From kishor.kharbas at intel.com Thu Sep 1 05:17:04 2016 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Thu, 1 Sep 2016 05:17:04 +0000 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: References: <57BE1AD4.7070403@oracle.com> Message-ID: Hello, I removed the unwanted save and restore of registers in the range XMM6-XMM31 from the x64_64 stubs. I also removed the #ifdef _WIN64 block from x86.ad file. Link to the new patch : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ Thanks Kishor -----Original Message----- From: Kharbas, Kishor Sent: Wednesday, August 24, 2016 6:24 PM To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Cc: Kharbas, Kishor Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows Thanks Vladimir for quick feedback. I will look into the stubs which save the registers in the range XMM6-XMM31. Also the first comment makes perfect sense. Thanks Kishor -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, August 24, 2016 3:08 PM To: Kharbas, Kishor ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows Hi Kishor, First, #ifdef _WIN64 is not needed anymore since calling convention is similat to unix now. Second, I would like you to look more broadly. With this change we don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not sure that we can remove all #ifdef _WIN64 there but for most of them I think we can do. Please, look. Thanks, Vladimir On 8/24/16 2:40 PM, Kharbas, Kishor wrote: > Requesting the community to review the patch for > https://bugs.openjdk.java.net/browse/JDK-8078122 > > Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 > > The patch changes the definitions of registers XMM6-XMM31 for WIN64. > > Thank you. > > Kishor > From dmitrij.pochepko at oracle.com Thu Sep 1 14:28:57 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 1 Sep 2016 17:28:57 +0300 Subject: RFR(S): 8146096 - [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: <6e2b1074-0573-b078-3ade-1d028e5fb377@oracle.com> References: <4d957fd9-442a-9096-ac01-0f39a974f6a2@oracle.com> <91e6de05-de11-d026-aa25-127775169479@oracle.com> <6e2b1074-0573-b078-3ade-1d028e5fb377@oracle.com> Message-ID: <20b0d0de-c992-e7e3-dff4-601cc6ab03f5@oracle.com> Thank you for attentive review. > Wow! You wrote "parser" to C2 Ideal graph. > > // now, find SafePoint->CountedLoopEnd edge > > Actually CountedLoopEnd input edge should point to SafePoint. Not > reverse. You should search from LoopEnd up. fixed > > 115 SafePoint === 112 1 107 1 1 10 110 [[ 116 ]] > SafePoint !orig=76 !jvms: SimpleTest::testMethod @ bci:21 > 116 CountedLoopEnd === 115 105 [[ 117 118 ]] [lt] > P=0.500000, C=6633.000000 !orig=97,[80] !jvms: SimpleTest::testMethod > @ bci:5 > > The test is too simple. Without UseCountedLoopSafepoints the loop is > folded to accum += 100. You need a little more complex. I've changed increment to combination of different shifts > > Also test should be run only with C2 (Tiered should be off). I don't > think require vm.opt.TieredStopAtLevel is the same as > -XX:-TieredCompilation. Requires expression just help to ensure level 4 is available, because testMethod specifically compiled on level=4 using WhiteBox (CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION). This differs from original test idea to disable tiered compilation and trigger compilation by running cycle 2 billion times, which made it a long "stress" test. Now that method compiled via WhiteBox, this test became much more faster Please take a look at v03: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.03/ Thanks, Dmitrij > > Thanks, > Vladimir > > On 8/31/16 12:03 PM, Dmitrij Pochepko wrote: >> Hi, >> >> Please take a look at v02. >> >> I've rewritten this test. Now it launch vm with >> -XX:+UseCountedLoopSafepoints (restricting compilation to tested method >> with single simple counted loop) and checks that output of >> -XX:+PrintIdeal have edge SafePoint -> CountedLoopEnd >> >> Then, launch the same with -XX:-UseCountedLoopSafepoints and checks that >> there is no such edge. >> >> I've tested fix via rbt on all platforms. >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.02/ >> >> >> Thanks, >> >> Dmitrij >> >> On 29.08.2016 19:28, Vladimir Kozlov wrote: >>> I am not against marking test as stress but I think the test itself is >>> not good. It should be rewrote. I added comment to the JBS with >>> discussion during original 8146096 RFR. >>> >>> Thanks, >>> Vladimir >>> >>> On 8/29/16 7:08 AM, Dmitrij Pochepko wrote: >>>> Hi, >>>> >>>> please review small fix for 8146096 - [TEST BUG] >>>> compiler/loopopts/UseCountedLoopSafepoints.java Timeouts >>>> >>>> >>>> Test timeouts on slow platforms, so, this fix adds execution control >>>> with respect to elapsed time. Also, test marked as stress. >>>> >>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.01/ >>>> >>>> CR: https://bugs.openjdk.java.net/browse/JDK-8146096 >>>> >>>> >>>> I've tested this fix on linux-amd64. >>>> >>>> >>>> Thanks, >>>> >>>> Dmitrij >>>> >> From vladimir.x.ivanov at oracle.com Thu Sep 1 15:35:32 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 1 Sep 2016 18:35:32 +0300 Subject: [9] RFR(XS): 8165050: [jittester] generated tests cannot be run with jtreg In-Reply-To: <3e682080-a02c-d2ab-2827-e4233a49082f@oracle.com> References: <3e682080-a02c-d2ab-2827-e4233a49082f@oracle.com> Message-ID: Reviewed. Best regards, Vladimir Ivanov On 8/31/16 8:14 PM, Tatiana Pivovarova wrote: > Hello! > > Please review this small patch > > Bug: > After moving jdk.test.lib and next fixing of jittester building > (8164648) JTREG cannot find jdk.test.lib library to compile generated test > Fix: > After considering several approaches we decided to return the previous > approach - just copy jdk.test.lib in $TESTBASE folder > > Tested locally > > bug: https://bugs.openjdk.java.net/browse/JDK-8165050 > webrev: http://cr.openjdk.java.net/~tpivovarova/8165050/webrev.00/ > > Thanks, > Tatiana > From dmitrij.pochepko at oracle.com Thu Sep 1 17:00:13 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 1 Sep 2016 20:00:13 +0300 Subject: RFR(XS): 8165244 - Unquarantine compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java Message-ID: Hi, please review small patch for 8165244 - Unquarantine compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java A quarantine reason (JDK-8139383) is closed as duplicate of fixed JDK-8139700, so, compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java should be unquarantined. webrev: http://cr.openjdk.java.net/~dpochepk/8165244/webrev.01/ CR: https://bugs.openjdk.java.net/browse/JDK-8165244 I've tested fix on linux-amd64. Thanks, Dmitrij From tatiana.pivovarova at oracle.com Thu Sep 1 17:05:26 2016 From: tatiana.pivovarova at oracle.com (Tatiana Pivovarova) Date: Thu, 1 Sep 2016 20:05:26 +0300 Subject: [9] RFR(XS): 8165050: [jittester] generated tests cannot be run with jtreg In-Reply-To: References: <3e682080-a02c-d2ab-2827-e4233a49082f@oracle.com> Message-ID: Thanks Vladimir! On 09/01/2016 06:35 PM, Vladimir Ivanov wrote: > Reviewed. > > Best regards, > Vladimir Ivanov > > On 8/31/16 8:14 PM, Tatiana Pivovarova wrote: >> Hello! >> >> Please review this small patch >> >> Bug: >> After moving jdk.test.lib and next fixing of jittester building >> (8164648) JTREG cannot find jdk.test.lib library to compile generated >> test >> Fix: >> After considering several approaches we decided to return the previous >> approach - just copy jdk.test.lib in $TESTBASE folder >> >> Tested locally >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8165050 >> webrev: http://cr.openjdk.java.net/~tpivovarova/8165050/webrev.00/ >> >> Thanks, >> Tatiana >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Sep 1 17:36:15 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Sep 2016 10:36:15 -0700 Subject: RFR(XS): 8165244 - Unquarantine compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java In-Reply-To: References: Message-ID: Good. thanks, Vladimir On 9/1/16 10:00 AM, Dmitrij Pochepko wrote: > Hi, > > please review small patch for 8165244 - Unquarantine > compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java > > A quarantine reason (JDK-8139383) is closed as duplicate of fixed > JDK-8139700, so, > compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java should be > unquarantined. > > webrev: http://cr.openjdk.java.net/~dpochepk/8165244/webrev.01/ > > CR: https://bugs.openjdk.java.net/browse/JDK-8165244 > > I've tested fix on linux-amd64. > > Thanks, > > Dmitrij > From dmitrij.pochepko at oracle.com Thu Sep 1 17:38:56 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 1 Sep 2016 20:38:56 +0300 Subject: RFR(XS): 8165244 - Unquarantine compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java In-Reply-To: References: Message-ID: Thank you! On 01.09.2016 20:36, Vladimir Kozlov wrote: > Good. > > thanks, > Vladimir > > On 9/1/16 10:00 AM, Dmitrij Pochepko wrote: >> Hi, >> >> please review small patch for 8165244 - Unquarantine >> compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java >> >> A quarantine reason (JDK-8139383) is closed as duplicate of fixed >> JDK-8139700, so, >> compiler/jvmci/compilerToVM/ExecuteInstalledCodeTest.java should be >> unquarantined. >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8165244/webrev.01/ >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8165244 >> >> I've tested fix on linux-amd64. >> >> Thanks, >> >> Dmitrij >> From vladimir.kozlov at oracle.com Thu Sep 1 17:40:57 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Sep 2016 10:40:57 -0700 Subject: RFR(S): 8146096 - [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: <20b0d0de-c992-e7e3-dff4-601cc6ab03f5@oracle.com> References: <4d957fd9-442a-9096-ac01-0f39a974f6a2@oracle.com> <91e6de05-de11-d026-aa25-127775169479@oracle.com> <6e2b1074-0573-b078-3ade-1d028e5fb377@oracle.com> <20b0d0de-c992-e7e3-dff4-601cc6ab03f5@oracle.com> Message-ID: Yes, this looks good. On 9/1/16 7:28 AM, Dmitrij Pochepko wrote: > Thank you for attentive review. > > >> Wow! You wrote "parser" to C2 Ideal graph. >> >> // now, find SafePoint->CountedLoopEnd edge >> >> Actually CountedLoopEnd input edge should point to SafePoint. Not >> reverse. You should search from LoopEnd up. > fixed >> >> 115 SafePoint === 112 1 107 1 1 10 110 [[ 116 ]] >> SafePoint !orig=76 !jvms: SimpleTest::testMethod @ bci:21 >> 116 CountedLoopEnd === 115 105 [[ 117 118 ]] [lt] >> P=0.500000, C=6633.000000 !orig=97,[80] !jvms: SimpleTest::testMethod >> @ bci:5 >> >> The test is too simple. Without UseCountedLoopSafepoints the loop is >> folded to accum += 100. You need a little more complex. > I've changed increment to combination of different shifts >> >> Also test should be run only with C2 (Tiered should be off). I don't >> think require vm.opt.TieredStopAtLevel is the same as >> -XX:-TieredCompilation. > Requires expression just help to ensure level 4 is available, because > testMethod specifically compiled on level=4 using WhiteBox > (CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION). > This differs from original test idea to disable tiered compilation and > trigger compilation by running cycle 2 billion times, which made it a > long "stress" test. Now that method compiled via WhiteBox, this test > became much more faster Got it. Thanks, Vladimir > > Please take a look at v03: > http://cr.openjdk.java.net/~dpochepk/8146096/webrev.03/ > > Thanks, > Dmitrij >> >> Thanks, >> Vladimir >> >> On 8/31/16 12:03 PM, Dmitrij Pochepko wrote: >>> Hi, >>> >>> Please take a look at v02. >>> >>> I've rewritten this test. Now it launch vm with >>> -XX:+UseCountedLoopSafepoints (restricting compilation to tested method >>> with single simple counted loop) and checks that output of >>> -XX:+PrintIdeal have edge SafePoint -> CountedLoopEnd >>> >>> Then, launch the same with -XX:-UseCountedLoopSafepoints and checks that >>> there is no such edge. >>> >>> I've tested fix via rbt on all platforms. >>> >>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.02/ >>> >>> >>> Thanks, >>> >>> Dmitrij >>> >>> On 29.08.2016 19:28, Vladimir Kozlov wrote: >>>> I am not against marking test as stress but I think the test itself is >>>> not good. It should be rewrote. I added comment to the JBS with >>>> discussion during original 8146096 RFR. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/29/16 7:08 AM, Dmitrij Pochepko wrote: >>>>> Hi, >>>>> >>>>> please review small fix for 8146096 - [TEST BUG] >>>>> compiler/loopopts/UseCountedLoopSafepoints.java Timeouts >>>>> >>>>> >>>>> Test timeouts on slow platforms, so, this fix adds execution control >>>>> with respect to elapsed time. Also, test marked as stress. >>>>> >>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.01/ >>>>> >>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8146096 >>>>> >>>>> >>>>> I've tested this fix on linux-amd64. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Dmitrij >>>>> >>> > From dmitrij.pochepko at oracle.com Thu Sep 1 17:44:24 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 1 Sep 2016 20:44:24 +0300 Subject: RFR(S): 8146096 - [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: References: <4d957fd9-442a-9096-ac01-0f39a974f6a2@oracle.com> <91e6de05-de11-d026-aa25-127775169479@oracle.com> <6e2b1074-0573-b078-3ade-1d028e5fb377@oracle.com> <20b0d0de-c992-e7e3-dff4-601cc6ab03f5@oracle.com> Message-ID: <3b88b0b2-4f2e-175e-c45e-e85e2ca719ef@oracle.com> Thank you for review! On 01.09.2016 20:40, Vladimir Kozlov wrote: > Yes, this looks good. > > On 9/1/16 7:28 AM, Dmitrij Pochepko wrote: >> Thank you for attentive review. >> >> >>> Wow! You wrote "parser" to C2 Ideal graph. >>> >>> // now, find SafePoint->CountedLoopEnd edge >>> >>> Actually CountedLoopEnd input edge should point to SafePoint. Not >>> reverse. You should search from LoopEnd up. >> fixed >>> >>> 115 SafePoint === 112 1 107 1 1 10 110 [[ 116 ]] >>> SafePoint !orig=76 !jvms: SimpleTest::testMethod @ bci:21 >>> 116 CountedLoopEnd === 115 105 [[ 117 118 ]] [lt] >>> P=0.500000, C=6633.000000 !orig=97,[80] !jvms: SimpleTest::testMethod >>> @ bci:5 >>> >>> The test is too simple. Without UseCountedLoopSafepoints the loop is >>> folded to accum += 100. You need a little more complex. >> I've changed increment to combination of different shifts >>> >>> Also test should be run only with C2 (Tiered should be off). I don't >>> think require vm.opt.TieredStopAtLevel is the same as >>> -XX:-TieredCompilation. >> Requires expression just help to ensure level 4 is available, because >> testMethod specifically compiled on level=4 using WhiteBox >> (CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION). >> This differs from original test idea to disable tiered compilation and >> trigger compilation by running cycle 2 billion times, which made it a >> long "stress" test. Now that method compiled via WhiteBox, this test >> became much more faster > > Got it. > > Thanks, > Vladimir > >> >> Please take a look at v03: >> http://cr.openjdk.java.net/~dpochepk/8146096/webrev.03/ >> >> Thanks, >> Dmitrij >>> >>> Thanks, >>> Vladimir >>> >>> On 8/31/16 12:03 PM, Dmitrij Pochepko wrote: >>>> Hi, >>>> >>>> Please take a look at v02. >>>> >>>> I've rewritten this test. Now it launch vm with >>>> -XX:+UseCountedLoopSafepoints (restricting compilation to tested >>>> method >>>> with single simple counted loop) and checks that output of >>>> -XX:+PrintIdeal have edge SafePoint -> CountedLoopEnd >>>> >>>> Then, launch the same with -XX:-UseCountedLoopSafepoints and checks >>>> that >>>> there is no such edge. >>>> >>>> I've tested fix via rbt on all platforms. >>>> >>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.02/ >>>> >>>> >>>> Thanks, >>>> >>>> Dmitrij >>>> >>>> On 29.08.2016 19:28, Vladimir Kozlov wrote: >>>>> I am not against marking test as stress but I think the test >>>>> itself is >>>>> not good. It should be rewrote. I added comment to the JBS with >>>>> discussion during original 8146096 RFR. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 8/29/16 7:08 AM, Dmitrij Pochepko wrote: >>>>>> Hi, >>>>>> >>>>>> please review small fix for 8146096 - [TEST BUG] >>>>>> compiler/loopopts/UseCountedLoopSafepoints.java Timeouts >>>>>> >>>>>> >>>>>> Test timeouts on slow platforms, so, this fix adds execution control >>>>>> with respect to elapsed time. Also, test marked as stress. >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.01/ >>>>>> >>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8146096 >>>>>> >>>>>> >>>>>> I've tested this fix on linux-amd64. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Dmitrij >>>>>> >>>> >> From vladimir.kozlov at oracle.com Thu Sep 1 17:47:56 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Sep 2016 10:47:56 -0700 Subject: RFR(S): 8146096 - [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: References: <4d957fd9-442a-9096-ac01-0f39a974f6a2@oracle.com> <91e6de05-de11-d026-aa25-127775169479@oracle.com> <6e2b1074-0573-b078-3ade-1d028e5fb377@oracle.com> <20b0d0de-c992-e7e3-dff4-601cc6ab03f5@oracle.com> Message-ID: I forgot an other useful C2 flag -XX:LoopUnrollLimit=0 to avoid loop iteration splitting. Otherwise you will get several loops (pre-, main- post-). Thanks, Vladimir On 9/1/16 10:40 AM, Vladimir Kozlov wrote: > Yes, this looks good. > > On 9/1/16 7:28 AM, Dmitrij Pochepko wrote: >> Thank you for attentive review. >> >> >>> Wow! You wrote "parser" to C2 Ideal graph. >>> >>> // now, find SafePoint->CountedLoopEnd edge >>> >>> Actually CountedLoopEnd input edge should point to SafePoint. Not >>> reverse. You should search from LoopEnd up. >> fixed >>> >>> 115 SafePoint === 112 1 107 1 1 10 110 [[ 116 ]] >>> SafePoint !orig=76 !jvms: SimpleTest::testMethod @ bci:21 >>> 116 CountedLoopEnd === 115 105 [[ 117 118 ]] [lt] >>> P=0.500000, C=6633.000000 !orig=97,[80] !jvms: SimpleTest::testMethod >>> @ bci:5 >>> >>> The test is too simple. Without UseCountedLoopSafepoints the loop is >>> folded to accum += 100. You need a little more complex. >> I've changed increment to combination of different shifts >>> >>> Also test should be run only with C2 (Tiered should be off). I don't >>> think require vm.opt.TieredStopAtLevel is the same as >>> -XX:-TieredCompilation. >> Requires expression just help to ensure level 4 is available, because >> testMethod specifically compiled on level=4 using WhiteBox >> (CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION). >> This differs from original test idea to disable tiered compilation and >> trigger compilation by running cycle 2 billion times, which made it a >> long "stress" test. Now that method compiled via WhiteBox, this test >> became much more faster > > Got it. > > Thanks, > Vladimir > >> >> Please take a look at v03: >> http://cr.openjdk.java.net/~dpochepk/8146096/webrev.03/ >> >> Thanks, >> Dmitrij >>> >>> Thanks, >>> Vladimir >>> >>> On 8/31/16 12:03 PM, Dmitrij Pochepko wrote: >>>> Hi, >>>> >>>> Please take a look at v02. >>>> >>>> I've rewritten this test. Now it launch vm with >>>> -XX:+UseCountedLoopSafepoints (restricting compilation to tested method >>>> with single simple counted loop) and checks that output of >>>> -XX:+PrintIdeal have edge SafePoint -> CountedLoopEnd >>>> >>>> Then, launch the same with -XX:-UseCountedLoopSafepoints and checks >>>> that >>>> there is no such edge. >>>> >>>> I've tested fix via rbt on all platforms. >>>> >>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.02/ >>>> >>>> >>>> Thanks, >>>> >>>> Dmitrij >>>> >>>> On 29.08.2016 19:28, Vladimir Kozlov wrote: >>>>> I am not against marking test as stress but I think the test itself is >>>>> not good. It should be rewrote. I added comment to the JBS with >>>>> discussion during original 8146096 RFR. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 8/29/16 7:08 AM, Dmitrij Pochepko wrote: >>>>>> Hi, >>>>>> >>>>>> please review small fix for 8146096 - [TEST BUG] >>>>>> compiler/loopopts/UseCountedLoopSafepoints.java Timeouts >>>>>> >>>>>> >>>>>> Test timeouts on slow platforms, so, this fix adds execution control >>>>>> with respect to elapsed time. Also, test marked as stress. >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.01/ >>>>>> >>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8146096 >>>>>> >>>>>> >>>>>> I've tested this fix on linux-amd64. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Dmitrij >>>>>> >>>> >> From vladimir.kozlov at oracle.com Thu Sep 1 17:52:14 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Sep 2016 10:52:14 -0700 Subject: RFR(S): 8157956 - OverflowCodeCacheTest.java fails with Out of space in CodeCache for method handle intrinsic In-Reply-To: <62af160f-2438-8dcf-db3e-8954f931951e@oracle.com> References: <62af160f-2438-8dcf-db3e-8954f931951e@oracle.com> Message-ID: <4445baa9-4c2b-3e14-ae31-2e5a32331981@oracle.com> Good. Thanks, Vladimir On 8/31/16 11:07 AM, Dmitrij Pochepko wrote: > Hi, > > please review small fix for 8157956 - OverflowCodeCacheTest.java fails > with Out of space in CodeCache for method handle intrinsic > > Test failed with "VirtualMachineError: Out of space for method handle > intrinsic" because of code cache exhaustion. This happened on final > assert. This fix moves assert after freeing codecache logic to get rid > of error. > > I've tested it on linux-i586 > > webrev: http://cr.openjdk.java.net/~dpochepk/8157956/webrev.01/ > > CR: https://bugs.openjdk.java.net/browse/JDK-8157956 > > Thanks, > > Dmitrij > From dmitrij.pochepko at oracle.com Thu Sep 1 17:57:19 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 1 Sep 2016 20:57:19 +0300 Subject: RFR(S): 8157956 - OverflowCodeCacheTest.java fails with Out of space in CodeCache for method handle intrinsic In-Reply-To: <4445baa9-4c2b-3e14-ae31-2e5a32331981@oracle.com> References: <62af160f-2438-8dcf-db3e-8954f931951e@oracle.com> <4445baa9-4c2b-3e14-ae31-2e5a32331981@oracle.com> Message-ID: <7f9fbfef-0162-aa40-72ec-7ddf0ab8ede6@oracle.com> Thank you! On 01.09.2016 20:52, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 8/31/16 11:07 AM, Dmitrij Pochepko wrote: >> Hi, >> >> please review small fix for 8157956 - OverflowCodeCacheTest.java fails >> with Out of space in CodeCache for method handle intrinsic >> >> Test failed with "VirtualMachineError: Out of space for method handle >> intrinsic" because of code cache exhaustion. This happened on final >> assert. This fix moves assert after freeing codecache logic to get rid >> of error. >> >> I've tested it on linux-i586 >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8157956/webrev.01/ >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8157956 >> >> Thanks, >> >> Dmitrij >> From dmitrij.pochepko at oracle.com Thu Sep 1 18:03:19 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 1 Sep 2016 21:03:19 +0300 Subject: RFR(S): 8146096 - [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: References: <4d957fd9-442a-9096-ac01-0f39a974f6a2@oracle.com> <91e6de05-de11-d026-aa25-127775169479@oracle.com> <6e2b1074-0573-b078-3ade-1d028e5fb377@oracle.com> <20b0d0de-c992-e7e3-dff4-601cc6ab03f5@oracle.com> Message-ID: <0456966a-6e5e-c293-a16e-59a2cc2fcc8a@oracle.com> Hi, I've added LoopUnrollLimit=0 to spawned vm options and tested on linux-amd64. Please take a look at v04: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.04/ Thanks, Dmitrij On 01.09.2016 20:47, Vladimir Kozlov wrote: > I forgot an other useful C2 flag -XX:LoopUnrollLimit=0 to avoid loop > iteration splitting. Otherwise you will get several loops (pre-, main- > post-). > > Thanks, > Vladimir > > On 9/1/16 10:40 AM, Vladimir Kozlov wrote: >> Yes, this looks good. >> >> On 9/1/16 7:28 AM, Dmitrij Pochepko wrote: >>> Thank you for attentive review. >>> >>> >>>> Wow! You wrote "parser" to C2 Ideal graph. >>>> >>>> // now, find SafePoint->CountedLoopEnd edge >>>> >>>> Actually CountedLoopEnd input edge should point to SafePoint. Not >>>> reverse. You should search from LoopEnd up. >>> fixed >>>> >>>> 115 SafePoint === 112 1 107 1 1 10 110 [[ 116 ]] >>>> SafePoint !orig=76 !jvms: SimpleTest::testMethod @ bci:21 >>>> 116 CountedLoopEnd === 115 105 [[ 117 118 ]] [lt] >>>> P=0.500000, C=6633.000000 !orig=97,[80] !jvms: SimpleTest::testMethod >>>> @ bci:5 >>>> >>>> The test is too simple. Without UseCountedLoopSafepoints the loop is >>>> folded to accum += 100. You need a little more complex. >>> I've changed increment to combination of different shifts >>>> >>>> Also test should be run only with C2 (Tiered should be off). I don't >>>> think require vm.opt.TieredStopAtLevel is the same as >>>> -XX:-TieredCompilation. >>> Requires expression just help to ensure level 4 is available, because >>> testMethod specifically compiled on level=4 using WhiteBox >>> (CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION). >>> This differs from original test idea to disable tiered compilation and >>> trigger compilation by running cycle 2 billion times, which made it a >>> long "stress" test. Now that method compiled via WhiteBox, this test >>> became much more faster >> >> Got it. >> >> Thanks, >> Vladimir >> >>> >>> Please take a look at v03: >>> http://cr.openjdk.java.net/~dpochepk/8146096/webrev.03/ >>> >>> Thanks, >>> Dmitrij >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/31/16 12:03 PM, Dmitrij Pochepko wrote: >>>>> Hi, >>>>> >>>>> Please take a look at v02. >>>>> >>>>> I've rewritten this test. Now it launch vm with >>>>> -XX:+UseCountedLoopSafepoints (restricting compilation to tested >>>>> method >>>>> with single simple counted loop) and checks that output of >>>>> -XX:+PrintIdeal have edge SafePoint -> CountedLoopEnd >>>>> >>>>> Then, launch the same with -XX:-UseCountedLoopSafepoints and checks >>>>> that >>>>> there is no such edge. >>>>> >>>>> I've tested fix via rbt on all platforms. >>>>> >>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.02/ >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Dmitrij >>>>> >>>>> On 29.08.2016 19:28, Vladimir Kozlov wrote: >>>>>> I am not against marking test as stress but I think the test >>>>>> itself is >>>>>> not good. It should be rewrote. I added comment to the JBS with >>>>>> discussion during original 8146096 RFR. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 8/29/16 7:08 AM, Dmitrij Pochepko wrote: >>>>>>> Hi, >>>>>>> >>>>>>> please review small fix for 8146096 - [TEST BUG] >>>>>>> compiler/loopopts/UseCountedLoopSafepoints.java Timeouts >>>>>>> >>>>>>> >>>>>>> Test timeouts on slow platforms, so, this fix adds execution >>>>>>> control >>>>>>> with respect to elapsed time. Also, test marked as stress. >>>>>>> >>>>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.01/ >>>>>>> >>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8146096 >>>>>>> >>>>>>> >>>>>>> I've tested this fix on linux-amd64. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Dmitrij >>>>>>> >>>>> >>> From vladimir.kozlov at oracle.com Thu Sep 1 18:03:53 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Sep 2016 11:03:53 -0700 Subject: RFR(S): 8146096 - [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: <0456966a-6e5e-c293-a16e-59a2cc2fcc8a@oracle.com> References: <4d957fd9-442a-9096-ac01-0f39a974f6a2@oracle.com> <91e6de05-de11-d026-aa25-127775169479@oracle.com> <6e2b1074-0573-b078-3ade-1d028e5fb377@oracle.com> <20b0d0de-c992-e7e3-dff4-601cc6ab03f5@oracle.com> <0456966a-6e5e-c293-a16e-59a2cc2fcc8a@oracle.com> Message-ID: Nice! Thank you, Vladimir On 9/1/16 11:03 AM, Dmitrij Pochepko wrote: > Hi, > > I've added LoopUnrollLimit=0 to spawned vm options and tested on > linux-amd64. > > Please take a look at v04: > http://cr.openjdk.java.net/~dpochepk/8146096/webrev.04/ > > Thanks, > > Dmitrij > > On 01.09.2016 20:47, Vladimir Kozlov wrote: >> I forgot an other useful C2 flag -XX:LoopUnrollLimit=0 to avoid loop >> iteration splitting. Otherwise you will get several loops (pre-, main- >> post-). >> >> Thanks, >> Vladimir >> >> On 9/1/16 10:40 AM, Vladimir Kozlov wrote: >>> Yes, this looks good. >>> >>> On 9/1/16 7:28 AM, Dmitrij Pochepko wrote: >>>> Thank you for attentive review. >>>> >>>> >>>>> Wow! You wrote "parser" to C2 Ideal graph. >>>>> >>>>> // now, find SafePoint->CountedLoopEnd edge >>>>> >>>>> Actually CountedLoopEnd input edge should point to SafePoint. Not >>>>> reverse. You should search from LoopEnd up. >>>> fixed >>>>> >>>>> 115 SafePoint === 112 1 107 1 1 10 110 [[ 116 ]] >>>>> SafePoint !orig=76 !jvms: SimpleTest::testMethod @ bci:21 >>>>> 116 CountedLoopEnd === 115 105 [[ 117 118 ]] [lt] >>>>> P=0.500000, C=6633.000000 !orig=97,[80] !jvms: SimpleTest::testMethod >>>>> @ bci:5 >>>>> >>>>> The test is too simple. Without UseCountedLoopSafepoints the loop is >>>>> folded to accum += 100. You need a little more complex. >>>> I've changed increment to combination of different shifts >>>>> >>>>> Also test should be run only with C2 (Tiered should be off). I don't >>>>> think require vm.opt.TieredStopAtLevel is the same as >>>>> -XX:-TieredCompilation. >>>> Requires expression just help to ensure level 4 is available, because >>>> testMethod specifically compiled on level=4 using WhiteBox >>>> (CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION). >>>> This differs from original test idea to disable tiered compilation and >>>> trigger compilation by running cycle 2 billion times, which made it a >>>> long "stress" test. Now that method compiled via WhiteBox, this test >>>> became much more faster >>> >>> Got it. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Please take a look at v03: >>>> http://cr.openjdk.java.net/~dpochepk/8146096/webrev.03/ >>>> >>>> Thanks, >>>> Dmitrij >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 8/31/16 12:03 PM, Dmitrij Pochepko wrote: >>>>>> Hi, >>>>>> >>>>>> Please take a look at v02. >>>>>> >>>>>> I've rewritten this test. Now it launch vm with >>>>>> -XX:+UseCountedLoopSafepoints (restricting compilation to tested >>>>>> method >>>>>> with single simple counted loop) and checks that output of >>>>>> -XX:+PrintIdeal have edge SafePoint -> CountedLoopEnd >>>>>> >>>>>> Then, launch the same with -XX:-UseCountedLoopSafepoints and checks >>>>>> that >>>>>> there is no such edge. >>>>>> >>>>>> I've tested fix via rbt on all platforms. >>>>>> >>>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.02/ >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Dmitrij >>>>>> >>>>>> On 29.08.2016 19:28, Vladimir Kozlov wrote: >>>>>>> I am not against marking test as stress but I think the test >>>>>>> itself is >>>>>>> not good. It should be rewrote. I added comment to the JBS with >>>>>>> discussion during original 8146096 RFR. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 8/29/16 7:08 AM, Dmitrij Pochepko wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> please review small fix for 8146096 - [TEST BUG] >>>>>>>> compiler/loopopts/UseCountedLoopSafepoints.java Timeouts >>>>>>>> >>>>>>>> >>>>>>>> Test timeouts on slow platforms, so, this fix adds execution >>>>>>>> control >>>>>>>> with respect to elapsed time. Also, test marked as stress. >>>>>>>> >>>>>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.01/ >>>>>>>> >>>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8146096 >>>>>>>> >>>>>>>> >>>>>>>> I've tested this fix on linux-amd64. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Dmitrij >>>>>>>> >>>>>> >>>> > From dmitrij.pochepko at oracle.com Thu Sep 1 18:07:52 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 1 Sep 2016 21:07:52 +0300 Subject: RFR(S): 8146096 - [TEST BUG] compiler/loopopts/UseCountedLoopSafepoints.java Timeouts In-Reply-To: References: <4d957fd9-442a-9096-ac01-0f39a974f6a2@oracle.com> <91e6de05-de11-d026-aa25-127775169479@oracle.com> <6e2b1074-0573-b078-3ade-1d028e5fb377@oracle.com> <20b0d0de-c992-e7e3-dff4-601cc6ab03f5@oracle.com> <0456966a-6e5e-c293-a16e-59a2cc2fcc8a@oracle.com> Message-ID: Thank you! On 01.09.2016 21:03, Vladimir Kozlov wrote: > Nice! > > Thank you, > Vladimir > > On 9/1/16 11:03 AM, Dmitrij Pochepko wrote: >> Hi, >> >> I've added LoopUnrollLimit=0 to spawned vm options and tested on >> linux-amd64. >> >> Please take a look at v04: >> http://cr.openjdk.java.net/~dpochepk/8146096/webrev.04/ >> >> Thanks, >> >> Dmitrij >> >> On 01.09.2016 20:47, Vladimir Kozlov wrote: >>> I forgot an other useful C2 flag -XX:LoopUnrollLimit=0 to avoid loop >>> iteration splitting. Otherwise you will get several loops (pre-, main- >>> post-). >>> >>> Thanks, >>> Vladimir >>> >>> On 9/1/16 10:40 AM, Vladimir Kozlov wrote: >>>> Yes, this looks good. >>>> >>>> On 9/1/16 7:28 AM, Dmitrij Pochepko wrote: >>>>> Thank you for attentive review. >>>>> >>>>> >>>>>> Wow! You wrote "parser" to C2 Ideal graph. >>>>>> >>>>>> // now, find SafePoint->CountedLoopEnd edge >>>>>> >>>>>> Actually CountedLoopEnd input edge should point to SafePoint. Not >>>>>> reverse. You should search from LoopEnd up. >>>>> fixed >>>>>> >>>>>> 115 SafePoint === 112 1 107 1 1 10 110 [[ 116 ]] >>>>>> SafePoint !orig=76 !jvms: SimpleTest::testMethod @ bci:21 >>>>>> 116 CountedLoopEnd === 115 105 [[ 117 118 ]] [lt] >>>>>> P=0.500000, C=6633.000000 !orig=97,[80] !jvms: >>>>>> SimpleTest::testMethod >>>>>> @ bci:5 >>>>>> >>>>>> The test is too simple. Without UseCountedLoopSafepoints the loop is >>>>>> folded to accum += 100. You need a little more complex. >>>>> I've changed increment to combination of different shifts >>>>>> >>>>>> Also test should be run only with C2 (Tiered should be off). I don't >>>>>> think require vm.opt.TieredStopAtLevel is the same as >>>>>> -XX:-TieredCompilation. >>>>> Requires expression just help to ensure level 4 is available, because >>>>> testMethod specifically compiled on level=4 using WhiteBox >>>>> (CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION). >>>>> This differs from original test idea to disable tiered compilation >>>>> and >>>>> trigger compilation by running cycle 2 billion times, which made it a >>>>> long "stress" test. Now that method compiled via WhiteBox, this test >>>>> became much more faster >>>> >>>> Got it. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Please take a look at v03: >>>>> http://cr.openjdk.java.net/~dpochepk/8146096/webrev.03/ >>>>> >>>>> Thanks, >>>>> Dmitrij >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 8/31/16 12:03 PM, Dmitrij Pochepko wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Please take a look at v02. >>>>>>> >>>>>>> I've rewritten this test. Now it launch vm with >>>>>>> -XX:+UseCountedLoopSafepoints (restricting compilation to tested >>>>>>> method >>>>>>> with single simple counted loop) and checks that output of >>>>>>> -XX:+PrintIdeal have edge SafePoint -> CountedLoopEnd >>>>>>> >>>>>>> Then, launch the same with -XX:-UseCountedLoopSafepoints and checks >>>>>>> that >>>>>>> there is no such edge. >>>>>>> >>>>>>> I've tested fix via rbt on all platforms. >>>>>>> >>>>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.02/ >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Dmitrij >>>>>>> >>>>>>> On 29.08.2016 19:28, Vladimir Kozlov wrote: >>>>>>>> I am not against marking test as stress but I think the test >>>>>>>> itself is >>>>>>>> not good. It should be rewrote. I added comment to the JBS with >>>>>>>> discussion during original 8146096 RFR. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 8/29/16 7:08 AM, Dmitrij Pochepko wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> please review small fix for 8146096 - [TEST BUG] >>>>>>>>> compiler/loopopts/UseCountedLoopSafepoints.java Timeouts >>>>>>>>> >>>>>>>>> >>>>>>>>> Test timeouts on slow platforms, so, this fix adds execution >>>>>>>>> control >>>>>>>>> with respect to elapsed time. Also, test marked as stress. >>>>>>>>> >>>>>>>>> webrev: http://cr.openjdk.java.net/~dpochepk/8146096/webrev.01/ >>>>>>>>> >>>>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8146096 >>>>>>>>> >>>>>>>>> >>>>>>>>> I've tested this fix on linux-amd64. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Dmitrij >>>>>>>>> >>>>>>> >>>>> >> From Leonid.Mesnik at oracle.com Thu Sep 1 18:13:45 2016 From: Leonid.Mesnik at oracle.com (Leonid Mesnik) Date: Thu, 1 Sep 2016 21:13:45 +0300 Subject: [9-dev] Request for review: JDK-8146128: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig timeouts In-Reply-To: <0be045b2-ec1f-cf9b-bcf8-86ca602eadec@oracle.com> References: <542E8041.1010101@oracle.com> <0be045b2-ec1f-cf9b-bcf8-86ca602eadec@oracle.com> Message-ID: <57C86FD9.2030508@oracle.com> Hi The hotspot compiler changes should go to jdk9/hs-comp and not to 9-dev. Also hotspot-compiler-dev at openjdk.java.net alias should be used for compiler specific product and test changes. It is unclear from issue description/comment what is the root cause of failure and how it was fixed. Could you please add this information. Leonid On 01.09.2016 20:58, Alexander Vorobyev wrote: > > Hi All, > > I'd like review for JDK-8146128 > (https://bugs.openjdk.java.net/browse/JDK-8146128) > > Test passes with timeout increased. Looks like it times out in > sub-tests where AESIntrinsics are disabled (testNoUseAES(), > testNoUseAESIntrinsic()). The easiest way to fix this test is to > increase timeout. > > Run parameter was added: > @run main/othervm/timeout=300 > > > Here is webrev: > http://cr.openjdk.java.net/~avorobye/8146128/webrev.00/ > > > Thanks, > Alexander > > > From vladimir.kozlov at oracle.com Thu Sep 1 18:15:50 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Sep 2016 11:15:50 -0700 Subject: [9-dev] Request for review: JDK-8146128: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig timeouts In-Reply-To: <57C86FD9.2030508@oracle.com> References: <542E8041.1010101@oracle.com> <0be045b2-ec1f-cf9b-bcf8-86ca602eadec@oracle.com> <57C86FD9.2030508@oracle.com> Message-ID: Yes, removing jdk90dev from to: 300 is not enough. From bug report: elapsed time (seconds): 482.214 An other way to solve that is to check remaining time after each test (forked VM) is executed and exit gracefully. Thanks, Vladimir On 9/1/16 11:13 AM, Leonid Mesnik wrote: > Hi > > The hotspot compiler changes should go to jdk9/hs-comp and not to 9-dev. > Also hotspot-compiler-dev at openjdk.java.net alias should be used for > compiler specific product and test changes. > > It is unclear from issue description/comment what is the root cause of > failure and how it was fixed. Could you please add this information. > > Leonid > > On 01.09.2016 20:58, Alexander Vorobyev wrote: >> >> Hi All, >> >> I'd like review for JDK-8146128 >> (https://bugs.openjdk.java.net/browse/JDK-8146128) >> >> Test passes with timeout increased. Looks like it times out in >> sub-tests where AESIntrinsics are disabled (testNoUseAES(), >> testNoUseAESIntrinsic()). The easiest way to fix this test is to >> increase timeout. >> >> Run parameter was added: >> @run main/othervm/timeout=300 >> >> >> Here is webrev: >> http://cr.openjdk.java.net/~avorobye/8146128/webrev.00/ >> >> >> Thanks, >> Alexander >> >> >> > From alexander.vorobyev at oracle.com Thu Sep 1 18:23:57 2016 From: alexander.vorobyev at oracle.com (Alexander Vorobyev) Date: Thu, 1 Sep 2016 21:23:57 +0300 Subject: [9-dev] Request for review: JDK-8146128: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig timeouts In-Reply-To: <57C86FD9.2030508@oracle.com> References: <542E8041.1010101@oracle.com> <0be045b2-ec1f-cf9b-bcf8-86ca602eadec@oracle.com> <57C86FD9.2030508@oracle.com> Message-ID: <9fa4959b-b6bf-1dcc-d909-d46d62748000@oracle.com> This test uses test/compiler/codegen/aes/TestAESMain.java. It runs it with different AES-related flags and then analyzes output. But TestAESMain.java performs very intensive calculations and it takes more time when axelerations are disabled (testNoUseAES(), testNoUseAESIntrinsic()). It is the cause of timeout. On 01.09.2016 21:13, Leonid Mesnik wrote: > Hi > > The hotspot compiler changes should go to jdk9/hs-comp and not to > 9-dev. Also hotspot-compiler-dev at openjdk.java.net alias should be used > for compiler specific product and test changes. > > It is unclear from issue description/comment what is the root cause > of failure and how it was fixed. Could you please add this information. > > Leonid > > On 01.09.2016 20:58, Alexander Vorobyev wrote: >> >> Hi All, >> >> I'd like review for JDK-8146128 >> (https://bugs.openjdk.java.net/browse/JDK-8146128) >> >> Test passes with timeout increased. Looks like it times out in >> sub-tests where AESIntrinsics are disabled (testNoUseAES(), >> testNoUseAESIntrinsic()). The easiest way to fix this test is to >> increase timeout. >> >> Run parameter was added: >> @run main/othervm/timeout=300 >> >> >> Here is webrev: >> http://cr.openjdk.java.net/~avorobye/8146128/webrev.00/ >> >> >> Thanks, >> Alexander >> >> >> > From alexander.vorobyev at oracle.com Thu Sep 1 18:36:41 2016 From: alexander.vorobyev at oracle.com (Alexander Vorobyev) Date: Thu, 1 Sep 2016 21:36:41 +0300 Subject: [9-dev] Request for review: JDK-8146128: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig timeouts In-Reply-To: References: <542E8041.1010101@oracle.com> <0be045b2-ec1f-cf9b-bcf8-86ca602eadec@oracle.com> <57C86FD9.2030508@oracle.com> Message-ID: <2b3ca745-2d17-61a7-07fe-50ef619d8dde@oracle.com> Do you mean to stop the test execution if there is not enough time remained? Even if not all test cases finished? On 01.09.2016 21:15, Vladimir Kozlov wrote: > Yes, removing jdk90dev from to: > > 300 is not enough. From bug report: > > elapsed time (seconds): 482.214 > > An other way to solve that is to check remaining time after each test > (forked VM) is executed and exit gracefully. > > Thanks, > Vladimir > > > > On 9/1/16 11:13 AM, Leonid Mesnik wrote: >> Hi >> >> The hotspot compiler changes should go to jdk9/hs-comp and not to 9-dev. >> Also hotspot-compiler-dev at openjdk.java.net alias should be used for >> compiler specific product and test changes. >> >> It is unclear from issue description/comment what is the root cause of >> failure and how it was fixed. Could you please add this information. >> >> Leonid >> >> On 01.09.2016 20:58, Alexander Vorobyev wrote: >>> >>> Hi All, >>> >>> I'd like review for JDK-8146128 >>> (https://bugs.openjdk.java.net/browse/JDK-8146128) >>> >>> Test passes with timeout increased. Looks like it times out in >>> sub-tests where AESIntrinsics are disabled (testNoUseAES(), >>> testNoUseAESIntrinsic()). The easiest way to fix this test is to >>> increase timeout. >>> >>> Run parameter was added: >>> @run main/othervm/timeout=300 >>> >>> >>> Here is webrev: >>> http://cr.openjdk.java.net/~avorobye/8146128/webrev.00/ >>> >>> >>> Thanks, >>> Alexander >>> >>> >>> >> From vladimir.kozlov at oracle.com Thu Sep 1 18:38:32 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Sep 2016 11:38:32 -0700 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: References: <57BE1AD4.7070403@oracle.com> Message-ID: <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> Good. But looks like some code relied on old stack layout in stubs, for example sha256_AVX2(): #ifndef _WIN64 _XMM_SAVE_SIZE = 0, #else _XMM_SAVE_SIZE = 8*16, #endif Please, check that all other related code is fixed too. (I looked on all cases of _WIN64 in src/cpu/x86/vm/). Thanks, Vladimir On 8/31/16 10:17 PM, Kharbas, Kishor wrote: > Hello, > > I removed the unwanted save and restore of registers in the range XMM6-XMM31 from the x64_64 stubs. > I also removed the #ifdef _WIN64 block from x86.ad file. > > Link to the new patch : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ > > Thanks > Kishor > > > -----Original Message----- > From: Kharbas, Kishor > Sent: Wednesday, August 24, 2016 6:24 PM > To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net > Cc: Kharbas, Kishor > Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows > > Thanks Vladimir for quick feedback. > I will look into the stubs which save the registers in the range XMM6-XMM31. Also the first comment makes perfect sense. > > Thanks > Kishor > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, August 24, 2016 3:08 PM > To: Kharbas, Kishor ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows > > Hi Kishor, > > First, #ifdef _WIN64 is not needed anymore since calling convention is similat to unix now. > > Second, I would like you to look more broadly. With this change we don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not sure that we can remove all #ifdef _WIN64 there but for most of them I think we can do. Please, look. > > Thanks, > Vladimir > > On 8/24/16 2:40 PM, Kharbas, Kishor wrote: >> Requesting the community to review the patch for >> https://bugs.openjdk.java.net/browse/JDK-8078122 >> >> Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 >> >> The patch changes the definitions of registers XMM6-XMM31 for WIN64. >> >> Thank you. >> >> Kishor >> From vladimir.kozlov at oracle.com Thu Sep 1 18:44:52 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Sep 2016 11:44:52 -0700 Subject: [9-dev] Request for review: JDK-8146128: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig timeouts In-Reply-To: <2b3ca745-2d17-61a7-07fe-50ef619d8dde@oracle.com> References: <542E8041.1010101@oracle.com> <0be045b2-ec1f-cf9b-bcf8-86ca602eadec@oracle.com> <57C86FD9.2030508@oracle.com> <2b3ca745-2d17-61a7-07fe-50ef619d8dde@oracle.com> Message-ID: <2c927fd6-17ad-137e-669d-822a1bab7c57@oracle.com> Yes, in addition to timeout increase. Because we can always find very slow platform (SPARC VM, for example) on which any reasonable timeout may be not enough. It would be rare cases with increased timeout so that skipping remaining tests is fine, I think. You can't increase timeout to hours. Thanks, Vladimir On 9/1/16 11:36 AM, Alexander Vorobyev wrote: > Do you mean to stop the test execution if there is not enough time > remained? Even if not all test cases finished? > > > On 01.09.2016 21:15, Vladimir Kozlov wrote: >> Yes, removing jdk90dev from to: >> >> 300 is not enough. From bug report: >> >> elapsed time (seconds): 482.214 >> >> An other way to solve that is to check remaining time after each test >> (forked VM) is executed and exit gracefully. >> >> Thanks, >> Vladimir >> >> >> >> On 9/1/16 11:13 AM, Leonid Mesnik wrote: >>> Hi >>> >>> The hotspot compiler changes should go to jdk9/hs-comp and not to 9-dev. >>> Also hotspot-compiler-dev at openjdk.java.net alias should be used for >>> compiler specific product and test changes. >>> >>> It is unclear from issue description/comment what is the root cause of >>> failure and how it was fixed. Could you please add this information. >>> >>> Leonid >>> >>> On 01.09.2016 20:58, Alexander Vorobyev wrote: >>>> >>>> Hi All, >>>> >>>> I'd like review for JDK-8146128 >>>> (https://bugs.openjdk.java.net/browse/JDK-8146128) >>>> >>>> Test passes with timeout increased. Looks like it times out in >>>> sub-tests where AESIntrinsics are disabled (testNoUseAES(), >>>> testNoUseAESIntrinsic()). The easiest way to fix this test is to >>>> increase timeout. >>>> >>>> Run parameter was added: >>>> @run main/othervm/timeout=300 >>>> >>>> >>>> Here is webrev: >>>> http://cr.openjdk.java.net/~avorobye/8146128/webrev.00/ >>>> >>>> >>>> Thanks, >>>> Alexander >>>> >>>> >>>> >>> > From vladimir.kozlov at oracle.com Thu Sep 1 18:56:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 1 Sep 2016 11:56:12 -0700 Subject: CR for RFR 8164989 In-Reply-To: References: Message-ID: Hi, Michael Please, add comment which explain why it is disabled ( 0 && ). File a bug (if you did not do that already) which will address the compress issue and reference it in the comment. Thanks, Vladimir On 8/30/16 6:30 PM, Berg, Michael C wrote: > Hi Folks, > > I would like to contribute a bug fix for SKX and KNL EVEX code gen. The > inflate and compress intrinsics on avx512 yield incorrect results and > cause derby, sunflow, xml.transform and xml.validation to fail. I have > disabled the avx512 context for compress as it needs some rework and > repaired inflate. Please review the resultant code. > > > > This code was tested as follows: hotspot jreg, SPECjvm2008 bdw, skx, knl > complete with no issues. This change addresses > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/4a39ee246f70 which > was added in early May. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8164989 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8164989/webrev/ > > > > Regards, > > Michael > From alexander.vorobyev at oracle.com Fri Sep 2 19:40:02 2016 From: alexander.vorobyev at oracle.com (Alexander Vorobyev) Date: Fri, 2 Sep 2016 22:40:02 +0300 Subject: [9-dev] Request for review: JDK-8146128: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig timeouts In-Reply-To: <2c927fd6-17ad-137e-669d-822a1bab7c57@oracle.com> References: <542E8041.1010101@oracle.com> <0be045b2-ec1f-cf9b-bcf8-86ca602eadec@oracle.com> <57C86FD9.2030508@oracle.com> <2b3ca745-2d17-61a7-07fe-50ef619d8dde@oracle.com> <2c927fd6-17ad-137e-669d-822a1bab7c57@oracle.com> Message-ID: <6814a4c0-272e-44a4-4dd3-e3dd30a61966@oracle.com> Here is a new webrew: http://cr.openjdk.java.net/~avorobye/8146128/webrev.01/ Changes: - timeout increased to 600; - TestAESMain now runs with 100 iterations and 1000 iterations for warm-up with -XX:CompileThresholdScaling=0.01 option added. Those changes allow our test to run much faster. Also, we still can be sure that methods are compiled (as I understand, by default compilation starts after 10000 iterations for server compiler, so settings listed above are suitable for us). About checking the remained time - how can we predict whether remained time is still enough for the next test case? Also, those test cases have different duration - it also makes our suggestions about time very vague. And if we just skip some test cases, we never know about it from test results (because whole test will be marked as passed). I am not sure, if we should add such functionality for really rare cases when there is not enough time. What do you think? On 01.09.2016 21:44, Vladimir Kozlov wrote: > Yes, in addition to timeout increase. > > Because we can always find very slow platform (SPARC VM, for example) > on which any reasonable timeout may be not enough. It would be rare > cases with increased timeout so that skipping remaining tests is fine, > I think. You can't increase timeout to hours. > > Thanks, > Vladimir > > On 9/1/16 11:36 AM, Alexander Vorobyev wrote: >> Do you mean to stop the test execution if there is not enough time >> remained? Even if not all test cases finished? >> >> >> On 01.09.2016 21:15, Vladimir Kozlov wrote: >>> Yes, removing jdk90dev from to: >>> >>> 300 is not enough. From bug report: >>> >>> elapsed time (seconds): 482.214 >>> >>> An other way to solve that is to check remaining time after each test >>> (forked VM) is executed and exit gracefully. >>> >>> Thanks, >>> Vladimir >>> >>> >>> >>> On 9/1/16 11:13 AM, Leonid Mesnik wrote: >>>> Hi >>>> >>>> The hotspot compiler changes should go to jdk9/hs-comp and not to >>>> 9-dev. >>>> Also hotspot-compiler-dev at openjdk.java.net alias should be used for >>>> compiler specific product and test changes. >>>> >>>> It is unclear from issue description/comment what is the root cause of >>>> failure and how it was fixed. Could you please add this information. >>>> >>>> Leonid >>>> >>>> On 01.09.2016 20:58, Alexander Vorobyev wrote: >>>>> >>>>> Hi All, >>>>> >>>>> I'd like review for JDK-8146128 >>>>> (https://bugs.openjdk.java.net/browse/JDK-8146128) >>>>> >>>>> Test passes with timeout increased. Looks like it times out in >>>>> sub-tests where AESIntrinsics are disabled (testNoUseAES(), >>>>> testNoUseAESIntrinsic()). The easiest way to fix this test is to >>>>> increase timeout. >>>>> >>>>> Run parameter was added: >>>>> @run main/othervm/timeout=300 >>>>> >>>>> >>>>> Here is webrev: >>>>> http://cr.openjdk.java.net/~avorobye/8146128/webrev.00/ >>>>> >>>>> >>>>> Thanks, >>>>> Alexander >>>>> >>>>> >>>>> >>>> >> From michael.c.berg at intel.com Fri Sep 2 20:05:13 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Fri, 2 Sep 2016 20:05:13 +0000 Subject: CR for RFR 8164989 In-Reply-To: References: Message-ID: Vladimir, please see the latest webrev for the comment addition: http://cr.openjdk.java.net/~mcberg/8164989/webrev.02/ Also I have create a new bug and referenced it to this jbs issue (https://bugs.openjdk.java.net/browse/JDK-8164989 ). See https://bugs.openjdk.java.net/browse/JDK-8165287 for details concerning the remaining issue for compress. Thanks, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, September 01, 2016 11:56 AM To: hotspot-compiler-dev at openjdk.java.net; Berg, Michael C Subject: Re: CR for RFR 8164989 Hi, Michael Please, add comment which explain why it is disabled ( 0 && ). File a bug (if you did not do that already) which will address the compress issue and reference it in the comment. Thanks, Vladimir On 8/30/16 6:30 PM, Berg, Michael C wrote: > Hi Folks, > > I would like to contribute a bug fix for SKX and KNL EVEX code gen. > The inflate and compress intrinsics on avx512 yield incorrect results > and cause derby, sunflow, xml.transform and xml.validation to fail. I > have disabled the avx512 context for compress as it needs some rework > and repaired inflate. Please review the resultant code. > > > > This code was tested as follows: hotspot jreg, SPECjvm2008 bdw, skx, > knl complete with no issues. This change addresses > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/4a39ee246f70 which > was added in early May. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8164989 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8164989/webrev/ > > > > Regards, > > Michael > From kishor.kharbas at intel.com Fri Sep 2 22:07:44 2016 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Fri, 2 Sep 2016 22:07:44 +0000 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> References: <57BE1AD4.7070403@oracle.com> <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> Message-ID: Thanks Vladimir, I have updated the patch : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.02/ I looked for other places in src/cpu/x86/vm. I feel every case is covered. - Kishor -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Thursday, September 1, 2016 11:39 AM To: Kharbas, Kishor ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows Good. But looks like some code relied on old stack layout in stubs, for example sha256_AVX2(): #ifndef _WIN64 _XMM_SAVE_SIZE = 0, #else _XMM_SAVE_SIZE = 8*16, #endif Please, check that all other related code is fixed too. (I looked on all cases of _WIN64 in src/cpu/x86/vm/). Thanks, Vladimir On 8/31/16 10:17 PM, Kharbas, Kishor wrote: > Hello, > > I removed the unwanted save and restore of registers in the range XMM6-XMM31 from the x64_64 stubs. > I also removed the #ifdef _WIN64 block from x86.ad file. > > Link to the new patch : > http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ > > Thanks > Kishor > > > -----Original Message----- > From: Kharbas, Kishor > Sent: Wednesday, August 24, 2016 6:24 PM > To: Vladimir Kozlov ; > hotspot-compiler-dev at openjdk.java.net > Cc: Kharbas, Kishor > Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get > clobbered by a JNI call on windows > > Thanks Vladimir for quick feedback. > I will look into the stubs which save the registers in the range XMM6-XMM31. Also the first comment makes perfect sense. > > Thanks > Kishor > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, August 24, 2016 3:08 PM > To: Kharbas, Kishor ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get > clobbered by a JNI call on windows > > Hi Kishor, > > First, #ifdef _WIN64 is not needed anymore since calling convention is similat to unix now. > > Second, I would like you to look more broadly. With this change we don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not sure that we can remove all #ifdef _WIN64 there but for most of them I think we can do. Please, look. > > Thanks, > Vladimir > > On 8/24/16 2:40 PM, Kharbas, Kishor wrote: >> Requesting the community to review the patch for >> https://bugs.openjdk.java.net/browse/JDK-8078122 >> >> Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 >> >> The patch changes the definitions of registers XMM6-XMM31 for WIN64. >> >> Thank you. >> >> Kishor >> From smita.kamath at intel.com Fri Sep 2 22:35:56 2016 From: smita.kamath at intel.com (Kamath, Smita) Date: Fri, 2 Sep 2016 22:35:56 +0000 Subject: FW: RFR (M): bug-id: bug summary In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B75AA104B@FMSMSX112.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B75AA104B@FMSMSX112.amr.corp.intel.com> Message-ID: <6563F381B547594081EF9DE181D07912444A5AF3@FMSMSX119.amr.corp.intel.com> Hi All, I would like to contribute an optimization for SHA 512 towards JDK 9.1. This optimization shows ~2X improvement on X86_64 platforms. Bug: https://bugs.openjdk.java.net/browse/JDK-8165381 Webrev: http://cr.openjdk.java.net/~vdeshpande/8165381/webrev.00/ Hotspot jtreg tests pass on Windows and Linux with this patch. Please review and sponsor. Thanks, Smita Kamath -------------- next part -------------- An HTML attachment was scrubbed... URL: From smita.kamath at intel.com Fri Sep 2 22:48:28 2016 From: smita.kamath at intel.com (Kamath, Smita) Date: Fri, 2 Sep 2016 22:48:28 +0000 Subject: RFR (M): 8165381 : Update for x86 SHA512 using AVX2 Message-ID: <6563F381B547594081EF9DE181D07912444A5B15@FMSMSX119.amr.corp.intel.com> Hi All, I would like to contribute an optimization for SHA 512 towards JDK 9.1. This optimization shows ~2X improvement on X86_64 platforms. Bug: https://bugs.openjdk.java.net/browse/JDK-8165381 Webrev: http://cr.openjdk.java.net/~vdeshpande/8165381/webrev.00/ Hotspot jtreg tests pass on Windows and Linux with this patch. Please review and sponsor. Thanks, Smita Kamath -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamsheed.c.m at oracle.com Mon Sep 5 07:53:04 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Mon, 5 Sep 2016 13:23:04 +0530 Subject: RFR: 8164508: unexpected profiling mismatch in c1 generated code Message-ID: <6d873f34-2f65-e96f-0189-ca8af07ae824@oracle.com> Hi, webrev: http://cr.openjdk.java.net/~jcm/8164508/webrev.00/ bug id: https://bugs.openjdk.java.net/browse/JDK-8164508 we were skipping profiling of first argument(recv) for virtual call sites to static callee. this was not done for non-inline case in c1. (see linked case for ref: https://bugs.openjdk.java.net/browse/JDK-8027631) - bool has_receiver = x->inlined() && !x->callee()->is_static() && !Bytecodes::has_receiver(bc); + bool has_receiver = x->callee()->is_loaded() && !x->callee()->is_static() && !Bytecodes::has_receiver(bc); above change is not absolutely necessary as this can happen only for _linkToVirtual,_linkToInterface sites inlining at present, and linker elimination and callee inlining always happen together in c1. Please review, Best Regards, Jamsheed -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Mon Sep 5 11:54:52 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 5 Sep 2016 11:54:52 +0000 Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version Message-ID: Hi, This fixes the RTM tests wrt. to supported platforms on ppc. Please review this change. I please need a sponsor. http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.bs/ http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.hs/ RTM uses special instructions that are only available on recent x86 cpus. On x86, this feature does not need OS support. On ppc, the equivalent functionality, hardware transactional memory, requires OS support. Thus the feature is only enabled by the VM if CPU and OS are at a specific level. The tests must check this. too. This holds for AIX and Linux. To do so, this change introduces rtm/predicate/SupportedOS.java which checks for proper OS versions on ppc, else returns true. The OS version is retrieved from Platform.java, which has new methods getOsVersionMajor() and getOsVersionMinor(). To simplify the checks in the tests, I also introduced a 3-way AndPredicate constructor. To simplify the OS version check on Aix, I change enabling RTM on Aix to require AIX 7.2. Before, it was enabled on AIX 7.1.3.30, which contains an important bug fix. The last digits of this version are not exported to os.version property, so I can not check for them in the test. Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Mon Sep 5 12:56:41 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 5 Sep 2016 14:56:41 +0200 Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version In-Reply-To: References: Message-ID: Hi Goetz, I think you've only forgot to import compiler.testlibrary.rtm.predicate.SupportedOS into test/compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java Also, in SupportedOS.java the line: public boolean getAsBoolean() is indented to far (should be four spaces less like the annotation in the line before). Besides that, the change looks good. Thanks for fixing this, Volker On Mon, Sep 5, 2016 at 1:54 PM, Lindenmaier, Goetz wrote: > Hi, > > > > This fixes the RTM tests wrt. to supported platforms on ppc. > > Please review this change. I please need a sponsor. > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.bs/ > > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.hs/ > > > RTM uses special instructions that are only available on recent x86 cpus. On > x86, this feature does not need OS support. On ppc, the equivalent > functionality, hardware transactional memory, requires OS support. Thus the > feature is only enabled by the VM if CPU and OS are at a specific level. The > tests must check this. too. This holds for AIX and Linux. > > > > To do so, this change introduces rtm/predicate/SupportedOS.java which checks > for proper OS versions on ppc, else returns true. > > The OS version is retrieved from Platform.java, which has new methods > getOsVersionMajor() and getOsVersionMinor(). > > To simplify the checks in the tests, I also introduced a 3-way AndPredicate > constructor. > > > > To simplify the OS version check on Aix, I change enabling RTM on Aix to > require AIX 7.2. > > Before, it was enabled on AIX 7.1.3.30, which contains an important bug fix. > The > > last digits of this version are not exported to os.version property, so I > can not > > check for them in the test. > > > > Best regards, > > Goetz. From doug.simon at oracle.com Mon Sep 5 16:45:56 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 5 Sep 2016 18:45:56 +0200 Subject: RFR: 8165434: [JVMCI] remove uses of setAccessible Message-ID: JVMCI currently uses java.lang.reflect.AccessibleObject.setAccessible to get at private internals of certain JDK objects (e.g. java.lang.reflect.Method::slot). In light of changes around java.lang.reflect.AccessibleObject::setAccessible at http://openjdk.java.net/jeps/261, this may require extra command line options at some point. To avoid that, I?ve removed all uses of setAccessible in JVMCI. http://cr.openjdk.java.net/~dnsimon/8165434/ https://bugs.openjdk.java.net/browse/JDK-8165434 -Doug From doug.simon at oracle.com Mon Sep 5 16:49:16 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 5 Sep 2016 18:49:16 +0200 Subject: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI Message-ID: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> In jvmci-8, we increased the interpreter code size when JVMCI code is included: http://hg.openjdk.java.net/graal/graal-jvmci-8/file/a074ae16281d/src/cpu/x86/vm/templateInterpreter_x86.hpp#l37 This needs to also be done in jdk9. https://bugs.openjdk.java.net/browse/JDK-8165457 http://cr.openjdk.java.net/~dnsimon/8165457/ -Doug From goetz.lindenmaier at sap.com Tue Sep 6 09:11:49 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 6 Sep 2016 09:11:49 +0000 Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version In-Reply-To: References: Message-ID: Hi Volker, thanks for the review! I fixed the two issues: http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/02/webrev.hs/ http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/02/webrev.bs/ Best regards, Goetz. > -----Original Message----- > From: Volker Simonis [mailto:volker.simonis at gmail.com] > Sent: Montag, 5. September 2016 14:57 > To: Lindenmaier, Goetz > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version > > Hi Goetz, > > I think you've only forgot to import > compiler.testlibrary.rtm.predicate.SupportedOS into > test/compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java > > Also, in SupportedOS.java the line: > > public boolean getAsBoolean() > > is indented to far (should be four spaces less like the annotation in > the line before). > > Besides that, the change looks good. > > Thanks for fixing this, > Volker > > On Mon, Sep 5, 2016 at 1:54 PM, Lindenmaier, Goetz > wrote: > > Hi, > > > > > > > > This fixes the RTM tests wrt. to supported platforms on ppc. > > > > Please review this change. I please need a sponsor. > > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.bs/ > > > > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.hs/ > > > > > > RTM uses special instructions that are only available on recent x86 cpus. On > > x86, this feature does not need OS support. On ppc, the equivalent > > functionality, hardware transactional memory, requires OS support. Thus > the > > feature is only enabled by the VM if CPU and OS are at a specific level. The > > tests must check this. too. This holds for AIX and Linux. > > > > > > > > To do so, this change introduces rtm/predicate/SupportedOS.java which > checks > > for proper OS versions on ppc, else returns true. > > > > The OS version is retrieved from Platform.java, which has new methods > > getOsVersionMajor() and getOsVersionMinor(). > > > > To simplify the checks in the tests, I also introduced a 3-way AndPredicate > > constructor. > > > > > > > > To simplify the OS version check on Aix, I change enabling RTM on Aix to > > require AIX 7.2. > > > > Before, it was enabled on AIX 7.1.3.30, which contains an important bug fix. > > The > > > > last digits of this version are not exported to os.version property, so I > > can not > > > > check for them in the test. > > > > > > > > Best regards, > > > > Goetz. From volker.simonis at gmail.com Tue Sep 6 10:21:03 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 6 Sep 2016 12:21:03 +0200 Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version In-Reply-To: References: Message-ID: Thumbs up from me! Volker On Tue, Sep 6, 2016 at 11:11 AM, Lindenmaier, Goetz wrote: > Hi Volker, > > thanks for the review! I fixed the two issues: > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/02/webrev.hs/ > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/02/webrev.bs/ > > Best regards, > Goetz. > > >> -----Original Message----- >> From: Volker Simonis [mailto:volker.simonis at gmail.com] >> Sent: Montag, 5. September 2016 14:57 >> To: Lindenmaier, Goetz >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version >> >> Hi Goetz, >> >> I think you've only forgot to import >> compiler.testlibrary.rtm.predicate.SupportedOS into >> test/compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >> >> Also, in SupportedOS.java the line: >> >> public boolean getAsBoolean() >> >> is indented to far (should be four spaces less like the annotation in >> the line before). >> >> Besides that, the change looks good. >> >> Thanks for fixing this, >> Volker >> >> On Mon, Sep 5, 2016 at 1:54 PM, Lindenmaier, Goetz >> wrote: >> > Hi, >> > >> > >> > >> > This fixes the RTM tests wrt. to supported platforms on ppc. >> > >> > Please review this change. I please need a sponsor. >> > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.bs/ >> > >> > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.hs/ >> > >> > >> > RTM uses special instructions that are only available on recent x86 cpus. On >> > x86, this feature does not need OS support. On ppc, the equivalent >> > functionality, hardware transactional memory, requires OS support. Thus >> the >> > feature is only enabled by the VM if CPU and OS are at a specific level. The >> > tests must check this. too. This holds for AIX and Linux. >> > >> > >> > >> > To do so, this change introduces rtm/predicate/SupportedOS.java which >> checks >> > for proper OS versions on ppc, else returns true. >> > >> > The OS version is retrieved from Platform.java, which has new methods >> > getOsVersionMajor() and getOsVersionMinor(). >> > >> > To simplify the checks in the tests, I also introduced a 3-way AndPredicate >> > constructor. >> > >> > >> > >> > To simplify the OS version check on Aix, I change enabling RTM on Aix to >> > require AIX 7.2. >> > >> > Before, it was enabled on AIX 7.1.3.30, which contains an important bug fix. >> > The >> > >> > last digits of this version are not exported to os.version property, so I >> > can not >> > >> > check for them in the test. >> > >> > >> > >> > Best regards, >> > >> > Goetz. From filipp.zhinkin at gmail.com Tue Sep 6 11:46:03 2016 From: filipp.zhinkin at gmail.com (Filipp Zhinkin) Date: Tue, 6 Sep 2016 14:46:03 +0300 Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version In-Reply-To: References: Message-ID: Hi, I would suggest to use something like Boolean.TRUE::booleanValue instead of null in AndPredicated ctor and use camel case for Platform's fields and methods. Otherwise the change looks good. Just for the record: all those predicates where introduced because there were no way to check OS/CPU/whatever using jtreg. Now it should be possible to skip tests using jreg's @required tag. So maybe we can get rid of some java code? :) // Not suggesting to do it right now. Regards, Filipp. On Tue, Sep 6, 2016 at 1:21 PM, Volker Simonis wrote: > Thumbs up from me! > > Volker > > On Tue, Sep 6, 2016 at 11:11 AM, Lindenmaier, Goetz > wrote: >> Hi Volker, >> >> thanks for the review! I fixed the two issues: >> http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/02/webrev.hs/ >> http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/02/webrev.bs/ >> >> Best regards, >> Goetz. >> >> >>> -----Original Message----- >>> From: Volker Simonis [mailto:volker.simonis at gmail.com] >>> Sent: Montag, 5. September 2016 14:57 >>> To: Lindenmaier, Goetz >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version >>> >>> Hi Goetz, >>> >>> I think you've only forgot to import >>> compiler.testlibrary.rtm.predicate.SupportedOS into >>> test/compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >>> >>> Also, in SupportedOS.java the line: >>> >>> public boolean getAsBoolean() >>> >>> is indented to far (should be four spaces less like the annotation in >>> the line before). >>> >>> Besides that, the change looks good. >>> >>> Thanks for fixing this, >>> Volker >>> >>> On Mon, Sep 5, 2016 at 1:54 PM, Lindenmaier, Goetz >>> wrote: >>> > Hi, >>> > >>> > >>> > >>> > This fixes the RTM tests wrt. to supported platforms on ppc. >>> > >>> > Please review this change. I please need a sponsor. >>> > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.bs/ >>> > >>> > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.hs/ >>> > >>> > >>> > RTM uses special instructions that are only available on recent x86 cpus. On >>> > x86, this feature does not need OS support. On ppc, the equivalent >>> > functionality, hardware transactional memory, requires OS support. Thus >>> the >>> > feature is only enabled by the VM if CPU and OS are at a specific level. The >>> > tests must check this. too. This holds for AIX and Linux. >>> > >>> > >>> > >>> > To do so, this change introduces rtm/predicate/SupportedOS.java which >>> checks >>> > for proper OS versions on ppc, else returns true. >>> > >>> > The OS version is retrieved from Platform.java, which has new methods >>> > getOsVersionMajor() and getOsVersionMinor(). >>> > >>> > To simplify the checks in the tests, I also introduced a 3-way AndPredicate >>> > constructor. >>> > >>> > >>> > >>> > To simplify the OS version check on Aix, I change enabling RTM on Aix to >>> > require AIX 7.2. >>> > >>> > Before, it was enabled on AIX 7.1.3.30, which contains an important bug fix. >>> > The >>> > >>> > last digits of this version are not exported to os.version property, so I >>> > can not >>> > >>> > check for them in the test. >>> > >>> > >>> > >>> > Best regards, >>> > >>> > Goetz. From goetz.lindenmaier at sap.com Tue Sep 6 13:12:24 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 6 Sep 2016 13:12:24 +0000 Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version In-Reply-To: References: Message-ID: <28e894e35a3a431aa92d05b310b48970@DEWDFE13DE50.global.corp.sap> Hi Filipp, thanks for reviewing my change! I fixed the two issues: http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.bs/ http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.hs/ The hotspot change is unchanged except for the reviewer attribution. I also fixed the comment in Platform.java: major->minor. Would you mind sponsoring the change? Best regards, Goetz. > -----Original Message----- > From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] > Sent: Dienstag, 6. September 2016 13:46 > To: Volker Simonis > Cc: Lindenmaier, Goetz ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version > > Hi, > > I would suggest to use something like Boolean.TRUE::booleanValue > instead of null in AndPredicated ctor and use camel case for > Platform's fields and methods. > Otherwise the change looks good. > > Just for the record: all those predicates where introduced because > there were no way to check OS/CPU/whatever using jtreg. > Now it should be possible to skip tests using jreg's @required tag. So > maybe we can get rid of some java code? :) > // Not suggesting to do it right now. > > Regards, > Filipp. > > On Tue, Sep 6, 2016 at 1:21 PM, Volker Simonis > wrote: > > Thumbs up from me! > > > > Volker > > > > On Tue, Sep 6, 2016 at 11:11 AM, Lindenmaier, Goetz > > wrote: > >> Hi Volker, > >> > >> thanks for the review! I fixed the two issues: > >> http://cr.openjdk.java.net/~goetz/wr16/8165235- > osRecog/02/webrev.hs/ > >> http://cr.openjdk.java.net/~goetz/wr16/8165235- > osRecog/02/webrev.bs/ > >> > >> Best regards, > >> Goetz. > >> > >> > >>> -----Original Message----- > >>> From: Volker Simonis [mailto:volker.simonis at gmail.com] > >>> Sent: Montag, 5. September 2016 14:57 > >>> To: Lindenmaier, Goetz > >>> Cc: hotspot-compiler-dev at openjdk.java.net > >>> Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS > version > >>> > >>> Hi Goetz, > >>> > >>> I think you've only forgot to import > >>> compiler.testlibrary.rtm.predicate.SupportedOS into > >>> test/compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java > >>> > >>> Also, in SupportedOS.java the line: > >>> > >>> public boolean getAsBoolean() > >>> > >>> is indented to far (should be four spaces less like the annotation in > >>> the line before). > >>> > >>> Besides that, the change looks good. > >>> > >>> Thanks for fixing this, > >>> Volker > >>> > >>> On Mon, Sep 5, 2016 at 1:54 PM, Lindenmaier, Goetz > >>> wrote: > >>> > Hi, > >>> > > >>> > > >>> > > >>> > This fixes the RTM tests wrt. to supported platforms on ppc. > >>> > > >>> > Please review this change. I please need a sponsor. > >>> > http://cr.openjdk.java.net/~goetz/wr16/8165235- > osRecog/01/webrev.bs/ > >>> > > >>> > http://cr.openjdk.java.net/~goetz/wr16/8165235- > osRecog/01/webrev.hs/ > >>> > > >>> > > >>> > RTM uses special instructions that are only available on recent x86 > cpus. On > >>> > x86, this feature does not need OS support. On ppc, the equivalent > >>> > functionality, hardware transactional memory, requires OS support. > Thus > >>> the > >>> > feature is only enabled by the VM if CPU and OS are at a specific level. > The > >>> > tests must check this. too. This holds for AIX and Linux. > >>> > > >>> > > >>> > > >>> > To do so, this change introduces rtm/predicate/SupportedOS.java > which > >>> checks > >>> > for proper OS versions on ppc, else returns true. > >>> > > >>> > The OS version is retrieved from Platform.java, which has new > methods > >>> > getOsVersionMajor() and getOsVersionMinor(). > >>> > > >>> > To simplify the checks in the tests, I also introduced a 3-way > AndPredicate > >>> > constructor. > >>> > > >>> > > >>> > > >>> > To simplify the OS version check on Aix, I change enabling RTM on Aix > to > >>> > require AIX 7.2. > >>> > > >>> > Before, it was enabled on AIX 7.1.3.30, which contains an important > bug fix. > >>> > The > >>> > > >>> > last digits of this version are not exported to os.version property, so I > >>> > can not > >>> > > >>> > check for them in the test. > >>> > > >>> > > >>> > > >>> > Best regards, > >>> > > >>> > Goetz. From HORII at jp.ibm.com Tue Sep 6 14:50:13 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Tue, 6 Sep 2016 23:50:13 +0900 Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic Message-ID: Dear Vladimir and all: Can I please request reviews for the following change? JIRA: https://bugs.openjdk.java.net/browse/JDK-8164920 webrev: http://cr.openjdk.java.net/~gromero/8164920/01/ As Volker's comments in the above JIRA, this is a ppc64-only improvement which will not affect any of the Oracle platforms in any way. This change includes new implementation of CRC32 Intrinsics for ppc64le. In my local experiment, CRC32 of 64KB was calculated more than 20 times faster than original. Performance of CRC32 Intrinsic is important to run recent Apache Cassandra. A Cassandra daemon needs to read 64KB data from a disk with CRC32 checksum by default. This JIRA entry has "jdk9-fc-request" label. If there is a chance to include new change in JDK 9 for ppc64le, I would like to request a review for this change. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Sep 6 16:08:45 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Sep 2016 09:08:45 -0700 Subject: RFR: 8164508: unexpected profiling mismatch in c1 generated code In-Reply-To: <6d873f34-2f65-e96f-0189-ca8af07ae824@oracle.com> References: <6d873f34-2f65-e96f-0189-ca8af07ae824@oracle.com> Message-ID: <3ace86b5-ff91-7d2c-9b74-e4b9c497365c@oracle.com> Good. thanks, Vladimir On 9/5/16 12:53 AM, Jamsheed C m wrote: > Hi, > > webrev: http://cr.openjdk.java.net/~jcm/8164508/webrev.00/ > > bug id: https://bugs.openjdk.java.net/browse/JDK-8164508 > > > we were skipping profiling of first argument(recv) for virtual call > sites to static callee. this was not done for non-inline case in c1. > (see linked case for ref: https://bugs.openjdk.java.net/browse/JDK-8027631) > > - bool has_receiver = x->inlined() && !x->callee()->is_static() && > !Bytecodes::has_receiver(bc); > + bool has_receiver = x->callee()->is_loaded() && > !x->callee()->is_static() && !Bytecodes::has_receiver(bc); above change > is not absolutely necessary as this can happen only for > _linkToVirtual,_linkToInterface sites inlining at present, and linker > elimination and callee inlining always happen together in c1. Please > review, Best Regards, Jamsheed > > > From vladimir.kozlov at oracle.com Tue Sep 6 16:10:20 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Sep 2016 09:10:20 -0700 Subject: CR for RFR 8164989 In-Reply-To: References: Message-ID: <8c9d34cf-7e2c-11b7-4ab4-be8070315fad@oracle.com> Looks good. I will sponsor it. Thanks, Vladimir On 9/2/16 1:05 PM, Berg, Michael C wrote: > Vladimir, please see the latest webrev for the comment addition: > > http://cr.openjdk.java.net/~mcberg/8164989/webrev.02/ > > Also I have create a new bug and referenced it to this jbs issue (https://bugs.openjdk.java.net/browse/JDK-8164989 ). > > See https://bugs.openjdk.java.net/browse/JDK-8165287 for details concerning the remaining issue for compress. > > Thanks, > Michael > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, September 01, 2016 11:56 AM > To: hotspot-compiler-dev at openjdk.java.net; Berg, Michael C > Subject: Re: CR for RFR 8164989 > > Hi, Michael > > Please, add comment which explain why it is disabled ( 0 && ). > File a bug (if you did not do that already) which will address the compress issue and reference it in the comment. > > Thanks, > Vladimir > > On 8/30/16 6:30 PM, Berg, Michael C wrote: >> Hi Folks, >> >> I would like to contribute a bug fix for SKX and KNL EVEX code gen. >> The inflate and compress intrinsics on avx512 yield incorrect results >> and cause derby, sunflow, xml.transform and xml.validation to fail. I >> have disabled the avx512 context for compress as it needs some rework >> and repaired inflate. Please review the resultant code. >> >> >> >> This code was tested as follows: hotspot jreg, SPECjvm2008 bdw, skx, >> knl complete with no issues. This change addresses >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/4a39ee246f70 which >> was added in early May. >> >> >> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8164989 >> >> >> webrev: >> >> http://cr.openjdk.java.net/~mcberg/8164989/webrev/ >> >> >> >> Regards, >> >> Michael >> From vladimir.kozlov at oracle.com Tue Sep 6 16:49:06 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Sep 2016 09:49:06 -0700 Subject: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI In-Reply-To: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> References: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> Message-ID: <77b35cd8-0d95-d661-cc91-324112cdd62e@oracle.com> Good. Thanks, Vladimir On 9/5/16 9:49 AM, Doug Simon wrote: > In jvmci-8, we increased the interpreter code size when JVMCI code is included: > > http://hg.openjdk.java.net/graal/graal-jvmci-8/file/a074ae16281d/src/cpu/x86/vm/templateInterpreter_x86.hpp#l37 > > This needs to also be done in jdk9. > > https://bugs.openjdk.java.net/browse/JDK-8165457 > http://cr.openjdk.java.net/~dnsimon/8165457/ > > -Doug > From vladimir.kozlov at oracle.com Tue Sep 6 16:51:55 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Sep 2016 09:51:55 -0700 Subject: [9-dev] Request for review: JDK-8146128: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig timeouts In-Reply-To: <6814a4c0-272e-44a4-4dd3-e3dd30a61966@oracle.com> References: <542E8041.1010101@oracle.com> <0be045b2-ec1f-cf9b-bcf8-86ca602eadec@oracle.com> <57C86FD9.2030508@oracle.com> <2b3ca745-2d17-61a7-07fe-50ef619d8dde@oracle.com> <2c927fd6-17ad-137e-669d-822a1bab7c57@oracle.com> <6814a4c0-272e-44a4-4dd3-e3dd30a61966@oracle.com> Message-ID: <5b26f27d-ade9-1dab-692b-46f9c996d9b2@oracle.com> Yes, this looks reasonable. We may not need to check time between tests since you significantly reduced number of iterations. I think we can go with these changes. Thanks, Vladimir On 9/2/16 12:40 PM, Alexander Vorobyev wrote: > Here is a new webrew: > http://cr.openjdk.java.net/~avorobye/8146128/webrev.01/ > > Changes: > > - timeout increased to 600; > > - TestAESMain now runs with 100 iterations and 1000 iterations for > warm-up with -XX:CompileThresholdScaling=0.01 option added. > > Those changes allow our test to run much faster. Also, we still can be > sure that methods are compiled (as I understand, by default compilation > starts after 10000 iterations for server compiler, so settings listed > above are suitable for us). > > About checking the remained time - how can we predict whether remained > time is still enough for the next test case? Also, those test cases have > different duration - it also makes our suggestions about time very > vague. And if we just skip some test cases, we never know about it from > test results (because whole test will be marked as passed). I am not > sure, if we should add such functionality for really rare cases when > there is not enough time. What do you think? > > > On 01.09.2016 21:44, Vladimir Kozlov wrote: >> Yes, in addition to timeout increase. >> >> Because we can always find very slow platform (SPARC VM, for example) >> on which any reasonable timeout may be not enough. It would be rare >> cases with increased timeout so that skipping remaining tests is fine, >> I think. You can't increase timeout to hours. >> >> Thanks, >> Vladimir >> >> On 9/1/16 11:36 AM, Alexander Vorobyev wrote: >>> Do you mean to stop the test execution if there is not enough time >>> remained? Even if not all test cases finished? >>> >>> >>> On 01.09.2016 21:15, Vladimir Kozlov wrote: >>>> Yes, removing jdk90dev from to: >>>> >>>> 300 is not enough. From bug report: >>>> >>>> elapsed time (seconds): 482.214 >>>> >>>> An other way to solve that is to check remaining time after each test >>>> (forked VM) is executed and exit gracefully. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> >>>> >>>> On 9/1/16 11:13 AM, Leonid Mesnik wrote: >>>>> Hi >>>>> >>>>> The hotspot compiler changes should go to jdk9/hs-comp and not to >>>>> 9-dev. >>>>> Also hotspot-compiler-dev at openjdk.java.net alias should be used for >>>>> compiler specific product and test changes. >>>>> >>>>> It is unclear from issue description/comment what is the root cause of >>>>> failure and how it was fixed. Could you please add this information. >>>>> >>>>> Leonid >>>>> >>>>> On 01.09.2016 20:58, Alexander Vorobyev wrote: >>>>>> >>>>>> Hi All, >>>>>> >>>>>> I'd like review for JDK-8146128 >>>>>> (https://bugs.openjdk.java.net/browse/JDK-8146128) >>>>>> >>>>>> Test passes with timeout increased. Looks like it times out in >>>>>> sub-tests where AESIntrinsics are disabled (testNoUseAES(), >>>>>> testNoUseAESIntrinsic()). The easiest way to fix this test is to >>>>>> increase timeout. >>>>>> >>>>>> Run parameter was added: >>>>>> @run main/othervm/timeout=300 >>>>>> >>>>>> >>>>>> Here is webrev: >>>>>> http://cr.openjdk.java.net/~avorobye/8146128/webrev.00/ >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Alexander >>>>>> >>>>>> >>>>>> >>>>> >>> > From vladimir.kozlov at oracle.com Tue Sep 6 17:12:13 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Sep 2016 10:12:13 -0700 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: References: <57BE1AD4.7070403@oracle.com> <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> Message-ID: <4af19c5d-9a7f-d18b-820b-6f3664b8183a@oracle.com> Good. I start testing these changes. I will push it if testing pass. Thanks, Vladimir On 9/2/16 3:07 PM, Kharbas, Kishor wrote: > Thanks Vladimir, > > I have updated the patch : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.02/ > > I looked for other places in src/cpu/x86/vm. I feel every case is covered. > > - Kishor > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, September 1, 2016 11:39 AM > To: Kharbas, Kishor ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows > > Good. But looks like some code relied on old stack layout in stubs, for example sha256_AVX2(): > > #ifndef _WIN64 > _XMM_SAVE_SIZE = 0, > #else > _XMM_SAVE_SIZE = 8*16, > #endif > > Please, check that all other related code is fixed too. (I looked on all cases of _WIN64 in src/cpu/x86/vm/). > > Thanks, > Vladimir > > On 8/31/16 10:17 PM, Kharbas, Kishor wrote: >> Hello, >> >> I removed the unwanted save and restore of registers in the range XMM6-XMM31 from the x64_64 stubs. >> I also removed the #ifdef _WIN64 block from x86.ad file. >> >> Link to the new patch : >> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ >> >> Thanks >> Kishor >> >> >> -----Original Message----- >> From: Kharbas, Kishor >> Sent: Wednesday, August 24, 2016 6:24 PM >> To: Vladimir Kozlov ; >> hotspot-compiler-dev at openjdk.java.net >> Cc: Kharbas, Kishor >> Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get >> clobbered by a JNI call on windows >> >> Thanks Vladimir for quick feedback. >> I will look into the stubs which save the registers in the range XMM6-XMM31. Also the first comment makes perfect sense. >> >> Thanks >> Kishor >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, August 24, 2016 3:08 PM >> To: Kharbas, Kishor ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >> clobbered by a JNI call on windows >> >> Hi Kishor, >> >> First, #ifdef _WIN64 is not needed anymore since calling convention is similat to unix now. >> >> Second, I would like you to look more broadly. With this change we don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not sure that we can remove all #ifdef _WIN64 there but for most of them I think we can do. Please, look. >> >> Thanks, >> Vladimir >> >> On 8/24/16 2:40 PM, Kharbas, Kishor wrote: >>> Requesting the community to review the patch for >>> https://bugs.openjdk.java.net/browse/JDK-8078122 >>> >>> Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 >>> >>> The patch changes the definitions of registers XMM6-XMM31 for WIN64. >>> >>> Thank you. >>> >>> Kishor >>> From kishor.kharbas at intel.com Tue Sep 6 18:08:57 2016 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Tue, 6 Sep 2016 18:08:57 +0000 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: <4af19c5d-9a7f-d18b-820b-6f3664b8183a@oracle.com> References: <57BE1AD4.7070403@oracle.com> <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> <4af19c5d-9a7f-d18b-820b-6f3664b8183a@oracle.com> Message-ID: Thanks Vladimir! -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, September 6, 2016 10:12 AM To: Kharbas, Kishor ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows Good. I start testing these changes. I will push it if testing pass. Thanks, Vladimir On 9/2/16 3:07 PM, Kharbas, Kishor wrote: > Thanks Vladimir, > > I have updated the patch : > http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.02/ > > I looked for other places in src/cpu/x86/vm. I feel every case is covered. > > - Kishor > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Thursday, September 1, 2016 11:39 AM > To: Kharbas, Kishor ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get > clobbered by a JNI call on windows > > Good. But looks like some code relied on old stack layout in stubs, for example sha256_AVX2(): > > #ifndef _WIN64 > _XMM_SAVE_SIZE = 0, > #else > _XMM_SAVE_SIZE = 8*16, > #endif > > Please, check that all other related code is fixed too. (I looked on all cases of _WIN64 in src/cpu/x86/vm/). > > Thanks, > Vladimir > > On 8/31/16 10:17 PM, Kharbas, Kishor wrote: >> Hello, >> >> I removed the unwanted save and restore of registers in the range XMM6-XMM31 from the x64_64 stubs. >> I also removed the #ifdef _WIN64 block from x86.ad file. >> >> Link to the new patch : >> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ >> >> Thanks >> Kishor >> >> >> -----Original Message----- >> From: Kharbas, Kishor >> Sent: Wednesday, August 24, 2016 6:24 PM >> To: Vladimir Kozlov ; >> hotspot-compiler-dev at openjdk.java.net >> Cc: Kharbas, Kishor >> Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get >> clobbered by a JNI call on windows >> >> Thanks Vladimir for quick feedback. >> I will look into the stubs which save the registers in the range XMM6-XMM31. Also the first comment makes perfect sense. >> >> Thanks >> Kishor >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, August 24, 2016 3:08 PM >> To: Kharbas, Kishor ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >> clobbered by a JNI call on windows >> >> Hi Kishor, >> >> First, #ifdef _WIN64 is not needed anymore since calling convention is similat to unix now. >> >> Second, I would like you to look more broadly. With this change we don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not sure that we can remove all #ifdef _WIN64 there but for most of them I think we can do. Please, look. >> >> Thanks, >> Vladimir >> >> On 8/24/16 2:40 PM, Kharbas, Kishor wrote: >>> Requesting the community to review the patch for >>> https://bugs.openjdk.java.net/browse/JDK-8078122 >>> >>> Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 >>> >>> The patch changes the definitions of registers XMM6-XMM31 for WIN64. >>> >>> Thank you. >>> >>> Kishor >>> From cthalinger at twitter.com Tue Sep 6 18:12:21 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Tue, 6 Sep 2016 08:12:21 -1000 Subject: RFR: 8165434: [JVMCI] remove uses of setAccessible In-Reply-To: References: Message-ID: <864558C5-C2AD-4D6B-BB6F-568F00BBE28A@twitter.com> > On Sep 5, 2016, at 6:45 AM, Doug Simon wrote: > > JVMCI currently uses java.lang.reflect.AccessibleObject.setAccessible to get at private internals of certain JDK objects (e.g. java.lang.reflect.Method::slot). In light of changes around java.lang.reflect.AccessibleObject::setAccessible at http://openjdk.java.net/jeps/261, this may require extra command line options at some point. To avoid that, I?ve removed all uses of setAccessible in JVMCI. > > http://cr.openjdk.java.net/~dnsimon/8165434/ src/jdk.vm.ci/share/classes/jdk.vm.ci.meta/src/jdk/vm/ci/meta/ModifiersProvider.java + int BRIDGE = 0x0040; + int VARARGS = 0x0080; + int SYNTHETIC = 0x1000; + int ANNOTATION = 0x2000; + int ENUM = 0x4000; I wish we could avoid that. We can?t use this stuff because it?s HotSpot-dependent, right? + assert ModifiersProvider.SYNTHETIC == getConstant("JVM_ACC_SYNTHETIC", Integer.class); + assert ModifiersProvider.ANNOTATION == getConstant("JVM_ACC_ANNOTATION", Integer.class); + assert ModifiersProvider.BRIDGE == getConstant("JVM_ACC_BRIDGE", Integer.class); + assert ModifiersProvider.VARARGS == getConstant("JVM_ACC_VARARGS", Integer.class); + assert ModifiersProvider.ENUM == getConstant("JVM_ACC_ENUM", Integer.class); What if we convert these constants to interface methods and the VM-dependent part has to implement them? Or maybe even keep the fields and assign them via interface methods. src/share/vm/jvmci/vmStructs_jvmci.cpp declare_constant(JVM_ACC_FIELD_HAS_GENERIC_SIGNATURE) \ + declare_preprocessor_constant("JVM_ACC_VARARGS", JVM_ACC_VARARGS) \ + declare_preprocessor_constant("JVM_ACC_BRIDGE", JVM_ACC_BRIDGE) \ + declare_preprocessor_constant("JVM_ACC_ANNOTATION", JVM_ACC_ANNOTATION) \ + declare_preprocessor_constant("JVM_ACC_ENUM", JVM_ACC_ENUM) \ declare_preprocessor_constant("JVM_ACC_SYNTHETIC", JVM_ACC_SYNTHETIC) \ Please align the ?\?. Otherwise this looks good and generally a good cleanup. > https://bugs.openjdk.java.net/browse/JDK-8165434 > > -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From cthalinger at twitter.com Tue Sep 6 18:14:17 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Tue, 6 Sep 2016 08:14:17 -1000 Subject: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI In-Reply-To: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> References: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> Message-ID: > On Sep 5, 2016, at 6:49 AM, Doug Simon wrote: > > In jvmci-8, we increased the interpreter code size when JVMCI code is included: > > http://hg.openjdk.java.net/graal/graal-jvmci-8/file/a074ae16281d/src/cpu/x86/vm/templateInterpreter_x86.hpp#l37 What about SPARC? Have we ever seen a problem there? Or AArch64 for that matter? > > This needs to also be done in jdk9. > > https://bugs.openjdk.java.net/browse/JDK-8165457 > http://cr.openjdk.java.net/~dnsimon/8165457/ > > -Doug From vladimir.kozlov at oracle.com Tue Sep 6 21:31:16 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 6 Sep 2016 14:31:16 -0700 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: <4af19c5d-9a7f-d18b-820b-6f3664b8183a@oracle.com> References: <57BE1AD4.7070403@oracle.com> <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> <4af19c5d-9a7f-d18b-820b-6f3664b8183a@oracle.com> Message-ID: <7de8489c-943b-5ecf-48c1-0bffad101070@oracle.com> Next jtreg test failed on 32-bit Linux: hotspot/test/compiler/runtime/Test7196199.java ----------System.err:(57/2416)---------- test_incrc: [41] = 8.081506E20 != 150000.0 test_incrc: [42] = 1.8632992E31 != 150000.0 test_incrc: [43] = 2.8397877E29 != 150000.0 ... https://bugs.openjdk.java.net/browse/JDK-7196199 was related to Upper bits (64-255) of XMM (YMM) registers are not saved/restored in interrupt handle code during safepoint. Looks like your changes are not enough. Vladimir On 9/6/16 10:12 AM, Vladimir Kozlov wrote: > Good. I start testing these changes. I will push it if testing pass. > > Thanks, > Vladimir > > On 9/2/16 3:07 PM, Kharbas, Kishor wrote: >> Thanks Vladimir, >> >> I have updated the patch : >> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.02/ >> >> I looked for other places in src/cpu/x86/vm. I feel every case is >> covered. >> >> - Kishor >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, September 1, 2016 11:39 AM >> To: Kharbas, Kishor ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >> clobbered by a JNI call on windows >> >> Good. But looks like some code relied on old stack layout in stubs, >> for example sha256_AVX2(): >> >> #ifndef _WIN64 >> _XMM_SAVE_SIZE = 0, >> #else >> _XMM_SAVE_SIZE = 8*16, >> #endif >> >> Please, check that all other related code is fixed too. (I looked on >> all cases of _WIN64 in src/cpu/x86/vm/). >> >> Thanks, >> Vladimir >> >> On 8/31/16 10:17 PM, Kharbas, Kishor wrote: >>> Hello, >>> >>> I removed the unwanted save and restore of registers in the range >>> XMM6-XMM31 from the x64_64 stubs. >>> I also removed the #ifdef _WIN64 block from x86.ad file. >>> >>> Link to the new patch : >>> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ >>> >>> Thanks >>> Kishor >>> >>> >>> -----Original Message----- >>> From: Kharbas, Kishor >>> Sent: Wednesday, August 24, 2016 6:24 PM >>> To: Vladimir Kozlov ; >>> hotspot-compiler-dev at openjdk.java.net >>> Cc: Kharbas, Kishor >>> Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get >>> clobbered by a JNI call on windows >>> >>> Thanks Vladimir for quick feedback. >>> I will look into the stubs which save the registers in the range >>> XMM6-XMM31. Also the first comment makes perfect sense. >>> >>> Thanks >>> Kishor >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, August 24, 2016 3:08 PM >>> To: Kharbas, Kishor ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >>> clobbered by a JNI call on windows >>> >>> Hi Kishor, >>> >>> First, #ifdef _WIN64 is not needed anymore since calling convention >>> is similat to unix now. >>> >>> Second, I would like you to look more broadly. With this change we >>> don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not >>> sure that we can remove all #ifdef _WIN64 there but for most of them >>> I think we can do. Please, look. >>> >>> Thanks, >>> Vladimir >>> >>> On 8/24/16 2:40 PM, Kharbas, Kishor wrote: >>>> Requesting the community to review the patch for >>>> https://bugs.openjdk.java.net/browse/JDK-8078122 >>>> >>>> Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 >>>> >>>> The patch changes the definitions of registers XMM6-XMM31 for WIN64. >>>> >>>> Thank you. >>>> >>>> Kishor >>>> From doug.simon at oracle.com Tue Sep 6 21:37:52 2016 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 6 Sep 2016 23:37:52 +0200 Subject: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI In-Reply-To: References: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> Message-ID: <7ED300F2-253B-4550-BF5E-878A99EDAEB2@oracle.com> > On 06 Sep 2016, at 20:14, Christian Thalinger wrote: > > >> On Sep 5, 2016, at 6:49 AM, Doug Simon wrote: >> >> In jvmci-8, we increased the interpreter code size when JVMCI code is included: >> >> http://hg.openjdk.java.net/graal/graal-jvmci-8/file/a074ae16281d/src/cpu/x86/vm/templateInterpreter_x86.hpp#l37 > > What about SPARC? Have we ever seen a problem there? Or AArch64 for that matter? I?ve only ever seen problems on AMD64. I?ve never seen it on SPARC and have never run on AArch64. The real fix is that the interpreter generator should never have to guess the size of the code buffer it needs but should resize things as needed after generating the interpreter. -Doug From cthalinger at twitter.com Tue Sep 6 21:39:17 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Tue, 6 Sep 2016 11:39:17 -1000 Subject: JVMCI compiler thread idle state is RUNNABLE Message-ID: <959DA194-390F-49B2-97FA-CE402CA9D03D@twitter.com> One thing we noticed here at Twitter is that JVMCI threads are not hidden (is_hidden_from_external_view) but at the same time they show up as always active. I don?t know the history here but I?m speculating that since compiler threads were always hidden no-one bothered. In the SIGQUIT thread dump compiler threads show up as RUNNABLE: "C1 CompilerThread3" #8 daemon prio=9 os_prio=31 tid=0x00007fdcc2016800 nid=0x5103 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread2" #7 daemon prio=9 os_prio=31 tid=0x00007fdcc2821800 nid=0x4f03 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE The specification of RUNNABLE is: /** * Thread state for a runnable thread. A thread in the runnable * state is executing in the Java virtual machine but it may * be waiting for other resources from the operating system * such as processor. */ RUNNABLE, and that makes sense. But this is very confusing to the user (as one of our internal users reported to me). Maybe JVMCI threads should just be hidden, too? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cthalinger at twitter.com Tue Sep 6 21:58:08 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Tue, 6 Sep 2016 11:58:08 -1000 Subject: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI In-Reply-To: <7ED300F2-253B-4550-BF5E-878A99EDAEB2@oracle.com> References: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> <7ED300F2-253B-4550-BF5E-878A99EDAEB2@oracle.com> Message-ID: <92B9E4F8-DF56-475B-A9EC-6FB179C58925@twitter.com> > On Sep 6, 2016, at 11:37 AM, Doug Simon wrote: > > >> On 06 Sep 2016, at 20:14, Christian Thalinger wrote: >> >> >>> On Sep 5, 2016, at 6:49 AM, Doug Simon wrote: >>> >>> In jvmci-8, we increased the interpreter code size when JVMCI code is included: >>> >>> http://hg.openjdk.java.net/graal/graal-jvmci-8/file/a074ae16281d/src/cpu/x86/vm/templateInterpreter_x86.hpp#l37 >> >> What about SPARC? Have we ever seen a problem there? Or AArch64 for that matter? > > I?ve only ever seen problems on AMD64. I?ve never seen it on SPARC and have never run on AArch64. > > The real fix is that the interpreter generator should never have to guess the size of the code buffer it needs but should resize things as needed after generating the interpreter. Yes, it should. From kishor.kharbas at intel.com Wed Sep 7 00:39:31 2016 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Wed, 7 Sep 2016 00:39:31 +0000 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: <7de8489c-943b-5ecf-48c1-0bffad101070@oracle.com> References: <57BE1AD4.7070403@oracle.com> <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> <4af19c5d-9a7f-d18b-820b-6f3664b8183a@oracle.com> <7de8489c-943b-5ecf-48c1-0bffad101070@oracle.com> Message-ID: Hi Vladimir, The patch only touches code in _WIN64. I am having hard time to understand why the test fails for 32-bit Linux Btw, that test passes on Windows 64 platform. I am planning to test on Linux too. Thanks Kishor -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, September 6, 2016 2:31 PM To: Kharbas, Kishor ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows Next jtreg test failed on 32-bit Linux: hotspot/test/compiler/runtime/Test7196199.java ----------System.err:(57/2416)---------- test_incrc: [41] = 8.081506E20 != 150000.0 test_incrc: [42] = 1.8632992E31 != 150000.0 test_incrc: [43] = 2.8397877E29 != 150000.0 ... https://bugs.openjdk.java.net/browse/JDK-7196199 was related to Upper bits (64-255) of XMM (YMM) registers are not saved/restored in interrupt handle code during safepoint. Looks like your changes are not enough. Vladimir On 9/6/16 10:12 AM, Vladimir Kozlov wrote: > Good. I start testing these changes. I will push it if testing pass. > > Thanks, > Vladimir > > On 9/2/16 3:07 PM, Kharbas, Kishor wrote: >> Thanks Vladimir, >> >> I have updated the patch : >> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.02/ >> >> I looked for other places in src/cpu/x86/vm. I feel every case is >> covered. >> >> - Kishor >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, September 1, 2016 11:39 AM >> To: Kharbas, Kishor ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >> clobbered by a JNI call on windows >> >> Good. But looks like some code relied on old stack layout in stubs, >> for example sha256_AVX2(): >> >> #ifndef _WIN64 >> _XMM_SAVE_SIZE = 0, >> #else >> _XMM_SAVE_SIZE = 8*16, >> #endif >> >> Please, check that all other related code is fixed too. (I looked on >> all cases of _WIN64 in src/cpu/x86/vm/). >> >> Thanks, >> Vladimir >> >> On 8/31/16 10:17 PM, Kharbas, Kishor wrote: >>> Hello, >>> >>> I removed the unwanted save and restore of registers in the range >>> XMM6-XMM31 from the x64_64 stubs. >>> I also removed the #ifdef _WIN64 block from x86.ad file. >>> >>> Link to the new patch : >>> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ >>> >>> Thanks >>> Kishor >>> >>> >>> -----Original Message----- >>> From: Kharbas, Kishor >>> Sent: Wednesday, August 24, 2016 6:24 PM >>> To: Vladimir Kozlov ; >>> hotspot-compiler-dev at openjdk.java.net >>> Cc: Kharbas, Kishor >>> Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get >>> clobbered by a JNI call on windows >>> >>> Thanks Vladimir for quick feedback. >>> I will look into the stubs which save the registers in the range >>> XMM6-XMM31. Also the first comment makes perfect sense. >>> >>> Thanks >>> Kishor >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, August 24, 2016 3:08 PM >>> To: Kharbas, Kishor ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >>> clobbered by a JNI call on windows >>> >>> Hi Kishor, >>> >>> First, #ifdef _WIN64 is not needed anymore since calling convention >>> is similat to unix now. >>> >>> Second, I would like you to look more broadly. With this change we >>> don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not >>> sure that we can remove all #ifdef _WIN64 there but for most of them >>> I think we can do. Please, look. >>> >>> Thanks, >>> Vladimir >>> >>> On 8/24/16 2:40 PM, Kharbas, Kishor wrote: >>>> Requesting the community to review the patch for >>>> https://bugs.openjdk.java.net/browse/JDK-8078122 >>>> >>>> Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 >>>> >>>> The patch changes the definitions of registers XMM6-XMM31 for WIN64. >>>> >>>> Thank you. >>>> >>>> Kishor >>>> From goetz.lindenmaier at sap.com Wed Sep 7 10:07:57 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 7 Sep 2016 10:07:57 +0000 Subject: [9] RFR (M): 8160543: C1: Crash in java.lang.String.indexOf in some java.sql tests In-Reply-To: <57C6EA3F.1060702@oracle.com> References: <57C6EA3F.1060702@oracle.com> Message-ID: Hi Nils, I encountered this issue in our nightly jck test runs with -Xcomp. I applied your fix to the VM tested, and I can no more observe the error. Also, I had a look at the code. It looks good. The if around assert(klass->is_loaded(), "sanity"); could be merged into the assertion. Also, if this holds, a row of calls to klass->is_loaded() can be removed. Please fix the indentation in c1_GraphBuilder.cpp 2056ff. Thanks for fixing this, Goetz. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Nils Eliasson > Sent: Mittwoch, 31. August 2016 16:31 > To: hotspot-compiler-dev at openjdk.java.net compiler dev at openjdk.java.net> > Subject: [9] RFR (M): 8160543: C1: Crash in java.lang.String.indexOf in some > java.sql tests > > Hi, > > This is fixes for bug [1] JDK-8160543 "C1: Crash in java.lang.String.indexOf in > some java.sql tests" and [2] JDK-8154172 "NPE is thrown instead of linkage > error when invoking nonexistent method " > > * Description: > > Problem in bug #2: A method that is not loaded must not have null check at > the call. The unloaded method may not exist and then we may throw a NPE > on a null receiver before LinkageError and violate the VM spec. > > Problem in bug #1: A final method that is not loaded at compile time (the > final-property is unknown), but is actually loaded from another classloader > (and may already be compiled) must null check its receiver. The null check > can not be at the call site since it would violate #2. Instead the call will have to > use the target methods unverified entry point. > > An additional problem i encountered is that profiling requires a null check, > but if the method isn't loaded we can't add one. So we can't profile unloaded > methods. > > The solution to these problems shouldn't introduce any regression in the > normal use case. Unloaded methods is only common in the compiler when > using -Xcomp when the interpreter hasn't made sure everything is loaded. I > have made the trade-off that it is acceptable to have an performance > regression in the -Xcomp case in order to meet the VM specification. > > * Summary of changes: > > hotspot/src/share/vm/code/compiledIC.cpp > > - if (static_bound || is_optimized) { > + if (is_optimized) { > > static_bound is true if the method at resolve time is declared final. This is > uninteresting, we need to know if the call was known final at compile time. > is_optimized however is only true if the targets was loaded and was final at > compile time. This change makes sure that we get a call to the unverified > entry point if there was no null check emitted. > > hotspot/src/share/vm/c1/c1_GraphBuilder.cpp > Contains both changed and some simplifications. The is_resolved method > has been exploded, and redundant check was removed. The major change is > where we decide if a null check should be emitted and when profiling can be > added. > > * Testing > > These are some useful test I have run with and without -Xcomp and with and > without -XX:TieredStopAtLevel=1: > - jdk/test/java/sql/testng/test/sql/CallableStatementTests.java (for bug #1) > - JCK/BINC (binc02908m01 for bug #2 and all binc0500) > - hotspot/test/compiler/linkage/LinkageErrors.java > > I will await complete runs of hs-comp tier 0 - 5 before checkin. > > Please review, > > Regards, > Nils Eliasson > > > > Webrev: http://cr.openjdk.java.net/~neliasso/8160543/webrev.09/ > Bug [1]: https://bugs.openjdk.java.net/browse/JDK-8160543 > Bug [2]: https://bugs.openjdk.java.net/browse/JDK-8160383 From doug.simon at oracle.com Wed Sep 7 10:34:08 2016 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 7 Sep 2016 12:34:08 +0200 Subject: JVMCI compiler thread idle state is RUNNABLE In-Reply-To: <959DA194-390F-49B2-97FA-CE402CA9D03D@twitter.com> References: <959DA194-390F-49B2-97FA-CE402CA9D03D@twitter.com> Message-ID: <7C8D02B7-0069-4D1C-BF47-D435F443EBD9@oracle.com> > On 06 Sep 2016, at 23:39, Christian Thalinger wrote: > > One thing we noticed here at Twitter is that JVMCI threads are not hidden (is_hidden_from_external_view) but at the same time they show up as always active. > > I don?t know the history here but I?m speculating that since compiler threads were always hidden no-one bothered. In the SIGQUIT thread dump compiler threads show up as RUNNABLE: > > "C1 CompilerThread3" #8 daemon prio=9 os_prio=31 tid=0x00007fdcc2016800 nid=0x5103 waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > > "C2 CompilerThread2" #7 daemon prio=9 os_prio=31 tid=0x00007fdcc2821800 nid=0x4f03 waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > > The specification of RUNNABLE is: > > /** > * Thread state for a runnable thread. A thread in the runnable > * state is executing in the Java virtual machine but it may > * be waiting for other resources from the operating system > * such as processor. > */ > RUNNABLE, > > and that makes sense. But this is very confusing to the user (as one of our internal users reported to me). Maybe JVMCI threads should just be hidden, too? Why is this very confusing? All sorts of non-app threads show up in a SIGQUIT thread dump don?t they? Not sure if this is covered by your proposal/question but I don?t think JVMCI compiler threads should be hidden from JVMTI otherwise they could never be debugged by a Java IDE. -Doug From doug.simon at oracle.com Wed Sep 7 12:29:20 2016 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 7 Sep 2016 14:29:20 +0200 Subject: RFR: 8165434: [JVMCI] remove uses of setAccessible In-Reply-To: <864558C5-C2AD-4D6B-BB6F-568F00BBE28A@twitter.com> References: <864558C5-C2AD-4D6B-BB6F-568F00BBE28A@twitter.com> Message-ID: <6224CDA0-63E6-442C-BD13-732208FA75A2@oracle.com> > On 06 Sep 2016, at 20:12, Christian Thalinger wrote: > > >> On Sep 5, 2016, at 6:45 AM, Doug Simon wrote: >> >> JVMCI currently uses java.lang.reflect.AccessibleObject.setAccessible to get at private internals of certain JDK objects (e.g. java.lang.reflect.Method::slot). In light of changes around java.lang.reflect.AccessibleObject::setAccessible at http://openjdk.java.net/jeps/261, this may require extra command line options at some point. To avoid that, I?ve removed all uses of setAccessible in JVMCI. >> >> http://cr.openjdk.java.net/~dnsimon/8165434/ > > src/jdk.vm.ci/share/classes/jdk.vm.ci.meta/src/jdk/vm/ci/meta/ModifiersProvider.java > > + int BRIDGE = 0x0040; > + int VARARGS = 0x0080; > + int SYNTHETIC = 0x1000; > + int ANNOTATION = 0x2000; > + int ENUM = 0x4000; > I wish we could avoid that. We can?t use this stuff because it?s HotSpot-dependent, right? > + assert ModifiersProvider.SYNTHETIC == getConstant("JVM_ACC_SYNTHETIC", Integer.class); > + assert ModifiersProvider.ANNOTATION == getConstant("JVM_ACC_ANNOTATION", Integer.class); > + assert ModifiersProvider.BRIDGE == getConstant("JVM_ACC_BRIDGE", Integer.class); > + assert ModifiersProvider.VARARGS == getConstant("JVM_ACC_VARARGS", Integer.class); > + assert ModifiersProvider.ENUM == getConstant("JVM_ACC_ENUM", Integer.class); > What if we convert these constants to interface methods and the VM-dependent part has to implement them? Or maybe even keep the fields and assign them via interface methods. Following your suggestion, I?ve factored out these VM dependent flags to a new HotSpotModifiers class: http://cr.openjdk.java.net/~dnsimon/8165434.v2/ > src/share/vm/jvmci/vmStructs_jvmci.cpp > > declare_constant(JVM_ACC_FIELD_HAS_GENERIC_SIGNATURE) \ > > + declare_preprocessor_constant("JVM_ACC_VARARGS", JVM_ACC_VARARGS) \ > + declare_preprocessor_constant("JVM_ACC_BRIDGE", JVM_ACC_BRIDGE) \ > + declare_preprocessor_constant("JVM_ACC_ANNOTATION", JVM_ACC_ANNOTATION) \ > + declare_preprocessor_constant("JVM_ACC_ENUM", JVM_ACC_ENUM) \ > > declare_preprocessor_constant("JVM_ACC_SYNTHETIC", JVM_ACC_SYNTHETIC) \ > > Please align the ?\?. Done. > > Otherwise this looks good and generally a good cleanup. Thanks for the review. -Doug From jamsheed.c.m at oracle.com Wed Sep 7 14:14:15 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Wed, 7 Sep 2016 19:44:15 +0530 Subject: RFR: 8164508: unexpected profiling mismatch in c1 generated code In-Reply-To: <3ace86b5-ff91-7d2c-9b74-e4b9c497365c@oracle.com> References: <6d873f34-2f65-e96f-0189-ca8af07ae824@oracle.com> <3ace86b5-ff91-7d2c-9b74-e4b9c497365c@oracle.com> Message-ID: <2c802d19-5c80-afb1-cd24-943bced4d5ff@oracle.com> Thank you, Vladimir. Best Regards, Jamsheed On 9/6/2016 9:38 PM, Vladimir Kozlov wrote: > Good. > > thanks, > Vladimir > > On 9/5/16 12:53 AM, Jamsheed C m wrote: >> Hi, >> >> webrev: http://cr.openjdk.java.net/~jcm/8164508/webrev.00/ >> >> bug id: https://bugs.openjdk.java.net/browse/JDK-8164508 >> >> >> we were skipping profiling of first argument(recv) for virtual call >> sites to static callee. this was not done for non-inline case in c1. >> (see linked case for ref: >> https://bugs.openjdk.java.net/browse/JDK-8027631) >> >> - bool has_receiver = x->inlined() && !x->callee()->is_static() && >> !Bytecodes::has_receiver(bc); >> + bool has_receiver = x->callee()->is_loaded() && >> !x->callee()->is_static() && !Bytecodes::has_receiver(bc); above change >> is not absolutely necessary as this can happen only for >> _linkToVirtual,_linkToInterface sites inlining at present, and linker >> elimination and callee inlining always happen together in c1. Please >> review, Best Regards, Jamsheed >> >> >> From michael.c.berg at intel.com Wed Sep 7 16:57:45 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Wed, 7 Sep 2016 16:57:45 +0000 Subject: CR for RFR 8165565 Message-ID: Hi Folks, Some cases of CountedLoopEnd have side effect code on targets like SKX for vector processed post loops that are unsafe to translate to short branch versions. A recent change between b126 and b127 exposes this problem. The simple solution is to not allow short branch mapping for these cases. This produces correct code. A patch will be uploaded shortly to exemplify this case. The failures show up in SPECjvm2008 in the scimark metrics after b127 inclusive for SKX targets. This code was tested as follows: hotspot jreg, SPECjvm2008 on bdw and skx complete with no issues. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8165565 webrev: http://cr.openjdk.java.net/~mcberg/8165565/webrev.01/ Essentially it preserves this ad file pattern on x86 by disallowing branch shortening (this instruction pattern is predicate guarded for skx like platforms): jmpLoopEnd_and_restoreMask() { match(CountedLoopEnd cop cr); __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); __ restorevectmask(); which has the restore vector mask side effect code. Regards, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Sep 7 17:09:50 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 7 Sep 2016 10:09:50 -0700 Subject: CR for RFR 8165565 In-Reply-To: References: Message-ID: Michael, I think you should instead modify first condition in InstructForm::check_branch_variant() Thanks, Vladimir On 9/7/16 9:57 AM, Berg, Michael C wrote: > Hi Folks, > > Some cases of CountedLoopEnd have side effect code on targets like SKX > for vector processed post loops that are unsafe to translate to short > branch versions. A recent change between b126 and b127 exposes this > problem. The simple solution is to not allow short branch mapping for > these cases. This produces correct code. A patch will be uploaded > shortly to exemplify this case. The failures show up in SPECjvm2008 in > the scimark metrics after b127 inclusive for SKX targets. > > > > This code was tested as follows: hotspot jreg, SPECjvm2008 on bdw and > skx complete with no issues. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8165565 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8165565/webrev.01/ > > > > Essentially it preserves this ad file pattern on x86 by disallowing > branch shortening (this instruction pattern is predicate guarded for skx > like platforms): > > > > jmpLoopEnd_and_restoreMask() { > match(CountedLoopEnd cop cr); > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); > __ restorevectmask(); > > > > which has the restore vector mask side effect code. > > > > Regards, > > Michael > From cthalinger at twitter.com Wed Sep 7 17:19:05 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Wed, 7 Sep 2016 07:19:05 -1000 Subject: JVMCI compiler thread idle state is RUNNABLE In-Reply-To: <7C8D02B7-0069-4D1C-BF47-D435F443EBD9@oracle.com> References: <959DA194-390F-49B2-97FA-CE402CA9D03D@twitter.com> <7C8D02B7-0069-4D1C-BF47-D435F443EBD9@oracle.com> Message-ID: > On Sep 7, 2016, at 12:34 AM, Doug Simon wrote: > > >> On 06 Sep 2016, at 23:39, Christian Thalinger wrote: >> >> One thing we noticed here at Twitter is that JVMCI threads are not hidden (is_hidden_from_external_view) but at the same time they show up as always active. >> >> I don?t know the history here but I?m speculating that since compiler threads were always hidden no-one bothered. In the SIGQUIT thread dump compiler threads show up as RUNNABLE: >> >> "C1 CompilerThread3" #8 daemon prio=9 os_prio=31 tid=0x00007fdcc2016800 nid=0x5103 waiting on condition [0x0000000000000000] >> java.lang.Thread.State: RUNNABLE >> >> "C2 CompilerThread2" #7 daemon prio=9 os_prio=31 tid=0x00007fdcc2821800 nid=0x4f03 waiting on condition [0x0000000000000000] >> java.lang.Thread.State: RUNNABLE >> >> The specification of RUNNABLE is: >> >> /** >> * Thread state for a runnable thread. A thread in the runnable >> * state is executing in the Java virtual machine but it may >> * be waiting for other resources from the operating system >> * such as processor. >> */ >> RUNNABLE, >> >> and that makes sense. But this is very confusing to the user (as one of our internal users reported to me). Maybe JVMCI threads should just be hidden, too? > > Why is this very confusing? All sorts of non-app threads show up in a SIGQUIT thread dump don?t they? The confusing part is that JVMCI threads show up but C1/C2 threads don?t. Oh, maybe I wasn?t clear enough. I?m not talking about a SIGQUIT thread dump; I?m talking about the thread list you can get in Java code (JVM_GetAllThreads aka. ?external view?). > > Not sure if this is covered by your proposal/question but I don?t think JVMCI compiler threads should be hidden from JVMTI otherwise they could never be debugged by a Java IDE. I don?t know which threads JVMTI sees but I would assume it?s not using the ?external view?. > > -Doug From doug.simon at oracle.com Wed Sep 7 17:32:02 2016 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 7 Sep 2016 19:32:02 +0200 Subject: JVMCI compiler thread idle state is RUNNABLE In-Reply-To: References: <959DA194-390F-49B2-97FA-CE402CA9D03D@twitter.com> <7C8D02B7-0069-4D1C-BF47-D435F443EBD9@oracle.com> Message-ID: <9ACAF625-ABE6-40DC-AECB-E57E17659AD8@oracle.com> > On 07 Sep 2016, at 19:19, Christian Thalinger wrote: > >> >> On Sep 7, 2016, at 12:34 AM, Doug Simon wrote: >> >> >>> On 06 Sep 2016, at 23:39, Christian Thalinger wrote: >>> >>> One thing we noticed here at Twitter is that JVMCI threads are not hidden (is_hidden_from_external_view) but at the same time they show up as always active. >>> >>> I don?t know the history here but I?m speculating that since compiler threads were always hidden no-one bothered. In the SIGQUIT thread dump compiler threads show up as RUNNABLE: >>> >>> "C1 CompilerThread3" #8 daemon prio=9 os_prio=31 tid=0x00007fdcc2016800 nid=0x5103 waiting on condition [0x0000000000000000] >>> java.lang.Thread.State: RUNNABLE >>> >>> "C2 CompilerThread2" #7 daemon prio=9 os_prio=31 tid=0x00007fdcc2821800 nid=0x4f03 waiting on condition [0x0000000000000000] >>> java.lang.Thread.State: RUNNABLE >>> >>> The specification of RUNNABLE is: >>> >>> /** >>> * Thread state for a runnable thread. A thread in the runnable >>> * state is executing in the Java virtual machine but it may >>> * be waiting for other resources from the operating system >>> * such as processor. >>> */ >>> RUNNABLE, >>> >>> and that makes sense. But this is very confusing to the user (as one of our internal users reported to me). Maybe JVMCI threads should just be hidden, too? >> >> Why is this very confusing? All sorts of non-app threads show up in a SIGQUIT thread dump don?t they? > > The confusing part is that JVMCI threads show up but C1/C2 threads don?t. Oh, maybe I wasn?t clear enough. I?m not talking about a SIGQUIT thread dump; I?m talking about the thread list you can get in Java code (JVM_GetAllThreads aka. ?external view?). I don?t know what exactly is meant by ?external view? but I see that a bunch of JVMTI methods use is_hidden_from_external_view which, as you observed, returns false for JVMCI compiler threads. That seems to imply that changing the returned value for JVMCI threads will probably make such threads disappear in a Java debugger. Maybe all uses of is_hidden_from_external_view need to make their own decision on whether or not to include CompilerThreads that can_call_java. -Doug From cthalinger at twitter.com Wed Sep 7 17:52:19 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Wed, 7 Sep 2016 07:52:19 -1000 Subject: RFR: 8165434: [JVMCI] remove uses of setAccessible In-Reply-To: <6224CDA0-63E6-442C-BD13-732208FA75A2@oracle.com> References: <864558C5-C2AD-4D6B-BB6F-568F00BBE28A@twitter.com> <6224CDA0-63E6-442C-BD13-732208FA75A2@oracle.com> Message-ID: <999A422E-6CF6-45C5-955B-D58745DBB456@twitter.com> > On Sep 7, 2016, at 2:29 AM, Doug Simon wrote: > >> >> On 06 Sep 2016, at 20:12, Christian Thalinger wrote: >> >> >>> On Sep 5, 2016, at 6:45 AM, Doug Simon wrote: >>> >>> JVMCI currently uses java.lang.reflect.AccessibleObject.setAccessible to get at private internals of certain JDK objects (e.g. java.lang.reflect.Method::slot). In light of changes around java.lang.reflect.AccessibleObject::setAccessible at http://openjdk.java.net/jeps/261, this may require extra command line options at some point. To avoid that, I?ve removed all uses of setAccessible in JVMCI. >>> >>> http://cr.openjdk.java.net/~dnsimon/8165434/ >> >> src/jdk.vm.ci/share/classes/jdk.vm.ci.meta/src/jdk/vm/ci/meta/ModifiersProvider.java >> >> + int BRIDGE = 0x0040; >> + int VARARGS = 0x0080; >> + int SYNTHETIC = 0x1000; >> + int ANNOTATION = 0x2000; >> + int ENUM = 0x4000; >> I wish we could avoid that. We can?t use this stuff because it?s HotSpot-dependent, right? >> + assert ModifiersProvider.SYNTHETIC == getConstant("JVM_ACC_SYNTHETIC", Integer.class); >> + assert ModifiersProvider.ANNOTATION == getConstant("JVM_ACC_ANNOTATION", Integer.class); >> + assert ModifiersProvider.BRIDGE == getConstant("JVM_ACC_BRIDGE", Integer.class); >> + assert ModifiersProvider.VARARGS == getConstant("JVM_ACC_VARARGS", Integer.class); >> + assert ModifiersProvider.ENUM == getConstant("JVM_ACC_ENUM", Integer.class); >> What if we convert these constants to interface methods and the VM-dependent part has to implement them? Or maybe even keep the fields and assign them via interface methods. > > Following your suggestion, I?ve factored out these VM dependent flags to a new HotSpotModifiers class: > > http://cr.openjdk.java.net/~dnsimon/8165434.v2/ Excellent. One question? I noticed HotSpotModifiers is an interface but no other class implements it. Is there a reason for it being an interface? Only nit, remove 2011: 2 * Copyright (c) 2011, 2016, Oracle and/or its affiliates. All rights reserved. > >> src/share/vm/jvmci/vmStructs_jvmci.cpp >> >> declare_constant(JVM_ACC_FIELD_HAS_GENERIC_SIGNATURE) \ >> >> + declare_preprocessor_constant("JVM_ACC_VARARGS", JVM_ACC_VARARGS) \ >> + declare_preprocessor_constant("JVM_ACC_BRIDGE", JVM_ACC_BRIDGE) \ >> + declare_preprocessor_constant("JVM_ACC_ANNOTATION", JVM_ACC_ANNOTATION) \ >> + declare_preprocessor_constant("JVM_ACC_ENUM", JVM_ACC_ENUM) \ >> >> declare_preprocessor_constant("JVM_ACC_SYNTHETIC", JVM_ACC_SYNTHETIC) \ >> >> Please align the ?\?. > > Done. Looks good. > >> >> Otherwise this looks good and generally a good cleanup. > > Thanks for the review. > > -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From cthalinger at twitter.com Wed Sep 7 17:59:32 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Wed, 7 Sep 2016 07:59:32 -1000 Subject: JVMCI compiler thread idle state is RUNNABLE In-Reply-To: <9ACAF625-ABE6-40DC-AECB-E57E17659AD8@oracle.com> References: <959DA194-390F-49B2-97FA-CE402CA9D03D@twitter.com> <7C8D02B7-0069-4D1C-BF47-D435F443EBD9@oracle.com> <9ACAF625-ABE6-40DC-AECB-E57E17659AD8@oracle.com> Message-ID: <31663455-9FF4-4FDF-A33D-02CEBE488D00@twitter.com> > On Sep 7, 2016, at 7:32 AM, Doug Simon wrote: > >> >> On 07 Sep 2016, at 19:19, Christian Thalinger wrote: >> >>> >>> On Sep 7, 2016, at 12:34 AM, Doug Simon wrote: >>> >>> >>>> On 06 Sep 2016, at 23:39, Christian Thalinger wrote: >>>> >>>> One thing we noticed here at Twitter is that JVMCI threads are not hidden (is_hidden_from_external_view) but at the same time they show up as always active. >>>> >>>> I don?t know the history here but I?m speculating that since compiler threads were always hidden no-one bothered. In the SIGQUIT thread dump compiler threads show up as RUNNABLE: >>>> >>>> "C1 CompilerThread3" #8 daemon prio=9 os_prio=31 tid=0x00007fdcc2016800 nid=0x5103 waiting on condition [0x0000000000000000] >>>> java.lang.Thread.State: RUNNABLE >>>> >>>> "C2 CompilerThread2" #7 daemon prio=9 os_prio=31 tid=0x00007fdcc2821800 nid=0x4f03 waiting on condition [0x0000000000000000] >>>> java.lang.Thread.State: RUNNABLE >>>> >>>> The specification of RUNNABLE is: >>>> >>>> /** >>>> * Thread state for a runnable thread. A thread in the runnable >>>> * state is executing in the Java virtual machine but it may >>>> * be waiting for other resources from the operating system >>>> * such as processor. >>>> */ >>>> RUNNABLE, >>>> >>>> and that makes sense. But this is very confusing to the user (as one of our internal users reported to me). Maybe JVMCI threads should just be hidden, too? >>> >>> Why is this very confusing? All sorts of non-app threads show up in a SIGQUIT thread dump don?t they? >> >> The confusing part is that JVMCI threads show up but C1/C2 threads don?t. Oh, maybe I wasn?t clear enough. I?m not talking about a SIGQUIT thread dump; I?m talking about the thread list you can get in Java code (JVM_GetAllThreads aka. ?external view?). > > I don?t know what exactly is meant by ?external view? but I see that a bunch of JVMTI methods use is_hidden_from_external_view which, as you observed, returns false for JVMCI compiler threads. That seems to imply that changing the returned value for JVMCI threads will probably make such threads disappear in a Java debugger. Yes, I?ve seen the code in JVMTI too. We need a JVMTI expert. > Maybe all uses of is_hidden_from_external_view need to make their own decision on whether or not to include CompilerThreads that can_call_java. That would be suboptimal but from what I can see that might be the only way. I think we need another method next to is_hidden_from_external_view that distinguishes between native compiler threads and JVMCI threads. can_call_java might be that method but it has a horrible name. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Wed Sep 7 18:04:02 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Wed, 7 Sep 2016 18:04:02 +0000 Subject: CR for RFR 8165565 In-Reply-To: References: Message-ID: Vladimir please see the latest webrev: http://cr.openjdk.java.net/~mcberg/8165565/webrev.02a/ It has the change for check_branch_variant. I have tested it out and it works as advertised. Regards, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, September 07, 2016 10:10 AM To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: CR for RFR 8165565 Michael, I think you should instead modify first condition in InstructForm::check_branch_variant() Thanks, Vladimir On 9/7/16 9:57 AM, Berg, Michael C wrote: > Hi Folks, > > Some cases of CountedLoopEnd have side effect code on targets like SKX > for vector processed post loops that are unsafe to translate to short > branch versions. A recent change between b126 and b127 exposes this > problem. The simple solution is to not allow short branch mapping for > these cases. This produces correct code. A patch will be uploaded > shortly to exemplify this case. The failures show up in SPECjvm2008 in > the scimark metrics after b127 inclusive for SKX targets. > > > > This code was tested as follows: hotspot jreg, SPECjvm2008 on bdw and > skx complete with no issues. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8165565 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8165565/webrev.01/ > > > > Essentially it preserves this ad file pattern on x86 by disallowing > branch shortening (this instruction pattern is predicate guarded for > skx like platforms): > > > > jmpLoopEnd_and_restoreMask() { > match(CountedLoopEnd cop cr); > __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); > __ restorevectmask(); > > > > which has the restore vector mask side effect code. > > > > Regards, > > Michael > From vladimir.kozlov at oracle.com Wed Sep 7 18:05:20 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 7 Sep 2016 11:05:20 -0700 Subject: CR for RFR 8165565 In-Reply-To: References: Message-ID: <82c35d8b-b85a-2595-d6f8-849eaf619c16@oracle.com> Looks good. Thanks, Vladimir On 9/7/16 11:04 AM, Berg, Michael C wrote: > Vladimir please see the latest webrev: > > http://cr.openjdk.java.net/~mcberg/8165565/webrev.02a/ > > It has the change for check_branch_variant. > I have tested it out and it works as advertised. > > Regards, > Michael > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, September 07, 2016 10:10 AM > To: Berg, Michael C ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: CR for RFR 8165565 > > Michael, I think you should instead modify first condition in > InstructForm::check_branch_variant() > > Thanks, > Vladimir > > On 9/7/16 9:57 AM, Berg, Michael C wrote: >> Hi Folks, >> >> Some cases of CountedLoopEnd have side effect code on targets like SKX >> for vector processed post loops that are unsafe to translate to short >> branch versions. A recent change between b126 and b127 exposes this >> problem. The simple solution is to not allow short branch mapping for >> these cases. This produces correct code. A patch will be uploaded >> shortly to exemplify this case. The failures show up in SPECjvm2008 in >> the scimark metrics after b127 inclusive for SKX targets. >> >> >> >> This code was tested as follows: hotspot jreg, SPECjvm2008 on bdw and >> skx complete with no issues. >> >> >> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8165565 >> >> >> webrev: >> >> http://cr.openjdk.java.net/~mcberg/8165565/webrev.01/ >> >> >> >> Essentially it preserves this ad file pattern on x86 by disallowing >> branch shortening (this instruction pattern is predicate guarded for >> skx like platforms): >> >> >> >> jmpLoopEnd_and_restoreMask() { >> match(CountedLoopEnd cop cr); >> __ jcc((Assembler::Condition)($cop$$cmpcode), *L, false); >> __ restorevectmask(); >> >> >> >> which has the restore vector mask side effect code. >> >> >> >> Regards, >> >> Michael >> From jamsheed.c.m at oracle.com Thu Sep 8 09:56:14 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Thu, 8 Sep 2016 15:26:14 +0530 Subject: RFR: 8134389: Crash in HotSpot with jvm.dll+0x42b48 ciObjectFactory::create_new_metadata Message-ID: Hi All, bugid: https://bugs.openjdk.java.net/browse/JDK-8134389 webrev: http://cr.openjdk.java.net/~jcm/8134389/webrev.00/ return type information is not available in lforms, this causes contradictions in operation like store indexed. mh _linkTo* site arg type casting. etc.. fix: TypeCast to declared return type at lform return. Best Regards, Jamsheed From vitalyd at gmail.com Thu Sep 8 10:38:46 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 08 Sep 2016 10:38:46 +0000 Subject: MaxBCEAEstimateSize and inlining clarification Message-ID: Hi guys, I'm hoping someone could clarify how MaxBCEAEstimateSize interacts with inlining. The default max size is 150, nearly half the size of FreqInlineSize. Is EA eligibility performed on a method before it's inlined then? I can't imagine that 150 is the limit after inlining. If it's before inlining, how exactly does this work after the method is inlined since the inlined call graph may have quite a bit of code and thus EA may take a while? My understanding is EA is run after inlining to maximize its effectiveness. Or is the MaxBCEAEstimateLevel used as pseudo inlining for the analysis? I'm seeing some code that iterates over a ConcurrentHashMap's entrySet that allocates tens of GB of CHM$MapEntry objects even though they don't escape. I'm also seeing some other places where EA ought to be kicking in but isn't. So I'd like to understand the nuances of it a bit better. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Thu Sep 8 12:25:13 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 08 Sep 2016 14:25:13 +0200 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: Message-ID: > I'm hoping someone could clarify how MaxBCEAEstimateSize interacts with > inlining. > > The default max size is 150, nearly half the size of FreqInlineSize. Is EA > eligibility performed on a method before it's inlined then? I can't imagine > that 150 is the limit after inlining. If it's before inlining, how exactly > does this work after the method is inlined since the inlined call graph may > have quite a bit of code and thus EA may take a while? My understanding is > EA is run after inlining to maximize its effectiveness. Or is the > MaxBCEAEstimateLevel used as pseudo inlining for the analysis? EA happens after inlining. For calls that are not inlined, the bytecodes of the callees is analyzed to find more opportunities for EA. MaxBCEAEstimateSize affects the pass that operates on bytecodes of non inlined methods. Roland. From vitalyd at gmail.com Thu Sep 8 12:31:56 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 8 Sep 2016 08:31:56 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: Message-ID: Hi Roland, Thanks for the quick reply. On Thu, Sep 8, 2016 at 8:25 AM, Roland Westrelin wrote: > > > I'm hoping someone could clarify how MaxBCEAEstimateSize interacts with > > inlining. > > > > The default max size is 150, nearly half the size of FreqInlineSize. Is > EA > > eligibility performed on a method before it's inlined then? I can't > imagine > > that 150 is the limit after inlining. If it's before inlining, how > exactly > > does this work after the method is inlined since the inlined call graph > may > > have quite a bit of code and thus EA may take a while? My understanding > is > > EA is run after inlining to maximize its effectiveness. Or is the > > MaxBCEAEstimateLevel used as pseudo inlining for the analysis? > > EA happens after inlining. For calls that are not inlined, the bytecodes > of the callees is analyzed to find more opportunities for > EA. MaxBCEAEstimateSize affects the pass that operates on bytecodes of > non inlined methods. > Ok, I see - so this flag has no bearing on a method that's inlined. Great. Are there any other conditions/flags that may prevent EA from running? I'm talking about things other than an object escaping in some paths (i.e. the control flow insensitive EA as implemented in C2) or the ordering of EA vs loop unrolling (as I came to find out a few months ago on this list). Are OSR compilations performed with EA? > > Roland. > Thanks again. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Sep 8 12:38:35 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 8 Sep 2016 15:38:35 +0300 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: Message-ID: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> Vitaly, > The default max size is 150, nearly half the size of FreqInlineSize. Is > EA eligibility performed on a method before it's inlined then? I can't > imagine that 150 is the limit after inlining. If it's before inlining, > how exactly does this work after the method is inlined since the inlined > call graph may have quite a bit of code and thus EA may take a while? My > understanding is EA is run after inlining to maximize its effectiveness. > Or is the MaxBCEAEstimateLevel used as pseudo inlining for the analysis? Yes, it's sort of "pseudo inlining". EA happens after inlining is over (both parse & post-parse phases). For calls with known target, EA performs static analysis to compute escape info for arguments. It happens for methods smaller than MaxBCEAEstimateSize. MaxBCEAEstimateLevel limits the inlining depth during analysis. > I'm seeing some code that iterates over a ConcurrentHashMap's entrySet > that allocates tens of GB of CHM$MapEntry objects even though they don't > escape. I'm also seeing some other places where EA ought to be kicking > in but isn't. So I'd like to understand the nuances of it a bit better. I wish -XX:+PrintEscapeAnalysis & -XX:+PrintEliminateAllocations were available in product binaries, but they aren't, unfortunately. You can build an "optimized" JVM though. It's close to product binaries w.r.t. speed, but contains also provides most of diagnostic logic (e.g., all nonproduct flags are available). If autoboxing is involved, you can try -XX:+AggressiveUnboxing. Best regards, Vladimir Ivanov From vitalyd at gmail.com Thu Sep 8 13:07:17 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 8 Sep 2016 09:07:17 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> Message-ID: Hi Vladimir, On Thu, Sep 8, 2016 at 8:38 AM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > Vitaly, > > The default max size is 150, nearly half the size of FreqInlineSize. Is >> EA eligibility performed on a method before it's inlined then? I can't >> imagine that 150 is the limit after inlining. If it's before inlining, >> how exactly does this work after the method is inlined since the inlined >> call graph may have quite a bit of code and thus EA may take a while? My >> understanding is EA is run after inlining to maximize its effectiveness. >> Or is the MaxBCEAEstimateLevel used as pseudo inlining for the analysis? >> > > Yes, it's sort of "pseudo inlining". EA happens after inlining is over > (both parse & post-parse phases). For calls with known target, EA performs > static analysis to compute escape info for arguments. It happens for > methods smaller than MaxBCEAEstimateSize. MaxBCEAEstimateLevel limits the > inlining depth during analysis. By "known target", does that take profiling into account or it has to be statically known? But basically, it sounds like this is what Roland said -- any methods not inlined for whatever reason (not hot enough, too big, etc) are also inspected for EA purposes, but with the MaxBCEAEstimateSize and Level limits. > > > I'm seeing some code that iterates over a ConcurrentHashMap's entrySet >> that allocates tens of GB of CHM$MapEntry objects even though they don't >> escape. I'm also seeing some other places where EA ought to be kicking >> in but isn't. So I'd like to understand the nuances of it a bit better. >> > > I wish -XX:+PrintEscapeAnalysis & -XX:+PrintEliminateAllocations were > available in product binaries, but they aren't, unfortunately. Yes, that would be great! Is there a good reason they couldn't be turned into prod flags for, say, java 9? > You can build an "optimized" JVM though. It's close to product binaries > w.r.t. speed, but contains also provides most of diagnostic logic (e.g., > all nonproduct flags are available). > If autoboxing is involved, you can try -XX:+AggressiveUnboxing. > So I see this is behind UnlockExperimentalVMOptions (I'm on 8u92). Some of the instances I'm seeing are, indeed, autoboxing. Is this feature stable? What additional optimizations does it enable? Or put another way, why is it experimental? :) > > Best regards, > Vladimir Ivanov > Thanks Vladimir, very helpful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Thu Sep 8 13:12:34 2016 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 8 Sep 2016 15:12:34 +0200 Subject: RFR: 8165434: [JVMCI] remove uses of setAccessible In-Reply-To: <999A422E-6CF6-45C5-955B-D58745DBB456@twitter.com> References: <864558C5-C2AD-4D6B-BB6F-568F00BBE28A@twitter.com> <6224CDA0-63E6-442C-BD13-732208FA75A2@oracle.com> <999A422E-6CF6-45C5-955B-D58745DBB456@twitter.com> Message-ID: <21860311-D6E9-482B-B0A0-F488A516A1D3@oracle.com> > On 07 Sep 2016, at 19:52, Christian Thalinger wrote: > >> >> On Sep 7, 2016, at 2:29 AM, Doug Simon wrote: >> >>> >>> On 06 Sep 2016, at 20:12, Christian Thalinger wrote: >>> >>> >>>> On Sep 5, 2016, at 6:45 AM, Doug Simon wrote: >>>> >>>> JVMCI currently uses java.lang.reflect.AccessibleObject.setAccessible to get at private internals of certain JDK objects (e.g. java.lang.reflect.Method::slot). In light of changes around java.lang.reflect.AccessibleObject::setAccessible at http://openjdk.java.net/jeps/261, this may require extra command line options at some point. To avoid that, I?ve removed all uses of setAccessible in JVMCI. >>>> >>>> http://cr.openjdk.java.net/~dnsimon/8165434/ >>> >>> src/jdk.vm.ci/share/classes/jdk.vm.ci.meta/src/jdk/vm/ci/meta/ModifiersProvider.java >>> >>> + int BRIDGE = 0x0040; >>> + int VARARGS = 0x0080; >>> + int SYNTHETIC = 0x1000; >>> + int ANNOTATION = 0x2000; >>> + int ENUM = 0x4000; >>> I wish we could avoid that. We can?t use this stuff because it?s HotSpot-dependent, right? >>> + assert ModifiersProvider.SYNTHETIC == getConstant("JVM_ACC_SYNTHETIC", Integer.class); >>> + assert ModifiersProvider.ANNOTATION == getConstant("JVM_ACC_ANNOTATION", Integer.class); >>> + assert ModifiersProvider.BRIDGE == getConstant("JVM_ACC_BRIDGE", Integer.class); >>> + assert ModifiersProvider.VARARGS == getConstant("JVM_ACC_VARARGS", Integer.class); >>> + assert ModifiersProvider.ENUM == getConstant("JVM_ACC_ENUM", Integer.class); >>> What if we convert these constants to interface methods and the VM-dependent part has to implement them? Or maybe even keep the fields and assign them via interface methods. >> >> Following your suggestion, I?ve factored out these VM dependent flags to a new HotSpotModifiers class: >> >> http://cr.openjdk.java.net/~dnsimon/8165434.v2/ > > Excellent. One question? I noticed HotSpotModifiers is an interface but no other class implements it. Is there a reason for it being an interface? Nope. It?s now a class. > > Only nit, remove 2011: > 2 * Copyright (c) 2011, 2016, Oracle and/or its affiliates. All rights reserved. Fixed. -Doug From vladimir.x.ivanov at oracle.com Thu Sep 8 13:43:06 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 8 Sep 2016 16:43:06 +0300 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> Message-ID: <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> > Yes, it's sort of "pseudo inlining". EA happens after inlining is > over (both parse & post-parse phases). For calls with known target, > EA performs static analysis to compute escape info for arguments. It > happens for methods smaller than MaxBCEAEstimateSize. > MaxBCEAEstimateLevel limits the inlining depth during analysis. > > By "known target", does that take profiling into account or it has to be > statically known? But basically, it sounds like this is what Roland said > -- any methods not inlined for whatever reason (not hot enough, too big, > etc) are also inspected for EA purposes, but with the > MaxBCEAEstimateSize and Level limits. Profiling info isn't used at all. At the beginning all calls with known targets are already static calls (CallStaticJavaNode in the IR). And during analysis only static info (CHA) is used to devirtualize calls. > I wish -XX:+PrintEscapeAnalysis & -XX:+PrintEliminateAllocations > were available in product binaries, but they aren't, unfortunately. > > Yes, that would be great! Is there a good reason they couldn't be turned > into prod flags for, say, java 9? It's not that simple, since the flags use dumping logic not available in product binaries (e.g., Node::dump() to print corresponing IR nodes). I don't see a compelling reason not to have all the dumping logic available in product binaries, but it's much larger project, comparing to changing type for a couple of flags from "nonproduct" to "diagnostic". > You can build an "optimized" JVM though. It's close to product > binaries w.r.t. speed, but contains also provides most of diagnostic > logic (e.g., all nonproduct flags are available). > > > If autoboxing is involved, you can try -XX:+AggressiveUnboxing. > > So I see this is behind UnlockExperimentalVMOptions (I'm on 8u92). Some > of the instances I'm seeing are, indeed, autoboxing. Is this feature > stable? What additional optimizations does it enable? Or put another > way, why is it experimental? :) The approach to box elimination it does is more reliable (operates on valueOf calls instead of an inlined method). At least, we are not aware of any bugs in the implementation. It is still experimental because we haven't had time to test it thoroughly yet and it went out of our radars after intergration.. Hope to take care of it in 9u. Best regards, Vladimir Ivanov From vitalyd at gmail.com Thu Sep 8 13:54:47 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 8 Sep 2016 09:54:47 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> Message-ID: On Thu, Sep 8, 2016 at 9:43 AM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > > Yes, it's sort of "pseudo inlining". EA happens after inlining is >> over (both parse & post-parse phases). For calls with known target, >> EA performs static analysis to compute escape info for arguments. It >> happens for methods smaller than MaxBCEAEstimateSize. >> MaxBCEAEstimateLevel limits the inlining depth during analysis. >> >> By "known target", does that take profiling into account or it has to be >> statically known? But basically, it sounds like this is what Roland said >> -- any methods not inlined for whatever reason (not hot enough, too big, >> etc) are also inspected for EA purposes, but with the >> MaxBCEAEstimateSize and Level limits. >> > > Profiling info isn't used at all. At the beginning all calls with known > targets are already static calls (CallStaticJavaNode in the IR). And during > analysis only static info (CHA) is used to devirtualize calls. Thanks. Does that mean that marking classes/methods final helps here even if at runtime there're no subclasses? I know marking classes/methods final removes the need to register guards (by virtue of making the call static when receiver is final), but does it have added benefit for EA purposes here as well? I'm slightly confused by your "only static info (CHA) is used to devirtualize calls" statement. Are you referring to the same CHA concept where loaded class hierarchy is inspected? It sounds like you're not since you mention "static info", but CHA is dynamic in my mind. I'm probably misinterpreting this. > > > I wish -XX:+PrintEscapeAnalysis & -XX:+PrintEliminateAllocations >> were available in product binaries, but they aren't, unfortunately. >> >> Yes, that would be great! Is there a good reason they couldn't be turned >> into prod flags for, say, java 9? >> > > It's not that simple, since the flags use dumping logic not available in > product binaries (e.g., Node::dump() to print corresponing IR nodes). > > I don't see a compelling reason not to have all the dumping logic > available in product binaries, but it's much larger project, comparing to > changing type for a couple of flags from "nonproduct" to "diagnostic". Ok, understood. I do think this would be very valuable, so if you guys can make it happen it'd be greatly appreciated. > > > You can build an "optimized" JVM though. It's close to product >> binaries w.r.t. speed, but contains also provides most of diagnostic >> logic (e.g., all nonproduct flags are available). >> >> >> If autoboxing is involved, you can try -XX:+AggressiveUnboxing. >> >> So I see this is behind UnlockExperimentalVMOptions (I'm on 8u92). Some >> of the instances I'm seeing are, indeed, autoboxing. Is this feature >> stable? What additional optimizations does it enable? Or put another >> way, why is it experimental? :) >> > > The approach to box elimination it does is more reliable (operates on > valueOf calls instead of an inlined method). > > At least, we are not aware of any bugs in the implementation. It is still > experimental because we haven't had time to test it thoroughly yet and it > went out of our radars after intergration.. Hope to take care of it in 9u. > Ok, thanks. > > Best regards, > Vladimir Ivanov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From filipp.zhinkin at gmail.com Thu Sep 8 14:14:26 2016 From: filipp.zhinkin at gmail.com (Filipp Zhinkin) Date: Thu, 8 Sep 2016 17:14:26 +0300 Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version In-Reply-To: <28e894e35a3a431aa92d05b310b48970@DEWDFE13DE50.global.corp.sap> References: <28e894e35a3a431aa92d05b310b48970@DEWDFE13DE50.global.corp.sap> Message-ID: Hi Goetz, sorry for the late reply. The change looks good for me. Unfortunately I'm not able to sponsor it, because I'm not working at Oracle and can't submit JPRT. Regards, Filipp. On Tue, Sep 6, 2016 at 4:12 PM, Lindenmaier, Goetz wrote: > Hi Filipp, > > thanks for reviewing my change! > I fixed the two issues: > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.bs/ > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.hs/ > > The hotspot change is unchanged except for the reviewer attribution. > > I also fixed the comment in Platform.java: major->minor. > > Would you mind sponsoring the change? > > Best regards, > Goetz. > > >> -----Original Message----- >> From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] >> Sent: Dienstag, 6. September 2016 13:46 >> To: Volker Simonis >> Cc: Lindenmaier, Goetz ; hotspot-compiler- >> dev at openjdk.java.net >> Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version >> >> Hi, >> >> I would suggest to use something like Boolean.TRUE::booleanValue >> instead of null in AndPredicated ctor and use camel case for >> Platform's fields and methods. >> Otherwise the change looks good. >> >> Just for the record: all those predicates where introduced because >> there were no way to check OS/CPU/whatever using jtreg. >> Now it should be possible to skip tests using jreg's @required tag. So >> maybe we can get rid of some java code? :) >> // Not suggesting to do it right now. >> >> Regards, >> Filipp. >> >> On Tue, Sep 6, 2016 at 1:21 PM, Volker Simonis >> wrote: >> > Thumbs up from me! >> > >> > Volker >> > >> > On Tue, Sep 6, 2016 at 11:11 AM, Lindenmaier, Goetz >> > wrote: >> >> Hi Volker, >> >> >> >> thanks for the review! I fixed the two issues: >> >> http://cr.openjdk.java.net/~goetz/wr16/8165235- >> osRecog/02/webrev.hs/ >> >> http://cr.openjdk.java.net/~goetz/wr16/8165235- >> osRecog/02/webrev.bs/ >> >> >> >> Best regards, >> >> Goetz. >> >> >> >> >> >>> -----Original Message----- >> >>> From: Volker Simonis [mailto:volker.simonis at gmail.com] >> >>> Sent: Montag, 5. September 2016 14:57 >> >>> To: Lindenmaier, Goetz >> >>> Cc: hotspot-compiler-dev at openjdk.java.net >> >>> Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS >> version >> >>> >> >>> Hi Goetz, >> >>> >> >>> I think you've only forgot to import >> >>> compiler.testlibrary.rtm.predicate.SupportedOS into >> >>> test/compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java >> >>> >> >>> Also, in SupportedOS.java the line: >> >>> >> >>> public boolean getAsBoolean() >> >>> >> >>> is indented to far (should be four spaces less like the annotation in >> >>> the line before). >> >>> >> >>> Besides that, the change looks good. >> >>> >> >>> Thanks for fixing this, >> >>> Volker >> >>> >> >>> On Mon, Sep 5, 2016 at 1:54 PM, Lindenmaier, Goetz >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > >> >>> > >> >>> > This fixes the RTM tests wrt. to supported platforms on ppc. >> >>> > >> >>> > Please review this change. I please need a sponsor. >> >>> > http://cr.openjdk.java.net/~goetz/wr16/8165235- >> osRecog/01/webrev.bs/ >> >>> > >> >>> > http://cr.openjdk.java.net/~goetz/wr16/8165235- >> osRecog/01/webrev.hs/ >> >>> > >> >>> > >> >>> > RTM uses special instructions that are only available on recent x86 >> cpus. On >> >>> > x86, this feature does not need OS support. On ppc, the equivalent >> >>> > functionality, hardware transactional memory, requires OS support. >> Thus >> >>> the >> >>> > feature is only enabled by the VM if CPU and OS are at a specific level. >> The >> >>> > tests must check this. too. This holds for AIX and Linux. >> >>> > >> >>> > >> >>> > >> >>> > To do so, this change introduces rtm/predicate/SupportedOS.java >> which >> >>> checks >> >>> > for proper OS versions on ppc, else returns true. >> >>> > >> >>> > The OS version is retrieved from Platform.java, which has new >> methods >> >>> > getOsVersionMajor() and getOsVersionMinor(). >> >>> > >> >>> > To simplify the checks in the tests, I also introduced a 3-way >> AndPredicate >> >>> > constructor. >> >>> > >> >>> > >> >>> > >> >>> > To simplify the OS version check on Aix, I change enabling RTM on Aix >> to >> >>> > require AIX 7.2. >> >>> > >> >>> > Before, it was enabled on AIX 7.1.3.30, which contains an important >> bug fix. >> >>> > The >> >>> > >> >>> > last digits of this version are not exported to os.version property, so I >> >>> > can not >> >>> > >> >>> > check for them in the test. >> >>> > >> >>> > >> >>> > >> >>> > Best regards, >> >>> > >> >>> > Goetz. From goetz.lindenmaier at sap.com Thu Sep 8 14:36:00 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 8 Sep 2016 14:36:00 +0000 Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version In-Reply-To: References: <28e894e35a3a431aa92d05b310b48970@DEWDFE13DE50.global.corp.sap> Message-ID: Hi Fillipp, Oh, I understand. Thanks for reviewing anyways! Best regards, Goetz. > -----Original Message----- > From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] > Sent: Donnerstag, 8. September 2016 16:14 > To: Lindenmaier, Goetz > Cc: Volker Simonis ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version > > Hi Goetz, > > sorry for the late reply. > > The change looks good for me. > Unfortunately I'm not able to sponsor it, because I'm not working at > Oracle and can't submit JPRT. > > Regards, > Filipp. > > On Tue, Sep 6, 2016 at 4:12 PM, Lindenmaier, Goetz > wrote: > > Hi Filipp, > > > > thanks for reviewing my change! > > I fixed the two issues: > > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.bs/ > > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.hs/ > > > > The hotspot change is unchanged except for the reviewer attribution. > > > > I also fixed the comment in Platform.java: major->minor. > > > > Would you mind sponsoring the change? > > > > Best regards, > > Goetz. > > > > > >> -----Original Message----- > >> From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] > >> Sent: Dienstag, 6. September 2016 13:46 > >> To: Volker Simonis > >> Cc: Lindenmaier, Goetz ; hotspot- > compiler- > >> dev at openjdk.java.net > >> Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS > version > >> > >> Hi, > >> > >> I would suggest to use something like Boolean.TRUE::booleanValue > >> instead of null in AndPredicated ctor and use camel case for > >> Platform's fields and methods. > >> Otherwise the change looks good. > >> > >> Just for the record: all those predicates where introduced because > >> there were no way to check OS/CPU/whatever using jtreg. > >> Now it should be possible to skip tests using jreg's @required tag. So > >> maybe we can get rid of some java code? :) > >> // Not suggesting to do it right now. > >> > >> Regards, > >> Filipp. > >> > >> On Tue, Sep 6, 2016 at 1:21 PM, Volker Simonis > > >> wrote: > >> > Thumbs up from me! > >> > > >> > Volker > >> > > >> > On Tue, Sep 6, 2016 at 11:11 AM, Lindenmaier, Goetz > >> > wrote: > >> >> Hi Volker, > >> >> > >> >> thanks for the review! I fixed the two issues: > >> >> http://cr.openjdk.java.net/~goetz/wr16/8165235- > >> osRecog/02/webrev.hs/ > >> >> http://cr.openjdk.java.net/~goetz/wr16/8165235- > >> osRecog/02/webrev.bs/ > >> >> > >> >> Best regards, > >> >> Goetz. > >> >> > >> >> > >> >>> -----Original Message----- > >> >>> From: Volker Simonis [mailto:volker.simonis at gmail.com] > >> >>> Sent: Montag, 5. September 2016 14:57 > >> >>> To: Lindenmaier, Goetz > >> >>> Cc: hotspot-compiler-dev at openjdk.java.net > >> >>> Subject: Re: RFR(M): 8165235: [TESTBUG] RTM tests must check OS > >> version > >> >>> > >> >>> Hi Goetz, > >> >>> > >> >>> I think you've only forgot to import > >> >>> compiler.testlibrary.rtm.predicate.SupportedOS into > >> >>> > test/compiler/rtm/cli/TestUseRTMLockingOptionWithBiasedLocking.java > >> >>> > >> >>> Also, in SupportedOS.java the line: > >> >>> > >> >>> public boolean getAsBoolean() > >> >>> > >> >>> is indented to far (should be four spaces less like the annotation in > >> >>> the line before). > >> >>> > >> >>> Besides that, the change looks good. > >> >>> > >> >>> Thanks for fixing this, > >> >>> Volker > >> >>> > >> >>> On Mon, Sep 5, 2016 at 1:54 PM, Lindenmaier, Goetz > >> >>> wrote: > >> >>> > Hi, > >> >>> > > >> >>> > > >> >>> > > >> >>> > This fixes the RTM tests wrt. to supported platforms on ppc. > >> >>> > > >> >>> > Please review this change. I please need a sponsor. > >> >>> > http://cr.openjdk.java.net/~goetz/wr16/8165235- > >> osRecog/01/webrev.bs/ > >> >>> > > >> >>> > http://cr.openjdk.java.net/~goetz/wr16/8165235- > >> osRecog/01/webrev.hs/ > >> >>> > > >> >>> > > >> >>> > RTM uses special instructions that are only available on recent x86 > >> cpus. On > >> >>> > x86, this feature does not need OS support. On ppc, the equivalent > >> >>> > functionality, hardware transactional memory, requires OS support. > >> Thus > >> >>> the > >> >>> > feature is only enabled by the VM if CPU and OS are at a specific > level. > >> The > >> >>> > tests must check this. too. This holds for AIX and Linux. > >> >>> > > >> >>> > > >> >>> > > >> >>> > To do so, this change introduces rtm/predicate/SupportedOS.java > >> which > >> >>> checks > >> >>> > for proper OS versions on ppc, else returns true. > >> >>> > > >> >>> > The OS version is retrieved from Platform.java, which has new > >> methods > >> >>> > getOsVersionMajor() and getOsVersionMinor(). > >> >>> > > >> >>> > To simplify the checks in the tests, I also introduced a 3-way > >> AndPredicate > >> >>> > constructor. > >> >>> > > >> >>> > > >> >>> > > >> >>> > To simplify the OS version check on Aix, I change enabling RTM on > Aix > >> to > >> >>> > require AIX 7.2. > >> >>> > > >> >>> > Before, it was enabled on AIX 7.1.3.30, which contains an important > >> bug fix. > >> >>> > The > >> >>> > > >> >>> > last digits of this version are not exported to os.version property, so > I > >> >>> > can not > >> >>> > > >> >>> > check for them in the test. > >> >>> > > >> >>> > > >> >>> > > >> >>> > Best regards, > >> >>> > > >> >>> > Goetz. From goetz.lindenmaier at sap.com Thu Sep 8 14:38:44 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 8 Sep 2016 14:38:44 +0000 Subject: please sponsor? RFR(M): 8165235: [TESTBUG] RTM tests must check OS version Message-ID: Hi, This change was reviewed by Volker Simonis and Fillipp Zhinkin. Final webrevs: http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.bs/ http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.hs/ Could someone please sponsor? Thanks! Goetz > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Lindenmaier, Goetz > Sent: Montag, 5. September 2016 13:55 > To: hotspot-compiler-dev at openjdk.java.net > Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version > > Hi, > > > > This fixes the RTM tests wrt. to supported platforms on ppc. > > Please review this change. I please need a sponsor. > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.bs/ > > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.hs/ > > > RTM uses special instructions that are only available on recent x86 cpus. On > x86, this feature does not need OS support. On ppc, the equivalent > functionality, hardware transactional memory, requires OS support. Thus the > feature is only enabled by the VM if CPU and OS are at a specific level. The > tests must check this. too. This holds for AIX and Linux. > > > > To do so, this change introduces rtm/predicate/SupportedOS.java which > checks for proper OS versions on ppc, else returns true. > > The OS version is retrieved from Platform.java, which has new methods > getOsVersionMajor() and getOsVersionMinor(). > > To simplify the checks in the tests, I also introduced a 3-way AndPredicate > constructor. > > > > To simplify the OS version check on Aix, I change enabling RTM on Aix to > require AIX 7.2. > > Before, it was enabled on AIX 7.1.3.30, which contains an important bug fix. > The > > last digits of this version are not exported to os.version property, so I can not > > check for them in the test. > > > > Best regards, > > Goetz. From dmitrij.pochepko at oracle.com Thu Sep 8 14:48:00 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 8 Sep 2016 17:48:00 +0300 Subject: RFR: 8155219 - [TESTBUG] Rewrite compiler/ciReplay/TestVM.sh in java Message-ID: Hi, please review fix for 8155219 - [TESTBUG] Rewrite compiler/ciReplay/TestVM.sh in java compiler/ciReoplay/* tests were ported from shell to java. CR: https://bugs.openjdk.java.net/browse/JDK-8155219 webrev for root level: http://cr.openjdk.java.net/~dpochepk/8155219/webrev.root.01/ webrev for hotspot: http://cr.openjdk.java.net/~dpochepk/8155219/webrev.01/ I've tested it via rbt. Thanks, Dmitrij From vladimir.x.ivanov at oracle.com Thu Sep 8 15:53:11 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 8 Sep 2016 18:53:11 +0300 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> Message-ID: > Profiling info isn't used at all. At the beginning all calls with > known targets are already static calls (CallStaticJavaNode in the > IR). And during analysis only static info (CHA) is used to > devirtualize calls. > > Thanks. Does that mean that marking classes/methods final helps here > even if at runtime there're no subclasses? I know marking > classes/methods final removes the need to register guards (by virtue of > making the call static when receiver is final), but does it have added > benefit for EA purposes here as well? Marking classes/methods final can only reduce number of dependencies associated with a method, produced by CHA, but it doesn't give any new information to the analysis itself, so shouldn't affect inlining decisions. > I'm slightly confused by your "only static info (CHA) is used to > devirtualize calls" statement. Are you referring to the same CHA > concept where loaded class hierarchy is inspected? It sounds like you're > not since you mention "static info", but CHA is dynamic in my mind. I'm > probably misinterpreting this. Yes, sorry for the confusion. That's the same concept which is used during ordinary inlining: class hierarchy inspection and nmethod dependencies to trach changes. > It's not that simple, since the flags use dumping logic not > available in product binaries (e.g., Node::dump() to print > corresponing IR nodes). > > I don't see a compelling reason not to have all the dumping logic > available in product binaries, but it's much larger project, > comparing to changing type for a couple of flags from "nonproduct" > to "diagnostic". > > Ok, understood. I do think this would be very valuable, so if you guys > can make it happen it'd be greatly appreciated. Filed JDK-8165716 [1]. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8165716 From rednaxelafx at gmail.com Thu Sep 8 16:14:49 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Thu, 8 Sep 2016 09:14:49 -0700 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> Message-ID: Hi Vitaly, On Thu, Sep 8, 2016 at 6:54 AM, Vitaly Davidovich wrote: > > I'm slightly confused by your "only static info (CHA) is used to > devirtualize calls" statement. Are you referring to the same CHA concept > where loaded class hierarchy is inspected? It sounds like you're not since > you mention "static info", but CHA is dynamic in my mind. I'm probably > misinterpreting this. > One general rule of thumb: when you see JIT people talking about "static info" (e.g. statically resolvable target), that means values that are know at JIT-compile time. Or simply, at compile time. From the JIT compilers' point of view, CHA information is considered "static" with dependencies. Speaking of "statically known", I'd like to make a side note about static finals. Assuming we trust static finals will not be changed after first assignment (excluding outliers like System.in, System.out, System.err), then there's an interesting difference between what a JIT compiler consider as "static constant" than javac. In Java, "final" is really overloaded with two different meanings on the language level: "const" and "readonly" (using C#'s terminology) -- "const" for javac-level compile-time constants, and "readonly" for values that are initialized at runtime, but stay immutable after initialization. javac implements the Java Language Spec, and only treats "const" usage as constants. On the other hand, to a JIT compiler, both "const" and "readonly" usages would be considered as static constants, because the value is known at JIT-compile time and won't change afterwards. Just my two cents ;-) - Kris -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Thu Sep 8 16:27:55 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 8 Sep 2016 12:27:55 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> Message-ID: Hi Kris, On Thu, Sep 8, 2016 at 12:14 PM, Krystal Mok wrote: > Hi Vitaly, > > On Thu, Sep 8, 2016 at 6:54 AM, Vitaly Davidovich > wrote: >> >> I'm slightly confused by your "only static info (CHA) is used to >> devirtualize calls" statement. Are you referring to the same CHA concept >> where loaded class hierarchy is inspected? It sounds like you're not since >> you mention "static info", but CHA is dynamic in my mind. I'm probably >> misinterpreting this. >> > > One general rule of thumb: when you see JIT people talking about "static > info" (e.g. statically resolvable target), that means values that are know > at JIT-compile time. Or simply, at compile time. From the JIT compilers' > point of view, CHA information is considered "static" with dependencies. > Right. I just wanted to make sure that was the case here. "JIT-time static" would prevent confusion :). > > Speaking of "statically known", I'd like to make a side note about static > finals. Assuming we trust static finals will not be changed after first > assignment (excluding outliers like System.in, System.out, System.err), > then there's an interesting difference between what a JIT compiler consider > as "static constant" than javac. > In Java, "final" is really overloaded with two different meanings on the > language level: "const" and "readonly" (using C#'s terminology) -- "const" > for javac-level compile-time constants, and "readonly" for values that are > initialized at runtime, but stay immutable after initialization. > javac implements the Java Language Spec, and only treats "const" usage as > constants. On the other hand, to a JIT compiler, both "const" and > "readonly" usages would be considered as static constants, because the > value is known at JIT-compile time and won't change afterwards. > Yes. I make use of that quite a bit to make javac-time dynamic expressions be JIT-time constants. However, I hope the whole trusting final instance fields stuff happens soon. Otherwise, seemingly const-foldable code like this: static long makeMask() { return (1L << SomeEnum.A.ordinal()) | (1L << SomeEnum.B.ordinal()); // etc } isn't folded when makeMask is compiled. But make that mask a static final: static final long MASK = // same expression as above and we're good. > Just my two cents ;-) > > - Kris > Anyway, we're going off on a tangent here, but thanks for the thoughts Kris. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Thu Sep 8 16:32:04 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 8 Sep 2016 12:32:04 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> Message-ID: On Thu, Sep 8, 2016 at 11:53 AM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > > Profiling info isn't used at all. At the beginning all calls with >> known targets are already static calls (CallStaticJavaNode in the >> IR). And during analysis only static info (CHA) is used to >> devirtualize calls. >> >> Thanks. Does that mean that marking classes/methods final helps here >> even if at runtime there're no subclasses? I know marking >> classes/methods final removes the need to register guards (by virtue of >> making the call static when receiver is final), but does it have added >> benefit for EA purposes here as well? >> > > Marking classes/methods final can only reduce number of dependencies > associated with a method, produced by CHA, but it doesn't give any new > information to the analysis itself, so shouldn't affect inlining decisions. Right. We're on the same page now with respect to "static info (CHA)" :). By the way, and this is off-topic to this thread (apologies), but while we're discussing marking classes/methods final, are there any other footprint advantages to doing it even if CHA will devirt calls properly? So removing the need to register dependencies is one, and is good. Are the vtables smaller for these cases? Anything else that's an added benefit (from JVM runtime standpoint)? > > > I'm slightly confused by your "only static info (CHA) is used to >> devirtualize calls" statement. Are you referring to the same CHA >> concept where loaded class hierarchy is inspected? It sounds like you're >> not since you mention "static info", but CHA is dynamic in my mind. I'm >> probably misinterpreting this. >> > > Yes, sorry for the confusion. That's the same concept which is used during > ordinary inlining: class hierarchy inspection and nmethod dependencies to > trach changes. Yup, all good - thanks. > > > It's not that simple, since the flags use dumping logic not >> available in product binaries (e.g., Node::dump() to print >> corresponing IR nodes). >> >> I don't see a compelling reason not to have all the dumping logic >> available in product binaries, but it's much larger project, >> comparing to changing type for a couple of flags from "nonproduct" >> to "diagnostic". >> >> Ok, understood. I do think this would be very valuable, so if you guys >> can make it happen it'd be greatly appreciated. >> > > Filed JDK-8165716 [1]. > Thank you! > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8165716 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Thu Sep 8 21:48:11 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Thu, 8 Sep 2016 14:48:11 -0700 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> Message-ID: On Thu, Sep 8, 2016 at 9:32 AM, Vitaly Davidovich wrote: > > By the way, and this is off-topic to this thread (apologies), but while > we're discussing marking classes/methods final, are there any other > footprint advantages to doing it even if CHA will devirt calls properly? So > removing the need to register dependencies is one, and is good. Are the > vtables smaller for these cases? Anything else that's an added benefit > (from JVM runtime standpoint)? > Well...nothing that really stands out. Removing the need for registering the dependencies is certainly a good thing, but it doesn't really matter that much. The vtable won't be necessarily be smaller, it depends. What's guaranteed is that a final method won't need a *new* vtable entry. Because "final" can be labeled on a method that's virtual in some base class, and is only "final" on some derived class. That vtable slot in the derived class is going to be inherited from the base class and then set to the overriding target, so no saving at all in this case. bool klassVtable::needs_new_vtable_entry(methodHandle target_method, Klass* super, Handle classloader, Symbol* classname, AccessFlags class_flags, TRAPS) { // ... if (target_method->is_final_method(class_flags) || // a final method never needs a new entry; final methods can be statically // resolved and they have to be present in the vtable only if they override // a super's method, in which case they re-use its entry (target_method()->is_static()) || // static methods don't need to be in vtable (target_method()->name() == vmSymbols::object_initializer_name()) // is never called dynamically-bound ) { return false; } // ... } The only thing that I can think of that improves *interpreter* performance is the invoke_vfinal HotSpot internal bytecode. It allows the interpreter in HotSpot to skip the vtable lookup and directly dispatch to the target method, even when the original Java bytecode was invokevirtual. But it's only an optimization for the interpreter, and it doesn't matter for the JIT compilers. - Kris -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Thu Sep 8 22:13:12 2016 From: john.r.rose at oracle.com (John Rose) Date: Thu, 8 Sep 2016 15:13:12 -0700 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> Message-ID: <80D3ACAD-7751-4DF9-AFEE-B3AFEC420210@oracle.com> On Sep 8, 2016, at 2:48 PM, Krystal Mok wrote: > > On Thu, Sep 8, 2016 at 9:32 AM, Vitaly Davidovich > wrote: > > By the way, and this is off-topic to this thread (apologies), but while we're discussing marking classes/methods final, are there any other footprint advantages to doing it even if CHA will devirt calls properly? So removing the need to register dependencies is one, and is good. Are the vtables smaller for these cases? Anything else that's an added benefit (from JVM runtime standpoint)? > > Well...nothing that really stands out. > > Removing the need for registering the dependencies is certainly a good thing, but it doesn't really matter that much. > > The vtable won't be necessarily be smaller, it depends. What's guaranteed is that a final method won't need a *new* vtable entry. > Because "final" can be labeled on a method that's virtual in some base class, and is only "final" on some derived class. That vtable slot in the derived class is going to be inherited from the base class and then set to the overriding target, so no saving at all in this case. HotSpot is overly generous with v-table entries. IIRC even privates get their own entries, for convoluted reasons. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Thu Sep 8 22:13:31 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 8 Sep 2016 18:13:31 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> Message-ID: On Thursday, September 8, 2016, Krystal Mok wrote: > On Thu, Sep 8, 2016 at 9:32 AM, Vitaly Davidovich > wrote: > >> >> By the way, and this is off-topic to this thread (apologies), but while >> we're discussing marking classes/methods final, are there any other >> footprint advantages to doing it even if CHA will devirt calls properly? So >> removing the need to register dependencies is one, and is good. Are the >> vtables smaller for these cases? Anything else that's an added benefit >> (from JVM runtime standpoint)? >> > > Well...nothing that really stands out. > > Removing the need for registering the dependencies is certainly a good > thing, but it doesn't really matter that much. > I'll take it :). I'm assuming you think it doesn't really matter because it's only done for C2 compiled code (or C1 as well?), and that's not an excessive number and this is only checked at class loading time which also shouldn't happen much (if at all) once steady state is reached. Or is there something else/more to your reasoning? > > The vtable won't be necessarily be smaller, it depends. What's guaranteed > is that a final method won't need a *new* vtable entry. > Yes, I meant for classes that declare the method, not inherit it. > Because "final" can be labeled on a method that's virtual in some base > class, and is only "final" on some derived class. That vtable slot in the > derived class is going to be inherited from the base class and then set to > the overriding target, so no saving at all in this case. > > bool klassVtable::needs_new_vtable_entry(methodHandle target_method, > Klass* super, > Handle classloader, > Symbol* classname, > AccessFlags class_flags, > TRAPS) { > // ... > > if (target_method->is_final_method(class_flags) || > // a final method never needs a new entry; final methods can be > statically > // resolved and they have to be present in the vtable only if they > override > // a super's method, in which case they re-use its entry > (target_method()->is_static()) || > // static methods don't need to be in vtable > (target_method()->name() == vmSymbols::object_initializer_name()) > // is never called dynamically-bound > ) { > return false; > } > > // ... > } > Thanks for the code pointer. > > The only thing that I can think of that improves *interpreter* performance > is the invoke_vfinal HotSpot internal bytecode. It allows the interpreter > in HotSpot to skip the vtable lookup and directly dispatch to the target > method, even when the original Java bytecode was invokevirtual. But it's > only an optimization for the interpreter, and it doesn't matter for the JIT > compilers. > Yeah, don't care too much about interpreter :) > > - Kris > > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Thu Sep 8 23:28:29 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Thu, 8 Sep 2016 16:28:29 -0700 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: <80D3ACAD-7751-4DF9-AFEE-B3AFEC420210@oracle.com> References: <9a143950-c069-7cae-0c52-d16aae4fe8fe@oracle.com> <156e3c2e-238b-bbc1-08c6-358837839f5c@oracle.com> <80D3ACAD-7751-4DF9-AFEE-B3AFEC420210@oracle.com> Message-ID: On Thu, Sep 8, 2016 at 3:13 PM, John Rose wrote: > On Sep 8, 2016, at 2:48 PM, Krystal Mok wrote: > > > On Thu, Sep 8, 2016 at 9:32 AM, Vitaly Davidovich > wrote: > >> >> By the way, and this is off-topic to this thread (apologies), but while >> we're discussing marking classes/methods final, are there any other >> footprint advantages to doing it even if CHA will devirt calls properly? So >> removing the need to register dependencies is one, and is good. Are the >> vtables smaller for these cases? Anything else that's an added benefit >> (from JVM runtime standpoint)? >> > > Well...nothing that really stands out. > > Removing the need for registering the dependencies is certainly a good > thing, but it doesn't really matter that much. > > The vtable won't be necessarily be smaller, it depends. What's guaranteed > is that a final method won't need a *new* vtable entry. > Because "final" can be labeled on a method that's virtual in some base > class, and is only "final" on some derived class. That vtable slot in the > derived class is going to be inherited from the base class and then set to > the overriding target, so no saving at all in this case. > > > HotSpot is overly generous with v-table entries. IIRC even privates get > their own entries, for convoluted reasons. > > Thanks for the tip, John! You're right. Yikes... // private methods in classes always have a new entry in the vtable // specification interpretation since classic has // private methods not overriding // JDK8 adds private methods in interfaces which require invokespecial if (target_method()->is_private()) { return true; } - Kris > ? John > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kishor.kharbas at intel.com Fri Sep 9 00:46:49 2016 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Fri, 9 Sep 2016 00:46:49 +0000 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: References: <57BE1AD4.7070403@oracle.com> <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> <4af19c5d-9a7f-d18b-820b-6f3664b8183a@oracle.com> <7de8489c-943b-5ecf-48c1-0bffad101070@oracle.com> Message-ID: Hi Vladimir, I couldn't reproduce the error on my 32-bit Linux machine. The test was done on a Sandy bridge machine (has AVX instruction set) Please advise how to proceed further. Thanks Kishor -----Original Message----- From: Kharbas, Kishor Sent: Tuesday, September 6, 2016 5:40 PM To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Cc: Kharbas, Kishor Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows Hi Vladimir, The patch only touches code in _WIN64. I am having hard time to understand why the test fails for 32-bit Linux Btw, that test passes on Windows 64 platform. I am planning to test on Linux too. Thanks Kishor -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, September 6, 2016 2:31 PM To: Kharbas, Kishor ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows Next jtreg test failed on 32-bit Linux: hotspot/test/compiler/runtime/Test7196199.java ----------System.err:(57/2416)---------- test_incrc: [41] = 8.081506E20 != 150000.0 test_incrc: [42] = 1.8632992E31 != 150000.0 test_incrc: [43] = 2.8397877E29 != 150000.0 ... https://bugs.openjdk.java.net/browse/JDK-7196199 was related to Upper bits (64-255) of XMM (YMM) registers are not saved/restored in interrupt handle code during safepoint. Looks like your changes are not enough. Vladimir On 9/6/16 10:12 AM, Vladimir Kozlov wrote: > Good. I start testing these changes. I will push it if testing pass. > > Thanks, > Vladimir > > On 9/2/16 3:07 PM, Kharbas, Kishor wrote: >> Thanks Vladimir, >> >> I have updated the patch : >> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.02/ >> >> I looked for other places in src/cpu/x86/vm. I feel every case is >> covered. >> >> - Kishor >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Thursday, September 1, 2016 11:39 AM >> To: Kharbas, Kishor ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >> clobbered by a JNI call on windows >> >> Good. But looks like some code relied on old stack layout in stubs, >> for example sha256_AVX2(): >> >> #ifndef _WIN64 >> _XMM_SAVE_SIZE = 0, >> #else >> _XMM_SAVE_SIZE = 8*16, >> #endif >> >> Please, check that all other related code is fixed too. (I looked on >> all cases of _WIN64 in src/cpu/x86/vm/). >> >> Thanks, >> Vladimir >> >> On 8/31/16 10:17 PM, Kharbas, Kishor wrote: >>> Hello, >>> >>> I removed the unwanted save and restore of registers in the range >>> XMM6-XMM31 from the x64_64 stubs. >>> I also removed the #ifdef _WIN64 block from x86.ad file. >>> >>> Link to the new patch : >>> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ >>> >>> Thanks >>> Kishor >>> >>> >>> -----Original Message----- >>> From: Kharbas, Kishor >>> Sent: Wednesday, August 24, 2016 6:24 PM >>> To: Vladimir Kozlov ; >>> hotspot-compiler-dev at openjdk.java.net >>> Cc: Kharbas, Kishor >>> Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get >>> clobbered by a JNI call on windows >>> >>> Thanks Vladimir for quick feedback. >>> I will look into the stubs which save the registers in the range >>> XMM6-XMM31. Also the first comment makes perfect sense. >>> >>> Thanks >>> Kishor >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Wednesday, August 24, 2016 3:08 PM >>> To: Kharbas, Kishor ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >>> clobbered by a JNI call on windows >>> >>> Hi Kishor, >>> >>> First, #ifdef _WIN64 is not needed anymore since calling convention >>> is similat to unix now. >>> >>> Second, I would like you to look more broadly. With this change we >>> don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not >>> sure that we can remove all #ifdef _WIN64 there but for most of them >>> I think we can do. Please, look. >>> >>> Thanks, >>> Vladimir >>> >>> On 8/24/16 2:40 PM, Kharbas, Kishor wrote: >>>> Requesting the community to review the patch for >>>> https://bugs.openjdk.java.net/browse/JDK-8078122 >>>> >>>> Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 >>>> >>>> The patch changes the definitions of registers XMM6-XMM31 for WIN64. >>>> >>>> Thank you. >>>> >>>> Kishor >>>> From cnewland at chrisnewland.com Fri Sep 9 06:48:05 2016 From: cnewland at chrisnewland.com (Chris Newland) Date: Fri, 9 Sep 2016 07:48:05 +0100 Subject: Clarification on hot_throw optimisation Message-ID: <4a2a7e1c8db461198aa896cd24be45b1.squirrel@excalibur.xssl.net> Hi all, I'm adding support for highlighting the hot_throw HotSpot optimisation in JITWatch (a LogCompilation visualiser) [1] and would like to ask if I've understood it correctly please. Example code: https://github.com/AdoptOpenJDK/jitwatch/blob/master/core/src/main/resources/examples/HotThrow.java Issue where I've tried to note my findings: https://github.com/AdoptOpenJDK/jitwatch/issues/223 ====================================== import java.util.Random; public class HotThrow { private Random random = new Random(); public HotThrow() { StringBuilder builder = new StringBuilder(); String string = "The quick brown fox jumps over the lazy dog"; char[] chars = string.toCharArray(); for (int i = 0 ; i < 1_000_000; i++) { int index = random.nextInt(100); char c = getChar(chars, index); builder.append(c); } System.out.println(builder.toString()); } public char getChar(char[] chars, int index) { try { return chars[index]; } catch(ArrayIndexOutOfBoundsException e) { return '*'; } } public static void main(String[] args) { new HotThrow(); } } ====================================== I believe that the range check on the array index was eliminated in C2 but hit a trap when index was out of range. HotSpot then detected this as a hot throw in vm/opto/graphKit.cpp case Deoptimization::Reason_range_check: ex_obj = env()->ArrayIndexOutOfBoundsException_instance(); break; and because there was a local exception handler it uses a pre-allocated AIOOBE (without a stack trace?) and didn't deoptimise or drop back to the interpreter. JITWatch looks for LogCompilation like: and I then use bci reference and the method bytecode Exception table to look up the exception type and highlight it in the JITWatch UI: https://www.chrisnewland.com/images/jitwatch/release1.1/hotthrow.png Is this correct? I didn't quite understand the comment in graphKit // Note: If the deopt count has blown up, the uncommon trap // runtime is going to flush this nmethod, not matter what. Will the hot_throw optimisation stop working after a certain count? I've not observed that yet. Many thanks, Chris @chriswhocodes [1] https://github.com/AdoptOpenJDK/jitwatch From adinn at redhat.com Fri Sep 9 08:34:52 2016 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 9 Sep 2016 09:34:52 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix JNI floating point argument handling In-Reply-To: References: <1d9d7d75-a20e-4145-dfe6-e8ff8e3aea7c@redhat.com> Message-ID: <8f6113af-fd97-08c1-776d-f37b289be060@redhat.com> Hi Ningsheng, On 08/09/16 10:20, Ningsheng Jian wrote: > I have updated the webrev at: > > http://people.linaro.org/~ningsheng.jian/jni-fix/webrev.01/ > > Please help to review it. It passed jtreg tests on my arm server with > fastdebug build. Andrew Haley is on holiday at the moment so I have reviewed this patch. It looks fine and the test passes on my patched build. This will need a sponsor from Oracle to get it supplied with the necessary exemption for jdk9 and committed -- any takers from the [in cc] compiler dev list? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From doug.simon at oracle.com Fri Sep 9 09:01:39 2016 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 9 Sep 2016 11:01:39 +0200 Subject: RFR: 8165755: [JVMCI] replace use of vm_abort with vm_exit Message-ID: <799CA13D-6BDC-4BF5-9241-515A684191F4@oracle.com> Calling vm_abort from multiple threads can cause nasty crashes such as double free errors. We've seen this in Graal during JVMCI initialization when an unknown Graal option is encountered. Multiple compiler threads try to initialize JVMCI which fails with an exception indicating the bad option: Uncaught exception at /scratch/graaluser/buildslave/buildlog/ci_executor/main/graal-jvmci-8/src/share/vm/jvmci/jvmciCompiler.cpp:127 java.lang.ExceptionInInitializerError at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(HotSpotJVMCIRuntime.java:85) at jdk.vm.ci.runtime.JVMCI.initializeRuntime(Native Method) at jdk.vm.ci.runtime.JVMCI.(JVMCI.java:58) Caused by: java.lang.IllegalArgumentException: Could not find option OptSomethingThatDoesNotExcist at com.oracle.graal.options.OptionsParser.parseOption(OptionsParser.java:134) at com.oracle.graal.options.OptionsParser.parseOptions(OptionsParser.java:62) at com.oracle.graal.hotspot.HotSpotGraalCompilerFactory.initializeOptions(HotSpotGraalCompilerFactory.java:156) at com.oracle.graal.hotspot.HotSpotGraalCompilerFactory.onSelection(HotSpotGraalCompilerFactory.java:86) at jdk.vm.ci.hotspot.HotSpotJVMCICompilerConfig.getCompilerFactory(HotSpotJVMCICompilerConfig.java:96) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(HotSpotJVMCIRuntime.java:277) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(HotSpotJVMCIRuntime.java:67) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$DelayedInit.(HotSpotJVMCIRuntime.java:75) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(HotSpotJVMCIRuntime.java:85) at jdk.vm.ci.runtime.JVMCI.initializeRuntime(Native Method) at jdk.vm.ci.runtime.JVMCI.(JVMCI.java:58) The native JVMCI code then tries to exit the VM by calling vm_abort. If multiple compiler threads do this concurrently, certain destructors can be called twice as shown by these thread dumps: thread #26: tid = 0x0019, 0x00007fff84280124 libsystem_malloc.dylib`szone_size + 227, stop reason = signal SIGSTOP frame #0: 0x00007fff84280124 libsystem_malloc.dylib`szone_size + 227 frame #1: 0x00007fff8427fed5 libsystem_malloc.dylib`free + 61 frame #2: 0x000000010ac95963 libjvm.dylib`os::free(memblock=0x00007fedc86226e0, memflags=mtInternal) + 307 at os.cpp:711 frame #3: 0x000000010a2afc54 libjvm.dylib`FreeHeap(p=0x00007fedc86226e0, memflags=mtInternal) + 52 at allocation.inline.hpp:93 frame #4: 0x000000010acf0a9f libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8622650) + 63 at perfData.cpp:116 frame #5: 0x000000010acf0ae5 libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8622650) + 21 at perfData.cpp:114 frame #6: 0x000000010acf163d libjvm.dylib`PerfDataManager::destroy() + 109 at perfData.cpp:287 frame #7: 0x000000010acf3f4d libjvm.dylib`perfMemory_exit() + 61 at perfMemory.cpp:74 frame #8: 0x000000010ac9bb0d libjvm.dylib`os::shutdown() + 13 at os_bsd.cpp:1130 frame #9: 0x000000010ac9bb55 libjvm.dylib`os::abort(dump_core=false) + 21 at os_bsd.cpp:1150 frame #10: 0x000000010a9188e7 libjvm.dylib`vm_abort(dump_core=false) + 39 at java.cpp:666 frame #11: 0x000000010aa4f1e7 libjvm.dylib`JVMCIRuntime::abort_on_pending_exception(exception=Handle @ 0x000070000175b208, message="Uncaught exception at /Users/dsimon/graal/graal-jvmci-8/src/share/vm/jvmci/jvmciCompiler.cpp:127", dump_core=false) + 167 at jvmciRuntime.cpp:992 frame #12: 0x000000010aa17017 libjvm.dylib`JVMCICompiler::compile_method(this=0x00007fedcb203050, method=0x000070000175b8d8, entry_bci=-1, env=0x000070000175b8f0) + 311 at jvmciCompiler.cpp:127 frame #13: 0x000000010a656cd2 libjvm.dylib`CompileBroker::invoke_compiler_on_method(task=0x00007fedc853ca30) + 1314 at compileBroker.cpp:2207 thread #23: tid = 0x0016, 0x00007fff91fcb122 libsystem_kernel.dylib`__semwait_signal_nocancel + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff91fcb122 libsystem_kernel.dylib`__semwait_signal_nocancel + 10 frame #1: 0x00007fff9578c318 libsystem_c.dylib`nanosleep$NOCANCEL + 188 frame #2: 0x00007fff957b62ce libsystem_c.dylib`usleep$NOCANCEL + 54 frame #3: 0x00007fff957e46e9 libsystem_c.dylib`abort + 139 frame #4: 0x00007fff8428c396 libsystem_malloc.dylib`szone_error + 626 frame #5: 0x000000010ac95963 libjvm.dylib`os::free(memblock=0x00007fedc8601cd0, memflags=mtInternal) + 307 at os.cpp:711 frame #6: 0x000000010a2afc54 libjvm.dylib`FreeHeap(p=0x00007fedc8601cd0, memflags=mtInternal) + 52 at allocation.inline.hpp:93 frame #7: 0x000000010acf0a9f libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8601c60) + 63 at perfData.cpp:116 frame #8: 0x000000010acf0ae5 libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8601c60) + 21 at perfData.cpp:114 frame #9: 0x000000010acf163d libjvm.dylib`PerfDataManager::destroy() + 109 at perfData.cpp:287 frame #10: 0x000000010acf3f4d libjvm.dylib`perfMemory_exit() + 61 at perfMemory.cpp:74 frame #11: 0x000000010ac9bb0d libjvm.dylib`os::shutdown() + 13 at os_bsd.cpp:1130 frame #12: 0x000000010ac9bb55 libjvm.dylib`os::abort(dump_core=false) + 21 at os_bsd.cpp:1150 frame #13: 0x000000010a9188e7 libjvm.dylib`vm_abort(dump_core=false) + 39 at java.cpp:666 frame #14: 0x000000010aa4f1e7 libjvm.dylib`JVMCIRuntime::abort_on_pending_exception(exception=Handle @ 0x0000700001452208, message="Uncaught exception at /Users/dsimon/graal/graal-jvmci-8/src/share/vm/jvmci/jvmciCompiler.cpp:127", dump_core=false) + 167 at jvmciRuntime.cpp:992 frame #15: 0x000000010aa17017 libjvm.dylib`JVMCICompiler::compile_method(this=0x00007fedcb203050, method=0x00007000014528d8, entry_bci=-1, env=0x00007000014528f0) + 311 at jvmciCompiler.cpp:127 frame #16: 0x000000010a656cd2 libjvm.dylib`CompileBroker::invoke_compiler_on_method(task=0x00007fedc862a320) + 1314 at compileBroker.cpp:2207 This webrev replaces calls to vm_abort() with before_exit() + vm_exit(). The latter is thread safe. https://bugs.openjdk.java.net/browse/JDK-8165755 http://cr.openjdk.java.net/~dnsimon/8165755/ -Doug From aph at redhat.com Fri Sep 9 10:10:23 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 9 Sep 2016 11:10:23 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix JNI floating point argument handling In-Reply-To: <8f6113af-fd97-08c1-776d-f37b289be060@redhat.com> References: <1d9d7d75-a20e-4145-dfe6-e8ff8e3aea7c@redhat.com> <8f6113af-fd97-08c1-776d-f37b289be060@redhat.com> Message-ID: <0e8ac5bb-0ab7-1120-a1d3-a15bf786c6da@redhat.com> On 09/09/16 09:34, Andrew Dinn wrote: > Hi Ningsheng, > > On 08/09/16 10:20, Ningsheng Jian wrote: >> I have updated the webrev at: >> >> http://people.linaro.org/~ningsheng.jian/jni-fix/webrev.01/ >> >> Please help to review it. It passed jtreg tests on my arm server with >> fastdebug build. > > Andrew Haley is on holiday at the moment so I have reviewed this patch. > It looks fine and the test passes on my patched build. It's good. I'm surprised (not to say appalled) that we never noticed this before now. Andrew. From adinn at redhat.com Fri Sep 9 12:59:11 2016 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 9 Sep 2016 13:59:11 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix JNI floating point argument handling In-Reply-To: <0e8ac5bb-0ab7-1120-a1d3-a15bf786c6da@redhat.com> References: <1d9d7d75-a20e-4145-dfe6-e8ff8e3aea7c@redhat.com> <8f6113af-fd97-08c1-776d-f37b289be060@redhat.com> <0e8ac5bb-0ab7-1120-a1d3-a15bf786c6da@redhat.com> Message-ID: <3f7844ac-8bd6-207d-cd81-ae93c8391dcf@redhat.com> On 09/09/16 11:10, Andrew Haley wrote: > On 09/09/16 09:34, Andrew Dinn wrote: >> Hi Ningsheng, >> >> On 08/09/16 10:20, Ningsheng Jian wrote: >>> I have updated the webrev at: >>> >>> http://people.linaro.org/~ningsheng.jian/jni-fix/webrev.01/ >>> >>> Please help to review it. It passed jtreg tests on my arm server with >>> fastdebug build. >> >> Andrew Haley is on holiday at the moment so I have reviewed this patch. >> It looks fine and the test passes on my patched build. > > It's good. I'm surprised (not to say appalled) that we never noticed > this before now. Yeah, I was going to run hg blame on the jdk8 tree to see who wrote such obviously broken code -- but I think it was me so I decided not to bother! Do we still need someone from Oracle to sponsor this and provide it with an exemption before it can go into JDK9? Or can you commit it? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From aph at redhat.com Fri Sep 9 13:30:39 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 9 Sep 2016 14:30:39 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix JNI floating point argument handling In-Reply-To: <3f7844ac-8bd6-207d-cd81-ae93c8391dcf@redhat.com> References: <1d9d7d75-a20e-4145-dfe6-e8ff8e3aea7c@redhat.com> <8f6113af-fd97-08c1-776d-f37b289be060@redhat.com> <0e8ac5bb-0ab7-1120-a1d3-a15bf786c6da@redhat.com> <3f7844ac-8bd6-207d-cd81-ae93c8391dcf@redhat.com> Message-ID: On 09/09/16 13:59, Andrew Dinn wrote: > Yeah, I was going to run hg blame on the jdk8 tree to see who wrote such > obviously broken code -- but I think it was me so I decided not to bother! > > Do we still need someone from Oracle to sponsor this and provide it with > an exemption before it can go into JDK9? Or can you commit it? This is a serious bug so it must go in. We need sponsorship for the test case. If we don't get that sponsorship we can commit into the aarch64 dir, but let's try for sponsorship first. Andrew. From cthalinger at twitter.com Fri Sep 9 17:48:28 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Fri, 9 Sep 2016 07:48:28 -1000 Subject: RFR: 8165755: [JVMCI] replace use of vm_abort with vm_exit In-Reply-To: <799CA13D-6BDC-4BF5-9241-515A684191F4@oracle.com> References: <799CA13D-6BDC-4BF5-9241-515A684191F4@oracle.com> Message-ID: <6391B00B-AFBF-410C-A6A1-2ED95B35EBEB@twitter.com> I think this looks fine but maybe we should ask the runtime folks. > On Sep 8, 2016, at 11:01 PM, Doug Simon wrote: > > Calling vm_abort from multiple threads can cause nasty crashes such as double free errors. We've seen this in Graal during JVMCI initialization when an unknown Graal option is encountered. Multiple compiler threads try to initialize JVMCI which fails with an exception indicating the bad option: > > Uncaught exception at /scratch/graaluser/buildslave/buildlog/ci_executor/main/graal-jvmci-8/src/share/vm/jvmci/jvmciCompiler.cpp:127 > java.lang.ExceptionInInitializerError > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(HotSpotJVMCIRuntime.java:85) > at jdk.vm.ci.runtime.JVMCI.initializeRuntime(Native Method) > at jdk.vm.ci.runtime.JVMCI.(JVMCI.java:58) > Caused by: java.lang.IllegalArgumentException: Could not find option OptSomethingThatDoesNotExcist > at com.oracle.graal.options.OptionsParser.parseOption(OptionsParser.java:134) > at com.oracle.graal.options.OptionsParser.parseOptions(OptionsParser.java:62) > at com.oracle.graal.hotspot.HotSpotGraalCompilerFactory.initializeOptions(HotSpotGraalCompilerFactory.java:156) > at com.oracle.graal.hotspot.HotSpotGraalCompilerFactory.onSelection(HotSpotGraalCompilerFactory.java:86) > at jdk.vm.ci.hotspot.HotSpotJVMCICompilerConfig.getCompilerFactory(HotSpotJVMCICompilerConfig.java:96) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(HotSpotJVMCIRuntime.java:277) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(HotSpotJVMCIRuntime.java:67) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$DelayedInit.(HotSpotJVMCIRuntime.java:75) > at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(HotSpotJVMCIRuntime.java:85) > at jdk.vm.ci.runtime.JVMCI.initializeRuntime(Native Method) > at jdk.vm.ci.runtime.JVMCI.(JVMCI.java:58) > > The native JVMCI code then tries to exit the VM by calling vm_abort. If multiple compiler threads do this concurrently, certain destructors can be called twice as shown by these thread dumps: > > thread #26: tid = 0x0019, 0x00007fff84280124 libsystem_malloc.dylib`szone_size + 227, stop reason = signal SIGSTOP > frame #0: 0x00007fff84280124 libsystem_malloc.dylib`szone_size + 227 > frame #1: 0x00007fff8427fed5 libsystem_malloc.dylib`free + 61 > frame #2: 0x000000010ac95963 libjvm.dylib`os::free(memblock=0x00007fedc86226e0, memflags=mtInternal) + 307 at os.cpp:711 > frame #3: 0x000000010a2afc54 libjvm.dylib`FreeHeap(p=0x00007fedc86226e0, memflags=mtInternal) + 52 at allocation.inline.hpp:93 > frame #4: 0x000000010acf0a9f libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8622650) + 63 at perfData.cpp:116 > frame #5: 0x000000010acf0ae5 libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8622650) + 21 at perfData.cpp:114 > frame #6: 0x000000010acf163d libjvm.dylib`PerfDataManager::destroy() + 109 at perfData.cpp:287 > frame #7: 0x000000010acf3f4d libjvm.dylib`perfMemory_exit() + 61 at perfMemory.cpp:74 > frame #8: 0x000000010ac9bb0d libjvm.dylib`os::shutdown() + 13 at os_bsd.cpp:1130 > frame #9: 0x000000010ac9bb55 libjvm.dylib`os::abort(dump_core=false) + 21 at os_bsd.cpp:1150 > frame #10: 0x000000010a9188e7 libjvm.dylib`vm_abort(dump_core=false) + 39 at java.cpp:666 > frame #11: 0x000000010aa4f1e7 libjvm.dylib`JVMCIRuntime::abort_on_pending_exception(exception=Handle @ 0x000070000175b208, message="Uncaught exception at /Users/dsimon/graal/graal-jvmci-8/src/share/vm/jvmci/jvmciCompiler.cpp:127", dump_core=false) + 167 at jvmciRuntime.cpp:992 > frame #12: 0x000000010aa17017 libjvm.dylib`JVMCICompiler::compile_method(this=0x00007fedcb203050, method=0x000070000175b8d8, entry_bci=-1, env=0x000070000175b8f0) + 311 at jvmciCompiler.cpp:127 > frame #13: 0x000000010a656cd2 libjvm.dylib`CompileBroker::invoke_compiler_on_method(task=0x00007fedc853ca30) + 1314 at compileBroker.cpp:2207 > > thread #23: tid = 0x0016, 0x00007fff91fcb122 libsystem_kernel.dylib`__semwait_signal_nocancel + 10, stop reason = signal SIGSTOP > frame #0: 0x00007fff91fcb122 libsystem_kernel.dylib`__semwait_signal_nocancel + 10 > frame #1: 0x00007fff9578c318 libsystem_c.dylib`nanosleep$NOCANCEL + 188 > frame #2: 0x00007fff957b62ce libsystem_c.dylib`usleep$NOCANCEL + 54 > frame #3: 0x00007fff957e46e9 libsystem_c.dylib`abort + 139 > frame #4: 0x00007fff8428c396 libsystem_malloc.dylib`szone_error + 626 > frame #5: 0x000000010ac95963 libjvm.dylib`os::free(memblock=0x00007fedc8601cd0, memflags=mtInternal) + 307 at os.cpp:711 > frame #6: 0x000000010a2afc54 libjvm.dylib`FreeHeap(p=0x00007fedc8601cd0, memflags=mtInternal) + 52 at allocation.inline.hpp:93 > frame #7: 0x000000010acf0a9f libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8601c60) + 63 at perfData.cpp:116 > frame #8: 0x000000010acf0ae5 libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8601c60) + 21 at perfData.cpp:114 > frame #9: 0x000000010acf163d libjvm.dylib`PerfDataManager::destroy() + 109 at perfData.cpp:287 > frame #10: 0x000000010acf3f4d libjvm.dylib`perfMemory_exit() + 61 at perfMemory.cpp:74 > frame #11: 0x000000010ac9bb0d libjvm.dylib`os::shutdown() + 13 at os_bsd.cpp:1130 > frame #12: 0x000000010ac9bb55 libjvm.dylib`os::abort(dump_core=false) + 21 at os_bsd.cpp:1150 > frame #13: 0x000000010a9188e7 libjvm.dylib`vm_abort(dump_core=false) + 39 at java.cpp:666 > frame #14: 0x000000010aa4f1e7 libjvm.dylib`JVMCIRuntime::abort_on_pending_exception(exception=Handle @ 0x0000700001452208, message="Uncaught exception at /Users/dsimon/graal/graal-jvmci-8/src/share/vm/jvmci/jvmciCompiler.cpp:127", dump_core=false) + 167 at jvmciRuntime.cpp:992 > frame #15: 0x000000010aa17017 libjvm.dylib`JVMCICompiler::compile_method(this=0x00007fedcb203050, method=0x00007000014528d8, entry_bci=-1, env=0x00007000014528f0) + 311 at jvmciCompiler.cpp:127 > frame #16: 0x000000010a656cd2 libjvm.dylib`CompileBroker::invoke_compiler_on_method(task=0x00007fedc862a320) + 1314 at compileBroker.cpp:2207 > > > This webrev replaces calls to vm_abort() with before_exit() + vm_exit(). The latter is thread safe. > > https://bugs.openjdk.java.net/browse/JDK-8165755 > http://cr.openjdk.java.net/~dnsimon/8165755/ > > -Doug From doug.simon at oracle.com Fri Sep 9 18:33:40 2016 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 9 Sep 2016 20:33:40 +0200 Subject: RFR: 8165755: [JVMCI] replace use of vm_abort with vm_exit In-Reply-To: <6391B00B-AFBF-410C-A6A1-2ED95B35EBEB@twitter.com> References: <799CA13D-6BDC-4BF5-9241-515A684191F4@oracle.com> <6391B00B-AFBF-410C-A6A1-2ED95B35EBEB@twitter.com> Message-ID: <2F360221-7E6B-43BB-B69C-69A17777E5F2@oracle.com> Can someone from the runtime team confirm that using vm_exit (instead of vm_abort) is the best way to stop the VM when JVMCI initialization fails (e.g., when invalid JVMCI options are provided on the command line). Thanks! -Doug > On 09 Sep 2016, at 19:48, Christian Thalinger wrote: > > I think this looks fine but maybe we should ask the runtime folks. > >> On Sep 8, 2016, at 11:01 PM, Doug Simon wrote: >> >> Calling vm_abort from multiple threads can cause nasty crashes such as double free errors. We've seen this in Graal during JVMCI initialization when an unknown Graal option is encountered. Multiple compiler threads try to initialize JVMCI which fails with an exception indicating the bad option: >> >> Uncaught exception at /scratch/graaluser/buildslave/buildlog/ci_executor/main/graal-jvmci-8/src/share/vm/jvmci/jvmciCompiler.cpp:127 >> java.lang.ExceptionInInitializerError >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(HotSpotJVMCIRuntime.java:85) >> at jdk.vm.ci.runtime.JVMCI.initializeRuntime(Native Method) >> at jdk.vm.ci.runtime.JVMCI.(JVMCI.java:58) >> Caused by: java.lang.IllegalArgumentException: Could not find option OptSomethingThatDoesNotExcist >> at com.oracle.graal.options.OptionsParser.parseOption(OptionsParser.java:134) >> at com.oracle.graal.options.OptionsParser.parseOptions(OptionsParser.java:62) >> at com.oracle.graal.hotspot.HotSpotGraalCompilerFactory.initializeOptions(HotSpotGraalCompilerFactory.java:156) >> at com.oracle.graal.hotspot.HotSpotGraalCompilerFactory.onSelection(HotSpotGraalCompilerFactory.java:86) >> at jdk.vm.ci.hotspot.HotSpotJVMCICompilerConfig.getCompilerFactory(HotSpotJVMCICompilerConfig.java:96) >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(HotSpotJVMCIRuntime.java:277) >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.(HotSpotJVMCIRuntime.java:67) >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime$DelayedInit.(HotSpotJVMCIRuntime.java:75) >> at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(HotSpotJVMCIRuntime.java:85) >> at jdk.vm.ci.runtime.JVMCI.initializeRuntime(Native Method) >> at jdk.vm.ci.runtime.JVMCI.(JVMCI.java:58) >> >> The native JVMCI code then tries to exit the VM by calling vm_abort. If multiple compiler threads do this concurrently, certain destructors can be called twice as shown by these thread dumps: >> >> thread #26: tid = 0x0019, 0x00007fff84280124 libsystem_malloc.dylib`szone_size + 227, stop reason = signal SIGSTOP >> frame #0: 0x00007fff84280124 libsystem_malloc.dylib`szone_size + 227 >> frame #1: 0x00007fff8427fed5 libsystem_malloc.dylib`free + 61 >> frame #2: 0x000000010ac95963 libjvm.dylib`os::free(memblock=0x00007fedc86226e0, memflags=mtInternal) + 307 at os.cpp:711 >> frame #3: 0x000000010a2afc54 libjvm.dylib`FreeHeap(p=0x00007fedc86226e0, memflags=mtInternal) + 52 at allocation.inline.hpp:93 >> frame #4: 0x000000010acf0a9f libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8622650) + 63 at perfData.cpp:116 >> frame #5: 0x000000010acf0ae5 libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8622650) + 21 at perfData.cpp:114 >> frame #6: 0x000000010acf163d libjvm.dylib`PerfDataManager::destroy() + 109 at perfData.cpp:287 >> frame #7: 0x000000010acf3f4d libjvm.dylib`perfMemory_exit() + 61 at perfMemory.cpp:74 >> frame #8: 0x000000010ac9bb0d libjvm.dylib`os::shutdown() + 13 at os_bsd.cpp:1130 >> frame #9: 0x000000010ac9bb55 libjvm.dylib`os::abort(dump_core=false) + 21 at os_bsd.cpp:1150 >> frame #10: 0x000000010a9188e7 libjvm.dylib`vm_abort(dump_core=false) + 39 at java.cpp:666 >> frame #11: 0x000000010aa4f1e7 libjvm.dylib`JVMCIRuntime::abort_on_pending_exception(exception=Handle @ 0x000070000175b208, message="Uncaught exception at /Users/dsimon/graal/graal-jvmci-8/src/share/vm/jvmci/jvmciCompiler.cpp:127", dump_core=false) + 167 at jvmciRuntime.cpp:992 >> frame #12: 0x000000010aa17017 libjvm.dylib`JVMCICompiler::compile_method(this=0x00007fedcb203050, method=0x000070000175b8d8, entry_bci=-1, env=0x000070000175b8f0) + 311 at jvmciCompiler.cpp:127 >> frame #13: 0x000000010a656cd2 libjvm.dylib`CompileBroker::invoke_compiler_on_method(task=0x00007fedc853ca30) + 1314 at compileBroker.cpp:2207 >> >> thread #23: tid = 0x0016, 0x00007fff91fcb122 libsystem_kernel.dylib`__semwait_signal_nocancel + 10, stop reason = signal SIGSTOP >> frame #0: 0x00007fff91fcb122 libsystem_kernel.dylib`__semwait_signal_nocancel + 10 >> frame #1: 0x00007fff9578c318 libsystem_c.dylib`nanosleep$NOCANCEL + 188 >> frame #2: 0x00007fff957b62ce libsystem_c.dylib`usleep$NOCANCEL + 54 >> frame #3: 0x00007fff957e46e9 libsystem_c.dylib`abort + 139 >> frame #4: 0x00007fff8428c396 libsystem_malloc.dylib`szone_error + 626 >> frame #5: 0x000000010ac95963 libjvm.dylib`os::free(memblock=0x00007fedc8601cd0, memflags=mtInternal) + 307 at os.cpp:711 >> frame #6: 0x000000010a2afc54 libjvm.dylib`FreeHeap(p=0x00007fedc8601cd0, memflags=mtInternal) + 52 at allocation.inline.hpp:93 >> frame #7: 0x000000010acf0a9f libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8601c60) + 63 at perfData.cpp:116 >> frame #8: 0x000000010acf0ae5 libjvm.dylib`PerfData::~PerfData(this=0x00007fedc8601c60) + 21 at perfData.cpp:114 >> frame #9: 0x000000010acf163d libjvm.dylib`PerfDataManager::destroy() + 109 at perfData.cpp:287 >> frame #10: 0x000000010acf3f4d libjvm.dylib`perfMemory_exit() + 61 at perfMemory.cpp:74 >> frame #11: 0x000000010ac9bb0d libjvm.dylib`os::shutdown() + 13 at os_bsd.cpp:1130 >> frame #12: 0x000000010ac9bb55 libjvm.dylib`os::abort(dump_core=false) + 21 at os_bsd.cpp:1150 >> frame #13: 0x000000010a9188e7 libjvm.dylib`vm_abort(dump_core=false) + 39 at java.cpp:666 >> frame #14: 0x000000010aa4f1e7 libjvm.dylib`JVMCIRuntime::abort_on_pending_exception(exception=Handle @ 0x0000700001452208, message="Uncaught exception at /Users/dsimon/graal/graal-jvmci-8/src/share/vm/jvmci/jvmciCompiler.cpp:127", dump_core=false) + 167 at jvmciRuntime.cpp:992 >> frame #15: 0x000000010aa17017 libjvm.dylib`JVMCICompiler::compile_method(this=0x00007fedcb203050, method=0x00007000014528d8, entry_bci=-1, env=0x00007000014528f0) + 311 at jvmciCompiler.cpp:127 >> frame #16: 0x000000010a656cd2 libjvm.dylib`CompileBroker::invoke_compiler_on_method(task=0x00007fedc862a320) + 1314 at compileBroker.cpp:2207 >> >> >> This webrev replaces calls to vm_abort() with before_exit() + vm_exit(). The latter is thread safe. >> >> https://bugs.openjdk.java.net/browse/JDK-8165755 >> http://cr.openjdk.java.net/~dnsimon/8165755/ >> >> -Doug > From jamsheed.c.m at oracle.com Fri Sep 9 22:23:00 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Sat, 10 Sep 2016 03:53:00 +0530 Subject: RFR: 8134389: Crash in HotSpot with jvm.dll+0x42b48 ciObjectFactory::create_new_metadata In-Reply-To: References: Message-ID: <05c82c51-9525-eec7-206e-a265c7d47194@oracle.com> adding a little more description as per my understanding This issue can happen only for compiled lforms not inlined case there are two scenarios. 1) no compiled lforms inlined 2) some compiled lforms are inlined or final method is not inlined (linkTo* not inlined).. (i.e partially inlined) in all these cases *Invoke instruction* will be *return Value*. and will have erased type. so we reify return type either by type casting(for partially inlined case) or by directly pulling from callsite MT. Best Regards, Jamsheed On 9/8/2016 3:26 PM, Jamsheed C m wrote: > Hi All, > > bugid: https://bugs.openjdk.java.net/browse/JDK-8134389 > > webrev: http://cr.openjdk.java.net/~jcm/8134389/webrev.00/ > > return type information is not available in lforms, this causes > contradictions in operation like store indexed. mh _linkTo* site arg > type casting. etc.. > > fix: TypeCast to declared return type at lform return. > > Best Regards, > > Jamsheed > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamsheed.c.m at oracle.com Sun Sep 11 11:51:43 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Sun, 11 Sep 2016 17:21:43 +0530 Subject: RFR: 8134389: Crash in HotSpot with jvm.dll+0x42b48 ciObjectFactory::create_new_metadata In-Reply-To: <05c82c51-9525-eec7-206e-a265c7d47194@oracle.com> References: <05c82c51-9525-eec7-206e-a265c7d47194@oracle.com> Message-ID: <7c1a8b01-b4ec-ea23-b59a-500c1bfd5dbc@oracle.com> i made some changes to my fix. webrev is updated in place. pit results with latest modification updated in bug(not still completed) Best Regards, Jamsheed On 9/10/2016 3:53 AM, Jamsheed C m wrote: > > adding a little more description as per my understanding > > This issue can happen only for compiled lforms not inlined case > > there are two scenarios. > 1) no compiled lforms inlined > 2) some compiled lforms are inlined or final method is not inlined > (linkTo* not inlined).. (i.e partially inlined) > > in all these cases *Invoke instruction* will be *return Value*. and > will have erased type. > so we reify return type either by type casting(for partially inlined > case) or by directly pulling from callsite MT. > > Best Regards, > > Jamsheed > > > On 9/8/2016 3:26 PM, Jamsheed C m wrote: >> Hi All, >> >> bugid: https://bugs.openjdk.java.net/browse/JDK-8134389 >> >> webrev: http://cr.openjdk.java.net/~jcm/8134389/webrev.00/ >> >> return type information is not available in lforms, this causes >> contradictions in operation like store indexed. mh _linkTo* site arg >> type casting. etc.. >> >> fix: TypeCast to declared return type at lform return. >> >> Best Regards, >> >> Jamsheed >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Mon Sep 12 16:35:24 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 12 Sep 2016 18:35:24 +0200 Subject: RFR(S): 8159611: C2: ArrayCopy elimination skips required parameter checks In-Reply-To: <7ce01d28-13f5-098a-9898-080f8258881d@oracle.com> References: <57B2A380.6000408@oracle.com> <41851a79-5ffe-2b9d-504a-6a2301de5384@oracle.com> <7ce01d28-13f5-098a-9898-080f8258881d@oracle.com> Message-ID: Sorry for the long delay... Here's my new version: http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v3/ I've actually changed PhaseMacroExpand::expand_arraycopy_node() such that it calls generate_arraycopy() with 'length_never_negative' set to true if EliminateAllocations is true (in this case we already checked in LibraryCallKit::inline_arraycopy() that 'length' is not negative). This way I could leave generate_arraycopy() untouched. The generated code now looks as follows: Original version (without 'length < 0' check): 0a7 B5: # B17 B6 <- B4 Freq: 0,999998 0a7 cmpl R9, R11 # unsigned 0aa jb,u B17 P=0,000001 C=-1,000000 ... 0da B7: # B18 B8 <- B6 B12 B13 Freq: 0,999997 0da movl R11, [rsp + #8] # spill 0df testl R11, R11 0e2 jle B18 P=0,000001 C=-1,000000 ... 0e8 B8: # B9 <- B7 Freq: 0,999996 0f9 call_leaf_nofp,runtime oop_disjoint_arraycopy ... 106 B9: # B10 <- B8 B18 B20 Freq: 0,999997 113 ret ... 184 B17: # N1 <- B4 B5 Freq: 2,01328e-06 193 call,static wrapper for: uncommon_trap(reason='intrinsic_or_type_checked_inlining' action='make_not_entrant' debug_id='0') 19d B18: # B9 B19 <- B7 Freq: 9,99997e-07 19d testl R11, R11 1a0 jge B9 P=0,999999 C=-1,000000 1a0 1a6 B19: # B22 B20 <- B18 Freq: 9,99997e-13 1a6 movq RSI, R8 # spill 1a9 movl RDX, #1 # int 1ae movq RCX, R10 # spill 1b1 movl R8, #1 # int 1b7 movl R9, R11 # spill nop # 1 bytes pad for loops and calls 1bb call,static wrapper for: slow_arraycopy In B5 there's a check if 'offset+length' is still in the array range. If not we jump to the uncommon trap in B17. In B7 there's the first check from PhaseMacroExpand::generate_arraycopy() (i.e. generate_nonpositive_guard()). If 'length is less than or equal to zero we jump to B18 where there's the second check from PhaseMacroExpand::generate_arraycopy() (i.e. generate_negative_guard()). If 'length' is zero, we jump to B9 and return. Otherwise we fall into B19 from where we call slow_arraycopy. slow_arraycopy (which is generated in ObjArrayKlass::copy_array() will throw an AIOOB exception if 'length' is negative. The new version now looks as follows: 0a2 B5: # B19 B6 <- B4 Freq: 0,999998 0a2 cmpl R10, RCX # unsigned 0a5 jb,u B19 P=0,000001 C=-1,000000 0a5 0ab B6: # B20 B7 <- B5 Freq: 0,999997 0ab movl R10, [rsp + #0] # spill 0af testl R10, R10 0b2 jl B20 P=0,000001 C=-1,000000 0b2 ... 0e2 B8: # B10 B9 <- B7 B13 B14 Freq: 0,999996 0e2 testl R10, R10 0e5 je,s B10 P=0,000001 C=-1,000000 ... 0e7 B9: # B10 <- B8 Freq: 0,999995 0f8 call_leaf_nofp,runtime oop_disjoint_arraycopy ... 105 B10: # B11 <- B9 B8 Freq: 0,999996 112 ret ... 18e B19: # B20 <- B5 Freq: 9,99998e-07 192 B20: # N1 <- B18 B19 B6 Freq: 3,01327e-06 1a3 call,static wrapper for: uncommon_trap(reason='intrinsic_or_type_checked_inlining' action='make_not_entrant' debug_id='0') B5 is like before, but is now followed by the extra check for 'length' being not negative in B6. In B8 we we now have the first check (i.e. generate_negative_guard()) from PhaseMacroExpand::generate_arraycopy(). It directly checks if 'length' is zero and jumps to B10 (i.e. returns) if so. Otherwise we fall directly into oop_disjoint_arraycopy(). There's no need to check for 'length' being negative and calling 'slow_arraycopy' because this case is already handled before now (in B6). Is this OK now? Thank you and best regards, Volker On Fri, Aug 26, 2016 at 3:51 AM, Vladimir Kozlov wrote: > Looks good. > > Check does not fold because it is different: LT vs LE. > > Actually there are 3 checks together with yours (see > PhaseMacroExpand::generate_arraycopy()): > > Node* not_pos = generate_nonpositive_guard(ctrl, copy_length, > length_never_negative); > if (not_pos != NULL) { > Node* local_ctrl = not_pos, *local_io = *io; > MergeMemNode* local_mem = MergeMemNode::make(mem); > transform_later(local_mem); > > // (6) length must not be negative. > if (!length_never_negative) { > generate_negative_guard(&local_ctrl, copy_length, slow_region); > } > > I think the only way to avoid this is to modify code in generate_arraycopy() > when EliminateAllocations is true. In such case you need to generate only > length == 0 check. > > Thanks, > Vladimir > > > On 8/25/16 10:03 AM, Volker Simonis wrote: >> >> On Tue, Aug 16, 2016 at 11:49 PM, Vladimir Kozlov >> wrote: >>> >>> Not generating exception is definitely bug. >>> >>> First, about test case. It would be nice if it also verifies other >>> IndexOutOfBoundsException cases. >>> >> >> I've extended the test case. See: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v2/ >> >> With the new test I've caught another problem in C1 (only on x86 and >> s390, but that's not in the OpenJDK yet :). >> >> LIR_Assembler::emit_arraycopy() had a shortcut for length==0 which >> prevented the throwing of an ArrayStoreException if src and dst arrays >> have incompatible type (see do_test2() in the new regression test). >> Note that this is a different error from 8160591 and not fixed by the >> change for 8160591. >> >> I've also moved the new check after the offset + length check as >> suggested by you (see new webrev). >> >> Unfortunately, the new check is still not eliminated. Here's how it looks: >> >> 0ae B6: # B20 B7 <- B5 Freq: 0,999997 >> 0ae movl R9, [rsp + #0] # spill >> 0b2 testl R9, R9 >> 0b5 jl B20 P=0,000001 C=-1,000000 >> 0b5 >> 0bb B7: # B12 B8 <- B6 Freq: 0,999996 >> 0bb movl R11, [R10 + #8 (8-bit)] # compressed klass ptr >> 0bf decode_klass_not_null RAX,R11 >> 0cc movl RBX, [RAX + #16 (8-bit)] # int >> 0cf movslq RCX, RBX # i2l >> 0d2 movq RSI, precise klass [Ljava/lang/Object;: >> 0x00007ff1080320d0:Constant:exact * # ptr >> 0dc movq RCX, [RSI + RCX] # class >> 0e0 cmpq RAX, RCX # ptr >> 0e3 jne,us B12 P=0,170000 C=-1,000000 >> 0e3 >> 0e5 B8: # B21 B9 <- B7 B13 B14 Freq: 0,999996 >> 0e5 testl R9, R9 >> 0e8 jle B21 P=0,000001 C=-1,000000 >> >> As you can see 'testl R9, R9' is executed two times. >> >> I've even tried to move the new check after the subtype check, but >> that doesn't helps either: >> >> 0da B7: # B20 B8 <- B6 B13 B14 Freq: 0,999997 >> 0da movl R11, [rsp + #8] # spill >> 0df testl R11, R11 >> 0e2 jl B20 P=0,000001 C=-1,000000 >> 0e2 >> 0e8 B8: # B10 B9 <- B7 Freq: 0,999996 >> 0e8 testl R11, R11 >> 0eb jle,s B10 P=0,000001 C=-1,000000 >> >> Any idea how this could be fixed? >> >> Thanks, >> Volker >> >> PS: and I still don't have a reproducible benchmark which shows a >> regression with my change... >> >> >>> Actually additional dynamic check will help in case of negative length is >>> know during compilation. The allocation code will be eliminated very >>> early >>> instead of waiting macro expansion: >>> >>> int length = alloc->in(AllocateNode::ALength)->find_int_con(-1); >>> if (length < 0) { >>> NOT_PRODUCT(fail_eliminate = "Array's size is not constant";) >>> can_eliminate = false; >>> } >>> >>> About additional length check in your new test. I think it may be >>> collapsed >>> with preceding check since it is generated after other checks. >>> So I would suggest to move it after offset + length check. >>> >>> Thanks, >>> Vladimir >>> >>> >>> On 8/16/16 7:57 AM, Volker Simonis wrote: >>>> >>>> >>>> On Tue, Aug 16, 2016 at 7:24 AM, Tobias Hartmann >>>> wrote: >>>>> >>>>> >>>>> Hi Volker, >>>>> >>>>> thanks for taking care of this issue! >>>>> >>>>> Did you check what happens if the allocation is not eliminated and >>>>> macro >>>>> expansion phase emits another negative guard? Are the checks merged? >>>>> >>>> >>>> It depends. I just saw that in some cases the regression test worked >>>> before, because the length check was done in >>>> SharedRuntime::slow_arraycopy_C(). So in that case there's obviously >>>> nothing that can be merged. But the test case is obviously a >>>> degenerated example anyway, so I don't think that's a problem. >>>> >>>> If I do a more real-world example like this where the arracopy can not >>>> be eliminated because one of its arguments escapes: >>>> >>>> public static boolean do_test2(int length, Object[] dest) { >>>> try { >>>> System.arraycopy(new Object[10], 1, dest, 1, length); >>>> return false; >>>> } catch (IndexOutOfBoundsException e) { >>>> return true; >>>> } >>>> } >>>> >>>> and call it with: >>>> >>>> do_test2(8, new Object[10]) >>>> >>>> the generated code for do_test2() unfortunately contains one more >>>> check now with my change (the 'length' field is in [rsp + #0]): >>>> >>>> 0a2 B4: # B18 B5 <- B3 Freq: 0,999999 >>>> 0a2 movl R9, [rsp + #0] # spill >>>> 0a6 testl R9, R9 >>>> 0a9 jl B18 P=0,000001 C=-1,000000 >>>> 0a9 >>>> 0af B5: # B18 B6 <- B4 Freq: 0,999998 >>>> 0af movl RBX, R9 # spill >>>> 0b2 incl RBX # int >>>> 0b4 cmpl RBX, #10 # unsigned >>>> 0b7 jnbe,u B18 P=0,000001 C=-1,000000 >>>> >>>> The generated code before my change looked like this (againthe >>>> 'length' field is in [rsp + #0]): >>>> >>>> 0a1 B4: # B17 B5 <- B3 Freq: 0,999999 >>>> 0a1 movl R11, [rsp + #8] # spill >>>> 0a6 incl R11 # int >>>> 0a9 cmpl R11, #10 # unsigned >>>> 0ad jnbe,u B17 P=0,000001 C=-1,000000 >>>> >>>> It seems that the 'length' check has been completely eliminated before. >>>> >>>> So I need to do some more tests to understand why the new check isn't >>>> eliminated. >>>> >>>> Do you think the new check results in a performance regression? Have >>>> you run some benchmarks? >>>> >>>>> I would prefer brackets around the if body but you don't need to send >>>>> another webrev: >>>>> if (EliminateAllocations) { >>>>> generate_negative_guard(length, slow_region); >>>>> } >>>> >>>> >>>> >>>> Yes, I agree. >>>> >>>>> >>>>> Best regards, >>>>> Tobias >>>>> >>>>> On 12.08.2016 21:13, Volker Simonis wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> can I please have a review and sponsor for the following fix: >>>>>> >>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611 >>>>>> https://bugs.openjdk.java.net/browse/JDK-8159611 >>>>>> >>>>>> >>>>>> We are inserting several checks for the arguments of >>>>>> System.arraycopy() in LibraryCallKit::inline_arraycopy() before >>>>>> intensifying the call in LibraryCallKit::inline_arraycopy. However the >>>>>> check for the 'length' argument of arracopy is postponed to the macro >>>>>> expansion phase in PhaseMacroExpand::generate_arraycopy(). >>>>>> >>>>>> But if we are running with EscapeAnalysis and EliminateAllocations, >>>>>> the array allocations inside a call to System.arraycopy() may get >>>>>> eliminated and thus the complete call to System.arraycopy() will be >>>>>> removed (see PhaseMacroExpand::process_users_of_allocation). In this >>>>>> case the extra 'length' check won't be added by >>>>>> PhaseMacroExpand::generate_arraycopy() any more because macro >>>>>> expansion happens after the elimination of macro nodes. >>>>>> >>>>>> In such a case it may happen that System.arraycopy() will silently >>>>>> accept an invalid (i.e. negative) 'length' parameter, although it >>>>>> should actually throw an ArrayOutOfBounds exception. >>>>>> >>>>>> The fix is simple: also insert a check for the length field in >>>>>> LibraryCallKit::inline_arraycopy() if we are running with >>>>>> EliminateAllocations. >>>>>> >>>>>> Regards, >>>>>> Volker >>>>>> >>> > From cthalinger at twitter.com Mon Sep 12 17:26:02 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Mon, 12 Sep 2016 07:26:02 -1000 Subject: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI In-Reply-To: <92B9E4F8-DF56-475B-A9EC-6FB179C58925@twitter.com> References: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> <7ED300F2-253B-4550-BF5E-878A99EDAEB2@oracle.com> <92B9E4F8-DF56-475B-A9EC-6FB179C58925@twitter.com> Message-ID: <938FBEA5-0AAB-4640-B231-E259B00275AB@twitter.com> > On Sep 6, 2016, at 11:58 AM, Christian Thalinger wrote: > >> >> On Sep 6, 2016, at 11:37 AM, Doug Simon wrote: >> >> >>> On 06 Sep 2016, at 20:14, Christian Thalinger wrote: >>> >>> >>>> On Sep 5, 2016, at 6:49 AM, Doug Simon wrote: >>>> >>>> In jvmci-8, we increased the interpreter code size when JVMCI code is included: >>>> >>>> http://hg.openjdk.java.net/graal/graal-jvmci-8/file/a074ae16281d/src/cpu/x86/vm/templateInterpreter_x86.hpp#l37 >>> >>> What about SPARC? Have we ever seen a problem there? Or AArch64 for that matter? >> >> I?ve only ever seen problems on AMD64. I?ve never seen it on SPARC and have never run on AArch64. >> >> The real fix is that the interpreter generator should never have to guess the size of the code buffer it needs but should resize things as needed after generating the interpreter. > > Yes, it should. Forgot to say that it looks good. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Sep 12 19:13:27 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 12 Sep 2016 15:13:27 -0400 Subject: Odd interaction between ArrayList$Itr and Escape Analysis Message-ID: Hi all, Vladimir I. and I have been looking at a peculiarity in EA as it relates to eliminating the ArrayList$Itr. What Vladimir found (and I see it as well) is that ArrayList$Itr::init isn't always inlined due to "unloaded signature classes", e.g.: @ 6 java.util.ArrayList::iterator (10 bytes) inline (hot) @ 6 java.util.ArrayList$Itr:: (6 bytes) unloaded signature classes I tried to dig a bit further into this, and it appears that what's "unloaded" is ArrayList$1. LogCompilation shows this (which I think is relevant): It looks like ArrayList$1 is a synthetic class generated by javac because ArrayList$Itr constructor is private (despite the class itself being private). Here's the bytecode (8u51) of ArrayList::iterator: public java.util.Iterator iterator(); descriptor: ()Ljava/util/Iterator; flags: ACC_PUBLIC Code: stack=4, locals=1, args_size=1 0: new #61 // class java/util/ArrayList$Itr 3: dup 4: aload_0 5: aconst_null 6: invokespecial #62 // Method java/util/ArrayList$Itr."":(Ljava/util/ArrayList;Ljava/util/ArrayList$1;)V 9: areturn LineNumberTable: line 834: 0 Signature: #185 // ()Ljava/util/Iterator; The only way I can get the Itr allocation removed in my method is by causing some other method that does the same thing to be JIT compiled prior to mine. Does anyone have a good idea of what's actually going on here? Why is that synthetic ArrayList$1 such a pest here? It's a bit sad that such a little thing can prevent EA from working in a perfectly good candidate method for it. Thoughts? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Mon Sep 12 19:19:24 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Mon, 12 Sep 2016 12:19:24 -0700 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: Message-ID: Hi Vitaly, Haha. I've actually fixed the exact same problem in Zing JVM when I found this out a while ago. Do you guys want the patch be upstreamed? Here's the bug description that I wrote for Zing, but it applies to HotSpot as well (since we inherited that bug from HotSpot): This bug is to track an enhancement that would allow compilation and inlining of "bridge constructors" for private inner classes, generated by javac. In HotSpot's compilation policy, and C2's inlining heuristic, if a method/constructor is found to have unloaded classes in its signature, then there are special handling: * in compilation policy, if a method is about to be triggered a C2 compilation, and there are unloaded classes in its signature, then these classes are forced to be loaded before compilation; * in C2, when a method is considered to be a candidate for inlining, if there are unloaded classes in its signature, it will NOT be inlined. It's questionable whether or not the C2 inlining heuristic is profitable in general, but there's a case where it's definitely not profitable - when dealing with "bridge constructors" generated by javac. When javac sees a private inner class with no explicit constructors, e.g. > package java.util; > > public class ArrayList implements Iterable { > public Iterator iterator() { > return new Itr(); > } > > private class Itr implements Iterator { } > } javac will synthesize two constructors for the inner class (e.g. Itr above): 1. The normal default constructor, with accessibility the same as its holder class - private private java.util.ArrayList$Itr(java.util.ArrayList); 2. A "bridge constructor". Because the enclosing class needs to access Itr's constructor, but doesn't have accessibility to the private one, so javac futher synthesizes this "bridge constructor" with package accessibility, which simply delegates to the private default one: java.util.ArrayList$Itr(java.util.ArrayList, java.util.ArrayList$1); The sole purpose of the "bridge constructor" is to provide accessibility, but if it were only different from the private one in its accessibility, the two constructors won't be distinguishable under JVM's overload resolution rules. So, javac pulls a trick, and appends a marker argument called "access constructor tag" to the argument list of the bridge constructor, e.g. java.util.ArrayList$1 in this example, and always passes a null to this argument. In effect, the class of this marker argument never needs to be loaded, because it's never instantiated. But C2 isn't happy about unloaded classes in signature, so it'd refuse to inline any bridge constructors. 0.320: 17 2 TestC2ArrayListIteratorLoop::sumList 0.321: @ 3 java.util.ArrayList::iterator (10 bytes) inlined (hot) 0.321: - @ 6 java.util.ArrayList$Itr:: (6 bytes) unloaded signature classes 0.321: @ 8 java.util.ArrayList$Itr::hasNext (20 bytes) inlined (hot) 0.321: @ 8 java.util.ArrayList::access$100 (5 bytes) inlined (hot) 0.322: @ 25 java.lang.Integer::intValue (5 bytes) inlined (hot) With this enhancement, C2 will be able to ignore the unloaded class in the bridge constructor, and inline it: 0.269: 18 2 TestC2ArrayListIteratorLoop::sumList 0.269: @ 3 java.util.ArrayList::iterator (10 bytes) inlined (hot) 0.270: @ 6 java.util.ArrayList$Itr:: (6 bytes) inlined (hot) 0.270: @ 2 java.util.ArrayList$Itr:: (26 bytes) inlined (hot) 0.270: - @ 6 java.lang.Object:: (1 bytes) don't intrinsify this 0.270: @ 6 java.lang.Object:: (1 bytes) inlined (hot) 0.270: @ 8 java.util.ArrayList$Itr::hasNext (20 bytes) inlined (hot) 0.270: @ 8 java.util.ArrayList::access$100 (5 bytes) inlined (hot) 0.271: @ 25 java.lang.Integer::intValue (5 bytes) inlined (hot) - Kris On Mon, Sep 12, 2016 at 12:13 PM, Vitaly Davidovich wrote: > Hi all, > > Vladimir I. and I have been looking at a peculiarity in EA as it relates > to eliminating the ArrayList$Itr. What Vladimir found (and I see it as > well) is that ArrayList$Itr::init isn't always inlined due to "unloaded > signature classes", e.g.: > > @ 6 java.util.ArrayList::iterator (10 bytes) inline (hot) > @ 6 java.util.ArrayList$Itr:: (6 > bytes) unloaded signature classes > > I tried to dig a bit further into this, and it appears that what's > "unloaded" is ArrayList$1. LogCompilation shows this (which I think is > relevant): > > > > > arguments='820 827' flags='4096' bytes='6' iicount='1853'/> > > > > > > > It looks like ArrayList$1 is a synthetic class generated by javac because > ArrayList$Itr constructor is private (despite the class itself being > private). Here's the bytecode (8u51) of ArrayList::iterator: > > public java.util.Iterator iterator(); > descriptor: ()Ljava/util/Iterator; > flags: ACC_PUBLIC > Code: > stack=4, locals=1, args_size=1 > 0: new #61 // class > java/util/ArrayList$Itr > 3: dup > 4: aload_0 > 5: aconst_null > 6: invokespecial #62 // Method > java/util/ArrayList$Itr."":(Ljava/util/ArrayList; > Ljava/util/ArrayList$1;)V > 9: areturn > LineNumberTable: > line 834: 0 > Signature: #185 // ()Ljava/util/Iterator; > > The only way I can get the Itr allocation removed in my method is by > causing some other method that does the same thing to be JIT compiled prior > to mine. > > Does anyone have a good idea of what's actually going on here? Why is that > synthetic ArrayList$1 such a pest here? It's a bit sad that such a little > thing can prevent EA from working in a perfectly good candidate method for > it. > > Thoughts? > > Thanks > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Sep 12 19:38:14 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 12 Sep 2016 15:38:14 -0400 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: Message-ID: Hi Kris, On Mon, Sep 12, 2016 at 3:19 PM, Krystal Mok wrote: > Hi Vitaly, > > Haha. I've actually fixed the exact same problem in Zing JVM when I found > this out a while ago. Do you guys want the patch be upstreamed? > Incredible - nothing like debugging/troubleshooting an already solved problem! :( What other goodies could you upstream? :) Vladimir I. (or K. :)), could you guys accept that patch? > > Here's the bug description that I wrote for Zing, but it applies to > HotSpot as well (since we inherited that bug from HotSpot): > > This bug is to track an enhancement that would allow compilation and > inlining of "bridge constructors" for private inner classes, generated by > javac. > > In HotSpot's compilation policy, and C2's inlining heuristic, if a > method/constructor is found to have unloaded classes in its signature, then > there are special handling: > * in compilation policy, if a method is about to be triggered a C2 > compilation, and there are unloaded classes in its signature, then these > classes are forced to be loaded before compilation; > This explains why I actually had to trigger a dummy method to JIT compile so that bridge class would be loaded. I was slightly puzzled why simply exercising that iteration code in the interpreter wasn't "loading" the unloaded class(es). > * in C2, when a method is considered to be a candidate for inlining, if > there are unloaded classes in its signature, it will NOT be inlined. > > It's questionable whether or not the C2 inlining heuristic is profitable > in general, but there's a case where it's definitely not profitable - when > dealing with "bridge constructors" generated by javac. > It seems odd to me as well why inlining won't force load the missing class(es). If we're inlining, it means the method itself or the call chain it's part of is hot - failing to inline can have negative side-effects, like this example. I suppose there must be a good reason why it doesn't do this though? > > When javac sees a private inner class with no explicit constructors, e.g. > > > package java.util; > > > > public class ArrayList implements Iterable { > > public Iterator iterator() { > > return new Itr(); > > } > > > > private class Itr implements Iterator { } > > } > > javac will synthesize two constructors for the inner class (e.g. Itr > above): > 1. The normal default constructor, with accessibility the same as its > holder class - private > private java.util.ArrayList$Itr(java.util.ArrayList); > 2. A "bridge constructor". Because the enclosing class needs to access > Itr's constructor, but doesn't have accessibility to the private one, so > javac futher synthesizes this "bridge constructor" with package > accessibility, which simply delegates to the private default one: > java.util.ArrayList$Itr(java.util.ArrayList, java.util.ArrayList$1); > > The sole purpose of the "bridge constructor" is to provide accessibility, > but if it were only different from the private one in its accessibility, > the two constructors won't be distinguishable under JVM's overload > resolution rules. So, javac pulls a trick, and appends a marker argument > called "access constructor tag" to the argument list of the bridge > constructor, e.g. java.util.ArrayList$1 in this example, and always passes > a null to this argument. > Aha, so that's why there's that aconst_null right before the invokespecial! I was wondering what the heck that was. > > In effect, the class of this marker argument never needs to be loaded, > because it's never instantiated. But C2 isn't happy about unloaded classes > in signature, so it'd refuse to inline any bridge constructors. > > 0.320: 17 2 TestC2ArrayListIteratorLoop::sumList > 0.321: @ 3 java.util.ArrayList::iterator (10 bytes) > inlined (hot) > 0.321: - @ 6 java.util.ArrayList$Itr:: (6 bytes) > unloaded signature classes > 0.321: @ 8 java.util.ArrayList$Itr::hasNext (20 bytes) > inlined (hot) > 0.321: @ 8 java.util.ArrayList::access$100 (5 bytes) > inlined (hot) > 0.322: @ 25 java.lang.Integer::intValue (5 bytes) inlined > (hot) > > With this enhancement, C2 will be able to ignore the unloaded class in the > bridge constructor, and inline it: > > 0.269: 18 2 TestC2ArrayListIteratorLoop::sumList > 0.269: @ 3 java.util.ArrayList::iterator (10 bytes) > inlined (hot) > 0.270: @ 6 java.util.ArrayList$Itr:: (6 bytes) > inlined (hot) > 0.270: @ 2 java.util.ArrayList$Itr:: (26 bytes) > inlined (hot) > 0.270: - @ 6 java.lang.Object:: (1 bytes) don't > intrinsify this > 0.270: @ 6 java.lang.Object:: (1 bytes) > inlined (hot) > 0.270: @ 8 java.util.ArrayList$Itr::hasNext (20 bytes) > inlined (hot) > 0.270: @ 8 java.util.ArrayList::access$100 (5 bytes) > inlined (hot) > 0.271: @ 25 java.lang.Integer::intValue (5 bytes) inlined > (hot) > > - Kris > Thanks for the great explanation Kris. > > On Mon, Sep 12, 2016 at 12:13 PM, Vitaly Davidovich > wrote: > >> Hi all, >> >> Vladimir I. and I have been looking at a peculiarity in EA as it relates >> to eliminating the ArrayList$Itr. What Vladimir found (and I see it as >> well) is that ArrayList$Itr::init isn't always inlined due to "unloaded >> signature classes", e.g.: >> >> @ 6 java.util.ArrayList::iterator (10 bytes) inline (hot) >> @ 6 java.util.ArrayList$Itr:: (6 >> bytes) unloaded signature classes >> >> I tried to dig a bit further into this, and it appears that what's >> "unloaded" is ArrayList$1. LogCompilation shows this (which I think is >> relevant): >> >> >> >> >> > arguments='820 827' flags='4096' bytes='6' iicount='1853'/> >> >> >> >> >> >> >> It looks like ArrayList$1 is a synthetic class generated by javac because >> ArrayList$Itr constructor is private (despite the class itself being >> private). Here's the bytecode (8u51) of ArrayList::iterator: >> >> public java.util.Iterator iterator(); >> descriptor: ()Ljava/util/Iterator; >> flags: ACC_PUBLIC >> Code: >> stack=4, locals=1, args_size=1 >> 0: new #61 // class >> java/util/ArrayList$Itr >> 3: dup >> 4: aload_0 >> 5: aconst_null >> 6: invokespecial #62 // Method >> java/util/ArrayList$Itr."":(Ljava/util/ArrayList;Ljava >> /util/ArrayList$1;)V >> 9: areturn >> LineNumberTable: >> line 834: 0 >> Signature: #185 // ()Ljava/util/Iterator; >> >> The only way I can get the Itr allocation removed in my method is by >> causing some other method that does the same thing to be JIT compiled prior >> to mine. >> >> Does anyone have a good idea of what's actually going on here? Why is >> that synthetic ArrayList$1 such a pest here? It's a bit sad that such a >> little thing can prevent EA from working in a perfectly good candidate >> method for it. >> >> Thoughts? >> >> Thanks >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matcdac at gmail.com Mon Sep 12 19:53:09 2016 From: matcdac at gmail.com (Prakhar Makhija) Date: Tue, 13 Sep 2016 01:23:09 +0530 Subject: Incomplete Iterator Message-ID: Hi, I feel there should be one more Iterator added to the Collections, let's say UpdatedIterator, which should be implemented for List, Set, Map, etc. The reason being the existing one does not support the manipulation of Collection and throws Exception. It would be great to have a new one besides this, so programmers will have a choice to pick any of the two. Let's say more than one thread are accessing their own static collection field, or some encapsulated data of another object. Now some will of these threads will want to just iterate it, others may remove something from it, while others may add in it, also some can manipulate the existing data. So the basic need is UpdatedIterator must keep track of the latest updated modified version of the Collection, using the same. On Sep 13, 2016 12:51 AM, wrote: Send hotspot-compiler-dev mailing list submissions to hotspot-compiler-dev at openjdk.java.net To subscribe or unsubscribe via the World Wide Web, visit http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev or, via email, send a message with subject or body 'help' to hotspot-compiler-dev-request at openjdk.java.net You can reach the person managing the list at hotspot-compiler-dev-owner at openjdk.java.net When replying, please edit your Subject line so it is more specific than "Re: Contents of hotspot-compiler-dev digest..." Today's Topics: 1. Re: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI (Christian Thalinger) 2. Odd interaction between ArrayList$Itr and Escape Analysis (Vitaly Davidovich) 3. Re: Odd interaction between ArrayList$Itr and Escape Analysis (Krystal Mok) ---------------------------------------------------------------------- Message: 1 Date: Mon, 12 Sep 2016 07:26:02 -1000 From: Christian Thalinger To: Doug Simon Cc: hotspot compiler Subject: Re: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI Message-ID: <938FBEA5-0AAB-4640-B231-E259B00275AB at twitter.com> Content-Type: text/plain; charset="utf-8" > On Sep 6, 2016, at 11:58 AM, Christian Thalinger wrote: > >> >> On Sep 6, 2016, at 11:37 AM, Doug Simon wrote: >> >> >>> On 06 Sep 2016, at 20:14, Christian Thalinger wrote: >>> >>> >>>> On Sep 5, 2016, at 6:49 AM, Doug Simon wrote: >>>> >>>> In jvmci-8, we increased the interpreter code size when JVMCI code is included: >>>> >>>> http://hg.openjdk.java.net/graal/graal-jvmci-8/file/ a074ae16281d/src/cpu/x86/vm/templateInterpreter_x86.hpp#l37 >>> >>> What about SPARC? Have we ever seen a problem there? Or AArch64 for that matter? >> >> I?ve only ever seen problems on AMD64. I?ve never seen it on SPARC and have never run on AArch64. >> >> The real fix is that the interpreter generator should never have to guess the size of the code buffer it needs but should resize things as needed after generating the interpreter. > > Yes, it should. Forgot to say that it looks good. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Mon, 12 Sep 2016 15:13:27 -0400 From: Vitaly Davidovich To: hotspot compiler Subject: Odd interaction between ArrayList$Itr and Escape Analysis Message-ID: Content-Type: text/plain; charset="utf-8" Hi all, Vladimir I. and I have been looking at a peculiarity in EA as it relates to eliminating the ArrayList$Itr. What Vladimir found (and I see it as well) is that ArrayList$Itr::init isn't always inlined due to "unloaded signature classes", e.g.: @ 6 java.util.ArrayList::iterator (10 bytes) inline (hot) @ 6 java.util.ArrayList$Itr:: (6 bytes) unloaded signature classes I tried to dig a bit further into this, and it appears that what's "unloaded" is ArrayList$1. LogCompilation shows this (which I think is relevant): It looks like ArrayList$1 is a synthetic class generated by javac because ArrayList$Itr constructor is private (despite the class itself being private). Here's the bytecode (8u51) of ArrayList::iterator: public java.util.Iterator iterator(); descriptor: ()Ljava/util/Iterator; flags: ACC_PUBLIC Code: stack=4, locals=1, args_size=1 0: new #61 // class java/util/ArrayList$Itr 3: dup 4: aload_0 5: aconst_null 6: invokespecial #62 // Method java/util/ArrayList$Itr."":(Ljava/util/ArrayList; Ljava/util/ArrayList$1;)V 9: areturn LineNumberTable: line 834: 0 Signature: #185 // ()Ljava/util/Iterator; The only way I can get the Itr allocation removed in my method is by causing some other method that does the same thing to be JIT compiled prior to mine. Does anyone have a good idea of what's actually going on here? Why is that synthetic ArrayList$1 such a pest here? It's a bit sad that such a little thing can prevent EA from working in a perfectly good candidate method for it. Thoughts? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Mon, 12 Sep 2016 12:19:24 -0700 From: Krystal Mok To: Vitaly Davidovich Cc: hotspot compiler Subject: Re: Odd interaction between ArrayList$Itr and Escape Analysis Message-ID: Content-Type: text/plain; charset="utf-8" Hi Vitaly, Haha. I've actually fixed the exact same problem in Zing JVM when I found this out a while ago. Do you guys want the patch be upstreamed? Here's the bug description that I wrote for Zing, but it applies to HotSpot as well (since we inherited that bug from HotSpot): This bug is to track an enhancement that would allow compilation and inlining of "bridge constructors" for private inner classes, generated by javac. In HotSpot's compilation policy, and C2's inlining heuristic, if a method/constructor is found to have unloaded classes in its signature, then there are special handling: * in compilation policy, if a method is about to be triggered a C2 compilation, and there are unloaded classes in its signature, then these classes are forced to be loaded before compilation; * in C2, when a method is considered to be a candidate for inlining, if there are unloaded classes in its signature, it will NOT be inlined. It's questionable whether or not the C2 inlining heuristic is profitable in general, but there's a case where it's definitely not profitable - when dealing with "bridge constructors" generated by javac. When javac sees a private inner class with no explicit constructors, e.g. > package java.util; > > public class ArrayList implements Iterable { > public Iterator iterator() { > return new Itr(); > } > > private class Itr implements Iterator { } > } javac will synthesize two constructors for the inner class (e.g. Itr above): 1. The normal default constructor, with accessibility the same as its holder class - private private java.util.ArrayList$Itr(java.util.ArrayList); 2. A "bridge constructor". Because the enclosing class needs to access Itr's constructor, but doesn't have accessibility to the private one, so javac futher synthesizes this "bridge constructor" with package accessibility, which simply delegates to the private default one: java.util.ArrayList$Itr(java.util.ArrayList, java.util.ArrayList$1); The sole purpose of the "bridge constructor" is to provide accessibility, but if it were only different from the private one in its accessibility, the two constructors won't be distinguishable under JVM's overload resolution rules. So, javac pulls a trick, and appends a marker argument called "access constructor tag" to the argument list of the bridge constructor, e.g. java.util.ArrayList$1 in this example, and always passes a null to this argument. In effect, the class of this marker argument never needs to be loaded, because it's never instantiated. But C2 isn't happy about unloaded classes in signature, so it'd refuse to inline any bridge constructors. 0.320: 17 2 TestC2ArrayListIteratorLoop::sumList 0.321: @ 3 java.util.ArrayList::iterator (10 bytes) inlined (hot) 0.321: - @ 6 java.util.ArrayList$Itr:: (6 bytes) unloaded signature classes 0.321: @ 8 java.util.ArrayList$Itr::hasNext (20 bytes) inlined (hot) 0.321: @ 8 java.util.ArrayList::access$100 (5 bytes) inlined (hot) 0.322: @ 25 java.lang.Integer::intValue (5 bytes) inlined (hot) With this enhancement, C2 will be able to ignore the unloaded class in the bridge constructor, and inline it: 0.269: 18 2 TestC2ArrayListIteratorLoop::sumList 0.269: @ 3 java.util.ArrayList::iterator (10 bytes) inlined (hot) 0.270: @ 6 java.util.ArrayList$Itr:: (6 bytes) inlined (hot) 0.270: @ 2 java.util.ArrayList$Itr:: (26 bytes) inlined (hot) 0.270: - @ 6 java.lang.Object:: (1 bytes) don't intrinsify this 0.270: @ 6 java.lang.Object:: (1 bytes) inlined (hot) 0.270: @ 8 java.util.ArrayList$Itr::hasNext (20 bytes) inlined (hot) 0.270: @ 8 java.util.ArrayList::access$100 (5 bytes) inlined (hot) 0.271: @ 25 java.lang.Integer::intValue (5 bytes) inlined (hot) - Kris On Mon, Sep 12, 2016 at 12:13 PM, Vitaly Davidovich wrote: > Hi all, > > Vladimir I. and I have been looking at a peculiarity in EA as it relates > to eliminating the ArrayList$Itr. What Vladimir found (and I see it as > well) is that ArrayList$Itr::init isn't always inlined due to "unloaded > signature classes", e.g.: > > @ 6 java.util.ArrayList::iterator (10 bytes) inline (hot) > @ 6 java.util.ArrayList$Itr:: (6 > bytes) unloaded signature classes > > I tried to dig a bit further into this, and it appears that what's > "unloaded" is ArrayList$1. LogCompilation shows this (which I think is > relevant): > > > > > arguments='820 827' flags='4096' bytes='6' iicount='1853'/> > > > > > > > It looks like ArrayList$1 is a synthetic class generated by javac because > ArrayList$Itr constructor is private (despite the class itself being > private). Here's the bytecode (8u51) of ArrayList::iterator: > > public java.util.Iterator iterator(); > descriptor: ()Ljava/util/Iterator; > flags: ACC_PUBLIC > Code: > stack=4, locals=1, args_size=1 > 0: new #61 // class > java/util/ArrayList$Itr > 3: dup > 4: aload_0 > 5: aconst_null > 6: invokespecial #62 // Method > java/util/ArrayList$Itr."":(Ljava/util/ArrayList; > Ljava/util/ArrayList$1;)V > 9: areturn > LineNumberTable: > line 834: 0 > Signature: #185 // ()Ljava/util/Iterator; > > The only way I can get the Itr allocation removed in my method is by > causing some other method that does the same thing to be JIT compiled prior > to mine. > > Does anyone have a good idea of what's actually going on here? Why is that > synthetic ArrayList$1 such a pest here? It's a bit sad that such a little > thing can prevent EA from working in a perfectly good candidate method for > it. > > Thoughts? > > Thanks > -------------- next part -------------- An HTML attachment was scrubbed... URL: End of hotspot-compiler-dev Digest, Vol 112, Issue 20 ***************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Mon Sep 12 19:56:52 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Mon, 12 Sep 2016 12:56:52 -0700 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: Message-ID: On Mon, Sep 12, 2016 at 12:38 PM, Vitaly Davidovich wrote: > > It seems odd to me as well why inlining won't force load the missing > class(es). If we're inlining, it means the method itself or the call chain > it's part of is hot - failing to inline can have negative side-effects, > like this example. I suppose there must be a good reason why it doesn't do > this though? > That's because we can't. The JIT compilers are running on their own threads, and they're not real "Java threads". So they are not allowed to run arbitrary Java code. But Java class loading may involve running arbitrary Java code, e.g. the ClassLoader.loadClass() upcall. Force class loading can be done on the triggering side (for the top-level method), because compilation tasks are triggered from real Java threads, and they're allowed to run arbitrary Java code. - Kris -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Sep 12 20:15:41 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 12 Sep 2016 16:15:41 -0400 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: Message-ID: On Mon, Sep 12, 2016 at 3:56 PM, Krystal Mok wrote: > On Mon, Sep 12, 2016 at 12:38 PM, Vitaly Davidovich > wrote: >> >> It seems odd to me as well why inlining won't force load the missing >> class(es). If we're inlining, it means the method itself or the call chain >> it's part of is hot - failing to inline can have negative side-effects, >> like this example. I suppose there must be a good reason why it doesn't do >> this though? >> > > That's because we can't. The JIT compilers are running on their own > threads, and they're not real "Java threads". So they are not allowed to > run arbitrary Java code. But Java class loading may involve running > arbitrary Java code, e.g. the ClassLoader.loadClass() upcall. > Force class loading can be done on the triggering side (for the top-level > method), because compilation tasks are triggered from real Java threads, > and they're allowed to run arbitrary Java code. > I see, makes sense. Perhaps there can be an option to turn on loading of required types in the entire compilation unit, after all inlining is done (and therefore make the unloaded types not be barriers for inlining). I'd personally prefer that over having odd performance differences. > > - Kris > Thanks Kris. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheremin at gmail.com Tue Sep 13 09:07:48 2016 From: cheremin at gmail.com (Cheremin Ruslan) Date: Tue, 13 Sep 2016 12:07:48 +0300 Subject: MaxBCEAEstimateSize and inlining clarification Message-ID: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> > I'm seeing some code that iterates over a ConcurrentHashMap's entrySet that allocates tens of GB of CHM$MapEntry objects even though they don't escape I'm a bit confused: I was sure BCEA-style params do affect EA, but don't affect scalar replacement. With bcEscapeAnalyser you can get (sort of) inter-procedural EA, but this only allows you to have more allocations identified as ArgEscape instead of GlobalEscape. But you can't get more NoEscape without real inlining. ArgEscape (afaik) is used only for synchronization removals in HotSpot, not for scalar replacements. Am I incorrect? ---- Ruslan From martin.doerr at sap.com Tue Sep 13 09:35:09 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 13 Sep 2016 09:35:09 +0000 Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic In-Reply-To: References: Message-ID: Hi Hiroshi, we appreciate your change. Thanks for contributing it. It basically looks good, but I'd like to propose some minor improvements. kernel_crc32_1word_vpmsumd: 1. The Pre-align code can be implemented shorter: clrldi_(prealign, buf, 57); beq(CCR0, L_alignHead); subfic(prealign, prealign, 128); 2. I'd prefer the label name "L_alignedHead". 3. The branch b(L_alignTail) and the label are not needed and should get removed. kernel_crc32_1word_aligned: 1. When saving and restoring non-volatile vector register, please use offset differences of -16 instead of -32. (The ABI allows up to 288 bytes to be used in frameless functions so it will fit if -16 is used.) 2. The std instructions should better be used with int offsets so you can get rid of the addi(offset, offset, -8) instructions. Comments: For single line comments "//" should be used instead of "/*". Would be nice if you could change them. Thanks and best regards, Martin From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] Sent: Dienstag, 6. September 2016 16:50 To: hotspot-compiler-dev at openjdk.java.net; vladimir.kozlov at oracle.com Cc: Volker Simonis (volker.simonis at gmail.com) ; Doerr, Martin ; Gustavo Bueno Romero Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic Dear Vladimir and all: Can I please request reviews for the following change? JIRA: https://bugs.openjdk.java.net/browse/JDK-8164920 webrev: http://cr.openjdk.java.net/~gromero/8164920/01/ As Volker's comments in the above JIRA, this is a ppc64-only improvement which will not affect any of the Oracle platforms in any way. This change includes new implementation of CRC32 Intrinsics for ppc64le. In my local experiment, CRC32 of 64KB was calculated more than 20 times faster than original. Performance of CRC32 Intrinsic is important to run recent Apache Cassandra. A Cassandra daemon needs to read 64KB data from a disk with CRC32 checksum by default. This JIRA entry has "jdk9-fc-request" label. If there is a chance to include new change in JDK 9 for ppc64le, I would like to request a review for this change. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Tue Sep 13 14:18:29 2016 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 13 Sep 2016 16:18:29 +0200 (CEST) Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: Message-ID: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> I've always found that the empty inner classes generated by javac as a kind of hack. These classes should be removed in Java 10, thanks to the nestmate attributes. http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-January/000060.html The other solution, is to have an empty class in the jdk which is not visible from javac (the class itself can be marked as synthetic), so javac can use it without creating method clash. and to solve the problem now, the easy solution is to add a package private constructor in ArrayList.Itr, private class Itr implements Iterator { int cursor; // index of next element to return int lastRet = -1; // index of last element returned; -1 if no such int expectedModCount = modCount; Itr() { // avoid to generate a synthetic accessor constructor } } regards, R?mi > De: "Vitaly Davidovich" > ?: "Krystal Mok" > Cc: "hotspot compiler" > Envoy?: Lundi 12 Septembre 2016 22:15:41 > Objet: Re: Odd interaction between ArrayList$Itr and Escape Analysis > On Mon, Sep 12, 2016 at 3:56 PM, Krystal Mok < rednaxelafx at gmail.com > wrote: >> On Mon, Sep 12, 2016 at 12:38 PM, Vitaly Davidovich < vitalyd at gmail.com > wrote: >>> It seems odd to me as well why inlining won't force load the missing class(es). >>> If we're inlining, it means the method itself or the call chain it's part of is >>> hot - failing to inline can have negative side-effects, like this example. I >>> suppose there must be a good reason why it doesn't do this though? >> That's because we can't. The JIT compilers are running on their own threads, and >> they're not real "Java threads". So they are not allowed to run arbitrary Java >> code. But Java class loading may involve running arbitrary Java code, e.g. the >> ClassLoader.loadClass() upcall. >> Force class loading can be done on the triggering side (for the top-level >> method), because compilation tasks are triggered from real Java threads, and >> they're allowed to run arbitrary Java code. > I see, makes sense. Perhaps there can be an option to turn on loading of > required types in the entire compilation unit, after all inlining is done (and > therefore make the unloaded types not be barriers for inlining). I'd personally > prefer that over having odd performance differences. >> - Kris > Thanks Kris. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Tue Sep 13 15:04:47 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 13 Sep 2016 17:04:47 +0200 Subject: RFR(S): 8159611: C2: ArrayCopy elimination skips required parameter checks In-Reply-To: References: <57B2A380.6000408@oracle.com> <41851a79-5ffe-2b9d-504a-6a2301de5384@oracle.com> <7ce01d28-13f5-098a-9898-080f8258881d@oracle.com> Message-ID: <8e399624-8e67-ebe6-d348-7691690532e8@oracle.com> Hi Volker, On 09/12/2016 06:35 PM, Volker Simonis wrote: > Sorry for the long delay... thank you for spending more time on this bug and also for the detailed description of the way your solution works! > > Here's my new version: > > http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v3/ That looks good to me. I did a preliminary performance evaluation with Octane-Gbemu and Octane-PdfJS, results look good on all platforms. Let me now do a more detailed evaluation. I'll get back to you once the results are available. Thank you! Best regards, Zoltan > > I've actually changed PhaseMacroExpand::expand_arraycopy_node() such > that it calls generate_arraycopy() with 'length_never_negative' set to > true if EliminateAllocations is true (in this case we already checked > in LibraryCallKit::inline_arraycopy() that 'length' is not negative). > This way I could leave generate_arraycopy() untouched. > > The generated code now looks as follows: > > Original version (without 'length < 0' check): > > 0a7 B5: # B17 B6 <- B4 Freq: 0,999998 > 0a7 cmpl R9, R11 # unsigned > 0aa jb,u B17 P=0,000001 C=-1,000000 > ... > 0da B7: # B18 B8 <- B6 B12 B13 Freq: 0,999997 > 0da movl R11, [rsp + #8] # spill > 0df testl R11, R11 > 0e2 jle B18 P=0,000001 C=-1,000000 > ... > 0e8 B8: # B9 <- B7 Freq: 0,999996 > 0f9 call_leaf_nofp,runtime oop_disjoint_arraycopy > ... > 106 B9: # B10 <- B8 B18 B20 Freq: 0,999997 > 113 ret > ... > 184 B17: # N1 <- B4 B5 Freq: 2,01328e-06 > 193 call,static wrapper for: > uncommon_trap(reason='intrinsic_or_type_checked_inlining' > action='make_not_entrant' debug_id='0') > > 19d B18: # B9 B19 <- B7 Freq: 9,99997e-07 > 19d testl R11, R11 > 1a0 jge B9 P=0,999999 C=-1,000000 > 1a0 > 1a6 B19: # B22 B20 <- B18 Freq: 9,99997e-13 > 1a6 movq RSI, R8 # spill > 1a9 movl RDX, #1 # int > 1ae movq RCX, R10 # spill > 1b1 movl R8, #1 # int > 1b7 movl R9, R11 # spill > nop # 1 bytes pad for loops and calls > 1bb call,static wrapper for: slow_arraycopy > > In B5 there's a check if 'offset+length' is still in the array range. > If not we jump to the uncommon trap in B17. > In B7 there's the first check from > PhaseMacroExpand::generate_arraycopy() (i.e. > generate_nonpositive_guard()). If 'length is less than or equal to > zero we jump to B18 where there's the second check from > PhaseMacroExpand::generate_arraycopy() (i.e. > generate_negative_guard()). If 'length' is zero, we jump to B9 and > return. Otherwise we fall into B19 from where we call slow_arraycopy. > slow_arraycopy (which is generated in ObjArrayKlass::copy_array() will > throw an AIOOB exception if 'length' is negative. > > The new version now looks as follows: > > 0a2 B5: # B19 B6 <- B4 Freq: 0,999998 > 0a2 cmpl R10, RCX # unsigned > 0a5 jb,u B19 P=0,000001 C=-1,000000 > 0a5 > 0ab B6: # B20 B7 <- B5 Freq: 0,999997 > 0ab movl R10, [rsp + #0] # spill > 0af testl R10, R10 > 0b2 jl B20 P=0,000001 C=-1,000000 > 0b2 > ... > 0e2 B8: # B10 B9 <- B7 B13 B14 Freq: 0,999996 > 0e2 testl R10, R10 > 0e5 je,s B10 P=0,000001 C=-1,000000 > ... > 0e7 B9: # B10 <- B8 Freq: 0,999995 > 0f8 call_leaf_nofp,runtime oop_disjoint_arraycopy > ... > 105 B10: # B11 <- B9 B8 Freq: 0,999996 > 112 ret > ... > 18e B19: # B20 <- B5 Freq: 9,99998e-07 > 192 B20: # N1 <- B18 B19 B6 Freq: 3,01327e-06 > 1a3 call,static wrapper for: > uncommon_trap(reason='intrinsic_or_type_checked_inlining' > action='make_not_entrant' debug_id='0') > > B5 is like before, but is now followed by the extra check for 'length' > being not negative in B6. In B8 we we now have the first check (i.e. > generate_negative_guard()) from > PhaseMacroExpand::generate_arraycopy(). It directly checks if 'length' > is zero and jumps to B10 (i.e. returns) if so. Otherwise we fall > directly into oop_disjoint_arraycopy(). There's no need to check for > 'length' being negative and calling 'slow_arraycopy' because this case > is already handled before now (in B6). > > Is this OK now? > > Thank you and best regards, > Volker > > > On Fri, Aug 26, 2016 at 3:51 AM, Vladimir Kozlov > wrote: >> Looks good. >> >> Check does not fold because it is different: LT vs LE. >> >> Actually there are 3 checks together with yours (see >> PhaseMacroExpand::generate_arraycopy()): >> >> Node* not_pos = generate_nonpositive_guard(ctrl, copy_length, >> length_never_negative); >> if (not_pos != NULL) { >> Node* local_ctrl = not_pos, *local_io = *io; >> MergeMemNode* local_mem = MergeMemNode::make(mem); >> transform_later(local_mem); >> >> // (6) length must not be negative. >> if (!length_never_negative) { >> generate_negative_guard(&local_ctrl, copy_length, slow_region); >> } >> >> I think the only way to avoid this is to modify code in generate_arraycopy() >> when EliminateAllocations is true. In such case you need to generate only >> length == 0 check. >> >> Thanks, >> Vladimir >> >> >> On 8/25/16 10:03 AM, Volker Simonis wrote: >>> On Tue, Aug 16, 2016 at 11:49 PM, Vladimir Kozlov >>> wrote: >>>> Not generating exception is definitely bug. >>>> >>>> First, about test case. It would be nice if it also verifies other >>>> IndexOutOfBoundsException cases. >>>> >>> I've extended the test case. See: >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v2/ >>> >>> With the new test I've caught another problem in C1 (only on x86 and >>> s390, but that's not in the OpenJDK yet :). >>> >>> LIR_Assembler::emit_arraycopy() had a shortcut for length==0 which >>> prevented the throwing of an ArrayStoreException if src and dst arrays >>> have incompatible type (see do_test2() in the new regression test). >>> Note that this is a different error from 8160591 and not fixed by the >>> change for 8160591. >>> >>> I've also moved the new check after the offset + length check as >>> suggested by you (see new webrev). >>> >>> Unfortunately, the new check is still not eliminated. Here's how it looks: >>> >>> 0ae B6: # B20 B7 <- B5 Freq: 0,999997 >>> 0ae movl R9, [rsp + #0] # spill >>> 0b2 testl R9, R9 >>> 0b5 jl B20 P=0,000001 C=-1,000000 >>> 0b5 >>> 0bb B7: # B12 B8 <- B6 Freq: 0,999996 >>> 0bb movl R11, [R10 + #8 (8-bit)] # compressed klass ptr >>> 0bf decode_klass_not_null RAX,R11 >>> 0cc movl RBX, [RAX + #16 (8-bit)] # int >>> 0cf movslq RCX, RBX # i2l >>> 0d2 movq RSI, precise klass [Ljava/lang/Object;: >>> 0x00007ff1080320d0:Constant:exact * # ptr >>> 0dc movq RCX, [RSI + RCX] # class >>> 0e0 cmpq RAX, RCX # ptr >>> 0e3 jne,us B12 P=0,170000 C=-1,000000 >>> 0e3 >>> 0e5 B8: # B21 B9 <- B7 B13 B14 Freq: 0,999996 >>> 0e5 testl R9, R9 >>> 0e8 jle B21 P=0,000001 C=-1,000000 >>> >>> As you can see 'testl R9, R9' is executed two times. >>> >>> I've even tried to move the new check after the subtype check, but >>> that doesn't helps either: >>> >>> 0da B7: # B20 B8 <- B6 B13 B14 Freq: 0,999997 >>> 0da movl R11, [rsp + #8] # spill >>> 0df testl R11, R11 >>> 0e2 jl B20 P=0,000001 C=-1,000000 >>> 0e2 >>> 0e8 B8: # B10 B9 <- B7 Freq: 0,999996 >>> 0e8 testl R11, R11 >>> 0eb jle,s B10 P=0,000001 C=-1,000000 >>> >>> Any idea how this could be fixed? >>> >>> Thanks, >>> Volker >>> >>> PS: and I still don't have a reproducible benchmark which shows a >>> regression with my change... >>> >>> >>>> Actually additional dynamic check will help in case of negative length is >>>> know during compilation. The allocation code will be eliminated very >>>> early >>>> instead of waiting macro expansion: >>>> >>>> int length = alloc->in(AllocateNode::ALength)->find_int_con(-1); >>>> if (length < 0) { >>>> NOT_PRODUCT(fail_eliminate = "Array's size is not constant";) >>>> can_eliminate = false; >>>> } >>>> >>>> About additional length check in your new test. I think it may be >>>> collapsed >>>> with preceding check since it is generated after other checks. >>>> So I would suggest to move it after offset + length check. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> >>>> On 8/16/16 7:57 AM, Volker Simonis wrote: >>>>> >>>>> On Tue, Aug 16, 2016 at 7:24 AM, Tobias Hartmann >>>>> wrote: >>>>>> >>>>>> Hi Volker, >>>>>> >>>>>> thanks for taking care of this issue! >>>>>> >>>>>> Did you check what happens if the allocation is not eliminated and >>>>>> macro >>>>>> expansion phase emits another negative guard? Are the checks merged? >>>>>> >>>>> It depends. I just saw that in some cases the regression test worked >>>>> before, because the length check was done in >>>>> SharedRuntime::slow_arraycopy_C(). So in that case there's obviously >>>>> nothing that can be merged. But the test case is obviously a >>>>> degenerated example anyway, so I don't think that's a problem. >>>>> >>>>> If I do a more real-world example like this where the arracopy can not >>>>> be eliminated because one of its arguments escapes: >>>>> >>>>> public static boolean do_test2(int length, Object[] dest) { >>>>> try { >>>>> System.arraycopy(new Object[10], 1, dest, 1, length); >>>>> return false; >>>>> } catch (IndexOutOfBoundsException e) { >>>>> return true; >>>>> } >>>>> } >>>>> >>>>> and call it with: >>>>> >>>>> do_test2(8, new Object[10]) >>>>> >>>>> the generated code for do_test2() unfortunately contains one more >>>>> check now with my change (the 'length' field is in [rsp + #0]): >>>>> >>>>> 0a2 B4: # B18 B5 <- B3 Freq: 0,999999 >>>>> 0a2 movl R9, [rsp + #0] # spill >>>>> 0a6 testl R9, R9 >>>>> 0a9 jl B18 P=0,000001 C=-1,000000 >>>>> 0a9 >>>>> 0af B5: # B18 B6 <- B4 Freq: 0,999998 >>>>> 0af movl RBX, R9 # spill >>>>> 0b2 incl RBX # int >>>>> 0b4 cmpl RBX, #10 # unsigned >>>>> 0b7 jnbe,u B18 P=0,000001 C=-1,000000 >>>>> >>>>> The generated code before my change looked like this (againthe >>>>> 'length' field is in [rsp + #0]): >>>>> >>>>> 0a1 B4: # B17 B5 <- B3 Freq: 0,999999 >>>>> 0a1 movl R11, [rsp + #8] # spill >>>>> 0a6 incl R11 # int >>>>> 0a9 cmpl R11, #10 # unsigned >>>>> 0ad jnbe,u B17 P=0,000001 C=-1,000000 >>>>> >>>>> It seems that the 'length' check has been completely eliminated before. >>>>> >>>>> So I need to do some more tests to understand why the new check isn't >>>>> eliminated. >>>>> >>>>> Do you think the new check results in a performance regression? Have >>>>> you run some benchmarks? >>>>> >>>>>> I would prefer brackets around the if body but you don't need to send >>>>>> another webrev: >>>>>> if (EliminateAllocations) { >>>>>> generate_negative_guard(length, slow_region); >>>>>> } >>>>> >>>>> >>>>> Yes, I agree. >>>>> >>>>>> Best regards, >>>>>> Tobias >>>>>> >>>>>> On 12.08.2016 21:13, Volker Simonis wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> can I please have a review and sponsor for the following fix: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611 >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8159611 >>>>>>> >>>>>>> >>>>>>> We are inserting several checks for the arguments of >>>>>>> System.arraycopy() in LibraryCallKit::inline_arraycopy() before >>>>>>> intensifying the call in LibraryCallKit::inline_arraycopy. However the >>>>>>> check for the 'length' argument of arracopy is postponed to the macro >>>>>>> expansion phase in PhaseMacroExpand::generate_arraycopy(). >>>>>>> >>>>>>> But if we are running with EscapeAnalysis and EliminateAllocations, >>>>>>> the array allocations inside a call to System.arraycopy() may get >>>>>>> eliminated and thus the complete call to System.arraycopy() will be >>>>>>> removed (see PhaseMacroExpand::process_users_of_allocation). In this >>>>>>> case the extra 'length' check won't be added by >>>>>>> PhaseMacroExpand::generate_arraycopy() any more because macro >>>>>>> expansion happens after the elimination of macro nodes. >>>>>>> >>>>>>> In such a case it may happen that System.arraycopy() will silently >>>>>>> accept an invalid (i.e. negative) 'length' parameter, although it >>>>>>> should actually throw an ArrayOutOfBounds exception. >>>>>>> >>>>>>> The fix is simple: also insert a check for the length field in >>>>>>> LibraryCallKit::inline_arraycopy() if we are running with >>>>>>> EliminateAllocations. >>>>>>> >>>>>>> Regards, >>>>>>> Volker >>>>>>> From vladimir.kozlov at oracle.com Tue Sep 13 16:32:46 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 13 Sep 2016 09:32:46 -0700 Subject: RFR(S): 8159611: C2: ArrayCopy elimination skips required parameter checks In-Reply-To: References: <57B2A380.6000408@oracle.com> <41851a79-5ffe-2b9d-504a-6a2301de5384@oracle.com> <7ce01d28-13f5-098a-9898-080f8258881d@oracle.com> Message-ID: <57D82A2E.7020902@oracle.com> Yes, I agree with generate_negative_guard() in inline_arraycopy(). But I think we should path flag to ArrayCopyNode::make() when negative guards is generated in inline_arraycopy(). It is generated under several conditions so I don't want it to be missed in expand_arraycopy_node(). Thanks, Vladimir On 9/12/16 9:35 AM, Volker Simonis wrote: > Sorry for the long delay... > > Here's my new version: > > http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v3/ > > I've actually changed PhaseMacroExpand::expand_arraycopy_node() such > that it calls generate_arraycopy() with 'length_never_negative' set to > true if EliminateAllocations is true (in this case we already checked > in LibraryCallKit::inline_arraycopy() that 'length' is not negative). > This way I could leave generate_arraycopy() untouched. > > The generated code now looks as follows: > > Original version (without 'length < 0' check): > > 0a7 B5: # B17 B6 <- B4 Freq: 0,999998 > 0a7 cmpl R9, R11 # unsigned > 0aa jb,u B17 P=0,000001 C=-1,000000 > ... > 0da B7: # B18 B8 <- B6 B12 B13 Freq: 0,999997 > 0da movl R11, [rsp + #8] # spill > 0df testl R11, R11 > 0e2 jle B18 P=0,000001 C=-1,000000 > ... > 0e8 B8: # B9 <- B7 Freq: 0,999996 > 0f9 call_leaf_nofp,runtime oop_disjoint_arraycopy > ... > 106 B9: # B10 <- B8 B18 B20 Freq: 0,999997 > 113 ret > ... > 184 B17: # N1 <- B4 B5 Freq: 2,01328e-06 > 193 call,static wrapper for: > uncommon_trap(reason='intrinsic_or_type_checked_inlining' > action='make_not_entrant' debug_id='0') > > 19d B18: # B9 B19 <- B7 Freq: 9,99997e-07 > 19d testl R11, R11 > 1a0 jge B9 P=0,999999 C=-1,000000 > 1a0 > 1a6 B19: # B22 B20 <- B18 Freq: 9,99997e-13 > 1a6 movq RSI, R8 # spill > 1a9 movl RDX, #1 # int > 1ae movq RCX, R10 # spill > 1b1 movl R8, #1 # int > 1b7 movl R9, R11 # spill > nop # 1 bytes pad for loops and calls > 1bb call,static wrapper for: slow_arraycopy > > In B5 there's a check if 'offset+length' is still in the array range. > If not we jump to the uncommon trap in B17. > In B7 there's the first check from > PhaseMacroExpand::generate_arraycopy() (i.e. > generate_nonpositive_guard()). If 'length is less than or equal to > zero we jump to B18 where there's the second check from > PhaseMacroExpand::generate_arraycopy() (i.e. > generate_negative_guard()). If 'length' is zero, we jump to B9 and > return. Otherwise we fall into B19 from where we call slow_arraycopy. > slow_arraycopy (which is generated in ObjArrayKlass::copy_array() will > throw an AIOOB exception if 'length' is negative. > > The new version now looks as follows: > > 0a2 B5: # B19 B6 <- B4 Freq: 0,999998 > 0a2 cmpl R10, RCX # unsigned > 0a5 jb,u B19 P=0,000001 C=-1,000000 > 0a5 > 0ab B6: # B20 B7 <- B5 Freq: 0,999997 > 0ab movl R10, [rsp + #0] # spill > 0af testl R10, R10 > 0b2 jl B20 P=0,000001 C=-1,000000 > 0b2 > ... > 0e2 B8: # B10 B9 <- B7 B13 B14 Freq: 0,999996 > 0e2 testl R10, R10 > 0e5 je,s B10 P=0,000001 C=-1,000000 > ... > 0e7 B9: # B10 <- B8 Freq: 0,999995 > 0f8 call_leaf_nofp,runtime oop_disjoint_arraycopy > ... > 105 B10: # B11 <- B9 B8 Freq: 0,999996 > 112 ret > ... > 18e B19: # B20 <- B5 Freq: 9,99998e-07 > 192 B20: # N1 <- B18 B19 B6 Freq: 3,01327e-06 > 1a3 call,static wrapper for: > uncommon_trap(reason='intrinsic_or_type_checked_inlining' > action='make_not_entrant' debug_id='0') > > B5 is like before, but is now followed by the extra check for 'length' > being not negative in B6. In B8 we we now have the first check (i.e. > generate_negative_guard()) from > PhaseMacroExpand::generate_arraycopy(). It directly checks if 'length' > is zero and jumps to B10 (i.e. returns) if so. Otherwise we fall > directly into oop_disjoint_arraycopy(). There's no need to check for > 'length' being negative and calling 'slow_arraycopy' because this case > is already handled before now (in B6). > > Is this OK now? > > Thank you and best regards, > Volker > > > On Fri, Aug 26, 2016 at 3:51 AM, Vladimir Kozlov > wrote: >> Looks good. >> >> Check does not fold because it is different: LT vs LE. >> >> Actually there are 3 checks together with yours (see >> PhaseMacroExpand::generate_arraycopy()): >> >> Node* not_pos = generate_nonpositive_guard(ctrl, copy_length, >> length_never_negative); >> if (not_pos != NULL) { >> Node* local_ctrl = not_pos, *local_io = *io; >> MergeMemNode* local_mem = MergeMemNode::make(mem); >> transform_later(local_mem); >> >> // (6) length must not be negative. >> if (!length_never_negative) { >> generate_negative_guard(&local_ctrl, copy_length, slow_region); >> } >> >> I think the only way to avoid this is to modify code in generate_arraycopy() >> when EliminateAllocations is true. In such case you need to generate only >> length == 0 check. >> >> Thanks, >> Vladimir >> >> >> On 8/25/16 10:03 AM, Volker Simonis wrote: >>> >>> On Tue, Aug 16, 2016 at 11:49 PM, Vladimir Kozlov >>> wrote: >>>> >>>> Not generating exception is definitely bug. >>>> >>>> First, about test case. It would be nice if it also verifies other >>>> IndexOutOfBoundsException cases. >>>> >>> >>> I've extended the test case. See: >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v2/ >>> >>> With the new test I've caught another problem in C1 (only on x86 and >>> s390, but that's not in the OpenJDK yet :). >>> >>> LIR_Assembler::emit_arraycopy() had a shortcut for length==0 which >>> prevented the throwing of an ArrayStoreException if src and dst arrays >>> have incompatible type (see do_test2() in the new regression test). >>> Note that this is a different error from 8160591 and not fixed by the >>> change for 8160591. >>> >>> I've also moved the new check after the offset + length check as >>> suggested by you (see new webrev). >>> >>> Unfortunately, the new check is still not eliminated. Here's how it looks: >>> >>> 0ae B6: # B20 B7 <- B5 Freq: 0,999997 >>> 0ae movl R9, [rsp + #0] # spill >>> 0b2 testl R9, R9 >>> 0b5 jl B20 P=0,000001 C=-1,000000 >>> 0b5 >>> 0bb B7: # B12 B8 <- B6 Freq: 0,999996 >>> 0bb movl R11, [R10 + #8 (8-bit)] # compressed klass ptr >>> 0bf decode_klass_not_null RAX,R11 >>> 0cc movl RBX, [RAX + #16 (8-bit)] # int >>> 0cf movslq RCX, RBX # i2l >>> 0d2 movq RSI, precise klass [Ljava/lang/Object;: >>> 0x00007ff1080320d0:Constant:exact * # ptr >>> 0dc movq RCX, [RSI + RCX] # class >>> 0e0 cmpq RAX, RCX # ptr >>> 0e3 jne,us B12 P=0,170000 C=-1,000000 >>> 0e3 >>> 0e5 B8: # B21 B9 <- B7 B13 B14 Freq: 0,999996 >>> 0e5 testl R9, R9 >>> 0e8 jle B21 P=0,000001 C=-1,000000 >>> >>> As you can see 'testl R9, R9' is executed two times. >>> >>> I've even tried to move the new check after the subtype check, but >>> that doesn't helps either: >>> >>> 0da B7: # B20 B8 <- B6 B13 B14 Freq: 0,999997 >>> 0da movl R11, [rsp + #8] # spill >>> 0df testl R11, R11 >>> 0e2 jl B20 P=0,000001 C=-1,000000 >>> 0e2 >>> 0e8 B8: # B10 B9 <- B7 Freq: 0,999996 >>> 0e8 testl R11, R11 >>> 0eb jle,s B10 P=0,000001 C=-1,000000 >>> >>> Any idea how this could be fixed? >>> >>> Thanks, >>> Volker >>> >>> PS: and I still don't have a reproducible benchmark which shows a >>> regression with my change... >>> >>> >>>> Actually additional dynamic check will help in case of negative length is >>>> know during compilation. The allocation code will be eliminated very >>>> early >>>> instead of waiting macro expansion: >>>> >>>> int length = alloc->in(AllocateNode::ALength)->find_int_con(-1); >>>> if (length < 0) { >>>> NOT_PRODUCT(fail_eliminate = "Array's size is not constant";) >>>> can_eliminate = false; >>>> } >>>> >>>> About additional length check in your new test. I think it may be >>>> collapsed >>>> with preceding check since it is generated after other checks. >>>> So I would suggest to move it after offset + length check. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> >>>> On 8/16/16 7:57 AM, Volker Simonis wrote: >>>>> >>>>> >>>>> On Tue, Aug 16, 2016 at 7:24 AM, Tobias Hartmann >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Volker, >>>>>> >>>>>> thanks for taking care of this issue! >>>>>> >>>>>> Did you check what happens if the allocation is not eliminated and >>>>>> macro >>>>>> expansion phase emits another negative guard? Are the checks merged? >>>>>> >>>>> >>>>> It depends. I just saw that in some cases the regression test worked >>>>> before, because the length check was done in >>>>> SharedRuntime::slow_arraycopy_C(). So in that case there's obviously >>>>> nothing that can be merged. But the test case is obviously a >>>>> degenerated example anyway, so I don't think that's a problem. >>>>> >>>>> If I do a more real-world example like this where the arracopy can not >>>>> be eliminated because one of its arguments escapes: >>>>> >>>>> public static boolean do_test2(int length, Object[] dest) { >>>>> try { >>>>> System.arraycopy(new Object[10], 1, dest, 1, length); >>>>> return false; >>>>> } catch (IndexOutOfBoundsException e) { >>>>> return true; >>>>> } >>>>> } >>>>> >>>>> and call it with: >>>>> >>>>> do_test2(8, new Object[10]) >>>>> >>>>> the generated code for do_test2() unfortunately contains one more >>>>> check now with my change (the 'length' field is in [rsp + #0]): >>>>> >>>>> 0a2 B4: # B18 B5 <- B3 Freq: 0,999999 >>>>> 0a2 movl R9, [rsp + #0] # spill >>>>> 0a6 testl R9, R9 >>>>> 0a9 jl B18 P=0,000001 C=-1,000000 >>>>> 0a9 >>>>> 0af B5: # B18 B6 <- B4 Freq: 0,999998 >>>>> 0af movl RBX, R9 # spill >>>>> 0b2 incl RBX # int >>>>> 0b4 cmpl RBX, #10 # unsigned >>>>> 0b7 jnbe,u B18 P=0,000001 C=-1,000000 >>>>> >>>>> The generated code before my change looked like this (againthe >>>>> 'length' field is in [rsp + #0]): >>>>> >>>>> 0a1 B4: # B17 B5 <- B3 Freq: 0,999999 >>>>> 0a1 movl R11, [rsp + #8] # spill >>>>> 0a6 incl R11 # int >>>>> 0a9 cmpl R11, #10 # unsigned >>>>> 0ad jnbe,u B17 P=0,000001 C=-1,000000 >>>>> >>>>> It seems that the 'length' check has been completely eliminated before. >>>>> >>>>> So I need to do some more tests to understand why the new check isn't >>>>> eliminated. >>>>> >>>>> Do you think the new check results in a performance regression? Have >>>>> you run some benchmarks? >>>>> >>>>>> I would prefer brackets around the if body but you don't need to send >>>>>> another webrev: >>>>>> if (EliminateAllocations) { >>>>>> generate_negative_guard(length, slow_region); >>>>>> } >>>>> >>>>> >>>>> >>>>> Yes, I agree. >>>>> >>>>>> >>>>>> Best regards, >>>>>> Tobias >>>>>> >>>>>> On 12.08.2016 21:13, Volker Simonis wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> can I please have a review and sponsor for the following fix: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611 >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8159611 >>>>>>> >>>>>>> >>>>>>> We are inserting several checks for the arguments of >>>>>>> System.arraycopy() in LibraryCallKit::inline_arraycopy() before >>>>>>> intensifying the call in LibraryCallKit::inline_arraycopy. However the >>>>>>> check for the 'length' argument of arracopy is postponed to the macro >>>>>>> expansion phase in PhaseMacroExpand::generate_arraycopy(). >>>>>>> >>>>>>> But if we are running with EscapeAnalysis and EliminateAllocations, >>>>>>> the array allocations inside a call to System.arraycopy() may get >>>>>>> eliminated and thus the complete call to System.arraycopy() will be >>>>>>> removed (see PhaseMacroExpand::process_users_of_allocation). In this >>>>>>> case the extra 'length' check won't be added by >>>>>>> PhaseMacroExpand::generate_arraycopy() any more because macro >>>>>>> expansion happens after the elimination of macro nodes. >>>>>>> >>>>>>> In such a case it may happen that System.arraycopy() will silently >>>>>>> accept an invalid (i.e. negative) 'length' parameter, although it >>>>>>> should actually throw an ArrayOutOfBounds exception. >>>>>>> >>>>>>> The fix is simple: also insert a check for the length field in >>>>>>> LibraryCallKit::inline_arraycopy() if we are running with >>>>>>> EliminateAllocations. >>>>>>> >>>>>>> Regards, >>>>>>> Volker >>>>>>> >>>> >> From vitalyd at gmail.com Tue Sep 13 17:51:46 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 13 Sep 2016 13:51:46 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> Message-ID: On Tuesday, September 13, 2016, Cheremin Ruslan wrote: > > I'm seeing some code that iterates over a ConcurrentHashMap's entrySet > that allocates tens of GB of CHM$MapEntry objects even though they don't > escape > > > I'm a bit confused: I was sure BCEA-style params do affect EA, but don't > affect scalar replacement. With bcEscapeAnalyser you can get (sort of) > inter-procedural EA, but this only allows you to have more allocations > identified as ArgEscape instead of GlobalEscape. But you can't get more > NoEscape without real inlining. ArgEscape (afaik) is used only for > synchronization removals in HotSpot, not for scalar replacements. > > Am I incorrect? That's my understanding as well (and matches what I'm seeing in some synthetic test harnesses). I'm generally seeing a lot of variability in scalar replacement in particular, all driven by profile data. HashMap::get(int) sometimes works at eliminating the box and sometimes doesn't - the difference appears to be whether Integer::equals is inlined or not, which in turn depends on whether the lookup finds something or not and whether the number of successful lookups reaches compilation threshold. It's pretty brittle, sadly, and more importantly, unstable. > ---- > Ruslan -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheremin at gmail.com Tue Sep 13 18:25:09 2016 From: cheremin at gmail.com (Ruslan Cheremin) Date: Tue, 13 Sep 2016 21:25:09 +0300 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> Message-ID: >That's my understanding as well (and matches what I'm seeing in some synthetic test harnesses). Ok, I just tried to clear it out, because it is not the first time I see BCEA... noted in context of scalar replacement, and I start to doubt my eyes :) >t's pretty brittle, sadly, and more importantly, unstable. Making similar experiments I see the same. E.g. HashMap.get(TupleKey) lookup can be successfully scalarized 99% cases, but scalarization become broken once with slightly changed key generation schema -- because hashcodes distribution becomes worse, and HashMap buckets start to convert themself to TreeBins, and TreeBins code is much harder task for EA. Another can of worms is mismatch between different inlining heuristics. E.g. FreqInlineSize and InlineSmallCode thresholds may give different decision for the same piece of code, and taken inlining decision depends on was method already compiled or not -- which depends on thinnest details of initialization order and execution profile. This scenarios becomes rare in 1.8 with InlineSmallCode increased, but I'm not sure they are gone... Currently, I'm starting to think code needs to be specifically written for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get it for free (or it will be unstable). ---- Ruslan 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich : > > > On Tuesday, September 13, 2016, Cheremin Ruslan > wrote: > >> > I'm seeing some code that iterates over a ConcurrentHashMap's entrySet >> that allocates tens of GB of CHM$MapEntry objects even though they don't >> escape >> >> >> I'm a bit confused: I was sure BCEA-style params do affect EA, but don't >> affect scalar replacement. With bcEscapeAnalyser you can get (sort of) >> inter-procedural EA, but this only allows you to have more allocations >> identified as ArgEscape instead of GlobalEscape. But you can't get more >> NoEscape without real inlining. ArgEscape (afaik) is used only for >> synchronization removals in HotSpot, not for scalar replacements. >> >> Am I incorrect? > > That's my understanding as well (and matches what I'm seeing in some > synthetic test harnesses). > > I'm generally seeing a lot of variability in scalar replacement in > particular, all driven by profile data. HashMap::get(int) > sometimes works at eliminating the box and sometimes doesn't - the > difference appears to be whether Integer::equals is inlined or not, which > in turn depends on whether the lookup finds something or not and whether > the number of successful lookups reaches compilation threshold. It's pretty > brittle, sadly, and more importantly, unstable. > > > >> ---- >> Ruslan > > > > -- > Sent from my phone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Tue Sep 13 18:33:51 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 13 Sep 2016 14:33:51 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> Message-ID: On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin wrote: > >That's my understanding as well (and matches what I'm seeing in some > synthetic test harnesses). > > Ok, I just tried to clear it out, because it is not the first time I see > BCEA... noted in context of scalar replacement, and I start to doubt my > eyes :) > > >t's pretty brittle, sadly, and more importantly, unstable. > > Making similar experiments I see the same. E.g. HashMap.get(TupleKey) > lookup can be successfully scalarized 99% cases, but scalarization become > broken once with slightly changed key generation schema -- because > hashcodes distribution becomes worse, and HashMap buckets start to convert > themself to TreeBins, and TreeBins code is much harder task for EA. > > Another can of worms is mismatch between different inlining heuristics. > E.g. FreqInlineSize and InlineSmallCode thresholds may give different > decision for the same piece of code, and taken inlining decision depends on > was method already compiled or not -- which depends on thinnest details of > initialization order and execution profile. This scenarios becomes rare in > 1.8 with InlineSmallCode increased, but I'm not sure they are gone... > > Currently, I'm starting to think code needs to be specifically written for > EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get it > for free (or it will be unstable). > I'm not sure this is practical, to be honest, at least for a big enough application. I've long considered EA (and scalar replacement) as a bonus optimization, and never to rely on it if the allocations would hurt otherwise. I'm just a bit surprised *just* how unstable it appears to be, in the "simplest" of cases. I think code can be written to increase likelihood of scalar replacement, but I just can't see how it can be made stable to the point where you can rely/depend on it for performance. > > ---- > Ruslan > > > 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich : > >> >> >> On Tuesday, September 13, 2016, Cheremin Ruslan >> wrote: >> >>> > I'm seeing some code that iterates over a ConcurrentHashMap's entrySet >>> that allocates tens of GB of CHM$MapEntry objects even though they don't >>> escape >>> >>> >>> I'm a bit confused: I was sure BCEA-style params do affect EA, but don't >>> affect scalar replacement. With bcEscapeAnalyser you can get (sort of) >>> inter-procedural EA, but this only allows you to have more allocations >>> identified as ArgEscape instead of GlobalEscape. But you can't get more >>> NoEscape without real inlining. ArgEscape (afaik) is used only for >>> synchronization removals in HotSpot, not for scalar replacements. >>> >>> Am I incorrect? >> >> That's my understanding as well (and matches what I'm seeing in some >> synthetic test harnesses). >> >> I'm generally seeing a lot of variability in scalar replacement in >> particular, all driven by profile data. HashMap::get(int) >> sometimes works at eliminating the box and sometimes doesn't - the >> difference appears to be whether Integer::equals is inlined or not, which >> in turn depends on whether the lookup finds something or not and whether >> the number of successful lookups reaches compilation threshold. It's pretty >> brittle, sadly, and more importantly, unstable. >> >> >> >>> ---- >>> Ruslan >> >> >> >> -- >> Sent from my phone >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheremin at gmail.com Tue Sep 13 19:32:18 2016 From: cheremin at gmail.com (Ruslan Cheremin) Date: Tue, 13 Sep 2016 22:32:18 +0300 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> Message-ID: >how it can be made stable to the point where you can rely/depend on it for performance. Well, same can be said about any JIT optimization -- (may be it is time to rename dynamic runtime to stochastic runtime?). Personally I see SR to be the same order of stability as inlining. Actually, apart from few SR-specific issues (like with merge points), EA/SR mostly follow inlining: if you have enough scope inlined you'll have, say, 80% chance of SR. From my perspective it is inlining which is so surprisingly unstable. BTW: have you considered to share you experience with EA/SR pitfalls? Even if "increase likelihood" is the best option available -- there are still very little information about it in the net. ---- Ruslan 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich : > > > On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin > wrote: > >> >That's my understanding as well (and matches what I'm seeing in some >> synthetic test harnesses). >> >> Ok, I just tried to clear it out, because it is not the first time I see >> BCEA... noted in context of scalar replacement, and I start to doubt my >> eyes :) >> >> >t's pretty brittle, sadly, and more importantly, unstable. >> >> Making similar experiments I see the same. E.g. HashMap.get(TupleKey) >> lookup can be successfully scalarized 99% cases, but scalarization become >> broken once with slightly changed key generation schema -- because >> hashcodes distribution becomes worse, and HashMap buckets start to convert >> themself to TreeBins, and TreeBins code is much harder task for EA. >> >> Another can of worms is mismatch between different inlining heuristics. >> E.g. FreqInlineSize and InlineSmallCode thresholds may give different >> decision for the same piece of code, and taken inlining decision depends on >> was method already compiled or not -- which depends on thinnest details of >> initialization order and execution profile. This scenarios becomes rare in >> 1.8 with InlineSmallCode increased, but I'm not sure they are gone... >> >> Currently, I'm starting to think code needs to be specifically written >> for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get >> it for free (or it will be unstable). >> > I'm not sure this is practical, to be honest, at least for a big enough > application. I've long considered EA (and scalar replacement) as a bonus > optimization, and never to rely on it if the allocations would hurt > otherwise. I'm just a bit surprised *just* how unstable it appears to be, > in the "simplest" of cases. > > I think code can be written to increase likelihood of scalar replacement, > but I just can't see how it can be made stable to the point where you can > rely/depend on it for performance. > >> >> ---- >> Ruslan >> >> >> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich : >> >>> >>> >>> On Tuesday, September 13, 2016, Cheremin Ruslan >>> wrote: >>> >>>> > I'm seeing some code that iterates over a ConcurrentHashMap's >>>> entrySet that allocates tens of GB of CHM$MapEntry objects even though they >>>> don't escape >>>> >>>> >>>> I'm a bit confused: I was sure BCEA-style params do affect EA, but >>>> don't affect scalar replacement. With bcEscapeAnalyser you can get (sort >>>> of) inter-procedural EA, but this only allows you to have more allocations >>>> identified as ArgEscape instead of GlobalEscape. But you can't get more >>>> NoEscape without real inlining. ArgEscape (afaik) is used only for >>>> synchronization removals in HotSpot, not for scalar replacements. >>>> >>>> Am I incorrect? >>> >>> That's my understanding as well (and matches what I'm seeing in some >>> synthetic test harnesses). >>> >>> I'm generally seeing a lot of variability in scalar replacement in >>> particular, all driven by profile data. HashMap::get(int) >>> sometimes works at eliminating the box and sometimes doesn't - the >>> difference appears to be whether Integer::equals is inlined or not, which >>> in turn depends on whether the lookup finds something or not and whether >>> the number of successful lookups reaches compilation threshold. It's pretty >>> brittle, sadly, and more importantly, unstable. >>> >>> >>> >>>> ---- >>>> Ruslan >>> >>> >>> >>> -- >>> Sent from my phone >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Tue Sep 13 19:44:05 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 13 Sep 2016 15:44:05 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> Message-ID: On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin wrote: > >how it can be made stable to the point where you can rely/depend on it > for performance. > > Well, same can be said about any JIT optimization -- (may be it is time to > rename dynamic runtime to stochastic runtime?). Personally I see SR to be > the same order of stability as inlining. Actually, apart from few > SR-specific issues (like with merge points), EA/SR mostly follow inlining: > if you have enough scope inlined you'll have, say, 80% chance of SR. From > my perspective it is inlining which is so surprisingly unstable. > Yeah, I'd agree. The difference, in my mind, is failing to inline a function may not have as drastic performance implications as failing to eliminate temporaries. > > BTW: have you considered to share you experience with EA/SR pitfalls? Even > if "increase likelihood" is the best option available -- there are still > very little information about it in the net. > I'm kind of doing that via the few emails on this list :). I think you pretty much covered the biggest (apparent) flake in the equation - inlining, which can fail for all sorts of different reasons. Beyond that, there's the control flow insensitive aspect of the EA, which is tangentially related to inlining (or lack thereof). There was also another thread a few months back where I was asking why a small local array allocation wasn't scalarized, and the answer there was ordering between loop unrolling and EA passes (I can dig up that thread if you're interested). The bizarre thing there was the loop operation was folded into a constant, and the compiled method was returning a constant value, but the array allocation was left behind (although it wasn't needed). I agree that there isn't much information about EA in Hotspot (there's a lot of handwaving and inaccuracies online). In particular, it'd be nice if the performance wiki had a section on making user code play well with EA (just like it has guidance on some other JIT aspects currently). > > ---- > Ruslan > > > > 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich : > >> >> >> On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin >> wrote: >> >>> >That's my understanding as well (and matches what I'm seeing in some >>> synthetic test harnesses). >>> >>> Ok, I just tried to clear it out, because it is not the first time I see >>> BCEA... noted in context of scalar replacement, and I start to doubt my >>> eyes :) >>> >>> >t's pretty brittle, sadly, and more importantly, unstable. >>> >>> Making similar experiments I see the same. E.g. HashMap.get(TupleKey) >>> lookup can be successfully scalarized 99% cases, but scalarization become >>> broken once with slightly changed key generation schema -- because >>> hashcodes distribution becomes worse, and HashMap buckets start to convert >>> themself to TreeBins, and TreeBins code is much harder task for EA. >>> >>> Another can of worms is mismatch between different inlining heuristics. >>> E.g. FreqInlineSize and InlineSmallCode thresholds may give different >>> decision for the same piece of code, and taken inlining decision depends on >>> was method already compiled or not -- which depends on thinnest details of >>> initialization order and execution profile. This scenarios becomes rare in >>> 1.8 with InlineSmallCode increased, but I'm not sure they are gone... >>> >>> Currently, I'm starting to think code needs to be specifically written >>> for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get >>> it for free (or it will be unstable). >>> >> I'm not sure this is practical, to be honest, at least for a big enough >> application. I've long considered EA (and scalar replacement) as a bonus >> optimization, and never to rely on it if the allocations would hurt >> otherwise. I'm just a bit surprised *just* how unstable it appears to be, >> in the "simplest" of cases. >> >> I think code can be written to increase likelihood of scalar replacement, >> but I just can't see how it can be made stable to the point where you can >> rely/depend on it for performance. >> >>> >>> ---- >>> Ruslan >>> >>> >>> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich : >>> >>>> >>>> >>>> On Tuesday, September 13, 2016, Cheremin Ruslan >>>> wrote: >>>> >>>>> > I'm seeing some code that iterates over a ConcurrentHashMap's >>>>> entrySet that allocates tens of GB of CHM$MapEntry objects even though they >>>>> don't escape >>>>> >>>>> >>>>> I'm a bit confused: I was sure BCEA-style params do affect EA, but >>>>> don't affect scalar replacement. With bcEscapeAnalyser you can get (sort >>>>> of) inter-procedural EA, but this only allows you to have more allocations >>>>> identified as ArgEscape instead of GlobalEscape. But you can't get more >>>>> NoEscape without real inlining. ArgEscape (afaik) is used only for >>>>> synchronization removals in HotSpot, not for scalar replacements. >>>>> >>>>> Am I incorrect? >>>> >>>> That's my understanding as well (and matches what I'm seeing in some >>>> synthetic test harnesses). >>>> >>>> I'm generally seeing a lot of variability in scalar replacement in >>>> particular, all driven by profile data. HashMap::get(int) >>>> sometimes works at eliminating the box and sometimes doesn't - the >>>> difference appears to be whether Integer::equals is inlined or not, which >>>> in turn depends on whether the lookup finds something or not and whether >>>> the number of successful lookups reaches compilation threshold. It's pretty >>>> brittle, sadly, and more importantly, unstable. >>>> >>>> >>>> >>>>> ---- >>>>> Ruslan >>>> >>>> >>>> >>>> -- >>>> Sent from my phone >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Sep 13 19:52:24 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 13 Sep 2016 12:52:24 -0700 Subject: please sponsor? RFR(M): 8165235: [TESTBUG] RTM tests must check OS version In-Reply-To: References: Message-ID: <57D858F8.1010807@oracle.com> Submitted to JPRT. Thanks, Vladimir On 9/8/16 7:38 AM, Lindenmaier, Goetz wrote: > Hi, > > This change was reviewed by Volker Simonis and Fillipp Zhinkin. > Final webrevs: > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.bs/ > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.hs/ > > Could someone please sponsor? > > Thanks! > Goetz > >> -----Original Message----- >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- >> bounces at openjdk.java.net] On Behalf Of Lindenmaier, Goetz >> Sent: Montag, 5. September 2016 13:55 >> To: hotspot-compiler-dev at openjdk.java.net >> Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version >> >> Hi, >> >> >> >> This fixes the RTM tests wrt. to supported platforms on ppc. >> >> Please review this change. I please need a sponsor. >> http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.bs/ >> >> http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/01/webrev.hs/ >> >> >> RTM uses special instructions that are only available on recent x86 cpus. On >> x86, this feature does not need OS support. On ppc, the equivalent >> functionality, hardware transactional memory, requires OS support. Thus the >> feature is only enabled by the VM if CPU and OS are at a specific level. The >> tests must check this. too. This holds for AIX and Linux. >> >> >> >> To do so, this change introduces rtm/predicate/SupportedOS.java which >> checks for proper OS versions on ppc, else returns true. >> >> The OS version is retrieved from Platform.java, which has new methods >> getOsVersionMajor() and getOsVersionMinor(). >> >> To simplify the checks in the tests, I also introduced a 3-way AndPredicate >> constructor. >> >> >> >> To simplify the OS version check on Aix, I change enabling RTM on Aix to >> require AIX 7.2. >> >> Before, it was enabled on AIX 7.1.3.30, which contains an important bug fix. >> The >> >> last digits of this version are not exported to os.version property, so I can not >> >> check for them in the test. >> >> >> >> Best regards, >> >> Goetz. > From vitalyd at gmail.com Tue Sep 13 19:54:27 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 13 Sep 2016 15:54:27 -0400 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> Message-ID: On Tuesday, September 13, 2016, Remi Forax wrote: > I've always found that the empty inner classes generated by javac as a > kind of hack. > > These classes should be removed in Java 10, thanks to the nestmate > attributes. > http://mail.openjdk.java.net/pipermail/valhalla-spec- > experts/2016-January/000060.html > > The other solution, is to have an empty class in the jdk which is not > visible from javac (the class itself can be marked as synthetic), > so javac can use it without creating method clash. > > and to solve the problem now, the easy solution is to add a package > private constructor in ArrayList.Itr, > I'm hoping Oracle can take Kris' (Azul) patch (or do something similar). It might catch more cases than just modifying Itr. > > private class Itr implements Iterator { int cursor; // index of next element to return int lastRet = -1; // index of last element returned; -1 if no such int expectedModCount = modCount; > > Itr() { > // avoid to generate a synthetic accessor constructor > } > } > > > regards, > R?mi > > ------------------------------ > > *De: *"Vitaly Davidovich" > > *?: *"Krystal Mok" > > *Cc: *"hotspot compiler" > > *Envoy?: *Lundi 12 Septembre 2016 22:15:41 > *Objet: *Re: Odd interaction between ArrayList$Itr and Escape Analysis > > > > On Mon, Sep 12, 2016 at 3:56 PM, Krystal Mok > wrote: > >> On Mon, Sep 12, 2016 at 12:38 PM, Vitaly Davidovich > > wrote: >>> >>> It seems odd to me as well why inlining won't force load the missing >>> class(es). If we're inlining, it means the method itself or the call chain >>> it's part of is hot - failing to inline can have negative side-effects, >>> like this example. I suppose there must be a good reason why it doesn't do >>> this though? >>> >> >> That's because we can't. The JIT compilers are running on their own >> threads, and they're not real "Java threads". So they are not allowed to >> run arbitrary Java code. But Java class loading may involve running >> arbitrary Java code, e.g. the ClassLoader.loadClass() upcall. >> Force class loading can be done on the triggering side (for the top-level >> method), because compilation tasks are triggered from real Java threads, >> and they're allowed to run arbitrary Java code. >> > I see, makes sense. Perhaps there can be an option to turn on loading of > required types in the entire compilation unit, after all inlining is done > (and therefore make the unloaded types not be barriers for inlining). I'd > personally prefer that over having odd performance differences. > >> >> - Kris >> > Thanks Kris. > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheremin at gmail.com Tue Sep 13 19:55:05 2016 From: cheremin at gmail.com (Ruslan Cheremin) Date: Tue, 13 Sep 2016 22:55:05 +0300 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> Message-ID: >There was also another thread a few months back where I was asking why a small local array allocation wasn't scalarized, and the answer there was ordering between loop unrolling and EA passes (I can dig up that thread if you're interested). It would be very nice, please -- I've tried to google it by myself (because you've noted it already in the thread) but wasn't able to guess right keywords :) 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich : > > > On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin > wrote: > >> >how it can be made stable to the point where you can rely/depend on it >> for performance. >> >> Well, same can be said about any JIT optimization -- (may be it is time >> to rename dynamic runtime to stochastic runtime?). Personally I see SR to >> be the same order of stability as inlining. Actually, apart from few >> SR-specific issues (like with merge points), EA/SR mostly follow inlining: >> if you have enough scope inlined you'll have, say, 80% chance of SR. >> From my perspective it is inlining which is so surprisingly unstable. >> > Yeah, I'd agree. The difference, in my mind, is failing to inline a > function may not have as drastic performance implications as failing to > eliminate temporaries. > >> >> BTW: have you considered to share you experience with EA/SR pitfalls? >> Even if "increase likelihood" is the best option available -- there are >> still very little information about it in the net. >> > I'm kind of doing that via the few emails on this list :). I think you > pretty much covered the biggest (apparent) flake in the equation - > inlining, which can fail for all sorts of different reasons. Beyond that, > there's the control flow insensitive aspect of the EA, which is > tangentially related to inlining (or lack thereof). > > There was also another thread a few months back where I was asking why a > small local array allocation wasn't scalarized, and the answer there was > ordering between loop unrolling and EA passes (I can dig up that thread if > you're interested). The bizarre thing there was the loop operation was > folded into a constant, and the compiled method was returning a constant > value, but the array allocation was left behind (although it wasn't needed). > > I agree that there isn't much information about EA in Hotspot (there's a > lot of handwaving and inaccuracies online). In particular, it'd be nice if > the performance wiki had a section on making user code play well with EA > (just like it has guidance on some other JIT aspects currently). > >> >> ---- >> Ruslan >> >> >> >> 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich : >> >>> >>> >>> On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin >>> wrote: >>> >>>> >That's my understanding as well (and matches what I'm seeing in some >>>> synthetic test harnesses). >>>> >>>> Ok, I just tried to clear it out, because it is not the first time I >>>> see BCEA... noted in context of scalar replacement, and I start to doubt my >>>> eyes :) >>>> >>>> >t's pretty brittle, sadly, and more importantly, unstable. >>>> >>>> Making similar experiments I see the same. E.g. HashMap.get(TupleKey) >>>> lookup can be successfully scalarized 99% cases, but scalarization become >>>> broken once with slightly changed key generation schema -- because >>>> hashcodes distribution becomes worse, and HashMap buckets start to convert >>>> themself to TreeBins, and TreeBins code is much harder task for EA. >>>> >>>> Another can of worms is mismatch between different inlining heuristics. >>>> E.g. FreqInlineSize and InlineSmallCode thresholds may give different >>>> decision for the same piece of code, and taken inlining decision depends on >>>> was method already compiled or not -- which depends on thinnest details of >>>> initialization order and execution profile. This scenarios becomes rare in >>>> 1.8 with InlineSmallCode increased, but I'm not sure they are gone... >>>> >>>> Currently, I'm starting to think code needs to be specifically written >>>> for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get >>>> it for free (or it will be unstable). >>>> >>> I'm not sure this is practical, to be honest, at least for a big enough >>> application. I've long considered EA (and scalar replacement) as a bonus >>> optimization, and never to rely on it if the allocations would hurt >>> otherwise. I'm just a bit surprised *just* how unstable it appears to be, >>> in the "simplest" of cases. >>> >>> I think code can be written to increase likelihood of scalar >>> replacement, but I just can't see how it can be made stable to the point >>> where you can rely/depend on it for performance. >>> >>>> >>>> ---- >>>> Ruslan >>>> >>>> >>>> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich : >>>> >>>>> >>>>> >>>>> On Tuesday, September 13, 2016, Cheremin Ruslan >>>>> wrote: >>>>> >>>>>> > I'm seeing some code that iterates over a ConcurrentHashMap's >>>>>> entrySet that allocates tens of GB of CHM$MapEntry objects even though they >>>>>> don't escape >>>>>> >>>>>> >>>>>> I'm a bit confused: I was sure BCEA-style params do affect EA, but >>>>>> don't affect scalar replacement. With bcEscapeAnalyser you can get (sort >>>>>> of) inter-procedural EA, but this only allows you to have more allocations >>>>>> identified as ArgEscape instead of GlobalEscape. But you can't get more >>>>>> NoEscape without real inlining. ArgEscape (afaik) is used only for >>>>>> synchronization removals in HotSpot, not for scalar replacements. >>>>>> >>>>>> Am I incorrect? >>>>> >>>>> That's my understanding as well (and matches what I'm seeing in some >>>>> synthetic test harnesses). >>>>> >>>>> I'm generally seeing a lot of variability in scalar replacement in >>>>> particular, all driven by profile data. HashMap::get(int) >>>>> sometimes works at eliminating the box and sometimes doesn't - the >>>>> difference appears to be whether Integer::equals is inlined or not, which >>>>> in turn depends on whether the lookup finds something or not and whether >>>>> the number of successful lookups reaches compilation threshold. It's pretty >>>>> brittle, sadly, and more importantly, unstable. >>>>> >>>>> >>>>> >>>>>> ---- >>>>>> Ruslan >>>>> >>>>> >>>>> >>>>> -- >>>>> Sent from my phone >>>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Tue Sep 13 20:01:39 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Tue, 13 Sep 2016 13:01:39 -0700 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> Message-ID: And I'm happy to upstream that patch, if the team is interested. Now, when I first discovered the problem, my first intuition was that it's better to "fix" it in javac. But before nest mates in the Class file, there isn't much that javac could do. Changing the Java libraries to not use private constructors in inner classes is also doable, but needs changing a lot of files. So I ended up fixing it in the VM, even though I agree fully with what R?mi brought up. The access constructor tag thingy in javac is really a weird hack. If you guys ever look at the contents of ArrayList$1, it's really empty -- the class doesn't even declare some of the usual structures in a normal Class file... Hopefully we can get rid of it in javac soon. - Kris On Tuesday, September 13, 2016, Vitaly Davidovich wrote: > > > On Tuesday, September 13, 2016, Remi Forax > wrote: > >> I've always found that the empty inner classes generated by javac as a >> kind of hack. >> >> These classes should be removed in Java 10, thanks to the nestmate >> attributes. >> http://mail.openjdk.java.net/pipermail/valhalla-spec-experts >> /2016-January/000060.html >> >> The other solution, is to have an empty class in the jdk which is not >> visible from javac (the class itself can be marked as synthetic), >> so javac can use it without creating method clash. >> >> and to solve the problem now, the easy solution is to add a package >> private constructor in ArrayList.Itr, >> > I'm hoping Oracle can take Kris' (Azul) patch (or do something similar). > It might catch more cases than just modifying Itr. > >> >> private class Itr implements Iterator { int cursor; // index of next element to return int lastRet = -1; // index of last element returned; -1 if no such int expectedModCount = modCount; >> >> Itr() { >> // avoid to generate a synthetic accessor constructor >> } >> } >> >> >> regards, >> R?mi >> >> ------------------------------ >> >> *De: *"Vitaly Davidovich" >> *?: *"Krystal Mok" >> *Cc: *"hotspot compiler" >> *Envoy?: *Lundi 12 Septembre 2016 22:15:41 >> *Objet: *Re: Odd interaction between ArrayList$Itr and Escape Analysis >> >> >> >> On Mon, Sep 12, 2016 at 3:56 PM, Krystal Mok >> wrote: >> >>> On Mon, Sep 12, 2016 at 12:38 PM, Vitaly Davidovich >>> wrote: >>>> >>>> It seems odd to me as well why inlining won't force load the missing >>>> class(es). If we're inlining, it means the method itself or the call chain >>>> it's part of is hot - failing to inline can have negative side-effects, >>>> like this example. I suppose there must be a good reason why it doesn't do >>>> this though? >>>> >>> >>> That's because we can't. The JIT compilers are running on their own >>> threads, and they're not real "Java threads". So they are not allowed to >>> run arbitrary Java code. But Java class loading may involve running >>> arbitrary Java code, e.g. the ClassLoader.loadClass() upcall. >>> Force class loading can be done on the triggering side (for the >>> top-level method), because compilation tasks are triggered from real Java >>> threads, and they're allowed to run arbitrary Java code. >>> >> I see, makes sense. Perhaps there can be an option to turn on loading of >> required types in the entire compilation unit, after all inlining is done >> (and therefore make the unloaded types not be barriers for inlining). I'd >> personally prefer that over having odd performance differences. >> >>> >>> - Kris >>> >> Thanks Kris. >> >> > > -- > Sent from my phone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Sep 13 20:15:06 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 13 Sep 2016 13:15:06 -0700 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> Message-ID: <57D85E4A.2080302@oracle.com> If allocation is done locally in loop it could be SR (but not guaranteed): for () { Foo f = new Foo(); } "Currently" we can't SR it if there is merge: Foo f = new Foo(); for () { f = new Foo(); } x = f.x; Also we can't SR an array if it has index access because we can't map loads/stores to concrete element: int[] a = new int[3]; for (i) { x = a[i] } If elements are accessed without index (using array to pass or return several values) or a loop is fully unrolled we can SR it: x0 = a[0]; x1 = a[1]; x2 = a[2]; Regards, Vladimir On 9/13/16 12:55 PM, Ruslan Cheremin wrote: >>There was also another thread a few months back where I was asking why a small local array allocation wasn't scalarized, and the answer there was ordering between loop unrolling and EA passes (I can > dig up that thread if you're interested). > > It would be very nice, please -- I've tried to google it by myself (because you've noted it already in the thread) but wasn't able to guess right keywords :) > > > 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich >: > > > > On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin > wrote: > > >how it can be made stable to the point where you can rely/depend on it for performance. > > Well, same can be said about any JIT optimization -- (may be it is time to rename dynamic runtime to stochastic runtime?). Personally I see SR to be the same order of stability as inlining. > Actually, apart from few SR-specific issues (like with merge points), EA/SR mostly follow inlining: if you have enough scope inlined you'll have, say, 80% chance of SR. From my perspective it > is inlining which is so surprisingly unstable. > > Yeah, I'd agree. The difference, in my mind, is failing to inline a function may not have as drastic performance implications as failing to eliminate temporaries. > > > BTW: have you considered to share you experience with EA/SR pitfalls? Even if "increase likelihood" is the best option available -- there are still very little information about it in the net. > > I'm kind of doing that via the few emails on this list :). I think you pretty much covered the biggest (apparent) flake in the equation - inlining, which can fail for all sorts of different > reasons. Beyond that, there's the control flow insensitive aspect of the EA, which is tangentially related to inlining (or lack thereof). > > There was also another thread a few months back where I was asking why a small local array allocation wasn't scalarized, and the answer there was ordering between loop unrolling and EA passes (I > can dig up that thread if you're interested). The bizarre thing there was the loop operation was folded into a constant, and the compiled method was returning a constant value, but the array > allocation was left behind (although it wasn't needed). > > I agree that there isn't much information about EA in Hotspot (there's a lot of handwaving and inaccuracies online). In particular, it'd be nice if the performance wiki had a section on making > user code play well with EA (just like it has guidance on some other JIT aspects currently). > > > ---- > Ruslan > > > > 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich >: > > > > On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin > wrote: > > >That's my understanding as well (and matches what I'm seeing in some synthetic test harnesses). > > Ok, I just tried to clear it out, because it is not the first time I see BCEA... noted in context of scalar replacement, and I start to doubt my eyes :) > > >t's pretty brittle, sadly, and more importantly, unstable. > > Making similar experiments I see the same. E.g. HashMap.get(TupleKey) lookup can be successfully scalarized 99% cases, but scalarization become broken once with slightly changed key > generation schema -- because hashcodes distribution becomes worse, and HashMap buckets start to convert themself to TreeBins, and TreeBins code is much harder task for EA. > > Another can of worms is mismatch between different inlining heuristics. E.g. FreqInlineSize and InlineSmallCode thresholds may give different decision for the same piece of code, and > taken inlining decision depends on was method already compiled or not -- which depends on thinnest details of initialization order and execution profile. This scenarios becomes rare in > 1.8 with InlineSmallCode increased, but I'm not sure they are gone... > > Currently, I'm starting to think code needs to be specifically written for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get it for free (or it will be unstable). > > I'm not sure this is practical, to be honest, at least for a big enough application. I've long considered EA (and scalar replacement) as a bonus optimization, and never to rely on it if > the allocations would hurt otherwise. I'm just a bit surprised *just* how unstable it appears to be, in the "simplest" of cases. > > I think code can be written to increase likelihood of scalar replacement, but I just can't see how it can be made stable to the point where you can rely/depend on it for performance. > > > ---- > Ruslan > > > 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich >: > > > > On Tuesday, September 13, 2016, Cheremin Ruslan > wrote: > > > I'm seeing some code that iterates over a ConcurrentHashMap's entrySet that allocates tens of GB of CHM$MapEntry objects even though they don't escape > > > I'm a bit confused: I was sure BCEA-style params do affect EA, but don't affect scalar replacement. With bcEscapeAnalyser you can get (sort of) inter-procedural EA, but this > only allows you to have more allocations identified as ArgEscape instead of GlobalEscape. But you can't get more NoEscape without real inlining. ArgEscape (afaik) is used only > for synchronization removals in HotSpot, not for scalar replacements. > > Am I incorrect? > > That's my understanding as well (and matches what I'm seeing in some synthetic test harnesses). > > I'm generally seeing a lot of variability in scalar replacement in particular, all driven by profile data. HashMap::get(int) sometimes works at eliminating the box > and sometimes doesn't - the difference appears to be whether Integer::equals is inlined or not, which in turn depends on whether the lookup finds something or not and whether the > number of successful lookups reaches compilation threshold. It's pretty brittle, sadly, and more importantly, unstable. > > > > ---- > Ruslan > > > > -- > Sent from my phone > > > > > > From vitalyd at gmail.com Tue Sep 13 20:29:13 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 13 Sep 2016 16:29:13 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> Message-ID: On Tue, Sep 13, 2016 at 3:55 PM, Ruslan Cheremin wrote: > >There was also another thread a few months back where I was asking why a > small local array allocation wasn't scalarized, and the answer there was > ordering between loop unrolling and EA passes (I can dig up that thread if > you're interested). > > It would be very nice, please -- I've tried to google it by myself > (because you've noted it already in the thread) but wasn't able to guess > right keywords :) > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-December/020546.html > > > 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich : > >> >> >> On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin >> wrote: >> >>> >how it can be made stable to the point where you can rely/depend on it >>> for performance. >>> >>> Well, same can be said about any JIT optimization -- (may be it is time >>> to rename dynamic runtime to stochastic runtime?). Personally I see SR to >>> be the same order of stability as inlining. Actually, apart from few >>> SR-specific issues (like with merge points), EA/SR mostly follow inlining: >>> if you have enough scope inlined you'll have, say, 80% chance of SR. >>> From my perspective it is inlining which is so surprisingly unstable. >>> >> Yeah, I'd agree. The difference, in my mind, is failing to inline a >> function may not have as drastic performance implications as failing to >> eliminate temporaries. >> >>> >>> BTW: have you considered to share you experience with EA/SR pitfalls? >>> Even if "increase likelihood" is the best option available -- there are >>> still very little information about it in the net. >>> >> I'm kind of doing that via the few emails on this list :). I think you >> pretty much covered the biggest (apparent) flake in the equation - >> inlining, which can fail for all sorts of different reasons. Beyond that, >> there's the control flow insensitive aspect of the EA, which is >> tangentially related to inlining (or lack thereof). >> >> There was also another thread a few months back where I was asking why a >> small local array allocation wasn't scalarized, and the answer there was >> ordering between loop unrolling and EA passes (I can dig up that thread if >> you're interested). The bizarre thing there was the loop operation was >> folded into a constant, and the compiled method was returning a constant >> value, but the array allocation was left behind (although it wasn't needed). >> >> I agree that there isn't much information about EA in Hotspot (there's a >> lot of handwaving and inaccuracies online). In particular, it'd be nice if >> the performance wiki had a section on making user code play well with EA >> (just like it has guidance on some other JIT aspects currently). >> >>> >>> ---- >>> Ruslan >>> >>> >>> >>> 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich : >>> >>>> >>>> >>>> On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin >>>> wrote: >>>> >>>>> >That's my understanding as well (and matches what I'm seeing in some >>>>> synthetic test harnesses). >>>>> >>>>> Ok, I just tried to clear it out, because it is not the first time I >>>>> see BCEA... noted in context of scalar replacement, and I start to doubt my >>>>> eyes :) >>>>> >>>>> >t's pretty brittle, sadly, and more importantly, unstable. >>>>> >>>>> Making similar experiments I see the same. E.g. HashMap.get(TupleKey) >>>>> lookup can be successfully scalarized 99% cases, but scalarization become >>>>> broken once with slightly changed key generation schema -- because >>>>> hashcodes distribution becomes worse, and HashMap buckets start to convert >>>>> themself to TreeBins, and TreeBins code is much harder task for EA. >>>>> >>>>> Another can of worms is mismatch between different inlining >>>>> heuristics. E.g. FreqInlineSize and InlineSmallCode thresholds may give >>>>> different decision for the same piece of code, and taken inlining decision >>>>> depends on was method already compiled or not -- which depends on thinnest >>>>> details of initialization order and execution profile. This scenarios >>>>> becomes rare in 1.8 with InlineSmallCode increased, but I'm not sure they >>>>> are gone... >>>>> >>>>> Currently, I'm starting to think code needs to be specifically written >>>>> for EA/SR in mind to be more-or-less stably scalarized. I.e. you can't get >>>>> it for free (or it will be unstable). >>>>> >>>> I'm not sure this is practical, to be honest, at least for a big enough >>>> application. I've long considered EA (and scalar replacement) as a bonus >>>> optimization, and never to rely on it if the allocations would hurt >>>> otherwise. I'm just a bit surprised *just* how unstable it appears to be, >>>> in the "simplest" of cases. >>>> >>>> I think code can be written to increase likelihood of scalar >>>> replacement, but I just can't see how it can be made stable to the point >>>> where you can rely/depend on it for performance. >>>> >>>>> >>>>> ---- >>>>> Ruslan >>>>> >>>>> >>>>> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich : >>>>> >>>>>> >>>>>> >>>>>> On Tuesday, September 13, 2016, Cheremin Ruslan >>>>>> wrote: >>>>>> >>>>>>> > I'm seeing some code that iterates over a ConcurrentHashMap's >>>>>>> entrySet that allocates tens of GB of CHM$MapEntry objects even though they >>>>>>> don't escape >>>>>>> >>>>>>> >>>>>>> I'm a bit confused: I was sure BCEA-style params do affect EA, but >>>>>>> don't affect scalar replacement. With bcEscapeAnalyser you can get (sort >>>>>>> of) inter-procedural EA, but this only allows you to have more allocations >>>>>>> identified as ArgEscape instead of GlobalEscape. But you can't get more >>>>>>> NoEscape without real inlining. ArgEscape (afaik) is used only for >>>>>>> synchronization removals in HotSpot, not for scalar replacements. >>>>>>> >>>>>>> Am I incorrect? >>>>>> >>>>>> That's my understanding as well (and matches what I'm seeing in some >>>>>> synthetic test harnesses). >>>>>> >>>>>> I'm generally seeing a lot of variability in scalar replacement in >>>>>> particular, all driven by profile data. HashMap::get(int) >>>>>> sometimes works at eliminating the box and sometimes doesn't - the >>>>>> difference appears to be whether Integer::equals is inlined or not, which >>>>>> in turn depends on whether the lookup finds something or not and whether >>>>>> the number of successful lookups reaches compilation threshold. It's pretty >>>>>> brittle, sadly, and more importantly, unstable. >>>>>> >>>>>> >>>>>> >>>>>>> ---- >>>>>>> Ruslan >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sent from my phone >>>>>> >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Tue Sep 13 22:33:59 2016 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 14 Sep 2016 00:33:59 +0200 Subject: RFR: 8165434: [JVMCI] remove uses of setAccessible In-Reply-To: <21860311-D6E9-482B-B0A0-F488A516A1D3@oracle.com> References: <864558C5-C2AD-4D6B-BB6F-568F00BBE28A@twitter.com> <6224CDA0-63E6-442C-BD13-732208FA75A2@oracle.com> <999A422E-6CF6-45C5-955B-D58745DBB456@twitter.com> <21860311-D6E9-482B-B0A0-F488A516A1D3@oracle.com> Message-ID: <9B5CFF51-7C4E-44E8-B743-B37411E3C77C@oracle.com> JPRT testing revealed a test bug in FindUniqueConcreteMethodTest.java where CompileToVM.findUniqueConcreteMethod was being called with a default method. This is not supported by HotSpot which is why the only other usage of this private API avoids it: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/ec36e3e03d65/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java#l385 The offending test in FindUniqueConcreteMethodTest has been commented out. -Doug > On 08 Sep 2016, at 15:12, Doug Simon wrote: > >> >> On 07 Sep 2016, at 19:52, Christian Thalinger wrote: >> >>> >>> On Sep 7, 2016, at 2:29 AM, Doug Simon wrote: >>> >>>> >>>> On 06 Sep 2016, at 20:12, Christian Thalinger wrote: >>>> >>>> >>>>> On Sep 5, 2016, at 6:45 AM, Doug Simon wrote: >>>>> >>>>> JVMCI currently uses java.lang.reflect.AccessibleObject.setAccessible to get at private internals of certain JDK objects (e.g. java.lang.reflect.Method::slot). In light of changes around java.lang.reflect.AccessibleObject::setAccessible at http://openjdk.java.net/jeps/261, this may require extra command line options at some point. To avoid that, I?ve removed all uses of setAccessible in JVMCI. >>>>> >>>>> http://cr.openjdk.java.net/~dnsimon/8165434/ >>>> >>>> src/jdk.vm.ci/share/classes/jdk.vm.ci.meta/src/jdk/vm/ci/meta/ModifiersProvider.java >>>> >>>> + int BRIDGE = 0x0040; >>>> + int VARARGS = 0x0080; >>>> + int SYNTHETIC = 0x1000; >>>> + int ANNOTATION = 0x2000; >>>> + int ENUM = 0x4000; >>>> I wish we could avoid that. We can?t use this stuff because it?s HotSpot-dependent, right? >>>> + assert ModifiersProvider.SYNTHETIC == getConstant("JVM_ACC_SYNTHETIC", Integer.class); >>>> + assert ModifiersProvider.ANNOTATION == getConstant("JVM_ACC_ANNOTATION", Integer.class); >>>> + assert ModifiersProvider.BRIDGE == getConstant("JVM_ACC_BRIDGE", Integer.class); >>>> + assert ModifiersProvider.VARARGS == getConstant("JVM_ACC_VARARGS", Integer.class); >>>> + assert ModifiersProvider.ENUM == getConstant("JVM_ACC_ENUM", Integer.class); >>>> What if we convert these constants to interface methods and the VM-dependent part has to implement them? Or maybe even keep the fields and assign them via interface methods. >>> >>> Following your suggestion, I?ve factored out these VM dependent flags to a new HotSpotModifiers class: >>> >>> http://cr.openjdk.java.net/~dnsimon/8165434.v2/ >> >> Excellent. One question? I noticed HotSpotModifiers is an interface but no other class implements it. Is there a reason for it being an interface? > > Nope. It?s now a class. > >> >> Only nit, remove 2011: >> 2 * Copyright (c) 2011, 2016, Oracle and/or its affiliates. All rights reserved. > > Fixed. > > -Doug From goetz.lindenmaier at sap.com Wed Sep 14 06:29:09 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 14 Sep 2016 06:29:09 +0000 Subject: please sponsor? RFR(M): 8165235: [TESTBUG] RTM tests must check OS version In-Reply-To: <57D858F8.1010807@oracle.com> References: <57D858F8.1010807@oracle.com> Message-ID: Hi Vladimir, Thanks a lot! Best regards, Goetz > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Dienstag, 13. September 2016 21:52 > To: Lindenmaier, Goetz ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: please sponsor? RFR(M): 8165235: [TESTBUG] RTM tests must > check OS version > > Submitted to JPRT. > > Thanks, > Vladimir > > On 9/8/16 7:38 AM, Lindenmaier, Goetz wrote: > > Hi, > > > > This change was reviewed by Volker Simonis and Fillipp Zhinkin. > > Final webrevs: > > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.bs/ > > http://cr.openjdk.java.net/~goetz/wr16/8165235-osRecog/03/webrev.hs/ > > > > Could someone please sponsor? > > > > Thanks! > > Goetz > > > >> -----Original Message----- > >> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > >> bounces at openjdk.java.net] On Behalf Of Lindenmaier, Goetz > >> Sent: Montag, 5. September 2016 13:55 > >> To: hotspot-compiler-dev at openjdk.java.net > >> Subject: RFR(M): 8165235: [TESTBUG] RTM tests must check OS version > >> > >> Hi, > >> > >> > >> > >> This fixes the RTM tests wrt. to supported platforms on ppc. > >> > >> Please review this change. I please need a sponsor. > >> http://cr.openjdk.java.net/~goetz/wr16/8165235- > osRecog/01/webrev.bs/ > >> > >> http://cr.openjdk.java.net/~goetz/wr16/8165235- > osRecog/01/webrev.hs/ > >> > >> > >> RTM uses special instructions that are only available on recent x86 cpus. > On > >> x86, this feature does not need OS support. On ppc, the equivalent > >> functionality, hardware transactional memory, requires OS support. Thus > the > >> feature is only enabled by the VM if CPU and OS are at a specific level. The > >> tests must check this. too. This holds for AIX and Linux. > >> > >> > >> > >> To do so, this change introduces rtm/predicate/SupportedOS.java which > >> checks for proper OS versions on ppc, else returns true. > >> > >> The OS version is retrieved from Platform.java, which has new methods > >> getOsVersionMajor() and getOsVersionMinor(). > >> > >> To simplify the checks in the tests, I also introduced a 3-way AndPredicate > >> constructor. > >> > >> > >> > >> To simplify the OS version check on Aix, I change enabling RTM on Aix to > >> require AIX 7.2. > >> > >> Before, it was enabled on AIX 7.1.3.30, which contains an important bug > fix. > >> The > >> > >> last digits of this version are not exported to os.version property, so I can > not > >> > >> check for them in the test. > >> > >> > >> > >> Best regards, > >> > >> Goetz. > > From vladimir.x.ivanov at oracle.com Wed Sep 14 10:12:35 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 14 Sep 2016 13:12:35 +0300 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> Message-ID: <4c873846-5322-ebdf-5e0a-393089aea590@oracle.com> Kris, > And I'm happy to upstream that patch, if the team is interested. Sure, we are definitely interested in fixing that. Feel free to file a bug and send the fix out for review. > Now, when I first discovered the problem, my first intuition was that > it's better to "fix" it in javac. But before nest mates in the Class > file, there isn't much that javac could do. Changing the Java libraries > to not use private constructors in inner classes is also doable, but > needs changing a lot of files. I agree that javac is not the best place to fix the immediate problem: it requires recompilation and there are already lots of problematic bytecode shapes out in the wild. The JVM should optimize for that case instead. > So I ended up fixing it in the VM, even though I agree fully with what > R?mi brought up. I'm curious how did you fix it. I haven't found a description in the thread. It's possible to force class loading, but I'm worried about undesirable effects of class initialization. Is it enough for C2 to have the class loaded but not initialized to make it work? Another approach would be to issue a null check and deoptimize (for bridge methods, the check collapses after inlining since the argument is always null) or add a nmethod dependency and throw away the code when the parameter class is loaded. Best regards, Vladimir Ivanov > The access constructor tag thingy in javac is really a weird hack. If > you guys ever look at the contents of ArrayList$1, it's really empty > -- the class doesn't even declare some of the usual structures in a > normal Class file... Hopefully we can get rid of it in javac soon. > On Tuesday, September 13, 2016, Vitaly Davidovich > wrote: > > > > On Tuesday, September 13, 2016, Remi Forax > wrote: > > I've always found that the empty inner classes generated by > javac as a kind of hack. > > These classes should be removed in Java 10, thanks to the > nestmate attributes. > > http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-January/000060.html > > > The other solution, is to have an empty class in the jdk which > is not visible from javac (the class itself can be marked as > synthetic), > so javac can use it without creating method clash. > > and to solve the problem now, the easy solution is to add a > package private constructor in ArrayList.Itr, > > I'm hoping Oracle can take Kris' (Azul) patch (or do something > similar). It might catch more cases than just modifying Itr. > > > private class Itr implements Iterator { > int cursor; // index of next element to return > int lastRet = -1; // index of last element returned; -1 if no such > int expectedModCount = modCount; > > Itr() { > // avoid to generate a synthetic accessor constructor > } > } > > > regards, > R?mi > > ------------------------------------------------------------------------ > > *De: *"Vitaly Davidovich" > *?: *"Krystal Mok" > *Cc: *"hotspot compiler" > *Envoy?: *Lundi 12 Septembre 2016 22:15:41 > *Objet: *Re: Odd interaction between ArrayList$Itr and > Escape Analysis > > > > On Mon, Sep 12, 2016 at 3:56 PM, Krystal Mok > wrote: > > On Mon, Sep 12, 2016 at 12:38 PM, Vitaly Davidovich > wrote: > > It seems odd to me as well why inlining won't force > load the missing class(es). If we're inlining, it > means the method itself or the call chain it's part > of is hot - failing to inline can have negative > side-effects, like this example. I suppose there > must be a good reason why it doesn't do this though? > > > That's because we can't. The JIT compilers are running > on their own threads, and they're not real "Java > threads". So they are not allowed to run arbitrary Java > code. But Java class loading may involve running > arbitrary Java code, e.g. the ClassLoader.loadClass() > upcall. > Force class loading can be done on the triggering side > (for the top-level method), because compilation tasks > are triggered from real Java threads, and they're > allowed to run arbitrary Java code. > > I see, makes sense. Perhaps there can be an option to turn > on loading of required types in the entire compilation unit, > after all inlining is done (and therefore make the unloaded > types not be barriers for inlining). I'd personally prefer > that over having odd performance differences. > > > - Kris > > Thanks Kris. > > > > -- > Sent from my phone > From zoltan.majo at oracle.com Wed Sep 14 12:19:15 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Wed, 14 Sep 2016 14:19:15 +0200 Subject: RFR(S): 8159611: C2: ArrayCopy elimination skips required parameter checks In-Reply-To: <8e399624-8e67-ebe6-d348-7691690532e8@oracle.com> References: <57B2A380.6000408@oracle.com> <41851a79-5ffe-2b9d-504a-6a2301de5384@oracle.com> <7ce01d28-13f5-098a-9898-080f8258881d@oracle.com> <8e399624-8e67-ebe6-d348-7691690532e8@oracle.com> Message-ID: Hi Volker, On 09/13/2016 05:04 PM, Zolt?n Maj? wrote: > Hi Volker, > > > On 09/12/2016 06:35 PM, Volker Simonis wrote: >> Sorry for the long delay... > > thank you for spending more time on this bug and also for the detailed > description of the way your solution works! > >> >> Here's my new version: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v3/ > > That looks good to me. > > I did a preliminary performance evaluation with Octane-Gbemu and > Octane-PdfJS, results look good on all platforms. Let me now do a > more detailed evaluation. I'll get back to you once the results are > available. The performance evaluation with webrev.v3 is complete now. The change does not cause any performance regressions (neither for SPECjvm2008 nor for Octane). Once you update the code according to Vladimir's suggestions, I can look again. Thank you! Best regards, Zoltan > > Thank you! > > Best regards, > > > Zoltan > >> >> I've actually changed PhaseMacroExpand::expand_arraycopy_node() such >> that it calls generate_arraycopy() with 'length_never_negative' set to >> true if EliminateAllocations is true (in this case we already checked >> in LibraryCallKit::inline_arraycopy() that 'length' is not negative). >> This way I could leave generate_arraycopy() untouched. >> >> The generated code now looks as follows: >> >> Original version (without 'length < 0' check): >> >> 0a7 B5: # B17 B6 <- B4 Freq: 0,999998 >> 0a7 cmpl R9, R11 # unsigned >> 0aa jb,u B17 P=0,000001 C=-1,000000 >> ... >> 0da B7: # B18 B8 <- B6 B12 B13 Freq: 0,999997 >> 0da movl R11, [rsp + #8] # spill >> 0df testl R11, R11 >> 0e2 jle B18 P=0,000001 C=-1,000000 >> ... >> 0e8 B8: # B9 <- B7 Freq: 0,999996 >> 0f9 call_leaf_nofp,runtime oop_disjoint_arraycopy >> ... >> 106 B9: # B10 <- B8 B18 B20 Freq: 0,999997 >> 113 ret >> ... >> 184 B17: # N1 <- B4 B5 Freq: 2,01328e-06 >> 193 call,static wrapper for: >> uncommon_trap(reason='intrinsic_or_type_checked_inlining' >> action='make_not_entrant' debug_id='0') >> >> 19d B18: # B9 B19 <- B7 Freq: 9,99997e-07 >> 19d testl R11, R11 >> 1a0 jge B9 P=0,999999 C=-1,000000 >> 1a0 >> 1a6 B19: # B22 B20 <- B18 Freq: 9,99997e-13 >> 1a6 movq RSI, R8 # spill >> 1a9 movl RDX, #1 # int >> 1ae movq RCX, R10 # spill >> 1b1 movl R8, #1 # int >> 1b7 movl R9, R11 # spill >> nop # 1 bytes pad for loops and calls >> 1bb call,static wrapper for: slow_arraycopy >> >> In B5 there's a check if 'offset+length' is still in the array range. >> If not we jump to the uncommon trap in B17. >> In B7 there's the first check from >> PhaseMacroExpand::generate_arraycopy() (i.e. >> generate_nonpositive_guard()). If 'length is less than or equal to >> zero we jump to B18 where there's the second check from >> PhaseMacroExpand::generate_arraycopy() (i.e. >> generate_negative_guard()). If 'length' is zero, we jump to B9 and >> return. Otherwise we fall into B19 from where we call slow_arraycopy. >> slow_arraycopy (which is generated in ObjArrayKlass::copy_array() will >> throw an AIOOB exception if 'length' is negative. >> >> The new version now looks as follows: >> >> 0a2 B5: # B19 B6 <- B4 Freq: 0,999998 >> 0a2 cmpl R10, RCX # unsigned >> 0a5 jb,u B19 P=0,000001 C=-1,000000 >> 0a5 >> 0ab B6: # B20 B7 <- B5 Freq: 0,999997 >> 0ab movl R10, [rsp + #0] # spill >> 0af testl R10, R10 >> 0b2 jl B20 P=0,000001 C=-1,000000 >> 0b2 >> ... >> 0e2 B8: # B10 B9 <- B7 B13 B14 Freq: 0,999996 >> 0e2 testl R10, R10 >> 0e5 je,s B10 P=0,000001 C=-1,000000 >> ... >> 0e7 B9: # B10 <- B8 Freq: 0,999995 >> 0f8 call_leaf_nofp,runtime oop_disjoint_arraycopy >> ... >> 105 B10: # B11 <- B9 B8 Freq: 0,999996 >> 112 ret >> ... >> 18e B19: # B20 <- B5 Freq: 9,99998e-07 >> 192 B20: # N1 <- B18 B19 B6 Freq: 3,01327e-06 >> 1a3 call,static wrapper for: >> uncommon_trap(reason='intrinsic_or_type_checked_inlining' >> action='make_not_entrant' debug_id='0') >> >> B5 is like before, but is now followed by the extra check for 'length' >> being not negative in B6. In B8 we we now have the first check (i.e. >> generate_negative_guard()) from >> PhaseMacroExpand::generate_arraycopy(). It directly checks if 'length' >> is zero and jumps to B10 (i.e. returns) if so. Otherwise we fall >> directly into oop_disjoint_arraycopy(). There's no need to check for >> 'length' being negative and calling 'slow_arraycopy' because this case >> is already handled before now (in B6). >> >> Is this OK now? >> >> Thank you and best regards, >> Volker >> >> >> On Fri, Aug 26, 2016 at 3:51 AM, Vladimir Kozlov >> wrote: >>> Looks good. >>> >>> Check does not fold because it is different: LT vs LE. >>> >>> Actually there are 3 checks together with yours (see >>> PhaseMacroExpand::generate_arraycopy()): >>> >>> Node* not_pos = generate_nonpositive_guard(ctrl, copy_length, >>> length_never_negative); >>> if (not_pos != NULL) { >>> Node* local_ctrl = not_pos, *local_io = *io; >>> MergeMemNode* local_mem = MergeMemNode::make(mem); >>> transform_later(local_mem); >>> >>> // (6) length must not be negative. >>> if (!length_never_negative) { >>> generate_negative_guard(&local_ctrl, copy_length, slow_region); >>> } >>> >>> I think the only way to avoid this is to modify code in >>> generate_arraycopy() >>> when EliminateAllocations is true. In such case you need to generate >>> only >>> length == 0 check. >>> >>> Thanks, >>> Vladimir >>> >>> >>> On 8/25/16 10:03 AM, Volker Simonis wrote: >>>> On Tue, Aug 16, 2016 at 11:49 PM, Vladimir Kozlov >>>> wrote: >>>>> Not generating exception is definitely bug. >>>>> >>>>> First, about test case. It would be nice if it also verifies other >>>>> IndexOutOfBoundsException cases. >>>>> >>>> I've extended the test case. See: >>>> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611.v2/ >>>> >>>> With the new test I've caught another problem in C1 (only on x86 and >>>> s390, but that's not in the OpenJDK yet :). >>>> >>>> LIR_Assembler::emit_arraycopy() had a shortcut for length==0 which >>>> prevented the throwing of an ArrayStoreException if src and dst arrays >>>> have incompatible type (see do_test2() in the new regression test). >>>> Note that this is a different error from 8160591 and not fixed by the >>>> change for 8160591. >>>> >>>> I've also moved the new check after the offset + length check as >>>> suggested by you (see new webrev). >>>> >>>> Unfortunately, the new check is still not eliminated. Here's how it >>>> looks: >>>> >>>> 0ae B6: # B20 B7 <- B5 Freq: 0,999997 >>>> 0ae movl R9, [rsp + #0] # spill >>>> 0b2 testl R9, R9 >>>> 0b5 jl B20 P=0,000001 C=-1,000000 >>>> 0b5 >>>> 0bb B7: # B12 B8 <- B6 Freq: 0,999996 >>>> 0bb movl R11, [R10 + #8 (8-bit)] # compressed klass ptr >>>> 0bf decode_klass_not_null RAX,R11 >>>> 0cc movl RBX, [RAX + #16 (8-bit)] # int >>>> 0cf movslq RCX, RBX # i2l >>>> 0d2 movq RSI, precise klass [Ljava/lang/Object;: >>>> 0x00007ff1080320d0:Constant:exact * # ptr >>>> 0dc movq RCX, [RSI + RCX] # class >>>> 0e0 cmpq RAX, RCX # ptr >>>> 0e3 jne,us B12 P=0,170000 C=-1,000000 >>>> 0e3 >>>> 0e5 B8: # B21 B9 <- B7 B13 B14 Freq: 0,999996 >>>> 0e5 testl R9, R9 >>>> 0e8 jle B21 P=0,000001 C=-1,000000 >>>> >>>> As you can see 'testl R9, R9' is executed two times. >>>> >>>> I've even tried to move the new check after the subtype check, but >>>> that doesn't helps either: >>>> >>>> 0da B7: # B20 B8 <- B6 B13 B14 Freq: 0,999997 >>>> 0da movl R11, [rsp + #8] # spill >>>> 0df testl R11, R11 >>>> 0e2 jl B20 P=0,000001 C=-1,000000 >>>> 0e2 >>>> 0e8 B8: # B10 B9 <- B7 Freq: 0,999996 >>>> 0e8 testl R11, R11 >>>> 0eb jle,s B10 P=0,000001 C=-1,000000 >>>> >>>> Any idea how this could be fixed? >>>> >>>> Thanks, >>>> Volker >>>> >>>> PS: and I still don't have a reproducible benchmark which shows a >>>> regression with my change... >>>> >>>> >>>>> Actually additional dynamic check will help in case of negative >>>>> length is >>>>> know during compilation. The allocation code will be eliminated very >>>>> early >>>>> instead of waiting macro expansion: >>>>> >>>>> int length = >>>>> alloc->in(AllocateNode::ALength)->find_int_con(-1); >>>>> if (length < 0) { >>>>> NOT_PRODUCT(fail_eliminate = "Array's size is not >>>>> constant";) >>>>> can_eliminate = false; >>>>> } >>>>> >>>>> About additional length check in your new test. I think it may be >>>>> collapsed >>>>> with preceding check since it is generated after other checks. >>>>> So I would suggest to move it after offset + length check. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> >>>>> On 8/16/16 7:57 AM, Volker Simonis wrote: >>>>>> >>>>>> On Tue, Aug 16, 2016 at 7:24 AM, Tobias Hartmann >>>>>> wrote: >>>>>>> >>>>>>> Hi Volker, >>>>>>> >>>>>>> thanks for taking care of this issue! >>>>>>> >>>>>>> Did you check what happens if the allocation is not eliminated and >>>>>>> macro >>>>>>> expansion phase emits another negative guard? Are the checks >>>>>>> merged? >>>>>>> >>>>>> It depends. I just saw that in some cases the regression test worked >>>>>> before, because the length check was done in >>>>>> SharedRuntime::slow_arraycopy_C(). So in that case there's obviously >>>>>> nothing that can be merged. But the test case is obviously a >>>>>> degenerated example anyway, so I don't think that's a problem. >>>>>> >>>>>> If I do a more real-world example like this where the arracopy >>>>>> can not >>>>>> be eliminated because one of its arguments escapes: >>>>>> >>>>>> public static boolean do_test2(int length, Object[] dest) { >>>>>> try { >>>>>> System.arraycopy(new Object[10], 1, dest, 1, length); >>>>>> return false; >>>>>> } catch (IndexOutOfBoundsException e) { >>>>>> return true; >>>>>> } >>>>>> } >>>>>> >>>>>> and call it with: >>>>>> >>>>>> do_test2(8, new Object[10]) >>>>>> >>>>>> the generated code for do_test2() unfortunately contains one more >>>>>> check now with my change (the 'length' field is in [rsp + #0]): >>>>>> >>>>>> 0a2 B4: # B18 B5 <- B3 Freq: 0,999999 >>>>>> 0a2 movl R9, [rsp + #0] # spill >>>>>> 0a6 testl R9, R9 >>>>>> 0a9 jl B18 P=0,000001 C=-1,000000 >>>>>> 0a9 >>>>>> 0af B5: # B18 B6 <- B4 Freq: 0,999998 >>>>>> 0af movl RBX, R9 # spill >>>>>> 0b2 incl RBX # int >>>>>> 0b4 cmpl RBX, #10 # unsigned >>>>>> 0b7 jnbe,u B18 P=0,000001 C=-1,000000 >>>>>> >>>>>> The generated code before my change looked like this (againthe >>>>>> 'length' field is in [rsp + #0]): >>>>>> >>>>>> 0a1 B4: # B17 B5 <- B3 Freq: 0,999999 >>>>>> 0a1 movl R11, [rsp + #8] # spill >>>>>> 0a6 incl R11 # int >>>>>> 0a9 cmpl R11, #10 # unsigned >>>>>> 0ad jnbe,u B17 P=0,000001 C=-1,000000 >>>>>> >>>>>> It seems that the 'length' check has been completely eliminated >>>>>> before. >>>>>> >>>>>> So I need to do some more tests to understand why the new check >>>>>> isn't >>>>>> eliminated. >>>>>> >>>>>> Do you think the new check results in a performance regression? Have >>>>>> you run some benchmarks? >>>>>> >>>>>>> I would prefer brackets around the if body but you don't need to >>>>>>> send >>>>>>> another webrev: >>>>>>> if (EliminateAllocations) { >>>>>>> generate_negative_guard(length, slow_region); >>>>>>> } >>>>>> >>>>>> >>>>>> Yes, I agree. >>>>>> >>>>>>> Best regards, >>>>>>> Tobias >>>>>>> >>>>>>> On 12.08.2016 21:13, Volker Simonis wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> can I please have a review and sponsor for the following fix: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2016/8159611 >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8159611 >>>>>>>> >>>>>>>> >>>>>>>> We are inserting several checks for the arguments of >>>>>>>> System.arraycopy() in LibraryCallKit::inline_arraycopy() before >>>>>>>> intensifying the call in LibraryCallKit::inline_arraycopy. >>>>>>>> However the >>>>>>>> check for the 'length' argument of arracopy is postponed to the >>>>>>>> macro >>>>>>>> expansion phase in PhaseMacroExpand::generate_arraycopy(). >>>>>>>> >>>>>>>> But if we are running with EscapeAnalysis and >>>>>>>> EliminateAllocations, >>>>>>>> the array allocations inside a call to System.arraycopy() may get >>>>>>>> eliminated and thus the complete call to System.arraycopy() >>>>>>>> will be >>>>>>>> removed (see PhaseMacroExpand::process_users_of_allocation). In >>>>>>>> this >>>>>>>> case the extra 'length' check won't be added by >>>>>>>> PhaseMacroExpand::generate_arraycopy() any more because macro >>>>>>>> expansion happens after the elimination of macro nodes. >>>>>>>> >>>>>>>> In such a case it may happen that System.arraycopy() will silently >>>>>>>> accept an invalid (i.e. negative) 'length' parameter, although it >>>>>>>> should actually throw an ArrayOutOfBounds exception. >>>>>>>> >>>>>>>> The fix is simple: also insert a check for the length field in >>>>>>>> LibraryCallKit::inline_arraycopy() if we are running with >>>>>>>> EliminateAllocations. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Volker >>>>>>>> > From tom.rodriguez at oracle.com Wed Sep 14 15:04:46 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 14 Sep 2016 08:04:46 -0700 Subject: RFR: 8165434: [JVMCI] remove uses of setAccessible In-Reply-To: <9B5CFF51-7C4E-44E8-B743-B37411E3C77C@oracle.com> References: <864558C5-C2AD-4D6B-BB6F-568F00BBE28A@twitter.com> <6224CDA0-63E6-442C-BD13-732208FA75A2@oracle.com> <999A422E-6CF6-45C5-955B-D58745DBB456@twitter.com> <21860311-D6E9-482B-B0A0-F488A516A1D3@oracle.com> <9B5CFF51-7C4E-44E8-B743-B37411E3C77C@oracle.com> Message-ID: <280B564D-C015-4AF1-8B1E-B16A8BE0E7A0@oracle.com> The updated test looks fine to me. tom > On Sep 13, 2016, at 3:33 PM, Doug Simon wrote: > > JPRT testing revealed a test bug in FindUniqueConcreteMethodTest.java where CompileToVM.findUniqueConcreteMethod was being called with a default method. This is not supported by HotSpot which is why the only other usage of this private API avoids it: > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/ec36e3e03d65/src/jdk.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java#l385 > > The offending test in FindUniqueConcreteMethodTest has been commented out. > > -Doug > >> On 08 Sep 2016, at 15:12, Doug Simon wrote: >> >>> >>> On 07 Sep 2016, at 19:52, Christian Thalinger wrote: >>> >>>> >>>> On Sep 7, 2016, at 2:29 AM, Doug Simon wrote: >>>> >>>>> >>>>> On 06 Sep 2016, at 20:12, Christian Thalinger wrote: >>>>> >>>>> >>>>>> On Sep 5, 2016, at 6:45 AM, Doug Simon wrote: >>>>>> >>>>>> JVMCI currently uses java.lang.reflect.AccessibleObject.setAccessible to get at private internals of certain JDK objects (e.g. java.lang.reflect.Method::slot). In light of changes around java.lang.reflect.AccessibleObject::setAccessible at http://openjdk.java.net/jeps/261, this may require extra command line options at some point. To avoid that, I?ve removed all uses of setAccessible in JVMCI. >>>>>> >>>>>> http://cr.openjdk.java.net/~dnsimon/8165434/ >>>>> >>>>> src/jdk.vm.ci/share/classes/jdk.vm.ci.meta/src/jdk/vm/ci/meta/ModifiersProvider.java >>>>> >>>>> + int BRIDGE = 0x0040; >>>>> + int VARARGS = 0x0080; >>>>> + int SYNTHETIC = 0x1000; >>>>> + int ANNOTATION = 0x2000; >>>>> + int ENUM = 0x4000; >>>>> I wish we could avoid that. We can?t use this stuff because it?s HotSpot-dependent, right? >>>>> + assert ModifiersProvider.SYNTHETIC == getConstant("JVM_ACC_SYNTHETIC", Integer.class); >>>>> + assert ModifiersProvider.ANNOTATION == getConstant("JVM_ACC_ANNOTATION", Integer.class); >>>>> + assert ModifiersProvider.BRIDGE == getConstant("JVM_ACC_BRIDGE", Integer.class); >>>>> + assert ModifiersProvider.VARARGS == getConstant("JVM_ACC_VARARGS", Integer.class); >>>>> + assert ModifiersProvider.ENUM == getConstant("JVM_ACC_ENUM", Integer.class); >>>>> What if we convert these constants to interface methods and the VM-dependent part has to implement them? Or maybe even keep the fields and assign them via interface methods. >>>> >>>> Following your suggestion, I?ve factored out these VM dependent flags to a new HotSpotModifiers class: >>>> >>>> http://cr.openjdk.java.net/~dnsimon/8165434.v2/ >>> >>> Excellent. One question? I noticed HotSpotModifiers is an interface but no other class implements it. Is there a reason for it being an interface? >> >> Nope. It?s now a class. >> >>> >>> Only nit, remove 2011: >>> 2 * Copyright (c) 2011, 2016, Oracle and/or its affiliates. All rights reserved. >> >> Fixed. >> >> -Doug > From vitalyd at gmail.com Wed Sep 14 15:46:03 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 14 Sep 2016 11:46:03 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: <57D85E4A.2080302@oracle.com> References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> <57D85E4A.2080302@oracle.com> Message-ID: Hi Vladimir, Do OSR compilations run EA? I'm looking at some code (roughly) like this: while (true) { for (Entry<...> e : concurrentHashMap.entrySet()) { // e does not escape } Thread.sleep(...); } I see the enclosing method OSR compiled, but the iterator and entry aren't eliminated. Makes me wonder if OSR doesn't do EA. Is that the case? Thanks On Tuesday, September 13, 2016, Vladimir Kozlov wrote: > If allocation is done locally in loop it could be SR (but not guaranteed): > > for () { > Foo f = new Foo(); > } > > "Currently" we can't SR it if there is merge: > > Foo f = new Foo(); > for () { > f = new Foo(); > } > x = f.x; > > Also we can't SR an array if it has index access because we can't map > loads/stores to concrete element: > > int[] a = new int[3]; > for (i) { > x = a[i] > } > > If elements are accessed without index (using array to pass or return > several values) or a loop is fully unrolled we can SR it: > > x0 = a[0]; > x1 = a[1]; > x2 = a[2]; > > Regards, > Vladimir > > On 9/13/16 12:55 PM, Ruslan Cheremin wrote: > >> There was also another thread a few months back where I was asking why a >>> small local array allocation wasn't scalarized, and the answer there was >>> ordering between loop unrolling and EA passes (I can >>> >> dig up that thread if you're interested). >> >> It would be very nice, please -- I've tried to google it by myself >> (because you've noted it already in the thread) but wasn't able to guess >> right keywords :) >> >> >> 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich > vitalyd at gmail.com>>: >> >> >> >> On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin > > wrote: >> >> >how it can be made stable to the point where you can rely/depend >> on it for performance. >> >> Well, same can be said about any JIT optimization -- (may be it >> is time to rename dynamic runtime to stochastic runtime?). Personally I see >> SR to be the same order of stability as inlining. >> Actually, apart from few SR-specific issues (like with merge >> points), EA/SR mostly follow inlining: if you have enough scope inlined >> you'll have, say, 80% chance of SR. From my perspective it >> is inlining which is so surprisingly unstable. >> >> Yeah, I'd agree. The difference, in my mind, is failing to inline a >> function may not have as drastic performance implications as failing to >> eliminate temporaries. >> >> >> BTW: have you considered to share you experience with EA/SR >> pitfalls? Even if "increase likelihood" is the best option available -- >> there are still very little information about it in the net. >> >> I'm kind of doing that via the few emails on this list :). I think >> you pretty much covered the biggest (apparent) flake in the equation - >> inlining, which can fail for all sorts of different >> reasons. Beyond that, there's the control flow insensitive aspect of >> the EA, which is tangentially related to inlining (or lack thereof). >> >> There was also another thread a few months back where I was asking >> why a small local array allocation wasn't scalarized, and the answer there >> was ordering between loop unrolling and EA passes (I >> can dig up that thread if you're interested). The bizarre thing >> there was the loop operation was folded into a constant, and the compiled >> method was returning a constant value, but the array >> allocation was left behind (although it wasn't needed). >> >> I agree that there isn't much information about EA in Hotspot >> (there's a lot of handwaving and inaccuracies online). In particular, it'd >> be nice if the performance wiki had a section on making >> user code play well with EA (just like it has guidance on some other >> JIT aspects currently). >> >> >> ---- >> Ruslan >> >> >> >> 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich > >: >> >> >> >> On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin < >> cheremin at gmail.com > wrote: >> >> >That's my understanding as well (and matches what I'm >> seeing in some synthetic test harnesses). >> >> Ok, I just tried to clear it out, because it is not the >> first time I see BCEA... noted in context of scalar replacement, and I >> start to doubt my eyes :) >> >> >t's pretty brittle, sadly, and more importantly, >> unstable. >> >> Making similar experiments I see the same. E.g. >> HashMap.get(TupleKey) lookup can be successfully scalarized 99% cases, but >> scalarization become broken once with slightly changed key >> generation schema -- because hashcodes distribution >> becomes worse, and HashMap buckets start to convert themself to TreeBins, >> and TreeBins code is much harder task for EA. >> >> Another can of worms is mismatch between different >> inlining heuristics. E.g. FreqInlineSize and InlineSmallCode thresholds may >> give different decision for the same piece of code, and >> taken inlining decision depends on was method already >> compiled or not -- which depends on thinnest details of initialization >> order and execution profile. This scenarios becomes rare in >> 1.8 with InlineSmallCode increased, but I'm not sure they >> are gone... >> >> Currently, I'm starting to think code needs to be >> specifically written for EA/SR in mind to be more-or-less stably >> scalarized. I.e. you can't get it for free (or it will be unstable). >> >> I'm not sure this is practical, to be honest, at least for a >> big enough application. I've long considered EA (and scalar replacement) >> as a bonus optimization, and never to rely on it if >> the allocations would hurt otherwise. I'm just a bit >> surprised *just* how unstable it appears to be, in the "simplest" of cases. >> >> I think code can be written to increase likelihood of scalar >> replacement, but I just can't see how it can be made stable to the point >> where you can rely/depend on it for performance. >> >> >> ---- >> Ruslan >> >> >> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich < >> vitalyd at gmail.com >: >> >> >> >> On Tuesday, September 13, 2016, Cheremin Ruslan < >> cheremin at gmail.com > wrote: >> >> > I'm seeing some code that iterates over a >> ConcurrentHashMap's entrySet that allocates tens of GB of CHM$MapEntry >> objects even though they don't escape >> >> >> I'm a bit confused: I was sure BCEA-style params >> do affect EA, but don't affect scalar replacement. With bcEscapeAnalyser >> you can get (sort of) inter-procedural EA, but this >> only allows you to have more allocations >> identified as ArgEscape instead of GlobalEscape. But you can't get more >> NoEscape without real inlining. ArgEscape (afaik) is used only >> for synchronization removals in HotSpot, not for >> scalar replacements. >> >> Am I incorrect? >> >> That's my understanding as well (and matches what I'm >> seeing in some synthetic test harnesses). >> >> I'm generally seeing a lot of variability in scalar >> replacement in particular, all driven by profile data. HashMap> ...>::get(int) sometimes works at eliminating the box >> and sometimes doesn't - the difference appears to be >> whether Integer::equals is inlined or not, which in turn depends on whether >> the lookup finds something or not and whether the >> number of successful lookups reaches compilation >> threshold. It's pretty brittle, sadly, and more importantly, unstable. >> >> >> >> ---- >> Ruslan >> >> >> >> -- >> Sent from my phone >> >> >> >> >> >> >> -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Wed Sep 14 16:13:50 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 14 Sep 2016 19:13:50 +0300 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> <57D85E4A.2080302@oracle.com> Message-ID: <45c9ea9a-872a-b5f7-28fc-c1211ef8f044@oracle.com> > Do OSR compilations run EA? I'm looking at some code (roughly) like this: > > while (true) { > for (Entry<...> e : concurrentHashMap.entrySet()) { > // e does not escape > } > Thread.sleep(...); > } > > I see the enclosing method OSR compiled, but the iterator and entry > aren't eliminated. Makes me wonder if OSR doesn't do EA. Is that the case? EA is performed for OSR compilations, but keep in mind that the entry point for OSR compilation is the back branch in the loop. The whole JVM state is passed as the argument, so EA can only detect that something is local for the duration of a single loop iteration, not when something temporary is allocated for the whole loop. It means that the iterator object can't be eliminated in OSR compilation. Probably, it causes the element object to escape as well. Best regards, Vladimir Ivanov > On Tuesday, September 13, 2016, Vladimir Kozlov > > wrote: > > If allocation is done locally in loop it could be SR (but not > guaranteed): > > for () { > Foo f = new Foo(); > } > > "Currently" we can't SR it if there is merge: > > Foo f = new Foo(); > for () { > f = new Foo(); > } > x = f.x; > > Also we can't SR an array if it has index access because we can't > map loads/stores to concrete element: > > int[] a = new int[3]; > for (i) { > x = a[i] > } > > If elements are accessed without index (using array to pass or > return several values) or a loop is fully unrolled we can SR it: > > x0 = a[0]; > x1 = a[1]; > x2 = a[2]; > > Regards, > Vladimir > > On 9/13/16 12:55 PM, Ruslan Cheremin wrote: > > There was also another thread a few months back where I was > asking why a small local array allocation wasn't scalarized, > and the answer there was ordering between loop unrolling and > EA passes (I can > > dig up that thread if you're interested). > > It would be very nice, please -- I've tried to google it by > myself (because you've noted it already in the thread) but > wasn't able to guess right keywords :) > > > 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich >: > > > > On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin > > wrote: > > >how it can be made stable to the point where you can > rely/depend on it for performance. > > Well, same can be said about any JIT optimization -- > (may be it is time to rename dynamic runtime to stochastic > runtime?). Personally I see SR to be the same order of stability > as inlining. > Actually, apart from few SR-specific issues (like with > merge points), EA/SR mostly follow inlining: if you have enough > scope inlined you'll have, say, 80% chance of SR. From my > perspective it > is inlining which is so surprisingly unstable. > > Yeah, I'd agree. The difference, in my mind, is failing to > inline a function may not have as drastic performance > implications as failing to eliminate temporaries. > > > BTW: have you considered to share you experience with > EA/SR pitfalls? Even if "increase likelihood" is the best option > available -- there are still very little information about it in > the net. > > I'm kind of doing that via the few emails on this list :). > I think you pretty much covered the biggest (apparent) flake in > the equation - inlining, which can fail for all sorts of different > reasons. Beyond that, there's the control flow insensitive > aspect of the EA, which is tangentially related to inlining (or > lack thereof). > > There was also another thread a few months back where I was > asking why a small local array allocation wasn't scalarized, and > the answer there was ordering between loop unrolling and EA > passes (I > can dig up that thread if you're interested). The bizarre > thing there was the loop operation was folded into a constant, > and the compiled method was returning a constant value, but the > array > allocation was left behind (although it wasn't needed). > > I agree that there isn't much information about EA in > Hotspot (there's a lot of handwaving and inaccuracies online). > In particular, it'd be nice if the performance wiki had a > section on making > user code play well with EA (just like it has guidance on > some other JIT aspects currently). > > > ---- > Ruslan > > > > 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich > >: > > > > On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin > > wrote: > > >That's my understanding as well (and matches > what I'm seeing in some synthetic test harnesses). > > Ok, I just tried to clear it out, because it is > not the first time I see BCEA... noted in context of scalar > replacement, and I start to doubt my eyes :) > > >t's pretty brittle, sadly, and more > importantly, unstable. > > Making similar experiments I see the same. E.g. > HashMap.get(TupleKey) lookup can be successfully scalarized 99% > cases, but scalarization become broken once with slightly > changed key > generation schema -- because hashcodes > distribution becomes worse, and HashMap buckets start to convert > themself to TreeBins, and TreeBins code is much harder task for EA. > > Another can of worms is mismatch between > different inlining heuristics. E.g. FreqInlineSize and > InlineSmallCode thresholds may give different decision for the > same piece of code, and > taken inlining decision depends on was method > already compiled or not -- which depends on thinnest details of > initialization order and execution profile. This scenarios > becomes rare in > 1.8 with InlineSmallCode increased, but I'm not > sure they are gone... > > Currently, I'm starting to think code needs to > be specifically written for EA/SR in mind to be more-or-less > stably scalarized. I.e. you can't get it for free (or it will be > unstable). > > I'm not sure this is practical, to be honest, at > least for a big enough application. I've long considered EA > (and scalar replacement) as a bonus optimization, and never to > rely on it if > the allocations would hurt otherwise. I'm just a > bit surprised *just* how unstable it appears to be, in the > "simplest" of cases. > > I think code can be written to increase likelihood > of scalar replacement, but I just can't see how it can be made > stable to the point where you can rely/depend on it for performance. > > > ---- > Ruslan > > > 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich > >: > > > > On Tuesday, September 13, 2016, Cheremin > Ruslan > wrote: > > > I'm seeing some code that iterates > over a ConcurrentHashMap's entrySet that allocates tens of GB of > CHM$MapEntry objects even though they don't escape > > > I'm a bit confused: I was sure > BCEA-style params do affect EA, but don't affect scalar > replacement. With bcEscapeAnalyser you can get (sort of) > inter-procedural EA, but this > only allows you to have more allocations > identified as ArgEscape instead of GlobalEscape. But you can't > get more NoEscape without real inlining. ArgEscape (afaik) is > used only > for synchronization removals in HotSpot, > not for scalar replacements. > > Am I incorrect? > > That's my understanding as well (and matches > what I'm seeing in some synthetic test harnesses). > > I'm generally seeing a lot of variability in > scalar replacement in particular, all driven by profile data. > HashMap::get(int) sometimes works at eliminating > the box > and sometimes doesn't - the difference > appears to be whether Integer::equals is inlined or not, which > in turn depends on whether the lookup finds something or not and > whether the > number of successful lookups reaches > compilation threshold. It's pretty brittle, sadly, and more > importantly, unstable. > > > > ---- > Ruslan > > > > -- > Sent from my phone > > > > > > > > > -- > Sent from my phone From vitalyd at gmail.com Wed Sep 14 16:15:41 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 14 Sep 2016 12:15:41 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> <57D85E4A.2080302@oracle.com> Message-ID: Looking at PrintInlining output of an application with places where SR isn't happening (but should, in my mind), it appears that lots of call graphs along the path where the object "escapes" end because some part of the path fails to inline with "already compiled into a big method" failure reason. So basically we end up hitting a "black hole" along the way, and the JIT can no longer prove the object doesn't escape. I wonder how 1000 (when tiered is disabled) for 64bit was chosen as the default value for InlineSmallCode - is that still the current thinking as a good default? I understand the rationale for this check, but it also seems like this heuristic is somewhat problematic; how do we, for example, know that inlining that method (and whatever was inlined into it to cause it to be > InlineSmallCode) won't produce smaller machine code because more optimizations can be done? It also seems like it would be nice to force inlining if bcEscapeAnalysis estimates that some allocations can go away as a result. Also, is the size of the method already taking into account any untaken/cold code pruning that was done prior to code gen? I assume so, but just wanted to check. Finally, would it be possible to print out the actual native code size as part of the "already compiled into a big method" message? Otherwise, it's hard to say what value I should try for InlineSmallCode. Thanks P.S. When is @ForceInline going to be part of Java SE? :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Sep 14 16:18:19 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 14 Sep 2016 12:18:19 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: <45c9ea9a-872a-b5f7-28fc-c1211ef8f044@oracle.com> References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> <57D85E4A.2080302@oracle.com> <45c9ea9a-872a-b5f7-28fc-c1211ef8f044@oracle.com> Message-ID: On Wed, Sep 14, 2016 at 12:13 PM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > Do OSR compilations run EA? I'm looking at some code (roughly) like this: >> >> while (true) { >> for (Entry<...> e : concurrentHashMap.entrySet()) { >> // e does not escape >> } >> Thread.sleep(...); >> } >> >> I see the enclosing method OSR compiled, but the iterator and entry >> aren't eliminated. Makes me wonder if OSR doesn't do EA. Is that the >> case? >> > > EA is performed for OSR compilations, but keep in mind that the entry > point for OSR compilation is the back branch in the loop. > > The whole JVM state is passed as the argument, so EA can only detect that > something is local for the duration of a single loop iteration, not when > something temporary is allocated for the whole loop. > > It means that the iterator object can't be eliminated in OSR compilation. > Probably, it causes the element object to escape as well. > Darn! Ok, thanks Vladimir - that would explain what I'm seeing. So basically need to find a way to avoid OSR compiles for cases like this. > > Best regards, > Vladimir Ivanov > > On Tuesday, September 13, 2016, Vladimir Kozlov >> > wrote: >> >> If allocation is done locally in loop it could be SR (but not >> guaranteed): >> >> for () { >> Foo f = new Foo(); >> } >> >> "Currently" we can't SR it if there is merge: >> >> Foo f = new Foo(); >> for () { >> f = new Foo(); >> } >> x = f.x; >> >> Also we can't SR an array if it has index access because we can't >> map loads/stores to concrete element: >> >> int[] a = new int[3]; >> for (i) { >> x = a[i] >> } >> >> If elements are accessed without index (using array to pass or >> return several values) or a loop is fully unrolled we can SR it: >> >> x0 = a[0]; >> x1 = a[1]; >> x2 = a[2]; >> >> Regards, >> Vladimir >> >> On 9/13/16 12:55 PM, Ruslan Cheremin wrote: >> >> There was also another thread a few months back where I was >> asking why a small local array allocation wasn't scalarized, >> and the answer there was ordering between loop unrolling and >> EA passes (I can >> >> dig up that thread if you're interested). >> >> It would be very nice, please -- I've tried to google it by >> myself (because you've noted it already in the thread) but >> wasn't able to guess right keywords :) >> >> >> 2016-09-13 22:44 GMT+03:00 Vitaly Davidovich > >: >> >> >> >> On Tue, Sep 13, 2016 at 3:32 PM, Ruslan Cheremin >> > wrote: >> >> >how it can be made stable to the point where you can >> rely/depend on it for performance. >> >> Well, same can be said about any JIT optimization -- >> (may be it is time to rename dynamic runtime to stochastic >> runtime?). Personally I see SR to be the same order of stability >> as inlining. >> Actually, apart from few SR-specific issues (like with >> merge points), EA/SR mostly follow inlining: if you have enough >> scope inlined you'll have, say, 80% chance of SR. From my >> perspective it >> is inlining which is so surprisingly unstable. >> >> Yeah, I'd agree. The difference, in my mind, is failing to >> inline a function may not have as drastic performance >> implications as failing to eliminate temporaries. >> >> >> BTW: have you considered to share you experience with >> EA/SR pitfalls? Even if "increase likelihood" is the best option >> available -- there are still very little information about it in >> the net. >> >> I'm kind of doing that via the few emails on this list :). >> I think you pretty much covered the biggest (apparent) flake in >> the equation - inlining, which can fail for all sorts of different >> reasons. Beyond that, there's the control flow insensitive >> aspect of the EA, which is tangentially related to inlining (or >> lack thereof). >> >> There was also another thread a few months back where I was >> asking why a small local array allocation wasn't scalarized, and >> the answer there was ordering between loop unrolling and EA >> passes (I >> can dig up that thread if you're interested). The bizarre >> thing there was the loop operation was folded into a constant, >> and the compiled method was returning a constant value, but the >> array >> allocation was left behind (although it wasn't needed). >> >> I agree that there isn't much information about EA in >> Hotspot (there's a lot of handwaving and inaccuracies online). >> In particular, it'd be nice if the performance wiki had a >> section on making >> user code play well with EA (just like it has guidance on >> some other JIT aspects currently). >> >> >> ---- >> Ruslan >> >> >> >> 2016-09-13 21:33 GMT+03:00 Vitaly Davidovich >> >: >> >> >> >> On Tue, Sep 13, 2016 at 2:25 PM, Ruslan Cheremin >> > wrote: >> >> >That's my understanding as well (and matches >> what I'm seeing in some synthetic test harnesses). >> >> Ok, I just tried to clear it out, because it is >> not the first time I see BCEA... noted in context of scalar >> replacement, and I start to doubt my eyes :) >> >> >t's pretty brittle, sadly, and more >> importantly, unstable. >> >> Making similar experiments I see the same. E.g. >> HashMap.get(TupleKey) lookup can be successfully scalarized 99% >> cases, but scalarization become broken once with slightly >> changed key >> generation schema -- because hashcodes >> distribution becomes worse, and HashMap buckets start to convert >> themself to TreeBins, and TreeBins code is much harder task for >> EA. >> >> Another can of worms is mismatch between >> different inlining heuristics. E.g. FreqInlineSize and >> InlineSmallCode thresholds may give different decision for the >> same piece of code, and >> taken inlining decision depends on was method >> already compiled or not -- which depends on thinnest details of >> initialization order and execution profile. This scenarios >> becomes rare in >> 1.8 with InlineSmallCode increased, but I'm not >> sure they are gone... >> >> Currently, I'm starting to think code needs to >> be specifically written for EA/SR in mind to be more-or-less >> stably scalarized. I.e. you can't get it for free (or it will be >> unstable). >> >> I'm not sure this is practical, to be honest, at >> least for a big enough application. I've long considered EA >> (and scalar replacement) as a bonus optimization, and never to >> rely on it if >> the allocations would hurt otherwise. I'm just a >> bit surprised *just* how unstable it appears to be, in the >> "simplest" of cases. >> >> I think code can be written to increase likelihood >> of scalar replacement, but I just can't see how it can be made >> stable to the point where you can rely/depend on it for >> performance. >> >> >> ---- >> Ruslan >> >> >> 2016-09-13 20:51 GMT+03:00 Vitaly Davidovich >> >: >> >> >> >> On Tuesday, September 13, 2016, Cheremin >> Ruslan > wrote: >> >> > I'm seeing some code that iterates >> over a ConcurrentHashMap's entrySet that allocates tens of GB of >> CHM$MapEntry objects even though they don't escape >> >> >> I'm a bit confused: I was sure >> BCEA-style params do affect EA, but don't affect scalar >> replacement. With bcEscapeAnalyser you can get (sort of) >> inter-procedural EA, but this >> only allows you to have more allocations >> identified as ArgEscape instead of GlobalEscape. But you can't >> get more NoEscape without real inlining. ArgEscape (afaik) is >> used only >> for synchronization removals in HotSpot, >> not for scalar replacements. >> >> Am I incorrect? >> >> That's my understanding as well (and matches >> what I'm seeing in some synthetic test harnesses). >> >> I'm generally seeing a lot of variability in >> scalar replacement in particular, all driven by profile data. >> HashMap::get(int) sometimes works at eliminating >> the box >> and sometimes doesn't - the difference >> appears to be whether Integer::equals is inlined or not, which >> in turn depends on whether the lookup finds something or not and >> whether the >> number of successful lookups reaches >> compilation threshold. It's pretty brittle, sadly, and more >> importantly, unstable. >> >> >> >> ---- >> Ruslan >> >> >> >> -- >> Sent from my phone >> >> >> >> >> >> >> >> >> -- >> Sent from my phone >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheremin at gmail.com Wed Sep 14 17:50:42 2016 From: cheremin at gmail.com (Ruslan) Date: Wed, 14 Sep 2016 20:50:42 +0300 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> <57D85E4A.2080302@oracle.com> Message-ID: Afaik, InlineSmallCode is 2000 in 1.8+. Which makes such scenarios not so often ---- Ruslan > 14 ????. 2016 ?., ? 19:15, Vitaly Davidovich ???????(?): > > Looking at PrintInlining output of an application with places where SR isn't happening (but should, in my mind), it appears that lots of call graphs along the path where the object "escapes" end because some part of the path fails to inline with "already compiled into a big method" failure reason. So basically we end up hitting a "black hole" along the way, and the JIT can no longer prove the object doesn't escape. > > I wonder how 1000 (when tiered is disabled) for 64bit was chosen as the default value for InlineSmallCode - is that still the current thinking as a good default? I understand the rationale for this check, but it also seems like this heuristic is somewhat problematic; how do we, for example, know that inlining that method (and whatever was inlined into it to cause it to be > InlineSmallCode) won't produce smaller machine code because more optimizations can be done? It also seems like it would be nice to force inlining if bcEscapeAnalysis estimates that some allocations can go away as a result. > > Also, is the size of the method already taking into account any untaken/cold code pruning that was done prior to code gen? I assume so, but just wanted to check. > > Finally, would it be possible to print out the actual native code size as part of the "already compiled into a big method" message? Otherwise, it's hard to say what value I should try for InlineSmallCode. > > Thanks > > P.S. When is @ForceInline going to be part of Java SE? :) From vitalyd at gmail.com Wed Sep 14 17:52:57 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 14 Sep 2016 13:52:57 -0400 Subject: MaxBCEAEstimateSize and inlining clarification In-Reply-To: References: <00C16B65-A85F-491E-9384-1172735D9952@gmail.com> <57D85E4A.2080302@oracle.com> Message-ID: On Wed, Sep 14, 2016 at 1:50 PM, Ruslan wrote: > Afaik, InlineSmallCode is 2000 in 1.8+. Which makes such scenarios not so > often > If you disable tiered compilation, it's 1000. ---- > Ruslan > > > 14 ????. 2016 ?., ? 19:15, Vitaly Davidovich > ???????(?): > > > > Looking at PrintInlining output of an application with places where SR > isn't happening (but should, in my mind), it appears that lots of call > graphs along the path where the object "escapes" end because some part of > the path fails to inline with "already compiled into a big method" failure > reason. So basically we end up hitting a "black hole" along the way, and > the JIT can no longer prove the object doesn't escape. > > > > I wonder how 1000 (when tiered is disabled) for 64bit was chosen as the > default value for InlineSmallCode - is that still the current thinking as a > good default? I understand the rationale for this check, but it also seems > like this heuristic is somewhat problematic; how do we, for example, know > that inlining that method (and whatever was inlined into it to cause it to > be > InlineSmallCode) won't produce smaller machine code because more > optimizations can be done? It also seems like it would be nice to force > inlining if bcEscapeAnalysis estimates that some allocations can go away as > a result. > > > > Also, is the size of the method already taking into account any > untaken/cold code pruning that was done prior to code gen? I assume so, but > just wanted to check. > > > > Finally, would it be possible to print out the actual native code size > as part of the "already compiled into a big method" message? Otherwise, > it's hard to say what value I should try for InlineSmallCode. > > > > Thanks > > > > P.S. When is @ForceInline going to be part of Java SE? :) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Sep 14 18:11:28 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 14 Sep 2016 11:11:28 -0700 Subject: Possible integer overflow in LIRGenerator::generate_address on SPARC and other platforms In-Reply-To: <1c3f2f5368754962a6e2b2684c6e1fa2@DEWDFE13DE14.global.corp.sap> References: <1c3f2f5368754962a6e2b2684c6e1fa2@DEWDFE13DE14.global.corp.sap> Message-ID: <57D992D0.4010009@oracle.com> CC to group since I am not familiar with C1. On SPARC generate_address() is called only from LIR_Address* generate_address(LIR_Opr base, int disp, BasicType type) { return generate_address(base, LIR_OprFact::illegalOpr, 0, disp, type); } So it is not a issue. But I agree with you in general. On x86 LIRGenerator::emit_array_address() may have this problem. The only explanation I see that we did not hit it is Interpreter may be more careful about checking it and throw exception. It could be C1 check this values somewhere else. Thanks, Vladimir On 9/6/16 9:21 AM, Doerr, Martin wrote: > Hi Vladimir, > > I was wondering about the following code in LIRGenerator::generate_address in c1_LIRGenerator_sparc.cpp (and some other platforms): > > if (index->is_constant()) { > > disp += index->as_constant_ptr()->as_jint() << shift; > > It?s fine to compute the constant in general, but disp is an int! > > Seems like the only user of this function which uses an index is Unsafe put/get where nobody has noticed it yet. > > Do you think we have to fix this in 9? > > I can open a bug if you like. > > Best regards, > > Martin > From dean.long at oracle.com Wed Sep 14 21:26:14 2016 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 14 Sep 2016 14:26:14 -0700 Subject: Possible integer overflow in LIRGenerator::generate_address on SPARC and other platforms In-Reply-To: <57D992D0.4010009@oracle.com> References: <1c3f2f5368754962a6e2b2684c6e1fa2@DEWDFE13DE14.global.corp.sap> <57D992D0.4010009@oracle.com> Message-ID: For sparc, I think this is used by LIRGenerator::do_LoadField() and do_StoreField(), so "disp" should be limited by the size of object. I didn't find sparc using it for Unsafe put. dl On 9/14/16 11:11 AM, Vladimir Kozlov wrote: > CC to group since I am not familiar with C1. > > On SPARC generate_address() is called only from > > LIR_Address* generate_address(LIR_Opr base, int disp, BasicType type) { > return generate_address(base, LIR_OprFact::illegalOpr, 0, disp, > type); > } > > So it is not a issue. But I agree with you in general. > > On x86 LIRGenerator::emit_array_address() may have this problem. > > The only explanation I see that we did not hit it is Interpreter may > be more careful about checking it and throw exception. > > It could be C1 check this values somewhere else. > > Thanks, > Vladimir > > On 9/6/16 9:21 AM, Doerr, Martin wrote: >> Hi Vladimir, >> >> I was wondering about the following code in >> LIRGenerator::generate_address in c1_LIRGenerator_sparc.cpp (and some >> other platforms): >> >> if (index->is_constant()) { >> >> disp += index->as_constant_ptr()->as_jint() << shift; >> >> It?s fine to compute the constant in general, but disp is an int! >> >> Seems like the only user of this function which uses an index is >> Unsafe put/get where nobody has noticed it yet. >> >> Do you think we have to fix this in 9? >> >> I can open a bug if you like. >> >> Best regards, >> >> Martin >> From david.d.leopoldseder at oracle.com Thu Sep 15 14:10:05 2016 From: david.d.leopoldseder at oracle.com (David Leopoldseder) Date: Thu, 15 Sep 2016 16:10:05 +0200 Subject: RFR: 8166125: [JVMCI] Missing JVMCI flag default values Message-ID: <57a99e49-68ad-8e21-0736-8fda150bb7f4@oracle.com> Hi, Please review this patch. Bug: During the initial commit for the JVMCI JEP some options JVMCI sets differently than c2 have been forgotten. Fix: Set the options if INCLUDE_JVMCI is true and -XX:+UseJVMCICompiler. http://cr.openjdk.java.net/~davleopo/JDK-8166125/webrev.001/ https://bugs.openjdk.java.net/browse/JDK-8166125 - david -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Thu Sep 15 15:13:04 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 15 Sep 2016 15:13:04 +0000 Subject: Possible integer overflow in LIRGenerator::generate_address on SPARC and other platforms In-Reply-To: <57D992D0.4010009@oracle.com> References: <1c3f2f5368754962a6e2b2684c6e1fa2@DEWDFE13DE14.global.corp.sap> <57D992D0.4010009@oracle.com> Message-ID: Hi Vladimir, thanks for taking a look. I'll provide a webrev and send a RFR. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Mittwoch, 14. September 2016 20:11 To: Doerr, Martin Cc: hotspot compiler Subject: Re: Possible integer overflow in LIRGenerator::generate_address on SPARC and other platforms CC to group since I am not familiar with C1. On SPARC generate_address() is called only from LIR_Address* generate_address(LIR_Opr base, int disp, BasicType type) { return generate_address(base, LIR_OprFact::illegalOpr, 0, disp, type); } So it is not a issue. But I agree with you in general. On x86 LIRGenerator::emit_array_address() may have this problem. The only explanation I see that we did not hit it is Interpreter may be more careful about checking it and throw exception. It could be C1 check this values somewhere else. Thanks, Vladimir On 9/6/16 9:21 AM, Doerr, Martin wrote: > Hi Vladimir, > > I was wondering about the following code in LIRGenerator::generate_address in c1_LIRGenerator_sparc.cpp (and some other platforms): > > if (index->is_constant()) { > > disp += index->as_constant_ptr()->as_jint() << shift; > > It's fine to compute the constant in general, but disp is an int! > > Seems like the only user of this function which uses an index is Unsafe put/get where nobody has noticed it yet. > > Do you think we have to fix this in 9? > > I can open a bug if you like. > > Best regards, > > Martin > From martin.doerr at sap.com Thu Sep 15 15:25:05 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 15 Sep 2016 15:25:05 +0000 Subject: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms Message-ID: <29e2b45c984248da8172cf921b7811a6@DEWDFE13DE14.global.corp.sap> Hi, as discussed with Vladimir, C1 contains code to simplify constant index/displacement addressing which uses int. However, int may overflow on 64 bit platforms. Please review the following webrev: http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.00/ I'll also need a sponsor, please. Thanks and best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitrij.pochepko at oracle.com Thu Sep 15 16:57:58 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 15 Sep 2016 19:57:58 +0300 Subject: RFR: 8166146 - [Testbug] update codecache tests with minimal vm filter Message-ID: Hi, please review small fix for 8166146 - [Testbug] update codecache tests with minimal vm filter This patch update codecache tests which are not applicable for minimal vm with respective requires expression webrev: http://cr.openjdk.java.net/~dpochepk/8166146/webrev.01/ CR: https://bugs.openjdk.java.net/browse/JDK-8166146 I've tested this change on linux-x86 using minimal vm Thanks, Dmitrij From igor.ignatyev at oracle.com Thu Sep 15 17:00:59 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 15 Sep 2016 20:00:59 +0300 Subject: RFR: 8166146 - [Testbug] update codecache tests with minimal vm filter In-Reply-To: References: Message-ID: <6B559DCA-5732-456C-9516-197B0252B452@oracle.com> Dmitrij, could you please explain why those tests are applicable for minimal vm? Thanks, ? Igor > On Sep 15, 2016, at 7:57 PM, Dmitrij Pochepko wrote: > > Hi, > > please review small fix for 8166146 - [Testbug] update codecache tests with minimal vm filter > > This patch update codecache tests which are not applicable for minimal vm with respective requires expression > > > webrev: http://cr.openjdk.java.net/~dpochepk/8166146/webrev.01/ > > CR: https://bugs.openjdk.java.net/browse/JDK-8166146 > > I've tested this change on linux-x86 using minimal vm > > > Thanks, > > Dmitrij > From dmitrij.pochepko at oracle.com Thu Sep 15 17:12:03 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Thu, 15 Sep 2016 20:12:03 +0300 Subject: RFR: 8166146 - [Testbug] update codecache tests with minimal vm filter In-Reply-To: <6B559DCA-5732-456C-9516-197B0252B452@oracle.com> References: <6B559DCA-5732-456C-9516-197B0252B452@oracle.com> Message-ID: Hi, these tests are using java.management module, which is incompatible with minimal vm(even if this modules is present in tested image). Thanks, Dmitrij > Dmitrij, > > could you please explain why those tests are applicable for minimal vm? > > Thanks, > ? Igor > >> On Sep 15, 2016, at 7:57 PM, Dmitrij Pochepko wrote: >> >> Hi, >> >> please review small fix for 8166146 - [Testbug] update codecache tests with minimal vm filter >> >> This patch update codecache tests which are not applicable for minimal vm with respective requires expression >> >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8166146/webrev.01/ >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8166146 >> >> I've tested this change on linux-x86 using minimal vm >> >> >> Thanks, >> >> Dmitrij >> From igor.ignatyev at oracle.com Thu Sep 15 17:18:58 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 15 Sep 2016 20:18:58 +0300 Subject: RFR: 8166146 - [Testbug] update codecache tests with minimal vm filter In-Reply-To: References: <6B559DCA-5732-456C-9516-197B0252B452@oracle.com> Message-ID: The tests have '@modules java.management?, so it?s jtreg responsibilities to filter them out if this module is incompatible/unavailable. I don?t think we should clutter up tests w/ unneeded directives, otherwise you will have to go and update all tests which use modules from compact3+, and you will have to update them again when someone change which modules minimal vm supports. in other words, this should be solved on another level: test execution system, test harness, wherever but in not tests. they already declared everything they depend on. Thanks, ? Igor > On Sep 15, 2016, at 8:12 PM, Dmitrij Pochepko wrote: > > Hi, > > these tests are using java.management module, which is incompatible with minimal vm(even if this modules is present in tested image). > > Thanks, > Dmitrij >> Dmitrij, >> >> could you please explain why those tests are applicable for minimal vm? >> >> Thanks, >> ? Igor >> >>> On Sep 15, 2016, at 7:57 PM, Dmitrij Pochepko wrote: >>> >>> Hi, >>> >>> please review small fix for 8166146 - [Testbug] update codecache tests with minimal vm filter >>> >>> This patch update codecache tests which are not applicable for minimal vm with respective requires expression >>> >>> >>> webrev: http://cr.openjdk.java.net/~dpochepk/8166146/webrev.01/ >>> >>> CR: https://bugs.openjdk.java.net/browse/JDK-8166146 >>> >>> I've tested this change on linux-x86 using minimal vm >>> >>> >>> Thanks, >>> >>> Dmitrij >>> > From vladimir.kozlov at oracle.com Thu Sep 15 17:49:35 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 15 Sep 2016 10:49:35 -0700 Subject: RFR: 8166146 - [Testbug] update codecache tests with minimal vm filter In-Reply-To: References: Message-ID: Good. thanks, Vladimir On 9/15/16 9:57 AM, Dmitrij Pochepko wrote: > Hi, > > please review small fix for 8166146 - [Testbug] update codecache tests with minimal vm filter > > This patch update codecache tests which are not applicable for minimal vm with respective requires expression > > > webrev: http://cr.openjdk.java.net/~dpochepk/8166146/webrev.01/ > > CR: https://bugs.openjdk.java.net/browse/JDK-8166146 > > I've tested this change on linux-x86 using minimal vm > > > Thanks, > > Dmitrij > From vladimir.kozlov at oracle.com Thu Sep 15 17:58:59 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 15 Sep 2016 10:58:59 -0700 Subject: RFR: 8166125: [JVMCI] Missing JVMCI flag default values In-Reply-To: <57a99e49-68ad-8e21-0736-8fda150bb7f4@oracle.com> References: <57a99e49-68ad-8e21-0736-8fda150bb7f4@oracle.com> Message-ID: <6ef5a5b0-a1a1-421d-4b7e-f104819d0c61@oracle.com> Looks good. thanks, Vladimir On 9/15/16 7:10 AM, David Leopoldseder wrote: > Hi, > > Please review this patch. > > Bug: > During the initial commit for the JVMCI JEP some options JVMCI sets differently than c2 have been forgotten. > Fix: > Set the options if INCLUDE_JVMCI is true and -XX:+UseJVMCICompiler. > > http://cr.openjdk.java.net/~davleopo/JDK-8166125/webrev.001/ > https://bugs.openjdk.java.net/browse/JDK-8166125 > > - david From vladimir.kozlov at oracle.com Thu Sep 15 18:05:42 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 15 Sep 2016 11:05:42 -0700 Subject: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms In-Reply-To: <29e2b45c984248da8172cf921b7811a6@DEWDFE13DE14.global.corp.sap> References: <29e2b45c984248da8172cf921b7811a6@DEWDFE13DE14.global.corp.sap> Message-ID: <66073428-8ee1-ecf2-52c0-8f4af5a6e6e8@oracle.com> Good but is is not enough. emit_array_address() in c1_LIRGenerator_x86.cpp has the same problem. I would suggest to look on all places where next methods are called and make sure they are correct: LIR_Address(LIR_Opr base, intx disp, BasicType type) LIR_Address(LIR_Opr base, LIR_Opr index, Scale scale, intx disp, BasicType type) Thanks, Vladimir On 9/15/16 8:25 AM, Doerr, Martin wrote: > Hi, > > > > as discussed with Vladimir, C1 contains code to simplify constant index/displacement addressing which uses int. However, > int may overflow on 64 bit platforms. > > > > Please review the following webrev: > > http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.00/ > > > > I?ll also need a sponsor, please. > > > > Thanks and best regards, > > Martin > > > From doug.simon at oracle.com Thu Sep 15 18:31:00 2016 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 15 Sep 2016 20:31:00 +0200 Subject: RFR: 8166125: [JVMCI] Missing JVMCI flag default values In-Reply-To: <57a99e49-68ad-8e21-0736-8fda150bb7f4@oracle.com> References: <57a99e49-68ad-8e21-0736-8fda150bb7f4@oracle.com> Message-ID: I would consider moving the logic from Arguments::set_jvmci_specific_flags (in arguments.[ch]pp) to JVMCIGlobals::set_jvmci_specific_flags (in jvmci_globals.[ch]pp) to match the approach taken for check_jvmci_flags_are_consistent. > On 15 Sep 2016, at 16:10, David Leopoldseder wrote: > > Hi, > > Please review this patch. > > Bug: > During the initial commit for the JVMCI JEP some options JVMCI sets differently than c2 have been forgotten. > Fix: > Set the options if INCLUDE_JVMCI is true and -XX:+UseJVMCICompiler. > > http://cr.openjdk.java.net/~davleopo/JDK-8166125/webrev.001/ > https://bugs.openjdk.java.net/browse/JDK-8166125 > > - david From cthalinger at twitter.com Thu Sep 15 18:34:45 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Thu, 15 Sep 2016 08:34:45 -1000 Subject: RFR: 8166125: [JVMCI] Missing JVMCI flag default values In-Reply-To: <57a99e49-68ad-8e21-0736-8fda150bb7f4@oracle.com> References: <57a99e49-68ad-8e21-0736-8fda150bb7f4@oracle.com> Message-ID: <03D19E4E-8D70-4D43-A8EC-C4F5BD9505DF@twitter.com> Nobody noticed until now? > On Sep 15, 2016, at 4:10 AM, David Leopoldseder wrote: > > Hi, > > Please review this patch. > > Bug: > During the initial commit for the JVMCI JEP some options JVMCI sets differently than c2 have been forgotten. > Fix: > Set the options if INCLUDE_JVMCI is true and -XX:+UseJVMCICompiler. > > http://cr.openjdk.java.net/~davleopo/JDK-8166125/webrev.001/ > https://bugs.openjdk.java.net/browse/JDK-8166125 > > - david -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Thu Sep 15 18:38:18 2016 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 15 Sep 2016 20:38:18 +0200 Subject: RFR: 8166125: [JVMCI] Missing JVMCI flag default values In-Reply-To: <03D19E4E-8D70-4D43-A8EC-C4F5BD9505DF@twitter.com> References: <57a99e49-68ad-8e21-0736-8fda150bb7f4@oracle.com> <03D19E4E-8D70-4D43-A8EC-C4F5BD9505DF@twitter.com> Message-ID: <139FEE17-8C56-4E3F-A289-670D34A061D1@oracle.com> David noticed this while investigating some performance regressions that occurred around the time we switched from a separate JVMCI VM binary (i.e. COMPILERJVMCI) to -XX:+UseJVMCICompiler. -Doug > On 15 Sep 2016, at 20:34, Christian Thalinger wrote: > > Nobody noticed until now? > >> On Sep 15, 2016, at 4:10 AM, David Leopoldseder wrote: >> >> Hi, >> >> Please review this patch. >> >> Bug: >> During the initial commit for the JVMCI JEP some options JVMCI sets differently than c2 have been forgotten. >> Fix: >> Set the options if INCLUDE_JVMCI is true and -XX:+UseJVMCICompiler. >> >> http://cr.openjdk.java.net/~davleopo/JDK-8166125/webrev.001/ >> https://bugs.openjdk.java.net/browse/JDK-8166125 >> >> - david > From cthalinger at twitter.com Thu Sep 15 18:40:35 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Thu, 15 Sep 2016 08:40:35 -1000 Subject: RFR: 8166125: [JVMCI] Missing JVMCI flag default values In-Reply-To: <139FEE17-8C56-4E3F-A289-670D34A061D1@oracle.com> References: <57a99e49-68ad-8e21-0736-8fda150bb7f4@oracle.com> <03D19E4E-8D70-4D43-A8EC-C4F5BD9505DF@twitter.com> <139FEE17-8C56-4E3F-A289-670D34A061D1@oracle.com> Message-ID: <49305203-6D6F-4B5D-B5E6-9530A16E669F@twitter.com> Crazy :-) Well, better late than never. Looks good. > On Sep 15, 2016, at 8:38 AM, Doug Simon wrote: > > David noticed this while investigating some performance regressions that occurred around the time we switched from a separate JVMCI VM binary (i.e. COMPILERJVMCI) to -XX:+UseJVMCICompiler. > > -Doug > >> On 15 Sep 2016, at 20:34, Christian Thalinger wrote: >> >> Nobody noticed until now? >> >>> On Sep 15, 2016, at 4:10 AM, David Leopoldseder wrote: >>> >>> Hi, >>> >>> Please review this patch. >>> >>> Bug: >>> During the initial commit for the JVMCI JEP some options JVMCI sets differently than c2 have been forgotten. >>> Fix: >>> Set the options if INCLUDE_JVMCI is true and -XX:+UseJVMCICompiler. >>> >>> http://cr.openjdk.java.net/~davleopo/JDK-8166125/webrev.001/ >>> https://bugs.openjdk.java.net/browse/JDK-8166125 >>> >>> - david >> > From vladimir.kozlov at oracle.com Fri Sep 16 17:56:20 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 16 Sep 2016 10:56:20 -0700 Subject: RFR: 8134389: Crash in HotSpot with jvm.dll+0x42b48 ciObjectFactory::create_new_metadata In-Reply-To: <7c1a8b01-b4ec-ea23-b59a-500c1bfd5dbc@oracle.com> References: <05c82c51-9525-eec7-206e-a265c7d47194@oracle.com> <7c1a8b01-b4ec-ea23-b59a-500c1bfd5dbc@oracle.com> Message-ID: <57DC3244.7050300@oracle.com> Add assert (we have it in other places too): assert(declared_signature != NULL, "cannot be null"); For functionality correctness ask Vladimir Ivanov to look. Thanks, Vladimir On 9/11/16 4:51 AM, Jamsheed C m wrote: > i made some changes to my fix. webrev is updated in place. > > pit results with latest modification updated in bug(not still completed) > > Best Regards, > > Jamsheed > > > On 9/10/2016 3:53 AM, Jamsheed C m wrote: >> >> adding a little more description as per my understanding >> >> This issue can happen only for compiled lforms not inlined case >> >> there are two scenarios. >> 1) no compiled lforms inlined >> 2) some compiled lforms are inlined or final method is not inlined (linkTo* not inlined).. (i.e partially inlined) >> >> in all these cases *Invoke instruction* will be *return Value*. and will have erased type. >> so we reify return type either by type casting(for partially inlined case) or by directly pulling from callsite MT. >> >> Best Regards, >> >> Jamsheed >> >> >> On 9/8/2016 3:26 PM, Jamsheed C m wrote: >>> Hi All, >>> >>> bugid: https://bugs.openjdk.java.net/browse/JDK-8134389 >>> >>> webrev: http://cr.openjdk.java.net/~jcm/8134389/webrev.00/ >>> >>> return type information is not available in lforms, this causes contradictions in operation like store indexed. mh _linkTo* site arg type casting. etc.. >>> >>> fix: TypeCast to declared return type at lform return. >>> >>> Best Regards, >>> >>> Jamsheed >>> >> > From vladimir.kozlov at oracle.com Fri Sep 16 18:04:19 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 16 Sep 2016 11:04:19 -0700 Subject: RFR: 8155219 - [TESTBUG] Rewrite compiler/ciReplay/TestVM.sh in java In-Reply-To: References: Message-ID: <57DC3423.4020707@oracle.com> Thanks you for doing this. Looks good. I assume vm.debug is true for both builds: fastdebug and slowdebug. Thanks, Vladimir On 9/8/16 7:48 AM, Dmitrij Pochepko wrote: > Hi, > > please review fix for 8155219 - [TESTBUG] Rewrite compiler/ciReplay/TestVM.sh in java > > compiler/ciReoplay/* tests were ported from shell to java. > > > > CR: https://bugs.openjdk.java.net/browse/JDK-8155219 > > webrev for root level: http://cr.openjdk.java.net/~dpochepk/8155219/webrev.root.01/ > > webrev for hotspot: http://cr.openjdk.java.net/~dpochepk/8155219/webrev.01/ > > > I've tested it via rbt. > > Thanks, > > Dmitrij > From dmitrij.pochepko at oracle.com Fri Sep 16 18:44:22 2016 From: dmitrij.pochepko at oracle.com (Dmitrij Pochepko) Date: Fri, 16 Sep 2016 21:44:22 +0300 Subject: RFR: 8155219 - [TESTBUG] Rewrite compiler/ciReplay/TestVM.sh in java In-Reply-To: References: Message-ID: <05897ff9-f355-972a-7755-0edb03d3ba4e@oracle.com> Hi, can somebody take a look? Thanks, Dmitrij On 08.09.2016 17:48, Dmitrij Pochepko wrote: > Hi, > > please review fix for 8155219 - [TESTBUG] Rewrite > compiler/ciReplay/TestVM.sh in java > > compiler/ciReoplay/* tests were ported from shell to java. > > > > CR: https://bugs.openjdk.java.net/browse/JDK-8155219 > > webrev for root level: > http://cr.openjdk.java.net/~dpochepk/8155219/webrev.root.01/ > > webrev for hotspot: > http://cr.openjdk.java.net/~dpochepk/8155219/webrev.01/ > > > I've tested it via rbt. > > Thanks, > > Dmitrij > From vladimir.kozlov at oracle.com Sat Sep 17 01:17:56 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 16 Sep 2016 18:17:56 -0700 Subject: [9] RFR[XS] 8166096: variable tracking size limit exceeded in jvmciCompilerToVM.cpp Message-ID: <57DC99C4.4030701@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8166096 +++ b/make/lib/JvmOverrideFiles.gmk Fri Sep 16 18:14:06 2016 -0700 @@ -31,6 +31,7 @@ ifeq ($(TOOLCHAIN_TYPE), gcc) BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := -fno-var-tracking-assignments -O0 + BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := -fno-var-tracking-assignments endif ifeq ($(OPENJDK_TARGET_OS), linux) Remove annoying Hotspot compilation warning: hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp: In static member function 'static objArrayHandle CompilerToVM::initialize_intrinsics(Thread*)': hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp:206:16: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without objArrayHandle CompilerToVM::initialize_intrinsics(TRAPS) { Thanks, Vladimir From dean.long at oracle.com Sat Sep 17 03:28:39 2016 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 16 Sep 2016 20:28:39 -0700 Subject: [9] RFR[XS] 8166096: variable tracking size limit exceeded in jvmciCompilerToVM.cpp In-Reply-To: <57DC99C4.4030701@oracle.com> References: <57DC99C4.4030701@oracle.com> Message-ID: <9721ae67-e2f8-f5fb-0078-5c45b90c3040@oracle.com> Good. dl On 9/16/16 6:17 PM, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8166096 > > +++ b/make/lib/JvmOverrideFiles.gmk Fri Sep 16 18:14:06 2016 -0700 > @@ -31,6 +31,7 @@ > > ifeq ($(TOOLCHAIN_TYPE), gcc) > BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := > -fno-var-tracking-assignments -O0 > + BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := > -fno-var-tracking-assignments > endif > > ifeq ($(OPENJDK_TARGET_OS), linux) > > > Remove annoying Hotspot compilation warning: > > hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp: In static member > function 'static objArrayHandle > CompilerToVM::initialize_intrinsics(Thread*)': > hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp:206:16: note: > variable tracking size limit exceeded with -fvar-tracking-assignments, > retrying without > objArrayHandle CompilerToVM::initialize_intrinsics(TRAPS) { > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Sat Sep 17 04:09:00 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 16 Sep 2016 21:09:00 -0700 Subject: [9] RFR[XS] 8166096: variable tracking size limit exceeded in jvmciCompilerToVM.cpp In-Reply-To: <9721ae67-e2f8-f5fb-0078-5c45b90c3040@oracle.com> References: <57DC99C4.4030701@oracle.com> <9721ae67-e2f8-f5fb-0078-5c45b90c3040@oracle.com> Message-ID: <57DCC1DC.9000007@oracle.com> Thank you, Dean Vladimir On 9/16/16 8:28 PM, dean.long at oracle.com wrote: > Good. > > dl > > > On 9/16/16 6:17 PM, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8166096 >> >> +++ b/make/lib/JvmOverrideFiles.gmk Fri Sep 16 18:14:06 2016 -0700 >> @@ -31,6 +31,7 @@ >> >> ifeq ($(TOOLCHAIN_TYPE), gcc) >> BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := -fno-var-tracking-assignments -O0 >> + BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := -fno-var-tracking-assignments >> endif >> >> ifeq ($(OPENJDK_TARGET_OS), linux) >> >> >> Remove annoying Hotspot compilation warning: >> >> hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp: In static member function 'static objArrayHandle CompilerToVM::initialize_intrinsics(Thread*)': >> hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp:206:16: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without >> objArrayHandle CompilerToVM::initialize_intrinsics(TRAPS) { >> >> Thanks, >> Vladimir > From HORII at jp.ibm.com Sun Sep 18 17:36:27 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Sun, 18 Sep 2016 17:36:27 +0000 Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic In-Reply-To: References: Message-ID: Hi Martin, and all (Please allow me to send this mail twice. The first mail is awaiting because it exceeded 100KB) Thank you for your reviewing. Gustavo and I recreated a new change based on your comments. I would like to request a review again. My account of cr server is not available now (because of my mistake...) and Gustavo cannot create a webrev file with another reason. I would like to attach a diff file created with "hg diff -g" in hotspot. If possible, could someone create a webrev file with this changeset? Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo "Doerr, Martin" wrote on 09/13/2016 18:35:09: > From: "Doerr, Martin" > To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-compiler- > dev at openjdk.java.net" > Cc: "Volker Simonis (volker.simonis at gmail.com)" > , Gustavo Bueno Romero > Date: 09/13/2016 18:36 > Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > Hi Hiroshi, > > we appreciate your change. Thanks for contributing it. > It basically looks good, but I?d like to propose some minor improvements. > > > kernel_crc32_1word_vpmsumd: > > 1. The Pre-align code can be implemented shorter: > clrldi_(prealign, buf, 57); > beq(CCR0, L_alignHead); > > subfic(prealign, prealign, 128); > > 2. I'd prefer the label name ?L_alignedHead?. > > 3. The branch b(L_alignTail) and the label are not needed and should > get removed. > > > kernel_crc32_1word_aligned: > > 1. When saving and restoring non-volatile vector register, please > use offset differences of -16 instead of -32. > (The ABI allows up to 288 bytes to be used in frameless functions so > it will fit if -16 is used.) > > 2. The std instructions should better be used with int offsets so > you can get rid of the addi(offset, offset, -8) instructions. > > > Comments: > For single line comments "//" should be used instead of "/*". Would > be nice if you could change them. > > > Thanks and best regards, > Martin > > > From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] > Sent: Dienstag, 6. September 2016 16:50 > To: hotspot-compiler-dev at openjdk.java.net; vladimir.kozlov at oracle.com > Cc: Volker Simonis (volker.simonis at gmail.com) > ; Doerr, Martin ; > Gustavo Bueno Romero > Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > Dear Vladimir and all: > > Can I please request reviews for the following change? > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8164920 > webrev: http://cr.openjdk.java.net/~gromero/8164920/01/ > > As Volker's comments in the above JIRA, this is a ppc64-only > improvement which will not > affect any of the Oracle platforms in any way. > > This change includes new implementation of CRC32 Intrinsics for ppc64le. > In my local experiment, CRC32 of 64KB was calculated more than 20 > times faster than original. > Performance of CRC32 Intrinsic is important to run recent Apache Cassandra. > A Cassandra daemon needs to read 64KB data from a disk with CRC32 > checksum by default. > > This JIRA entry has "jdk9-fc-request" label. > If there is a chance to include new change in JDK 9 for ppc64le, I > would like to request > a review for this change. > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hotspot.crc32.20160918.changeset Type: application/octet-stream Size: 60026 bytes Desc: not available URL: From david.holmes at oracle.com Mon Sep 19 05:01:47 2016 From: david.holmes at oracle.com (David Holmes) Date: Mon, 19 Sep 2016 15:01:47 +1000 Subject: [9] RFR[XS] 8166096: variable tracking size limit exceeded in jvmciCompilerToVM.cpp In-Reply-To: <57DC99C4.4030701@oracle.com> References: <57DC99C4.4030701@oracle.com> Message-ID: <5157ce5a-2b55-22e8-dd88-68bbfc237dae@oracle.com> On 17/09/2016 11:17 AM, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8166096 > > +++ b/make/lib/JvmOverrideFiles.gmk Fri Sep 16 18:14:06 2016 -0700 > @@ -31,6 +31,7 @@ > > ifeq ($(TOOLCHAIN_TYPE), gcc) > BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := -fno-var-tracking-assignments -O0 > + BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := > -fno-var-tracking-assignments > endif > > ifeq ($(OPENJDK_TARGET_OS), linux) > > > Remove annoying Hotspot compilation warning: Seems reasonable as a short term silencer, but ... does it imply the code needs to be changed somehow? Thanks, David > hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp: In static member > function 'static objArrayHandle > CompilerToVM::initialize_intrinsics(Thread*)': > hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp:206:16: note: variable > tracking size limit exceeded with -fvar-tracking-assignments, retrying > without > objArrayHandle CompilerToVM::initialize_intrinsics(TRAPS) { > > Thanks, > Vladimir From igor.ignatyev at oracle.com Mon Sep 19 09:38:00 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 19 Sep 2016 12:38:00 +0300 Subject: RFR(XS) : 8166164 : compiler/compilercontrol/share/processors/LogProcessor.java does not close Scanner Message-ID: http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ > 16 lines changed: 2 ins; 0 del; 14 mod; Hi all, could you please review this small patch which fixes resource leak in compiler/compilercontrol tests? LogProcessor::getScanner creates a new Scanner, but there is no code which closes it. This leak leads to 'failed to clean up files after test? error from jtreg. the fix was tested by running :hotspot_compiler test group. JBS: https://bugs.openjdk.java.net/browse/JDK-8166164 webrev: http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ Thanks, ? Igor From tobias.hartmann at oracle.com Mon Sep 19 12:15:04 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 19 Sep 2016 14:15:04 +0200 Subject: [9] RFR(S): 8166046: [TESTBUG] compiler/stringopts/TestStringObjectInitialization.java fails with OOME Message-ID: <57DFD6C8.2080508@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8166046 http://cr.openjdk.java.net/~thartmann/8166046/webrev.00/ The test creates 101 threads that each execute a loop with 10.000 iterations that append to a String another String of size 17. This results in a String of size 101 * 10.000 * 17 = 17.170.000 ( = 35 MB). In the failing cases, the test is executed on 32-bit Windows with -Xcomp and -XX:+DeoptimizeALot which increase memory consumption of the VM due to extensive (re-)compilation, deoptimization and re-allocation. The test fails because there is not enough heap space to hold the String. I reduced the number of threads to 32+1 and the number of per-thread iterations to 1000. I verified that this fixes the OOMEs on the failing 32-bit Windows machines. Thanks, Tobias From pavel.punegov at oracle.com Mon Sep 19 12:18:24 2016 From: pavel.punegov at oracle.com (Pavel Punegov) Date: Mon, 19 Sep 2016 15:18:24 +0300 Subject: RFR(XS) : 8166164 : compiler/compilercontrol/share/processors/LogProcessor.java does not close Scanner In-Reply-To: References: Message-ID: Hi Igor, the change looks good. Thanks for fixing. ? Pavel. > On 19 Sep 2016, at 12:38, Igor Ignatyev wrote: > > http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ >> 16 lines changed: 2 ins; 0 del; 14 mod; > > Hi all, > > could you please review this small patch which fixes resource leak in compiler/compilercontrol tests? > LogProcessor::getScanner creates a new Scanner, but there is no code which closes it. This leak leads to 'failed to clean up files after test? error from jtreg. > > the fix was tested by running :hotspot_compiler test group. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8166164 > webrev: http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ > > Thanks, > ? Igor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Mon Sep 19 13:47:50 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 19 Sep 2016 13:47:50 +0000 Subject: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms In-Reply-To: <66073428-8ee1-ecf2-52c0-8f4af5a6e6e8@oracle.com> References: <29e2b45c984248da8172cf921b7811a6@DEWDFE13DE14.global.corp.sap> <66073428-8ee1-ecf2-52c0-8f4af5a6e6e8@oracle.com> Message-ID: <73f98e3882bd46dab427a02de68a1b93@DEWDFE13DE14.global.corp.sap> Hi Vladimir, you're right. I have fixed that too in the new webrev: http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.01/ The 2 LIR_Address constructors you have mentioned don't have many users. The other ones look ok. Thanks and best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Donnerstag, 15. September 2016 20:06 To: hotspot-compiler-dev at openjdk.java.net Cc: Doerr, Martin Subject: Re: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms Good but is is not enough. emit_array_address() in c1_LIRGenerator_x86.cpp has the same problem. I would suggest to look on all places where next methods are called and make sure they are correct: LIR_Address(LIR_Opr base, intx disp, BasicType type) LIR_Address(LIR_Opr base, LIR_Opr index, Scale scale, intx disp, BasicType type) Thanks, Vladimir On 9/15/16 8:25 AM, Doerr, Martin wrote: > Hi, > > > > as discussed with Vladimir, C1 contains code to simplify constant index/displacement addressing which uses int. However, > int may overflow on 64 bit platforms. > > > > Please review the following webrev: > > http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.00/ > > > > I'll also need a sponsor, please. > > > > Thanks and best regards, > > Martin > > > From HORII at jp.ibm.com Sun Sep 18 15:00:57 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Sun, 18 Sep 2016 15:00:57 +0000 Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic In-Reply-To: References: Message-ID: Hi Martin, and all Thank you for your reviewing. Gustavo and I recreated a new change based on your comments. I would like to review it again. My account of cr server is not available now (because of my mistake...) and Gustavo cannot create a webrev file with another reason. I would like to attach a diff file created with "hg diff -g" in hotspot. If possible, could someone create a webrev file with this changeset? I also attach a test program for CRC32 Intrinsic. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo From: "Doerr, Martin" To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-compiler-dev at openjdk.java.net" Cc: "Volker Simonis (volker.simonis at gmail.com)" , Gustavo Bueno Romero Date: 09/13/2016 18:36 Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic Hi Hiroshi, we appreciate your change. Thanks for contributing it. It basically looks good, but I?d like to propose some minor improvements. kernel_crc32_1word_vpmsumd: 1. The Pre-align code can be implemented shorter: clrldi_(prealign, buf, 57); beq(CCR0, L_alignHead); subfic(prealign, prealign, 128); 2. I'd prefer the label name ?L_alignedHead?. 3. The branch b(L_alignTail) and the label are not needed and should get removed. kernel_crc32_1word_aligned: 1. When saving and restoring non-volatile vector register, please use offset differences of -16 instead of -32. (The ABI allows up to 288 bytes to be used in frameless functions so it will fit if -16 is used.) 2. The std instructions should better be used with int offsets so you can get rid of the addi(offset, offset, -8) instructions. Comments: For single line comments "//" should be used instead of "/*". Would be nice if you could change them. Thanks and best regards, Martin From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] Sent: Dienstag, 6. September 2016 16:50 To: hotspot-compiler-dev at openjdk.java.net; vladimir.kozlov at oracle.com Cc: Volker Simonis (volker.simonis at gmail.com) ; Doerr, Martin ; Gustavo Bueno Romero Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic Dear Vladimir and all: Can I please request reviews for the following change? JIRA: https://bugs.openjdk.java.net/browse/JDK-8164920 webrev: http://cr.openjdk.java.net/~gromero/8164920/01/ As Volker's comments in the above JIRA, this is a ppc64-only improvement which will not affect any of the Oracle platforms in any way. This change includes new implementation of CRC32 Intrinsics for ppc64le. In my local experiment, CRC32 of 64KB was calculated more than 20 times faster than original. Performance of CRC32 Intrinsic is important to run recent Apache Cassandra. A Cassandra daemon needs to read 64KB data from a disk with CRC32 checksum by default. This JIRA entry has "jdk9-fc-request" label. If there is a chance to include new change in JDK 9 for ppc64le, I would like to request a review for this change. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CRC32Test.java Type: application/octet-stream Size: 8738 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hotspot.crc32.20160918.changeset Type: application/octet-stream Size: 60026 bytes Desc: not available URL: From erik.joelsson at oracle.com Mon Sep 19 07:58:40 2016 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Mon, 19 Sep 2016 09:58:40 +0200 Subject: [9] RFR[XS] 8166096: variable tracking size limit exceeded in jvmciCompilerToVM.cpp In-Reply-To: <57DC99C4.4030701@oracle.com> References: <57DC99C4.4030701@oracle.com> Message-ID: Looks ok to me. /Erik On 2016-09-17 03:17, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8166096 > > +++ b/make/lib/JvmOverrideFiles.gmk Fri Sep 16 18:14:06 2016 -0700 > @@ -31,6 +31,7 @@ > > ifeq ($(TOOLCHAIN_TYPE), gcc) > BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := > -fno-var-tracking-assignments -O0 > + BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := > -fno-var-tracking-assignments > endif > > ifeq ($(OPENJDK_TARGET_OS), linux) > > > Remove annoying Hotspot compilation warning: > > hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp: In static member > function 'static objArrayHandle > CompilerToVM::initialize_intrinsics(Thread*)': > hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp:206:16: note: > variable tracking size limit exceeded with -fvar-tracking-assignments, > retrying without > objArrayHandle CompilerToVM::initialize_intrinsics(TRAPS) { > > Thanks, > Vladimir From vladimir.x.ivanov at oracle.com Mon Sep 19 16:38:54 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 19 Sep 2016 19:38:54 +0300 Subject: RFR: 8134389: Crash in HotSpot with jvm.dll+0x42b48 ciObjectFactory::create_new_metadata In-Reply-To: <7c1a8b01-b4ec-ea23-b59a-500c1bfd5dbc@oracle.com> References: <05c82c51-9525-eec7-206e-a265c7d47194@oracle.com> <7c1a8b01-b4ec-ea23-b59a-500c1bfd5dbc@oracle.com> Message-ID: <3d9039ee-84a2-d4e8-e36a-08f12c4cd504@oracle.com> Overall, the fix looks good. Some nitpicks: (1) I'd prefer to avoid using ciMethod::is_compiled_lambda_form(); (2) align with other uses of TypeCast for method handles. Also, ciType::is_klass() can be replaced with !ciType::is_primitive_type() check, but IMO it doesn't matter much. Something like the following: diff --git a/src/share/vm/c1/c1_GraphBuilder.cpp b/src/share/vm/c1/c1_GraphBuilder.cpp --- a/src/share/vm/c1/c1_GraphBuilder.cpp +++ b/src/share/vm/c1/c1_GraphBuilder.cpp @@ -1493,6 +1493,24 @@ // Check to see whether we are inlining. If so, Return // instructions become Gotos to the continuation point. if (continuation() != NULL) { + + int invoke_bci = state()->caller_state()->bci(); + + if (x != NULL && !ignore_return) { + ciMethod* caller = state()->scope()->caller()->method(); + Bytecodes::Code invoke_raw_bc = caller->raw_code_at_bci(invoke_bci); + if (invoke_raw_bc == Bytecodes::_invokehandle || + invoke_raw_bc == Bytecodes::_invokedynamic) { + ciType* declared_ret_type = caller->get_declared_signature_at_bci(invoke_bci)->return_type(); + if (declared_ret_type->is_klass() && + x->exact_type() == NULL && + x->declared_type() != declared_ret_type && + declared_ret_type != compilation()->env()->Object_klass()) { + x = append(new TypeCast(declared_ret_type->as_klass(), x, copy_state_before())); + } + } + } + assert(!method()->is_synchronized() || InlineSynchronizedMethods, "can not inline synchronized methods yet"); if (compilation()->env()->dtrace_method_probes()) { @@ -1516,7 +1534,6 @@ // State at end of inlined method is the state of the caller // without the method parameters on stack, including the // return value, if any, of the inlined method on operand stack. - int invoke_bci = state()->caller_state()->bci(); set_state(state()->caller_state()->copy_for_parsing()); if (x != NULL) { if (!ignore_return) { diff --git a/src/share/vm/c1/c1_Instruction.cpp b/src/share/vm/c1/c1_Instruction.cpp --- a/src/share/vm/c1/c1_Instruction.cpp +++ b/src/share/vm/c1/c1_Instruction.cpp @@ -360,7 +360,8 @@ } ciType* Invoke::declared_type() const { - ciType *t = _target->signature()->return_type(); + ciSignature* declared_signature = state()->scope()->method()->get_declared_signature_at_bci(state()->bci()); + ciType *t = declared_signature->return_type(); assert(t->basic_type() != T_VOID, "need return value of void method?"); return t; } diff --git a/src/share/vm/ci/ciMethod.hpp b/src/share/vm/ci/ciMethod.hpp --- a/src/share/vm/ci/ciMethod.hpp +++ b/src/share/vm/ci/ciMethod.hpp @@ -255,6 +255,12 @@ ciSignature* ignored_declared_signature; return get_method_at_bci(bci, ignored_will_link, &ignored_declared_signature); } + ciSignature* get_declared_signature_at_bci(int bci) { + bool ignored_will_link; + ciSignature* declared_signature; + get_method_at_bci(bci, ignored_will_link, &declared_signature); + return declared_signature; + } // Given a certain calling environment, find the monomorphic target // for the call. Return NULL if the call is not monomorphic in Best regards, Vladimir Ivanov On 9/11/16 2:51 PM, Jamsheed C m wrote: > i made some changes to my fix. webrev is updated in place. > > pit results with latest modification updated in bug(not still completed) > > Best Regards, > > Jamsheed > > > On 9/10/2016 3:53 AM, Jamsheed C m wrote: >> >> adding a little more description as per my understanding >> >> This issue can happen only for compiled lforms not inlined case >> >> there are two scenarios. >> 1) no compiled lforms inlined >> 2) some compiled lforms are inlined or final method is not inlined >> (linkTo* not inlined).. (i.e partially inlined) >> >> in all these cases *Invoke instruction* will be *return Value*. and >> will have erased type. >> so we reify return type either by type casting(for partially inlined >> case) or by directly pulling from callsite MT. >> >> Best Regards, >> >> Jamsheed >> >> >> On 9/8/2016 3:26 PM, Jamsheed C m wrote: >>> Hi All, >>> >>> bugid: https://bugs.openjdk.java.net/browse/JDK-8134389 >>> >>> webrev: http://cr.openjdk.java.net/~jcm/8134389/webrev.00/ >>> >>> return type information is not available in lforms, this causes >>> contradictions in operation like store indexed. mh _linkTo* site arg >>> type casting. etc.. >>> >>> fix: TypeCast to declared return type at lform return. >>> >>> Best Regards, >>> >>> Jamsheed >>> >> > From vladimir.kozlov at oracle.com Mon Sep 19 17:10:00 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Sep 2016 10:10:00 -0700 Subject: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms In-Reply-To: <73f98e3882bd46dab427a02de68a1b93@DEWDFE13DE14.global.corp.sap> References: <29e2b45c984248da8172cf921b7811a6@DEWDFE13DE14.global.corp.sap> <66073428-8ee1-ecf2-52c0-8f4af5a6e6e8@oracle.com> <73f98e3882bd46dab427a02de68a1b93@DEWDFE13DE14.global.corp.sap> Message-ID: This looks good. Thanks, Vladimir On 9/19/16 6:47 AM, Doerr, Martin wrote: > Hi Vladimir, > > you're right. I have fixed that too in the new webrev: > http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.01/ > > The 2 LIR_Address constructors you have mentioned don't have many users. The other ones look ok. > > Thanks and best regards, > Martin > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 15. September 2016 20:06 > To: hotspot-compiler-dev at openjdk.java.net > Cc: Doerr, Martin > Subject: Re: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms > > Good but is is not enough. > > emit_array_address() in c1_LIRGenerator_x86.cpp has the same problem. > I would suggest to look on all places where next methods are called and make sure they are correct: > > LIR_Address(LIR_Opr base, intx disp, BasicType type) > LIR_Address(LIR_Opr base, LIR_Opr index, Scale scale, intx disp, BasicType type) > > Thanks, > Vladimir > > On 9/15/16 8:25 AM, Doerr, Martin wrote: >> Hi, >> >> >> >> as discussed with Vladimir, C1 contains code to simplify constant index/displacement addressing which uses int. However, >> int may overflow on 64 bit platforms. >> >> >> >> Please review the following webrev: >> >> http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.00/ >> >> >> >> I'll also need a sponsor, please. >> >> >> >> Thanks and best regards, >> >> Martin >> >> >> From vladimir.kozlov at oracle.com Mon Sep 19 17:24:48 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Sep 2016 10:24:48 -0700 Subject: [9] RFR[XS] 8166096: variable tracking size limit exceeded in jvmciCompilerToVM.cpp In-Reply-To: <5157ce5a-2b55-22e8-dd88-68bbfc237dae@oracle.com> References: <57DC99C4.4030701@oracle.com> <5157ce5a-2b55-22e8-dd88-68bbfc237dae@oracle.com> Message-ID: Thank you, David, for review. I don't think we can do anything more here. It is number of intrinsics which overflow some internal buffer related to var-tracking-assignments in gcc during compilation: VM_INTRINSICS_DO(VM_INTRINSIC_INFO, VM_SYMBOL_IGNORE, VM_SYMBOL_IGNORE, VM_SYMBOL_IGNORE, VM_ALIAS_IGNORE) Thanks, Vladimir On 9/18/16 10:01 PM, David Holmes wrote: > On 17/09/2016 11:17 AM, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8166096 >> >> +++ b/make/lib/JvmOverrideFiles.gmk Fri Sep 16 18:14:06 2016 -0700 >> @@ -31,6 +31,7 @@ >> >> ifeq ($(TOOLCHAIN_TYPE), gcc) >> BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := -fno-var-tracking-assignments -O0 >> + BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := >> -fno-var-tracking-assignments >> endif >> >> ifeq ($(OPENJDK_TARGET_OS), linux) >> >> >> Remove annoying Hotspot compilation warning: > > Seems reasonable as a short term silencer, but ... does it imply the code needs to be changed somehow? > > Thanks, > David > >> hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp: In static member >> function 'static objArrayHandle >> CompilerToVM::initialize_intrinsics(Thread*)': >> hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp:206:16: note: variable >> tracking size limit exceeded with -fvar-tracking-assignments, retrying >> without >> objArrayHandle CompilerToVM::initialize_intrinsics(TRAPS) { >> >> Thanks, >> Vladimir From vladimir.kozlov at oracle.com Mon Sep 19 17:25:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Sep 2016 10:25:12 -0700 Subject: [9] RFR[XS] 8166096: variable tracking size limit exceeded in jvmciCompilerToVM.cpp In-Reply-To: References: <57DC99C4.4030701@oracle.com> Message-ID: <07c1b43a-648c-d2f2-d1ac-61a6f6ff58fd@oracle.com> Thank you, Erik Vladimir On 9/19/16 12:58 AM, Erik Joelsson wrote: > Looks ok to me. > > /Erik > > > On 2016-09-17 03:17, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8166096 >> >> +++ b/make/lib/JvmOverrideFiles.gmk Fri Sep 16 18:14:06 2016 -0700 >> @@ -31,6 +31,7 @@ >> >> ifeq ($(TOOLCHAIN_TYPE), gcc) >> BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := -fno-var-tracking-assignments -O0 >> + BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := -fno-var-tracking-assignments >> endif >> >> ifeq ($(OPENJDK_TARGET_OS), linux) >> >> >> Remove annoying Hotspot compilation warning: >> >> hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp: In static member function 'static objArrayHandle >> CompilerToVM::initialize_intrinsics(Thread*)': >> hotspot/src/share/vm/jvmci/jvmciCompilerToVM.cpp:206:16: note: variable tracking size limit exceeded with >> -fvar-tracking-assignments, retrying without >> objArrayHandle CompilerToVM::initialize_intrinsics(TRAPS) { >> >> Thanks, >> Vladimir > From vladimir.kozlov at oracle.com Mon Sep 19 17:26:36 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Sep 2016 10:26:36 -0700 Subject: RFR(XS) : 8166164 : compiler/compilercontrol/share/processors/LogProcessor.java does not close Scanner In-Reply-To: References: Message-ID: Good. thanks, Vladimir On 9/19/16 2:38 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ >> 16 lines changed: 2 ins; 0 del; 14 mod; > > Hi all, > > could you please review this small patch which fixes resource leak in compiler/compilercontrol tests? > LogProcessor::getScanner creates a new Scanner, but there is no code which closes it. This leak leads to 'failed to clean up files after test? error from jtreg. > > the fix was tested by running :hotspot_compiler test group. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8166164 > webrev: http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ > > Thanks, > ? Igor > From vladimir.kozlov at oracle.com Mon Sep 19 17:32:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 19 Sep 2016 10:32:05 -0700 Subject: [9] RFR(S): 8166046: [TESTBUG] compiler/stringopts/TestStringObjectInitialization.java fails with OOME In-Reply-To: <57DFD6C8.2080508@oracle.com> References: <57DFD6C8.2080508@oracle.com> Message-ID: <1391f045-7e4a-9301-0c4d-4a0083403486@oracle.com> Should we scale down compilation threshold too? The test verifies C2 optimization we need to make sure that we still catch original JDK-8159244 problem. Thanks, Vladimir On 9/19/16 5:15 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8166046 > http://cr.openjdk.java.net/~thartmann/8166046/webrev.00/ > > The test creates 101 threads that each execute a loop with 10.000 iterations that append to a String another String of size 17. This results in a String of size 101 * 10.000 * 17 = 17.170.000 ( = 35 MB). In the failing cases, the test is executed on 32-bit Windows with -Xcomp and -XX:+DeoptimizeALot which increase memory consumption of the VM due to extensive (re-)compilation, deoptimization and re-allocation. The test fails because there is not enough heap space to hold the String. > > I reduced the number of threads to 32+1 and the number of per-thread iterations to 1000. I verified that this fixes the OOMEs on the failing 32-bit Windows machines. > > Thanks, > Tobias > From doug.simon at oracle.com Mon Sep 19 20:21:30 2016 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 19 Sep 2016 22:21:30 +0200 Subject: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI In-Reply-To: <92B9E4F8-DF56-475B-A9EC-6FB179C58925@twitter.com> References: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> <7ED300F2-253B-4550-BF5E-878A99EDAEB2@oracle.com> <92B9E4F8-DF56-475B-A9EC-6FB179C58925@twitter.com> Message-ID: <8A747A8B-7D76-471C-9709-4F850629F67C@oracle.com> > On 06 Sep 2016, at 23:58, Christian Thalinger wrote: > >> >> On Sep 6, 2016, at 11:37 AM, Doug Simon wrote: >> >> >>> On 06 Sep 2016, at 20:14, Christian Thalinger wrote: >>> >>> >>>> On Sep 5, 2016, at 6:49 AM, Doug Simon wrote: >>>> >>>> In jvmci-8, we increased the interpreter code size when JVMCI code is included: >>>> >>>> http://hg.openjdk.java.net/graal/graal-jvmci-8/file/a074ae16281d/src/cpu/x86/vm/templateInterpreter_x86.hpp#l37 >>> >>> What about SPARC? Have we ever seen a problem there? Or AArch64 for that matter? >> >> I?ve only ever seen problems on AMD64. I?ve never seen it on SPARC and have never run on AArch64. >> >> The real fix is that the interpreter generator should never have to guess the size of the code buffer it needs but should resize things as needed after generating the interpreter. > > Yes, it should. In the hope that this gets addressed one day: https://bugs.openjdk.java.net/browse/JDK-8166317 -Doug From tobias.hartmann at oracle.com Tue Sep 20 08:27:35 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Sep 2016 10:27:35 +0200 Subject: [9] RFR(S): 8166046: [TESTBUG] compiler/stringopts/TestStringObjectInitialization.java fails with OOME In-Reply-To: <1391f045-7e4a-9301-0c4d-4a0083403486@oracle.com> References: <57DFD6C8.2080508@oracle.com> <1391f045-7e4a-9301-0c4d-4a0083403486@oracle.com> Message-ID: <57E0F2F7.8000808@oracle.com> Hi Vladimir, On 19.09.2016 19:32, Vladimir Kozlov wrote: > Should we scale down compilation threshold too? The test verifies C2 optimization we need to make sure that we still catch original JDK-8159244 problem. I verified that the test still (rarely) triggers the problem I fixed with JDK-8159244 but while testing I found a better way to avoid the OOMEs: http://cr.openjdk.java.net/~thartmann/8166046/webrev.01 This significantly reduces the runtime of the test from 1m30s to 12s on my machine and triggers JDK-8159244 in 100% of the runs. I verified that the patch still fixes the OOME's on the 32-bit Windows machines. Thanks, Tobias > Thanks, > Vladimir > > On 9/19/16 5:15 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8166046 >> http://cr.openjdk.java.net/~thartmann/8166046/webrev.00/ >> >> The test creates 101 threads that each execute a loop with 10.000 iterations that append to a String another String of size 17. This results in a String of size 101 * 10.000 * 17 = 17.170.000 ( = 35 MB). In the failing cases, the test is executed on 32-bit Windows with -Xcomp and -XX:+DeoptimizeALot which increase memory consumption of the VM due to extensive (re-)compilation, deoptimization and re-allocation. The test fails because there is not enough heap space to hold the String. >> >> I reduced the number of threads to 32+1 and the number of per-thread iterations to 1000. I verified that this fixes the OOMEs on the failing 32-bit Windows machines. >> >> Thanks, >> Tobias >> From igor.ignatyev at oracle.com Tue Sep 20 14:36:55 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 20 Sep 2016 17:36:55 +0300 Subject: RFR(XS) : 8166164 : compiler/compilercontrol/share/processors/LogProcessor.java does not close Scanner In-Reply-To: References: Message-ID: Vladimir, thank you for review. ? Igor > On Sep 19, 2016, at 8:26 PM, Vladimir Kozlov wrote: > > Good. > > thanks, > Vladimir > > On 9/19/16 2:38 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ >>> 16 lines changed: 2 ins; 0 del; 14 mod; >> >> Hi all, >> >> could you please review this small patch which fixes resource leak in compiler/compilercontrol tests? >> LogProcessor::getScanner creates a new Scanner, but there is no code which closes it. This leak leads to 'failed to clean up files after test? error from jtreg. >> >> the fix was tested by running :hotspot_compiler test group. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8166164 >> webrev: http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ >> >> Thanks, >> ? Igor >> From igor.ignatyev at oracle.com Tue Sep 20 14:37:14 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 20 Sep 2016 17:37:14 +0300 Subject: RFR(XS) : 8166164 : compiler/compilercontrol/share/processors/LogProcessor.java does not close Scanner In-Reply-To: References: Message-ID: <427682F7-2B61-464A-987D-C8BEC6FEACC4@oracle.com> Pavel, thanks for review. ? Igor > On Sep 19, 2016, at 3:18 PM, Pavel Punegov wrote: > > Hi Igor, > > the change looks good. Thanks for fixing. > > ? Pavel. > >> On 19 Sep 2016, at 12:38, Igor Ignatyev wrote: >> >> http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ >>> 16 lines changed: 2 ins; 0 del; 14 mod; >> >> Hi all, >> >> could you please review this small patch which fixes resource leak in compiler/compilercontrol tests? >> LogProcessor::getScanner creates a new Scanner, but there is no code which closes it. This leak leads to 'failed to clean up files after test? error from jtreg. >> >> the fix was tested by running :hotspot_compiler test group. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8166164 >> webrev: http://cr.openjdk.java.net/~iignatyev/8166164/webrev.00/ >> >> Thanks, >> ? Igor >> > From vladimir.kozlov at oracle.com Tue Sep 20 16:58:12 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Sep 2016 09:58:12 -0700 Subject: [9] RFR(S): 8166046: [TESTBUG] compiler/stringopts/TestStringObjectInitialization.java fails with OOME In-Reply-To: <57E0F2F7.8000808@oracle.com> References: <57DFD6C8.2080508@oracle.com> <1391f045-7e4a-9301-0c4d-4a0083403486@oracle.com> <57E0F2F7.8000808@oracle.com> Message-ID: Perfect. thanks, Vladimir On 9/20/16 1:27 AM, Tobias Hartmann wrote: > Hi Vladimir, > > On 19.09.2016 19:32, Vladimir Kozlov wrote: >> Should we scale down compilation threshold too? The test verifies C2 optimization we need to make sure that we still catch original JDK-8159244 problem. > > I verified that the test still (rarely) triggers the problem I fixed with JDK-8159244 but while testing I found a better way to avoid the OOMEs: > http://cr.openjdk.java.net/~thartmann/8166046/webrev.01 > > This significantly reduces the runtime of the test from 1m30s to 12s on my machine and triggers JDK-8159244 in 100% of the runs. I verified that the patch still fixes the OOME's on the 32-bit Windows machines. > > Thanks, > Tobias > >> Thanks, >> Vladimir >> >> On 9/19/16 5:15 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8166046 >>> http://cr.openjdk.java.net/~thartmann/8166046/webrev.00/ >>> >>> The test creates 101 threads that each execute a loop with 10.000 iterations that append to a String another String of size 17. This results in a String of size 101 * 10.000 * 17 = 17.170.000 ( = 35 MB). In the failing cases, the test is executed on 32-bit Windows with -Xcomp and -XX:+DeoptimizeALot which increase memory consumption of the VM due to extensive (re-)compilation, deoptimization and re-allocation. The test fails because there is not enough heap space to hold the String. >>> >>> I reduced the number of threads to 32+1 and the number of per-thread iterations to 1000. I verified that this fixes the OOMEs on the failing 32-bit Windows machines. >>> >>> Thanks, >>> Tobias >>> From tobias.hartmann at oracle.com Tue Sep 20 17:01:28 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 20 Sep 2016 19:01:28 +0200 Subject: [9] RFR(S): 8166046: [TESTBUG] compiler/stringopts/TestStringObjectInitialization.java fails with OOME In-Reply-To: References: <57DFD6C8.2080508@oracle.com> <1391f045-7e4a-9301-0c4d-4a0083403486@oracle.com> <57E0F2F7.8000808@oracle.com> Message-ID: <57E16B68.1040001@oracle.com> Thanks, Vladimir. Best regards, Tobias On 20.09.2016 18:58, Vladimir Kozlov wrote: > Perfect. > > thanks, > Vladimir > > On 9/20/16 1:27 AM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 19.09.2016 19:32, Vladimir Kozlov wrote: >>> Should we scale down compilation threshold too? The test verifies C2 optimization we need to make sure that we still catch original JDK-8159244 problem. >> >> I verified that the test still (rarely) triggers the problem I fixed with JDK-8159244 but while testing I found a better way to avoid the OOMEs: >> http://cr.openjdk.java.net/~thartmann/8166046/webrev.01 >> >> This significantly reduces the runtime of the test from 1m30s to 12s on my machine and triggers JDK-8159244 in 100% of the runs. I verified that the patch still fixes the OOME's on the 32-bit Windows machines. >> >> Thanks, >> Tobias >> >>> Thanks, >>> Vladimir >>> >>> On 9/19/16 5:15 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8166046 >>>> http://cr.openjdk.java.net/~thartmann/8166046/webrev.00/ >>>> >>>> The test creates 101 threads that each execute a loop with 10.000 iterations that append to a String another String of size 17. This results in a String of size 101 * 10.000 * 17 = 17.170.000 ( = 35 MB). In the failing cases, the test is executed on 32-bit Windows with -Xcomp and -XX:+DeoptimizeALot which increase memory consumption of the VM due to extensive (re-)compilation, deoptimization and re-allocation. The test fails because there is not enough heap space to hold the String. >>>> >>>> I reduced the number of threads to 32+1 and the number of per-thread iterations to 1000. I verified that this fixes the OOMEs on the failing 32-bit Windows machines. >>>> >>>> Thanks, >>>> Tobias >>>> From HORII at jp.ibm.com Tue Sep 20 17:03:19 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Wed, 21 Sep 2016 02:03:19 +0900 Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic In-Reply-To: References: Message-ID: Hi all, Martin thankfully created a webrev with some good correction. http://cr.openjdk.java.net/~mdoerr/8164920_ppc_crc32/webrev.01/ Could someone review this change again? Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo From: Hiroshi H Horii/Japan/IBM To: "Doerr, Martin" Cc: Gustavo Bueno Romero , "hotspot-compiler-dev at openjdk.java.net" , "Volker Simonis (volker.simonis at gmail.com)" Date: 09/19/2016 02:36 Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic Hi Martin, and all (Please allow me to send this mail twice. The first mail is awaiting because it exceeded 100KB) Thank you for your reviewing. Gustavo and I recreated a new change based on your comments. I would like to request a review again. My account of cr server is not available now (because of my mistake...) and Gustavo cannot create a webrev file with another reason. I would like to attach a diff file created with "hg diff -g" in hotspot. If possible, could someone create a webrev file with this changeset? [attachment "hotspot.crc32.20160918.changeset" deleted by Hiroshi H Horii/Japan/IBM] Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo "Doerr, Martin" wrote on 09/13/2016 18:35:09: > From: "Doerr, Martin" > To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-compiler- > dev at openjdk.java.net" > Cc: "Volker Simonis (volker.simonis at gmail.com)" > , Gustavo Bueno Romero > Date: 09/13/2016 18:36 > Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > Hi Hiroshi, > > we appreciate your change. Thanks for contributing it. > It basically looks good, but I?d like to propose some minor improvements. > > > kernel_crc32_1word_vpmsumd: > > 1. The Pre-align code can be implemented shorter: > clrldi_(prealign, buf, 57); > beq(CCR0, L_alignHead); > > subfic(prealign, prealign, 128); > > 2. I'd prefer the label name ?L_alignedHead?. > > 3. The branch b(L_alignTail) and the label are not needed and should > get removed. > > > kernel_crc32_1word_aligned: > > 1. When saving and restoring non-volatile vector register, please > use offset differences of -16 instead of -32. > (The ABI allows up to 288 bytes to be used in frameless functions so > it will fit if -16 is used.) > > 2. The std instructions should better be used with int offsets so > you can get rid of the addi(offset, offset, -8) instructions. > > > Comments: > For single line comments "//" should be used instead of "/*". Would > be nice if you could change them. > > > Thanks and best regards, > Martin > > > From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] > Sent: Dienstag, 6. September 2016 16:50 > To: hotspot-compiler-dev at openjdk.java.net; vladimir.kozlov at oracle.com > Cc: Volker Simonis (volker.simonis at gmail.com) > ; Doerr, Martin ; > Gustavo Bueno Romero > Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > Dear Vladimir and all: > > Can I please request reviews for the following change? > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8164920 > webrev: http://cr.openjdk.java.net/~gromero/8164920/01/ > > As Volker's comments in the above JIRA, this is a ppc64-only > improvement which will not > affect any of the Oracle platforms in any way. > > This change includes new implementation of CRC32 Intrinsics for ppc64le. > In my local experiment, CRC32 of 64KB was calculated more than 20 > times faster than original. > Performance of CRC32 Intrinsic is important to run recent Apache Cassandra. > A Cassandra daemon needs to read 64KB data from a disk with CRC32 > checksum by default. > > This JIRA entry has "jdk9-fc-request" label. > If there is a chance to include new change in JDK 9 for ppc64le, I > would like to request > a review for this change. > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From cthalinger at twitter.com Tue Sep 20 17:59:30 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Tue, 20 Sep 2016 07:59:30 -1000 Subject: RFR: 8165457: [JVMCI] increase InterpreterCodeSize for JVMCI In-Reply-To: <8A747A8B-7D76-471C-9709-4F850629F67C@oracle.com> References: <39E38A4A-7DEB-49C3-BC8B-C41C9F0F0ED1@oracle.com> <7ED300F2-253B-4550-BF5E-878A99EDAEB2@oracle.com> <92B9E4F8-DF56-475B-A9EC-6FB179C58925@twitter.com> <8A747A8B-7D76-471C-9709-4F850629F67C@oracle.com> Message-ID: > On Sep 19, 2016, at 10:21 AM, Doug Simon wrote: > > >> On 06 Sep 2016, at 23:58, Christian Thalinger wrote: >> >>> >>> On Sep 6, 2016, at 11:37 AM, Doug Simon wrote: >>> >>> >>>> On 06 Sep 2016, at 20:14, Christian Thalinger wrote: >>>> >>>> >>>>> On Sep 5, 2016, at 6:49 AM, Doug Simon wrote: >>>>> >>>>> In jvmci-8, we increased the interpreter code size when JVMCI code is included: >>>>> >>>>> http://hg.openjdk.java.net/graal/graal-jvmci-8/file/a074ae16281d/src/cpu/x86/vm/templateInterpreter_x86.hpp#l37 >>>> >>>> What about SPARC? Have we ever seen a problem there? Or AArch64 for that matter? >>> >>> I?ve only ever seen problems on AMD64. I?ve never seen it on SPARC and have never run on AArch64. >>> >>> The real fix is that the interpreter generator should never have to guess the size of the code buffer it needs but should resize things as needed after generating the interpreter. >> >> Yes, it should. > > In the hope that this gets addressed one day: https://bugs.openjdk.java.net/browse/JDK-8166317 ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Tue Sep 20 23:02:58 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Tue, 20 Sep 2016 23:02:58 +0000 Subject: CR for RFR 8129376 Message-ID: Hi Folks, Performance on client x86 targets was hampered in two SPECjvm98 metrics (mpegaudio and mtrt) for 32-bit since we added AVX512. I also checked to make sure only client on x86-32 was affected. I have mitigated this by altering the xmm pad modeling in the x86-32-bit machine description so that register allocation cannot see the dummy definitions, enabling the desired performance while retaining correctness for 32-bit on AVX512. This code was tested as follows: hotspot jreg, SPECjvm2008, SPECjvm98 on hsw, skx and knl targets complete with no issues on 32-bit. These changes do not alter behavior on x86-64. Bug-id: https://bugs.openjdk.java.net/browse/JDK-8129376 webrev: http://cr.openjdk.java.net/~mcberg/8129376/webrev.01 Regards, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Sep 20 23:15:52 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Sep 2016 16:15:52 -0700 Subject: CR for RFR 8129376 In-Reply-To: References: Message-ID: <57E1C328.4020402@oracle.com> Changes look good. Michael, can you explain how pads affected code generation to cause regression? .ad changes affects Server VM (c2) code generation. Do you need it based on performance numbers? TRy running Server VM with -XX:-TieredCompilation. Thanks, Vladimir On 9/20/16 4:02 PM, Berg, Michael C wrote: > Hi Folks, > > Performance on client x86 targets was hampered in two SPECjvm98 metrics (mpegaudio and mtrt) for 32-bit since we added AVX512. I also checked to make sure only client on x86-32 was affected. I have > mitigated this by altering the xmm pad modeling in the x86-32-bit machine description so that register allocation cannot see the dummy definitions, enabling the desired performance while retaining > correctness for 32-bit on AVX512. > > This code was tested as follows: hotspot jreg, SPECjvm2008, SPECjvm98 on hsw, skx and knl targets complete with no issues on 32-bit. These changes do not alter behavior on x86-64. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8129376 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8129376/webrev.01 > > Regards, > > Michael > From michael.c.berg at intel.com Tue Sep 20 23:22:09 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Tue, 20 Sep 2016 23:22:09 +0000 Subject: CR for RFR 8129376 In-Reply-To: <57E1C328.4020402@oracle.com> References: <57E1C328.4020402@oracle.com> Message-ID: Vladmir, The way they were versed they caused allocation issues on part of the xmm bank for client only. I believe I ran with both tiered off and inlining off while sleuthing the issue and after I applied the change as part of my verification process. I tested client and server 32-bit. With the change applied 32-bit server performance does not seem affected. The generated code looks like it did before the change was applied now. Regards, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, September 20, 2016 4:16 PM To: hotspot-compiler-dev at openjdk.java.net Cc: Berg, Michael C Subject: Re: CR for RFR 8129376 Changes look good. Michael, can you explain how pads affected code generation to cause regression? .ad changes affects Server VM (c2) code generation. Do you need it based on performance numbers? TRy running Server VM with -XX:-TieredCompilation. Thanks, Vladimir On 9/20/16 4:02 PM, Berg, Michael C wrote: > Hi Folks, > > Performance on client x86 targets was hampered in two SPECjvm98 > metrics (mpegaudio and mtrt) for 32-bit since we added AVX512. I > also checked to make sure only client on x86-32 was affected. I have mitigated this by altering the xmm pad modeling in the x86-32-bit machine description so that register allocation cannot see the dummy definitions, enabling the desired performance while retaining correctness for 32-bit on AVX512. > > This code was tested as follows: hotspot jreg, SPECjvm2008, SPECjvm98 on hsw, skx and knl targets complete with no issues on 32-bit. These changes do not alter behavior on x86-64. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8129376 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8129376/webrev.01 > > Regards, > > Michael > From michael.c.berg at intel.com Tue Sep 20 23:24:19 2016 From: michael.c.berg at intel.com (Berg, Michael C) Date: Tue, 20 Sep 2016 23:24:19 +0000 Subject: CR for RFR 8129376 References: <57E1C328.4020402@oracle.com> Message-ID: Small augment... -----Original Message----- From: Berg, Michael C Sent: Tuesday, September 20, 2016 4:22 PM To: 'Vladimir Kozlov' ; hotspot-compiler-dev at openjdk.java.net Subject: RE: CR for RFR 8129376 Vladmir, The way they were versed they caused allocation issues on part of the xmm bank for client only. I believe I ran with both tiered off and inlining off while sleuthing the issue and after I applied the change as part of my verification process. I tested client and server 32-bit. With the change applied 32-bit server performance does not seem affected. The generated code looks like it did before the change was applied now. Regards, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, September 20, 2016 4:16 PM To: hotspot-compiler-dev at openjdk.java.net Cc: Berg, Michael C Subject: Re: CR for RFR 8129376 Changes look good. Michael, can you explain how pads affected code generation to cause regression? .ad changes affects Server VM (c2) code generation. Do you need it based on performance numbers? TRy running Server VM with -XX:-TieredCompilation. Thanks, Vladimir On 9/20/16 4:02 PM, Berg, Michael C wrote: > Hi Folks, > > Performance on client x86 targets was hampered in two SPECjvm98 > metrics (mpegaudio and mtrt) for 32-bit since we added AVX512. I > also checked to make sure only client on x86-32 was affected. I have mitigated this by altering the xmm pad modeling in the x86-32-bit machine description so that register allocation cannot see the dummy definitions, enabling the desired performance while retaining correctness for 32-bit on AVX512. > > This code was tested as follows: hotspot jreg, SPECjvm2008, SPECjvm98 on hsw, skx and knl targets complete with no issues on 32-bit. These changes do not alter behavior on x86-64. > > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8129376 > > > webrev: > > http://cr.openjdk.java.net/~mcberg/8129376/webrev.01 > > Regards, > > Michael > From vladimir.kozlov at oracle.com Tue Sep 20 23:40:21 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Sep 2016 16:40:21 -0700 Subject: CR for RFR 8129376 In-Reply-To: References: <57E1C328.4020402@oracle.com> Message-ID: <57E1C8E5.8060808@oracle.com> Okay, goods then. I will sponsor it. Thanks, Vladimir On 9/20/16 4:24 PM, Berg, Michael C wrote: > Small augment... > > -----Original Message----- > From: Berg, Michael C > Sent: Tuesday, September 20, 2016 4:22 PM > To: 'Vladimir Kozlov' ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: CR for RFR 8129376 > > Vladmir, > > The way they were versed they caused allocation issues on part of the xmm bank for client only. > I believe I ran with both tiered off and inlining off while sleuthing the issue and after I applied the change as part of my verification process. I tested client and server 32-bit. > With the change applied 32-bit server performance does not seem affected. The generated code looks like it did before the change was applied now. > > Regards, > Michael > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, September 20, 2016 4:16 PM > To: hotspot-compiler-dev at openjdk.java.net > Cc: Berg, Michael C > Subject: Re: CR for RFR 8129376 > > Changes look good. > > Michael, can you explain how pads affected code generation to cause regression? > .ad changes affects Server VM (c2) code generation. Do you need it based on performance numbers? TRy running Server VM with -XX:-TieredCompilation. > > Thanks, > Vladimir > > On 9/20/16 4:02 PM, Berg, Michael C wrote: >> Hi Folks, >> >> Performance on client x86 targets was hampered in two SPECjvm98 >> metrics (mpegaudio and mtrt) for 32-bit since we added AVX512. I >> also checked to make sure only client on x86-32 was affected. I have mitigated this by altering the xmm pad modeling in the x86-32-bit machine description so that register allocation cannot see the dummy definitions, enabling the desired performance while retaining correctness for 32-bit on AVX512. >> >> This code was tested as follows: hotspot jreg, SPECjvm2008, SPECjvm98 on hsw, skx and knl targets complete with no issues on 32-bit. These changes do not alter behavior on x86-64. >> >> >> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8129376 >> >> >> webrev: >> >> http://cr.openjdk.java.net/~mcberg/8129376/webrev.01 >> >> Regards, >> >> Michael >> From tobias.hartmann at oracle.com Wed Sep 21 07:01:42 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 21 Sep 2016 09:01:42 +0200 Subject: [9] RFR(S): 8161085: PreserveFPRegistersTest fails with 'AssertionError: Final value has changed' Message-ID: <57E23056.50200@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8161085 http://cr.openjdk.java.net/~thartmann/8161085/webrev.00/ This problem is very similar to JDK-8148175, the test fails because G1 barriers emitted by C1 do not preserve floating point registers on SPARC. The problem is that the barrier code calls into the runtime without saving/restoring the FP registers. I fixed this by using save_live_registers() instead of manually saving individual registers. Tested with failing regression test on SPARC and RBT (running). Thanks, Tobias From vladimir.x.ivanov at oracle.com Wed Sep 21 10:07:43 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 21 Sep 2016 13:07:43 +0300 Subject: [9] RFR(S): 8161085: PreserveFPRegistersTest fails with 'AssertionError: Final value has changed' In-Reply-To: <57E23056.50200@oracle.com> References: <57E23056.50200@oracle.com> Message-ID: Looks good. Best regards, Vladimir Ivanov On 9/21/16 10:01 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8161085 > http://cr.openjdk.java.net/~thartmann/8161085/webrev.00/ > > This problem is very similar to JDK-8148175, the test fails because G1 barriers emitted by C1 do not preserve floating point registers on SPARC. The problem is that the barrier code calls into the runtime without saving/restoring the FP registers. I fixed this by using save_live_registers() instead of manually saving individual registers. > > Tested with failing regression test on SPARC and RBT (running). > > Thanks, > Tobias > From tobias.hartmann at oracle.com Wed Sep 21 10:08:11 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 21 Sep 2016 12:08:11 +0200 Subject: [9] RFR(S): 8161085: PreserveFPRegistersTest fails with 'AssertionError: Final value has changed' In-Reply-To: References: <57E23056.50200@oracle.com> Message-ID: <57E25C0B.8090000@oracle.com> Thanks, Vladimir! Best regards, Tobias On 21.09.2016 12:07, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 9/21/16 10:01 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8161085 >> http://cr.openjdk.java.net/~thartmann/8161085/webrev.00/ >> >> This problem is very similar to JDK-8148175, the test fails because G1 barriers emitted by C1 do not preserve floating point registers on SPARC. The problem is that the barrier code calls into the runtime without saving/restoring the FP registers. I fixed this by using save_live_registers() instead of manually saving individual registers. >> >> Tested with failing regression test on SPARC and RBT (running). >> >> Thanks, >> Tobias >> From vladimir.kozlov at oracle.com Wed Sep 21 16:02:05 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Sep 2016 09:02:05 -0700 Subject: [9] RFR(S): 8161085: PreserveFPRegistersTest fails with 'AssertionError: Final value has changed' In-Reply-To: <57E23056.50200@oracle.com> References: <57E23056.50200@oracle.com> Message-ID: <8add0ad3-f4bf-c81d-823e-97fe509b2ef3@oracle.com> Good. Thanks, Vladimir On 9/21/16 12:01 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8161085 > http://cr.openjdk.java.net/~thartmann/8161085/webrev.00/ > > This problem is very similar to JDK-8148175, the test fails because G1 barriers emitted by C1 do not preserve floating point registers on SPARC. The problem is that the barrier code calls into the runtime without saving/restoring the FP registers. I fixed this by using save_live_registers() instead of manually saving individual registers. > > Tested with failing regression test on SPARC and RBT (running). > > Thanks, > Tobias > From tobias.hartmann at oracle.com Wed Sep 21 16:07:47 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 21 Sep 2016 18:07:47 +0200 Subject: [9] RFR(S): 8161085: PreserveFPRegistersTest fails with 'AssertionError: Final value has changed' In-Reply-To: <8add0ad3-f4bf-c81d-823e-97fe509b2ef3@oracle.com> References: <57E23056.50200@oracle.com> <8add0ad3-f4bf-c81d-823e-97fe509b2ef3@oracle.com> Message-ID: <57E2B053.8020904@oracle.com> Thanks, Vladimir! Best regards, Tobias On 21.09.2016 18:02, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 9/21/16 12:01 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8161085 >> http://cr.openjdk.java.net/~thartmann/8161085/webrev.00/ >> >> This problem is very similar to JDK-8148175, the test fails because G1 barriers emitted by C1 do not preserve floating point registers on SPARC. The problem is that the barrier code calls into the runtime without saving/restoring the FP registers. I fixed this by using save_live_registers() instead of manually saving individual registers. >> >> Tested with failing regression test on SPARC and RBT (running). >> >> Thanks, >> Tobias >> From vladimir.kozlov at oracle.com Wed Sep 21 22:51:33 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Sep 2016 15:51:33 -0700 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: References: <57BE1AD4.7070403@oracle.com> <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> <4af19c5d-9a7f-d18b-820b-6f3664b8183a@oracle.com> <7de8489c-943b-5ecf-48c1-0bffad101070@oracle.com> Message-ID: <57E30EF5.8010709@oracle.com> To close loop on this. It looks like the machine, on which test failed, had well known XMM saving problem in Linux kernal. So we decided to push changes. You may saw notification already. regards, Vladimir On 9/8/16 5:46 PM, Kharbas, Kishor wrote: > Hi Vladimir, > I couldn't reproduce the error on my 32-bit Linux machine. The test was done on a Sandy bridge machine (has AVX instruction set) > Please advise how to proceed further. > > Thanks > Kishor > > > -----Original Message----- > From: Kharbas, Kishor > Sent: Tuesday, September 6, 2016 5:40 PM > To: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net > Cc: Kharbas, Kishor > Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows > > Hi Vladimir, > > The patch only touches code in _WIN64. I am having hard time to understand why the test fails for 32-bit Linux > > Btw, that test passes on Windows 64 platform. I am planning to test on Linux too. > > Thanks > Kishor > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, September 6, 2016 2:31 PM > To: Kharbas, Kishor ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows > > Next jtreg test failed on 32-bit Linux: > > hotspot/test/compiler/runtime/Test7196199.java > > ----------System.err:(57/2416)---------- > test_incrc: [41] = 8.081506E20 != 150000.0 > test_incrc: [42] = 1.8632992E31 != 150000.0 > test_incrc: [43] = 2.8397877E29 != 150000.0 ... > > https://bugs.openjdk.java.net/browse/JDK-7196199 > > was related to Upper bits (64-255) of XMM (YMM) registers are not saved/restored in interrupt handle code during safepoint. > > Looks like your changes are not enough. > > Vladimir > > > On 9/6/16 10:12 AM, Vladimir Kozlov wrote: >> Good. I start testing these changes. I will push it if testing pass. >> >> Thanks, >> Vladimir >> >> On 9/2/16 3:07 PM, Kharbas, Kishor wrote: >>> Thanks Vladimir, >>> >>> I have updated the patch : >>> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.02/ >>> >>> I looked for other places in src/cpu/x86/vm. I feel every case is >>> covered. >>> >>> - Kishor >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, September 1, 2016 11:39 AM >>> To: Kharbas, Kishor ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >>> clobbered by a JNI call on windows >>> >>> Good. But looks like some code relied on old stack layout in stubs, >>> for example sha256_AVX2(): >>> >>> #ifndef _WIN64 >>> _XMM_SAVE_SIZE = 0, >>> #else >>> _XMM_SAVE_SIZE = 8*16, >>> #endif >>> >>> Please, check that all other related code is fixed too. (I looked on >>> all cases of _WIN64 in src/cpu/x86/vm/). >>> >>> Thanks, >>> Vladimir >>> >>> On 8/31/16 10:17 PM, Kharbas, Kishor wrote: >>>> Hello, >>>> >>>> I removed the unwanted save and restore of registers in the range >>>> XMM6-XMM31 from the x64_64 stubs. >>>> I also removed the #ifdef _WIN64 block from x86.ad file. >>>> >>>> Link to the new patch : >>>> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ >>>> >>>> Thanks >>>> Kishor >>>> >>>> >>>> -----Original Message----- >>>> From: Kharbas, Kishor >>>> Sent: Wednesday, August 24, 2016 6:24 PM >>>> To: Vladimir Kozlov ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Cc: Kharbas, Kishor >>>> Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get >>>> clobbered by a JNI call on windows >>>> >>>> Thanks Vladimir for quick feedback. >>>> I will look into the stubs which save the registers in the range >>>> XMM6-XMM31. Also the first comment makes perfect sense. >>>> >>>> Thanks >>>> Kishor >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Wednesday, August 24, 2016 3:08 PM >>>> To: Kharbas, Kishor ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >>>> clobbered by a JNI call on windows >>>> >>>> Hi Kishor, >>>> >>>> First, #ifdef _WIN64 is not needed anymore since calling convention >>>> is similat to unix now. >>>> >>>> Second, I would like you to look more broadly. With this change we >>>> don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not >>>> sure that we can remove all #ifdef _WIN64 there but for most of them >>>> I think we can do. Please, look. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/24/16 2:40 PM, Kharbas, Kishor wrote: >>>>> Requesting the community to review the patch for >>>>> https://bugs.openjdk.java.net/browse/JDK-8078122 >>>>> >>>>> Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 >>>>> >>>>> The patch changes the definitions of registers XMM6-XMM31 for WIN64. >>>>> >>>>> Thank you. >>>>> >>>>> Kishor >>>>> From doug.simon at oracle.com Thu Sep 22 07:54:03 2016 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 22 Sep 2016 09:54:03 +0200 Subject: RFR: 8166517: [JVMCI] export JVMCI to auto-detected JVMCI compiler Message-ID: When JVMCI compiler auto-selection (JDK-8160730) is used, then JVMCI needs to be exported to the selected compiler that same as way if the -Djvmci.Compiler property was specified. https://bugs.openjdk.java.net/browse/JDK-8166517 http://cr.openjdk.java.net/~dnsimon/8166517/ -Doug From goetz.lindenmaier at sap.com Thu Sep 22 10:06:09 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 22 Sep 2016 10:06:09 +0000 Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic In-Reply-To: References: Message-ID: <919b03528ad546a8996e39ed7a737ebb@DEWDFE13DE50.global.corp.sap> Hi Hiroshi, I had a look at your change. While I can't tell whether the algorithm is correct, I can state all our tests are green. Spilling the registers to stack is fine as there is the ShadowZone on the stack. I would appreciate if it would be ported for big endian, too, as a follow up though. Reviewed. Best regards, Goetz > -----Original Message----- > From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] > Sent: Dienstag, 20. September 2016 19:03 > To: Doerr, Martin > Cc: Gustavo Bueno Romero ; hotspot-compiler- > dev at openjdk.java.net; Volker Simonis (volker.simonis at gmail.com) > ; Lindenmaier, Goetz > > Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > Hi all, > > Martin thankfully created a webrev with some good correction. > http://cr.openjdk.java.net/~mdoerr/8164920_ppc_crc32/webrev.01/ > > > Could someone review this change again? > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > > > From: Hiroshi H Horii/Japan/IBM > To: "Doerr, Martin" > Cc: Gustavo Bueno Romero , "hotspot-compiler- > dev at openjdk.java.net" , > "Volker Simonis (volker.simonis at gmail.com)" > Date: 09/19/2016 02:36 > Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > ________________________________ > > > > Hi Martin, and all > (Please allow me to send this mail twice. The first mail is awaiting because it > exceeded 100KB) > > Thank you for your reviewing. Gustavo and I recreated a new change based > on your comments. I would like to request a review again. > > My account of cr server is not available now (because of my mistake...) and > Gustavo cannot create a webrev file with another reason. I would like to > attach a diff file created with "hg diff -g" in hotspot. If possible, could > someone create a webrev file with this changeset? > > [attachment "hotspot.crc32.20160918.changeset" deleted by Hiroshi H > Horii/Japan/IBM] > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > "Doerr, Martin" wrote on 09/13/2016 18:35:09: > > > From: "Doerr, Martin" > > To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-compiler- > > dev at openjdk.java.net" > > Cc: "Volker Simonis (volker.simonis at gmail.com)" > > , Gustavo Bueno Romero > > > Date: 09/13/2016 18:36 > > Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > > > Hi Hiroshi, > > > > we appreciate your change. Thanks for contributing it. > > It basically looks good, but I?d like to propose some minor improvements. > > > > > > kernel_crc32_1word_vpmsumd: > > > > 1. The Pre-align code can be implemented shorter: > > clrldi_(prealign, buf, 57); > > beq(CCR0, L_alignHead); > > > > subfic(prealign, prealign, 128); > > > > 2. I'd prefer the label name ?L_alignedHead?. > > > > 3. The branch b(L_alignTail) and the label are not needed and should > > get removed. > > > > > > kernel_crc32_1word_aligned: > > > > 1. When saving and restoring non-volatile vector register, please > > use offset differences of -16 instead of -32. > > (The ABI allows up to 288 bytes to be used in frameless functions so > > it will fit if -16 is used.) > > > > 2. The std instructions should better be used with int offsets so > > you can get rid of the addi(offset, offset, -8) instructions. > > > > > > Comments: > > For single line comments "//" should be used instead of "/*". Would > > be nice if you could change them. > > > > > > Thanks and best regards, > > Martin > > > > > > From: Hiroshi H Horii [mailto:HORII at jp.ibm.com > ] > > Sent: Dienstag, 6. September 2016 16:50 > > To: hotspot-compiler-dev at openjdk.java.net; vladimir.kozlov at oracle.com > > Cc: Volker Simonis (volker.simonis at gmail.com) > > ; Doerr, Martin ; > > Gustavo Bueno Romero > > Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > > > Dear Vladimir and all: > > > > Can I please request reviews for the following change? > > > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8164920 > > > webrev: http://cr.openjdk.java.net/~gromero/8164920/01/ > > > > > As Volker's comments in the above JIRA, this is a ppc64-only > > improvement which will not > > affect any of the Oracle platforms in any way. > > > > This change includes new implementation of CRC32 Intrinsics for ppc64le. > > In my local experiment, CRC32 of 64KB was calculated more than 20 > > times faster than original. > > Performance of CRC32 Intrinsic is important to run recent Apache > Cassandra. > > A Cassandra daemon needs to read 64KB data from a disk with CRC32 > > checksum by default. > > > > This JIRA entry has "jdk9-fc-request" label. > > If there is a chance to include new change in JDK 9 for ppc64le, I > > would like to request > > a review for this change. > > > > Regards, > > Hiroshi > > ----------------------- > > Hiroshi Horii, Ph.D. > > IBM Research - Tokyo > > From martin.doerr at sap.com Thu Sep 22 10:16:00 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 22 Sep 2016 10:16:00 +0000 Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic In-Reply-To: <919b03528ad546a8996e39ed7a737ebb@DEWDFE13DE50.global.corp.sap> References: <919b03528ad546a8996e39ed7a737ebb@DEWDFE13DE50.global.corp.sap> Message-ID: Hi all, thanks a lot for the contribution and for reviewing. 8164920 has the label jdk9-fc-yes, a second review and the tests have passed. I will push it. Best regards, Martin -----Original Message----- From: Lindenmaier, Goetz Sent: Donnerstag, 22. September 2016 12:06 To: Hiroshi H Horii ; Doerr, Martin Cc: Gustavo Bueno Romero ; hotspot-compiler-dev at openjdk.java.net; Volker Simonis (volker.simonis at gmail.com) Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic Hi Hiroshi, I had a look at your change. While I can't tell whether the algorithm is correct, I can state all our tests are green. Spilling the registers to stack is fine as there is the ShadowZone on the stack. I would appreciate if it would be ported for big endian, too, as a follow up though. Reviewed. Best regards, Goetz > -----Original Message----- > From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] > Sent: Dienstag, 20. September 2016 19:03 > To: Doerr, Martin > Cc: Gustavo Bueno Romero ; hotspot-compiler- > dev at openjdk.java.net; Volker Simonis (volker.simonis at gmail.com) > ; Lindenmaier, Goetz > > Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > Hi all, > > Martin thankfully created a webrev with some good correction. > http://cr.openjdk.java.net/~mdoerr/8164920_ppc_crc32/webrev.01/ > > > Could someone review this change again? > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > > > From: Hiroshi H Horii/Japan/IBM > To: "Doerr, Martin" > Cc: Gustavo Bueno Romero , "hotspot-compiler- > dev at openjdk.java.net" , > "Volker Simonis (volker.simonis at gmail.com)" > Date: 09/19/2016 02:36 > Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > ________________________________ > > > > Hi Martin, and all > (Please allow me to send this mail twice. The first mail is awaiting because it > exceeded 100KB) > > Thank you for your reviewing. Gustavo and I recreated a new change based > on your comments. I would like to request a review again. > > My account of cr server is not available now (because of my mistake...) and > Gustavo cannot create a webrev file with another reason. I would like to > attach a diff file created with "hg diff -g" in hotspot. If possible, could > someone create a webrev file with this changeset? > > [attachment "hotspot.crc32.20160918.changeset" deleted by Hiroshi H > Horii/Japan/IBM] > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > "Doerr, Martin" wrote on 09/13/2016 18:35:09: > > > From: "Doerr, Martin" > > To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-compiler- > > dev at openjdk.java.net" > > Cc: "Volker Simonis (volker.simonis at gmail.com)" > > , Gustavo Bueno Romero > > > Date: 09/13/2016 18:36 > > Subject: RE: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > > > Hi Hiroshi, > > > > we appreciate your change. Thanks for contributing it. > > It basically looks good, but I?d like to propose some minor improvements. > > > > > > kernel_crc32_1word_vpmsumd: > > > > 1. The Pre-align code can be implemented shorter: > > clrldi_(prealign, buf, 57); > > beq(CCR0, L_alignHead); > > > > subfic(prealign, prealign, 128); > > > > 2. I'd prefer the label name ?L_alignedHead?. > > > > 3. The branch b(L_alignTail) and the label are not needed and should > > get removed. > > > > > > kernel_crc32_1word_aligned: > > > > 1. When saving and restoring non-volatile vector register, please > > use offset differences of -16 instead of -32. > > (The ABI allows up to 288 bytes to be used in frameless functions so > > it will fit if -16 is used.) > > > > 2. The std instructions should better be used with int offsets so > > you can get rid of the addi(offset, offset, -8) instructions. > > > > > > Comments: > > For single line comments "//" should be used instead of "/*". Would > > be nice if you could change them. > > > > > > Thanks and best regards, > > Martin > > > > > > From: Hiroshi H Horii [mailto:HORII at jp.ibm.com > ] > > Sent: Dienstag, 6. September 2016 16:50 > > To: hotspot-compiler-dev at openjdk.java.net; vladimir.kozlov at oracle.com > > Cc: Volker Simonis (volker.simonis at gmail.com) > > ; Doerr, Martin ; > > Gustavo Bueno Romero > > Subject: RFR(m) 8164920: ppc: enhancement of CRC32 intrinsic > > > > Dear Vladimir and all: > > > > Can I please request reviews for the following change? > > > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8164920 > > > webrev: http://cr.openjdk.java.net/~gromero/8164920/01/ > > > > > As Volker's comments in the above JIRA, this is a ppc64-only > > improvement which will not > > affect any of the Oracle platforms in any way. > > > > This change includes new implementation of CRC32 Intrinsics for ppc64le. > > In my local experiment, CRC32 of 64KB was calculated more than 20 > > times faster than original. > > Performance of CRC32 Intrinsic is important to run recent Apache > Cassandra. > > A Cassandra daemon needs to read 64KB data from a disk with CRC32 > > checksum by default. > > > > This JIRA entry has "jdk9-fc-request" label. > > If there is a chance to include new change in JDK 9 for ppc64le, I > > would like to request > > a review for this change. > > > > Regards, > > Hiroshi > > ----------------------- > > Hiroshi Horii, Ph.D. > > IBM Research - Tokyo > > From jamsheed.c.m at oracle.com Thu Sep 22 16:22:18 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Thu, 22 Sep 2016 21:52:18 +0530 Subject: RFR: 8134389: Crash in HotSpot with jvm.dll+0x42b48 ciObjectFactory::create_new_metadata In-Reply-To: <3d9039ee-84a2-d4e8-e36a-08f12c4cd504@oracle.com> References: <05c82c51-9525-eec7-206e-a265c7d47194@oracle.com> <7c1a8b01-b4ec-ea23-b59a-500c1bfd5dbc@oracle.com> <3d9039ee-84a2-d4e8-e36a-08f12c4cd504@oracle.com> Message-ID: <087568ef-70b1-2938-eb27-784264d0ec39@oracle.com> Hi Vladimir, Thanks for the review, On 9/19/2016 10:08 PM, Vladimir Ivanov wrote: > Overall, the fix looks good. > > Some nitpicks: > (1) I'd prefer to avoid using ciMethod::is_compiled_lambda_form(); Ok. Hope there is no correctness reasons behind this? > (2) align with other uses of TypeCast for method handles. There was a bug in closed arm port blocking this(8166441). i was getting failure in arm-32 closed port. its fixed and is out on review. The issue was with null constant getting typecast. i chose alternate implementation to avoid all those cases. i am Ok for aligning with previous typecast implementation. hope i needn't send updated webrev? Best Regards, Jamsheed > > Also, ciType::is_klass() can be replaced with > !ciType::is_primitive_type() check, but IMO it doesn't matter much. > > Something like the following: > > diff --git a/src/share/vm/c1/c1_GraphBuilder.cpp > b/src/share/vm/c1/c1_GraphBuilder.cpp > --- a/src/share/vm/c1/c1_GraphBuilder.cpp > +++ b/src/share/vm/c1/c1_GraphBuilder.cpp > @@ -1493,6 +1493,24 @@ > // Check to see whether we are inlining. If so, Return > // instructions become Gotos to the continuation point. > if (continuation() != NULL) { > + > + int invoke_bci = state()->caller_state()->bci(); > + > + if (x != NULL && !ignore_return) { > + ciMethod* caller = state()->scope()->caller()->method(); > + Bytecodes::Code invoke_raw_bc = > caller->raw_code_at_bci(invoke_bci); > + if (invoke_raw_bc == Bytecodes::_invokehandle || > + invoke_raw_bc == Bytecodes::_invokedynamic) { > + ciType* declared_ret_type = > caller->get_declared_signature_at_bci(invoke_bci)->return_type(); > + if (declared_ret_type->is_klass() && > + x->exact_type() == NULL && > + x->declared_type() != declared_ret_type && > + declared_ret_type != compilation()->env()->Object_klass()) { > + x = append(new TypeCast(declared_ret_type->as_klass(), x, > copy_state_before())); > + } > + } > + } > + > assert(!method()->is_synchronized() || InlineSynchronizedMethods, > "can not inline synchronized methods yet"); > > if (compilation()->env()->dtrace_method_probes()) { > @@ -1516,7 +1534,6 @@ > // State at end of inlined method is the state of the caller > // without the method parameters on stack, including the > // return value, if any, of the inlined method on operand stack. > - int invoke_bci = state()->caller_state()->bci(); > set_state(state()->caller_state()->copy_for_parsing()); > if (x != NULL) { > if (!ignore_return) { > diff --git a/src/share/vm/c1/c1_Instruction.cpp > b/src/share/vm/c1/c1_Instruction.cpp > --- a/src/share/vm/c1/c1_Instruction.cpp > +++ b/src/share/vm/c1/c1_Instruction.cpp > @@ -360,7 +360,8 @@ > } > > ciType* Invoke::declared_type() const { > - ciType *t = _target->signature()->return_type(); > + ciSignature* declared_signature = > state()->scope()->method()->get_declared_signature_at_bci(state()->bci()); > + ciType *t = declared_signature->return_type(); > assert(t->basic_type() != T_VOID, "need return value of void > method?"); > return t; > } > diff --git a/src/share/vm/ci/ciMethod.hpp b/src/share/vm/ci/ciMethod.hpp > --- a/src/share/vm/ci/ciMethod.hpp > +++ b/src/share/vm/ci/ciMethod.hpp > @@ -255,6 +255,12 @@ > ciSignature* ignored_declared_signature; > return get_method_at_bci(bci, ignored_will_link, > &ignored_declared_signature); > } > + ciSignature* get_declared_signature_at_bci(int bci) { > + bool ignored_will_link; > + ciSignature* declared_signature; > + get_method_at_bci(bci, ignored_will_link, &declared_signature); > + return declared_signature; > + } > > // Given a certain calling environment, find the monomorphic target > // for the call. Return NULL if the call is not monomorphic in > > Best regards, > Vladimir Ivanov > > On 9/11/16 2:51 PM, Jamsheed C m wrote: >> i made some changes to my fix. webrev is updated in place. >> >> pit results with latest modification updated in bug(not still completed) >> >> Best Regards, >> >> Jamsheed >> >> >> On 9/10/2016 3:53 AM, Jamsheed C m wrote: >>> >>> adding a little more description as per my understanding >>> >>> This issue can happen only for compiled lforms not inlined case >>> >>> there are two scenarios. >>> 1) no compiled lforms inlined >>> 2) some compiled lforms are inlined or final method is not inlined >>> (linkTo* not inlined).. (i.e partially inlined) >>> >>> in all these cases *Invoke instruction* will be *return Value*. and >>> will have erased type. >>> so we reify return type either by type casting(for partially inlined >>> case) or by directly pulling from callsite MT. >>> >>> Best Regards, >>> >>> Jamsheed >>> >>> >>> On 9/8/2016 3:26 PM, Jamsheed C m wrote: >>>> Hi All, >>>> >>>> bugid: https://bugs.openjdk.java.net/browse/JDK-8134389 >>>> >>>> webrev: http://cr.openjdk.java.net/~jcm/8134389/webrev.00/ >>>> >>>> return type information is not available in lforms, this causes >>>> contradictions in operation like store indexed. mh _linkTo* site arg >>>> type casting. etc.. >>>> >>>> fix: TypeCast to declared return type at lform return. >>>> >>>> Best Regards, >>>> >>>> Jamsheed >>>> >>> >> From igor.ignatyev at oracle.com Thu Sep 22 17:56:13 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 22 Sep 2016 20:56:13 +0300 Subject: RFR(XXS) : 8166549 : fix incorrectly @ignore-d hotspot/compiler tests Message-ID: http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ > 1 line changed: 0 ins; 0 del; 1 mod; Hi all, could you please review this tiny patch which corrects bug id used in @ignore? compiler/codecache/stress/OverloadCompileQueueTest.java was @ignored due to JDK-8071905[1], but JDK-8071905 is closed as a dup of JDK-8079586[2] which is fixed. the test still has a problem (it can timeout), so I have filed a new bug[3] and used its id in @ignore. JBS: https://bugs.openjdk.java.net/browse/JDK-8166549 webrev: http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ [1] https://bugs.openjdk.java.net/browse/JDK-8071905 [2] https://bugs.openjdk.java.net/browse/JDK-8079586 [3] https://bugs.openjdk.java.net/browse/JDK-8166554 Thanks, ? Igor From kirill.zhaldybin at oracle.com Thu Sep 22 18:08:11 2016 From: kirill.zhaldybin at oracle.com (Kirill Zhaldybin) Date: Thu, 22 Sep 2016 21:08:11 +0300 Subject: RFR(XXS) : 8166549 : fix incorrectly @ignore-d hotspot/compiler tests In-Reply-To: References: Message-ID: Igor, Looks good to me. Regards, Kirill On 22.09.2016 20:56, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ >> 1 line changed: 0 ins; 0 del; 1 mod; > Hi all, > > could you please review this tiny patch which corrects bug id used in @ignore? > compiler/codecache/stress/OverloadCompileQueueTest.java was @ignored due to JDK-8071905[1], but JDK-8071905 is closed as a dup of JDK-8079586[2] which is fixed. the test still has a problem (it can timeout), so I have filed a new bug[3] and used its id in @ignore. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8166549 > webrev: http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ > > [1] https://bugs.openjdk.java.net/browse/JDK-8071905 > [2] https://bugs.openjdk.java.net/browse/JDK-8079586 > [3] https://bugs.openjdk.java.net/browse/JDK-8166554 > > Thanks, > ? Igor From vladimir.x.ivanov at oracle.com Thu Sep 22 18:16:56 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 22 Sep 2016 21:16:56 +0300 Subject: RFR: 8134389: Crash in HotSpot with jvm.dll+0x42b48 ciObjectFactory::create_new_metadata In-Reply-To: <087568ef-70b1-2938-eb27-784264d0ec39@oracle.com> References: <05c82c51-9525-eec7-206e-a265c7d47194@oracle.com> <7c1a8b01-b4ec-ea23-b59a-500c1bfd5dbc@oracle.com> <3d9039ee-84a2-d4e8-e36a-08f12c4cd504@oracle.com> <087568ef-70b1-2938-eb27-784264d0ec39@oracle.com> Message-ID: <59650400-0347-12c7-1a26-4efb99d5b29e@oracle.com> >> (1) I'd prefer to avoid using ciMethod::is_compiled_lambda_form(); > Ok. Hope there is no correctness reasons behind this? No, it simply doesn't communicate the intention clearly enough. All LambdaForms are marked w/ @Compiled, but we are only interested in invokers (indy, exact & genertic invokers). >> (2) align with other uses of TypeCast for method handles. > There was a bug in closed arm port blocking this(8166441). i was getting > failure in arm-32 closed port. its fixed and is out on review. > The issue was with null constant getting typecast. i chose alternate > implementation to avoid all those cases. i am Ok for aligning with > previous typecast implementation. Good. > hope i needn't send updated webrev? No need to send new webrev. Best regards, Vladimir Ivanov > > Best Regards, > Jamsheed > >> >> Also, ciType::is_klass() can be replaced with >> !ciType::is_primitive_type() check, but IMO it doesn't matter much. >> >> Something like the following: >> >> diff --git a/src/share/vm/c1/c1_GraphBuilder.cpp >> b/src/share/vm/c1/c1_GraphBuilder.cpp >> --- a/src/share/vm/c1/c1_GraphBuilder.cpp >> +++ b/src/share/vm/c1/c1_GraphBuilder.cpp >> @@ -1493,6 +1493,24 @@ >> // Check to see whether we are inlining. If so, Return >> // instructions become Gotos to the continuation point. >> if (continuation() != NULL) { >> + >> + int invoke_bci = state()->caller_state()->bci(); >> + >> + if (x != NULL && !ignore_return) { >> + ciMethod* caller = state()->scope()->caller()->method(); >> + Bytecodes::Code invoke_raw_bc = >> caller->raw_code_at_bci(invoke_bci); >> + if (invoke_raw_bc == Bytecodes::_invokehandle || >> + invoke_raw_bc == Bytecodes::_invokedynamic) { >> + ciType* declared_ret_type = >> caller->get_declared_signature_at_bci(invoke_bci)->return_type(); >> + if (declared_ret_type->is_klass() && >> + x->exact_type() == NULL && >> + x->declared_type() != declared_ret_type && >> + declared_ret_type != compilation()->env()->Object_klass()) { >> + x = append(new TypeCast(declared_ret_type->as_klass(), x, >> copy_state_before())); >> + } >> + } >> + } >> + >> assert(!method()->is_synchronized() || InlineSynchronizedMethods, >> "can not inline synchronized methods yet"); >> >> if (compilation()->env()->dtrace_method_probes()) { >> @@ -1516,7 +1534,6 @@ >> // State at end of inlined method is the state of the caller >> // without the method parameters on stack, including the >> // return value, if any, of the inlined method on operand stack. >> - int invoke_bci = state()->caller_state()->bci(); >> set_state(state()->caller_state()->copy_for_parsing()); >> if (x != NULL) { >> if (!ignore_return) { >> diff --git a/src/share/vm/c1/c1_Instruction.cpp >> b/src/share/vm/c1/c1_Instruction.cpp >> --- a/src/share/vm/c1/c1_Instruction.cpp >> +++ b/src/share/vm/c1/c1_Instruction.cpp >> @@ -360,7 +360,8 @@ >> } >> >> ciType* Invoke::declared_type() const { >> - ciType *t = _target->signature()->return_type(); >> + ciSignature* declared_signature = >> state()->scope()->method()->get_declared_signature_at_bci(state()->bci()); >> >> + ciType *t = declared_signature->return_type(); >> assert(t->basic_type() != T_VOID, "need return value of void >> method?"); >> return t; >> } >> diff --git a/src/share/vm/ci/ciMethod.hpp b/src/share/vm/ci/ciMethod.hpp >> --- a/src/share/vm/ci/ciMethod.hpp >> +++ b/src/share/vm/ci/ciMethod.hpp >> @@ -255,6 +255,12 @@ >> ciSignature* ignored_declared_signature; >> return get_method_at_bci(bci, ignored_will_link, >> &ignored_declared_signature); >> } >> + ciSignature* get_declared_signature_at_bci(int bci) { >> + bool ignored_will_link; >> + ciSignature* declared_signature; >> + get_method_at_bci(bci, ignored_will_link, &declared_signature); >> + return declared_signature; >> + } >> >> // Given a certain calling environment, find the monomorphic target >> // for the call. Return NULL if the call is not monomorphic in >> >> Best regards, >> Vladimir Ivanov >> >> On 9/11/16 2:51 PM, Jamsheed C m wrote: >>> i made some changes to my fix. webrev is updated in place. >>> >>> pit results with latest modification updated in bug(not still completed) >>> >>> Best Regards, >>> >>> Jamsheed >>> >>> >>> On 9/10/2016 3:53 AM, Jamsheed C m wrote: >>>> >>>> adding a little more description as per my understanding >>>> >>>> This issue can happen only for compiled lforms not inlined case >>>> >>>> there are two scenarios. >>>> 1) no compiled lforms inlined >>>> 2) some compiled lforms are inlined or final method is not inlined >>>> (linkTo* not inlined).. (i.e partially inlined) >>>> >>>> in all these cases *Invoke instruction* will be *return Value*. and >>>> will have erased type. >>>> so we reify return type either by type casting(for partially inlined >>>> case) or by directly pulling from callsite MT. >>>> >>>> Best Regards, >>>> >>>> Jamsheed >>>> >>>> >>>> On 9/8/2016 3:26 PM, Jamsheed C m wrote: >>>>> Hi All, >>>>> >>>>> bugid: https://bugs.openjdk.java.net/browse/JDK-8134389 >>>>> >>>>> webrev: http://cr.openjdk.java.net/~jcm/8134389/webrev.00/ >>>>> >>>>> return type information is not available in lforms, this causes >>>>> contradictions in operation like store indexed. mh _linkTo* site arg >>>>> type casting. etc.. >>>>> >>>>> fix: TypeCast to declared return type at lform return. >>>>> >>>>> Best Regards, >>>>> >>>>> Jamsheed >>>>> >>>> >>> > From vladimir.kozlov at oracle.com Thu Sep 22 18:32:23 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 22 Sep 2016 11:32:23 -0700 Subject: RFR(XXS) : 8166549 : fix incorrectly @ignore-d hotspot/compiler tests In-Reply-To: References: Message-ID: <57E423B7.5080107@oracle.com> Good. Thanks, Vladimir On 9/22/16 10:56 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ >> 1 line changed: 0 ins; 0 del; 1 mod; > > Hi all, > > could you please review this tiny patch which corrects bug id used in @ignore? > compiler/codecache/stress/OverloadCompileQueueTest.java was @ignored due to JDK-8071905[1], but JDK-8071905 is closed as a dup of JDK-8079586[2] which is fixed. the test still has a problem (it can timeout), so I have filed a new bug[3] and used its id in @ignore. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8166549 > webrev: http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ > > [1] https://bugs.openjdk.java.net/browse/JDK-8071905 > [2] https://bugs.openjdk.java.net/browse/JDK-8079586 > [3] https://bugs.openjdk.java.net/browse/JDK-8166554 > > Thanks, > ? Igor > From igor.ignatyev at oracle.com Thu Sep 22 19:05:34 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 22 Sep 2016 22:05:34 +0300 Subject: RFR(XXS) : 8166549 : fix incorrectly @ignore-d hotspot/compiler tests In-Reply-To: References: Message-ID: <975B41AA-2B5E-43FD-AB97-9D3C3ABE2169@oracle.com> Kirill, thank you. ? Igor > On Sep 22, 2016, at 9:08 PM, Kirill Zhaldybin wrote: > > Igor, > > Looks good to me. > > Regards, Kirill > > On 22.09.2016 20:56, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ >>> 1 line changed: 0 ins; 0 del; 1 mod; >> Hi all, >> >> could you please review this tiny patch which corrects bug id used in @ignore? >> compiler/codecache/stress/OverloadCompileQueueTest.java was @ignored due to JDK-8071905[1], but JDK-8071905 is closed as a dup of JDK-8079586[2] which is fixed. the test still has a problem (it can timeout), so I have filed a new bug[3] and used its id in @ignore. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8166549 >> webrev: http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8071905 >> [2] https://bugs.openjdk.java.net/browse/JDK-8079586 >> [3] https://bugs.openjdk.java.net/browse/JDK-8166554 >> >> Thanks, >> ? Igor > From igor.ignatyev at oracle.com Thu Sep 22 19:05:50 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 22 Sep 2016 22:05:50 +0300 Subject: RFR(XXS) : 8166549 : fix incorrectly @ignore-d hotspot/compiler tests In-Reply-To: <57E423B7.5080107@oracle.com> References: <57E423B7.5080107@oracle.com> Message-ID: <57321D41-26E8-4713-8923-1BEC64C63E99@oracle.com> Vladmir, Thank you for review, ? Igor > On Sep 22, 2016, at 9:32 PM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 9/22/16 10:56 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ >>> 1 line changed: 0 ins; 0 del; 1 mod; >> >> Hi all, >> >> could you please review this tiny patch which corrects bug id used in @ignore? >> compiler/codecache/stress/OverloadCompileQueueTest.java was @ignored due to JDK-8071905[1], but JDK-8071905 is closed as a dup of JDK-8079586[2] which is fixed. the test still has a problem (it can timeout), so I have filed a new bug[3] and used its id in @ignore. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8166549 >> webrev: http://cr.openjdk.java.net/~iignatyev/8166549/webrev.00/ >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8071905 >> [2] https://bugs.openjdk.java.net/browse/JDK-8079586 >> [3] https://bugs.openjdk.java.net/browse/JDK-8166554 >> >> Thanks, >> ? Igor >> From jamsheed.c.m at oracle.com Fri Sep 23 07:13:25 2016 From: jamsheed.c.m at oracle.com (Jamsheed C m) Date: Fri, 23 Sep 2016 12:43:25 +0530 Subject: RFR: 8134389: Crash in HotSpot with jvm.dll+0x42b48 ciObjectFactory::create_new_metadata In-Reply-To: <59650400-0347-12c7-1a26-4efb99d5b29e@oracle.com> References: <05c82c51-9525-eec7-206e-a265c7d47194@oracle.com> <7c1a8b01-b4ec-ea23-b59a-500c1bfd5dbc@oracle.com> <3d9039ee-84a2-d4e8-e36a-08f12c4cd504@oracle.com> <087568ef-70b1-2938-eb27-784264d0ec39@oracle.com> <59650400-0347-12c7-1a26-4efb99d5b29e@oracle.com> Message-ID: <0ab782c5-036a-9374-e39f-5980011849dd@oracle.com> Thanks for clarification, Vladimir Ivanov. Best Regards, Jamsheed On 9/22/2016 11:46 PM, Vladimir Ivanov wrote: >> Ok. Hope there is no correctness reasons behind this? > > No, it simply doesn't communicate the intention clearly enough. > > All LambdaForms are marked w/ @Compiled, but we are only interested in > invokers (indy, exact & genertic invokers). -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Fri Sep 23 10:58:37 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 23 Sep 2016 10:58:37 +0000 Subject: RFR(M): 8166562: C2: Suppress relocations in scratch emit. Message-ID: Hi, Please review this nice and small improvement to scratch emit. It simplifies The s390 port considerably, but is completely independent. I introduced usage of the feature on ppc. I please need a sponsor. http://cr.openjdk.java.net/~goetz/wr16/8166562-scratch_emit/webrev.01/ The C2 compiler needs to know how much space the assembly emitted for a MachNode requires. For many nodes, this is statically specified. Some nodes don't have fixed sizes, as the code emitted depends on flags or even runtime values. To determine the sizes of these, C2 does a scratch emit, i.e., it emits the assembly for the MachNode to a dedicated code buffer and remembers the space needed. In the debug build, this is done on each emit also for nodes with fixed size to verify the fixed size. The scratch emit buffer does not support relocations. Therefore any code needing relocations must check for scratch emit and skip the relocations if so. The s390x architecture offers a lot of instructions with pc-relative addressing. We use these to access constants in the constant section of the code buffer. As this section can be resized, these offsets must be able to be relocated. Instead of coding the check whether a scratch emit is happening into all the MachNodes, we mark the scratch emit buffers as such and just skip the relocation in these buffers. This simplifies usage of relocations in a lot of nodes and macroAssembler routines. Best regards, Goetz. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Fri Sep 23 12:31:05 2016 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 23 Sep 2016 14:31:05 +0200 Subject: RFR: 8166517: [JVMCI] export JVMCI to auto-detected JVMCI compiler In-Reply-To: References: Message-ID: <17D28FE7-8E15-4D0C-89DE-7178E75A339D@oracle.com> Can I please get a review of this tiny change. Thanks! -Doug > On 22 Sep 2016, at 09:54, Doug Simon wrote: > > When JVMCI compiler auto-selection (JDK-8160730) is used, then JVMCI needs to be exported to the selected compiler that same as way if the -Djvmci.Compiler property was specified. > > https://bugs.openjdk.java.net/browse/JDK-8166517 > http://cr.openjdk.java.net/~dnsimon/8166517/ > > -Doug From vladimir.kozlov at oracle.com Fri Sep 23 19:09:24 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 23 Sep 2016 12:09:24 -0700 Subject: RFR: 8166517: [JVMCI] export JVMCI to auto-detected JVMCI compiler In-Reply-To: <17D28FE7-8E15-4D0C-89DE-7178E75A339D@oracle.com> References: <17D28FE7-8E15-4D0C-89DE-7178E75A339D@oracle.com> Message-ID: <57E57DE4.3000305@oracle.com> Looks fine to me. But I thought Chris or Tom could review it. They are both OpenJDK Reviewers. Thanks, Vladimir On 9/23/16 5:31 AM, Doug Simon wrote: > Can I please get a review of this tiny change. > > Thanks! > > -Doug > >> On 22 Sep 2016, at 09:54, Doug Simon wrote: >> >> When JVMCI compiler auto-selection (JDK-8160730) is used, then JVMCI needs to be exported to the selected compiler that same as way if the -Djvmci.Compiler property was specified. >> >> https://bugs.openjdk.java.net/browse/JDK-8166517 >> http://cr.openjdk.java.net/~dnsimon/8166517/ >> >> -Doug > From doug.simon at oracle.com Fri Sep 23 20:19:00 2016 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 23 Sep 2016 22:19:00 +0200 Subject: RFR: 8166517: [JVMCI] export JVMCI to auto-detected JVMCI compiler In-Reply-To: <57E57DE4.3000305@oracle.com> References: <17D28FE7-8E15-4D0C-89DE-7178E75A339D@oracle.com> <57E57DE4.3000305@oracle.com> Message-ID: <720062BC-70D6-4592-AA39-9662398BBBC1@oracle.com> > On 23 Sep 2016, at 21:09, Vladimir Kozlov wrote: > > Looks fine to me. But I thought Chris or Tom could review it. They are both OpenJDK Reviewers. I?m not fussy - anyone with the sufficient role with do ;-) Thanks for the review in any case. -Doug > On 9/23/16 5:31 AM, Doug Simon wrote: >> Can I please get a review of this tiny change. >> >> Thanks! >> >> -Doug >> >>> On 22 Sep 2016, at 09:54, Doug Simon wrote: >>> >>> When JVMCI compiler auto-selection (JDK-8160730) is used, then JVMCI needs to be exported to the selected compiler that same as way if the -Djvmci.Compiler property was specified. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8166517 >>> http://cr.openjdk.java.net/~dnsimon/8166517/ >>> >>> -Doug >> From kishor.kharbas at intel.com Fri Sep 23 20:32:31 2016 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Fri, 23 Sep 2016 20:32:31 +0000 Subject: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows In-Reply-To: <57E30EF5.8010709@oracle.com> References: <57BE1AD4.7070403@oracle.com> <6aee0e7c-76a5-a920-7099-a3edc349f205@oracle.com> <4af19c5d-9a7f-d18b-820b-6f3664b8183a@oracle.com> <7de8489c-943b-5ecf-48c1-0bffad101070@oracle.com> <57E30EF5.8010709@oracle.com> Message-ID: Ah ok, glad to know it wasn't a regression by this patch. Thanks Kishor -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, September 21, 2016 3:52 PM To: Kharbas, Kishor ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get clobbered by a JNI call on windows To close loop on this. It looks like the machine, on which test failed, had well known XMM saving problem in Linux kernal. So we decided to push changes. You may saw notification already. regards, Vladimir On 9/8/16 5:46 PM, Kharbas, Kishor wrote: > Hi Vladimir, > I couldn't reproduce the error on my 32-bit Linux machine. The test > was done on a Sandy bridge machine (has AVX instruction set) Please advise how to proceed further. > > Thanks > Kishor > > > -----Original Message----- > From: Kharbas, Kishor > Sent: Tuesday, September 6, 2016 5:40 PM > To: Vladimir Kozlov ; > hotspot-compiler-dev at openjdk.java.net > Cc: Kharbas, Kishor > Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get > clobbered by a JNI call on windows > > Hi Vladimir, > > The patch only touches code in _WIN64. I am having hard time to > understand why the test fails for 32-bit Linux > > Btw, that test passes on Windows 64 platform. I am planning to test on Linux too. > > Thanks > Kishor > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, September 6, 2016 2:31 PM > To: Kharbas, Kishor ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get > clobbered by a JNI call on windows > > Next jtreg test failed on 32-bit Linux: > > hotspot/test/compiler/runtime/Test7196199.java > > ----------System.err:(57/2416)---------- > test_incrc: [41] = 8.081506E20 != 150000.0 > test_incrc: [42] = 1.8632992E31 != 150000.0 > test_incrc: [43] = 2.8397877E29 != 150000.0 ... > > https://bugs.openjdk.java.net/browse/JDK-7196199 > > was related to Upper bits (64-255) of XMM (YMM) registers are not saved/restored in interrupt handle code during safepoint. > > Looks like your changes are not enough. > > Vladimir > > > On 9/6/16 10:12 AM, Vladimir Kozlov wrote: >> Good. I start testing these changes. I will push it if testing pass. >> >> Thanks, >> Vladimir >> >> On 9/2/16 3:07 PM, Kharbas, Kishor wrote: >>> Thanks Vladimir, >>> >>> I have updated the patch : >>> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.02/ >>> >>> I looked for other places in src/cpu/x86/vm. I feel every case is >>> covered. >>> >>> - Kishor >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Thursday, September 1, 2016 11:39 AM >>> To: Kharbas, Kishor ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >>> clobbered by a JNI call on windows >>> >>> Good. But looks like some code relied on old stack layout in stubs, >>> for example sha256_AVX2(): >>> >>> #ifndef _WIN64 >>> _XMM_SAVE_SIZE = 0, >>> #else >>> _XMM_SAVE_SIZE = 8*16, >>> #endif >>> >>> Please, check that all other related code is fixed too. (I looked on >>> all cases of _WIN64 in src/cpu/x86/vm/). >>> >>> Thanks, >>> Vladimir >>> >>> On 8/31/16 10:17 PM, Kharbas, Kishor wrote: >>>> Hello, >>>> >>>> I removed the unwanted save and restore of registers in the range >>>> XMM6-XMM31 from the x64_64 stubs. >>>> I also removed the #ifdef _WIN64 block from x86.ad file. >>>> >>>> Link to the new patch : >>>> http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.01/ >>>> >>>> Thanks >>>> Kishor >>>> >>>> >>>> -----Original Message----- >>>> From: Kharbas, Kishor >>>> Sent: Wednesday, August 24, 2016 6:24 PM >>>> To: Vladimir Kozlov ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Cc: Kharbas, Kishor >>>> Subject: RE: RFR(M) 8078122 : YMM registers upper 128 bits may get >>>> clobbered by a JNI call on windows >>>> >>>> Thanks Vladimir for quick feedback. >>>> I will look into the stubs which save the registers in the range >>>> XMM6-XMM31. Also the first comment makes perfect sense. >>>> >>>> Thanks >>>> Kishor >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Wednesday, August 24, 2016 3:08 PM >>>> To: Kharbas, Kishor ; >>>> hotspot-compiler-dev at openjdk.java.net >>>> Subject: Re: RFR(M) 8078122 : YMM registers upper 128 bits may get >>>> clobbered by a JNI call on windows >>>> >>>> Hi Kishor, >>>> >>>> First, #ifdef _WIN64 is not needed anymore since calling convention >>>> is similat to unix now. >>>> >>>> Second, I would like you to look more broadly. With this change we >>>> don't need to preserve XMM6-XMM31 in our stubs for WIN64. I am not >>>> sure that we can remove all #ifdef _WIN64 there but for most of >>>> them I think we can do. Please, look. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 8/24/16 2:40 PM, Kharbas, Kishor wrote: >>>>> Requesting the community to review the patch for >>>>> https://bugs.openjdk.java.net/browse/JDK-8078122 >>>>> >>>>> Webrev : http://cr.openjdk.java.net/~vdeshpande/8078122/webrev.00 >>>>> >>>>> The patch changes the definitions of registers XMM6-XMM31 for WIN64. >>>>> >>>>> Thank you. >>>>> >>>>> Kishor >>>>> From vladimir.kozlov at oracle.com Fri Sep 23 21:25:52 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 23 Sep 2016 14:25:52 -0700 Subject: RFR(M): 8166562: C2: Suppress relocations in scratch emit. In-Reply-To: References: Message-ID: <57E59DE0.8020200@oracle.com> Looks good. I thought about using new type of CodeBlobType but it may need more changes then in your. Thanks, Vladimir On 9/23/16 3:58 AM, Lindenmaier, Goetz wrote: > Hi, > > Please review this nice and small improvement to scratch emit. It simplifies > > The s390 port considerably, but is completely independent. I introduced > > usage of the feature on ppc. I please need a sponsor. > > http://cr.openjdk.java.net/~goetz/wr16/8166562-scratch_emit/webrev.01/ > > The C2 compiler needs to know how much space the assembly emitted for a MachNode requires. For many nodes, this is statically specified. Some nodes don't have fixed sizes, as the code emitted depends > on flags or even runtime values. To determine the sizes of these, C2 does a scratch emit, i.e., it emits the assembly for the MachNode to a dedicated code buffer and remembers the space needed. In the > debug build, this is done on each emit also for nodes with fixed size to verify the fixed size. > > The scratch emit buffer does not support relocations. Therefore any code needing relocations must check for scratch emit and skip the relocations if so. > > The s390x architecture offers a lot of instructions with pc-relative addressing. We use these to access constants in the constant section of the code buffer. As this section can be resized, these > offsets must be able to be relocated. Instead of coding the check whether a scratch emit is happening into all the MachNodes, we mark the scratch emit buffers as such and just skip the relocation in > these buffers. This simplifies usage of relocations in a lot of nodes and macroAssembler routines. > > Best regards, > > Goetz. > From martin.doerr at sap.com Mon Sep 26 08:46:02 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 26 Sep 2016 08:46:02 +0000 Subject: RFR(S): 8166689: PPC64: Race condition between stack bang and non-entrant patching Message-ID: <1e90645ee3e14fee8ef0b7dc131157a2@DEWDFE13DE14.global.corp.sap> Hi, I found a race condition between stack bang and non-entrant patching on linux PPC64. The signal handler on linux PPC64 investigates the instruction when a stack bang has hit the protected zone. Another thread may patch the verified entry point preventing the signal handler from recognizing the stack overflow. This problem can be prevented by rearranging C1's prolog code such that the stack bang instruction will never be at the verified entry point. C2's prolog code is already implemented accordingly. My proposed fix is here: http://cr.openjdk.java.net/~mdoerr/8166689_PPC64_C1_stackbang/webrev.00/ I have also fixed a missing RewriteControl check in the template interpreter on PPC64. Please review. Thanks and best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Mon Sep 26 09:16:21 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 26 Sep 2016 09:16:21 +0000 Subject: RFR(S): 8166689: PPC64: Race condition between stack bang and non-entrant patching In-Reply-To: <1e90645ee3e14fee8ef0b7dc131157a2@DEWDFE13DE14.global.corp.sap> References: <1e90645ee3e14fee8ef0b7dc131157a2@DEWDFE13DE14.global.corp.sap> Message-ID: <59f6c7b138df480680b2f3e47d80c986@DEWDFE13DE50.global.corp.sap> Hi Martin, Good catch! I wondered whether CodeOffsets::Frame_Complete is still set properly, but that's set in shared code after calling build_frame. So the fix is good. Thanks also for fixing the better byte behavior issue. Best regards, Goetz. From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin Sent: Montag, 26. September 2016 10:46 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8166689: PPC64: Race condition between stack bang and non-entrant patching Hi, I found a race condition between stack bang and non-entrant patching on linux PPC64. The signal handler on linux PPC64 investigates the instruction when a stack bang has hit the protected zone. Another thread may patch the verified entry point preventing the signal handler from recognizing the stack overflow. This problem can be prevented by rearranging C1's prolog code such that the stack bang instruction will never be at the verified entry point. C2's prolog code is already implemented accordingly. My proposed fix is here: http://cr.openjdk.java.net/~mdoerr/8166689_PPC64_C1_stackbang/webrev.00/ I have also fixed a missing RewriteControl check in the template interpreter on PPC64. Please review. Thanks and best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Mon Sep 26 09:27:25 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 26 Sep 2016 09:27:25 +0000 Subject: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms In-Reply-To: References: <29e2b45c984248da8172cf921b7811a6@DEWDFE13DE14.global.corp.sap> <66073428-8ee1-ecf2-52c0-8f4af5a6e6e8@oracle.com> <73f98e3882bd46dab427a02de68a1b93@DEWDFE13DE14.global.corp.sap> Message-ID: <14cca179436e4d49ae94f44977af033d@DEWDFE13DE14.global.corp.sap> Hi, can somebody sponsor this C1 bug fix, please? It has already one review. Thanks and best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Montag, 19. September 2016 19:10 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms This looks good. Thanks, Vladimir On 9/19/16 6:47 AM, Doerr, Martin wrote: > Hi Vladimir, > > you're right. I have fixed that too in the new webrev: > http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.01/ > > The 2 LIR_Address constructors you have mentioned don't have many users. The other ones look ok. > > Thanks and best regards, > Martin > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Donnerstag, 15. September 2016 20:06 > To: hotspot-compiler-dev at openjdk.java.net > Cc: Doerr, Martin > Subject: Re: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms > > Good but is is not enough. > > emit_array_address() in c1_LIRGenerator_x86.cpp has the same problem. > I would suggest to look on all places where next methods are called and make sure they are correct: > > LIR_Address(LIR_Opr base, intx disp, BasicType type) > LIR_Address(LIR_Opr base, LIR_Opr index, Scale scale, intx disp, BasicType type) > > Thanks, > Vladimir > > On 9/15/16 8:25 AM, Doerr, Martin wrote: >> Hi, >> >> >> >> as discussed with Vladimir, C1 contains code to simplify constant index/displacement addressing which uses int. However, >> int may overflow on 64 bit platforms. >> >> >> >> Please review the following webrev: >> >> http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.00/ >> >> >> >> I'll also need a sponsor, please. >> >> >> >> Thanks and best regards, >> >> Martin >> >> >> From vladimir.kozlov at oracle.com Mon Sep 26 16:15:27 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Sep 2016 09:15:27 -0700 Subject: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms In-Reply-To: <14cca179436e4d49ae94f44977af033d@DEWDFE13DE14.global.corp.sap> References: <29e2b45c984248da8172cf921b7811a6@DEWDFE13DE14.global.corp.sap> <66073428-8ee1-ecf2-52c0-8f4af5a6e6e8@oracle.com> <73f98e3882bd46dab427a02de68a1b93@DEWDFE13DE14.global.corp.sap> <14cca179436e4d49ae94f44977af033d@DEWDFE13DE14.global.corp.sap> Message-ID: <57E9499F.3010908@oracle.com> Sent to JPRT. Vladimir On 9/26/16 2:27 AM, Doerr, Martin wrote: > Hi, > > can somebody sponsor this C1 bug fix, please? > It has already one review. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Montag, 19. September 2016 19:10 > To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms > > This looks good. > > Thanks, > Vladimir > > On 9/19/16 6:47 AM, Doerr, Martin wrote: >> Hi Vladimir, >> >> you're right. I have fixed that too in the new webrev: >> http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.01/ >> >> The 2 LIR_Address constructors you have mentioned don't have many users. The other ones look ok. >> >> Thanks and best regards, >> Martin >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Donnerstag, 15. September 2016 20:06 >> To: hotspot-compiler-dev at openjdk.java.net >> Cc: Doerr, Martin >> Subject: Re: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms >> >> Good but is is not enough. >> >> emit_array_address() in c1_LIRGenerator_x86.cpp has the same problem. >> I would suggest to look on all places where next methods are called and make sure they are correct: >> >> LIR_Address(LIR_Opr base, intx disp, BasicType type) >> LIR_Address(LIR_Opr base, LIR_Opr index, Scale scale, intx disp, BasicType type) >> >> Thanks, >> Vladimir >> >> On 9/15/16 8:25 AM, Doerr, Martin wrote: >>> Hi, >>> >>> >>> >>> as discussed with Vladimir, C1 contains code to simplify constant index/displacement addressing which uses int. However, >>> int may overflow on 64 bit platforms. >>> >>> >>> >>> Please review the following webrev: >>> >>> http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.00/ >>> >>> >>> >>> I'll also need a sponsor, please. >>> >>> >>> >>> Thanks and best regards, >>> >>> Martin >>> >>> >>> From vitalyd at gmail.com Mon Sep 26 17:23:01 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 26 Sep 2016 13:23:01 -0400 Subject: Odd inlining failure Message-ID: Hi guys, I'm trying to understand some "odd" inlining output from PrintInlining - hoping someone can explain/confirm. I have the following call graph: a() ------> b() --------------> c() So a() calls b() (and some other methods that aren't relevant here). b() calls c() and d() internally. a() gets hot, and is queued up for compilation (C2, tiered is disabled). b() is large (> MaxInlineSize) but less than FreqInlineSize - it gets inlined with "inline (hot)" in the log. c() is similar -- it's large, but < FreqInlineSize. However, the inlining output says "too big", and c() isn't inlined. Now, c() is *always* called when b() is called - it's a helper method (ironically, contains code moved out of b() to make b() smaller). b() is also the only caller of c(). So, if b() is "hot", why is c() not? Is it because compilation, and therefore inlining, started top-down here? CompileThreshold is the default here - 10000. Is it the case that b() reaches 10k, but c() is at 9999 still and is therefore not inlined? Let me know if something's not clear in the above description. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From cthalinger at twitter.com Mon Sep 26 20:36:51 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Mon, 26 Sep 2016 10:36:51 -1000 Subject: RFR: 8166517: [JVMCI] export JVMCI to auto-detected JVMCI compiler In-Reply-To: <720062BC-70D6-4592-AA39-9662398BBBC1@oracle.com> References: <17D28FE7-8E15-4D0C-89DE-7178E75A339D@oracle.com> <57E57DE4.3000305@oracle.com> <720062BC-70D6-4592-AA39-9662398BBBC1@oracle.com> Message-ID: <77F1D693-5893-48EB-9A77-DF8CBC092F0D@twitter.com> > On Sep 23, 2016, at 10:19 AM, Doug Simon wrote: > > >> On 23 Sep 2016, at 21:09, Vladimir Kozlov wrote: >> >> Looks fine to me. But I thought Chris or Tom could review it. They are both OpenJDK Reviewers. Sorry, I traveled. Looks good, FTR. > > I?m not fussy - anyone with the sufficient role with do ;-) > > Thanks for the review in any case. > > -Doug > >> On 9/23/16 5:31 AM, Doug Simon wrote: >>> Can I please get a review of this tiny change. >>> >>> Thanks! >>> >>> -Doug >>> >>>> On 22 Sep 2016, at 09:54, Doug Simon wrote: >>>> >>>> When JVMCI compiler auto-selection (JDK-8160730) is used, then JVMCI needs to be exported to the selected compiler that same as way if the -Djvmci.Compiler property was specified. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8166517 >>>> http://cr.openjdk.java.net/~dnsimon/8166517/ >>>> >>>> -Doug >>> > From rasbold at google.com Mon Sep 26 22:18:31 2016 From: rasbold at google.com (Chuck Rasbold) Date: Mon, 26 Sep 2016 15:18:31 -0700 Subject: RFR(S): 8166742 : SIGFPE in C2 Loop IV elimination Message-ID: A small fix for an edge case crash in C2... Bug: https://bugs.openjdk.java.net/browse/JDK-8166742 Webrev: http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/ Requesting a sponsor and reviews. Thanks. -- Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Sep 26 22:45:23 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Sep 2016 15:45:23 -0700 Subject: RFR(S): 8166742 : SIGFPE in C2 Loop IV elimination In-Reply-To: References: Message-ID: <57E9A503.6090506@oracle.com> Hi Chuck Can you do 'long' arithmetic in existing condition to catch integer overflow instead? if ((ratio_con * stride_con) == stride_con2) { // Check for exact thanks, Vladimir On 9/26/16 3:18 PM, Chuck Rasbold wrote: > A small fix for an edge case crash in C2... > > Bug: https://bugs.openjdk.java.net/browse/JDK-8166742 > Webrev: http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/ > > Requesting a sponsor and reviews. Thanks. > > -- Chuck From rasbold at google.com Tue Sep 27 00:01:14 2016 From: rasbold at google.com (Chuck Rasbold) Date: Mon, 26 Sep 2016 17:01:14 -0700 Subject: RFR(S): 8166742 : SIGFPE in C2 Loop IV elimination In-Reply-To: <57E9A503.6090506@oracle.com> References: <57E9A503.6090506@oracle.com> Message-ID: Just to confirm, are you suggesting that the ratio be first computed as a 64 bit quantity, effectively along the lines of... long ratio_conl = ((long) stride_con2) / stride_con; if ((ratio_conl * stride_con) == stride_con2 && ratio_conl < 0x8000000 ) { // Check for exact int ratio_con = (int) ratio_conl; On Mon, Sep 26, 2016 at 3:45 PM, Vladimir Kozlov wrote: > Hi Chuck > > Can you do 'long' arithmetic in existing condition to catch integer > overflow instead? > > if ((ratio_con * stride_con) == stride_con2) { // Check for exact > > thanks, > Vladimir > > > > On 9/26/16 3:18 PM, Chuck Rasbold wrote: > >> A small fix for an edge case crash in C2... >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8166742 >> Webrev: http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/ >> >> Requesting a sponsor and reviews. Thanks. >> >> -- Chuck >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Sep 27 00:35:17 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 26 Sep 2016 17:35:17 -0700 Subject: RFR(S): 8166742 : SIGFPE in C2 Loop IV elimination In-Reply-To: References: <57E9A503.6090506@oracle.com> Message-ID: <57E9BEC5.2060308@oracle.com> Slightly different (cast after /) and jlong type: jlong ratio_conl = (jlong) (stride_con2 / stride_con); if ((ratio_conl * stride_con) == (jlong)stride_con2) { // Check for exact Vladimir On 9/26/16 5:01 PM, Chuck Rasbold wrote: > Just to confirm, are you suggesting that the ratio be first computed as a 64 bit quantity, effectively along the lines of... > > long ratio_conl = ((long) stride_con2) / stride_con; > > if ((ratio_conl * stride_con) == stride_con2 && > ratio_conl < 0x8000000 ) { // Check for exact > int ratio_con = (int) ratio_conl; > > > On Mon, Sep 26, 2016 at 3:45 PM, Vladimir Kozlov > wrote: > > Hi Chuck > > Can you do 'long' arithmetic in existing condition to catch integer overflow instead? > > if ((ratio_con * stride_con) == stride_con2) { // Check for exact > > thanks, > Vladimir > > > > On 9/26/16 3:18 PM, Chuck Rasbold wrote: > > A small fix for an edge case crash in C2... > > Bug: https://bugs.openjdk.java.net/browse/JDK-8166742 > Webrev: http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/ > > Requesting a sponsor and reviews. Thanks. > > -- Chuck > > From martin.doerr at sap.com Tue Sep 27 08:19:01 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 27 Sep 2016 08:19:01 +0000 Subject: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms In-Reply-To: <57E9499F.3010908@oracle.com> References: <29e2b45c984248da8172cf921b7811a6@DEWDFE13DE14.global.corp.sap> <66073428-8ee1-ecf2-52c0-8f4af5a6e6e8@oracle.com> <73f98e3882bd46dab427a02de68a1b93@DEWDFE13DE14.global.corp.sap> <14cca179436e4d49ae94f44977af033d@DEWDFE13DE14.global.corp.sap> <57E9499F.3010908@oracle.com> Message-ID: <563ea24df5e9458092264113e8989bbf@DEWDFE13DE14.global.corp.sap> Hi Vladimir, thank you very much for your support. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Montag, 26. September 2016 18:15 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms Sent to JPRT. Vladimir On 9/26/16 2:27 AM, Doerr, Martin wrote: > Hi, > > can somebody sponsor this C1 bug fix, please? > It has already one review. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Montag, 19. September 2016 19:10 > To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms > > This looks good. > > Thanks, > Vladimir > > On 9/19/16 6:47 AM, Doerr, Martin wrote: >> Hi Vladimir, >> >> you're right. I have fixed that too in the new webrev: >> http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.01/ >> >> The 2 LIR_Address constructors you have mentioned don't have many users. The other ones look ok. >> >> Thanks and best regards, >> Martin >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Donnerstag, 15. September 2016 20:06 >> To: hotspot-compiler-dev at openjdk.java.net >> Cc: Doerr, Martin >> Subject: Re: RFR(M): 8166140: C1: Possible integer overflow in LIRGenerator::generate_address on several platforms >> >> Good but is is not enough. >> >> emit_array_address() in c1_LIRGenerator_x86.cpp has the same problem. >> I would suggest to look on all places where next methods are called and make sure they are correct: >> >> LIR_Address(LIR_Opr base, intx disp, BasicType type) >> LIR_Address(LIR_Opr base, LIR_Opr index, Scale scale, intx disp, BasicType type) >> >> Thanks, >> Vladimir >> >> On 9/15/16 8:25 AM, Doerr, Martin wrote: >>> Hi, >>> >>> >>> >>> as discussed with Vladimir, C1 contains code to simplify constant index/displacement addressing which uses int. However, >>> int may overflow on 64 bit platforms. >>> >>> >>> >>> Please review the following webrev: >>> >>> http://cr.openjdk.java.net/~mdoerr/8166140_C1_int_overflow/webrev.00/ >>> >>> >>> >>> I'll also need a sponsor, please. >>> >>> >>> >>> Thanks and best regards, >>> >>> Martin >>> >>> >>> From martin.doerr at sap.com Tue Sep 27 09:03:12 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 27 Sep 2016 09:03:12 +0000 Subject: RFR(XXS): 8166767: C2: OptimizeStringConcat produces wrong results when copying UTF16 Strings Message-ID: <6359416b37c7459594386d16d48d0644@DEWDFE13DE14.global.corp.sap> Hi, can somebody review and sponsor this very small C2 bug fix, please? PhaseStringOpts::copy_constant_string increments index twice in the copy loop when source and destination are UTF16 encoded. http://cr.openjdk.java.net/~mdoerr/8166767_StringOpts_copy_bug/webrev.00/ Thanks and best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Tue Sep 27 09:30:41 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 27 Sep 2016 11:30:41 +0200 Subject: RFR(XXS): 8166767: C2: OptimizeStringConcat produces wrong results when copying UTF16 Strings In-Reply-To: <6359416b37c7459594386d16d48d0644@DEWDFE13DE14.global.corp.sap> References: <6359416b37c7459594386d16d48d0644@DEWDFE13DE14.global.corp.sap> Message-ID: <57EA3C41.5050206@oracle.com> Hi Martin, On 27.09.2016 11:03, Doerr, Martin wrote: > PhaseStringOpts::copy_constant_string increments index twice in the copy loop when source and destination are UTF16 encoded. > http://cr.openjdk.java.net/~mdoerr/8166767_StringOpts_copy_bug/webrev.00/ The index passed to readChar refers to an index in the source byte array (and length is the size in bytes): // Read two bytes from index and index+1 and convert them to a char For example, if we want to read the second char value, we need to use index = 4. Therefore, if we read chars, we need to increment i twice in each loop iteration to get the correct char index in the byte array. Or am I missing something? Best regards, Tobias From martin.doerr at sap.com Tue Sep 27 09:40:59 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 27 Sep 2016 09:40:59 +0000 Subject: RFR(XXS): 8166767: C2: OptimizeStringConcat produces wrong results when copying UTF16 Strings In-Reply-To: <57EA3C41.5050206@oracle.com> References: <6359416b37c7459594386d16d48d0644@DEWDFE13DE14.global.corp.sap> <57EA3C41.5050206@oracle.com> Message-ID: <0d0271854c0d4792ac0b2cc0180a83d5@DEWDFE13DE14.global.corp.sap> Hi Tobias, thank you very much for reviewing. You're right. I just noticed that the code didn't fit to another change which I have locally. I got to fix that. Sorry for the mistake. I'll close the bug. Best regards, Martin -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Dienstag, 27. September 2016 11:31 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(XXS): 8166767: C2: OptimizeStringConcat produces wrong results when copying UTF16 Strings Hi Martin, On 27.09.2016 11:03, Doerr, Martin wrote: > PhaseStringOpts::copy_constant_string increments index twice in the copy loop when source and destination are UTF16 encoded. > http://cr.openjdk.java.net/~mdoerr/8166767_StringOpts_copy_bug/webrev.00/ The index passed to readChar refers to an index in the source byte array (and length is the size in bytes): // Read two bytes from index and index+1 and convert them to a char For example, if we want to read the second char value, we need to use index = 4. Therefore, if we read chars, we need to increment i twice in each loop iteration to get the correct char index in the byte array. Or am I missing something? Best regards, Tobias From tobias.hartmann at oracle.com Tue Sep 27 10:19:46 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 27 Sep 2016 12:19:46 +0200 Subject: RFR(XXS): 8166767: C2: OptimizeStringConcat produces wrong results when copying UTF16 Strings In-Reply-To: <0d0271854c0d4792ac0b2cc0180a83d5@DEWDFE13DE14.global.corp.sap> References: <6359416b37c7459594386d16d48d0644@DEWDFE13DE14.global.corp.sap> <57EA3C41.5050206@oracle.com> <0d0271854c0d4792ac0b2cc0180a83d5@DEWDFE13DE14.global.corp.sap> Message-ID: <57EA47C2.4000404@oracle.com> Hi Martin, On 27.09.2016 11:40, Doerr, Martin wrote: > Hi Tobias, > > thank you very much for reviewing. You're right. > I just noticed that the code didn't fit to another change which I have locally. I got to fix that. > Sorry for the mistake. I'll close the bug. Sure, no problem! Best regards, Tobias > Best regards, > Martin > > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Dienstag, 27. September 2016 11:31 > To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(XXS): 8166767: C2: OptimizeStringConcat produces wrong results when copying UTF16 Strings > > Hi Martin, > > On 27.09.2016 11:03, Doerr, Martin wrote: >> PhaseStringOpts::copy_constant_string increments index twice in the copy loop when source and destination are UTF16 encoded. >> http://cr.openjdk.java.net/~mdoerr/8166767_StringOpts_copy_bug/webrev.00/ > > The index passed to readChar refers to an index in the source byte array (and length is the size in bytes): > // Read two bytes from index and index+1 and convert them to a char > > For example, if we want to read the second char value, we need to use index = 4. Therefore, if we read chars, we need to increment i twice in each loop iteration to get the correct char index in the byte array. > > Or am I missing something? > > Best regards, > Tobias > From roland.schatz at oracle.com Tue Sep 27 12:54:56 2016 From: roland.schatz at oracle.com (Roland Schatz) Date: Tue, 27 Sep 2016 14:54:56 +0200 Subject: RFR: 8166781: fix wrong comment in ReceiverTypeData Message-ID: <6e26665d-b687-4ef8-619f-b398eca63b2a@oracle.com> Hi, Please review this comment fix: webrev: http://cr.openjdk.java.net/~rschatz/JDK-8166781/webrev.00/ issue: https://bugs.openjdk.java.net/browse/JDK-8166781 According to my reading of the code, the comment should now agree with the code. But I don't pretend to really understand that code. It would be nice if someone who knows about the profiling code could confirm that's actually true ;) See also previous thread about that issue: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-August/024105.html Thanks, Roland From rasbold at google.com Tue Sep 27 14:56:27 2016 From: rasbold at google.com (Chuck Rasbold) Date: Tue, 27 Sep 2016 07:56:27 -0700 Subject: RFR(S): 8166742 : SIGFPE in C2 Loop IV elimination In-Reply-To: <57E9BEC5.2060308@oracle.com> References: <57E9A503.6090506@oracle.com> <57E9BEC5.2060308@oracle.com> Message-ID: On Mon, Sep 26, 2016 at 5:35 PM, Vladimir Kozlov wrote: > Slightly different (cast after /) and jlong type: > > jlong ratio_conl = (jlong) (stride_con2 / stride_con); > The division above won't work (at least, it raises a SIGFPE on my Linux x86 platform) when stride_con2 == min_jint and stride_con == -1. > if ((ratio_conl * stride_con) == (jlong)stride_con2) { // Check for exact > > What would be the value of ratio_conl such that this test fails? I think I'm missing something... -- Chuck > Vladimir > > On 9/26/16 5:01 PM, Chuck Rasbold wrote: > >> Just to confirm, are you suggesting that the ratio be first computed as a >> 64 bit quantity, effectively along the lines of... >> >> long ratio_conl = ((long) stride_con2) / stride_con; >> >> if ((ratio_conl * stride_con) == stride_con2 && >> ratio_conl < 0x8000000 ) { // Check for exact >> int ratio_con = (int) ratio_conl; >> >> >> On Mon, Sep 26, 2016 at 3:45 PM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com > wrote: >> >> Hi Chuck >> >> Can you do 'long' arithmetic in existing condition to catch integer >> overflow instead? >> >> if ((ratio_con * stride_con) == stride_con2) { // Check for exact >> >> thanks, >> Vladimir >> >> >> >> On 9/26/16 3:18 PM, Chuck Rasbold wrote: >> >> A small fix for an edge case crash in C2... >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8166742 < >> https://bugs.openjdk.java.net/browse/JDK-8166742> >> Webrev: http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/ < >> http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/> >> >> Requesting a sponsor and reviews. Thanks. >> >> -- Chuck >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Sep 27 16:48:44 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Sep 2016 09:48:44 -0700 Subject: RFR(S): 8166742 : SIGFPE in C2 Loop IV elimination In-Reply-To: References: <57E9A503.6090506@oracle.com> <57E9BEC5.2060308@oracle.com> Message-ID: <274bdc7a-e7e5-882c-5715-32063f1e0f2c@oracle.com> So why it is SIGFPE when both values are 'int'? I thought it is incorrect results cause SIGFPE that is why I suggested to check for integer overflow. Lets then go with your second suggested change here. But let check that ratio is small first and do cast to (jint) otherwise the long check is useless: // The ratio of the two strides cannot be represented as an int // if stride_con2 is min_int and stride_con is -1. jlong ratio_conl = ((jlong)stride_con2 / stride_con); if ((ratio_conl < 0x80000000L) && (jint)(ratio_conl * stride_con) == stride_con2) { // Check for exact jint ratio_con = (jint)ratio_conl; Thanks, Vladimir On 9/27/16 7:56 AM, Chuck Rasbold wrote: > > > On Mon, Sep 26, 2016 at 5:35 PM, Vladimir Kozlov > > wrote: > > Slightly different (cast after /) and jlong type: > > jlong ratio_conl = (jlong) (stride_con2 / stride_con); > > > The division above won't work (at least, it raises a SIGFPE on my Linux > x86 platform) when stride_con2 == min_jint and stride_con == -1. > > > if ((ratio_conl * stride_con) == (jlong)stride_con2) { // Check > for exact > > > What would be the value of ratio_conl such that this test fails? I > think I'm missing something... > > -- Chuck > > > Vladimir > > On 9/26/16 5:01 PM, Chuck Rasbold wrote: > > Just to confirm, are you suggesting that the ratio be first > computed as a 64 bit quantity, effectively along the lines of... > > long ratio_conl = ((long) stride_con2) / stride_con; > > if ((ratio_conl * stride_con) == stride_con2 && > ratio_conl < 0x8000000 ) { // Check for exact > int ratio_con = (int) ratio_conl; > > > On Mon, Sep 26, 2016 at 3:45 PM, Vladimir Kozlov > > >> wrote: > > Hi Chuck > > Can you do 'long' arithmetic in existing condition to catch > integer overflow instead? > > if ((ratio_con * stride_con) == stride_con2) { // Check for > exact > > thanks, > Vladimir > > > > On 9/26/16 3:18 PM, Chuck Rasbold wrote: > > A small fix for an edge case crash in C2... > > Bug: https://bugs.openjdk.java.net/browse/JDK-8166742 > > > > Webrev: > http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/ > > > > > Requesting a sponsor and reviews. Thanks. > > -- Chuck > > > From rasbold at google.com Tue Sep 27 20:57:35 2016 From: rasbold at google.com (Chuck Rasbold) Date: Tue, 27 Sep 2016 13:57:35 -0700 Subject: RFR(S): 8166742 : SIGFPE in C2 Loop IV elimination In-Reply-To: <274bdc7a-e7e5-882c-5715-32063f1e0f2c@oracle.com> References: <57E9A503.6090506@oracle.com> <57E9BEC5.2060308@oracle.com> <274bdc7a-e7e5-882c-5715-32063f1e0f2c@oracle.com> Message-ID: Sorry for not being transparent enough. Here's an external reference that describes the problem that is being encountered by the division: https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Signed-Integer-Division.html That's why the original fix targeted a very specific case. One can't represent ratio_con as a 32 bit value in that case. Worse, trying to compute it by division causes a SIGFPE. Do you think the revised code below is as straightforward as the original? -- Chuck On Tue, Sep 27, 2016 at 9:48 AM, Vladimir Kozlov wrote: > So why it is SIGFPE when both values are 'int'? > > I thought it is incorrect results cause SIGFPE that is why I suggested to > check for integer overflow. > > Lets then go with your second suggested change here. But let check that > ratio is small first and do cast to (jint) otherwise the long check is > useless: > > // The ratio of the two strides cannot be represented as an int > // if stride_con2 is min_int and stride_con is -1. > jlong ratio_conl = ((jlong)stride_con2 / stride_con); > > if ((ratio_conl < 0x80000000L) && > (jint)(ratio_conl * stride_con) == stride_con2) { // Check for exact > jint ratio_con = (jint)ratio_conl; > > Thanks, > Vladimir > > On 9/27/16 7:56 AM, Chuck Rasbold wrote: > >> >> >> On Mon, Sep 26, 2016 at 5:35 PM, Vladimir Kozlov >> > wrote: >> >> Slightly different (cast after /) and jlong type: >> >> jlong ratio_conl = (jlong) (stride_con2 / stride_con); >> >> >> The division above won't work (at least, it raises a SIGFPE on my Linux >> x86 platform) when stride_con2 == min_jint and stride_con == -1. >> >> >> if ((ratio_conl * stride_con) == (jlong)stride_con2) { // Check >> for exact >> >> >> What would be the value of ratio_conl such that this test fails? I >> think I'm missing something... >> >> -- Chuck >> >> >> Vladimir >> >> On 9/26/16 5:01 PM, Chuck Rasbold wrote: >> >> Just to confirm, are you suggesting that the ratio be first >> computed as a 64 bit quantity, effectively along the lines of... >> >> long ratio_conl = ((long) stride_con2) / stride_con; >> >> if ((ratio_conl * stride_con) == stride_con2 && >> ratio_conl < 0x8000000 ) { // Check for exact >> int ratio_con = (int) ratio_conl; >> >> >> On Mon, Sep 26, 2016 at 3:45 PM, Vladimir Kozlov >> >> > >> wrote: >> >> Hi Chuck >> >> Can you do 'long' arithmetic in existing condition to catch >> integer overflow instead? >> >> if ((ratio_con * stride_con) == stride_con2) { // Check for >> exact >> >> thanks, >> Vladimir >> >> >> >> On 9/26/16 3:18 PM, Chuck Rasbold wrote: >> >> A small fix for an edge case crash in C2... >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8166742 >> >> > > >> Webrev: >> http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/ >> >> > > >> >> Requesting a sponsor and reviews. Thanks. >> >> -- Chuck >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.gronlund at oracle.com Tue Sep 27 20:58:46 2016 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Tue, 27 Sep 2016 13:58:46 -0700 (PDT) Subject: RFR(XS): 8166806: Add intrinsic support for writer used in event based tracing Message-ID: Greetings, Kindly asking for reviews for the following change: Bug: http://bugs.openjdk.java.net/browse/JDK-8166806 Webrev: http://cr.openjdk.java.net/~mgronlun/8166806/webrev/ Thanks in advance Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Wed Sep 28 09:30:03 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 28 Sep 2016 11:30:03 +0200 Subject: RFR(S): 8166836: Elimination of clone's ArrayCopyNode may make compilation fail silently Message-ID: http://cr.openjdk.java.net/~roland/8166836/webrev.00/ For a non escaping allocation, the logic that eliminates a clone's ArrayCopy node, adds loads for each field of the eliminated allocation at safepoints. For each load, the load's control is set so the load is after the memory barrier that always precedes the ArrayCopy node but the memory edge is set to the memory state before the memory barrier. Anti dependency edges are added to the load nodes resulting in a graph that can't be scheduled and a compilation that always fail. I think the memory edges of the loads bypass the memory barrier so the loads have a chance to be optimized out (if for instance the loads are from a just allocated object). But that code doesn't seem to even work in simple cases. Instead, I propose we eliminate the memory barrier before the ArrayCopy node (and the one after). It's quite unfortunate that this wasn't found by testing because compilations where the graph is non schedulable simply fail. This could have gone unnoticed much longer. In debug builds shouldn't we abort the VM in C2Compiler::compile_method() if the compilation fails because of a non schedulable graph? Roland. From vladimir.x.ivanov at oracle.com Wed Sep 28 10:04:39 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 28 Sep 2016 13:04:39 +0300 Subject: RFR(S): 8166836: Elimination of clone's ArrayCopyNode may make compilation fail silently In-Reply-To: References: Message-ID: <182f42a0-caca-ea5b-0e7d-3ebaff5f1bc1@oracle.com> > http://cr.openjdk.java.net/~roland/8166836/webrev.00/ Looks good. > It's quite unfortunate that this wasn't found by testing because > compilations where the graph is non schedulable simply fail. This could > have gone unnoticed much longer. In debug builds shouldn't we abort the > VM in C2Compiler::compile_method() if the compilation fails because of a > non schedulable graph? Sounds reasonable. I expect there are other cases when compilers bail out unexpectely. It would be good to have an assert checking it doesn't happen. Best regards, Vladimir Ivanov From rwestrel at redhat.com Wed Sep 28 13:03:40 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 28 Sep 2016 15:03:40 +0200 Subject: Odd inlining failure In-Reply-To: References: Message-ID: Hi Vitaly, > I'm trying to understand some "odd" inlining output from PrintInlining - > hoping someone can explain/confirm. You could run with -XX:+PrintMethodData (diagnostic). It prints all profile data at the end of the execution of the VM. You can then look at invocation counts at the call site of c in b and b in a. > I have the following call graph: > a() > ------> b() > --------------> c() > > So a() calls b() (and some other methods that aren't relevant here). b() > calls c() and d() internally. a() gets hot, and is queued up for > compilation (C2, tiered is disabled). > > b() is large (> MaxInlineSize) but less than FreqInlineSize - it gets > inlined with "inline (hot)" in the log. c() is similar -- it's large, but > < FreqInlineSize. However, the inlining output says "too big", and c() > isn't inlined. Now, c() is *always* called when b() is called - it's a > helper method (ironically, contains code moved out of b() to make b() > smaller). b() is also the only caller of c(). > > So, if b() is "hot", why is c() not? Is it because compilation, and > therefore inlining, started top-down here? CompileThreshold is the default > here - 10000. Is it the case that b() reaches 10k, but c() is at 9999 > still and is therefore not inlined? Maybe b() gets compiled early which would mean we stop collecting profile data at the call site for c() in b()? Is there a loop in b() that would cause it be compiled before it's invoked 10k times? Roland. From vitalyd at gmail.com Wed Sep 28 14:49:34 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 28 Sep 2016 10:49:34 -0400 Subject: Odd inlining failure In-Reply-To: References: Message-ID: Apologies, I accidentally dropped the list from my reply to Roland (quoted below). On Wed, Sep 28, 2016 at 10:38 AM, Roland Westrelin wrote: > > > In this case when b() is called its invocation count is +1 to c() because > > c() is only called by b(). Now, a() has a big switch statement, with one > > arm calling into b(). a() is called in a loop of sorts. So I think the > > switch arm calling b() gets hot and inlining starts. But since inlining > is > > going top-down here, I suspect it's failing to inline helper methods, > such > > as c(), that are just as hot as b() but the 10000'th invocation hasn't > been > > recorded yet? This seems kind of broken though, if true, so I'm wondering > > if I'm missing something. > > > > When recursively inlining, starting at a hot method, do the recursive > > callsites, like c(), need to also have exactly 10,000 (or more) > > invocations? What if it's, say, 9995? > > CompileThreshold=10000 is when compilation is triggered. It doesn't come > into play to decide whether inlining happens or not. Also if you have > loops, compilation is triggered when invocation counter + backedge > counter exceeds CompileThreshold. > > Note also, that profiling (invocation counters at calls etc.) doesn't > start until a method has been invoked a minimum number of times: > InterpreterProfilePercentage % of CompileThreshold. So profiling doesn't > start until invocation counter + backedge counter is greater than 3300 > by default with tiered off. If your method is inlined before it's been > invoked 3300, all the call sites in the method are cold. > Ah, maybe that's the reason -- there's a loop in the outer method (a()), so maybe that's the cause. I'll need to look at the compilation log or the PrintMethodData that you suggested. > > And if methods are invoked by multiple threads, updates to the counters > can be lost. > Single thread here. > > The code that triggers inlining is: > > int call_site_count = method()->scale_count(profile.count()); > int invoke_count = method()->interpreter_invocation_count(); > > assert(invoke_count != 0, "require invocation count greater than zero"); > int freq = call_site_count / invoke_count; > > // bump the max size if the call is frequent > if ((freq >= InlineFrequencyRatio) || > (call_site_count >= InlineFrequencyCount) || > is_unboxing_method(callee_method, C) || > is_init_with_ea(callee_method, caller_method, C)) { > > max_inline_size = C->freq_inline_size(); > if (size <= max_inline_size && TraceFrequencyInlining) { > CompileTask::print_inline_indent(inline_level()); > tty->print_cr("Inlined frequent method (freq=%d count=%d):", freq, > call_site_count); > CompileTask::print_inline_indent(inline_level()); > callee_method->print(); > tty->cr(); > } > } else { > // Not hot. Check for medium-sized pre-existing nmethod at cold sites. > if (callee_method->has_compiled_code() && > callee_method->instructions_size() > inline_small_code_size) { > set_msg("already compiled into a medium method"); > return false; > } > } > if (size > max_inline_size) { > if (max_inline_size > default_max_inline_size) { > set_msg("hot method too big"); > } else { > set_msg("too big"); > } > return false; > } > > So a call site is hot if the call site count exceeds > InlineFrequencyCount (100) or the frequency (ratio of number of time the > call was taken and the number of time the caller was entered) exceeds > InlineFrequencyRatio (20). InlineFrequencyCount is way below 10000. > > Do you have this as a simple test case that you can share? > I don't yet - I'll see if I can reproduce something. As noted, microbenchmarks/reduced test cases usually do the right thing but when same/similar code shapes/call graphs are incorporated into a large app, they don't. > > > I need to go look at the inlining heuristic code again, but maybe you > know > > offhand. > > > > As a general observation, I'm seeing lots of inlining failures, for a > > variety of reasons, in a complex app where I think inlining would help. > > The heuristics aren't doing the "right" thing. I know there are a few > > longstanding JBS entries around inlining, but I'm wondering if they will > > ever be addressed or whether Graal simply takes over for C2. I wonder if > > Oracle or RedHat or anyone else looks at inlining output on large apps > as a > > way to assess its effect? Microbenchmarks are usually fine because the > > profile is different, methods typically don't fail to inline because of > > InlineSmallCode, etc. > > > > I know I'm preaching to the choir and I apologize for the semi-rant, but > > inlining is paramount to Java performance, moreso than other languages > (eg > > C/C++) because of all the safety checks. Given @ForceInline isn't really > > available for end users, it's a huge pain and sometimes practically > > impossible to convince C2 to inline something. > > > > I understand Graal has better inlining properties (I believe it pseudo > > inlines to see if it's profitable, regardless of bytecode size). Is that > > the Hotspot answer to improved inlining? > > > > What the heck is everyone else doing for large apps with lots of hot > > callsites? :) I can move some code around manually to outline some > > (uncommon) code to slim down methods, but that's a hack IMO. > > You didn't send that email to the list. Was it intended? Argh - no, that was unintentional. I'm adding the list back in here. > I'm curious > what others would say. All I can say is that inlining heuristics are a > known weakness of c2. Improving them is not a simple project. Also > having graal on the horizon probably doesn't help: it could be a lot of > work that will be of little value when graal is here, whenever that > happens. > > Roland. > Thanks again -------------- next part -------------- An HTML attachment was scrubbed... URL: From rednaxelafx at gmail.com Wed Sep 28 15:37:19 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 28 Sep 2016 08:37:19 -0700 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: <4c873846-5322-ebdf-5e0a-393089aea590@oracle.com> References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> <4c873846-5322-ebdf-5e0a-393089aea590@oracle.com> Message-ID: Hi guys, Here's the HotSpot-side patch, based on OpenJDK9 HotSpot: Webrev: http://cr.openjdk.java.net/~kmo/8166840/webrev.00/ Please give me a preliminary idea of how you guys feel about the patch, and then I'll start an actual review thread if people agree on the direction of this patch. Note: This is the way javac constructs that "XXX$1" name for the accessConstructorTag: JDK7u: http://hg.openjdk.java.net/jdk7u/jdk7u/langtools/file/93a2788178e6/src/share/classes/com/sun/tools/javac/comp/Lower.java#l1154 JDK9: http://hg.openjdk.java.net/jdk9/jdk9/langtools/file/9f61004270d8/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l1241 So name matching on "$1" suffix is sufficient here to workaround this particular pattern from javac. P.S. I haven't built OpenJDK9 in quite a while now, and apparently the makefiles have changed and the scripts that I used to build JDK7u / JDK8u doesn't work on JDK9. What's the current recommended way to build just HotSpot with fastdebug / product levels? Thanks, Kris (OpenJDK username: kmo) On Wed, Sep 14, 2016 at 3:12 AM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > Kris, > > And I'm happy to upstream that patch, if the team is interested. >> > > Sure, we are definitely interested in fixing that. Feel free to file a bug > and send the fix out for review. > > Now, when I first discovered the problem, my first intuition was that >> it's better to "fix" it in javac. But before nest mates in the Class >> file, there isn't much that javac could do. Changing the Java libraries >> to not use private constructors in inner classes is also doable, but >> needs changing a lot of files. >> > > I agree that javac is not the best place to fix the immediate problem: it > requires recompilation and there are already lots of problematic bytecode > shapes out in the wild. The JVM should optimize for that case instead. > > So I ended up fixing it in the VM, even though I agree fully with what >> R?mi brought up. >> > > I'm curious how did you fix it. I haven't found a description in the > thread. > > It's possible to force class loading, but I'm worried about undesirable > effects of class initialization. Is it enough for C2 to have the class > loaded but not initialized to make it work? > > Another approach would be to issue a null check and deoptimize (for bridge > methods, the check collapses after inlining since the argument is always > null) or add a nmethod dependency and throw away the code when the > parameter class is loaded. > > Best regards, > Vladimir Ivanov > > The access constructor tag thingy in javac is really a weird hack. If >> you guys ever look at the contents of ArrayList$1, it's really empty >> -- the class doesn't even declare some of the usual structures in a >> normal Class file... Hopefully we can get rid of it in javac soon. >> > > On Tuesday, September 13, 2016, Vitaly Davidovich > > wrote: >> >> >> >> On Tuesday, September 13, 2016, Remi Forax > > wrote: >> >> I've always found that the empty inner classes generated by >> javac as a kind of hack. >> >> These classes should be removed in Java 10, thanks to the >> nestmate attributes. >> >> http://mail.openjdk.java.net/pipermail/valhalla-spec-experts >> /2016-January/000060.html >> > s/2016-January/000060.html> >> >> The other solution, is to have an empty class in the jdk which >> is not visible from javac (the class itself can be marked as >> synthetic), >> so javac can use it without creating method clash. >> >> and to solve the problem now, the easy solution is to add a >> package private constructor in ArrayList.Itr, >> >> I'm hoping Oracle can take Kris' (Azul) patch (or do something >> similar). It might catch more cases than just modifying Itr. >> >> >> private class Itr implements Iterator { >> int cursor; // index of next element to return >> int lastRet = -1; // index of last element returned; -1 if no such >> int expectedModCount = modCount; >> >> Itr() { >> // avoid to generate a synthetic accessor constructor >> } >> } >> >> >> regards, >> R?mi >> >> ------------------------------------------------------------ >> ------------ >> >> *De: *"Vitaly Davidovich" >> *?: *"Krystal Mok" >> *Cc: *"hotspot compiler" > java.net> >> *Envoy?: *Lundi 12 Septembre 2016 22:15:41 >> *Objet: *Re: Odd interaction between ArrayList$Itr and >> >> Escape Analysis >> >> >> >> On Mon, Sep 12, 2016 at 3:56 PM, Krystal Mok >> wrote: >> >> On Mon, Sep 12, 2016 at 12:38 PM, Vitaly Davidovich >> wrote: >> >> It seems odd to me as well why inlining won't force >> load the missing class(es). If we're inlining, it >> means the method itself or the call chain it's part >> of is hot - failing to inline can have negative >> side-effects, like this example. I suppose there >> must be a good reason why it doesn't do this though? >> >> >> That's because we can't. The JIT compilers are running >> on their own threads, and they're not real "Java >> threads". So they are not allowed to run arbitrary Java >> code. But Java class loading may involve running >> arbitrary Java code, e.g. the ClassLoader.loadClass() >> upcall. >> Force class loading can be done on the triggering side >> (for the top-level method), because compilation tasks >> are triggered from real Java threads, and they're >> allowed to run arbitrary Java code. >> >> I see, makes sense. Perhaps there can be an option to turn >> on loading of required types in the entire compilation unit, >> after all inlining is done (and therefore make the unloaded >> types not be barriers for inlining). I'd personally prefer >> that over having odd performance differences. >> >> >> - Kris >> >> Thanks Kris. >> >> >> >> -- >> Sent from my phone >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Wed Sep 28 16:17:48 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 28 Sep 2016 19:17:48 +0300 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> <4c873846-5322-ebdf-5e0a-393089aea590@oracle.com> Message-ID: Kris, thanks for sharing the patch! IMO the problem we observe is not specific to bridge methods. It demonstrates a generic short-coming in C2 inlining heuristic: even though the argument is never used (otherwise, the class would have been already loaded, right?), we don't inline the whole method. So, I'd prefer to see a solution which covers the general case. Can we do that? It seems so: it could be achieved by a null guard on the argument or a nmethod dependency on the unloaded class. Best regards, Vladimir Ivanov On 9/28/16 6:37 PM, Krystal Mok wrote: > Hi guys, > > Here's the HotSpot-side patch, based on OpenJDK9 HotSpot: > > Webrev: http://cr.openjdk.java.net/~kmo/8166840/webrev.00/ > > Please give me a preliminary idea of how you guys feel about the patch, > and then I'll start an actual review thread if people agree on the > direction of this patch. > > Note: This is the way javac constructs that "XXX$1" name for the > accessConstructorTag: > JDK7u: http://hg.openjdk.java.net/jdk7u/jdk7u/langtools/file/93a2788178e6/src/share/classes/com/sun/tools/javac/comp/Lower.java#l1154 > JDK9: > http://hg.openjdk.java.net/jdk9/jdk9/langtools/file/9f61004270d8/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l1241 > > So name matching on "$1" suffix is sufficient here to workaround this > particular pattern from javac. > > P.S. I haven't built OpenJDK9 in quite a while now, and apparently the > makefiles have changed and the scripts that I used to build JDK7u / > JDK8u doesn't work on JDK9. What's the current recommended way to build > just HotSpot with fastdebug / product levels? > > Thanks, > Kris (OpenJDK username: kmo) > > On Wed, Sep 14, 2016 at 3:12 AM, Vladimir Ivanov > > wrote: > > Kris, > > And I'm happy to upstream that patch, if the team is interested. > > > Sure, we are definitely interested in fixing that. Feel free to file > a bug and send the fix out for review. > > Now, when I first discovered the problem, my first intuition was > that > it's better to "fix" it in javac. But before nest mates in the Class > file, there isn't much that javac could do. Changing the Java > libraries > to not use private constructors in inner classes is also doable, but > needs changing a lot of files. > > > I agree that javac is not the best place to fix the immediate > problem: it requires recompilation and there are already lots of > problematic bytecode shapes out in the wild. The JVM should optimize > for that case instead. > > So I ended up fixing it in the VM, even though I agree fully > with what > R?mi brought up. > > > I'm curious how did you fix it. I haven't found a description in the > thread. > > It's possible to force class loading, but I'm worried about > undesirable effects of class initialization. Is it enough for C2 to > have the class loaded but not initialized to make it work? > > Another approach would be to issue a null check and deoptimize (for > bridge methods, the check collapses after inlining since the > argument is always null) or add a nmethod dependency and throw away > the code when the parameter class is loaded. > > Best regards, > Vladimir Ivanov > > The access constructor tag thingy in javac is really a weird > hack. If > you guys ever look at the contents of ArrayList$1, it's really empty > -- the class doesn't even declare some of the usual structures in a > normal Class file... Hopefully we can get rid of it in javac soon. > > > On Tuesday, September 13, 2016, Vitaly Davidovich > > >> wrote: > > > > On Tuesday, September 13, 2016, Remi Forax > > ');>> wrote: > > I've always found that the empty inner classes generated by > javac as a kind of hack. > > These classes should be removed in Java 10, thanks to the > nestmate attributes. > > > http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-January/000060.html > > > > > > The other solution, is to have an empty class in the jdk > which > is not visible from javac (the class itself can be marked as > synthetic), > so javac can use it without creating method clash. > > and to solve the problem now, the easy solution is to add a > package private constructor in ArrayList.Itr, > > I'm hoping Oracle can take Kris' (Azul) patch (or do something > similar). It might catch more cases than just modifying Itr. > > > private class Itr implements Iterator { > int cursor; // index of next element to return > int lastRet = -1; // index of last element returned; -1 > if no such > int expectedModCount = modCount; > > Itr() { > // avoid to generate a synthetic accessor constructor > } > } > > > regards, > R?mi > > > ------------------------------------------------------------------------ > > *De: *"Vitaly Davidovich" > > *?: *"Krystal Mok" > > *Cc: *"hotspot compiler" > > > *Envoy?: *Lundi 12 Septembre 2016 22:15:41 > *Objet: *Re: Odd interaction between ArrayList$Itr and > > Escape Analysis > > > > On Mon, Sep 12, 2016 at 3:56 PM, Krystal Mok > > wrote: > > On Mon, Sep 12, 2016 at 12:38 PM, Vitaly Davidovich > > > wrote: > > It seems odd to me as well why inlining > won't force > load the missing class(es). If we're > inlining, it > means the method itself or the call chain > it's part > of is hot - failing to inline can have negative > side-effects, like this example. I suppose > there > must be a good reason why it doesn't do this > though? > > > That's because we can't. The JIT compilers are > running > on their own threads, and they're not real "Java > threads". So they are not allowed to run > arbitrary Java > code. But Java class loading may involve running > arbitrary Java code, e.g. the > ClassLoader.loadClass() > upcall. > Force class loading can be done on the > triggering side > (for the top-level method), because compilation > tasks > are triggered from real Java threads, and they're > allowed to run arbitrary Java code. > > I see, makes sense. Perhaps there can be an option > to turn > on loading of required types in the entire > compilation unit, > after all inlining is done (and therefore make the > unloaded > types not be barriers for inlining). I'd personally > prefer > that over having odd performance differences. > > > - Kris > > Thanks Kris. > > > > -- > Sent from my phone > > From rednaxelafx at gmail.com Wed Sep 28 16:42:34 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 28 Sep 2016 09:42:34 -0700 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> <4c873846-5322-ebdf-5e0a-393089aea590@oracle.com> Message-ID: Hi Vladimir, Yes, the patch I posted was the short-term one that I used for getting rid of this particular kind of problem before a release, and it's already in production for us. So it was deliberately focused on a very narrow scenario so that I don't have to worry about testing too much. I do also have another patch for the general case for "unused unloaded arguments". I haven't gotten around to polish and test that patch yet, but since we're seeing a good motivation on the OpenJDK side as well, I may as well go back and get that patch ready soon. A null guard is a good way to go. It's basically the same kind of logic that C2 OSR entry already uses. In this case, at a call site, a null guard on the caller-side against an argument whose type is unloaded is one way to do it. (There are of course other alternatives. e.g. If we focus on the callee-side, in a compiler with a mixed top-down / bottom-up inlining heuristics system, the (devirtualized if needed) callee can be inspected first to see if an argument of unloaded type is never used or not. If it is never used, don't even bother inserting the null guard on the caller-side, and just go ahead and inline would be good and safe. C2 doesn't have this luxury yet so tackling the problem with a caller-side solution is easier to do.) IMO a nmethod dependency on an "unloaded class" isn't that feasible, since you might not even have a concrete entity to "depend on", and registering symbolic dependencies for "unloaded classes" in general, even though I believe is doable, might be rather tedious. Thanks, Kris On Wed, Sep 28, 2016 at 9:17 AM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > Kris, thanks for sharing the patch! > > IMO the problem we observe is not specific to bridge methods. > > It demonstrates a generic short-coming in C2 inlining heuristic: even > though the argument is never used (otherwise, the class would have been > already loaded, right?), we don't inline the whole method. > > So, I'd prefer to see a solution which covers the general case. > > Can we do that? It seems so: it could be achieved by a null guard on the > argument or a nmethod dependency on the unloaded class. > > Best regards, > Vladimir Ivanov > > On 9/28/16 6:37 PM, Krystal Mok wrote: > >> Hi guys, >> >> Here's the HotSpot-side patch, based on OpenJDK9 HotSpot: >> >> Webrev: http://cr.openjdk.java.net/~kmo/8166840/webrev.00/ >> >> Please give me a preliminary idea of how you guys feel about the patch, >> and then I'll start an actual review thread if people agree on the >> direction of this patch. >> >> Note: This is the way javac constructs that "XXX$1" name for the >> accessConstructorTag: >> JDK7u: http://hg.openjdk.java.net/jdk7u/jdk7u/langtools/file/93a278 >> 8178e6/src/share/classes/com/sun/tools/javac/comp/Lower.java#l1154 >> JDK9: >> http://hg.openjdk.java.net/jdk9/jdk9/langtools/file/9f610042 >> 70d8/src/jdk.compiler/share/classes/com/sun/tools/javac/ >> comp/Lower.java#l1241 >> >> So name matching on "$1" suffix is sufficient here to workaround this >> particular pattern from javac. >> >> P.S. I haven't built OpenJDK9 in quite a while now, and apparently the >> makefiles have changed and the scripts that I used to build JDK7u / >> JDK8u doesn't work on JDK9. What's the current recommended way to build >> just HotSpot with fastdebug / product levels? >> >> Thanks, >> Kris (OpenJDK username: kmo) >> >> On Wed, Sep 14, 2016 at 3:12 AM, Vladimir Ivanov >> > >> wrote: >> >> Kris, >> >> And I'm happy to upstream that patch, if the team is interested. >> >> >> Sure, we are definitely interested in fixing that. Feel free to file >> a bug and send the fix out for review. >> >> Now, when I first discovered the problem, my first intuition was >> that >> it's better to "fix" it in javac. But before nest mates in the >> Class >> file, there isn't much that javac could do. Changing the Java >> libraries >> to not use private constructors in inner classes is also doable, >> but >> needs changing a lot of files. >> >> >> I agree that javac is not the best place to fix the immediate >> problem: it requires recompilation and there are already lots of >> problematic bytecode shapes out in the wild. The JVM should optimize >> for that case instead. >> >> So I ended up fixing it in the VM, even though I agree fully >> with what >> R?mi brought up. >> >> >> I'm curious how did you fix it. I haven't found a description in the >> thread. >> >> It's possible to force class loading, but I'm worried about >> undesirable effects of class initialization. Is it enough for C2 to >> have the class loaded but not initialized to make it work? >> >> Another approach would be to issue a null check and deoptimize (for >> bridge methods, the check collapses after inlining since the >> argument is always null) or add a nmethod dependency and throw away >> the code when the parameter class is loaded. >> >> Best regards, >> Vladimir Ivanov >> >> The access constructor tag thingy in javac is really a weird >> hack. If >> you guys ever look at the contents of ArrayList$1, it's really >> empty >> -- the class doesn't even declare some of the usual structures in >> a >> normal Class file... Hopefully we can get rid of it in javac soon. >> >> >> On Tuesday, September 13, 2016, Vitaly Davidovich >> >> >> wrote: >> >> >> >> On Tuesday, September 13, 2016, Remi Forax >> >> > ');>> wrote: >> >> I've always found that the empty inner classes generated >> by >> javac as a kind of hack. >> >> These classes should be removed in Java 10, thanks to the >> nestmate attributes. >> >> >> http://mail.openjdk.java.net/pipermail/valhalla-spec-experts >> /2016-January/000060.html >> > s/2016-January/000060.html> >> >> > s/2016-January/000060.html >> > s/2016-January/000060.html>> >> >> The other solution, is to have an empty class in the jdk >> which >> is not visible from javac (the class itself can be marked >> as >> synthetic), >> so javac can use it without creating method clash. >> >> and to solve the problem now, the easy solution is to add >> a >> package private constructor in ArrayList.Itr, >> >> I'm hoping Oracle can take Kris' (Azul) patch (or do something >> similar). It might catch more cases than just modifying Itr. >> >> >> private class Itr implements Iterator { >> int cursor; // index of next element to return >> int lastRet = -1; // index of last element returned; -1 >> if no such >> int expectedModCount = modCount; >> >> Itr() { >> // avoid to generate a synthetic accessor constructor >> } >> } >> >> >> regards, >> R?mi >> >> >> ------------------------------------------------------------ >> ------------ >> >> *De: *"Vitaly Davidovich" > > >> *?: *"Krystal Mok" > > >> *Cc: *"hotspot compiler" >> > > >> *Envoy?: *Lundi 12 Septembre 2016 22:15:41 >> *Objet: *Re: Odd interaction between ArrayList$Itr and >> >> Escape Analysis >> >> >> >> On Mon, Sep 12, 2016 at 3:56 PM, Krystal Mok >> > > wrote: >> >> On Mon, Sep 12, 2016 at 12:38 PM, Vitaly >> Davidovich >> > >> >> wrote: >> >> It seems odd to me as well why inlining >> won't force >> load the missing class(es). If we're >> inlining, it >> means the method itself or the call chain >> it's part >> of is hot - failing to inline can have >> negative >> side-effects, like this example. I suppose >> there >> must be a good reason why it doesn't do this >> though? >> >> >> That's because we can't. The JIT compilers are >> running >> on their own threads, and they're not real "Java >> threads". So they are not allowed to run >> arbitrary Java >> code. But Java class loading may involve running >> arbitrary Java code, e.g. the >> ClassLoader.loadClass() >> upcall. >> Force class loading can be done on the >> triggering side >> (for the top-level method), because compilation >> tasks >> are triggered from real Java threads, and they're >> allowed to run arbitrary Java code. >> >> I see, makes sense. Perhaps there can be an option >> to turn >> on loading of required types in the entire >> compilation unit, >> after all inlining is done (and therefore make the >> unloaded >> types not be barriers for inlining). I'd personally >> prefer >> that over having odd performance differences. >> >> >> - Kris >> >> Thanks Kris. >> >> >> >> -- >> Sent from my phone >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Sep 28 16:47:19 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 28 Sep 2016 09:47:19 -0700 Subject: RFR(S): 8166742 : SIGFPE in C2 Loop IV elimination In-Reply-To: References: <57E9A503.6090506@oracle.com> <57E9BEC5.2060308@oracle.com> <274bdc7a-e7e5-882c-5715-32063f1e0f2c@oracle.com> Message-ID: On 9/27/16 1:57 PM, Chuck Rasbold wrote: > Sorry for not being transparent enough. Here's an external reference > that describes the problem > that is being encountered by the division: > > https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Signed-Integer-Division.html > > That's why the original fix targeted a very specific case. One can't > represent ratio_con as a 32 bit value in that case. > Worse, trying to compute it by division causes a SIGFPE. > > Do you think the revised code below is as straightforward as the original? Okay, looks like it is very special only one case and not a range of cases. Lets use your original fix then. I will sponsor it. Thanks, Vladimir > > -- Chuck > > On Tue, Sep 27, 2016 at 9:48 AM, Vladimir Kozlov > > wrote: > > So why it is SIGFPE when both values are 'int'? > > I thought it is incorrect results cause SIGFPE that is why I > suggested to check for integer overflow. > > Lets then go with your second suggested change here. But let check > that ratio is small first and do cast to (jint) otherwise the long > check is useless: > > // The ratio of the two strides cannot be represented as an int > // if stride_con2 is min_int and stride_con is -1. > jlong ratio_conl = ((jlong)stride_con2 / stride_con); > > if ((ratio_conl < 0x80000000L) && > (jint)(ratio_conl * stride_con) == stride_con2) { // Check for > exact > jint ratio_con = (jint)ratio_conl; > > Thanks, > Vladimir > > On 9/27/16 7:56 AM, Chuck Rasbold wrote: > > > > On Mon, Sep 26, 2016 at 5:35 PM, Vladimir Kozlov > > >> wrote: > > Slightly different (cast after /) and jlong type: > > jlong ratio_conl = (jlong) (stride_con2 / stride_con); > > > The division above won't work (at least, it raises a SIGFPE on > my Linux > x86 platform) when stride_con2 == min_jint and stride_con == -1. > > > if ((ratio_conl * stride_con) == (jlong)stride_con2) { // > Check > for exact > > > What would be the value of ratio_conl such that this test fails? I > think I'm missing something... > > -- Chuck > > > Vladimir > > On 9/26/16 5:01 PM, Chuck Rasbold wrote: > > Just to confirm, are you suggesting that the ratio be first > computed as a 64 bit quantity, effectively along the > lines of... > > long ratio_conl = ((long) stride_con2) / stride_con; > > if ((ratio_conl * stride_con) == stride_con2 && > ratio_conl < 0x8000000 ) { // Check for exact > int ratio_con = (int) ratio_conl; > > > On Mon, Sep 26, 2016 at 3:45 PM, Vladimir Kozlov > > > > > >>> wrote: > > Hi Chuck > > Can you do 'long' arithmetic in existing condition > to catch > integer overflow instead? > > if ((ratio_con * stride_con) == stride_con2) { // > Check for > exact > > thanks, > Vladimir > > > > On 9/26/16 3:18 PM, Chuck Rasbold wrote: > > A small fix for an edge case crash in C2... > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8166742 > > > > > >> > Webrev: > http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/ > > > > > >> > > Requesting a sponsor and reviews. Thanks. > > -- Chuck > > > > From rasbold at google.com Wed Sep 28 17:02:00 2016 From: rasbold at google.com (Chuck Rasbold) Date: Wed, 28 Sep 2016 10:02:00 -0700 Subject: RFR(S): 8166742 : SIGFPE in C2 Loop IV elimination In-Reply-To: References: <57E9A503.6090506@oracle.com> <57E9BEC5.2060308@oracle.com> <274bdc7a-e7e5-882c-5715-32063f1e0f2c@oracle.com> Message-ID: Thanks, Vladimir! On Wed, Sep 28, 2016 at 9:47 AM, Vladimir Kozlov wrote: > On 9/27/16 1:57 PM, Chuck Rasbold wrote: > >> Sorry for not being transparent enough. Here's an external reference >> that describes the problem >> that is being encountered by the division: >> >> https://www.gnu.org/software/autoconf/manual/autoconf-2.67/h >> tml_node/Signed-Integer-Division.html >> >> That's why the original fix targeted a very specific case. One can't >> represent ratio_con as a 32 bit value in that case. >> Worse, trying to compute it by division causes a SIGFPE. >> >> Do you think the revised code below is as straightforward as the original? >> > > Okay, looks like it is very special only one case and not a range of > cases. Lets use your original fix then. > > I will sponsor it. > > Thanks, > Vladimir > > >> -- Chuck >> >> On Tue, Sep 27, 2016 at 9:48 AM, Vladimir Kozlov >> > wrote: >> >> So why it is SIGFPE when both values are 'int'? >> >> I thought it is incorrect results cause SIGFPE that is why I >> suggested to check for integer overflow. >> >> Lets then go with your second suggested change here. But let check >> that ratio is small first and do cast to (jint) otherwise the long >> check is useless: >> >> // The ratio of the two strides cannot be represented as an int >> // if stride_con2 is min_int and stride_con is -1. >> jlong ratio_conl = ((jlong)stride_con2 / stride_con); >> >> if ((ratio_conl < 0x80000000L) && >> (jint)(ratio_conl * stride_con) == stride_con2) { // Check for >> exact >> jint ratio_con = (jint)ratio_conl; >> >> Thanks, >> Vladimir >> >> On 9/27/16 7:56 AM, Chuck Rasbold wrote: >> >> >> >> On Mon, Sep 26, 2016 at 5:35 PM, Vladimir Kozlov >> >> > >> wrote: >> >> Slightly different (cast after /) and jlong type: >> >> jlong ratio_conl = (jlong) (stride_con2 / stride_con); >> >> >> The division above won't work (at least, it raises a SIGFPE on >> my Linux >> x86 platform) when stride_con2 == min_jint and stride_con == -1. >> >> >> if ((ratio_conl * stride_con) == (jlong)stride_con2) { // >> Check >> for exact >> >> >> What would be the value of ratio_conl such that this test fails? >> I >> think I'm missing something... >> >> -- Chuck >> >> >> Vladimir >> >> On 9/26/16 5:01 PM, Chuck Rasbold wrote: >> >> Just to confirm, are you suggesting that the ratio be >> first >> computed as a 64 bit quantity, effectively along the >> lines of... >> >> long ratio_conl = ((long) stride_con2) / stride_con; >> >> if ((ratio_conl * stride_con) == stride_con2 && >> ratio_conl < 0x8000000 ) { // Check for exact >> int ratio_con = (int) ratio_conl; >> >> >> On Mon, Sep 26, 2016 at 3:45 PM, Vladimir Kozlov >> > >> > > >> > >> > >>> wrote: >> >> Hi Chuck >> >> Can you do 'long' arithmetic in existing condition >> to catch >> integer overflow instead? >> >> if ((ratio_con * stride_con) == stride_con2) { // >> Check for >> exact >> >> thanks, >> Vladimir >> >> >> >> On 9/26/16 3:18 PM, Chuck Rasbold wrote: >> >> A small fix for an edge case crash in C2... >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8166742 >> >> > > >> > >> > >> >> Webrev: >> http://cr.openjdk.java.net/~rasbold/8166742/webrev.00/ >> >> > > >> > >> > >> >> >> Requesting a sponsor and reviews. Thanks. >> >> -- Chuck >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Wed Sep 28 17:10:33 2016 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 28 Sep 2016 20:10:33 +0300 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> <4c873846-5322-ebdf-5e0a-393089aea590@oracle.com> Message-ID: <78ec340d-7c56-233a-5c0f-4f60b7bdda89@oracle.com> Kris, > A null guard is a good way to go. It's basically the same kind of logic > that C2 OSR entry already uses. In this case, at a call site, a null > guard on the caller-side against an argument whose type is unloaded is > one way to do it. For the case when argument value (null) is a compile-time constant, the guard collapses right away. So, it sounds like a good solution. > (There are of course other alternatives. e.g. If we focus on the > callee-side, in a compiler with a mixed top-down / bottom-up inlining > heuristics system, the (devirtualized if needed) callee can be inspected > first to see if an argument of unloaded type is never used or not. If it > is never used, don't even bother inserting the null guard on the > caller-side, and just go ahead and inline would be good and safe. C2 > doesn't have this luxury yet so tackling the problem with a caller-side > solution is easier to do.) > > IMO a nmethod dependency on an "unloaded class" isn't that feasible, > since you might not even have a concrete entity to "depend on", and > registering symbolic dependencies for "unloaded classes" in general, > even though I believe is doable, might be rather tedious. Agree, it requires a new flavor of nmethod dependency. Best regards, Vladimir Ivanov From rednaxelafx at gmail.com Wed Sep 28 17:15:53 2016 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 28 Sep 2016 10:15:53 -0700 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: <78ec340d-7c56-233a-5c0f-4f60b7bdda89@oracle.com> References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> <4c873846-5322-ebdf-5e0a-393089aea590@oracle.com> <78ec340d-7c56-233a-5c0f-4f60b7bdda89@oracle.com> Message-ID: On Wed, Sep 28, 2016 at 10:10 AM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > Kris, > > A null guard is a good way to go. It's basically the same kind of logic >> that C2 OSR entry already uses. In this case, at a call site, a null >> guard on the caller-side against an argument whose type is unloaded is >> one way to do it. >> > > For the case when argument value (null) is a compile-time constant, the > guard collapses right away. So, it sounds like a good solution. That's exactly what I'm doing with my other patch. Let me prepare that and send it out for review sometime this weekend. Thanks, Kris > > > (There are of course other alternatives. e.g. If we focus on the >> callee-side, in a compiler with a mixed top-down / bottom-up inlining >> heuristics system, the (devirtualized if needed) callee can be inspected >> first to see if an argument of unloaded type is never used or not. If it >> is never used, don't even bother inserting the null guard on the >> caller-side, and just go ahead and inline would be good and safe. C2 >> doesn't have this luxury yet so tackling the problem with a caller-side >> solution is easier to do.) >> >> IMO a nmethod dependency on an "unloaded class" isn't that feasible, >> since you might not even have a concrete entity to "depend on", and >> registering symbolic dependencies for "unloaded classes" in general, >> even though I believe is doable, might be rather tedious. >> > > Agree, it requires a new flavor of nmethod dependency. > > Best regards, > Vladimir Ivanov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Sep 28 17:37:58 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 28 Sep 2016 10:37:58 -0700 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> <4c873846-5322-ebdf-5e0a-393089aea590@oracle.com> Message-ID: On Sep 28, 2016, at 9:42 AM, Krystal Mok wrote: > > I do also have another patch for the general case for "unused unloaded arguments". I haven't gotten around to polish and test that patch yet, but since we're seeing a good motivation on the OpenJDK side as well, I may as well go back and get that patch ready soon. > > A null guard is a good way to go. It's basically the same kind of logic that C2 OSR entry already uses. In this case, at a call site, a null guard on the caller-side against an argument whose type is unloaded is one way to do it. This is the fix I would prefer for the inliner. > (There are of course other alternatives. e.g. If we focus on the callee-side, in a compiler with a mixed top-down / bottom-up inlining heuristics system, the (devirtualized if needed) callee can be inspected first to see if an argument of unloaded type is never used or not. If it is never used, don't even bother inserting the null guard on the caller-side, and just go ahead and inline would be good and safe. C2 doesn't have this luxury yet so tackling the problem with a caller-side solution is easier to do.) I'd like to do more in this direction. The EA function summarizer could be overloaded to also gather data on the usage of arguments (as well as their escape status). For example, if an argument is used to gate a branch (somehow), then having that argument be constant should "add points" to the heuristic that decides inlining. *In general*, constant arguments should be an "argument" to raise the likelihood of inlining a call. I'm going to guess that this work would be better done in Graal, but we don't have that luxury yet. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Sep 28 17:46:03 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 28 Sep 2016 13:46:03 -0400 Subject: Odd interaction between ArrayList$Itr and Escape Analysis In-Reply-To: References: <1619527975.952230.1473776309365.JavaMail.zimbra@u-pem.fr> <4c873846-5322-ebdf-5e0a-393089aea590@oracle.com> Message-ID: On Wed, Sep 28, 2016 at 1:37 PM, John Rose wrote: > On Sep 28, 2016, at 9:42 AM, Krystal Mok wrote: > > > I do also have another patch for the general case for "unused unloaded > arguments". I haven't gotten around to polish and test that patch yet, but > since we're seeing a good motivation on the OpenJDK side as well, I may as > well go back and get that patch ready soon. > > A null guard is a good way to go. It's basically the same kind of logic > that C2 OSR entry already uses. In this case, at a call site, a null guard > on the caller-side against an argument whose type is unloaded is one way to > do it. > > > This is the fix I would prefer for the inliner. > > (There are of course other alternatives. e.g. If we focus on the > callee-side, in a compiler with a mixed top-down / bottom-up inlining > heuristics system, the (devirtualized if needed) callee can be inspected > first to see if an argument of unloaded type is never used or not. If it is > never used, don't even bother inserting the null guard on the caller-side, > and just go ahead and inline would be good and safe. C2 doesn't have this > luxury yet so tackling the problem with a caller-side solution is easier to > do.) > > > I'd like to do more in this direction. The EA function summarizer could > be overloaded to also gather data on the usage of arguments (as well as > their escape status). For example, if an argument is used to gate a branch > (somehow), then having that argument be constant should "add points" to the > heuristic that decides inlining. *In general*, constant arguments should > be an "argument" to raise the likelihood of inlining a call. > Yes! I'm a bit surprised we've gone so long without constants adding bonus points for inlining. I'm definitely seeing places where a callsite isn't inlined for one reason or another, but there's a constant (sometimes several) being passed through which would end up folding a bunch of code in the callee, and sometimes eliminating code there altogether. > > I'm going to guess that this work would be better done in Graal, but we > don't have that luxury yet. > > ? John > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Sep 28 18:00:53 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 28 Sep 2016 11:00:53 -0700 Subject: RFR(XS): 8166806: Add intrinsic support for writer used in event based tracing In-Reply-To: References: Message-ID: <08a3d3a5-5d59-b18d-4aa6-82a2662408d3@oracle.com> Hi Markus, Where _getBufferWriter is defined? I don't see closed changes. c1_LIRGenerator.cpp: should you use oopConst(NULL) in compare? library_call.cpp: TypeInstPtr::MIRROR is useless since the result phi type is TypePtr::BOTTOM. Using TypePtr::BOTTOM for load could be less bug prone. Thanks, Vladimir On 9/27/16 1:58 PM, Markus Gronlund wrote: > Greetings, > > > > Kindly asking for reviews for the following change: > > > > Bug: http://bugs.openjdk.java.net/browse/JDK-8166806 > > Webrev: http://cr.openjdk.java.net/~mgronlun/8166806/webrev/ > > > > Thanks in advance > > Markus > From vladimir.kozlov at oracle.com Wed Sep 28 18:38:39 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 28 Sep 2016 11:38:39 -0700 Subject: RFR(S): 8166836: Elimination of clone's ArrayCopyNode may make compilation fail silently In-Reply-To: <182f42a0-caca-ea5b-0e7d-3ebaff5f1bc1@oracle.com> References: <182f42a0-caca-ea5b-0e7d-3ebaff5f1bc1@oracle.com> Message-ID: I thought we do that. There are several places in loopnode.cpp where we hit assert if graph is bad. Thanks, Vladimir On 9/28/16 3:04 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~roland/8166836/webrev.00/ > > Looks good. > >> It's quite unfortunate that this wasn't found by testing because >> compilations where the graph is non schedulable simply fail. This could >> have gone unnoticed much longer. In debug builds shouldn't we abort the >> VM in C2Compiler::compile_method() if the compilation fails because of a >> non schedulable graph? > > Sounds reasonable. I expect there are other cases when compilers bail > out unexpectely. It would be good to have an assert checking it doesn't > happen. > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Wed Sep 28 18:44:59 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 28 Sep 2016 11:44:59 -0700 Subject: RFR: 8166781: fix wrong comment in ReceiverTypeData In-Reply-To: <6e26665d-b687-4ef8-619f-b398eca63b2a@oracle.com> References: <6e26665d-b687-4ef8-619f-b398eca63b2a@oracle.com> Message-ID: <6e307bee-3205-b10c-5120-a68ef56c6f5e@oracle.com> Yes, the comment change is correct. Thank you for poining our previous discussion. Vladimir On 9/27/16 5:54 AM, Roland Schatz wrote: > Hi, > > Please review this comment fix: > > webrev: http://cr.openjdk.java.net/~rschatz/JDK-8166781/webrev.00/ > issue: https://bugs.openjdk.java.net/browse/JDK-8166781 > > According to my reading of the code, the comment should now agree with > the code. > But I don't pretend to really understand that code. It would be nice if > someone who knows about the profiling code could confirm that's actually > true ;) > > See also previous thread about that issue: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-August/024105.html > > > Thanks, > Roland > From rwestrel at redhat.com Thu Sep 29 13:29:43 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 29 Sep 2016 15:29:43 +0200 Subject: RFR(S): 8166836: Elimination of clone's ArrayCopyNode may make compilation fail silently In-Reply-To: <182f42a0-caca-ea5b-0e7d-3ebaff5f1bc1@oracle.com> References: <182f42a0-caca-ea5b-0e7d-3ebaff5f1bc1@oracle.com> Message-ID: >> http://cr.openjdk.java.net/~roland/8166836/webrev.00/ > > Looks good. Thanks for the review, Vladimir. Roland. From rwestrel at redhat.com Thu Sep 29 13:31:51 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 29 Sep 2016 15:31:51 +0200 Subject: RFR(S): 8166836: Elimination of clone's ArrayCopyNode may make compilation fail silently In-Reply-To: References: <182f42a0-caca-ea5b-0e7d-3ebaff5f1bc1@oracle.com> Message-ID: Thanks for taking a look at this. > I thought we do that. There are several places in loopnode.cpp where we > hit assert if graph is bad. In my case, the graph becomes unschedulable because of anti dependencies so only after loop opts. Should I open another bug to check C.failure_reason() in C2Compiler::compile_method() and abort if the schedule failed? Roland. From alexander.vorobyev at oracle.com Thu Sep 29 16:30:24 2016 From: alexander.vorobyev at oracle.com (Alexander Vorobyev) Date: Thu, 29 Sep 2016 19:30:24 +0300 Subject: Request for review: JDK-8145728: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java Expected message not found: 'com.sun.crypto.provider.AESCrypt::(implEncryptBlock|implDecryptBlock) ([0-9]+ bytes) (intrinsic) not found on supported platfroms In-Reply-To: <542E8041.1010101@oracle.com> References: <542E8041.1010101@oracle.com> Message-ID: <3f1a7b1e-1ec1-6af6-5b38-84eae3ba4d40@oracle.com> Hi All, I'd like review for JDK-8145728 (https://bugs.openjdk.java.net/browse/JDK-8145728) Judging by the test results, test fails with specific compiler options: -XX:+TieredCompilation -XX:TieredStopAtLevel=N, where N<4. In this case C2 is not used and we are not able to see intrinsics usage in the test log. So such configuration is not valid for this test and should not be used. Supposed fix is to prevent this test from accepting such options. "@requires" tag was added: @requires vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4 Here is webrev: http://cr.openjdk.java.net/~avorobye/8145728/webrew.00/ Thanks, Alexander From vladimir.kozlov at oracle.com Thu Sep 29 16:38:19 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Sep 2016 09:38:19 -0700 Subject: Request for review: JDK-8145728: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java Expected message not found: 'com.sun.crypto.provider.AESCrypt::(implEncryptBlock|implDecryptBlock) ([0-9]+ bytes) (intrinsic) not found on supported platfroms In-Reply-To: <3f1a7b1e-1ec1-6af6-5b38-84eae3ba4d40@oracle.com> References: <542E8041.1010101@oracle.com> <3f1a7b1e-1ec1-6af6-5b38-84eae3ba4d40@oracle.com> Message-ID: <4949ae4e-d2f5-f09b-7c8c-ea99cc61351e@oracle.com> Looks good. Did you run all compiler/cpuflags tests to verify that we don't need to fix other tests too? Thanks, Vladimir On 9/29/16 9:30 AM, Alexander Vorobyev wrote: > > Hi All, > > I'd like review for JDK-8145728 > (https://bugs.openjdk.java.net/browse/JDK-8145728) > > Judging by the test results, test fails with specific compiler options: > -XX:+TieredCompilation -XX:TieredStopAtLevel=N, where N<4. In this case > C2 is not used and we are not able to see intrinsics usage in the test > log. So such configuration is not valid for this test and should not be > used. Supposed fix is to prevent this test from accepting such options. > > "@requires" tag was added: > @requires vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4 > > > Here is webrev: > http://cr.openjdk.java.net/~avorobye/8145728/webrew.00/ > > > Thanks, > Alexander > > > From vladimir.kozlov at oracle.com Thu Sep 29 16:44:51 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Sep 2016 09:44:51 -0700 Subject: RFR(S): 8166836: Elimination of clone's ArrayCopyNode may make compilation fail silently In-Reply-To: References: <182f42a0-caca-ea5b-0e7d-3ebaff5f1bc1@oracle.com> Message-ID: <0b65059d-d5c1-9451-2cc9-4f8d7c00ae9d@oracle.com> Yea, by all means. Thanks, Vladimir On 9/29/16 6:31 AM, Roland Westrelin wrote: > > Thanks for taking a look at this. > >> I thought we do that. There are several places in loopnode.cpp where we >> hit assert if graph is bad. > > In my case, the graph becomes unschedulable because of anti dependencies > so only after loop opts. Should I open another bug to check > C.failure_reason() in C2Compiler::compile_method() and abort if the > schedule failed? > > Roland. > From tom.rodriguez at oracle.com Thu Sep 29 18:25:15 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 29 Sep 2016 11:25:15 -0700 Subject: RFR(S) 8166869: [JVMCI] record metadata relocations for metadata references Message-ID: <73408CA4-EA92-41B4-9499-AE89F4F2F27B@oracle.com> http://cr.openjdk.java.net/~never/8166869/webrev https://bugs.openjdk.java.net/browse/JDK-8166869 JVMCI records metadata references in the metadata section, so scanning of referenced metadata will work properly but it never actually creates a relocation in code or constants section. This means the disassembly is a little less readable than it might be. This adds the creation of the appropriate relocation. Tested by inspection of assembly printing on sparc and x86. tom From vladimir.kozlov at oracle.com Thu Sep 29 18:30:08 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 29 Sep 2016 11:30:08 -0700 Subject: RFR(S) 8166869: [JVMCI] record metadata relocations for metadata references In-Reply-To: <73408CA4-EA92-41B4-9499-AE89F4F2F27B@oracle.com> References: <73408CA4-EA92-41B4-9499-AE89F4F2F27B@oracle.com> Message-ID: <650ac4af-ea41-27b7-d52d-49be3e282c31@oracle.com> Looks good. Thanks, Vladimir On 9/29/16 11:25 AM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8166869/webrev > https://bugs.openjdk.java.net/browse/JDK-8166869 > > JVMCI records metadata references in the metadata section, so scanning of referenced metadata will work properly but it never actually creates a relocation in code or constants section. This means the disassembly is a little less readable than it might be. This adds the creation of the appropriate relocation. Tested by inspection of assembly printing on sparc and x86. > > tom > From tom.rodriguez at oracle.com Fri Sep 30 00:24:55 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 29 Sep 2016 17:24:55 -0700 Subject: RFR 8166929: [JVMCI] Expose decompile counts in MDO Message-ID: <54C73C2D-F44F-491E-92C3-79DE73CE7B8F@oracle.com> http://cr.openjdk.java.net/~never/8166929/webrev https://bugs.openjdk.java.net/browse/JDK-8166929 This is a minor API addition to expose some of the top-level MDO decompile and recompile counts. It?s necessary to detect recompilation pathologies. Tested by printing MDOs from JVMCI. I also fixed a few problems I discovered with the formatting of the MDO printed form. tom From vitalyd at gmail.com Fri Sep 30 02:16:52 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 29 Sep 2016 22:16:52 -0400 Subject: RFR 8166929: [JVMCI] Expose decompile counts in MDO In-Reply-To: <54C73C2D-F44F-491E-92C3-79DE73CE7B8F@oracle.com> References: <54C73C2D-F44F-491E-92C3-79DE73CE7B8F@oracle.com> Message-ID: Quick fly-by comment: HotSpotMethodData::toString should use %d for overflow recompiles count printing, like the other counters. Thanks On Thursday, September 29, 2016, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8166929/webrev > https://bugs.openjdk.java.net/browse/JDK-8166929 > > This is a minor API addition to expose some of the top-level MDO decompile > and recompile counts. It?s necessary to detect recompilation pathologies. > Tested by printing MDOs from JVMCI. I also fixed a few problems I > discovered with the formatting of the MDO printed form. > > tom -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Fri Sep 30 07:52:44 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 30 Sep 2016 09:52:44 +0200 Subject: RFR(S): 8166836: Elimination of clone's ArrayCopyNode may make compilation fail silently In-Reply-To: References: Message-ID: > http://cr.openjdk.java.net/~roland/8166836/webrev.00/ I need a sponsor for this. Roland. From zoltan.majo at oracle.com Fri Sep 30 08:16:33 2016 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Fri, 30 Sep 2016 10:16:33 +0200 Subject: RFR(S): 8166836: Elimination of clone's ArrayCopyNode may make compilation fail silently In-Reply-To: References: Message-ID: <22cacbd4-052f-72ef-f671-7b905c3e47dd@oracle.com> Hi Roland, On 09/30/2016 09:52 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8166836/webrev.00/ > I need a sponsor for this. I'll take care of it. Best regards, Zoltan > > Roland. From HORII at jp.ibm.com Fri Sep 30 10:17:05 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Fri, 30 Sep 2016 10:17:05 +0000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> Message-ID: Dear David, and Dan, Thank you for your comments. > In hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: > 266 the log line reads data from the forwardee even when the CAS > fails. I believe those reads will be unsafe without barriers after > the copy of the content of the object. > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:288 > same problem as in line 266 Can we use o->size() or new_obj_size instead of new_obj->size()? > If you feel that the use of new_obj->size() is potentially unsafe then > the fact we return new_obj means that any use of new_obj by the caller > may also potentially be unsafe. In my understanding, while copying objects to a survivor space, if a thread creates a new_obj and sets a pointer with CAS, the other threads can touch the new_obj after the thread calls push_contents(new_obj) (Line: 239). In push_contents, OrderAccess::release_store is called before pushing the object as a task into a deque of workstealing (taskqueue.inline.hpp). If the other thread reads the task, all of copy for new_obj is safe. Thank you for your helps again. I may be misunderstanding or missing something critical. Any comments and claims are always appreciated. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes wrote on 09/30/2016 07:16:16: > From: David Holmes > To: Carsten Varming , Hiroshi H Horii/Japan/IBM at IBMJP > Cc: Tim Ellison , "ppc-aix-port- > dev at openjdk.java.net" , "hotspot- > runtime-dev at openjdk.java.net" dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" gc-dev at openjdk.java.net>, hotspot-compiler-dev dev-bounces at openjdk.java.net> > Date: 09/30/2016 07:17 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > On 30/09/2016 12:47 AM, Carsten Varming wrote: > > Dear Hiroshi, > > > > In hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:266 > > the log line reads data from the forwardee even when the CAS fails. I > > believe those reads will be unsafe without barriers after the copy of > > the content of the object. > > I find it extremely hard to reason about a barrier-less cmpxchg in general. > > If you feel that the use of new_obj->size() is potentially unsafe then > the fact we return new_obj means that any use of new_obj by the caller > may also potentially be unsafe. > > David > ----- > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:288 same > > problem as in line 266 > > > > I would argue that the logging should only happen if the thread > > successfully copied the object and CAS failures should be logged > > separately without reading data from the forwardee. > > > > BTW, unrelated to your change: It seems like the logging in line 266 > > should be guarded by something like "if (log_develop_is_enabled(Trace, > > gc, scavenge)" like the logging in line 288. > > > > Carsten > > > > On Thu, Sep 29, 2016 at 8:00 AM, Hiroshi H Horii > > wrote: > > > > Hi all, > > > > Can I please request reviews for a change for 8154736 that improve > > copy_to_survivor performance of ppc64 and aarch64? > > If possible, I would like to include this change into jdk9. > > > > 8154736 includes two changes, cmpxchg and copy_to_suvivor, and the > > former > > was resolved as 8155949. > > Now, I would like to ask a review for the remaining, copy_to_suvivor > > change. > > > > webrev: > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.01/ > > < http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.01/> > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8154736 > > > > > > I tested this change with SPECjbb2013. Also, I re-check that relaxed > > cmpxchg is available for changing forwarding pointers. However, because > > this change is sensitive, we need more reviews not only from > > compiler-dev, > > but also from gc-dev. > > > > Regards, > > Hiroshi > > ----------------------- > > Hiroshi Horii, Ph.D. > > IBM Research - Tokyo > > > > > > > > > > From: David Holmes > > > > To: "Doerr, Martin" > >, Hiroshi H > > Horii/Japan/IBM at IBMJP > > Cc: Tim Ellison > >, > > "ppc-aix-port-dev at openjdk.java.net > > " > > > >, > > "hotspot-gc-dev at openjdk.java.net > > " > > > >, > > "hotspot-runtime-dev at openjdk.java.net > > " > > > > > > Date: 05/10/2016 19:31 > > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > > copy_to_survivor for ppc64 > > > > > > > > On 10/05/2016 7:41 PM, Doerr, Martin wrote: > > > Hi David, > > > > > > thank you very much for testing the other platforms. > > > > > > Here's an updated webrev: > > > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ > > > > > > Thanks. Second test run on its way. > > > > David > > ----- > > > > > Best regards, > > > Martin > > > > > > -----Original Message----- > > > From: hotspot-runtime-dev [ > > mailto:hotspot-runtime-dev-bounces at openjdk.java.net > > ] On Behalf Of > > David > > Holmes > > > Sent: Dienstag, 10. Mai 2016 11:11 > > > To: Hiroshi H Horii > > > > Cc: Tim Ellison > >; > > ppc-aix-port-dev at openjdk.java.net > > ; > > hotspot-gc-dev at openjdk.java.net > > ; > > hotspot-runtime-dev at openjdk.java.net > > > > > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > > copy_to_survivor for ppc64 > > > > > > The fix seems incomplete for solaris: > > > > > > make/Main.gmk:232: recipe for target 'hotspot' failed > > > > > "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/ > solaris_x86/vm/atomic_solaris_x86.inline.hpp", > > > line 124: Error: Too many arguments in call to > > > "_Atomic_cmpxchg_long(long, volatile long*, long)". > > > > > "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/ > solaris_x86/vm/atomic_solaris_x86.inline.hpp", > > > line 128: Error: Too many arguments in call to > > > "_Atomic_cmpxchg_long(long, volatile long*, long)". > > > > > > David > > > > > > On 10/05/2016 5:34 PM, David Holmes wrote: > > >> Hi Hiroshi, > > >> > > >> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote: > > >>> Hi David, > > >>> > > >>> Thank you for your comments. > > >>> > > >>> As Martin suggested me, I would like to separate this proposal to > > >>> - relaxing memory order of cmpxchg > > >>> - improvement of copy_to_survivior with relaxed cmpxchg > > >>> and discuss the former first. > > >>> > > >>> Martin thankfully created a new webrev that include a change of > > cmpxchg. > > >>> > > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/ > > > > >>> He has already tested it with AIX, linuxx86_64, linuxppc64le and > > >>> darwinintel64. > > >>> (Please tell me if I need to send a new mail for this PFR) > > >> > > >> Please do as it will be simpler to track that way. > > >> > > >>>> What I would prefer to see is an additional memory_order value > > (such > > as > > >>>> memory_order_ignored) which is the default for all methods declared > > to > > >>>> take a memory_order parameter. > > >>> > > >>> We added simple enum to specify memory order in atomic.hpp as > > follows. > > >>> > > >>> typedef enum cmpxchg_cmpxchg_memory_order { > > >>> memory_order_relaxed, > > >>> memory_order_conservative > > >>> } cmpxchg_memory_order; > > >>> > > >>> All of cmpxchg functions have an argument of cmpxchg_memory_order > > >>> with a default value memory_order_conservative that uses the same > > >>> semantics with the existing cmpxchg and requires no change for the > > >>> existing > > >>> callers. If you think "memory_order_ignored" is better than > > >>> "memory_order_conservative", I will be happy to modify this change. > > >>> (I just thought, "ignored" may resemble "relaxed" and may make > > >>> people who are familiar with C++11's memory semantics confused. > > >>> I would like to know thoughts of native speakers.) > > >> > > >> That is fine by me. I don't think "ignored" would be confused with > > >> "relaxed", but "conservative" is fine. > > >> > > >> I will run the patch through our internal build system while you > > prepare > > >> the updated RFR. My only concern is "unused argument" warnings > > from the > > >> compiler. :) > > >> > > >> We are quickly running into a hard deadline with Feature Complete > > >> however - possibly less than 24 hours - for hotspot changes. If this > > >> doesn't get in in time I will see if I can shepherd it through the > > >> approval process. > > >> > > >> Thanks, > > >> David > > >> > > >> > > >>> Regards, > > >>> Hiroshi > > >>> ----------------------- > > >>> Hiroshi Horii, Ph.D. > > >>> IBM Research - Tokyo > > >>> > > >>> > > >>> David Holmes > > wrote on 05/04/2016 14:55:29: > > >>> > > >>>> From: David Holmes > > > > >>>> To: Hiroshi H Horii/Japan/IBM at IBMJP > > >>>> Cc: hotspot-gc-dev at openjdk.java.net > > , hotspot-runtime- > > >>>> dev at openjdk.java.net , > > ppc-aix-port-dev at openjdk.java.net > > , Tim Ellison > > >>>> >, > > Volker Simonis > >, > > >>>> "Doerr, Martin" > >, "Lindenmaier, Goetz" > > >>>> > > > >>>> Date: 05/04/2016 14:57 > > >>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > > >>>> copy_to_survivor for ppc64 > > >>>> > > >>>> Hi Hiroshi, > > >>>> > > >>>> Sorry for the delay on getting back to this. > > >>>> > > >>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote: > > >>>>> Hi David, > > >>>>> > > >>>>> Thank you for your comments and questions. > > >>>>> > > >>>>>> 1. Are the current cmpxchg semantics exactly the same as > > >>>>>> memory_order_seq_cst? > > >>>>> > > >>>>> This is very good question.. > > >>>>> > > >>>>> I guess, cmpxchg needs a more conservative constraint for memory > > >>> ordering > > >>>>> than C++11, to add sync after a compare-and-exchange operation. > > >>>>> > > >>>>> Could someone give comments or thoughts? > > >>>> > > >>>> I don't want to comment on the comparison with C++11. What I would > > >>>> prefer to see is an additional memory_order value (such as > > >>>> memory_order_ignored) which is the default for all methods declared > > to > > >>>> take a memory_order parameter. That way existing > > implementations are > > >>>> clearly ignoring the memory_order attribute and there is no > > potential > > >>>> for confusion as to whether the existing implementations equate to > > >>>> memory_order_seq_cst or not. > > >>>> > > >>>> That said, I'm not sure it makes sense to add the memory_order > > parameter > > >>>> to all methods with "cas" in their name, e.g. > > oopDesc::cas_set_mark, > > >>>> oopDesc::cas_forward_to, unless those methods can sensibly be > > called > > >>>> with any value for memory_order - which seems highly unlikely. > > Perhaps > > >>>> those methods should identify the weakest form of memory_order they > > >>>> support and that should be hard-wired into them? > > >>>> > > >>>> Thanks, > > >>>> David > > >>>> > > >>>>> memory_order_seq_cst is defined as > > >>>>> "Any operation with this memory order is both an acquire > > >>> operation and > > >>>>> a release operation, plus a single total order exists in > > which > > >>>> all > > >>>>> threads > > >>>>> observe all modifications (see below) in the same order." > > >>>>> (http://en.cppreference.com/w/cpp/atomic/memory_order > > ) > > >>>>> > > >>>>> In my environment, g++ and xlc generate following assemblies on > > >>>> ppc64le. > > >>>>> (interestingly, they generates the same assemblies for any > > >>>> memory_order) > > >>>>> > > >>>>> g++ (4.9.2) > > >>>>> 100008a4: ac 04 00 7c sync > > >>>>> 100008a8: 28 50 20 7d lwarx r9,0,r10 > > >>>>> 100008ac: 00 18 09 7c cmpw r9,r3 > > >>>>> 100008b0: 0c 00 c2 40 bne- 100008bc > > >>>>> 100008b4: 2d 51 80 7c stwcx. r4,0,r10 > > >>>>> 100008b8: f0 ff c2 40 bne- 100008a8 > > >>>>> 100008bc: 2c 01 00 4c isync > > >>>>> > > >>>>> xlc (13.1.3) > > >>>>> 10000888: ac 04 00 7c sync > > >>>>> 1000088c: 28 28 c0 7c lwarx r6,0,r5 > > >>>>> 10000890: 40 00 26 7c cmpld r6,r0 > > >>>>> 10000894: 0c 00 82 40 bne 100008a0 > > >>>>> 10000898: 2d 29 80 7c stwcx. r4,0,r5 > > >>>>> 1000089c: f0 ff e2 40 bne+ 1000088c > > >>>>> 100008a0: 2c 01 00 4c isync > > >>>>> > > >>>>> On the other hand, the current OpenJDK generates following > > assemblies. > > >>>>> > > >>>>> 508: ac 04 00 7c sync > > >>>>> 50c: 00 00 5c e9 ld r10,0(r28) > > >>>>> 510: 00 50 3b 7c cmpd r27,r10 > > >>>>> 514: 1c 00 c2 40 bne- 530 > > >>>>> 518: a8 40 5c 7d ldarx r10,r28,r8 > > >>>>> 51c: 00 50 3b 7c cmpd r27,r10 > > >>>>> 520: 10 00 c2 40 bne- 530 > > >>>>> 524: ad 41 3c 7d stdcx. r9,r28,r8 > > >>>>> 528: f0 ff c2 40 bne- 518 > > >>>>> 52c: ac 04 00 7c sync > > >>>>> 530: 00 50 bb 7f ... > > >>>>> > > >>>>> Though we can ignore 50c-514 (because they are a duplicated guard > > >>>>> condition), > > >>>>> the last sync instruction (52c) makes cmpxchg more strict than > > >>>>> memory_order_seq_cst. > > >>>>> > > >>>>> In some cases, the last sync is necessary when this thread must be > > >>>> able > > >>>>> to read > > >>>>> all of the changes in the other threads while executing from > > 508 to > > >>>> 530 > > >>>>> (that processes compare-and-exchange). > > >>>>> > > >>>>>> 2. Has there been a discussion already, establishing that the > > >>>> modified > > >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is > > >>>>>> postulating that and based on what evidence? > > >>>>> > > >>>>> Volker and his colleagues have investigated the current GC codes > > >>>>> according to this. > > >>>>> > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > > > > >>>> April/019079.html > > >>>>> However, I believe, we need comments of other GC expertsto change > > >>>>> the shared codes. > > >>>>> > > >>>>> Regards, > > >>>>> Hiroshi > > >>>>> ----------------------- > > >>>>> Hiroshi Horii, Ph.D. > > >>>>> IBM Research - Tokyo > > >>>>> > > >>>>> > > >>>>> David Holmes > > wrote on 04/22/2016 21:57:07: > > >>>>> > > >>>>>> From: David Holmes > > > > >>>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- > > >>>>>> dev at openjdk.java.net , > > hotspot-gc-dev at openjdk.java.net < mailto:hotspot-gc-dev at openjdk.java.net> > > >>>>>> Cc: Tim Ellison > >, > > >>>>> ppc-aix-port-dev at openjdk.java.net > > > > >>>>>> Date: 04/22/2016 21:58 > > >>>>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > > >>>>>> copy_to_survivor for ppc64 > > >>>>>> > > >>>>>> Hi Hiroshi, > > >>>>>> > > >>>>>> Two initial questions: > > >>>>>> > > >>>>>> 1. Are the current cmpxchg semantics exactly the same as > > >>>>>> memory_order_seq_cst? > > >>>>>> > > >>>>>> 2. Has there been a discussion already, establishing that the > > >>>> modified > > >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is > > >>>>>> postulating that and based on what evidence? > > >>>>>> > > >>>>>> Missing memory barriers have caused very difficult to track down > > >>> bugs in > > >>>>>> the past - very rare race conditions. So any relaxation here has > > >>>> to be > > >>>>>> done with extreme confidence. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> David > > >>>>>> > > >>>>>> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: > > >>>>>>> Dear all: > > >>>>>>> > > >>>>>>> Can I please request reviews for the following change? > > >>>>>>> > > >>>>>>> Code change: > > >>>>>>> > > >>> > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ > > < http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/> > > >>>>>>> (I initially created and Martin enhanced so much) > > >>>>>>> > > >>>>>>> This change follows the discussion started from this mail. > > >>>>>>> > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > > > > >>>>>> April/018960.html > > >>>>>>> > > >>>>>>> Description: > > >>>>>>> This change provides relaxed compare-and-exchange by introducing > > >>>>>>> similar semantics of C++ atomic memory operators, enum > > >>>> memory_order. > > >>>>>>> As described in atomic_linux_ppc.inline.hpp, the current > > >>>>> implementation of > > >>>>>>> cmpxchg is fence_cmpxchg_acquire. This implementation is useful > > for > > >>>>>>> general purposes because twice calls of sync before and after > > >>>>> cmpxchg will > > >>>>>>> provide strict consistency. However, they sometimes cause > > overheads > > >>>>>>> because > > >>>>>>> sync instructions are very expensive in the current POWER chip > > >>> design. > > >>>>>>> In addition, for the other platforms, such as aarch64, this > > strict > > >>>>>>> semantics > > >>>>>>> may cause some overheads (according to the Andrew's mail). > > >>>>>>> > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > > > > >>>>>> April/019073.html > > >>>>>>> > > >>>>>>> With this change, callers can explicitly specify constraints of > > >>> memory > > >>>>>>> ordering > > >>>>>>> for cmpxchg with an additional parameter, memory_order order. > > >>>>>>> > > >>>>>>> typedef enum memory_order { > > >>>>>>> memory_order_relaxed, > > >>>>>>> memory_order_consume, > > >>>>>>> memory_order_acquire, > > >>>>>>> memory_order_release, > > >>>>>>> memory_order_acq_rel, > > >>>>>>> memory_order_seq_cst > > >>>>>>> } memory_order; > > >>>>>>> > > >>>>>>> Because the default value of the parameter is > > memory_order_seq_cst, > > >>>>>>> existing codes can use the same semantics of cmpxchg without any > > >>>>>>> modification. The relaxed cmpxchg is implemented only on ppc > > >>>>>>> in this changeset. Therefore, the behavior on the other > > platforms > > >>> will > > >>>>>>> not be changed with this changeset. > > >>>>>>> > > >>>>>>> In addition, with the new parameter of cmpxchg, this change > > >>>> improves > > >>>>>>> performance of copy_to_survivor in the parallel GC. > > >>>>>>> copy_to_survivor changes forward pointers by using cmpxchg. This > > >>>>>>> operation doesn't require any sync instructions. A pointer is > > >>> changed > > >>>>>>> at most once in a GC and when cmpxchg fails, the latest > > pointer is > > >>>>>>> available for the caller. cas_set_mark and cas_forward_to are > > >>> extended > > >>>>>>> with an additional memory_order parameter as cmpxchg and > > >>>>> copy_to_survivor > > >>>>>>> uses memory_order_relaxed to modify the forward pointers. > > >>>>>>> > > >>>>>>> Summary of source code changes: > > >>>>>>> > > >>>>>>> * src/share/vm/runtime/atomic.hpp > > >>>>>>> - Defines enum memory_order and adds a parameter to > > cmpxchg. > > >>>>>>> > > >>>>>>> * src/share/vm/runtime/atomic.cpp > > >>>>>>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > > >>>>>>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > > >>>>>>> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > > >>>>>>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > > >>>>>>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > > >>>>>>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > > >>>>>>> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > > >>>>>>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > > >>>>>>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > > >>>>>>> - Added a parameter for each cmpxchg function to follow > > >>>>>>> the change of atomic.hpp. Their implementations are not > > >>>>> changed. > > >>>>>>> > > >>>>>>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > > >>>>>>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > >>>>>>> - Added a parameter for each cmpxchg function to follow > > >>>>>>> the change of atomic.hpp. In addition, implementations > > >>>>>>> are changed corresponding to the specified > > memory_order. > > >>>>>>> > > >>>>>>> * src/share/vm/oops/oop.hpp > > >>>>>>> * src/share/vm/oops/oop.inline.hpp > > >>>>>>> - Add a memory_order parameter to use relaxed cmpxchg in > > >>>>>>> cas_set_mark and cas_forward_to. > > >>>>>>> > > >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.cpp > > >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.inline.hpp > > >>>>>>> > > >>>>>>> Martin tested this changeset on linuxx86_64, linuxppc64le and > > >>>>>>> darwinintel64. > > >>>>>>> Though more time is needed to test on the other platform, we > > would > > >>>>> like to > > >>>>>>> ask > > >>>>>>> reviews and start discussion on this changeset. > > >>>>>>> I also tested this changeset with SPECjbb2013 and confirmed that > > gc > > >>>>> pause > > >>>>>>> time > > >>>>>>> is reduced. > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> Hiroshi > > >>>>>>> ----------------------- > > >>>>>>> Hiroshi Horii, Ph.D. > > >>>>>>> IBM Research - Tokyo > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Fri Sep 30 11:12:27 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 30 Sep 2016 21:12:27 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> Message-ID: <1e40040e-b494-6e1e-00a4-dc130954cebd@oracle.com> On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: > Dear David, and Dan, > > Thank you for your comments. > >> In hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: >> 266 the log line reads data from the forwardee even when the CAS >> fails. I believe those reads will be unsafe without barriers after >> the copy of the content of the object. >> hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:288 >> same problem as in line 266 > > Can we use o->size() or new_obj_size instead of new_obj->size()? > >> If you feel that the use of new_obj->size() is potentially unsafe then >> the fact we return new_obj means that any use of new_obj by the caller >> may also potentially be unsafe. > > In my understanding, while copying objects to a survivor space, if a > thread creates a new_obj and sets a pointer with CAS, the other threads > can touch the new_obj after the thread calls push_contents(new_obj) > (Line: 239). In push_contents, OrderAccess::release_store is called > before pushing the object as a task into a deque of workstealing > (taskqueue.inline.hpp). If the other thread reads the task, all of copy > for new_obj is safe. I'm not familiar with the larger picture of the GC protocols here, but just looking at this code fragment in isolation if the CAS fails we read o->forwardee() to set new_obj. That in itself is fine because we're reading the field that we were testing with the CAS. But we could then deference new_obj before the thread that won the CAS calls push_contents; and even if it is after push_contents we have not done an acquire to pair with the release-store in push_contents. So I'm really not seeing how we can use a barrier-less CAS here. David ----- > > Thank you for your helps again. I may be misunderstanding or missing > something critical. Any comments and claims are always appreciated. > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes wrote on 09/30/2016 07:16:16: > >> From: David Holmes >> To: Carsten Varming , Hiroshi H Horii/Japan/IBM at IBMJP >> Cc: Tim Ellison , "ppc-aix-port- >> dev at openjdk.java.net" , "hotspot- >> runtime-dev at openjdk.java.net" > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" > gc-dev at openjdk.java.net>, hotspot-compiler-dev > dev-bounces at openjdk.java.net> >> Date: 09/30/2016 07:17 >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> copy_to_survivor for ppc64 >> >> On 30/09/2016 12:47 AM, Carsten Varming wrote: >> > Dear Hiroshi, >> > >> > In hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:266 >> > the log line reads data from the forwardee even when the CAS fails. I >> > believe those reads will be unsafe without barriers after the copy of >> > the content of the object. >> >> I find it extremely hard to reason about a barrier-less cmpxchg in > general. >> >> If you feel that the use of new_obj->size() is potentially unsafe then >> the fact we return new_obj means that any use of new_obj by the caller >> may also potentially be unsafe. >> >> David >> ----- >> >> > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:288 same >> > problem as in line 266 >> > >> > I would argue that the logging should only happen if the thread >> > successfully copied the object and CAS failures should be logged >> > separately without reading data from the forwardee. >> > >> > BTW, unrelated to your change: It seems like the logging in line 266 >> > should be guarded by something like "if (log_develop_is_enabled(Trace, >> > gc, scavenge)" like the logging in line 288. >> > >> > Carsten >> > >> > On Thu, Sep 29, 2016 at 8:00 AM, Hiroshi H Horii > > > wrote: >> > >> > Hi all, >> > >> > Can I please request reviews for a change for 8154736 that improve >> > copy_to_survivor performance of ppc64 and aarch64? >> > If possible, I would like to include this change into jdk9. >> > >> > 8154736 includes two changes, cmpxchg and copy_to_suvivor, and the >> > former >> > was resolved as 8155949. >> > Now, I would like to ask a review for the remaining, copy_to_suvivor >> > change. >> > >> > webrev: >> > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.01/ >> > > >> > JIRA: https://bugs.openjdk.java.net/browse/JDK-8154736 >> > >> > >> > I tested this change with SPECjbb2013. Also, I re-check that relaxed >> > cmpxchg is available for changing forwarding pointers. However, > because >> > this change is sensitive, we need more reviews not only from >> > compiler-dev, >> > but also from gc-dev. >> > >> > Regards, >> > Hiroshi >> > ----------------------- >> > Hiroshi Horii, Ph.D. >> > IBM Research - Tokyo >> > >> > >> > >> > >> > From: David Holmes > > > >> > To: "Doerr, Martin" > > >, Hiroshi H >> > Horii/Japan/IBM at IBMJP >> > Cc: Tim Ellison > > >, >> > "ppc-aix-port-dev at openjdk.java.net >> > " >> > > > >, >> > "hotspot-gc-dev at openjdk.java.net >> > " >> > > > >, >> > "hotspot-runtime-dev at openjdk.java.net >> > " >> > > > > >> > Date: 05/10/2016 19:31 >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> > copy_to_survivor for ppc64 >> > >> > >> > >> > On 10/05/2016 7:41 PM, Doerr, Martin wrote: >> > > Hi David, >> > > >> > > thank you very much for testing the other platforms. >> > > >> > > Here's an updated webrev: >> > > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ >> > >> > >> > Thanks. Second test run on its way. >> > >> > David >> > ----- >> > >> > > Best regards, >> > > Martin >> > > >> > > -----Original Message----- >> > > From: hotspot-runtime-dev [ >> > mailto:hotspot-runtime-dev-bounces at openjdk.java.net >> > ] On Behalf Of >> > David >> > Holmes >> > > Sent: Dienstag, 10. Mai 2016 11:11 >> > > To: Hiroshi H Horii > >> > > Cc: Tim Ellison > > >; >> > ppc-aix-port-dev at openjdk.java.net >> > ; >> > hotspot-gc-dev at openjdk.java.net >> > ; >> > hotspot-runtime-dev at openjdk.java.net >> > >> > > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> > copy_to_survivor for ppc64 >> > > >> > > The fix seems incomplete for solaris: >> > > >> > > make/Main.gmk:232: recipe for target 'hotspot' failed >> > > >> > "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/ >> solaris_x86/vm/atomic_solaris_x86.inline.hpp", >> > > line 124: Error: Too many arguments in call to >> > > "_Atomic_cmpxchg_long(long, volatile long*, long)". >> > > >> > "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/ >> solaris_x86/vm/atomic_solaris_x86.inline.hpp", >> > > line 128: Error: Too many arguments in call to >> > > "_Atomic_cmpxchg_long(long, volatile long*, long)". >> > > >> > > David >> > > >> > > On 10/05/2016 5:34 PM, David Holmes wrote: >> > >> Hi Hiroshi, >> > >> >> > >> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote: >> > >>> Hi David, >> > >>> >> > >>> Thank you for your comments. >> > >>> >> > >>> As Martin suggested me, I would like to separate this > proposal to >> > >>> - relaxing memory order of cmpxchg >> > >>> - improvement of copy_to_survivior with relaxed cmpxchg >> > >>> and discuss the former first. >> > >>> >> > >>> Martin thankfully created a new webrev that include a change of >> > cmpxchg. >> > >>> >> > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/ >> > >> > >>> He has already tested it with AIX, linuxx86_64, linuxppc64le and >> > >>> darwinintel64. >> > >>> (Please tell me if I need to send a new mail for this PFR) >> > >> >> > >> Please do as it will be simpler to track that way. >> > >> >> > >>>> What I would prefer to see is an additional memory_order value >> > (such >> > as >> > >>>> memory_order_ignored) which is the default for all methods > declared >> > to >> > >>>> take a memory_order parameter. >> > >>> >> > >>> We added simple enum to specify memory order in atomic.hpp as >> > follows. >> > >>> >> > >>> typedef enum cmpxchg_cmpxchg_memory_order { >> > >>> memory_order_relaxed, >> > >>> memory_order_conservative >> > >>> } cmpxchg_memory_order; >> > >>> >> > >>> All of cmpxchg functions have an argument of > cmpxchg_memory_order >> > >>> with a default value memory_order_conservative that uses the > same >> > >>> semantics with the existing cmpxchg and requires no change > for the >> > >>> existing >> > >>> callers. If you think "memory_order_ignored" is better than >> > >>> "memory_order_conservative", I will be happy to modify this > change. >> > >>> (I just thought, "ignored" may resemble "relaxed" and may make >> > >>> people who are familiar with C++11's memory semantics confused. >> > >>> I would like to know thoughts of native speakers.) >> > >> >> > >> That is fine by me. I don't think "ignored" would be confused > with >> > >> "relaxed", but "conservative" is fine. >> > >> >> > >> I will run the patch through our internal build system while you >> > prepare >> > >> the updated RFR. My only concern is "unused argument" warnings >> > from the >> > >> compiler. :) >> > >> >> > >> We are quickly running into a hard deadline with Feature Complete >> > >> however - possibly less than 24 hours - for hotspot changes. > If this >> > >> doesn't get in in time I will see if I can shepherd it > through the >> > >> approval process. >> > >> >> > >> Thanks, >> > >> David >> > >> >> > >> >> > >>> Regards, >> > >>> Hiroshi >> > >>> ----------------------- >> > >>> Hiroshi Horii, Ph.D. >> > >>> IBM Research - Tokyo >> > >>> >> > >>> >> > >>> David Holmes > > > wrote on 05/04/2016 14:55:29: >> > >>> >> > >>>> From: David Holmes > > > >> > >>>> To: Hiroshi H Horii/Japan/IBM at IBMJP >> > >>>> Cc: hotspot-gc-dev at openjdk.java.net >> > , hotspot-runtime- >> > >>>> dev at openjdk.java.net , >> > ppc-aix-port-dev at openjdk.java.net >> > , Tim Ellison >> > >>>> >, >> > Volker Simonis > > >, >> > >>>> "Doerr, Martin" > > >, "Lindenmaier, Goetz" >> > >>>> > >> > >>>> Date: 05/04/2016 14:57 >> > >>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> > >>>> copy_to_survivor for ppc64 >> > >>>> >> > >>>> Hi Hiroshi, >> > >>>> >> > >>>> Sorry for the delay on getting back to this. >> > >>>> >> > >>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote: >> > >>>>> Hi David, >> > >>>>> >> > >>>>> Thank you for your comments and questions. >> > >>>>> >> > >>>>>> 1. Are the current cmpxchg semantics exactly the same as >> > >>>>>> memory_order_seq_cst? >> > >>>>> >> > >>>>> This is very good question.. >> > >>>>> >> > >>>>> I guess, cmpxchg needs a more conservative constraint for > memory >> > >>> ordering >> > >>>>> than C++11, to add sync after a compare-and-exchange > operation. >> > >>>>> >> > >>>>> Could someone give comments or thoughts? >> > >>>> >> > >>>> I don't want to comment on the comparison with C++11. What > I would >> > >>>> prefer to see is an additional memory_order value (such as >> > >>>> memory_order_ignored) which is the default for all methods > declared >> > to >> > >>>> take a memory_order parameter. That way existing >> > implementations are >> > >>>> clearly ignoring the memory_order attribute and there is no >> > potential >> > >>>> for confusion as to whether the existing implementations > equate to >> > >>>> memory_order_seq_cst or not. >> > >>>> >> > >>>> That said, I'm not sure it makes sense to add the memory_order >> > parameter >> > >>>> to all methods with "cas" in their name, e.g. >> > oopDesc::cas_set_mark, >> > >>>> oopDesc::cas_forward_to, unless those methods can sensibly be >> > called >> > >>>> with any value for memory_order - which seems highly unlikely. >> > Perhaps >> > >>>> those methods should identify the weakest form of > memory_order they >> > >>>> support and that should be hard-wired into them? >> > >>>> >> > >>>> Thanks, >> > >>>> David >> > >>>> >> > >>>>> memory_order_seq_cst is defined as >> > >>>>> "Any operation with this memory order is both an acquire >> > >>> operation and >> > >>>>> a release operation, plus a single total order exists in >> > which >> > >>>> all >> > >>>>> threads >> > >>>>> observe all modifications (see below) in the same order." >> > >>>>> (http://en.cppreference.com/w/cpp/atomic/memory_order >> > ) >> > >>>>> >> > >>>>> In my environment, g++ and xlc generate following > assemblies on >> > >>>> ppc64le. >> > >>>>> (interestingly, they generates the same assemblies for any >> > >>>> memory_order) >> > >>>>> >> > >>>>> g++ (4.9.2) >> > >>>>> 100008a4: ac 04 00 7c sync >> > >>>>> 100008a8: 28 50 20 7d lwarx r9,0,r10 >> > >>>>> 100008ac: 00 18 09 7c cmpw r9,r3 >> > >>>>> 100008b0: 0c 00 c2 40 bne- 100008bc >> > >>>>> 100008b4: 2d 51 80 7c stwcx. r4,0,r10 >> > >>>>> 100008b8: f0 ff c2 40 bne- 100008a8 >> > >>>>> 100008bc: 2c 01 00 4c isync >> > >>>>> >> > >>>>> xlc (13.1.3) >> > >>>>> 10000888: ac 04 00 7c sync >> > >>>>> 1000088c: 28 28 c0 7c lwarx r6,0,r5 >> > >>>>> 10000890: 40 00 26 7c cmpld r6,r0 >> > >>>>> 10000894: 0c 00 82 40 bne 100008a0 >> > >>>>> 10000898: 2d 29 80 7c stwcx. r4,0,r5 >> > >>>>> 1000089c: f0 ff e2 40 bne+ 1000088c >> > >>>>> 100008a0: 2c 01 00 4c isync >> > >>>>> >> > >>>>> On the other hand, the current OpenJDK generates following >> > assemblies. >> > >>>>> >> > >>>>> 508: ac 04 00 7c sync >> > >>>>> 50c: 00 00 5c e9 ld r10,0(r28) >> > >>>>> 510: 00 50 3b 7c cmpd r27,r10 >> > >>>>> 514: 1c 00 c2 40 bne- 530 >> > >>>>> 518: a8 40 5c 7d ldarx r10,r28,r8 >> > >>>>> 51c: 00 50 3b 7c cmpd r27,r10 >> > >>>>> 520: 10 00 c2 40 bne- 530 >> > >>>>> 524: ad 41 3c 7d stdcx. r9,r28,r8 >> > >>>>> 528: f0 ff c2 40 bne- 518 >> > >>>>> 52c: ac 04 00 7c sync >> > >>>>> 530: 00 50 bb 7f ... >> > >>>>> >> > >>>>> Though we can ignore 50c-514 (because they are a > duplicated guard >> > >>>>> condition), >> > >>>>> the last sync instruction (52c) makes cmpxchg more strict than >> > >>>>> memory_order_seq_cst. >> > >>>>> >> > >>>>> In some cases, the last sync is necessary when this thread > must be >> > >>>> able >> > >>>>> to read >> > >>>>> all of the changes in the other threads while executing from >> > 508 to >> > >>>> 530 >> > >>>>> (that processes compare-and-exchange). >> > >>>>> >> > >>>>>> 2. Has there been a discussion already, establishing that the >> > >>>> modified >> > >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is >> > >>>>>> postulating that and based on what evidence? >> > >>>>> >> > >>>>> Volker and his colleagues have investigated the current GC > codes >> > >>>>> according to this. >> > >>>>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> > >> > >>>> April/019079.html >> > >>>>> However, I believe, we need comments of other GC expertsto > change >> > >>>>> the shared codes. >> > >>>>> >> > >>>>> Regards, >> > >>>>> Hiroshi >> > >>>>> ----------------------- >> > >>>>> Hiroshi Horii, Ph.D. >> > >>>>> IBM Research - Tokyo >> > >>>>> >> > >>>>> >> > >>>>> David Holmes > > > wrote on 04/22/2016 21:57:07: >> > >>>>> >> > >>>>>> From: David Holmes > > > >> > >>>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- >> > >>>>>> dev at openjdk.java.net , >> > hotspot-gc-dev at openjdk.java.net > >> > >>>>>> Cc: Tim Ellison > > >, >> > >>>>> ppc-aix-port-dev at openjdk.java.net >> > >> > >>>>>> Date: 04/22/2016 21:58 >> > >>>>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> > >>>>>> copy_to_survivor for ppc64 >> > >>>>>> >> > >>>>>> Hi Hiroshi, >> > >>>>>> >> > >>>>>> Two initial questions: >> > >>>>>> >> > >>>>>> 1. Are the current cmpxchg semantics exactly the same as >> > >>>>>> memory_order_seq_cst? >> > >>>>>> >> > >>>>>> 2. Has there been a discussion already, establishing that the >> > >>>> modified >> > >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is >> > >>>>>> postulating that and based on what evidence? >> > >>>>>> >> > >>>>>> Missing memory barriers have caused very difficult to > track down >> > >>> bugs in >> > >>>>>> the past - very rare race conditions. So any relaxation > here has >> > >>>> to be >> > >>>>>> done with extreme confidence. >> > >>>>>> >> > >>>>>> Thanks, >> > >>>>>> David >> > >>>>>> >> > >>>>>> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: >> > >>>>>>> Dear all: >> > >>>>>>> >> > >>>>>>> Can I please request reviews for the following change? >> > >>>>>>> >> > >>>>>>> Code change: >> > >>>>>>> >> > >>> >> > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ >> > > >> > >>>>>>> (I initially created and Martin enhanced so much) >> > >>>>>>> >> > >>>>>>> This change follows the discussion started from this mail. >> > >>>>>>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> > >> > >>>>>> April/018960.html >> > >>>>>>> >> > >>>>>>> Description: >> > >>>>>>> This change provides relaxed compare-and-exchange by > introducing >> > >>>>>>> similar semantics of C++ atomic memory operators, enum >> > >>>> memory_order. >> > >>>>>>> As described in atomic_linux_ppc.inline.hpp, the current >> > >>>>> implementation of >> > >>>>>>> cmpxchg is fence_cmpxchg_acquire. This implementation is > useful >> > for >> > >>>>>>> general purposes because twice calls of sync before and > after >> > >>>>> cmpxchg will >> > >>>>>>> provide strict consistency. However, they sometimes cause >> > overheads >> > >>>>>>> because >> > >>>>>>> sync instructions are very expensive in the current > POWER chip >> > >>> design. >> > >>>>>>> In addition, for the other platforms, such as aarch64, this >> > strict >> > >>>>>>> semantics >> > >>>>>>> may cause some overheads (according to the Andrew's mail). >> > >>>>>>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> > >> > >>>>>> April/019073.html >> > >>>>>>> >> > >>>>>>> With this change, callers can explicitly specify > constraints of >> > >>> memory >> > >>>>>>> ordering >> > >>>>>>> for cmpxchg with an additional parameter, memory_order > order. >> > >>>>>>> >> > >>>>>>> typedef enum memory_order { >> > >>>>>>> memory_order_relaxed, >> > >>>>>>> memory_order_consume, >> > >>>>>>> memory_order_acquire, >> > >>>>>>> memory_order_release, >> > >>>>>>> memory_order_acq_rel, >> > >>>>>>> memory_order_seq_cst >> > >>>>>>> } memory_order; >> > >>>>>>> >> > >>>>>>> Because the default value of the parameter is >> > memory_order_seq_cst, >> > >>>>>>> existing codes can use the same semantics of cmpxchg > without any >> > >>>>>>> modification. The relaxed cmpxchg is implemented only on ppc >> > >>>>>>> in this changeset. Therefore, the behavior on the other >> > platforms >> > >>> will >> > >>>>>>> not be changed with this changeset. >> > >>>>>>> >> > >>>>>>> In addition, with the new parameter of cmpxchg, this change >> > >>>> improves >> > >>>>>>> performance of copy_to_survivor in the parallel GC. >> > >>>>>>> copy_to_survivor changes forward pointers by using > cmpxchg. This >> > >>>>>>> operation doesn't require any sync instructions. A > pointer is >> > >>> changed >> > >>>>>>> at most once in a GC and when cmpxchg fails, the latest >> > pointer is >> > >>>>>>> available for the caller. cas_set_mark and > cas_forward_to are >> > >>> extended >> > >>>>>>> with an additional memory_order parameter as cmpxchg and >> > >>>>> copy_to_survivor >> > >>>>>>> uses memory_order_relaxed to modify the forward pointers. >> > >>>>>>> >> > >>>>>>> Summary of source code changes: >> > >>>>>>> >> > >>>>>>> * src/share/vm/runtime/atomic.hpp >> > >>>>>>> - Defines enum memory_order and adds a parameter to >> > cmpxchg. >> > >>>>>>> >> > >>>>>>> * src/share/vm/runtime/atomic.cpp >> > >>>>>>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >> > >>>>>>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >> > >>>>>>> * > src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >> > >>>>>>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >> > >>>>>>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >> > >>>>>>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >> > >>>>>>> * > src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >> > >>>>>>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >> > >>>>>>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >> > >>>>>>> - Added a parameter for each cmpxchg function to > follow >> > >>>>>>> the change of atomic.hpp. Their implementations > are not >> > >>>>> changed. >> > >>>>>>> >> > >>>>>>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >> > >>>>>>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> > >>>>>>> - Added a parameter for each cmpxchg function to > follow >> > >>>>>>> the change of atomic.hpp. In addition, > implementations >> > >>>>>>> are changed corresponding to the specified >> > memory_order. >> > >>>>>>> >> > >>>>>>> * src/share/vm/oops/oop.hpp >> > >>>>>>> * src/share/vm/oops/oop.inline.hpp >> > >>>>>>> - Add a memory_order parameter to use relaxed > cmpxchg in >> > >>>>>>> cas_set_mark and cas_forward_to. >> > >>>>>>> >> > >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.cpp >> > >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.inline.hpp >> > >>>>>>> >> > >>>>>>> Martin tested this changeset on linuxx86_64, > linuxppc64le and >> > >>>>>>> darwinintel64. >> > >>>>>>> Though more time is needed to test on the other platform, we >> > would >> > >>>>> like to >> > >>>>>>> ask >> > >>>>>>> reviews and start discussion on this changeset. >> > >>>>>>> I also tested this changeset with SPECjbb2013 and > confirmed that >> > gc >> > >>>>> pause >> > >>>>>>> time >> > >>>>>>> is reduced. >> > >>>>>>> >> > >>>>>>> Regards, >> > >>>>>>> Hiroshi >> > >>>>>>> ----------------------- >> > >>>>>>> Hiroshi Horii, Ph.D. >> > >>>>>>> IBM Research - Tokyo >> > >>>>>>> >> > >>>>>>> >> > >>>>>> >> > >>>>> >> > >>>> >> > >>> >> > >> > >> > >> > >> > >> > >> > From alexander.vorobyev at oracle.com Fri Sep 30 11:31:25 2016 From: alexander.vorobyev at oracle.com (Alexander Vorobyev) Date: Fri, 30 Sep 2016 14:31:25 +0300 Subject: Request for review: JDK-8145728: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java Expected message not found: 'com.sun.crypto.provider.AESCrypt::(implEncryptBlock|implDecryptBlock) ([0-9]+ bytes) (intrinsic) not found on supported platfroms In-Reply-To: <4949ae4e-d2f5-f09b-7c8c-ea99cc61351e@oracle.com> References: <542E8041.1010101@oracle.com> <3f1a7b1e-1ec1-6af6-5b38-84eae3ba4d40@oracle.com> <4949ae4e-d2f5-f09b-7c8c-ea99cc61351e@oracle.com> Message-ID: Do you mean with -XX:+TieredCompilation -XX:TieredStopAtLevel=1? There are no reports about such failures - all other compiler/cpuflags tests pass judging by existing test runs results. Just in case, I run compiler/cpuflags tests manually - no failures. On 29.09.2016 19:38, Vladimir Kozlov wrote: > Looks good. Did you run all compiler/cpuflags tests to verify that we > don't need to fix other tests too? > > Thanks, > Vladimir > > On 9/29/16 9:30 AM, Alexander Vorobyev wrote: >> >> Hi All, >> >> I'd like review for JDK-8145728 >> (https://bugs.openjdk.java.net/browse/JDK-8145728) >> >> Judging by the test results, test fails with specific compiler options: >> -XX:+TieredCompilation -XX:TieredStopAtLevel=N, where N<4. In this case >> C2 is not used and we are not able to see intrinsics usage in the test >> log. So such configuration is not valid for this test and should not be >> used. Supposed fix is to prevent this test from accepting such options. >> >> "@requires" tag was added: >> @requires vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel >> == 4 >> >> >> Here is webrev: >> http://cr.openjdk.java.net/~avorobye/8145728/webrew.00/ >> >> >> Thanks, >> Alexander >> >> >> From thomas.schatzl at oracle.com Fri Sep 30 12:02:31 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 30 Sep 2016 14:02:31 +0200 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <1e40040e-b494-6e1e-00a4-dc130954cebd@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1e40040e-b494-6e1e-00a4-dc130954cebd@oracle.com> Message-ID: <1475236951.6301.72.camel@oracle.com> Hi, On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote: > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: > > > > Dear David, and Dan, > > > > Thank you for your comments. > > > > > > > > In > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: > > > 266 the log line reads data from the forwardee even when the CAS > > > fails. I believe those reads will be unsafe without barriers > > > after > > > the copy of the content of the object. > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28 > > > 8 > > > same problem as in line 266 > > Can we use o->size() or new_obj_size instead of new_obj->size()? They are not equivalent. Parallel GC and other collectors creatively reuse the "length" field of objArrays to indicate progress in the scanning them during GC. new_obj_size is the result of a call to o->size() (and the compiler may redo computations at any point), so has the same issue. > > > If you feel that the use of new_obj->size() is potentially unsafe > > > then > > > the fact we return new_obj means that any use of new_obj by the > > > caller > > > may also potentially be unsafe. > > In my understanding, while copying objects to a survivor space, if > > a thread creates a new_obj and sets a pointer with CAS, the other > > threads can touch the new_obj after the thread calls > > push_contents(new_obj) (Line: 239). In push_contents, > > OrderAccess::release_store is called before pushing the object as a > > task into a deque of workstealing (taskqueue.inline.hpp). If the > > other thread reads the task, all of copy for new_obj is safe. > I'm not familiar with the larger picture of the GC protocols here, > but just looking at this code fragment in isolation if the CAS fails > we read o->forwardee() to set new_obj. That in itself is fine because > we're reading the field that we were testing with the CAS. But we > could then deference new_obj before the thread that won the CAS calls > push_contents; and even if it is after push_contents we have not done > an acquire to pair with the release-store in push_contents. I think Hiroshi thinks that since the work stealing itself does a CAS with barrier after obtaining "new_obj" in the other thread, it should be safe (for other threads consuming an object on the task queue). > So I'm really not seeing how we can use a barrier-less CAS here. I also do not think it is safe as is - for example, at least PSPromotionManager::copy_and_push_safe_barrier() reads data from the returned new_obj (in another log message :)) regardless of failure. That method also reads the forwardee if forwarded, and then again uses object information in that same log message. A quick look did not show other issues, but don't count this as a review. Thanks, ? Thomas From igor.ignatyev at oracle.com Fri Sep 30 13:05:30 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 30 Sep 2016 16:05:30 +0300 Subject: Request for review: JDK-8145728: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java Expected message not found: 'com.sun.crypto.provider.AESCrypt::(implEncryptBlock|implDecryptBlock) ([0-9]+ bytes) (intrinsic) not found on supported platfroms In-Reply-To: <3f1a7b1e-1ec1-6af6-5b38-84eae3ba4d40@oracle.com> References: <542E8041.1010101@oracle.com> <3f1a7b1e-1ec1-6af6-5b38-84eae3ba4d40@oracle.com> Message-ID: Alexander, your fix literally removes the test from almost all executions, because we do not set -XX:TieredStopAtLevel=4 in any configs. from my point of view, changing AESSupportPredicate class is a better way to fix this issue, since it will be reused by all other tests. I also have a question regarding your evaluation. Basing on own comment[1], not used C2 can not be a reason why this test failed before, otherwise you would be able to reproduce this bug w/o any problems. could you please provide more detailed evaluation? [1] https://bugs.openjdk.java.net/browse/JDK-8145728?focusedCommentId=13996257&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13996257 It is not reproducible with the latest builds of JDK 9 (b133), even on the same host with the same options. Thanks, ? Igor > On Sep 29, 2016, at 7:30 PM, Alexander Vorobyev wrote: > > > Hi All, > > I'd like review for JDK-8145728 (https://bugs.openjdk.java.net/browse/JDK-8145728) > > Judging by the test results, test fails with specific compiler options: -XX:+TieredCompilation -XX:TieredStopAtLevel=N, where N<4. In this case C2 is not used and we are not able to see intrinsics usage in the test log. So such configuration is not valid for this test and should not be used. Supposed fix is to prevent this test from accepting such options. > > "@requires" tag was added: > @requires vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4 > > > Here is webrev: > http://cr.openjdk.java.net/~avorobye/8145728/webrew.00/ > > > Thanks, > Alexander > > > From alexander.vorobyev at oracle.com Fri Sep 30 13:51:32 2016 From: alexander.vorobyev at oracle.com (Alexander Vorobyev) Date: Fri, 30 Sep 2016 16:51:32 +0300 Subject: Request for review: JDK-8145728: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java Expected message not found: 'com.sun.crypto.provider.AESCrypt::(implEncryptBlock|implDecryptBlock) ([0-9]+ bytes) (intrinsic) not found on supported platfroms In-Reply-To: References: <542E8041.1010101@oracle.com> <3f1a7b1e-1ec1-6af6-5b38-84eae3ba4d40@oracle.com> Message-ID: <47fcc8c7-c986-fec5-74a5-6f5fcd35e83f@oracle.com> About my comment. I really was not able to reproduce this issue at that time, because the earliest failure report has different VM options and does not contain -XX:TieredStopAtLevel option. Do you always use -XX:TieredStopAtLevel option in test runs? My fix allows to run this test when this option is not set (vm.opt.TieredStopAtLevel == null). AESSupportPredicate class only checks CPU AES feature. It is exactly what it is supposed to do. Is it really necessary to add some new functionality (unrelated to AES feature) to it? Thanks On 30.09.2016 16:05, Igor Ignatyev wrote: > Alexander, > > your fix literally removes the test from almost all executions, because we do not set -XX:TieredStopAtLevel=4 in any configs. from my point of view, changing AESSupportPredicate class is a better way to fix this issue, since it will be reused by all other tests. > > I also have a question regarding your evaluation. Basing on own comment[1], not used C2 can not be a reason why this test failed before, otherwise you would be able to reproduce this bug w/o any problems. could you please provide more detailed evaluation? > > [1] https://bugs.openjdk.java.net/browse/JDK-8145728?focusedCommentId=13996257&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13996257 > It is not reproducible with the latest builds of JDK 9 (b133), even on the same host with the same options. > > Thanks, > ? Igor > >> On Sep 29, 2016, at 7:30 PM, Alexander Vorobyev wrote: >> >> >> Hi All, >> >> I'd like review for JDK-8145728 (https://bugs.openjdk.java.net/browse/JDK-8145728) >> >> Judging by the test results, test fails with specific compiler options: -XX:+TieredCompilation -XX:TieredStopAtLevel=N, where N<4. In this case C2 is not used and we are not able to see intrinsics usage in the test log. So such configuration is not valid for this test and should not be used. Supposed fix is to prevent this test from accepting such options. >> >> "@requires" tag was added: >> @requires vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4 >> >> >> Here is webrev: >> http://cr.openjdk.java.net/~avorobye/8145728/webrew.00/ >> >> >> Thanks, >> Alexander >> >> >> From igor.ignatyev at oracle.com Fri Sep 30 13:58:06 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 30 Sep 2016 16:58:06 +0300 Subject: Request for review: JDK-8145728: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java Expected message not found: 'com.sun.crypto.provider.AESCrypt::(implEncryptBlock|implDecryptBlock) ([0-9]+ bytes) (intrinsic) not found on supported platfroms In-Reply-To: <47fcc8c7-c986-fec5-74a5-6f5fcd35e83f@oracle.com> References: <542E8041.1010101@oracle.com> <3f1a7b1e-1ec1-6af6-5b38-84eae3ba4d40@oracle.com> <47fcc8c7-c986-fec5-74a5-6f5fcd35e83f@oracle.com> Message-ID: <02A2359A-B806-4D3A-B993-7647EB5BFA48@oracle.com> Alexander, please see inline below. > On Sep 30, 2016, at 4:51 PM, Alexander Vorobyev wrote: > > About my comment. I really was not able to reproduce this issue at that time, because the earliest failure report has different VM options and does not contain -XX:TieredStopAtLevel option. that means there can be another issue w/ the test or the product, which hasn?t investigated and your fix can hide it. > > Do you always use -XX:TieredStopAtLevel option in test runs? My fix allows to run this test when this option is not set (vm.opt.TieredStopAtLevel == null). no we don?t. > > AESSupportPredicate class only checks CPU AES feature. It is exactly what it is supposed to do. Is it really necessary to add some new functionality (unrelated to AES feature) to it? AESSupportPredicate is supposed to check that JVM can use AES, AFAIR there is AES intrinsics support only in C2, so a disabled C2 basically means JVM can not use AES intrinsics. Regards, ? Igor > > Thanks > > On 30.09.2016 16:05, Igor Ignatyev wrote: >> Alexander, >> >> your fix literally removes the test from almost all executions, because we do not set -XX:TieredStopAtLevel=4 in any configs. from my point of view, changing AESSupportPredicate class is a better way to fix this issue, since it will be reused by all other tests. >> >> I also have a question regarding your evaluation. Basing on own comment[1], not used C2 can not be a reason why this test failed before, otherwise you would be able to reproduce this bug w/o any problems. could you please provide more detailed evaluation? >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8145728?focusedCommentId=13996257&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13996257 >> It is not reproducible with the latest builds of JDK 9 (b133), even on the same host with the same options. >> >> Thanks, >> ? Igor >> >>> On Sep 29, 2016, at 7:30 PM, Alexander Vorobyev wrote: >>> >>> >>> Hi All, >>> >>> I'd like review for JDK-8145728 (https://bugs.openjdk.java.net/browse/JDK-8145728) >>> >>> Judging by the test results, test fails with specific compiler options: -XX:+TieredCompilation -XX:TieredStopAtLevel=N, where N<4. In this case C2 is not used and we are not able to see intrinsics usage in the test log. So such configuration is not valid for this test and should not be used. Supposed fix is to prevent this test from accepting such options. >>> >>> "@requires" tag was added: >>> @requires vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4 >>> >>> >>> Here is webrev: >>> http://cr.openjdk.java.net/~avorobye/8145728/webrew.00/ >>> >>> >>> Thanks, >>> Alexander >>> >>> >>> > From alexander.vorobyev at oracle.com Fri Sep 30 15:06:33 2016 From: alexander.vorobyev at oracle.com (Alexander Vorobyev) Date: Fri, 30 Sep 2016 18:06:33 +0300 Subject: Request for review: JDK-8145728: compiler/cpuflags/TestAESIntrinsicsOnSupportedConfig.java Expected message not found: 'com.sun.crypto.provider.AESCrypt::(implEncryptBlock|implDecryptBlock) ([0-9]+ bytes) (intrinsic) not found on supported platfroms In-Reply-To: <02A2359A-B806-4D3A-B993-7647EB5BFA48@oracle.com> References: <542E8041.1010101@oracle.com> <3f1a7b1e-1ec1-6af6-5b38-84eae3ba4d40@oracle.com> <47fcc8c7-c986-fec5-74a5-6f5fcd35e83f@oracle.com> <02A2359A-B806-4D3A-B993-7647EB5BFA48@oracle.com> Message-ID: On 30.09.2016 16:58, Igor Ignatyev wrote: > Alexander, > > please see inline below. > >> On Sep 30, 2016, at 4:51 PM, Alexander Vorobyev wrote: >> >> About my comment. I really was not able to reproduce this issue at that time, because the earliest failure report has different VM options and does not contain -XX:TieredStopAtLevel option. > that means there can be another issue w/ the test or the product, which hasn?t investigated and your fix can hide it. I don't think my fix can hide it, because it does not use VM options/configurations from the earliest failure reports. Only issue my fix is targeted for is invalid VM configuration with -XX:TieredStopAtLevel option. >> Do you always use -XX:TieredStopAtLevel option in test runs? My fix allows to run this test when this option is not set (vm.opt.TieredStopAtLevel == null). > no we don?t. >> AESSupportPredicate class only checks CPU AES feature. It is exactly what it is supposed to do. Is it really necessary to add some new functionality (unrelated to AES feature) to it? > AESSupportPredicate is supposed to check that JVM can use AES, AFAIR there is AES intrinsics support only in C2, so a disabled C2 basically means JVM can not use AES intrinsics. Maybe I was wrong. For now, AESSupportPredicate uses CPUInfo class which shows us exactly CPU features, not JVM. And, for example, TestAESIntrinsicsOnUnsupportedConfig.java expects exactly such behaviour. Because "UnsupportedConfig" means CPU with no AES feature. On such CPU we will see "AES instructions are not available on this CPU" warning (TestAESIntrinsicsOnUnsupportedConfig expects this warning) in the test log, but on CPU with AES feature and with -XX:TieredStopAtLevel=1 option (you suppose to make AESSupportPredicate return FALSE for this configuration, right?) we won't. In result, we will have TestAESIntrinsicsOnUnsupportedConfig failures on platforms where this test is not even supposed to be run. Please correct me if I misunderstand your idea. > > Regards, > ? Igor >> Thanks >> >> On 30.09.2016 16:05, Igor Ignatyev wrote: >>> Alexander, >>> >>> your fix literally removes the test from almost all executions, because we do not set -XX:TieredStopAtLevel=4 in any configs. from my point of view, changing AESSupportPredicate class is a better way to fix this issue, since it will be reused by all other tests. >>> >>> I also have a question regarding your evaluation. Basing on own comment[1], not used C2 can not be a reason why this test failed before, otherwise you would be able to reproduce this bug w/o any problems. could you please provide more detailed evaluation? >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8145728?focusedCommentId=13996257&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13996257 >>> It is not reproducible with the latest builds of JDK 9 (b133), even on the same host with the same options. >>> >>> Thanks, >>> ? Igor >>> >>>> On Sep 29, 2016, at 7:30 PM, Alexander Vorobyev wrote: >>>> >>>> >>>> Hi All, >>>> >>>> I'd like review for JDK-8145728 (https://bugs.openjdk.java.net/browse/JDK-8145728) >>>> >>>> Judging by the test results, test fails with specific compiler options: -XX:+TieredCompilation -XX:TieredStopAtLevel=N, where N<4. In this case C2 is not used and we are not able to see intrinsics usage in the test log. So such configuration is not valid for this test and should not be used. Supposed fix is to prevent this test from accepting such options. >>>> >>>> "@requires" tag was added: >>>> @requires vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4 >>>> >>>> >>>> Here is webrev: >>>> http://cr.openjdk.java.net/~avorobye/8145728/webrew.00/ >>>> >>>> >>>> Thanks, >>>> Alexander >>>> >>>> >>>> From HORIE at jp.ibm.com Fri Sep 30 13:30:36 2016 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 30 Sep 2016 13:30:36 +0000 Subject: RFR:8166684:implement intrinsic code with vector instructions for Unsafe.copyMemory() In-Reply-To: References: , <4c013cabdeeb476f97c427643aef7a1b@DEWDFE13DE14.global.corp.sap> Message-ID: An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Fri Sep 30 16:00:23 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 30 Sep 2016 16:00:23 +0000 Subject: RFR:8166684:implement intrinsic code with vector instructions for Unsafe.copyMemory() In-Reply-To: References: , <4c013cabdeeb476f97c427643aef7a1b@DEWDFE13DE14.global.corp.sap> Message-ID: <690881fb156d4d0a83dc31a01e50b4ec@DEWDFE13DE14.global.corp.sap> Hi Michihiro, thanks for contributing this change. Looks good, now. We will test it. We?ll push it if it gets approved and reviewed. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 30. September 2016 15:31 To: Lindenmaier, Goetz Cc: gromero at linux.vnet.ibm.com; Hiroshi H Horii ; Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RE: RFR:8166684:implement intrinsic code with vector instructions for Unsafe.copyMemory() Hi Goetz, Martin, Would you review this? The initialization of tmp1 is now outside the loop. JIRA: https://bugs.openjdk.java.net/browse/JDK-8166684 Webrev: http://cr.openjdk.java.net/~horii/8166684/webrev.01/ We created webrev by ourselves, and cced to hotspot-compiler-dev. Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Lindenmaier, Goetz" > To: "Doerr, Martin" >, Michihiro Horie/Japan/IBM at IBMJP, "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" > Cc: Hiroshi H Horii/Japan/IBM at IBMJP, Gustavo Romero > Subject: RE: RFR:8166684:implement intrinsic code with vector instructions for Unsafe.copyMemory() Date: Mon, Sep 26, 2016 10:51 PM Hi, please post this RFR also to hotspot-compiler-dev. It must be reviewed on one of the official lists before it can be pushed. Ppc-aix-port-dev is only for communication about the port, not for reviews. Also I would appreciate if you could upload your webrevs yourselves. We are happy to help out in the beginning, and also with testing, reviewing and pushing, but making webrevs is a task I don't see on our side in the long term. Thanks and best regards, Goetz. > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > bounces at openjdk.java.net] On Behalf Of Doerr, Martin > Sent: Montag, 26. September 2016 11:53 > To: Michihiro Horie >; Simonis, Volker > >; ppc-aix-port-dev at openjdk.java.net > Cc: Hiroshi H Horii >; Gustavo Romero > > > Subject: RE: RFR:8166684:implement intrinsic code with vector instructions > for Unsafe.copyMemory() > > Hi Michihiro, > > > > the initialization of tmp1 should be done outside of the loop. Beside that, the > change looks good: > > http://cr.openjdk.java.net/~mdoerr/8166684_PPC64_unsafe_copymemory/ > webrev.00/ > /webrev.00/> > > > > Best regards, > > Martin > > > > > > From: Michihiro Horie [mailto:HORIE at jp.ibm.com] > Sent: Montag, 26. September 2016 09:37 > To: Doerr, Martin >; Simonis, Volker > >; ppc-aix-port-dev at openjdk.java.net > Cc: volker.simonis at gmail.com; Gustavo Romero > >; Hiroshi H Horii > > Subject: RFR:8166684:implement intrinsic code with vector instructions for > Unsafe.copyMemory() > > > > Dear all, > > Could I please request reviews for the following change? > This change was created for JDK9. > > I added fixes to the intrinsic code for sun.misc.Unsafe.copyMemory() by > using VSX. > Since Spark often invokes Unsafe.copyMemory(), it is beneficial to use the > vector instructions for these intrinsic code. > > jira: https://bugs.openjdk.java.net/browse/JDK-8166684 > > diff: (See attached file: unsafe-copymemory-openjdk9.diff) > > Best regards, > -- > Michihiro Horie, > IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.rodriguez at oracle.com Fri Sep 30 16:07:27 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 30 Sep 2016 09:07:27 -0700 Subject: RFR 8166929: [JVMCI] Expose decompile counts in MDO In-Reply-To: References: <54C73C2D-F44F-491E-92C3-79DE73CE7B8F@oracle.com> Message-ID: <0C3B6756-89C4-4CB2-BA71-A509C45FF82F@oracle.com> > On Sep 29, 2016, at 7:16 PM, Vitaly Davidovich wrote: > > Quick fly-by comment: HotSpotMethodData::toString should use %d for overflow recompiles count printing, like the other counters. Thanks, that was a typo. Fixed and updated in place. tom > > Thanks > > On Thursday, September 29, 2016, Tom Rodriguez > wrote: > http://cr.openjdk.java.net/~never/8166929/webrev > https://bugs.openjdk.java.net/browse/JDK-8166929 > > This is a minor API addition to expose some of the top-level MDO decompile and recompile counts. It?s necessary to detect recompilation pathologies. Tested by printing MDOs from JVMCI. I also fixed a few problems I discovered with the formatting of the MDO printed form. > > tom > > > -- > Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From cthalinger at twitter.com Fri Sep 30 22:32:42 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Fri, 30 Sep 2016 12:32:42 -1000 Subject: RFR 8166929: [JVMCI] Expose decompile counts in MDO In-Reply-To: <0C3B6756-89C4-4CB2-BA71-A509C45FF82F@oracle.com> References: <54C73C2D-F44F-491E-92C3-79DE73CE7B8F@oracle.com> <0C3B6756-89C4-4CB2-BA71-A509C45FF82F@oracle.com> Message-ID: <0A743A77-9620-438C-8D04-C304744327AB@twitter.com> > On Sep 30, 2016, at 6:07 AM, Tom Rodriguez wrote: > > >> On Sep 29, 2016, at 7:16 PM, Vitaly Davidovich > wrote: >> >> Quick fly-by comment: HotSpotMethodData::toString should use %d for overflow recompiles count printing, like the other counters. > > Thanks, that was a typo. Fixed and updated in place. + COMPILER2_OR_JVMCI_PRESENT(nonstatic_field(MethodCounters, _interpreter_invocation_count, int)) \ + COMPILER2_OR_JVMCI_PRESENT(nonstatic_field(MethodCounters, _interpreter_throwout_count, u2)) \ Isn?t that true always? + public int getDecompileCount() { + return UNSAFE.getInt(metaspaceMethodData + config.methodDataDecompiles); + } + + public int getOverflowRecompileCount() { + return UNSAFE.getInt(metaspaceMethodData + config.methodDataOverflowRecompiles); + } + + public int getOverflowTrapsCount() { + return UNSAFE.getInt(metaspaceMethodData + config.methodDataOverflowTraps); + } This is high-level nitpicking: the fields are plural but you named the methods singular except ?OverflowTraps?. Either plural everywhere or none. > > tom > >> >> Thanks >> >> On Thursday, September 29, 2016, Tom Rodriguez > wrote: >> http://cr.openjdk.java.net/~never/8166929/webrev >> https://bugs.openjdk.java.net/browse/JDK-8166929 >> >> This is a minor API addition to expose some of the top-level MDO decompile and recompile counts. It?s necessary to detect recompilation pathologies. Tested by printing MDOs from JVMCI. I also fixed a few problems I discovered with the formatting of the MDO printed form. >> >> tom >> >> >> -- >> Sent from my phone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.rodriguez at oracle.com Fri Sep 30 23:04:30 2016 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 30 Sep 2016 16:04:30 -0700 Subject: RFR 8166929: [JVMCI] Expose decompile counts in MDO In-Reply-To: <0A743A77-9620-438C-8D04-C304744327AB@twitter.com> References: <54C73C2D-F44F-491E-92C3-79DE73CE7B8F@oracle.com> <0C3B6756-89C4-4CB2-BA71-A509C45FF82F@oracle.com> <0A743A77-9620-438C-8D04-C304744327AB@twitter.com> Message-ID: > On Sep 30, 2016, at 3:32 PM, Christian Thalinger wrote: > > >> On Sep 30, 2016, at 6:07 AM, Tom Rodriguez > wrote: >> >> >>> On Sep 29, 2016, at 7:16 PM, Vitaly Davidovich > wrote: >>> >>> Quick fly-by comment: HotSpotMethodData::toString should use %d for overflow recompiles count printing, like the other counters. >> >> Thanks, that was a typo. Fixed and updated in place. > > + COMPILER2_OR_JVMCI_PRESENT(nonstatic_field(MethodCounters, _interpreter_invocation_count, int)) \ > + COMPILER2_OR_JVMCI_PRESENT(nonstatic_field(MethodCounters, _interpreter_throwout_count, u2)) \ > Isn?t that true always? Yes, it is here. I copied those lines from the normal vmstructs database where it might not be true but they can be removed here. While fixing this I also realized that InvocationCounter wasn?t declared in the JVMCI copy of vmstructs, so I added that. > > + public int getDecompileCount() { > + return UNSAFE.getInt(metaspaceMethodData + config.methodDataDecompiles); > + } > + > + public int getOverflowRecompileCount() { > + return UNSAFE.getInt(metaspaceMethodData + config.methodDataOverflowRecompiles); > + } > + > + public int getOverflowTrapsCount() { > + return UNSAFE.getInt(metaspaceMethodData + config.methodDataOverflowTraps); > + } > This is high-level nitpicking: the fields are plural but you named the methods singular except ?OverflowTraps?. Either plural everywhere or none. Yes I fixed that in 8 while preparing this webrev but I missed updating in it 9. I?ve put the updated webrev at http://cr.openjdk.java.net/~never/8166929.1/webrev tom > >> >> tom >> >>> >>> Thanks >>> >>> On Thursday, September 29, 2016, Tom Rodriguez > wrote: >>> http://cr.openjdk.java.net/~never/8166929/webrev >>> https://bugs.openjdk.java.net/browse/JDK-8166929 >>> >>> This is a minor API addition to expose some of the top-level MDO decompile and recompile counts. It?s necessary to detect recompilation pathologies. Tested by printing MDOs from JVMCI. I also fixed a few problems I discovered with the formatting of the MDO printed form. >>> >>> tom >>> >>> >>> -- >>> Sent from my phone >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cthalinger at twitter.com Fri Sep 30 23:12:08 2016 From: cthalinger at twitter.com (Christian Thalinger) Date: Fri, 30 Sep 2016 13:12:08 -1000 Subject: RFR 8166929: [JVMCI] Expose decompile counts in MDO In-Reply-To: References: <54C73C2D-F44F-491E-92C3-79DE73CE7B8F@oracle.com> <0C3B6756-89C4-4CB2-BA71-A509C45FF82F@oracle.com> <0A743A77-9620-438C-8D04-C304744327AB@twitter.com> Message-ID: <874B3DFE-DF6A-43E5-AEFE-1AAE5BA628B0@twitter.com> > On Sep 30, 2016, at 1:04 PM, Tom Rodriguez wrote: > >> >> On Sep 30, 2016, at 3:32 PM, Christian Thalinger > wrote: >> >> >>> On Sep 30, 2016, at 6:07 AM, Tom Rodriguez > wrote: >>> >>> >>>> On Sep 29, 2016, at 7:16 PM, Vitaly Davidovich > wrote: >>>> >>>> Quick fly-by comment: HotSpotMethodData::toString should use %d for overflow recompiles count printing, like the other counters. >>> >>> Thanks, that was a typo. Fixed and updated in place. >> >> + COMPILER2_OR_JVMCI_PRESENT(nonstatic_field(MethodCounters, _interpreter_invocation_count, int)) \ >> + COMPILER2_OR_JVMCI_PRESENT(nonstatic_field(MethodCounters, _interpreter_throwout_count, u2)) \ >> Isn?t that true always? > > Yes, it is here. I copied those lines from the normal vmstructs database where it might not be true but they can be removed here. While fixing this I also realized that InvocationCounter wasn?t declared in the JVMCI copy of vmstructs, so I added that. > >> >> + public int getDecompileCount() { >> + return UNSAFE.getInt(metaspaceMethodData + config.methodDataDecompiles); >> + } >> + >> + public int getOverflowRecompileCount() { >> + return UNSAFE.getInt(metaspaceMethodData + config.methodDataOverflowRecompiles); >> + } >> + >> + public int getOverflowTrapsCount() { >> + return UNSAFE.getInt(metaspaceMethodData + config.methodDataOverflowTraps); >> + } >> This is high-level nitpicking: the fields are plural but you named the methods singular except ?OverflowTraps?. Either plural everywhere or none. > > Yes I fixed that in 8 while preparing this webrev but I missed updating in it 9. I?ve put the updated webrev at http://cr.openjdk.java.net/~never/8166929.1/webrev Looks good. Unrelated question: String nl = String.format("%n"); String nlIndent = String.format("%n%38s", ""); + sb.append("Raw method data for "); + sb.append(method.format("%H.%n(%p)")); + sb.append(":"); + sb.append(nl); + sb.append(String.format("nof_decompiles(%d) nof_overflow_recompiles(%d) nof_overflow_traps(%d)%n", + getDecompileCount(), getOverflowRecompileCount(), getOverflowTrapCount())); Is pre-formatting nl really a win? If yes, why are we not doing the same trick on the last line? > > tom > >> >>> >>> tom >>> >>>> >>>> Thanks >>>> >>>> On Thursday, September 29, 2016, Tom Rodriguez > wrote: >>>> http://cr.openjdk.java.net/~never/8166929/webrev >>>> https://bugs.openjdk.java.net/browse/JDK-8166929 >>>> >>>> This is a minor API addition to expose some of the top-level MDO decompile and recompile counts. It?s necessary to detect recompilation pathologies. Tested by printing MDOs from JVMCI. I also fixed a few problems I discovered with the formatting of the MDO printed form. >>>> >>>> tom >>>> >>>> >>>> -- >>>> Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: