From roland.westrelin at oracle.com Mon Aug 1 05:00:31 2011 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 1 Aug 2011 14:00:31 +0200 Subject: proposed membar simplification in c2 In-Reply-To: <82AE947F-0E5B-475C-84B2-49D9CB6F282A@oracle.com> References: <4E25C003.5030805@oracle.com> <82AE947F-0E5B-475C-84B2-49D9CB6F282A@oracle.com> Message-ID: <5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com> While doing more testing I found that I had to make some changes in src/share/vm/adlc/formssel.cpp as well. Here is an updated webrev: http://cr.openjdk.java.net/~roland/membar/webrev.03/ Roland. From vladimir.kozlov at oracle.com Mon Aug 1 08:21:10 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 01 Aug 2011 08:21:10 -0700 Subject: proposed membar simplification in c2 In-Reply-To: <5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com> References: <4E25C003.5030805@oracle.com> <82AE947F-0E5B-475C-84B2-49D9CB6F282A@oracle.com> <5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com> Message-ID: <4E36C466.9090609@oracle.com> Good. Vladimir On 8/1/11 5:00 AM, Roland Westrelin wrote: > > While doing more testing I found that I had to make some changes in src/share/vm/adlc/formssel.cpp as well. Here is an updated webrev: > > http://cr.openjdk.java.net/~roland/membar/webrev.03/ > > Roland. From christian.thalinger at oracle.com Tue Aug 2 01:46:50 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 2 Aug 2011 10:46:50 +0200 Subject: proposed membar simplification in c2 In-Reply-To: <5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com> References: <4E25C003.5030805@oracle.com> <82AE947F-0E5B-475C-84B2-49D9CB6F282A@oracle.com> <5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com> Message-ID: <013EBD4C-417A-41AE-8865-56A14D240AE1@oracle.com> Looks good. -- Christian On Aug 1, 2011, at 2:00 PM, Roland Westrelin wrote: > > While doing more testing I found that I had to make some changes in src/share/vm/adlc/formssel.cpp as well. Here is an updated webrev: > > http://cr.openjdk.java.net/~roland/membar/webrev.03/ > > Roland. From joe.j.kearney at gmail.com Wed Aug 3 07:17:20 2011 From: joe.j.kearney at gmail.com (Joe Kearney) Date: Wed, 3 Aug 2011 15:17:20 +0100 Subject: IdealGraphVisualizer file compatibility Message-ID: Hi, I've been trying to play with igv from http://ssw.jku.at/General/Staff/TW/igv.html, http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate the required log files. What sort of files should I expect the igv to be able to read? The example files are graphDocument XMLs. I was hoping to be able to generate a file with something like the following: -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml Needless to say, these hotspot_log files are totally different and the igv barfs with the below. java.lang.NullPointerException at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70) at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128) at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572) [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997) How do I get the jvm to generate the right output file? Many thanks, Joe From christian.thalinger at oracle.com Wed Aug 3 07:51:06 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 3 Aug 2011 16:51:06 +0200 Subject: IdealGraphVisualizer file compatibility In-Reply-To: References: Message-ID: <60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com> You want: -XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml The README of the visualizer also helps: http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README -- Christian On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote: > Hi, > > I've been trying to play with igv from > http://ssw.jku.at/General/Staff/TW/igv.html, > http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate > the required log files. What sort of files should I expect the igv to > be able to read? The example files are graphDocument XMLs. I was > hoping to be able to generate a file with something like the > following: > > -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml > > Needless to say, these hotspot_log files are totally different and the > igv barfs with the below. > > java.lang.NullPointerException > at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70) > at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128) > at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572) > [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997) > > > How do I get the jvm to generate the right output file? > > Many thanks, > Joe From peter.hofer at jku.at Wed Aug 3 08:14:43 2011 From: peter.hofer at jku.at (Peter Hofer) Date: Wed, 3 Aug 2011 17:14:43 +0200 Subject: IdealGraphVisualizer file compatibility In-Reply-To: References: Message-ID: <20110803171443.4447acd5@sunflower> Hi Joe! > I've been trying to play with igv from > http://ssw.jku.at/General/Staff/TW/igv.html, > http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate > the required log files. What sort of files should I expect the igv to > be able to read? The example files are graphDocument XMLs. This is IGV's custom XML format. Its structure is described in Thomas Wuerthinger's master's thesis: http://ssw.jku.at/Research/Papers/Wuerthinger07Master/ > I was hoping to be able to generate a file with something like the > following: > [...] > How do I get the jvm to generate the right output file? You need a debug or fastdebug build of Hotspot. Only the server compiler can generate IGV output, so you need to specify -server if your VM uses the client compiler by default. You can then use -XX:PrintIdealGraphLevel= to enable IGV output and to control the detail level of the generated output (with 1 being the minimum). By default, Hotspot's IGV printer tries to send the output to an IGV instance listening at localhost:4444. You can instead write it to a file using -XX:PrintIdealGraphFile= or use -XX:PrintIdealGraphAddress= and -XX:PrintIdealGraphPort= for a different network destination. Best regards, Peter From joe.j.kearney at gmail.com Wed Aug 3 09:37:31 2011 From: joe.j.kearney at gmail.com (Joe Kearney) Date: Wed, 3 Aug 2011 17:37:31 +0100 Subject: IdealGraphVisualizer file compatibility In-Reply-To: <60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com> References: <60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com> Message-ID: Ah, thanks for the readme link. I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with UnlockDiagnosticVMOptions etc as well. to no avail. Is there something else needed to expose this? Joe On 3 August 2011 15:51, Christian Thalinger wrote: > You want: ?-XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml > > The README of the visualizer also helps: > > http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README > > -- Christian > > On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote: > >> Hi, >> >> I've been trying to play with igv from >> http://ssw.jku.at/General/Staff/TW/igv.html, >> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate >> the required log files. What sort of files should I expect the igv to >> be able to read? The example files are graphDocument XMLs. I was >> hoping to be able to generate a file with something like the >> following: >> >> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml >> >> Needless to say, these hotspot_log files are totally different and the >> igv barfs with the below. >> >> java.lang.NullPointerException >> ? ? ? at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70) >> ? ? ? at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128) >> ? ? ? at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572) >> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997) >> >> >> How do I get the jvm to generate the right output file? >> >> Many thanks, >> Joe > > From christian.thalinger at oracle.com Wed Aug 3 09:40:45 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 3 Aug 2011 18:40:45 +0200 Subject: IdealGraphVisualizer file compatibility In-Reply-To: References: <60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com> Message-ID: You need a debug build. -- Christian On Aug 3, 2011, at 6:37 PM, Joe Kearney wrote: > Ah, thanks for the readme link. > > I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the > PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with > UnlockDiagnosticVMOptions etc as well. to no avail. Is there something > else needed to expose this? > > Joe > > On 3 August 2011 15:51, Christian Thalinger > wrote: >> You want: -XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml >> >> The README of the visualizer also helps: >> >> http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README >> >> -- Christian >> >> On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote: >> >>> Hi, >>> >>> I've been trying to play with igv from >>> http://ssw.jku.at/General/Staff/TW/igv.html, >>> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate >>> the required log files. What sort of files should I expect the igv to >>> be able to read? The example files are graphDocument XMLs. I was >>> hoping to be able to generate a file with something like the >>> following: >>> >>> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml >>> >>> Needless to say, these hotspot_log files are totally different and the >>> igv barfs with the below. >>> >>> java.lang.NullPointerException >>> at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70) >>> at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128) >>> at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572) >>> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997) >>> >>> >>> How do I get the jvm to generate the right output file? >>> >>> Many thanks, >>> Joe >> >> From tom.rodriguez at oracle.com Wed Aug 3 09:42:05 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 3 Aug 2011 09:42:05 -0700 Subject: IdealGraphVisualizer file compatibility In-Reply-To: References: <60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com> Message-ID: It's not available in the product as it's really intended for developers. Use a fastdebug build. tom On Aug 3, 2011, at 9:37 AM, Joe Kearney wrote: > Ah, thanks for the readme link. > > I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the > PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with > UnlockDiagnosticVMOptions etc as well. to no avail. Is there something > else needed to expose this? > > Joe > > On 3 August 2011 15:51, Christian Thalinger > wrote: >> You want: -XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml >> >> The README of the visualizer also helps: >> >> http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README >> >> -- Christian >> >> On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote: >> >>> Hi, >>> >>> I've been trying to play with igv from >>> http://ssw.jku.at/General/Staff/TW/igv.html, >>> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate >>> the required log files. What sort of files should I expect the igv to >>> be able to read? The example files are graphDocument XMLs. I was >>> hoping to be able to generate a file with something like the >>> following: >>> >>> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml >>> >>> Needless to say, these hotspot_log files are totally different and the >>> igv barfs with the below. >>> >>> java.lang.NullPointerException >>> at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70) >>> at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128) >>> at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572) >>> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997) >>> >>> >>> How do I get the jvm to generate the right output file? >>> >>> Many thanks, >>> Joe >> >> From joe.j.kearney at gmail.com Wed Aug 3 10:00:26 2011 From: joe.j.kearney at gmail.com (Joe Kearney) Date: Wed, 3 Aug 2011 18:00:26 +0100 Subject: IdealGraphVisualizer file compatibility In-Reply-To: References: <60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com> Message-ID: Oh ok, I didn't realise. Thanks. Are there any plans to make it more widely available? I can see it being useful for experimenting to squeeze performance. Thanks, Joe On 3 August 2011 17:42, Tom Rodriguez wrote: > It's not available in the product as it's really intended for developers. ?Use a fastdebug build. > > tom > > On Aug 3, 2011, at 9:37 AM, Joe Kearney wrote: > >> Ah, thanks for the readme link. >> >> I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the >> PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with >> UnlockDiagnosticVMOptions etc as well. to no avail. Is there something >> else needed to expose this? >> >> Joe >> >> On 3 August 2011 15:51, Christian Thalinger >> wrote: >>> You want: ?-XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml >>> >>> The README of the visualizer also helps: >>> >>> http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README >>> >>> -- Christian >>> >>> On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote: >>> >>>> Hi, >>>> >>>> I've been trying to play with igv from >>>> http://ssw.jku.at/General/Staff/TW/igv.html, >>>> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate >>>> the required log files. What sort of files should I expect the igv to >>>> be able to read? The example files are graphDocument XMLs. I was >>>> hoping to be able to generate a file with something like the >>>> following: >>>> >>>> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml >>>> >>>> Needless to say, these hotspot_log files are totally different and the >>>> igv barfs with the below. >>>> >>>> java.lang.NullPointerException >>>> ? ? ? at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70) >>>> ? ? ? at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128) >>>> ? ? ? at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572) >>>> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997) >>>> >>>> >>>> How do I get the jvm to generate the right output file? >>>> >>>> Many thanks, >>>> Joe >>> >>> > > From igor.veresov at oracle.com Wed Aug 3 13:40:17 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 03 Aug 2011 13:40:17 -0700 Subject: review(XXS): 7060842: UseNUMA crash with UseHugreTLBFS running SPECjvm2008 Message-ID: <4E39B231.5070606@oracle.com> It seems that madvise(MADV_FREE) breaks pages reservation semantics of the the underlying segment. With tight memory constraints this would cause a race for pages and a segfault if the JVM louses. The solution is to revert back to the previous implementation of os::free_memory() that used mmap(). Webrev: http://cr.openjdk.java.net/~iveresov/7060842/webrev.00/ Tested is gc test suite. igor From vladimir.kozlov at oracle.com Thu Aug 4 18:19:42 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 04 Aug 2011 18:19:42 -0700 Subject: Request for reviews (L): 7063629: use cbcond in C2 generated code on T4 Message-ID: <4E3B452E.10509@oracle.com> http://cr.openjdk.java.net/~kvn/7063629/webrev 7063629: use cbcond in C2 generated code on T4 The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86. Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back. Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding. Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions. Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file. Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64. Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte). The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it. From christian.thalinger at oracle.com Fri Aug 5 06:26:26 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 5 Aug 2011 15:26:26 +0200 Subject: review for 6990212: JSR 292 JVMTI MethodEnter hook is not called for JSR 292 bootstrap and target methods In-Reply-To: <8D08FBBE-B796-45C1-A8DC-626531ABD5C2@oracle.com> References: <6FC4D868-6EC6-4DE5-92C4-A55B42AF3CFE@oracle.com> <25A1B825-BC91-4F1E-B7B1-C8E507F8EA34@oracle.com> <839E75F4-67A3-4C3B-AD06-9985EB762357@oracle.com> <9E4737BC-6971-42B4-B9B4-C5BC9A2FCA1C@oracle.com> <8D08FBBE-B796-45C1-A8DC-626531ABD5C2@oracle.com> Message-ID: <8A9AEEB6-BB68-4D9A-A762-97C0561FC2B8@oracle.com> I really had this feeling that this change is going to break something. Two JDK tests are failing on x86 and SPARC: FAILED: java/lang/invoke/JavaDocExamplesTest.java FAILED: java/lang/invoke/MethodHandlesTest.java It's the raise_exception path: JUnit version 4.4 .......................................E.E. Time: 1.767 There were 2 failures: 1) testInterfaceCast(test.java.lang.invoke.MethodHandlesTest) java.lang.InternalError: unexpected code -38348624: required class java.lang.Number but encountered class java.lang.String at java.lang.invoke.MethodHandleNatives.raiseException(MethodHandleNatives.java:375) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:566) at test.java.lang.invoke.MethodHandlesTest.testInterfaceCast(MethodHandlesTest.java:2231) at test.java.lang.invoke.MethodHandlesTest.testInterfaceCast(MethodHandlesTest.java:2208) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59) at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98) at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79) at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87) at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77) at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42) at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88) at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51) at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44) at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27) at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37) at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42) at org.junit.internal.runners.CompositeRunner.runChildren(CompositeRunner.java:33) at org.junit.internal.runners.CompositeRunner.run(CompositeRunner.java:28) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at org.junit.runner.JUnitCore.run(JUnitCore.java:109) at org.junit.runner.JUnitCore.run(JUnitCore.java:100) at org.junit.runner.JUnitCore.runMain(JUnitCore.java:81) at org.junit.runner.JUnitCore.main(JUnitCore.java:44) 2) testCastFailure(test.java.lang.invoke.MethodHandlesTest) java.lang.InternalError: unexpected code -38348480: required class java.lang.Integer but encountered class java.lang.String at java.lang.invoke.MethodHandleNatives.raiseException(MethodHandleNatives.java:375) at test.java.lang.invoke.MethodHandlesTest.testCastFailure(MethodHandlesTest.java:2340) at test.java.lang.invoke.MethodHandlesTest.testCastFailure(MethodHandlesTest.java:2251) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59) at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98) at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79) at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87) at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77) at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42) at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88) at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51) at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44) at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27) at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37) at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42) at org.junit.internal.runners.CompositeRunner.runChildren(CompositeRunner.java:33) at org.junit.internal.runners.CompositeRunner.run(CompositeRunner.java:28) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at org.junit.runner.JUnitCore.run(JUnitCore.java:109) at org.junit.runner.JUnitCore.run(JUnitCore.java:100) at org.junit.runner.JUnitCore.runMain(JUnitCore.java:81) at org.junit.runner.JUnitCore.main(JUnitCore.java:44) FAILURES!!! Tests run: 41, Failures: 2 -- Christian On Jul 14, 2011, at 9:49 PM, Tom Rodriguez wrote: > > On Jul 12, 2011, at 9:38 AM, Christian Thalinger wrote: > >> On Jul 11, 2011, at 5:43 PM, Christian Thalinger wrote: >>> On Jul 9, 2011, at 12:21 AM, Tom Rodriguez wrote: >>>> Coleen point out that it's confusing to reuse the name jump_from_interpreted since we're not really in the interpreter. I've changed it to jump_from_method_handle and left that note that it parallels jump_from_interpreted. >>> >>> This looks good. Although I'm a little worried about the raise_exception changes on SPARC. In the past I had various crashes with versions that used the interpreter stack to pass the arguments. That's why I changed it to the simpler, more reliable current version (which uses the compiler calling convention). Maybe I got adjust_SP_and_Gargs_down_by_slots right and there is no problem now. >>> >>> Just to be sure I'm currently running JRuby's benchmarks (my memory tells me that I had the crashes with these benchmarks) on two different SPARC boxes. I'll let you know when they are finished. >> >> Sorry, it took a little longer to run them because one of the benchmarks (bench_full_load_path.rb) does not finish (it hangs around doing nothing). Anyway, all others look good. > > Thanks. I fixed the interp_only check to look more like the original code and reran the mlvm tests and they all look good. > > tom > >> >> -- Christian > From christian.thalinger at oracle.com Fri Aug 5 06:32:14 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 5 Aug 2011 15:32:14 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled Message-ID: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> http://cr.openjdk.java.net/~twisti/7071653 7071653: JSR 292: call site change notification should be pushed not pulled Reviewed-by: Currently every speculatively inlined method handle call site has a guard that compares the current target of the CallSite object to the inlined one. This per-invocation overhead can be removed if the notification is changed from pulled to pushed (i.e. deoptimization). I had to change the logic in TemplateTable::patch_bytecode to skip bytecode quickening for putfield instructions when the put_code written to the constant pool cache is zero. This is required so that every execution of a putfield to CallSite.target calls out to InterpreterRuntime::resolve_get_put to do the deoptimization of depending compiled methods. I also had to change the dependency machinery to understand other dependencies than class hierarchy ones. DepChange got the super-type of two new dependencies, KlassDepChange and CallSiteDepChange. Tested with JRuby tests and benchmarks, hand-written testcases, JDK tests and vm.mlvm tests. Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, second with 7071653). Since the CallSite targets don't change during the runtime of this benchmark we can see the performance benefit of eliminating the guard: $ jruby --server bench/bench_fib_recursive.rb 5 35 0.883000 0.000000 0.883000 ( 0.854000) 0.715000 0.000000 0.715000 ( 0.715000) 0.712000 0.000000 0.712000 ( 0.712000) 0.713000 0.000000 0.713000 ( 0.713000) 0.713000 0.000000 0.713000 ( 0.712000) $ jruby --server bench/bench_fib_recursive.rb 5 35 0.772000 0.000000 0.772000 ( 0.742000) 0.624000 0.000000 0.624000 ( 0.624000) 0.621000 0.000000 0.621000 ( 0.621000) 0.622000 0.000000 0.622000 ( 0.622000) 0.622000 0.000000 0.622000 ( 0.621000) From tom.rodriguez at oracle.com Fri Aug 5 09:48:28 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 5 Aug 2011 09:48:28 -0700 Subject: review for 6990212: JSR 292 JVMTI MethodEnter hook is not called for JSR 292 bootstrap and target methods In-Reply-To: <8A9AEEB6-BB68-4D9A-A762-97C0561FC2B8@oracle.com> References: <6FC4D868-6EC6-4DE5-92C4-A55B42AF3CFE@oracle.com> <25A1B825-BC91-4F1E-B7B1-C8E507F8EA34@oracle.com> <839E75F4-67A3-4C3B-AD06-9985EB762357@oracle.com> <9E4737BC-6971-42B4-B9B4-C5BC9A2FCA1C@oracle.com> <8D08FBBE-B796-45C1-A8DC-626531ABD5C2@oracle.com> <8A9AEEB6-BB68-4D9A-A762-97C0561FC2B8@oracle.com> Message-ID: <2115F2F0-1A22-46CB-9ACE-DB1B404A4853@oracle.com> Yeah vladimir reported something similar to me last night. I'm looking at it. tom On Aug 5, 2011, at 6:26 AM, Christian Thalinger wrote: > I really had this feeling that this change is going to break something. Two JDK tests are failing on x86 and SPARC: > > FAILED: java/lang/invoke/JavaDocExamplesTest.java > FAILED: java/lang/invoke/MethodHandlesTest.java > > It's the raise_exception path: > > JUnit version 4.4 > .......................................E.E. > Time: 1.767 > There were 2 failures: > 1) testInterfaceCast(test.java.lang.invoke.MethodHandlesTest) > java.lang.InternalError: unexpected code -38348624: required class java.lang.Number but encountered class java.lang.String > at java.lang.invoke.MethodHandleNatives.raiseException(MethodHandleNatives.java:375) > at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:566) > at test.java.lang.invoke.MethodHandlesTest.testInterfaceCast(MethodHandlesTest.java:2231) > at test.java.lang.invoke.MethodHandlesTest.testInterfaceCast(MethodHandlesTest.java:2208) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59) > at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98) > at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79) > at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87) > at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77) > at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42) > at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88) > at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51) > at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44) > at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27) > at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37) > at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42) > at org.junit.internal.runners.CompositeRunner.runChildren(CompositeRunner.java:33) > at org.junit.internal.runners.CompositeRunner.run(CompositeRunner.java:28) > at org.junit.runner.JUnitCore.run(JUnitCore.java:130) > at org.junit.runner.JUnitCore.run(JUnitCore.java:109) > at org.junit.runner.JUnitCore.run(JUnitCore.java:100) > at org.junit.runner.JUnitCore.runMain(JUnitCore.java:81) > at org.junit.runner.JUnitCore.main(JUnitCore.java:44) > 2) testCastFailure(test.java.lang.invoke.MethodHandlesTest) > java.lang.InternalError: unexpected code -38348480: required class java.lang.Integer but encountered class java.lang.String > at java.lang.invoke.MethodHandleNatives.raiseException(MethodHandleNatives.java:375) > at test.java.lang.invoke.MethodHandlesTest.testCastFailure(MethodHandlesTest.java:2340) > at test.java.lang.invoke.MethodHandlesTest.testCastFailure(MethodHandlesTest.java:2251) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59) > at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98) > at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79) > at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87) > at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77) > at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42) > at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88) > at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51) > at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44) > at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27) > at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37) > at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42) > at org.junit.internal.runners.CompositeRunner.runChildren(CompositeRunner.java:33) > at org.junit.internal.runners.CompositeRunner.run(CompositeRunner.java:28) > at org.junit.runner.JUnitCore.run(JUnitCore.java:130) > at org.junit.runner.JUnitCore.run(JUnitCore.java:109) > at org.junit.runner.JUnitCore.run(JUnitCore.java:100) > at org.junit.runner.JUnitCore.runMain(JUnitCore.java:81) > at org.junit.runner.JUnitCore.main(JUnitCore.java:44) > > FAILURES!!! > Tests run: 41, Failures: 2 > > -- Christian > > On Jul 14, 2011, at 9:49 PM, Tom Rodriguez wrote: > >> >> On Jul 12, 2011, at 9:38 AM, Christian Thalinger wrote: >> >>> On Jul 11, 2011, at 5:43 PM, Christian Thalinger wrote: >>>> On Jul 9, 2011, at 12:21 AM, Tom Rodriguez wrote: >>>>> Coleen point out that it's confusing to reuse the name jump_from_interpreted since we're not really in the interpreter. I've changed it to jump_from_method_handle and left that note that it parallels jump_from_interpreted. >>>> >>>> This looks good. Although I'm a little worried about the raise_exception changes on SPARC. In the past I had various crashes with versions that used the interpreter stack to pass the arguments. That's why I changed it to the simpler, more reliable current version (which uses the compiler calling convention). Maybe I got adjust_SP_and_Gargs_down_by_slots right and there is no problem now. >>>> >>>> Just to be sure I'm currently running JRuby's benchmarks (my memory tells me that I had the crashes with these benchmarks) on two different SPARC boxes. I'll let you know when they are finished. >>> >>> Sorry, it took a little longer to run them because one of the benchmarks (bench_full_load_path.rb) does not finish (it hangs around doing nothing). Anyway, all others look good. >> >> Thanks. I fixed the interp_only check to look more like the original code and reran the mlvm tests and they all look good. >> >> tom >> >>> >>> -- Christian >> > From forax at univ-mlv.fr Fri Aug 5 10:19:56 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Fri, 05 Aug 2011 19:19:56 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> Message-ID: <4E3C263C.50604@univ-mlv.fr> Cool :) R?mi On 08/05/2011 03:32 PM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7071653 > > 7071653: JSR 292: call site change notification should be pushed not pulled > Reviewed-by: > > Currently every speculatively inlined method handle call site has a > guard that compares the current target of the CallSite object to the > inlined one. This per-invocation overhead can be removed if the > notification is changed from pulled to pushed (i.e. deoptimization). > > I had to change the logic in TemplateTable::patch_bytecode to skip > bytecode quickening for putfield instructions when the put_code > written to the constant pool cache is zero. This is required so that > every execution of a putfield to CallSite.target calls out to > InterpreterRuntime::resolve_get_put to do the deoptimization of > depending compiled methods. > > I also had to change the dependency machinery to understand other > dependencies than class hierarchy ones. DepChange got the super-type > of two new dependencies, KlassDepChange and CallSiteDepChange. > > Tested with JRuby tests and benchmarks, hand-written testcases, JDK > tests and vm.mlvm tests. > > Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, > second with 7071653). Since the CallSite targets don't change during > the runtime of this benchmark we can see the performance benefit of > eliminating the guard: > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.883000 0.000000 0.883000 ( 0.854000) > 0.715000 0.000000 0.715000 ( 0.715000) > 0.712000 0.000000 0.712000 ( 0.712000) > 0.713000 0.000000 0.713000 ( 0.713000) > 0.713000 0.000000 0.713000 ( 0.712000) > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.772000 0.000000 0.772000 ( 0.742000) > 0.624000 0.000000 0.624000 ( 0.624000) > 0.621000 0.000000 0.621000 ( 0.621000) > 0.622000 0.000000 0.622000 ( 0.622000) > 0.622000 0.000000 0.622000 ( 0.621000) > From tom.rodriguez at oracle.com Fri Aug 5 13:22:37 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 5 Aug 2011 13:22:37 -0700 Subject: review for 7075623: 6990212 broke raiseException in 64 bit Message-ID: http://cr.openjdk.java.net/~never/7075623 3 lines changed: 0 ins; 0 del; 3 mod; 4699 unchg 7075623: 6990212 broke raiseException in 64 bit Reviewed-by: The fix for 6990212 included making the raiseException path do a normal dispatch instead of always using the compiler entry. The assembly for 64 bit had a few issues. On x86 the saved sp register is wrong which causes rarg0_code to be killed. On sparc the code should be passed as an int instead of a ptr which causes problems because of endianness. I also modified the x86 code to do the same. Tested with original regression test on sparc/x86 32/64 -Xcomp/-Xmixed. I also reran the failing JDK regression tests. From headius at headius.com Fri Aug 5 14:26:27 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Fri, 5 Aug 2011 16:26:27 -0500 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <4E3C263C.50604@univ-mlv.fr> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3C263C.50604@univ-mlv.fr> Message-ID: <37CFE89F-7AC8-4A37-979A-F7EF4B06745B@headius.com> I concur! I can't wait to see the new asm with recent fixes! - Charlie (mobile) On Aug 5, 2011, at 12:19, R?mi Forax wrote: > Cool :) > > R?mi > > On 08/05/2011 03:32 PM, Christian Thalinger wrote: >> http://cr.openjdk.java.net/~twisti/7071653 >> >> 7071653: JSR 292: call site change notification should be pushed not pulled >> Reviewed-by: >> >> Currently every speculatively inlined method handle call site has a >> guard that compares the current target of the CallSite object to the >> inlined one. This per-invocation overhead can be removed if the >> notification is changed from pulled to pushed (i.e. deoptimization). >> >> I had to change the logic in TemplateTable::patch_bytecode to skip >> bytecode quickening for putfield instructions when the put_code >> written to the constant pool cache is zero. This is required so that >> every execution of a putfield to CallSite.target calls out to >> InterpreterRuntime::resolve_get_put to do the deoptimization of >> depending compiled methods. >> >> I also had to change the dependency machinery to understand other >> dependencies than class hierarchy ones. DepChange got the super-type >> of two new dependencies, KlassDepChange and CallSiteDepChange. >> >> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >> tests and vm.mlvm tests. >> >> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >> second with 7071653). Since the CallSite targets don't change during >> the runtime of this benchmark we can see the performance benefit of >> eliminating the guard: >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.883000 0.000000 0.883000 ( 0.854000) >> 0.715000 0.000000 0.715000 ( 0.715000) >> 0.712000 0.000000 0.712000 ( 0.712000) >> 0.713000 0.000000 0.713000 ( 0.713000) >> 0.713000 0.000000 0.713000 ( 0.712000) >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.772000 0.000000 0.772000 ( 0.742000) >> 0.624000 0.000000 0.624000 ( 0.624000) >> 0.621000 0.000000 0.621000 ( 0.621000) >> 0.622000 0.000000 0.622000 ( 0.622000) >> 0.622000 0.000000 0.622000 ( 0.621000) >> > From vladimir.kozlov at oracle.com Sat Aug 6 10:50:44 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Sat, 06 Aug 2011 17:50:44 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7075559: JPRT windows_x64 build failure Message-ID: <20110806175046.BF9DB479AD@hg.openjdk.java.net> Changeset: 4aa5974a06dd Author: kvn Date: 2011-08-06 08:28 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/4aa5974a06dd 7075559: JPRT windows_x64 build failure Summary: use SA_CLASSDIR variable instead of dirsctory saclasses. Reviewed-by: kamg, dcubed ! make/linux/makefiles/defs.make ! make/solaris/makefiles/defs.make ! make/solaris/makefiles/saproc.make ! make/windows/makefiles/defs.make ! make/windows/makefiles/sa.make From vladimir.kozlov at oracle.com Sun Aug 7 15:35:29 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 07 Aug 2011 15:35:29 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> Message-ID: <4E3F1331.2000909@oracle.com> Christian, You need to add big comment to the new code in templateTable_.cpp explaining what it does and why. Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? Add assert(byte_no == -1, ) to default: case to make sure you got all cases above it. I am concern about using next short branch in new code in templateTable_sparc.cpp: cmp_and_br_short(..., L_patch_done); // don't patch There is __ stop() call which generates a lot of code so that label L_patch_done could be far. Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. I don't like assignments in condition and implicit NULL checks. Can you change check_dependency() to next?: klassOop check_dependency() { klassOop result = check_klass_dependency(NULL); if (result != NULL) return result; return check_call_site_dependency(NULL); } In interpreterRuntime.cpp initialize marked: int marked = 0; Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. Vladimir On 8/5/11 6:32 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7071653 > > 7071653: JSR 292: call site change notification should be pushed not pulled > Reviewed-by: > > Currently every speculatively inlined method handle call site has a > guard that compares the current target of the CallSite object to the > inlined one. This per-invocation overhead can be removed if the > notification is changed from pulled to pushed (i.e. deoptimization). > > I had to change the logic in TemplateTable::patch_bytecode to skip > bytecode quickening for putfield instructions when the put_code > written to the constant pool cache is zero. This is required so that > every execution of a putfield to CallSite.target calls out to > InterpreterRuntime::resolve_get_put to do the deoptimization of > depending compiled methods. > > I also had to change the dependency machinery to understand other > dependencies than class hierarchy ones. DepChange got the super-type > of two new dependencies, KlassDepChange and CallSiteDepChange. > > Tested with JRuby tests and benchmarks, hand-written testcases, JDK > tests and vm.mlvm tests. > > Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, > second with 7071653). Since the CallSite targets don't change during > the runtime of this benchmark we can see the performance benefit of > eliminating the guard: > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.883000 0.000000 0.883000 ( 0.854000) > 0.715000 0.000000 0.715000 ( 0.715000) > 0.712000 0.000000 0.712000 ( 0.712000) > 0.713000 0.000000 0.713000 ( 0.713000) > 0.713000 0.000000 0.713000 ( 0.712000) > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.772000 0.000000 0.772000 ( 0.742000) > 0.624000 0.000000 0.624000 ( 0.624000) > 0.621000 0.000000 0.621000 ( 0.621000) > 0.622000 0.000000 0.622000 ( 0.622000) > 0.622000 0.000000 0.622000 ( 0.621000) > From christian.thalinger at oracle.com Mon Aug 8 01:34:50 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 8 Aug 2011 10:34:50 +0200 Subject: review for 7075623: 6990212 broke raiseException in 64 bit In-Reply-To: References: Message-ID: <98787BDF-3C3A-45AF-B2D7-CDB4763E8D0D@oracle.com> Looks good. -- Christian On Aug 5, 2011, at 10:22 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/7075623 > 3 lines changed: 0 ins; 0 del; 3 mod; 4699 unchg > > 7075623: 6990212 broke raiseException in 64 bit > Reviewed-by: > > The fix for 6990212 included making the raiseException path do a > normal dispatch instead of always using the compiler entry. The > assembly for 64 bit had a few issues. On x86 the saved sp register is > wrong which causes rarg0_code to be killed. On sparc the code should > be passed as an int instead of a ptr which causes problems because of > endianness. I also modified the x86 code to do the same. Tested with > original regression test on sparc/x86 32/64 -Xcomp/-Xmixed. I also > reran the failing JDK regression tests. > From gbenson at redhat.com Mon Aug 8 03:25:22 2011 From: gbenson at redhat.com (Gary Benson) Date: Mon, 8 Aug 2011 11:25:22 +0100 Subject: Review Request: zero/shark doesn't build after b147-fcs In-Reply-To: <7ADFDF69-ADDA-4B24-8F78-82D52F46FD2B@oracle.com> References: <4E1C5E4F.1080307@zafena.se> <4E2049CA.8060506@LGonQn.Org> <20110715145127.GA3311@redhat.com> <7ADFDF69-ADDA-4B24-8F78-82D52F46FD2B@oracle.com> Message-ID: <20110808102522.GB2761@redhat.com> Christian Thalinger wrote: > On Jul 15, 2011, at 4:51 PM, Gary Benson wrote: > > Chris Phillips wrote: > > > http://lgonqn.org/temp/ChrisPhi/webrev-sharkContext.hpp-typo-in-assert/ > > > > Nice catch :) > > > > > http://lgonqn.org/temp/ChrisPhi/webrev-methodHandles_zero.hpp-missing/ > > > > You could probably make adapter_code_size be 0, or something small > > 1ike 1*k. Nothing will presumably be generated into these buffers > > after all? > > Gary, can I add you as a reviewer? -- Christian Sure. Sorry for the delay in replying, I was on PTO. Thanks, Gary -- http://gbenson.net/ From christian.thalinger at oracle.com Mon Aug 8 06:56:00 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 8 Aug 2011 15:56:00 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <4E3F1331.2000909@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> Message-ID: On Aug 8, 2011, at 12:35 AM, Vladimir Kozlov wrote: > Christian, > > You need to add big comment to the new code in templateTable_.cpp explaining what it does and why. Done. I made the wording a little more general because Tom's effectively final work might use the same machinery. > > Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. _indices in CosntantPoolCacheEntry is defined as intx: volatile intx _indices; // constant pool index & rewrite bytecodes and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: // bit number |31 0| // bit length |-8--|-8--|---16----| // -------------------------------- // _indices [ b2 | b1 | index ] Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. > > Add assert(byte_no == -1, ) to default: case to make sure you got all cases above it. Done. > > I am concern about using next short branch in new code in templateTable_sparc.cpp: > > cmp_and_br_short(..., L_patch_done); // don't patch > > There is __ stop() call which generates a lot of code so that label L_patch_done could be far. Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? > > > Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. > > I don't like assignments in condition and implicit NULL checks. Can you change check_dependency() to next?: > > klassOop check_dependency() { > klassOop result = check_klass_dependency(NULL); > if (result != NULL) return result; > return check_call_site_dependency(NULL); > } Done. > > In interpreterRuntime.cpp initialize marked: int marked = 0; OK. > > Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. The spec of MutableCallSite says: "For target values which will be frequently updated, consider using a volatile call site instead." And VolatileCallSite says: "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. In other respects, a VolatileCallSite is interchangeable with MutableCallSite." Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. Additionally I had to do two small changes because the build was broken on some configurations: - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; and - MutexLockerEx ccl(CodeCache_lock, thread); + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); I updated the webrev. -- Christian > > > Vladimir > > On 8/5/11 6:32 AM, Christian Thalinger wrote: >> http://cr.openjdk.java.net/~twisti/7071653 >> >> 7071653: JSR 292: call site change notification should be pushed not pulled >> Reviewed-by: >> >> Currently every speculatively inlined method handle call site has a >> guard that compares the current target of the CallSite object to the >> inlined one. This per-invocation overhead can be removed if the >> notification is changed from pulled to pushed (i.e. deoptimization). >> >> I had to change the logic in TemplateTable::patch_bytecode to skip >> bytecode quickening for putfield instructions when the put_code >> written to the constant pool cache is zero. This is required so that >> every execution of a putfield to CallSite.target calls out to >> InterpreterRuntime::resolve_get_put to do the deoptimization of >> depending compiled methods. >> >> I also had to change the dependency machinery to understand other >> dependencies than class hierarchy ones. DepChange got the super-type >> of two new dependencies, KlassDepChange and CallSiteDepChange. >> >> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >> tests and vm.mlvm tests. >> >> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >> second with 7071653). Since the CallSite targets don't change during >> the runtime of this benchmark we can see the performance benefit of >> eliminating the guard: >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.883000 0.000000 0.883000 ( 0.854000) >> 0.715000 0.000000 0.715000 ( 0.715000) >> 0.712000 0.000000 0.712000 ( 0.712000) >> 0.713000 0.000000 0.713000 ( 0.713000) >> 0.713000 0.000000 0.713000 ( 0.712000) >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.772000 0.000000 0.772000 ( 0.742000) >> 0.624000 0.000000 0.624000 ( 0.624000) >> 0.621000 0.000000 0.621000 ( 0.621000) >> 0.622000 0.000000 0.622000 ( 0.622000) >> 0.622000 0.000000 0.622000 ( 0.621000) >> From vladimir.kozlov at oracle.com Mon Aug 8 07:55:32 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 08 Aug 2011 07:55:32 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> Message-ID: <4E3FF8E4.2070302@oracle.com> Christian, Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? On 8/8/11 6:56 AM, Christian Thalinger wrote: >> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? > > Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. > > _indices in CosntantPoolCacheEntry is defined as intx: > > volatile intx _indices; // constant pool index& rewrite bytecodes > > and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: > > // bit number |31 0| > // bit length |-8--|-8--|---16----| > // -------------------------------- > // _indices [ b2 | b1 | index ] > > Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). >> >> I am concern about using next short branch in new code in templateTable_sparc.cpp: >> >> cmp_and_br_short(..., L_patch_done); // don't patch >> >> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. > > Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? > Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. >> >> >> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. > > Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. > OK. > >> >> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. > > The spec of MutableCallSite says: > > "For target values which will be frequently updated, consider using a volatile call site instead." > > And VolatileCallSite says: > > "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. > > Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. > > In other respects, a VolatileCallSite is interchangeable with MutableCallSite." > > Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. > Thank you for explaining it. > Additionally I had to do two small changes because the build was broken on some configurations: > > - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; > + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; > > and > > - MutexLockerEx ccl(CodeCache_lock, thread); > + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); > > I updated the webrev. Good. Vladimir > > -- Christian > >> >> >> Vladimir >> >> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>> http://cr.openjdk.java.net/~twisti/7071653 >>> >>> 7071653: JSR 292: call site change notification should be pushed not pulled >>> Reviewed-by: >>> >>> Currently every speculatively inlined method handle call site has a >>> guard that compares the current target of the CallSite object to the >>> inlined one. This per-invocation overhead can be removed if the >>> notification is changed from pulled to pushed (i.e. deoptimization). >>> >>> I had to change the logic in TemplateTable::patch_bytecode to skip >>> bytecode quickening for putfield instructions when the put_code >>> written to the constant pool cache is zero. This is required so that >>> every execution of a putfield to CallSite.target calls out to >>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>> depending compiled methods. >>> >>> I also had to change the dependency machinery to understand other >>> dependencies than class hierarchy ones. DepChange got the super-type >>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>> >>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>> tests and vm.mlvm tests. >>> >>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>> second with 7071653). Since the CallSite targets don't change during >>> the runtime of this benchmark we can see the performance benefit of >>> eliminating the guard: >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.883000 0.000000 0.883000 ( 0.854000) >>> 0.715000 0.000000 0.715000 ( 0.715000) >>> 0.712000 0.000000 0.712000 ( 0.712000) >>> 0.713000 0.000000 0.713000 ( 0.713000) >>> 0.713000 0.000000 0.713000 ( 0.712000) >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.772000 0.000000 0.772000 ( 0.742000) >>> 0.624000 0.000000 0.624000 ( 0.624000) >>> 0.621000 0.000000 0.621000 ( 0.621000) >>> 0.622000 0.000000 0.622000 ( 0.622000) >>> 0.622000 0.000000 0.622000 ( 0.621000) >>> > From christian.thalinger at oracle.com Mon Aug 8 09:40:46 2011 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Mon, 08 Aug 2011 16:40:46 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7071823: Zero: zero/shark doesn't build after b147-fcs Message-ID: <20110808164048.9272747A1C@hg.openjdk.java.net> Changeset: a3142bdb6707 Author: twisti Date: 2011-08-08 05:49 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a3142bdb6707 7071823: Zero: zero/shark doesn't build after b147-fcs Reviewed-by: gbenson, twisti Contributed-by: Chris Phillips ! src/cpu/zero/vm/frame_zero.cpp + src/cpu/zero/vm/methodHandles_zero.hpp ! src/cpu/zero/vm/sharedRuntime_zero.cpp ! src/share/vm/shark/sharkContext.hpp From christian.thalinger at oracle.com Mon Aug 8 11:12:06 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 8 Aug 2011 20:12:06 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <4E3FF8E4.2070302@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> <4E3FF8E4.2070302@oracle.com> Message-ID: On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: > Christian, > > Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. > > On 8/8/11 6:56 AM, Christian Thalinger wrote: >>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? >> >> Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. >> >> _indices in CosntantPoolCacheEntry is defined as intx: >> >> volatile intx _indices; // constant pool index& rewrite bytecodes >> >> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: >> >> // bit number |31 0| >> // bit length |-8--|-8--|---16----| >> // -------------------------------- >> // _indices [ b2 | b1 | index ] >> >> Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. > > I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. > >>> >>> I am concern about using next short branch in new code in templateTable_sparc.cpp: >>> >>> cmp_and_br_short(..., L_patch_done); // don't patch >>> >>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. >> >> Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? >> > > Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. That works. > >>> >>> >>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. >> >> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. >> > > OK. > >> >>> >>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. >> >> The spec of MutableCallSite says: >> >> "For target values which will be frequently updated, consider using a volatile call site instead." >> >> And VolatileCallSite says: >> >> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. >> >> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. >> >> In other respects, a VolatileCallSite is interchangeable with MutableCallSite." >> >> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. >> > > Thank you for explaining it. > >> Additionally I had to do two small changes because the build was broken on some configurations: >> >> - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; >> + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; >> >> and >> >> - MutexLockerEx ccl(CodeCache_lock, thread); >> + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); >> >> I updated the webrev. > > Good. Thanks. -- Christian > > Vladimir > >> >> -- Christian >> >>> >>> >>> Vladimir >>> >>> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>>> http://cr.openjdk.java.net/~twisti/7071653 >>>> >>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>> Reviewed-by: >>>> >>>> Currently every speculatively inlined method handle call site has a >>>> guard that compares the current target of the CallSite object to the >>>> inlined one. This per-invocation overhead can be removed if the >>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>> >>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>> bytecode quickening for putfield instructions when the put_code >>>> written to the constant pool cache is zero. This is required so that >>>> every execution of a putfield to CallSite.target calls out to >>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>> depending compiled methods. >>>> >>>> I also had to change the dependency machinery to understand other >>>> dependencies than class hierarchy ones. DepChange got the super-type >>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>> >>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>> tests and vm.mlvm tests. >>>> >>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>> second with 7071653). Since the CallSite targets don't change during >>>> the runtime of this benchmark we can see the performance benefit of >>>> eliminating the guard: >>>> >>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>> >>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>> >> From tom.rodriguez at oracle.com Mon Aug 8 11:49:16 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 8 Aug 2011 11:49:16 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> Message-ID: dependencies.cpp: in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed. It should probably look more like this: klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) { assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity"); // Same CallSite object but different target? Check this specific call site // if changes is non-NULL or validate all CallSites if ((changes == NULL || (call_site == changes->call_site())) && (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) { return ctxk; // assertion failed } assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid"); return NULL; // assertion still valid } The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked. interpreterRuntime.cpp: Please move the dependence check code into universe with the other dependence check code. Also add some comments explaining why it's doing what it's doing. doCall.cpp: Can you put in a comment explaining that VolatileCallSite is never inlined. Otherwise it looks good. tom On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7071653 > > 7071653: JSR 292: call site change notification should be pushed not pulled > Reviewed-by: > > Currently every speculatively inlined method handle call site has a > guard that compares the current target of the CallSite object to the > inlined one. This per-invocation overhead can be removed if the > notification is changed from pulled to pushed (i.e. deoptimization). > > I had to change the logic in TemplateTable::patch_bytecode to skip > bytecode quickening for putfield instructions when the put_code > written to the constant pool cache is zero. This is required so that > every execution of a putfield to CallSite.target calls out to > InterpreterRuntime::resolve_get_put to do the deoptimization of > depending compiled methods. > > I also had to change the dependency machinery to understand other > dependencies than class hierarchy ones. DepChange got the super-type > of two new dependencies, KlassDepChange and CallSiteDepChange. > > Tested with JRuby tests and benchmarks, hand-written testcases, JDK > tests and vm.mlvm tests. > > Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, > second with 7071653). Since the CallSite targets don't change during > the runtime of this benchmark we can see the performance benefit of > eliminating the guard: > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.883000 0.000000 0.883000 ( 0.854000) > 0.715000 0.000000 0.715000 ( 0.715000) > 0.712000 0.000000 0.712000 ( 0.712000) > 0.713000 0.000000 0.713000 ( 0.713000) > 0.713000 0.000000 0.713000 ( 0.712000) > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.772000 0.000000 0.772000 ( 0.742000) > 0.624000 0.000000 0.624000 ( 0.624000) > 0.621000 0.000000 0.621000 ( 0.621000) > 0.622000 0.000000 0.622000 ( 0.622000) > 0.622000 0.000000 0.622000 ( 0.621000) > From tom.rodriguez at oracle.com Mon Aug 8 11:50:57 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 8 Aug 2011 11:50:57 -0700 Subject: review for 7075623: 6990212 broke raiseException in 64 bit In-Reply-To: <98787BDF-3C3A-45AF-B2D7-CDB4763E8D0D@oracle.com> References: <98787BDF-3C3A-45AF-B2D7-CDB4763E8D0D@oracle.com> Message-ID: <193DDFC1-78CF-4097-B3F9-0D9ECBA5DB63@oracle.com> Thanks Christian and Vladimir. tom On Aug 8, 2011, at 1:34 AM, Christian Thalinger wrote: > Looks good. -- Christian > > On Aug 5, 2011, at 10:22 PM, Tom Rodriguez wrote: > >> http://cr.openjdk.java.net/~never/7075623 >> 3 lines changed: 0 ins; 0 del; 3 mod; 4699 unchg >> >> 7075623: 6990212 broke raiseException in 64 bit >> Reviewed-by: >> >> The fix for 6990212 included making the raiseException path do a >> normal dispatch instead of always using the compiler entry. The >> assembly for 64 bit had a few issues. On x86 the saved sp register is >> wrong which causes rarg0_code to be killed. On sparc the code should >> be passed as an int instead of a ptr which causes problems because of >> endianness. I also modified the x86 code to do the same. Tested with >> original regression test on sparc/x86 32/64 -Xcomp/-Xmixed. I also >> reran the failing JDK regression tests. >> > From vladimir.kozlov at oracle.com Mon Aug 8 11:52:57 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 08 Aug 2011 11:52:57 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> <4E3FF8E4.2070302@oracle.com> Message-ID: <4E403089.5010204@oracle.com> Christian Thalinger wrote: > On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: > >> Christian, >> >> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? > > No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. You lost me here. New code in resolve_get_put() is executed only for putfield to CallSite.target. But new code in patch_bytecode() skips quickening for all putfield bytecodes. My question is: can you narrow skipping quickening only for putfield to CallSite.target? Or you are saying that there is no performance difference between executing _aputfield vs _fast_aputfield? Vladimir > >> On 8/8/11 6:56 AM, Christian Thalinger wrote: >>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? >>> Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. >>> >>> _indices in CosntantPoolCacheEntry is defined as intx: >>> >>> volatile intx _indices; // constant pool index& rewrite bytecodes >>> >>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: >>> >>> // bit number |31 0| >>> // bit length |-8--|-8--|---16----| >>> // -------------------------------- >>> // _indices [ b2 | b1 | index ] >>> >>> Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. >> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). > > I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. > >>>> I am concern about using next short branch in new code in templateTable_sparc.cpp: >>>> >>>> cmp_and_br_short(..., L_patch_done); // don't patch >>>> >>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. >>> Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? >>> >> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. > > That works. > >>>> >>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. >>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. >>> >> OK. >> >>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. >>> The spec of MutableCallSite says: >>> >>> "For target values which will be frequently updated, consider using a volatile call site instead." >>> >>> And VolatileCallSite says: >>> >>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. >>> >>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. >>> >>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite." >>> >>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. >>> >> Thank you for explaining it. >> >>> Additionally I had to do two small changes because the build was broken on some configurations: >>> >>> - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; >>> + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; >>> >>> and >>> >>> - MutexLockerEx ccl(CodeCache_lock, thread); >>> + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); >>> >>> I updated the webrev. >> Good. > > Thanks. > > -- Christian > >> Vladimir >> >>> -- Christian >>> >>>> >>>> Vladimir >>>> >>>> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>>>> http://cr.openjdk.java.net/~twisti/7071653 >>>>> >>>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>>> Reviewed-by: >>>>> >>>>> Currently every speculatively inlined method handle call site has a >>>>> guard that compares the current target of the CallSite object to the >>>>> inlined one. This per-invocation overhead can be removed if the >>>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>>> >>>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>>> bytecode quickening for putfield instructions when the put_code >>>>> written to the constant pool cache is zero. This is required so that >>>>> every execution of a putfield to CallSite.target calls out to >>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>>> depending compiled methods. >>>>> >>>>> I also had to change the dependency machinery to understand other >>>>> dependencies than class hierarchy ones. DepChange got the super-type >>>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>>> >>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>>> tests and vm.mlvm tests. >>>>> >>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>>> second with 7071653). Since the CallSite targets don't change during >>>>> the runtime of this benchmark we can see the performance benefit of >>>>> eliminating the guard: >>>>> >>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>>> >>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>>> > From tom.rodriguez at oracle.com Mon Aug 8 12:08:52 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 8 Aug 2011 12:08:52 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <4E403089.5010204@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> <4E3FF8E4.2070302@oracle.com> <4E403089.5010204@oracle.com> Message-ID: On Aug 8, 2011, at 11:52 AM, Vladimir Kozlov wrote: > Christian Thalinger wrote: >> On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: >> >>> Christian, >>> >>> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? >> >> No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. > > You lost me here. New code in resolve_get_put() is executed only for putfield to > CallSite.target. But new code in patch_bytecode() skips quickening for all > putfield bytecodes. My question is: can you narrow skipping quickening only for > putfield to CallSite.target? Or you are saying that there is no performance > difference between executing _aputfield vs _fast_aputfield? It only skips quickening if put_code is zero, which is only done for CallSite.target. All the others proceed as they used to. tom > > Vladimir > >> >>> On 8/8/11 6:56 AM, Christian Thalinger wrote: >>>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? >>>> Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. >>>> >>>> _indices in CosntantPoolCacheEntry is defined as intx: >>>> >>>> volatile intx _indices; // constant pool index& rewrite bytecodes >>>> >>>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: >>>> >>>> // bit number |31 0| >>>> // bit length |-8--|-8--|---16----| >>>> // -------------------------------- >>>> // _indices [ b2 | b1 | index ] >>>> >>>> Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. >>> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). >> >> I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. >> >>>>> I am concern about using next short branch in new code in templateTable_sparc.cpp: >>>>> >>>>> cmp_and_br_short(..., L_patch_done); // don't patch >>>>> >>>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. >>>> Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? >>>> >>> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. >> >> That works. >> >>>>> >>>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. >>>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. >>>> >>> OK. >>> >>>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. >>>> The spec of MutableCallSite says: >>>> >>>> "For target values which will be frequently updated, consider using a volatile call site instead." >>>> >>>> And VolatileCallSite says: >>>> >>>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. >>>> >>>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. >>>> >>>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite." >>>> >>>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. >>>> >>> Thank you for explaining it. >>> >>>> Additionally I had to do two small changes because the build was broken on some configurations: >>>> >>>> - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; >>>> + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; >>>> >>>> and >>>> >>>> - MutexLockerEx ccl(CodeCache_lock, thread); >>>> + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); >>>> >>>> I updated the webrev. >>> Good. >> >> Thanks. >> >> -- Christian >> >>> Vladimir >>> >>>> -- Christian >>>> >>>>> >>>>> Vladimir >>>>> >>>>> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>>>>> http://cr.openjdk.java.net/~twisti/7071653 >>>>>> >>>>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>>>> Reviewed-by: >>>>>> >>>>>> Currently every speculatively inlined method handle call site has a >>>>>> guard that compares the current target of the CallSite object to the >>>>>> inlined one. This per-invocation overhead can be removed if the >>>>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>>>> >>>>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>>>> bytecode quickening for putfield instructions when the put_code >>>>>> written to the constant pool cache is zero. This is required so that >>>>>> every execution of a putfield to CallSite.target calls out to >>>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>>>> depending compiled methods. >>>>>> >>>>>> I also had to change the dependency machinery to understand other >>>>>> dependencies than class hierarchy ones. DepChange got the super-type >>>>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>>>> >>>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>>>> tests and vm.mlvm tests. >>>>>> >>>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>>>> second with 7071653). Since the CallSite targets don't change during >>>>>> the runtime of this benchmark we can see the performance benefit of >>>>>> eliminating the guard: >>>>>> >>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>>>> >>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>>>> >> > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From vladimir.kozlov at oracle.com Mon Aug 8 12:36:45 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 08 Aug 2011 12:36:45 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> <4E3FF8E4.2070302@oracle.com> <4E403089.5010204@oracle.com> Message-ID: <4E403ACD.5000500@oracle.com> Tom Rodriguez wrote: > On Aug 8, 2011, at 11:52 AM, Vladimir Kozlov wrote: > >> Christian Thalinger wrote: >>> On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: >>> >>>> Christian, >>>> >>>> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? >>> No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. >> You lost me here. New code in resolve_get_put() is executed only for putfield to >> CallSite.target. But new code in patch_bytecode() skips quickening for all >> putfield bytecodes. My question is: can you narrow skipping quickening only for >> putfield to CallSite.target? Or you are saying that there is no performance >> difference between executing _aputfield vs _fast_aputfield? > > It only skips quickening if put_code is zero, which is only done for CallSite.target. All the others proceed as they used to. Good. Thank you, Tom Vladimir > > tom > >> Vladimir >> >>>> On 8/8/11 6:56 AM, Christian Thalinger wrote: >>>>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? >>>>> Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. >>>>> >>>>> _indices in CosntantPoolCacheEntry is defined as intx: >>>>> >>>>> volatile intx _indices; // constant pool index& rewrite bytecodes >>>>> >>>>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: >>>>> >>>>> // bit number |31 0| >>>>> // bit length |-8--|-8--|---16----| >>>>> // -------------------------------- >>>>> // _indices [ b2 | b1 | index ] >>>>> >>>>> Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. >>>> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). >>> I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. >>> >>>>>> I am concern about using next short branch in new code in templateTable_sparc.cpp: >>>>>> >>>>>> cmp_and_br_short(..., L_patch_done); // don't patch >>>>>> >>>>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. >>>>> Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? >>>>> >>>> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. >>> That works. >>> >>>>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. >>>>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. >>>>> >>>> OK. >>>> >>>>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. >>>>> The spec of MutableCallSite says: >>>>> >>>>> "For target values which will be frequently updated, consider using a volatile call site instead." >>>>> >>>>> And VolatileCallSite says: >>>>> >>>>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. >>>>> >>>>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. >>>>> >>>>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite." >>>>> >>>>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. >>>>> >>>> Thank you for explaining it. >>>> >>>>> Additionally I had to do two small changes because the build was broken on some configurations: >>>>> >>>>> - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; >>>>> + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; >>>>> >>>>> and >>>>> >>>>> - MutexLockerEx ccl(CodeCache_lock, thread); >>>>> + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); >>>>> >>>>> I updated the webrev. >>>> Good. >>> Thanks. >>> >>> -- Christian >>> >>>> Vladimir >>>> >>>>> -- Christian >>>>> >>>>>> Vladimir >>>>>> >>>>>> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>>>>>> http://cr.openjdk.java.net/~twisti/7071653 >>>>>>> >>>>>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>>>>> Reviewed-by: >>>>>>> >>>>>>> Currently every speculatively inlined method handle call site has a >>>>>>> guard that compares the current target of the CallSite object to the >>>>>>> inlined one. This per-invocation overhead can be removed if the >>>>>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>>>>> >>>>>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>>>>> bytecode quickening for putfield instructions when the put_code >>>>>>> written to the constant pool cache is zero. This is required so that >>>>>>> every execution of a putfield to CallSite.target calls out to >>>>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>>>>> depending compiled methods. >>>>>>> >>>>>>> I also had to change the dependency machinery to understand other >>>>>>> dependencies than class hierarchy ones. DepChange got the super-type >>>>>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>>>>> >>>>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>>>>> tests and vm.mlvm tests. >>>>>>> >>>>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>>>>> second with 7071653). Since the CallSite targets don't change during >>>>>>> the runtime of this benchmark we can see the performance benefit of >>>>>>> eliminating the guard: >>>>>>> >>>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>>>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>>>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>>>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>>>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>>>>> >>>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>>>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>>>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>>>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>>>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>>>>> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > From headius at headius.com Mon Aug 8 14:29:46 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Mon, 8 Aug 2011 17:29:46 -0400 Subject: review for 7071307: MethodHandle bimorphic inlining should consider the frequency In-Reply-To: <97DD49F1-0A6B-4F0C-88EA-76C93D054007@oracle.com> References: <97DD49F1-0A6B-4F0C-88EA-76C93D054007@oracle.com> Message-ID: On Thu, Jul 28, 2011 at 7:47 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/7071307 > 46 lines changed: 27 ins; 6 del; 13 mod; 3568 unchg > > 7071307: MethodHandle bimorphic inlining should consider the frequency > Reviewed-by: > > The fix for 7050554 added a bimorphic inline path but didn't take into > account the frequency of the guarding test. ?This ends up treating > both sides of the if as equally frequent which can lead to over > inlining and overflowing the method inlining limits. ?The fix is to > grab the frequency from the If and apply that to the branches. ?This > addresses a major source of overinlining that can result in bad > performance with JSR 292. ?We may do a later extension to this to > actually do per call chain profiling of selectAlternative but that's a > more complicated fix. > > I also fixed a problem with the ideal graph printer where debug_orig > printing would go into an infinite loop. > > Tested with jruby and vm.mlvm tests. Building on Ubuntu (an admittedly old install) yields some warnings that are turned into errors: g++ -DLINUX -D_GNU_SOURCE -DIA32 -DPRODUCT -I. -I/home/headius/hsx-hotspot/src/share/vm/prims -I/home/headius/hsx-hotspot/src/share/vm -I/home/headius/hsx-hotspot/src/cpu/x86/vm -I/home/headius/hsx-hotspot/src/os_cpu/linux_x86/vm -I/home/headius/hsx-hotspot/src/os/linux/vm -I/home/headius/hsx-hotspot/src/os/posix/vm -I../generated -DHOTSPOT_RELEASE_VERSION="\"22.0-b01-internal\"" -DHOTSPOT_BUILD_TARGET="\"product\"" -DHOTSPOT_BUILD_USER="\"headius\"" -DHOTSPOT_LIB_ARCH=\"i386\" -DJRE_RELEASE_VERSION="\"1.7.0\"" -DHOTSPOT_VM_DISTRO="\"OpenJDK\"" -DTARGET_OS_FAMILY_linux -DTARGET_ARCH_x86 -DTARGET_ARCH_MODEL_x86_32 -DTARGET_OS_ARCH_linux_x86 -DTARGET_OS_ARCH_MODEL_linux_x86_32 -DTARGET_COMPILER_gcc -DCOMPILER2 -DCOMPILER1 -fPIC -fno-rtti -fno-exceptions -D_REENTRANT -fcheck-new -m32 -march=i586 -pipe -O3 -fno-strict-aliasing -DVM_LITTLE_ENDIAN -Werror -Wpointer-arith -Wconversion -Wsign-compare -c -MMD -MP -MF ../generated/dependencies/precompiled.hpp.gch.d -x c++-header /home/headius/hsx-hotspot/src/share/vm/precompiled.hpp -o precompiled.hpp.gch cc1plus: warnings being treated as errors /home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp: In member function 'ciCallProfile ciCallProfile::rescale(double)': /home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp:87: warning: converting to 'int' from 'double' /home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp:89: warning: converting to 'int' from 'double' The lines in question are doing (int) *= (double), which gcc complains about. Ubuntu probably has warnings set up to be errors, so it fails the build. I modified them in my local copy to do the long form with an explicit cast back to int, but you can fix in whatever way is best. - Charlie From tom.rodriguez at oracle.com Mon Aug 8 14:44:18 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 8 Aug 2011 14:44:18 -0700 Subject: review for 7071307: MethodHandle bimorphic inlining should consider the frequency In-Reply-To: References: <97DD49F1-0A6B-4F0C-88EA-76C93D054007@oracle.com> Message-ID: <7BE903A3-19FF-433C-909C-AAD3105E69D2@oracle.com> I'll fix that as you suggest. diff -r a19c671188cb src/share/vm/ci/ciCallProfile.hpp --- a/src/share/vm/ci/ciCallProfile.hpp +++ b/src/share/vm/ci/ciCallProfile.hpp @@ -79,6 +79,17 @@ assert(i < _limit, "out of Call Profile MorphismLimit"); return _receiver[i]; } + + // Rescale the current profile based on the incoming scale + ciCallProfile rescale(double scale) { + assert(scale >= 0 && scale <= 1.0, "out of range"); + ciCallProfile call = *this; + call._count = (int)(call._count * scale); + for (int i = 0; i < _morphism; i++) { + call._receiver_count[i] = (int)(call._receiver_count[i] * scale); + } + return call; + } }; #endif // SHARE_VM_CI_CICALLPROFILE_HPP I haven't pushed this yet because I was seeing some cases where the if's were ordered how I expect and I'm still trying to figure out if this is me or something odd in jruby. I should get to the bottom of this today. tom On Aug 8, 2011, at 2:29 PM, Charles Oliver Nutter wrote: > On Thu, Jul 28, 2011 at 7:47 PM, Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/7071307 >> 46 lines changed: 27 ins; 6 del; 13 mod; 3568 unchg >> >> 7071307: MethodHandle bimorphic inlining should consider the frequency >> Reviewed-by: >> >> The fix for 7050554 added a bimorphic inline path but didn't take into >> account the frequency of the guarding test. This ends up treating >> both sides of the if as equally frequent which can lead to over >> inlining and overflowing the method inlining limits. The fix is to >> grab the frequency from the If and apply that to the branches. This >> addresses a major source of overinlining that can result in bad >> performance with JSR 292. We may do a later extension to this to >> actually do per call chain profiling of selectAlternative but that's a >> more complicated fix. >> >> I also fixed a problem with the ideal graph printer where debug_orig >> printing would go into an infinite loop. >> >> Tested with jruby and vm.mlvm tests. > > Building on Ubuntu (an admittedly old install) yields some warnings > that are turned into errors: > > g++ -DLINUX -D_GNU_SOURCE -DIA32 -DPRODUCT -I. > -I/home/headius/hsx-hotspot/src/share/vm/prims > -I/home/headius/hsx-hotspot/src/share/vm > -I/home/headius/hsx-hotspot/src/cpu/x86/vm > -I/home/headius/hsx-hotspot/src/os_cpu/linux_x86/vm > -I/home/headius/hsx-hotspot/src/os/linux/vm > -I/home/headius/hsx-hotspot/src/os/posix/vm -I../generated > -DHOTSPOT_RELEASE_VERSION="\"22.0-b01-internal\"" > -DHOTSPOT_BUILD_TARGET="\"product\"" > -DHOTSPOT_BUILD_USER="\"headius\"" -DHOTSPOT_LIB_ARCH=\"i386\" > -DJRE_RELEASE_VERSION="\"1.7.0\"" -DHOTSPOT_VM_DISTRO="\"OpenJDK\"" > -DTARGET_OS_FAMILY_linux -DTARGET_ARCH_x86 -DTARGET_ARCH_MODEL_x86_32 > -DTARGET_OS_ARCH_linux_x86 -DTARGET_OS_ARCH_MODEL_linux_x86_32 > -DTARGET_COMPILER_gcc -DCOMPILER2 -DCOMPILER1 -fPIC -fno-rtti > -fno-exceptions -D_REENTRANT -fcheck-new -m32 -march=i586 -pipe -O3 > -fno-strict-aliasing -DVM_LITTLE_ENDIAN -Werror -Wpointer-arith > -Wconversion -Wsign-compare -c -MMD -MP -MF > ../generated/dependencies/precompiled.hpp.gch.d -x c++-header > /home/headius/hsx-hotspot/src/share/vm/precompiled.hpp -o > precompiled.hpp.gch > cc1plus: warnings being treated as errors > /home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp: In member > function 'ciCallProfile ciCallProfile::rescale(double)': > /home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp:87: > warning: converting to 'int' from 'double' > /home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp:89: > warning: converting to 'int' from 'double' > > The lines in question are doing (int) *= (double), which gcc complains > about. Ubuntu probably has warnings set up to be errors, so it fails > the build. > > I modified them in my local copy to do the long form with an explicit > cast back to int, but you can fix in whatever way is best. > > - Charlie From tom.rodriguez at oracle.com Mon Aug 8 20:44:51 2011 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Tue, 09 Aug 2011 03:44:51 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7075623: 6990212 broke raiseException in 64 bit Message-ID: <20110809034457.B64E147A38@hg.openjdk.java.net> Changeset: a19c671188cb Author: never Date: 2011-08-08 13:19 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a19c671188cb 7075623: 6990212 broke raiseException in 64 bit Reviewed-by: kvn, twisti ! src/cpu/sparc/vm/methodHandles_sparc.cpp ! src/cpu/x86/vm/methodHandles_x86.cpp From christian.thalinger at oracle.com Tue Aug 9 04:33:29 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 9 Aug 2011 13:33:29 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> Message-ID: <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com> On Aug 8, 2011, at 8:49 PM, Tom Rodriguez wrote: > dependencies.cpp: > > in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed. It should probably look more like this: > > klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) { > assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity"); > // Same CallSite object but different target? Check this specific call site > // if changes is non-NULL or validate all CallSites > if ((changes == NULL || (call_site == changes->call_site())) && > (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) { > return ctxk; // assertion failed > } > assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid"); > return NULL; // assertion still valid > } I see your point. But the code above is broken as changes->method_handle() will not work when changes == NULL. One of my first versions of this code also stored the MethodHandle target in the dependence stream which seems to be required when we want to validate all CallSites. Something like this: ! klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, oop method_handle, CallSiteDepChange* changes) { + assert(call_site ->is_a(SystemDictionary::CallSite_klass()), "sanity"); + assert(method_handle->is_a(SystemDictionary::MethodHandle_klass()), "sanity"); + if (changes == NULL) { + // Validate all CallSites + if (java_lang_invoke_CallSite::target(call_site) != method_handle) + return ctxk; // assertion failed + } else { + // Validate the given CallSite + if (call_site == changes->call_site() && java_lang_invoke_CallSite::target(call_site) != changes->method_handle()) { + assert(method_handle != changes->method_handle(), "must be"); + return ctxk; // assertion failed + } + } + assert(java_lang_invoke_CallSite::target(call_site) == method_handle, "should still be valid"); + return NULL; // assertion still valid + } > > The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked. > > interpreterRuntime.cpp: > > Please move the dependence check code into universe with the other dependence check code. Where it says: // %%% The Universe::flush_foo methods belong in CodeCache. :-) > Also add some comments explaining why it's doing what it's doing. Done. > > doCall.cpp: > > Can you put in a comment explaining that VolatileCallSite is never inlined. Done. > > Otherwise it looks good. webrev updated. -- Christian > > tom > > > On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote: > >> http://cr.openjdk.java.net/~twisti/7071653 >> >> 7071653: JSR 292: call site change notification should be pushed not pulled >> Reviewed-by: >> >> Currently every speculatively inlined method handle call site has a >> guard that compares the current target of the CallSite object to the >> inlined one. This per-invocation overhead can be removed if the >> notification is changed from pulled to pushed (i.e. deoptimization). >> >> I had to change the logic in TemplateTable::patch_bytecode to skip >> bytecode quickening for putfield instructions when the put_code >> written to the constant pool cache is zero. This is required so that >> every execution of a putfield to CallSite.target calls out to >> InterpreterRuntime::resolve_get_put to do the deoptimization of >> depending compiled methods. >> >> I also had to change the dependency machinery to understand other >> dependencies than class hierarchy ones. DepChange got the super-type >> of two new dependencies, KlassDepChange and CallSiteDepChange. >> >> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >> tests and vm.mlvm tests. >> >> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >> second with 7071653). Since the CallSite targets don't change during >> the runtime of this benchmark we can see the performance benefit of >> eliminating the guard: >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.883000 0.000000 0.883000 ( 0.854000) >> 0.715000 0.000000 0.715000 ( 0.715000) >> 0.712000 0.000000 0.712000 ( 0.712000) >> 0.713000 0.000000 0.713000 ( 0.713000) >> 0.713000 0.000000 0.713000 ( 0.712000) >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.772000 0.000000 0.772000 ( 0.742000) >> 0.624000 0.000000 0.624000 ( 0.624000) >> 0.621000 0.000000 0.621000 ( 0.621000) >> 0.622000 0.000000 0.622000 ( 0.622000) >> 0.622000 0.000000 0.622000 ( 0.621000) >> > From rednaxelafx at gmail.com Tue Aug 9 06:14:37 2011 From: rednaxelafx at gmail.com (Krystal Mok) Date: Tue, 9 Aug 2011 21:14:37 +0800 Subject: A failed attempt to add Phi::exact_type() to C1 Message-ID: Hi all, I tried to add an implementation of Phi::exact_type() to C1 last weekend, but failed. I'd like to share my experience. Any comment would be appreciated. I reading a blog post with a microbenchmark [1]. The microbenchmark, when run on the client VM, triggers an OSR compilation of Client1.main() by C1; all method invocations on the local variable "list" were not inlined. At first I thought it was weird: since there's only one definition of "list", all of its uses should know its exact type, and thus should be able to get inlined. I tried moving the code in main() to another method, so that it can get a standard compilation, and found those methods were indeed inlined when standard compiled. So apparently the difference had to do with OSRs. The I realized it was the extra Phi introduced by the OSR entry that lost the exact type information. And it wasn't just in OSRs, Phi nodes in C1 HIR always loses exact type information, becuase it doesn't override Instruction::exact_type(). C1 will not inline "list.size()" in the code snippet below, which contains diamond control flow, with different definitions of the same variable: public static void test(String[] args) { List list; if (args.length % 2 == 0) { list = new ArrayList(); // a1 } else { list = new ArrayList(32); // a2 } // a3 = Phi(a1, a2) int size = list.size(); // a3.invokeinterface() java/util/List.size()I System.out.println(size); } Even if the local variable "list" always holds a reference to an ArrayList instance, the Phi node stops the exact type information to flow through it, so the "list.size()" call site can't be inlined. I thought I'd be able to fix the problem by adding an implementation of Phi::exact_type(), and I made a patch, avaiable at [2]. The basic idea is simple: if all operands of a Phi node agrees on a single exact type, use it as the exact type of this Phi. And some assumptions: 1. Because C1's HIR is in SSA form, the only kind of nodes that can have cycles in data dependence graph is Phi. Cycles have to be broken when recursively traversing the operands of a Phi node. If a cycle is found, I'll just give up finding the exact type of this Phi. 2. Unlike C2, which prunes the part of the graph not reachable from the OSR entry point in OSR compilations, C1 always sees the whole graph of a method, regardless of standard or an OSR compilation. If a variable needs a Phi and the operands don't agree on a single exact type, standard compilation would have noticed; otherwise, if a variable doesn't need a Phi, or it needs a Phi but the operands agree on a single exact type, it should still hold in an OSR compilation. So, if an operand of a Phi node is a UnsafeGetRaw (which can only be introduced in an OSR entry), skipping it should be safe. Applying the patch did allow the affected call sites to get inlined, in the microbenchmark in [1]. The diamond control flow example got the "list.size()" call site inlined as well. But, the patch had a fatal bug. The Java code example in [2] demonstrates that bug. C1 builds the HIR graph incrementally; inline decisions are made as a part of the HIR building process. When C1's GraphBuilder sees an invoke* bytecode, it'll try to devirtualize the call site by asking for the receiver's exact_type(). But the relationships between Phi nodes may still be incomplete by then, so the Phi::exact_type() in my patch may return immature (thus incorrect) results. In the example in [2], the "list.size()" call site at line 18 inlined java.util.ArrayList.size(), which is incorrect. The HIR log shows that when GraphBuilder tried to inline this call site, the receiver (list4 in code comment) had only one operand (list3 in code comment), which covers the first two definitions of "list" (list1 and list2) but missed the third one (list5). The connection between list4 and list5 was added later, too late. So, the patch doesn't work. My questions: 1. Any ideas on how to implement a correct Phi::exact_type() that conforms to the way HIR graph is built now? 2. If the inline decisions are decoupled from the HIR graph building phase, and pushed to a later phase, would it significantly slow down/complicate C1? If it was done later, it could have allowed a much better chance of inlining more stuff. Besides, it might allow policy-controlled iterations of other optimizations + inlining, so in tiered mode warm methods may get finer grain control of optimizations, and result in better code quality (suppose it couldn't go to tier 4, or got deopt'd and fell to tier 1). Regards, Kris Mok [1]: http://icyfenix.iteye.com/blog/1110279 [2]: https://gist.github.com/1133678 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110809/585331bc/attachment.html From tom.rodriguez at oracle.com Tue Aug 9 14:02:07 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 9 Aug 2011 14:02:07 -0700 Subject: Request for reviews (L): 7063629: use cbcond in C2 generated code on T4 In-Reply-To: <4E3B452E.10509@oracle.com> References: <4E3B452E.10509@oracle.com> Message-ID: This looks really good. This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*. That would make it easier to use it directly during code generation, as in: + __ jmpb($labl$$label); sparc.ad: It might be nice to factor this out: Assembler::Predict predict_taken = + cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn; x86_32.ad: Would you get averse to inlining Jcc and JccShort? output.cpp: Why does the first round of shorten_branches occur in the middle of init_buffer? Couldn't it be done right afterwards? It's just odd that it's buried inside there. That first round is conservative since we haven't done all padding yet, right? Then shorten_branches_final does a last pass based on the real offsets? shorten_branches_final isn't a great name. Maybe finalize_offsets_and_shorten? The core shorten branch logic is duplicated in those functions. Could it be factored out or is there too much local state? Why was this needed? *** 2182,2192 **** --- 2383,2393 ---- (op != Op_Node && // Not an unused antidepedence node and // not an unallocated boxlock (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) { // Push any trailing projections ! if( bb->_nodes[bb->_nodes.size()-1] != n ) { ! if( bb->_nodes[_bb_end-1] != n ) { for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) { Node *foi = n->fast_out(i); if( foi->is_Proj() ) _scheduled.push(foi); } That code is complicated enough that I can't reason about it's correctness from a webrev. Is this because of the trailing NOPs? Can you add this comment to the that last anti_do_def piece I added: // kill projections on a branch should appear to occur on the // branch, not afterwards, so grab the masks from the projections // and process them. tom On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7063629/webrev > > 7063629: use cbcond in C2 generated code on T4 > > The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86. > > Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back. > > Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding. > > Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions. > > Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file. > > Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64. > > Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte). > > The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it. From vladimir.kozlov at oracle.com Tue Aug 9 15:55:37 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 09 Aug 2011 15:55:37 -0700 Subject: Request for reviews (L): 7063629: use cbcond in C2 generated code on T4 In-Reply-To: References: <4E3B452E.10509@oracle.com> Message-ID: <4E41BAE9.1070505@oracle.com> Thank you, Tom Tom Rodriguez wrote: > This looks really good. > > This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*. That would make it easier to use it directly during code generation, as in: > > + __ jmpb($labl$$label); Yes, I would leave it for an other time. I will file RFE. > > sparc.ad: > > It might be nice to factor this out: > > Assembler::Predict predict_taken = > + cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn; I will file RFE for that: use probability from IfNode to determine the pt value as you suggested before. > > x86_32.ad: > > Would you get averse to inlining Jcc and JccShort? I did not realize that it is just one instruction now :) They are used in a lot of places and I did not want to duplicate the original code. I will inline them now. > > output.cpp: > > Why does the first round of shorten_branches occur in the middle of init_buffer? Couldn't it be done right afterwards? It's just odd that it's buried inside there. First loop in shorten_branches() estimates code, locals, stubs sizes which are used later in init_buffer() to allocate CodeBuffer. I would need to split shorten_branches() method which is not easy since the first loop also collects information about branches which could be replaced. > > That first round is conservative since we haven't done all padding yet, right? Correct. > Then shorten_branches_final does a last pass based on the real offsets? Yes, backward branches inserted in this method use final offsets. For forward branches we still have only conservative offsets since following blocks are not processed yet. > shorten_branches_final isn't a great name. Maybe finalize_offsets_and_shorten? I also did not like it, I will use finalize_offsets_and_shorten() > > The core shorten branch logic is duplicated in those functions. Could it be factored out or is there too much local state? I thought about it but as you said "too much local state". > > Why was this needed? > > *** 2182,2192 **** > --- 2383,2393 ---- > (op != Op_Node && // Not an unused antidepedence node and > // not an unallocated boxlock > (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) { > > // Push any trailing projections > ! if( bb->_nodes[bb->_nodes.size()-1] != n ) { > ! if( bb->_nodes[_bb_end-1] != n ) { > for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) { > Node *foi = n->fast_out(i); > if( foi->is_Proj() ) > _scheduled.push(foi); > } > > That code is complicated enough that I can't reason about it's correctness from a webrev. Is this because of the trailing NOPs? I hit next assert during development because the loop above pushed nodes which are not for schedule. assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of instructions" ); It may happened before I split shorten_branches() and there were trailing NOPs. But it is not only trailing NOPs, it is also projections after calls and MachNullCheck nodes (see code in DoScheduling()). I think in general the check above should check the last node for schedule and not the last node in block. > > Can you add this comment to the that last anti_do_def piece I added: > > // kill projections on a branch should appear to occur on the > // branch, not afterwards, so grab the masks from the projections > // and process them. Done. Thanks, Vladimir > > tom > > > On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7063629/webrev >> >> 7063629: use cbcond in C2 generated code on T4 >> >> The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86. >> >> Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back. >> >> Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding. >> >> Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions. >> >> Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file. >> >> Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64. >> >> Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte). >> >> The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it. > From roland.westrelin at oracle.com Wed Aug 10 05:00:28 2011 From: roland.westrelin at oracle.com (roland.westrelin at oracle.com) Date: Wed, 10 Aug 2011 12:00:28 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7074017: Introduce MemBarAcquireLock/MemBarReleaseLock nodes for monitor enter/exit code paths Message-ID: <20110810120032.DC88047A8F@hg.openjdk.java.net> Changeset: f1c12354c3f7 Author: roland Date: 2011-08-02 18:36 +0200 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/f1c12354c3f7 7074017: Introduce MemBarAcquireLock/MemBarReleaseLock nodes for monitor enter/exit code paths Summary: replace MemBarAcquire/MemBarRelease nodes on the monitor enter/exit code paths with new MemBarAcquireLock/MemBarReleaseLock nodes Reviewed-by: kvn, twisti ! src/cpu/sparc/vm/sparc.ad ! src/cpu/x86/vm/x86_32.ad ! src/cpu/x86/vm/x86_64.ad ! src/share/vm/adlc/formssel.cpp ! src/share/vm/opto/classes.hpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/macro.cpp ! src/share/vm/opto/matcher.cpp ! src/share/vm/opto/matcher.hpp ! src/share/vm/opto/memnode.cpp ! src/share/vm/opto/memnode.hpp From tom.rodriguez at oracle.com Wed Aug 10 12:28:07 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 10 Aug 2011 12:28:07 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com> Message-ID: <15BDEB85-0323-4026-A249-D979D88E863B@oracle.com> On Aug 9, 2011, at 4:33 AM, Christian Thalinger wrote: > > On Aug 8, 2011, at 8:49 PM, Tom Rodriguez wrote: > >> dependencies.cpp: >> >> in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed. It should probably look more like this: >> >> klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) { >> assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity"); >> // Same CallSite object but different target? Check this specific call site >> // if changes is non-NULL or validate all CallSites >> if ((changes == NULL || (call_site == changes->call_site())) && >> (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) { >> return ctxk; // assertion failed >> } >> assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid"); >> return NULL; // assertion still valid >> } > > I see your point. But the code above is broken as changes->method_handle() will not work when changes == NULL. One of my first versions of this code also stored the MethodHandle target in the dependence stream which seems to be required when we want to validate all CallSites. Something like this Yes that right. The new webrev looks good. tom > > ! klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, oop method_handle, CallSiteDepChange* changes) { > + assert(call_site ->is_a(SystemDictionary::CallSite_klass()), "sanity"); > + assert(method_handle->is_a(SystemDictionary::MethodHandle_klass()), "sanity"); > + if (changes == NULL) { > + // Validate all CallSites > + if (java_lang_invoke_CallSite::target(call_site) != method_handle) > + return ctxk; // assertion failed > + } else { > + // Validate the given CallSite > + if (call_site == changes->call_site() && java_lang_invoke_CallSite::target(call_site) != changes->method_handle()) { > + assert(method_handle != changes->method_handle(), "must be"); > + return ctxk; // assertion failed > + } > + } > + assert(java_lang_invoke_CallSite::target(call_site) == method_handle, "should still be valid"); > + return NULL; // assertion still valid > + } > >> >> The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked. >> >> interpreterRuntime.cpp: >> >> Please move the dependence check code into universe with the other dependence check code. > > Where it says: > > // %%% The Universe::flush_foo methods belong in CodeCache. > > :-) > >> Also add some comments explaining why it's doing what it's doing. > > Done. > >> >> doCall.cpp: >> >> Can you put in a comment explaining that VolatileCallSite is never inlined. > > Done. > >> >> Otherwise it looks good. > > webrev updated. > > -- Christian > >> >> tom >> >> >> On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote: >> >>> http://cr.openjdk.java.net/~twisti/7071653 >>> >>> 7071653: JSR 292: call site change notification should be pushed not pulled >>> Reviewed-by: >>> >>> Currently every speculatively inlined method handle call site has a >>> guard that compares the current target of the CallSite object to the >>> inlined one. This per-invocation overhead can be removed if the >>> notification is changed from pulled to pushed (i.e. deoptimization). >>> >>> I had to change the logic in TemplateTable::patch_bytecode to skip >>> bytecode quickening for putfield instructions when the put_code >>> written to the constant pool cache is zero. This is required so that >>> every execution of a putfield to CallSite.target calls out to >>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>> depending compiled methods. >>> >>> I also had to change the dependency machinery to understand other >>> dependencies than class hierarchy ones. DepChange got the super-type >>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>> >>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>> tests and vm.mlvm tests. >>> >>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>> second with 7071653). Since the CallSite targets don't change during >>> the runtime of this benchmark we can see the performance benefit of >>> eliminating the guard: >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.883000 0.000000 0.883000 ( 0.854000) >>> 0.715000 0.000000 0.715000 ( 0.715000) >>> 0.712000 0.000000 0.712000 ( 0.712000) >>> 0.713000 0.000000 0.713000 ( 0.713000) >>> 0.713000 0.000000 0.713000 ( 0.712000) >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.772000 0.000000 0.772000 ( 0.742000) >>> 0.624000 0.000000 0.624000 ( 0.624000) >>> 0.621000 0.000000 0.621000 ( 0.621000) >>> 0.622000 0.000000 0.622000 ( 0.622000) >>> 0.622000 0.000000 0.622000 ( 0.621000) >>> >> > From christian.thalinger at oracle.com Wed Aug 10 12:34:27 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 10 Aug 2011 21:34:27 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <15BDEB85-0323-4026-A249-D979D88E863B@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com> <15BDEB85-0323-4026-A249-D979D88E863B@oracle.com> Message-ID: <27ED8701-5353-4929-B9F1-D5A4F7A361B4@oracle.com> On Aug 10, 2011, at 9:28 PM, Tom Rodriguez wrote: > > On Aug 9, 2011, at 4:33 AM, Christian Thalinger wrote: > >> >> On Aug 8, 2011, at 8:49 PM, Tom Rodriguez wrote: >> >>> dependencies.cpp: >>> >>> in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed. It should probably look more like this: >>> >>> klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) { >>> assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity"); >>> // Same CallSite object but different target? Check this specific call site >>> // if changes is non-NULL or validate all CallSites >>> if ((changes == NULL || (call_site == changes->call_site())) && >>> (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) { >>> return ctxk; // assertion failed >>> } >>> assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid"); >>> return NULL; // assertion still valid >>> } >> >> I see your point. But the code above is broken as changes->method_handle() will not work when changes == NULL. One of my first versions of this code also stored the MethodHandle target in the dependence stream which seems to be required when we want to validate all CallSites. Something like this > > Yes that right. The new webrev looks good. Thank you, Tom. -- Christian > > tom > > >> >> ! klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, oop method_handle, CallSiteDepChange* changes) { >> + assert(call_site ->is_a(SystemDictionary::CallSite_klass()), "sanity"); >> + assert(method_handle->is_a(SystemDictionary::MethodHandle_klass()), "sanity"); >> + if (changes == NULL) { >> + // Validate all CallSites >> + if (java_lang_invoke_CallSite::target(call_site) != method_handle) >> + return ctxk; // assertion failed >> + } else { >> + // Validate the given CallSite >> + if (call_site == changes->call_site() && java_lang_invoke_CallSite::target(call_site) != changes->method_handle()) { >> + assert(method_handle != changes->method_handle(), "must be"); >> + return ctxk; // assertion failed >> + } >> + } >> + assert(java_lang_invoke_CallSite::target(call_site) == method_handle, "should still be valid"); >> + return NULL; // assertion still valid >> + } >> >>> >>> The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked. >>> >>> interpreterRuntime.cpp: >>> >>> Please move the dependence check code into universe with the other dependence check code. >> >> Where it says: >> >> // %%% The Universe::flush_foo methods belong in CodeCache. >> >> :-) >> >>> Also add some comments explaining why it's doing what it's doing. >> >> Done. >> >>> >>> doCall.cpp: >>> >>> Can you put in a comment explaining that VolatileCallSite is never inlined. >> >> Done. >> >>> >>> Otherwise it looks good. >> >> webrev updated. >> >> -- Christian >> >>> >>> tom >>> >>> >>> On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote: >>> >>>> http://cr.openjdk.java.net/~twisti/7071653 >>>> >>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>> Reviewed-by: >>>> >>>> Currently every speculatively inlined method handle call site has a >>>> guard that compares the current target of the CallSite object to the >>>> inlined one. This per-invocation overhead can be removed if the >>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>> >>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>> bytecode quickening for putfield instructions when the put_code >>>> written to the constant pool cache is zero. This is required so that >>>> every execution of a putfield to CallSite.target calls out to >>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>> depending compiled methods. >>>> >>>> I also had to change the dependency machinery to understand other >>>> dependencies than class hierarchy ones. DepChange got the super-type >>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>> >>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>> tests and vm.mlvm tests. >>>> >>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>> second with 7071653). Since the CallSite targets don't change during >>>> the runtime of this benchmark we can see the performance benefit of >>>> eliminating the guard: >>>> >>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>> >>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>> >>> >> > From vladimir.kozlov at oracle.com Wed Aug 10 12:47:51 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 10 Aug 2011 12:47:51 -0700 Subject: Request for reviews (S): 7077439: Possible reference through NULL in loopPredicate.cpp:726 Message-ID: <4E42E067.8020302@oracle.com> http://cr.openjdk.java.net/~kvn/7077439/webrev Fixed 7077439: Possible reference through NULL in loopPredicate.cpp:726 VM crashed at the next line because cl->loopexit() == NULL when I tried to port 7070134 into previous Hotspot sources: BoolTest::mask bt = cl->loopexit()->test_trip(); I did not see such crush with latest HS22 sources but it does not mean it can't happen. The check cl->is_valid_counted_loop() should be used in the code to avoid such crush. Note, this check is superset of cl->stride_is_con() so the later could be replaced. From tom.rodriguez at oracle.com Wed Aug 10 12:52:28 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 10 Aug 2011 12:52:28 -0700 Subject: IdealGraphVisualizer file compatibility In-Reply-To: References: <60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com> Message-ID: <16906A61-ADC0-4700-A1B8-5082604F8420@oracle.com> On Aug 3, 2011, at 10:00 AM, Joe Kearney wrote: > Oh ok, I didn't realise. Thanks. Are there any plans to make it more > widely available? I can see it being useful for experimenting to > squeeze performance. We don't have any current plans. We've tended not to include developer specific features in the product binary, mainly to avoid making an already large library even larger. Admittedly IGV support is pretty small code size wise. tom > > Thanks, > Joe > > On 3 August 2011 17:42, Tom Rodriguez wrote: >> It's not available in the product as it's really intended for developers. Use a fastdebug build. >> >> tom >> >> On Aug 3, 2011, at 9:37 AM, Joe Kearney wrote: >> >>> Ah, thanks for the readme link. >>> >>> I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the >>> PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with >>> UnlockDiagnosticVMOptions etc as well. to no avail. Is there something >>> else needed to expose this? >>> >>> Joe >>> >>> On 3 August 2011 15:51, Christian Thalinger >>> wrote: >>>> You want: -XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml >>>> >>>> The README of the visualizer also helps: >>>> >>>> http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README >>>> >>>> -- Christian >>>> >>>> On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote: >>>> >>>>> Hi, >>>>> >>>>> I've been trying to play with igv from >>>>> http://ssw.jku.at/General/Staff/TW/igv.html, >>>>> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate >>>>> the required log files. What sort of files should I expect the igv to >>>>> be able to read? The example files are graphDocument XMLs. I was >>>>> hoping to be able to generate a file with something like the >>>>> following: >>>>> >>>>> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml >>>>> >>>>> Needless to say, these hotspot_log files are totally different and the >>>>> igv barfs with the below. >>>>> >>>>> java.lang.NullPointerException >>>>> at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70) >>>>> at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128) >>>>> at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572) >>>>> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997) >>>>> >>>>> >>>>> How do I get the jvm to generate the right output file? >>>>> >>>>> Many thanks, >>>>> Joe >>>> >>>> >> >> From tom.rodriguez at oracle.com Wed Aug 10 13:07:03 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 10 Aug 2011 13:07:03 -0700 Subject: Request for reviews (S): 7077439: Possible reference through NULL in loopPredicate.cpp:726 In-Reply-To: <4E42E067.8020302@oracle.com> References: <4E42E067.8020302@oracle.com> Message-ID: Looks good. tom On Aug 10, 2011, at 12:47 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7077439/webrev > > Fixed 7077439: Possible reference through NULL in loopPredicate.cpp:726 > > VM crashed at the next line because cl->loopexit() == NULL when I tried to port 7070134 into previous Hotspot sources: > > BoolTest::mask bt = cl->loopexit()->test_trip(); > > I did not see such crush with latest HS22 sources but it does not mean it can't happen. The check cl->is_valid_counted_loop() should be used in the code to avoid such crush. Note, this check is superset of cl->stride_is_con() so the later could be replaced. > From vladimir.kozlov at oracle.com Wed Aug 10 14:01:03 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 10 Aug 2011 14:01:03 -0700 Subject: Request for reviews (S): 7077439: Possible reference through NULL in loopPredicate.cpp:726 In-Reply-To: References: <4E42E067.8020302@oracle.com> Message-ID: <4E42F18F.5060001@oracle.com> Thank you, Tom Vladimir Tom Rodriguez wrote: > Looks good. > > tom > > On Aug 10, 2011, at 12:47 PM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7077439/webrev >> >> Fixed 7077439: Possible reference through NULL in loopPredicate.cpp:726 >> >> VM crashed at the next line because cl->loopexit() == NULL when I tried to port 7070134 into previous Hotspot sources: >> >> BoolTest::mask bt = cl->loopexit()->test_trip(); >> >> I did not see such crush with latest HS22 sources but it does not mean it can't happen. The check cl->is_valid_counted_loop() should be used in the code to avoid such crush. Note, this check is superset of cl->stride_is_con() so the later could be replaced. >> > From vladimir.kozlov at oracle.com Wed Aug 10 18:12:36 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Thu, 11 Aug 2011 01:12:36 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7077439: Possible reference through NULL in loopPredicate.cpp:726 Message-ID: <20110811011238.EA27A47AB5@hg.openjdk.java.net> Changeset: 6987871cfb9b Author: kvn Date: 2011-08-10 14:06 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/6987871cfb9b 7077439: Possible reference through NULL in loopPredicate.cpp:726 Summary: Use cl->is_valid_counted_loop() check. Reviewed-by: never ! src/share/vm/opto/loopPredicate.cpp ! src/share/vm/opto/loopTransform.cpp ! src/share/vm/opto/loopnode.cpp ! src/share/vm/opto/superword.cpp From vladimir.kozlov at oracle.com Thu Aug 11 11:22:04 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 11 Aug 2011 11:22:04 -0700 Subject: Request for reviews (L): 7063629: use cbcond in C2 generated code on T4 In-Reply-To: <4E41BAE9.1070505@oracle.com> References: <4E3B452E.10509@oracle.com> <4E41BAE9.1070505@oracle.com> Message-ID: <4E441DCC.5040303@oracle.com> >> ! if( bb->_nodes[bb->_nodes.size()-1] != n ) { >> ! if( bb->_nodes[_bb_end-1] != n ) { >> >> That code is complicated enough that I can't reason about it's >> correctness from a webrev. Is this because of the trailing NOPs? > > I hit next assert during development because the loop above pushed nodes > which are not for schedule. > > assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of > instructions" ); > > It may happened before I split shorten_branches() and there were > trailing NOPs. But it is not only trailing NOPs, it is also projections > after calls and MachNullCheck nodes (see code in DoScheduling()). I > think in general the check above should check the last node for schedule > and not the last node in block. Tom, I ran full CTW without this change with my latest changes and did not hit the assert which confirms that it was problem in early development when trailing NOPs were inserted before DoScheduling() call. Do you think I should remove this change? Thanks, Vladimir Vladimir Kozlov wrote: > Thank you, Tom > > Tom Rodriguez wrote: >> This looks really good. >> >> This might be for another day but now that label must be non-NULL, >> maybe it should be a Label& instead of a Label*. That would make it >> easier to use it directly during code generation, as in: >> >> + __ jmpb($labl$$label); > > Yes, I would leave it for an other time. I will file RFE. > >> >> sparc.ad: >> >> It might be nice to factor this out: >> >> Assembler::Predict predict_taken = >> + cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn; > > I will file RFE for that: use probability from IfNode to determine the > pt value as you suggested before. > >> >> x86_32.ad: >> >> Would you get averse to inlining Jcc and JccShort? > > I did not realize that it is just one instruction now :) > They are used in a lot of places and I did not want to duplicate the > original code. I will inline them now. > >> >> output.cpp: >> >> Why does the first round of shorten_branches occur in the middle of >> init_buffer? Couldn't it be done right afterwards? It's just odd >> that it's buried inside there. > > First loop in shorten_branches() estimates code, locals, stubs sizes > which are used later in init_buffer() to allocate CodeBuffer. I would > need to split shorten_branches() method which is not easy since the > first loop also collects information about branches which could be > replaced. > >> >> That first round is conservative since we haven't done all padding >> yet, right? > > Correct. > >> Then shorten_branches_final does a last pass based on the real offsets? > > Yes, backward branches inserted in this method use final offsets. For > forward branches we still have only conservative offsets since following > blocks are not processed yet. > >> shorten_branches_final isn't a great name. Maybe >> finalize_offsets_and_shorten? > > I also did not like it, I will use finalize_offsets_and_shorten() > >> >> The core shorten branch logic is duplicated in those functions. Could >> it be factored out or is there too much local state? > > I thought about it but as you said "too much local state". > >> >> Why was this needed? >> >> *** 2182,2192 **** >> --- 2383,2393 ---- >> (op != Op_Node && // Not an unused antidepedence node and >> // not an unallocated boxlock >> (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != >> Op_BoxLock)) ) { >> // Push any trailing projections >> ! if( bb->_nodes[bb->_nodes.size()-1] != n ) { >> ! if( bb->_nodes[_bb_end-1] != n ) { >> for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; >> i++) { >> Node *foi = n->fast_out(i); >> if( foi->is_Proj() ) >> _scheduled.push(foi); >> } >> >> That code is complicated enough that I can't reason about it's >> correctness from a webrev. Is this because of the trailing NOPs? > > I hit next assert during development because the loop above pushed nodes > which are not for schedule. > > assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of > instructions" ); > > It may happened before I split shorten_branches() and there were > trailing NOPs. But it is not only trailing NOPs, it is also projections > after calls and MachNullCheck nodes (see code in DoScheduling()). I > think in general the check above should check the last node for schedule > and not the last node in block. > >> >> Can you add this comment to the that last anti_do_def piece I added: >> >> // kill projections on a branch should appear to occur on the >> // branch, not afterwards, so grab the masks from the projections >> // and process them. > > Done. > > Thanks, > Vladimir > >> >> tom >> >> >> On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote: >> >>> http://cr.openjdk.java.net/~kvn/7063629/webrev >>> >>> 7063629: use cbcond in C2 generated code on T4 >>> >>> The code is finally shaped as I want and it passed CTW, regression, >>> nsk tests on T4 and x86. >>> >>> Added new fused compare and branch instructions into sparc.ad and >>> corresponding short versions which use cbcond instruction. Added new >>> flag avoid_back_to_back to avoid generation of cbcond back to back. >>> >>> Split shorten_branches() into 2 methods. First method conservatively >>> estimates code size and branches location and does few rounds of >>> branch shortening. It is executed before ScheduleAndBundle(). Step 3 >>> is moved to new method shorten_branches_final() called after >>> ScheduleAndBundle(). It does final paddings, alignment and final >>> branch replacement. Method fill_buffer() does verification instead of >>> padding. >>> >>> Labels are binded now only during code generation in fill_buffer(). >>> As result they are not available when forward branches are emitted. >>> To fix that MacroAssembler branch instructions are used now in x86 >>> .ad files. I replaced unused rtype parameter with maybe_short flag to >>> force using only long branches in .ad long branch instructions. >>> >>> Added check to adlc to verify that short version of a branch >>> instructions has the same declaration in .ad file. >>> >>> Added assert to verify that the size of emitted instruction matches >>> the value returned by MachNode::size(). Found that >>> MachBreakpointNode::size() returned incorrect value on x64. >>> >>> Fixed loop alignment for Sparc (min alignment should be instruction >>> size which is 4 bytes instead of 1 byte). >>> >>> The prototype was done by Tom and I took some of his additional >>> fixes. The block changes go with some code in output to put opto >>> assembly style block comments in the PrintNMethods output. There's >>> also snippet in there that deals with the fact kill projections on >>> branches make it appear the kill occurs after the branch instead of >>> being part of it. >> From tom.rodriguez at oracle.com Thu Aug 11 11:30:42 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 11 Aug 2011 11:30:42 -0700 Subject: Request for reviews (L): 7063629: use cbcond in C2 generated code on T4 In-Reply-To: <4E441DCC.5040303@oracle.com> References: <4E3B452E.10509@oracle.com> <4E41BAE9.1070505@oracle.com> <4E441DCC.5040303@oracle.com> Message-ID: On Aug 11, 2011, at 11:22 AM, Vladimir Kozlov wrote: > >> ! if( bb->_nodes[bb->_nodes.size()-1] != n ) { > >> ! if( bb->_nodes[_bb_end-1] != n ) { > >> > >> That code is complicated enough that I can't reason about it's > >> correctness from a webrev. Is this because of the trailing NOPs? > > > > I hit next assert during development because the loop above pushed nodes > > which are not for schedule. > > > > assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of > > instructions" ); > > > > It may happened before I split shorten_branches() and there were > > trailing NOPs. But it is not only trailing NOPs, it is also projections > > after calls and MachNullCheck nodes (see code in DoScheduling()). I > > think in general the check above should check the last node for schedule > > and not the last node in block. > > Tom, > > I ran full CTW without this change with my latest changes and did not hit the assert which confirms that it was problem in early development when trailing NOPs were inserted before DoScheduling() call. Do you think I should remove this change? If it isn't be needed then I think should be removed. You could put in an assert that the old and new value are equal and then investigate any cases where they are different to confirm which value is correct. It may be that they are different but both could be correct. tom > > Thanks, > Vladimir > > Vladimir Kozlov wrote: >> Thank you, Tom >> Tom Rodriguez wrote: >>> This looks really good. >>> >>> This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*. That would make it easier to use it directly during code generation, as in: >>> >>> + __ jmpb($labl$$label); >> Yes, I would leave it for an other time. I will file RFE. >>> >>> sparc.ad: >>> >>> It might be nice to factor this out: >>> >>> Assembler::Predict predict_taken = >>> + cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn; >> I will file RFE for that: use probability from IfNode to determine the pt value as you suggested before. >>> >>> x86_32.ad: >>> >>> Would you get averse to inlining Jcc and JccShort? >> I did not realize that it is just one instruction now :) >> They are used in a lot of places and I did not want to duplicate the original code. I will inline them now. >>> >>> output.cpp: >>> >>> Why does the first round of shorten_branches occur in the middle of init_buffer? Couldn't it be done right afterwards? It's just odd that it's buried inside there. >> First loop in shorten_branches() estimates code, locals, stubs sizes which are used later in init_buffer() to allocate CodeBuffer. I would need to split shorten_branches() method which is not easy since the first loop also collects information about branches which could be replaced. >>> >>> That first round is conservative since we haven't done all padding yet, right? >> Correct. >>> Then shorten_branches_final does a last pass based on the real offsets? >> Yes, backward branches inserted in this method use final offsets. For forward branches we still have only conservative offsets since following blocks are not processed yet. >>> shorten_branches_final isn't a great name. Maybe finalize_offsets_and_shorten? >> I also did not like it, I will use finalize_offsets_and_shorten() >>> >>> The core shorten branch logic is duplicated in those functions. Could it be factored out or is there too much local state? >> I thought about it but as you said "too much local state". >>> >>> Why was this needed? >>> >>> *** 2182,2192 **** >>> --- 2383,2393 ---- >>> (op != Op_Node && // Not an unused antidepedence node and >>> // not an unallocated boxlock >>> (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) { >>> // Push any trailing projections >>> ! if( bb->_nodes[bb->_nodes.size()-1] != n ) { >>> ! if( bb->_nodes[_bb_end-1] != n ) { >>> for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) { >>> Node *foi = n->fast_out(i); >>> if( foi->is_Proj() ) >>> _scheduled.push(foi); >>> } >>> >>> That code is complicated enough that I can't reason about it's correctness from a webrev. Is this because of the trailing NOPs? >> I hit next assert during development because the loop above pushed nodes which are not for schedule. >> assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of instructions" ); >> It may happened before I split shorten_branches() and there were trailing NOPs. But it is not only trailing NOPs, it is also projections after calls and MachNullCheck nodes (see code in DoScheduling()). I think in general the check above should check the last node for schedule and not the last node in block. >>> >>> Can you add this comment to the that last anti_do_def piece I added: >>> >>> // kill projections on a branch should appear to occur on the >>> // branch, not afterwards, so grab the masks from the projections >>> // and process them. >> Done. >> Thanks, >> Vladimir >>> >>> tom >>> >>> >>> On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote: >>> >>>> http://cr.openjdk.java.net/~kvn/7063629/webrev >>>> >>>> 7063629: use cbcond in C2 generated code on T4 >>>> >>>> The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86. >>>> >>>> Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back. >>>> >>>> Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding. >>>> >>>> Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions. >>>> >>>> Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file. >>>> >>>> Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64. >>>> >>>> Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte). >>>> >>>> The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it. >>> From vladimir.kozlov at oracle.com Thu Aug 11 11:52:37 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 11 Aug 2011 11:52:37 -0700 Subject: Request for reviews (L): 7063629: use cbcond in C2 generated code on T4 In-Reply-To: References: <4E3B452E.10509@oracle.com> <4E41BAE9.1070505@oracle.com> <4E441DCC.5040303@oracle.com> Message-ID: <4E4424F5.7070308@oracle.com> They are different but result is the same. I ran with assert as you suggested and found it immediately (-Xcomp). Anyway I will revert the change since we still have "wrong number of instructions" assert which should catch problems. Thanks, Vladimir Tom Rodriguez wrote: > On Aug 11, 2011, at 11:22 AM, Vladimir Kozlov wrote: > >>>> ! if( bb->_nodes[bb->_nodes.size()-1] != n ) { >>>> ! if( bb->_nodes[_bb_end-1] != n ) { >>>> >>>> That code is complicated enough that I can't reason about it's >>>> correctness from a webrev. Is this because of the trailing NOPs? >>> I hit next assert during development because the loop above pushed nodes >>> which are not for schedule. >>> >>> assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of >>> instructions" ); >>> >>> It may happened before I split shorten_branches() and there were >>> trailing NOPs. But it is not only trailing NOPs, it is also projections >>> after calls and MachNullCheck nodes (see code in DoScheduling()). I >>> think in general the check above should check the last node for schedule >>> and not the last node in block. >> Tom, >> >> I ran full CTW without this change with my latest changes and did not hit the assert which confirms that it was problem in early development when trailing NOPs were inserted before DoScheduling() call. Do you think I should remove this change? > > If it isn't be needed then I think should be removed. You could put in an assert that the old and new value are equal and then investigate any cases where they are different to confirm which value is correct. It may be that they are different but both could be correct. > > tom > >> Thanks, >> Vladimir >> >> Vladimir Kozlov wrote: >>> Thank you, Tom >>> Tom Rodriguez wrote: >>>> This looks really good. >>>> >>>> This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*. That would make it easier to use it directly during code generation, as in: >>>> >>>> + __ jmpb($labl$$label); >>> Yes, I would leave it for an other time. I will file RFE. >>>> sparc.ad: >>>> >>>> It might be nice to factor this out: >>>> >>>> Assembler::Predict predict_taken = >>>> + cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn; >>> I will file RFE for that: use probability from IfNode to determine the pt value as you suggested before. >>>> x86_32.ad: >>>> >>>> Would you get averse to inlining Jcc and JccShort? >>> I did not realize that it is just one instruction now :) >>> They are used in a lot of places and I did not want to duplicate the original code. I will inline them now. >>>> output.cpp: >>>> >>>> Why does the first round of shorten_branches occur in the middle of init_buffer? Couldn't it be done right afterwards? It's just odd that it's buried inside there. >>> First loop in shorten_branches() estimates code, locals, stubs sizes which are used later in init_buffer() to allocate CodeBuffer. I would need to split shorten_branches() method which is not easy since the first loop also collects information about branches which could be replaced. >>>> That first round is conservative since we haven't done all padding yet, right? >>> Correct. >>>> Then shorten_branches_final does a last pass based on the real offsets? >>> Yes, backward branches inserted in this method use final offsets. For forward branches we still have only conservative offsets since following blocks are not processed yet. >>>> shorten_branches_final isn't a great name. Maybe finalize_offsets_and_shorten? >>> I also did not like it, I will use finalize_offsets_and_shorten() >>>> The core shorten branch logic is duplicated in those functions. Could it be factored out or is there too much local state? >>> I thought about it but as you said "too much local state". >>>> Why was this needed? >>>> >>>> *** 2182,2192 **** >>>> --- 2383,2393 ---- >>>> (op != Op_Node && // Not an unused antidepedence node and >>>> // not an unallocated boxlock >>>> (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) { >>>> // Push any trailing projections >>>> ! if( bb->_nodes[bb->_nodes.size()-1] != n ) { >>>> ! if( bb->_nodes[_bb_end-1] != n ) { >>>> for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) { >>>> Node *foi = n->fast_out(i); >>>> if( foi->is_Proj() ) >>>> _scheduled.push(foi); >>>> } >>>> >>>> That code is complicated enough that I can't reason about it's correctness from a webrev. Is this because of the trailing NOPs? >>> I hit next assert during development because the loop above pushed nodes which are not for schedule. >>> assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of instructions" ); >>> It may happened before I split shorten_branches() and there were trailing NOPs. But it is not only trailing NOPs, it is also projections after calls and MachNullCheck nodes (see code in DoScheduling()). I think in general the check above should check the last node for schedule and not the last node in block. >>>> Can you add this comment to the that last anti_do_def piece I added: >>>> >>>> // kill projections on a branch should appear to occur on the >>>> // branch, not afterwards, so grab the masks from the projections >>>> // and process them. >>> Done. >>> Thanks, >>> Vladimir >>>> tom >>>> >>>> >>>> On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote: >>>> >>>>> http://cr.openjdk.java.net/~kvn/7063629/webrev >>>>> >>>>> 7063629: use cbcond in C2 generated code on T4 >>>>> >>>>> The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86. >>>>> >>>>> Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back. >>>>> >>>>> Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding. >>>>> >>>>> Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions. >>>>> >>>>> Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file. >>>>> >>>>> Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64. >>>>> >>>>> Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte). >>>>> >>>>> The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it. > From tom.rodriguez at oracle.com Thu Aug 11 15:02:53 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 11 Aug 2011 15:02:53 -0700 Subject: ReentrantLock performance regression between JDK5 and 6/7? In-Reply-To: References: Message-ID: <0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com> I believe this was caused by the switch to using lock addl[esp], 0 instead of mfence for volatile membars, 6822204. My review request for that said that at the time I didn't measure any performance change for Intel, http://cr.openjdk.java.net/~never/6822204. On your microbenchmark I can measure the difference though so I'm going to remeasure derby which previously showed the big difference. We may want to make the lock addl be AMD specific. tom On Aug 11, 2011, at 11:05 AM, Clemens Eisserer wrote: > Hi Vitaly, > > I tried this bench on 6u23 and if I first run that code in a 10k iteration loop and then time the 1mm iteration loop I get about 10 ms speedup. The first loop would trigger jit compilation (10k is the default threshold I believe) and second should run without compilation interruption. > > Can you try the same? Also might be interesting to time it under the interpreter (-Xint). > > I changed the testcase a bit, to no longer rely on OSR - as lockBench() will for sure soon hit the compilation threshold after a few runs. > > I get the following timings for 1m runs: > > jdk7-server: 53ms > jdk7-client: 62ms > jdk7-xint : 955ms > > jdk6-xint : 1000ms > jdk6-client: 68ms > jdk6-server: 52ms > > jdk5-server: 40ms > jdk5-client: 61ms > jdk5-xint : 832ms > > So JDK7 is slower in every case, the regression seems to have landed in jdk6 (I was using openjdk6). > > Should I file a bug-report about this behaviour? > > Thanks, Clemens > > > public class LockPerf { > static ReentrantLock lock = new ReentrantLock(); > > public static void main(String[] args) { > while (true) { > long start2 = System.nanoTime(); > for(int i=0; i < 1000; i++) { > lockBench(); > } > System.out.println("Lock bench: " + ((System.nanoTime() - start2)) / 1000000); > } > } > > private static void lockBench() { > for (int i = 0; i < 1000; i++) { > lock.lock(); > lock.unlock(); > } > } > } > > > On Aug 11, 2011 11:38 AM, "Clemens Eisserer" wrote: > > Hi Vitaly, > > > > Which OS are you using? > >> > > Linux-3.0 (Fedora 15) > > > > > >> Also, you should use System.nanoTime() for this type of timing as it gives > >> you a more precise timer. > >> > > I tried, but results remained the same. ~53ms for jdk6/7, ~41 for JDK5. > > I was using the server compiler both times. > > > > Thanks, Clemens > From vitalyd at gmail.com Thu Aug 11 15:39:15 2011 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Thu, 11 Aug 2011 18:39:15 -0400 Subject: ReentrantLock performance regression between JDK5 and 6/7? In-Reply-To: <0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com> References: <0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com> Message-ID: Hi Tom, Just curious - I recall reading on Dave Dice's blog that he found locked add to perform better than mfence. Granted he tested on a nehalem box - do you think it may need more granular decision making in the jit than just amd vs Intel? i.e. check Intel generation as well. Thanks On Aug 11, 2011 6:03 PM, "Tom Rodriguez" wrote: > I believe this was caused by the switch to using lock addl[esp], 0 instead of mfence for volatile membars, 6822204. My review request for that said that at the time I didn't measure any performance change for Intel, http://cr.openjdk.java.net/~never/6822204. On your microbenchmark I can measure the difference though so I'm going to remeasure derby which previously showed the big difference. We may want to make the lock addl be AMD specific. > > tom > > On Aug 11, 2011, at 11:05 AM, Clemens Eisserer wrote: > >> Hi Vitaly, >> >> I tried this bench on 6u23 and if I first run that code in a 10k iteration loop and then time the 1mm iteration loop I get about 10 ms speedup. The first loop would trigger jit compilation (10k is the default threshold I believe) and second should run without compilation interruption. >> >> Can you try the same? Also might be interesting to time it under the interpreter (-Xint). >> >> I changed the testcase a bit, to no longer rely on OSR - as lockBench() will for sure soon hit the compilation threshold after a few runs. >> >> I get the following timings for 1m runs: >> >> jdk7-server: 53ms >> jdk7-client: 62ms >> jdk7-xint : 955ms >> >> jdk6-xint : 1000ms >> jdk6-client: 68ms >> jdk6-server: 52ms >> >> jdk5-server: 40ms >> jdk5-client: 61ms >> jdk5-xint : 832ms >> >> So JDK7 is slower in every case, the regression seems to have landed in jdk6 (I was using openjdk6). >> >> Should I file a bug-report about this behaviour? >> >> Thanks, Clemens >> >> >> public class LockPerf { >> static ReentrantLock lock = new ReentrantLock(); >> >> public static void main(String[] args) { >> while (true) { >> long start2 = System.nanoTime(); >> for(int i=0; i < 1000; i++) { >> lockBench(); >> } >> System.out.println("Lock bench: " + ((System.nanoTime() - start2)) / 1000000); >> } >> } >> >> private static void lockBench() { >> for (int i = 0; i < 1000; i++) { >> lock.lock(); >> lock.unlock(); >> } >> } >> } >> >> >> On Aug 11, 2011 11:38 AM, "Clemens Eisserer" wrote: >> > Hi Vitaly, >> > >> > Which OS are you using? >> >> >> > Linux-3.0 (Fedora 15) >> > >> > >> >> Also, you should use System.nanoTime() for this type of timing as it gives >> >> you a more precise timer. >> >> >> > I tried, but results remained the same. ~53ms for jdk6/7, ~41 for JDK5. >> > I was using the server compiler both times. >> > >> > Thanks, Clemens >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110811/c7fc1bf9/attachment.html From vladimir.kozlov at oracle.com Thu Aug 11 23:30:29 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Fri, 12 Aug 2011 06:30:29 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7063629: use cbcond in C2 generated code on T4 Message-ID: <20110812063035.2F11447B02@hg.openjdk.java.net> Changeset: 95134e034042 Author: kvn Date: 2011-08-11 12:08 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/95134e034042 7063629: use cbcond in C2 generated code on T4 Summary: Use new short branch instruction in C2 generated code. Reviewed-by: never ! src/cpu/sparc/vm/assembler_sparc.hpp ! src/cpu/sparc/vm/sparc.ad ! src/cpu/sparc/vm/vm_version_sparc.cpp ! src/cpu/x86/vm/assembler_x86.cpp ! src/cpu/x86/vm/assembler_x86.hpp ! src/cpu/x86/vm/x86_32.ad ! src/cpu/x86/vm/x86_64.ad ! src/os_cpu/linux_x86/vm/linux_x86_32.ad ! src/os_cpu/linux_x86/vm/linux_x86_64.ad ! src/os_cpu/solaris_x86/vm/solaris_x86_32.ad ! src/os_cpu/solaris_x86/vm/solaris_x86_64.ad ! src/share/vm/adlc/formssel.cpp ! src/share/vm/adlc/output_h.cpp ! src/share/vm/opto/block.cpp ! src/share/vm/opto/block.hpp ! src/share/vm/opto/compile.hpp ! src/share/vm/opto/machnode.hpp ! src/share/vm/opto/matcher.hpp ! src/share/vm/opto/node.hpp ! src/share/vm/opto/output.cpp From fweimer at bfk.de Fri Aug 12 00:57:50 2011 From: fweimer at bfk.de (Florian Weimer) Date: Fri, 12 Aug 2011 07:57:50 +0000 Subject: ReentrantLock performance regression between JDK5 and 6/7? In-Reply-To: <0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com> (Tom Rodriguez's message of "Thu, 11 Aug 2011 15:02:53 -0700") References: <0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com> Message-ID: <824o1nukn5.fsf@mid.bfk.de> * Tom Rodriguez: > I believe this was caused by the switch to using lock addl[esp], 0 > instead of mfence for volatile membars, 6822204. My review request > for that said that at the time I didn't measure any performance change > for Intel, http://cr.openjdk.java.net/~never/6822204. On your > microbenchmark I can measure the difference though so I'm going to > remeasure derby which previously showed the big difference. We may > want to make the lock addl be AMD specific. Couldn't the relative speed of the two instructions also depend on the type of benchmark? -- Florian Weimer BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstra?e 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 From tom.rodriguez at oracle.com Fri Aug 12 11:22:14 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 12 Aug 2011 11:22:14 -0700 Subject: ReentrantLock performance regression between JDK5 and 6/7? In-Reply-To: <824o1nukn5.fsf@mid.bfk.de> References: <0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com> <824o1nukn5.fsf@mid.bfk.de> Message-ID: <0F9B135C-E961-4A73-8CD6-A17BAF2ABA19@oracle.com> On Aug 12, 2011, at 12:57 AM, Florian Weimer wrote: > * Tom Rodriguez: > >> I believe this was caused by the switch to using lock addl[esp], 0 >> instead of mfence for volatile membars, 6822204. My review request >> for that said that at the time I didn't measure any performance change >> for Intel, http://cr.openjdk.java.net/~never/6822204. On your >> microbenchmark I can measure the difference though so I'm going to >> remeasure derby which previously showed the big difference. We may >> want to make the lock addl be AMD specific. > > Couldn't the relative speed of the two instructions also depend on the > type of benchmark? These are primarily being emitted for volatile fences so many programs won't care about their speed at all. If you look at my other email it suggests that the difference is that Intel chips prior to Nehalem had heavier weight implementation of lock addl than was required. mfence stayed approximately the same between processor versions with it's speed pretty much tracking the relative clock speeds, 2.4 for the Tigerton and 2.8 for Nehalem. The original data suggested no performance change on Nehalem when switching instructions so it probably doesn't care either way. tom > > -- > Florian Weimer > BFK edv-consulting GmbH http://www.bfk.de/ > Kriegsstra?e 100 tel: +49-721-96201-1 > D-76133 Karlsruhe fax: +49-721-96201-99 From vladimir.kozlov at oracle.com Mon Aug 15 08:58:12 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Aug 2011 08:58:12 -0700 Subject: Request for reviews (XS): 7079317: Incorrect branch's destination block in PrintoOptoAssembly output Message-ID: <4E494214.2080407@oracle.com> http://cr.openjdk.java.net/~kvn/7079317/webrev 7079317: Incorrect branch's destination block in PrintoOptoAssembly output After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block. Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed. From tom.rodriguez at oracle.com Mon Aug 15 10:46:51 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 15 Aug 2011 10:46:51 -0700 Subject: Request for reviews (XS): 7079317: Incorrect branch's destination block in PrintoOptoAssembly output In-Reply-To: <4E494214.2080407@oracle.com> References: <4E494214.2080407@oracle.com> Message-ID: <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com> I don't understand how calling insts_size and Node::size causes a bug. What am I missing? tom On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7079317/webrev > > 7079317: Incorrect branch's destination block in PrintoOptoAssembly output > > After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block. > Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed. From vladimir.kozlov at oracle.com Mon Aug 15 10:50:10 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Aug 2011 10:50:10 -0700 Subject: Request for reviews (XS): 7079317: Incorrect branch's destination block in PrintoOptoAssembly output In-Reply-To: <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com> References: <4E494214.2080407@oracle.com> <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com> Message-ID: <4E495C52.7080807@oracle.com> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes. Vladimir Tom Rodriguez wrote: > I don't understand how calling insts_size and Node::size causes a bug. What am I missing? > > tom > > On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7079317/webrev >> >> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output >> >> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block. >> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed. > From tom.rodriguez at oracle.com Mon Aug 15 11:05:39 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 15 Aug 2011 11:05:39 -0700 Subject: Request for reviews (XS): 7079317: Incorrect branch's destination block in PrintoOptoAssembly output In-Reply-To: <4E495C52.7080807@oracle.com> References: <4E494214.2080407@oracle.com> <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com> <4E495C52.7080807@oracle.com> Message-ID: <08E31550-58B4-4125-876A-304C4465BC78@oracle.com> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote: > Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes. Ah. Fixing scratch_emit_size seems better since it's kind of a surprising behaviour. It's not that much code is it? tom > > Vladimir > > Tom Rodriguez wrote: >> I don't understand how calling insts_size and Node::size causes a bug. What am I missing? >> tom >> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/7079317/webrev >>> >>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output >>> >>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block. >>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed. From vladimir.kozlov at oracle.com Mon Aug 15 12:04:40 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Aug 2011 12:04:40 -0700 Subject: Request for reviews (XS): 7079317: Incorrect branch's destination block in PrintoOptoAssembly output In-Reply-To: <08E31550-58B4-4125-876A-304C4465BC78@oracle.com> References: <4E494214.2080407@oracle.com> <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com> <4E495C52.7080807@oracle.com> <08E31550-58B4-4125-876A-304C4465BC78@oracle.com> Message-ID: <4E496DC8.60107@oracle.com> Tom Rodriguez wrote: > On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote: > >> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes. > > Ah. Fixing scratch_emit_size seems better since it's kind of a surprising behaviour. It's not that much code is it? It needs a virtual method in MachNode which increase vtable of all Mach nodes. Here is webrev: http://cr.openjdk.java.net/~kvn/7079317/webrev Vladimir > > tom > >> Vladimir >> >> Tom Rodriguez wrote: >>> I don't understand how calling insts_size and Node::size causes a bug. What am I missing? >>> tom >>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote: >>>> http://cr.openjdk.java.net/~kvn/7079317/webrev >>>> >>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output >>>> >>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block. >>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed. > From tom.rodriguez at oracle.com Mon Aug 15 12:48:18 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 15 Aug 2011 12:48:18 -0700 Subject: Request for reviews (XS): 7079317: Incorrect branch's destination block in PrintoOptoAssembly output In-Reply-To: <4E496DC8.60107@oracle.com> References: <4E494214.2080407@oracle.com> <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com> <4E495C52.7080807@oracle.com> <08E31550-58B4-4125-876A-304C4465BC78@oracle.com> <4E496DC8.60107@oracle.com> Message-ID: <8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com> On Aug 15, 2011, at 12:04 PM, Vladimir Kozlov wrote: > Tom Rodriguez wrote: >> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote: >>> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes. >> Ah. Fixing scratch_emit_size seems better since it's kind of a surprising behaviour. It's not that much code is it? > > It needs a virtual method in MachNode which increase vtable of all Mach nodes. Here is webrev: If we're really concerned about vtable size, all of those subtype specific setter/getters could probably be elsewhere down in the hierarchy. The only meaningful implementations of label_set are in subclasses of MachGotoNode and MachIfNode so it seems like it could be moved into a new superclass of them. I guess alternatively you could have a single virtual which returns the labelOper and implement label_set and save_label non-virtually in terms of that, though that probably doesn't play well with MachNullCheck which is_Branch but doesn't have a label. The whole labelOper machinery looks ridiculously complicated... Anyway, your change is ok with me as is. tom > > http://cr.openjdk.java.net/~kvn/7079317/webrev > > Vladimir > >> tom >>> Vladimir >>> >>> Tom Rodriguez wrote: >>>> I don't understand how calling insts_size and Node::size causes a bug. What am I missing? >>>> tom >>>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote: >>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev >>>>> >>>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output >>>>> >>>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block. >>>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed. From vladimir.kozlov at oracle.com Mon Aug 15 17:20:53 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Aug 2011 17:20:53 -0700 Subject: Request for reviews (XS): 7079317: Incorrect branch's destination block in PrintoOptoAssembly output In-Reply-To: <8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com> References: <4E494214.2080407@oracle.com> <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com> <4E495C52.7080807@oracle.com> <08E31550-58B4-4125-876A-304C4465BC78@oracle.com> <4E496DC8.60107@oracle.com> <8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com> Message-ID: <4E49B7E5.8080909@oracle.com> Tom, You should not give me these ideas since I can't back out now :) . Here is implementation using MachBranchNode. The only problem was JumpX mach node which is subclass of MachConstantNode. But it is fine since it does not have label, short version or delay slot (the sparc instruction has delay slot but we use ialu_reg_reg pipe_class). It needs only one additional check in output.cpp where Kill projections are processed. http://cr.openjdk.java.net/~kvn/7079317/webrev Thanks, Vladimir Tom Rodriguez wrote: > On Aug 15, 2011, at 12:04 PM, Vladimir Kozlov wrote: > >> Tom Rodriguez wrote: >>> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote: >>>> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes. >>> Ah. Fixing scratch_emit_size seems better since it's kind of a surprising behaviour. It's not that much code is it? >> It needs a virtual method in MachNode which increase vtable of all Mach nodes. Here is webrev: > > If we're really concerned about vtable size, all of those subtype specific setter/getters could probably be elsewhere down in the hierarchy. The only meaningful implementations of label_set are in subclasses of MachGotoNode and MachIfNode so it seems like it could be moved into a new superclass of them. > > I guess alternatively you could have a single virtual which returns the labelOper and implement label_set and save_label non-virtually in terms of that, though that probably doesn't play well with MachNullCheck which is_Branch but doesn't have a label. The whole labelOper machinery looks ridiculously complicated... > > Anyway, your change is ok with me as is. > > tom > >> http://cr.openjdk.java.net/~kvn/7079317/webrev >> >> Vladimir >> >>> tom >>>> Vladimir >>>> >>>> Tom Rodriguez wrote: >>>>> I don't understand how calling insts_size and Node::size causes a bug. What am I missing? >>>>> tom >>>>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote: >>>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev >>>>>> >>>>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output >>>>>> >>>>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block. >>>>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed. > From vladimir.kozlov at oracle.com Mon Aug 15 18:12:03 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Aug 2011 18:12:03 -0700 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 Message-ID: <4E49C3E3.6060903@oracle.com> http://cr.openjdk.java.net/~kvn/7079329/webrev 7079329: Adjust allocation prefetching for T4 L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued. BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation prefetching. Changed prefetchAlloc_bis parameter from memory to regP. Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch write, 1: BIS). Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line address. Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation. L1_data_cache_line_size() renamed to prefetch_data_size(). From christian.thalinger at oracle.com Tue Aug 16 02:29:44 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 16 Aug 2011 11:29:44 +0200 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E49C3E3.6060903@oracle.com> References: <4E49C3E3.6060903@oracle.com> Message-ID: On Aug 16, 2011, at 3:12 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7079329/webrev > > 7079329: Adjust allocation prefetching for T4 > > L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued. > > BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation prefetching. > > Changed prefetchAlloc_bis parameter from memory to regP. > > Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch write, 1: BIS). > > Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line address. > > Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation. > > L1_data_cache_line_size() renamed to prefetch_data_size(). src/cpu/x86/vm/x86_32.ad: src/cpu/x86/vm/x86_64.ad: Can you use MacroAssembler instructions to emit the code for the new instructs? src/cpu/sparc/vm/vm_version_sparc.cpp: + if (is_T4()) { + // Double number of prefetched cache lines on T4 + // since L2 cache line size is smaller (32 bytes). + if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) { + FLAG_SET_DEFAULT(AllocatePrefetchLines, 6); + } + if (FLAG_IS_DEFAULT(AllocateInstPrefetchLines)) { + FLAG_SET_DEFAULT(AllocateInstPrefetchLines, 2); + } + } Maybe you should use *2 here. Otherwise this looks good. -- Christian From igor.veresov at oracle.com Tue Aug 16 02:47:58 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 16 Aug 2011 02:47:58 -0700 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E49C3E3.6060903@oracle.com> References: <4E49C3E3.6060903@oracle.com> Message-ID: I think this looks good. igor On Monday, August 15, 2011 at 6:12 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7079329/webrev > > 7079329: Adjust allocation prefetching for T4 > > L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As > result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that > prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued. > > BIS can't be use for general prefetching since it may fault. New > PrefetchAllocation node was added for allocation prefetching. > > Changed prefetchAlloc_bis parameter from memory to regP. > > Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for > allocation prefetching (0: prefetch write, 1: BIS). > > Added new instructions on Sparc cacheLineAdrX to reduce number of instructions > generated for finding next cache line address. > > Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch > for instance allocation. > > L1_data_cache_line_size() renamed to prefetch_data_size(). From martin.doerr at sap.com Tue Aug 16 03:31:53 2011 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 16 Aug 2011 12:31:53 +0200 Subject: allocation prefetching with block initializing instructions Message-ID: <160598AAAEA6C640BF796BA28D836C6404FC0D85E6@DEWDFECCR04.wdf.sap.corp> Hello everybody, I have read your emails about the allocation prefetching on SPARC. Avoiding fetching the cache lines from memory seems to make a lot of sense. However, it should be possible to use these block initializing stores to replace the ClearArray nodes in addition. We are loosing quite some time in these clear loops. Have you guys already thought about this? I had played with the ZeroTLAB switch some time ago, but the TLABs appear to get too large so clearing them at once doesn't perform well. But if we only clear to something like a prefetch watermark and get rid of the ClearArray we should get better performance. We only have to make sure that we always clear up to some distance behind the object being allocated. I'm looking forward to read your comments. Kind regards, Martin D -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110816/5ee27bb4/attachment-0001.html From paul.hohensee at oracle.com Tue Aug 16 06:01:12 2011 From: paul.hohensee at oracle.com (Paul Hohensee) Date: Tue, 16 Aug 2011 09:01:12 -0400 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E49C3E3.6060903@oracle.com> References: <4E49C3E3.6060903@oracle.com> Message-ID: <4E4A6A18.6080807@oracle.com> You're changing the meaning of an existing flag, AllocatePrefetchLines, to apply only to arrays, right? If so, I'd add another flag for arrays, maybe call it AllocateArrayPrefetchLines, and change the code so AllocatePrefetchLines becomes an optional parameter. E.g., default it to -1 in globals.hpp, and if it's specified on the command line, set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the command line value. That would retain backward compatibility: I believe I've seen AllocatePrefetchLines used in a few jbb submissions. Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst" is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'. Paul On 8/15/11 9:12 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7079329/webrev > > 7079329: Adjust allocation prefetching for T4 > > L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series > before. As result BIS instruction prefetches only 32 bytes. Jbb2005 > runs show that prefetching 64 bytes is still better on T4 so 2 BIS > instructions should be issued. > > BIS can't be use for general prefetching since it may fault. New > PrefetchAllocation node was added for allocation prefetching. > > Changed prefetchAlloc_bis parameter from memory to regP. > > Use AllocatePrefetchInstr on Sparc to allow specify what instruction > to use for allocation prefetching (0: prefetch write, 1: BIS). > > Added new instructions on Sparc cacheLineAdrX to reduce number of > instructions generated for finding next cache line address. > > Added new flag AllocateInstPrefetchLines to specify number of lines to > prefetch for instance allocation. > > L1_data_cache_line_size() renamed to prefetch_data_size(). From paul.hohensee at oracle.com Tue Aug 16 06:01:36 2011 From: paul.hohensee at oracle.com (Paul Hohensee) Date: Tue, 16 Aug 2011 09:01:36 -0400 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E49C3E3.6060903@oracle.com> References: <4E49C3E3.6060903@oracle.com> Message-ID: <4E4A6A30.6090608@oracle.com> You're changing the meaning of an existing flag, AllocatePrefetchLines, to apply only to arrays, right? If so, I'd add another flag for arrays, maybe call it AllocateArrayPrefetchLines, and change the code so AllocatePrefetchLines becomes an optional parameter. E.g., default it to -1 in globals.hpp, and if it's specified on the command line, set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the command line value. That would retain backward compatibility: I remember seeing AllocatePrefetchLines used in a few jbb submissions. Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst" is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'. Paul On 8/15/11 9:12 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7079329/webrev > > 7079329: Adjust allocation prefetching for T4 > > L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series > before. As result BIS instruction prefetches only 32 bytes. Jbb2005 > runs show that prefetching 64 bytes is still better on T4 so 2 BIS > instructions should be issued. > > BIS can't be use for general prefetching since it may fault. New > PrefetchAllocation node was added for allocation prefetching. > > Changed prefetchAlloc_bis parameter from memory to regP. > > Use AllocatePrefetchInstr on Sparc to allow specify what instruction > to use for allocation prefetching (0: prefetch write, 1: BIS). > > Added new instructions on Sparc cacheLineAdrX to reduce number of > instructions generated for finding next cache line address. > > Added new flag AllocateInstPrefetchLines to specify number of lines to > prefetch for instance allocation. > > L1_data_cache_line_size() renamed to prefetch_data_size(). From paul.hohensee at oracle.com Tue Aug 16 06:11:38 2011 From: paul.hohensee at oracle.com (Paul Hohensee) Date: Tue, 16 Aug 2011 09:11:38 -0400 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E4A6A30.6090608@oracle.com> References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com> Message-ID: <4E4A6C8A.9030306@oracle.com> Also, is there a way to avoid using #ifdef SPARC in threadLocalAllocBuffer.hpp? Maybe add a predicate to vm_version that says whether or not to play the tlab reserve game. Paul On 8/16/11 9:01 AM, Paul Hohensee wrote: > You're changing the meaning of an existing flag, > AllocatePrefetchLines, to > apply only to arrays, right? > > If so, I'd add another flag for arrays, maybe call it > AllocateArrayPrefetchLines, > and change the code so AllocatePrefetchLines becomes an optional > parameter. > E.g., default it to -1 in globals.hpp, and if it's specified on the > command line, > set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the > command line value. That would retain backward compatibility: I remember > seeing AllocatePrefetchLines used in a few jbb submissions. > > Also, I'd rename AllocateInstPrefetchLines to > AllocateInstancePrefetchLines. 'Inst" > is a bit confusing to me and perhaps to others: the first thing I > think of is 'instruction'. > > Paul > > On 8/15/11 9:12 PM, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/7079329/webrev >> >> 7079329: Adjust allocation prefetching for T4 >> >> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series >> before. As result BIS instruction prefetches only 32 bytes. Jbb2005 >> runs show that prefetching 64 bytes is still better on T4 so 2 BIS >> instructions should be issued. >> >> BIS can't be use for general prefetching since it may fault. New >> PrefetchAllocation node was added for allocation prefetching. >> >> Changed prefetchAlloc_bis parameter from memory to regP. >> >> Use AllocatePrefetchInstr on Sparc to allow specify what instruction >> to use for allocation prefetching (0: prefetch write, 1: BIS). >> >> Added new instructions on Sparc cacheLineAdrX to reduce number of >> instructions generated for finding next cache line address. >> >> Added new flag AllocateInstPrefetchLines to specify number of lines >> to prefetch for instance allocation. >> >> L1_data_cache_line_size() renamed to prefetch_data_size(). From christian.thalinger at oracle.com Tue Aug 16 06:26:24 2011 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Tue, 16 Aug 2011 13:26:24 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7071653: JSR 292: call site change notification should be pushed not pulled Message-ID: <20110816132629.4D79447BFC@hg.openjdk.java.net> Changeset: fdb992d83a87 Author: twisti Date: 2011-08-16 04:14 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/fdb992d83a87 7071653: JSR 292: call site change notification should be pushed not pulled Reviewed-by: kvn, never, bdelsart ! src/cpu/sparc/vm/interp_masm_sparc.cpp ! src/cpu/sparc/vm/interp_masm_sparc.hpp ! src/cpu/sparc/vm/templateTable_sparc.cpp ! src/cpu/x86/vm/interp_masm_x86_32.cpp ! src/cpu/x86/vm/interp_masm_x86_32.hpp ! src/cpu/x86/vm/interp_masm_x86_64.cpp ! src/cpu/x86/vm/interp_masm_x86_64.hpp ! src/cpu/x86/vm/templateTable_x86_32.cpp ! src/cpu/x86/vm/templateTable_x86_64.cpp ! src/share/vm/ci/ciCallSite.cpp ! src/share/vm/ci/ciCallSite.hpp ! src/share/vm/ci/ciField.hpp ! src/share/vm/classfile/systemDictionary.cpp ! src/share/vm/classfile/systemDictionary.hpp ! src/share/vm/classfile/vmSymbols.hpp ! src/share/vm/code/dependencies.cpp ! src/share/vm/code/dependencies.hpp ! src/share/vm/code/nmethod.cpp ! src/share/vm/interpreter/interpreterRuntime.cpp ! src/share/vm/interpreter/templateTable.hpp ! src/share/vm/memory/universe.cpp ! src/share/vm/memory/universe.hpp ! src/share/vm/oops/instanceKlass.cpp ! src/share/vm/opto/callGenerator.cpp ! src/share/vm/opto/callGenerator.hpp ! src/share/vm/opto/doCall.cpp ! src/share/vm/opto/parse3.cpp From vladimir.kozlov at oracle.com Tue Aug 16 08:01:37 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Aug 2011 08:01:37 -0700 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: References: <4E49C3E3.6060903@oracle.com> Message-ID: <4E4A8651.60006@oracle.com> On 8/16/11 2:29 AM, Christian Thalinger wrote: > > On Aug 16, 2011, at 3:12 AM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7079329/webrev >> >> 7079329: Adjust allocation prefetching for T4 >> >> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued. >> >> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation prefetching. >> >> Changed prefetchAlloc_bis parameter from memory to regP. >> >> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch write, 1: BIS). >> >> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line address. >> >> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation. >> >> L1_data_cache_line_size() renamed to prefetch_data_size(). > > src/cpu/x86/vm/x86_32.ad: > src/cpu/x86/vm/x86_64.ad: > > Can you use MacroAssembler instructions to emit the code for the new instructs? OK. > > src/cpu/sparc/vm/vm_version_sparc.cpp: > > + if (is_T4()) { > + // Double number of prefetched cache lines on T4 > + // since L2 cache line size is smaller (32 bytes). > + if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) { > + FLAG_SET_DEFAULT(AllocatePrefetchLines, 6); > + } > + if (FLAG_IS_DEFAULT(AllocateInstPrefetchLines)) { > + FLAG_SET_DEFAULT(AllocateInstPrefetchLines, 2); > + } > + } > > Maybe you should use *2 here. Something like this?: + if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) { + FLAG_SET_DEFAULT(AllocatePrefetchLines, AllocatePrefetchLines*2); + } Vladimir > > Otherwise this looks good. > > -- Christian From vladimir.kozlov at oracle.com Tue Aug 16 08:13:30 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Aug 2011 08:13:30 -0700 Subject: allocation prefetching with block initializing instructions In-Reply-To: <160598AAAEA6C640BF796BA28D836C6404FC0D85E6@DEWDFECCR04.wdf.sap.corp> References: <160598AAAEA6C640BF796BA28D836C6404FC0D85E6@DEWDFECCR04.wdf.sap.corp> Message-ID: <4E4A891A.2060703@oracle.com> Martin, I have next RFE which I am working on. I do use BIS in ClearArray. I still need to figure out how to use it for zeroing new objects in runtime: pd_fill_to_aligned_words() in copy_sparc.hpp which is used for big arrays. 7059037: Use BIS for zeroing on T4 Regards, Vladimir On 8/16/11 3:31 AM, Doerr, Martin wrote: > Hello everybody, > I have read your emails about the allocation prefetching on SPARC. > Avoiding fetching the cache lines from memory seems to make a lot of sense. > However, it should be possible to use these block initializing stores to replace > the ClearArray nodes in addition. We are loosing quite some time in these > clear loops. > Have you guys already thought about this? > I had played with the ZeroTLAB switch some time ago, but the TLABs appear to > get too large so clearing them at once doesn't perform well. But if we only > clear to something like a prefetch watermark and get rid of the ClearArray > we should get better performance. We only have to make sure that we always clear > up to some distance behind the object being allocated. > I'm looking forward to read your comments. Kind regards, > Martin D From vladimir.kozlov at oracle.com Tue Aug 16 08:18:52 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Aug 2011 08:18:52 -0700 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E4A6A18.6080807@oracle.com> References: <4E49C3E3.6060903@oracle.com> <4E4A6A18.6080807@oracle.com> Message-ID: <4E4A8A5C.7070808@oracle.com> On 8/16/11 6:01 AM, Paul Hohensee wrote: > You're changing the meaning of an existing flag, AllocatePrefetchLines, to > apply only to arrays, right? No. It was always used only for arrays: ! uint lines = (length != NULL) ? AllocatePrefetchLines : 1; > That would retain backward compatibility: I believe > I've seen AllocatePrefetchLines used in a few jbb submissions. That is why I did not rename it. > > Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst" > is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'. Agree. Thanks, Vladimir > > Paul > > On 8/15/11 9:12 PM, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/7079329/webrev >> >> 7079329: Adjust allocation prefetching for T4 >> >> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches only >> 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued. >> >> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation >> prefetching. >> >> Changed prefetchAlloc_bis parameter from memory to regP. >> >> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch >> write, 1: BIS). >> >> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line >> address. >> >> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation. >> >> L1_data_cache_line_size() renamed to prefetch_data_size(). From vladimir.kozlov at oracle.com Tue Aug 16 08:20:09 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Aug 2011 08:20:09 -0700 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E4A6C8A.9030306@oracle.com> References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com> <4E4A6C8A.9030306@oracle.com> Message-ID: <4E4A8AA9.2080006@oracle.com> I will think about it. Thanks, Vladimir On 8/16/11 6:11 AM, Paul Hohensee wrote: > Also, is there a way to avoid using #ifdef SPARC in threadLocalAllocBuffer.hpp? > Maybe add a predicate to vm_version that says whether or not to play the tlab > reserve game. > > Paul > > On 8/16/11 9:01 AM, Paul Hohensee wrote: >> You're changing the meaning of an existing flag, AllocatePrefetchLines, to >> apply only to arrays, right? >> >> If so, I'd add another flag for arrays, maybe call it AllocateArrayPrefetchLines, >> and change the code so AllocatePrefetchLines becomes an optional parameter. >> E.g., default it to -1 in globals.hpp, and if it's specified on the command line, >> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the >> command line value. That would retain backward compatibility: I remember >> seeing AllocatePrefetchLines used in a few jbb submissions. >> >> Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst" >> is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'. >> >> Paul >> >> On 8/15/11 9:12 PM, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/7079329/webrev >>> >>> 7079329: Adjust allocation prefetching for T4 >>> >>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches >>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued. >>> >>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation >>> prefetching. >>> >>> Changed prefetchAlloc_bis parameter from memory to regP. >>> >>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch >>> write, 1: BIS). >>> >>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line >>> address. >>> >>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation. >>> >>> L1_data_cache_line_size() renamed to prefetch_data_size(). From christian.thalinger at oracle.com Tue Aug 16 08:48:34 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 16 Aug 2011 17:48:34 +0200 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E4A8651.60006@oracle.com> References: <4E49C3E3.6060903@oracle.com> <4E4A8651.60006@oracle.com> Message-ID: On Aug 16, 2011, at 5:01 PM, Vladimir Kozlov wrote: > On 8/16/11 2:29 AM, Christian Thalinger wrote: >> >> On Aug 16, 2011, at 3:12 AM, Vladimir Kozlov wrote: >> >>> http://cr.openjdk.java.net/~kvn/7079329/webrev >>> >>> 7079329: Adjust allocation prefetching for T4 >>> >>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued. >>> >>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation prefetching. >>> >>> Changed prefetchAlloc_bis parameter from memory to regP. >>> >>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch write, 1: BIS). >>> >>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line address. >>> >>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation. >>> >>> L1_data_cache_line_size() renamed to prefetch_data_size(). >> >> src/cpu/x86/vm/x86_32.ad: >> src/cpu/x86/vm/x86_64.ad: >> >> Can you use MacroAssembler instructions to emit the code for the new instructs? > > OK. > >> >> src/cpu/sparc/vm/vm_version_sparc.cpp: >> >> + if (is_T4()) { >> + // Double number of prefetched cache lines on T4 >> + // since L2 cache line size is smaller (32 bytes). >> + if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) { >> + FLAG_SET_DEFAULT(AllocatePrefetchLines, 6); >> + } >> + if (FLAG_IS_DEFAULT(AllocateInstPrefetchLines)) { >> + FLAG_SET_DEFAULT(AllocateInstPrefetchLines, 2); >> + } >> + } >> >> Maybe you should use *2 here. > > Something like this?: > > + if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) { > + FLAG_SET_DEFAULT(AllocatePrefetchLines, AllocatePrefetchLines*2); > + } Yes, that makes more sense to me. -- Christian > > Vladimir > >> >> Otherwise this looks good. >> >> -- Christian From vladimir.kozlov at oracle.com Tue Aug 16 10:05:15 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Aug 2011 10:05:15 -0700 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E4A8AA9.2080006@oracle.com> References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com> <4E4A6C8A.9030306@oracle.com> <4E4A8AA9.2080006@oracle.com> Message-ID: <4E4AA34B.8010504@oracle.com> Thank you, Christian, Paul and Igor I updated webrev with suggestions: http://cr.openjdk.java.net/~kvn/7079329/webrev - AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines. - Prefetch instructions in x86 .ad use MacroAssembler instructions. - Added Abstract_VM_Version::reserve_for_allocation_prefetch() method used in ThreadLocalAllocBuffer::end_reserve(). - I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting since VM_Version::initialize() is called twice on Sparc (long story which I don't want to discuss here). Vladimir Vladimir Kozlov wrote: > I will think about it. > > Thanks, > Vladimir > > On 8/16/11 6:11 AM, Paul Hohensee wrote: >> Also, is there a way to avoid using #ifdef SPARC in >> threadLocalAllocBuffer.hpp? >> Maybe add a predicate to vm_version that says whether or not to play >> the tlab >> reserve game. >> >> Paul >> >> On 8/16/11 9:01 AM, Paul Hohensee wrote: >>> You're changing the meaning of an existing flag, >>> AllocatePrefetchLines, to >>> apply only to arrays, right? >>> >>> If so, I'd add another flag for arrays, maybe call it >>> AllocateArrayPrefetchLines, >>> and change the code so AllocatePrefetchLines becomes an optional >>> parameter. >>> E.g., default it to -1 in globals.hpp, and if it's specified on the >>> command line, >>> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the >>> command line value. That would retain backward compatibility: I remember >>> seeing AllocatePrefetchLines used in a few jbb submissions. >>> >>> Also, I'd rename AllocateInstPrefetchLines to >>> AllocateInstancePrefetchLines. 'Inst" >>> is a bit confusing to me and perhaps to others: the first thing I >>> think of is 'instruction'. >>> >>> Paul >>> >>> On 8/15/11 9:12 PM, Vladimir Kozlov wrote: >>>> http://cr.openjdk.java.net/~kvn/7079329/webrev >>>> >>>> 7079329: Adjust allocation prefetching for T4 >>>> >>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series >>>> before. As result BIS instruction prefetches >>>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still >>>> better on T4 so 2 BIS instructions should be issued. >>>> >>>> BIS can't be use for general prefetching since it may fault. New >>>> PrefetchAllocation node was added for allocation >>>> prefetching. >>>> >>>> Changed prefetchAlloc_bis parameter from memory to regP. >>>> >>>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction >>>> to use for allocation prefetching (0: prefetch >>>> write, 1: BIS). >>>> >>>> Added new instructions on Sparc cacheLineAdrX to reduce number of >>>> instructions generated for finding next cache line >>>> address. >>>> >>>> Added new flag AllocateInstPrefetchLines to specify number of lines >>>> to prefetch for instance allocation. >>>> >>>> L1_data_cache_line_size() renamed to prefetch_data_size(). From igor.veresov at oracle.com Tue Aug 16 11:09:05 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 16 Aug 2011 11:09:05 -0700 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E4AA34B.8010504@oracle.com> References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com> <4E4A6C8A.9030306@oracle.com> <4E4A8AA9.2080006@oracle.com> <4E4AA34B.8010504@oracle.com> Message-ID: <8D4DCF97085E4A44A5462CB4356E2B81@oracle.com> Still looks good. igor On Tuesday, August 16, 2011 at 10:05 AM, Vladimir Kozlov wrote: > Thank you, Christian, Paul and Igor > > I updated webrev with suggestions: > > http://cr.openjdk.java.net/~kvn/7079329/webrev > > - AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines. > - Prefetch instructions in x86 .ad use MacroAssembler instructions. > - Added Abstract_VM_Version::reserve_for_allocation_prefetch() method used in > ThreadLocalAllocBuffer::end_reserve(). > - I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting since > VM_Version::initialize() is called twice on Sparc (long story which I don't want > to discuss here). > > Vladimir > > Vladimir Kozlov wrote: > > I will think about it. > > > > Thanks, > > Vladimir > > > > On 8/16/11 6:11 AM, Paul Hohensee wrote: > > > Also, is there a way to avoid using #ifdef SPARC in > > > threadLocalAllocBuffer.hpp? > > > Maybe add a predicate to vm_version that says whether or not to play > > > the tlab > > > reserve game. > > > > > > Paul > > > > > > On 8/16/11 9:01 AM, Paul Hohensee wrote: > > > > You're changing the meaning of an existing flag, > > > > AllocatePrefetchLines, to > > > > apply only to arrays, right? > > > > > > > > If so, I'd add another flag for arrays, maybe call it > > > > AllocateArrayPrefetchLines, > > > > and change the code so AllocatePrefetchLines becomes an optional > > > > parameter. > > > > E.g., default it to -1 in globals.hpp, and if it's specified on the > > > > command line, > > > > set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the > > > > command line value. That would retain backward compatibility: I remember > > > > seeing AllocatePrefetchLines used in a few jbb submissions. > > > > > > > > Also, I'd rename AllocateInstPrefetchLines to > > > > AllocateInstancePrefetchLines. 'Inst" > > > > is a bit confusing to me and perhaps to others: the first thing I > > > > think of is 'instruction'. > > > > > > > > Paul > > > > > > > > On 8/15/11 9:12 PM, Vladimir Kozlov wrote: > > > > > http://cr.openjdk.java.net/~kvn/7079329/webrev > > > > > > > > > > 7079329: Adjust allocation prefetching for T4 > > > > > > > > > > L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series > > > > > before. As result BIS instruction prefetches > > > > > only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still > > > > > better on T4 so 2 BIS instructions should be issued. > > > > > > > > > > BIS can't be use for general prefetching since it may fault. New > > > > > PrefetchAllocation node was added for allocation > > > > > prefetching. > > > > > > > > > > Changed prefetchAlloc_bis parameter from memory to regP. > > > > > > > > > > Use AllocatePrefetchInstr on Sparc to allow specify what instruction > > > > > to use for allocation prefetching (0: prefetch > > > > > write, 1: BIS). > > > > > > > > > > Added new instructions on Sparc cacheLineAdrX to reduce number of > > > > > instructions generated for finding next cache line > > > > > address. > > > > > > > > > > Added new flag AllocateInstPrefetchLines to specify number of lines > > > > > to prefetch for instance allocation. > > > > > > > > > > L1_data_cache_line_size() renamed to prefetch_data_size(). From tom.rodriguez at oracle.com Tue Aug 16 11:11:13 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 16 Aug 2011 11:11:13 -0700 Subject: Request for reviews (XS): 7079317: Incorrect branch's destination block in PrintoOptoAssembly output In-Reply-To: <4E49B7E5.8080909@oracle.com> References: <4E494214.2080407@oracle.com> <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com> <4E495C52.7080807@oracle.com> <08E31550-58B4-4125-876A-304C4465BC78@oracle.com> <4E496DC8.60107@oracle.com> <8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com> <4E49B7E5.8080909@oracle.com> Message-ID: <9987443B-50F2-42FF-8A98-5713B87B50A3@oracle.com> That looks good. tom On Aug 15, 2011, at 5:20 PM, Vladimir Kozlov wrote: > Tom, > > You should not give me these ideas since I can't back out now :) . Here is implementation using MachBranchNode. The only problem was JumpX mach node which is subclass of MachConstantNode. But it is fine since it does not have label, short version or delay slot (the sparc instruction has delay slot but we use ialu_reg_reg pipe_class). It needs only one additional check in output.cpp where Kill projections are processed. > > http://cr.openjdk.java.net/~kvn/7079317/webrev > > Thanks, > Vladimir > > Tom Rodriguez wrote: >> On Aug 15, 2011, at 12:04 PM, Vladimir Kozlov wrote: >>> Tom Rodriguez wrote: >>>> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote: >>>>> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes. >>>> Ah. Fixing scratch_emit_size seems better since it's kind of a surprising behaviour. It's not that much code is it? >>> It needs a virtual method in MachNode which increase vtable of all Mach nodes. Here is webrev: >> If we're really concerned about vtable size, all of those subtype specific setter/getters could probably be elsewhere down in the hierarchy. The only meaningful implementations of label_set are in subclasses of MachGotoNode and MachIfNode so it seems like it could be moved into a new superclass of them. >> I guess alternatively you could have a single virtual which returns the labelOper and implement label_set and save_label non-virtually in terms of that, though that probably doesn't play well with MachNullCheck which is_Branch but doesn't have a label. The whole labelOper machinery looks ridiculously complicated... >> Anyway, your change is ok with me as is. >> tom >>> http://cr.openjdk.java.net/~kvn/7079317/webrev >>> >>> Vladimir >>> >>>> tom >>>>> Vladimir >>>>> >>>>> Tom Rodriguez wrote: >>>>>> I don't understand how calling insts_size and Node::size causes a bug. What am I missing? >>>>>> tom >>>>>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote: >>>>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev >>>>>>> >>>>>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output >>>>>>> >>>>>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block. >>>>>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed. From vladimir.kozlov at oracle.com Tue Aug 16 11:14:36 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Aug 2011 11:14:36 -0700 Subject: Request for reviews (XS): 7079317: Incorrect branch's destination block in PrintoOptoAssembly output In-Reply-To: <9987443B-50F2-42FF-8A98-5713B87B50A3@oracle.com> References: <4E494214.2080407@oracle.com> <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com> <4E495C52.7080807@oracle.com> <08E31550-58B4-4125-876A-304C4465BC78@oracle.com> <4E496DC8.60107@oracle.com> <8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com> <4E49B7E5.8080909@oracle.com> <9987443B-50F2-42FF-8A98-5713B87B50A3@oracle.com> Message-ID: <4E4AB38C.4000504@oracle.com> Thank you, Tom Vladimir Tom Rodriguez wrote: > That looks good. > > tom > > On Aug 15, 2011, at 5:20 PM, Vladimir Kozlov wrote: > >> Tom, >> >> You should not give me these ideas since I can't back out now :) . Here is implementation using MachBranchNode. The only problem was JumpX mach node which is subclass of MachConstantNode. But it is fine since it does not have label, short version or delay slot (the sparc instruction has delay slot but we use ialu_reg_reg pipe_class). It needs only one additional check in output.cpp where Kill projections are processed. >> >> http://cr.openjdk.java.net/~kvn/7079317/webrev >> >> Thanks, >> Vladimir >> >> Tom Rodriguez wrote: >>> On Aug 15, 2011, at 12:04 PM, Vladimir Kozlov wrote: >>>> Tom Rodriguez wrote: >>>>> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote: >>>>>> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes. >>>>> Ah. Fixing scratch_emit_size seems better since it's kind of a surprising behaviour. It's not that much code is it? >>>> It needs a virtual method in MachNode which increase vtable of all Mach nodes. Here is webrev: >>> If we're really concerned about vtable size, all of those subtype specific setter/getters could probably be elsewhere down in the hierarchy. The only meaningful implementations of label_set are in subclasses of MachGotoNode and MachIfNode so it seems like it could be moved into a new superclass of them. >>> I guess alternatively you could have a single virtual which returns the labelOper and implement label_set and save_label non-virtually in terms of that, though that probably doesn't play well with MachNullCheck which is_Branch but doesn't have a label. The whole labelOper machinery looks ridiculously complicated... >>> Anyway, your change is ok with me as is. >>> tom >>>> http://cr.openjdk.java.net/~kvn/7079317/webrev >>>> >>>> Vladimir >>>> >>>>> tom >>>>>> Vladimir >>>>>> >>>>>> Tom Rodriguez wrote: >>>>>>> I don't understand how calling insts_size and Node::size causes a bug. What am I missing? >>>>>>> tom >>>>>>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote: >>>>>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev >>>>>>>> >>>>>>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output >>>>>>>> >>>>>>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block. >>>>>>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed. > From christian.thalinger at oracle.com Tue Aug 16 11:32:40 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 16 Aug 2011 20:32:40 +0200 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E4AA34B.8010504@oracle.com> References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com> <4E4A6C8A.9030306@oracle.com> <4E4A8AA9.2080006@oracle.com> <4E4AA34B.8010504@oracle.com> Message-ID: Looks good. -- Christian On Aug 16, 2011, at 7:05 PM, Vladimir Kozlov wrote: > Thank you, Christian, Paul and Igor > > I updated webrev with suggestions: > > http://cr.openjdk.java.net/~kvn/7079329/webrev > > - AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines. > - Prefetch instructions in x86 .ad use MacroAssembler instructions. > - Added Abstract_VM_Version::reserve_for_allocation_prefetch() method used in ThreadLocalAllocBuffer::end_reserve(). > - I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting since VM_Version::initialize() is called twice on Sparc (long story which I don't want to discuss here). > > Vladimir > > Vladimir Kozlov wrote: >> I will think about it. >> Thanks, >> Vladimir >> On 8/16/11 6:11 AM, Paul Hohensee wrote: >>> Also, is there a way to avoid using #ifdef SPARC in threadLocalAllocBuffer.hpp? >>> Maybe add a predicate to vm_version that says whether or not to play the tlab >>> reserve game. >>> >>> Paul >>> >>> On 8/16/11 9:01 AM, Paul Hohensee wrote: >>>> You're changing the meaning of an existing flag, AllocatePrefetchLines, to >>>> apply only to arrays, right? >>>> >>>> If so, I'd add another flag for arrays, maybe call it AllocateArrayPrefetchLines, >>>> and change the code so AllocatePrefetchLines becomes an optional parameter. >>>> E.g., default it to -1 in globals.hpp, and if it's specified on the command line, >>>> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the >>>> command line value. That would retain backward compatibility: I remember >>>> seeing AllocatePrefetchLines used in a few jbb submissions. >>>> >>>> Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst" >>>> is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'. >>>> >>>> Paul >>>> >>>> On 8/15/11 9:12 PM, Vladimir Kozlov wrote: >>>>> http://cr.openjdk.java.net/~kvn/7079329/webrev >>>>> >>>>> 7079329: Adjust allocation prefetching for T4 >>>>> >>>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches >>>>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued. >>>>> >>>>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation >>>>> prefetching. >>>>> >>>>> Changed prefetchAlloc_bis parameter from memory to regP. >>>>> >>>>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch >>>>> write, 1: BIS). >>>>> >>>>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line >>>>> address. >>>>> >>>>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation. >>>>> >>>>> L1_data_cache_line_size() renamed to prefetch_data_size(). From vladimir.kozlov at oracle.com Tue Aug 16 11:31:55 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Aug 2011 11:31:55 -0700 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com> <4E4A6C8A.9030306@oracle.com> <4E4A8AA9.2080006@oracle.com> <4E4AA34B.8010504@oracle.com> Message-ID: <4E4AB79B.4020608@oracle.com> thank you, Christian. Vladimir Christian Thalinger wrote: > Looks good. > > -- Christian > > On Aug 16, 2011, at 7:05 PM, Vladimir Kozlov wrote: > >> Thank you, Christian, Paul and Igor >> >> I updated webrev with suggestions: >> >> http://cr.openjdk.java.net/~kvn/7079329/webrev >> >> - AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines. >> - Prefetch instructions in x86 .ad use MacroAssembler instructions. >> - Added Abstract_VM_Version::reserve_for_allocation_prefetch() method used in ThreadLocalAllocBuffer::end_reserve(). >> - I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting since VM_Version::initialize() is called twice on Sparc (long story which I don't want to discuss here). >> >> Vladimir >> >> Vladimir Kozlov wrote: >>> I will think about it. >>> Thanks, >>> Vladimir >>> On 8/16/11 6:11 AM, Paul Hohensee wrote: >>>> Also, is there a way to avoid using #ifdef SPARC in threadLocalAllocBuffer.hpp? >>>> Maybe add a predicate to vm_version that says whether or not to play the tlab >>>> reserve game. >>>> >>>> Paul >>>> >>>> On 8/16/11 9:01 AM, Paul Hohensee wrote: >>>>> You're changing the meaning of an existing flag, AllocatePrefetchLines, to >>>>> apply only to arrays, right? >>>>> >>>>> If so, I'd add another flag for arrays, maybe call it AllocateArrayPrefetchLines, >>>>> and change the code so AllocatePrefetchLines becomes an optional parameter. >>>>> E.g., default it to -1 in globals.hpp, and if it's specified on the command line, >>>>> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the >>>>> command line value. That would retain backward compatibility: I remember >>>>> seeing AllocatePrefetchLines used in a few jbb submissions. >>>>> >>>>> Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst" >>>>> is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'. >>>>> >>>>> Paul >>>>> >>>>> On 8/15/11 9:12 PM, Vladimir Kozlov wrote: >>>>>> http://cr.openjdk.java.net/~kvn/7079329/webrev >>>>>> >>>>>> 7079329: Adjust allocation prefetching for T4 >>>>>> >>>>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches >>>>>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued. >>>>>> >>>>>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation >>>>>> prefetching. >>>>>> >>>>>> Changed prefetchAlloc_bis parameter from memory to regP. >>>>>> >>>>>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch >>>>>> write, 1: BIS). >>>>>> >>>>>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line >>>>>> address. >>>>>> >>>>>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation. >>>>>> >>>>>> L1_data_cache_line_size() renamed to prefetch_data_size(). > From christian.thalinger at Oracle.com Tue Aug 16 12:52:34 2011 From: christian.thalinger at Oracle.com (Christian Thalinger) Date: Tue, 16 Aug 2011 21:52:34 +0200 Subject: Request for reviews (XXS): 7079626: x64 emits unnecessary REX prefix Message-ID: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com> http://cr.openjdk.java.net/~twisti/7079626/ 7079626: x64 emits unnecessary REX prefix Reviewed-by: While investigating some other bug we found out that on x64 we sometimes emit unnecessary REX prefixes. From vladimir.kozlov at oracle.com Tue Aug 16 12:57:32 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Aug 2011 12:57:32 -0700 Subject: Request for reviews (XXS): 7079626: x64 emits unnecessary REX prefix In-Reply-To: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com> References: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com> Message-ID: <4E4ACBAC.40801@oracle.com> Looks good. Vladimir Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7079626/ > > 7079626: x64 emits unnecessary REX prefix > Reviewed-by: > > While investigating some other bug we found out that on x64 we > sometimes emit unnecessary REX prefixes. From igor.veresov at oracle.com Tue Aug 16 13:05:21 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 16 Aug 2011 13:05:21 -0700 Subject: Request for reviews (XXS): 7079626: x64 emits unnecessary REX prefix In-Reply-To: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com> References: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com> Message-ID: Looks good. igor On Tuesday, August 16, 2011 at 12:52 PM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7079626/ > > 7079626: x64 emits unnecessary REX prefix > Reviewed-by: > > While investigating some other bug we found out that on x64 we > sometimes emit unnecessary REX prefixes. From tom.rodriguez at oracle.com Tue Aug 16 13:08:11 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 16 Aug 2011 13:08:11 -0700 Subject: Request for reviews (XXS): 7079626: x64 emits unnecessary REX prefix In-Reply-To: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com> References: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com> Message-ID: Looks good. tom On Aug 16, 2011, at 12:52 PM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7079626/ > > 7079626: x64 emits unnecessary REX prefix > Reviewed-by: > > While investigating some other bug we found out that on x64 we > sometimes emit unnecessary REX prefixes. From vladimir.kozlov at oracle.com Tue Aug 16 16:27:01 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Tue, 16 Aug 2011 23:27:01 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7079317: Incorrect branch's destination block in PrintoOptoAssembly output Message-ID: <20110816232707.BA0DC47C30@hg.openjdk.java.net> Changeset: 11211f7cb5a0 Author: kvn Date: 2011-08-16 11:53 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/11211f7cb5a0 7079317: Incorrect branch's destination block in PrintoOptoAssembly output Summary: save/restore label and block in scratch_emit_size() Reviewed-by: never ! src/share/vm/adlc/archDesc.cpp ! src/share/vm/adlc/formssel.cpp ! src/share/vm/adlc/output_c.cpp ! src/share/vm/adlc/output_h.cpp ! src/share/vm/opto/block.cpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/idealGraphPrinter.cpp ! src/share/vm/opto/machnode.cpp ! src/share/vm/opto/machnode.hpp ! src/share/vm/opto/node.hpp ! src/share/vm/opto/output.cpp From vladimir.kozlov at oracle.com Tue Aug 16 21:32:33 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 17 Aug 2011 04:32:33 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7079329: Adjust allocation prefetching for T4 Message-ID: <20110817043235.1C4A147C41@hg.openjdk.java.net> Changeset: 1af104d6cf99 Author: kvn Date: 2011-08-16 16:59 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/1af104d6cf99 7079329: Adjust allocation prefetching for T4 Summary: on T4 2 BIS instructions should be issued to prefetch 64 bytes Reviewed-by: iveresov, phh, twisti ! src/cpu/sparc/vm/assembler_sparc.hpp ! src/cpu/sparc/vm/sparc.ad ! src/cpu/sparc/vm/vm_version_sparc.cpp ! src/cpu/sparc/vm/vm_version_sparc.hpp ! src/cpu/x86/vm/assembler_x86.cpp ! src/cpu/x86/vm/vm_version_x86.cpp ! src/cpu/x86/vm/vm_version_x86.hpp ! src/cpu/x86/vm/x86_32.ad ! src/cpu/x86/vm/x86_64.ad ! src/share/vm/adlc/formssel.cpp ! src/share/vm/memory/threadLocalAllocBuffer.hpp ! src/share/vm/opto/classes.hpp ! src/share/vm/opto/macro.cpp ! src/share/vm/opto/matcher.cpp ! src/share/vm/opto/memnode.hpp ! src/share/vm/runtime/globals.hpp ! src/share/vm/runtime/vm_version.cpp ! src/share/vm/runtime/vm_version.hpp From paul.hohensee at oracle.com Wed Aug 17 04:43:48 2011 From: paul.hohensee at oracle.com (Paul Hohensee) Date: Wed, 17 Aug 2011 07:43:48 -0400 Subject: Request for reviews (M): 7079329: Adjust allocation prefetching for T4 In-Reply-To: <4E4AA34B.8010504@oracle.com> References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com> <4E4A6C8A.9030306@oracle.com> <4E4A8AA9.2080006@oracle.com> <4E4AA34B.8010504@oracle.com> Message-ID: <4E4BA974.2080008@oracle.com> Looks good. Paul On 8/16/11 1:05 PM, Vladimir Kozlov wrote: > Thank you, Christian, Paul and Igor > > I updated webrev with suggestions: > > http://cr.openjdk.java.net/~kvn/7079329/webrev > > - AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines. > - Prefetch instructions in x86 .ad use MacroAssembler instructions. > - Added Abstract_VM_Version::reserve_for_allocation_prefetch() method > used in ThreadLocalAllocBuffer::end_reserve(). > - I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting > since VM_Version::initialize() is called twice on Sparc (long story > which I don't want to discuss here). > > Vladimir > > Vladimir Kozlov wrote: >> I will think about it. >> >> Thanks, >> Vladimir >> >> On 8/16/11 6:11 AM, Paul Hohensee wrote: >>> Also, is there a way to avoid using #ifdef SPARC in >>> threadLocalAllocBuffer.hpp? >>> Maybe add a predicate to vm_version that says whether or not to play >>> the tlab >>> reserve game. >>> >>> Paul >>> >>> On 8/16/11 9:01 AM, Paul Hohensee wrote: >>>> You're changing the meaning of an existing flag, >>>> AllocatePrefetchLines, to >>>> apply only to arrays, right? >>>> >>>> If so, I'd add another flag for arrays, maybe call it >>>> AllocateArrayPrefetchLines, >>>> and change the code so AllocatePrefetchLines becomes an optional >>>> parameter. >>>> E.g., default it to -1 in globals.hpp, and if it's specified on the >>>> command line, >>>> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines >>>> to the >>>> command line value. That would retain backward compatibility: I >>>> remember >>>> seeing AllocatePrefetchLines used in a few jbb submissions. >>>> >>>> Also, I'd rename AllocateInstPrefetchLines to >>>> AllocateInstancePrefetchLines. 'Inst" >>>> is a bit confusing to me and perhaps to others: the first thing I >>>> think of is 'instruction'. >>>> >>>> Paul >>>> >>>> On 8/15/11 9:12 PM, Vladimir Kozlov wrote: >>>>> http://cr.openjdk.java.net/~kvn/7079329/webrev >>>>> >>>>> 7079329: Adjust allocation prefetching for T4 >>>>> >>>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T >>>>> series before. As result BIS instruction prefetches >>>>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is >>>>> still better on T4 so 2 BIS instructions should be issued. >>>>> >>>>> BIS can't be use for general prefetching since it may fault. New >>>>> PrefetchAllocation node was added for allocation >>>>> prefetching. >>>>> >>>>> Changed prefetchAlloc_bis parameter from memory to regP. >>>>> >>>>> Use AllocatePrefetchInstr on Sparc to allow specify what >>>>> instruction to use for allocation prefetching (0: prefetch >>>>> write, 1: BIS). >>>>> >>>>> Added new instructions on Sparc cacheLineAdrX to reduce number of >>>>> instructions generated for finding next cache line >>>>> address. >>>>> >>>>> Added new flag AllocateInstPrefetchLines to specify number of >>>>> lines to prefetch for instance allocation. >>>>> >>>>> L1_data_cache_line_size() renamed to prefetch_data_size(). From christian.thalinger at oracle.com Wed Aug 17 09:35:42 2011 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Wed, 17 Aug 2011 16:35:42 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7079626: x64 emits unnecessary REX prefix Message-ID: <20110817163545.80DC347C64@hg.openjdk.java.net> Changeset: 381bf869f784 Author: twisti Date: 2011-08-17 05:14 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/381bf869f784 7079626: x64 emits unnecessary REX prefix Reviewed-by: kvn, iveresov, never ! src/cpu/x86/vm/assembler_x86.cpp From christian.thalinger at oracle.com Wed Aug 17 11:20:01 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 17 Aug 2011 20:20:01 +0200 Subject: Request for reviews (XXS): 7079769: JSR 292: incorrect size() for CallStaticJavaHandle on sparc Message-ID: <001166E8-B9AE-4843-AF8A-6F1F9063D751@oracle.com> http://cr.openjdk.java.net/~twisti/7079769/ 7079769: JSR 292: incorrect size() for CallStaticJavaHandle on sparc Reviewed-by: The preserve_SP and restore_SP add two instructions resulting in a size of 16 not 8. src/cpu/sparc/vm/sparc.ad From tom.rodriguez at oracle.com Wed Aug 17 11:37:19 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 17 Aug 2011 11:37:19 -0700 Subject: Request for reviews (XXS): 7079769: JSR 292: incorrect size() for CallStaticJavaHandle on sparc In-Reply-To: <001166E8-B9AE-4843-AF8A-6F1F9063D751@oracle.com> References: <001166E8-B9AE-4843-AF8A-6F1F9063D751@oracle.com> Message-ID: <132B3A1B-B8EC-45B5-B08B-982CEC305B3D@oracle.com> Looks good. tom On Aug 17, 2011, at 11:20 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7079769/ > > 7079769: JSR 292: incorrect size() for CallStaticJavaHandle on sparc > Reviewed-by: > > The preserve_SP and restore_SP add two instructions resulting in a > size of 16 not 8. > > src/cpu/sparc/vm/sparc.ad > From christian.thalinger at oracle.com Wed Aug 17 16:25:02 2011 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Wed, 17 Aug 2011 23:25:02 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7079769: JSR 292: incorrect size() for CallStaticJavaHandle on sparc Message-ID: <20110817232505.6EDE447C89@hg.openjdk.java.net> Changeset: bd87c0dcaba5 Author: twisti Date: 2011-08-17 11:52 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/bd87c0dcaba5 7079769: JSR 292: incorrect size() for CallStaticJavaHandle on sparc Reviewed-by: never, kvn ! src/cpu/sparc/vm/sparc.ad From vladimir.kozlov at oracle.com Wed Aug 17 17:35:29 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 17 Aug 2011 17:35:29 -0700 Subject: Request for reviews (S): 7080431: VM asserts if specified size(x) in .ad is larger than emitted size Message-ID: <4E4C5E51.4020307@oracle.com> http://cr.openjdk.java.net/~kvn/7080431/webrev 7080431: VM asserts if specified size(x) in .ad is larger than emitted size It was allowed to specify larger size(x) in mach node definition in .ad file than actual emitted instruction size. It was treated as upper bound on instruction size. 7063629 changes broke that, it requires size(x) in mach node definition match the emitted size which reduced flexibility in C2 development. Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior. From tom.rodriguez at oracle.com Wed Aug 17 17:55:07 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 17 Aug 2011 17:55:07 -0700 Subject: Request for reviews (S): 7080431: VM asserts if specified size(x) in .ad is larger than emitted size In-Reply-To: <4E4C5E51.4020307@oracle.com> References: <4E4C5E51.4020307@oracle.com> Message-ID: <4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com> Is this effectively a partial anti-delta of the fill_buffer changes? tom On Aug 17, 2011, at 5:35 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7080431/webrev > > 7080431: VM asserts if specified size(x) in .ad is larger than emitted size > > It was allowed to specify larger size(x) in mach node definition in .ad file than actual emitted instruction size. It was treated as upper bound on instruction size. 7063629 changes broke that, it requires size(x) in mach node definition match the emitted size which reduced flexibility in C2 development. > > Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior. From vladimir.kozlov at oracle.com Wed Aug 17 18:20:11 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 17 Aug 2011 18:20:11 -0700 Subject: Request for reviews (S): 7080431: VM asserts if specified size(x) in .ad is larger than emitted size In-Reply-To: <4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com> References: <4E4C5E51.4020307@oracle.com> <4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com> Message-ID: <4E4C68CB.8030204@oracle.com> On 8/17/11 5:55 PM, Tom Rodriguez wrote: > Is this effectively a partial anti-delta of the fill_buffer changes? Yes for inserting padding and block alignment. It never did branch shortening and corresponding offsets verification. Vladimir > > tom > > On Aug 17, 2011, at 5:35 PM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7080431/webrev >> >> 7080431: VM asserts if specified size(x) in .ad is larger than emitted size >> >> It was allowed to specify larger size(x) in mach node definition in .ad file than actual emitted instruction size. It was treated as upper bound on instruction size. 7063629 changes broke that, it requires size(x) in mach node definition match the emitted size which reduced flexibility in C2 development. >> >> Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior. > From tom.rodriguez at oracle.com Wed Aug 17 18:46:46 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 17 Aug 2011 18:46:46 -0700 Subject: Request for reviews (S): 7080431: VM asserts if specified size(x) in .ad is larger than emitted size In-Reply-To: <4E4C68CB.8030204@oracle.com> References: <4E4C5E51.4020307@oracle.com> <4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com> <4E4C68CB.8030204@oracle.com> Message-ID: <6CA01544-397C-4654-A4E4-4DBD297E1A9A@oracle.com> On Aug 17, 2011, at 6:20 PM, Vladimir Kozlov wrote: > On 8/17/11 5:55 PM, Tom Rodriguez wrote: >> Is this effectively a partial anti-delta of the fill_buffer changes? > > Yes for inserting padding and block alignment. It never did branch shortening and corresponding offsets verification. I compared it with the previous one and it looks good. Thanks for fixing this. tom > > Vladimir > >> >> tom >> >> On Aug 17, 2011, at 5:35 PM, Vladimir Kozlov wrote: >> >>> http://cr.openjdk.java.net/~kvn/7080431/webrev >>> >>> 7080431: VM asserts if specified size(x) in .ad is larger than emitted size >>> >>> It was allowed to specify larger size(x) in mach node definition in .ad file than actual emitted instruction size. It was treated as upper bound on instruction size. 7063629 changes broke that, it requires size(x) in mach node definition match the emitted size which reduced flexibility in C2 development. >>> >>> Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior. >> From vladimir.kozlov at oracle.com Wed Aug 17 18:53:24 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 17 Aug 2011 18:53:24 -0700 Subject: Request for reviews (S): 7080431: VM asserts if specified size(x) in .ad is larger than emitted size In-Reply-To: <6CA01544-397C-4654-A4E4-4DBD297E1A9A@oracle.com> References: <4E4C5E51.4020307@oracle.com> <4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com> <4E4C68CB.8030204@oracle.com> <6CA01544-397C-4654-A4E4-4DBD297E1A9A@oracle.com> Message-ID: <4E4C7094.3030209@oracle.com> Thank you, Tom Vladimir On 8/17/11 6:46 PM, Tom Rodriguez wrote: > > On Aug 17, 2011, at 6:20 PM, Vladimir Kozlov wrote: > >> On 8/17/11 5:55 PM, Tom Rodriguez wrote: >>> Is this effectively a partial anti-delta of the fill_buffer changes? >> >> Yes for inserting padding and block alignment. It never did branch shortening and corresponding offsets verification. > > I compared it with the previous one and it looks good. Thanks for fixing this. > > tom > >> >> Vladimir >> >>> >>> tom >>> >>> On Aug 17, 2011, at 5:35 PM, Vladimir Kozlov wrote: >>> >>>> http://cr.openjdk.java.net/~kvn/7080431/webrev >>>> >>>> 7080431: VM asserts if specified size(x) in .ad is larger than emitted size >>>> >>>> It was allowed to specify larger size(x) in mach node definition in .ad file than actual emitted instruction size. It was treated as upper bound on instruction size. 7063629 changes broke that, it requires size(x) in mach node definition match the emitted size which reduced flexibility in C2 development. >>>> >>>> Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior. >>> > From vladimir.kozlov at oracle.com Thu Aug 18 16:14:17 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Thu, 18 Aug 2011 23:14:17 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7080431: VM asserts if specified size(x) in .ad is larger than emitted size Message-ID: <20110818231422.52BE147D40@hg.openjdk.java.net> Changeset: 739a9abbbd4b Author: kvn Date: 2011-08-18 11:49 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/739a9abbbd4b 7080431: VM asserts if specified size(x) in .ad is larger than emitted size Summary: Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior. Reviewed-by: never ! src/share/vm/opto/compile.hpp ! src/share/vm/opto/output.cpp From vladimir.kozlov at oracle.com Fri Aug 19 12:41:37 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 19 Aug 2011 12:41:37 -0700 Subject: Request for reviews (XS): 7076831: TEST_BUG: compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS Message-ID: <4E4EBC71.9060101@oracle.com> http://cr.openjdk.java.net/~kvn/7076831/webrev 7076831: TEST_BUG: compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS Run test only on systems with 2Gbyte or more memory. Don't zap heap to reduce execution time. From vladimir.kozlov at oracle.com Fri Aug 19 22:20:04 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Sat, 20 Aug 2011 05:20:04 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 40 new changesets Message-ID: <20110820052115.A5E8247EBE@hg.openjdk.java.net> Changeset: d9dc0a55c848 Author: schien Date: 2011-05-20 16:03 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d9dc0a55c848 Added tag jdk7-b143 for changeset c149193c768b ! .hgtags Changeset: 278445be9145 Author: trims Date: 2011-05-24 14:02 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/278445be9145 Added tag hs21-b13 for changeset c149193c768b ! .hgtags Changeset: 01e01c25d24a Author: trims Date: 2011-05-24 14:07 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/01e01c25d24a Merge ! .hgtags Changeset: e6e7d76b2bd3 Author: mr Date: 2011-05-24 15:28 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/e6e7d76b2bd3 7048009: Update .jcheck/conf files for JDK 8 Reviewed-by: jjh ! .jcheck/conf Changeset: 968305b802ee Author: trims Date: 2011-07-23 01:56 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/968305b802ee Merge Changeset: 8e5d4aa73a8c Author: trims Date: 2011-07-22 23:47 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/8e5d4aa73a8c 7069176: Update the JDK version numbers in Hotspot for JDK 8 Summary: Change JDK_MINOR_VER and JDK_PREVIOUS_VERSION to reflect JDK8 values Reviewed-by: jcoomes ! make/hotspot_version Changeset: 0cc8a70952c3 Author: trims Date: 2011-07-22 23:42 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/0cc8a70952c3 7070061: Adjust Hotspot make/jprt.properties for new JDK8 settings Summary: Fix so the JPRT can build with -release jdk8 now Reviewed-by: ohair ! make/jprt.properties Changeset: 20cac004a4f9 Author: dsamersoff Date: 2011-06-09 01:06 +0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/20cac004a4f9 Merge Changeset: 1744e37e032b Author: dsamersoff Date: 2011-06-18 13:32 +0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/1744e37e032b Merge Changeset: d425748f2203 Author: dcubed Date: 2011-06-23 20:31 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d425748f2203 7043987: 3/3 JVMTI FollowReferences is slow Summary: VM_HeapWalkOperation::doit() should only reset mark bits when necessary. Reviewed-by: dsamersoff, ysr, dholmes, dcubed Contributed-by: ashok.srinivasa.murthy at oracle.com ! src/share/vm/prims/jvmtiTagMap.cpp Changeset: 88dce6a60ac8 Author: dcubed Date: 2011-06-29 20:28 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/88dce6a60ac8 6951623: 3/3 possible performance problems in FollowReferences() and GetObjectsWithTags() Summary: Call collect_stack_roots() before collect_simple_roots() as an optimization. Reviewed-by: ysr, dsamersoff, dcubed Contributed-by: ashok.srinivasa.murthy at oracle.com ! src/share/vm/prims/jvmtiTagMap.cpp Changeset: 109d1d265924 Author: dholmes Date: 2011-07-02 04:17 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/109d1d265924 7052988: JPRT embedded builds don't set MINIMIZE_RAM_USAGE Reviewed-by: kamg, dsamersoff ! make/jprt.gmk Changeset: 5447b2c582ad Author: coleenp Date: 2011-07-07 22:34 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/5447b2c582ad Merge Changeset: bcc6475bc68f Author: coleenp Date: 2011-07-16 22:21 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/bcc6475bc68f Merge Changeset: 0b80db433fcb Author: dholmes Date: 2011-07-22 00:29 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/0b80db433fcb 7046490: Preallocated OOME objects should obey Throwable stack trace protocol Summary: Update the OOME stacktrace to contain Throwable.UNASSIGNED_STACK when the backtrace is filled in Reviewed-by: mchung, phh ! src/share/vm/classfile/javaClasses.cpp ! src/share/vm/classfile/javaClasses.hpp Changeset: 8107273fd204 Author: coleenp Date: 2011-07-23 10:42 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/8107273fd204 Merge Changeset: ca1f1753c866 Author: andrew Date: 2011-07-28 14:10 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/ca1f1753c866 7072341: enable hotspot builds on Linux 3.0 Summary: Add "3" to list of allowable versions Reviewed-by: kamg, chrisphi ! make/linux/Makefile Changeset: 14a2fd14c0db Author: johnc Date: 2011-08-01 10:04 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/14a2fd14c0db 7068240: G1: Long "parallel other time" and "ext root scanning" when running specific benchmark Summary: In root processing, move the scanning of the reference processor's discovered lists to before RSet updating and scanning. When scanning the reference processor's discovered lists, use a buffering closure so that the time spent copying any reference object is correctly attributed. Also removed a couple of unused and irrelevant timers. Reviewed-by: ysr, jmasa ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp ! src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp ! src/share/vm/gc_implementation/g1/g1CollectorPolicy.hpp Changeset: 6aa4feb8a366 Author: johnc Date: 2011-08-02 12:13 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/6aa4feb8a366 7069863: G1: SIGSEGV running SPECjbb2011 and -UseBiasedLocking Summary: Align the reserved size of the heap and perm to the heap region size to get a preferred heap base that is aligned to the region size, and call the correct heap reservation constructor. Also add a check in the heap reservation code that the reserved space starts at the requested address (if any). Reviewed-by: kvn, ysr ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp ! src/share/vm/runtime/virtualspace.cpp Changeset: a20e6e447d3d Author: iveresov Date: 2011-08-05 16:44 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a20e6e447d3d 7060842: UseNUMA crash with UseHugreTLBFS running SPECjvm2008 Summary: Use mmap() instead of madvise(MADV_DONTNEED) to uncommit pages Reviewed-by: ysr ! src/os/linux/vm/os_linux.cpp Changeset: 7c2653aefc46 Author: iveresov Date: 2011-08-05 16:50 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/7c2653aefc46 7060836: RHEL 5.5 and 5.6 should support UseNUMA Summary: Add a wrapper for sched_getcpu() for systems where libc lacks it Reviewed-by: ysr Contributed-by: Andrew John Hughes ! src/os/linux/vm/os_linux.cpp ! src/os/linux/vm/os_linux.hpp Changeset: 41e6ee74f879 Author: kevinw Date: 2011-08-02 14:37 +0100 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/41e6ee74f879 7072527: CMS: JMM GC counters overcount in some cases Summary: Avoid overcounting when CMS has concurrent mode failure. Reviewed-by: ysr Contributed-by: rednaxelafx at gmail.com ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp ! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp + test/gc/7072527/TestFullGCCount.java Changeset: e9db47a083cc Author: kevinw Date: 2011-08-11 14:58 +0100 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/e9db47a083cc Merge Changeset: 87e40b34bc2b Author: johnc Date: 2011-08-11 11:36 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/87e40b34bc2b 7074579: G1: JVM crash with JDK7 running ATG CRMDemo Fusion App Summary: Handlize MemoryUsage klass oop in createGCInfo routine Reviewed-by: tonyp, fparain, ysr, jcoomes ! src/share/vm/services/gcNotifier.cpp Changeset: f44782f04dd4 Author: tonyp Date: 2011-08-12 11:31 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/f44782f04dd4 7039627: G1: avoid BOT updates for survivor allocations and dirty survivor regions incrementally Summary: Refactor the allocation code during GC to use the G1AllocRegion abstraction. Use separate subclasses of G1AllocRegion for survivor and old regions. Avoid BOT updates and dirty survivor cards incrementally for the former. Reviewed-by: brutisso, johnc, ysr ! src/share/vm/gc_implementation/g1/g1AllocRegion.cpp ! src/share/vm/gc_implementation/g1/g1AllocRegion.hpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.inline.hpp ! src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp ! src/share/vm/gc_implementation/g1/g1CollectorPolicy.hpp ! src/share/vm/gc_implementation/g1/heapRegion.cpp ! src/share/vm/gc_implementation/g1/heapRegion.hpp ! src/share/vm/gc_implementation/g1/heapRegionRemSet.cpp Changeset: 76b1a9420e3d Author: ysr Date: 2011-08-16 08:02 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/76b1a9420e3d Merge Changeset: 46cb9a7b8b01 Author: dsamersoff Date: 2011-08-10 15:04 +0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/46cb9a7b8b01 7073913: The fix for 7017193 causes segfaults Summary: Buffer overflow in os::get_line_chars Reviewed-by: coleenp, dholmes, dcubed Contributed-by: aph at redhat.com ! src/share/vm/runtime/os.cpp Changeset: b1cbb0907b36 Author: zgu Date: 2011-04-15 09:34 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/b1cbb0907b36 7016797: Hotspot: securely/restrictive load dlls and new API for loading system dlls Summary: Created Windows Dll wrapped to handle jdk6 and jdk7 platform requirements, also provided more restictive Dll search orders for Windows system Dlls. Reviewed-by: acorn, dcubed, ohair, alanb ! make/windows/makefiles/compile.make ! src/os/windows/vm/decoder_windows.cpp ! src/os/windows/vm/jvm_windows.h ! src/os/windows/vm/os_windows.cpp ! src/os/windows/vm/os_windows.hpp Changeset: 279ef1916773 Author: zgu Date: 2011-07-12 21:13 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/279ef1916773 7065535: Mistyped function name that disabled UseLargePages on Windows Summary: Missing suffix "A" of Windows API LookupPrivilegeValue failed finding function pointer, caused VM to disable UseLargePages option Reviewed-by: coleenp, phh ! src/os/windows/vm/os_windows.cpp Changeset: a68e11dceb83 Author: zgu Date: 2011-08-16 09:18 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a68e11dceb83 Merge Changeset: 00ed4ccfe642 Author: collins Date: 2011-08-17 07:05 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/00ed4ccfe642 Merge Changeset: de147f62e695 Author: kvn Date: 2011-08-19 08:55 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/de147f62e695 Merge - agent/src/share/classes/sun/jvm/hotspot/interpreter/BytecodeFastAAccess0.java - agent/src/share/classes/sun/jvm/hotspot/interpreter/BytecodeFastIAccess0.java Changeset: 24cee90e9453 Author: jcoomes Date: 2011-08-17 10:32 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/24cee90e9453 6791672: enable 1G and larger pages on solaris Reviewed-by: ysr, iveresov, johnc ! src/os/solaris/vm/os_solaris.cpp ! src/share/vm/runtime/os.cpp ! src/share/vm/runtime/os.hpp Changeset: 3be7439273c5 Author: katleman Date: 2011-05-25 13:31 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/3be7439273c5 7044486: open jdk repos have files with incorrect copyright headers, which can end up in src bundles Reviewed-by: ohair, trims ! agent/src/share/classes/sun/jvm/hotspot/runtime/ServiceThread.java ! make/linux/README ! make/windows/projectfiles/kernel/Makefile ! src/cpu/x86/vm/vm_version_x86.cpp ! src/cpu/x86/vm/vm_version_x86.hpp ! src/os_cpu/solaris_sparc/vm/solaris_sparc.s ! src/share/tools/hsdis/README ! src/share/vm/gc_implementation/g1/heapRegionSet.inline.hpp ! src/share/vm/gc_implementation/parNew/parCardTableModRefBS.cpp ! src/share/vm/utilities/yieldingWorkgroup.cpp Changeset: 8b135e6129d6 Author: jeff Date: 2011-05-27 15:01 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/8b135e6129d6 7045697: JDK7 THIRD PARTY README update Reviewed-by: lana ! THIRD_PARTY_README Changeset: 52e4ba46751f Author: kamg Date: 2011-04-12 16:42 -0400 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/52e4ba46751f 7020373: JSR rewriting can overflow memory address size variables Summary: Abort if incoming classfile's parameters would cause overflows Reviewed-by: coleenp, dcubed, never ! src/share/vm/oops/generateOopMap.cpp + test/runtime/7020373/Test7020373.sh Changeset: bca686989d4b Author: asaha Date: 2011-06-15 14:59 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/bca686989d4b 7055247: Ignore test of # 7020373 Reviewed-by: dcubed ! test/runtime/7020373/Test7020373.sh Changeset: 337ffef74c37 Author: jeff Date: 2011-06-22 10:10 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/337ffef74c37 7057046: Add embedded license to THIRD PARTY README Reviewed-by: lana ! THIRD_PARTY_README Changeset: 9f12ede5571a Author: jcoomes Date: 2011-08-19 14:08 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/9f12ede5571a Merge ! src/cpu/x86/vm/vm_version_x86.cpp ! src/cpu/x86/vm/vm_version_x86.hpp ! src/share/vm/oops/generateOopMap.cpp ! src/share/vm/runtime/os.cpp Changeset: 7c29742c41b4 Author: jcoomes Date: 2011-08-19 14:22 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/7c29742c41b4 7081251: bump the hs22 build number to 02 Reviewed-by: johnc ! make/hotspot_version From igor.veresov at oracle.com Fri Aug 19 23:17:51 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Fri, 19 Aug 2011 23:17:51 -0700 Subject: Request for reviews (XS): 7076831: TEST_BUG: compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS In-Reply-To: <4E4EBC71.9060101@oracle.com> References: <4E4EBC71.9060101@oracle.com> Message-ID: Looks good. igor On Friday, August 19, 2011 at 12:41 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7076831/webrev > > 7076831: TEST_BUG: compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS > > Run test only on systems with 2Gbyte or more memory. Don't zap heap to reduce > execution time. From vladimir.kozlov at oracle.com Sat Aug 20 17:24:05 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Sun, 21 Aug 2011 00:24:05 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7076831: TEST_BUG: compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS Message-ID: <20110821002407.7A6CD47F4B@hg.openjdk.java.net> Changeset: ff9ab6327924 Author: kvn Date: 2011-08-20 14:03 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/ff9ab6327924 7076831: TEST_BUG: compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS Summary: Run test only on systems with 2Gbyte or more memory. Don't zap heap to reduce execution time. Reviewed-by: iveresov ! test/compiler/5091921/Test7005594.sh From vladimir.kozlov at oracle.com Mon Aug 22 10:33:50 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 22 Aug 2011 10:33:50 -0700 Subject: Request for reviews (XXXS): 7081926 assert(VM_Version::supports_sse2()) failed: must support Message-ID: <4E5292FE.3010500@oracle.com> http://cr.openjdk.java.net/~kvn/7081926/webrev 7081926 assert(VM_Version::supports_sse2()) failed: must support Changes in 7079329 (use MacroAssembler prefetch instructions in x86 .ad files) exposed typo in this assert, prefetchnta is supported since SSE not SSE2. From tom.rodriguez at oracle.com Mon Aug 22 10:44:38 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 22 Aug 2011 10:44:38 -0700 Subject: Request for reviews (XXXS): 7081926 assert(VM_Version::supports_sse2()) failed: must support In-Reply-To: <4E5292FE.3010500@oracle.com> References: <4E5292FE.3010500@oracle.com> Message-ID: <83375FC5-1695-4748-AB88-78E4011AB1C2@oracle.com> Looks good. tom On Aug 22, 2011, at 10:33 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7081926/webrev > > 7081926 assert(VM_Version::supports_sse2()) failed: must support > > Changes in 7079329 (use MacroAssembler prefetch instructions in x86 .ad files) exposed typo in this assert, prefetchnta is supported since SSE not SSE2. > From vladimir.kozlov at oracle.com Mon Aug 22 10:52:24 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 22 Aug 2011 10:52:24 -0700 Subject: Request for reviews (XXXS): 7081926 assert(VM_Version::supports_sse2()) failed: must support In-Reply-To: <83375FC5-1695-4748-AB88-78E4011AB1C2@oracle.com> References: <4E5292FE.3010500@oracle.com> <83375FC5-1695-4748-AB88-78E4011AB1C2@oracle.com> Message-ID: <4E529758.9080202@oracle.com> Thank you, Tom Vladimir Tom Rodriguez wrote: > Looks good. > > tom > > On Aug 22, 2011, at 10:33 AM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7081926/webrev >> >> 7081926 assert(VM_Version::supports_sse2()) failed: must support >> >> Changes in 7079329 (use MacroAssembler prefetch instructions in x86 .ad files) exposed typo in this assert, prefetchnta is supported since SSE not SSE2. >> > From vladimir.kozlov at oracle.com Mon Aug 22 17:32:48 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Tue, 23 Aug 2011 00:32:48 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7081926: assert(VM_Version::supports_sse2()) failed: must support Message-ID: <20110823003251.D5BF547FF5@hg.openjdk.java.net> Changeset: a594deb1d6dc Author: kvn Date: 2011-08-22 11:00 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a594deb1d6dc 7081926: assert(VM_Version::supports_sse2()) failed: must support Summary: fix assert, prefetchnta is supported since SSE not SSE2. Reviewed-by: never ! src/cpu/x86/vm/assembler_x86.cpp From christian.thalinger at oracle.com Tue Aug 23 12:20:30 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 23 Aug 2011 21:20:30 +0200 Subject: Request for reviews (S): 7078382: JSR 292: don't count method handle adapters against inlining budgets Message-ID: http://cr.openjdk.java.net/~twisti/7078382/ 7078382: JSR 292: don't count method handle adapters against inlining budgets Reviewed-by: Currently the code size of method handle adapters are counted against inlining budgets like DesiredMethodLimit. This results to earlier compiler bailouts with method handle call sites than without leading to worse performance. The fix is to return an adjusted bytecode size for method handle adapters for inlining decisions (the metric we use for now is the number of invokes). Tested with JRuby benchmarks. From tom.rodriguez at oracle.com Tue Aug 23 16:44:38 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 23 Aug 2011 16:44:38 -0700 Subject: review for 7071307: MethodHandle bimorphic inlining should consider the frequency Message-ID: This is a re-review since I added per method handle GWT profiling. http://cr.openjdk.java.net/~never/7071307 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg 7071307: MethodHandle bimorphic inlining should consider the frequency Reviewed-by: The fix for 7050554 added a bimorphic inline path but didn't take into account the frequency of the guarding test. This ends up treating both sides of the if as equally frequent which can lead to over inlining and overflowing the method inlining limits. The fix is to grab the frequency from the If and apply that to the branches. Additionally I added support for per method handle profile collection since this was required to get good results for more complex programs. This requires the fix for 7082631 on the JDK side. http://cr.openjdk.java.net/~never/7082631 I also fixed a problem with the ideal graph printer where debug_orig printing would go into an infinite loop. Tested with jruby and vm.mlvm tests. From christian.thalinger at oracle.com Wed Aug 24 06:12:55 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 24 Aug 2011 15:12:55 +0200 Subject: review for 7071307: MethodHandle bimorphic inlining should consider the frequency In-Reply-To: References: Message-ID: <4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com> On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote: > This is a re-review since I added per method handle GWT profiling. > > http://cr.openjdk.java.net/~never/7071307 > 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg src/share/vm/prims/methodHandleWalk.cpp: MethodHandleCompiler::fetch_counts: + int count1 = -1, count2 = -1; ... + int total = count1 + count2; + if (count1 != -1 && count2 != -2 && total != 0) { Why -2? + int _taken_count; + int _not_taken_count; Does taken refer to target and not_taken to fallback in the GWT? MethodHandleCompiler::make_invoke: Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking? + bool found_sel = false; Can you rename that to maybe found_selectAlternative? src/share/vm/ci/ciMethodHandle.cpp: That print_chain is very helpful. Thanks for that. src/share/vm/classfile/javaClasses.cpp: + int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) { + assert(is_instance(mh), "DMH only"); + return mh->int_field(_vmcount_offset); + } + + void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) { + assert(is_instance(mh), "DMH only"); + mh->int_field_put(_vmcount_offset, count); + } I think the assert message is a copy-paste bug. Otherwise looks good. > > 7071307: MethodHandle bimorphic inlining should consider the frequency > Reviewed-by: > > The fix for 7050554 added a bimorphic inline path but didn't take into > account the frequency of the guarding test. This ends up treating > both sides of the if as equally frequent which can lead to over > inlining and overflowing the method inlining limits. The fix is to > grab the frequency from the If and apply that to the branches. > > Additionally I added support for per method handle profile collection > since this was required to get good results for more complex programs. > This requires the fix for 7082631 on the JDK side. > http://cr.openjdk.java.net/~never/7082631 The JDK changes look good. -- Christian > > I also fixed a problem with the ideal graph printer where debug_orig > printing would go into an infinite loop. > > Tested with jruby and vm.mlvm tests. > From tom.deneau at amd.com Wed Aug 24 09:26:54 2011 From: tom.deneau at amd.com (Deneau, Tom) Date: Wed, 24 Aug 2011 11:26:54 -0500 Subject: Review Request: UseNUMAInterleaving #6 In-Reply-To: <4E543CDA.3050904@oracle.com> References: <5EA33A275136844D843B73A29FB9A6A901362B54B2@SAUSEXMBP01.amd.com> <4E402E1C.1010807@oracle.com> <5EA33A275136844D843B73A29FB9A6A90186EF904E@SAUSEXMBP01.amd.com> <247BA26129A14681B03D0856A6FAC69D@oracle.com> <5EA33A275136844D843B73A29FB9A6A90186EF98B7@SAUSEXMBP01.amd.com> <91928C974B07497184AF80B96F196606@oracle.com> <5EA33A275136844D843B73A29FB9A6A90186FA618A@SAUSEXMBP01.amd.com> <462098EF18364A629C463AC72D5495CC@oracle.com> <5EA33A275136844D843B73A29FB9A6A9018D581E22@SAUSEXMBP01.amd.com> <9F66C366BA1C4D8A83183711ADE738A0@oracle.com> <5EA33A275136844D843B73A29FB9A6A9018D581EBA@SAUSEXMBP01.amd.com> <4E543CDA.3050904@oracle.com> Message-ID: <5EA33A275136844D843B73A29FB9A6A9018D582275@SAUSEXMBP01.amd.com> I believe I have addressed ramki's comments with http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.06/ -- Tom > -----Original Message----- > From: Y. S. Ramakrishna [mailto:y.s.ramakrishna at oracle.com] > Sent: Tuesday, August 23, 2011 6:51 PM > To: Deneau, Tom > Cc: hotspot-gc-dev at openjdk.java.net > Subject: Re: Review Request: UseNUMAInterleaving #4 > > Hi Tom -- the perf improvement on windows is impressive. > > The changes look good. Just a few very minor nits below: > > globals.hpp: In the doc string field for NUMAInterleaveGranularity, you > might state that this is a Windows only option. (although i recognize > that this hasn't been done for some of the other windows options that > i became aware of now as being used exclusively in windows before > your changes, for instance: UseLargePagesIndividualAllocation > and LargePagesIndividualAllocationInjectError). > > arguments.cpp: you could get rid of the empty lines 1432-1433, and move > the > content of 1428-1430 into the if-scope of 1422-1426. > > os_windows.cpp: you can probably get rid of the extra newline > introduced at line 1967. > > line 3018, typo: "NUMAInterleavaing" > also at line 3033: "thNUMANodeListHolderat" > The comment at lines 3030-3033 would also benefit > from a few missing punctuation marks. > > at lines 3040 and 3043, it might read better to place the returns > on lines of their own. > > If you run with +UseNUMAInterleaving and a commit failed, > it would seem that the error message at line 2987 would be > confusing and incorrect. Perhaps you want to suitably modify > it or just suppress the additional text in that case. > > os_solaris.cpp: 2780-2784, it might make sense to do the madvise > global/many > call only if the mmap_chunk() succeeds, rather than all the time as you > are doing. May be something like:- > > 2780 char *res = Solaris::mmap_chunk(addr, size, MAP_PRIVATE|MAP_FIXED, > prot); > 2781 if (res != NULL) { > if (UseNUMAInterleaving) { > 2782 numa_make_global(res, size); > } > return true; > 2783 } > 2784 return false; > > At line 3444, would it make sense to use "size" instead of "bytes" > (although > size is just a copy of bytes -- i don't understand the reason for making > the copy, so feel free to ignore if this is some recherche style issue; > otherwise > it might make sense to get rid of the copy and just use the formal > parameter as is > the case for the Linux code; although this is really not code that you > introduced, > but just because you happen to be touching code in the vicinity... your > choice.) > > In the same vein, i'd make the Linux code similar in shape to > the solaris code for the two hunks changed in os_linux.cpp. > > rest looks good. > -- ramki > > On 08/23/11 12:59, Deneau, Tom wrote: > > OK, http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.05/ > > should address the concerns listed below... > > > > -- Tom > > > > > >> -----Original Message----- > >> From: Igor Veresov [mailto:igor.veresov at oracle.com] > >> Sent: Tuesday, August 23, 2011 1:53 PM > >> To: Deneau, Tom > >> Cc: hotspot-gc-dev at openjdk.java.net > >> Subject: Re: Review Request: UseNUMAInterleaving #4 > >> > >> Tom, > >> > >> This looks good to me, except three minor things: > >> > >> os_windows.cpp: > >> > >> - you should check for null here: > >> 2630 ~NUMANodeListHolder() { > >>> if (_numa_used_node_list != NULL) { > >> 2631 FREE_C_HEAP_ARRAY(int, _numa_used_node_list); > >>> } > >> 2632 } > >> > >> - if NUMANodeListHolder::build() will be called multiple times, you'll > >> leak memory. I guess you should check if _numa_used_node_list is NULL > and > >> if not free it first. > >> > >> - you didn't modify os::numa_get_leaf_groups() to handle the situation > >> when the value of argument "size" is bigger than > >> NUMANodeListHolder::get_count(). You can use MIN2 to adjust the value. > >> See my comment in the previous mail. > >> > >> > >> igor > >> > >> On Tuesday, August 23, 2011 at 11:23 AM, Deneau, Tom wrote: > >> > >>> Please review this patch which adds a new flag called > >>> UseNUMAInterleaving. This flag provides a subset of the functionality > >>> provided by UseNUMA. In Hotspot UseNUMA terminology, > >>> UseNUMAInterleaved makes all memory "numa_global" which is > implemented > >>> as interleaved. This patch's main purpose is to provide that subset > >>> on OSes like Windows which do not support the full UseNUMA > >>> functionality. However, a simple implementation of > UseNUMAInterleaving > >> is > >>> also provided for other OSes > >>> > >>> The situations where this shows the biggest benefits would be: > >>> * Windows platforms with multiple numa nodes (eg, 4) > >>> > >>> * The JVM process is run across all the nodes (not affinitized to > >>> one node). > >>> > >>> * A workload that has enough threads so that it uses the majority > >>> of the cores in the machine, so that the heap is being accessed > >>> from many cores, including remote ones. > >>> > >>> * Enough memory per node and a heap size such that the default heap > >>> placement policy on windows would end up with the heap (or > >>> nursery) placed on one node. > >>> > >>> jbb2005 and SPECPower_ssj2008 are examples of such workloads. In our > >>> measurements, we have seen some cases where the performance with > >>> UseNUMAInterleaving was 2.7x vs. the performance without. There were > >>> gains of varying sizes across all systems. > >>> > >>> The webrev is at > >>> http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.04/ > >>> > >>> Summary of changes in webrev.04 from webrev.03: > >>> > >>> * As suggested by Igor Veresov, UseNUMA can imply > >>> UseNUMAInterleaving on all platforms. This is in arguments.cpp > >>> > >>> * In NUMANodeListHolder in os_windows.cpp, allocates the node_list > >>> dynamically rather than assuming a length of 64. The method > >>> NUMANodeListHolder::get_node_list_entry checks returns -1 for > >>> indexes that are out of bounds. > >>> > >>> * Several code convention cleanups suggested by Igor. > >>> > >>> * Merge with the new style system dll function resolutions from > >>> "7016797: Hotspot: securely/restrictive load dlls and new API for > >>> loading system dlls" Note: my new NUMA functions are outside the > >> ifdefs. > >>> > >>> Summary of changes in webrev.03 from webrev.02: > >>> > >>> * As suggested by Igor Veresov, reverts to using > >>> UseNUMAInterleaving as the enabling flag. This will make it > >>> easier in the future when there are GCs that enable fuller > >>> UseNUMA on Windows. > >>> > >>> * Adds a simple implementation of UseNUMAInterleaving on Linux and > >>> Solaris, which just calls numa_make_global after commit_memory > >>> and reserve_memory_special > >>> > >>> * Adds a flag NUMAInterleaveGranularity which allows setting the > >>> granularity with which we move to a different node in a memory > >>> allocation. The default is 2MB. This flag only applies to > >>> Windows for now. > >>> > >>> * Several code cleanups in os_windows.cpp suggested by Igor. > >>> > >>> > >>> Summary of overall changes in os_windows.cpp: > >>> > >>> * Some static routines were added to set things up init time. These > >>> * check that the required APIs (VirtualAllocExNuma, > >>> GetNumaHighestNodeNumber, GetNumaNodeProcessorMask) exist in > >>> the OS > >>> > >>> * build the list of numa nodes on which this process has affinity > >>> > >>> * Changes to os::reserve_memory > >>> * There was already a routine that reserved pages one page at a > >>> time (used for Individual Large Page Allocation on WS2003). > >>> This was abstracted to a separate routine, called > >>> allocate_pages_individually. This gets called both for the > >>> Individual Large Page Allocation thing mentioned above and for > >>> UseNUMAInterleaving (for both small and large pages) > >>> > >>> * When used for NUMA Interleaving this just goes thru the numa > >>> node list in a round-robin fashion, allocating chunks at the > >>> NUMAInterleaveGranularity using a different allocation for > >>> each chunk > >>> > >>> * Whether we do just a reserve or a combined reserve/commit is > >>> determined by the caller of allocate_pages_individually > >>> > >>> * When used with large pages, we do a Reserve and Commit at > >>> the same time which is the way it always worked and the way > >>> it has to work on windows. > >>> > >>> * For small pages, only the reserve is done, the commit will > >>> come later. (which is the way it worked for > >>> non-interleaved) > >>> > >>> * os::commit_memory changes > >>> * If UseNUMAIntereaving is true, os::commit_memory has to check > >>> whether it was being asked to commit memory that might have > >>> come from multiple Reserve allocations, if so, the commits > >>> must also be broken up. We don't keep any data structure to > >>> keep track of this, we just use VirtualQuery which queries the > >>> properties of a VA range and can tell us how much came from > >>> one VirtualAlloc call. > >>> > >>> I do not have a bug id for this. > >>> > >>> -- Tom Deneau, AMD > >> > > From tom.rodriguez at oracle.com Wed Aug 24 11:59:01 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 24 Aug 2011 11:59:01 -0700 Subject: review for 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method Message-ID: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com> http://cr.openjdk.java.net/~never/7082949 55 lines changed: 55 ins; 0 del; 0 mod; 1606 unchg 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method Summary: Reviewed-by: The fix for 7056328 added some resource allocation in some cases when building the invoke method but didn't insert a ResourceMark. Mostly we ended up using one in a caller but sometimes the caller doesn't have one so this code needs its own. Tested with failing test case. From vladimir.kozlov at oracle.com Wed Aug 24 12:17:26 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 24 Aug 2011 12:17:26 -0700 Subject: review for 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method In-Reply-To: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com> References: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com> Message-ID: <4E554E46.40409@oracle.com> Looks good. Vladimir Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/7082949 > 55 lines changed: 55 ins; 0 del; 0 mod; 1606 unchg > > 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method > Summary: > Reviewed-by: > > The fix for 7056328 added some resource allocation in some cases when > building the invoke method but didn't insert a ResourceMark. Mostly > we ended up using one in a caller but sometimes the caller doesn't > have one so this code needs its own. Tested with failing test case. > From christian.thalinger at oracle.com Wed Aug 24 12:42:36 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 24 Aug 2011 21:42:36 +0200 Subject: review for 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method In-Reply-To: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com> References: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com> Message-ID: <014007F6-6C38-4B89-8622-43BAD1B64D3E@oracle.com> Looks good. -- Christian On Aug 24, 2011, at 8:59 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/7082949 > 55 lines changed: 55 ins; 0 del; 0 mod; 1606 unchg > > 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method > Summary: > Reviewed-by: > > The fix for 7056328 added some resource allocation in some cases when > building the invoke method but didn't insert a ResourceMark. Mostly > we ended up using one in a caller but sometimes the caller doesn't > have one so this code needs its own. Tested with failing test case. > From tom.rodriguez at oracle.com Wed Aug 24 13:57:20 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 24 Aug 2011 13:57:20 -0700 Subject: review for 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method In-Reply-To: <014007F6-6C38-4B89-8622-43BAD1B64D3E@oracle.com> References: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com> <014007F6-6C38-4B89-8622-43BAD1B64D3E@oracle.com> Message-ID: <9DA84FDC-34BE-460D-92CA-F7BBCD752A94@oracle.com> Thanks Christian and Vladimir. tom On Aug 24, 2011, at 12:42 PM, Christian Thalinger wrote: > Looks good. -- Christian > > On Aug 24, 2011, at 8:59 PM, Tom Rodriguez wrote: > >> http://cr.openjdk.java.net/~never/7082949 >> 55 lines changed: 55 ins; 0 del; 0 mod; 1606 unchg >> >> 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method >> Summary: >> Reviewed-by: >> >> The fix for 7056328 added some resource allocation in some cases when >> building the invoke method but didn't insert a ResourceMark. Mostly >> we ended up using one in a caller but sometimes the caller doesn't >> have one so this code needs its own. Tested with failing test case. >> > From tom.rodriguez at oracle.com Wed Aug 24 14:12:39 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 24 Aug 2011 14:12:39 -0700 Subject: review for 7071307: MethodHandle bimorphic inlining should consider the frequency In-Reply-To: <4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com> References: <4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com> Message-ID: On Aug 24, 2011, at 6:12 AM, Christian Thalinger wrote: > > On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote: > >> This is a re-review since I added per method handle GWT profiling. >> >> http://cr.openjdk.java.net/~never/7071307 >> 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg > > src/share/vm/prims/methodHandleWalk.cpp: > > MethodHandleCompiler::fetch_counts: > > + int count1 = -1, count2 = -1; > ... > + int total = count1 + count2; > + if (count1 != -1 && count2 != -2 && total != 0) { > > Why -2? Just a typo. It's fixed. > > + int _taken_count; > + int _not_taken_count; > > Does taken refer to target and not_taken to fallback in the GWT? They refer to the bytecode and the vmcounts collected. I think they are actually reversed from what selectAlternative generates but as long as they agree with the bytecodes generated I don't think it matters. I verified empirically that the counts match the execution and feed into the frequency in the proper fashion. > > MethodHandleCompiler::make_invoke: > > Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking? I added support for ifeq and added update_branch_dest to correct the offsets. I only added support for ifeq for now. > > + bool found_sel = false; > > Can you rename that to maybe found_selectAlternative? Yup. > > > src/share/vm/ci/ciMethodHandle.cpp: > > That print_chain is very helpful. Thanks for that. > > > src/share/vm/classfile/javaClasses.cpp: > > + int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) { > + assert(is_instance(mh), "DMH only"); > + return mh->int_field(_vmcount_offset); > + } > + > + void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) { > + assert(is_instance(mh), "DMH only"); > + mh->int_field_put(_vmcount_offset, count); > + } > > I think the assert message is a copy-paste bug. Fixed. > > Otherwise looks good. Thanks! tom > >> >> 7071307: MethodHandle bimorphic inlining should consider the frequency >> Reviewed-by: >> >> The fix for 7050554 added a bimorphic inline path but didn't take into >> account the frequency of the guarding test. This ends up treating >> both sides of the if as equally frequent which can lead to over >> inlining and overflowing the method inlining limits. The fix is to >> grab the frequency from the If and apply that to the branches. >> >> Additionally I added support for per method handle profile collection >> since this was required to get good results for more complex programs. >> This requires the fix for 7082631 on the JDK side. >> http://cr.openjdk.java.net/~never/7082631 > > The JDK changes look good. > > -- Christian > >> >> I also fixed a problem with the ideal graph printer where debug_orig >> printing would go into an infinite loop. >> >> Tested with jruby and vm.mlvm tests. >> > From vladimir.kozlov at oracle.com Wed Aug 24 17:52:16 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 24 Aug 2011 17:52:16 -0700 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 Message-ID: <4E559CC0.6030701@oracle.com> http://cr.openjdk.java.net/~kvn/7059037/webrev 7059037: Use BIS for zeroing on T4 On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) and template interpreter (TemplateTable::_new()). New stub zero_aligned_words was added to use in runtime. BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it requires membar. 2Hb was selected based on microbenchmark results. I also added wrasi(Reg, immI) instruction which I used during development. VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original was not used. Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it since it will be cleaned later in init_obj(). Fixed call sites of check_for_bad_heap_word_value() where klass is not initialized to avoid the verification failure. From christian.thalinger at oracle.com Thu Aug 25 00:59:25 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 25 Aug 2011 09:59:25 +0200 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <4E559CC0.6030701@oracle.com> References: <4E559CC0.6030701@oracle.com> Message-ID: On Aug 25, 2011, at 2:52 AM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7059037/webrev > > 7059037: Use BIS for zeroing on T4 > > On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new > allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is > used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) > and template interpreter (TemplateTable::_new()). New stub zero_aligned_words > was added to use in runtime. > > BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it > requires membar. 2Hb was selected based on microbenchmark results. > > I also added wrasi(Reg, immI) instruction which I used during development. > VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original > was not used. > Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it > since it will be cleaned later in init_obj(). > Fixed call sites of check_for_bad_heap_word_value() where klass is not > initialized to avoid the verification failure. > src/cpu/sparc/vm/assembler_sparc.cpp: + int cach_line_size = VM_Version::prefetch_data_size(); I guess this should be cache_line_size. + // Use BIS zeroing only for big arrays since it requires membar. + if (Assembler::is_simm13(blk_zero_size)) { // < 4096 + cmp(count, blk_zero_size); + } else { + set(blk_zero_size, temp); + cmp(count, temp); + } You could use ensure_simm13_or_reg here: cmp(count, ensure_simm13_or_reg(blk_zero_size, temp)); but I think you have to add a cmp(Register s1, RegisterOrConstant s2). + // Clean the beginning of space upto next cache line. There is a space missing: "up to". Otherwise this looks good. A side question: what's the difference between using reg_to_register_object($tmp$$reg) and $tmp$$Register? What does: assert(L5->encoding() == R_L5_enc && G1->encoding() == R_G1_enc, "right coding"); in reg_to_register_object actually check for? -- Christian From christian.thalinger at oracle.com Thu Aug 25 02:17:00 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 25 Aug 2011 11:17:00 +0200 Subject: Request for reviews (M): 7083184: JSR 292: don't store context class argument with call site dependencies Message-ID: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com> http://cr.openjdk.java.net/~twisti/7083184/ 7083184: JSR 292: don't store context class argument with call site dependencies Reviewed-by: The changes of 7071653 store a context class argument per call site dependency in the dependency stream. This is actually not required since the context class is implicitly available with the first argument; the call site object. Additionally call site dependencies should not depend on the very general super class CallSite but rather its actual class. src/share/vm/ci/ciEnv.cpp src/share/vm/ci/ciEnv.hpp src/share/vm/code/dependencies.cpp src/share/vm/code/dependencies.hpp src/share/vm/memory/universe.cpp src/share/vm/opto/callGenerator.cpp From christian.thalinger at oracle.com Thu Aug 25 06:54:42 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 25 Aug 2011 15:54:42 +0200 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled Message-ID: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> http://cr.openjdk.java.net/~twisti/7071709/ 7071709: JSR 292: switchpoint invalidation should be pushed not pulled Reviewed-by: SwitchPoints use a MutableCallSite for its implementation. The fix is to treat the target field of constant CallSites as a compile time constant and add a dependence for invalidation of the optimization. src/share/vm/opto/memnode.cpp src/share/vm/opto/parse3.cpp From martin.doerr at sap.com Thu Aug 25 09:42:03 2011 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 25 Aug 2011 18:42:03 +0200 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <4E559CC0.6030701@oracle.com> References: <4E559CC0.6030701@oracle.com> Message-ID: <160598AAAEA6C640BF796BA28D836C6404FC3DC8BF@DEWDFECCR04.wdf.sap.corp> Hi Vladimir, this looks like a good starting point. Have you already seen my comments which I had added to bug 7059037? I just pasted them below. Kind regards, Martin D I'm aware of 2 easy to implement but problematic ways to use block initializing instructions for TLAB initialization: 1. Use them in ClearArray. The problem here is that objects are not cache line aligned in general so we need to clear the slow way before (and after?) a cache line boundary. This is not difficult to implement but has quite some overhead and does not avoid fetching cache lines from memory at the beginning (end?) of objects. 2. Use them in zero_to_words and activate -XX:+ZeroTLAB. This will clear the whole TLABs when they get allocated. Doesn't perform well when TLABs get large and cache lines get squeezed out to other levels in the memory hierarchy. (BTW: filling with badHeapWordVal in ThreadLocalAllocBuffer::allocate breaks ZeroTLAB function in debug build, maybe we should open a new bug for it) My new proposal is to combine the zeroing with the prefetching. We only have to make sure that we always clear up to some distance behind the object being allocated. Then we can disable the ClearArray nodes as it is done when ZeroTLAB is used. We already have tlab_pf_top_offset which is used with AllocatePrefetchStyle==2. Block initializing prefetching could be implemented using such kind of a prefetch watermark. If we establish to align the TLABs to cache line boundaries and to use a size which is divisible by the cache line size, this should be easy to implement (which shouldn't be a bad thing for any platform). We could use an AllocatePrefetchDistance of one cache line behind new_eden_top which probably makes sense, but playing with it might still be interesting because some processors use automatic hardware prefetching which can interfere with what we're doing. We should probably clear so far ahead that the hardware prefetch engine doesn't overtake us. -----Original Message----- From: hotspot-compiler-dev-bounces at openjdk.java.net [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov Sent: Donnerstag, 25. August 2011 02:52 To: hotspot compiler Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 http://cr.openjdk.java.net/~kvn/7059037/webrev 7059037: Use BIS for zeroing on T4 On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) and template interpreter (TemplateTable::_new()). New stub zero_aligned_words was added to use in runtime. BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it requires membar. 2Hb was selected based on microbenchmark results. I also added wrasi(Reg, immI) instruction which I used during development. VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original was not used. Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it since it will be cleaned later in init_obj(). Fixed call sites of check_for_bad_heap_word_value() where klass is not initialized to avoid the verification failure. From tom.rodriguez at oracle.com Thu Aug 25 09:55:19 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 25 Aug 2011 09:55:19 -0700 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> Message-ID: <108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com> Why is this being done for VolatileCallSite? There's no mechanism for falling back if the field is invalided too many times so we're just going to recompile over and over again which seems wrong. Otherwise it looks ok. tom On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7071709/ > > 7071709: JSR 292: switchpoint invalidation should be pushed not pulled > Reviewed-by: > > SwitchPoints use a MutableCallSite for its implementation. The fix is > to treat the target field of constant CallSites as a compile time > constant and add a dependence for invalidation of the optimization. > > src/share/vm/opto/memnode.cpp > src/share/vm/opto/parse3.cpp > From forax at univ-mlv.fr Thu Aug 25 10:51:27 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Thu, 25 Aug 2011 19:51:27 +0200 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com> Message-ID: <4E568B9F.8040606@univ-mlv.fr> On 08/25/2011 06:55 PM, Tom Rodriguez wrote: > Why is this being done for VolatileCallSite? There's no mechanism for falling back if the field is invalided too many times so we're just going to recompile over and over again which seems wrong. Otherwise it looks ok. Maybe because it can be invalidated only once from the API side. > > tom R?mi > > On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote: > >> http://cr.openjdk.java.net/~twisti/7071709/ >> >> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled >> Reviewed-by: >> >> SwitchPoints use a MutableCallSite for its implementation. The fix is >> to treat the target field of constant CallSites as a compile time >> constant and add a dependence for invalidation of the optimization. >> >> src/share/vm/opto/memnode.cpp >> src/share/vm/opto/parse3.cpp >> From vladimir.kozlov at oracle.com Thu Aug 25 10:50:19 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 25 Aug 2011 10:50:19 -0700 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: References: <4E559CC0.6030701@oracle.com> Message-ID: <4E568B5B.5050307@oracle.com> Thank you, Christian Christian Thalinger wrote: > On Aug 25, 2011, at 2:52 AM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7059037/webrev >> >> 7059037: Use BIS for zeroing on T4 >> >> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new >> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is >> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) >> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words >> was added to use in runtime. >> >> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it >> requires membar. 2Hb was selected based on microbenchmark results. >> >> I also added wrasi(Reg, immI) instruction which I used during development. >> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original >> was not used. >> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it >> since it will be cleaned later in init_obj(). >> Fixed call sites of check_for_bad_heap_word_value() where klass is not >> initialized to avoid the verification failure. >> > > src/cpu/sparc/vm/assembler_sparc.cpp: > > + int cach_line_size = VM_Version::prefetch_data_size(); > > I guess this should be cache_line_size. Fixed. > > + // Use BIS zeroing only for big arrays since it requires membar. > + if (Assembler::is_simm13(blk_zero_size)) { // < 4096 > + cmp(count, blk_zero_size); > + } else { > + set(blk_zero_size, temp); > + cmp(count, temp); > + } > > You could use ensure_simm13_or_reg here: > > cmp(count, ensure_simm13_or_reg(blk_zero_size, temp)); > > but I think you have to add a cmp(Register s1, RegisterOrConstant s2). I will keep it as it is. I don't want to add new method for just one case. > > + // Clean the beginning of space upto next cache line. > > There is a space missing: "up to". Fixed. > > Otherwise this looks good. > > > A side question: what's the difference between using reg_to_register_object($tmp$$reg) and $tmp$$Register? What does: I think reg_to_register_object() was implemented before $tmp$$Register. I copied code from original clear_array() which is very old. I will switch to $tmp$$Register form. > > assert(L5->encoding() == R_L5_enc && G1->encoding() == R_G1_enc, "right coding"); > > in reg_to_register_object actually check for? It is old code which verifies that encoding() produces correct result. Thanks, Vlaidmir > > -- Christian From john.r.rose at oracle.com Thu Aug 25 11:15:03 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Aug 2011 11:15:03 -0700 Subject: Request for reviews (M): 7083184: JSR 292: don't store context class argument with call site dependencies In-Reply-To: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com> References: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com> Message-ID: <4F53CD27-1042-4193-810B-C2A906490077@oracle.com> Looks good. -- John On Aug 25, 2011, at 2:17 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7083184/ > > 7083184: JSR 292: don't store context class argument with call site dependencies > Reviewed-by: > > The changes of 7071653 store a context class argument per call site > dependency in the dependency stream. This is actually not required > since the context class is implicitly available with the first > argument; the call site object. Additionally call site dependencies > should not depend on the very general super class CallSite but rather > its actual class. > > src/share/vm/ci/ciEnv.cpp > src/share/vm/ci/ciEnv.hpp > src/share/vm/code/dependencies.cpp > src/share/vm/code/dependencies.hpp > src/share/vm/memory/universe.cpp > src/share/vm/opto/callGenerator.cpp > From john.r.rose at oracle.com Thu Aug 25 11:21:40 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Aug 2011 11:21:40 -0700 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> Message-ID: <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> That's nice and clean. One question: What happens when a CallSite optimizes down to a ConstantCallSite? It looks like a useless dependency will get inserted. Maybe the call to assert_call_site_target_value should be guarded by a check whether the field is marked final. Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass. Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses. -- John On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7071709/ > > 7071709: JSR 292: switchpoint invalidation should be pushed not pulled > Reviewed-by: > > SwitchPoints use a MutableCallSite for its implementation. The fix is > to treat the target field of constant CallSites as a compile time > constant and add a dependence for invalidation of the optimization. > > src/share/vm/opto/memnode.cpp > src/share/vm/opto/parse3.cpp > From tom.rodriguez at oracle.com Thu Aug 25 11:32:28 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 25 Aug 2011 11:32:28 -0700 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <4E568B9F.8040606@univ-mlv.fr> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com> <4E568B9F.8040606@univ-mlv.fr> Message-ID: <05ACD704-7163-497D-8C2C-9E7AD760E080@oracle.com> On Aug 25, 2011, at 10:51 AM, R?mi Forax wrote: > On 08/25/2011 06:55 PM, Tom Rodriguez wrote: >> Why is this being done for VolatileCallSite? There's no mechanism for falling back if the field is invalided too many times so we're just going to recompile over and over again which seems wrong. Otherwise it looks ok. > > Maybe because it can be invalidated only once from the API side. The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times. MutableCallSite can be as well but the implication is that it's not updated very often. I'm actually unclear what distinction is trying to be made with VolatileCallSite. tom > >> >> tom > > R?mi > >> >> On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote: >> >>> http://cr.openjdk.java.net/~twisti/7071709/ >>> >>> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled >>> Reviewed-by: >>> >>> SwitchPoints use a MutableCallSite for its implementation. The fix is >>> to treat the target field of constant CallSites as a compile time >>> constant and add a dependence for invalidation of the optimization. >>> >>> src/share/vm/opto/memnode.cpp >>> src/share/vm/opto/parse3.cpp >>> > From tom.rodriguez at oracle.com Thu Aug 25 11:34:13 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 25 Aug 2011 11:34:13 -0700 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> Message-ID: On Aug 25, 2011, at 11:21 AM, John Rose wrote: > That's nice and clean. > > One question: What happens when a CallSite optimizes down to a ConstantCallSite? It looks like a useless dependency will get inserted. > > Maybe the call to assert_call_site_target_value should be guarded by a check whether the field is marked final. target isn't final. The semantics of the field are captured in the subclass. It's true you don't need the dependence for ConstantCallSite though. tom > > Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass. Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses. > > -- John > > On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote: > >> http://cr.openjdk.java.net/~twisti/7071709/ >> >> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled >> Reviewed-by: >> >> SwitchPoints use a MutableCallSite for its implementation. The fix is >> to treat the target field of constant CallSites as a compile time >> constant and add a dependence for invalidation of the optimization. >> >> src/share/vm/opto/memnode.cpp >> src/share/vm/opto/parse3.cpp >> > From tom.rodriguez at oracle.com Thu Aug 25 12:58:54 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 25 Aug 2011 12:58:54 -0700 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <4E559CC0.6030701@oracle.com> References: <4E559CC0.6030701@oracle.com> Message-ID: <2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com> src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp: Please use an ifdef block instead of the expression form. You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments. Something like: predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con() > BlkZeroingLowLimit) That would reduce any overhead for large instances that will never benefit from BIS. Could we use block instead of blk? Otherwise this looks good. tom On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7059037/webrev > > 7059037: Use BIS for zeroing on T4 > > On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new > allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is > used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) > and template interpreter (TemplateTable::_new()). New stub zero_aligned_words > was added to use in runtime. > > BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it > requires membar. 2Hb was selected based on microbenchmark results. > > I also added wrasi(Reg, immI) instruction which I used during development. > VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original > was not used. > Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it > since it will be cleaned later in init_obj(). > Fixed call sites of check_for_bad_heap_word_value() where klass is not > initialized to avoid the verification failure. > From tom.rodriguez at oracle.com Thu Aug 25 13:01:58 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 25 Aug 2011 13:01:58 -0700 Subject: Request for reviews (M): 7083184: JSR 292: don't store context class argument with call site dependencies In-Reply-To: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com> References: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com> Message-ID: Looks good. tom On Aug 25, 2011, at 2:17 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7083184/ > > 7083184: JSR 292: don't store context class argument with call site dependencies > Reviewed-by: > > The changes of 7071653 store a context class argument per call site > dependency in the dependency stream. This is actually not required > since the context class is implicitly available with the first > argument; the call site object. Additionally call site dependencies > should not depend on the very general super class CallSite but rather > its actual class. > > src/share/vm/ci/ciEnv.cpp > src/share/vm/ci/ciEnv.hpp > src/share/vm/code/dependencies.cpp > src/share/vm/code/dependencies.hpp > src/share/vm/memory/universe.cpp > src/share/vm/opto/callGenerator.cpp > From y.s.ramakrishna at oracle.com Thu Aug 25 13:23:12 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Thu, 25 Aug 2011 13:23:12 -0700 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <4E559CC0.6030701@oracle.com> References: <4E559CC0.6030701@oracle.com> Message-ID: <4E56AF30.8050909@oracle.com> Hi Vladimir -- On 8/24/2011 5:52 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7059037/webrev > > 7059037: Use BIS for zeroing on T4 > ... > Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of > zeroing it > since it will be cleaned later in init_obj(). > Fixed call sites of check_for_bad_heap_word_value() where klass is not > initialized to avoid the verification failure. > Can you describe why these two changes were necessary? There was already support for skipping headers for concurrent GC's when zapping and verifying. Did something change that caused this to be changed. I haven't looked at the rest of the files, but a high level description of the need to make this change would allow me to review the changes that necessitated this, and whether it could not be done more easily otherwise (using the existing framework of skipping a preamble of words in the object). -- ramki From john.r.rose at oracle.com Thu Aug 25 16:47:13 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Aug 2011 16:47:13 -0700 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <05ACD704-7163-497D-8C2C-9E7AD760E080@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com> <4E568B9F.8040606@univ-mlv.fr> <05ACD704-7163-497D-8C2C-9E7AD760E080@oracle.com> Message-ID: <35544B63-E3B9-4F37-9015-4E081392A9D1@oracle.com> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote: > The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times. MutableCallSite can be as well but the implication is that it's not updated very often. I'm actually unclear what distinction is trying to be made with VolatileCallSite. The MCS and VCS have the same semantics, except for the extra memory barriers on VCS. These barriers do not affect the validity or applicability of push notification. Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times. We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted. -- Joh -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110825/c42544f4/attachment.html From vladimir.kozlov at oracle.com Thu Aug 25 16:51:56 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 25 Aug 2011 16:51:56 -0700 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <4E56AF30.8050909@oracle.com> References: <4E559CC0.6030701@oracle.com> <4E56AF30.8050909@oracle.com> Message-ID: <4E56E01C.6000604@oracle.com> Ramki, Ramki Ramakrishna wrote: > Hi Vladimir -- > > On 8/24/2011 5:52 PM, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/7059037/webrev >> >> 7059037: Use BIS for zeroing on T4 >> > ... >> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of >> zeroing it >> since it will be cleaned later in init_obj(). TLAB::allocate() zaps new objects so I think allocate_from_tlab_slow() should also zap new object (and I copied code from ThreadLocalAllocBuffer::allocate()) instead of cleaning it since it will be cleaned later in init_obj(). >> Fixed call sites of check_for_bad_heap_word_value() where klass is not >> initialized to avoid the verification failure. >> % /java/re/jdk/7/latest/binaries/solaris-i586/fastdebug/bin/java -XX:+CheckMemoryInitialization -Xcomp t VM option '+CheckMemoryInitialization' # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/collectedHeap.cpp:98 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/tmp/workspace/jdk7-2-build-solaris-i586-product/jdk7/hotspot/src/share/vm/gc_interface/collectedHeap.cpp:98), pid=27663, tid=2 # assert((*(intptr_t*) (addr + slot)) != ((intptr_t) badHeapWordVal)) failed: Found badHeapWordValue in post-allocation check # # JRE version: 7.0-b147 # Java VM: Java HotSpot(TM) Server VM (21.0-b17-fastdebug compiled mode solaris-x86 ) Vladimir > > Can you describe why these two changes were necessary? There was already > support > for skipping headers for concurrent GC's when zapping and verifying. Did > something > change that caused this to be changed. > > I haven't looked at the rest of the files, but a high level description > of the need to > make this change would allow me to review the changes that necessitated > this, > and whether it could not be done more easily otherwise (using the existing > framework of skipping a preamble of words in the object). > > -- ramki From igor.veresov at oracle.com Thu Aug 25 17:10:28 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 25 Aug 2011 17:10:28 -0700 Subject: review(XS): 6591247: C2 cleans up the merge point too early during SplitIf. Message-ID: The problem here is that during split-if we remove the region's self reference too early while processing its users, which can make get_ctrl_no_update() return the wrong answer. I wasn't able to reproduce the problem, but it seems to be possible for it to occur if the region points to something else but phi and the self reference is deleted too early. The solution is to remove the self reference last. Webrev: http://cr.openjdk.java.net/~iveresov/6591247/webrev.00/ Testing: specjvm98, CTW Thanks, igor From vladimir.kozlov at oracle.com Thu Aug 25 18:47:37 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 25 Aug 2011 18:47:37 -0700 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com> References: <4E559CC0.6030701@oracle.com> <2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com> Message-ID: <4E56FB39.2050207@oracle.com> Thank you, Tom I updated webrev with your and Christian suggestions: http://cr.openjdk.java.net/~kvn/7059037/webrev Tom Rodriguez wrote: > src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp: > > Please use an ifdef block instead of the expression form. Done. > > You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments. Something like: > > predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con() > BlkZeroingLowLimit) > > That would reduce any overhead for large instances that will never benefit from BIS. Done. I thought about that but found that such cases are rare since the expression which calculates count could be complex (because we mostly do partial zeroing) or when object is small with constant count ClearArray is replaced with stores in ideal transformation. But I agree it still may help. > > Could we use block instead of blk? Otherwise this looks good. Done. Thanks, Vladimir > > tom > > On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7059037/webrev >> >> 7059037: Use BIS for zeroing on T4 >> >> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new >> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is >> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) >> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words >> was added to use in runtime. >> >> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it >> requires membar. 2Hb was selected based on microbenchmark results. >> >> I also added wrasi(Reg, immI) instruction which I used during development. >> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original >> was not used. >> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it >> since it will be cleaned later in init_obj(). >> Fixed call sites of check_for_bad_heap_word_value() where klass is not >> initialized to avoid the verification failure. >> > From igor.veresov at oracle.com Thu Aug 25 19:48:49 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 25 Aug 2011 19:48:49 -0700 Subject: review(XS): 6591247: C2 cleans up the merge point too early during SplitIf. In-Reply-To: <4E56ED79.3080902@oracle.com> References: <4E56ED79.3080902@oracle.com> Message-ID: <4FB4620C833040C090409B0FBB3DD4EF@oracle.com> Thanks, Vladimir! igor On Thursday, August 25, 2011 at 5:48 PM, Vladimir Kozlov wrote: > It is good. > > Vladimir > > Igor Veresov wrote: > > The problem here is that during split-if we remove the region's self reference too early while processing its users, which can make get_ctrl_no_update() return the wrong answer. > > I wasn't able to reproduce the problem, but it seems to be possible for it to occur if the region points to something else but phi and the self reference is deleted too early. > > The solution is to remove the self reference last. > > > > Webrev: http://cr.openjdk.java.net/~iveresov/6591247/webrev.00/ > > > > Testing: specjvm98, CTW > > > > Thanks, > > igor From tom.rodriguez at oracle.com Thu Aug 25 19:57:11 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 25 Aug 2011 19:57:11 -0700 Subject: review(XS): 6591247: C2 cleans up the merge point too early during SplitIf. In-Reply-To: References: Message-ID: <7E0E16EB-ACA8-482F-92FE-E76CD2B79CC2@oracle.com> Looks good. tom On Aug 25, 2011, at 5:10 PM, Igor Veresov wrote: > The problem here is that during split-if we remove the region's self reference too early while processing its users, which can make get_ctrl_no_update() return the wrong answer. > I wasn't able to reproduce the problem, but it seems to be possible for it to occur if the region points to something else but phi and the self reference is deleted too early. > The solution is to remove the self reference last. > > Webrev: http://cr.openjdk.java.net/~iveresov/6591247/webrev.00/ > > Testing: specjvm98, CTW > > Thanks, > igor > From tom.rodriguez at oracle.com Thu Aug 25 19:58:47 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 25 Aug 2011 19:58:47 -0700 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <4E56FB39.2050207@oracle.com> References: <4E559CC0.6030701@oracle.com> <2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com> <4E56FB39.2050207@oracle.com> Message-ID: <33F76179-F235-46E6-8B7F-A2A92DE9FE20@oracle.com> Looks good. tom On Aug 25, 2011, at 6:47 PM, Vladimir Kozlov wrote: > Thank you, Tom > > I updated webrev with your and Christian suggestions: > > http://cr.openjdk.java.net/~kvn/7059037/webrev > > Tom Rodriguez wrote: >> src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp: >> Please use an ifdef block instead of the expression form. > > Done. > >> You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments. Something like: >> predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con() > BlkZeroingLowLimit) >> That would reduce any overhead for large instances that will never benefit from BIS. > > Done. I thought about that but found that such cases are rare since the expression which calculates count could be complex (because we mostly do partial zeroing) or when object is small with constant count ClearArray is replaced with stores in ideal transformation. But I agree it still may help. > >> Could we use block instead of blk? Otherwise this looks good. > > Done. > > Thanks, > Vladimir > >> tom >> On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/7059037/webrev >>> >>> 7059037: Use BIS for zeroing on T4 >>> >>> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new >>> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is >>> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) >>> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words >>> was added to use in runtime. >>> >>> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it >>> requires membar. 2Hb was selected based on microbenchmark results. >>> >>> I also added wrasi(Reg, immI) instruction which I used during development. >>> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original >>> was not used. >>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it >>> since it will be cleaned later in init_obj(). >>> Fixed call sites of check_for_bad_heap_word_value() where klass is not >>> initialized to avoid the verification failure. >>> From igor.veresov at oracle.com Thu Aug 25 20:51:43 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 25 Aug 2011 20:51:43 -0700 Subject: review(XS): 6591247: C2 cleans up the merge point too early during SplitIf. In-Reply-To: <7E0E16EB-ACA8-482F-92FE-E76CD2B79CC2@oracle.com> References: <7E0E16EB-ACA8-482F-92FE-E76CD2B79CC2@oracle.com> Message-ID: <5D1DCB1FC80E4C499A9FF261E399F976@oracle.com> Thanks, Tom! igor On Thursday, August 25, 2011 at 7:57 PM, Tom Rodriguez wrote: > Looks good. > > tom > > On Aug 25, 2011, at 5:10 PM, Igor Veresov wrote: > > > The problem here is that during split-if we remove the region's self reference too early while processing its users, which can make get_ctrl_no_update() return the wrong answer. > > I wasn't able to reproduce the problem, but it seems to be possible for it to occur if the region points to something else but phi and the self reference is deleted too early. > > The solution is to remove the self reference last. > > > > Webrev: http://cr.openjdk.java.net/~iveresov/6591247/webrev.00/ > > > > Testing: specjvm98, CTW > > > > Thanks, > > igor From christian.thalinger at oracle.com Fri Aug 26 00:02:05 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 26 Aug 2011 09:02:05 +0200 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <4E56FB39.2050207@oracle.com> References: <4E559CC0.6030701@oracle.com> <2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com> <4E56FB39.2050207@oracle.com> Message-ID: <3FF5DFAA-BF98-42C3-9744-758441EB2BB2@oracle.com> Looks good. -- Christian On Aug 26, 2011, at 3:47 AM, Vladimir Kozlov wrote: > Thank you, Tom > > I updated webrev with your and Christian suggestions: > > http://cr.openjdk.java.net/~kvn/7059037/webrev > > Tom Rodriguez wrote: >> src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp: >> Please use an ifdef block instead of the expression form. > > Done. > >> You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments. Something like: >> predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con() > BlkZeroingLowLimit) >> That would reduce any overhead for large instances that will never benefit from BIS. > > Done. I thought about that but found that such cases are rare since the expression which calculates count could be complex (because we mostly do partial zeroing) or when object is small with constant count ClearArray is replaced with stores in ideal transformation. But I agree it still may help. > >> Could we use block instead of blk? Otherwise this looks good. > > Done. > > Thanks, > Vladimir > >> tom >> On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/7059037/webrev >>> >>> 7059037: Use BIS for zeroing on T4 >>> >>> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new >>> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is >>> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) >>> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words >>> was added to use in runtime. >>> >>> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it >>> requires membar. 2Hb was selected based on microbenchmark results. >>> >>> I also added wrasi(Reg, immI) instruction which I used during development. >>> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original >>> was not used. >>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it >>> since it will be cleaned later in init_obj(). >>> Fixed call sites of check_for_bad_heap_word_value() where klass is not >>> initialized to avoid the verification failure. >>> From y.s.ramakrishna at oracle.com Fri Aug 26 00:51:19 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Fri, 26 Aug 2011 00:51:19 -0700 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <4E56E01C.6000604@oracle.com> References: <4E559CC0.6030701@oracle.com> <4E56AF30.8050909@oracle.com> <4E56E01C.6000604@oracle.com> Message-ID: <4E575077.6080502@oracle.com> On 8/25/2011 4:51 PM, Vladimir Kozlov wrote: > Ramki, > > Ramki Ramakrishna wrote: >> Hi Vladimir -- >> >> On 8/24/2011 5:52 PM, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/7059037/webrev >>> >>> 7059037: Use BIS for zeroing on T4 >>> >> ... >>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead >>> of zeroing it >>> since it will be cleaned later in init_obj(). > > TLAB::allocate() zaps new objects so I think allocate_from_tlab_slow() > should also zap new object (and I copied code from > ThreadLocalAllocBuffer::allocate()) instead of cleaning it since it > will be cleaned later in init_obj(). > I see. OK I agree that this is the right thing to do for concurrent gc's, although i wish this could be cleanly abstracted based on the collector, rather than a blanket imposition from concurrent gc's. >>> Fixed call sites of check_for_bad_heap_word_value() where klass is not >>> initialized to avoid the verification failure. >>> I see that the skip_header_HeapWords() used in GCH:: check_for_non_bad_heap_word_value() was not extended to the CH::check_for_bad_heap_word_value(). Your changes look good; thanks for fixing up the shortcomings. I'll check with my colleagues on the need to clean this up (in a separate CR of course) so that the concurrent GC'isms do not leak out in this manner into the general code, or at least are left sufficiently abstract when they can be. -- ramki > > % /java/re/jdk/7/latest/binaries/solaris-i586/fastdebug/bin/java > -XX:+CheckMemoryInitialization -Xcomp t > VM option '+CheckMemoryInitialization' > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/collectedHeap.cpp:98 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error > (/tmp/workspace/jdk7-2-build-solaris-i586-product/jdk7/hotspot/src/share/vm/gc_interface/collectedHeap.cpp:98), > pid=27663, tid=2 > # assert((*(intptr_t*) (addr + slot)) != ((intptr_t) badHeapWordVal)) > failed: Found badHeapWordValue in post-allocation check > # > # JRE version: 7.0-b147 > # Java VM: Java HotSpot(TM) Server VM (21.0-b17-fastdebug compiled > mode solaris-x86 ) > > Vladimir > >> >> Can you describe why these two changes were necessary? There was >> already support >> for skipping headers for concurrent GC's when zapping and verifying. >> Did something >> change that caused this to be changed. >> >> I haven't looked at the rest of the files, but a high level >> description of the need to >> make this change would allow me to review the changes that >> necessitated this, >> and whether it could not be done more easily otherwise (using the >> existing >> framework of skipping a preamble of words in the object). >> >> -- ramki From christian.thalinger at oracle.com Fri Aug 26 02:16:26 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 26 Aug 2011 11:16:26 +0200 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> Message-ID: On Aug 25, 2011, at 8:21 PM, John Rose wrote: > That's nice and clean. > > One question: What happens when a CallSite optimizes down to a ConstantCallSite? It looks like a useless dependency will get inserted. Right. That slipped through the cracks. I also changed the check in callGenerator. > > Maybe the call to assert_call_site_target_value should be guarded by a check whether the field is marked final. I'm not sure I understand. The target field isn't final. > > Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass. Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses. Are there plans to do this? I changed ciField::is_call_site_target to check for subclasses of CallSite. -- Christian > > -- John > > On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote: > >> http://cr.openjdk.java.net/~twisti/7071709/ >> >> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled >> Reviewed-by: >> >> SwitchPoints use a MutableCallSite for its implementation. The fix is >> to treat the target field of constant CallSites as a compile time >> constant and add a dependence for invalidation of the optimization. >> >> src/share/vm/opto/memnode.cpp >> src/share/vm/opto/parse3.cpp >> > From christian.thalinger at oracle.com Fri Aug 26 02:23:08 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 26 Aug 2011 11:23:08 +0200 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <35544B63-E3B9-4F37-9015-4E081392A9D1@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com> <4E568B9F.8040606@univ-mlv.fr> <05ACD704-7163-497D-8C2C-9E7AD760E080@oracle.com> <35544B63-E3B9-4F37-9015-4E081392A9D1@oracle.com> Message-ID: <775B68CD-023B-4821-A6DD-383F57FDFE73@oracle.com> On Aug 26, 2011, at 1:47 AM, John Rose wrote: > On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote: > >> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times. MutableCallSite can be as well but the implication is that it's not updated very often. I'm actually unclear what distinction is trying to be made with VolatileCallSite. > > The MCS and VCS have the same semantics, except for the extra memory barriers on VCS. These barriers do not affect the validity or applicability of push notification. Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times. We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted. So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes. I'm fine with both. -- Christian > > -- Joh -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110826/37b5498c/attachment.html From christian.thalinger at oracle.com Fri Aug 26 04:16:43 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 26 Aug 2011 13:16:43 +0200 Subject: review for 7071307: MethodHandle bimorphic inlining should consider the frequency In-Reply-To: References: <4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com> Message-ID: <6E9A5D90-666D-436B-BCA9-A781510B70DD@oracle.com> I just applied this patch to test the rtalk implementation and I hit an assert: Internal Error at bytecodeInfo.cpp:152, pid=10351, tid=11 assert(mha_profile) failed: must exist Some context: (dbx) p _caller_jvms->method()->print() _caller_jvms->method()->print() = (void) (dbx) p _caller_jvms->bci() _caller_jvms->bci() = 7 (dbx) p _caller_jvms->method()->print_codes() 0 aload_2 1 astore_3 2 aload_1 3 fast_aload_0 4 aload_2 5 astore_3 6 aload_3 7 invokedynamic secondary cache[4] of CP[2] missing bias? 0 bci: 7 CounterData count(16900) 12 astore_3 13 aload_3 14 invokedynamic secondary cache[5] of CP[3] missing bias? 8 bci: 14 CounterData count(16900) 19 astore_3 20 aload_3 21 areturn _caller_jvms->method()->print_codes() = (void) (dbx) p mdo->print() --- Extra data: mdo->print() = (void) (dbx) -- Christian On Aug 24, 2011, at 11:12 PM, Tom Rodriguez wrote: > > On Aug 24, 2011, at 6:12 AM, Christian Thalinger wrote: > >> >> On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote: >> >>> This is a re-review since I added per method handle GWT profiling. >>> >>> http://cr.openjdk.java.net/~never/7071307 >>> 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg >> >> src/share/vm/prims/methodHandleWalk.cpp: >> >> MethodHandleCompiler::fetch_counts: >> >> + int count1 = -1, count2 = -1; >> ... >> + int total = count1 + count2; >> + if (count1 != -1 && count2 != -2 && total != 0) { >> >> Why -2? > > Just a typo. It's fixed. > >> >> + int _taken_count; >> + int _not_taken_count; >> >> Does taken refer to target and not_taken to fallback in the GWT? > > They refer to the bytecode and the vmcounts collected. I think they are actually reversed from what selectAlternative generates but as long as they agree with the bytecodes generated I don't think it matters. I verified empirically that the counts match the execution and feed into the frequency in the proper fashion. > >> >> MethodHandleCompiler::make_invoke: >> >> Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking? > > I added support for ifeq and added update_branch_dest to correct the offsets. I only added support for ifeq for now. > >> >> + bool found_sel = false; >> >> Can you rename that to maybe found_selectAlternative? > > Yup. > >> >> >> src/share/vm/ci/ciMethodHandle.cpp: >> >> That print_chain is very helpful. Thanks for that. >> >> >> src/share/vm/classfile/javaClasses.cpp: >> >> + int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) { >> + assert(is_instance(mh), "DMH only"); >> + return mh->int_field(_vmcount_offset); >> + } >> + >> + void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) { >> + assert(is_instance(mh), "DMH only"); >> + mh->int_field_put(_vmcount_offset, count); >> + } >> >> I think the assert message is a copy-paste bug. > > Fixed. > >> >> Otherwise looks good. > > Thanks! > > tom > >> >>> >>> 7071307: MethodHandle bimorphic inlining should consider the frequency >>> Reviewed-by: >>> >>> The fix for 7050554 added a bimorphic inline path but didn't take into >>> account the frequency of the guarding test. This ends up treating >>> both sides of the if as equally frequent which can lead to over >>> inlining and overflowing the method inlining limits. The fix is to >>> grab the frequency from the If and apply that to the branches. >>> >>> Additionally I added support for per method handle profile collection >>> since this was required to get good results for more complex programs. >>> This requires the fix for 7082631 on the JDK side. >>> http://cr.openjdk.java.net/~never/7082631 >> >> The JDK changes look good. >> >> -- Christian >> >>> >>> I also fixed a problem with the ideal graph printer where debug_orig >>> printing would go into an infinite loop. >>> >>> Tested with jruby and vm.mlvm tests. >>> >> > From tom.rodriguez at oracle.com Fri Aug 26 04:53:32 2011 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Fri, 26 Aug 2011 11:53:32 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method Message-ID: <20110826115338.EE4DF47131@hg.openjdk.java.net> Changeset: ac8738449b6f Author: never Date: 2011-08-25 20:29 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/ac8738449b6f 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method Reviewed-by: kvn, twisti ! src/share/vm/oops/methodOop.cpp + test/compiler/7082949/Test7082949.java From vladimir.kozlov at oracle.com Fri Aug 26 07:51:28 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 26 Aug 2011 07:51:28 -0700 Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4 In-Reply-To: <3FF5DFAA-BF98-42C3-9744-758441EB2BB2@oracle.com> References: <4E559CC0.6030701@oracle.com> <2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com> <4E56FB39.2050207@oracle.com> <3FF5DFAA-BF98-42C3-9744-758441EB2BB2@oracle.com> Message-ID: <4E57B2F0.90805@oracle.com> Thank you, Tom and Christian for reviews. Vladimir On 8/26/11 12:02 AM, Christian Thalinger wrote: > Looks good. -- Christian > > On Aug 26, 2011, at 3:47 AM, Vladimir Kozlov wrote: > >> Thank you, Tom >> >> I updated webrev with your and Christian suggestions: >> >> http://cr.openjdk.java.net/~kvn/7059037/webrev >> >> Tom Rodriguez wrote: >>> src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp: >>> Please use an ifdef block instead of the expression form. >> >> Done. >> >>> You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments. Something like: >>> predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con()> BlkZeroingLowLimit) >>> That would reduce any overhead for large instances that will never benefit from BIS. >> >> Done. I thought about that but found that such cases are rare since the expression which calculates count could be complex (because we mostly do partial zeroing) or when object is small with constant count ClearArray is replaced with stores in ideal transformation. But I agree it still may help. >> >>> Could we use block instead of blk? Otherwise this looks good. >> >> Done. >> >> Thanks, >> Vladimir >> >>> tom >>> On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote: >>>> http://cr.openjdk.java.net/~kvn/7059037/webrev >>>> >>>> 7059037: Use BIS for zeroing on T4 >>>> >>>> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new >>>> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is >>>> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words()) >>>> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words >>>> was added to use in runtime. >>>> >>>> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it >>>> requires membar. 2Hb was selected based on microbenchmark results. >>>> >>>> I also added wrasi(Reg, immI) instruction which I used during development. >>>> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original >>>> was not used. >>>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it >>>> since it will be cleaned later in init_obj(). >>>> Fixed call sites of check_for_bad_heap_word_value() where klass is not >>>> initialized to avoid the verification failure. >>>> > From vladimir.kozlov at oracle.com Fri Aug 26 13:33:55 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Fri, 26 Aug 2011 20:33:55 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7059037: Use BIS for zeroing on T4 Message-ID: <20110826203400.01D1247146@hg.openjdk.java.net> Changeset: baf763f388e6 Author: kvn Date: 2011-08-26 08:52 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/baf763f388e6 7059037: Use BIS for zeroing on T4 Summary: Use BIS for zeroing new allocated big (2Kb and more) objects and arrays. Reviewed-by: never, twisti, ysr ! src/cpu/sparc/vm/assembler_sparc.cpp ! src/cpu/sparc/vm/assembler_sparc.hpp ! src/cpu/sparc/vm/copy_sparc.hpp ! src/cpu/sparc/vm/sparc.ad ! src/cpu/sparc/vm/stubGenerator_sparc.cpp ! src/cpu/sparc/vm/templateTable_sparc.cpp ! src/cpu/sparc/vm/vm_version_sparc.cpp ! src/cpu/sparc/vm/vm_version_sparc.hpp ! src/share/vm/gc_interface/collectedHeap.cpp ! src/share/vm/gc_interface/collectedHeap.inline.hpp ! src/share/vm/oops/cpCacheKlass.cpp ! src/share/vm/runtime/globals.hpp ! src/share/vm/runtime/stubRoutines.cpp ! src/share/vm/runtime/stubRoutines.hpp From tom.rodriguez at oracle.com Fri Aug 26 13:47:47 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 26 Aug 2011 13:47:47 -0700 Subject: review for 7071307: MethodHandle bimorphic inlining should consider the frequency In-Reply-To: <6E9A5D90-666D-436B-BCA9-A781510B70DD@oracle.com> References: <4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com> <6E9A5D90-666D-436B-BCA9-A781510B70DD@oracle.com> Message-ID: <8D08BA0D-D556-463A-BD48-E3800C76A9ED@oracle.com> I needed an is_empty() test in addition to the method_data() != NULL. if (_caller_jvms != NULL && _caller_jvms->method() != NULL && _caller_jvms->method()->method_data() != NULL && !_caller_jvms->method()->method_data()->is_empty()) { ciMethodData* mdo = _caller_jvms->method()->method_data(); ciProfileData* mha_profile = mdo->bci_to_data(_caller_jvms->bci()); assert(mha_profile, "must exist"); CounterData* cd = mha_profile->as_CounterData(); call_site_count = cd->count(); } else { call_site_count = invoke_count; // use the same value } I also hit another unrelated assertion when running the test where he had optimized away all the invokedynamics, so Compile::has_method_handle_invokes was true but we never actually emitted any. So we failed this assert in nmethod.cpp: assert(has_method_handle_invokes() == (_deoptimize_mh_offset != -1), "must have deopt mh handler"); The fix is to remove the set_has_method_handle_invokes call in callGenerator.cpp and set them when they are matched. diff -r ac8738449b6f src/share/vm/opto/matcher.cpp --- a/src/share/vm/opto/matcher.cpp +++ b/src/share/vm/opto/matcher.cpp @@ -1106,6 +1106,9 @@ mcall_java->_optimized_virtual = call_java->is_optimized_virtual(); is_method_handle_invoke = call_java->is_method_handle_invoke(); mcall_java->_method_handle_invoke = is_method_handle_invoke; + if (is_method_handle_invoke) { + C->set_has_method_handle_invokes(true); + } if( mcall_java->is_MachCallStaticJava() ) mcall_java->as_MachCallStaticJava()->_name = call_java->as_CallStaticJava()->_name; There's some crazy deep inlining in that smalltalk test case. I think there must be some sort of bug with it. The PrintInlining output wraps on my screen several times. I'm looking at it. tom On Aug 26, 2011, at 4:16 AM, Christian Thalinger wrote: > I just applied this patch to test the rtalk implementation and I hit an assert: > > Internal Error at bytecodeInfo.cpp:152, pid=10351, tid=11 > assert(mha_profile) failed: must exist > > Some context: > > (dbx) p _caller_jvms->method()->print() > _caller_jvms->method()->print() = (void) > (dbx) p _caller_jvms->bci() > _caller_jvms->bci() = 7 > (dbx) p _caller_jvms->method()->print_codes() > 0 aload_2 > 1 astore_3 > 2 aload_1 > 3 fast_aload_0 > 4 aload_2 > 5 astore_3 > 6 aload_3 > 7 invokedynamic secondary cache[4] of CP[2] missing bias? > 0 bci: 7 CounterData count(16900) > 12 astore_3 > 13 aload_3 > 14 invokedynamic secondary cache[5] of CP[3] missing bias? > 8 bci: 14 CounterData count(16900) > 19 astore_3 > 20 aload_3 > 21 areturn > _caller_jvms->method()->print_codes() = (void) > (dbx) p mdo->print() > --- Extra data: > mdo->print() = (void) > (dbx) > > -- Christian > > On Aug 24, 2011, at 11:12 PM, Tom Rodriguez wrote: > >> >> On Aug 24, 2011, at 6:12 AM, Christian Thalinger wrote: >> >>> >>> On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote: >>> >>>> This is a re-review since I added per method handle GWT profiling. >>>> >>>> http://cr.openjdk.java.net/~never/7071307 >>>> 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg >>> >>> src/share/vm/prims/methodHandleWalk.cpp: >>> >>> MethodHandleCompiler::fetch_counts: >>> >>> + int count1 = -1, count2 = -1; >>> ... >>> + int total = count1 + count2; >>> + if (count1 != -1 && count2 != -2 && total != 0) { >>> >>> Why -2? >> >> Just a typo. It's fixed. >> >>> >>> + int _taken_count; >>> + int _not_taken_count; >>> >>> Does taken refer to target and not_taken to fallback in the GWT? >> >> They refer to the bytecode and the vmcounts collected. I think they are actually reversed from what selectAlternative generates but as long as they agree with the bytecodes generated I don't think it matters. I verified empirically that the counts match the execution and feed into the frequency in the proper fashion. >> >>> >>> MethodHandleCompiler::make_invoke: >>> >>> Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking? >> >> I added support for ifeq and added update_branch_dest to correct the offsets. I only added support for ifeq for now. >> >>> >>> + bool found_sel = false; >>> >>> Can you rename that to maybe found_selectAlternative? >> >> Yup. >> >>> >>> >>> src/share/vm/ci/ciMethodHandle.cpp: >>> >>> That print_chain is very helpful. Thanks for that. >>> >>> >>> src/share/vm/classfile/javaClasses.cpp: >>> >>> + int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) { >>> + assert(is_instance(mh), "DMH only"); >>> + return mh->int_field(_vmcount_offset); >>> + } >>> + >>> + void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) { >>> + assert(is_instance(mh), "DMH only"); >>> + mh->int_field_put(_vmcount_offset, count); >>> + } >>> >>> I think the assert message is a copy-paste bug. >> >> Fixed. >> >>> >>> Otherwise looks good. >> >> Thanks! >> >> tom >> >>> >>>> >>>> 7071307: MethodHandle bimorphic inlining should consider the frequency >>>> Reviewed-by: >>>> >>>> The fix for 7050554 added a bimorphic inline path but didn't take into >>>> account the frequency of the guarding test. This ends up treating >>>> both sides of the if as equally frequent which can lead to over >>>> inlining and overflowing the method inlining limits. The fix is to >>>> grab the frequency from the If and apply that to the branches. >>>> >>>> Additionally I added support for per method handle profile collection >>>> since this was required to get good results for more complex programs. >>>> This requires the fix for 7082631 on the JDK side. >>>> http://cr.openjdk.java.net/~never/7082631 >>> >>> The JDK changes look good. >>> >>> -- Christian >>> >>>> >>>> I also fixed a problem with the ideal graph printer where debug_orig >>>> printing would go into an infinite loop. >>>> >>>> Tested with jruby and vm.mlvm tests. >>>> >>> >> > From igor.veresov at oracle.com Sat Aug 27 02:21:23 2011 From: igor.veresov at oracle.com (igor.veresov at oracle.com) Date: Sat, 27 Aug 2011 09:21:23 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 6591247: C2 cleans up the merge point too early during SplitIf Message-ID: <20110827092125.9CD5147167@hg.openjdk.java.net> Changeset: 8805f8c1e23e Author: iveresov Date: 2011-08-27 00:23 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/8805f8c1e23e 6591247: C2 cleans up the merge point too early during SplitIf Summary: Remove region self reference last Reviewed-by: kvn, never ! src/share/vm/opto/split_if.cpp From john.r.rose at oracle.com Sat Aug 27 16:44:47 2011 From: john.r.rose at oracle.com (John Rose) Date: Sat, 27 Aug 2011 16:44:47 -0700 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> Message-ID: On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote: >> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass. Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses. > > Are there plans to do this? I changed ciField::is_call_site_target to check for subclasses of CallSite. No, no plans. Just a move toward robustness. Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses. I'm afraid we could get forced to split the field at some point. On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote: > > On Aug 26, 2011, at 1:47 AM, John Rose wrote: > >> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote: >> >>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times. MutableCallSite can be as well but the implication is that it's not updated very often. I'm actually unclear what distinction is trying to be made with VolatileCallSite. >> >> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS. These barriers do not affect the validity or applicability of push notification. Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times. We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted. > > So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes. I'm fine with both. I think we need the throttling logic right away. I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug. -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110827/39c5114a/attachment.html From christian.thalinger at oracle.com Sun Aug 28 01:16:23 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Sun, 28 Aug 2011 10:16:23 +0200 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> Message-ID: <4CE5B9F0-9357-4714-968B-2F818D0090A6@oracle.com> On Aug 28, 2011, at 1:44 AM, John Rose wrote: > On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote: > >>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass. Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses. >> >> Are there plans to do this? I changed ciField::is_call_site_target to check for subclasses of CallSite. > > No, no plans. Just a move toward robustness. Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses. I'm afraid we could get forced to split the field at some point. > > On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote: > >> >> On Aug 26, 2011, at 1:47 AM, John Rose wrote: >> >>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote: >>> >>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times. MutableCallSite can be as well but the implication is that it's not updated very often. I'm actually unclear what distinction is trying to be made with VolatileCallSite. >>> >>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS. These barriers do not affect the validity or applicability of push notification. Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times. We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted. >> >> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes. I'm fine with both. > > > I think we need the throttling logic right away. I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug. The other way around (MCS instead of VCS). Alright, then I'll change the logic in callGenerator and doCall to optimize VCSs too for this patch and start working on the throttling logic. -- Christian > > -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110828/aaa2db94/attachment.html From christian.thalinger at oracle.com Mon Aug 29 05:45:58 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 29 Aug 2011 14:45:58 +0200 Subject: review for 7071307: MethodHandle bimorphic inlining should consider the frequency In-Reply-To: <8D08BA0D-D556-463A-BD48-E3800C76A9ED@oracle.com> References: <4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com> <6E9A5D90-666D-436B-BCA9-A781510B70DD@oracle.com> <8D08BA0D-D556-463A-BD48-E3800C76A9ED@oracle.com> Message-ID: <6B7CD75D-B7D9-4F9A-ACE8-BD0718426A59@oracle.com> On Aug 26, 2011, at 10:47 PM, Tom Rodriguez wrote: > I needed an is_empty() test in addition to the method_data() != NULL. > > if (_caller_jvms != NULL && _caller_jvms->method() != NULL && > _caller_jvms->method()->method_data() != NULL && > !_caller_jvms->method()->method_data()->is_empty()) { > ciMethodData* mdo = _caller_jvms->method()->method_data(); > ciProfileData* mha_profile = mdo->bci_to_data(_caller_jvms->bci()); > assert(mha_profile, "must exist"); > CounterData* cd = mha_profile->as_CounterData(); > call_site_count = cd->count(); > } else { > call_site_count = invoke_count; // use the same value > } Looks good. > > I also hit another unrelated assertion when running the test where he had optimized away all the invokedynamics, so Compile::has_method_handle_invokes was true but we never actually emitted any. So we failed this assert in nmethod.cpp: > > assert(has_method_handle_invokes() == (_deoptimize_mh_offset != -1), "must have deopt mh handler"); > > The fix is to remove the set_has_method_handle_invokes call in callGenerator.cpp and set them when they are matched. > > diff -r ac8738449b6f src/share/vm/opto/matcher.cpp > --- a/src/share/vm/opto/matcher.cpp > +++ b/src/share/vm/opto/matcher.cpp > @@ -1106,6 +1106,9 @@ > mcall_java->_optimized_virtual = call_java->is_optimized_virtual(); > is_method_handle_invoke = call_java->is_method_handle_invoke(); > mcall_java->_method_handle_invoke = is_method_handle_invoke; > + if (is_method_handle_invoke) { > + C->set_has_method_handle_invokes(true); > + } > if( mcall_java->is_MachCallStaticJava() ) > mcall_java->as_MachCallStaticJava()->_name = > call_java->as_CallStaticJava()->_name; Ahh, good catch. > > There's some crazy deep inlining in that smalltalk test case. I think there must be some sort of bug with it. The PrintInlining output wraps on my screen several times. I'm looking at it. I haven't printed the inlining tree yet. I will try... -- Christian > > tom > > On Aug 26, 2011, at 4:16 AM, Christian Thalinger wrote: > >> I just applied this patch to test the rtalk implementation and I hit an assert: >> >> Internal Error at bytecodeInfo.cpp:152, pid=10351, tid=11 >> assert(mha_profile) failed: must exist >> >> Some context: >> >> (dbx) p _caller_jvms->method()->print() >> _caller_jvms->method()->print() = (void) >> (dbx) p _caller_jvms->bci() >> _caller_jvms->bci() = 7 >> (dbx) p _caller_jvms->method()->print_codes() >> 0 aload_2 >> 1 astore_3 >> 2 aload_1 >> 3 fast_aload_0 >> 4 aload_2 >> 5 astore_3 >> 6 aload_3 >> 7 invokedynamic secondary cache[4] of CP[2] missing bias? >> 0 bci: 7 CounterData count(16900) >> 12 astore_3 >> 13 aload_3 >> 14 invokedynamic secondary cache[5] of CP[3] missing bias? >> 8 bci: 14 CounterData count(16900) >> 19 astore_3 >> 20 aload_3 >> 21 areturn >> _caller_jvms->method()->print_codes() = (void) >> (dbx) p mdo->print() >> --- Extra data: >> mdo->print() = (void) >> (dbx) >> >> -- Christian >> >> On Aug 24, 2011, at 11:12 PM, Tom Rodriguez wrote: >> >>> >>> On Aug 24, 2011, at 6:12 AM, Christian Thalinger wrote: >>> >>>> >>>> On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote: >>>> >>>>> This is a re-review since I added per method handle GWT profiling. >>>>> >>>>> http://cr.openjdk.java.net/~never/7071307 >>>>> 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg >>>> >>>> src/share/vm/prims/methodHandleWalk.cpp: >>>> >>>> MethodHandleCompiler::fetch_counts: >>>> >>>> + int count1 = -1, count2 = -1; >>>> ... >>>> + int total = count1 + count2; >>>> + if (count1 != -1 && count2 != -2 && total != 0) { >>>> >>>> Why -2? >>> >>> Just a typo. It's fixed. >>> >>>> >>>> + int _taken_count; >>>> + int _not_taken_count; >>>> >>>> Does taken refer to target and not_taken to fallback in the GWT? >>> >>> They refer to the bytecode and the vmcounts collected. I think they are actually reversed from what selectAlternative generates but as long as they agree with the bytecodes generated I don't think it matters. I verified empirically that the counts match the execution and feed into the frequency in the proper fashion. >>> >>>> >>>> MethodHandleCompiler::make_invoke: >>>> >>>> Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking? >>> >>> I added support for ifeq and added update_branch_dest to correct the offsets. I only added support for ifeq for now. >>> >>>> >>>> + bool found_sel = false; >>>> >>>> Can you rename that to maybe found_selectAlternative? >>> >>> Yup. >>> >>>> >>>> >>>> src/share/vm/ci/ciMethodHandle.cpp: >>>> >>>> That print_chain is very helpful. Thanks for that. >>>> >>>> >>>> src/share/vm/classfile/javaClasses.cpp: >>>> >>>> + int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) { >>>> + assert(is_instance(mh), "DMH only"); >>>> + return mh->int_field(_vmcount_offset); >>>> + } >>>> + >>>> + void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) { >>>> + assert(is_instance(mh), "DMH only"); >>>> + mh->int_field_put(_vmcount_offset, count); >>>> + } >>>> >>>> I think the assert message is a copy-paste bug. >>> >>> Fixed. >>> >>>> >>>> Otherwise looks good. >>> >>> Thanks! >>> >>> tom >>> >>>> >>>>> >>>>> 7071307: MethodHandle bimorphic inlining should consider the frequency >>>>> Reviewed-by: >>>>> >>>>> The fix for 7050554 added a bimorphic inline path but didn't take into >>>>> account the frequency of the guarding test. This ends up treating >>>>> both sides of the if as equally frequent which can lead to over >>>>> inlining and overflowing the method inlining limits. The fix is to >>>>> grab the frequency from the If and apply that to the branches. >>>>> >>>>> Additionally I added support for per method handle profile collection >>>>> since this was required to get good results for more complex programs. >>>>> This requires the fix for 7082631 on the JDK side. >>>>> http://cr.openjdk.java.net/~never/7082631 >>>> >>>> The JDK changes look good. >>>> >>>> -- Christian >>>> >>>>> >>>>> I also fixed a problem with the ideal graph printer where debug_orig >>>>> printing would go into an infinite loop. >>>>> >>>>> Tested with jruby and vm.mlvm tests. >>>>> >>>> >>> >> > From christian.thalinger at oracle.com Mon Aug 29 08:21:54 2011 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Mon, 29 Aug 2011 15:21:54 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7083184: JSR 292: don't store context class argument with call site dependencies Message-ID: <20110829152156.726D7471D7@hg.openjdk.java.net> Changeset: b27c72d69fd1 Author: twisti Date: 2011-08-29 05:07 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/b27c72d69fd1 7083184: JSR 292: don't store context class argument with call site dependencies Reviewed-by: jrose, never ! src/share/vm/ci/ciEnv.cpp ! src/share/vm/ci/ciEnv.hpp ! src/share/vm/code/dependencies.cpp ! src/share/vm/code/dependencies.hpp ! src/share/vm/memory/universe.cpp ! src/share/vm/opto/callGenerator.cpp From christian.thalinger at oracle.com Mon Aug 29 09:52:27 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 29 Aug 2011 18:52:27 +0200 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> Message-ID: <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com> On Aug 28, 2011, at 1:44 AM, John Rose wrote: > On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote: > >>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass. Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses. >> >> Are there plans to do this? I changed ciField::is_call_site_target to check for subclasses of CallSite. > > No, no plans. Just a move toward robustness. Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses. I'm afraid we could get forced to split the field at some point. I think the point is now. setTargetVolatile uses Unsafe to fake the volatile field semantics and the compiler doesn't recognize this as a field store to CS.target. Thus we miss the field stores to a VCS and end up with wrong behavior. I'm currently preparing something that does this refactoring. Why was this Unsafe trick used in the first place? -- Christian > > On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote: > >> >> On Aug 26, 2011, at 1:47 AM, John Rose wrote: >> >>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote: >>> >>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times. MutableCallSite can be as well but the implication is that it's not updated very often. I'm actually unclear what distinction is trying to be made with VolatileCallSite. >>> >>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS. These barriers do not affect the validity or applicability of push notification. Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times. We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted. >> >> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes. I'm fine with both. > > > I think we need the throttling logic right away. I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug. > > -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110829/5f65697c/attachment.html From christian.thalinger at oracle.com Mon Aug 29 10:22:36 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 29 Aug 2011 19:22:36 +0200 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com> Message-ID: <4A211AB4-019F-4E4A-8E68-F7E3F4CDF2CC@oracle.com> On Aug 29, 2011, at 6:52 PM, Christian Thalinger wrote: > > On Aug 28, 2011, at 1:44 AM, John Rose wrote: > >> On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote: >> >>>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass. Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses. >>> >>> Are there plans to do this? I changed ciField::is_call_site_target to check for subclasses of CallSite. >> >> No, no plans. Just a move toward robustness. Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses. I'm afraid we could get forced to split the field at some point. > > I think the point is now. setTargetVolatile uses Unsafe to fake the volatile field semantics and the compiler doesn't recognize this as a field store to CS.target. Thus we miss the field stores to a VCS and end up with wrong behavior. > > I'm currently preparing something that does this refactoring. > > Why was this Unsafe trick used in the first place? Never mind. I can see now why. -- Christian > > -- Christian > >> >> On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote: >> >>> >>> On Aug 26, 2011, at 1:47 AM, John Rose wrote: >>> >>>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote: >>>> >>>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times. MutableCallSite can be as well but the implication is that it's not updated very often. I'm actually unclear what distinction is trying to be made with VolatileCallSite. >>>> >>>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS. These barriers do not affect the validity or applicability of push notification. Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times. We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted. >>> >>> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes. I'm fine with both. >> >> >> I think we need the throttling logic right away. I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug. >> >> -- John > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110829/78974f61/attachment-0001.html From tom.rodriguez at oracle.com Mon Aug 29 11:03:37 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 29 Aug 2011 11:03:37 -0700 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com> Message-ID: <0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com> On Aug 29, 2011, at 9:52 AM, Christian Thalinger wrote: > > On Aug 28, 2011, at 1:44 AM, John Rose wrote: > >> On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote: >> >>>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass. Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses. >>> >>> Are there plans to do this? I changed ciField::is_call_site_target to check for subclasses of CallSite. >> >> No, no plans. Just a move toward robustness. Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses. I'm afraid we could get forced to split the field at some point. > > I think the point is now. setTargetVolatile uses Unsafe to fake the volatile field semantics and the compiler doesn't recognize this as a field store to CS.target. Thus we miss the field stores to a VCS and end up with wrong behavior. > > I'm currently preparing something that does this refactoring. I'm not so sure this is a good idea. A fair amount of code assumes that the structure of all CallSites is the same: __ load_heap_oop(rcx_method_handle, Address(rax_callsite, __ delayed_value(java_lang_invoke_CallSite::target_offset_in_bytes, rdx))); __ null_check(rcx_method_handle); __ verify_oop(rcx_method_handle); __ prepare_to_jump_from_interpreted(); __ jump_to_method_handle_entry(rcx_method_handle, rdx); I guess we could require/enforce that all call site subclasses have their target field at the same offset but it does seem to be break something fairly fundamental. You could trap these writes in the Unsafe machinery instead. That's fairly ugly but easy enough to do with a few assumptions about how it will be written. We might have to worry about reflection too, though that should either use Unsafe I think. Maybe we should move forward with what we have and deal with VCS later? tom > > Why was this Unsafe trick used in the first place? > > -- Christian > >> >> On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote: >> >>> >>> On Aug 26, 2011, at 1:47 AM, John Rose wrote: >>> >>>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote: >>>> >>>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times. MutableCallSite can be as well but the implication is that it's not updated very often. I'm actually unclear what distinction is trying to be made with VolatileCallSite. >>>> >>>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS. These barriers do not affect the validity or applicability of push notification. Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times. We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted. >>> >>> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes. I'm fine with both. >> >> I think we need the throttling logic right away. I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug. >> >> -- John > From christian.thalinger at oracle.com Mon Aug 29 11:56:32 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 29 Aug 2011 20:56:32 +0200 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com> <0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com> Message-ID: <2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com> On Aug 29, 2011, at 8:03 PM, Tom Rodriguez wrote: > > On Aug 29, 2011, at 9:52 AM, Christian Thalinger wrote: > >> >> On Aug 28, 2011, at 1:44 AM, John Rose wrote: >> >>> On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote: >>> >>>>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass. Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses. >>>> >>>> Are there plans to do this? I changed ciField::is_call_site_target to check for subclasses of CallSite. >>> >>> No, no plans. Just a move toward robustness. Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses. I'm afraid we could get forced to split the field at some point. >> >> I think the point is now. setTargetVolatile uses Unsafe to fake the volatile field semantics and the compiler doesn't recognize this as a field store to CS.target. Thus we miss the field stores to a VCS and end up with wrong behavior. >> >> I'm currently preparing something that does this refactoring. > > I'm not so sure this is a good idea. A fair amount of code assumes that the structure of all CallSites is the same: > > __ load_heap_oop(rcx_method_handle, Address(rax_callsite, __ delayed_value(java_lang_invoke_CallSite::target_offset_in_bytes, rdx))); > __ null_check(rcx_method_handle); > __ verify_oop(rcx_method_handle); > __ prepare_to_jump_from_interpreted(); > __ jump_to_method_handle_entry(rcx_method_handle, rdx); > > I guess we could require/enforce that all call site subclasses have their target field at the same offset but it does seem to be break something fairly fundamental. I agree. I got it working but it's fragile. > > You could trap these writes in the Unsafe machinery instead. That's fairly ugly but easy enough to do with a few assumptions about how it will be written. We might have to worry about reflection too, though that should either use Unsafe I think. I don't like that very much either. > > Maybe we should move forward with what we have and deal with VCS later? Yes, I think that would be the best approach. John, what do you think, optimize CCS and MCS for now and deal with VCS later? -- Christian > > tom > >> >> Why was this Unsafe trick used in the first place? >> >> -- Christian >> >>> >>> On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote: >>> >>>> >>>> On Aug 26, 2011, at 1:47 AM, John Rose wrote: >>>> >>>>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote: >>>>> >>>>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times. MutableCallSite can be as well but the implication is that it's not updated very often. I'm actually unclear what distinction is trying to be made with VolatileCallSite. >>>>> >>>>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS. These barriers do not affect the validity or applicability of push notification. Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times. We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted. >>>> >>>> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes. I'm fine with both. >>> >>> I think we need the throttling logic right away. I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug. >>> >>> -- John >> > From john.r.rose at oracle.com Mon Aug 29 12:41:56 2011 From: john.r.rose at oracle.com (John Rose) Date: Mon, 29 Aug 2011 12:41:56 -0700 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: <2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com> References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com> <0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com> <2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com> Message-ID: Yes, deal with volatile fields later. I do think that VCS should get push notif now. -- John (on my iPhone) On Aug 29, 2011, at 11:56 AM, Christian Thalinger wrote: > Yes, I think that would be the best approach. John, what do you think, optimize CCS and MCS for now and deal with VCS later? From christian.thalinger at oracle.com Tue Aug 30 01:07:52 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 30 Aug 2011 10:07:52 +0200 Subject: Request for reviews (S): 7078382: JSR 292: don't count method handle adapters against inlining budgets In-Reply-To: References: Message-ID: <5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com> So, the change is so small that nobody cares? :-) -- Christian On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7078382/ > > 7078382: JSR 292: don't count method handle adapters against inlining budgets > Reviewed-by: > > Currently the code size of method handle adapters are counted against > inlining budgets like DesiredMethodLimit. This results to earlier > compiler bailouts with method handle call sites than without leading > to worse performance. > > The fix is to return an adjusted bytecode size for method handle > adapters for inlining decisions (the metric we use for now is the > number of invokes). > > Tested with JRuby benchmarks. > From vladimir.kozlov at oracle.com Tue Aug 30 07:59:58 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Aug 2011 07:59:58 -0700 Subject: Request for reviews (S): 7078382: JSR 292: don't count method handle adapters against inlining budgets In-Reply-To: <5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com> References: <5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com> Message-ID: <4E5CFAEE.2010006@oracle.com> + // (a) Don't fully count method handle adapters against inlining ^ you have only one paragraph so (a) is not needed. "sites of the adapter" --> "sites in the adapter" Can you not assign inside loop's condition? You can do next: + while (iter.next() != ciBytecodeStream::EOBC()) { + if (Bytecodes::is_invoke(iter.cur_bc())) { Other looks good. Thanks, Vladimir On 8/30/11 1:07 AM, Christian Thalinger wrote: > So, the change is so small that nobody cares? :-) > > -- Christian > > On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote: > >> http://cr.openjdk.java.net/~twisti/7078382/ >> >> 7078382: JSR 292: don't count method handle adapters against inlining budgets >> Reviewed-by: >> >> Currently the code size of method handle adapters are counted against >> inlining budgets like DesiredMethodLimit. This results to earlier >> compiler bailouts with method handle call sites than without leading >> to worse performance. >> >> The fix is to return an adjusted bytecode size for method handle >> adapters for inlining decisions (the metric we use for now is the >> number of invokes). >> >> Tested with JRuby benchmarks. >> > From christian.thalinger at oracle.com Tue Aug 30 08:35:11 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 30 Aug 2011 17:35:11 +0200 Subject: Request for reviews (S): 7078382: JSR 292: don't count method handle adapters against inlining budgets In-Reply-To: <4E5CFAEE.2010006@oracle.com> References: <5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com> <4E5CFAEE.2010006@oracle.com> Message-ID: <80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com> On Aug 30, 2011, at 4:59 PM, Vladimir Kozlov wrote: > + // (a) Don't fully count method handle adapters against inlining > ^ you have only one paragraph so (a) is not needed. Yeah. I thought maybe we get more in the future :-) I removed it. > > "sites of the adapter" --> "sites in the adapter" Thanks. > > Can you not assign inside loop's condition? You can do next: > > + while (iter.next() != ciBytecodeStream::EOBC()) { > + if (Bytecodes::is_invoke(iter.cur_bc())) { Yes, I like that better. I also changed the example in ciStreams.hpp as I got that code from there. > > Other looks good. Thank you. I updated the webrev. -- Christian > > Thanks, > Vladimir > > > On 8/30/11 1:07 AM, Christian Thalinger wrote: >> So, the change is so small that nobody cares? :-) >> >> -- Christian >> >> On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote: >> >>> http://cr.openjdk.java.net/~twisti/7078382/ >>> >>> 7078382: JSR 292: don't count method handle adapters against inlining budgets >>> Reviewed-by: >>> >>> Currently the code size of method handle adapters are counted against >>> inlining budgets like DesiredMethodLimit. This results to earlier >>> compiler bailouts with method handle call sites than without leading >>> to worse performance. >>> >>> The fix is to return an adjusted bytecode size for method handle >>> adapters for inlining decisions (the metric we use for now is the >>> number of invokes). >>> >>> Tested with JRuby benchmarks. >>> >> From vladimir.kozlov at oracle.com Tue Aug 30 08:45:26 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Aug 2011 08:45:26 -0700 Subject: Request for reviews (S): 7078382: JSR 292: don't count method handle adapters against inlining budgets In-Reply-To: <80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com> References: <5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com> <4E5CFAEE.2010006@oracle.com> <80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com> Message-ID: <4E5D0596.6060703@oracle.com> Looks good. Thanks, Vladimir Christian Thalinger wrote: > On Aug 30, 2011, at 4:59 PM, Vladimir Kozlov wrote: > >> + // (a) Don't fully count method handle adapters against inlining >> ^ you have only one paragraph so (a) is not needed. > > Yeah. I thought maybe we get more in the future :-) I removed it. > >> "sites of the adapter" --> "sites in the adapter" > > Thanks. > >> Can you not assign inside loop's condition? You can do next: >> >> + while (iter.next() != ciBytecodeStream::EOBC()) { >> + if (Bytecodes::is_invoke(iter.cur_bc())) { > > Yes, I like that better. I also changed the example in ciStreams.hpp as I got that code from there. > >> Other looks good. > > Thank you. I updated the webrev. > > -- Christian > >> Thanks, >> Vladimir >> >> >> On 8/30/11 1:07 AM, Christian Thalinger wrote: >>> So, the change is so small that nobody cares? :-) >>> >>> -- Christian >>> >>> On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote: >>> >>>> http://cr.openjdk.java.net/~twisti/7078382/ >>>> >>>> 7078382: JSR 292: don't count method handle adapters against inlining budgets >>>> Reviewed-by: >>>> >>>> Currently the code size of method handle adapters are counted against >>>> inlining budgets like DesiredMethodLimit. This results to earlier >>>> compiler bailouts with method handle call sites than without leading >>>> to worse performance. >>>> >>>> The fix is to return an adjusted bytecode size for method handle >>>> adapters for inlining decisions (the metric we use for now is the >>>> number of invokes). >>>> >>>> Tested with JRuby benchmarks. >>>> > From christian.thalinger at oracle.com Tue Aug 30 09:21:21 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 30 Aug 2011 18:21:21 +0200 Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline bytecoded method handle adapters Message-ID: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com> http://cr.openjdk.java.net/~twisti/7079673/ 7079673: JSR 292: C1 should inline bytecoded method handle adapters Reviewed-by: The current JSR 292 support in C1 always does an invoke for method handle invokes which results in a lot of C2I-I2C transfers. This results in very poor performance. src/share/vm/c1/c1_GraphBuilder.cpp src/share/vm/c1/c1_GraphBuilder.hpp src/share/vm/c1/c1_Instruction.cpp src/share/vm/c1/c1_Instruction.hpp src/share/vm/classfile/javaClasses.cpp src/share/vm/classfile/vmSymbols.hpp From john.cuthbertson at oracle.com Tue Aug 30 09:54:09 2011 From: john.cuthbertson at oracle.com (John Cuthbertson) Date: Tue, 30 Aug 2011 09:54:09 -0700 Subject: RFR(S): 7066841: remove MacroAssembler::br_on_reg_cond() on sparc Message-ID: <4E5D15B1.9010006@oracle.com> Hi Everyone, Can I have couple of volunteers look over these changes? The webrev can be found at: http://cr.openjdk.java.net/~johnc/7066841/webrev.0/. These changes basically remove the macro assembler routine br_on_reg_cond and replace the remaining calls to that routine, in the G1 barriers, with an equivalent. Testing: GC test suite and Kitchensink on 32/64 bit sparc with -Xint, -client -Xcomp, -XX:+TieredCompilation -XX:TieredStopAtLevel=1, and default. VerifyDuringGC and VerifyBeforeGC were also enabled to detect missing barriers. Thanks, JohnC From tom.rodriguez at oracle.com Tue Aug 30 09:56:23 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 30 Aug 2011 09:56:23 -0700 Subject: Request for reviews (S): 7078382: JSR 292: don't count method handle adapters against inlining budgets In-Reply-To: <4E5D0596.6060703@oracle.com> References: <5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com> <4E5CFAEE.2010006@oracle.com> <80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com> <4E5D0596.6060703@oracle.com> Message-ID: <6770529E-2661-4D36-8B1A-3607CF33CAE6@oracle.com> Yes it looks good. tom On Aug 30, 2011, at 8:45 AM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > Christian Thalinger wrote: >> On Aug 30, 2011, at 4:59 PM, Vladimir Kozlov wrote: >>> + // (a) Don't fully count method handle adapters against inlining >>> ^ you have only one paragraph so (a) is not needed. >> Yeah. I thought maybe we get more in the future :-) I removed it. >>> "sites of the adapter" --> "sites in the adapter" >> Thanks. >>> Can you not assign inside loop's condition? You can do next: >>> >>> + while (iter.next() != ciBytecodeStream::EOBC()) { >>> + if (Bytecodes::is_invoke(iter.cur_bc())) { >> Yes, I like that better. I also changed the example in ciStreams.hpp as I got that code from there. >>> Other looks good. >> Thank you. I updated the webrev. >> -- Christian >>> Thanks, >>> Vladimir >>> >>> >>> On 8/30/11 1:07 AM, Christian Thalinger wrote: >>>> So, the change is so small that nobody cares? :-) >>>> >>>> -- Christian >>>> >>>> On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote: >>>> >>>>> http://cr.openjdk.java.net/~twisti/7078382/ >>>>> >>>>> 7078382: JSR 292: don't count method handle adapters against inlining budgets >>>>> Reviewed-by: >>>>> >>>>> Currently the code size of method handle adapters are counted against >>>>> inlining budgets like DesiredMethodLimit. This results to earlier >>>>> compiler bailouts with method handle call sites than without leading >>>>> to worse performance. >>>>> >>>>> The fix is to return an adjusted bytecode size for method handle >>>>> adapters for inlining decisions (the metric we use for now is the >>>>> number of invokes). >>>>> >>>>> Tested with JRuby benchmarks. >>>>> From christian.thalinger at oracle.com Tue Aug 30 10:03:01 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 30 Aug 2011 19:03:01 +0200 Subject: Request for reviews (S): 7078382: JSR 292: don't count method handle adapters against inlining budgets In-Reply-To: <6770529E-2661-4D36-8B1A-3607CF33CAE6@oracle.com> References: <5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com> <4E5CFAEE.2010006@oracle.com> <80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com> <4E5D0596.6060703@oracle.com> <6770529E-2661-4D36-8B1A-3607CF33CAE6@oracle.com> Message-ID: <8F1601DB-D389-4451-8BF2-0530028D21B2@oracle.com> Thanks, Tom and Vladimir. -- Christian On Aug 30, 2011, at 6:56 PM, Tom Rodriguez wrote: > Yes it looks good. > > tom > > On Aug 30, 2011, at 8:45 AM, Vladimir Kozlov wrote: > >> Looks good. >> >> Thanks, >> Vladimir >> >> Christian Thalinger wrote: >>> On Aug 30, 2011, at 4:59 PM, Vladimir Kozlov wrote: >>>> + // (a) Don't fully count method handle adapters against inlining >>>> ^ you have only one paragraph so (a) is not needed. >>> Yeah. I thought maybe we get more in the future :-) I removed it. >>>> "sites of the adapter" --> "sites in the adapter" >>> Thanks. >>>> Can you not assign inside loop's condition? You can do next: >>>> >>>> + while (iter.next() != ciBytecodeStream::EOBC()) { >>>> + if (Bytecodes::is_invoke(iter.cur_bc())) { >>> Yes, I like that better. I also changed the example in ciStreams.hpp as I got that code from there. >>>> Other looks good. >>> Thank you. I updated the webrev. >>> -- Christian >>>> Thanks, >>>> Vladimir >>>> >>>> >>>> On 8/30/11 1:07 AM, Christian Thalinger wrote: >>>>> So, the change is so small that nobody cares? :-) >>>>> >>>>> -- Christian >>>>> >>>>> On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote: >>>>> >>>>>> http://cr.openjdk.java.net/~twisti/7078382/ >>>>>> >>>>>> 7078382: JSR 292: don't count method handle adapters against inlining budgets >>>>>> Reviewed-by: >>>>>> >>>>>> Currently the code size of method handle adapters are counted against >>>>>> inlining budgets like DesiredMethodLimit. This results to earlier >>>>>> compiler bailouts with method handle call sites than without leading >>>>>> to worse performance. >>>>>> >>>>>> The fix is to return an adjusted bytecode size for method handle >>>>>> adapters for inlining decisions (the metric we use for now is the >>>>>> number of invokes). >>>>>> >>>>>> Tested with JRuby benchmarks. >>>>>> > From vladimir.kozlov at oracle.com Tue Aug 30 10:47:28 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Aug 2011 10:47:28 -0700 Subject: RFR(S): 7066841: remove MacroAssembler::br_on_reg_cond() on sparc In-Reply-To: <4E5D15B1.9010006@oracle.com> References: <4E5D15B1.9010006@oracle.com> Message-ID: <4E5D2230.70802@oracle.com> Nice cleanup. Thank you, John. Vladimir John Cuthbertson wrote: > Hi Everyone, > > Can I have couple of volunteers look over these changes? The webrev can > be found at: http://cr.openjdk.java.net/~johnc/7066841/webrev.0/. > > These changes basically remove the macro assembler routine > br_on_reg_cond and replace the remaining calls to that routine, in the > G1 barriers, with an equivalent. > > Testing: GC test suite and Kitchensink on 32/64 bit sparc with -Xint, > -client -Xcomp, -XX:+TieredCompilation -XX:TieredStopAtLevel=1, and > default. VerifyDuringGC and VerifyBeforeGC were also enabled to detect > missing barriers. > > Thanks, > > JohnC From tom.rodriguez at oracle.com Tue Aug 30 11:50:33 2011 From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com) Date: Tue, 30 Aug 2011 18:50:33 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7082263: Reflection::resolve_field/field_get/field_set are broken Message-ID: <20110830185035.0FA6E47231@hg.openjdk.java.net> Changeset: 19241ae0d839 Author: never Date: 2011-08-30 00:54 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/19241ae0d839 7082263: Reflection::resolve_field/field_get/field_set are broken Reviewed-by: kvn, dholmes, stefank, coleenp ! make/linux/makefiles/mapfile-vers-debug ! make/linux/makefiles/mapfile-vers-product ! make/solaris/makefiles/debug.make ! make/solaris/makefiles/fastdebug.make ! make/solaris/makefiles/jvmg.make - make/solaris/makefiles/mapfile-vers-nonproduct ! make/solaris/makefiles/optimized.make ! make/solaris/makefiles/product.make ! src/share/vm/precompiled.hpp ! src/share/vm/prims/jvm.cpp ! src/share/vm/prims/jvm.h ! src/share/vm/prims/unsafe.cpp ! src/share/vm/runtime/reflection.cpp ! src/share/vm/runtime/reflection.hpp - src/share/vm/runtime/reflectionCompat.hpp From tom.rodriguez at oracle.com Tue Aug 30 12:08:23 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 30 Aug 2011 12:08:23 -0700 Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline bytecoded method handle adapters In-Reply-To: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com> References: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com> Message-ID: <1C3853BB-211B-4082-950A-B837A4582775@oracle.com> c1_GraphBuilder.cpp: + } else if (receiver->as_CheckCast()) { I think this should be more robust. The as_Phi and operand_count checks should be part of this guard instead of being asserts. I assume this will be updated to do the optimization for VCS as well? Otherwise it looks good. tom On Aug 30, 2011, at 9:21 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7079673/ > > 7079673: JSR 292: C1 should inline bytecoded method handle adapters > Reviewed-by: > > The current JSR 292 support in C1 always does an invoke for method > handle invokes which results in a lot of C2I-I2C transfers. This > results in very poor performance. > > src/share/vm/c1/c1_GraphBuilder.cpp > src/share/vm/c1/c1_GraphBuilder.hpp > src/share/vm/c1/c1_Instruction.cpp > src/share/vm/c1/c1_Instruction.hpp > src/share/vm/classfile/javaClasses.cpp > src/share/vm/classfile/vmSymbols.hpp > From vladimir.kozlov at oracle.com Tue Aug 30 14:26:24 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Aug 2011 14:26:24 -0700 Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken Message-ID: <4E5D5580.9010604@oracle.com> http://cr.openjdk.java.net/~kvn/7085137/webrev 7085137: -XX:+VerifyOops is broken I hit my new assert about different code emit size (7063629) when I specified -XX:+VerifyOops on sparc. It uses set((intptr_t)msg, O0) instruction to set address of message which is new each time, as result set() size could be different. Replace set() with patchable_set() to generate 8 instructions always. Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg(). Thanks, Vladimir From tom.rodriguez at oracle.com Tue Aug 30 16:12:02 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 30 Aug 2011 16:12:02 -0700 Subject: review for 7016881: JSR 292: JDI: sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds Message-ID: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com> http://cr.openjdk.java.net/~never/7016881 1 line changed: 0 ins; 0 del; 1 mod; 233 unchg 7016881: JSR 292: JDI: sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds Reviewed-by: This was a bug in the 7012081 changes. A reference to rawIndex wasn't updated to poolIndex so some times the wrong index was used resulting in exceptions. Tested with failing test. From vladimir.kozlov at oracle.com Tue Aug 30 15:00:48 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Aug 2011 15:00:48 -0700 Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken In-Reply-To: <9429CD3F52A14F59B559311750D61152@oracle.com> References: <4E5D5580.9010604@oracle.com> <9429CD3F52A14F59B559311750D61152@oracle.com> Message-ID: <4E5D5D90.3010302@oracle.com> Thank you, Igor Vladimir Igor Veresov wrote: > Looks good > > igor > > On Tuesday, August 30, 2011 at 2:26 PM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7085137/webrev >> >> 7085137: -XX:+VerifyOops is broken >> >> I hit my new assert about different code emit size (7063629) when I specified >> -XX:+VerifyOops on sparc. It uses set((intptr_t)msg, O0) instruction to set >> address of message which is new each time, as result set() size could be different. >> Replace set() with patchable_set() to generate 8 instructions always. >> Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg(). >> >> Thanks, >> Vladimir > > From vladimir.kozlov at oracle.com Tue Aug 30 16:22:32 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Aug 2011 16:22:32 -0700 Subject: review for 7016881: JSR 292: JDI: sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds In-Reply-To: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com> References: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com> Message-ID: <4E5D70B8.60304@oracle.com> Good. Vladimir Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/7016881 > 1 line changed: 0 ins; 0 del; 1 mod; 233 unchg > > 7016881: JSR 292: JDI: sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds > Reviewed-by: > > This was a bug in the 7012081 changes. A reference to rawIndex wasn't > updated to poolIndex so some times the wrong index was used resulting in > exceptions. Tested with failing test. > From igor.veresov at oracle.com Tue Aug 30 14:57:00 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 30 Aug 2011 14:57:00 -0700 Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken In-Reply-To: <4E5D5580.9010604@oracle.com> References: <4E5D5580.9010604@oracle.com> Message-ID: <9429CD3F52A14F59B559311750D61152@oracle.com> Looks good igor On Tuesday, August 30, 2011 at 2:26 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7085137/webrev > > 7085137: -XX:+VerifyOops is broken > > I hit my new assert about different code emit size (7063629) when I specified > -XX:+VerifyOops on sparc. It uses set((intptr_t)msg, O0) instruction to set > address of message which is new each time, as result set() size could be different. > Replace set() with patchable_set() to generate 8 instructions always. > Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg(). > > Thanks, > Vladimir From igor.veresov at oracle.com Tue Aug 30 17:19:22 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 30 Aug 2011 17:19:22 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops Message-ID: This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ Thanks, igor From tom.rodriguez at oracle.com Tue Aug 30 17:51:18 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 30 Aug 2011 17:51:18 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: References: Message-ID: <99BF7288-C64C-4F1A-93AD-70E668343872@oracle.com> On Aug 30, 2011, at 5:19 PM, Igor Veresov wrote: > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. > > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. The 2K limit is fine. I have some memory that the NMethodSizeLimit may be set at 32K because of the reach of branches on some platform. I can't remember for sure though. tom > > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ > > Thanks, > igor > From tom.rodriguez at oracle.com Tue Aug 30 17:53:48 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 30 Aug 2011 17:53:48 -0700 Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken In-Reply-To: <4E5D5580.9010604@oracle.com> References: <4E5D5580.9010604@oracle.com> Message-ID: <432D4798-1664-4B0A-9C4D-B5A47174D5FD@oracle.com> Looks good. tom On Aug 30, 2011, at 2:26 PM, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/7085137/webrev > > 7085137: -XX:+VerifyOops is broken > > I hit my new assert about different code emit size (7063629) when I specified -XX:+VerifyOops on sparc. It uses set((intptr_t)msg, O0) instruction to set address of message which is new each time, as result set() size could be different. > Replace set() with patchable_set() to generate 8 instructions always. > Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg(). > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Tue Aug 30 17:49:52 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Aug 2011 17:49:52 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: References: Message-ID: <4E5D8530.5050507@oracle.com> Igor, May be you need to increase size only if VerifyOops is specified. What do you think? Vladimir Igor Veresov wrote: > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. > > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. > > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ > > Thanks, > igor > From igor.veresov at oracle.com Tue Aug 30 18:09:13 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 30 Aug 2011 18:09:13 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: <99BF7288-C64C-4F1A-93AD-70E668343872@oracle.com> References: <99BF7288-C64C-4F1A-93AD-70E668343872@oracle.com> Message-ID: <6EF543BAD8094CB7AAEBD9A2CED6BE2C@oracle.com> I think it's being taken care of here: static int desired_max_code_buffer_size() { #ifndef PPC return (int) NMethodSizeLimit; // default 256K or 512K #else // conditional branches on PPC are restricted to 16 bit signed return MIN2((unsigned int)NMethodSizeLimit,32*K); #endif } igor On Tuesday, August 30, 2011 at 5:51 PM, Tom Rodriguez wrote: > > On Aug 30, 2011, at 5:19 PM, Igor Veresov wrote: > > > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. > > > > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. > > The 2K limit is fine. I have some memory that the NMethodSizeLimit may be set at 32K because of the reach of branches on some platform. I can't remember for sure though. > > tom > > > > > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ > > > > Thanks, > > igor From igor.veresov at oracle.com Tue Aug 30 18:12:29 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 30 Aug 2011 18:12:29 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: <4E5D8530.5050507@oracle.com> References: <4E5D8530.5050507@oracle.com> Message-ID: I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. igor On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote: > Igor, > > May be you need to increase size only if VerifyOops is specified. What do you think? > > Vladimir > > Igor Veresov wrote: > > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. > > > > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. > > > > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ > > > > Thanks, > > igor From vladimir.kozlov at oracle.com Tue Aug 30 18:17:40 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Aug 2011 18:17:40 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: References: <4E5D8530.5050507@oracle.com> Message-ID: <4E5D8BB4.7020505@oracle.com> Then it is fine. Changes looks good. Vladimir Igor Veresov wrote: > I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. > > igor > > On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote: > >> Igor, >> >> May be you need to increase size only if VerifyOops is specified. What do you think? >> >> Vladimir >> >> Igor Veresov wrote: >>> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. >>> >>> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. >>> >>> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ >>> >>> Thanks, >>> igor > > From tom.rodriguez at oracle.com Tue Aug 30 18:24:08 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 30 Aug 2011 18:24:08 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: References: <4E5D8530.5050507@oracle.com> Message-ID: <431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com> On Aug 30, 2011, at 6:12 PM, Igor Veresov wrote: > I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. It's 32k * wordSize so it's already twice as big on 64 bit. We might want to revisit these limits for tiered though since profiling generates quite a bit of extra code. tom > > igor > > On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote: > >> Igor, >> >> May be you need to increase size only if VerifyOops is specified. What do you think? >> >> Vladimir >> >> Igor Veresov wrote: >>> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. >>> >>> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. >>> >>> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ >>> >>> Thanks, >>> igor > > From tom.rodriguez at oracle.com Tue Aug 30 18:24:19 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 30 Aug 2011 18:24:19 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: <6EF543BAD8094CB7AAEBD9A2CED6BE2C@oracle.com> References: <99BF7288-C64C-4F1A-93AD-70E668343872@oracle.com> <6EF543BAD8094CB7AAEBD9A2CED6BE2C@oracle.com> Message-ID: On Aug 30, 2011, at 6:09 PM, Igor Veresov wrote: > I think it's being taken care of here: > > static int desired_max_code_buffer_size() { > #ifndef PPC > return (int) NMethodSizeLimit; // default 256K or 512K > #else > // conditional branches on PPC are restricted to 16 bit signed > return MIN2((unsigned int)NMethodSizeLimit,32*K); > #endif > } Ah, that's what I'm thinking of. tom > > igor > > On Tuesday, August 30, 2011 at 5:51 PM, Tom Rodriguez wrote: > >> >> On Aug 30, 2011, at 5:19 PM, Igor Veresov wrote: >> >>> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. >>> >>> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. >> >> The 2K limit is fine. I have some memory that the NMethodSizeLimit may be set at 32K because of the reach of branches on some platform. I can't remember for sure though. >> >> tom >> >>> >>> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ >>> >>> Thanks, >>> igor > > From vladimir.kozlov at ORACLE.COM Tue Aug 30 18:14:15 2011 From: vladimir.kozlov at ORACLE.COM (Vladimir Kozlov) Date: Tue, 30 Aug 2011 18:14:15 -0700 Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken In-Reply-To: <432D4798-1664-4B0A-9C4D-B5A47174D5FD@oracle.com> References: <4E5D5580.9010604@oracle.com> <432D4798-1664-4B0A-9C4D-B5A47174D5FD@oracle.com> Message-ID: <4E5D8AE7.2050404@oracle.com> Thank you, Tom Vladimir Tom Rodriguez wrote: > Looks good. > > tom > > On Aug 30, 2011, at 2:26 PM, Vladimir Kozlov wrote: > >> http://cr.openjdk.java.net/~kvn/7085137/webrev >> >> 7085137: -XX:+VerifyOops is broken >> >> I hit my new assert about different code emit size (7063629) when I specified -XX:+VerifyOops on sparc. It uses set((intptr_t)msg, O0) instruction to set address of message which is new each time, as result set() size could be different. >> Replace set() with patchable_set() to generate 8 instructions always. >> Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg(). >> >> Thanks, >> Vladimir > From igor.veresov at oracle.com Tue Aug 30 18:37:25 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 30 Aug 2011 18:37:25 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: <431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com> References: <4E5D8530.5050507@oracle.com> <431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com> Message-ID: On Tuesday, August 30, 2011 at 6:24 PM, Tom Rodriguez wrote: > > On Aug 30, 2011, at 6:12 PM, Igor Veresov wrote: > > > I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. > > It's 32k * wordSize so it's already twice as big on 64 bit. We might want to revisit these limits for tiered though since profiling generates quite a bit of extra code. > Yes, of course you're right. I was thinking about something else when I replied... I guess we could make the increase predicated upon the verification, but I thought it should be pretty harmless to increase it since those buffers are allocated only once per compiler thread. igor > tom > > > > > igor > > > > On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote: > > > > > Igor, > > > > > > May be you need to increase size only if VerifyOops is specified. What do you think? > > > > > > Vladimir > > > > > > Igor Veresov wrote: > > > > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. > > > > > > > > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. > > > > > > > > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ > > > > > > > > Thanks, > > > > igor From tom.rodriguez at oracle.com Tue Aug 30 18:38:07 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 30 Aug 2011 18:38:07 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: References: <4E5D8530.5050507@oracle.com> <431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com> Message-ID: On Aug 30, 2011, at 6:37 PM, Igor Veresov wrote: > On Tuesday, August 30, 2011 at 6:24 PM, Tom Rodriguez wrote: >> >> On Aug 30, 2011, at 6:12 PM, Igor Veresov wrote: >> >>> I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. >> >> It's 32k * wordSize so it's already twice as big on 64 bit. We might want to revisit these limits for tiered though since profiling generates quite a bit of extra code. >> > Yes, of course you're right. I was thinking about something else when I replied... > > I guess we could make the increase predicated upon the verification, but I thought it should be pretty harmless to increase it since those buffers are allocated only once per compiler thread. Either way. tom > > igor > >> tom >> >>> >>> igor >>> >>> On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote: >>> >>>> Igor, >>>> >>>> May be you need to increase size only if VerifyOops is specified. What do you think? >>>> >>>> Vladimir >>>> >>>> Igor Veresov wrote: >>>>> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. >>>>> >>>>> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ >>>>> >>>>> Thanks, >>>>> igor > > From igor.veresov at oracle.com Tue Aug 30 18:42:45 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 30 Aug 2011 18:42:45 -0700 Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and CompressedOops In-Reply-To: References: <4E5D8530.5050507@oracle.com> <431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com> Message-ID: I'll just go with increasing it. Otherwise we'll have to factor in tiered, compressed oops, verification. Thanks Tom and Vladimir! igor On Tuesday, August 30, 2011 at 6:38 PM, Tom Rodriguez wrote: > > On Aug 30, 2011, at 6:37 PM, Igor Veresov wrote: > > > On Tuesday, August 30, 2011 at 6:24 PM, Tom Rodriguez wrote: > > > > > > On Aug 30, 2011, at 6:12 PM, Igor Veresov wrote: > > > > > > > I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. > > > > > > It's 32k * wordSize so it's already twice as big on 64 bit. We might want to revisit these limits for tiered though since profiling generates quite a bit of extra code. > > Yes, of course you're right. I was thinking about something else when I replied... > > > > I guess we could make the increase predicated upon the verification, but I thought it should be pretty harmless to increase it since those buffers are allocated only once per compiler thread. > > Either way. > > tom > > > > > igor > > > > > tom > > > > > > > > > > > igor > > > > > > > > On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote: > > > > > > > > > Igor, > > > > > > > > > > May be you need to increase size only if VerifyOops is specified. What do you think? > > > > > > > > > > Vladimir > > > > > > > > > > Igor Veresov wrote: > > > > > > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. > > > > > > > > > > > > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. > > > > > > > > > > > > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/ > > > > > > > > > > > > Thanks, > > > > > > igor From igor.veresov at oracle.com Tue Aug 30 18:47:36 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 30 Aug 2011 18:47:36 -0700 Subject: RFR(S): 7066841: remove MacroAssembler::br_on_reg_cond() on sparc In-Reply-To: <4E5D15B1.9010006@oracle.com> References: <4E5D15B1.9010006@oracle.com> Message-ID: <4EA6DFEB650F440C8DCBD000C1F07B34@oracle.com> Looks good. igor On Tuesday, August 30, 2011 at 9:54 AM, John Cuthbertson wrote: > Hi Everyone, > > Can I have couple of volunteers look over these changes? The webrev can > be found at: http://cr.openjdk.java.net/~johnc/7066841/webrev.0/. > > These changes basically remove the macro assembler routine > br_on_reg_cond and replace the remaining calls to that routine, in the > G1 barriers, with an equivalent. > > Testing: GC test suite and Kitchensink on 32/64 bit sparc with -Xint, > -client -Xcomp, -XX:+TieredCompilation -XX:TieredStopAtLevel=1, and > default. VerifyDuringGC and VerifyBeforeGC were also enabled to detect > missing barriers. > > Thanks, > > JohnC From igor.veresov at oracle.com Tue Aug 30 21:28:26 2011 From: igor.veresov at oracle.com (igor.veresov at oracle.com) Date: Wed, 31 Aug 2011 04:28:26 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7085279: C1 overflows code buffer with VerifyOops and CompressedOops Message-ID: <20110831042828.9290C47248@hg.openjdk.java.net> Changeset: b346f13112d8 Author: iveresov Date: 2011-08-30 19:01 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/b346f13112d8 7085279: C1 overflows code buffer with VerifyOops and CompressedOops Summary: Increase the limit of code emitted per LIR instruction, increase the max size of the nmethod generated by C1 Reviewed-by: never, kvn, johnc ! src/share/vm/c1/c1_LIRAssembler.cpp ! src/share/vm/c1/c1_globals.hpp From christian.thalinger at oracle.com Wed Aug 31 03:36:52 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 31 Aug 2011 12:36:52 +0200 Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline bytecoded method handle adapters In-Reply-To: <1C3853BB-211B-4082-950A-B837A4582775@oracle.com> References: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com> <1C3853BB-211B-4082-950A-B837A4582775@oracle.com> Message-ID: On Aug 30, 2011, at 9:08 PM, Tom Rodriguez wrote: > c1_GraphBuilder.cpp: > > + } else if (receiver->as_CheckCast()) { > > I think this should be more robust. The as_Phi and operand_count checks should be part of this guard instead of being asserts. I changed that and updated the webrev. > > I assume this will be updated to do the optimization for VCS as well? Otherwise it looks good. For the VCS optimization, I decided to split that off into its own CR since there where a couple of overlaps between C1 and C2. It's covered by: 7085404: JSR 292: VolatileCallSites should have push notification too http://cr.openjdk.java.net/~twisti/7085404/ To get this right the order of pushing these related CRs will be: 1. 7079673: JSR 292: C1 should inline bytecoded method handle adapters 2. 7085404: JSR 292: VolatileCallSites should have push notification too 3. 7071709: JSR 292: switchpoint invalidation should be pushed not pulled -- Christian > > tom > > On Aug 30, 2011, at 9:21 AM, Christian Thalinger wrote: > >> http://cr.openjdk.java.net/~twisti/7079673/ >> >> 7079673: JSR 292: C1 should inline bytecoded method handle adapters >> Reviewed-by: >> >> The current JSR 292 support in C1 always does an invoke for method >> handle invokes which results in a lot of C2I-I2C transfers. This >> results in very poor performance. >> >> src/share/vm/c1/c1_GraphBuilder.cpp >> src/share/vm/c1/c1_GraphBuilder.hpp >> src/share/vm/c1/c1_Instruction.cpp >> src/share/vm/c1/c1_Instruction.hpp >> src/share/vm/classfile/javaClasses.cpp >> src/share/vm/classfile/vmSymbols.hpp >> > From christian.thalinger at oracle.com Wed Aug 31 03:42:58 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 31 Aug 2011 12:42:58 +0200 Subject: review for 7016881: JSR 292: JDI: sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds In-Reply-To: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com> References: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com> Message-ID: Looks good. -- Christian On Aug 31, 2011, at 1:12 AM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/7016881 > 1 line changed: 0 ins; 0 del; 1 mod; 233 unchg > > 7016881: JSR 292: JDI: sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds > Reviewed-by: > > This was a bug in the 7012081 changes. A reference to rawIndex wasn't > updated to poolIndex so some times the wrong index was used resulting in > exceptions. Tested with failing test. > From christian.thalinger at oracle.com Wed Aug 31 03:42:07 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 31 Aug 2011 12:42:07 +0200 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com> <0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com> <2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com> Message-ID: On Aug 29, 2011, at 9:41 PM, John Rose wrote: > Yes, deal with volatile fields later. > > I do think that VCS should get push notif now. They will: 7085404: JSR 292: VolatileCallSites should have push notification too http://cr.openjdk.java.net/~twisti/7085404/ This patch now only contains the SwitchPoint optimization and will be pushed as the last of my fixes (as stated in an earlier email): http://cr.openjdk.java.net/~twisti/7071709/ Tom, John, can you review this again? -- Christian > > -- John (on my iPhone) > > On Aug 29, 2011, at 11:56 AM, Christian Thalinger wrote: > >> Yes, I think that would be the best approach. John, what do you think, optimize CCS and MCS for now and deal with VCS later? From christian.thalinger at oracle.com Wed Aug 31 08:18:21 2011 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Wed, 31 Aug 2011 15:18:21 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7078382: JSR 292: don't count method handle adapters against inlining budgets Message-ID: <20110831151823.8BB7B47261@hg.openjdk.java.net> Changeset: de847cac9235 Author: twisti Date: 2011-08-31 01:40 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/de847cac9235 7078382: JSR 292: don't count method handle adapters against inlining budgets Reviewed-by: kvn, never ! src/share/vm/c1/c1_GraphBuilder.cpp ! src/share/vm/ci/ciMethod.cpp ! src/share/vm/ci/ciMethod.hpp ! src/share/vm/ci/ciStreams.hpp ! src/share/vm/interpreter/bytecodes.hpp ! src/share/vm/opto/bytecodeInfo.cpp From tom.rodriguez at oracle.com Wed Aug 31 10:24:30 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 31 Aug 2011 10:24:30 -0700 Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline bytecoded method handle adapters In-Reply-To: References: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com> <1C3853BB-211B-4082-950A-B837A4582775@oracle.com> Message-ID: On Aug 31, 2011, at 3:36 AM, Christian Thalinger wrote: > > On Aug 30, 2011, at 9:08 PM, Tom Rodriguez wrote: > >> c1_GraphBuilder.cpp: >> >> + } else if (receiver->as_CheckCast()) { >> >> I think this should be more robust. The as_Phi and operand_count checks should be part of this guard instead of being asserts. > > I changed that and updated the webrev. > >> >> I assume this will be updated to do the optimization for VCS as well? Otherwise it looks good. > > For the VCS optimization, I decided to split that off into its own CR since there where a couple of overlaps between C1 and C2. It's covered by: > > 7085404: JSR 292: VolatileCallSites should have push notification too > > http://cr.openjdk.java.net/~twisti/7085404/ > > To get this right the order of pushing these related CRs will be: > > 1. 7079673: JSR 292: C1 should inline bytecoded method handle adapters > 2. 7085404: JSR 292: VolatileCallSites should have push notification too > 3. 7071709: JSR 292: switchpoint invalidation should be pushed not pulled These all look good. tom > > -- Christian > >> >> tom >> >> On Aug 30, 2011, at 9:21 AM, Christian Thalinger wrote: >> >>> http://cr.openjdk.java.net/~twisti/7079673/ >>> >>> 7079673: JSR 292: C1 should inline bytecoded method handle adapters >>> Reviewed-by: >>> >>> The current JSR 292 support in C1 always does an invoke for method >>> handle invokes which results in a lot of C2I-I2C transfers. This >>> results in very poor performance. >>> >>> src/share/vm/c1/c1_GraphBuilder.cpp >>> src/share/vm/c1/c1_GraphBuilder.hpp >>> src/share/vm/c1/c1_Instruction.cpp >>> src/share/vm/c1/c1_Instruction.hpp >>> src/share/vm/classfile/javaClasses.cpp >>> src/share/vm/classfile/vmSymbols.hpp >>> >> > From christian.thalinger at oracle.com Wed Aug 31 11:45:14 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 31 Aug 2011 20:45:14 +0200 Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline bytecoded method handle adapters In-Reply-To: References: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com> <1C3853BB-211B-4082-950A-B837A4582775@oracle.com> Message-ID: On Aug 31, 2011, at 7:24 PM, Tom Rodriguez wrote: > > On Aug 31, 2011, at 3:36 AM, Christian Thalinger wrote: > >> >> On Aug 30, 2011, at 9:08 PM, Tom Rodriguez wrote: >> >>> c1_GraphBuilder.cpp: >>> >>> + } else if (receiver->as_CheckCast()) { >>> >>> I think this should be more robust. The as_Phi and operand_count checks should be part of this guard instead of being asserts. >> >> I changed that and updated the webrev. >> >>> >>> I assume this will be updated to do the optimization for VCS as well? Otherwise it looks good. >> >> For the VCS optimization, I decided to split that off into its own CR since there where a couple of overlaps between C1 and C2. It's covered by: >> >> 7085404: JSR 292: VolatileCallSites should have push notification too >> >> http://cr.openjdk.java.net/~twisti/7085404/ >> >> To get this right the order of pushing these related CRs will be: >> >> 1. 7079673: JSR 292: C1 should inline bytecoded method handle adapters >> 2. 7085404: JSR 292: VolatileCallSites should have push notification too >> 3. 7071709: JSR 292: switchpoint invalidation should be pushed not pulled > > These all look good. Thanks, Tom. -- Christian > > tom > >> >> -- Christian >> >>> >>> tom >>> >>> On Aug 30, 2011, at 9:21 AM, Christian Thalinger wrote: >>> >>>> http://cr.openjdk.java.net/~twisti/7079673/ >>>> >>>> 7079673: JSR 292: C1 should inline bytecoded method handle adapters >>>> Reviewed-by: >>>> >>>> The current JSR 292 support in C1 always does an invoke for method >>>> handle invokes which results in a lot of C2I-I2C transfers. This >>>> results in very poor performance. >>>> >>>> src/share/vm/c1/c1_GraphBuilder.cpp >>>> src/share/vm/c1/c1_GraphBuilder.hpp >>>> src/share/vm/c1/c1_Instruction.cpp >>>> src/share/vm/c1/c1_Instruction.hpp >>>> src/share/vm/classfile/javaClasses.cpp >>>> src/share/vm/classfile/vmSymbols.hpp >>>> >>> >> > From vladimir.kozlov at oracle.com Wed Aug 31 12:08:23 2011 From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com) Date: Wed, 31 Aug 2011 19:08:23 +0000 Subject: hg: hsx/hotspot-comp/hotspot: 7085137: -XX:+VerifyOops is broken Message-ID: <20110831190826.731244726B@hg.openjdk.java.net> Changeset: a64d352d1118 Author: kvn Date: 2011-08-31 09:48 -0700 URL: http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a64d352d1118 7085137: -XX:+VerifyOops is broken Summary: Replace set() with patchable_set() to generate 8 instructions always. Reviewed-by: iveresov, never, roland ! src/cpu/sparc/vm/assembler_sparc.cpp ! src/cpu/sparc/vm/sparc.ad From tom.rodriguez at oracle.com Wed Aug 31 12:56:38 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 31 Aug 2011 12:56:38 -0700 Subject: review for 7051798: SA-JDI: NPE in Frame.addressOfStackSlot(Frame.java:244) Message-ID: <6D60141A-85A4-4762-946E-A4509CDA2CAA@oracle.com> http://cr.openjdk.java.net/~never/7051798 1346 lines changed: 585 ins; 637 del; 124 mod; 26143 unchg 7051798: SA-JDI: NPE in Frame.addressOfStackSlot(Frame.java:244) Reviewed-by: The SA was never updated to handle ricochet frames so stack walking was broken when they were encountered. The X86 stack walking code hadn't been updated in a while so I sync'ed it the current version of frame_x86.cpp and eliminated the AMD64 variants of many of these classes since they should be exactly that same. All SA related exceptions in the mlvm test have been fixed. I had to convert the PcDesc flags into masks since the SA can't deal with bitfields. Because of some JDI features being used by the test I had to fix other unreported SAJDI issues when asking for locals for optimized and native frames. I also hit an unreported assertion failure in C1 with large frames. Tested with failing mlvm sajdi tests from report plus the regular tmtools and sajdi test to stress the stack walking. From john.r.rose at oracle.com Wed Aug 31 14:34:26 2011 From: john.r.rose at oracle.com (John Rose) Date: Wed, 31 Aug 2011 14:34:26 -0700 Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation should be pushed not pulled In-Reply-To: References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com> <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com> <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com> <0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com> <2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com> Message-ID: On Aug 31, 2011, at 3:42 AM, Christian Thalinger wrote: > This patch now only contains the SwitchPoint optimization and will be pushed as the last of my fixes (as stated in an earlier email): > > http://cr.openjdk.java.net/~twisti/7071709/ > > Tom, John, can you review this again? It is good, but I have a question. What happens when this line produces a null value for the target (because of -Xcomp etc.): ciMethodHandle* target = call_site->get_target(); Shouldn't there be a guard for that edge case, in case Murphy's Law kicks in? -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110831/7b72e42f/attachment.html From tom.rodriguez at oracle.com Wed Aug 31 15:32:09 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 31 Aug 2011 15:32:09 -0700 Subject: review for 7083786: dead various dead chunks of code Message-ID: http://cr.openjdk.java.net/~never/7083786 180 lines changed: 0 ins; 178 del; 2 mod; 32710 unchg 7083786: dead various dead chunks of code Reviewed-by: Delete some dead code. Tested with JPRT. From igor.veresov at oracle.com Wed Aug 31 16:24:55 2011 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 31 Aug 2011 16:24:55 -0700 Subject: review for 7083786: dead various dead chunks of code In-Reply-To: References: Message-ID: <7CF3CFF69B5042E0B27A86A12214C804@oracle.com> Looks good. igor On Wednesday, August 31, 2011 at 3:32 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/7083786 > 180 lines changed: 0 ins; 178 del; 2 mod; 32710 unchg > > 7083786: dead various dead chunks of code > Reviewed-by: > > Delete some dead code. Tested with JPRT. From vladimir.kozlov at oracle.com Wed Aug 31 16:33:37 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 31 Aug 2011 16:33:37 -0700 Subject: review for 7083786: dead various dead chunks of code In-Reply-To: References: Message-ID: <4E5EC4D1.4010502@oracle.com> Looks good. How did you find all these cases (except #if 0)? Thanks, Vladimir Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/7083786 > 180 lines changed: 0 ins; 178 del; 2 mod; 32710 unchg > > 7083786: dead various dead chunks of code > Reviewed-by: > > Delete some dead code. Tested with JPRT. > From tom.rodriguez at oracle.com Wed Aug 31 16:44:07 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 31 Aug 2011 16:44:07 -0700 Subject: review for 7083786: dead various dead chunks of code In-Reply-To: <4E5EC4D1.4010502@oracle.com> References: <4E5EC4D1.4010502@oracle.com> Message-ID: <0B5DF037-D4A4-4D84-BC8B-AF8A7D34346F@oracle.com> I noticed them when doing various other changes and ended up collecting them. Volker reported one of them. Thanks! tom On Aug 31, 2011, at 4:33 PM, Vladimir Kozlov wrote: > Looks good. How did you find all these cases (except #if 0)? > > Thanks, > Vladimir > > Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/7083786 >> 180 lines changed: 0 ins; 178 del; 2 mod; 32710 unchg >> 7083786: dead various dead chunks of code >> Reviewed-by: >> Delete some dead code. Tested with JPRT. From vladimir.kozlov at oracle.com Wed Aug 31 17:09:29 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 31 Aug 2011 17:09:29 -0700 Subject: review for 7051798: SA-JDI: NPE in Frame.addressOfStackSlot(Frame.java:244) In-Reply-To: <6D60141A-85A4-4762-946E-A4509CDA2CAA@oracle.com> References: <6D60141A-85A4-4762-946E-A4509CDA2CAA@oracle.com> Message-ID: <4E5ECD39.9060106@oracle.com> I think it looks good. Thanks, Vladimir Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/7051798 > 1346 lines changed: 585 ins; 637 del; 124 mod; 26143 unchg > > 7051798: SA-JDI: NPE in Frame.addressOfStackSlot(Frame.java:244) > Reviewed-by: > > The SA was never updated to handle ricochet frames so stack walking > was broken when they were encountered. The X86 stack walking code > hadn't been updated in a while so I sync'ed it the current version of > frame_x86.cpp and eliminated the AMD64 variants of many of these > classes since they should be exactly that same. All SA related > exceptions in the mlvm test have been fixed. I had to convert the > PcDesc flags into masks since the SA can't deal with bitfields. > > Because of some JDI features being used by the test I had to fix other > unreported SAJDI issues when asking for locals for optimized and > native frames. I also hit an unreported assertion failure in C1 with > large frames. > > Tested with failing mlvm sajdi tests from report plus the regular > tmtools and sajdi test to stress the stack walking. >