From duncan.macgregor at ge.com Mon Jan 5 15:50:00 2015 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management)) Date: Mon, 5 Jan 2015 15:50:00 +0000 Subject: That was the year that was. Message-ID: <978D194CF4618446926EDAD23C949D9939BC32F5@LONURLNA08.e2k.ad.ge.com> Since it's now the new year I thought it was a good opportunity to look back on progress we've made in Magik on Java over the course of the last twelve months. In my JVMLS talk I mentioned LF memory usage and startup time as areas of concern, as did Marcus and others. Over the last couple of months I and a couple of other team members have been given the time to seriously look at our startup time and performance and, along with the changes made in 8u40, have made substantial progress. Startup time Getting our system to boot on Linux, using Solaris Studio and other profiling tools, and producing piles and piles of flame graphs has proved very useful in analysing startup time, and has shown up some areas of our own legacy infrastructure that were contributing substantially to our startup time, but reducing the total number of classes generated has also greatly reduced our startup. Due to the nature of the language we do need to evaluate as we compile, so have introduced a two stage compilation process where we compile and evaluate files in small chunks but do not write out those class files, rather generating one large class file representing the whole source file at the end. On typical application code this has reduced the class count from by 75% and substantially reduced the class loading time (also greatly reducing the time spent resolving method handle constants - partly why I haven't had version 2 of that patch higher on my priority queue - sorry John). linkCallSite and friends (especially setTarget) still show up significantly on flame graphs (almost 17% of samples). The time to create a mutable callsite appears to be almost completely dominated by the MethodHandleNatives.setCallsiteTargetNormal call commonly done in a the constructor of the callsite itself). Some quick and dirty instrumentation shows that we create about 50% more constant call sites for symbols than we do mutable call sites for method calls, but the constant sites show up in about 1/60th of the traces compared to the mutable sites. Another 12% of startup is taken up with reseting callsite targets after the fallback has been invoked. I?m not sure how much more time we?ll get to work on this area, or whether startup time (or at least this portion of it) will be regarded as ?good enough? but there seem to be a couple of avenues we could explore to improve things 1. We could look at refactoring our code so that setTarget does not need to be used when initialising our mutable call sites. Since most sites need a fallback method bound to themselves in some way this would require refactoring our code to create objects that hold a MutableCallSite, rather than subclass MutableCallSite. This might help to further our plans at decomposing call sites into their functional parts, but is something I?m not going to explore without doing some thorough benchmarking first. 2. It?s also worth digging into when it is worth resetting a callsite?s target. Mutable sites hit during bootstrap frequently only get used once, or at most a small number of times, so we might do better gathering some type information and only setting the target when it seems worth the cost. We?ve considered a couple of more radical approaches to reducing startup time, mostly around either implementing an interpreter to handle the bootstrap code (because it?s always fun to maintain an interpreter and a compiler) or some form of serialisation (tricky to get right and fit in with modularisation work) but I?m more than open to any other wacky ideas people want to throw in. Memory The LambdaForm changes have had an excellent effect on application memory usage. There's still plenty of room to reduce it but that it's probably more for us to optimise our core and application code rather than fundamental JVM issues now. Anyway, happy new year to everyone on the mlvm list, Duncan. From forax at univ-mlv.fr Tue Jan 6 07:51:35 2015 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 06 Jan 2015 08:51:35 +0100 Subject: Invokedynamic and recursive method call In-Reply-To: <54A3019C.1070909@univ-mlv.fr> References: <54A3019C.1070909@univ-mlv.fr> Message-ID: <54AB9407.7020805@univ-mlv.fr> ping ? R?mi On 12/30/2014 08:48 PM, Remi Forax wrote: > Hi guys, > I've found a bug in the interaction between the lambda form and > inlining algorithm, > basically if the inlining heuristic bailout because the method is > recursive and already inlined once, > instead to emit a code to do a direct call, it revert to do call to > linkStatic with the method > as MemberName. > > I think it's a regression because before the introduction of lambda > forms, > I'm pretty sure that the JIT was emitting a direct call. > > Step to reproduce with nashorn, run this JavaScript code > function fibo(n) { > return (n < 2)? 1: fibo(n - 1) + fibo(n - 2) > } > > print(fibo(45)) > > like this: > /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions > -J-XX:+PrintAssembly fibo.js > log.txt > > look for a method 'fibo' from the tail of the log, you will find > something like this: > > 0x00007f97e4b4743f: mov $0x76d08f770,%r8 ; {oop(a > 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' > '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' > in 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')} > 0x00007f97e4b47449: xchg %ax,%ax > 0x00007f97e4b4744b: callq 0x00007f97dd0446e0 > > I hope this can be fixed. My demonstration that I can have fibo > written with a dynamic language > that run as fast as written in Java doesn't work anymore :( > > cheers, > R?mi > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From marcus.lagergren at oracle.com Wed Jan 7 09:43:26 2015 From: marcus.lagergren at oracle.com (Marcus Lagergren) Date: Wed, 7 Jan 2015 10:43:26 +0100 Subject: Invokedynamic and recursive method call In-Reply-To: <54A3019C.1070909@univ-mlv.fr> References: <54A3019C.1070909@univ-mlv.fr> Message-ID: <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> Remi, I tried to reproduce your problem with jdk9 b44. It runs decently fast. When did it start to regress? Regards Marcus > On 30 Dec 2014, at 20:48, Remi Forax wrote: > > Hi guys, > I've found a bug in the interaction between the lambda form and inlining algorithm, > basically if the inlining heuristic bailout because the method is recursive and already inlined once, > instead to emit a code to do a direct call, it revert to do call to linkStatic with the method > as MemberName. > > I think it's a regression because before the introduction of lambda forms, > I'm pretty sure that the JIT was emitting a direct call. > > Step to reproduce with nashorn, run this JavaScript code > function fibo(n) { > return (n < 2)? 1: fibo(n - 1) + fibo(n - 2) > } > > print(fibo(45)) > > like this: > /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions -J-XX:+PrintAssembly fibo.js > log.txt > > look for a method 'fibo' from the tail of the log, you will find something like this: > > 0x00007f97e4b4743f: mov $0x76d08f770,%r8 ; {oop(a 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')} > 0x00007f97e4b47449: xchg %ax,%ax > 0x00007f97e4b4744b: callq 0x00007f97dd0446e0 > > I hope this can be fixed. My demonstration that I can have fibo written with a dynamic language > that run as fast as written in Java doesn't work anymore :( > > cheers, > R?mi > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From paul.sandoz at oracle.com Wed Jan 7 11:16:20 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 7 Jan 2015 12:16:20 +0100 Subject: [9] RFR (M): 8067344: Adjust java/lang/invoke/LFCaching/LFGarbageCollectedTest.java for recent changes in java.lang.invoke In-Reply-To: <549962C4.2040301@oracle.com> References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com> Message-ID: Hi 70 TestMethods testCase = getTestMethod(); 71 if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) { 72 // Invokers aren't collected. 73 return; 74 } Can you just filter those test cases out in the main method within EnumSet.complementOf? On Dec 23, 2014, at 1:40 PM, Vladimir Ivanov wrote: > Spotted some more problems: > - need to skip identity operations (identity_* LambdaForms) in the test, since corresponding LambdaForms reside in a permanent cache; > 82 mtype = adapter.type(); 83 if (mtype.parameterCount() == 0) { 84 // Ignore identity_* LambdaForms. 85 return; 86 } Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required? > - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance. > 78 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE); Could replace "getTestMethod()" with "testCase". Paul. > Updated version: > http://cr.openjdk.java.net/~vlivanov/8067344/webrev.01/ > > Best regards, > Vladimir Ivanov > > On 12/22/14 11:53 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8067344 >> >> LFGarbageCollectedTest should be adjusted after JDK-8057020. >> >> There are a couple of problems with the test. >> >> (1) Existing logic to test that LambdaForm instance is collected isn't >> stable enough. Consequent System.GCs can hinder reference enqueueing. >> To speed up the test, I added -XX:SoftRefLRUPolicyMSPerMB=0 and limited >> the heap by -Xmx64m. >> >> (2) MethodType-based invoker caches are deliberately left strongly >> reachable. So, they should be skipped in the test. >> >> (3) Added additional diagnostic output to simplify failure analysis >> (test case details, method handle type and LambdaForm, heap dump >> (optional, -DHEAP_DUMP=true)). >> >> Testing: failing test. >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From forax at univ-mlv.fr Wed Jan 7 16:13:48 2015 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 07 Jan 2015 17:13:48 +0100 Subject: Invokedynamic and recursive method call In-Reply-To: <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> References: <54A3019C.1070909@univ-mlv.fr> <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> Message-ID: <54AD5B3C.80004@univ-mlv.fr> On 01/07/2015 10:43 AM, Marcus Lagergren wrote: > Remi, I tried to reproduce your problem with jdk9 b44. It runs decently fast. yes, nashorn is fast enough but it can be faster if the JIT was not doing something stupid. When the VM inline fibo, because fibo is recursive, the recursive call is inlined only once, so the call at depth=2 can not be inlined but should be a classical direct call. But if fibo is called through an invokedynamic, instead of emitting a direct call to fibo, the JIT generates a code that push the method handle on stack and execute it like if the metod handle was not constant (the method handle is constant because the call at depth=1 is inlined !). > When did it start to regress? jdk7u40, i believe. I've created a jar containing some handwritten bytecodes with no dependency to reproduce the issue easily: https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar [forax at localhost test7]$ time /usr/jdk/jdk1.9.0/bin/java -cp fibo7.jar FiboSample 1836311903 real 0m6.653s user 0m6.729s sys 0m0.019s [forax at localhost test7]$ time /usr/jdk/jdk1.8.0_25/bin/java -cp fibo7.jar FiboSample 1836311903 real 0m6.572s user 0m6.591s sys 0m0.019s [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_71/bin/java -cp fibo7.jar FiboSample 1836311903 real 0m6.373s user 0m6.396s sys 0m0.016s [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_25/bin/java -cp fibo7.jar FiboSample 1836311903 real 0m4.847s user 0m4.832s sys 0m0.019s as you can see, it was faster with a JDK before jdk7u40. > > Regards > Marcus cheers, R?mi > >> On 30 Dec 2014, at 20:48, Remi Forax wrote: >> >> Hi guys, >> I've found a bug in the interaction between the lambda form and inlining algorithm, >> basically if the inlining heuristic bailout because the method is recursive and already inlined once, >> instead to emit a code to do a direct call, it revert to do call to linkStatic with the method >> as MemberName. >> >> I think it's a regression because before the introduction of lambda forms, >> I'm pretty sure that the JIT was emitting a direct call. >> >> Step to reproduce with nashorn, run this JavaScript code >> function fibo(n) { >> return (n < 2)? 1: fibo(n - 1) + fibo(n - 2) >> } >> >> print(fibo(45)) >> >> like this: >> /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions -J-XX:+PrintAssembly fibo.js > log.txt >> >> look for a method 'fibo' from the tail of the log, you will find something like this: >> >> 0x00007f97e4b4743f: mov $0x76d08f770,%r8 ; {oop(a 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')} >> 0x00007f97e4b47449: xchg %ax,%ax >> 0x00007f97e4b4744b: callq 0x00007f97dd0446e0 >> >> I hope this can be fixed. My demonstration that I can have fibo written with a dynamic language >> that run as fast as written in Java doesn't work anymore :( >> >> cheers, >> R?mi >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From marcus.lagergren at oracle.com Wed Jan 7 16:27:35 2015 From: marcus.lagergren at oracle.com (Marcus Lagergren) Date: Wed, 7 Jan 2015 17:27:35 +0100 Subject: Invokedynamic and recursive method call In-Reply-To: <54AD5B3C.80004@univ-mlv.fr> References: <54A3019C.1070909@univ-mlv.fr> <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> <54AD5B3C.80004@univ-mlv.fr> Message-ID: <4B1C6F6B-08FB-464C-B130-896C00833EE5@oracle.com> 7u40 is when the native invoke dynamic implementation was replaced with Lambda Forms :-/ /M > On 07 Jan 2015, at 17:13, Remi Forax wrote: > > > On 01/07/2015 10:43 AM, Marcus Lagergren wrote: >> Remi, I tried to reproduce your problem with jdk9 b44. It runs decently fast. > > yes, nashorn is fast enough but it can be faster if the JIT was not doing something stupid. > > When the VM inline fibo, because fibo is recursive, the recursive call is inlined only once, > so the call at depth=2 can not be inlined but should be a classical direct call. > > But if fibo is called through an invokedynamic, instead of emitting a direct call to fibo, > the JIT generates a code that push the method handle on stack and execute it > like if the metod handle was not constant > (the method handle is constant because the call at depth=1 is inlined !). > >> When did it start to regress? > > jdk7u40, i believe. > > I've created a jar containing some handwritten bytecodes with no dependency to reproduce the issue easily: > https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar > > [forax at localhost test7]$ time /usr/jdk/jdk1.9.0/bin/java -cp fibo7.jar FiboSample > 1836311903 > > real 0m6.653s > user 0m6.729s > sys 0m0.019s > [forax at localhost test7]$ time /usr/jdk/jdk1.8.0_25/bin/java -cp fibo7.jar FiboSample > 1836311903 > > real 0m6.572s > user 0m6.591s > sys 0m0.019s > [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_71/bin/java -cp fibo7.jar FiboSample > 1836311903 > > real 0m6.373s > user 0m6.396s > sys 0m0.016s > [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_25/bin/java -cp fibo7.jar FiboSample > 1836311903 > > real 0m4.847s > user 0m4.832s > sys 0m0.019s > > as you can see, it was faster with a JDK before jdk7u40. > >> >> Regards >> Marcus > > cheers, > R?mi > >> >>> On 30 Dec 2014, at 20:48, Remi Forax wrote: >>> >>> Hi guys, >>> I've found a bug in the interaction between the lambda form and inlining algorithm, >>> basically if the inlining heuristic bailout because the method is recursive and already inlined once, >>> instead to emit a code to do a direct call, it revert to do call to linkStatic with the method >>> as MemberName. >>> >>> I think it's a regression because before the introduction of lambda forms, >>> I'm pretty sure that the JIT was emitting a direct call. >>> >>> Step to reproduce with nashorn, run this JavaScript code >>> function fibo(n) { >>> return (n < 2)? 1: fibo(n - 1) + fibo(n - 2) >>> } >>> >>> print(fibo(45)) >>> >>> like this: >>> /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions -J-XX:+PrintAssembly fibo.js > log.txt >>> >>> look for a method 'fibo' from the tail of the log, you will find something like this: >>> >>> 0x00007f97e4b4743f: mov $0x76d08f770,%r8 ; {oop(a 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')} >>> 0x00007f97e4b47449: xchg %ax,%ax >>> 0x00007f97e4b4744b: callq 0x00007f97dd0446e0 >>> >>> I hope this can be fixed. My demonstration that I can have fibo written with a dynamic language >>> that run as fast as written in Java doesn't work anymore :( >>> >>> cheers, >>> R?mi >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From headius at headius.com Wed Jan 7 18:07:23 2015 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 7 Jan 2015 12:07:23 -0600 Subject: Invokedynamic and recursive method call In-Reply-To: <54AD5B3C.80004@univ-mlv.fr> References: <54A3019C.1070909@univ-mlv.fr> <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> <54AD5B3C.80004@univ-mlv.fr> Message-ID: This could explain performance regressions we've seen on the performance of heavily-recursive algorithms. I'll try to get an assembly dump for fib in JRuby later today. - Charlie On Wed, Jan 7, 2015 at 10:13 AM, Remi Forax wrote: > > On 01/07/2015 10:43 AM, Marcus Lagergren wrote: >> >> Remi, I tried to reproduce your problem with jdk9 b44. It runs decently >> fast. > > > yes, nashorn is fast enough but it can be faster if the JIT was not doing > something stupid. > > When the VM inline fibo, because fibo is recursive, the recursive call is > inlined only once, > so the call at depth=2 can not be inlined but should be a classical direct > call. > > But if fibo is called through an invokedynamic, instead of emitting a direct > call to fibo, > the JIT generates a code that push the method handle on stack and execute it > like if the metod handle was not constant > (the method handle is constant because the call at depth=1 is inlined !). > >> When did it start to regress? > > > jdk7u40, i believe. > > I've created a jar containing some handwritten bytecodes with no dependency > to reproduce the issue easily: > https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar > > [forax at localhost test7]$ time /usr/jdk/jdk1.9.0/bin/java -cp fibo7.jar > FiboSample > 1836311903 > > real 0m6.653s > user 0m6.729s > sys 0m0.019s > [forax at localhost test7]$ time /usr/jdk/jdk1.8.0_25/bin/java -cp fibo7.jar > FiboSample > 1836311903 > > real 0m6.572s > user 0m6.591s > sys 0m0.019s > [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_71/bin/java -cp fibo7.jar > FiboSample > 1836311903 > > real 0m6.373s > user 0m6.396s > sys 0m0.016s > [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_25/bin/java -cp fibo7.jar > FiboSample > 1836311903 > > real 0m4.847s > user 0m4.832s > sys 0m0.019s > > as you can see, it was faster with a JDK before jdk7u40. > >> >> Regards >> Marcus > > > cheers, > R?mi > > >> >>> On 30 Dec 2014, at 20:48, Remi Forax wrote: >>> >>> Hi guys, >>> I've found a bug in the interaction between the lambda form and inlining >>> algorithm, >>> basically if the inlining heuristic bailout because the method is >>> recursive and already inlined once, >>> instead to emit a code to do a direct call, it revert to do call to >>> linkStatic with the method >>> as MemberName. >>> >>> I think it's a regression because before the introduction of lambda >>> forms, >>> I'm pretty sure that the JIT was emitting a direct call. >>> >>> Step to reproduce with nashorn, run this JavaScript code >>> function fibo(n) { >>> return (n < 2)? 1: fibo(n - 1) + fibo(n - 2) >>> } >>> >>> print(fibo(45)) >>> >>> like this: >>> /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions >>> -J-XX:+PrintAssembly fibo.js > log.txt >>> >>> look for a method 'fibo' from the tail of the log, you will find >>> something like this: >>> >>> 0x00007f97e4b4743f: mov $0x76d08f770,%r8 ; {oop(a >>> 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' >>> '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in >>> 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')} >>> 0x00007f97e4b47449: xchg %ax,%ax >>> 0x00007f97e4b4744b: callq 0x00007f97dd0446e0 >>> >>> I hope this can be fixed. My demonstration that I can have fibo written >>> with a dynamic language >>> that run as fast as written in Java doesn't work anymore :( >>> >>> cheers, >>> R?mi >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From joe.darcy at oracle.com Fri Jan 9 02:53:54 2015 From: joe.darcy at oracle.com (Joseph D. Darcy) Date: Thu, 08 Jan 2015 18:53:54 -0800 Subject: [9] RFR (M): 8067344: Adjust java/lang/invoke/LFCaching/LFGarbageCollectedTest.java for recent changes in java.lang.invoke In-Reply-To: References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com> Message-ID: <54AF42C2.9070105@oracle.com> Hello, I don't have a comment on the changes to the test per se, but as someone who keeps an eye on test failures that occur in regression tests in the jdk repo of the JDK 9 dev forest, I'd like to see this test stop failing, either by the test being fixed for, barring that, the testing being @ignore-d in some way until the semantics of the test can be corrected. Thanks, -Joe On 1/7/2015 3:16 AM, Paul Sandoz wrote: > Hi > > 70 TestMethods testCase = getTestMethod(); > 71 if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) { > 72 // Invokers aren't collected. > 73 return; > 74 } > > Can you just filter those test cases out in the main method within EnumSet.complementOf? > > On Dec 23, 2014, at 1:40 PM, Vladimir Ivanov wrote: > >> Spotted some more problems: >> - need to skip identity operations (identity_* LambdaForms) in the test, since corresponding LambdaForms reside in a permanent cache; >> > 82 mtype = adapter.type(); > 83 if (mtype.parameterCount() == 0) { > 84 // Ignore identity_* LambdaForms. > 85 return; > 86 } > > Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required? > > >> - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance. >> > 78 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE); > > > Could replace "getTestMethod()" with "testCase". > > Paul. > >> Updated version: >> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.01/ >> >> Best regards, >> Vladimir Ivanov >> >> On 12/22/14 11:53 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8067344 >>> >>> LFGarbageCollectedTest should be adjusted after JDK-8057020. >>> >>> There are a couple of problems with the test. >>> >>> (1) Existing logic to test that LambdaForm instance is collected isn't >>> stable enough. Consequent System.GCs can hinder reference enqueueing. >>> To speed up the test, I added -XX:SoftRefLRUPolicyMSPerMB=0 and limited >>> the heap by -Xmx64m. >>> >>> (2) MethodType-based invoker caches are deliberately left strongly >>> reachable. So, they should be skipped in the test. >>> >>> (3) Added additional diagnostic output to simplify failure analysis >>> (test case details, method handle type and LambdaForm, heap dump >>> (optional, -DHEAP_DUMP=true)). >>> >>> Testing: failing test. >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From vladimir.x.ivanov at oracle.com Mon Jan 12 18:06:54 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 12 Jan 2015 21:06:54 +0300 Subject: [9] RFR (M): 8067344: Adjust java/lang/invoke/LFCaching/LFGarbageCollectedTest.java for recent changes in java.lang.invoke In-Reply-To: References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com> Message-ID: <54B40D3E.3080608@oracle.com> Paul, Thanks for the review! Updated webrev: http://cr.openjdk.java.net/~vlivanov/8067344/webrev.02 > 70 TestMethods testCase = getTestMethod(); > 71 if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) { > 72 // Invokers aren't collected. > 73 return; > 74 } > > Can you just filter those test cases out in the main method within EnumSet.complementOf? Good point! Done. > 82 mtype = adapter.type(); > 83 if (mtype.parameterCount() == 0) { > 84 // Ignore identity_* LambdaForms. > 85 return; > 86 } > > Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required? Some transformations can rarely degenerate into identity. I share your concern, so I decided to check LambdaFor.debugName instead. >> - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance. >> > > 78 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE); > > > Could replace "getTestMethod()" with "testCase". Done. Best regards, Vladimir Ivanov From paul.sandoz at oracle.com Mon Jan 12 18:42:11 2015 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 12 Jan 2015 19:42:11 +0100 Subject: [9] RFR (M): 8067344: Adjust java/lang/invoke/LFCaching/LFGarbageCollectedTest.java for recent changes in java.lang.invoke In-Reply-To: <54B40D3E.3080608@oracle.com> References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com> <54B40D3E.3080608@oracle.com> Message-ID: <6C72F39E-3CD3-4005-BD47-0FEEFFCD1F43@oracle.com> On Jan 12, 2015, at 7:06 PM, Vladimir Ivanov wrote: > Paul, > > Thanks for the review! > Look good, +1, Paul. > Updated webrev: > http://cr.openjdk.java.net/~vlivanov/8067344/webrev.02 > >> 70 TestMethods testCase = getTestMethod(); >> 71 if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) { >> 72 // Invokers aren't collected. >> 73 return; >> 74 } >> >> Can you just filter those test cases out in the main method within EnumSet.complementOf? > Good point! Done. > >> 82 mtype = adapter.type(); >> 83 if (mtype.parameterCount() == 0) { >> 84 // Ignore identity_* LambdaForms. >> 85 return; >> 86 } >> >> Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required? > Some transformations can rarely degenerate into identity. I share your concern, so I decided to check LambdaFor.debugName instead. > >>> - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance. >>> >> >> 78 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE); >> >> >> Could replace "getTestMethod()" with "testCase". > Done. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From vladimir.x.ivanov at oracle.com Mon Jan 12 19:12:10 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 12 Jan 2015 22:12:10 +0300 Subject: [9] RFR (M): 8067344: Adjust java/lang/invoke/LFCaching/LFGarbageCollectedTest.java for recent changes in java.lang.invoke In-Reply-To: <6C72F39E-3CD3-4005-BD47-0FEEFFCD1F43@oracle.com> References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com> <54B40D3E.3080608@oracle.com> <6C72F39E-3CD3-4005-BD47-0FEEFFCD1F43@oracle.com> Message-ID: <54B41C8A.4060103@oracle.com> Thanks, Paul! Best regards, Vladimir Ivanov On 1/12/15 9:42 PM, Paul Sandoz wrote: > On Jan 12, 2015, at 7:06 PM, Vladimir Ivanov wrote: >> Paul, >> >> Thanks for the review! >> > > Look good, +1, > Paul. > >> Updated webrev: >> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.02 >> >>> 70 TestMethods testCase = getTestMethod(); >>> 71 if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) { >>> 72 // Invokers aren't collected. >>> 73 return; >>> 74 } >>> >>> Can you just filter those test cases out in the main method within EnumSet.complementOf? >> Good point! Done. >> >>> 82 mtype = adapter.type(); >>> 83 if (mtype.parameterCount() == 0) { >>> 84 // Ignore identity_* LambdaForms. >>> 85 return; >>> 86 } >>> >>> Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required? >> Some transformations can rarely degenerate into identity. I share your concern, so I decided to check LambdaFor.debugName instead. >> >>>> - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance. >>>> >>> >>> 78 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE); >>> >>> >>> Could replace "getTestMethod()" with "testCase". >> Done. >> > > From vladimir.x.ivanov at oracle.com Fri Jan 16 17:16:22 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 16 Jan 2015 20:16:22 +0300 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared Message-ID: <54B94766.2080102@oracle.com> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps & Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next thing I plan to do is to experiment with ignoring deoptimization counts for other LambdaForms which are heavily shared. I already saw problems caused by deoptimization counts pollution (see JDK-8068915 [2]). I plan to backport the fix into 8u40, once I finish extensive performance testing. Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). Thanks! PS: as a summary, my experiments show that fixes for 8063137 & 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8059877 8059877: GWT branch frequencies pollution due to LF sharing [2] https://bugs.openjdk.java.net/browse/JDK-8068915 [3] https://bugs.openjdk.java.net/browse/JDK-8046703 JEP 210: LambdaForm Reduction and Caching From vladimir.kozlov at oracle.com Fri Jan 16 20:34:50 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 16 Jan 2015 12:34:50 -0800 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54B94766.2080102@oracle.com> References: <54B94766.2080102@oracle.com> Message-ID: <54B975EA.6040005@oracle.com> Nice! At least Hotspot part since I don't understand jdk part :) I would suggest to add more detailed comment (instead of simple "Stop profiling") to inline_profileBranch() intrinsic explaining what it is doing because it is not strictly "intrinsic" - it does not implement profileBranch() java code when counts is constant. You forgot to mark Opaque4Node as macro node. I would suggest to base it on Opaque2Node then you will get some methods from it. Thanks, Vladimir On 1/16/15 9:16 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ > https://bugs.openjdk.java.net/browse/JDK-8063137 > > After GuardWithTest (GWT) LambdaForms became shared, profile pollution > significantly distorted compilation decisions. It affected inlining and > hindered some optimizations. It causes significant performance > regressions for Nashorn (on Octane benchmarks). > > Inlining was fixed by 8059877 [1], but it didn't cover the case when a > branch is never taken. It can cause missed optimization opportunity, and > not just increase in code size. For example, non-pruned branch can break > escape analysis. > > Currently, there are 2 problems: > - branch frequencies profile pollution > - deoptimization counts pollution > > Branch frequency pollution hides from JIT the fact that a branch is > never taken. Since GWT LambdaForms (and hence their bytecode) are > heavily shared, but the behavior is specific to MethodHandle, there's no > way for JIT to understand how particular GWT instance behaves. > > The solution I propose is to do profiling in Java code and feed it to > JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where > profiling info is stored. Once JIT kicks in, it can retrieve these > counts, if corresponding MethodHandle is a compile-time constant (and it > is usually the case). To communicate the profile data from Java code to > JIT, MethodHandleImpl::profileBranch() is used. > > If GWT MethodHandle isn't a compile-time constant, profiling should > proceed. It happens when corresponding LambdaForm is already shared, for > newly created GWT MethodHandles profiling can occur only in native code > (dedicated nmethod for a single LambdaForm). So, when compilation of the > whole MethodHandle chain is triggered, the profile should be already > gathered. > > Overriding branch frequencies is not enough. Statistics on > deoptimization events is also polluted. Even if a branch is never taken, > JIT doesn't issue an uncommon trap there unless corresponding bytecode > doesn't trap too much and doesn't cause too many recompiles. > > I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT > sees it on some method, Compile::too_many_traps & > Compile::too_many_recompiles for that method always return false. It > allows JIT to prune the branch based on custom profile and recompile the > method, if the branch is visited. > > For now, I wanted to keep the fix very focused. The next thing I plan to > do is to experiment with ignoring deoptimization counts for other > LambdaForms which are heavily shared. I already saw problems caused by > deoptimization counts pollution (see JDK-8068915 [2]). > > I plan to backport the fix into 8u40, once I finish extensive > performance testing. > > Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). > > Thanks! > > PS: as a summary, my experiments show that fixes for 8063137 & 8068915 > [2] almost completely recovers peak performance after LambdaForm sharing > [3]. There's one more problem left (non-inlined MethodHandle invocations > are more expensive when LFs are shared), but it's a story for another day. > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8059877 > 8059877: GWT branch frequencies pollution due to LF sharing > [2] https://bugs.openjdk.java.net/browse/JDK-8068915 > [3] https://bugs.openjdk.java.net/browse/JDK-8046703 > JEP 210: LambdaForm Reduction and Caching > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From john.r.rose at oracle.com Fri Jan 16 23:13:48 2015 From: john.r.rose at oracle.com (John Rose) Date: Fri, 16 Jan 2015 15:13:48 -0800 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54B94766.2080102@oracle.com> References: <54B94766.2080102@oracle.com> Message-ID: <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> On Jan 16, 2015, at 9:16 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ > https://bugs.openjdk.java.net/browse/JDK-8063137 > ... > PS: as a summary, my experiments show that fixes for 8063137 & 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. This performance bump is excellent news. LFs are supposed to express emergently common behaviors, like hidden classes. We are much closer to that goal now. I'm glad to see that the library-assisted profiling turns out to be relatively clean. In effect this restores the pre-LF CountingMethodHandle logic from 2011, which was so beneficial in JDK 7: http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/02de5cdbef21/src/share/classes/java/lang/invoke/CountingMethodHandle.java I have some suggestions to make this version a little cleaner; see below. Starting with the JDK changes: In LambdaForm.java, I'm feeling flag pressure from all the little boolean fields and constructor parameters. (Is it time to put in a bit-encoded field "private byte LambdaForm.flags", or do we wait for another boolean to come along? But see next questions, which are more important.) What happens when a GWT LF gets inlined into a larger LF? Then there might be two or more selectAlternative calls. Will this confuse anything or will it Just Work? The combined LF will get profiled as usual, and the selectAlternative calls will also collect profile (or not?). This leads to another question: Why have a boolean 'isGWT' at all? Why not just check for one or more occurrence of selectAlternative, and declare that those guys override (some of) the profiling. Something like: -+ if (PROFILE_GWT && lambdaForm.isGWT) ++ if (PROFILE_GWT && lambdaForm.containsFunction(NF_selectAlternative)) (...where LF.containsFunction(NamedFunction) is a variation of LF.contains(Name).) I suppose the answer may be that you want to inline GWTs (if ever) into customized code where the JVM profiling should get maximum benefit. In that case case you might want to set the boolean to "false" to distinguish "immature" GWT combinators from customized ones. If that's the case, perhaps the real boolean flag you want is not 'isGWT' but 'sharedProfile' or 'immature' or some such, or (inverting) 'customized'. (I like the feel of a 'customized' flag.) Then @IgnoreProfile would get attached to a LF that (a ) contains selectAlternative and (b ) is marked as non-customized/immature/shared. You might also want to adjust the call to 'profileBranch' based on whether the containing LF was shared or customized. What I'm mainly poking at here is that 'isGWT' is not informative about the intended use of the flag. In 'updateCounters', if the counter overflows, you'll get continuous creation of ArithmeticExceptions. Will that optimize or will it cause a permanent slowdown? Consider a hack like this on the exception path: counters[idx] = Integer.MAX_VALUE / 2; On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in the VM) promises too much "ignorance", since it suppresses branch counts and traps, but allows type profiles to be consulted. Maybe something positive like "@ManyTraps" or "@SharedMegamorphic"? (It's just a name, and this is just a suggestion.) Going to the JVM: In library_call.cpp, I think you should change the assert to a guard: -+ assert(aobj->length() == 2, ""); ++ && aobj->length() == 2) { In Parse::dynamic_branch_prediction, the mere presence of the Opaque4 node is enough to trigger replacement of profiling. I think there should *not* be a test of method()->ignore_profile(). That should provide better integration between the two sources of profile data to JVM profiling? Also, I think the name 'Opaque4Node' is way too? opaque. Suggest 'ProfileBranchNode', since that's exactly what it does. Suggest changing the log element "profile_branch" to "observe source='profileBranch'", to make a better hint as to the source of the info. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcus.lagergren at oracle.com Sun Jan 18 21:54:43 2015 From: marcus.lagergren at oracle.com (Marcus Lagergren) Date: Sun, 18 Jan 2015 22:54:43 +0100 Subject: JFokus 2015 - the VM Tech Day Message-ID: Greetings community members! Here is something that I'm sure you'll find interesting. I want to advertise the upcoming "VM tech day? event, scheduled to take place February 2, 2015 at the JFokus conference in Stockholm. Sorry I am on a bit of a short notice here, but finalizing the speaker list took us a bit more time than expected. The VM tech day is a mini-track that runs the first day of the JFokus conference. This is its schedule: https://www.jfokus.se/jfokus/jvmtech.jsp After some rather challenging months of jigsaw puzzles, it is with great pleasure that I can announce that our speaker line up is now complete - and it is great indeed! We are talking 100% gurus, prophets, ninjas, rock stars, and all other similar terms that normally gets your resume binned if it passes my desk. But in this case the labels are true. We have strictly top names from both the commercial world and from academia ready to take you on a great ride. So what is the VM tech day? For those of you familiar with the JVM Language Summit (JVMLS) that usually takes place in Santa Clara in the summers, the format is similar. It?s the usual deal: anyone morbidly interested in runtime internals, code generation, polyglot programming and the complexities of language implementation, should find a veritable gold mine of stimulating conversation and knowledge transfer here. What is different from a typical JVMLS (except for the shorter duration), is that we have widened the scope a bit to include several runtimes, language implementation issues and polyglot problems. There will be six scheduled sessions and plenty of time for breakouts and discussions. We will also heavily encourage audience interaction and participation. The JFokus VM tech day is opened by John Rose. I am sure John needs no introduction to the subscribers of this list. With advanced OpenJDK projects like Valhalla and Panama booting up, John will discuss what the JVM has in store for the future. Other speakers include the tireless Charlie Nutter from Red Hat, the formidable Remi Forax, the brilliant Vyacheslav Egorov of Google v8 fame, the esteemed Dan Heidinga from IBM and the good looking Attila Szegedi from Oracle. We also have plenty of non-speaking celebrity participants in the audience, for example Fredrik ?hrstr?m: invokedynamic specification wizard extraordinaire and architect behind the new OpenJDK build system. Stop by and get autographs ;) Thusly: if you are attending JFokus, or if you are making up your mind about attending it right now, the VM tech summit is definitely something anyone subscribing to mlvm-dev wouldn't want to miss. The cross-platform/cross-technology/cross-company focus that we have tried very hard to create will without a doubt be ultra stimulating. Of that you can be sure. Please help us spread the word in whatever forums you deem appropriate! Talk to you friends! Tweet links to this post! Yell from your cubicle soap boxes across the neverending seas of fluorescent lights! Any further questions you may have about the event, not answered by the web pages, can be directed either to me (@lagergren) or Mattias Karlsson (@matkar) or as replies to this e-mail thread. On behalf of JFokus / VM Tech Day 2015 Marcus Lagergren Master of ceremonies (or something) From marcus.lagergren at oracle.com Mon Jan 19 09:58:50 2015 From: marcus.lagergren at oracle.com (Marcus Lagergren) Date: Mon, 19 Jan 2015 10:58:50 +0100 Subject: JFokus 2015 - the VM Tech Day In-Reply-To: References: Message-ID: <36D2A3B5-A5ED-421A-8DAC-379712002366@oracle.com> And to further clarify things - you can attend _only_ the VM Tech day / tech summit, should you so desire, and skip the rest of the JFokus conference. (What a strange thing to do, given the quality of JFokus, but I can?t be the one questioning your priorities here) (http://www.jfokus.se/jfokus/register.jsp ) /M > On 18 Jan 2015, at 22:54, Marcus Lagergren wrote: > > Greetings community members! > > Here is something that I'm sure you'll find interesting. > > I want to advertise the upcoming "VM tech day? event, scheduled to > take place February 2, 2015 at the JFokus conference in > Stockholm. Sorry I am on a bit of a short notice here, but finalizing > the speaker list took us a bit more time than expected. > > The VM tech day is a mini-track that runs the first day of the JFokus > conference. This is its schedule: > https://www.jfokus.se/jfokus/jvmtech.jsp > > After some rather challenging months of jigsaw puzzles, it is with > great pleasure that I can announce that our speaker line up is now > complete - and it is great indeed! We are talking 100% gurus, > prophets, ninjas, rock stars, and all other similar terms that > normally gets your resume binned if it passes my desk. But in this > case the labels are true. We have strictly top names from both the > commercial world and from academia ready to take you on a great > ride. > > So what is the VM tech day? For those of you familiar with the JVM > Language Summit (JVMLS) that usually takes place in Santa Clara in > the summers, the format is similar. It?s the usual deal: anyone > morbidly interested in runtime internals, code generation, polyglot > programming and the complexities of language implementation, should > find a veritable gold mine of stimulating conversation and knowledge > transfer here. What is different from a typical JVMLS (except for the > shorter duration), is that we have widened the scope a bit to include > several runtimes, language implementation issues and polyglot > problems. > > There will be six scheduled sessions and plenty of time for breakouts > and discussions. We will also heavily encourage audience interaction > and participation. > > The JFokus VM tech day is opened by John Rose. I am sure John needs > no introduction to the subscribers of this list. With advanced OpenJDK > projects like Valhalla and Panama booting up, John will discuss what > the JVM has in store for the future. > > Other speakers include the tireless Charlie Nutter from Red Hat, the > formidable Remi Forax, the brilliant Vyacheslav Egorov of Google v8 > fame, the esteemed Dan Heidinga from IBM and the good looking Attila > Szegedi from Oracle. > > We also have plenty of non-speaking celebrity participants in the > audience, for example Fredrik ?hrstr?m: invokedynamic specification > wizard extraordinaire and architect behind the new OpenJDK build > system. Stop by and get autographs ;) > > Thusly: if you are attending JFokus, or if you are making up your mind > about attending it right now, the VM tech summit is definitely > something anyone subscribing to mlvm-dev wouldn't want to miss. The > cross-platform/cross-technology/cross-company focus that we have tried > very hard to create will without a doubt be ultra stimulating. Of that > you can be sure. > > Please help us spread the word in whatever forums you deem > appropriate! Talk to you friends! Tweet links to this post! Yell from > your cubicle soap boxes across the neverending seas of fluorescent > lights! > > Any further questions you may have about the event, not answered by > the web pages, can be directed either to me (@lagergren) or Mattias > Karlsson (@matkar) or as replies to this e-mail thread. > > On behalf of JFokus / VM Tech Day 2015 > Marcus Lagergren > Master of ceremonies (or something) > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Mon Jan 19 17:05:49 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 19 Jan 2015 20:05:49 +0300 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54B975EA.6040005@oracle.com> References: <54B94766.2080102@oracle.com> <54B975EA.6040005@oracle.com> Message-ID: <54BD396D.2050907@oracle.com> Thanks, Vladimir! > I would suggest to add more detailed comment (instead of simple "Stop > profiling") to inline_profileBranch() intrinsic explaining what it is > doing because it is not strictly "intrinsic" - it does not implement > profileBranch() java code when counts is constant. Sure, will do. > You forgot to mark Opaque4Node as macro node. I would suggest to base it > on Opaque2Node then you will get some methods from it. Do I really need to do so? I expect it to go away during IGVN pass right after parsing is over. That's why I register the node for igvn in LibraryCallKit::inline_profileBranch(). Changes in macro.cpp & compile.cpp are leftovers from the version when Opaque4 was macro node. I plan to remove them. Best regards, Vladimir Ivanov > On 1/16/15 9:16 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ >> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ >> https://bugs.openjdk.java.net/browse/JDK-8063137 >> >> After GuardWithTest (GWT) LambdaForms became shared, profile pollution >> significantly distorted compilation decisions. It affected inlining and >> hindered some optimizations. It causes significant performance >> regressions for Nashorn (on Octane benchmarks). >> >> Inlining was fixed by 8059877 [1], but it didn't cover the case when a >> branch is never taken. It can cause missed optimization opportunity, and >> not just increase in code size. For example, non-pruned branch can break >> escape analysis. >> >> Currently, there are 2 problems: >> - branch frequencies profile pollution >> - deoptimization counts pollution >> >> Branch frequency pollution hides from JIT the fact that a branch is >> never taken. Since GWT LambdaForms (and hence their bytecode) are >> heavily shared, but the behavior is specific to MethodHandle, there's no >> way for JIT to understand how particular GWT instance behaves. >> >> The solution I propose is to do profiling in Java code and feed it to >> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where >> profiling info is stored. Once JIT kicks in, it can retrieve these >> counts, if corresponding MethodHandle is a compile-time constant (and it >> is usually the case). To communicate the profile data from Java code to >> JIT, MethodHandleImpl::profileBranch() is used. >> >> If GWT MethodHandle isn't a compile-time constant, profiling should >> proceed. It happens when corresponding LambdaForm is already shared, for >> newly created GWT MethodHandles profiling can occur only in native code >> (dedicated nmethod for a single LambdaForm). So, when compilation of the >> whole MethodHandle chain is triggered, the profile should be already >> gathered. >> >> Overriding branch frequencies is not enough. Statistics on >> deoptimization events is also polluted. Even if a branch is never taken, >> JIT doesn't issue an uncommon trap there unless corresponding bytecode >> doesn't trap too much and doesn't cause too many recompiles. >> >> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT >> sees it on some method, Compile::too_many_traps & >> Compile::too_many_recompiles for that method always return false. It >> allows JIT to prune the branch based on custom profile and recompile the >> method, if the branch is visited. >> >> For now, I wanted to keep the fix very focused. The next thing I plan to >> do is to experiment with ignoring deoptimization counts for other >> LambdaForms which are heavily shared. I already saw problems caused by >> deoptimization counts pollution (see JDK-8068915 [2]). >> >> I plan to backport the fix into 8u40, once I finish extensive >> performance testing. >> >> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, >> Octane). >> >> Thanks! >> >> PS: as a summary, my experiments show that fixes for 8063137 & 8068915 >> [2] almost completely recovers peak performance after LambdaForm sharing >> [3]. There's one more problem left (non-inlined MethodHandle invocations >> are more expensive when LFs are shared), but it's a story for another >> day. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8059877 >> 8059877: GWT branch frequencies pollution due to LF sharing >> [2] https://bugs.openjdk.java.net/browse/JDK-8068915 >> [3] https://bugs.openjdk.java.net/browse/JDK-8046703 >> JEP 210: LambdaForm Reduction and Caching >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From duncan.macgregor at ge.com Mon Jan 19 20:21:59 2015 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management)) Date: Mon, 19 Jan 2015 20:21:59 +0000 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54B94766.2080102@oracle.com> References: <54B94766.2080102@oracle.com> Message-ID: Okay, I?ve done some tests of this with the micro benchmarks for our language & runtime which show pretty much no change except for one test which is now almost 3x slower. It uses nested loops to iterate over an array and concatenate the string-like objects it contains, and replaces elements with these new longer string-llike objects. It?s a bit of a pathological case, and I haven?t seen the same sort of degradation in the other benchmarks or in real applications, but I haven?t done serious benchmarking of them with this change. I shall see if the test case can be reduced down to anything simpler while still showing the same performance behaviour, and try add some compilation logging options to narrow down what?s going on. Duncan. On 16/01/2015 17:16, "Vladimir Ivanov" wrote: >http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ >http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ >https://bugs.openjdk.java.net/browse/JDK-8063137 > >After GuardWithTest (GWT) LambdaForms became shared, profile pollution >significantly distorted compilation decisions. It affected inlining and >hindered some optimizations. It causes significant performance >regressions for Nashorn (on Octane benchmarks). > >Inlining was fixed by 8059877 [1], but it didn't cover the case when a >branch is never taken. It can cause missed optimization opportunity, and >not just increase in code size. For example, non-pruned branch can break >escape analysis. > >Currently, there are 2 problems: > - branch frequencies profile pollution > - deoptimization counts pollution > >Branch frequency pollution hides from JIT the fact that a branch is >never taken. Since GWT LambdaForms (and hence their bytecode) are >heavily shared, but the behavior is specific to MethodHandle, there's no >way for JIT to understand how particular GWT instance behaves. > >The solution I propose is to do profiling in Java code and feed it to >JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where >profiling info is stored. Once JIT kicks in, it can retrieve these >counts, if corresponding MethodHandle is a compile-time constant (and it >is usually the case). To communicate the profile data from Java code to >JIT, MethodHandleImpl::profileBranch() is used. > >If GWT MethodHandle isn't a compile-time constant, profiling should >proceed. It happens when corresponding LambdaForm is already shared, for >newly created GWT MethodHandles profiling can occur only in native code >(dedicated nmethod for a single LambdaForm). So, when compilation of the >whole MethodHandle chain is triggered, the profile should be already >gathered. > >Overriding branch frequencies is not enough. Statistics on >deoptimization events is also polluted. Even if a branch is never taken, >JIT doesn't issue an uncommon trap there unless corresponding bytecode >doesn't trap too much and doesn't cause too many recompiles. > >I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT >sees it on some method, Compile::too_many_traps & >Compile::too_many_recompiles for that method always return false. It >allows JIT to prune the branch based on custom profile and recompile the >method, if the branch is visited. > >For now, I wanted to keep the fix very focused. The next thing I plan to >do is to experiment with ignoring deoptimization counts for other >LambdaForms which are heavily shared. I already saw problems caused by >deoptimization counts pollution (see JDK-8068915 [2]). > >I plan to backport the fix into 8u40, once I finish extensive >performance testing. > >Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, >Octane). > >Thanks! > >PS: as a summary, my experiments show that fixes for 8063137 & 8068915 >[2] almost completely recovers peak performance after LambdaForm sharing >[3]. There's one more problem left (non-inlined MethodHandle invocations >are more expensive when LFs are shared), but it's a story for another day. > >Best regards, >Vladimir Ivanov > >[1] https://bugs.openjdk.java.net/browse/JDK-8059877 > 8059877: GWT branch frequencies pollution due to LF sharing >[2] https://bugs.openjdk.java.net/browse/JDK-8068915 >[3] https://bugs.openjdk.java.net/browse/JDK-8046703 > JEP 210: LambdaForm Reduction and Caching >_______________________________________________ >mlvm-dev mailing list >mlvm-dev at openjdk.java.net >http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From vladimir.x.ivanov at oracle.com Tue Jan 20 12:40:50 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 20 Jan 2015 15:40:50 +0300 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: References: <54B94766.2080102@oracle.com> Message-ID: <54BE4CD2.30805@oracle.com> Duncan, thanks a lot for giving it a try! If you plan to spend more time on it, please, apply 8068915 as well. I saw huge intermittent performance regressions due to continuous deoptimization storm. You can look into -XX:+LogCompilation output and look for repeated deoptimization events in steady state w/ Action_none. Also, there's deoptimization statistics in the log (at least, in jdk9). It's located right before compilation_log tag. Thanks again for the valuable feedback! Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00 On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote: > Okay, I?ve done some tests of this with the micro benchmarks for our > language & runtime which show pretty much no change except for one test > which is now almost 3x slower. It uses nested loops to iterate over an > array and concatenate the string-like objects it contains, and replaces > elements with these new longer string-llike objects. It?s a bit of a > pathological case, and I haven?t seen the same sort of degradation in the > other benchmarks or in real applications, but I haven?t done serious > benchmarking of them with this change. > > I shall see if the test case can be reduced down to anything simpler while > still showing the same performance behaviour, and try add some compilation > logging options to narrow down what?s going on. > > Duncan. > > On 16/01/2015 17:16, "Vladimir Ivanov" > wrote: > >> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ >> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ >> https://bugs.openjdk.java.net/browse/JDK-8063137 >> >> After GuardWithTest (GWT) LambdaForms became shared, profile pollution >> significantly distorted compilation decisions. It affected inlining and >> hindered some optimizations. It causes significant performance >> regressions for Nashorn (on Octane benchmarks). >> >> Inlining was fixed by 8059877 [1], but it didn't cover the case when a >> branch is never taken. It can cause missed optimization opportunity, and >> not just increase in code size. For example, non-pruned branch can break >> escape analysis. >> >> Currently, there are 2 problems: >> - branch frequencies profile pollution >> - deoptimization counts pollution >> >> Branch frequency pollution hides from JIT the fact that a branch is >> never taken. Since GWT LambdaForms (and hence their bytecode) are >> heavily shared, but the behavior is specific to MethodHandle, there's no >> way for JIT to understand how particular GWT instance behaves. >> >> The solution I propose is to do profiling in Java code and feed it to >> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where >> profiling info is stored. Once JIT kicks in, it can retrieve these >> counts, if corresponding MethodHandle is a compile-time constant (and it >> is usually the case). To communicate the profile data from Java code to >> JIT, MethodHandleImpl::profileBranch() is used. >> >> If GWT MethodHandle isn't a compile-time constant, profiling should >> proceed. It happens when corresponding LambdaForm is already shared, for >> newly created GWT MethodHandles profiling can occur only in native code >> (dedicated nmethod for a single LambdaForm). So, when compilation of the >> whole MethodHandle chain is triggered, the profile should be already >> gathered. >> >> Overriding branch frequencies is not enough. Statistics on >> deoptimization events is also polluted. Even if a branch is never taken, >> JIT doesn't issue an uncommon trap there unless corresponding bytecode >> doesn't trap too much and doesn't cause too many recompiles. >> >> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT >> sees it on some method, Compile::too_many_traps & >> Compile::too_many_recompiles for that method always return false. It >> allows JIT to prune the branch based on custom profile and recompile the >> method, if the branch is visited. >> >> For now, I wanted to keep the fix very focused. The next thing I plan to >> do is to experiment with ignoring deoptimization counts for other >> LambdaForms which are heavily shared. I already saw problems caused by >> deoptimization counts pollution (see JDK-8068915 [2]). >> >> I plan to backport the fix into 8u40, once I finish extensive >> performance testing. >> >> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, >> Octane). >> >> Thanks! >> >> PS: as a summary, my experiments show that fixes for 8063137 & 8068915 >> [2] almost completely recovers peak performance after LambdaForm sharing >> [3]. There's one more problem left (non-inlined MethodHandle invocations >> are more expensive when LFs are shared), but it's a story for another day. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8059877 >> 8059877: GWT branch frequencies pollution due to LF sharing >> [2] https://bugs.openjdk.java.net/browse/JDK-8068915 >> [3] https://bugs.openjdk.java.net/browse/JDK-8046703 >> JEP 210: LambdaForm Reduction and Caching >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > From duncan.macgregor at ge.com Tue Jan 20 13:09:45 2015 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management)) Date: Tue, 20 Jan 2015 13:09:45 +0000 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54BE4CD2.30805@oracle.com> References: <54B94766.2080102@oracle.com> <54BE4CD2.30805@oracle.com> Message-ID: I?ll apply that patch and try to run more tests this afternoon. On 20/01/2015 12:40, "Vladimir Ivanov" wrote: >Duncan, thanks a lot for giving it a try! > >If you plan to spend more time on it, please, apply 8068915 as well. I >saw huge intermittent performance regressions due to continuous >deoptimization storm. You can look into -XX:+LogCompilation output and >look for repeated deoptimization events in steady state w/ Action_none. >Also, there's deoptimization statistics in the log (at least, in jdk9). >It's located right before compilation_log tag. > >Thanks again for the valuable feedback! > >Best regards, >Vladimir Ivanov > >[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00 > >On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote: >> Okay, I?ve done some tests of this with the micro benchmarks for our >> language & runtime which show pretty much no change except for one test >> which is now almost 3x slower. It uses nested loops to iterate over an >> array and concatenate the string-like objects it contains, and replaces >> elements with these new longer string-llike objects. It?s a bit of a >> pathological case, and I haven?t seen the same sort of degradation in >>the >> other benchmarks or in real applications, but I haven?t done serious >> benchmarking of them with this change. >> >> I shall see if the test case can be reduced down to anything simpler >>while >> still showing the same performance behaviour, and try add some >>compilation >> logging options to narrow down what?s going on. >> >> Duncan. >> >> On 16/01/2015 17:16, "Vladimir Ivanov" >> wrote: >> >>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ >>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ >>> https://bugs.openjdk.java.net/browse/JDK-8063137 >>> >>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution >>> significantly distorted compilation decisions. It affected inlining and >>> hindered some optimizations. It causes significant performance >>> regressions for Nashorn (on Octane benchmarks). >>> >>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a >>> branch is never taken. It can cause missed optimization opportunity, >>>and >>> not just increase in code size. For example, non-pruned branch can >>>break >>> escape analysis. >>> >>> Currently, there are 2 problems: >>> - branch frequencies profile pollution >>> - deoptimization counts pollution >>> >>> Branch frequency pollution hides from JIT the fact that a branch is >>> never taken. Since GWT LambdaForms (and hence their bytecode) are >>> heavily shared, but the behavior is specific to MethodHandle, there's >>>no >>> way for JIT to understand how particular GWT instance behaves. >>> >>> The solution I propose is to do profiling in Java code and feed it to >>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where >>> profiling info is stored. Once JIT kicks in, it can retrieve these >>> counts, if corresponding MethodHandle is a compile-time constant (and >>>it >>> is usually the case). To communicate the profile data from Java code to >>> JIT, MethodHandleImpl::profileBranch() is used. >>> >>> If GWT MethodHandle isn't a compile-time constant, profiling should >>> proceed. It happens when corresponding LambdaForm is already shared, >>>for >>> newly created GWT MethodHandles profiling can occur only in native code >>> (dedicated nmethod for a single LambdaForm). So, when compilation of >>>the >>> whole MethodHandle chain is triggered, the profile should be already >>> gathered. >>> >>> Overriding branch frequencies is not enough. Statistics on >>> deoptimization events is also polluted. Even if a branch is never >>>taken, >>> JIT doesn't issue an uncommon trap there unless corresponding bytecode >>> doesn't trap too much and doesn't cause too many recompiles. >>> >>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT >>> sees it on some method, Compile::too_many_traps & >>> Compile::too_many_recompiles for that method always return false. It >>> allows JIT to prune the branch based on custom profile and recompile >>>the >>> method, if the branch is visited. >>> >>> For now, I wanted to keep the fix very focused. The next thing I plan >>>to >>> do is to experiment with ignoring deoptimization counts for other >>> LambdaForms which are heavily shared. I already saw problems caused by >>> deoptimization counts pollution (see JDK-8068915 [2]). >>> >>> I plan to backport the fix into 8u40, once I finish extensive >>> performance testing. >>> >>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, >>> Octane). >>> >>> Thanks! >>> >>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915 >>> [2] almost completely recovers peak performance after LambdaForm >>>sharing >>> [3]. There's one more problem left (non-inlined MethodHandle >>>invocations >>> are more expensive when LFs are shared), but it's a story for another >>>day. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877 >>> 8059877: GWT branch frequencies pollution due to LF sharing >>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915 >>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703 >>> JEP 210: LambdaForm Reduction and Caching >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >_______________________________________________ >mlvm-dev mailing list >mlvm-dev at openjdk.java.net >http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From duncan.macgregor at ge.com Tue Jan 20 17:14:00 2015 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management)) Date: Tue, 20 Jan 2015 17:14:00 +0000 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54BE4CD2.30805@oracle.com> References: <54B94766.2080102@oracle.com> <54BE4CD2.30805@oracle.com> Message-ID: Hmm, 8068915 hasn?t fixed it, but running fewer benchmarks seems to make the problem go away, so it looks like there?s something going wrong fairly deep in our runtime. Trying the full suite with compilation logging enabled now to see if I can find a smoking gun. On 20/01/2015 12:40, "Vladimir Ivanov" wrote: >Duncan, thanks a lot for giving it a try! > >If you plan to spend more time on it, please, apply 8068915 as well. I >saw huge intermittent performance regressions due to continuous >deoptimization storm. You can look into -XX:+LogCompilation output and >look for repeated deoptimization events in steady state w/ Action_none. >Also, there's deoptimization statistics in the log (at least, in jdk9). >It's located right before compilation_log tag. > >Thanks again for the valuable feedback! > >Best regards, >Vladimir Ivanov > >[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00 > >On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote: >> Okay, I?ve done some tests of this with the micro benchmarks for our >> language & runtime which show pretty much no change except for one test >> which is now almost 3x slower. It uses nested loops to iterate over an >> array and concatenate the string-like objects it contains, and replaces >> elements with these new longer string-llike objects. It?s a bit of a >> pathological case, and I haven?t seen the same sort of degradation in >>the >> other benchmarks or in real applications, but I haven?t done serious >> benchmarking of them with this change. >> >> I shall see if the test case can be reduced down to anything simpler >>while >> still showing the same performance behaviour, and try add some >>compilation >> logging options to narrow down what?s going on. >> >> Duncan. >> >> On 16/01/2015 17:16, "Vladimir Ivanov" >> wrote: >> >>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ >>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ >>> https://bugs.openjdk.java.net/browse/JDK-8063137 >>> >>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution >>> significantly distorted compilation decisions. It affected inlining and >>> hindered some optimizations. It causes significant performance >>> regressions for Nashorn (on Octane benchmarks). >>> >>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a >>> branch is never taken. It can cause missed optimization opportunity, >>>and >>> not just increase in code size. For example, non-pruned branch can >>>break >>> escape analysis. >>> >>> Currently, there are 2 problems: >>> - branch frequencies profile pollution >>> - deoptimization counts pollution >>> >>> Branch frequency pollution hides from JIT the fact that a branch is >>> never taken. Since GWT LambdaForms (and hence their bytecode) are >>> heavily shared, but the behavior is specific to MethodHandle, there's >>>no >>> way for JIT to understand how particular GWT instance behaves. >>> >>> The solution I propose is to do profiling in Java code and feed it to >>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where >>> profiling info is stored. Once JIT kicks in, it can retrieve these >>> counts, if corresponding MethodHandle is a compile-time constant (and >>>it >>> is usually the case). To communicate the profile data from Java code to >>> JIT, MethodHandleImpl::profileBranch() is used. >>> >>> If GWT MethodHandle isn't a compile-time constant, profiling should >>> proceed. It happens when corresponding LambdaForm is already shared, >>>for >>> newly created GWT MethodHandles profiling can occur only in native code >>> (dedicated nmethod for a single LambdaForm). So, when compilation of >>>the >>> whole MethodHandle chain is triggered, the profile should be already >>> gathered. >>> >>> Overriding branch frequencies is not enough. Statistics on >>> deoptimization events is also polluted. Even if a branch is never >>>taken, >>> JIT doesn't issue an uncommon trap there unless corresponding bytecode >>> doesn't trap too much and doesn't cause too many recompiles. >>> >>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT >>> sees it on some method, Compile::too_many_traps & >>> Compile::too_many_recompiles for that method always return false. It >>> allows JIT to prune the branch based on custom profile and recompile >>>the >>> method, if the branch is visited. >>> >>> For now, I wanted to keep the fix very focused. The next thing I plan >>>to >>> do is to experiment with ignoring deoptimization counts for other >>> LambdaForms which are heavily shared. I already saw problems caused by >>> deoptimization counts pollution (see JDK-8068915 [2]). >>> >>> I plan to backport the fix into 8u40, once I finish extensive >>> performance testing. >>> >>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, >>> Octane). >>> >>> Thanks! >>> >>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915 >>> [2] almost completely recovers peak performance after LambdaForm >>>sharing >>> [3]. There's one more problem left (non-inlined MethodHandle >>>invocations >>> are more expensive when LFs are shared), but it's a story for another >>>day. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877 >>> 8059877: GWT branch frequencies pollution due to LF sharing >>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915 >>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703 >>> JEP 210: LambdaForm Reduction and Caching >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >_______________________________________________ >mlvm-dev mailing list >mlvm-dev at openjdk.java.net >http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From vladimir.x.ivanov at oracle.com Tue Jan 20 19:09:11 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 20 Jan 2015 22:09:11 +0300 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> References: <54B94766.2080102@oracle.com> <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> Message-ID: <54BEA7D7.6080008@oracle.com> John, thanks for the review! Updated webrev: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.01/hotspot http://cr.openjdk.java.net/~vlivanov/8063137/webrev.01/jdk See my answers inline. On 1/17/15 2:13 AM, John Rose wrote: > On Jan 16, 2015, at 9:16 AM, Vladimir Ivanov > > wrote: >> >> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ >> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ >> https://bugs.openjdk.java.net/browse/JDK-8063137 >> ... >> PS: as a summary, my experiments show that fixes for 8063137 & 8068915 >> [2] almost completely recovers peak performance after LambdaForm >> sharing [3]. There's one more problem left (non-inlined MethodHandle >> invocations are more expensive when LFs are shared), but it's a story >> for another day. > > This performance bump is excellent news. LFs are supposed to express > emergently common behaviors, like hidden classes. We are much closer to > that goal now. > > I'm glad to see that the library-assisted profiling turns out to be > relatively clean. > > In effect this restores the pre-LF CountingMethodHandle logic from 2011, > which was so beneficial in JDK 7: > http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/02de5cdbef21/src/share/classes/java/lang/invoke/CountingMethodHandle.java > > I have some suggestions to make this version a little cleaner; see below. > > Starting with the JDK changes: > > In LambdaForm.java, I'm feeling flag pressure from all the little > boolean fields and constructor parameters. > > (Is it time to put in a bit-encoded field "private byte > LambdaForm.flags", or do we wait for another boolean to come along? But > see next questions, which are more important.) > > What happens when a GWT LF gets inlined into a larger LF? Then there > might be two or more selectAlternative calls. > Will this confuse anything or will it Just Work? The combined LF will > get profiled as usual, and the selectAlternative calls will also collect > profile (or not?). > > This leads to another question: Why have a boolean 'isGWT' at all? Why > not just check for one or more occurrence of selectAlternative, and > declare that those guys override (some of) the profiling. Something like: > > -+ if (PROFILE_GWT && lambdaForm.isGWT) > ++ if (PROFILE_GWT && lambdaForm.containsFunction(NF_selectAlternative)) > (...where LF.containsFunction(NamedFunction) is a variation of > LF.contains(Name).) > > I suppose the answer may be that you want to inline GWTs (if ever) into > customized code where the JVM profiling should get maximum benefit. In > that case case you might want to set the boolean to "false" to > distinguish "immature" GWT combinators from customized ones. > > If that's the case, perhaps the real boolean flag you want is not > 'isGWT' but 'sharedProfile' or 'immature' or some such, or (inverting) > 'customized'. (I like the feel of a 'customized' flag.) Then > @IgnoreProfile would get attached to a LF that (a ) contains > selectAlternative and (b ) is marked as non-customized/immature/shared. > You might also want to adjust the call to 'profileBranch' based on > whether the containing LF was shared or customized. > > What I'm mainly poking at here is that 'isGWT' is not informative about > the intended use of the flag. I agree. It was an interim solution. Initially, I planned to introduce customization and guide the logic based on that property. But it's not there yet and I needed something for GWT case. Unfortunately, I missed the case when GWT is edited. In that case, isGWT flag is missed and no annotation is set. So, I removed isGWT flag and introduced a check for selectAlternative occurence in LambdaForm shape, as you suggested. > In 'updateCounters', if the counter overflows, you'll get continuous > creation of ArithmeticExceptions. Will that optimize or will it cause a > permanent slowdown? Consider a hack like this on the exception path: > counters[idx] = Integer.MAX_VALUE / 2; I had an impression that VM optimizes overflows in Math.exact* intrinsics, but it's not the case - it always inserts an uncommon trap. I used the workaround you proposed. > On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in > the VM) promises too much "ignorance", since it suppresses branch counts > and traps, but allows type profiles to be consulted. Maybe something > positive like "@ManyTraps" or "@SharedMegamorphic"? (It's just a name, > and this is just a suggestion.) What do you think about @LambdaForm.Shared? > Going to the JVM: > > In library_call.cpp, I think you should change the assert to a guard: > -+ assert(aobj->length() == 2, ""); > ++ && aobj->length() == 2) { Done. > In Parse::dynamic_branch_prediction, the mere presence of the Opaque4 > node is enough to trigger replacement of profiling. I think there > should *not* be a test of method()->ignore_profile(). That should > provide better integration between the two sources of profile data to > JVM profiling? Done. > Also, I think the name 'Opaque4Node' is way too? opaque. Suggest > 'ProfileBranchNode', since that's exactly what it does. Done. > Suggest changing the log element "profile_branch" to "observe > source='profileBranch'", to make a better hint as to the source of the info. Done. Best regards, Vladimir Ivanov From duncan.macgregor at ge.com Tue Jan 20 20:11:42 2015 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management)) Date: Tue, 20 Jan 2015 20:11:42 +0000 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: References: <54B94766.2080102@oracle.com> <54BE4CD2.30805@oracle.com> Message-ID: So, very few deopt events in the logs (exactly 4 in fact, in both the performant and non-performant cases, and for the exact same methods), but in the case where performance has degraded I only see an initial compilation for the problem method and not the later inlining I see in the performant case. I?ll dig through the rest of the logs and try see if there?s any differences leading up to the inlining. On the bright side while going through the logs I did spot one obvious snafu in our code (unnecessary MutableCallSite usage), and have got a 2.5 times speed up on another benchmark, so I?m not too unhappy. :-) On 20/01/2015 17:14, "MacGregor, Duncan (GE Energy Management)" wrote: >Hmm, 8068915 hasn?t fixed it, but running fewer benchmarks seems to make >the problem go away, so it looks like there?s something going wrong fairly >deep in our runtime. Trying the full suite with compilation logging >enabled now to see if I can find a smoking gun. > >On 20/01/2015 12:40, "Vladimir Ivanov" >wrote: > >>Duncan, thanks a lot for giving it a try! >> >>If you plan to spend more time on it, please, apply 8068915 as well. I >>saw huge intermittent performance regressions due to continuous >>deoptimization storm. You can look into -XX:+LogCompilation output and >>look for repeated deoptimization events in steady state w/ Action_none. >>Also, there's deoptimization statistics in the log (at least, in jdk9). >>It's located right before compilation_log tag. >> >>Thanks again for the valuable feedback! >> >>Best regards, >>Vladimir Ivanov >> >>[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00 >> >>On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote: >>> Okay, I?ve done some tests of this with the micro benchmarks for our >>> language & runtime which show pretty much no change except for one test >>> which is now almost 3x slower. It uses nested loops to iterate over an >>> array and concatenate the string-like objects it contains, and replaces >>> elements with these new longer string-llike objects. It?s a bit of a >>> pathological case, and I haven?t seen the same sort of degradation in >>>the >>> other benchmarks or in real applications, but I haven?t done serious >>> benchmarking of them with this change. >>> >>> I shall see if the test case can be reduced down to anything simpler >>>while >>> still showing the same performance behaviour, and try add some >>>compilation >>> logging options to narrow down what?s going on. >>> >>> Duncan. >>> >>> On 16/01/2015 17:16, "Vladimir Ivanov" >>> wrote: >>> >>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ >>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ >>>> https://bugs.openjdk.java.net/browse/JDK-8063137 >>>> >>>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution >>>> significantly distorted compilation decisions. It affected inlining >>>>and >>>> hindered some optimizations. It causes significant performance >>>> regressions for Nashorn (on Octane benchmarks). >>>> >>>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a >>>> branch is never taken. It can cause missed optimization opportunity, >>>>and >>>> not just increase in code size. For example, non-pruned branch can >>>>break >>>> escape analysis. >>>> >>>> Currently, there are 2 problems: >>>> - branch frequencies profile pollution >>>> - deoptimization counts pollution >>>> >>>> Branch frequency pollution hides from JIT the fact that a branch is >>>> never taken. Since GWT LambdaForms (and hence their bytecode) are >>>> heavily shared, but the behavior is specific to MethodHandle, there's >>>>no >>>> way for JIT to understand how particular GWT instance behaves. >>>> >>>> The solution I propose is to do profiling in Java code and feed it to >>>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where >>>> profiling info is stored. Once JIT kicks in, it can retrieve these >>>> counts, if corresponding MethodHandle is a compile-time constant (and >>>>it >>>> is usually the case). To communicate the profile data from Java code >>>>to >>>> JIT, MethodHandleImpl::profileBranch() is used. >>>> >>>> If GWT MethodHandle isn't a compile-time constant, profiling should >>>> proceed. It happens when corresponding LambdaForm is already shared, >>>>for >>>> newly created GWT MethodHandles profiling can occur only in native >>>>code >>>> (dedicated nmethod for a single LambdaForm). So, when compilation of >>>>the >>>> whole MethodHandle chain is triggered, the profile should be already >>>> gathered. >>>> >>>> Overriding branch frequencies is not enough. Statistics on >>>> deoptimization events is also polluted. Even if a branch is never >>>>taken, >>>> JIT doesn't issue an uncommon trap there unless corresponding bytecode >>>> doesn't trap too much and doesn't cause too many recompiles. >>>> >>>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT >>>> sees it on some method, Compile::too_many_traps & >>>> Compile::too_many_recompiles for that method always return false. It >>>> allows JIT to prune the branch based on custom profile and recompile >>>>the >>>> method, if the branch is visited. >>>> >>>> For now, I wanted to keep the fix very focused. The next thing I plan >>>>to >>>> do is to experiment with ignoring deoptimization counts for other >>>> LambdaForms which are heavily shared. I already saw problems caused by >>>> deoptimization counts pollution (see JDK-8068915 [2]). >>>> >>>> I plan to backport the fix into 8u40, once I finish extensive >>>> performance testing. >>>> >>>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, >>>> Octane). >>>> >>>> Thanks! >>>> >>>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915 >>>> [2] almost completely recovers peak performance after LambdaForm >>>>sharing >>>> [3]. There's one more problem left (non-inlined MethodHandle >>>>invocations >>>> are more expensive when LFs are shared), but it's a story for another >>>>day. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877 >>>> 8059877: GWT branch frequencies pollution due to LF sharing >>>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915 >>>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703 >>>> JEP 210: LambdaForm Reduction and Caching >>>> _______________________________________________ >>>> mlvm-dev mailing list >>>> mlvm-dev at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>> >>_______________________________________________ >>mlvm-dev mailing list >>mlvm-dev at openjdk.java.net >>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > >_______________________________________________ >mlvm-dev mailing list >mlvm-dev at openjdk.java.net >http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From duncan.macgregor at ge.com Wed Jan 21 10:39:54 2015 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management)) Date: Wed, 21 Jan 2015 10:39:54 +0000 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: References: <54B94766.2080102@oracle.com> Message-ID: This version seems to have inconsistent removal of ignore profile in the hotspot patch. It?s no longer added to vmSymbols but is still referenced in classFileParser. On 19/01/2015 20:21, "MacGregor, Duncan (GE Energy Management)" wrote: >Okay, I?ve done some tests of this with the micro benchmarks for our >language & runtime which show pretty much no change except for one test >which is now almost 3x slower. It uses nested loops to iterate over an >array and concatenate the string-like objects it contains, and replaces >elements with these new longer string-llike objects. It?s a bit of a >pathological case, and I haven?t seen the same sort of degradation in the >other benchmarks or in real applications, but I haven?t done serious >benchmarking of them with this change. > >I shall see if the test case can be reduced down to anything simpler while >still showing the same performance behaviour, and try add some compilation >logging options to narrow down what?s going on. > >Duncan. > >On 16/01/2015 17:16, "Vladimir Ivanov" >wrote: > >>http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ >>http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ >>https://bugs.openjdk.java.net/browse/JDK-8063137 >> >>After GuardWithTest (GWT) LambdaForms became shared, profile pollution >>significantly distorted compilation decisions. It affected inlining and >>hindered some optimizations. It causes significant performance >>regressions for Nashorn (on Octane benchmarks). >> >>Inlining was fixed by 8059877 [1], but it didn't cover the case when a >>branch is never taken. It can cause missed optimization opportunity, and >>not just increase in code size. For example, non-pruned branch can break >>escape analysis. >> >>Currently, there are 2 problems: >> - branch frequencies profile pollution >> - deoptimization counts pollution >> >>Branch frequency pollution hides from JIT the fact that a branch is >>never taken. Since GWT LambdaForms (and hence their bytecode) are >>heavily shared, but the behavior is specific to MethodHandle, there's no >>way for JIT to understand how particular GWT instance behaves. >> >>The solution I propose is to do profiling in Java code and feed it to >>JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where >>profiling info is stored. Once JIT kicks in, it can retrieve these >>counts, if corresponding MethodHandle is a compile-time constant (and it >>is usually the case). To communicate the profile data from Java code to >>JIT, MethodHandleImpl::profileBranch() is used. >> >>If GWT MethodHandle isn't a compile-time constant, profiling should >>proceed. It happens when corresponding LambdaForm is already shared, for >>newly created GWT MethodHandles profiling can occur only in native code >>(dedicated nmethod for a single LambdaForm). So, when compilation of the >>whole MethodHandle chain is triggered, the profile should be already >>gathered. >> >>Overriding branch frequencies is not enough. Statistics on >>deoptimization events is also polluted. Even if a branch is never taken, >>JIT doesn't issue an uncommon trap there unless corresponding bytecode >>doesn't trap too much and doesn't cause too many recompiles. >> >>I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT >>sees it on some method, Compile::too_many_traps & >>Compile::too_many_recompiles for that method always return false. It >>allows JIT to prune the branch based on custom profile and recompile the >>method, if the branch is visited. >> >>For now, I wanted to keep the fix very focused. The next thing I plan to >>do is to experiment with ignoring deoptimization counts for other >>LambdaForms which are heavily shared. I already saw problems caused by >>deoptimization counts pollution (see JDK-8068915 [2]). >> >>I plan to backport the fix into 8u40, once I finish extensive >>performance testing. >> >>Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, >>Octane). >> >>Thanks! >> >>PS: as a summary, my experiments show that fixes for 8063137 & 8068915 >>[2] almost completely recovers peak performance after LambdaForm sharing >>[3]. There's one more problem left (non-inlined MethodHandle invocations >>are more expensive when LFs are shared), but it's a story for another >>day. >> >>Best regards, >>Vladimir Ivanov >> >>[1] https://bugs.openjdk.java.net/browse/JDK-8059877 >> 8059877: GWT branch frequencies pollution due to LF sharing >>[2] https://bugs.openjdk.java.net/browse/JDK-8068915 >>[3] https://bugs.openjdk.java.net/browse/JDK-8046703 >> JEP 210: LambdaForm Reduction and Caching >>_______________________________________________ >>mlvm-dev mailing list >>mlvm-dev at openjdk.java.net >>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > >_______________________________________________ >mlvm-dev mailing list >mlvm-dev at openjdk.java.net >http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From vladimir.x.ivanov at oracle.com Wed Jan 21 11:41:15 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 21 Jan 2015 14:41:15 +0300 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: References: <54B94766.2080102@oracle.com> Message-ID: <54BF905B.7020407@oracle.com> Duncan, sorry for that. Updated webrev inplace. Best regards, Vladimir Ivanov On 1/21/15 1:39 PM, MacGregor, Duncan (GE Energy Management) wrote: > This version seems to have inconsistent removal of ignore profile in the > hotspot patch. It?s no longer added to vmSymbols but is still referenced > in classFileParser. > > On 19/01/2015 20:21, "MacGregor, Duncan (GE Energy Management)" > wrote: > >> Okay, I?ve done some tests of this with the micro benchmarks for our >> language & runtime which show pretty much no change except for one test >> which is now almost 3x slower. It uses nested loops to iterate over an >> array and concatenate the string-like objects it contains, and replaces >> elements with these new longer string-llike objects. It?s a bit of a >> pathological case, and I haven?t seen the same sort of degradation in the >> other benchmarks or in real applications, but I haven?t done serious >> benchmarking of them with this change. >> >> I shall see if the test case can be reduced down to anything simpler while >> still showing the same performance behaviour, and try add some compilation >> logging options to narrow down what?s going on. >> >> Duncan. >> >> On 16/01/2015 17:16, "Vladimir Ivanov" >> wrote: >> >>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ >>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ >>> https://bugs.openjdk.java.net/browse/JDK-8063137 >>> >>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution >>> significantly distorted compilation decisions. It affected inlining and >>> hindered some optimizations. It causes significant performance >>> regressions for Nashorn (on Octane benchmarks). >>> >>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a >>> branch is never taken. It can cause missed optimization opportunity, and >>> not just increase in code size. For example, non-pruned branch can break >>> escape analysis. >>> >>> Currently, there are 2 problems: >>> - branch frequencies profile pollution >>> - deoptimization counts pollution >>> >>> Branch frequency pollution hides from JIT the fact that a branch is >>> never taken. Since GWT LambdaForms (and hence their bytecode) are >>> heavily shared, but the behavior is specific to MethodHandle, there's no >>> way for JIT to understand how particular GWT instance behaves. >>> >>> The solution I propose is to do profiling in Java code and feed it to >>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where >>> profiling info is stored. Once JIT kicks in, it can retrieve these >>> counts, if corresponding MethodHandle is a compile-time constant (and it >>> is usually the case). To communicate the profile data from Java code to >>> JIT, MethodHandleImpl::profileBranch() is used. >>> >>> If GWT MethodHandle isn't a compile-time constant, profiling should >>> proceed. It happens when corresponding LambdaForm is already shared, for >>> newly created GWT MethodHandles profiling can occur only in native code >>> (dedicated nmethod for a single LambdaForm). So, when compilation of the >>> whole MethodHandle chain is triggered, the profile should be already >>> gathered. >>> >>> Overriding branch frequencies is not enough. Statistics on >>> deoptimization events is also polluted. Even if a branch is never taken, >>> JIT doesn't issue an uncommon trap there unless corresponding bytecode >>> doesn't trap too much and doesn't cause too many recompiles. >>> >>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT >>> sees it on some method, Compile::too_many_traps & >>> Compile::too_many_recompiles for that method always return false. It >>> allows JIT to prune the branch based on custom profile and recompile the >>> method, if the branch is visited. >>> >>> For now, I wanted to keep the fix very focused. The next thing I plan to >>> do is to experiment with ignoring deoptimization counts for other >>> LambdaForms which are heavily shared. I already saw problems caused by >>> deoptimization counts pollution (see JDK-8068915 [2]). >>> >>> I plan to backport the fix into 8u40, once I finish extensive >>> performance testing. >>> >>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, >>> Octane). >>> >>> Thanks! >>> >>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915 >>> [2] almost completely recovers peak performance after LambdaForm sharing >>> [3]. There's one more problem left (non-inlined MethodHandle invocations >>> are more expensive when LFs are shared), but it's a story for another >>> day. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877 >>> 8059877: GWT branch frequencies pollution due to LF sharing >>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915 >>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703 >>> JEP 210: LambdaForm Reduction and Caching >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > From marcus.lagergren at oracle.com Wed Jan 21 13:48:33 2015 From: marcus.lagergren at oracle.com (Marcus Lagergren) Date: Wed, 21 Jan 2015 14:48:33 +0100 Subject: JFokus 2015 - the VM Tech Day In-Reply-To: <36D2A3B5-A5ED-421A-8DAC-379712002366@oracle.com> References: <36D2A3B5-A5ED-421A-8DAC-379712002366@oracle.com> Message-ID: Btw, I have a few 50% discounts left for the VM tech day. If you are interested, please e-mail me directly! /Marcus > On 19 Jan 2015, at 10:58, Marcus Lagergren wrote: > > And to further clarify things - you can attend _only_ the VM Tech day / tech summit, should you so desire, and skip the rest of the JFokus conference. (What a strange thing to do, given the quality of JFokus, but I can?t be the one questioning your priorities here) > > (http://www.jfokus.se/jfokus/register.jsp ) > > /M > >> On 18 Jan 2015, at 22:54, Marcus Lagergren > wrote: >> >> Greetings community members! >> >> Here is something that I'm sure you'll find interesting. >> >> I want to advertise the upcoming "VM tech day? event, scheduled to >> take place February 2, 2015 at the JFokus conference in >> Stockholm. Sorry I am on a bit of a short notice here, but finalizing >> the speaker list took us a bit more time than expected. >> >> The VM tech day is a mini-track that runs the first day of the JFokus >> conference. This is its schedule: >> https://www.jfokus.se/jfokus/jvmtech.jsp >> >> After some rather challenging months of jigsaw puzzles, it is with >> great pleasure that I can announce that our speaker line up is now >> complete - and it is great indeed! We are talking 100% gurus, >> prophets, ninjas, rock stars, and all other similar terms that >> normally gets your resume binned if it passes my desk. But in this >> case the labels are true. We have strictly top names from both the >> commercial world and from academia ready to take you on a great >> ride. >> >> So what is the VM tech day? For those of you familiar with the JVM >> Language Summit (JVMLS) that usually takes place in Santa Clara in >> the summers, the format is similar. It?s the usual deal: anyone >> morbidly interested in runtime internals, code generation, polyglot >> programming and the complexities of language implementation, should >> find a veritable gold mine of stimulating conversation and knowledge >> transfer here. What is different from a typical JVMLS (except for the >> shorter duration), is that we have widened the scope a bit to include >> several runtimes, language implementation issues and polyglot >> problems. >> >> There will be six scheduled sessions and plenty of time for breakouts >> and discussions. We will also heavily encourage audience interaction >> and participation. >> >> The JFokus VM tech day is opened by John Rose. I am sure John needs >> no introduction to the subscribers of this list. With advanced OpenJDK >> projects like Valhalla and Panama booting up, John will discuss what >> the JVM has in store for the future. >> >> Other speakers include the tireless Charlie Nutter from Red Hat, the >> formidable Remi Forax, the brilliant Vyacheslav Egorov of Google v8 >> fame, the esteemed Dan Heidinga from IBM and the good looking Attila >> Szegedi from Oracle. >> >> We also have plenty of non-speaking celebrity participants in the >> audience, for example Fredrik ?hrstr?m: invokedynamic specification >> wizard extraordinaire and architect behind the new OpenJDK build >> system. Stop by and get autographs ;) >> >> Thusly: if you are attending JFokus, or if you are making up your mind >> about attending it right now, the VM tech summit is definitely >> something anyone subscribing to mlvm-dev wouldn't want to miss. The >> cross-platform/cross-technology/cross-company focus that we have tried >> very hard to create will without a doubt be ultra stimulating. Of that >> you can be sure. >> >> Please help us spread the word in whatever forums you deem >> appropriate! Talk to you friends! Tweet links to this post! Yell from >> your cubicle soap boxes across the neverending seas of fluorescent >> lights! >> >> Any further questions you may have about the event, not answered by >> the web pages, can be directed either to me (@lagergren) or Mattias >> Karlsson (@matkar) or as replies to this e-mail thread. >> >> On behalf of JFokus / VM Tech Day 2015 >> Marcus Lagergren >> Master of ceremonies (or something) >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Wed Jan 21 16:25:18 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 21 Jan 2015 19:25:18 +0300 Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using MH.invoke/invokeExact Message-ID: <54BFD2EE.3060909@oracle.com> http://cr.openjdk.java.net/~vlivanov/8069591/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8069591 Overhead of non-inlined MH.invoke/invokeExact calls significantly increased with LambdaForm sharing. The cause is JIT compiler can't produce a single nmethod for the whole MethodHandle chain, so the execution is spread around numerous nmethods (1 per each MethodHandle in the chain). The longer the chain the larger overhead. The fix is to customize LambdaForms (create a dedicated LambdaForm for a MethodHandle). Per-MethodHandle count is introduced, which is incremented every time a MethodHandle is invoked using MethodHandle.invoke/invokeExact. Once CUSTOMIZE_THRESHOLD is reached for a particular MethodHandle, it's LambdaForm is substituted with a customized one, which has it's MethodHandle embedded. It allows JIT to see actual MethodHandle during compilation and produce more efficient code. This fix completely recovers Gbemu peak performance to pre-LambdaForm sharing level. Testing: jck (api/java_lang/invoke), jdk/java/lang/invoke, nashorn tests, nashorn/octane Thanks! Best regards, Vladimir Ivanov From forax at univ-mlv.fr Wed Jan 21 17:31:05 2015 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 21 Jan 2015 18:31:05 +0100 Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using MH.invoke/invokeExact In-Reply-To: <54BFD2EE.3060909@oracle.com> References: <54BFD2EE.3060909@oracle.com> Message-ID: <54BFE259.1090402@univ-mlv.fr> Hi Vladimir, in Invokers.java, I think that checkCustomized should take an Object and not a MethodHandle exactly like getCallSiteTarget takes an Object and not a CallSite. in MethodHandle.java, customizationCount is declared as a byte and there is no check that the CUSTOMIZE_THRESHOLD is not greater than 127. cheers, R?mi On 01/21/2015 05:25 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8069591/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8069591 > > Overhead of non-inlined MH.invoke/invokeExact calls significantly > increased with LambdaForm sharing. The cause is JIT compiler can't > produce a single nmethod for the whole MethodHandle chain, so the > execution is spread around numerous nmethods (1 per each MethodHandle > in the chain). The longer the chain the larger overhead. > > The fix is to customize LambdaForms (create a dedicated LambdaForm for > a MethodHandle). Per-MethodHandle count is introduced, which is > incremented every time a MethodHandle is invoked using > MethodHandle.invoke/invokeExact. Once CUSTOMIZE_THRESHOLD is reached > for a particular MethodHandle, it's LambdaForm is substituted with a > customized one, which has it's MethodHandle embedded. It allows JIT to > see actual MethodHandle during compilation and produce more efficient > code. > > This fix completely recovers Gbemu peak performance to pre-LambdaForm > sharing level. > > Testing: jck (api/java_lang/invoke), jdk/java/lang/invoke, nashorn > tests, nashorn/octane > > Thanks! > > Best regards, > Vladimir Ivanov > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From john.r.rose at oracle.com Wed Jan 21 19:30:28 2015 From: john.r.rose at oracle.com (John Rose) Date: Wed, 21 Jan 2015 11:30:28 -0800 Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using MH.invoke/invokeExact In-Reply-To: <54BFE259.1090402@univ-mlv.fr> References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr> Message-ID: <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com> On Jan 21, 2015, at 9:31 AM, Remi Forax wrote: > > in Invokers.java, I think that checkCustomized should take an Object and not a MethodHandle > exactly like getCallSiteTarget takes an Object and not a CallSite. The use of erased types (any ref => Object) in the MH runtime is an artifact of bootstrapping difficulties, early in the project. I hope it is not necessary any more. That said, I agree that the pattern should be consistent. Vladimir, would you please file a tracking bug for this cleanup, to change MH library functions to use stronger types instead of Object? > in MethodHandle.java, customizationCount is declared as a byte and there is no check that > the CUSTOMIZE_THRESHOLD is not greater than 127. Yes. Also, the maybeCustomize method has a race condition that could cause the counter to wrap. It shouldn't use "+=1" to increment; it should load the old counter value, test it, increment it (in a local), and then store the updated value. That is also one possible place to deal with jumbo CUSTOMIZE_THRESHOLD values. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Jan 22 17:56:30 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 22 Jan 2015 20:56:30 +0300 Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using MH.invoke/invokeExact In-Reply-To: <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com> References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr> <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com> Message-ID: <54C139CE.4000005@oracle.com> Remi, John, thanks for review! Updated webrev: http://cr.openjdk.java.net/~vlivanov/8069591/webrev.01/ This time I did additional testing (COMPILE_THRESHOLD > 0) and spotted a problem with MethodHandle.copyWith(): a MethodHandle can inherit customized LambdaForm this way. I could have added LambdaForm::uncustomize() call in evey Species_*::copyWith() method, but I decided to add it into MethodHandle constructor. Let me know if you think it's too intrusive. Also, I made DirectMethodHandles a special-case, since I don't see any benefit in customizing them. Best regards, Vladimir Ivanov On 1/21/15 10:30 PM, John Rose wrote: > On Jan 21, 2015, at 9:31 AM, Remi Forax wrote: >> >> in Invokers.java, I think that checkCustomized should take an Object and not a MethodHandle >> exactly like getCallSiteTarget takes an Object and not a CallSite. > > The use of erased types (any ref => Object) in the MH runtime is an artifact of bootstrapping difficulties, early in the project. I hope it is not necessary any more. That said, I agree that the pattern should be consistent. > > Vladimir, would you please file a tracking bug for this cleanup, to change MH library functions to use stronger types instead of Object? > >> in MethodHandle.java, customizationCount is declared as a byte and there is no check that >> the CUSTOMIZE_THRESHOLD is not greater than 127. > > Yes. Also, the maybeCustomize method has a race condition that could cause the counter to wrap. It shouldn't use "+=1" to increment; it should load the old counter value, test it, increment it (in a local), and then store the updated value. That is also one possible place to deal with jumbo CUSTOMIZE_THRESHOLD values. > > ? John > From vladimir.x.ivanov at oracle.com Thu Jan 22 18:21:52 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 22 Jan 2015 21:21:52 +0300 Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using MH.invoke/invokeExact In-Reply-To: <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com> References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr> <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com> Message-ID: <54C13FC0.1030608@oracle.com> >> in Invokers.java, I think that checkCustomized should take an Object and not a MethodHandle >> exactly like getCallSiteTarget takes an Object and not a CallSite. > > The use of erased types (any ref => Object) in the MH runtime is an artifact of bootstrapping difficulties, early in the project. I hope it is not necessary any more. That said, I agree that the pattern should be consistent. Sure. Here is it [1] Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8071368 From john.r.rose at oracle.com Thu Jan 22 23:30:59 2015 From: john.r.rose at oracle.com (John Rose) Date: Thu, 22 Jan 2015 15:30:59 -0800 Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using MH.invoke/invokeExact In-Reply-To: <54C139CE.4000005@oracle.com> References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr> <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com> <54C139CE.4000005@oracle.com> Message-ID: On Jan 22, 2015, at 9:56 AM, Vladimir Ivanov wrote: > > Remi, John, thanks for review! > > Updated webrev: > http://cr.openjdk.java.net/~vlivanov/8069591/webrev.01/ > > This time I did additional testing (COMPILE_THRESHOLD > 0) and spotted a problem with MethodHandle.copyWith(): a MethodHandle can inherit customized LambdaForm this way. I could have added LambdaForm::uncustomize() call in evey Species_*::copyWith() method, but I decided to add it into MethodHandle constructor. Let me know if you think it's too intrusive. It's OK to put it there. Now I'm worried that the new customization logic will defeat code sharing for invoked MHs, since uncustomize creates a new LF that is a duplicate of the original LF. That breaks the genetic link for children of the invoked MH, doesn't it? (I like the compileToBytecode call, if it is done on the original.) In fact, that is also a potential problem for the first version of your patch, also. Suggestion: Have every customized LF contain a direct link to its uncustomized original. Have uncustomize just return that same original, every time. Then, when using LF editor operations to derive new LFs, always have them extract the original before making a derivation. (Alternatively, have the LF editor caches be shared between original LFs and all their customized versions. But that doesn't save all the genetic links.) > Also, I made DirectMethodHandles a special-case, since I don't see any benefit in customizing them. The overriding method in DHM should be marked @Override, so that we know all the bits fit together. ? John From john.r.rose at oracle.com Fri Jan 23 01:31:59 2015 From: john.r.rose at oracle.com (John Rose) Date: Thu, 22 Jan 2015 17:31:59 -0800 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54BEA7D7.6080008@oracle.com> References: <54B94766.2080102@oracle.com> <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> <54BEA7D7.6080008@oracle.com> Message-ID: <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com> On Jan 20, 2015, at 11:09 AM, Vladimir Ivanov wrote: > >> What I'm mainly poking at here is that 'isGWT' is not informative about >> the intended use of the flag. > I agree. It was an interim solution. Initially, I planned to introduce customization and guide the logic based on that property. But it's not there yet and I needed something for GWT case. Unfortunately, I missed the case when GWT is edited. In that case, isGWT flag is missed and no annotation is set. > So, I removed isGWT flag and introduced a check for selectAlternative occurence in LambdaForm shape, as you suggested. Good. I think there is a sweeter spot just a little further on. Make profileBranch be an LF intrinsic and expose it like this: GWT(p,t,f;S) := let(a=new int[3]) in lambda(*: S) { selectAlternative(profileBranch(p.invoke( *), a), t, f).invoke( *); } Then selectAlternative triggers branchy bytecodes in the IBGen, and profileBranch injects profiling in C2. The presence of profileBranch would then trigger the @Shared annotation, if you still need it. After thinking about it some more, I still believe it would be better to detect the use of profileBranch during a C2 compile task, and feed that to the too_many_traps logic. I agree it is much easier to stick the annotation on in the IBGen; the problem is that because of a minor phase ordering problem you are introducing an annotation which flows from the JDK to the VM. Here's one more suggestion at reducing this coupling? Note that C->set_trap_count is called when each Parse phase processes a whole method. This means that information about the contents of the nmethod accumulates during the parse. Likewise, add a flag method C->{has,set}_injected_profile, and set the flag whenever the parser sees a profileBranch intrinsic (with or without a constant profile array; your call). Then consult that flag from too_many_traps. It is true that code which is parsed upstream of the very first profileBranch will potentially issue a non-trapping fallback, but by definition that code would be unrelated to the injected profile, so I don't see a harm in that. If this approach works, then you can remove the annotation altogether, which is clearly preferable. We understand the annotation now, but it has the danger of becoming a maintainer's puzzlement. > >> In 'updateCounters', if the counter overflows, you'll get continuous >> creation of ArithmeticExceptions. Will that optimize or will it cause a >> permanent slowdown? Consider a hack like this on the exception path: >> counters[idx] = Integer.MAX_VALUE / 2; > I had an impression that VM optimizes overflows in Math.exact* intrinsics, but it's not the case - it always inserts an uncommon trap. I used the workaround you proposed. Good. > >> On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in >> the VM) promises too much "ignorance", since it suppresses branch counts >> and traps, but allows type profiles to be consulted. Maybe something >> positive like "@ManyTraps" or "@SharedMegamorphic"? (It's just a name, >> and this is just a suggestion.) > What do you think about @LambdaForm.Shared? That's fine. Suggest changing the JVM accessor to is_lambda_form_shared, because the term "shared" is already overused in the VM. Or, to be much more accurate, s/@Shared/@CollectiveProfile/. Better yet, get rid of it, as suggested above. (I just realized that profile pollution looks logically parallel to the http://en.wikipedia.org/wiki/Tragedy_of_the_commons .) Also, in the comment explaining the annotation: s/mostly useless/probably polluted by conflicting behavior from multiple call sites/ I very much like the fact that profileBranch is the VM intrinsic, not selectAlternative. A VM intrinsic should be nice and narrow like that. In fact, you can delete selectAlternative from vmSymbols while you are at it. (We could do profileInteger and profileClass in a similar way, if that turned out to be useful.) ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.levart at gmail.com Fri Jan 23 14:38:58 2015 From: peter.levart at gmail.com (Peter Levart) Date: Fri, 23 Jan 2015 15:38:58 +0100 Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using MH.invoke/invokeExact In-Reply-To: References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr> <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com> <54C139CE.4000005@oracle.com> Message-ID: <54C25D02.2020209@gmail.com> On 01/23/2015 12:30 AM, John Rose wrote: > On Jan 22, 2015, at 9:56 AM, Vladimir Ivanov wrote: >> Remi, John, thanks for review! >> >> Updated webrev: >> http://cr.openjdk.java.net/~vlivanov/8069591/webrev.01/ >> >> This time I did additional testing (COMPILE_THRESHOLD > 0) and spotted a problem with MethodHandle.copyWith(): a MethodHandle can inherit customized LambdaForm this way. I could have added LambdaForm::uncustomize() call in evey Species_*::copyWith() method, but I decided to add it into MethodHandle constructor. Let me know if you think it's too intrusive. > It's OK to put it there. > > Now I'm worried that the new customization logic will defeat code sharing for invoked MHs, since uncustomize creates a new LF that is a duplicate of the original LF. That breaks the genetic link for children of the invoked MH, doesn't it? (I like the compileToBytecode call, if it is done on the original.) In fact, that is also a potential problem for the first version of your patch, also. > > Suggestion: Have every customized LF contain a direct link to its uncustomized original. Have uncustomize just return that same original, every time. Then, when using LF editor operations to derive new LFs, always have them extract the original before making a derivation. The customized LF then don't need 'transformCache' field. It could be re-used to point to original uncustomized LF. That would also be a signal for LF editor (the 4th type of payload attached to transformCache field) to follow the link to get to the uncustomized LF... Peter > > (Alternatively, have the LF editor caches be shared between original LFs and all their customized versions. But that doesn't save all the genetic links.) > >> Also, I made DirectMethodHandles a special-case, since I don't see any benefit in customizing them. > The overriding method in DHM should be marked @Override, so that we know all the bits fit together. > > ? John > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From vladimir.x.ivanov at oracle.com Fri Jan 23 16:00:53 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 23 Jan 2015 19:00:53 +0300 Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using MH.invoke/invokeExact In-Reply-To: <54C25D02.2020209@gmail.com> References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr> <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com> <54C139CE.4000005@oracle.com> <54C25D02.2020209@gmail.com> Message-ID: <54C27035.5050103@oracle.com> Good idea, Peter! Updated version: http://cr.openjdk.java.net/~vlivanov/8069591/webrev.02/ Best regards, Vladimir Ivanov On 1/23/15 5:38 PM, Peter Levart wrote: > On 01/23/2015 12:30 AM, John Rose wrote: >> On Jan 22, 2015, at 9:56 AM, Vladimir Ivanov >> wrote: >>> Remi, John, thanks for review! >>> >>> Updated webrev: >>> http://cr.openjdk.java.net/~vlivanov/8069591/webrev.01/ >>> >>> This time I did additional testing (COMPILE_THRESHOLD > 0) and >>> spotted a problem with MethodHandle.copyWith(): a MethodHandle can >>> inherit customized LambdaForm this way. I could have added >>> LambdaForm::uncustomize() call in evey Species_*::copyWith() method, >>> but I decided to add it into MethodHandle constructor. Let me know if >>> you think it's too intrusive. >> It's OK to put it there. >> >> Now I'm worried that the new customization logic will defeat code >> sharing for invoked MHs, since uncustomize creates a new LF that is a >> duplicate of the original LF. That breaks the genetic link for >> children of the invoked MH, doesn't it? (I like the compileToBytecode >> call, if it is done on the original.) In fact, that is also a >> potential problem for the first version of your patch, also. >> >> Suggestion: Have every customized LF contain a direct link to its >> uncustomized original. Have uncustomize just return that same >> original, every time. Then, when using LF editor operations to derive >> new LFs, always have them extract the original before making a >> derivation. > > The customized LF then don't need 'transformCache' field. It could be > re-used to point to original uncustomized LF. That would also be a > signal for LF editor (the 4th type of payload attached to transformCache > field) to follow the link to get to the uncustomized LF... > > Peter > >> >> (Alternatively, have the LF editor caches be shared between original >> LFs and all their customized versions. But that doesn't save all the >> genetic links.) >> >>> Also, I made DirectMethodHandles a special-case, since I don't see >>> any benefit in customizing them. >> The overriding method in DHM should be marked @Override, so that we >> know all the bits fit together. >> >> ? John >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > From john.r.rose at oracle.com Fri Jan 23 18:13:40 2015 From: john.r.rose at oracle.com (John Rose) Date: Fri, 23 Jan 2015 10:13:40 -0800 Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using MH.invoke/invokeExact In-Reply-To: <54C27035.5050103@oracle.com> References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr> <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com> <54C139CE.4000005@oracle.com> <54C25D02.2020209@gmail.com> <54C27035.5050103@oracle.com> Message-ID: On Jan 23, 2015, at 8:00 AM, Vladimir Ivanov wrote: > > Good idea, Peter! +1 > Updated version: > http://cr.openjdk.java.net/~vlivanov/8069591/webrev.02/ Yes, that's good, and you can count me as a reviewer. ? John P.S. One could also get rid of the LF.customized field by stuffing both that value and the original LF in the transformCache (as a 2-array), but that's overkill. P.P.S. A possible generalization to the LF.customized field would be an optional list of type, value, and/or structure constraints for one or more arguments to the LF. Then we could (a) customize on additional arguments if we thought that were useful, and/or (b) produce semi-custom code that could be shared by more than one MH, if we thought there was an interesting equivalence class of MHs to speed up with common code. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Mon Jan 26 16:41:50 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 26 Jan 2015 19:41:50 +0300 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com> References: <54B94766.2080102@oracle.com> <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> <54BEA7D7.6080008@oracle.com> <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com> Message-ID: <54C66E4E.9050805@oracle.com> John, What do you think about the following version? http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02 As you suggested, I reified MHI::profileBranch on LambdaForm level and removed @LambdaForm.Shared. My main concern about removing @Sharen was that profile pollution can affect the code before profileBranch call (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is sensitive to that change (there's a 10% difference in peak performance between @Shared and has_injected_profile()). I can leave @Shared as is for now or remove it and work on the fix to the deoptimization counts pollution. What do you prefer? Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8068915 On 1/23/15 4:31 AM, John Rose wrote: > On Jan 20, 2015, at 11:09 AM, Vladimir Ivanov > > wrote: >> >>> What I'm mainly poking at here is that 'isGWT' is not informative about >>> the intended use of the flag. >> I agree. It was an interim solution. Initially, I planned to introduce >> customization and guide the logic based on that property. But it's not >> there yet and I needed something for GWT case. Unfortunately, I missed >> the case when GWT is edited. In that case, isGWT flag is missed and no >> annotation is set. >> So, I removed isGWT flag and introduced a check for selectAlternative >> occurence in LambdaForm shape, as you suggested. > > Good. > > I think there is a sweeter spot just a little further on. Make > profileBranch be an LF intrinsic and expose it like this: > GWT(p,t,f;S) := let(a=new int[3]) in lambda(*: S) { > selectAlternative(profileBranch(p.invoke( *), a), t, f).invoke( *); } > > Then selectAlternative triggers branchy bytecodes in the IBGen, and > profileBranch injects profiling in C2. > The presence of profileBranch would then trigger the @Shared annotation, > if you still need it. > > After thinking about it some more, I still believe it would be better to > detect the use of profileBranch during a C2 compile task, and feed that > to the too_many_traps logic. I agree it is much easier to stick the > annotation on in the IBGen; the problem is that because of a minor phase > ordering problem you are introducing an annotation which flows from the > JDK to the VM. Here's one more suggestion at reducing this coupling? > > Note that C->set_trap_count is called when each Parse phase processes a > whole method. This means that information about the contents of the > nmethod accumulates during the parse. Likewise, add a flag method > C->{has,set}_injected_profile, and set the flag whenever the parser sees > a profileBranch intrinsic (with or without a constant profile array; > your call). Then consult that flag from too_many_traps. It is true > that code which is parsed upstream of the very first profileBranch will > potentially issue a non-trapping fallback, but by definition that code > would be unrelated to the injected profile, so I don't see a harm in > that. If this approach works, then you can remove the annotation > altogether, which is clearly preferable. We understand the annotation > now, but it has the danger of becoming a maintainer's puzzlement. > >> >>> In 'updateCounters', if the counter overflows, you'll get continuous >>> creation of ArithmeticExceptions. Will that optimize or will it cause a >>> permanent slowdown? Consider a hack like this on the exception path: >>> counters[idx] = Integer.MAX_VALUE / 2; >> I had an impression that VM optimizes overflows in Math.exact* >> intrinsics, but it's not the case - it always inserts an uncommon >> trap. I used the workaround you proposed. > > Good. > >> >>> On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in >>> the VM) promises too much "ignorance", since it suppresses branch counts >>> and traps, but allows type profiles to be consulted. Maybe something >>> positive like "@ManyTraps" or "@SharedMegamorphic"? (It's just a name, >>> and this is just a suggestion.) >> What do you think about @LambdaForm.Shared? > > That's fine. Suggest changing the JVM accessor to > is_lambda_form_shared, because the term "shared" is already overused in > the VM. > > Or, to be much more accurate, s/@Shared/@CollectiveProfile/. Better > yet, get rid of it, as suggested above. > > (I just realized that profile pollution looks logically parallel to the > http://en.wikipedia.org/wiki/Tragedy_of_the_commons .) > > Also, in the comment explaining the annotation: > s/mostly useless/probably polluted by conflicting behavior from > multiple call sites/ > > I very much like the fact that profileBranch is the VM intrinsic, not > selectAlternative. A VM intrinsic should be nice and narrow like that. > In fact, you can delete selectAlternative from vmSymbols while you are > at it. > > (We could do profileInteger and profileClass in a similar way, if that > turned out to be useful.) > > ? John From vladimir.x.ivanov at oracle.com Mon Jan 26 18:31:30 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 26 Jan 2015 21:31:30 +0300 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54C66E4E.9050805@oracle.com> References: <54B94766.2080102@oracle.com> <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> <54BEA7D7.6080008@oracle.com> <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com> <54C66E4E.9050805@oracle.com> Message-ID: <54C68802.7020105@oracle.com> > As you suggested, I reified MHI::profileBranch on LambdaForm level and > removed @LambdaForm.Shared. My main concern about removing @Sharen was > that profile pollution can affect the code before profileBranch call > (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is > sensitive to that change (there's a 10% difference in peak performance > between @Shared and has_injected_profile()). Ignore that. Additional runs don't prove there's a regression on Gbemu. There's some variance on Gbemu and it's present w/ and w/o @Shared. Best regards, Vladimir Ivanov > I can leave @Shared as is for now or remove it and work on the fix to > the deoptimization counts pollution. What do you prefer? > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8068915 > > On 1/23/15 4:31 AM, John Rose wrote: >> On Jan 20, 2015, at 11:09 AM, Vladimir Ivanov >> > >> wrote: >>> >>>> What I'm mainly poking at here is that 'isGWT' is not informative about >>>> the intended use of the flag. >>> I agree. It was an interim solution. Initially, I planned to introduce >>> customization and guide the logic based on that property. But it's not >>> there yet and I needed something for GWT case. Unfortunately, I missed >>> the case when GWT is edited. In that case, isGWT flag is missed and no >>> annotation is set. >>> So, I removed isGWT flag and introduced a check for selectAlternative >>> occurence in LambdaForm shape, as you suggested. >> >> Good. >> >> I think there is a sweeter spot just a little further on. Make >> profileBranch be an LF intrinsic and expose it like this: >> GWT(p,t,f;S) := let(a=new int[3]) in lambda(*: S) { >> selectAlternative(profileBranch(p.invoke( *), a), t, f).invoke( *); } >> >> Then selectAlternative triggers branchy bytecodes in the IBGen, and >> profileBranch injects profiling in C2. >> The presence of profileBranch would then trigger the @Shared annotation, >> if you still need it. >> >> After thinking about it some more, I still believe it would be better to >> detect the use of profileBranch during a C2 compile task, and feed that >> to the too_many_traps logic. I agree it is much easier to stick the >> annotation on in the IBGen; the problem is that because of a minor phase >> ordering problem you are introducing an annotation which flows from the >> JDK to the VM. Here's one more suggestion at reducing this coupling? >> >> Note that C->set_trap_count is called when each Parse phase processes a >> whole method. This means that information about the contents of the >> nmethod accumulates during the parse. Likewise, add a flag method >> C->{has,set}_injected_profile, and set the flag whenever the parser sees >> a profileBranch intrinsic (with or without a constant profile array; >> your call). Then consult that flag from too_many_traps. It is true >> that code which is parsed upstream of the very first profileBranch will >> potentially issue a non-trapping fallback, but by definition that code >> would be unrelated to the injected profile, so I don't see a harm in >> that. If this approach works, then you can remove the annotation >> altogether, which is clearly preferable. We understand the annotation >> now, but it has the danger of becoming a maintainer's puzzlement. >> >>> >>>> In 'updateCounters', if the counter overflows, you'll get continuous >>>> creation of ArithmeticExceptions. Will that optimize or will it >>>> cause a >>>> permanent slowdown? Consider a hack like this on the exception path: >>>> counters[idx] = Integer.MAX_VALUE / 2; >>> I had an impression that VM optimizes overflows in Math.exact* >>> intrinsics, but it's not the case - it always inserts an uncommon >>> trap. I used the workaround you proposed. >> >> Good. >> >>> >>>> On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in >>>> the VM) promises too much "ignorance", since it suppresses branch >>>> counts >>>> and traps, but allows type profiles to be consulted. Maybe something >>>> positive like "@ManyTraps" or "@SharedMegamorphic"? (It's just a name, >>>> and this is just a suggestion.) >>> What do you think about @LambdaForm.Shared? >> >> That's fine. Suggest changing the JVM accessor to >> is_lambda_form_shared, because the term "shared" is already overused in >> the VM. >> >> Or, to be much more accurate, s/@Shared/@CollectiveProfile/. Better >> yet, get rid of it, as suggested above. >> >> (I just realized that profile pollution looks logically parallel to the >> http://en.wikipedia.org/wiki/Tragedy_of_the_commons .) >> >> Also, in the comment explaining the annotation: >> s/mostly useless/probably polluted by conflicting behavior from >> multiple call sites/ >> >> I very much like the fact that profileBranch is the VM intrinsic, not >> selectAlternative. A VM intrinsic should be nice and narrow like that. >> In fact, you can delete selectAlternative from vmSymbols while you are >> at it. >> >> (We could do profileInteger and profileClass in a similar way, if that >> turned out to be useful.) >> >> ? John From john.r.rose at oracle.com Tue Jan 27 00:04:03 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 26 Jan 2015 16:04:03 -0800 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54C66E4E.9050805@oracle.com> References: <54B94766.2080102@oracle.com> <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> <54BEA7D7.6080008@oracle.com> <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com> <54C66E4E.9050805@oracle.com> Message-ID: <915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com> On Jan 26, 2015, at 8:41 AM, Vladimir Ivanov wrote: > > What do you think about the following version? > http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02 > > As you suggested, I reified MHI::profileBranch on LambdaForm level and removed @LambdaForm.Shared. My main concern about removing @Sharen was that profile pollution can affect the code before profileBranch call (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is sensitive to that change (there's a 10% difference in peak performance between @Shared and has_injected_profile()). > > I can leave @Shared as is for now or remove it and work on the fix to the deoptimization counts pollution. What do you prefer? Generic advice here: It's better to leave it out, if in doubt. If it has a real benefit, and we don't have time to make it clean, put it in and file a tracking bug to clean it up. I re-read the change. It's simpler and more coherent now. I see one more issue which we should fix now, while we can. It's the sort of thing which is hard to clean up later. The two fields of the profileBranch array have obscure and inconsistent labelings. It took me some hard thought and the inspection of three files to decide what "taken" and "not taken" mean in the C2 code that injects the profile. The problem is that, when you look at profileBranch, all you see is an integer (boolean) argument and an array, and no clear indication about which array element corresponds to which argument value. It's made worse by the fact that "taken" and "not taken" are not mentioned at all in the JDK code, which instead wires together the branches of selectAlternative without much comment. My preferred formulation, for making things clearer: Decouple the idea of branching from the idea of profile injection. Name the intrinsic (yes, one more bikeshed color) "profileBoolean" (or even "injectBooleanProfile"), and use the natural indexing of the array: 0 (Java false) is a[0], and 1 (Java true) is a[1]. We might later extend this to work with "booleans" (more generally, small-integer flags), of more than two possible values, klasses, etc. This line then goes away, and 'result' is used directly as the profile index: + int idx = result ? 0 : 1; The ProfileBooleanNode should have an embedded (or simply indirect) array of ints which is a simple copy of the profile array, so there's no doubt about which count is which. The parsing of the predicate that contains "profileBoolean" should probably be more robust, at least allowing for 'eq' and 'ne' versions of the test. (C2 freely flips comparison senses, in various places.) The check for Op_AndI must be more precise; make sure n->in(2) is a constant of the expected value (1). The most robust way to handle it (but try this another time, I think) would be to make two temp copies of the predicate, substituting the occurrence of ProfileBoolean with '0' and '1', respectively; if they both fold to '0' and '1' or '1' and '0', then you take the indicated action. I suggest putting the new code in Parse::dynamic_branch_prediction, which pattern-matches for injected profiles, into its own subroutine. Maybe: bool use_mdo = true; if (has_injected_profile(btest, test, &taken, ¬_taken)) { use_mdo = false; } if (use_mdo) { ... old code I see why you used the opposite order in the existing code: It mirrors the order of the second and third arguments to selectAlternative. But the JVM knows nothing about selectAlternative, so it's just confusing when reading the VM code to know which profile array element means what. ? John P.S. Long experience with byte-order bugs in HotSpot convinces me that if you are not scrupulously clear in your terms, when working with equal and opposite configuration pairs, you will have a long bug tail, especially if you have to maintain agreement about the configurations through many layers of software. This is one of those cases. The best chance to fix such bugs is not to allow them in the first place. In the case of byte-order, we have "first" vs. "second", "MSB" vs. "LSB", and "high" vs. "low" parts of values, for values in memory and in registers, and all possible misunderstandings about them and their relation have probably happened and caused bugs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From raahul.kumar at gmail.com Tue Jan 27 01:29:02 2015 From: raahul.kumar at gmail.com (Raahul Kumar) Date: Tue, 27 Jan 2015 11:29:02 +1000 Subject: [9] RFR (M): 8063 a,eward137: Never-taken branches should be prtuned when GWT LambdaForms are shared Message-ID: Career On 27 Jan 2015 10:04, "John Rose" wrote: > On Jan 26, 2015, at 8:41 AM, Vladimir Ivanov > wrote: > > > What do you think about the following version? > http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02 > > As you suggested, I reified MHI::profileBranch on LambdaForm level and > removed @LambdaForm.Shared. My main concern about removing @Sharen was that > profile pollution can affect the code before profileBranch call (akin to > 8068915 [1]) and it seems it's the case: Gbemu (at least) is sensitive to > that change (there's a 10% difference in peak performance between @Shared > and has_injected_profile()). > > I can leave @Shared as is for now or remove it and work on the fix to the > deoptimization counts pollution. What do you prefer? > > > Generic advice here: It's better to leave it out, if in doubt. If it has > a real benefit, and we don't have time to make it clean, put it in and file > a tracking bug to clean it up. > > I re-read the change. It's simpler and more coherent now. > > I see one more issue which we should fix now, while we can. It's the sort > of thing which is hard to clean up later. > > The two fields of the profileBranch array have obscure and inconsistent > labelings. It took me some hard thought and the inspection of three files > to decide what "taken" and "not taken" mean in the C2 code that injects the > profile. The problem is that, when you look at profileBranch, all you see > is an integer (boolean) argument and an array, and no clear indication > about which array element corresponds to which argument value. It's made > worse by the fact that "taken" and "not taken" are not mentioned at all in > the JDK code, which instead wires together the branches of > selectAlternative without much comment. > > My preferred formulation, for making things clearer: Decouple the idea of > branching from the idea of profile injection. Name the intrinsic (yes, one > more bikeshed color) "profileBoolean" (or even "injectBooleanProfile"), and > use the natural indexing of the array: 0 (Java false) is a[0], and 1 (Java > true) is a[1]. We might later extend this to work with "booleans" (more > generally, small-integer flags), of more than two possible values, klasses, > etc. > > This line then goes away, and 'result' is used directly as the profile > index: > + int idx = result ? 0 : 1; > > The ProfileBooleanNode should have an embedded (or simply indirect) array > of ints which is a simple copy of the profile array, so there's no doubt > about which count is which. > > The parsing of the predicate that contains "profileBoolean" should > probably be more robust, at least allowing for 'eq' and 'ne' versions of > the test. (C2 freely flips comparison senses, in various places.) The > check for Op_AndI must be more precise; make sure n->in(2) is a constant of > the expected value (1). The most robust way to handle it (but try this > another time, I think) would be to make two temp copies of the predicate, > substituting the occurrence of ProfileBoolean with '0' and '1', > respectively; if they both fold to '0' and '1' or '1' and '0', then you > take the indicated action. > > I suggest putting the new code in Parse::dynamic_branch_prediction, which > pattern-matches for injected profiles, into its own subroutine. Maybe: > bool use_mdo = true; > if (has_injected_profile(btest, test, &taken, ¬_taken)) { > use_mdo = false; > } > if (use_mdo) { ... old code > > I see why you used the opposite order in the existing code: It mirrors > the order of the second and third arguments to selectAlternative. But the > JVM knows nothing about selectAlternative, so it's just confusing when > reading the VM code to know which profile array element means what. > > ? John > > P.S. Long experience with byte-order bugs in HotSpot convinces me that if > you are not scrupulously clear in your terms, when working with equal and > opposite configuration pairs, you will have a long bug tail, especially if > you have to maintain agreement about the configurations through many layers > of software. This is one of those cases. The best chance to fix such bugs > is not to allow them in the first place. In the case of byte-order, we > have "first" vs. "second", "MSB" vs. "LSB", and "high" vs. "low" parts of > values, for values in memory and in registers, and all possible > misunderstandings about them and their relation have probably happened and > caused bugs. > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Jan 27 16:05:19 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 27 Jan 2015 19:05:19 +0300 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com> References: <54B94766.2080102@oracle.com> <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> <54BEA7D7.6080008@oracle.com> <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com> <54C66E4E.9050805@oracle.com> <915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com> Message-ID: <54C7B73F.50404@oracle.com> Thanks for the feedback, John! Updated webrev: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/jdk http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/hotspot Changes: - renamed MHI::profileBranch to MHI::profileBoolean, and ProfileBranchNode to ProfileBooleanNode; - restructured profile layout ([0] => false_cnt, [1] => true_cnt) - factored out profile injection in a separate function (has_injected_profile() in parse2.cpp) - ProfileBooleanNode stores true/false counts instead of taken/not_taken counts - matching from value counts to taken/not_taken happens in has_injected_profile(); - added BoolTest::ne support - sharpened test for AndI case: now it checks AndI (ProfileBoolean) (ConI 1) shape Best regards, Vladimir Ivanov On 1/27/15 3:04 AM, John Rose wrote: > On Jan 26, 2015, at 8:41 AM, Vladimir Ivanov > > wrote: >> >> What do you think about the following version? >> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02 >> >> As you suggested, I reified MHI::profileBranch on LambdaForm level and >> removed @LambdaForm.Shared. My main concern about removing @Sharen was >> that profile pollution can affect the code before profileBranch call >> (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is >> sensitive to that change (there's a 10% difference in peak performance >> between @Shared and has_injected_profile()). >> >> I can leave @Shared as is for now or remove it and work on the fix to >> the deoptimization counts pollution. What do you prefer? > > Generic advice here: It's better to leave it out, if in doubt. If it > has a real benefit, and we don't have time to make it clean, put it in > and file a tracking bug to clean it up. > > I re-read the change. It's simpler and more coherent now. > > I see one more issue which we should fix now, while we can. It's the > sort of thing which is hard to clean up later. > > The two fields of the profileBranch array have obscure and inconsistent > labelings. It took me some hard thought and the inspection of three > files to decide what "taken" and "not taken" mean in the C2 code that > injects the profile. The problem is that, when you look at > profileBranch, all you see is an integer (boolean) argument and an > array, and no clear indication about which array element corresponds to > which argument value. It's made worse by the fact that "taken" and "not > taken" are not mentioned at all in the JDK code, which instead wires > together the branches of selectAlternative without much comment. > > My preferred formulation, for making things clearer: Decouple the idea > of branching from the idea of profile injection. Name the intrinsic > (yes, one more bikeshed color) "profileBoolean" (or even > "injectBooleanProfile"), and use the natural indexing of the array: 0 > (Java false) is a[0], and 1 (Java true) is a[1]. We might later extend > this to work with "booleans" (more generally, small-integer flags), of > more than two possible values, klasses, etc. > > This line then goes away, and 'result' is used directly as the profile > index: > + int idx = result ? 0 : 1; > > The ProfileBooleanNode should have an embedded (or simply indirect) > array of ints which is a simple copy of the profile array, so there's no > doubt about which count is which. > > The parsing of the predicate that contains "profileBoolean" should > probably be more robust, at least allowing for 'eq' and 'ne' versions of > the test. (C2 freely flips comparison senses, in various places.) The > check for Op_AndI must be more precise; make sure n->in(2) is a constant > of the expected value (1). The most robust way to handle it (but try > this another time, I think) would be to make two temp copies of the > predicate, substituting the occurrence of ProfileBoolean with '0' and > '1', respectively; if they both fold to '0' and '1' or '1' and '0', then > you take the indicated action. > > I suggest putting the new code in Parse::dynamic_branch_prediction, > which pattern-matches for injected profiles, into its own subroutine. > Maybe: > bool use_mdo = true; > if (has_injected_profile(btest, test, &taken, ¬_taken)) { > use_mdo = false; > } > if (use_mdo) { ... old code > > I see why you used the opposite order in the existing code: It mirrors > the order of the second and third arguments to selectAlternative. But > the JVM knows nothing about selectAlternative, so it's just confusing > when reading the VM code to know which profile array element means what. > > ? John > > P.S. Long experience with byte-order bugs in HotSpot convinces me that > if you are not scrupulously clear in your terms, when working with equal > and opposite configuration pairs, you will have a long bug tail, > especially if you have to maintain agreement about the configurations > through many layers of software. This is one of those cases. The best > chance to fix such bugs is not to allow them in the first place. In the > case of byte-order, we have "first" vs. "second", "MSB" vs. "LSB", and > "high" vs. "low" parts of values, for values in memory and in registers, > and all possible misunderstandings about them and their relation have > probably happened and caused bugs. From john.r.rose at oracle.com Tue Jan 27 21:08:47 2015 From: john.r.rose at oracle.com (John Rose) Date: Tue, 27 Jan 2015 13:08:47 -0800 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54C7B73F.50404@oracle.com> References: <54B94766.2080102@oracle.com> <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> <54BEA7D7.6080008@oracle.com> <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com> <54C66E4E.9050805@oracle.com> <915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com> <54C7B73F.50404@oracle.com> Message-ID: <8AD9A8CC-E570-4DE6-ABB1-10B00FACB8AB@oracle.com> Looking very good, thanks. Ship it! Actually, can you insert a comment why the injected counts are not scaled? (Or perhaps they should be??) Also, we may need a followup bug for the code with this comment: // Look for the following shape: AndI (ProfileBoolean) (ConI 1)) Since profileBoolean returns a TypeInt::BOOL, the AndI with (ConI 1) should fold up. So there's some work to do in MulNode, which may allow that special pattern match to go away. But I don't want to divert the present bug by a possibly complex dive into fixing AndI::Ideal. (Generally speaking, pattern matching should assume strong normalization of its inputs. Otherwise you end up duplicating pattern match code in many places, inconsistently. Funny one-off idiom checks like this are evidence of incomplete IR normalization. See http://en.wikipedia.org/wiki/Rewriting for some background on terms like "normalization" and "confluence" which are relevant to C2.) ? John On Jan 27, 2015, at 8:05 AM, Vladimir Ivanov wrote: > > Thanks for the feedback, John! > > Updated webrev: > http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/jdk > http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/hotspot > > Changes: > - renamed MHI::profileBranch to MHI::profileBoolean, and ProfileBranchNode to ProfileBooleanNode; > - restructured profile layout ([0] => false_cnt, [1] => true_cnt) > - factored out profile injection in a separate function (has_injected_profile() in parse2.cpp) > - ProfileBooleanNode stores true/false counts instead of taken/not_taken counts > - matching from value counts to taken/not_taken happens in has_injected_profile(); > - added BoolTest::ne support > - sharpened test for AndI case: now it checks AndI (ProfileBoolean) (ConI 1) shape > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Jan 28 09:00:55 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 28 Jan 2015 12:00:55 +0300 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <8AD9A8CC-E570-4DE6-ABB1-10B00FACB8AB@oracle.com> References: <54B94766.2080102@oracle.com> <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> <54BEA7D7.6080008@oracle.com> <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com> <54C66E4E.9050805@oracle.com> <915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com> <54C7B73F.50404@oracle.com> <8AD9A8CC-E570-4DE6-ABB1-10B00FACB8AB@oracle.com> Message-ID: <54C8A547.6050607@oracle.com> > Looking very good, thanks. Ship it! Thanks, John! > Actually, can you insert a comment why the injected counts are not scaled? (Or perhaps they should be??) Sure! I intentionally don't scale the counts because I don't see any reason to do so. Profiling is done on per-MethodHandle basis, so the counts should be very close (considering racy updates) to the actual behavior. > Also, we may need a followup bug for the code with this comment: > // Look for the following shape: AndI (ProfileBoolean) (ConI 1)) > > Since profileBoolean returns a TypeInt::BOOL, the AndI with (ConI 1) should fold up. > So there's some work to do in MulNode, which may allow that special pattern match to go away. > But I don't want to divert the present bug by a possibly complex dive into fixing AndI::Ideal. Good catch! It's an overlook on my side. The following change for ProfileBooleanNode solves the problem: - virtual const Type *bottom_type() const { return TypeInt::INT; } + virtual const Type *bottom_type() const { return TypeInt::BOOL; } I polished the change a little according to your comments (diff against v03): http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03-04/hotspot Changes: - added short explanation why injected counts aren't scaled - adjusted ProfileBooleanNode type to TypeInt::BOOL and removed excessive pattern matching in has_injected_profile() - added an assert when ProfileBooleanNode is removed to catch the cases when injected profile isn't used: if we decide to generalize the API, I'd be happy to remove it, but current usages assumes that injected counts are always consumed during parsing and missing cases can cause hard-to-diagnose performance problems. Best regards, Vladimir Ivanov > > (Generally speaking, pattern matching should assume strong normalization of its inputs. Otherwise you end up duplicating pattern match code in many places, inconsistently. Funny one-off idiom checks like this are evidence of incomplete IR normalization. See http://en.wikipedia.org/wiki/Rewriting for some background on terms like "normalization" and "confluence" which are relevant to C2.) > > ? John > > On Jan 27, 2015, at 8:05 AM, Vladimir Ivanov wrote: >> >> Thanks for the feedback, John! >> >> Updated webrev: >> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/jdk >> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/hotspot >> >> Changes: >> - renamed MHI::profileBranch to MHI::profileBoolean, and ProfileBranchNode to ProfileBooleanNode; >> - restructured profile layout ([0] => false_cnt, [1] => true_cnt) >> - factored out profile injection in a separate function (has_injected_profile() in parse2.cpp) >> - ProfileBooleanNode stores true/false counts instead of taken/not_taken counts >> - matching from value counts to taken/not_taken happens in has_injected_profile(); >> - added BoolTest::ne support >> - sharpened test for AndI case: now it checks AndI (ProfileBoolean) (ConI 1) shape >> >> Best regards, >> Vladimir Ivanov > From vladimir.x.ivanov at oracle.com Wed Jan 28 17:12:23 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 28 Jan 2015 20:12:23 +0300 Subject: [9] RFR (XS): 8071787: Don't block inlining when DONT_INLINE_THRESHOLD=0 Message-ID: <54C91877.5040707@oracle.com> http://cr.openjdk.java.net/~vlivanov/8071787/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8071787 For testing & performance measurements, sometimes it's useful to replace block inlining wrappers with trivial reinvokers. This change extends DONT_INLINE_THRESHOLD in the following manner: DONT_INLINE_THRESHOLD = -1: no wrapper DONT_INLINE_THRESHOLD = 0: reinvoker DONT_INLINE_THRESHOLD > 0: counting wrapper Before that DONT_INLINE_THRESHOLD=0 meant a counting wrapper which is removed on the first invocation. After the change, it's DONT_INLINE_THRESHOLD=1. Testing: manual, java/lang/invoke Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Jan 28 17:22:57 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 28 Jan 2015 20:22:57 +0300 Subject: [9] RFR (XXS): 8071788: CountingWrapper.asType() is broken Message-ID: <54C91AF1.3010602@oracle.com> http://cr.openjdk.java.net/~vlivanov/8071788/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8071788 There's a type mismatch between MethodHandle and LambdaForm in CountingWrapper.asTypeUncached(). Sometimes, it leads to a VM crash. The fix is to use adapted MethodHandle to construct LambdaForm. There's no way to reproduce this problem with vanilla 8u40/9 binaries, because CountingWrapper is used only to block inlinining in GWT (MHI::profile() on target and fallback MethodHandles). It means there's no way to call CountingWrapper.asType() on wrapped MethodHandles outside of java.lang.invoke code, and there are no such calls inside it. Testing: manual, java/lang/invoke Thanks! Best regards, Vladimir Ivanov From john.r.rose at oracle.com Wed Jan 28 20:30:37 2015 From: john.r.rose at oracle.com (John Rose) Date: Wed, 28 Jan 2015 12:30:37 -0800 Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared In-Reply-To: <54C8A547.6050607@oracle.com> References: <54B94766.2080102@oracle.com> <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com> <54BEA7D7.6080008@oracle.com> <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com> <54C66E4E.9050805@oracle.com> <915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com> <54C7B73F.50404@oracle.com> <8AD9A8CC-E570-4DE6-ABB1-10B00FACB8AB@oracle.com> <54C8A547.6050607@oracle.com> Message-ID: On Jan 28, 2015, at 1:00 AM, Vladimir Ivanov wrote: > I polished the change a little according to your comments (diff against v03): > http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03-04/hotspot +1 Glad to see the AndI folds up easily; thanks for the cleanup. -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Wed Jan 28 20:40:49 2015 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 28 Jan 2015 21:40:49 +0100 Subject: What's the status of / relation between "JEP 169: Value Objects" / "Value Types for Java" / "Object Layout" Message-ID: Hi everybody, I've recently did some research on Java "value objects" / "value types" / "object layout" (I'll be actually giving a short talk on the topic at FOSDEM[0] this weekend). I just want to quickly summarize my current findings here and gently ask for feedback in case you think I've totally misunderstood something. Of course any comments and additional information is highly welcome as well. 1. JEP 169: Value Objects [1] - Created by John Rose in 2012 (last update in Sep. 2014) - Still in "Draft" state - Proposes a new "lockPermanently()" operator which marks objects as immutable - Seems to be only a little "helper functionality" to simplify automatic boxing/unboxing and escape analysis - Referenced the mlwm mailing list and repository but the mlwm repo seems dead since about 15 month now Question: is JEP 169 still under active development or has it been merged into the more general "Value types for Java" proposal below? 2. "Value types for Java" / "State of the Values" [2] - By J. Rose, B. Goetz and Guy Stele - Based on earlier ideas from "Value Types in the VM" [3] - Newest and most elaborate proposal - Proposes general (i.e. function arguments, return values, variables, arrays), "immutable" value types - Requires fundamental changes to the VM as well as to the Java language - Related to the "State of the Specialization" proposal [4] about support for generics over primitive and value types by B Goetz. - Discussed and developed in the OpenJDK "Valhalla" [5] project - Still very early stage (i.e. no "code" available yet) 3. PackedObjects as provided by the IBM J9 [6,7] - Flattens the memory layout of "@Packed" object fields and array - Removes object headers of and references to "@Packed" objects - Object headers can be generated on the fly (kind of "auto-boxing") - Currently the most complete and mature solution - Not Java-compatible (e.g. can not write to a nested "@Packed" fields). Must be enabled as an experimental extension. 4. ObjectLayout [8] - A pure Java, layout-optimized data structure package - Designed similar to "@ValueSafe"/"ValueType" in [3] and "Value-base classes" in Java 8 [9] - Designed such that it can be tranparently optimized within the VM - VM can transparently layout "@Intrinsic" objects within other objects - All objects are still complete Java object with valid header - The Java part of the library is mature, first native VM-optimizations on the way [10] The "Value types for Java" approach clearly seems to be the most general but also the most complex proposal. It's out of scope for Java 9 and still questionable for Java 10 and above. The "PackedObject" and "ObjectLayout" approaches are clearly simpler and more limited in scope as they only concentrate on better object layout. However the "ObjectLayout" proposal demonstrates that this is still possible within the current Java specification while the "PackedObjects" proposal demonstrated that an optimizing implementation is feasible. I've recently built a prototype which intrinsifies/optimizes some parts of the "ObjectLayout" proposal in the HotSpot [10]. Question: is there a chance to get a some sort of Java-only but transparently optimizable structure package like "ObjectLayout" into Java early (i.e. Java 9)? In my eyes this wouldn't contradict with a more general solution like the one proposed in the "Value types for Java" approach while still offering quite significant performance improvements for quite a big range of problems. And if carefully designed, it could be easily retrofitted to use the new, general "Value Types" once they are available. Question: what would be the right place to propose something like the "ObjectLayout" library for Java 9/10? Would that fit within the umbrella of the Valhalla project or would it be done within its own project / under it's own JEP? Thanks for your patience, Volker [0] https://fosdem.org/2015/schedule/event/packed_objects/ [1] http://openjdk.java.net/jeps/169 [2] http://cr.openjdk.java.net/~jrose/values/values-0.html [3] https://blogs.oracle.com/jrose/entry/value_types_in_the_vm [4] http://cr.openjdk.java.net/~briangoetz/valhalla/specialization.html [5] http://openjdk.java.net/projects/valhalla [6] http://www.slideshare.net/rsciampacone/javaone-2013-introduction-to-packedobjects?related=1 [7] http://medianetwork.oracle.com/video/player/2623645005001 [8] http://objectlayout.org [9] http://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html [10] https://github.com/simonis/ObjectLayout/tree/hotspot_intrinsification/hotspot From john.r.rose at oracle.com Thu Jan 29 03:10:34 2015 From: john.r.rose at oracle.com (John Rose) Date: Wed, 28 Jan 2015 19:10:34 -0800 Subject: [9] RFR (XS): 8071787: Don't block inlining when DONT_INLINE_THRESHOLD=0 In-Reply-To: <54C91877.5040707@oracle.com> References: <54C91877.5040707@oracle.com> Message-ID: Good. Consider fixing the typo in 'makeBlockInlningWrapper'. ? John On Jan 28, 2015, at 9:12 AM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/8071787/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8071787 > > For testing & performance measurements, sometimes it's useful to replace block inlining wrappers with trivial reinvokers. > > This change extends DONT_INLINE_THRESHOLD in the following manner: > DONT_INLINE_THRESHOLD = -1: no wrapper > DONT_INLINE_THRESHOLD = 0: reinvoker > DONT_INLINE_THRESHOLD > 0: counting wrapper > > Before that DONT_INLINE_THRESHOLD=0 meant a counting wrapper which is removed on the first invocation. After the change, it's DONT_INLINE_THRESHOLD=1. > > Testing: manual, java/lang/invoke > > Best regards, > Vladimir Ivanov From john.r.rose at oracle.com Thu Jan 29 03:11:49 2015 From: john.r.rose at oracle.com (John Rose) Date: Wed, 28 Jan 2015 19:11:49 -0800 Subject: [9] RFR (XXS): 8071788: CountingWrapper.asType() is broken In-Reply-To: <54C91AF1.3010602@oracle.com> References: <54C91AF1.3010602@oracle.com> Message-ID: <53D3F321-0259-4878-9767-EA909EF90810@oracle.com> Good. On Jan 28, 2015, at 9:22 AM, Vladimir Ivanov wrote: > > The fix is to use adapted MethodHandle to construct LambdaForm. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.latremoliere at gmail.com Thu Jan 29 11:02:32 2015 From: daniel.latremoliere at gmail.com (=?UTF-8?B?RGFuaWVsIExhdHLDqW1vbGnDqHJl?=) Date: Thu, 29 Jan 2015 12:02:32 +0100 Subject: What's the status of / relation between "JEP 169: Value Objects" / "Value Types for Java" / "Object Layout" In-Reply-To: References: Message-ID: <54CA1348.5050903@gmail.com> > I just want to quickly summarize my > current findings here and gently ask for feedback in case you think > I've totally misunderstood something. Of course any comments and > additional information is highly welcome as well. I don't know if that can be useful, but here is my point of view of developer oriented towards the question: "What feature for solving my problem?". This contains probably some or many errors, but it is another point of view (only mine), if useful. I will not use strictly projects/proposal list as the structure of my mail because content of proposal is changing and it is not my target. I am oriented towards the final user, i.e. the developer consuming these projects, not the implementer working in each of these projects. I will preferably split in three scopes following my perceived split of job between developer and runtime. The problem is data, then what can do JVM/GC with an object? I find two possibilities regarding this domain: move it, clone it. If JVM can clone the object, JVM can also move the object because the clone will not have the same address, then we have the following three features: --- 1) JVM can clone and move objects (Project Valhalla): Constraint: no complex constructor/no complex finalizer, because lifecycle of object is managed by JVM (JVM can clone, then JVM can create and destroy the object like JVM want). Only field affectation constructor, possibly with simple conversion of data format. Constraint: immutable, because we don't know which clone is good when one is modified and because modifying all clones simultaneously is slow/complex/parallel-unfriendly. Constraint: non-null because cloning a non-existing object is a non-existing problem. Use-case "Performance": objects to clone for being closer to execution silicon and better parallelism (registers or cache of CPU/GPU) - Runtime: expose features of CPU/GPU like SIMD (mostly like a modern version of javax.vecmath). - Developer: create custom low-level structures for CPU/GPU parallel computing. - Java language: small tuples, like complex numbers (immutable by performance choice, like SIMD, for being close to silicon; cloned at each pass by value). Use-case "Language": objects to clone for being closer to registers (in stack, then less allocations in heap; simpler than escape analysis) - Java language: multiple return values from a method (immutable because it's a result; cloned, by example, at the return of each delegate or not even created when stack-only). Use-case "Efficiency": others immutable non-null objects possibly concerned for reducing indirection/improving cache, given by specialization of collection classes - Database: primary key for Map (like HashMap)/B-Tree (like MapDB)/SQL (like JPA). A primary key is immutable and non-null by choice of developer, then possible gains. --- 2) JVM can move but not clone objects It's current state of Java objects: Constraint: developer need to define lifecycle in object, for being triggered by GC (constructor/finalizer) like current Java class. Constraint: small object, because when GC move a big object, there is possibly a noticeable latency. Constraint: usable directly only in Java code (because native code will need an indirection level for finding the real address of the object, changing after each move) Improvement by adding custom layout for objects (Project Panama on heap / ObjectLayout): Specific constraint: objects which are near identity-less, i.e. only one other object (the owner) know their identity/have pointer on it. Non-constraint: applicable to all objects types, contrary to Project Valhalla. Applicable to complex constructor, because complex constructor can be inlined in owner code where called. Applicable to mutable objects , because no cloning then no incoherency. Applicable to nullable objects only by adding a boolean field in the custom layout for storing potential existence or non-existence of the inlined object, and updating code testing nullability for using this boolean. Use-case "General efficiency": Custom layout (Inline sub-object in the object owning it): - Reduce memory use with less objects then less headers and less pointers. - Improve cache performance with better locality (objects inlined are in same cache line, then no reference to follow). - Applicable to many fields containing reference, requiring only the referenced object to be invisible from all objects except one (the owner). By example, a private field containing an internal ArrayList (without getter/setter) can probably be replaced by the integer containing the used size and the reference to backing array, with inlining of the few methods of ArrayList really used. It need probably to be driven by developer after real profiling for finding best ratio between efficiency/code expansion. It will probably have much more use-cases when AOT will be available and developer-manageable precisely (Jigsaw???), because most slow work of object-code inlining and following optimizations can be done at AOT time, while gains will be at running time. Probably useful for the hottest code (JIT after this pre-optimization at AOT time) and clearly bad for the coldest code (interpreter then avoid code expansion), but very useful for the big quantity of code between, which will gain from AOT if complex optimizations are available. This will very probably require developer help/instructions/annotations using profiler data obtained on functional tests of application. --- 3) JVM can not move or clone objects (Project Panama off heap / PackedObjects) Constraint: developer need to manage externally the full lifecycle of object and need to choose when creating or destroying it. Object is off-heap and an handle is on-heap for managing off-heap part. Constraint: potential fragmentation of free memory when frequently creating and removing objects not having the same size (taking attention to object size vs. page size is probably important). Use-case "GC Latency": big data structure inducing GC latency when moved if stored in heap - All big chunks of data, like Big Data or textures in games, etc. - Few number of objects for being manageable more explicitly by developer (without too much work). Use-case "Native": communicate with native library - Modern version of JNI Only my 2 cents, Daniel. From blackdrag at gmx.org Thu Jan 29 11:55:39 2015 From: blackdrag at gmx.org (Jochen Theodorou) Date: Thu, 29 Jan 2015 12:55:39 +0100 Subject: What's the status of / relation between "JEP 169: Value Objects" / "Value Types for Java" / "Object Layout" In-Reply-To: <54CA1348.5050903@gmail.com> References: <54CA1348.5050903@gmail.com> Message-ID: <54CA1FBB.7020601@gmx.org> Am 29.01.2015 12:02, schrieb Daniel Latr?moli?re: > >> I just want to quickly summarize my >> current findings here and gently ask for feedback in case you think >> I've totally misunderstood something. Of course any comments and >> additional information is highly welcome as well. > I don't know if that can be useful, but here is my point of view of > developer oriented towards the question: "What feature for solving my > problem?". This contains probably some or many errors, but it is another > point of view (only mine), if useful. [...] > 3) JVM can not move or clone objects (Project Panama off heap / > PackedObjects) > Constraint: developer need to manage externally the full lifecycle of > object and need to choose when creating or destroying it. Object is > off-heap and an handle is on-heap for managing off-heap part. > Constraint: potential fragmentation of free memory when frequently > creating and removing objects not having the same size (taking attention > to object size vs. page size is probably important). > > Use-case "GC Latency": big data structure inducing GC latency when moved > if stored in heap > - All big chunks of data, like Big Data or textures in games, etc. > - Few number of objects for being manageable more explicitly by > developer (without too much work). > > Use-case "Native": communicate with native library > - Modern version of JNI From that view it makes me wonder if that is really in the scope of JEP 169. bye Jochen -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From brian.goetz at oracle.com Thu Jan 29 17:05:23 2015 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 29 Jan 2015 12:05:23 -0500 Subject: What's the status of / relation between "JEP 169: Value Objects" / "Value Types for Java" / "Object Layout" In-Reply-To: References: Message-ID: <54CA6853.2060601@oracle.com> > Question: is JEP 169 still under active development or has it been > merged into the more general "Value types for Java" proposal below? It has been merged into the more general Value Types for Java proposal. > The "Value types for Java" approach clearly seems to be the most > general but also the most complex proposal. For some meanings of "complex". It is certainly the most intrusive and large; new bytecodes, new type signatures. But from a user-model perspective, value types are actually fairly simple. > It's out of scope for Java > 9 and still questionable for Java 10 and above. The "PackedObject" and > "ObjectLayout" approaches are clearly simpler and more limited in > scope as they only concentrate on better object layout. To your list, I'd add: Project Panama, the sister project to Valhalla. Panama focuses on interop with native code and data, including layout specification. A key goal of Packed was to be able to access off-heap native data in its native format, rather than marshalling it across the JNI boundary. Panama is focused on this problem as well, but aims to treat it as a separate problem from Java object layout, resulting in what we believe to be a cleaner decomposition of the two concerns. Packed is an interesting mix of memory density (object embedding and packed arrays) and native interop. But mixing the two goals also has costs; our approach is to separate them into orthogonal concerns, and we think that Valhalla and Panama do just that. So in many ways, while a larger project, the combination of Valhalla+Panama addresses the problem that Packed did, in a cleaner way. > Question: is there a chance to get a some sort of Java-only but > transparently optimizable structure package like "ObjectLayout" into > Java early (i.e. Java 9)? It would depend on a lot of things -- including the level of readiness of the design and implementation, and the overlap with anticipated future features. We've reviewed some of the early design of ObjectLayout and provided feedback to the projects architects; currently, I think it's in the "promising exploration" stage, but I think multiple rounds of simplification are needed before it is ready to be considered for "everybody's Java." But if the choice is to push something that's not ready into 9, or to wait longer -- there's not actually a choice to be made there. I appreciate the desire to "get something you can use now", but we have to be prepared to support whatever we push into Java for the next 20 years, and deal with the additional constraints it generates -- which can be an enormous cost. (Even thought the direct cost is mostly borne by Oracle, the indirect cost is borne by everyone, in the form of slower progress on everything else.) So I am very wary of the motivation of "well, something better is coming, but this works now, so can we push it in?" I'd prefer to focus on answering whether this is right thing for Java for the next 20 years. > In my eyes this wouldn't contradict with a more general solution like > the one proposed in the "Value types for Java" approach while still > offering quite significant performance improvements for quite a big > range of problems. The goals of the ObjectLayout effort has overlap with, but also differs from, the goals of Valhalla. And herein is the problem; neither generalizes the other, and I don't think we do the user base a great favor by pursuing two separate neither-coincident-nor-orthogonal approaches. I suspect, though, that after a few rounds of simplification, ObjectLayout could morph into something that fit either coincidently or orthogonally with the Valhalla work -- which would be great. But, as you know, our resources are limited, so we (Oracle) can't really afford to invest in both. And such simplification takes time -- getting to that "aha" moment when you realize you can simplify something is generally an incompressible process. > Question: what would be the right place to propose something like the > "ObjectLayout" library for Java 9/10? Would that fit within the > umbrella of the Valhalla project or would it be done within its own > project / under it's own JEP? Suggesting a version number at this point would be putting the cart before the horse (you'll note that we've not even proposed a version number for Valhalla; the closest we've gotten to that is "after 9".) OpenJDK Projects are a tool for building a community around a body of work; JEPs are a project-management tool for defining, scoping, and tracking the progress of a feature. Given where OL is, it would be reasonable to start a Project, which would become the nexus of collaboration that could eventually produce a JEP. Hope this helps, -Brian From volker.simonis at gmail.com Thu Jan 29 17:31:09 2015 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 29 Jan 2015 18:31:09 +0100 Subject: What's the status of / relation between "JEP 169: Value Objects" / "Value Types for Java" / "Object Layout" In-Reply-To: <54CA1348.5050903@gmail.com> References: <54CA1348.5050903@gmail.com> Message-ID: Hi Daniel, thanks a lot for sharing your point of view. I haven't been aware of the fact that Project Panama is also working on similar topics (I always thought it is only about the Foreign Function Interface and the next generation JNI). In [1,2] John Rose nicely explains that new data layouts in the JVM heap are very well on the agenda of Project Panama and he also mentions IBM's PackedObjects and Gil Ten's Object Layout proposals. Regards, Volker [1] http://mail.openjdk.java.net/pipermail/panama-dev/2014-October/000042.html [2] https://blogs.oracle.com/jrose/entry/the_isthmus_in_the_vm On Thu, Jan 29, 2015 at 12:02 PM, Daniel Latr?moli?re wrote: > >> I just want to quickly summarize my >> current findings here and gently ask for feedback in case you think >> I've totally misunderstood something. Of course any comments and >> additional information is highly welcome as well. > > I don't know if that can be useful, but here is my point of view of > developer oriented towards the question: "What feature for solving my > problem?". This contains probably some or many errors, but it is another > point of view (only mine), if useful. > > I will not use strictly projects/proposal list as the structure of my mail > because content of proposal is changing and it is not my target. I am > oriented towards the final user, i.e. the developer consuming these > projects, not the implementer working in each of these projects. > > I will preferably split in three scopes following my perceived split of job > between developer and runtime. The problem is data, then what can do JVM/GC > with an object? I find two possibilities regarding this domain: move it, > clone it. > > If JVM can clone the object, JVM can also move the object because the clone > will not have the same address, then we have the following three features: > --- > 1) JVM can clone and move objects (Project Valhalla): > Constraint: no complex constructor/no complex finalizer, because lifecycle > of object is managed by JVM (JVM can clone, then JVM can create and destroy > the object like JVM want). Only field affectation constructor, possibly with > simple conversion of data format. > Constraint: immutable, because we don't know which clone is good when one is > modified and because modifying all clones simultaneously is > slow/complex/parallel-unfriendly. > Constraint: non-null because cloning a non-existing object is a non-existing > problem. > > Use-case "Performance": objects to clone for being closer to execution > silicon and better parallelism (registers or cache of CPU/GPU) > - Runtime: expose features of CPU/GPU like SIMD (mostly like a modern > version of javax.vecmath). > - Developer: create custom low-level structures for CPU/GPU parallel > computing. > - Java language: small tuples, like complex numbers (immutable by > performance choice, like SIMD, for being close to silicon; cloned at each > pass by value). > > Use-case "Language": objects to clone for being closer to registers (in > stack, then less allocations in heap; simpler than escape analysis) > - Java language: multiple return values from a method (immutable because > it's a result; cloned, by example, at the return of each delegate or not > even created when stack-only). > > Use-case "Efficiency": others immutable non-null objects possibly concerned > for reducing indirection/improving cache, given by specialization of > collection classes > - Database: primary key for Map (like HashMap)/B-Tree (like MapDB)/SQL (like > JPA). A primary key is immutable and non-null by choice of developer, then > possible gains. > --- > 2) JVM can move but not clone objects > > It's current state of Java objects: > Constraint: developer need to define lifecycle in object, for being > triggered by GC (constructor/finalizer) like current Java class. > Constraint: small object, because when GC move a big object, there is > possibly a noticeable latency. > Constraint: usable directly only in Java code (because native code will need > an indirection level for finding the real address of the object, changing > after each move) > > Improvement by adding custom layout for objects (Project Panama on heap / > ObjectLayout): > Specific constraint: objects which are near identity-less, i.e. only one > other object (the owner) know their identity/have pointer on it. > Non-constraint: applicable to all objects types, contrary to Project > Valhalla. Applicable to complex constructor, because complex constructor can > be inlined in owner code where called. Applicable to mutable objects , > because no cloning then no incoherency. Applicable to nullable objects only > by adding a boolean field in the custom layout for storing potential > existence or non-existence of the inlined object, and updating code testing > nullability for using this boolean. > > Use-case "General efficiency": Custom layout (Inline sub-object in the > object owning it): > - Reduce memory use with less objects then less headers and less pointers. > - Improve cache performance with better locality (objects inlined are in > same cache line, then no reference to follow). > - Applicable to many fields containing reference, requiring only the > referenced object to be invisible from all objects except one (the owner). > > By example, a private field containing an internal ArrayList (without > getter/setter) can probably be replaced by the integer containing the used > size and the reference to backing array, with inlining of the few methods of > ArrayList really used. > It need probably to be driven by developer after real profiling for finding > best ratio between efficiency/code expansion. It will probably have much > more use-cases when AOT will be available and developer-manageable precisely > (Jigsaw???), because most slow work of object-code inlining and following > optimizations can be done at AOT time, while gains will be at running time. > Probably useful for the hottest code (JIT after this pre-optimization at AOT > time) and clearly bad for the coldest code (interpreter then avoid code > expansion), but very useful for the big quantity of code between, which will > gain from AOT if complex optimizations are available. This will very > probably require developer help/instructions/annotations using profiler data > obtained on functional tests of application. > --- > 3) JVM can not move or clone objects (Project Panama off heap / > PackedObjects) > Constraint: developer need to manage externally the full lifecycle of object > and need to choose when creating or destroying it. Object is off-heap and an > handle is on-heap for managing off-heap part. > Constraint: potential fragmentation of free memory when frequently creating > and removing objects not having the same size (taking attention to object > size vs. page size is probably important). > > Use-case "GC Latency": big data structure inducing GC latency when moved if > stored in heap > - All big chunks of data, like Big Data or textures in games, etc. > - Few number of objects for being manageable more explicitly by developer > (without too much work). > > Use-case "Native": communicate with native library > - Modern version of JNI > > Only my 2 cents, > Daniel. From vladimir.x.ivanov at oracle.com Thu Jan 29 18:18:13 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 29 Jan 2015 21:18:13 +0300 Subject: [9] RFR (XS): 8071787: Don't block inlining when DONT_INLINE_THRESHOLD=0 In-Reply-To: References: <54C91877.5040707@oracle.com> Message-ID: <54CA7965.7030301@oracle.com> Thanks, John! Best regards, Vladimir Ivanov On 1/29/15 6:10 AM, John Rose wrote: > Good. Consider fixing the typo in 'makeBlockInlningWrapper'. ? John > > On Jan 28, 2015, at 9:12 AM, Vladimir Ivanov wrote: >> >> http://cr.openjdk.java.net/~vlivanov/8071787/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8071787 >> >> For testing & performance measurements, sometimes it's useful to replace block inlining wrappers with trivial reinvokers. >> >> This change extends DONT_INLINE_THRESHOLD in the following manner: >> DONT_INLINE_THRESHOLD = -1: no wrapper >> DONT_INLINE_THRESHOLD = 0: reinvoker >> DONT_INLINE_THRESHOLD > 0: counting wrapper >> >> Before that DONT_INLINE_THRESHOLD=0 meant a counting wrapper which is removed on the first invocation. After the change, it's DONT_INLINE_THRESHOLD=1. >> >> Testing: manual, java/lang/invoke >> >> Best regards, >> Vladimir Ivanov > From vladimir.x.ivanov at oracle.com Thu Jan 29 18:18:22 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 29 Jan 2015 21:18:22 +0300 Subject: [9] RFR (XXS): 8071788: CountingWrapper.asType() is broken In-Reply-To: <53D3F321-0259-4878-9767-EA909EF90810@oracle.com> References: <54C91AF1.3010602@oracle.com> <53D3F321-0259-4878-9767-EA909EF90810@oracle.com> Message-ID: <54CA796E.2090500@oracle.com> Thanks, John! Best regards, Vladimir Ivanov On 1/29/15 6:11 AM, John Rose wrote: > Good. > > On Jan 28, 2015, at 9:22 AM, Vladimir Ivanov > > wrote: >> >> The fix is to use adapted MethodHandle to construct LambdaForm. > From christian.thalinger at oracle.com Fri Jan 30 00:41:13 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 29 Jan 2015 16:41:13 -0800 Subject: Invokedynamic and recursive method call In-Reply-To: References: <54A3019C.1070909@univ-mlv.fr> <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> <54AD5B3C.80004@univ-mlv.fr> Message-ID: Trying to remember compiler implementation details this sounds reasonable and is a bug (or an enhancement, actually ;-). Can someone file a bug? > On Jan 7, 2015, at 10:07 AM, Charles Oliver Nutter wrote: > > This could explain performance regressions we've seen on the > performance of heavily-recursive algorithms. I'll try to get an > assembly dump for fib in JRuby later today. > > - Charlie > > On Wed, Jan 7, 2015 at 10:13 AM, Remi Forax wrote: >> >> On 01/07/2015 10:43 AM, Marcus Lagergren wrote: >>> >>> Remi, I tried to reproduce your problem with jdk9 b44. It runs decently >>> fast. >> >> >> yes, nashorn is fast enough but it can be faster if the JIT was not doing >> something stupid. >> >> When the VM inline fibo, because fibo is recursive, the recursive call is >> inlined only once, >> so the call at depth=2 can not be inlined but should be a classical direct >> call. >> >> But if fibo is called through an invokedynamic, instead of emitting a direct >> call to fibo, >> the JIT generates a code that push the method handle on stack and execute it >> like if the metod handle was not constant >> (the method handle is constant because the call at depth=1 is inlined !). >> >>> When did it start to regress? >> >> >> jdk7u40, i believe. >> >> I've created a jar containing some handwritten bytecodes with no dependency >> to reproduce the issue easily: >> https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar >> >> [forax at localhost test7]$ time /usr/jdk/jdk1.9.0/bin/java -cp fibo7.jar >> FiboSample >> 1836311903 >> >> real 0m6.653s >> user 0m6.729s >> sys 0m0.019s >> [forax at localhost test7]$ time /usr/jdk/jdk1.8.0_25/bin/java -cp fibo7.jar >> FiboSample >> 1836311903 >> >> real 0m6.572s >> user 0m6.591s >> sys 0m0.019s >> [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_71/bin/java -cp fibo7.jar >> FiboSample >> 1836311903 >> >> real 0m6.373s >> user 0m6.396s >> sys 0m0.016s >> [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_25/bin/java -cp fibo7.jar >> FiboSample >> 1836311903 >> >> real 0m4.847s >> user 0m4.832s >> sys 0m0.019s >> >> as you can see, it was faster with a JDK before jdk7u40. >> >>> >>> Regards >>> Marcus >> >> >> cheers, >> R?mi >> >> >>> >>>> On 30 Dec 2014, at 20:48, Remi Forax wrote: >>>> >>>> Hi guys, >>>> I've found a bug in the interaction between the lambda form and inlining >>>> algorithm, >>>> basically if the inlining heuristic bailout because the method is >>>> recursive and already inlined once, >>>> instead to emit a code to do a direct call, it revert to do call to >>>> linkStatic with the method >>>> as MemberName. >>>> >>>> I think it's a regression because before the introduction of lambda >>>> forms, >>>> I'm pretty sure that the JIT was emitting a direct call. >>>> >>>> Step to reproduce with nashorn, run this JavaScript code >>>> function fibo(n) { >>>> return (n < 2)? 1: fibo(n - 1) + fibo(n - 2) >>>> } >>>> >>>> print(fibo(45)) >>>> >>>> like this: >>>> /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions >>>> -J-XX:+PrintAssembly fibo.js > log.txt >>>> >>>> look for a method 'fibo' from the tail of the log, you will find >>>> something like this: >>>> >>>> 0x00007f97e4b4743f: mov $0x76d08f770,%r8 ; {oop(a >>>> 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' >>>> '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in >>>> 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')} >>>> 0x00007f97e4b47449: xchg %ax,%ax >>>> 0x00007f97e4b4744b: callq 0x00007f97dd0446e0 >>>> >>>> I hope this can be fixed. My demonstration that I can have fibo written >>>> with a dynamic language >>>> that run as fast as written in Java doesn't work anymore :( >>>> >>>> cheers, >>>> R?mi >>>> >>>> _______________________________________________ >>>> mlvm-dev mailing list >>>> mlvm-dev at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From john.r.rose at oracle.com Fri Jan 30 00:48:03 2015 From: john.r.rose at oracle.com (John Rose) Date: Thu, 29 Jan 2015 16:48:03 -0800 Subject: Invokedynamic and recursive method call In-Reply-To: <54AD5B3C.80004@univ-mlv.fr> References: <54A3019C.1070909@univ-mlv.fr> <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> <54AD5B3C.80004@univ-mlv.fr> Message-ID: <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com> On Jan 7, 2015, at 8:13 AM, Remi Forax wrote: > > But if fibo is called through an invokedynamic, instead of emitting a direct call to fibo, > the JIT generates a code that push the method handle on stack and execute it > like if the metod handle was not constant > (the method handle is constant because the call at depth=1 is inlined !). Invocation of non-constant MH's had a performance regression with the LF-based implementation. As of JDK-8069591 they should be no slower and sometimes faster than the old implementation. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Fri Jan 30 01:01:04 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 29 Jan 2015 17:01:04 -0800 Subject: Invokedynamic and recursive method call In-Reply-To: <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com> References: <54A3019C.1070909@univ-mlv.fr> <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> <54AD5B3C.80004@univ-mlv.fr> <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com> Message-ID: <67DAAC92-261C-4769-8299-027F66081AFE@oracle.com> > On Jan 29, 2015, at 4:48 PM, John Rose wrote: > > On Jan 7, 2015, at 8:13 AM, Remi Forax > wrote: >> >> But if fibo is called through an invokedynamic, instead of emitting a direct call to fibo, >> the JIT generates a code that push the method handle on stack and execute it >> like if the metod handle was not constant >> (the method handle is constant because the call at depth=1 is inlined !). > > Invocation of non-constant MH's had a performance regression with the LF-based implementation. > As of JDK-8069591 they should be no slower and sometimes faster than the old implementation. Maybe but what Remi is saying that the MH is constant and we could emit a direct call. > ? John > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Fri Jan 30 01:03:03 2015 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 30 Jan 2015 02:03:03 +0100 Subject: Invokedynamic and recursive method call In-Reply-To: <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com> References: <54A3019C.1070909@univ-mlv.fr> <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> <54AD5B3C.80004@univ-mlv.fr> <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com> Message-ID: <54CAD847.1070804@univ-mlv.fr> On 01/30/2015 01:48 AM, John Rose wrote: > On Jan 7, 2015, at 8:13 AM, Remi Forax > wrote: >> >> But if fibo is called through an invokedynamic, instead of emitting a >> direct call to fibo, >> the JIT generates a code that push the method handle on stack and >> execute it >> like if the metod handle was not constant >> (the method handle is constant because the call at depth=1 is inlined !). > > Invocation of non-constant MH's had a performance regression with the > LF-based implementation. > As of JDK-8069591 they should be no slower and sometimes faster than > the old implementation. > ? John > In my case, the method handle is constant (I think it's also the case when you write fibo in javascript). At depth=1, the call is correctly inlined. At depth=2, the call is not inlined because it's a recursive call and by default hotspot only inline recursive call once, this is normal behavior. The bug is that instead of doing a call (using the call assembly instruction), the JIT pushes the method handle on stack and do an invokebasic, which is slower. R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Fri Jan 30 15:07:26 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 30 Jan 2015 18:07:26 +0300 Subject: Invokedynamic and recursive method call In-Reply-To: <54CAD847.1070804@univ-mlv.fr> References: <54A3019C.1070909@univ-mlv.fr> <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> <54AD5B3C.80004@univ-mlv.fr> <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com> <54CAD847.1070804@univ-mlv.fr> Message-ID: <54CB9E2E.6040203@oracle.com> Remi, thanks for the report! Filed JDK-8072008 [1]. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8072008 On 1/30/15 4:03 AM, Remi Forax wrote: > > On 01/30/2015 01:48 AM, John Rose wrote: >> On Jan 7, 2015, at 8:13 AM, Remi Forax > > wrote: >>> >>> But if fibo is called through an invokedynamic, instead of emitting a >>> direct call to fibo, >>> the JIT generates a code that push the method handle on stack and >>> execute it >>> like if the metod handle was not constant >>> (the method handle is constant because the call at depth=1 is inlined !). >> >> Invocation of non-constant MH's had a performance regression with the >> LF-based implementation. >> As of JDK-8069591 they should be no slower and sometimes faster than >> the old implementation. >> ? John >> > > In my case, the method handle is constant (I think it's also the case > when you write fibo in javascript). > At depth=1, the call is correctly inlined. > At depth=2, the call is not inlined because it's a recursive call and by > default hotspot only inline recursive call once, > this is normal behavior. The bug is that instead of doing a call (using > the call assembly instruction), > the JIT pushes the method handle on stack and do an invokebasic, which > is slower. > > R?mi > > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > From forax at univ-mlv.fr Sat Jan 31 22:54:46 2015 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 31 Jan 2015 23:54:46 +0100 Subject: Invokedynamic and recursive method call In-Reply-To: <54CB9E2E.6040203@oracle.com> References: <54A3019C.1070909@univ-mlv.fr> <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com> <54AD5B3C.80004@univ-mlv.fr> <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com> <54CAD847.1070804@univ-mlv.fr> <54CB9E2E.6040203@oracle.com> Message-ID: <54CD5D36.5040700@univ-mlv.fr> Thank you, Vladimir ! R?mi On 01/30/2015 04:07 PM, Vladimir Ivanov wrote: > Remi, thanks for the report! > > Filed JDK-8072008 [1]. > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8072008 > > On 1/30/15 4:03 AM, Remi Forax wrote: >> >> On 01/30/2015 01:48 AM, John Rose wrote: >>> On Jan 7, 2015, at 8:13 AM, Remi Forax >> > wrote: >>>> >>>> But if fibo is called through an invokedynamic, instead of emitting a >>>> direct call to fibo, >>>> the JIT generates a code that push the method handle on stack and >>>> execute it >>>> like if the metod handle was not constant >>>> (the method handle is constant because the call at depth=1 is >>>> inlined !). >>> >>> Invocation of non-constant MH's had a performance regression with the >>> LF-based implementation. >>> As of JDK-8069591 they should be no slower and sometimes faster than >>> the old implementation. >>> ? John >>> >> >> In my case, the method handle is constant (I think it's also the case >> when you write fibo in javascript). >> At depth=1, the call is correctly inlined. >> At depth=2, the call is not inlined because it's a recursive call and by >> default hotspot only inline recursive call once, >> this is normal behavior. The bug is that instead of doing a call (using >> the call assembly instruction), >> the JIT pushes the method handle on stack and do an invokebasic, which >> is slower. >> >> R?mi >> >> >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev