From duncan.macgregor at ge.com  Mon Jan  5 15:50:00 2015
From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management))
Date: Mon, 5 Jan 2015 15:50:00 +0000
Subject: That was the year that was.
Message-ID: <978D194CF4618446926EDAD23C949D9939BC32F5@LONURLNA08.e2k.ad.ge.com>

Since it's now the new year I thought it was a good opportunity to look back on progress we've made in Magik on Java over the course of the last twelve months.

In my JVMLS talk I mentioned LF memory usage and startup time as areas of concern, as did Marcus and others. Over the last couple of months I and a couple of other team members have been given the time to seriously look at our startup time and performance and, along with the changes made in 8u40, have made substantial progress.

Startup time

Getting our system to boot on Linux, using Solaris Studio and other profiling tools, and producing piles and piles of flame graphs has proved very useful in analysing startup time, and has shown up some areas of our own legacy infrastructure that were contributing substantially to our startup time, but reducing the total number of classes generated has also greatly reduced our startup.

Due to the nature of the language we do need to evaluate as we compile, so have introduced a two stage compilation process where we compile and evaluate files in small chunks but do not write out those class files, rather generating one large class file representing the whole source file at the end. On typical application code this has reduced the class count from by 75% and substantially reduced the class loading time (also greatly reducing the time spent resolving method handle constants - partly why I haven't had version 2 of that patch higher on my priority queue - sorry John). linkCallSite and friends (especially setTarget) still show up significantly on flame graphs (almost 17% of samples). The time to create a mutable callsite appears to be almost completely dominated by the MethodHandleNatives.setCallsiteTargetNormal call commonly done in a the constructor of the callsite itself).

Some quick and dirty instrumentation shows that we create about 50% more constant call sites for symbols than we do mutable call sites for method calls, but the constant sites show up in about 1/60th of the traces compared to the mutable sites.

Another 12% of startup is taken up with reseting callsite targets after the fallback has been invoked.

I?m not sure how much more time we?ll get to work on this area, or whether startup time (or at least this portion of it) will be regarded as ?good enough? but there seem to be a couple of avenues we could explore to improve things


  1.  We could look at refactoring our code so that setTarget does not need to be used when initialising our mutable call sites. Since most sites need a fallback method bound to themselves in some way this would require refactoring our code to create objects that hold a MutableCallSite, rather than subclass MutableCallSite. This might help to further our plans at decomposing call sites into their functional parts, but is something I?m not going to explore without doing some thorough benchmarking first.
  2.  It?s also worth digging into when it is worth resetting a callsite?s target. Mutable sites hit during bootstrap frequently only get used once, or at most a small number of times, so we might do better gathering some type information and only setting the target when it seems worth the cost.

We?ve considered a couple of more radical approaches to reducing startup time, mostly around either implementing an interpreter to handle the bootstrap code (because it?s always fun to maintain an interpreter and a compiler) or some form of serialisation (tricky to get right and fit in with modularisation work) but I?m more than open to any other wacky ideas people want to throw in.

Memory

The LambdaForm changes have had an excellent effect on application memory usage. There's still plenty of room to reduce it but that it's probably more for us to optimise our core and application code rather than fundamental JVM issues now.

Anyway, happy new year to everyone on the mlvm list,

Duncan.


From forax at univ-mlv.fr  Tue Jan  6 07:51:35 2015
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 06 Jan 2015 08:51:35 +0100
Subject: Invokedynamic and recursive method call
In-Reply-To: <54A3019C.1070909@univ-mlv.fr>
References: <54A3019C.1070909@univ-mlv.fr>
Message-ID: <54AB9407.7020805@univ-mlv.fr>

ping ?

R?mi

On 12/30/2014 08:48 PM, Remi Forax wrote:
> Hi guys,
> I've found a bug in the interaction between the lambda form and 
> inlining algorithm,
> basically if the inlining heuristic bailout because the method is 
> recursive and already inlined once,
> instead to emit a code to do a direct call, it revert to do call to 
> linkStatic with the method
> as MemberName.
>
> I think it's a regression because before the introduction of lambda 
> forms,
> I'm pretty sure that the JIT was emitting a direct call.
>
> Step to reproduce with nashorn, run this JavaScript code
> function fibo(n) {
>   return (n < 2)? 1: fibo(n - 1) + fibo(n - 2)
> }
>
> print(fibo(45))
>
> like this:
>   /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions 
> -J-XX:+PrintAssembly fibo.js > log.txt
>
> look for a method 'fibo' from the tail of the log, you will find 
> something like this:
>
>   0x00007f97e4b4743f: mov    $0x76d08f770,%r8   ;   {oop(a 
> 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' 
> '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' 
> in 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')}
>   0x00007f97e4b47449: xchg   %ax,%ax
>   0x00007f97e4b4744b: callq  0x00007f97dd0446e0
>
> I hope this can be fixed. My demonstration that I can have fibo 
> written with a dynamic language
> that run as fast as written in Java doesn't work anymore :(
>
> cheers,
> R?mi
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From marcus.lagergren at oracle.com  Wed Jan  7 09:43:26 2015
From: marcus.lagergren at oracle.com (Marcus Lagergren)
Date: Wed, 7 Jan 2015 10:43:26 +0100
Subject: Invokedynamic and recursive method call
In-Reply-To: <54A3019C.1070909@univ-mlv.fr>
References: <54A3019C.1070909@univ-mlv.fr>
Message-ID: <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>

Remi, I tried to reproduce your problem with jdk9 b44. It runs decently fast. When did it start to regress?

Regards
Marcus

> On 30 Dec 2014, at 20:48, Remi Forax <forax at univ-mlv.fr> wrote:
> 
> Hi guys,
> I've found a bug in the interaction between the lambda form and inlining algorithm,
> basically if the inlining heuristic bailout because the method is recursive and already inlined once,
> instead to emit a code to do a direct call, it revert to do call to linkStatic with the method
> as MemberName.
> 
> I think it's a regression because before the introduction of lambda forms,
> I'm pretty sure that the JIT was emitting a direct call.
> 
> Step to reproduce with nashorn, run this JavaScript code
> function fibo(n) {
>  return (n < 2)? 1: fibo(n - 1) + fibo(n - 2)
> }
> 
> print(fibo(45))
> 
> like this:
>  /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions -J-XX:+PrintAssembly fibo.js > log.txt
> 
> look for a method 'fibo' from the tail of the log, you will find something like this:
> 
>  0x00007f97e4b4743f: mov    $0x76d08f770,%r8   ;   {oop(a 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')}
>  0x00007f97e4b47449: xchg   %ax,%ax
>  0x00007f97e4b4744b: callq  0x00007f97dd0446e0
> 
> I hope this can be fixed. My demonstration that I can have fibo written with a dynamic language
> that run as fast as written in Java doesn't work anymore :(
> 
> cheers,
> R?mi
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From paul.sandoz at oracle.com  Wed Jan  7 11:16:20 2015
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Wed, 7 Jan 2015 12:16:20 +0100
Subject: [9] RFR (M): 8067344: Adjust
	java/lang/invoke/LFCaching/LFGarbageCollectedTest.java for
	recent changes in java.lang.invoke
In-Reply-To: <549962C4.2040301@oracle.com>
References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com>
Message-ID: <ADE5A273-0CC0-43F0-B7CF-633A558BCE4E@oracle.com>

Hi

  70             TestMethods testCase = getTestMethod();
  71             if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) {
  72                 // Invokers aren't collected.
  73                 return;
  74             }

Can you just filter those test cases out in the main method within EnumSet.complementOf?

On Dec 23, 2014, at 1:40 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:

> Spotted some more problems:
>  - need to skip identity operations (identity_* LambdaForms) in the test, since corresponding LambdaForms reside in a permanent cache;
> 

  82             mtype = adapter.type();
  83             if (mtype.parameterCount() == 0) {
  84                 // Ignore identity_* LambdaForms.
  85                 return;
  86             }

Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required?


>  - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance.
> 

  78                 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE);


Could replace "getTestMethod()" with "testCase".

Paul.

> Updated version:
> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.01/
> 
> Best regards,
> Vladimir Ivanov
> 
> On 12/22/14 11:53 PM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8067344
>> 
>> LFGarbageCollectedTest should be adjusted after JDK-8057020.
>> 
>> There are a couple of problems with the test.
>> 
>> (1) Existing logic to test that LambdaForm instance is collected isn't
>> stable enough. Consequent System.GCs can hinder reference enqueueing.
>> To speed up the test, I added -XX:SoftRefLRUPolicyMSPerMB=0 and limited
>> the heap by -Xmx64m.
>> 
>> (2) MethodType-based invoker caches are deliberately left strongly
>> reachable. So, they should be skipped in the test.
>> 
>> (3) Added additional diagnostic output to simplify failure analysis
>> (test case details, method handle type and LambdaForm, heap dump
>> (optional, -DHEAP_DUMP=true)).
>> 
>> Testing: failing test.
>> 
>> Thanks!
>> 
>> Best regards,
>> Vladimir Ivanov
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150107/2a509ec4/signature-0001.asc>

From forax at univ-mlv.fr  Wed Jan  7 16:13:48 2015
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 07 Jan 2015 17:13:48 +0100
Subject: Invokedynamic and recursive method call
In-Reply-To: <9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>
References: <54A3019C.1070909@univ-mlv.fr>
	<9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>
Message-ID: <54AD5B3C.80004@univ-mlv.fr>


On 01/07/2015 10:43 AM, Marcus Lagergren wrote:
> Remi, I tried to reproduce your problem with jdk9 b44. It runs decently fast.

yes, nashorn is fast enough but it can be faster if the JIT was not 
doing something stupid.

When the VM inline fibo, because fibo is recursive, the recursive call 
is inlined only once,
so the call at depth=2 can not be inlined but should be a classical 
direct call.

But if fibo is called through an invokedynamic, instead of emitting a 
direct call to fibo,
the JIT generates a code that push the method handle on stack and execute it
like if the metod handle was not constant
(the method handle is constant because the call at depth=1 is inlined !).

> When did it start to regress?

jdk7u40, i believe.

I've created a jar containing some handwritten bytecodes with no 
dependency to reproduce the issue easily:
   https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar

[forax at localhost test7]$ time /usr/jdk/jdk1.9.0/bin/java -cp fibo7.jar 
FiboSample
1836311903

real    0m6.653s
user    0m6.729s
sys    0m0.019s
[forax at localhost test7]$ time /usr/jdk/jdk1.8.0_25/bin/java -cp 
fibo7.jar FiboSample
1836311903

real    0m6.572s
user    0m6.591s
sys    0m0.019s
[forax at localhost test7]$ time /usr/jdk/jdk1.7.0_71/bin/java -cp 
fibo7.jar FiboSample
1836311903

real    0m6.373s
user    0m6.396s
sys    0m0.016s
[forax at localhost test7]$ time /usr/jdk/jdk1.7.0_25/bin/java -cp 
fibo7.jar FiboSample
1836311903

real    0m4.847s
user    0m4.832s
sys    0m0.019s

as you can see, it was faster with a JDK before jdk7u40.

>
> Regards
> Marcus

cheers,
R?mi

>
>> On 30 Dec 2014, at 20:48, Remi Forax <forax at univ-mlv.fr> wrote:
>>
>> Hi guys,
>> I've found a bug in the interaction between the lambda form and inlining algorithm,
>> basically if the inlining heuristic bailout because the method is recursive and already inlined once,
>> instead to emit a code to do a direct call, it revert to do call to linkStatic with the method
>> as MemberName.
>>
>> I think it's a regression because before the introduction of lambda forms,
>> I'm pretty sure that the JIT was emitting a direct call.
>>
>> Step to reproduce with nashorn, run this JavaScript code
>> function fibo(n) {
>>   return (n < 2)? 1: fibo(n - 1) + fibo(n - 2)
>> }
>>
>> print(fibo(45))
>>
>> like this:
>>   /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions -J-XX:+PrintAssembly fibo.js > log.txt
>>
>> look for a method 'fibo' from the tail of the log, you will find something like this:
>>
>>   0x00007f97e4b4743f: mov    $0x76d08f770,%r8   ;   {oop(a 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')}
>>   0x00007f97e4b47449: xchg   %ax,%ax
>>   0x00007f97e4b4744b: callq  0x00007f97dd0446e0
>>
>> I hope this can be fixed. My demonstration that I can have fibo written with a dynamic language
>> that run as fast as written in Java doesn't work anymore :(
>>
>> cheers,
>> R?mi
>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From marcus.lagergren at oracle.com  Wed Jan  7 16:27:35 2015
From: marcus.lagergren at oracle.com (Marcus Lagergren)
Date: Wed, 7 Jan 2015 17:27:35 +0100
Subject: Invokedynamic and recursive method call
In-Reply-To: <54AD5B3C.80004@univ-mlv.fr>
References: <54A3019C.1070909@univ-mlv.fr>
	<9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>
	<54AD5B3C.80004@univ-mlv.fr>
Message-ID: <4B1C6F6B-08FB-464C-B130-896C00833EE5@oracle.com>

7u40 is when the native invoke dynamic implementation was replaced with Lambda Forms :-/

/M

> On 07 Jan 2015, at 17:13, Remi Forax <forax at univ-mlv.fr> wrote:
> 
> 
> On 01/07/2015 10:43 AM, Marcus Lagergren wrote:
>> Remi, I tried to reproduce your problem with jdk9 b44. It runs decently fast.
> 
> yes, nashorn is fast enough but it can be faster if the JIT was not doing something stupid.
> 
> When the VM inline fibo, because fibo is recursive, the recursive call is inlined only once,
> so the call at depth=2 can not be inlined but should be a classical direct call.
> 
> But if fibo is called through an invokedynamic, instead of emitting a direct call to fibo,
> the JIT generates a code that push the method handle on stack and execute it
> like if the metod handle was not constant
> (the method handle is constant because the call at depth=1 is inlined !).
> 
>> When did it start to regress?
> 
> jdk7u40, i believe.
> 
> I've created a jar containing some handwritten bytecodes with no dependency to reproduce the issue easily:
>  https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar <https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar>
> 
> [forax at localhost test7]$ time /usr/jdk/jdk1.9.0/bin/java -cp fibo7.jar FiboSample
> 1836311903
> 
> real    0m6.653s
> user    0m6.729s
> sys    0m0.019s
> [forax at localhost test7]$ time /usr/jdk/jdk1.8.0_25/bin/java -cp fibo7.jar FiboSample
> 1836311903
> 
> real    0m6.572s
> user    0m6.591s
> sys    0m0.019s
> [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_71/bin/java -cp fibo7.jar FiboSample
> 1836311903
> 
> real    0m6.373s
> user    0m6.396s
> sys    0m0.016s
> [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_25/bin/java -cp fibo7.jar FiboSample
> 1836311903
> 
> real    0m4.847s
> user    0m4.832s
> sys    0m0.019s
> 
> as you can see, it was faster with a JDK before jdk7u40.
> 
>> 
>> Regards
>> Marcus
> 
> cheers,
> R?mi
> 
>> 
>>> On 30 Dec 2014, at 20:48, Remi Forax <forax at univ-mlv.fr> wrote:
>>> 
>>> Hi guys,
>>> I've found a bug in the interaction between the lambda form and inlining algorithm,
>>> basically if the inlining heuristic bailout because the method is recursive and already inlined once,
>>> instead to emit a code to do a direct call, it revert to do call to linkStatic with the method
>>> as MemberName.
>>> 
>>> I think it's a regression because before the introduction of lambda forms,
>>> I'm pretty sure that the JIT was emitting a direct call.
>>> 
>>> Step to reproduce with nashorn, run this JavaScript code
>>> function fibo(n) {
>>>  return (n < 2)? 1: fibo(n - 1) + fibo(n - 2)
>>> }
>>> 
>>> print(fibo(45))
>>> 
>>> like this:
>>>  /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions -J-XX:+PrintAssembly fibo.js > log.txt
>>> 
>>> look for a method 'fibo' from the tail of the log, you will find something like this:
>>> 
>>>  0x00007f97e4b4743f: mov    $0x76d08f770,%r8   ;   {oop(a 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo' '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')}
>>>  0x00007f97e4b47449: xchg   %ax,%ax
>>>  0x00007f97e4b4744b: callq  0x00007f97dd0446e0
>>> 
>>> I hope this can be fixed. My demonstration that I can have fibo written with a dynamic language
>>> that run as fast as written in Java doesn't work anymore :(
>>> 
>>> cheers,
>>> R?mi
>>> 
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net <mailto:mlvm-dev at openjdk.java.net>
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev <http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150107/0c290f62/attachment-0001.html>

From headius at headius.com  Wed Jan  7 18:07:23 2015
From: headius at headius.com (Charles Oliver Nutter)
Date: Wed, 7 Jan 2015 12:07:23 -0600
Subject: Invokedynamic and recursive method call
In-Reply-To: <54AD5B3C.80004@univ-mlv.fr>
References: <54A3019C.1070909@univ-mlv.fr>
	<9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>
	<54AD5B3C.80004@univ-mlv.fr>
Message-ID: <CAE-f1xSr7Kq88RNv=uS0ae+wFKAdWJS=eRT+GXCpktojGsdy-w@mail.gmail.com>

This could explain performance regressions we've seen on the
performance of heavily-recursive algorithms. I'll try to get an
assembly dump for fib in JRuby later today.

- Charlie

On Wed, Jan 7, 2015 at 10:13 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>
> On 01/07/2015 10:43 AM, Marcus Lagergren wrote:
>>
>> Remi, I tried to reproduce your problem with jdk9 b44. It runs decently
>> fast.
>
>
> yes, nashorn is fast enough but it can be faster if the JIT was not doing
> something stupid.
>
> When the VM inline fibo, because fibo is recursive, the recursive call is
> inlined only once,
> so the call at depth=2 can not be inlined but should be a classical direct
> call.
>
> But if fibo is called through an invokedynamic, instead of emitting a direct
> call to fibo,
> the JIT generates a code that push the method handle on stack and execute it
> like if the metod handle was not constant
> (the method handle is constant because the call at depth=1 is inlined !).
>
>> When did it start to regress?
>
>
> jdk7u40, i believe.
>
> I've created a jar containing some handwritten bytecodes with no dependency
> to reproduce the issue easily:
>   https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar
>
> [forax at localhost test7]$ time /usr/jdk/jdk1.9.0/bin/java -cp fibo7.jar
> FiboSample
> 1836311903
>
> real    0m6.653s
> user    0m6.729s
> sys    0m0.019s
> [forax at localhost test7]$ time /usr/jdk/jdk1.8.0_25/bin/java -cp fibo7.jar
> FiboSample
> 1836311903
>
> real    0m6.572s
> user    0m6.591s
> sys    0m0.019s
> [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_71/bin/java -cp fibo7.jar
> FiboSample
> 1836311903
>
> real    0m6.373s
> user    0m6.396s
> sys    0m0.016s
> [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_25/bin/java -cp fibo7.jar
> FiboSample
> 1836311903
>
> real    0m4.847s
> user    0m4.832s
> sys    0m0.019s
>
> as you can see, it was faster with a JDK before jdk7u40.
>
>>
>> Regards
>> Marcus
>
>
> cheers,
> R?mi
>
>
>>
>>> On 30 Dec 2014, at 20:48, Remi Forax <forax at univ-mlv.fr> wrote:
>>>
>>> Hi guys,
>>> I've found a bug in the interaction between the lambda form and inlining
>>> algorithm,
>>> basically if the inlining heuristic bailout because the method is
>>> recursive and already inlined once,
>>> instead to emit a code to do a direct call, it revert to do call to
>>> linkStatic with the method
>>> as MemberName.
>>>
>>> I think it's a regression because before the introduction of lambda
>>> forms,
>>> I'm pretty sure that the JIT was emitting a direct call.
>>>
>>> Step to reproduce with nashorn, run this JavaScript code
>>> function fibo(n) {
>>>   return (n < 2)? 1: fibo(n - 1) + fibo(n - 2)
>>> }
>>>
>>> print(fibo(45))
>>>
>>> like this:
>>>   /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions
>>> -J-XX:+PrintAssembly fibo.js > log.txt
>>>
>>> look for a method 'fibo' from the tail of the log, you will find
>>> something like this:
>>>
>>>   0x00007f97e4b4743f: mov    $0x76d08f770,%r8   ;   {oop(a
>>> 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo'
>>> '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in
>>> 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')}
>>>   0x00007f97e4b47449: xchg   %ax,%ax
>>>   0x00007f97e4b4744b: callq  0x00007f97dd0446e0
>>>
>>> I hope this can be fixed. My demonstration that I can have fibo written
>>> with a dynamic language
>>> that run as fast as written in Java doesn't work anymore :(
>>>
>>> cheers,
>>> R?mi
>>>
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

From joe.darcy at oracle.com  Fri Jan  9 02:53:54 2015
From: joe.darcy at oracle.com (Joseph D. Darcy)
Date: Thu, 08 Jan 2015 18:53:54 -0800
Subject: [9] RFR (M): 8067344: Adjust
	java/lang/invoke/LFCaching/LFGarbageCollectedTest.java
	for recent changes in java.lang.invoke
In-Reply-To: <ADE5A273-0CC0-43F0-B7CF-633A558BCE4E@oracle.com>
References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com>
	<ADE5A273-0CC0-43F0-B7CF-633A558BCE4E@oracle.com>
Message-ID: <54AF42C2.9070105@oracle.com>

Hello,

I don't have a comment on the changes to the test per se, but as someone 
who keeps an eye on test failures that occur in regression tests in the 
jdk repo of the JDK 9 dev forest, I'd like to see this test stop 
failing, either by the test being fixed for, barring that, the testing 
being @ignore-d in some way until the semantics of the test can be 
corrected.

Thanks,

-Joe

On 1/7/2015 3:16 AM, Paul Sandoz wrote:
> Hi
>
>    70             TestMethods testCase = getTestMethod();
>    71             if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) {
>    72                 // Invokers aren't collected.
>    73                 return;
>    74             }
>
> Can you just filter those test cases out in the main method within EnumSet.complementOf?
>
> On Dec 23, 2014, at 1:40 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>
>> Spotted some more problems:
>>   - need to skip identity operations (identity_* LambdaForms) in the test, since corresponding LambdaForms reside in a permanent cache;
>>
>    82             mtype = adapter.type();
>    83             if (mtype.parameterCount() == 0) {
>    84                 // Ignore identity_* LambdaForms.
>    85                 return;
>    86             }
>
> Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required?
>
>
>>   - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance.
>>
>    78                 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE);
>
>
> Could replace "getTestMethod()" with "testCase".
>
> Paul.
>
>> Updated version:
>> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.01/
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 12/22/14 11:53 PM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8067344
>>>
>>> LFGarbageCollectedTest should be adjusted after JDK-8057020.
>>>
>>> There are a couple of problems with the test.
>>>
>>> (1) Existing logic to test that LambdaForm instance is collected isn't
>>> stable enough. Consequent System.GCs can hinder reference enqueueing.
>>> To speed up the test, I added -XX:SoftRefLRUPolicyMSPerMB=0 and limited
>>> the heap by -Xmx64m.
>>>
>>> (2) MethodType-based invoker caches are deliberately left strongly
>>> reachable. So, they should be skipped in the test.
>>>
>>> (3) Added additional diagnostic output to simplify failure analysis
>>> (test case details, method handle type and LambdaForm, heap dump
>>> (optional, -DHEAP_DUMP=true)).
>>>
>>> Testing: failing test.
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From vladimir.x.ivanov at oracle.com  Mon Jan 12 18:06:54 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 12 Jan 2015 21:06:54 +0300
Subject: [9] RFR (M): 8067344: Adjust
	java/lang/invoke/LFCaching/LFGarbageCollectedTest.java
	for recent changes in java.lang.invoke
In-Reply-To: <ADE5A273-0CC0-43F0-B7CF-633A558BCE4E@oracle.com>
References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com>
	<ADE5A273-0CC0-43F0-B7CF-633A558BCE4E@oracle.com>
Message-ID: <54B40D3E.3080608@oracle.com>

Paul,

Thanks for the review!

Updated webrev:
  http://cr.openjdk.java.net/~vlivanov/8067344/webrev.02

>    70             TestMethods testCase = getTestMethod();
>    71             if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) {
>    72                 // Invokers aren't collected.
>    73                 return;
>    74             }
>
> Can you just filter those test cases out in the main method within EnumSet.complementOf?
Good point! Done.

>    82             mtype = adapter.type();
>    83             if (mtype.parameterCount() == 0) {
>    84                 // Ignore identity_* LambdaForms.
>    85                 return;
>    86             }
>
> Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required?
Some transformations can rarely degenerate into identity. I share your 
concern, so I decided to check LambdaFor.debugName instead.

>>   - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance.
>>
>
>    78                 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE);
>
>
> Could replace "getTestMethod()" with "testCase".
Done.

Best regards,
Vladimir Ivanov

From paul.sandoz at oracle.com  Mon Jan 12 18:42:11 2015
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Mon, 12 Jan 2015 19:42:11 +0100
Subject: [9] RFR (M): 8067344: Adjust
	java/lang/invoke/LFCaching/LFGarbageCollectedTest.java for
	recent changes in java.lang.invoke
In-Reply-To: <54B40D3E.3080608@oracle.com>
References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com>
	<ADE5A273-0CC0-43F0-B7CF-633A558BCE4E@oracle.com>
	<54B40D3E.3080608@oracle.com>
Message-ID: <6C72F39E-3CD3-4005-BD47-0FEEFFCD1F43@oracle.com>

On Jan 12, 2015, at 7:06 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> Paul,
> 
> Thanks for the review!
> 

Look good, +1,
Paul.

> Updated webrev:
> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.02
> 
>>   70             TestMethods testCase = getTestMethod();
>>   71             if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) {
>>   72                 // Invokers aren't collected.
>>   73                 return;
>>   74             }
>> 
>> Can you just filter those test cases out in the main method within EnumSet.complementOf?
> Good point! Done.
> 
>>   82             mtype = adapter.type();
>>   83             if (mtype.parameterCount() == 0) {
>>   84                 // Ignore identity_* LambdaForms.
>>   85                 return;
>>   86             }
>> 
>> Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required?
> Some transformations can rarely degenerate into identity. I share your concern, so I decided to check LambdaFor.debugName instead.
> 
>>>  - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance.
>>> 
>> 
>>   78                 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE);
>> 
>> 
>> Could replace "getTestMethod()" with "testCase".
> Done.
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150112/c2b6140b/signature.asc>

From vladimir.x.ivanov at oracle.com  Mon Jan 12 19:12:10 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 12 Jan 2015 22:12:10 +0300
Subject: [9] RFR (M): 8067344: Adjust
	java/lang/invoke/LFCaching/LFGarbageCollectedTest.java
	for recent changes in java.lang.invoke
In-Reply-To: <6C72F39E-3CD3-4005-BD47-0FEEFFCD1F43@oracle.com>
References: <549884E7.8040204@oracle.com> <549962C4.2040301@oracle.com>
	<ADE5A273-0CC0-43F0-B7CF-633A558BCE4E@oracle.com>
	<54B40D3E.3080608@oracle.com>
	<6C72F39E-3CD3-4005-BD47-0FEEFFCD1F43@oracle.com>
Message-ID: <54B41C8A.4060103@oracle.com>

Thanks, Paul!

Best regards,
Vladimir Ivanov

On 1/12/15 9:42 PM, Paul Sandoz wrote:
> On Jan 12, 2015, at 7:06 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>> Paul,
>>
>> Thanks for the review!
>>
>
> Look good, +1,
> Paul.
>
>> Updated webrev:
>> http://cr.openjdk.java.net/~vlivanov/8067344/webrev.02
>>
>>>    70             TestMethods testCase = getTestMethod();
>>>    71             if (testCase == TestMethods.EXACT_INVOKER || testCase == TestMethods.INVOKER) {
>>>    72                 // Invokers aren't collected.
>>>    73                 return;
>>>    74             }
>>>
>>> Can you just filter those test cases out in the main method within EnumSet.complementOf?
>> Good point! Done.
>>
>>>    82             mtype = adapter.type();
>>>    83             if (mtype.parameterCount() == 0) {
>>>    84                 // Ignore identity_* LambdaForms.
>>>    85                 return;
>>>    86             }
>>>
>>> Under what conditions does this arise? i guess it might be non-determinisitic based on the randomly generated arity for the test case, so could filter more tests than absolutely required?
>> Some transformations can rarely degenerate into identity. I share your concern, so I decided to check LambdaFor.debugName instead.
>>
>>>>   - need to keep original test data for diagnostic purposes, since getTestCaseData() produces new instance.
>>>>
>>>
>>>    78                 adapter = getTestMethod().getTestCaseMH(data, TestMethods.Kind.ONE);
>>>
>>>
>>> Could replace "getTestMethod()" with "testCase".
>> Done.
>>
>
>

From vladimir.x.ivanov at oracle.com  Fri Jan 16 17:16:22 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 16 Jan 2015 20:16:22 +0300
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
Message-ID: <54B94766.2080102@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8063137

After GuardWithTest (GWT) LambdaForms became shared, profile pollution 
significantly distorted compilation decisions. It affected inlining and 
hindered some optimizations. It causes significant performance 
regressions for Nashorn (on Octane benchmarks).

Inlining was fixed by 8059877 [1], but it didn't cover the case when a 
branch is never taken. It can cause missed optimization opportunity, and 
not just increase in code size. For example, non-pruned branch can break 
escape analysis.

Currently, there are 2 problems:
   - branch frequencies profile pollution
   - deoptimization counts pollution

Branch frequency pollution hides from JIT the fact that a branch is 
never taken. Since GWT LambdaForms (and hence their bytecode) are 
heavily shared, but the behavior is specific to MethodHandle, there's no 
way for JIT to understand how particular GWT instance behaves.

The solution I propose is to do profiling in Java code and feed it to 
JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where 
profiling info is stored. Once JIT kicks in, it can retrieve these 
counts, if corresponding MethodHandle is a compile-time constant (and it 
is usually the case). To communicate the profile data from Java code to 
JIT, MethodHandleImpl::profileBranch() is used.

If GWT MethodHandle isn't a compile-time constant, profiling should 
proceed. It happens when corresponding LambdaForm is already shared, for 
newly created GWT MethodHandles profiling can occur only in native code 
(dedicated nmethod for a single LambdaForm). So, when compilation of the 
whole MethodHandle chain is triggered, the profile should be already 
gathered.

Overriding branch frequencies is not enough. Statistics on 
deoptimization events is also polluted. Even if a branch is never taken, 
JIT doesn't issue an uncommon trap there unless corresponding bytecode 
doesn't trap too much and doesn't cause too many recompiles.

I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT 
sees it on some method, Compile::too_many_traps & 
Compile::too_many_recompiles for that method always return false. It 
allows JIT to prune the branch based on custom profile and recompile the 
method, if the branch is visited.

For now, I wanted to keep the fix very focused. The next thing I plan to 
do is to experiment with ignoring deoptimization counts for other 
LambdaForms which are heavily shared. I already saw problems caused by 
deoptimization counts pollution (see JDK-8068915 [2]).

I plan to backport the fix into 8u40, once I finish extensive 
performance testing.

Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane).

Thanks!

PS: as a summary, my experiments show that fixes for 8063137 & 8068915 
[2] almost completely recovers peak performance after LambdaForm sharing 
[3]. There's one more problem left (non-inlined MethodHandle invocations 
are more expensive when LFs are shared), but it's a story for another day.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8059877
     8059877: GWT branch frequencies pollution due to LF sharing
[2] https://bugs.openjdk.java.net/browse/JDK-8068915
[3] https://bugs.openjdk.java.net/browse/JDK-8046703
     JEP 210: LambdaForm Reduction and Caching

From vladimir.kozlov at oracle.com  Fri Jan 16 20:34:50 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 16 Jan 2015 12:34:50 -0800
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <54B94766.2080102@oracle.com>
References: <54B94766.2080102@oracle.com>
Message-ID: <54B975EA.6040005@oracle.com>

Nice! At least Hotspot part since I don't understand jdk part :)

I would suggest to add more detailed comment (instead of simple "Stop 
profiling") to inline_profileBranch() intrinsic explaining what it is 
doing because it is not strictly "intrinsic" - it does not implement 
profileBranch() java code when counts is constant.

You forgot to mark Opaque4Node as macro node. I would suggest to base it 
on Opaque2Node then you will get some methods from it.

Thanks,
Vladimir

On 1/16/15 9:16 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
> https://bugs.openjdk.java.net/browse/JDK-8063137
>
> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
> significantly distorted compilation decisions. It affected inlining and
> hindered some optimizations. It causes significant performance
> regressions for Nashorn (on Octane benchmarks).
>
> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
> branch is never taken. It can cause missed optimization opportunity, and
> not just increase in code size. For example, non-pruned branch can break
> escape analysis.
>
> Currently, there are 2 problems:
>    - branch frequencies profile pollution
>    - deoptimization counts pollution
>
> Branch frequency pollution hides from JIT the fact that a branch is
> never taken. Since GWT LambdaForms (and hence their bytecode) are
> heavily shared, but the behavior is specific to MethodHandle, there's no
> way for JIT to understand how particular GWT instance behaves.
>
> The solution I propose is to do profiling in Java code and feed it to
> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
> profiling info is stored. Once JIT kicks in, it can retrieve these
> counts, if corresponding MethodHandle is a compile-time constant (and it
> is usually the case). To communicate the profile data from Java code to
> JIT, MethodHandleImpl::profileBranch() is used.
>
> If GWT MethodHandle isn't a compile-time constant, profiling should
> proceed. It happens when corresponding LambdaForm is already shared, for
> newly created GWT MethodHandles profiling can occur only in native code
> (dedicated nmethod for a single LambdaForm). So, when compilation of the
> whole MethodHandle chain is triggered, the profile should be already
> gathered.
>
> Overriding branch frequencies is not enough. Statistics on
> deoptimization events is also polluted. Even if a branch is never taken,
> JIT doesn't issue an uncommon trap there unless corresponding bytecode
> doesn't trap too much and doesn't cause too many recompiles.
>
> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
> sees it on some method, Compile::too_many_traps &
> Compile::too_many_recompiles for that method always return false. It
> allows JIT to prune the branch based on custom profile and recompile the
> method, if the branch is visited.
>
> For now, I wanted to keep the fix very focused. The next thing I plan to
> do is to experiment with ignoring deoptimization counts for other
> LambdaForms which are heavily shared. I already saw problems caused by
> deoptimization counts pollution (see JDK-8068915 [2]).
>
> I plan to backport the fix into 8u40, once I finish extensive
> performance testing.
>
> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane).
>
> Thanks!
>
> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
> [2] almost completely recovers peak performance after LambdaForm sharing
> [3]. There's one more problem left (non-inlined MethodHandle invocations
> are more expensive when LFs are shared), but it's a story for another day.
>
> Best regards,
> Vladimir Ivanov
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>      8059877: GWT branch frequencies pollution due to LF sharing
> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>      JEP 210: LambdaForm Reduction and Caching
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

From john.r.rose at oracle.com  Fri Jan 16 23:13:48 2015
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 16 Jan 2015 15:13:48 -0800
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <54B94766.2080102@oracle.com>
References: <54B94766.2080102@oracle.com>
Message-ID: <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>

On Jan 16, 2015, at 9:16 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ <http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/>
> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ <http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/>
> https://bugs.openjdk.java.net/browse/JDK-8063137 <https://bugs.openjdk.java.net/browse/JDK-8063137>
> ...
> PS: as a summary, my experiments show that fixes for 8063137 & 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day.

This performance bump is excellent news.  LFs are supposed to express emergently common behaviors, like hidden classes.  We are much closer to that goal now.

I'm glad to see that the library-assisted profiling turns out to be relatively clean.

In effect this restores the pre-LF CountingMethodHandle logic from 2011, which was so beneficial in JDK 7:
  http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/02de5cdbef21/src/share/classes/java/lang/invoke/CountingMethodHandle.java

I have some suggestions to make this version a little cleaner; see below.

Starting with the JDK changes:

In LambdaForm.java, I'm feeling flag pressure from all the little boolean fields and constructor parameters.

(Is it time to put in a bit-encoded field "private byte LambdaForm.flags", or do we wait for another boolean to come along?  But see next questions, which are more important.)

What happens when a GWT LF gets inlined into a larger LF?  Then there might be two or more selectAlternative calls.
Will this confuse anything or will it Just Work?  The combined LF will get profiled as usual, and the selectAlternative calls will also collect profile (or not?).

This leads to another question:  Why have a boolean 'isGWT' at all?  Why not just check for one or more occurrence of selectAlternative, and declare that those guys override (some of) the profiling.  Something like:

  -+ if (PROFILE_GWT && lambdaForm.isGWT) 
  ++ if (PROFILE_GWT && lambdaForm.containsFunction(NF_selectAlternative))
(...where LF.containsFunction(NamedFunction) is a variation of LF.contains(Name).)

I suppose the answer may be that you want to inline GWTs (if ever) into customized code where the JVM profiling should get maximum benefit.  In that case case you might want to set the boolean to "false" to distinguish "immature" GWT combinators from customized ones.

If that's the case, perhaps the real boolean flag you want is not 'isGWT' but 'sharedProfile' or 'immature' or some such, or (inverting) 'customized'.  (I like the feel of a 'customized' flag.)  Then @IgnoreProfile would get attached to a LF that (a ) contains selectAlternative and (b ) is marked as non-customized/immature/shared.  You might also want to adjust the call to 'profileBranch' based on whether the containing LF was shared or customized.

What I'm mainly poking at here is that 'isGWT' is not informative about the intended use of the flag.

In 'updateCounters', if the counter overflows, you'll get continuous creation of ArithmeticExceptions.  Will that optimize or will it cause a permanent slowdown?  Consider a hack like this on the exception path:
   counters[idx] = Integer.MAX_VALUE / 2;

On the Name Bikeshed:  It looks like @IgnoreProfile (ignore_profile in the VM) promises too much "ignorance", since it suppresses branch counts and traps, but allows type profiles to be consulted.  Maybe something positive like "@ManyTraps" or "@SharedMegamorphic"?  (It's just a name, and this is just a suggestion.)

Going to the JVM:

In library_call.cpp, I think you should change the assert to a guard:
  -+     assert(aobj->length() == 2, "");
  ++     && aobj->length() == 2) {

In Parse::dynamic_branch_prediction, the mere presence of the Opaque4 node is enough to trigger replacement of profiling.  I think there should *not* be a test of method()->ignore_profile().  That should provide better integration between the two sources of profile data to JVM profiling?

Also, I think the name 'Opaque4Node' is way too? opaque.  Suggest 'ProfileBranchNode', since that's exactly what it does.

Suggest changing the log element "profile_branch" to "observe source='profileBranch'", to make a better hint as to the source of the info.

? John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150116/4c393179/attachment.html>

From marcus.lagergren at oracle.com  Sun Jan 18 21:54:43 2015
From: marcus.lagergren at oracle.com (Marcus Lagergren)
Date: Sun, 18 Jan 2015 22:54:43 +0100
Subject: JFokus 2015 - the VM Tech Day
Message-ID: <C2867DDB-1EAA-42D9-9132-BD2D116FEDA2@oracle.com>

Greetings community members!

Here is something that I'm sure you'll find interesting.

I want to advertise the upcoming "VM tech day? event, scheduled to
take place February 2, 2015 at the JFokus conference in
Stockholm. Sorry I am on a bit of a short notice here, but finalizing
the speaker list took us a bit more time than expected.

The VM tech day is a mini-track that runs the first day of the JFokus
conference. This is its schedule: 
https://www.jfokus.se/jfokus/jvmtech.jsp

After some rather challenging months of jigsaw puzzles, it is with
great pleasure that I can announce that our speaker line up is now
complete - and it is great indeed! We are talking 100% gurus,
prophets, ninjas, rock stars, and all other similar terms that
normally gets your resume binned if it passes my desk. But in this
case the labels are true. We have strictly top names from both the
commercial world and from academia ready to take you on a great
ride.

So what is the VM tech day? For those of you familiar with the JVM
Language Summit (JVMLS) that usually takes place in Santa Clara in
the summers, the format is similar. It?s the usual deal: anyone
morbidly interested in runtime internals, code generation, polyglot
programming and the complexities of language implementation, should
find a veritable gold mine of stimulating conversation and knowledge
transfer here. What is different from a typical JVMLS (except for the
shorter duration), is that we have widened the scope a bit to include
several runtimes, language implementation issues and polyglot
problems.

There will be six scheduled sessions and plenty of time for breakouts
and discussions. We will also heavily encourage audience interaction
and participation.

The JFokus VM tech day is opened by John Rose. I am sure John needs 
no introduction to the subscribers of this list. With advanced OpenJDK
projects like Valhalla and Panama booting up, John will discuss what
the JVM has in store for the future. 

Other speakers include the tireless Charlie Nutter from Red Hat, the
formidable Remi Forax, the brilliant Vyacheslav Egorov of Google v8
fame, the esteemed Dan Heidinga from IBM and the good looking Attila
Szegedi from Oracle.

We also have plenty of non-speaking celebrity participants in the
audience, for example Fredrik ?hrstr?m: invokedynamic specification 
wizard extraordinaire and architect behind the new OpenJDK build
system. Stop by and get autographs ;)

Thusly: if you are attending JFokus, or if you are making up your mind
about attending it right now, the VM tech summit is definitely
something anyone subscribing to mlvm-dev wouldn't want to miss. The
cross-platform/cross-technology/cross-company focus that we have tried
very hard to create will without a doubt be ultra stimulating. Of that
you can be sure.

Please help us spread the word in whatever forums you deem
appropriate! Talk to you friends! Tweet links to this post! Yell from
your cubicle soap boxes across the neverending seas of fluorescent
lights!

Any further questions you may have about the event, not answered by
the web pages, can be directed either to me (@lagergren) or Mattias 
Karlsson (@matkar) or as replies to this e-mail thread.

On behalf of JFokus / VM Tech Day 2015
Marcus Lagergren
Master of ceremonies (or something)


From marcus.lagergren at oracle.com  Mon Jan 19 09:58:50 2015
From: marcus.lagergren at oracle.com (Marcus Lagergren)
Date: Mon, 19 Jan 2015 10:58:50 +0100
Subject: JFokus 2015 - the VM Tech Day
In-Reply-To: <C2867DDB-1EAA-42D9-9132-BD2D116FEDA2@oracle.com>
References: <C2867DDB-1EAA-42D9-9132-BD2D116FEDA2@oracle.com>
Message-ID: <36D2A3B5-A5ED-421A-8DAC-379712002366@oracle.com>

And to further clarify things - you can attend _only_ the VM Tech day / tech summit, should you so desire, and skip the rest of the JFokus conference. (What a strange thing to do, given the quality of JFokus, but I can?t be the one questioning your priorities here)

(http://www.jfokus.se/jfokus/register.jsp <http://www.jfokus.se/jfokus/register.jsp>)

/M

> On 18 Jan 2015, at 22:54, Marcus Lagergren <marcus.lagergren at oracle.com> wrote:
> 
> Greetings community members!
> 
> Here is something that I'm sure you'll find interesting.
> 
> I want to advertise the upcoming "VM tech day? event, scheduled to
> take place February 2, 2015 at the JFokus conference in
> Stockholm. Sorry I am on a bit of a short notice here, but finalizing
> the speaker list took us a bit more time than expected.
> 
> The VM tech day is a mini-track that runs the first day of the JFokus
> conference. This is its schedule: 
> https://www.jfokus.se/jfokus/jvmtech.jsp
> 
> After some rather challenging months of jigsaw puzzles, it is with
> great pleasure that I can announce that our speaker line up is now
> complete - and it is great indeed! We are talking 100% gurus,
> prophets, ninjas, rock stars, and all other similar terms that
> normally gets your resume binned if it passes my desk. But in this
> case the labels are true. We have strictly top names from both the
> commercial world and from academia ready to take you on a great
> ride.
> 
> So what is the VM tech day? For those of you familiar with the JVM
> Language Summit (JVMLS) that usually takes place in Santa Clara in
> the summers, the format is similar. It?s the usual deal: anyone
> morbidly interested in runtime internals, code generation, polyglot
> programming and the complexities of language implementation, should
> find a veritable gold mine of stimulating conversation and knowledge
> transfer here. What is different from a typical JVMLS (except for the
> shorter duration), is that we have widened the scope a bit to include
> several runtimes, language implementation issues and polyglot
> problems.
> 
> There will be six scheduled sessions and plenty of time for breakouts
> and discussions. We will also heavily encourage audience interaction
> and participation.
> 
> The JFokus VM tech day is opened by John Rose. I am sure John needs 
> no introduction to the subscribers of this list. With advanced OpenJDK
> projects like Valhalla and Panama booting up, John will discuss what
> the JVM has in store for the future. 
> 
> Other speakers include the tireless Charlie Nutter from Red Hat, the
> formidable Remi Forax, the brilliant Vyacheslav Egorov of Google v8
> fame, the esteemed Dan Heidinga from IBM and the good looking Attila
> Szegedi from Oracle.
> 
> We also have plenty of non-speaking celebrity participants in the
> audience, for example Fredrik ?hrstr?m: invokedynamic specification 
> wizard extraordinaire and architect behind the new OpenJDK build
> system. Stop by and get autographs ;)
> 
> Thusly: if you are attending JFokus, or if you are making up your mind
> about attending it right now, the VM tech summit is definitely
> something anyone subscribing to mlvm-dev wouldn't want to miss. The
> cross-platform/cross-technology/cross-company focus that we have tried
> very hard to create will without a doubt be ultra stimulating. Of that
> you can be sure.
> 
> Please help us spread the word in whatever forums you deem
> appropriate! Talk to you friends! Tweet links to this post! Yell from
> your cubicle soap boxes across the neverending seas of fluorescent
> lights!
> 
> Any further questions you may have about the event, not answered by
> the web pages, can be directed either to me (@lagergren) or Mattias 
> Karlsson (@matkar) or as replies to this e-mail thread.
> 
> On behalf of JFokus / VM Tech Day 2015
> Marcus Lagergren
> Master of ceremonies (or something)
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150119/e8552f70/attachment-0001.html>

From vladimir.x.ivanov at oracle.com  Mon Jan 19 17:05:49 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 19 Jan 2015 20:05:49 +0300
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <54B975EA.6040005@oracle.com>
References: <54B94766.2080102@oracle.com> <54B975EA.6040005@oracle.com>
Message-ID: <54BD396D.2050907@oracle.com>

Thanks, Vladimir!

> I would suggest to add more detailed comment (instead of simple "Stop
> profiling") to inline_profileBranch() intrinsic explaining what it is
> doing because it is not strictly "intrinsic" - it does not implement
> profileBranch() java code when counts is constant.
Sure, will do.

> You forgot to mark Opaque4Node as macro node. I would suggest to base it
> on Opaque2Node then you will get some methods from it.
Do I really need to do so? I expect it to go away during IGVN pass right 
after parsing is over. That's why I register the node for igvn in 
LibraryCallKit::inline_profileBranch(). Changes in macro.cpp & 
compile.cpp are leftovers from the version when Opaque4 was macro node. 
I plan to remove them.

Best regards,
Vladimir Ivanov

> On 1/16/15 9:16 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>> https://bugs.openjdk.java.net/browse/JDK-8063137
>>
>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>> significantly distorted compilation decisions. It affected inlining and
>> hindered some optimizations. It causes significant performance
>> regressions for Nashorn (on Octane benchmarks).
>>
>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>> branch is never taken. It can cause missed optimization opportunity, and
>> not just increase in code size. For example, non-pruned branch can break
>> escape analysis.
>>
>> Currently, there are 2 problems:
>>    - branch frequencies profile pollution
>>    - deoptimization counts pollution
>>
>> Branch frequency pollution hides from JIT the fact that a branch is
>> never taken. Since GWT LambdaForms (and hence their bytecode) are
>> heavily shared, but the behavior is specific to MethodHandle, there's no
>> way for JIT to understand how particular GWT instance behaves.
>>
>> The solution I propose is to do profiling in Java code and feed it to
>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>> profiling info is stored. Once JIT kicks in, it can retrieve these
>> counts, if corresponding MethodHandle is a compile-time constant (and it
>> is usually the case). To communicate the profile data from Java code to
>> JIT, MethodHandleImpl::profileBranch() is used.
>>
>> If GWT MethodHandle isn't a compile-time constant, profiling should
>> proceed. It happens when corresponding LambdaForm is already shared, for
>> newly created GWT MethodHandles profiling can occur only in native code
>> (dedicated nmethod for a single LambdaForm). So, when compilation of the
>> whole MethodHandle chain is triggered, the profile should be already
>> gathered.
>>
>> Overriding branch frequencies is not enough. Statistics on
>> deoptimization events is also polluted. Even if a branch is never taken,
>> JIT doesn't issue an uncommon trap there unless corresponding bytecode
>> doesn't trap too much and doesn't cause too many recompiles.
>>
>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>> sees it on some method, Compile::too_many_traps &
>> Compile::too_many_recompiles for that method always return false. It
>> allows JIT to prune the branch based on custom profile and recompile the
>> method, if the branch is visited.
>>
>> For now, I wanted to keep the fix very focused. The next thing I plan to
>> do is to experiment with ignoring deoptimization counts for other
>> LambdaForms which are heavily shared. I already saw problems caused by
>> deoptimization counts pollution (see JDK-8068915 [2]).
>>
>> I plan to backport the fix into 8u40, once I finish extensive
>> performance testing.
>>
>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>> Octane).
>>
>> Thanks!
>>
>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>> [2] almost completely recovers peak performance after LambdaForm sharing
>> [3]. There's one more problem left (non-inlined MethodHandle invocations
>> are more expensive when LFs are shared), but it's a story for another
>> day.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>>      8059877: GWT branch frequencies pollution due to LF sharing
>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>>      JEP 210: LambdaForm Reduction and Caching
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

From duncan.macgregor at ge.com  Mon Jan 19 20:21:59 2015
From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management))
Date: Mon, 19 Jan 2015 20:21:59 +0000
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned
	when GWT LambdaForms are shared
In-Reply-To: <54B94766.2080102@oracle.com>
References: <54B94766.2080102@oracle.com>
Message-ID: <D0E31293.CBC09%duncan.macgregor@ge.com>

Okay, I?ve done some tests of this with the micro benchmarks for our
language & runtime which show pretty much no change except for one test
which is now almost 3x slower. It uses nested loops to iterate over an
array and concatenate the string-like objects it contains, and replaces
elements with these new longer string-llike objects. It?s a bit of a
pathological case, and I haven?t seen the same sort of degradation in the
other benchmarks or in real applications, but I haven?t done serious
benchmarking of them with this change.

I shall see if the test case can be reduced down to anything simpler while
still showing the same performance behaviour, and try add some compilation
logging options to narrow down what?s going on.

Duncan.

On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
wrote:

>http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>https://bugs.openjdk.java.net/browse/JDK-8063137
>
>After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>significantly distorted compilation decisions. It affected inlining and
>hindered some optimizations. It causes significant performance
>regressions for Nashorn (on Octane benchmarks).
>
>Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>branch is never taken. It can cause missed optimization opportunity, and
>not just increase in code size. For example, non-pruned branch can break
>escape analysis.
>
>Currently, there are 2 problems:
>   - branch frequencies profile pollution
>   - deoptimization counts pollution
>
>Branch frequency pollution hides from JIT the fact that a branch is
>never taken. Since GWT LambdaForms (and hence their bytecode) are
>heavily shared, but the behavior is specific to MethodHandle, there's no
>way for JIT to understand how particular GWT instance behaves.
>
>The solution I propose is to do profiling in Java code and feed it to
>JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>profiling info is stored. Once JIT kicks in, it can retrieve these
>counts, if corresponding MethodHandle is a compile-time constant (and it
>is usually the case). To communicate the profile data from Java code to
>JIT, MethodHandleImpl::profileBranch() is used.
>
>If GWT MethodHandle isn't a compile-time constant, profiling should
>proceed. It happens when corresponding LambdaForm is already shared, for
>newly created GWT MethodHandles profiling can occur only in native code
>(dedicated nmethod for a single LambdaForm). So, when compilation of the
>whole MethodHandle chain is triggered, the profile should be already
>gathered.
>
>Overriding branch frequencies is not enough. Statistics on
>deoptimization events is also polluted. Even if a branch is never taken,
>JIT doesn't issue an uncommon trap there unless corresponding bytecode
>doesn't trap too much and doesn't cause too many recompiles.
>
>I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>sees it on some method, Compile::too_many_traps &
>Compile::too_many_recompiles for that method always return false. It
>allows JIT to prune the branch based on custom profile and recompile the
>method, if the branch is visited.
>
>For now, I wanted to keep the fix very focused. The next thing I plan to
>do is to experiment with ignoring deoptimization counts for other
>LambdaForms which are heavily shared. I already saw problems caused by
>deoptimization counts pollution (see JDK-8068915 [2]).
>
>I plan to backport the fix into 8u40, once I finish extensive
>performance testing.
>
>Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>Octane).
>
>Thanks!
>
>PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>[2] almost completely recovers peak performance after LambdaForm sharing
>[3]. There's one more problem left (non-inlined MethodHandle invocations
>are more expensive when LFs are shared), but it's a story for another day.
>
>Best regards,
>Vladimir Ivanov
>
>[1] https://bugs.openjdk.java.net/browse/JDK-8059877
>     8059877: GWT branch frequencies pollution due to LF sharing
>[2] https://bugs.openjdk.java.net/browse/JDK-8068915
>[3] https://bugs.openjdk.java.net/browse/JDK-8046703
>     JEP 210: LambdaForm Reduction and Caching
>_______________________________________________
>mlvm-dev mailing list
>mlvm-dev at openjdk.java.net
>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From vladimir.x.ivanov at oracle.com  Tue Jan 20 12:40:50 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 20 Jan 2015 15:40:50 +0300
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <D0E31293.CBC09%duncan.macgregor@ge.com>
References: <54B94766.2080102@oracle.com>
	<D0E31293.CBC09%duncan.macgregor@ge.com>
Message-ID: <54BE4CD2.30805@oracle.com>

Duncan, thanks a lot for giving it a try!

If you plan to spend more time on it, please, apply 8068915 as well. I 
saw huge intermittent performance regressions due to continuous 
deoptimization storm. You can look into -XX:+LogCompilation output and 
look for repeated deoptimization events in steady state w/ Action_none. 
Also, there's deoptimization statistics in the log (at least, in jdk9). 
It's located right before compilation_log tag.

Thanks again for the valuable feedback!

Best regards,
Vladimir Ivanov

[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00

On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote:
> Okay, I?ve done some tests of this with the micro benchmarks for our
> language & runtime which show pretty much no change except for one test
> which is now almost 3x slower. It uses nested loops to iterate over an
> array and concatenate the string-like objects it contains, and replaces
> elements with these new longer string-llike objects. It?s a bit of a
> pathological case, and I haven?t seen the same sort of degradation in the
> other benchmarks or in real applications, but I haven?t done serious
> benchmarking of them with this change.
>
> I shall see if the test case can be reduced down to anything simpler while
> still showing the same performance behaviour, and try add some compilation
> logging options to narrow down what?s going on.
>
> Duncan.
>
> On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
> wrote:
>
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>> https://bugs.openjdk.java.net/browse/JDK-8063137
>>
>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>> significantly distorted compilation decisions. It affected inlining and
>> hindered some optimizations. It causes significant performance
>> regressions for Nashorn (on Octane benchmarks).
>>
>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>> branch is never taken. It can cause missed optimization opportunity, and
>> not just increase in code size. For example, non-pruned branch can break
>> escape analysis.
>>
>> Currently, there are 2 problems:
>>    - branch frequencies profile pollution
>>    - deoptimization counts pollution
>>
>> Branch frequency pollution hides from JIT the fact that a branch is
>> never taken. Since GWT LambdaForms (and hence their bytecode) are
>> heavily shared, but the behavior is specific to MethodHandle, there's no
>> way for JIT to understand how particular GWT instance behaves.
>>
>> The solution I propose is to do profiling in Java code and feed it to
>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>> profiling info is stored. Once JIT kicks in, it can retrieve these
>> counts, if corresponding MethodHandle is a compile-time constant (and it
>> is usually the case). To communicate the profile data from Java code to
>> JIT, MethodHandleImpl::profileBranch() is used.
>>
>> If GWT MethodHandle isn't a compile-time constant, profiling should
>> proceed. It happens when corresponding LambdaForm is already shared, for
>> newly created GWT MethodHandles profiling can occur only in native code
>> (dedicated nmethod for a single LambdaForm). So, when compilation of the
>> whole MethodHandle chain is triggered, the profile should be already
>> gathered.
>>
>> Overriding branch frequencies is not enough. Statistics on
>> deoptimization events is also polluted. Even if a branch is never taken,
>> JIT doesn't issue an uncommon trap there unless corresponding bytecode
>> doesn't trap too much and doesn't cause too many recompiles.
>>
>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>> sees it on some method, Compile::too_many_traps &
>> Compile::too_many_recompiles for that method always return false. It
>> allows JIT to prune the branch based on custom profile and recompile the
>> method, if the branch is visited.
>>
>> For now, I wanted to keep the fix very focused. The next thing I plan to
>> do is to experiment with ignoring deoptimization counts for other
>> LambdaForms which are heavily shared. I already saw problems caused by
>> deoptimization counts pollution (see JDK-8068915 [2]).
>>
>> I plan to backport the fix into 8u40, once I finish extensive
>> performance testing.
>>
>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>> Octane).
>>
>> Thanks!
>>
>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>> [2] almost completely recovers peak performance after LambdaForm sharing
>> [3]. There's one more problem left (non-inlined MethodHandle invocations
>> are more expensive when LFs are shared), but it's a story for another day.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>>      8059877: GWT branch frequencies pollution due to LF sharing
>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>>      JEP 210: LambdaForm Reduction and Caching
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>

From duncan.macgregor at ge.com  Tue Jan 20 13:09:45 2015
From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management))
Date: Tue, 20 Jan 2015 13:09:45 +0000
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned
	when GWT LambdaForms are shared
In-Reply-To: <54BE4CD2.30805@oracle.com>
References: <54B94766.2080102@oracle.com>
	<D0E31293.CBC09%duncan.macgregor@ge.com> <54BE4CD2.30805@oracle.com>
Message-ID: <D0E403C4.CBFB2%duncan.macgregor@ge.com>

I?ll apply that patch and try to run more tests this afternoon.

On 20/01/2015 12:40, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
wrote:

>Duncan, thanks a lot for giving it a try!
>
>If you plan to spend more time on it, please, apply 8068915 as well. I
>saw huge intermittent performance regressions due to continuous
>deoptimization storm. You can look into -XX:+LogCompilation output and
>look for repeated deoptimization events in steady state w/ Action_none.
>Also, there's deoptimization statistics in the log (at least, in jdk9).
>It's located right before compilation_log tag.
>
>Thanks again for the valuable feedback!
>
>Best regards,
>Vladimir Ivanov
>
>[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00
>
>On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote:
>> Okay, I?ve done some tests of this with the micro benchmarks for our
>> language & runtime which show pretty much no change except for one test
>> which is now almost 3x slower. It uses nested loops to iterate over an
>> array and concatenate the string-like objects it contains, and replaces
>> elements with these new longer string-llike objects. It?s a bit of a
>> pathological case, and I haven?t seen the same sort of degradation in
>>the
>> other benchmarks or in real applications, but I haven?t done serious
>> benchmarking of them with this change.
>>
>> I shall see if the test case can be reduced down to anything simpler
>>while
>> still showing the same performance behaviour, and try add some
>>compilation
>> logging options to narrow down what?s going on.
>>
>> Duncan.
>>
>> On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
>> wrote:
>>
>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>>> https://bugs.openjdk.java.net/browse/JDK-8063137
>>>
>>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>>> significantly distorted compilation decisions. It affected inlining and
>>> hindered some optimizations. It causes significant performance
>>> regressions for Nashorn (on Octane benchmarks).
>>>
>>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>>> branch is never taken. It can cause missed optimization opportunity,
>>>and
>>> not just increase in code size. For example, non-pruned branch can
>>>break
>>> escape analysis.
>>>
>>> Currently, there are 2 problems:
>>>    - branch frequencies profile pollution
>>>    - deoptimization counts pollution
>>>
>>> Branch frequency pollution hides from JIT the fact that a branch is
>>> never taken. Since GWT LambdaForms (and hence their bytecode) are
>>> heavily shared, but the behavior is specific to MethodHandle, there's
>>>no
>>> way for JIT to understand how particular GWT instance behaves.
>>>
>>> The solution I propose is to do profiling in Java code and feed it to
>>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>>> profiling info is stored. Once JIT kicks in, it can retrieve these
>>> counts, if corresponding MethodHandle is a compile-time constant (and
>>>it
>>> is usually the case). To communicate the profile data from Java code to
>>> JIT, MethodHandleImpl::profileBranch() is used.
>>>
>>> If GWT MethodHandle isn't a compile-time constant, profiling should
>>> proceed. It happens when corresponding LambdaForm is already shared,
>>>for
>>> newly created GWT MethodHandles profiling can occur only in native code
>>> (dedicated nmethod for a single LambdaForm). So, when compilation of
>>>the
>>> whole MethodHandle chain is triggered, the profile should be already
>>> gathered.
>>>
>>> Overriding branch frequencies is not enough. Statistics on
>>> deoptimization events is also polluted. Even if a branch is never
>>>taken,
>>> JIT doesn't issue an uncommon trap there unless corresponding bytecode
>>> doesn't trap too much and doesn't cause too many recompiles.
>>>
>>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>>> sees it on some method, Compile::too_many_traps &
>>> Compile::too_many_recompiles for that method always return false. It
>>> allows JIT to prune the branch based on custom profile and recompile
>>>the
>>> method, if the branch is visited.
>>>
>>> For now, I wanted to keep the fix very focused. The next thing I plan
>>>to
>>> do is to experiment with ignoring deoptimization counts for other
>>> LambdaForms which are heavily shared. I already saw problems caused by
>>> deoptimization counts pollution (see JDK-8068915 [2]).
>>>
>>> I plan to backport the fix into 8u40, once I finish extensive
>>> performance testing.
>>>
>>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>>> Octane).
>>>
>>> Thanks!
>>>
>>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>>> [2] almost completely recovers peak performance after LambdaForm
>>>sharing
>>> [3]. There's one more problem left (non-inlined MethodHandle
>>>invocations
>>> are more expensive when LFs are shared), but it's a story for another
>>>day.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>>>      8059877: GWT branch frequencies pollution due to LF sharing
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
>>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>>>      JEP 210: LambdaForm Reduction and Caching
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>_______________________________________________
>mlvm-dev mailing list
>mlvm-dev at openjdk.java.net
>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From duncan.macgregor at ge.com  Tue Jan 20 17:14:00 2015
From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management))
Date: Tue, 20 Jan 2015 17:14:00 +0000
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned
	when GWT LambdaForms are shared
In-Reply-To: <54BE4CD2.30805@oracle.com>
References: <54B94766.2080102@oracle.com>
	<D0E31293.CBC09%duncan.macgregor@ge.com> <54BE4CD2.30805@oracle.com>
Message-ID: <D0E43C3B.CC1C3%duncan.macgregor@ge.com>

Hmm, 8068915 hasn?t fixed it, but running fewer benchmarks seems to make
the problem go away, so it looks like there?s something going wrong fairly
deep in our runtime. Trying the full suite with compilation logging
enabled now to see if I can find a smoking gun.

On 20/01/2015 12:40, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
wrote:

>Duncan, thanks a lot for giving it a try!
>
>If you plan to spend more time on it, please, apply 8068915 as well. I
>saw huge intermittent performance regressions due to continuous
>deoptimization storm. You can look into -XX:+LogCompilation output and
>look for repeated deoptimization events in steady state w/ Action_none.
>Also, there's deoptimization statistics in the log (at least, in jdk9).
>It's located right before compilation_log tag.
>
>Thanks again for the valuable feedback!
>
>Best regards,
>Vladimir Ivanov
>
>[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00
>
>On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote:
>> Okay, I?ve done some tests of this with the micro benchmarks for our
>> language & runtime which show pretty much no change except for one test
>> which is now almost 3x slower. It uses nested loops to iterate over an
>> array and concatenate the string-like objects it contains, and replaces
>> elements with these new longer string-llike objects. It?s a bit of a
>> pathological case, and I haven?t seen the same sort of degradation in
>>the
>> other benchmarks or in real applications, but I haven?t done serious
>> benchmarking of them with this change.
>>
>> I shall see if the test case can be reduced down to anything simpler
>>while
>> still showing the same performance behaviour, and try add some
>>compilation
>> logging options to narrow down what?s going on.
>>
>> Duncan.
>>
>> On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
>> wrote:
>>
>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>>> https://bugs.openjdk.java.net/browse/JDK-8063137
>>>
>>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>>> significantly distorted compilation decisions. It affected inlining and
>>> hindered some optimizations. It causes significant performance
>>> regressions for Nashorn (on Octane benchmarks).
>>>
>>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>>> branch is never taken. It can cause missed optimization opportunity,
>>>and
>>> not just increase in code size. For example, non-pruned branch can
>>>break
>>> escape analysis.
>>>
>>> Currently, there are 2 problems:
>>>    - branch frequencies profile pollution
>>>    - deoptimization counts pollution
>>>
>>> Branch frequency pollution hides from JIT the fact that a branch is
>>> never taken. Since GWT LambdaForms (and hence their bytecode) are
>>> heavily shared, but the behavior is specific to MethodHandle, there's
>>>no
>>> way for JIT to understand how particular GWT instance behaves.
>>>
>>> The solution I propose is to do profiling in Java code and feed it to
>>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>>> profiling info is stored. Once JIT kicks in, it can retrieve these
>>> counts, if corresponding MethodHandle is a compile-time constant (and
>>>it
>>> is usually the case). To communicate the profile data from Java code to
>>> JIT, MethodHandleImpl::profileBranch() is used.
>>>
>>> If GWT MethodHandle isn't a compile-time constant, profiling should
>>> proceed. It happens when corresponding LambdaForm is already shared,
>>>for
>>> newly created GWT MethodHandles profiling can occur only in native code
>>> (dedicated nmethod for a single LambdaForm). So, when compilation of
>>>the
>>> whole MethodHandle chain is triggered, the profile should be already
>>> gathered.
>>>
>>> Overriding branch frequencies is not enough. Statistics on
>>> deoptimization events is also polluted. Even if a branch is never
>>>taken,
>>> JIT doesn't issue an uncommon trap there unless corresponding bytecode
>>> doesn't trap too much and doesn't cause too many recompiles.
>>>
>>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>>> sees it on some method, Compile::too_many_traps &
>>> Compile::too_many_recompiles for that method always return false. It
>>> allows JIT to prune the branch based on custom profile and recompile
>>>the
>>> method, if the branch is visited.
>>>
>>> For now, I wanted to keep the fix very focused. The next thing I plan
>>>to
>>> do is to experiment with ignoring deoptimization counts for other
>>> LambdaForms which are heavily shared. I already saw problems caused by
>>> deoptimization counts pollution (see JDK-8068915 [2]).
>>>
>>> I plan to backport the fix into 8u40, once I finish extensive
>>> performance testing.
>>>
>>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>>> Octane).
>>>
>>> Thanks!
>>>
>>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>>> [2] almost completely recovers peak performance after LambdaForm
>>>sharing
>>> [3]. There's one more problem left (non-inlined MethodHandle
>>>invocations
>>> are more expensive when LFs are shared), but it's a story for another
>>>day.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>>>      8059877: GWT branch frequencies pollution due to LF sharing
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
>>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>>>      JEP 210: LambdaForm Reduction and Caching
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>_______________________________________________
>mlvm-dev mailing list
>mlvm-dev at openjdk.java.net
>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From vladimir.x.ivanov at oracle.com  Tue Jan 20 19:09:11 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 20 Jan 2015 22:09:11 +0300
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
References: <54B94766.2080102@oracle.com>
	<7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
Message-ID: <54BEA7D7.6080008@oracle.com>

John, thanks for the review!

Updated webrev:
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.01/hotspot
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.01/jdk

See my answers inline.

On 1/17/15 2:13 AM, John Rose wrote:
> On Jan 16, 2015, at 9:16 AM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>> https://bugs.openjdk.java.net/browse/JDK-8063137
>> ...
>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>> [2] almost completely recovers peak performance after LambdaForm
>> sharing [3]. There's one more problem left (non-inlined MethodHandle
>> invocations are more expensive when LFs are shared), but it's a story
>> for another day.
>
> This performance bump is excellent news.  LFs are supposed to express
> emergently common behaviors, like hidden classes.  We are much closer to
> that goal now.
>
> I'm glad to see that the library-assisted profiling turns out to be
> relatively clean.
>
> In effect this restores the pre-LF CountingMethodHandle logic from 2011,
> which was so beneficial in JDK 7:
> http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/02de5cdbef21/src/share/classes/java/lang/invoke/CountingMethodHandle.java
>
> I have some suggestions to make this version a little cleaner; see below.
>
> Starting with the JDK changes:
>
> In LambdaForm.java, I'm feeling flag pressure from all the little
> boolean fields and constructor parameters.
>
> (Is it time to put in a bit-encoded field "private byte
> LambdaForm.flags", or do we wait for another boolean to come along?  But
> see next questions, which are more important.)
>
> What happens when a GWT LF gets inlined into a larger LF?  Then there
> might be two or more selectAlternative calls.
> Will this confuse anything or will it Just Work?  The combined LF will
> get profiled as usual, and the selectAlternative calls will also collect
> profile (or not?).
>
> This leads to another question:  Why have a boolean 'isGWT' at all?  Why
> not just check for one or more occurrence of selectAlternative, and
> declare that those guys override (some of) the profiling.  Something like:
>
>    -+ if (PROFILE_GWT && lambdaForm.isGWT)
>    ++ if (PROFILE_GWT && lambdaForm.containsFunction(NF_selectAlternative))
> (...where LF.containsFunction(NamedFunction) is a variation of
> LF.contains(Name).)
>
> I suppose the answer may be that you want to inline GWTs (if ever) into
> customized code where the JVM profiling should get maximum benefit.  In
> that case case you might want to set the boolean to "false" to
> distinguish "immature" GWT combinators from customized ones.
>
> If that's the case, perhaps the real boolean flag you want is not
> 'isGWT' but 'sharedProfile' or 'immature' or some such, or (inverting)
> 'customized'.  (I like the feel of a 'customized' flag.)  Then
> @IgnoreProfile would get attached to a LF that (a ) contains
> selectAlternative and (b ) is marked as non-customized/immature/shared.
>   You might also want to adjust the call to 'profileBranch' based on
> whether the containing LF was shared or customized.
>
> What I'm mainly poking at here is that 'isGWT' is not informative about
> the intended use of the flag.
I agree. It was an interim solution. Initially, I planned to introduce 
customization and guide the logic based on that property. But it's not 
there yet and I needed something for GWT case. Unfortunately, I missed 
the case when GWT is edited. In that case, isGWT flag is missed and no 
annotation is set.
So, I removed isGWT flag and introduced a check for selectAlternative 
occurence in LambdaForm shape, as you suggested.

> In 'updateCounters', if the counter overflows, you'll get continuous
> creation of ArithmeticExceptions.  Will that optimize or will it cause a
> permanent slowdown?  Consider a hack like this on the exception path:
>     counters[idx] = Integer.MAX_VALUE / 2;
I had an impression that VM optimizes overflows in Math.exact* 
intrinsics, but it's not the case - it always inserts an uncommon trap. 
I used the workaround you proposed.

> On the Name Bikeshed:  It looks like @IgnoreProfile (ignore_profile in
> the VM) promises too much "ignorance", since it suppresses branch counts
> and traps, but allows type profiles to be consulted.  Maybe something
> positive like "@ManyTraps" or "@SharedMegamorphic"?  (It's just a name,
> and this is just a suggestion.)
What do you think about @LambdaForm.Shared?

> Going to the JVM:
>
> In library_call.cpp, I think you should change the assert to a guard:
>    -+     assert(aobj->length() == 2, "");
>    ++     && aobj->length() == 2) {
Done.

> In Parse::dynamic_branch_prediction, the mere presence of the Opaque4
> node is enough to trigger replacement of profiling.  I think there
> should *not* be a test of method()->ignore_profile().  That should
> provide better integration between the two sources of profile data to
> JVM profiling?
Done.

> Also, I think the name 'Opaque4Node' is way too? opaque.  Suggest
> 'ProfileBranchNode', since that's exactly what it does.
Done.

> Suggest changing the log element "profile_branch" to "observe
> source='profileBranch'", to make a better hint as to the source of the info.
Done.

Best regards,
Vladimir Ivanov

From duncan.macgregor at ge.com  Tue Jan 20 20:11:42 2015
From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management))
Date: Tue, 20 Jan 2015 20:11:42 +0000
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned
	when GWT LambdaForms are shared
In-Reply-To: <D0E43C3B.CC1C3%duncan.macgregor@ge.com>
References: <54B94766.2080102@oracle.com>
	<D0E31293.CBC09%duncan.macgregor@ge.com> <54BE4CD2.30805@oracle.com>
	<D0E43C3B.CC1C3%duncan.macgregor@ge.com>
Message-ID: <D0E45F8F.CC295%duncan.macgregor@ge.com>

So, very few deopt events in the logs (exactly 4 in fact, in both the
performant and non-performant cases, and for the exact same methods), but
in the case where performance has degraded I only see an initial
compilation for the problem method and not the later inlining I see in the
performant case. I?ll dig through the rest of the logs and try see if
there?s any differences leading up to the inlining.

On the bright side while going through the logs I did spot one obvious
snafu in our code (unnecessary MutableCallSite usage), and have got a 2.5
times speed up on another benchmark, so I?m not too unhappy. :-)

On 20/01/2015 17:14, "MacGregor, Duncan (GE Energy Management)"
<duncan.macgregor at ge.com> wrote:

>Hmm, 8068915 hasn?t fixed it, but running fewer benchmarks seems to make
>the problem go away, so it looks like there?s something going wrong fairly
>deep in our runtime. Trying the full suite with compilation logging
>enabled now to see if I can find a smoking gun.
>
>On 20/01/2015 12:40, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
>wrote:
>
>>Duncan, thanks a lot for giving it a try!
>>
>>If you plan to spend more time on it, please, apply 8068915 as well. I
>>saw huge intermittent performance regressions due to continuous
>>deoptimization storm. You can look into -XX:+LogCompilation output and
>>look for repeated deoptimization events in steady state w/ Action_none.
>>Also, there's deoptimization statistics in the log (at least, in jdk9).
>>It's located right before compilation_log tag.
>>
>>Thanks again for the valuable feedback!
>>
>>Best regards,
>>Vladimir Ivanov
>>
>>[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00
>>
>>On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote:
>>> Okay, I?ve done some tests of this with the micro benchmarks for our
>>> language & runtime which show pretty much no change except for one test
>>> which is now almost 3x slower. It uses nested loops to iterate over an
>>> array and concatenate the string-like objects it contains, and replaces
>>> elements with these new longer string-llike objects. It?s a bit of a
>>> pathological case, and I haven?t seen the same sort of degradation in
>>>the
>>> other benchmarks or in real applications, but I haven?t done serious
>>> benchmarking of them with this change.
>>>
>>> I shall see if the test case can be reduced down to anything simpler
>>>while
>>> still showing the same performance behaviour, and try add some
>>>compilation
>>> logging options to narrow down what?s going on.
>>>
>>> Duncan.
>>>
>>> On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
>>> wrote:
>>>
>>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>>>> https://bugs.openjdk.java.net/browse/JDK-8063137
>>>>
>>>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>>>> significantly distorted compilation decisions. It affected inlining
>>>>and
>>>> hindered some optimizations. It causes significant performance
>>>> regressions for Nashorn (on Octane benchmarks).
>>>>
>>>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>>>> branch is never taken. It can cause missed optimization opportunity,
>>>>and
>>>> not just increase in code size. For example, non-pruned branch can
>>>>break
>>>> escape analysis.
>>>>
>>>> Currently, there are 2 problems:
>>>>    - branch frequencies profile pollution
>>>>    - deoptimization counts pollution
>>>>
>>>> Branch frequency pollution hides from JIT the fact that a branch is
>>>> never taken. Since GWT LambdaForms (and hence their bytecode) are
>>>> heavily shared, but the behavior is specific to MethodHandle, there's
>>>>no
>>>> way for JIT to understand how particular GWT instance behaves.
>>>>
>>>> The solution I propose is to do profiling in Java code and feed it to
>>>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>>>> profiling info is stored. Once JIT kicks in, it can retrieve these
>>>> counts, if corresponding MethodHandle is a compile-time constant (and
>>>>it
>>>> is usually the case). To communicate the profile data from Java code
>>>>to
>>>> JIT, MethodHandleImpl::profileBranch() is used.
>>>>
>>>> If GWT MethodHandle isn't a compile-time constant, profiling should
>>>> proceed. It happens when corresponding LambdaForm is already shared,
>>>>for
>>>> newly created GWT MethodHandles profiling can occur only in native
>>>>code
>>>> (dedicated nmethod for a single LambdaForm). So, when compilation of
>>>>the
>>>> whole MethodHandle chain is triggered, the profile should be already
>>>> gathered.
>>>>
>>>> Overriding branch frequencies is not enough. Statistics on
>>>> deoptimization events is also polluted. Even if a branch is never
>>>>taken,
>>>> JIT doesn't issue an uncommon trap there unless corresponding bytecode
>>>> doesn't trap too much and doesn't cause too many recompiles.
>>>>
>>>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>>>> sees it on some method, Compile::too_many_traps &
>>>> Compile::too_many_recompiles for that method always return false. It
>>>> allows JIT to prune the branch based on custom profile and recompile
>>>>the
>>>> method, if the branch is visited.
>>>>
>>>> For now, I wanted to keep the fix very focused. The next thing I plan
>>>>to
>>>> do is to experiment with ignoring deoptimization counts for other
>>>> LambdaForms which are heavily shared. I already saw problems caused by
>>>> deoptimization counts pollution (see JDK-8068915 [2]).
>>>>
>>>> I plan to backport the fix into 8u40, once I finish extensive
>>>> performance testing.
>>>>
>>>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>>>> Octane).
>>>>
>>>> Thanks!
>>>>
>>>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>>>> [2] almost completely recovers peak performance after LambdaForm
>>>>sharing
>>>> [3]. There's one more problem left (non-inlined MethodHandle
>>>>invocations
>>>> are more expensive when LFs are shared), but it's a story for another
>>>>day.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>>>>      8059877: GWT branch frequencies pollution due to LF sharing
>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>>>>      JEP 210: LambdaForm Reduction and Caching
>>>> _______________________________________________
>>>> mlvm-dev mailing list
>>>> mlvm-dev at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>>
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>>
>>_______________________________________________
>>mlvm-dev mailing list
>>mlvm-dev at openjdk.java.net
>>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>_______________________________________________
>mlvm-dev mailing list
>mlvm-dev at openjdk.java.net
>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From duncan.macgregor at ge.com  Wed Jan 21 10:39:54 2015
From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management))
Date: Wed, 21 Jan 2015 10:39:54 +0000
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned
	when GWT LambdaForms are shared
In-Reply-To: <D0E31293.CBC09%duncan.macgregor@ge.com>
References: <54B94766.2080102@oracle.com>
	<D0E31293.CBC09%duncan.macgregor@ge.com>
Message-ID: <D0E45146.CC28A%duncan.macgregor@ge.com>

This version seems to have inconsistent removal of ignore profile in the
hotspot patch. It?s no longer added to vmSymbols but is still referenced
in classFileParser.

On 19/01/2015 20:21, "MacGregor, Duncan (GE Energy Management)"
<duncan.macgregor at ge.com> wrote:

>Okay, I?ve done some tests of this with the micro benchmarks for our
>language & runtime which show pretty much no change except for one test
>which is now almost 3x slower. It uses nested loops to iterate over an
>array and concatenate the string-like objects it contains, and replaces
>elements with these new longer string-llike objects. It?s a bit of a
>pathological case, and I haven?t seen the same sort of degradation in the
>other benchmarks or in real applications, but I haven?t done serious
>benchmarking of them with this change.
>
>I shall see if the test case can be reduced down to anything simpler while
>still showing the same performance behaviour, and try add some compilation
>logging options to narrow down what?s going on.
>
>Duncan.
>
>On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
>wrote:
>
>>http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>>http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>>https://bugs.openjdk.java.net/browse/JDK-8063137
>>
>>After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>>significantly distorted compilation decisions. It affected inlining and
>>hindered some optimizations. It causes significant performance
>>regressions for Nashorn (on Octane benchmarks).
>>
>>Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>>branch is never taken. It can cause missed optimization opportunity, and
>>not just increase in code size. For example, non-pruned branch can break
>>escape analysis.
>>
>>Currently, there are 2 problems:
>>   - branch frequencies profile pollution
>>   - deoptimization counts pollution
>>
>>Branch frequency pollution hides from JIT the fact that a branch is
>>never taken. Since GWT LambdaForms (and hence their bytecode) are
>>heavily shared, but the behavior is specific to MethodHandle, there's no
>>way for JIT to understand how particular GWT instance behaves.
>>
>>The solution I propose is to do profiling in Java code and feed it to
>>JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>>profiling info is stored. Once JIT kicks in, it can retrieve these
>>counts, if corresponding MethodHandle is a compile-time constant (and it
>>is usually the case). To communicate the profile data from Java code to
>>JIT, MethodHandleImpl::profileBranch() is used.
>>
>>If GWT MethodHandle isn't a compile-time constant, profiling should
>>proceed. It happens when corresponding LambdaForm is already shared, for
>>newly created GWT MethodHandles profiling can occur only in native code
>>(dedicated nmethod for a single LambdaForm). So, when compilation of the
>>whole MethodHandle chain is triggered, the profile should be already
>>gathered.
>>
>>Overriding branch frequencies is not enough. Statistics on
>>deoptimization events is also polluted. Even if a branch is never taken,
>>JIT doesn't issue an uncommon trap there unless corresponding bytecode
>>doesn't trap too much and doesn't cause too many recompiles.
>>
>>I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>>sees it on some method, Compile::too_many_traps &
>>Compile::too_many_recompiles for that method always return false. It
>>allows JIT to prune the branch based on custom profile and recompile the
>>method, if the branch is visited.
>>
>>For now, I wanted to keep the fix very focused. The next thing I plan to
>>do is to experiment with ignoring deoptimization counts for other
>>LambdaForms which are heavily shared. I already saw problems caused by
>>deoptimization counts pollution (see JDK-8068915 [2]).
>>
>>I plan to backport the fix into 8u40, once I finish extensive
>>performance testing.
>>
>>Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>>Octane).
>>
>>Thanks!
>>
>>PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>>[2] almost completely recovers peak performance after LambdaForm sharing
>>[3]. There's one more problem left (non-inlined MethodHandle invocations
>>are more expensive when LFs are shared), but it's a story for another
>>day.
>>
>>Best regards,
>>Vladimir Ivanov
>>
>>[1] https://bugs.openjdk.java.net/browse/JDK-8059877
>>     8059877: GWT branch frequencies pollution due to LF sharing
>>[2] https://bugs.openjdk.java.net/browse/JDK-8068915
>>[3] https://bugs.openjdk.java.net/browse/JDK-8046703
>>     JEP 210: LambdaForm Reduction and Caching
>>_______________________________________________
>>mlvm-dev mailing list
>>mlvm-dev at openjdk.java.net
>>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>_______________________________________________
>mlvm-dev mailing list
>mlvm-dev at openjdk.java.net
>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From vladimir.x.ivanov at oracle.com  Wed Jan 21 11:41:15 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 21 Jan 2015 14:41:15 +0300
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <D0E45146.CC28A%duncan.macgregor@ge.com>
References: <54B94766.2080102@oracle.com>	<D0E31293.CBC09%duncan.macgregor@ge.com>
	<D0E45146.CC28A%duncan.macgregor@ge.com>
Message-ID: <54BF905B.7020407@oracle.com>

Duncan, sorry for that.
Updated webrev inplace.

Best regards,
Vladimir Ivanov

On 1/21/15 1:39 PM, MacGregor, Duncan (GE Energy Management) wrote:
> This version seems to have inconsistent removal of ignore profile in the
> hotspot patch. It?s no longer added to vmSymbols but is still referenced
> in classFileParser.
> 
> On 19/01/2015 20:21, "MacGregor, Duncan (GE Energy Management)"
> <duncan.macgregor at ge.com> wrote:
> 
>> Okay, I?ve done some tests of this with the micro benchmarks for our
>> language & runtime which show pretty much no change except for one test
>> which is now almost 3x slower. It uses nested loops to iterate over an
>> array and concatenate the string-like objects it contains, and replaces
>> elements with these new longer string-llike objects. It?s a bit of a
>> pathological case, and I haven?t seen the same sort of degradation in the
>> other benchmarks or in real applications, but I haven?t done serious
>> benchmarking of them with this change.
>>
>> I shall see if the test case can be reduced down to anything simpler while
>> still showing the same performance behaviour, and try add some compilation
>> logging options to narrow down what?s going on.
>>
>> Duncan.
>>
>> On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
>> wrote:
>>
>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>>> https://bugs.openjdk.java.net/browse/JDK-8063137
>>>
>>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>>> significantly distorted compilation decisions. It affected inlining and
>>> hindered some optimizations. It causes significant performance
>>> regressions for Nashorn (on Octane benchmarks).
>>>
>>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>>> branch is never taken. It can cause missed optimization opportunity, and
>>> not just increase in code size. For example, non-pruned branch can break
>>> escape analysis.
>>>
>>> Currently, there are 2 problems:
>>>    - branch frequencies profile pollution
>>>    - deoptimization counts pollution
>>>
>>> Branch frequency pollution hides from JIT the fact that a branch is
>>> never taken. Since GWT LambdaForms (and hence their bytecode) are
>>> heavily shared, but the behavior is specific to MethodHandle, there's no
>>> way for JIT to understand how particular GWT instance behaves.
>>>
>>> The solution I propose is to do profiling in Java code and feed it to
>>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>>> profiling info is stored. Once JIT kicks in, it can retrieve these
>>> counts, if corresponding MethodHandle is a compile-time constant (and it
>>> is usually the case). To communicate the profile data from Java code to
>>> JIT, MethodHandleImpl::profileBranch() is used.
>>>
>>> If GWT MethodHandle isn't a compile-time constant, profiling should
>>> proceed. It happens when corresponding LambdaForm is already shared, for
>>> newly created GWT MethodHandles profiling can occur only in native code
>>> (dedicated nmethod for a single LambdaForm). So, when compilation of the
>>> whole MethodHandle chain is triggered, the profile should be already
>>> gathered.
>>>
>>> Overriding branch frequencies is not enough. Statistics on
>>> deoptimization events is also polluted. Even if a branch is never taken,
>>> JIT doesn't issue an uncommon trap there unless corresponding bytecode
>>> doesn't trap too much and doesn't cause too many recompiles.
>>>
>>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>>> sees it on some method, Compile::too_many_traps &
>>> Compile::too_many_recompiles for that method always return false. It
>>> allows JIT to prune the branch based on custom profile and recompile the
>>> method, if the branch is visited.
>>>
>>> For now, I wanted to keep the fix very focused. The next thing I plan to
>>> do is to experiment with ignoring deoptimization counts for other
>>> LambdaForms which are heavily shared. I already saw problems caused by
>>> deoptimization counts pollution (see JDK-8068915 [2]).
>>>
>>> I plan to backport the fix into 8u40, once I finish extensive
>>> performance testing.
>>>
>>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>>> Octane).
>>>
>>> Thanks!
>>>
>>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>>> [2] almost completely recovers peak performance after LambdaForm sharing
>>> [3]. There's one more problem left (non-inlined MethodHandle invocations
>>> are more expensive when LFs are shared), but it's a story for another
>>> day.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>>>      8059877: GWT branch frequencies pollution due to LF sharing
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
>>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>>>      JEP 210: LambdaForm Reduction and Caching
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> 

From marcus.lagergren at oracle.com  Wed Jan 21 13:48:33 2015
From: marcus.lagergren at oracle.com (Marcus Lagergren)
Date: Wed, 21 Jan 2015 14:48:33 +0100
Subject: JFokus 2015 - the VM Tech Day
In-Reply-To: <36D2A3B5-A5ED-421A-8DAC-379712002366@oracle.com>
References: <C2867DDB-1EAA-42D9-9132-BD2D116FEDA2@oracle.com>
	<36D2A3B5-A5ED-421A-8DAC-379712002366@oracle.com>
Message-ID: <C7A39769-0EC9-45A3-8339-2ACB80437265@oracle.com>

Btw,

I have a few 50% discounts left for the VM tech day. If you are interested, please e-mail me directly!

/Marcus

> On 19 Jan 2015, at 10:58, Marcus Lagergren <marcus.lagergren at oracle.com> wrote:
> 
> And to further clarify things - you can attend _only_ the VM Tech day / tech summit, should you so desire, and skip the rest of the JFokus conference. (What a strange thing to do, given the quality of JFokus, but I can?t be the one questioning your priorities here)
> 
> (http://www.jfokus.se/jfokus/register.jsp <http://www.jfokus.se/jfokus/register.jsp>)
> 
> /M
> 
>> On 18 Jan 2015, at 22:54, Marcus Lagergren <marcus.lagergren at oracle.com <mailto:marcus.lagergren at oracle.com>> wrote:
>> 
>> Greetings community members!
>> 
>> Here is something that I'm sure you'll find interesting.
>> 
>> I want to advertise the upcoming "VM tech day? event, scheduled to
>> take place February 2, 2015 at the JFokus conference in
>> Stockholm. Sorry I am on a bit of a short notice here, but finalizing
>> the speaker list took us a bit more time than expected.
>> 
>> The VM tech day is a mini-track that runs the first day of the JFokus
>> conference. This is its schedule: 
>> https://www.jfokus.se/jfokus/jvmtech.jsp <https://www.jfokus.se/jfokus/jvmtech.jsp>
>> 
>> After some rather challenging months of jigsaw puzzles, it is with
>> great pleasure that I can announce that our speaker line up is now
>> complete - and it is great indeed! We are talking 100% gurus,
>> prophets, ninjas, rock stars, and all other similar terms that
>> normally gets your resume binned if it passes my desk. But in this
>> case the labels are true. We have strictly top names from both the
>> commercial world and from academia ready to take you on a great
>> ride.
>> 
>> So what is the VM tech day? For those of you familiar with the JVM
>> Language Summit (JVMLS) that usually takes place in Santa Clara in
>> the summers, the format is similar. It?s the usual deal: anyone
>> morbidly interested in runtime internals, code generation, polyglot
>> programming and the complexities of language implementation, should
>> find a veritable gold mine of stimulating conversation and knowledge
>> transfer here. What is different from a typical JVMLS (except for the
>> shorter duration), is that we have widened the scope a bit to include
>> several runtimes, language implementation issues and polyglot
>> problems.
>> 
>> There will be six scheduled sessions and plenty of time for breakouts
>> and discussions. We will also heavily encourage audience interaction
>> and participation.
>> 
>> The JFokus VM tech day is opened by John Rose. I am sure John needs 
>> no introduction to the subscribers of this list. With advanced OpenJDK
>> projects like Valhalla and Panama booting up, John will discuss what
>> the JVM has in store for the future. 
>> 
>> Other speakers include the tireless Charlie Nutter from Red Hat, the
>> formidable Remi Forax, the brilliant Vyacheslav Egorov of Google v8
>> fame, the esteemed Dan Heidinga from IBM and the good looking Attila
>> Szegedi from Oracle.
>> 
>> We also have plenty of non-speaking celebrity participants in the
>> audience, for example Fredrik ?hrstr?m: invokedynamic specification 
>> wizard extraordinaire and architect behind the new OpenJDK build
>> system. Stop by and get autographs ;)
>> 
>> Thusly: if you are attending JFokus, or if you are making up your mind
>> about attending it right now, the VM tech summit is definitely
>> something anyone subscribing to mlvm-dev wouldn't want to miss. The
>> cross-platform/cross-technology/cross-company focus that we have tried
>> very hard to create will without a doubt be ultra stimulating. Of that
>> you can be sure.
>> 
>> Please help us spread the word in whatever forums you deem
>> appropriate! Talk to you friends! Tweet links to this post! Yell from
>> your cubicle soap boxes across the neverending seas of fluorescent
>> lights!
>> 
>> Any further questions you may have about the event, not answered by
>> the web pages, can be directed either to me (@lagergren) or Mattias 
>> Karlsson (@matkar) or as replies to this e-mail thread.
>> 
>> On behalf of JFokus / VM Tech Day 2015
>> Marcus Lagergren
>> Master of ceremonies (or something)
>> 
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150121/4b223a5a/attachment-0001.html>

From vladimir.x.ivanov at oracle.com  Wed Jan 21 16:25:18 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 21 Jan 2015 19:25:18 +0300
Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked using
	MH.invoke/invokeExact
Message-ID: <54BFD2EE.3060909@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8069591/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8069591

Overhead of non-inlined MH.invoke/invokeExact calls significantly 
increased with LambdaForm sharing. The cause is JIT compiler can't 
produce a single nmethod for the whole MethodHandle chain, so the 
execution is spread around numerous nmethods (1 per each MethodHandle in 
the chain). The longer the chain the larger overhead.

The fix is to customize LambdaForms (create a dedicated LambdaForm for a 
MethodHandle). Per-MethodHandle count is introduced, which is 
incremented every time a MethodHandle is invoked using 
MethodHandle.invoke/invokeExact. Once CUSTOMIZE_THRESHOLD is reached for 
a particular MethodHandle, it's LambdaForm is substituted with a 
customized one, which has it's MethodHandle embedded. It allows JIT to 
see actual MethodHandle during compilation and produce more efficient code.

This fix completely recovers Gbemu peak performance to pre-LambdaForm 
sharing level.

Testing: jck (api/java_lang/invoke), jdk/java/lang/invoke, nashorn 
tests, nashorn/octane

Thanks!

Best regards,
Vladimir Ivanov

From forax at univ-mlv.fr  Wed Jan 21 17:31:05 2015
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 21 Jan 2015 18:31:05 +0100
Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked
	using MH.invoke/invokeExact
In-Reply-To: <54BFD2EE.3060909@oracle.com>
References: <54BFD2EE.3060909@oracle.com>
Message-ID: <54BFE259.1090402@univ-mlv.fr>

Hi Vladimir,
in Invokers.java, I think that checkCustomized should take an Object and 
not a MethodHandle
exactly like getCallSiteTarget takes an Object and not a CallSite.

in MethodHandle.java, customizationCount is declared as a byte and there 
is no check that
the CUSTOMIZE_THRESHOLD is not greater than 127.

cheers,
R?mi

On 01/21/2015 05:25 PM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8069591/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8069591
>
> Overhead of non-inlined MH.invoke/invokeExact calls significantly 
> increased with LambdaForm sharing. The cause is JIT compiler can't 
> produce a single nmethod for the whole MethodHandle chain, so the 
> execution is spread around numerous nmethods (1 per each MethodHandle 
> in the chain). The longer the chain the larger overhead.
>
> The fix is to customize LambdaForms (create a dedicated LambdaForm for 
> a MethodHandle). Per-MethodHandle count is introduced, which is 
> incremented every time a MethodHandle is invoked using 
> MethodHandle.invoke/invokeExact. Once CUSTOMIZE_THRESHOLD is reached 
> for a particular MethodHandle, it's LambdaForm is substituted with a 
> customized one, which has it's MethodHandle embedded. It allows JIT to 
> see actual MethodHandle during compilation and produce more efficient 
> code.
>
> This fix completely recovers Gbemu peak performance to pre-LambdaForm 
> sharing level.
>
> Testing: jck (api/java_lang/invoke), jdk/java/lang/invoke, nashorn 
> tests, nashorn/octane
>
> Thanks!
>
> Best regards,
> Vladimir Ivanov
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From john.r.rose at oracle.com  Wed Jan 21 19:30:28 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 21 Jan 2015 11:30:28 -0800
Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked
	using MH.invoke/invokeExact
In-Reply-To: <54BFE259.1090402@univ-mlv.fr>
References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr>
Message-ID: <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com>

On Jan 21, 2015, at 9:31 AM, Remi Forax <forax at univ-mlv.fr> wrote:
> 
> in Invokers.java, I think that checkCustomized should take an Object and not a MethodHandle
> exactly like getCallSiteTarget takes an Object and not a CallSite.

The use of erased types (any ref => Object) in the MH runtime is an artifact of bootstrapping difficulties, early in the project.  I hope it is not necessary any more.  That said, I agree that the pattern should be consistent.

Vladimir, would you please file a tracking bug for this cleanup, to change MH library functions to use stronger types instead of Object?

> in MethodHandle.java, customizationCount is declared as a byte and there is no check that
> the CUSTOMIZE_THRESHOLD is not greater than 127.

Yes.  Also, the maybeCustomize method has a race condition that could cause the counter to wrap.  It shouldn't use "+=1" to increment; it should load the old counter value, test it, increment it (in a local), and then store the updated value.  That is also one possible place to deal with jumbo CUSTOMIZE_THRESHOLD values.

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150121/bf7eba88/attachment.html>

From vladimir.x.ivanov at oracle.com  Thu Jan 22 17:56:30 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 22 Jan 2015 20:56:30 +0300
Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked
	using MH.invoke/invokeExact
In-Reply-To: <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com>
References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr>
	<3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com>
Message-ID: <54C139CE.4000005@oracle.com>

Remi, John, thanks for review!

Updated webrev:
http://cr.openjdk.java.net/~vlivanov/8069591/webrev.01/

This time I did additional testing (COMPILE_THRESHOLD > 0) and spotted a 
problem with MethodHandle.copyWith(): a MethodHandle can inherit 
customized LambdaForm this way. I could have added 
LambdaForm::uncustomize() call in evey Species_*::copyWith() method, but 
I decided to add it into MethodHandle constructor. Let me know if you 
think it's too intrusive.

Also, I made DirectMethodHandles a special-case, since I don't see any 
benefit in customizing them.

Best regards,
Vladimir Ivanov

On 1/21/15 10:30 PM, John Rose wrote:
> On Jan 21, 2015, at 9:31 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>>
>> in Invokers.java, I think that checkCustomized should take an Object and not a MethodHandle
>> exactly like getCallSiteTarget takes an Object and not a CallSite.
>
> The use of erased types (any ref => Object) in the MH runtime is an artifact of bootstrapping difficulties, early in the project.  I hope it is not necessary any more.  That said, I agree that the pattern should be consistent.
>
> Vladimir, would you please file a tracking bug for this cleanup, to change MH library functions to use stronger types instead of Object?
>
>> in MethodHandle.java, customizationCount is declared as a byte and there is no check that
>> the CUSTOMIZE_THRESHOLD is not greater than 127.
>
> Yes.  Also, the maybeCustomize method has a race condition that could cause the counter to wrap.  It shouldn't use "+=1" to increment; it should load the old counter value, test it, increment it (in a local), and then store the updated value.  That is also one possible place to deal with jumbo CUSTOMIZE_THRESHOLD values.
>
> ? John
>

From vladimir.x.ivanov at oracle.com  Thu Jan 22 18:21:52 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 22 Jan 2015 21:21:52 +0300
Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked
	using MH.invoke/invokeExact
In-Reply-To: <3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com>
References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr>
	<3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com>
Message-ID: <54C13FC0.1030608@oracle.com>

>> in Invokers.java, I think that checkCustomized should take an Object and not a MethodHandle
>> exactly like getCallSiteTarget takes an Object and not a CallSite.
>
> The use of erased types (any ref => Object) in the MH runtime is an artifact of bootstrapping difficulties, early in the project.  I hope it is not necessary any more.  That said, I agree that the pattern should be consistent.
Sure. Here is it [1]

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8071368

From john.r.rose at oracle.com  Thu Jan 22 23:30:59 2015
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 22 Jan 2015 15:30:59 -0800
Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked
	using MH.invoke/invokeExact
In-Reply-To: <54C139CE.4000005@oracle.com>
References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr>
	<3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com>
	<54C139CE.4000005@oracle.com>
Message-ID: <FD2BDC66-0E73-4B83-A943-D36F1ECAC1CC@oracle.com>

On Jan 22, 2015, at 9:56 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Remi, John, thanks for review!
> 
> Updated webrev:
> http://cr.openjdk.java.net/~vlivanov/8069591/webrev.01/
> 
> This time I did additional testing (COMPILE_THRESHOLD > 0) and spotted a problem with MethodHandle.copyWith(): a MethodHandle can inherit customized LambdaForm this way. I could have added LambdaForm::uncustomize() call in evey Species_*::copyWith() method, but I decided to add it into MethodHandle constructor. Let me know if you think it's too intrusive.

It's OK to put it there.

Now I'm worried that the new customization logic will defeat code sharing for invoked MHs, since uncustomize creates a new LF that is a duplicate of the original LF.  That breaks the genetic link for children of the invoked MH, doesn't it?  (I like the compileToBytecode call, if it is done on the original.)  In fact, that is also a potential problem for the first version of your patch, also.

Suggestion:  Have every customized LF contain a direct link to its uncustomized original.  Have uncustomize just return that same original, every time.  Then, when using LF editor operations to derive new LFs, always have them extract the original before making a derivation.

(Alternatively, have the LF editor caches be shared between original LFs and all their customized versions.  But that doesn't save all the genetic links.)

> Also, I made DirectMethodHandles a special-case, since I don't see any benefit in customizing them.

The overriding method in DHM should be marked @Override, so that we know all the bits fit together.

? John

From john.r.rose at oracle.com  Fri Jan 23 01:31:59 2015
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 22 Jan 2015 17:31:59 -0800
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <54BEA7D7.6080008@oracle.com>
References: <54B94766.2080102@oracle.com>
	<7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
	<54BEA7D7.6080008@oracle.com>
Message-ID: <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com>

On Jan 20, 2015, at 11:09 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
>> What I'm mainly poking at here is that 'isGWT' is not informative about
>> the intended use of the flag.
> I agree. It was an interim solution. Initially, I planned to introduce customization and guide the logic based on that property. But it's not there yet and I needed something for GWT case. Unfortunately, I missed the case when GWT is edited. In that case, isGWT flag is missed and no annotation is set.
> So, I removed isGWT flag and introduced a check for selectAlternative occurence in LambdaForm shape, as you suggested.

Good.

I think there is a sweeter spot just a little further on.  Make profileBranch be an LF intrinsic and expose it like this:
  GWT(p,t,f;S) := let(a=new int[3]) in lambda(*: S) { selectAlternative(profileBranch(p.invoke( *), a), t, f).invoke( *); }

Then selectAlternative triggers branchy bytecodes in the IBGen, and profileBranch injects profiling in C2.
The presence of profileBranch would then trigger the @Shared annotation, if you still need it.

After thinking about it some more, I still believe it would be better to detect the use of profileBranch during a C2 compile task, and feed that to the too_many_traps logic.  I agree it is much easier to stick the annotation on in the IBGen; the problem is that because of a minor phase ordering problem you are introducing an annotation which flows from the JDK to the VM.  Here's one more suggestion at reducing this coupling?

Note that C->set_trap_count is called when each Parse phase processes a whole method.  This means that information about the contents of the nmethod accumulates during the parse.  Likewise, add a flag method C->{has,set}_injected_profile, and set the flag whenever the parser sees a profileBranch intrinsic (with or without a constant profile array; your call).  Then consult that flag from too_many_traps.  It is true that code which is parsed upstream of the very first profileBranch will potentially issue a non-trapping fallback, but by definition that code would be unrelated to the injected profile, so I don't see a harm in that.  If this approach works, then you can remove the annotation altogether, which is clearly preferable.  We understand the annotation now, but it has the danger of becoming a maintainer's puzzlement.

> 
>> In 'updateCounters', if the counter overflows, you'll get continuous
>> creation of ArithmeticExceptions.  Will that optimize or will it cause a
>> permanent slowdown?  Consider a hack like this on the exception path:
>>    counters[idx] = Integer.MAX_VALUE / 2;
> I had an impression that VM optimizes overflows in Math.exact* intrinsics, but it's not the case - it always inserts an uncommon trap. I used the workaround you proposed.

Good.

> 
>> On the Name Bikeshed:  It looks like @IgnoreProfile (ignore_profile in
>> the VM) promises too much "ignorance", since it suppresses branch counts
>> and traps, but allows type profiles to be consulted.  Maybe something
>> positive like "@ManyTraps" or "@SharedMegamorphic"?  (It's just a name,
>> and this is just a suggestion.)
> What do you think about @LambdaForm.Shared?

That's fine.  Suggest changing the JVM accessor to is_lambda_form_shared, because the term "shared" is already overused in the VM.

Or, to be much more accurate, s/@Shared/@CollectiveProfile/.  Better yet, get rid of it, as suggested above.

(I just realized that profile pollution looks logically parallel to the http://en.wikipedia.org/wiki/Tragedy_of_the_commons <http://en.wikipedia.org/wiki/Tragedy_of_the_commons> .)

Also, in the comment explaining the annotation:
  s/mostly useless/probably polluted by conflicting behavior from multiple call sites/

I very much like the fact that profileBranch is the VM intrinsic, not selectAlternative.  A VM intrinsic should be nice and narrow like that.  In fact, you can delete selectAlternative from vmSymbols while you are at it.

(We could do profileInteger and profileClass in a similar way, if that turned out to be useful.)

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150122/7aa43013/attachment-0001.html>

From peter.levart at gmail.com  Fri Jan 23 14:38:58 2015
From: peter.levart at gmail.com (Peter Levart)
Date: Fri, 23 Jan 2015 15:38:58 +0100
Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked
	using MH.invoke/invokeExact
In-Reply-To: <FD2BDC66-0E73-4B83-A943-D36F1ECAC1CC@oracle.com>
References: <54BFD2EE.3060909@oracle.com>
	<54BFE259.1090402@univ-mlv.fr>	<3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com>	<54C139CE.4000005@oracle.com>
	<FD2BDC66-0E73-4B83-A943-D36F1ECAC1CC@oracle.com>
Message-ID: <54C25D02.2020209@gmail.com>

On 01/23/2015 12:30 AM, John Rose wrote:
> On Jan 22, 2015, at 9:56 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>> Remi, John, thanks for review!
>>
>> Updated webrev:
>> http://cr.openjdk.java.net/~vlivanov/8069591/webrev.01/
>>
>> This time I did additional testing (COMPILE_THRESHOLD > 0) and spotted a problem with MethodHandle.copyWith(): a MethodHandle can inherit customized LambdaForm this way. I could have added LambdaForm::uncustomize() call in evey Species_*::copyWith() method, but I decided to add it into MethodHandle constructor. Let me know if you think it's too intrusive.
> It's OK to put it there.
>
> Now I'm worried that the new customization logic will defeat code sharing for invoked MHs, since uncustomize creates a new LF that is a duplicate of the original LF.  That breaks the genetic link for children of the invoked MH, doesn't it?  (I like the compileToBytecode call, if it is done on the original.)  In fact, that is also a potential problem for the first version of your patch, also.
>
> Suggestion:  Have every customized LF contain a direct link to its uncustomized original.  Have uncustomize just return that same original, every time.  Then, when using LF editor operations to derive new LFs, always have them extract the original before making a derivation.

The customized LF then don't need 'transformCache' field. It could be 
re-used to point to original uncustomized LF. That would also be a 
signal for LF editor (the 4th type of payload attached to transformCache 
field) to follow the link to get to the uncustomized LF...

Peter

>
> (Alternatively, have the LF editor caches be shared between original LFs and all their customized versions.  But that doesn't save all the genetic links.)
>
>> Also, I made DirectMethodHandles a special-case, since I don't see any benefit in customizing them.
> The overriding method in DHM should be marked @Override, so that we know all the bits fit together.
>
> ? John
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From vladimir.x.ivanov at oracle.com  Fri Jan 23 16:00:53 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 23 Jan 2015 19:00:53 +0300
Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked
	using MH.invoke/invokeExact
In-Reply-To: <54C25D02.2020209@gmail.com>
References: <54BFD2EE.3060909@oracle.com>
	<54BFE259.1090402@univ-mlv.fr>	<3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com>	<54C139CE.4000005@oracle.com>
	<FD2BDC66-0E73-4B83-A943-D36F1ECAC1CC@oracle.com>
	<54C25D02.2020209@gmail.com>
Message-ID: <54C27035.5050103@oracle.com>

Good idea, Peter!

Updated version:
http://cr.openjdk.java.net/~vlivanov/8069591/webrev.02/

Best regards,
Vladimir Ivanov

On 1/23/15 5:38 PM, Peter Levart wrote:
> On 01/23/2015 12:30 AM, John Rose wrote:
>> On Jan 22, 2015, at 9:56 AM, Vladimir Ivanov
>> <vladimir.x.ivanov at oracle.com> wrote:
>>> Remi, John, thanks for review!
>>>
>>> Updated webrev:
>>> http://cr.openjdk.java.net/~vlivanov/8069591/webrev.01/
>>>
>>> This time I did additional testing (COMPILE_THRESHOLD > 0) and
>>> spotted a problem with MethodHandle.copyWith(): a MethodHandle can
>>> inherit customized LambdaForm this way. I could have added
>>> LambdaForm::uncustomize() call in evey Species_*::copyWith() method,
>>> but I decided to add it into MethodHandle constructor. Let me know if
>>> you think it's too intrusive.
>> It's OK to put it there.
>>
>> Now I'm worried that the new customization logic will defeat code
>> sharing for invoked MHs, since uncustomize creates a new LF that is a
>> duplicate of the original LF.  That breaks the genetic link for
>> children of the invoked MH, doesn't it?  (I like the compileToBytecode
>> call, if it is done on the original.)  In fact, that is also a
>> potential problem for the first version of your patch, also.
>>
>> Suggestion:  Have every customized LF contain a direct link to its
>> uncustomized original.  Have uncustomize just return that same
>> original, every time.  Then, when using LF editor operations to derive
>> new LFs, always have them extract the original before making a
>> derivation.
>
> The customized LF then don't need 'transformCache' field. It could be
> re-used to point to original uncustomized LF. That would also be a
> signal for LF editor (the 4th type of payload attached to transformCache
> field) to follow the link to get to the uncustomized LF...
>
> Peter
>
>>
>> (Alternatively, have the LF editor caches be shared between original
>> LFs and all their customized versions.  But that doesn't save all the
>> genetic links.)
>>
>>> Also, I made DirectMethodHandles a special-case, since I don't see
>>> any benefit in customizing them.
>> The overriding method in DHM should be marked @Override, so that we
>> know all the bits fit together.
>>
>> ? John
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>

From john.r.rose at oracle.com  Fri Jan 23 18:13:40 2015
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 23 Jan 2015 10:13:40 -0800
Subject: [9] RFR (M): 8069591: Customize LambdaForms which are invoked
	using MH.invoke/invokeExact
In-Reply-To: <54C27035.5050103@oracle.com>
References: <54BFD2EE.3060909@oracle.com> <54BFE259.1090402@univ-mlv.fr>
	<3B4D19E0-8DA6-4FE1-BD77-E12E8BCF15EC@oracle.com>
	<54C139CE.4000005@oracle.com>
	<FD2BDC66-0E73-4B83-A943-D36F1ECAC1CC@oracle.com>
	<54C25D02.2020209@gmail.com> <54C27035.5050103@oracle.com>
Message-ID: <F6F2D71F-EEEC-45B1-A8F8-D7CC4F75296B@oracle.com>

On Jan 23, 2015, at 8:00 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Good idea, Peter!

+1

> Updated version:
> http://cr.openjdk.java.net/~vlivanov/8069591/webrev.02/ <http://cr.openjdk.java.net/~vlivanov/8069591/webrev.02/>

Yes, that's good, and you can count me as a reviewer.  ? John

P.S. One could also get rid of the LF.customized field by stuffing both that value and the original LF in the transformCache (as a 2-array), but that's overkill.

P.P.S. A possible generalization to the LF.customized field would be an optional list of type, value, and/or structure constraints for one or more arguments to the LF.  Then we could (a) customize on additional arguments if we thought that were useful, and/or (b) produce semi-custom code that could be shared by more than one MH, if we thought there was an interesting equivalence class of MHs to speed up with common code.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150123/d94eac1c/attachment.html>

From vladimir.x.ivanov at oracle.com  Mon Jan 26 16:41:50 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 26 Jan 2015 19:41:50 +0300
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com>
References: <54B94766.2080102@oracle.com>
	<7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
	<54BEA7D7.6080008@oracle.com>
	<5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com>
Message-ID: <54C66E4E.9050805@oracle.com>

John,

What do you think about the following version?
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02

As you suggested, I reified MHI::profileBranch on LambdaForm level and 
removed @LambdaForm.Shared. My main concern about removing @Sharen was 
that profile pollution can affect the code before profileBranch call 
(akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is 
sensitive to that change (there's a 10% difference in peak performance 
between @Shared and has_injected_profile()).

I can leave @Shared as is for now or remove it and work on the fix to 
the deoptimization counts pollution. What do you prefer?

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8068915

On 1/23/15 4:31 AM, John Rose wrote:
> On Jan 20, 2015, at 11:09 AM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>>
>>> What I'm mainly poking at here is that 'isGWT' is not informative about
>>> the intended use of the flag.
>> I agree. It was an interim solution. Initially, I planned to introduce
>> customization and guide the logic based on that property. But it's not
>> there yet and I needed something for GWT case. Unfortunately, I missed
>> the case when GWT is edited. In that case, isGWT flag is missed and no
>> annotation is set.
>> So, I removed isGWT flag and introduced a check for selectAlternative
>> occurence in LambdaForm shape, as you suggested.
>
> Good.
>
> I think there is a sweeter spot just a little further on.  Make
> profileBranch be an LF intrinsic and expose it like this:
>    GWT(p,t,f;S) := let(a=new int[3]) in lambda(*: S) {
> selectAlternative(profileBranch(p.invoke( *), a), t, f).invoke( *); }
>
> Then selectAlternative triggers branchy bytecodes in the IBGen, and
> profileBranch injects profiling in C2.
> The presence of profileBranch would then trigger the @Shared annotation,
> if you still need it.
>
> After thinking about it some more, I still believe it would be better to
> detect the use of profileBranch during a C2 compile task, and feed that
> to the too_many_traps logic.  I agree it is much easier to stick the
> annotation on in the IBGen; the problem is that because of a minor phase
> ordering problem you are introducing an annotation which flows from the
> JDK to the VM.  Here's one more suggestion at reducing this coupling?
>
> Note that C->set_trap_count is called when each Parse phase processes a
> whole method.  This means that information about the contents of the
> nmethod accumulates during the parse.  Likewise, add a flag method
> C->{has,set}_injected_profile, and set the flag whenever the parser sees
> a profileBranch intrinsic (with or without a constant profile array;
> your call).  Then consult that flag from too_many_traps.  It is true
> that code which is parsed upstream of the very first profileBranch will
> potentially issue a non-trapping fallback, but by definition that code
> would be unrelated to the injected profile, so I don't see a harm in
> that.  If this approach works, then you can remove the annotation
> altogether, which is clearly preferable.  We understand the annotation
> now, but it has the danger of becoming a maintainer's puzzlement.
>
>>
>>> In 'updateCounters', if the counter overflows, you'll get continuous
>>> creation of ArithmeticExceptions.  Will that optimize or will it cause a
>>> permanent slowdown?  Consider a hack like this on the exception path:
>>>    counters[idx] = Integer.MAX_VALUE / 2;
>> I had an impression that VM optimizes overflows in Math.exact*
>> intrinsics, but it's not the case - it always inserts an uncommon
>> trap. I used the workaround you proposed.
>
> Good.
>
>>
>>> On the Name Bikeshed:  It looks like @IgnoreProfile (ignore_profile in
>>> the VM) promises too much "ignorance", since it suppresses branch counts
>>> and traps, but allows type profiles to be consulted.  Maybe something
>>> positive like "@ManyTraps" or "@SharedMegamorphic"?  (It's just a name,
>>> and this is just a suggestion.)
>> What do you think about @LambdaForm.Shared?
>
> That's fine.  Suggest changing the JVM accessor to
> is_lambda_form_shared, because the term "shared" is already overused in
> the VM.
>
> Or, to be much more accurate, s/@Shared/@CollectiveProfile/.  Better
> yet, get rid of it, as suggested above.
>
> (I just realized that profile pollution looks logically parallel to the
> http://en.wikipedia.org/wiki/Tragedy_of_the_commons .)
>
> Also, in the comment explaining the annotation:
>    s/mostly useless/probably polluted by conflicting behavior from
> multiple call sites/
>
> I very much like the fact that profileBranch is the VM intrinsic, not
> selectAlternative.  A VM intrinsic should be nice and narrow like that.
>   In fact, you can delete selectAlternative from vmSymbols while you are
> at it.
>
> (We could do profileInteger and profileClass in a similar way, if that
> turned out to be useful.)
>
> ? John

From vladimir.x.ivanov at oracle.com  Mon Jan 26 18:31:30 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 26 Jan 2015 21:31:30 +0300
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <54C66E4E.9050805@oracle.com>
References: <54B94766.2080102@oracle.com>
	<7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
	<54BEA7D7.6080008@oracle.com>
	<5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com>
	<54C66E4E.9050805@oracle.com>
Message-ID: <54C68802.7020105@oracle.com>

> As you suggested, I reified MHI::profileBranch on LambdaForm level and
> removed @LambdaForm.Shared. My main concern about removing @Sharen was
> that profile pollution can affect the code before profileBranch call
> (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is
> sensitive to that change (there's a 10% difference in peak performance
> between @Shared and has_injected_profile()).
Ignore that. Additional runs don't prove there's a regression on Gbemu. 
There's some variance on Gbemu and it's present w/ and w/o @Shared.

Best regards,
Vladimir Ivanov

> I can leave @Shared as is for now or remove it and work on the fix to
> the deoptimization counts pollution. What do you prefer?
>
> Best regards,
> Vladimir Ivanov
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8068915
>
> On 1/23/15 4:31 AM, John Rose wrote:
>> On Jan 20, 2015, at 11:09 AM, Vladimir Ivanov
>> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>>
>> wrote:
>>>
>>>> What I'm mainly poking at here is that 'isGWT' is not informative about
>>>> the intended use of the flag.
>>> I agree. It was an interim solution. Initially, I planned to introduce
>>> customization and guide the logic based on that property. But it's not
>>> there yet and I needed something for GWT case. Unfortunately, I missed
>>> the case when GWT is edited. In that case, isGWT flag is missed and no
>>> annotation is set.
>>> So, I removed isGWT flag and introduced a check for selectAlternative
>>> occurence in LambdaForm shape, as you suggested.
>>
>> Good.
>>
>> I think there is a sweeter spot just a little further on.  Make
>> profileBranch be an LF intrinsic and expose it like this:
>>    GWT(p,t,f;S) := let(a=new int[3]) in lambda(*: S) {
>> selectAlternative(profileBranch(p.invoke( *), a), t, f).invoke( *); }
>>
>> Then selectAlternative triggers branchy bytecodes in the IBGen, and
>> profileBranch injects profiling in C2.
>> The presence of profileBranch would then trigger the @Shared annotation,
>> if you still need it.
>>
>> After thinking about it some more, I still believe it would be better to
>> detect the use of profileBranch during a C2 compile task, and feed that
>> to the too_many_traps logic.  I agree it is much easier to stick the
>> annotation on in the IBGen; the problem is that because of a minor phase
>> ordering problem you are introducing an annotation which flows from the
>> JDK to the VM.  Here's one more suggestion at reducing this coupling?
>>
>> Note that C->set_trap_count is called when each Parse phase processes a
>> whole method.  This means that information about the contents of the
>> nmethod accumulates during the parse.  Likewise, add a flag method
>> C->{has,set}_injected_profile, and set the flag whenever the parser sees
>> a profileBranch intrinsic (with or without a constant profile array;
>> your call).  Then consult that flag from too_many_traps.  It is true
>> that code which is parsed upstream of the very first profileBranch will
>> potentially issue a non-trapping fallback, but by definition that code
>> would be unrelated to the injected profile, so I don't see a harm in
>> that.  If this approach works, then you can remove the annotation
>> altogether, which is clearly preferable.  We understand the annotation
>> now, but it has the danger of becoming a maintainer's puzzlement.
>>
>>>
>>>> In 'updateCounters', if the counter overflows, you'll get continuous
>>>> creation of ArithmeticExceptions.  Will that optimize or will it
>>>> cause a
>>>> permanent slowdown?  Consider a hack like this on the exception path:
>>>>    counters[idx] = Integer.MAX_VALUE / 2;
>>> I had an impression that VM optimizes overflows in Math.exact*
>>> intrinsics, but it's not the case - it always inserts an uncommon
>>> trap. I used the workaround you proposed.
>>
>> Good.
>>
>>>
>>>> On the Name Bikeshed:  It looks like @IgnoreProfile (ignore_profile in
>>>> the VM) promises too much "ignorance", since it suppresses branch
>>>> counts
>>>> and traps, but allows type profiles to be consulted.  Maybe something
>>>> positive like "@ManyTraps" or "@SharedMegamorphic"?  (It's just a name,
>>>> and this is just a suggestion.)
>>> What do you think about @LambdaForm.Shared?
>>
>> That's fine.  Suggest changing the JVM accessor to
>> is_lambda_form_shared, because the term "shared" is already overused in
>> the VM.
>>
>> Or, to be much more accurate, s/@Shared/@CollectiveProfile/.  Better
>> yet, get rid of it, as suggested above.
>>
>> (I just realized that profile pollution looks logically parallel to the
>> http://en.wikipedia.org/wiki/Tragedy_of_the_commons .)
>>
>> Also, in the comment explaining the annotation:
>>    s/mostly useless/probably polluted by conflicting behavior from
>> multiple call sites/
>>
>> I very much like the fact that profileBranch is the VM intrinsic, not
>> selectAlternative.  A VM intrinsic should be nice and narrow like that.
>>   In fact, you can delete selectAlternative from vmSymbols while you are
>> at it.
>>
>> (We could do profileInteger and profileClass in a similar way, if that
>> turned out to be useful.)
>>
>> ? John

From john.r.rose at oracle.com  Tue Jan 27 00:04:03 2015
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 26 Jan 2015 16:04:03 -0800
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <54C66E4E.9050805@oracle.com>
References: <54B94766.2080102@oracle.com>
	<7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
	<54BEA7D7.6080008@oracle.com>
	<5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com>
	<54C66E4E.9050805@oracle.com>
Message-ID: <915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com>

On Jan 26, 2015, at 8:41 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> What do you think about the following version?
> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02 <http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02>
> 
> As you suggested, I reified MHI::profileBranch on LambdaForm level and removed @LambdaForm.Shared. My main concern about removing @Sharen was that profile pollution can affect the code before profileBranch call (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is sensitive to that change (there's a 10% difference in peak performance between @Shared and has_injected_profile()).
> 
> I can leave @Shared as is for now or remove it and work on the fix to the deoptimization counts pollution. What do you prefer?

Generic advice here:  It's better to leave it out, if in doubt.  If it has a real benefit, and we don't have time to make it clean, put it in and file a tracking bug to clean it up.

I re-read the change.  It's simpler and more coherent now.

I see one more issue which we should fix now, while we can.  It's the sort of thing which is hard to clean up later.

The two fields of the profileBranch array have obscure and inconsistent labelings.  It took me some hard thought and the inspection of three files to decide what "taken" and "not taken" mean in the C2 code that injects the profile.  The problem is that, when you look at profileBranch, all you see is an integer (boolean) argument and an array, and no clear indication about which array element corresponds to which argument value.  It's made worse by the fact that "taken" and "not taken" are not mentioned at all in the JDK code, which instead wires together the branches of selectAlternative without much comment.

My preferred formulation, for making things clearer:  Decouple the idea of branching from the idea of profile injection.  Name the intrinsic (yes, one more bikeshed color) "profileBoolean" (or even "injectBooleanProfile"), and use the natural indexing of the array:  0 (Java false) is a[0], and 1 (Java true) is a[1].  We might later extend this to work with "booleans" (more generally, small-integer flags), of more than two possible values, klasses, etc.

This line then goes away, and 'result' is used directly as the profile index:
+        int idx = result ? 0 : 1;

The ProfileBooleanNode should have an embedded (or simply indirect) array of ints which is a simple copy of the profile array, so there's no doubt about which count is which.

The parsing of the predicate that contains "profileBoolean" should probably be more robust, at least allowing for 'eq' and 'ne' versions of the test.  (C2 freely flips comparison senses, in various places.)  The check for Op_AndI must be more precise; make sure n->in(2) is a constant of the expected value (1).  The most robust way to handle it (but try this another time, I think) would be to make two temp copies of the predicate, substituting the occurrence of ProfileBoolean with '0' and '1', respectively; if they both fold to '0' and '1' or '1' and '0', then you take the indicated action.

I suggest putting the new code in Parse::dynamic_branch_prediction, which pattern-matches for injected profiles, into its own subroutine.  Maybe:
   bool use_mdo = true;
   if (has_injected_profile(btest, test, &taken, &not_taken)) {
     use_mdo = false;
   }
   if (use_mdo) { ... old code

I see why you used the opposite order in the existing code:  It mirrors the order of the second and third arguments to selectAlternative.  But the JVM knows nothing about selectAlternative, so it's just confusing when reading the VM code to know which profile array element means what.

? John

P.S.  Long experience with byte-order bugs in HotSpot convinces me that if you are not scrupulously clear in your terms, when working with equal and opposite configuration pairs, you will have a long bug tail, especially if you have to maintain agreement about the configurations through many layers of software.  This is one of those cases.  The best chance to fix such bugs is not to allow them in the first place.  In the case of byte-order, we have "first" vs. "second", "MSB" vs. "LSB", and "high" vs. "low" parts of values, for values in memory and in registers, and all possible misunderstandings about them and their relation have probably happened and caused bugs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150126/61c08f5f/attachment.html>

From raahul.kumar at gmail.com  Tue Jan 27 01:29:02 2015
From: raahul.kumar at gmail.com (Raahul Kumar)
Date: Tue, 27 Jan 2015 11:29:02 +1000
Subject: [9] RFR (M): 8063 a,eward137: Never-taken branches should be
	prtuned when GWT LambdaForms are shared
Message-ID: <CAG3=SAYA1HFvk82owr-RUEDXeuXw8TBF8MyX+1hwnQa52Ct+hA@mail.gmail.com>

Career
On 27 Jan 2015 10:04, "John Rose" <john.r.rose at oracle.com> wrote:

> On Jan 26, 2015, at 8:41 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> wrote:
>
>
> What do you think about the following version?
> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02
>
> As you suggested, I reified MHI::profileBranch on LambdaForm level and
> removed @LambdaForm.Shared. My main concern about removing @Sharen was that
> profile pollution can affect the code before profileBranch call (akin to
> 8068915 [1]) and it seems it's the case: Gbemu (at least) is sensitive to
> that change (there's a 10% difference in peak performance between @Shared
> and has_injected_profile()).
>
> I can leave @Shared as is for now or remove it and work on the fix to the
> deoptimization counts pollution. What do you prefer?
>
>
> Generic advice here:  It's better to leave it out, if in doubt.  If it has
> a real benefit, and we don't have time to make it clean, put it in and file
> a tracking bug to clean it up.
>
> I re-read the change.  It's simpler and more coherent now.
>
> I see one more issue which we should fix now, while we can.  It's the sort
> of thing which is hard to clean up later.
>
> The two fields of the profileBranch array have obscure and inconsistent
> labelings.  It took me some hard thought and the inspection of three files
> to decide what "taken" and "not taken" mean in the C2 code that injects the
> profile.  The problem is that, when you look at profileBranch, all you see
> is an integer (boolean) argument and an array, and no clear indication
> about which array element corresponds to which argument value.  It's made
> worse by the fact that "taken" and "not taken" are not mentioned at all in
> the JDK code, which instead wires together the branches of
> selectAlternative without much comment.
>
> My preferred formulation, for making things clearer:  Decouple the idea of
> branching from the idea of profile injection.  Name the intrinsic (yes, one
> more bikeshed color) "profileBoolean" (or even "injectBooleanProfile"), and
> use the natural indexing of the array:  0 (Java false) is a[0], and 1 (Java
> true) is a[1].  We might later extend this to work with "booleans" (more
> generally, small-integer flags), of more than two possible values, klasses,
> etc.
>
> This line then goes away, and 'result' is used directly as the profile
> index:
> +        int idx = result ? 0 : 1;
>
> The ProfileBooleanNode should have an embedded (or simply indirect) array
> of ints which is a simple copy of the profile array, so there's no doubt
> about which count is which.
>
> The parsing of the predicate that contains "profileBoolean" should
> probably be more robust, at least allowing for 'eq' and 'ne' versions of
> the test.  (C2 freely flips comparison senses, in various places.)  The
> check for Op_AndI must be more precise; make sure n->in(2) is a constant of
> the expected value (1).  The most robust way to handle it (but try this
> another time, I think) would be to make two temp copies of the predicate,
> substituting the occurrence of ProfileBoolean with '0' and '1',
> respectively; if they both fold to '0' and '1' or '1' and '0', then you
> take the indicated action.
>
> I suggest putting the new code in Parse::dynamic_branch_prediction, which
> pattern-matches for injected profiles, into its own subroutine.  Maybe:
>    bool use_mdo = true;
>    if (has_injected_profile(btest, test, &taken, &not_taken)) {
>      use_mdo = false;
>    }
>    if (use_mdo) { ... old code
>
> I see why you used the opposite order in the existing code:  It mirrors
> the order of the second and third arguments to selectAlternative.  But the
> JVM knows nothing about selectAlternative, so it's just confusing when
> reading the VM code to know which profile array element means what.
>
> ? John
>
> P.S.  Long experience with byte-order bugs in HotSpot convinces me that if
> you are not scrupulously clear in your terms, when working with equal and
> opposite configuration pairs, you will have a long bug tail, especially if
> you have to maintain agreement about the configurations through many layers
> of software.  This is one of those cases.  The best chance to fix such bugs
> is not to allow them in the first place.  In the case of byte-order, we
> have "first" vs. "second", "MSB" vs. "LSB", and "high" vs. "low" parts of
> values, for values in memory and in registers, and all possible
> misunderstandings about them and their relation have probably happened and
> caused bugs.
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150127/c60a7e68/attachment-0001.html>

From vladimir.x.ivanov at oracle.com  Tue Jan 27 16:05:19 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 27 Jan 2015 19:05:19 +0300
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com>
References: <54B94766.2080102@oracle.com>
	<7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
	<54BEA7D7.6080008@oracle.com>
	<5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com>
	<54C66E4E.9050805@oracle.com>
	<915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com>
Message-ID: <54C7B73F.50404@oracle.com>

Thanks for the feedback, John!

Updated webrev:
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/jdk
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/hotspot

Changes:
   - renamed MHI::profileBranch to MHI::profileBoolean, and 
ProfileBranchNode to ProfileBooleanNode;
   - restructured profile layout ([0] => false_cnt, [1] => true_cnt)
   - factored out profile injection in a separate function 
(has_injected_profile() in parse2.cpp)
   - ProfileBooleanNode stores true/false counts instead of 
taken/not_taken counts
   - matching from value counts to taken/not_taken happens in 
has_injected_profile();
   - added BoolTest::ne support
   - sharpened test for AndI case: now it checks AndI (ProfileBoolean) 
(ConI 1) shape

Best regards,
Vladimir Ivanov

On 1/27/15 3:04 AM, John Rose wrote:
> On Jan 26, 2015, at 8:41 AM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>>
>> What do you think about the following version?
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02
>>
>> As you suggested, I reified MHI::profileBranch on LambdaForm level and
>> removed @LambdaForm.Shared. My main concern about removing @Sharen was
>> that profile pollution can affect the code before profileBranch call
>> (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is
>> sensitive to that change (there's a 10% difference in peak performance
>> between @Shared and has_injected_profile()).
>>
>> I can leave @Shared as is for now or remove it and work on the fix to
>> the deoptimization counts pollution. What do you prefer?
>
> Generic advice here:  It's better to leave it out, if in doubt.  If it
> has a real benefit, and we don't have time to make it clean, put it in
> and file a tracking bug to clean it up.
>
> I re-read the change.  It's simpler and more coherent now.
>
> I see one more issue which we should fix now, while we can.  It's the
> sort of thing which is hard to clean up later.
>
> The two fields of the profileBranch array have obscure and inconsistent
> labelings.  It took me some hard thought and the inspection of three
> files to decide what "taken" and "not taken" mean in the C2 code that
> injects the profile.  The problem is that, when you look at
> profileBranch, all you see is an integer (boolean) argument and an
> array, and no clear indication about which array element corresponds to
> which argument value.  It's made worse by the fact that "taken" and "not
> taken" are not mentioned at all in the JDK code, which instead wires
> together the branches of selectAlternative without much comment.
>
> My preferred formulation, for making things clearer:  Decouple the idea
> of branching from the idea of profile injection.  Name the intrinsic
> (yes, one more bikeshed color) "profileBoolean" (or even
> "injectBooleanProfile"), and use the natural indexing of the array:  0
> (Java false) is a[0], and 1 (Java true) is a[1].  We might later extend
> this to work with "booleans" (more generally, small-integer flags), of
> more than two possible values, klasses, etc.
>
> This line then goes away, and 'result' is used directly as the profile
> index:
> +        int idx = result ? 0 : 1;
>
> The ProfileBooleanNode should have an embedded (or simply indirect)
> array of ints which is a simple copy of the profile array, so there's no
> doubt about which count is which.
>
> The parsing of the predicate that contains "profileBoolean" should
> probably be more robust, at least allowing for 'eq' and 'ne' versions of
> the test.  (C2 freely flips comparison senses, in various places.)  The
> check for Op_AndI must be more precise; make sure n->in(2) is a constant
> of the expected value (1).  The most robust way to handle it (but try
> this another time, I think) would be to make two temp copies of the
> predicate, substituting the occurrence of ProfileBoolean with '0' and
> '1', respectively; if they both fold to '0' and '1' or '1' and '0', then
> you take the indicated action.
>
> I suggest putting the new code in Parse::dynamic_branch_prediction,
> which pattern-matches for injected profiles, into its own subroutine.
>   Maybe:
>     bool use_mdo = true;
>     if (has_injected_profile(btest, test, &taken, &not_taken)) {
>       use_mdo = false;
>     }
>     if (use_mdo) { ... old code
>
> I see why you used the opposite order in the existing code:  It mirrors
> the order of the second and third arguments to selectAlternative.  But
> the JVM knows nothing about selectAlternative, so it's just confusing
> when reading the VM code to know which profile array element means what.
>
> ? John
>
> P.S.  Long experience with byte-order bugs in HotSpot convinces me that
> if you are not scrupulously clear in your terms, when working with equal
> and opposite configuration pairs, you will have a long bug tail,
> especially if you have to maintain agreement about the configurations
> through many layers of software.  This is one of those cases.  The best
> chance to fix such bugs is not to allow them in the first place.  In the
> case of byte-order, we have "first" vs. "second", "MSB" vs. "LSB", and
> "high" vs. "low" parts of values, for values in memory and in registers,
> and all possible misunderstandings about them and their relation have
> probably happened and caused bugs.

From john.r.rose at oracle.com  Tue Jan 27 21:08:47 2015
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 27 Jan 2015 13:08:47 -0800
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <54C7B73F.50404@oracle.com>
References: <54B94766.2080102@oracle.com>
	<7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
	<54BEA7D7.6080008@oracle.com>
	<5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com>
	<54C66E4E.9050805@oracle.com>
	<915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com>
	<54C7B73F.50404@oracle.com>
Message-ID: <8AD9A8CC-E570-4DE6-ABB1-10B00FACB8AB@oracle.com>

Looking very good, thanks.  Ship it!

Actually, can you insert a comment why the injected counts are not scaled?  (Or perhaps they should be??)

Also, we may need a followup bug for the code with this comment:
  // Look for the following shape: AndI (ProfileBoolean) (ConI 1))

Since profileBoolean returns a TypeInt::BOOL, the AndI with (ConI 1) should fold up.
So there's some work to do in MulNode, which may allow that special pattern match to go away.
But I don't want to divert the present bug by a possibly complex dive into fixing AndI::Ideal.

(Generally speaking, pattern matching should assume strong normalization of its inputs.  Otherwise you end up duplicating pattern match code in many places, inconsistently.  Funny one-off idiom checks like this are evidence of incomplete IR normalization.  See http://en.wikipedia.org/wiki/Rewriting for some background on terms like "normalization" and "confluence" which are relevant to C2.)

? John

On Jan 27, 2015, at 8:05 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Thanks for the feedback, John!
> 
> Updated webrev:
> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/jdk
> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/hotspot
> 
> Changes:
>  - renamed MHI::profileBranch to MHI::profileBoolean, and ProfileBranchNode to ProfileBooleanNode;
>  - restructured profile layout ([0] => false_cnt, [1] => true_cnt)
>  - factored out profile injection in a separate function (has_injected_profile() in parse2.cpp)
>  - ProfileBooleanNode stores true/false counts instead of taken/not_taken counts
>  - matching from value counts to taken/not_taken happens in has_injected_profile();
>  - added BoolTest::ne support
>  - sharpened test for AndI case: now it checks AndI (ProfileBoolean) (ConI 1) shape
> 
> Best regards,
> Vladimir Ivanov


From vladimir.x.ivanov at oracle.com  Wed Jan 28 09:00:55 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 28 Jan 2015 12:00:55 +0300
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <8AD9A8CC-E570-4DE6-ABB1-10B00FACB8AB@oracle.com>
References: <54B94766.2080102@oracle.com>
	<7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
	<54BEA7D7.6080008@oracle.com>
	<5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com>
	<54C66E4E.9050805@oracle.com>
	<915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com>
	<54C7B73F.50404@oracle.com>
	<8AD9A8CC-E570-4DE6-ABB1-10B00FACB8AB@oracle.com>
Message-ID: <54C8A547.6050607@oracle.com>

> Looking very good, thanks.  Ship it!
Thanks, John!

> Actually, can you insert a comment why the injected counts are not scaled?  (Or perhaps they should be??)
Sure! I intentionally don't scale the counts because I don't see any 
reason to do so. Profiling is done on per-MethodHandle basis, so the 
counts should be very close (considering racy updates) to the actual 
behavior.

> Also, we may need a followup bug for the code with this comment:
>    // Look for the following shape: AndI (ProfileBoolean) (ConI 1))
>
> Since profileBoolean returns a TypeInt::BOOL, the AndI with (ConI 1) should fold up.
> So there's some work to do in MulNode, which may allow that special pattern match to go away.
> But I don't want to divert the present bug by a possibly complex dive into fixing AndI::Ideal.
Good catch! It's an overlook on my side.
The following change for ProfileBooleanNode solves the problem:
-  virtual const Type *bottom_type() const { return TypeInt::INT; }
+  virtual const Type *bottom_type() const { return TypeInt::BOOL; }

I polished the change a little according to your comments (diff against 
v03):
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03-04/hotspot

Changes:
   - added short explanation why injected counts aren't scaled

   - adjusted ProfileBooleanNode type to TypeInt::BOOL and removed 
excessive pattern matching in has_injected_profile()

   - added an assert when ProfileBooleanNode is removed to catch the 
cases when injected profile isn't used: if we decide to generalize the 
API, I'd be happy to remove it, but current usages assumes that injected 
counts are always consumed during parsing and missing cases can cause 
hard-to-diagnose performance problems.

Best regards,
Vladimir Ivanov

>
> (Generally speaking, pattern matching should assume strong normalization of its inputs.  Otherwise you end up duplicating pattern match code in many places, inconsistently.  Funny one-off idiom checks like this are evidence of incomplete IR normalization.  See http://en.wikipedia.org/wiki/Rewriting for some background on terms like "normalization" and "confluence" which are relevant to C2.)
>
> ? John
>
> On Jan 27, 2015, at 8:05 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> Thanks for the feedback, John!
>>
>> Updated webrev:
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/jdk
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/hotspot
>>
>> Changes:
>>   - renamed MHI::profileBranch to MHI::profileBoolean, and ProfileBranchNode to ProfileBooleanNode;
>>   - restructured profile layout ([0] => false_cnt, [1] => true_cnt)
>>   - factored out profile injection in a separate function (has_injected_profile() in parse2.cpp)
>>   - ProfileBooleanNode stores true/false counts instead of taken/not_taken counts
>>   - matching from value counts to taken/not_taken happens in has_injected_profile();
>>   - added BoolTest::ne support
>>   - sharpened test for AndI case: now it checks AndI (ProfileBoolean) (ConI 1) shape
>>
>> Best regards,
>> Vladimir Ivanov
>

From vladimir.x.ivanov at oracle.com  Wed Jan 28 17:12:23 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 28 Jan 2015 20:12:23 +0300
Subject: [9] RFR (XS): 8071787: Don't block inlining when
	DONT_INLINE_THRESHOLD=0
Message-ID: <54C91877.5040707@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8071787/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8071787

For testing & performance measurements, sometimes it's useful to replace 
block inlining wrappers with trivial reinvokers.

This change extends DONT_INLINE_THRESHOLD in the following manner:
   DONT_INLINE_THRESHOLD = -1: no wrapper
   DONT_INLINE_THRESHOLD =  0: reinvoker
   DONT_INLINE_THRESHOLD >  0: counting wrapper

Before that DONT_INLINE_THRESHOLD=0 meant a counting wrapper which is 
removed on the first invocation. After the change, it's 
DONT_INLINE_THRESHOLD=1.

Testing: manual, java/lang/invoke

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Wed Jan 28 17:22:57 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 28 Jan 2015 20:22:57 +0300
Subject: [9] RFR (XXS): 8071788: CountingWrapper.asType() is broken
Message-ID: <54C91AF1.3010602@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8071788/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8071788

There's a type mismatch between MethodHandle and LambdaForm in 
CountingWrapper.asTypeUncached(). Sometimes, it leads to a VM crash.

The fix is to use adapted MethodHandle to construct LambdaForm.

There's no way to reproduce this problem with vanilla 8u40/9 binaries, 
because CountingWrapper is used only to block inlinining in GWT 
(MHI::profile() on target and fallback MethodHandles).

It means there's no way to call CountingWrapper.asType() on wrapped 
MethodHandles outside of java.lang.invoke code, and there are no such 
calls inside it.

Testing: manual, java/lang/invoke

Thanks!

Best regards,
Vladimir Ivanov

From john.r.rose at oracle.com  Wed Jan 28 20:30:37 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 28 Jan 2015 12:30:37 -0800
Subject: [9] RFR (M): 8063137: Never-taken branches should be pruned when
	GWT LambdaForms are shared
In-Reply-To: <54C8A547.6050607@oracle.com>
References: <54B94766.2080102@oracle.com>
	<7B03B9FB-17B4-4AE0-92B8-F2DC5B231294@oracle.com>
	<54BEA7D7.6080008@oracle.com>
	<5BA1E369-ED87-4EBD-8408-B73B726D91BD@oracle.com>
	<54C66E4E.9050805@oracle.com>
	<915998BE-25E9-4196-BAC7-FE5527E10F83@oracle.com>
	<54C7B73F.50404@oracle.com>
	<8AD9A8CC-E570-4DE6-ABB1-10B00FACB8AB@oracle.com>
	<54C8A547.6050607@oracle.com>
Message-ID: <EFC0C11B-2D6F-45E0-8A47-43C27C31D960@oracle.com>

On Jan 28, 2015, at 1:00 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> I polished the change a little according to your comments (diff against v03):
> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03-04/hotspot <http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03-04/hotspot>

+1  Glad to see the AndI folds up easily; thanks for the cleanup.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150128/9d80b6df/attachment-0001.html>

From volker.simonis at gmail.com  Wed Jan 28 20:40:49 2015
From: volker.simonis at gmail.com (Volker Simonis)
Date: Wed, 28 Jan 2015 21:40:49 +0100
Subject: What's the status of / relation between "JEP 169: Value Objects" /
	"Value Types for Java" / "Object Layout"
Message-ID: <CA+3eh122g3Ax+-PCL3EvX3V-cQkorva2j_c42_i4uRLOqw42Fg@mail.gmail.com>

Hi everybody,

I've recently did some research on Java "value objects" / "value
types" / "object layout" (I'll be actually giving a short talk on the
topic at FOSDEM[0] this weekend). I just want to quickly summarize my
current findings here and gently ask for feedback in case you think
I've totally misunderstood something. Of course any comments and
additional information is highly welcome as well.

1. JEP 169: Value Objects [1]
 - Created by John Rose in 2012 (last update in Sep. 2014)
 - Still in "Draft" state
 - Proposes a new "lockPermanently()" operator which marks objects as immutable
 - Seems to be only a little "helper functionality" to simplify
automatic boxing/unboxing and escape analysis
 - Referenced the mlwm mailing list and repository but the mlwm repo
seems dead since about 15 month now

Question: is JEP 169 still under active development or has it been
merged into the more general "Value types for Java" proposal below?

2. "Value types for Java" / "State of the Values" [2]
 - By J. Rose, B. Goetz and Guy Stele
 - Based on earlier ideas from "Value Types in the VM" [3]
 - Newest and most elaborate proposal
 - Proposes general (i.e. function arguments, return values,
variables, arrays), "immutable" value types
 - Requires fundamental changes to the VM as well as to the Java language
 - Related to the "State of the Specialization" proposal [4] about
support for generics over primitive and value types by B Goetz.
 - Discussed and developed in the OpenJDK "Valhalla" [5] project
 - Still very early stage (i.e. no "code" available yet)

3. PackedObjects as provided by the IBM J9 [6,7]
 - Flattens the memory layout of "@Packed" object fields and array
 - Removes object headers of and references to "@Packed" objects
 - Object headers can be generated on the fly (kind of "auto-boxing")
 - Currently the most complete and mature solution
 - Not Java-compatible (e.g. can not write to a nested "@Packed"
fields). Must be enabled as an experimental extension.

4. ObjectLayout [8]
 - A pure Java, layout-optimized data structure package
 - Designed similar to "@ValueSafe"/"ValueType" in [3] and "Value-base
classes" in Java 8 [9]
 - Designed such that it can be tranparently optimized within the VM
 - VM can transparently layout "@Intrinsic" objects within other objects
 - All objects are still complete Java object with valid header
 - The Java part of the library is mature, first native
VM-optimizations on the way [10]

The "Value types for Java" approach clearly seems to be the most
general but also the most complex proposal. It's out of scope for Java
9 and still questionable for Java 10 and above. The "PackedObject" and
"ObjectLayout" approaches are clearly simpler and more limited in
scope as they only concentrate on better object layout. However the
"ObjectLayout" proposal demonstrates that this is still possible
within the current Java specification while the "PackedObjects"
proposal demonstrated that an optimizing implementation is feasible.
I've recently built a prototype which intrinsifies/optimizes some
parts of the "ObjectLayout" proposal in the HotSpot [10].

Question: is there a chance to get a some sort of Java-only but
transparently optimizable structure package like "ObjectLayout" into
Java early (i.e. Java 9)?

In my eyes this wouldn't contradict with a more general solution like
the one proposed in the "Value types for Java" approach while still
offering quite significant performance improvements for quite a big
range of problems. And if carefully designed, it could be easily
retrofitted to use the new, general "Value Types" once they are
available.

Question: what would be the right place to propose something like the
"ObjectLayout" library for Java 9/10? Would that fit within the
umbrella of the Valhalla project or would it be done within its own
project / under it's own JEP?

Thanks for your patience,
Volker


[0] https://fosdem.org/2015/schedule/event/packed_objects/
[1] http://openjdk.java.net/jeps/169
[2] http://cr.openjdk.java.net/~jrose/values/values-0.html
[3] https://blogs.oracle.com/jrose/entry/value_types_in_the_vm
[4] http://cr.openjdk.java.net/~briangoetz/valhalla/specialization.html
[5] http://openjdk.java.net/projects/valhalla
[6] http://www.slideshare.net/rsciampacone/javaone-2013-introduction-to-packedobjects?related=1
[7] http://medianetwork.oracle.com/video/player/2623645005001
[8] http://objectlayout.org
[9] http://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html
[10] https://github.com/simonis/ObjectLayout/tree/hotspot_intrinsification/hotspot

From john.r.rose at oracle.com  Thu Jan 29 03:10:34 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 28 Jan 2015 19:10:34 -0800
Subject: [9] RFR (XS): 8071787: Don't block inlining when
	DONT_INLINE_THRESHOLD=0
In-Reply-To: <54C91877.5040707@oracle.com>
References: <54C91877.5040707@oracle.com>
Message-ID: <BA207B73-825C-4CBC-B630-98C092577D1C@oracle.com>

Good.  Consider fixing the typo in 'makeBlockInlningWrapper'.  ? John

On Jan 28, 2015, at 9:12 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8071787/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8071787
> 
> For testing & performance measurements, sometimes it's useful to replace block inlining wrappers with trivial reinvokers.
> 
> This change extends DONT_INLINE_THRESHOLD in the following manner:
>  DONT_INLINE_THRESHOLD = -1: no wrapper
>  DONT_INLINE_THRESHOLD =  0: reinvoker
>  DONT_INLINE_THRESHOLD >  0: counting wrapper
> 
> Before that DONT_INLINE_THRESHOLD=0 meant a counting wrapper which is removed on the first invocation. After the change, it's DONT_INLINE_THRESHOLD=1.
> 
> Testing: manual, java/lang/invoke
> 
> Best regards,
> Vladimir Ivanov


From john.r.rose at oracle.com  Thu Jan 29 03:11:49 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 28 Jan 2015 19:11:49 -0800
Subject: [9] RFR (XXS): 8071788: CountingWrapper.asType() is broken
In-Reply-To: <54C91AF1.3010602@oracle.com>
References: <54C91AF1.3010602@oracle.com>
Message-ID: <53D3F321-0259-4878-9767-EA909EF90810@oracle.com>

Good.

On Jan 28, 2015, at 9:22 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> The fix is to use adapted MethodHandle to construct LambdaForm.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150128/091c2de0/attachment.html>

From daniel.latremoliere at gmail.com  Thu Jan 29 11:02:32 2015
From: daniel.latremoliere at gmail.com (=?UTF-8?B?RGFuaWVsIExhdHLDqW1vbGnDqHJl?=)
Date: Thu, 29 Jan 2015 12:02:32 +0100
Subject: What's the status of / relation between "JEP 169: Value Objects"
	/ "Value Types for Java" / "Object Layout"
In-Reply-To: <CA+3eh122g3Ax+-PCL3EvX3V-cQkorva2j_c42_i4uRLOqw42Fg@mail.gmail.com>
References: <CA+3eh122g3Ax+-PCL3EvX3V-cQkorva2j_c42_i4uRLOqw42Fg@mail.gmail.com>
Message-ID: <54CA1348.5050903@gmail.com>


> I just want to quickly summarize my
> current findings here and gently ask for feedback in case you think
> I've totally misunderstood something. Of course any comments and
> additional information is highly welcome as well.
I don't know if that can be useful, but here is my point of view of 
developer oriented towards the question: "What feature for solving my 
problem?". This contains probably some or many errors, but it is another 
point of view (only mine), if useful.

I will not use strictly projects/proposal list as the structure of my 
mail because content of proposal is changing and it is not my target. I 
am oriented towards the final user, i.e. the developer consuming these 
projects, not the implementer working in each of these projects.

I will preferably split in three scopes following my perceived split of 
job between developer and runtime. The problem is data, then what can do 
JVM/GC with an object? I find two possibilities regarding this domain: 
move it, clone it.

If JVM can clone the object, JVM can also move the object because the 
clone will not have the same address, then we have the following three 
features:
---
1) JVM can clone and move objects (Project Valhalla):
Constraint: no complex constructor/no complex finalizer, because 
lifecycle of object is managed by JVM (JVM can clone, then JVM can 
create and destroy the object like JVM want). Only field affectation 
constructor, possibly with simple conversion of data format.
Constraint: immutable, because we don't know which clone is good when 
one is modified and because modifying all clones simultaneously is 
slow/complex/parallel-unfriendly.
Constraint: non-null because cloning a non-existing object is a 
non-existing problem.

Use-case "Performance": objects to clone for being closer to execution 
silicon and better parallelism (registers or cache of CPU/GPU)
- Runtime: expose features of CPU/GPU like SIMD (mostly like a modern 
version of javax.vecmath).
- Developer: create custom low-level structures for CPU/GPU parallel 
computing.
- Java language: small tuples, like complex numbers (immutable by 
performance choice, like SIMD, for being close to silicon; cloned at 
each pass by value).

Use-case "Language": objects to clone for being closer to registers (in 
stack, then less allocations in heap; simpler than escape analysis)
- Java language: multiple return values from a method (immutable because 
it's a result; cloned, by example, at the return of each delegate or not 
even created when stack-only).

Use-case "Efficiency": others immutable non-null objects possibly 
concerned for reducing indirection/improving cache, given by 
specialization of collection classes
- Database: primary key for Map (like HashMap)/B-Tree (like MapDB)/SQL 
(like JPA). A primary key is immutable and non-null by choice of 
developer, then possible gains.
---
2) JVM can move but not clone objects

It's current state of Java objects:
Constraint: developer need to define lifecycle in object, for being 
triggered by GC (constructor/finalizer) like current Java class.
Constraint: small object, because when GC move a big object, there is 
possibly a noticeable latency.
Constraint: usable directly only in Java code (because native code will 
need an indirection level for finding the real address of the object, 
changing after each move)

Improvement by adding custom layout for objects (Project Panama on heap 
/ ObjectLayout):
Specific constraint: objects which are near identity-less, i.e. only one 
other object (the owner) know their identity/have pointer on it.
Non-constraint: applicable to all objects types, contrary to Project 
Valhalla. Applicable to complex constructor, because complex constructor 
can be inlined in owner code where called. Applicable to mutable objects 
, because no cloning then no incoherency. Applicable to nullable objects 
only by adding a boolean field in the custom layout for storing 
potential existence or non-existence of the inlined object, and updating 
code testing nullability for using this boolean.

Use-case "General efficiency": Custom layout (Inline sub-object in the 
object owning it):
- Reduce memory use with less objects then less headers and less pointers.
- Improve cache performance with better locality (objects inlined are in 
same cache line, then no reference to follow).
- Applicable to many fields containing reference, requiring only the 
referenced object to be invisible from all objects except one (the owner).

By example, a private field containing an internal ArrayList (without 
getter/setter) can probably be replaced by the integer containing the 
used size and the reference to backing array, with inlining of the few 
methods of ArrayList really used.
It need probably to be driven by developer after real profiling for 
finding best ratio between efficiency/code expansion. It will probably 
have much more use-cases when AOT will be available and 
developer-manageable precisely (Jigsaw???), because most slow work of 
object-code inlining and following optimizations can be done at AOT 
time, while gains will be at running time.
Probably useful for the hottest code (JIT after this pre-optimization at 
AOT time) and clearly bad for the coldest code (interpreter then avoid 
code expansion), but very useful for the big quantity of code between, 
which will gain from AOT if complex optimizations are available. This 
will very probably require developer help/instructions/annotations using 
profiler data obtained on functional tests of application.
---
3) JVM can not move or clone objects (Project Panama off heap / 
PackedObjects)
Constraint: developer need to manage externally the full lifecycle of 
object and need to choose when creating or destroying it. Object is 
off-heap and an handle is on-heap for managing off-heap part.
Constraint: potential fragmentation of free memory when frequently 
creating and removing objects not having the same size (taking attention 
to object size vs. page size is probably important).

Use-case "GC Latency": big data structure inducing GC latency when moved 
if stored in heap
- All big chunks of data, like Big Data or textures in games, etc.
- Few number of objects for being manageable more explicitly by 
developer (without too much work).

Use-case "Native": communicate with native library
- Modern version of JNI

Only my 2 cents,
Daniel.

From blackdrag at gmx.org  Thu Jan 29 11:55:39 2015
From: blackdrag at gmx.org (Jochen Theodorou)
Date: Thu, 29 Jan 2015 12:55:39 +0100
Subject: What's the status of / relation between "JEP 169: Value Objects"
	/ "Value Types for Java" / "Object Layout"
In-Reply-To: <54CA1348.5050903@gmail.com>
References: <CA+3eh122g3Ax+-PCL3EvX3V-cQkorva2j_c42_i4uRLOqw42Fg@mail.gmail.com>
	<54CA1348.5050903@gmail.com>
Message-ID: <54CA1FBB.7020601@gmx.org>

Am 29.01.2015 12:02, schrieb Daniel Latr?moli?re:
>
>> I just want to quickly summarize my
>> current findings here and gently ask for feedback in case you think
>> I've totally misunderstood something. Of course any comments and
>> additional information is highly welcome as well.
> I don't know if that can be useful, but here is my point of view of
> developer oriented towards the question: "What feature for solving my
> problem?". This contains probably some or many errors, but it is another
> point of view (only mine), if useful.
[...]
> 3) JVM can not move or clone objects (Project Panama off heap /
> PackedObjects)
> Constraint: developer need to manage externally the full lifecycle of
> object and need to choose when creating or destroying it. Object is
> off-heap and an handle is on-heap for managing off-heap part.
> Constraint: potential fragmentation of free memory when frequently
> creating and removing objects not having the same size (taking attention
> to object size vs. page size is probably important).
>
> Use-case "GC Latency": big data structure inducing GC latency when moved
> if stored in heap
> - All big chunks of data, like Big Data or textures in games, etc.
> - Few number of objects for being manageable more explicitly by
> developer (without too much work).
>
> Use-case "Native": communicate with native library
> - Modern version of JNI

 From that view it makes me wonder if that is really in the scope of JEP 
169.

bye Jochen

-- 
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org


From brian.goetz at oracle.com  Thu Jan 29 17:05:23 2015
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 29 Jan 2015 12:05:23 -0500
Subject: What's the status of / relation between "JEP 169: Value Objects"
	/ "Value Types for Java" / "Object Layout"
In-Reply-To: <CA+3eh122g3Ax+-PCL3EvX3V-cQkorva2j_c42_i4uRLOqw42Fg@mail.gmail.com>
References: <CA+3eh122g3Ax+-PCL3EvX3V-cQkorva2j_c42_i4uRLOqw42Fg@mail.gmail.com>
Message-ID: <54CA6853.2060601@oracle.com>

> Question: is JEP 169 still under active development or has it been
> merged into the more general "Value types for Java" proposal below?

It has been merged into the more general Value Types for Java proposal.

> The "Value types for Java" approach clearly seems to be the most
> general but also the most complex proposal.

For some meanings of "complex".  It is certainly the most intrusive and 
large; new bytecodes, new type signatures.  But from a user-model 
perspective, value types are actually fairly simple.

> It's out of scope for Java
> 9 and still questionable for Java 10 and above. The "PackedObject" and
> "ObjectLayout" approaches are clearly simpler and more limited in
> scope as they only concentrate on better object layout.

To your list, I'd add: Project Panama, the sister project to Valhalla. 
Panama focuses on interop with native code and data, including layout 
specification.  A key goal of Packed was to be able to access off-heap 
native data in its native format, rather than marshalling it across the 
JNI boundary.  Panama is focused on this problem as well, but aims to 
treat it as a separate problem from Java object layout, resulting in 
what we believe to be a cleaner decomposition of the two concerns.

Packed is an interesting mix of memory density (object embedding and 
packed arrays) and native interop.  But mixing the two goals also has 
costs; our approach is to separate them into orthogonal concerns, and we 
think that Valhalla and Panama do just that.  So in many ways, while a 
larger project, the combination of Valhalla+Panama addresses the problem 
that Packed did, in a cleaner way.

> Question: is there a chance to get a some sort of Java-only but
> transparently optimizable structure package like "ObjectLayout" into
> Java early (i.e. Java 9)?

It would depend on a lot of things -- including the level of readiness 
of the design and implementation, and the overlap with anticipated 
future features.  We've reviewed some of the early design of 
ObjectLayout and provided feedback to the projects architects; 
currently, I think it's in the "promising exploration" stage, but I 
think multiple rounds of simplification are needed before it is ready to 
be considered for "everybody's Java."  But if the choice is to push 
something that's not ready into 9, or to wait longer -- there's not 
actually a choice to be made there.

I appreciate the desire to "get something you can use now", but we have 
to be prepared to support whatever we push into Java for the next 20 
years, and deal with the additional constraints it generates -- which 
can be an enormous cost.  (Even thought the direct cost is mostly borne 
by Oracle, the indirect cost is borne by everyone, in the form of slower 
progress on everything else.)  So I am very wary of the motivation of 
"well, something better is coming, but this works now, so can we push it 
in?"  I'd prefer to focus on answering whether this is right thing for 
Java for the next 20 years.

> In my eyes this wouldn't contradict with a more general solution like
> the one proposed in the "Value types for Java" approach while still
> offering quite significant performance improvements for quite a big
> range of problems.

The goals of the ObjectLayout effort has overlap with, but also differs 
from, the goals of Valhalla.  And herein is the problem; neither 
generalizes the other, and I don't think we do the user base a great 
favor by pursuing two separate neither-coincident-nor-orthogonal 
approaches.  I suspect, though, that after a few rounds of 
simplification, ObjectLayout could morph into something that fit either 
coincidently or orthogonally with the Valhalla work -- which would be 
great.  But, as you know, our resources are limited, so we (Oracle) 
can't really afford to invest in both.  And such simplification takes 
time -- getting to that "aha" moment when you realize you can simplify 
something is generally an incompressible process.

> Question: what would be the right place to propose something like the
> "ObjectLayout" library for Java 9/10? Would that fit within the
> umbrella of the Valhalla project or would it be done within its own
> project / under it's own JEP?

Suggesting a version number at this point would be putting the cart 
before the horse (you'll note that we've not even proposed a version 
number for Valhalla; the closest we've gotten to that is "after 9".)

OpenJDK Projects are a tool for building a community around a body of 
work; JEPs are a project-management tool for defining, scoping, and 
tracking the progress of a feature.  Given where OL is, it would be 
reasonable to start a Project, which would become the nexus of 
collaboration that could eventually produce a JEP.

Hope this helps,
-Brian

From volker.simonis at gmail.com  Thu Jan 29 17:31:09 2015
From: volker.simonis at gmail.com (Volker Simonis)
Date: Thu, 29 Jan 2015 18:31:09 +0100
Subject: What's the status of / relation between "JEP 169: Value Objects"
	/ "Value Types for Java" / "Object Layout"
In-Reply-To: <54CA1348.5050903@gmail.com>
References: <CA+3eh122g3Ax+-PCL3EvX3V-cQkorva2j_c42_i4uRLOqw42Fg@mail.gmail.com>
	<54CA1348.5050903@gmail.com>
Message-ID: <CA+3eh12v8YW+tcapg6hg7qW41tdpfiQAtTdPXisv7fb1efyW3g@mail.gmail.com>

Hi Daniel,

thanks a lot for sharing your point of view.

I haven't been aware of the fact that Project Panama is also working
on similar topics (I always thought it is only about the Foreign
Function Interface and the next generation JNI). In [1,2] John Rose
nicely explains that new data layouts in the JVM heap are very well on
the agenda of Project Panama and he also mentions IBM's PackedObjects
and Gil Ten's Object Layout proposals.

Regards,
Volker

[1] http://mail.openjdk.java.net/pipermail/panama-dev/2014-October/000042.html
[2] https://blogs.oracle.com/jrose/entry/the_isthmus_in_the_vm

On Thu, Jan 29, 2015 at 12:02 PM, Daniel Latr?moli?re
<daniel.latremoliere at gmail.com> wrote:
>
>> I just want to quickly summarize my
>> current findings here and gently ask for feedback in case you think
>> I've totally misunderstood something. Of course any comments and
>> additional information is highly welcome as well.
>
> I don't know if that can be useful, but here is my point of view of
> developer oriented towards the question: "What feature for solving my
> problem?". This contains probably some or many errors, but it is another
> point of view (only mine), if useful.
>
> I will not use strictly projects/proposal list as the structure of my mail
> because content of proposal is changing and it is not my target. I am
> oriented towards the final user, i.e. the developer consuming these
> projects, not the implementer working in each of these projects.
>
> I will preferably split in three scopes following my perceived split of job
> between developer and runtime. The problem is data, then what can do JVM/GC
> with an object? I find two possibilities regarding this domain: move it,
> clone it.
>
> If JVM can clone the object, JVM can also move the object because the clone
> will not have the same address, then we have the following three features:
> ---
> 1) JVM can clone and move objects (Project Valhalla):
> Constraint: no complex constructor/no complex finalizer, because lifecycle
> of object is managed by JVM (JVM can clone, then JVM can create and destroy
> the object like JVM want). Only field affectation constructor, possibly with
> simple conversion of data format.
> Constraint: immutable, because we don't know which clone is good when one is
> modified and because modifying all clones simultaneously is
> slow/complex/parallel-unfriendly.
> Constraint: non-null because cloning a non-existing object is a non-existing
> problem.
>
> Use-case "Performance": objects to clone for being closer to execution
> silicon and better parallelism (registers or cache of CPU/GPU)
> - Runtime: expose features of CPU/GPU like SIMD (mostly like a modern
> version of javax.vecmath).
> - Developer: create custom low-level structures for CPU/GPU parallel
> computing.
> - Java language: small tuples, like complex numbers (immutable by
> performance choice, like SIMD, for being close to silicon; cloned at each
> pass by value).
>
> Use-case "Language": objects to clone for being closer to registers (in
> stack, then less allocations in heap; simpler than escape analysis)
> - Java language: multiple return values from a method (immutable because
> it's a result; cloned, by example, at the return of each delegate or not
> even created when stack-only).
>
> Use-case "Efficiency": others immutable non-null objects possibly concerned
> for reducing indirection/improving cache, given by specialization of
> collection classes
> - Database: primary key for Map (like HashMap)/B-Tree (like MapDB)/SQL (like
> JPA). A primary key is immutable and non-null by choice of developer, then
> possible gains.
> ---
> 2) JVM can move but not clone objects
>
> It's current state of Java objects:
> Constraint: developer need to define lifecycle in object, for being
> triggered by GC (constructor/finalizer) like current Java class.
> Constraint: small object, because when GC move a big object, there is
> possibly a noticeable latency.
> Constraint: usable directly only in Java code (because native code will need
> an indirection level for finding the real address of the object, changing
> after each move)
>
> Improvement by adding custom layout for objects (Project Panama on heap /
> ObjectLayout):
> Specific constraint: objects which are near identity-less, i.e. only one
> other object (the owner) know their identity/have pointer on it.
> Non-constraint: applicable to all objects types, contrary to Project
> Valhalla. Applicable to complex constructor, because complex constructor can
> be inlined in owner code where called. Applicable to mutable objects ,
> because no cloning then no incoherency. Applicable to nullable objects only
> by adding a boolean field in the custom layout for storing potential
> existence or non-existence of the inlined object, and updating code testing
> nullability for using this boolean.
>
> Use-case "General efficiency": Custom layout (Inline sub-object in the
> object owning it):
> - Reduce memory use with less objects then less headers and less pointers.
> - Improve cache performance with better locality (objects inlined are in
> same cache line, then no reference to follow).
> - Applicable to many fields containing reference, requiring only the
> referenced object to be invisible from all objects except one (the owner).
>
> By example, a private field containing an internal ArrayList (without
> getter/setter) can probably be replaced by the integer containing the used
> size and the reference to backing array, with inlining of the few methods of
> ArrayList really used.
> It need probably to be driven by developer after real profiling for finding
> best ratio between efficiency/code expansion. It will probably have much
> more use-cases when AOT will be available and developer-manageable precisely
> (Jigsaw???), because most slow work of object-code inlining and following
> optimizations can be done at AOT time, while gains will be at running time.
> Probably useful for the hottest code (JIT after this pre-optimization at AOT
> time) and clearly bad for the coldest code (interpreter then avoid code
> expansion), but very useful for the big quantity of code between, which will
> gain from AOT if complex optimizations are available. This will very
> probably require developer help/instructions/annotations using profiler data
> obtained on functional tests of application.
> ---
> 3) JVM can not move or clone objects (Project Panama off heap /
> PackedObjects)
> Constraint: developer need to manage externally the full lifecycle of object
> and need to choose when creating or destroying it. Object is off-heap and an
> handle is on-heap for managing off-heap part.
> Constraint: potential fragmentation of free memory when frequently creating
> and removing objects not having the same size (taking attention to object
> size vs. page size is probably important).
>
> Use-case "GC Latency": big data structure inducing GC latency when moved if
> stored in heap
> - All big chunks of data, like Big Data or textures in games, etc.
> - Few number of objects for being manageable more explicitly by developer
> (without too much work).
>
> Use-case "Native": communicate with native library
> - Modern version of JNI
>
> Only my 2 cents,
> Daniel.

From vladimir.x.ivanov at oracle.com  Thu Jan 29 18:18:13 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 29 Jan 2015 21:18:13 +0300
Subject: [9] RFR (XS): 8071787: Don't block inlining when
	DONT_INLINE_THRESHOLD=0
In-Reply-To: <BA207B73-825C-4CBC-B630-98C092577D1C@oracle.com>
References: <54C91877.5040707@oracle.com>
	<BA207B73-825C-4CBC-B630-98C092577D1C@oracle.com>
Message-ID: <54CA7965.7030301@oracle.com>

Thanks, John!

Best regards,
Vladimir Ivanov

On 1/29/15 6:10 AM, John Rose wrote:
> Good.  Consider fixing the typo in 'makeBlockInlningWrapper'.  ? John
>
> On Jan 28, 2015, at 9:12 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/8071787/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8071787
>>
>> For testing & performance measurements, sometimes it's useful to replace block inlining wrappers with trivial reinvokers.
>>
>> This change extends DONT_INLINE_THRESHOLD in the following manner:
>>   DONT_INLINE_THRESHOLD = -1: no wrapper
>>   DONT_INLINE_THRESHOLD =  0: reinvoker
>>   DONT_INLINE_THRESHOLD >  0: counting wrapper
>>
>> Before that DONT_INLINE_THRESHOLD=0 meant a counting wrapper which is removed on the first invocation. After the change, it's DONT_INLINE_THRESHOLD=1.
>>
>> Testing: manual, java/lang/invoke
>>
>> Best regards,
>> Vladimir Ivanov
>

From vladimir.x.ivanov at oracle.com  Thu Jan 29 18:18:22 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 29 Jan 2015 21:18:22 +0300
Subject: [9] RFR (XXS): 8071788: CountingWrapper.asType() is broken
In-Reply-To: <53D3F321-0259-4878-9767-EA909EF90810@oracle.com>
References: <54C91AF1.3010602@oracle.com>
	<53D3F321-0259-4878-9767-EA909EF90810@oracle.com>
Message-ID: <54CA796E.2090500@oracle.com>

Thanks, John!

Best regards,
Vladimir Ivanov

On 1/29/15 6:11 AM, John Rose wrote:
> Good.
>
> On Jan 28, 2015, at 9:22 AM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>>
>> The fix is to use adapted MethodHandle to construct LambdaForm.
>

From christian.thalinger at oracle.com  Fri Jan 30 00:41:13 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 29 Jan 2015 16:41:13 -0800
Subject: Invokedynamic and recursive method call
In-Reply-To: <CAE-f1xSr7Kq88RNv=uS0ae+wFKAdWJS=eRT+GXCpktojGsdy-w@mail.gmail.com>
References: <54A3019C.1070909@univ-mlv.fr>
	<9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>
	<54AD5B3C.80004@univ-mlv.fr>
	<CAE-f1xSr7Kq88RNv=uS0ae+wFKAdWJS=eRT+GXCpktojGsdy-w@mail.gmail.com>
Message-ID: <D37B75DF-A793-4EBE-9016-BB6D34DA7DFA@oracle.com>

Trying to remember compiler implementation details this sounds reasonable and is a bug (or an enhancement, actually ;-).  Can someone file a bug?

> On Jan 7, 2015, at 10:07 AM, Charles Oliver Nutter <headius at headius.com> wrote:
> 
> This could explain performance regressions we've seen on the
> performance of heavily-recursive algorithms. I'll try to get an
> assembly dump for fib in JRuby later today.
> 
> - Charlie
> 
> On Wed, Jan 7, 2015 at 10:13 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>> 
>> On 01/07/2015 10:43 AM, Marcus Lagergren wrote:
>>> 
>>> Remi, I tried to reproduce your problem with jdk9 b44. It runs decently
>>> fast.
>> 
>> 
>> yes, nashorn is fast enough but it can be faster if the JIT was not doing
>> something stupid.
>> 
>> When the VM inline fibo, because fibo is recursive, the recursive call is
>> inlined only once,
>> so the call at depth=2 can not be inlined but should be a classical direct
>> call.
>> 
>> But if fibo is called through an invokedynamic, instead of emitting a direct
>> call to fibo,
>> the JIT generates a code that push the method handle on stack and execute it
>> like if the metod handle was not constant
>> (the method handle is constant because the call at depth=1 is inlined !).
>> 
>>> When did it start to regress?
>> 
>> 
>> jdk7u40, i believe.
>> 
>> I've created a jar containing some handwritten bytecodes with no dependency
>> to reproduce the issue easily:
>>  https://github.com/forax/vmboiler/blob/master/test7/fibo7.jar
>> 
>> [forax at localhost test7]$ time /usr/jdk/jdk1.9.0/bin/java -cp fibo7.jar
>> FiboSample
>> 1836311903
>> 
>> real    0m6.653s
>> user    0m6.729s
>> sys    0m0.019s
>> [forax at localhost test7]$ time /usr/jdk/jdk1.8.0_25/bin/java -cp fibo7.jar
>> FiboSample
>> 1836311903
>> 
>> real    0m6.572s
>> user    0m6.591s
>> sys    0m0.019s
>> [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_71/bin/java -cp fibo7.jar
>> FiboSample
>> 1836311903
>> 
>> real    0m6.373s
>> user    0m6.396s
>> sys    0m0.016s
>> [forax at localhost test7]$ time /usr/jdk/jdk1.7.0_25/bin/java -cp fibo7.jar
>> FiboSample
>> 1836311903
>> 
>> real    0m4.847s
>> user    0m4.832s
>> sys    0m0.019s
>> 
>> as you can see, it was faster with a JDK before jdk7u40.
>> 
>>> 
>>> Regards
>>> Marcus
>> 
>> 
>> cheers,
>> R?mi
>> 
>> 
>>> 
>>>> On 30 Dec 2014, at 20:48, Remi Forax <forax at univ-mlv.fr> wrote:
>>>> 
>>>> Hi guys,
>>>> I've found a bug in the interaction between the lambda form and inlining
>>>> algorithm,
>>>> basically if the inlining heuristic bailout because the method is
>>>> recursive and already inlined once,
>>>> instead to emit a code to do a direct call, it revert to do call to
>>>> linkStatic with the method
>>>> as MemberName.
>>>> 
>>>> I think it's a regression because before the introduction of lambda
>>>> forms,
>>>> I'm pretty sure that the JIT was emitting a direct call.
>>>> 
>>>> Step to reproduce with nashorn, run this JavaScript code
>>>> function fibo(n) {
>>>>  return (n < 2)? 1: fibo(n - 1) + fibo(n - 2)
>>>> }
>>>> 
>>>> print(fibo(45))
>>>> 
>>>> like this:
>>>>  /usr/jdk/jdk1.9.0/bin/jjs -J-XX:+UnlockDiagnosticVMOptions
>>>> -J-XX:+PrintAssembly fibo.js > log.txt
>>>> 
>>>> look for a method 'fibo' from the tail of the log, you will find
>>>> something like this:
>>>> 
>>>>  0x00007f97e4b4743f: mov    $0x76d08f770,%r8   ;   {oop(a
>>>> 'java/lang/invoke/MemberName' = {method} {0x00007f97dcff8e40} 'fibo'
>>>> '(Ljdk/nashorn/internal/runtime/ScriptFunction;Ljava/lang/Object;I)I' in
>>>> 'jdk/nashorn/internal/scripts/Script$Recompilation$2$fibo')}
>>>>  0x00007f97e4b47449: xchg   %ax,%ax
>>>>  0x00007f97e4b4744b: callq  0x00007f97dd0446e0
>>>> 
>>>> I hope this can be fixed. My demonstration that I can have fibo written
>>>> with a dynamic language
>>>> that run as fast as written in Java doesn't work anymore :(
>>>> 
>>>> cheers,
>>>> R?mi
>>>> 
>>>> _______________________________________________
>>>> mlvm-dev mailing list
>>>> mlvm-dev at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>> 
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>> 
>> 
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From john.r.rose at oracle.com  Fri Jan 30 00:48:03 2015
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 29 Jan 2015 16:48:03 -0800
Subject: Invokedynamic and recursive method call
In-Reply-To: <54AD5B3C.80004@univ-mlv.fr>
References: <54A3019C.1070909@univ-mlv.fr>
	<9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>
	<54AD5B3C.80004@univ-mlv.fr>
Message-ID: <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com>

On Jan 7, 2015, at 8:13 AM, Remi Forax <forax at univ-mlv.fr> wrote:
> 
> But if fibo is called through an invokedynamic, instead of emitting a direct call to fibo,
> the JIT generates a code that push the method handle on stack and execute it
> like if the metod handle was not constant
> (the method handle is constant because the call at depth=1 is inlined !).

Invocation of non-constant MH's had a performance regression with the LF-based implementation.
As of JDK-8069591 they should be no slower and sometimes faster than the old implementation.
? John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150129/d2084e37/attachment-0001.html>

From christian.thalinger at oracle.com  Fri Jan 30 01:01:04 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 29 Jan 2015 17:01:04 -0800
Subject: Invokedynamic and recursive method call
In-Reply-To: <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com>
References: <54A3019C.1070909@univ-mlv.fr>
	<9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>
	<54AD5B3C.80004@univ-mlv.fr>
	<68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com>
Message-ID: <67DAAC92-261C-4769-8299-027F66081AFE@oracle.com>


> On Jan 29, 2015, at 4:48 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
> On Jan 7, 2015, at 8:13 AM, Remi Forax <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>> wrote:
>> 
>> But if fibo is called through an invokedynamic, instead of emitting a direct call to fibo,
>> the JIT generates a code that push the method handle on stack and execute it
>> like if the metod handle was not constant
>> (the method handle is constant because the call at depth=1 is inlined !).
> 
> Invocation of non-constant MH's had a performance regression with the LF-based implementation.
> As of JDK-8069591 they should be no slower and sometimes faster than the old implementation.

Maybe but what Remi is saying that the MH is constant and we could emit a direct call.

> ? John
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150129/597b093f/attachment.html>

From forax at univ-mlv.fr  Fri Jan 30 01:03:03 2015
From: forax at univ-mlv.fr (Remi Forax)
Date: Fri, 30 Jan 2015 02:03:03 +0100
Subject: Invokedynamic and recursive method call
In-Reply-To: <68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com>
References: <54A3019C.1070909@univ-mlv.fr>	<9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>	<54AD5B3C.80004@univ-mlv.fr>
	<68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com>
Message-ID: <54CAD847.1070804@univ-mlv.fr>


On 01/30/2015 01:48 AM, John Rose wrote:
> On Jan 7, 2015, at 8:13 AM, Remi Forax <forax at univ-mlv.fr 
> <mailto:forax at univ-mlv.fr>> wrote:
>>
>> But if fibo is called through an invokedynamic, instead of emitting a 
>> direct call to fibo,
>> the JIT generates a code that push the method handle on stack and 
>> execute it
>> like if the metod handle was not constant
>> (the method handle is constant because the call at depth=1 is inlined !).
>
> Invocation of non-constant MH's had a performance regression with the 
> LF-based implementation.
> As of JDK-8069591 they should be no slower and sometimes faster than 
> the old implementation.
> ? John
>

In my case, the method handle is constant (I think it's also the case 
when you write fibo in javascript).
At depth=1, the call is correctly inlined.
At depth=2, the call is not inlined because it's a recursive call and by 
default hotspot only inline recursive call once,
this is normal behavior. The bug is that instead of doing a call (using 
the call assembly instruction),
the JIT pushes the method handle on stack and do an invokebasic, which 
is slower.

R?mi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20150130/5353400a/attachment.html>

From vladimir.x.ivanov at oracle.com  Fri Jan 30 15:07:26 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 30 Jan 2015 18:07:26 +0300
Subject: Invokedynamic and recursive method call
In-Reply-To: <54CAD847.1070804@univ-mlv.fr>
References: <54A3019C.1070909@univ-mlv.fr>	<9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>	<54AD5B3C.80004@univ-mlv.fr>	<68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com>
	<54CAD847.1070804@univ-mlv.fr>
Message-ID: <54CB9E2E.6040203@oracle.com>

Remi, thanks for the report!

Filed JDK-8072008 [1].

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8072008

On 1/30/15 4:03 AM, Remi Forax wrote:
>
> On 01/30/2015 01:48 AM, John Rose wrote:
>> On Jan 7, 2015, at 8:13 AM, Remi Forax <forax at univ-mlv.fr
>> <mailto:forax at univ-mlv.fr>> wrote:
>>>
>>> But if fibo is called through an invokedynamic, instead of emitting a
>>> direct call to fibo,
>>> the JIT generates a code that push the method handle on stack and
>>> execute it
>>> like if the metod handle was not constant
>>> (the method handle is constant because the call at depth=1 is inlined !).
>>
>> Invocation of non-constant MH's had a performance regression with the
>> LF-based implementation.
>> As of JDK-8069591 they should be no slower and sometimes faster than
>> the old implementation.
>> ? John
>>
>
> In my case, the method handle is constant (I think it's also the case
> when you write fibo in javascript).
> At depth=1, the call is correctly inlined.
> At depth=2, the call is not inlined because it's a recursive call and by
> default hotspot only inline recursive call once,
> this is normal behavior. The bug is that instead of doing a call (using
> the call assembly instruction),
> the JIT pushes the method handle on stack and do an invokebasic, which
> is slower.
>
> R?mi
>
>
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>

From forax at univ-mlv.fr  Sat Jan 31 22:54:46 2015
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 31 Jan 2015 23:54:46 +0100
Subject: Invokedynamic and recursive method call
In-Reply-To: <54CB9E2E.6040203@oracle.com>
References: <54A3019C.1070909@univ-mlv.fr>	<9E513159-F926-4845-A11E-6585F8CFD788@oracle.com>	<54AD5B3C.80004@univ-mlv.fr>	<68EA3AFE-2625-4797-A552-ED07576BC46B@oracle.com>	<54CAD847.1070804@univ-mlv.fr>
	<54CB9E2E.6040203@oracle.com>
Message-ID: <54CD5D36.5040700@univ-mlv.fr>

Thank you,
Vladimir !

R?mi

On 01/30/2015 04:07 PM, Vladimir Ivanov wrote:
> Remi, thanks for the report!
>
> Filed JDK-8072008 [1].
>
> Best regards,
> Vladimir Ivanov
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8072008
>
> On 1/30/15 4:03 AM, Remi Forax wrote:
>>
>> On 01/30/2015 01:48 AM, John Rose wrote:
>>> On Jan 7, 2015, at 8:13 AM, Remi Forax <forax at univ-mlv.fr
>>> <mailto:forax at univ-mlv.fr>> wrote:
>>>>
>>>> But if fibo is called through an invokedynamic, instead of emitting a
>>>> direct call to fibo,
>>>> the JIT generates a code that push the method handle on stack and
>>>> execute it
>>>> like if the metod handle was not constant
>>>> (the method handle is constant because the call at depth=1 is 
>>>> inlined !).
>>>
>>> Invocation of non-constant MH's had a performance regression with the
>>> LF-based implementation.
>>> As of JDK-8069591 they should be no slower and sometimes faster than
>>> the old implementation.
>>> ? John
>>>
>>
>> In my case, the method handle is constant (I think it's also the case
>> when you write fibo in javascript).
>> At depth=1, the call is correctly inlined.
>> At depth=2, the call is not inlined because it's a recursive call and by
>> default hotspot only inline recursive call once,
>> this is normal behavior. The bug is that instead of doing a call (using
>> the call assembly instruction),
>> the JIT pushes the method handle on stack and do an invokebasic, which
>> is slower.
>>
>> R?mi
>>
>>
>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev