From roland.westrelin at oracle.com  Mon Aug  1 05:00:31 2011
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 1 Aug 2011 14:00:31 +0200
Subject: proposed membar simplification in c2
In-Reply-To: <82AE947F-0E5B-475C-84B2-49D9CB6F282A@oracle.com>
References: <op.vyu3v8drrz1tqb@grosvic> <4E25C003.5030805@oracle.com>
	<op.vy0mj5bmrz1tqb@grosvic>
	<82AE947F-0E5B-475C-84B2-49D9CB6F282A@oracle.com>
Message-ID: <5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com>


While doing more testing I found that I had to make some changes in src/share/vm/adlc/formssel.cpp as well. Here is an updated webrev:

http://cr.openjdk.java.net/~roland/membar/webrev.03/

Roland.

From vladimir.kozlov at oracle.com  Mon Aug  1 08:21:10 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 01 Aug 2011 08:21:10 -0700
Subject: proposed membar simplification in c2
In-Reply-To: <5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com>
References: <op.vyu3v8drrz1tqb@grosvic>
	<4E25C003.5030805@oracle.com>	<op.vy0mj5bmrz1tqb@grosvic>	<82AE947F-0E5B-475C-84B2-49D9CB6F282A@oracle.com>
	<5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com>
Message-ID: <4E36C466.9090609@oracle.com>

Good.

Vladimir

On 8/1/11 5:00 AM, Roland Westrelin wrote:
>
> While doing more testing I found that I had to make some changes in src/share/vm/adlc/formssel.cpp as well. Here is an updated webrev:
>
> http://cr.openjdk.java.net/~roland/membar/webrev.03/
>
> Roland.

From christian.thalinger at oracle.com  Tue Aug  2 01:46:50 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 2 Aug 2011 10:46:50 +0200
Subject: proposed membar simplification in c2
In-Reply-To: <5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com>
References: <op.vyu3v8drrz1tqb@grosvic> <4E25C003.5030805@oracle.com>
	<op.vy0mj5bmrz1tqb@grosvic>
	<82AE947F-0E5B-475C-84B2-49D9CB6F282A@oracle.com>
	<5D28412E-42A2-4CCE-B27E-26E2836EC394@oracle.com>
Message-ID: <013EBD4C-417A-41AE-8865-56A14D240AE1@oracle.com>

Looks good.  -- Christian

On Aug 1, 2011, at 2:00 PM, Roland Westrelin wrote:

> 
> While doing more testing I found that I had to make some changes in src/share/vm/adlc/formssel.cpp as well. Here is an updated webrev:
> 
> http://cr.openjdk.java.net/~roland/membar/webrev.03/
> 
> Roland.


From joe.j.kearney at gmail.com  Wed Aug  3 07:17:20 2011
From: joe.j.kearney at gmail.com (Joe Kearney)
Date: Wed, 3 Aug 2011 15:17:20 +0100
Subject: IdealGraphVisualizer file compatibility
Message-ID: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>

Hi,

I've been trying to play with igv from
http://ssw.jku.at/General/Staff/TW/igv.html,
http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate
the required log files. What sort of files should I expect the igv to
be able to read? The example files are graphDocument XMLs. I was
hoping to be able to generate a file with something like the
following:

-XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml

Needless to say, these hotspot_log files are totally different and the
igv barfs with the below.

java.lang.NullPointerException
	at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70)
	at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128)
	at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
[catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)


How do I get the jvm to generate the right output file?

Many thanks,
Joe

From christian.thalinger at oracle.com  Wed Aug  3 07:51:06 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 3 Aug 2011 16:51:06 +0200
Subject: IdealGraphVisualizer file compatibility
In-Reply-To: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>
References: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>
Message-ID: <60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com>

You want:  -XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml

The README of the visualizer also helps:

http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README

-- Christian

On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote:

> Hi,
> 
> I've been trying to play with igv from
> http://ssw.jku.at/General/Staff/TW/igv.html,
> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate
> the required log files. What sort of files should I expect the igv to
> be able to read? The example files are graphDocument XMLs. I was
> hoping to be able to generate a file with something like the
> following:
> 
> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml
> 
> Needless to say, these hotspot_log files are totally different and the
> igv barfs with the below.
> 
> java.lang.NullPointerException
> 	at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70)
> 	at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128)
> 	at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
> 
> 
> How do I get the jvm to generate the right output file?
> 
> Many thanks,
> Joe


From peter.hofer at jku.at  Wed Aug  3 08:14:43 2011
From: peter.hofer at jku.at (Peter Hofer)
Date: Wed, 3 Aug 2011 17:14:43 +0200
Subject: IdealGraphVisualizer file compatibility
In-Reply-To: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>
References: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>
Message-ID: <20110803171443.4447acd5@sunflower>

Hi Joe!

> I've been trying to play with igv from
> http://ssw.jku.at/General/Staff/TW/igv.html,
> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate
> the required log files. What sort of files should I expect the igv to
> be able to read? The example files are graphDocument XMLs.

This is IGV's custom XML format. Its structure is described in Thomas
Wuerthinger's master's thesis:
http://ssw.jku.at/Research/Papers/Wuerthinger07Master/

> I was hoping to be able to generate a file with something like the
> following:
> [...]
> How do I get the jvm to generate the right output file?

You need a debug or fastdebug build of Hotspot. Only the server
compiler can generate IGV output, so you need to specify -server if
your VM uses the client compiler by default.

You can then use -XX:PrintIdealGraphLevel=<level> to enable IGV output
and to control the detail level of the generated output (with 1 being
the minimum).

By default, Hotspot's IGV printer tries to send the output to an IGV
instance listening at localhost:4444. You can instead write it to a
file using -XX:PrintIdealGraphFile=<filename> or use
-XX:PrintIdealGraphAddress=<host> and -XX:PrintIdealGraphPort=<port> for
a different network destination.

Best regards,
 Peter

From joe.j.kearney at gmail.com  Wed Aug  3 09:37:31 2011
From: joe.j.kearney at gmail.com (Joe Kearney)
Date: Wed, 3 Aug 2011 17:37:31 +0100
Subject: IdealGraphVisualizer file compatibility
In-Reply-To: <60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com>
References: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>
	<60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com>
Message-ID: <CAARN+eENB3=_XpW7Hdqbe1hXEJ2oRL-JOSEtFNUPZjmp=joNug@mail.gmail.com>

Ah, thanks for the readme link.

I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the
PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with
UnlockDiagnosticVMOptions etc as well. to no avail. Is there something
else needed to expose this?

Joe

On 3 August 2011 15:51, Christian Thalinger
<christian.thalinger at oracle.com> wrote:
> You want: ?-XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml
>
> The README of the visualizer also helps:
>
> http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README
>
> -- Christian
>
> On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote:
>
>> Hi,
>>
>> I've been trying to play with igv from
>> http://ssw.jku.at/General/Staff/TW/igv.html,
>> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate
>> the required log files. What sort of files should I expect the igv to
>> be able to read? The example files are graphDocument XMLs. I was
>> hoping to be able to generate a file with something like the
>> following:
>>
>> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml
>>
>> Needless to say, these hotspot_log files are totally different and the
>> igv barfs with the below.
>>
>> java.lang.NullPointerException
>> ? ? ? at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70)
>> ? ? ? at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128)
>> ? ? ? at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
>> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
>>
>>
>> How do I get the jvm to generate the right output file?
>>
>> Many thanks,
>> Joe
>
>

From christian.thalinger at oracle.com  Wed Aug  3 09:40:45 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 3 Aug 2011 18:40:45 +0200
Subject: IdealGraphVisualizer file compatibility
In-Reply-To: <CAARN+eENB3=_XpW7Hdqbe1hXEJ2oRL-JOSEtFNUPZjmp=joNug@mail.gmail.com>
References: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>
	<60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com>
	<CAARN+eENB3=_XpW7Hdqbe1hXEJ2oRL-JOSEtFNUPZjmp=joNug@mail.gmail.com>
Message-ID: <A01FAB06-00A7-4300-A948-6730EB564280@oracle.com>

You need a debug build.  -- Christian

On Aug 3, 2011, at 6:37 PM, Joe Kearney wrote:

> Ah, thanks for the readme link.
> 
> I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the
> PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with
> UnlockDiagnosticVMOptions etc as well. to no avail. Is there something
> else needed to expose this?
> 
> Joe
> 
> On 3 August 2011 15:51, Christian Thalinger
> <christian.thalinger at oracle.com> wrote:
>> You want:  -XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml
>> 
>> The README of the visualizer also helps:
>> 
>> http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README
>> 
>> -- Christian
>> 
>> On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote:
>> 
>>> Hi,
>>> 
>>> I've been trying to play with igv from
>>> http://ssw.jku.at/General/Staff/TW/igv.html,
>>> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate
>>> the required log files. What sort of files should I expect the igv to
>>> be able to read? The example files are graphDocument XMLs. I was
>>> hoping to be able to generate a file with something like the
>>> following:
>>> 
>>> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml
>>> 
>>> Needless to say, these hotspot_log files are totally different and the
>>> igv barfs with the below.
>>> 
>>> java.lang.NullPointerException
>>>       at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70)
>>>       at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128)
>>>       at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
>>> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
>>> 
>>> 
>>> How do I get the jvm to generate the right output file?
>>> 
>>> Many thanks,
>>> Joe
>> 
>> 


From tom.rodriguez at oracle.com  Wed Aug  3 09:42:05 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 3 Aug 2011 09:42:05 -0700
Subject: IdealGraphVisualizer file compatibility
In-Reply-To: <CAARN+eENB3=_XpW7Hdqbe1hXEJ2oRL-JOSEtFNUPZjmp=joNug@mail.gmail.com>
References: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>
	<60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com>
	<CAARN+eENB3=_XpW7Hdqbe1hXEJ2oRL-JOSEtFNUPZjmp=joNug@mail.gmail.com>
Message-ID: <ABDD905E-085D-45AC-B552-BD1CBFF93DAC@oracle.com>

It's not available in the product as it's really intended for developers.  Use a fastdebug build.

tom

On Aug 3, 2011, at 9:37 AM, Joe Kearney wrote:

> Ah, thanks for the readme link.
> 
> I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the
> PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with
> UnlockDiagnosticVMOptions etc as well. to no avail. Is there something
> else needed to expose this?
> 
> Joe
> 
> On 3 August 2011 15:51, Christian Thalinger
> <christian.thalinger at oracle.com> wrote:
>> You want:  -XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml
>> 
>> The README of the visualizer also helps:
>> 
>> http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README
>> 
>> -- Christian
>> 
>> On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote:
>> 
>>> Hi,
>>> 
>>> I've been trying to play with igv from
>>> http://ssw.jku.at/General/Staff/TW/igv.html,
>>> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate
>>> the required log files. What sort of files should I expect the igv to
>>> be able to read? The example files are graphDocument XMLs. I was
>>> hoping to be able to generate a file with something like the
>>> following:
>>> 
>>> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml
>>> 
>>> Needless to say, these hotspot_log files are totally different and the
>>> igv barfs with the below.
>>> 
>>> java.lang.NullPointerException
>>>       at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70)
>>>       at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128)
>>>       at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
>>> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
>>> 
>>> 
>>> How do I get the jvm to generate the right output file?
>>> 
>>> Many thanks,
>>> Joe
>> 
>> 


From joe.j.kearney at gmail.com  Wed Aug  3 10:00:26 2011
From: joe.j.kearney at gmail.com (Joe Kearney)
Date: Wed, 3 Aug 2011 18:00:26 +0100
Subject: IdealGraphVisualizer file compatibility
In-Reply-To: <ABDD905E-085D-45AC-B552-BD1CBFF93DAC@oracle.com>
References: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>
	<60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com>
	<CAARN+eENB3=_XpW7Hdqbe1hXEJ2oRL-JOSEtFNUPZjmp=joNug@mail.gmail.com>
	<ABDD905E-085D-45AC-B552-BD1CBFF93DAC@oracle.com>
Message-ID: <CAARN+eGVB=4qqu1KNJRx3j-cV8E=ucrNruU7qKeJU0Mwx=KhOA@mail.gmail.com>

Oh ok, I didn't realise. Thanks. Are there any plans to make it more
widely available? I can see it being useful for experimenting to
squeeze performance.

Thanks,
Joe

On 3 August 2011 17:42, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> It's not available in the product as it's really intended for developers. ?Use a fastdebug build.
>
> tom
>
> On Aug 3, 2011, at 9:37 AM, Joe Kearney wrote:
>
>> Ah, thanks for the readme link.
>>
>> I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the
>> PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with
>> UnlockDiagnosticVMOptions etc as well. to no avail. Is there something
>> else needed to expose this?
>>
>> Joe
>>
>> On 3 August 2011 15:51, Christian Thalinger
>> <christian.thalinger at oracle.com> wrote:
>>> You want: ?-XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml
>>>
>>> The README of the visualizer also helps:
>>>
>>> http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README
>>>
>>> -- Christian
>>>
>>> On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote:
>>>
>>>> Hi,
>>>>
>>>> I've been trying to play with igv from
>>>> http://ssw.jku.at/General/Staff/TW/igv.html,
>>>> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate
>>>> the required log files. What sort of files should I expect the igv to
>>>> be able to read? The example files are graphDocument XMLs. I was
>>>> hoping to be able to generate a file with something like the
>>>> following:
>>>>
>>>> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml
>>>>
>>>> Needless to say, these hotspot_log files are totally different and the
>>>> igv barfs with the below.
>>>>
>>>> java.lang.NullPointerException
>>>> ? ? ? at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70)
>>>> ? ? ? at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128)
>>>> ? ? ? at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
>>>> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
>>>>
>>>>
>>>> How do I get the jvm to generate the right output file?
>>>>
>>>> Many thanks,
>>>> Joe
>>>
>>>
>
>

From igor.veresov at oracle.com  Wed Aug  3 13:40:17 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 03 Aug 2011 13:40:17 -0700
Subject: review(XXS): 7060842: UseNUMA crash with UseHugreTLBFS running
	SPECjvm2008
Message-ID: <4E39B231.5070606@oracle.com>

It seems that madvise(MADV_FREE) breaks pages reservation semantics of 
the the underlying segment. With tight memory constraints this would 
cause a race for pages and a segfault if the JVM louses. The solution is 
to revert back to the previous implementation of os::free_memory() that 
used mmap().

Webrev: http://cr.openjdk.java.net/~iveresov/7060842/webrev.00/

Tested is gc test suite.

igor

From vladimir.kozlov at oracle.com  Thu Aug  4 18:19:42 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 04 Aug 2011 18:19:42 -0700
Subject: Request for reviews (L):  7063629: use cbcond in C2 generated code
	on T4
Message-ID: <4E3B452E.10509@oracle.com>

http://cr.openjdk.java.net/~kvn/7063629/webrev

7063629: use cbcond in C2 generated code on T4

The code is finally shaped as I want and it passed CTW, regression, nsk tests on 
T4 and x86.

Added new fused compare and branch instructions into sparc.ad and corresponding 
short versions which use cbcond instruction. Added new flag avoid_back_to_back 
to avoid generation of cbcond back to back.

Split shorten_branches() into 2 methods. First method conservatively estimates 
code size and branches location and does few rounds of branch shortening. It is 
executed before ScheduleAndBundle(). Step 3 is moved to new method 
shorten_branches_final() called after ScheduleAndBundle(). It does final 
paddings, alignment and final branch replacement. Method fill_buffer() does 
verification instead of padding.

Labels are binded now only during code generation in fill_buffer(). As result 
they are not available when forward branches are emitted. To fix that 
MacroAssembler branch instructions are used now in x86 .ad files. I replaced 
unused rtype parameter with maybe_short flag to force using only long branches 
in .ad long branch instructions.

Added check to adlc to verify that short version of a branch instructions has 
the same declaration in .ad file.

Added assert to verify that the size of emitted instruction matches the value 
returned by MachNode::size(). Found that MachBreakpointNode::size() returned 
incorrect value on x64.

Fixed loop alignment for Sparc (min alignment should be instruction size which 
is 4 bytes instead of 1 byte).

The prototype was done by Tom and I took some of his additional fixes. The block 
changes go with some code in output to put opto assembly style block comments in 
the PrintNMethods output. There's also snippet in there that deals with the fact 
kill projections on branches make it appear the kill occurs after the branch 
instead of being part of it.

From christian.thalinger at oracle.com  Fri Aug  5 06:26:26 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 5 Aug 2011 15:26:26 +0200
Subject: review for 6990212: JSR 292 JVMTI MethodEnter hook is not called
	for JSR 292 bootstrap and target methods
In-Reply-To: <8D08FBBE-B796-45C1-A8DC-626531ABD5C2@oracle.com>
References: <6FC4D868-6EC6-4DE5-92C4-A55B42AF3CFE@oracle.com>
	<25A1B825-BC91-4F1E-B7B1-C8E507F8EA34@oracle.com>
	<839E75F4-67A3-4C3B-AD06-9985EB762357@oracle.com>
	<9E4737BC-6971-42B4-B9B4-C5BC9A2FCA1C@oracle.com>
	<8D08FBBE-B796-45C1-A8DC-626531ABD5C2@oracle.com>
Message-ID: <8A9AEEB6-BB68-4D9A-A762-97C0561FC2B8@oracle.com>

I really had this feeling that this change is going to break something.  Two JDK tests are failing on x86 and SPARC:

FAILED: java/lang/invoke/JavaDocExamplesTest.java
FAILED: java/lang/invoke/MethodHandlesTest.java

It's the raise_exception path:

JUnit version 4.4
.......................................E.E.
Time: 1.767
There were 2 failures:
1) testInterfaceCast(test.java.lang.invoke.MethodHandlesTest)
java.lang.InternalError: unexpected code -38348624: required class java.lang.Number but encountered class java.lang.String
        at java.lang.invoke.MethodHandleNatives.raiseException(MethodHandleNatives.java:375)
        at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:566)
        at test.java.lang.invoke.MethodHandlesTest.testInterfaceCast(MethodHandlesTest.java:2231)
        at test.java.lang.invoke.MethodHandlesTest.testInterfaceCast(MethodHandlesTest.java:2208)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59)
        at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98)
        at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79)
        at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87)
        at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77)
        at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42)
        at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88)
        at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51)
        at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44)
        at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27)
        at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37)
        at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42)
        at org.junit.internal.runners.CompositeRunner.runChildren(CompositeRunner.java:33)
        at org.junit.internal.runners.CompositeRunner.run(CompositeRunner.java:28)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:109)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:100)
        at org.junit.runner.JUnitCore.runMain(JUnitCore.java:81)
        at org.junit.runner.JUnitCore.main(JUnitCore.java:44)
2) testCastFailure(test.java.lang.invoke.MethodHandlesTest)
java.lang.InternalError: unexpected code -38348480: required class java.lang.Integer but encountered class java.lang.String
        at java.lang.invoke.MethodHandleNatives.raiseException(MethodHandleNatives.java:375)
        at test.java.lang.invoke.MethodHandlesTest.testCastFailure(MethodHandlesTest.java:2340)
        at test.java.lang.invoke.MethodHandlesTest.testCastFailure(MethodHandlesTest.java:2251)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59)
        at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98)
        at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79)
        at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87)
        at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77)
        at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42)
        at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88)
        at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51)
        at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44)
        at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27)
        at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37)
        at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42)
        at org.junit.internal.runners.CompositeRunner.runChildren(CompositeRunner.java:33)
        at org.junit.internal.runners.CompositeRunner.run(CompositeRunner.java:28)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:109)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:100)
        at org.junit.runner.JUnitCore.runMain(JUnitCore.java:81)
        at org.junit.runner.JUnitCore.main(JUnitCore.java:44)

FAILURES!!!
Tests run: 41,  Failures: 2

-- Christian

On Jul 14, 2011, at 9:49 PM, Tom Rodriguez wrote:

> 
> On Jul 12, 2011, at 9:38 AM, Christian Thalinger wrote:
> 
>> On Jul 11, 2011, at 5:43 PM, Christian Thalinger wrote:
>>> On Jul 9, 2011, at 12:21 AM, Tom Rodriguez wrote:
>>>> Coleen point out that it's confusing to reuse the name jump_from_interpreted since we're not really in the interpreter.  I've changed it to jump_from_method_handle and left that note that it parallels jump_from_interpreted.
>>> 
>>> This looks good.  Although I'm a little worried about the raise_exception changes on SPARC.  In the past I had various crashes with versions that used the interpreter stack to pass the arguments.  That's why I changed it to the simpler, more reliable current version (which uses the compiler calling convention).  Maybe I got adjust_SP_and_Gargs_down_by_slots right and there is no problem now.
>>> 
>>> Just to be sure I'm currently running JRuby's benchmarks (my memory tells me that I had the crashes with these benchmarks) on two different SPARC boxes.  I'll let you know when they are finished.
>> 
>> Sorry, it took a little longer to run them because one of the benchmarks (bench_full_load_path.rb) does not finish (it hangs around doing nothing).  Anyway, all others look good.
> 
> Thanks.  I fixed the interp_only check to look more like the original code and reran the mlvm tests and they all look good.
> 
> tom
> 
>> 
>> -- Christian
> 


From christian.thalinger at oracle.com  Fri Aug  5 06:32:14 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 5 Aug 2011 15:32:14 +0200
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
Message-ID: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>

http://cr.openjdk.java.net/~twisti/7071653

7071653: JSR 292: call site change notification should be pushed not pulled
Reviewed-by:

Currently every speculatively inlined method handle call site has a
guard that compares the current target of the CallSite object to the
inlined one.  This per-invocation overhead can be removed if the
notification is changed from pulled to pushed (i.e. deoptimization).

I had to change the logic in TemplateTable::patch_bytecode to skip
bytecode quickening for putfield instructions when the put_code
written to the constant pool cache is zero.  This is required so that
every execution of a putfield to CallSite.target calls out to
InterpreterRuntime::resolve_get_put to do the deoptimization of
depending compiled methods.

I also had to change the dependency machinery to understand other
dependencies than class hierarchy ones.  DepChange got the super-type
of two new dependencies, KlassDepChange and CallSiteDepChange.

Tested with JRuby tests and benchmarks, hand-written testcases, JDK
tests and vm.mlvm tests.

Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
second with 7071653).  Since the CallSite targets don't change during
the runtime of this benchmark we can see the performance benefit of
eliminating the guard:

$ jruby --server bench/bench_fib_recursive.rb 5 35
  0.883000   0.000000   0.883000 (  0.854000)
  0.715000   0.000000   0.715000 (  0.715000)
  0.712000   0.000000   0.712000 (  0.712000)
  0.713000   0.000000   0.713000 (  0.713000)
  0.713000   0.000000   0.713000 (  0.712000)

$ jruby --server bench/bench_fib_recursive.rb 5 35
  0.772000   0.000000   0.772000 (  0.742000)
  0.624000   0.000000   0.624000 (  0.624000)
  0.621000   0.000000   0.621000 (  0.621000)
  0.622000   0.000000   0.622000 (  0.622000)
  0.622000   0.000000   0.622000 (  0.621000)


From tom.rodriguez at oracle.com  Fri Aug  5 09:48:28 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 5 Aug 2011 09:48:28 -0700
Subject: review for 6990212: JSR 292 JVMTI MethodEnter hook is not called
	for JSR 292 bootstrap and target methods
In-Reply-To: <8A9AEEB6-BB68-4D9A-A762-97C0561FC2B8@oracle.com>
References: <6FC4D868-6EC6-4DE5-92C4-A55B42AF3CFE@oracle.com>
	<25A1B825-BC91-4F1E-B7B1-C8E507F8EA34@oracle.com>
	<839E75F4-67A3-4C3B-AD06-9985EB762357@oracle.com>
	<9E4737BC-6971-42B4-B9B4-C5BC9A2FCA1C@oracle.com>
	<8D08FBBE-B796-45C1-A8DC-626531ABD5C2@oracle.com>
	<8A9AEEB6-BB68-4D9A-A762-97C0561FC2B8@oracle.com>
Message-ID: <2115F2F0-1A22-46CB-9ACE-DB1B404A4853@oracle.com>

Yeah vladimir reported something similar to me last night.  I'm looking at it.

tom

On Aug 5, 2011, at 6:26 AM, Christian Thalinger wrote:

> I really had this feeling that this change is going to break something.  Two JDK tests are failing on x86 and SPARC:
> 
> FAILED: java/lang/invoke/JavaDocExamplesTest.java
> FAILED: java/lang/invoke/MethodHandlesTest.java
> 
> It's the raise_exception path:
> 
> JUnit version 4.4
> .......................................E.E.
> Time: 1.767
> There were 2 failures:
> 1) testInterfaceCast(test.java.lang.invoke.MethodHandlesTest)
> java.lang.InternalError: unexpected code -38348624: required class java.lang.Number but encountered class java.lang.String
>        at java.lang.invoke.MethodHandleNatives.raiseException(MethodHandleNatives.java:375)
>        at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:566)
>        at test.java.lang.invoke.MethodHandlesTest.testInterfaceCast(MethodHandlesTest.java:2231)
>        at test.java.lang.invoke.MethodHandlesTest.testInterfaceCast(MethodHandlesTest.java:2208)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:601)
>        at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59)
>        at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98)
>        at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79)
>        at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87)
>        at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77)
>        at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42)
>        at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88)
>        at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51)
>        at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44)
>        at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27)
>        at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37)
>        at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42)
>        at org.junit.internal.runners.CompositeRunner.runChildren(CompositeRunner.java:33)
>        at org.junit.internal.runners.CompositeRunner.run(CompositeRunner.java:28)
>        at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
>        at org.junit.runner.JUnitCore.run(JUnitCore.java:109)
>        at org.junit.runner.JUnitCore.run(JUnitCore.java:100)
>        at org.junit.runner.JUnitCore.runMain(JUnitCore.java:81)
>        at org.junit.runner.JUnitCore.main(JUnitCore.java:44)
> 2) testCastFailure(test.java.lang.invoke.MethodHandlesTest)
> java.lang.InternalError: unexpected code -38348480: required class java.lang.Integer but encountered class java.lang.String
>        at java.lang.invoke.MethodHandleNatives.raiseException(MethodHandleNatives.java:375)
>        at test.java.lang.invoke.MethodHandlesTest.testCastFailure(MethodHandlesTest.java:2340)
>        at test.java.lang.invoke.MethodHandlesTest.testCastFailure(MethodHandlesTest.java:2251)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:601)
>        at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59)
>        at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98)
>        at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:79)
>        at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:87)
>        at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77)
>        at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:42)
>        at org.junit.internal.runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88)
>        at org.junit.internal.runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51)
>        at org.junit.internal.runners.JUnit4ClassRunner$1.run(JUnit4ClassRunner.java:44)
>        at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:27)
>        at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37)
>        at org.junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java:42)
>        at org.junit.internal.runners.CompositeRunner.runChildren(CompositeRunner.java:33)
>        at org.junit.internal.runners.CompositeRunner.run(CompositeRunner.java:28)
>        at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
>        at org.junit.runner.JUnitCore.run(JUnitCore.java:109)
>        at org.junit.runner.JUnitCore.run(JUnitCore.java:100)
>        at org.junit.runner.JUnitCore.runMain(JUnitCore.java:81)
>        at org.junit.runner.JUnitCore.main(JUnitCore.java:44)
> 
> FAILURES!!!
> Tests run: 41,  Failures: 2
> 
> -- Christian
> 
> On Jul 14, 2011, at 9:49 PM, Tom Rodriguez wrote:
> 
>> 
>> On Jul 12, 2011, at 9:38 AM, Christian Thalinger wrote:
>> 
>>> On Jul 11, 2011, at 5:43 PM, Christian Thalinger wrote:
>>>> On Jul 9, 2011, at 12:21 AM, Tom Rodriguez wrote:
>>>>> Coleen point out that it's confusing to reuse the name jump_from_interpreted since we're not really in the interpreter.  I've changed it to jump_from_method_handle and left that note that it parallels jump_from_interpreted.
>>>> 
>>>> This looks good.  Although I'm a little worried about the raise_exception changes on SPARC.  In the past I had various crashes with versions that used the interpreter stack to pass the arguments.  That's why I changed it to the simpler, more reliable current version (which uses the compiler calling convention).  Maybe I got adjust_SP_and_Gargs_down_by_slots right and there is no problem now.
>>>> 
>>>> Just to be sure I'm currently running JRuby's benchmarks (my memory tells me that I had the crashes with these benchmarks) on two different SPARC boxes.  I'll let you know when they are finished.
>>> 
>>> Sorry, it took a little longer to run them because one of the benchmarks (bench_full_load_path.rb) does not finish (it hangs around doing nothing).  Anyway, all others look good.
>> 
>> Thanks.  I fixed the interp_only check to look more like the original code and reran the mlvm tests and they all look good.
>> 
>> tom
>> 
>>> 
>>> -- Christian
>> 
> 


From forax at univ-mlv.fr  Fri Aug  5 10:19:56 2011
From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=)
Date: Fri, 05 Aug 2011 19:19:56 +0200
Subject: Request for review (L): 7071653: JSR 292: call site
	change	notification should be pushed not pulled
In-Reply-To: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
Message-ID: <4E3C263C.50604@univ-mlv.fr>

Cool :)

R?mi

On 08/05/2011 03:32 PM, Christian Thalinger wrote:
> http://cr.openjdk.java.net/~twisti/7071653
>
> 7071653: JSR 292: call site change notification should be pushed not pulled
> Reviewed-by:
>
> Currently every speculatively inlined method handle call site has a
> guard that compares the current target of the CallSite object to the
> inlined one.  This per-invocation overhead can be removed if the
> notification is changed from pulled to pushed (i.e. deoptimization).
>
> I had to change the logic in TemplateTable::patch_bytecode to skip
> bytecode quickening for putfield instructions when the put_code
> written to the constant pool cache is zero.  This is required so that
> every execution of a putfield to CallSite.target calls out to
> InterpreterRuntime::resolve_get_put to do the deoptimization of
> depending compiled methods.
>
> I also had to change the dependency machinery to understand other
> dependencies than class hierarchy ones.  DepChange got the super-type
> of two new dependencies, KlassDepChange and CallSiteDepChange.
>
> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
> tests and vm.mlvm tests.
>
> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
> second with 7071653).  Since the CallSite targets don't change during
> the runtime of this benchmark we can see the performance benefit of
> eliminating the guard:
>
> $ jruby --server bench/bench_fib_recursive.rb 5 35
>    0.883000   0.000000   0.883000 (  0.854000)
>    0.715000   0.000000   0.715000 (  0.715000)
>    0.712000   0.000000   0.712000 (  0.712000)
>    0.713000   0.000000   0.713000 (  0.713000)
>    0.713000   0.000000   0.713000 (  0.712000)
>
> $ jruby --server bench/bench_fib_recursive.rb 5 35
>    0.772000   0.000000   0.772000 (  0.742000)
>    0.624000   0.000000   0.624000 (  0.624000)
>    0.621000   0.000000   0.621000 (  0.621000)
>    0.622000   0.000000   0.622000 (  0.622000)
>    0.622000   0.000000   0.622000 (  0.621000)
>


From tom.rodriguez at oracle.com  Fri Aug  5 13:22:37 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 5 Aug 2011 13:22:37 -0700
Subject: review for 7075623: 6990212 broke raiseException in 64 bit
Message-ID: <A320A62A-37AF-4CA7-8763-A832C2A00DCE@oracle.com>

http://cr.openjdk.java.net/~never/7075623
3 lines changed: 0 ins; 0 del; 3 mod; 4699 unchg

7075623: 6990212 broke raiseException in 64 bit
Reviewed-by:

The fix for 6990212 included making the raiseException path do a
normal dispatch instead of always using the compiler entry.  The
assembly for 64 bit had a few issues.  On x86 the saved sp register is
wrong which causes rarg0_code to be killed.  On sparc the code should
be passed as an int instead of a ptr which causes problems because of
endianness.  I also modified the x86 code to do the same.  Tested with
original regression test on sparc/x86 32/64 -Xcomp/-Xmixed.  I also
reran the failing JDK regression tests.


From headius at headius.com  Fri Aug  5 14:26:27 2011
From: headius at headius.com (Charles Oliver Nutter)
Date: Fri, 5 Aug 2011 16:26:27 -0500
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <4E3C263C.50604@univ-mlv.fr>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<4E3C263C.50604@univ-mlv.fr>
Message-ID: <37CFE89F-7AC8-4A37-979A-F7EF4B06745B@headius.com>

I concur! I can't wait to see the new asm with recent fixes!

- Charlie (mobile)

On Aug 5, 2011, at 12:19, R?mi Forax <forax at univ-mlv.fr> wrote:

> Cool :)
> 
> R?mi
> 
> On 08/05/2011 03:32 PM, Christian Thalinger wrote:
>> http://cr.openjdk.java.net/~twisti/7071653
>> 
>> 7071653: JSR 292: call site change notification should be pushed not pulled
>> Reviewed-by:
>> 
>> Currently every speculatively inlined method handle call site has a
>> guard that compares the current target of the CallSite object to the
>> inlined one.  This per-invocation overhead can be removed if the
>> notification is changed from pulled to pushed (i.e. deoptimization).
>> 
>> I had to change the logic in TemplateTable::patch_bytecode to skip
>> bytecode quickening for putfield instructions when the put_code
>> written to the constant pool cache is zero.  This is required so that
>> every execution of a putfield to CallSite.target calls out to
>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>> depending compiled methods.
>> 
>> I also had to change the dependency machinery to understand other
>> dependencies than class hierarchy ones.  DepChange got the super-type
>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>> 
>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>> tests and vm.mlvm tests.
>> 
>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>> second with 7071653).  Since the CallSite targets don't change during
>> the runtime of this benchmark we can see the performance benefit of
>> eliminating the guard:
>> 
>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>   0.883000   0.000000   0.883000 (  0.854000)
>>   0.715000   0.000000   0.715000 (  0.715000)
>>   0.712000   0.000000   0.712000 (  0.712000)
>>   0.713000   0.000000   0.713000 (  0.713000)
>>   0.713000   0.000000   0.713000 (  0.712000)
>> 
>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>   0.772000   0.000000   0.772000 (  0.742000)
>>   0.624000   0.000000   0.624000 (  0.624000)
>>   0.621000   0.000000   0.621000 (  0.621000)
>>   0.622000   0.000000   0.622000 (  0.622000)
>>   0.622000   0.000000   0.622000 (  0.621000)
>> 
> 

From vladimir.kozlov at oracle.com  Sat Aug  6 10:50:44 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Sat, 06 Aug 2011 17:50:44 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7075559: JPRT windows_x64 build failure
Message-ID: <20110806175046.BF9DB479AD@hg.openjdk.java.net>

Changeset: 4aa5974a06dd
Author:    kvn
Date:      2011-08-06 08:28 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/4aa5974a06dd

7075559: JPRT windows_x64 build failure
Summary: use SA_CLASSDIR variable instead of dirsctory saclasses.
Reviewed-by: kamg, dcubed

! make/linux/makefiles/defs.make
! make/solaris/makefiles/defs.make
! make/solaris/makefiles/saproc.make
! make/windows/makefiles/defs.make
! make/windows/makefiles/sa.make


From vladimir.kozlov at oracle.com  Sun Aug  7 15:35:29 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Sun, 07 Aug 2011 15:35:29 -0700
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
Message-ID: <4E3F1331.2000909@oracle.com>

Christian,

You need to add big comment to the new code in templateTable_<arch>.cpp explaining what it does and why.

Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32  bit)?

Add assert(byte_no == -1, ) to default: case to make sure you got all cases above it.

I am concern about using next short branch in new code in templateTable_sparc.cpp:

cmp_and_br_short(..., L_patch_done);  // don't patch

There is __ stop() call which generates a lot of code so that label L_patch_done could be far.


Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files.

I don't like assignments in condition and implicit NULL checks. Can you change check_dependency() to next?:

       klassOop check_dependency() {
         klassOop result = check_klass_dependency(NULL);
         if (result != NULL) return result;
         return check_call_site_dependency(NULL);
       }

In interpreterRuntime.cpp initialize marked:  int marked = 0;

Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it.


Vladimir

On 8/5/11 6:32 AM, Christian Thalinger wrote:
> http://cr.openjdk.java.net/~twisti/7071653
>
> 7071653: JSR 292: call site change notification should be pushed not pulled
> Reviewed-by:
>
> Currently every speculatively inlined method handle call site has a
> guard that compares the current target of the CallSite object to the
> inlined one.  This per-invocation overhead can be removed if the
> notification is changed from pulled to pushed (i.e. deoptimization).
>
> I had to change the logic in TemplateTable::patch_bytecode to skip
> bytecode quickening for putfield instructions when the put_code
> written to the constant pool cache is zero.  This is required so that
> every execution of a putfield to CallSite.target calls out to
> InterpreterRuntime::resolve_get_put to do the deoptimization of
> depending compiled methods.
>
> I also had to change the dependency machinery to understand other
> dependencies than class hierarchy ones.  DepChange got the super-type
> of two new dependencies, KlassDepChange and CallSiteDepChange.
>
> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
> tests and vm.mlvm tests.
>
> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
> second with 7071653).  Since the CallSite targets don't change during
> the runtime of this benchmark we can see the performance benefit of
> eliminating the guard:
>
> $ jruby --server bench/bench_fib_recursive.rb 5 35
>    0.883000   0.000000   0.883000 (  0.854000)
>    0.715000   0.000000   0.715000 (  0.715000)
>    0.712000   0.000000   0.712000 (  0.712000)
>    0.713000   0.000000   0.713000 (  0.713000)
>    0.713000   0.000000   0.713000 (  0.712000)
>
> $ jruby --server bench/bench_fib_recursive.rb 5 35
>    0.772000   0.000000   0.772000 (  0.742000)
>    0.624000   0.000000   0.624000 (  0.624000)
>    0.621000   0.000000   0.621000 (  0.621000)
>    0.622000   0.000000   0.622000 (  0.622000)
>    0.622000   0.000000   0.622000 (  0.621000)
>

From christian.thalinger at oracle.com  Mon Aug  8 01:34:50 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 8 Aug 2011 10:34:50 +0200
Subject: review for 7075623: 6990212 broke raiseException in 64 bit
In-Reply-To: <A320A62A-37AF-4CA7-8763-A832C2A00DCE@oracle.com>
References: <A320A62A-37AF-4CA7-8763-A832C2A00DCE@oracle.com>
Message-ID: <98787BDF-3C3A-45AF-B2D7-CDB4763E8D0D@oracle.com>

Looks good.  -- Christian

On Aug 5, 2011, at 10:22 PM, Tom Rodriguez wrote:

> http://cr.openjdk.java.net/~never/7075623
> 3 lines changed: 0 ins; 0 del; 3 mod; 4699 unchg
> 
> 7075623: 6990212 broke raiseException in 64 bit
> Reviewed-by:
> 
> The fix for 6990212 included making the raiseException path do a
> normal dispatch instead of always using the compiler entry.  The
> assembly for 64 bit had a few issues.  On x86 the saved sp register is
> wrong which causes rarg0_code to be killed.  On sparc the code should
> be passed as an int instead of a ptr which causes problems because of
> endianness.  I also modified the x86 code to do the same.  Tested with
> original regression test on sparc/x86 32/64 -Xcomp/-Xmixed.  I also
> reran the failing JDK regression tests.
> 


From gbenson at redhat.com  Mon Aug  8 03:25:22 2011
From: gbenson at redhat.com (Gary Benson)
Date: Mon, 8 Aug 2011 11:25:22 +0100
Subject: Review Request: zero/shark doesn't build after b147-fcs
In-Reply-To: <7ADFDF69-ADDA-4B24-8F78-82D52F46FD2B@oracle.com>
References: <4E1C5E4F.1080307@zafena.se> <4E2049CA.8060506@LGonQn.Org>
	<20110715145127.GA3311@redhat.com>
	<7ADFDF69-ADDA-4B24-8F78-82D52F46FD2B@oracle.com>
Message-ID: <20110808102522.GB2761@redhat.com>

Christian Thalinger wrote:
> On Jul 15, 2011, at 4:51 PM, Gary Benson wrote:
> > Chris Phillips wrote:
> > > http://lgonqn.org/temp/ChrisPhi/webrev-sharkContext.hpp-typo-in-assert/
> > 
> > Nice catch :)
> > 
> > > http://lgonqn.org/temp/ChrisPhi/webrev-methodHandles_zero.hpp-missing/
> > 
> > You could probably make adapter_code_size be 0, or something small
> > 1ike 1*k.  Nothing will presumably be generated into these buffers
> > after all?
> 
> Gary, can I add you as a reviewer?  -- Christian

Sure.  Sorry for the delay in replying, I was on PTO.

Thanks,
Gary

-- 
http://gbenson.net/

From christian.thalinger at oracle.com  Mon Aug  8 06:56:00 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 8 Aug 2011 15:56:00 +0200
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <4E3F1331.2000909@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<4E3F1331.2000909@oracle.com>
Message-ID: <BB9775B4-13F9-4222-8C2E-FCF3997897E5@oracle.com>


On Aug 8, 2011, at 12:35 AM, Vladimir Kozlov wrote:

> Christian,
> 
> You need to add big comment to the new code in templateTable_<arch>.cpp explaining what it does and why.

Done.  I made the wording a little more general because Tom's effectively final work might use the same machinery.

> 
> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32  bit)?

Good question.  I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr.

_indices in CosntantPoolCacheEntry is defined as intx:

  volatile intx     _indices;  // constant pool index & rewrite bytecodes

and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:

// bit number |31                0|
// bit length |-8--|-8--|---16----|
// --------------------------------
// _indices   [ b2 | b1 |  index  ]

Loading 32-bit on LE gives you the right bits but on BE it does not.  I think that's the reason for the "optimization" on x64.

> 
> Add assert(byte_no == -1, ) to default: case to make sure you got all cases above it.

Done.

> 
> I am concern about using next short branch in new code in templateTable_sparc.cpp:
> 
> cmp_and_br_short(..., L_patch_done);  // don't patch
> 
> There is __ stop() call which generates a lot of code so that label L_patch_done could be far.

Yeah, I thought I give it a try if it works.  cmp_and_br_short should assert if the branch displacement is too far, right?

> 
> 
> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files.

Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods).  It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it.  I missed that.  But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies.

> 
> I don't like assignments in condition and implicit NULL checks. Can you change check_dependency() to next?:
> 
>      klassOop check_dependency() {
>        klassOop result = check_klass_dependency(NULL);
>        if (result != NULL) return result;
>        return check_call_site_dependency(NULL);
>      }

Done.

> 
> In interpreterRuntime.cpp initialize marked:  int marked = 0;

OK.

> 
> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it.

The spec of MutableCallSite says:

"For target values which will be frequently updated, consider using a volatile call site instead."

And VolatileCallSite says:

"A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads.

Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads.

In other respects, a VolatileCallSite is interchangeable with MutableCallSite."

Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case.  Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet.

Additionally I had to do two small changes because the build was broken on some configurations:

-  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL;
+  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL;

and

-      MutexLockerEx ccl(CodeCache_lock, thread);
+      MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);

I updated the webrev.

-- Christian

> 
> 
> Vladimir
> 
> On 8/5/11 6:32 AM, Christian Thalinger wrote:
>> http://cr.openjdk.java.net/~twisti/7071653
>> 
>> 7071653: JSR 292: call site change notification should be pushed not pulled
>> Reviewed-by:
>> 
>> Currently every speculatively inlined method handle call site has a
>> guard that compares the current target of the CallSite object to the
>> inlined one.  This per-invocation overhead can be removed if the
>> notification is changed from pulled to pushed (i.e. deoptimization).
>> 
>> I had to change the logic in TemplateTable::patch_bytecode to skip
>> bytecode quickening for putfield instructions when the put_code
>> written to the constant pool cache is zero.  This is required so that
>> every execution of a putfield to CallSite.target calls out to
>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>> depending compiled methods.
>> 
>> I also had to change the dependency machinery to understand other
>> dependencies than class hierarchy ones.  DepChange got the super-type
>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>> 
>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>> tests and vm.mlvm tests.
>> 
>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>> second with 7071653).  Since the CallSite targets don't change during
>> the runtime of this benchmark we can see the performance benefit of
>> eliminating the guard:
>> 
>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>   0.883000   0.000000   0.883000 (  0.854000)
>>   0.715000   0.000000   0.715000 (  0.715000)
>>   0.712000   0.000000   0.712000 (  0.712000)
>>   0.713000   0.000000   0.713000 (  0.713000)
>>   0.713000   0.000000   0.713000 (  0.712000)
>> 
>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>   0.772000   0.000000   0.772000 (  0.742000)
>>   0.624000   0.000000   0.624000 (  0.624000)
>>   0.621000   0.000000   0.621000 (  0.621000)
>>   0.622000   0.000000   0.622000 (  0.622000)
>>   0.622000   0.000000   0.622000 (  0.621000)
>> 


From vladimir.kozlov at oracle.com  Mon Aug  8 07:55:32 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 08 Aug 2011 07:55:32 -0700
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <BB9775B4-13F9-4222-8C2E-FCF3997897E5@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<4E3F1331.2000909@oracle.com>
	<BB9775B4-13F9-4222-8C2E-FCF3997897E5@oracle.com>
Message-ID: <4E3FF8E4.2070302@oracle.com>

Christian,

Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is 
zero only in invoke dynamic case?

On 8/8/11 6:56 AM, Christian Thalinger wrote:
>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32  bit)?
>
> Good question.  I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr.
>
> _indices in CosntantPoolCacheEntry is defined as intx:
>
>    volatile intx     _indices;  // constant pool index&  rewrite bytecodes
>
> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:
>
> // bit number |31                0|
> // bit length |-8--|-8--|---16----|
> // --------------------------------
> // _indices   [ b2 | b1 |  index  ]
>
> Loading 32-bit on LE gives you the right bits but on BE it does not.  I think that's the reason for the "optimization" on x64.

I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file).

>>
>> I am concern about using next short branch in new code in templateTable_sparc.cpp:
>>
>> cmp_and_br_short(..., L_patch_done);  // don't patch
>>
>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far.
>
> Yeah, I thought I give it a try if it works.  cmp_and_br_short should assert if the branch displacement is too far, right?
>

Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM 
on Sparc + compressed oops + VerifyOops.

>>
>>
>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files.
>
> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods).  It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it.  I missed that.  But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies.
>

OK.

>
>>
>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it.
>
> The spec of MutableCallSite says:
>
> "For target values which will be frequently updated, consider using a volatile call site instead."
>
> And VolatileCallSite says:
>
> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads.
>
> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads.
>
> In other respects, a VolatileCallSite is interchangeable with MutableCallSite."
>
> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case.  Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet.
>

Thank you for explaining it.

> Additionally I had to do two small changes because the build was broken on some configurations:
>
> -  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL;
> +  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL;
>
> and
>
> -      MutexLockerEx ccl(CodeCache_lock, thread);
> +      MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);
>
> I updated the webrev.

Good.

Vladimir

>
> -- Christian
>
>>
>>
>> Vladimir
>>
>> On 8/5/11 6:32 AM, Christian Thalinger wrote:
>>> http://cr.openjdk.java.net/~twisti/7071653
>>>
>>> 7071653: JSR 292: call site change notification should be pushed not pulled
>>> Reviewed-by:
>>>
>>> Currently every speculatively inlined method handle call site has a
>>> guard that compares the current target of the CallSite object to the
>>> inlined one.  This per-invocation overhead can be removed if the
>>> notification is changed from pulled to pushed (i.e. deoptimization).
>>>
>>> I had to change the logic in TemplateTable::patch_bytecode to skip
>>> bytecode quickening for putfield instructions when the put_code
>>> written to the constant pool cache is zero.  This is required so that
>>> every execution of a putfield to CallSite.target calls out to
>>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>>> depending compiled methods.
>>>
>>> I also had to change the dependency machinery to understand other
>>> dependencies than class hierarchy ones.  DepChange got the super-type
>>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>>>
>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>>> tests and vm.mlvm tests.
>>>
>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>>> second with 7071653).  Since the CallSite targets don't change during
>>> the runtime of this benchmark we can see the performance benefit of
>>> eliminating the guard:
>>>
>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>    0.883000   0.000000   0.883000 (  0.854000)
>>>    0.715000   0.000000   0.715000 (  0.715000)
>>>    0.712000   0.000000   0.712000 (  0.712000)
>>>    0.713000   0.000000   0.713000 (  0.713000)
>>>    0.713000   0.000000   0.713000 (  0.712000)
>>>
>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>    0.772000   0.000000   0.772000 (  0.742000)
>>>    0.624000   0.000000   0.624000 (  0.624000)
>>>    0.621000   0.000000   0.621000 (  0.621000)
>>>    0.622000   0.000000   0.622000 (  0.622000)
>>>    0.622000   0.000000   0.622000 (  0.621000)
>>>
>

From christian.thalinger at oracle.com  Mon Aug  8 09:40:46 2011
From: christian.thalinger at oracle.com (christian.thalinger at oracle.com)
Date: Mon, 08 Aug 2011 16:40:46 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7071823: Zero: zero/shark doesn't build
	after b147-fcs
Message-ID: <20110808164048.9272747A1C@hg.openjdk.java.net>

Changeset: a3142bdb6707
Author:    twisti
Date:      2011-08-08 05:49 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a3142bdb6707

7071823: Zero: zero/shark doesn't build after b147-fcs
Reviewed-by: gbenson, twisti
Contributed-by: Chris Phillips <chphilli at redhat.com>

! src/cpu/zero/vm/frame_zero.cpp
+ src/cpu/zero/vm/methodHandles_zero.hpp
! src/cpu/zero/vm/sharedRuntime_zero.cpp
! src/share/vm/shark/sharkContext.hpp


From christian.thalinger at oracle.com  Mon Aug  8 11:12:06 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 8 Aug 2011 20:12:06 +0200
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <4E3FF8E4.2070302@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<4E3F1331.2000909@oracle.com>
	<BB9775B4-13F9-4222-8C2E-FCF3997897E5@oracle.com>
	<4E3FF8E4.2070302@oracle.com>
Message-ID: <F6375024-D824-4351-9B62-E604CBC6E45D@oracle.com>


On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote:

> Christian,
> 
> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case?

No, it doesn't buy us anything.  The new checking code is only executed the first time as the bytecodes are quickened right after that.  And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway.

> 
> On 8/8/11 6:56 AM, Christian Thalinger wrote:
>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32  bit)?
>> 
>> Good question.  I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr.
>> 
>> _indices in CosntantPoolCacheEntry is defined as intx:
>> 
>>   volatile intx     _indices;  // constant pool index&  rewrite bytecodes
>> 
>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:
>> 
>> // bit number |31                0|
>> // bit length |-8--|-8--|---16----|
>> // --------------------------------
>> // _indices   [ b2 | b1 |  index  ]
>> 
>> Loading 32-bit on LE gives you the right bits but on BE it does not.  I think that's the reason for the "optimization" on x64.
> 
> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file).

I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there.

> 
>>> 
>>> I am concern about using next short branch in new code in templateTable_sparc.cpp:
>>> 
>>> cmp_and_br_short(..., L_patch_done);  // don't patch
>>> 
>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far.
>> 
>> Yeah, I thought I give it a try if it works.  cmp_and_br_short should assert if the branch displacement is too far, right?
>> 
> 
> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops.

That works.

> 
>>> 
>>> 
>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files.
>> 
>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods).  It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it.  I missed that.  But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies.
>> 
> 
> OK.
> 
>> 
>>> 
>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it.
>> 
>> The spec of MutableCallSite says:
>> 
>> "For target values which will be frequently updated, consider using a volatile call site instead."
>> 
>> And VolatileCallSite says:
>> 
>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads.
>> 
>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads.
>> 
>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite."
>> 
>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case.  Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet.
>> 
> 
> Thank you for explaining it.
> 
>> Additionally I had to do two small changes because the build was broken on some configurations:
>> 
>> -  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL;
>> +  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL;
>> 
>> and
>> 
>> -      MutexLockerEx ccl(CodeCache_lock, thread);
>> +      MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);
>> 
>> I updated the webrev.
> 
> Good.

Thanks.

-- Christian

> 
> Vladimir
> 
>> 
>> -- Christian
>> 
>>> 
>>> 
>>> Vladimir
>>> 
>>> On 8/5/11 6:32 AM, Christian Thalinger wrote:
>>>> http://cr.openjdk.java.net/~twisti/7071653
>>>> 
>>>> 7071653: JSR 292: call site change notification should be pushed not pulled
>>>> Reviewed-by:
>>>> 
>>>> Currently every speculatively inlined method handle call site has a
>>>> guard that compares the current target of the CallSite object to the
>>>> inlined one.  This per-invocation overhead can be removed if the
>>>> notification is changed from pulled to pushed (i.e. deoptimization).
>>>> 
>>>> I had to change the logic in TemplateTable::patch_bytecode to skip
>>>> bytecode quickening for putfield instructions when the put_code
>>>> written to the constant pool cache is zero.  This is required so that
>>>> every execution of a putfield to CallSite.target calls out to
>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>>>> depending compiled methods.
>>>> 
>>>> I also had to change the dependency machinery to understand other
>>>> dependencies than class hierarchy ones.  DepChange got the super-type
>>>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>>>> 
>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>>>> tests and vm.mlvm tests.
>>>> 
>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>>>> second with 7071653).  Since the CallSite targets don't change during
>>>> the runtime of this benchmark we can see the performance benefit of
>>>> eliminating the guard:
>>>> 
>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>>   0.883000   0.000000   0.883000 (  0.854000)
>>>>   0.715000   0.000000   0.715000 (  0.715000)
>>>>   0.712000   0.000000   0.712000 (  0.712000)
>>>>   0.713000   0.000000   0.713000 (  0.713000)
>>>>   0.713000   0.000000   0.713000 (  0.712000)
>>>> 
>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>>   0.772000   0.000000   0.772000 (  0.742000)
>>>>   0.624000   0.000000   0.624000 (  0.624000)
>>>>   0.621000   0.000000   0.621000 (  0.621000)
>>>>   0.622000   0.000000   0.622000 (  0.622000)
>>>>   0.622000   0.000000   0.622000 (  0.621000)
>>>> 
>> 


From tom.rodriguez at oracle.com  Mon Aug  8 11:49:16 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 8 Aug 2011 11:49:16 -0700
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
Message-ID: <E09300F6-0DBF-4AA2-9245-C84F4BE6B10F@oracle.com>

dependencies.cpp:

in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed.  It should probably look more like this:

klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) {
  assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity");
  // Same CallSite object but different target?  Check this specific call site
  //  if changes is non-NULL or validate all CallSites
  if ((changes == NULL || (call_site == changes->call_site())) &&
      (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) {
    return ctxk;  // assertion failed
  }
  assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid");
  return NULL;  // assertion still valid
}

The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked.

interpreterRuntime.cpp:

Please move the dependence check code into universe with the other dependence check code.  Also add some comments explaining why it's doing what it's doing.

doCall.cpp:

Can you put in a comment explaining that VolatileCallSite is never inlined.

Otherwise it looks good.

tom


On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7071653
> 
> 7071653: JSR 292: call site change notification should be pushed not pulled
> Reviewed-by:
> 
> Currently every speculatively inlined method handle call site has a
> guard that compares the current target of the CallSite object to the
> inlined one.  This per-invocation overhead can be removed if the
> notification is changed from pulled to pushed (i.e. deoptimization).
> 
> I had to change the logic in TemplateTable::patch_bytecode to skip
> bytecode quickening for putfield instructions when the put_code
> written to the constant pool cache is zero.  This is required so that
> every execution of a putfield to CallSite.target calls out to
> InterpreterRuntime::resolve_get_put to do the deoptimization of
> depending compiled methods.
> 
> I also had to change the dependency machinery to understand other
> dependencies than class hierarchy ones.  DepChange got the super-type
> of two new dependencies, KlassDepChange and CallSiteDepChange.
> 
> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
> tests and vm.mlvm tests.
> 
> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
> second with 7071653).  Since the CallSite targets don't change during
> the runtime of this benchmark we can see the performance benefit of
> eliminating the guard:
> 
> $ jruby --server bench/bench_fib_recursive.rb 5 35
>  0.883000   0.000000   0.883000 (  0.854000)
>  0.715000   0.000000   0.715000 (  0.715000)
>  0.712000   0.000000   0.712000 (  0.712000)
>  0.713000   0.000000   0.713000 (  0.713000)
>  0.713000   0.000000   0.713000 (  0.712000)
> 
> $ jruby --server bench/bench_fib_recursive.rb 5 35
>  0.772000   0.000000   0.772000 (  0.742000)
>  0.624000   0.000000   0.624000 (  0.624000)
>  0.621000   0.000000   0.621000 (  0.621000)
>  0.622000   0.000000   0.622000 (  0.622000)
>  0.622000   0.000000   0.622000 (  0.621000)
> 


From tom.rodriguez at oracle.com  Mon Aug  8 11:50:57 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 8 Aug 2011 11:50:57 -0700
Subject: review for 7075623: 6990212 broke raiseException in 64 bit
In-Reply-To: <98787BDF-3C3A-45AF-B2D7-CDB4763E8D0D@oracle.com>
References: <A320A62A-37AF-4CA7-8763-A832C2A00DCE@oracle.com>
	<98787BDF-3C3A-45AF-B2D7-CDB4763E8D0D@oracle.com>
Message-ID: <193DDFC1-78CF-4097-B3F9-0D9ECBA5DB63@oracle.com>

Thanks Christian and Vladimir.

tom

On Aug 8, 2011, at 1:34 AM, Christian Thalinger wrote:

> Looks good.  -- Christian
> 
> On Aug 5, 2011, at 10:22 PM, Tom Rodriguez wrote:
> 
>> http://cr.openjdk.java.net/~never/7075623
>> 3 lines changed: 0 ins; 0 del; 3 mod; 4699 unchg
>> 
>> 7075623: 6990212 broke raiseException in 64 bit
>> Reviewed-by:
>> 
>> The fix for 6990212 included making the raiseException path do a
>> normal dispatch instead of always using the compiler entry.  The
>> assembly for 64 bit had a few issues.  On x86 the saved sp register is
>> wrong which causes rarg0_code to be killed.  On sparc the code should
>> be passed as an int instead of a ptr which causes problems because of
>> endianness.  I also modified the x86 code to do the same.  Tested with
>> original regression test on sparc/x86 32/64 -Xcomp/-Xmixed.  I also
>> reran the failing JDK regression tests.
>> 
> 


From vladimir.kozlov at oracle.com  Mon Aug  8 11:52:57 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 08 Aug 2011 11:52:57 -0700
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <F6375024-D824-4351-9B62-E604CBC6E45D@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<4E3F1331.2000909@oracle.com>
	<BB9775B4-13F9-4222-8C2E-FCF3997897E5@oracle.com>
	<4E3FF8E4.2070302@oracle.com>
	<F6375024-D824-4351-9B62-E604CBC6E45D@oracle.com>
Message-ID: <4E403089.5010204@oracle.com>

Christian Thalinger wrote:
> On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote:
> 
>> Christian,
>>
>> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case?
> 
> No, it doesn't buy us anything.  The new checking code is only executed the first time as the bytecodes are quickened right after that.  And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway.

You lost me here. New code in resolve_get_put() is executed only for putfield to 
CallSite.target. But new code in patch_bytecode() skips quickening for all 
putfield bytecodes. My question is: can you narrow skipping quickening only for 
putfield to CallSite.target? Or you are saying that there is no performance 
difference between executing _aputfield vs _fast_aputfield?

Vladimir

> 
>> On 8/8/11 6:56 AM, Christian Thalinger wrote:
>>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32  bit)?
>>> Good question.  I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr.
>>>
>>> _indices in CosntantPoolCacheEntry is defined as intx:
>>>
>>>   volatile intx     _indices;  // constant pool index&  rewrite bytecodes
>>>
>>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:
>>>
>>> // bit number |31                0|
>>> // bit length |-8--|-8--|---16----|
>>> // --------------------------------
>>> // _indices   [ b2 | b1 |  index  ]
>>>
>>> Loading 32-bit on LE gives you the right bits but on BE it does not.  I think that's the reason for the "optimization" on x64.
>> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file).
> 
> I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there.
> 
>>>> I am concern about using next short branch in new code in templateTable_sparc.cpp:
>>>>
>>>> cmp_and_br_short(..., L_patch_done);  // don't patch
>>>>
>>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far.
>>> Yeah, I thought I give it a try if it works.  cmp_and_br_short should assert if the branch displacement is too far, right?
>>>
>> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops.
> 
> That works.
> 
>>>>
>>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files.
>>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods).  It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it.  I missed that.  But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies.
>>>
>> OK.
>>
>>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it.
>>> The spec of MutableCallSite says:
>>>
>>> "For target values which will be frequently updated, consider using a volatile call site instead."
>>>
>>> And VolatileCallSite says:
>>>
>>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads.
>>>
>>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads.
>>>
>>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite."
>>>
>>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case.  Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet.
>>>
>> Thank you for explaining it.
>>
>>> Additionally I had to do two small changes because the build was broken on some configurations:
>>>
>>> -  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL;
>>> +  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL;
>>>
>>> and
>>>
>>> -      MutexLockerEx ccl(CodeCache_lock, thread);
>>> +      MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);
>>>
>>> I updated the webrev.
>> Good.
> 
> Thanks.
> 
> -- Christian
> 
>> Vladimir
>>
>>> -- Christian
>>>
>>>>
>>>> Vladimir
>>>>
>>>> On 8/5/11 6:32 AM, Christian Thalinger wrote:
>>>>> http://cr.openjdk.java.net/~twisti/7071653
>>>>>
>>>>> 7071653: JSR 292: call site change notification should be pushed not pulled
>>>>> Reviewed-by:
>>>>>
>>>>> Currently every speculatively inlined method handle call site has a
>>>>> guard that compares the current target of the CallSite object to the
>>>>> inlined one.  This per-invocation overhead can be removed if the
>>>>> notification is changed from pulled to pushed (i.e. deoptimization).
>>>>>
>>>>> I had to change the logic in TemplateTable::patch_bytecode to skip
>>>>> bytecode quickening for putfield instructions when the put_code
>>>>> written to the constant pool cache is zero.  This is required so that
>>>>> every execution of a putfield to CallSite.target calls out to
>>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>>>>> depending compiled methods.
>>>>>
>>>>> I also had to change the dependency machinery to understand other
>>>>> dependencies than class hierarchy ones.  DepChange got the super-type
>>>>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>>>>>
>>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>>>>> tests and vm.mlvm tests.
>>>>>
>>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>>>>> second with 7071653).  Since the CallSite targets don't change during
>>>>> the runtime of this benchmark we can see the performance benefit of
>>>>> eliminating the guard:
>>>>>
>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>>>   0.883000   0.000000   0.883000 (  0.854000)
>>>>>   0.715000   0.000000   0.715000 (  0.715000)
>>>>>   0.712000   0.000000   0.712000 (  0.712000)
>>>>>   0.713000   0.000000   0.713000 (  0.713000)
>>>>>   0.713000   0.000000   0.713000 (  0.712000)
>>>>>
>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>>>   0.772000   0.000000   0.772000 (  0.742000)
>>>>>   0.624000   0.000000   0.624000 (  0.624000)
>>>>>   0.621000   0.000000   0.621000 (  0.621000)
>>>>>   0.622000   0.000000   0.622000 (  0.622000)
>>>>>   0.622000   0.000000   0.622000 (  0.621000)
>>>>>
> 

From tom.rodriguez at oracle.com  Mon Aug  8 12:08:52 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 8 Aug 2011 12:08:52 -0700
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <4E403089.5010204@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<4E3F1331.2000909@oracle.com>
	<BB9775B4-13F9-4222-8C2E-FCF3997897E5@oracle.com>
	<4E3FF8E4.2070302@oracle.com>
	<F6375024-D824-4351-9B62-E604CBC6E45D@oracle.com>
	<4E403089.5010204@oracle.com>
Message-ID: <CD7E6BE0-FBD6-4A8F-A4FE-DA2C597D7B8F@oracle.com>


On Aug 8, 2011, at 11:52 AM, Vladimir Kozlov wrote:

> Christian Thalinger wrote:
>> On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote:
>> 
>>> Christian,
>>> 
>>> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case?
>> 
>> No, it doesn't buy us anything.  The new checking code is only executed the first time as the bytecodes are quickened right after that.  And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway.
> 
> You lost me here. New code in resolve_get_put() is executed only for putfield to 
> CallSite.target. But new code in patch_bytecode() skips quickening for all 
> putfield bytecodes. My question is: can you narrow skipping quickening only for 
> putfield to CallSite.target? Or you are saying that there is no performance 
> difference between executing _aputfield vs _fast_aputfield?

It only skips quickening if put_code is zero, which is only done for CallSite.target.  All the others proceed as they used to.

tom

> 
> Vladimir
> 
>> 
>>> On 8/8/11 6:56 AM, Christian Thalinger wrote:
>>>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32  bit)?
>>>> Good question.  I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr.
>>>> 
>>>> _indices in CosntantPoolCacheEntry is defined as intx:
>>>> 
>>>>  volatile intx     _indices;  // constant pool index&  rewrite bytecodes
>>>> 
>>>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:
>>>> 
>>>> // bit number |31                0|
>>>> // bit length |-8--|-8--|---16----|
>>>> // --------------------------------
>>>> // _indices   [ b2 | b1 |  index  ]
>>>> 
>>>> Loading 32-bit on LE gives you the right bits but on BE it does not.  I think that's the reason for the "optimization" on x64.
>>> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file).
>> 
>> I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there.
>> 
>>>>> I am concern about using next short branch in new code in templateTable_sparc.cpp:
>>>>> 
>>>>> cmp_and_br_short(..., L_patch_done);  // don't patch
>>>>> 
>>>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far.
>>>> Yeah, I thought I give it a try if it works.  cmp_and_br_short should assert if the branch displacement is too far, right?
>>>> 
>>> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops.
>> 
>> That works.
>> 
>>>>> 
>>>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files.
>>>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods).  It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it.  I missed that.  But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies.
>>>> 
>>> OK.
>>> 
>>>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it.
>>>> The spec of MutableCallSite says:
>>>> 
>>>> "For target values which will be frequently updated, consider using a volatile call site instead."
>>>> 
>>>> And VolatileCallSite says:
>>>> 
>>>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads.
>>>> 
>>>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads.
>>>> 
>>>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite."
>>>> 
>>>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case.  Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet.
>>>> 
>>> Thank you for explaining it.
>>> 
>>>> Additionally I had to do two small changes because the build was broken on some configurations:
>>>> 
>>>> -  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL;
>>>> +  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL;
>>>> 
>>>> and
>>>> 
>>>> -      MutexLockerEx ccl(CodeCache_lock, thread);
>>>> +      MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);
>>>> 
>>>> I updated the webrev.
>>> Good.
>> 
>> Thanks.
>> 
>> -- Christian
>> 
>>> Vladimir
>>> 
>>>> -- Christian
>>>> 
>>>>> 
>>>>> Vladimir
>>>>> 
>>>>> On 8/5/11 6:32 AM, Christian Thalinger wrote:
>>>>>> http://cr.openjdk.java.net/~twisti/7071653
>>>>>> 
>>>>>> 7071653: JSR 292: call site change notification should be pushed not pulled
>>>>>> Reviewed-by:
>>>>>> 
>>>>>> Currently every speculatively inlined method handle call site has a
>>>>>> guard that compares the current target of the CallSite object to the
>>>>>> inlined one.  This per-invocation overhead can be removed if the
>>>>>> notification is changed from pulled to pushed (i.e. deoptimization).
>>>>>> 
>>>>>> I had to change the logic in TemplateTable::patch_bytecode to skip
>>>>>> bytecode quickening for putfield instructions when the put_code
>>>>>> written to the constant pool cache is zero.  This is required so that
>>>>>> every execution of a putfield to CallSite.target calls out to
>>>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>>>>>> depending compiled methods.
>>>>>> 
>>>>>> I also had to change the dependency machinery to understand other
>>>>>> dependencies than class hierarchy ones.  DepChange got the super-type
>>>>>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>>>>>> 
>>>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>>>>>> tests and vm.mlvm tests.
>>>>>> 
>>>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>>>>>> second with 7071653).  Since the CallSite targets don't change during
>>>>>> the runtime of this benchmark we can see the performance benefit of
>>>>>> eliminating the guard:
>>>>>> 
>>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>>>>  0.883000   0.000000   0.883000 (  0.854000)
>>>>>>  0.715000   0.000000   0.715000 (  0.715000)
>>>>>>  0.712000   0.000000   0.712000 (  0.712000)
>>>>>>  0.713000   0.000000   0.713000 (  0.713000)
>>>>>>  0.713000   0.000000   0.713000 (  0.712000)
>>>>>> 
>>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>>>>  0.772000   0.000000   0.772000 (  0.742000)
>>>>>>  0.624000   0.000000   0.624000 (  0.624000)
>>>>>>  0.621000   0.000000   0.621000 (  0.621000)
>>>>>>  0.622000   0.000000   0.622000 (  0.622000)
>>>>>>  0.622000   0.000000   0.622000 (  0.621000)
>>>>>> 
>> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From vladimir.kozlov at oracle.com  Mon Aug  8 12:36:45 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 08 Aug 2011 12:36:45 -0700
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <CD7E6BE0-FBD6-4A8F-A4FE-DA2C597D7B8F@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<4E3F1331.2000909@oracle.com>
	<BB9775B4-13F9-4222-8C2E-FCF3997897E5@oracle.com>
	<4E3FF8E4.2070302@oracle.com>
	<F6375024-D824-4351-9B62-E604CBC6E45D@oracle.com>
	<4E403089.5010204@oracle.com>
	<CD7E6BE0-FBD6-4A8F-A4FE-DA2C597D7B8F@oracle.com>
Message-ID: <4E403ACD.5000500@oracle.com>

Tom Rodriguez wrote:
> On Aug 8, 2011, at 11:52 AM, Vladimir Kozlov wrote:
> 
>> Christian Thalinger wrote:
>>> On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote:
>>>
>>>> Christian,
>>>>
>>>> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case?
>>> No, it doesn't buy us anything.  The new checking code is only executed the first time as the bytecodes are quickened right after that.  And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway.
>> You lost me here. New code in resolve_get_put() is executed only for putfield to 
>> CallSite.target. But new code in patch_bytecode() skips quickening for all 
>> putfield bytecodes. My question is: can you narrow skipping quickening only for 
>> putfield to CallSite.target? Or you are saying that there is no performance 
>> difference between executing _aputfield vs _fast_aputfield?
> 
> It only skips quickening if put_code is zero, which is only done for CallSite.target.  All the others proceed as they used to.

Good. Thank you, Tom

Vladimir

> 
> tom
> 
>> Vladimir
>>
>>>> On 8/8/11 6:56 AM, Christian Thalinger wrote:
>>>>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32  bit)?
>>>>> Good question.  I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr.
>>>>>
>>>>> _indices in CosntantPoolCacheEntry is defined as intx:
>>>>>
>>>>>  volatile intx     _indices;  // constant pool index&  rewrite bytecodes
>>>>>
>>>>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:
>>>>>
>>>>> // bit number |31                0|
>>>>> // bit length |-8--|-8--|---16----|
>>>>> // --------------------------------
>>>>> // _indices   [ b2 | b1 |  index  ]
>>>>>
>>>>> Loading 32-bit on LE gives you the right bits but on BE it does not.  I think that's the reason for the "optimization" on x64.
>>>> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file).
>>> I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there.
>>>
>>>>>> I am concern about using next short branch in new code in templateTable_sparc.cpp:
>>>>>>
>>>>>> cmp_and_br_short(..., L_patch_done);  // don't patch
>>>>>>
>>>>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far.
>>>>> Yeah, I thought I give it a try if it works.  cmp_and_br_short should assert if the branch displacement is too far, right?
>>>>>
>>>> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops.
>>> That works.
>>>
>>>>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files.
>>>>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods).  It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it.  I missed that.  But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies.
>>>>>
>>>> OK.
>>>>
>>>>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it.
>>>>> The spec of MutableCallSite says:
>>>>>
>>>>> "For target values which will be frequently updated, consider using a volatile call site instead."
>>>>>
>>>>> And VolatileCallSite says:
>>>>>
>>>>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads.
>>>>>
>>>>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads.
>>>>>
>>>>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite."
>>>>>
>>>>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case.  Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet.
>>>>>
>>>> Thank you for explaining it.
>>>>
>>>>> Additionally I had to do two small changes because the build was broken on some configurations:
>>>>>
>>>>> -  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL;
>>>>> +  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL;
>>>>>
>>>>> and
>>>>>
>>>>> -      MutexLockerEx ccl(CodeCache_lock, thread);
>>>>> +      MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);
>>>>>
>>>>> I updated the webrev.
>>>> Good.
>>> Thanks.
>>>
>>> -- Christian
>>>
>>>> Vladimir
>>>>
>>>>> -- Christian
>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>> On 8/5/11 6:32 AM, Christian Thalinger wrote:
>>>>>>> http://cr.openjdk.java.net/~twisti/7071653
>>>>>>>
>>>>>>> 7071653: JSR 292: call site change notification should be pushed not pulled
>>>>>>> Reviewed-by:
>>>>>>>
>>>>>>> Currently every speculatively inlined method handle call site has a
>>>>>>> guard that compares the current target of the CallSite object to the
>>>>>>> inlined one.  This per-invocation overhead can be removed if the
>>>>>>> notification is changed from pulled to pushed (i.e. deoptimization).
>>>>>>>
>>>>>>> I had to change the logic in TemplateTable::patch_bytecode to skip
>>>>>>> bytecode quickening for putfield instructions when the put_code
>>>>>>> written to the constant pool cache is zero.  This is required so that
>>>>>>> every execution of a putfield to CallSite.target calls out to
>>>>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>>>>>>> depending compiled methods.
>>>>>>>
>>>>>>> I also had to change the dependency machinery to understand other
>>>>>>> dependencies than class hierarchy ones.  DepChange got the super-type
>>>>>>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>>>>>>>
>>>>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>>>>>>> tests and vm.mlvm tests.
>>>>>>>
>>>>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>>>>>>> second with 7071653).  Since the CallSite targets don't change during
>>>>>>> the runtime of this benchmark we can see the performance benefit of
>>>>>>> eliminating the guard:
>>>>>>>
>>>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>>>>>  0.883000   0.000000   0.883000 (  0.854000)
>>>>>>>  0.715000   0.000000   0.715000 (  0.715000)
>>>>>>>  0.712000   0.000000   0.712000 (  0.712000)
>>>>>>>  0.713000   0.000000   0.713000 (  0.713000)
>>>>>>>  0.713000   0.000000   0.713000 (  0.712000)
>>>>>>>
>>>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>>>>>  0.772000   0.000000   0.772000 (  0.742000)
>>>>>>>  0.624000   0.000000   0.624000 (  0.624000)
>>>>>>>  0.621000   0.000000   0.621000 (  0.621000)
>>>>>>>  0.622000   0.000000   0.622000 (  0.622000)
>>>>>>>  0.622000   0.000000   0.622000 (  0.621000)
>>>>>>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> 

From headius at headius.com  Mon Aug  8 14:29:46 2011
From: headius at headius.com (Charles Oliver Nutter)
Date: Mon, 8 Aug 2011 17:29:46 -0400
Subject: review for 7071307: MethodHandle bimorphic inlining should
	consider the frequency
In-Reply-To: <97DD49F1-0A6B-4F0C-88EA-76C93D054007@oracle.com>
References: <97DD49F1-0A6B-4F0C-88EA-76C93D054007@oracle.com>
Message-ID: <CAE-f1xTU2rbmQ+fzz_fnvw-5woVFausCWZ9UHAO13syWiGdOyg@mail.gmail.com>

On Thu, Jul 28, 2011 at 7:47 PM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> http://cr.openjdk.java.net/~never/7071307
> 46 lines changed: 27 ins; 6 del; 13 mod; 3568 unchg
>
> 7071307: MethodHandle bimorphic inlining should consider the frequency
> Reviewed-by:
>
> The fix for 7050554 added a bimorphic inline path but didn't take into
> account the frequency of the guarding test. ?This ends up treating
> both sides of the if as equally frequent which can lead to over
> inlining and overflowing the method inlining limits. ?The fix is to
> grab the frequency from the If and apply that to the branches. ?This
> addresses a major source of overinlining that can result in bad
> performance with JSR 292. ?We may do a later extension to this to
> actually do per call chain profiling of selectAlternative but that's a
> more complicated fix.
>
> I also fixed a problem with the ideal graph printer where debug_orig
> printing would go into an infinite loop.
>
> Tested with jruby and vm.mlvm tests.

Building on Ubuntu (an admittedly old install) yields some warnings
that are turned into errors:

g++ -DLINUX -D_GNU_SOURCE -DIA32 -DPRODUCT -I.
-I/home/headius/hsx-hotspot/src/share/vm/prims
-I/home/headius/hsx-hotspot/src/share/vm
-I/home/headius/hsx-hotspot/src/cpu/x86/vm
-I/home/headius/hsx-hotspot/src/os_cpu/linux_x86/vm
-I/home/headius/hsx-hotspot/src/os/linux/vm
-I/home/headius/hsx-hotspot/src/os/posix/vm -I../generated
-DHOTSPOT_RELEASE_VERSION="\"22.0-b01-internal\""
-DHOTSPOT_BUILD_TARGET="\"product\""
-DHOTSPOT_BUILD_USER="\"headius\"" -DHOTSPOT_LIB_ARCH=\"i386\"
-DJRE_RELEASE_VERSION="\"1.7.0\"" -DHOTSPOT_VM_DISTRO="\"OpenJDK\""
-DTARGET_OS_FAMILY_linux -DTARGET_ARCH_x86 -DTARGET_ARCH_MODEL_x86_32
-DTARGET_OS_ARCH_linux_x86 -DTARGET_OS_ARCH_MODEL_linux_x86_32
-DTARGET_COMPILER_gcc -DCOMPILER2 -DCOMPILER1 -fPIC -fno-rtti
-fno-exceptions -D_REENTRANT -fcheck-new -m32 -march=i586 -pipe -O3
-fno-strict-aliasing -DVM_LITTLE_ENDIAN -Werror -Wpointer-arith
-Wconversion -Wsign-compare    -c -MMD -MP -MF
../generated/dependencies/precompiled.hpp.gch.d -x c++-header
/home/headius/hsx-hotspot/src/share/vm/precompiled.hpp -o
precompiled.hpp.gch
cc1plus: warnings being treated as errors
/home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp: In member
function 'ciCallProfile ciCallProfile::rescale(double)':
/home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp:87:
warning: converting to 'int' from 'double'
/home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp:89:
warning: converting to 'int' from 'double'

The lines in question are doing (int) *= (double), which gcc complains
about. Ubuntu probably has warnings set up to be errors, so it fails
the build.

I modified them in my local copy to do the long form with an explicit
cast back to int, but you can fix in whatever way is best.

- Charlie

From tom.rodriguez at oracle.com  Mon Aug  8 14:44:18 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 8 Aug 2011 14:44:18 -0700
Subject: review for 7071307: MethodHandle bimorphic inlining should
	consider the frequency
In-Reply-To: <CAE-f1xTU2rbmQ+fzz_fnvw-5woVFausCWZ9UHAO13syWiGdOyg@mail.gmail.com>
References: <97DD49F1-0A6B-4F0C-88EA-76C93D054007@oracle.com>
	<CAE-f1xTU2rbmQ+fzz_fnvw-5woVFausCWZ9UHAO13syWiGdOyg@mail.gmail.com>
Message-ID: <7BE903A3-19FF-433C-909C-AAD3105E69D2@oracle.com>

I'll fix that as you suggest.

diff -r a19c671188cb src/share/vm/ci/ciCallProfile.hpp                                                                                                
--- a/src/share/vm/ci/ciCallProfile.hpp                                                                                                              
+++ b/src/share/vm/ci/ciCallProfile.hpp                                                                                                              
@@ -79,6 +79,17 @@
     assert(i < _limit, "out of Call Profile MorphismLimit");                                                                                        
     return _receiver[i];                                                                                                                            
   }                                                                                                                                                  
+                                                                                                                                                    
+  // Rescale the current profile based on the incoming scale                                                                                        
+  ciCallProfile rescale(double scale) {                                                                                                              
+    assert(scale >= 0 && scale <= 1.0, "out of range");                                                                                              
+    ciCallProfile call = *this;                                                                                                                      
+    call._count = (int)(call._count * scale);                                                                                                        
+    for (int i = 0; i < _morphism; i++) {                                                                                                            
+      call._receiver_count[i] = (int)(call._receiver_count[i] * scale);                                                                              
+    }                                                                                                                                                
+    return call;                                                                                                                                    
+  }                                                                                                                                                  
 };                                                                                                                                                  
                                                                                                                                                     
 #endif // SHARE_VM_CI_CICALLPROFILE_HPP

I haven't pushed this yet because I was seeing some cases where the if's were ordered how I expect and I'm still trying to figure out if this is me or something odd in jruby.  I should get to the bottom of this today.

tom

On Aug 8, 2011, at 2:29 PM, Charles Oliver Nutter wrote:

> On Thu, Jul 28, 2011 at 7:47 PM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>> http://cr.openjdk.java.net/~never/7071307
>> 46 lines changed: 27 ins; 6 del; 13 mod; 3568 unchg
>> 
>> 7071307: MethodHandle bimorphic inlining should consider the frequency
>> Reviewed-by:
>> 
>> The fix for 7050554 added a bimorphic inline path but didn't take into
>> account the frequency of the guarding test.  This ends up treating
>> both sides of the if as equally frequent which can lead to over
>> inlining and overflowing the method inlining limits.  The fix is to
>> grab the frequency from the If and apply that to the branches.  This
>> addresses a major source of overinlining that can result in bad
>> performance with JSR 292.  We may do a later extension to this to
>> actually do per call chain profiling of selectAlternative but that's a
>> more complicated fix.
>> 
>> I also fixed a problem with the ideal graph printer where debug_orig
>> printing would go into an infinite loop.
>> 
>> Tested with jruby and vm.mlvm tests.
> 
> Building on Ubuntu (an admittedly old install) yields some warnings
> that are turned into errors:
> 
> g++ -DLINUX -D_GNU_SOURCE -DIA32 -DPRODUCT -I.
> -I/home/headius/hsx-hotspot/src/share/vm/prims
> -I/home/headius/hsx-hotspot/src/share/vm
> -I/home/headius/hsx-hotspot/src/cpu/x86/vm
> -I/home/headius/hsx-hotspot/src/os_cpu/linux_x86/vm
> -I/home/headius/hsx-hotspot/src/os/linux/vm
> -I/home/headius/hsx-hotspot/src/os/posix/vm -I../generated
> -DHOTSPOT_RELEASE_VERSION="\"22.0-b01-internal\""
> -DHOTSPOT_BUILD_TARGET="\"product\""
> -DHOTSPOT_BUILD_USER="\"headius\"" -DHOTSPOT_LIB_ARCH=\"i386\"
> -DJRE_RELEASE_VERSION="\"1.7.0\"" -DHOTSPOT_VM_DISTRO="\"OpenJDK\""
> -DTARGET_OS_FAMILY_linux -DTARGET_ARCH_x86 -DTARGET_ARCH_MODEL_x86_32
> -DTARGET_OS_ARCH_linux_x86 -DTARGET_OS_ARCH_MODEL_linux_x86_32
> -DTARGET_COMPILER_gcc -DCOMPILER2 -DCOMPILER1 -fPIC -fno-rtti
> -fno-exceptions -D_REENTRANT -fcheck-new -m32 -march=i586 -pipe -O3
> -fno-strict-aliasing -DVM_LITTLE_ENDIAN -Werror -Wpointer-arith
> -Wconversion -Wsign-compare    -c -MMD -MP -MF
> ../generated/dependencies/precompiled.hpp.gch.d -x c++-header
> /home/headius/hsx-hotspot/src/share/vm/precompiled.hpp -o
> precompiled.hpp.gch
> cc1plus: warnings being treated as errors
> /home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp: In member
> function 'ciCallProfile ciCallProfile::rescale(double)':
> /home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp:87:
> warning: converting to 'int' from 'double'
> /home/headius/hsx-hotspot/src/share/vm/ci/ciCallProfile.hpp:89:
> warning: converting to 'int' from 'double'
> 
> The lines in question are doing (int) *= (double), which gcc complains
> about. Ubuntu probably has warnings set up to be errors, so it fails
> the build.
> 
> I modified them in my local copy to do the long form with an explicit
> cast back to int, but you can fix in whatever way is best.
> 
> - Charlie


From tom.rodriguez at oracle.com  Mon Aug  8 20:44:51 2011
From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com)
Date: Tue, 09 Aug 2011 03:44:51 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7075623: 6990212 broke raiseException
	in 64 bit
Message-ID: <20110809034457.B64E147A38@hg.openjdk.java.net>

Changeset: a19c671188cb
Author:    never
Date:      2011-08-08 13:19 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a19c671188cb

7075623: 6990212 broke raiseException in 64 bit
Reviewed-by: kvn, twisti

! src/cpu/sparc/vm/methodHandles_sparc.cpp
! src/cpu/x86/vm/methodHandles_x86.cpp


From christian.thalinger at oracle.com  Tue Aug  9 04:33:29 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 9 Aug 2011 13:33:29 +0200
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <E09300F6-0DBF-4AA2-9245-C84F4BE6B10F@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<E09300F6-0DBF-4AA2-9245-C84F4BE6B10F@oracle.com>
Message-ID: <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com>


On Aug 8, 2011, at 8:49 PM, Tom Rodriguez wrote:

> dependencies.cpp:
> 
> in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed.  It should probably look more like this:
> 
> klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) {
>  assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity");
>  // Same CallSite object but different target?  Check this specific call site
>  //  if changes is non-NULL or validate all CallSites
>  if ((changes == NULL || (call_site == changes->call_site())) &&
>      (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) {
>    return ctxk;  // assertion failed
>  }
>  assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid");
>  return NULL;  // assertion still valid
> }

I see your point.  But the code above is broken as changes->method_handle() will not work when changes == NULL.  One of my first versions of this code also stored the MethodHandle target in the dependence stream which seems to be required when we want to validate all CallSites.  Something like this:

! klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, oop method_handle, CallSiteDepChange* changes) {
+   assert(call_site    ->is_a(SystemDictionary::CallSite_klass()),     "sanity");
+   assert(method_handle->is_a(SystemDictionary::MethodHandle_klass()), "sanity");
+   if (changes == NULL) {
+     // Validate all CallSites
+     if (java_lang_invoke_CallSite::target(call_site) != method_handle)
+       return ctxk;  // assertion failed
+   } else {
+     // Validate the given CallSite
+     if (call_site == changes->call_site() && java_lang_invoke_CallSite::target(call_site) != changes->method_handle()) {
+       assert(method_handle != changes->method_handle(), "must be");
+       return ctxk;  // assertion failed
+     }
+   }
+   assert(java_lang_invoke_CallSite::target(call_site) == method_handle, "should still be valid");
+   return NULL;  // assertion still valid
+ }

> 
> The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked.
> 
> interpreterRuntime.cpp:
> 
> Please move the dependence check code into universe with the other dependence check code.

Where it says:

// %%% The Universe::flush_foo methods belong in CodeCache.

:-)

>   Also add some comments explaining why it's doing what it's doing.

Done.

> 
> doCall.cpp:
> 
> Can you put in a comment explaining that VolatileCallSite is never inlined.

Done.

> 
> Otherwise it looks good.

webrev updated.

-- Christian

> 
> tom
> 
> 
> On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote:
> 
>> http://cr.openjdk.java.net/~twisti/7071653
>> 
>> 7071653: JSR 292: call site change notification should be pushed not pulled
>> Reviewed-by:
>> 
>> Currently every speculatively inlined method handle call site has a
>> guard that compares the current target of the CallSite object to the
>> inlined one.  This per-invocation overhead can be removed if the
>> notification is changed from pulled to pushed (i.e. deoptimization).
>> 
>> I had to change the logic in TemplateTable::patch_bytecode to skip
>> bytecode quickening for putfield instructions when the put_code
>> written to the constant pool cache is zero.  This is required so that
>> every execution of a putfield to CallSite.target calls out to
>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>> depending compiled methods.
>> 
>> I also had to change the dependency machinery to understand other
>> dependencies than class hierarchy ones.  DepChange got the super-type
>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>> 
>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>> tests and vm.mlvm tests.
>> 
>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>> second with 7071653).  Since the CallSite targets don't change during
>> the runtime of this benchmark we can see the performance benefit of
>> eliminating the guard:
>> 
>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>> 0.883000   0.000000   0.883000 (  0.854000)
>> 0.715000   0.000000   0.715000 (  0.715000)
>> 0.712000   0.000000   0.712000 (  0.712000)
>> 0.713000   0.000000   0.713000 (  0.713000)
>> 0.713000   0.000000   0.713000 (  0.712000)
>> 
>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>> 0.772000   0.000000   0.772000 (  0.742000)
>> 0.624000   0.000000   0.624000 (  0.624000)
>> 0.621000   0.000000   0.621000 (  0.621000)
>> 0.622000   0.000000   0.622000 (  0.622000)
>> 0.622000   0.000000   0.622000 (  0.621000)
>> 
> 


From rednaxelafx at gmail.com  Tue Aug  9 06:14:37 2011
From: rednaxelafx at gmail.com (Krystal Mok)
Date: Tue, 9 Aug 2011 21:14:37 +0800
Subject: A failed attempt to add Phi::exact_type() to C1
Message-ID: <CA+cQ+tQmK1CEZpaBBxq9FN2jF46RcTKv4KhrsCDGc9o5SE_x=A@mail.gmail.com>

Hi all,

I tried to add an implementation of Phi::exact_type() to C1 last weekend,
but failed. I'd like to share my experience. Any comment would be
appreciated.

I reading a blog post with a microbenchmark [1]. The microbenchmark, when
run on the client VM, triggers an OSR compilation of Client1.main() by C1;
all method invocations on the local variable "list" were not inlined. At
first I thought it was weird: since there's only one definition of "list",
all of its uses should know its exact type, and thus should be able to get
inlined. I tried moving the code in main() to another method, so that it can
get a standard compilation, and found those methods were indeed inlined when
standard compiled.

So apparently the difference had to do with OSRs. The I realized it was the
extra Phi introduced by the OSR entry that lost the exact type information.
And it wasn't just in OSRs, Phi nodes in C1 HIR always loses exact type
information, becuase it doesn't override Instruction::exact_type(). C1 will
not inline "list.size()" in the code snippet below, which contains diamond
control flow, with different definitions of the same variable:

public static void test(String[] args) {
  List<?> list;
  if (args.length % 2 == 0) {
    list = new ArrayList<String>();   // a1
  } else {
    list = new ArrayList<String>(32); // a2
  }

  // a3 = Phi(a1, a2)
  int size = list.size(); // a3.invokeinterface() java/util/List.size()I
  System.out.println(size);
}

Even if the local variable "list" always holds a reference to an ArrayList
instance, the Phi node stops the exact type information to flow through it,
so the "list.size()" call site can't be inlined.

I thought I'd be able to fix the problem by adding an implementation of
Phi::exact_type(), and I made a patch, avaiable at [2].

The basic idea is simple: if all operands of a Phi node agrees on a single
exact type, use it as the exact type of this Phi.
And some assumptions:
1. Because C1's HIR is in SSA form, the only kind of nodes that can have
cycles in data dependence graph is Phi. Cycles have to be broken when
recursively traversing the operands of a Phi node. If a cycle is found, I'll
just give up  finding the exact type of this Phi.
2. Unlike C2, which prunes the part of the graph not reachable from the OSR
entry point in OSR compilations, C1 always sees the whole graph of a method,
regardless of standard or an OSR compilation. If a variable needs a Phi and
the operands don't agree on a single exact type, standard compilation would
have noticed; otherwise, if a variable doesn't need a Phi, or it needs a Phi
but the operands agree on a single exact type, it should still hold in an
OSR compilation. So, if an operand of a Phi node is a UnsafeGetRaw (which
can only be introduced in an OSR entry), skipping it should be safe.

Applying the patch did allow the affected call sites to get inlined, in the
microbenchmark in [1]. The diamond control flow example got the
"list.size()" call site inlined as well.

But, the patch had a fatal bug. The Java code example in [2] demonstrates
that bug.
C1 builds the HIR graph incrementally; inline decisions are made as a part
of the HIR building process. When C1's GraphBuilder sees an invoke*
bytecode, it'll try to devirtualize the call site by asking for the
receiver's exact_type(). But the relationships between Phi nodes may still
be incomplete by then, so the Phi::exact_type() in my patch may return
immature (thus incorrect) results.
In the example in [2], the "list.size()" call site at line 18 inlined
java.util.ArrayList.size(), which is incorrect. The HIR log shows that when
GraphBuilder tried to inline this call site, the receiver (list4 in code
comment) had only one operand (list3 in code comment), which covers the
first two definitions of "list" (list1 and list2) but missed the third one
(list5). The connection between list4 and list5 was added later, too late.

So, the patch doesn't work.

My questions:
1. Any ideas on how to implement a correct Phi::exact_type() that conforms
to the way HIR graph is built now?
2. If the inline decisions are decoupled from the HIR graph building phase,
and pushed to a later phase, would it significantly slow down/complicate C1?
If it was done later, it could have allowed a much better chance of inlining
more stuff. Besides, it might allow policy-controlled iterations of other
optimizations + inlining, so in tiered mode warm methods may get finer grain
control of optimizations, and result in better code quality (suppose it
couldn't go to tier 4, or got deopt'd and fell to tier 1).

Regards,
Kris Mok

[1]: http://icyfenix.iteye.com/blog/1110279
[2]: https://gist.github.com/1133678
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110809/585331bc/attachment.html 

From tom.rodriguez at oracle.com  Tue Aug  9 14:02:07 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 9 Aug 2011 14:02:07 -0700
Subject: Request for reviews (L): 7063629: use cbcond in C2 generated code
	on T4
In-Reply-To: <4E3B452E.10509@oracle.com>
References: <4E3B452E.10509@oracle.com>
Message-ID: <FFE3DD2B-A68C-4781-A3B9-EF642E7E484F@oracle.com>

This looks really good.

This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*.  That would make it easier to use it directly during code generation, as in:

+     __ jmpb($labl$$label);

sparc.ad:

It might be nice to factor this out:

      Assembler::Predict predict_taken =
+       cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn;

x86_32.ad:

Would you get averse to inlining Jcc and JccShort?

output.cpp:

Why does the first round of shorten_branches occur in the middle of init_buffer?  Couldn't it be done right afterwards?  It's just odd that it's buried inside there.

That first round is conservative since we haven't done all padding yet, right?  Then shorten_branches_final does a last pass based on the real offsets?  shorten_branches_final isn't a great name.  Maybe finalize_offsets_and_shorten?

The core shorten branch logic is duplicated in those functions.  Could it be factored out or is there too much local state?

Why was this needed?

*** 2182,2192 ****
--- 2383,2393 ----
        (op != Op_Node &&         // Not an unused antidepedence node and
         // not an unallocated boxlock
         (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) {
  
      // Push any trailing projections
!     if( bb->_nodes[bb->_nodes.size()-1] != n ) {
!     if( bb->_nodes[_bb_end-1] != n ) {
        for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) {
          Node *foi = n->fast_out(i);
          if( foi->is_Proj() )
            _scheduled.push(foi);
        }

That code is complicated enough that I can't reason about it's correctness from a webrev.  Is this because of the trailing NOPs?

Can you add this comment to the that last anti_do_def piece I added:

// kill projections on a branch should appear to occur on the
// branch, not afterwards, so grab the masks from the projections
// and process them.

tom


On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7063629/webrev
> 
> 7063629: use cbcond in C2 generated code on T4
> 
> The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86.
> 
> Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back.
> 
> Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding.
> 
> Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions.
> 
> Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file.
> 
> Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64.
> 
> Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte).
> 
> The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it.


From vladimir.kozlov at oracle.com  Tue Aug  9 15:55:37 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 09 Aug 2011 15:55:37 -0700
Subject: Request for reviews (L):  7063629: use cbcond in C2 generated
	code on T4
In-Reply-To: <FFE3DD2B-A68C-4781-A3B9-EF642E7E484F@oracle.com>
References: <4E3B452E.10509@oracle.com>
	<FFE3DD2B-A68C-4781-A3B9-EF642E7E484F@oracle.com>
Message-ID: <4E41BAE9.1070505@oracle.com>

Thank you, Tom

Tom Rodriguez wrote:
> This looks really good.
> 
> This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*.  That would make it easier to use it directly during code generation, as in:
> 
> +     __ jmpb($labl$$label);

Yes, I would leave it for an other time. I will file RFE.

> 
> sparc.ad:
> 
> It might be nice to factor this out:
> 
>       Assembler::Predict predict_taken =
> +       cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn;

I will file RFE for that: use probability from IfNode to determine the pt value 
as you suggested before.

> 
> x86_32.ad:
> 
> Would you get averse to inlining Jcc and JccShort?

I did not realize that it is just one instruction now :)
They are used in a lot of places and I did not want to duplicate the original 
code. I will inline them now.

> 
> output.cpp:
> 
> Why does the first round of shorten_branches occur in the middle of init_buffer?  Couldn't it be done right afterwards?  It's just odd that it's buried inside there.

First loop in shorten_branches() estimates code, locals, stubs sizes which are 
used later in init_buffer() to allocate CodeBuffer. I would need to split 
shorten_branches() method which is not easy since the first loop also collects 
information about branches which could be replaced.

> 
> That first round is conservative since we haven't done all padding yet, right?

Correct.

> Then shorten_branches_final does a last pass based on the real offsets? 

Yes, backward branches inserted in this method use final offsets. For forward 
branches we still have only conservative offsets since following blocks are not 
processed yet.

> shorten_branches_final isn't a great name.  Maybe finalize_offsets_and_shorten?

I also did not like it, I will use finalize_offsets_and_shorten()

> 
> The core shorten branch logic is duplicated in those functions.  Could it be factored out or is there too much local state?

I thought about it but as you said "too much local state".

> 
> Why was this needed?
> 
> *** 2182,2192 ****
> --- 2383,2393 ----
>         (op != Op_Node &&         // Not an unused antidepedence node and
>          // not an unallocated boxlock
>          (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) {
>   
>       // Push any trailing projections
> !     if( bb->_nodes[bb->_nodes.size()-1] != n ) {
> !     if( bb->_nodes[_bb_end-1] != n ) {
>         for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) {
>           Node *foi = n->fast_out(i);
>           if( foi->is_Proj() )
>             _scheduled.push(foi);
>         }
> 
> That code is complicated enough that I can't reason about it's correctness from a webrev.  Is this because of the trailing NOPs?

I hit next assert during development because the loop above pushed nodes which 
are not for schedule.

     assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of 
instructions" );

It may happened before I split shorten_branches() and there were trailing NOPs. 
But it is not only trailing NOPs, it is also projections after calls and 
MachNullCheck nodes (see code in DoScheduling()). I think in general the check 
above should check the last node for schedule and not the last node in block.

> 
> Can you add this comment to the that last anti_do_def piece I added:
> 
> // kill projections on a branch should appear to occur on the
> // branch, not afterwards, so grab the masks from the projections
> // and process them.

Done.

Thanks,
Vladimir

> 
> tom
> 
> 
> On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/7063629/webrev
>>
>> 7063629: use cbcond in C2 generated code on T4
>>
>> The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86.
>>
>> Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back.
>>
>> Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding.
>>
>> Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions.
>>
>> Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file.
>>
>> Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64.
>>
>> Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte).
>>
>> The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it.
> 

From roland.westrelin at oracle.com  Wed Aug 10 05:00:28 2011
From: roland.westrelin at oracle.com (roland.westrelin at oracle.com)
Date: Wed, 10 Aug 2011 12:00:28 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7074017: Introduce
	MemBarAcquireLock/MemBarReleaseLock nodes for monitor
	enter/exit code paths
Message-ID: <20110810120032.DC88047A8F@hg.openjdk.java.net>

Changeset: f1c12354c3f7
Author:    roland
Date:      2011-08-02 18:36 +0200
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/f1c12354c3f7

7074017: Introduce MemBarAcquireLock/MemBarReleaseLock nodes for monitor enter/exit code paths
Summary: replace MemBarAcquire/MemBarRelease nodes on the monitor enter/exit code paths with new MemBarAcquireLock/MemBarReleaseLock nodes
Reviewed-by: kvn, twisti

! src/cpu/sparc/vm/sparc.ad
! src/cpu/x86/vm/x86_32.ad
! src/cpu/x86/vm/x86_64.ad
! src/share/vm/adlc/formssel.cpp
! src/share/vm/opto/classes.hpp
! src/share/vm/opto/graphKit.cpp
! src/share/vm/opto/macro.cpp
! src/share/vm/opto/matcher.cpp
! src/share/vm/opto/matcher.hpp
! src/share/vm/opto/memnode.cpp
! src/share/vm/opto/memnode.hpp


From tom.rodriguez at oracle.com  Wed Aug 10 12:28:07 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 10 Aug 2011 12:28:07 -0700
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<E09300F6-0DBF-4AA2-9245-C84F4BE6B10F@oracle.com>
	<6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com>
Message-ID: <15BDEB85-0323-4026-A249-D979D88E863B@oracle.com>


On Aug 9, 2011, at 4:33 AM, Christian Thalinger wrote:

> 
> On Aug 8, 2011, at 8:49 PM, Tom Rodriguez wrote:
> 
>> dependencies.cpp:
>> 
>> in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed.  It should probably look more like this:
>> 
>> klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) {
>> assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity");
>> // Same CallSite object but different target?  Check this specific call site
>> //  if changes is non-NULL or validate all CallSites
>> if ((changes == NULL || (call_site == changes->call_site())) &&
>>     (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) {
>>   return ctxk;  // assertion failed
>> }
>> assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid");
>> return NULL;  // assertion still valid
>> }
> 
> I see your point.  But the code above is broken as changes->method_handle() will not work when changes == NULL.  One of my first versions of this code also stored the MethodHandle target in the dependence stream which seems to be required when we want to validate all CallSites.  Something like this

Yes that right.  The new webrev looks good.

tom


> 
> ! klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, oop method_handle, CallSiteDepChange* changes) {
> +   assert(call_site    ->is_a(SystemDictionary::CallSite_klass()),     "sanity");
> +   assert(method_handle->is_a(SystemDictionary::MethodHandle_klass()), "sanity");
> +   if (changes == NULL) {
> +     // Validate all CallSites
> +     if (java_lang_invoke_CallSite::target(call_site) != method_handle)
> +       return ctxk;  // assertion failed
> +   } else {
> +     // Validate the given CallSite
> +     if (call_site == changes->call_site() && java_lang_invoke_CallSite::target(call_site) != changes->method_handle()) {
> +       assert(method_handle != changes->method_handle(), "must be");
> +       return ctxk;  // assertion failed
> +     }
> +   }
> +   assert(java_lang_invoke_CallSite::target(call_site) == method_handle, "should still be valid");
> +   return NULL;  // assertion still valid
> + }
> 
>> 
>> The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked.
>> 
>> interpreterRuntime.cpp:
>> 
>> Please move the dependence check code into universe with the other dependence check code.
> 
> Where it says:
> 
> // %%% The Universe::flush_foo methods belong in CodeCache.
> 
> :-)
> 
>>  Also add some comments explaining why it's doing what it's doing.
> 
> Done.
> 
>> 
>> doCall.cpp:
>> 
>> Can you put in a comment explaining that VolatileCallSite is never inlined.
> 
> Done.
> 
>> 
>> Otherwise it looks good.
> 
> webrev updated.
> 
> -- Christian
> 
>> 
>> tom
>> 
>> 
>> On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote:
>> 
>>> http://cr.openjdk.java.net/~twisti/7071653
>>> 
>>> 7071653: JSR 292: call site change notification should be pushed not pulled
>>> Reviewed-by:
>>> 
>>> Currently every speculatively inlined method handle call site has a
>>> guard that compares the current target of the CallSite object to the
>>> inlined one.  This per-invocation overhead can be removed if the
>>> notification is changed from pulled to pushed (i.e. deoptimization).
>>> 
>>> I had to change the logic in TemplateTable::patch_bytecode to skip
>>> bytecode quickening for putfield instructions when the put_code
>>> written to the constant pool cache is zero.  This is required so that
>>> every execution of a putfield to CallSite.target calls out to
>>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>>> depending compiled methods.
>>> 
>>> I also had to change the dependency machinery to understand other
>>> dependencies than class hierarchy ones.  DepChange got the super-type
>>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>>> 
>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>>> tests and vm.mlvm tests.
>>> 
>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>>> second with 7071653).  Since the CallSite targets don't change during
>>> the runtime of this benchmark we can see the performance benefit of
>>> eliminating the guard:
>>> 
>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>> 0.883000   0.000000   0.883000 (  0.854000)
>>> 0.715000   0.000000   0.715000 (  0.715000)
>>> 0.712000   0.000000   0.712000 (  0.712000)
>>> 0.713000   0.000000   0.713000 (  0.713000)
>>> 0.713000   0.000000   0.713000 (  0.712000)
>>> 
>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>> 0.772000   0.000000   0.772000 (  0.742000)
>>> 0.624000   0.000000   0.624000 (  0.624000)
>>> 0.621000   0.000000   0.621000 (  0.621000)
>>> 0.622000   0.000000   0.622000 (  0.622000)
>>> 0.622000   0.000000   0.622000 (  0.621000)
>>> 
>> 
> 


From christian.thalinger at oracle.com  Wed Aug 10 12:34:27 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 10 Aug 2011 21:34:27 +0200
Subject: Request for review (L): 7071653: JSR 292: call site change
	notification should be pushed not pulled
In-Reply-To: <15BDEB85-0323-4026-A249-D979D88E863B@oracle.com>
References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com>
	<E09300F6-0DBF-4AA2-9245-C84F4BE6B10F@oracle.com>
	<6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com>
	<15BDEB85-0323-4026-A249-D979D88E863B@oracle.com>
Message-ID: <27ED8701-5353-4929-B9F1-D5A4F7A361B4@oracle.com>


On Aug 10, 2011, at 9:28 PM, Tom Rodriguez wrote:

> 
> On Aug 9, 2011, at 4:33 AM, Christian Thalinger wrote:
> 
>> 
>> On Aug 8, 2011, at 8:49 PM, Tom Rodriguez wrote:
>> 
>>> dependencies.cpp:
>>> 
>>> in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed.  It should probably look more like this:
>>> 
>>> klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) {
>>> assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity");
>>> // Same CallSite object but different target?  Check this specific call site
>>> //  if changes is non-NULL or validate all CallSites
>>> if ((changes == NULL || (call_site == changes->call_site())) &&
>>>    (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) {
>>>  return ctxk;  // assertion failed
>>> }
>>> assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid");
>>> return NULL;  // assertion still valid
>>> }
>> 
>> I see your point.  But the code above is broken as changes->method_handle() will not work when changes == NULL.  One of my first versions of this code also stored the MethodHandle target in the dependence stream which seems to be required when we want to validate all CallSites.  Something like this
> 
> Yes that right.  The new webrev looks good.

Thank you, Tom.

-- Christian

> 
> tom
> 
> 
>> 
>> ! klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, oop method_handle, CallSiteDepChange* changes) {
>> +   assert(call_site    ->is_a(SystemDictionary::CallSite_klass()),     "sanity");
>> +   assert(method_handle->is_a(SystemDictionary::MethodHandle_klass()), "sanity");
>> +   if (changes == NULL) {
>> +     // Validate all CallSites
>> +     if (java_lang_invoke_CallSite::target(call_site) != method_handle)
>> +       return ctxk;  // assertion failed
>> +   } else {
>> +     // Validate the given CallSite
>> +     if (call_site == changes->call_site() && java_lang_invoke_CallSite::target(call_site) != changes->method_handle()) {
>> +       assert(method_handle != changes->method_handle(), "must be");
>> +       return ctxk;  // assertion failed
>> +     }
>> +   }
>> +   assert(java_lang_invoke_CallSite::target(call_site) == method_handle, "should still be valid");
>> +   return NULL;  // assertion still valid
>> + }
>> 
>>> 
>>> The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked.
>>> 
>>> interpreterRuntime.cpp:
>>> 
>>> Please move the dependence check code into universe with the other dependence check code.
>> 
>> Where it says:
>> 
>> // %%% The Universe::flush_foo methods belong in CodeCache.
>> 
>> :-)
>> 
>>> Also add some comments explaining why it's doing what it's doing.
>> 
>> Done.
>> 
>>> 
>>> doCall.cpp:
>>> 
>>> Can you put in a comment explaining that VolatileCallSite is never inlined.
>> 
>> Done.
>> 
>>> 
>>> Otherwise it looks good.
>> 
>> webrev updated.
>> 
>> -- Christian
>> 
>>> 
>>> tom
>>> 
>>> 
>>> On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote:
>>> 
>>>> http://cr.openjdk.java.net/~twisti/7071653
>>>> 
>>>> 7071653: JSR 292: call site change notification should be pushed not pulled
>>>> Reviewed-by:
>>>> 
>>>> Currently every speculatively inlined method handle call site has a
>>>> guard that compares the current target of the CallSite object to the
>>>> inlined one.  This per-invocation overhead can be removed if the
>>>> notification is changed from pulled to pushed (i.e. deoptimization).
>>>> 
>>>> I had to change the logic in TemplateTable::patch_bytecode to skip
>>>> bytecode quickening for putfield instructions when the put_code
>>>> written to the constant pool cache is zero.  This is required so that
>>>> every execution of a putfield to CallSite.target calls out to
>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>>>> depending compiled methods.
>>>> 
>>>> I also had to change the dependency machinery to understand other
>>>> dependencies than class hierarchy ones.  DepChange got the super-type
>>>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>>>> 
>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>>>> tests and vm.mlvm tests.
>>>> 
>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>>>> second with 7071653).  Since the CallSite targets don't change during
>>>> the runtime of this benchmark we can see the performance benefit of
>>>> eliminating the guard:
>>>> 
>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>> 0.883000   0.000000   0.883000 (  0.854000)
>>>> 0.715000   0.000000   0.715000 (  0.715000)
>>>> 0.712000   0.000000   0.712000 (  0.712000)
>>>> 0.713000   0.000000   0.713000 (  0.713000)
>>>> 0.713000   0.000000   0.713000 (  0.712000)
>>>> 
>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>>> 0.772000   0.000000   0.772000 (  0.742000)
>>>> 0.624000   0.000000   0.624000 (  0.624000)
>>>> 0.621000   0.000000   0.621000 (  0.621000)
>>>> 0.622000   0.000000   0.622000 (  0.622000)
>>>> 0.622000   0.000000   0.622000 (  0.621000)
>>>> 
>>> 
>> 
> 


From vladimir.kozlov at oracle.com  Wed Aug 10 12:47:51 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 10 Aug 2011 12:47:51 -0700
Subject: Request for reviews (S): 7077439: Possible reference through NULL
	in loopPredicate.cpp:726
Message-ID: <4E42E067.8020302@oracle.com>

http://cr.openjdk.java.net/~kvn/7077439/webrev

Fixed 7077439: Possible reference through NULL in loopPredicate.cpp:726

VM crashed at the next line because cl->loopexit() == NULL when I tried to port 
7070134 into previous Hotspot sources:

     BoolTest::mask bt = cl->loopexit()->test_trip();

I did not see such crush with latest HS22 sources but it does not mean it can't 
happen. The check cl->is_valid_counted_loop() should be used in the code to 
avoid such crush. Note, this check is superset of cl->stride_is_con() so the 
later could be replaced.


From tom.rodriguez at oracle.com  Wed Aug 10 12:52:28 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 10 Aug 2011 12:52:28 -0700
Subject: IdealGraphVisualizer file compatibility
In-Reply-To: <CAARN+eGVB=4qqu1KNJRx3j-cV8E=ucrNruU7qKeJU0Mwx=KhOA@mail.gmail.com>
References: <CAARN+eEPit0KA+B_WS5Buxi5YAZVe8k2rhJ6ssNqK1nNEdhqTw@mail.gmail.com>
	<60867EAC-E2A7-4EA1-9A86-1B421C328693@oracle.com>
	<CAARN+eENB3=_XpW7Hdqbe1hXEJ2oRL-JOSEtFNUPZjmp=joNug@mail.gmail.com>
	<ABDD905E-085D-45AC-B552-BD1CBFF93DAC@oracle.com>
	<CAARN+eGVB=4qqu1KNJRx3j-cV8E=ucrNruU7qKeJU0Mwx=KhOA@mail.gmail.com>
Message-ID: <16906A61-ADC0-4700-A1B8-5082604F8420@oracle.com>


On Aug 3, 2011, at 10:00 AM, Joe Kearney wrote:

> Oh ok, I didn't realise. Thanks. Are there any plans to make it more
> widely available? I can see it being useful for experimenting to
> squeeze performance.

We don't have any current plans.  We've tended not to include developer specific features in the product binary, mainly to avoid making an already large library even larger.  Admittedly IGV support is pretty small code size wise.

tom

> 
> Thanks,
> Joe
> 
> On 3 August 2011 17:42, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>> It's not available in the product as it's really intended for developers.  Use a fastdebug build.
>> 
>> tom
>> 
>> On Aug 3, 2011, at 9:37 AM, Joe Kearney wrote:
>> 
>>> Ah, thanks for the readme link.
>>> 
>>> I can't get hotspot 1.6.0_25 or 1.7.0 to recognise the
>>> PrintIdealGraphLevel/PrintIdealGraphFile options. I tried with
>>> UnlockDiagnosticVMOptions etc as well. to no avail. Is there something
>>> else needed to expose this?
>>> 
>>> Joe
>>> 
>>> On 3 August 2011 15:51, Christian Thalinger
>>> <christian.thalinger at oracle.com> wrote:
>>>> You want:  -XX:PrintIdealGraphLevel=1 -XX:PrintIdealGraphFile=output.xml
>>>> 
>>>> The README of the visualizer also helps:
>>>> 
>>>> http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/tip/src/share/tools/IdealGraphVisualizer/README
>>>> 
>>>> -- Christian
>>>> 
>>>> On Aug 3, 2011, at 4:17 PM, Joe Kearney wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I've been trying to play with igv from
>>>>> http://ssw.jku.at/General/Staff/TW/igv.html,
>>>>> http://ssw.jku.at/General/Staff/PH/ but I don't know how to generate
>>>>> the required log files. What sort of files should I expect the igv to
>>>>> be able to read? The example files are graphDocument XMLs. I was
>>>>> hoping to be able to generate a file with something like the
>>>>> following:
>>>>> 
>>>>> -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=output.xml
>>>>> 
>>>>> Needless to say, these hotspot_log files are totally different and the
>>>>> igv barfs with the below.
>>>>> 
>>>>> java.lang.NullPointerException
>>>>>       at com.sun.hotspot.igv.data.GraphDocument.addGraphDocument(GraphDocument.java:70)
>>>>>       at com.sun.hotspot.igv.coordinator.actions.ImportAction$3.run(ImportAction.java:128)
>>>>>       at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
>>>>> [catch] at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
>>>>> 
>>>>> 
>>>>> How do I get the jvm to generate the right output file?
>>>>> 
>>>>> Many thanks,
>>>>> Joe
>>>> 
>>>> 
>> 
>> 


From tom.rodriguez at oracle.com  Wed Aug 10 13:07:03 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 10 Aug 2011 13:07:03 -0700
Subject: Request for reviews (S): 7077439: Possible reference through NULL
	in loopPredicate.cpp:726
In-Reply-To: <4E42E067.8020302@oracle.com>
References: <4E42E067.8020302@oracle.com>
Message-ID: <CE67D87D-30F8-4159-ABD6-2F8B6377E1D7@oracle.com>

Looks good.

tom

On Aug 10, 2011, at 12:47 PM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7077439/webrev
> 
> Fixed 7077439: Possible reference through NULL in loopPredicate.cpp:726
> 
> VM crashed at the next line because cl->loopexit() == NULL when I tried to port 7070134 into previous Hotspot sources:
> 
>    BoolTest::mask bt = cl->loopexit()->test_trip();
> 
> I did not see such crush with latest HS22 sources but it does not mean it can't happen. The check cl->is_valid_counted_loop() should be used in the code to avoid such crush. Note, this check is superset of cl->stride_is_con() so the later could be replaced.
> 


From vladimir.kozlov at oracle.com  Wed Aug 10 14:01:03 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 10 Aug 2011 14:01:03 -0700
Subject: Request for reviews (S): 7077439: Possible reference through
	NULL in loopPredicate.cpp:726
In-Reply-To: <CE67D87D-30F8-4159-ABD6-2F8B6377E1D7@oracle.com>
References: <4E42E067.8020302@oracle.com>
	<CE67D87D-30F8-4159-ABD6-2F8B6377E1D7@oracle.com>
Message-ID: <4E42F18F.5060001@oracle.com>

Thank you, Tom

Vladimir

Tom Rodriguez wrote:
> Looks good.
> 
> tom
> 
> On Aug 10, 2011, at 12:47 PM, Vladimir Kozlov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/7077439/webrev
>>
>> Fixed 7077439: Possible reference through NULL in loopPredicate.cpp:726
>>
>> VM crashed at the next line because cl->loopexit() == NULL when I tried to port 7070134 into previous Hotspot sources:
>>
>>    BoolTest::mask bt = cl->loopexit()->test_trip();
>>
>> I did not see such crush with latest HS22 sources but it does not mean it can't happen. The check cl->is_valid_counted_loop() should be used in the code to avoid such crush. Note, this check is superset of cl->stride_is_con() so the later could be replaced.
>>
> 

From vladimir.kozlov at oracle.com  Wed Aug 10 18:12:36 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Thu, 11 Aug 2011 01:12:36 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7077439: Possible reference through
	NULL in loopPredicate.cpp:726
Message-ID: <20110811011238.EA27A47AB5@hg.openjdk.java.net>

Changeset: 6987871cfb9b
Author:    kvn
Date:      2011-08-10 14:06 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/6987871cfb9b

7077439: Possible reference through NULL in loopPredicate.cpp:726
Summary: Use cl->is_valid_counted_loop() check.
Reviewed-by: never

! src/share/vm/opto/loopPredicate.cpp
! src/share/vm/opto/loopTransform.cpp
! src/share/vm/opto/loopnode.cpp
! src/share/vm/opto/superword.cpp


From vladimir.kozlov at oracle.com  Thu Aug 11 11:22:04 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 11 Aug 2011 11:22:04 -0700
Subject: Request for reviews (L):  7063629: use cbcond in C2 generated
	code on T4
In-Reply-To: <4E41BAE9.1070505@oracle.com>
References: <4E3B452E.10509@oracle.com>
	<FFE3DD2B-A68C-4781-A3B9-EF642E7E484F@oracle.com>
	<4E41BAE9.1070505@oracle.com>
Message-ID: <4E441DCC.5040303@oracle.com>

 >> !     if( bb->_nodes[bb->_nodes.size()-1] != n ) {
 >> !     if( bb->_nodes[_bb_end-1] != n ) {
 >>
 >> That code is complicated enough that I can't reason about it's
 >> correctness from a webrev.  Is this because of the trailing NOPs?
 >
 > I hit next assert during development because the loop above pushed nodes
 > which are not for schedule.
 >
 >     assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of
 > instructions" );
 >
 > It may happened before I split shorten_branches() and there were
 > trailing NOPs. But it is not only trailing NOPs, it is also projections
 > after calls and MachNullCheck nodes (see code in DoScheduling()). I
 > think in general the check above should check the last node for schedule
 > and not the last node in block.

Tom,

I ran full CTW without this change with my latest changes and did not hit the 
assert which confirms that it was problem in early development when trailing 
NOPs were inserted before DoScheduling() call. Do you think I should remove this 
change?

Thanks,
Vladimir

Vladimir Kozlov wrote:
> Thank you, Tom
> 
> Tom Rodriguez wrote:
>> This looks really good.
>>
>> This might be for another day but now that label must be non-NULL, 
>> maybe it should be a Label& instead of a Label*.  That would make it 
>> easier to use it directly during code generation, as in:
>>
>> +     __ jmpb($labl$$label);
> 
> Yes, I would leave it for an other time. I will file RFE.
> 
>>
>> sparc.ad:
>>
>> It might be nice to factor this out:
>>
>>       Assembler::Predict predict_taken =
>> +       cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn;
> 
> I will file RFE for that: use probability from IfNode to determine the 
> pt value as you suggested before.
> 
>>
>> x86_32.ad:
>>
>> Would you get averse to inlining Jcc and JccShort?
> 
> I did not realize that it is just one instruction now :)
> They are used in a lot of places and I did not want to duplicate the 
> original code. I will inline them now.
> 
>>
>> output.cpp:
>>
>> Why does the first round of shorten_branches occur in the middle of 
>> init_buffer?  Couldn't it be done right afterwards?  It's just odd 
>> that it's buried inside there.
> 
> First loop in shorten_branches() estimates code, locals, stubs sizes 
> which are used later in init_buffer() to allocate CodeBuffer. I would 
> need to split shorten_branches() method which is not easy since the 
> first loop also collects information about branches which could be 
> replaced.
> 
>>
>> That first round is conservative since we haven't done all padding 
>> yet, right?
> 
> Correct.
> 
>> Then shorten_branches_final does a last pass based on the real offsets? 
> 
> Yes, backward branches inserted in this method use final offsets. For 
> forward branches we still have only conservative offsets since following 
> blocks are not processed yet.
> 
>> shorten_branches_final isn't a great name.  Maybe 
>> finalize_offsets_and_shorten?
> 
> I also did not like it, I will use finalize_offsets_and_shorten()
> 
>>
>> The core shorten branch logic is duplicated in those functions.  Could 
>> it be factored out or is there too much local state?
> 
> I thought about it but as you said "too much local state".
> 
>>
>> Why was this needed?
>>
>> *** 2182,2192 ****
>> --- 2383,2393 ----
>>         (op != Op_Node &&         // Not an unused antidepedence node and
>>          // not an unallocated boxlock
>>          (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != 
>> Op_BoxLock)) ) {
>>         // Push any trailing projections
>> !     if( bb->_nodes[bb->_nodes.size()-1] != n ) {
>> !     if( bb->_nodes[_bb_end-1] != n ) {
>>         for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; 
>> i++) {
>>           Node *foi = n->fast_out(i);
>>           if( foi->is_Proj() )
>>             _scheduled.push(foi);
>>         }
>>
>> That code is complicated enough that I can't reason about it's 
>> correctness from a webrev.  Is this because of the trailing NOPs?
> 
> I hit next assert during development because the loop above pushed nodes 
> which are not for schedule.
> 
>     assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of 
> instructions" );
> 
> It may happened before I split shorten_branches() and there were 
> trailing NOPs. But it is not only trailing NOPs, it is also projections 
> after calls and MachNullCheck nodes (see code in DoScheduling()). I 
> think in general the check above should check the last node for schedule 
> and not the last node in block.
> 
>>
>> Can you add this comment to the that last anti_do_def piece I added:
>>
>> // kill projections on a branch should appear to occur on the
>> // branch, not afterwards, so grab the masks from the projections
>> // and process them.
> 
> Done.
> 
> Thanks,
> Vladimir
> 
>>
>> tom
>>
>>
>> On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote:
>>
>>> http://cr.openjdk.java.net/~kvn/7063629/webrev
>>>
>>> 7063629: use cbcond in C2 generated code on T4
>>>
>>> The code is finally shaped as I want and it passed CTW, regression, 
>>> nsk tests on T4 and x86.
>>>
>>> Added new fused compare and branch instructions into sparc.ad and 
>>> corresponding short versions which use cbcond instruction. Added new 
>>> flag avoid_back_to_back to avoid generation of cbcond back to back.
>>>
>>> Split shorten_branches() into 2 methods. First method conservatively 
>>> estimates code size and branches location and does few rounds of 
>>> branch shortening. It is executed before ScheduleAndBundle(). Step 3 
>>> is moved to new method shorten_branches_final() called after 
>>> ScheduleAndBundle(). It does final paddings, alignment and final 
>>> branch replacement. Method fill_buffer() does verification instead of 
>>> padding.
>>>
>>> Labels are binded now only during code generation in fill_buffer(). 
>>> As result they are not available when forward branches are emitted. 
>>> To fix that MacroAssembler branch instructions are used now in x86 
>>> .ad files. I replaced unused rtype parameter with maybe_short flag to 
>>> force using only long branches in .ad long branch instructions.
>>>
>>> Added check to adlc to verify that short version of a branch 
>>> instructions has the same declaration in .ad file.
>>>
>>> Added assert to verify that the size of emitted instruction matches 
>>> the value returned by MachNode::size(). Found that 
>>> MachBreakpointNode::size() returned incorrect value on x64.
>>>
>>> Fixed loop alignment for Sparc (min alignment should be instruction 
>>> size which is 4 bytes instead of 1 byte).
>>>
>>> The prototype was done by Tom and I took some of his additional 
>>> fixes. The block changes go with some code in output to put opto 
>>> assembly style block comments in the PrintNMethods output. There's 
>>> also snippet in there that deals with the fact kill projections on 
>>> branches make it appear the kill occurs after the branch instead of 
>>> being part of it.
>>

From tom.rodriguez at oracle.com  Thu Aug 11 11:30:42 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 11 Aug 2011 11:30:42 -0700
Subject: Request for reviews (L): 7063629: use cbcond in C2 generated code
	on T4
In-Reply-To: <4E441DCC.5040303@oracle.com>
References: <4E3B452E.10509@oracle.com>
	<FFE3DD2B-A68C-4781-A3B9-EF642E7E484F@oracle.com>
	<4E41BAE9.1070505@oracle.com> <4E441DCC.5040303@oracle.com>
Message-ID: <CE1D8AB0-215A-451F-B954-F53AE4E08B72@oracle.com>


On Aug 11, 2011, at 11:22 AM, Vladimir Kozlov wrote:

> >> !     if( bb->_nodes[bb->_nodes.size()-1] != n ) {
> >> !     if( bb->_nodes[_bb_end-1] != n ) {
> >>
> >> That code is complicated enough that I can't reason about it's
> >> correctness from a webrev.  Is this because of the trailing NOPs?
> >
> > I hit next assert during development because the loop above pushed nodes
> > which are not for schedule.
> >
> >     assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of
> > instructions" );
> >
> > It may happened before I split shorten_branches() and there were
> > trailing NOPs. But it is not only trailing NOPs, it is also projections
> > after calls and MachNullCheck nodes (see code in DoScheduling()). I
> > think in general the check above should check the last node for schedule
> > and not the last node in block.
> 
> Tom,
> 
> I ran full CTW without this change with my latest changes and did not hit the assert which confirms that it was problem in early development when trailing NOPs were inserted before DoScheduling() call. Do you think I should remove this change?

If it isn't be needed then I think should be removed.  You could put in an assert that the old and new value are equal and then investigate any cases where they are different to confirm which value is correct.  It may be that they are different but both could be correct.

tom

> 
> Thanks,
> Vladimir
> 
> Vladimir Kozlov wrote:
>> Thank you, Tom
>> Tom Rodriguez wrote:
>>> This looks really good.
>>> 
>>> This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*.  That would make it easier to use it directly during code generation, as in:
>>> 
>>> +     __ jmpb($labl$$label);
>> Yes, I would leave it for an other time. I will file RFE.
>>> 
>>> sparc.ad:
>>> 
>>> It might be nice to factor this out:
>>> 
>>>      Assembler::Predict predict_taken =
>>> +       cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn;
>> I will file RFE for that: use probability from IfNode to determine the pt value as you suggested before.
>>> 
>>> x86_32.ad:
>>> 
>>> Would you get averse to inlining Jcc and JccShort?
>> I did not realize that it is just one instruction now :)
>> They are used in a lot of places and I did not want to duplicate the original code. I will inline them now.
>>> 
>>> output.cpp:
>>> 
>>> Why does the first round of shorten_branches occur in the middle of init_buffer?  Couldn't it be done right afterwards?  It's just odd that it's buried inside there.
>> First loop in shorten_branches() estimates code, locals, stubs sizes which are used later in init_buffer() to allocate CodeBuffer. I would need to split shorten_branches() method which is not easy since the first loop also collects information about branches which could be replaced.
>>> 
>>> That first round is conservative since we haven't done all padding yet, right?
>> Correct.
>>> Then shorten_branches_final does a last pass based on the real offsets? 
>> Yes, backward branches inserted in this method use final offsets. For forward branches we still have only conservative offsets since following blocks are not processed yet.
>>> shorten_branches_final isn't a great name.  Maybe finalize_offsets_and_shorten?
>> I also did not like it, I will use finalize_offsets_and_shorten()
>>> 
>>> The core shorten branch logic is duplicated in those functions.  Could it be factored out or is there too much local state?
>> I thought about it but as you said "too much local state".
>>> 
>>> Why was this needed?
>>> 
>>> *** 2182,2192 ****
>>> --- 2383,2393 ----
>>>        (op != Op_Node &&         // Not an unused antidepedence node and
>>>         // not an unallocated boxlock
>>>         (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) {
>>>        // Push any trailing projections
>>> !     if( bb->_nodes[bb->_nodes.size()-1] != n ) {
>>> !     if( bb->_nodes[_bb_end-1] != n ) {
>>>        for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) {
>>>          Node *foi = n->fast_out(i);
>>>          if( foi->is_Proj() )
>>>            _scheduled.push(foi);
>>>        }
>>> 
>>> That code is complicated enough that I can't reason about it's correctness from a webrev.  Is this because of the trailing NOPs?
>> I hit next assert during development because the loop above pushed nodes which are not for schedule.
>>    assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of instructions" );
>> It may happened before I split shorten_branches() and there were trailing NOPs. But it is not only trailing NOPs, it is also projections after calls and MachNullCheck nodes (see code in DoScheduling()). I think in general the check above should check the last node for schedule and not the last node in block.
>>> 
>>> Can you add this comment to the that last anti_do_def piece I added:
>>> 
>>> // kill projections on a branch should appear to occur on the
>>> // branch, not afterwards, so grab the masks from the projections
>>> // and process them.
>> Done.
>> Thanks,
>> Vladimir
>>> 
>>> tom
>>> 
>>> 
>>> On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote:
>>> 
>>>> http://cr.openjdk.java.net/~kvn/7063629/webrev
>>>> 
>>>> 7063629: use cbcond in C2 generated code on T4
>>>> 
>>>> The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86.
>>>> 
>>>> Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back.
>>>> 
>>>> Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding.
>>>> 
>>>> Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions.
>>>> 
>>>> Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file.
>>>> 
>>>> Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64.
>>>> 
>>>> Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte).
>>>> 
>>>> The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it.
>>> 


From vladimir.kozlov at oracle.com  Thu Aug 11 11:52:37 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 11 Aug 2011 11:52:37 -0700
Subject: Request for reviews (L):  7063629: use cbcond in C2 generated
	code on T4
In-Reply-To: <CE1D8AB0-215A-451F-B954-F53AE4E08B72@oracle.com>
References: <4E3B452E.10509@oracle.com>
	<FFE3DD2B-A68C-4781-A3B9-EF642E7E484F@oracle.com>
	<4E41BAE9.1070505@oracle.com> <4E441DCC.5040303@oracle.com>
	<CE1D8AB0-215A-451F-B954-F53AE4E08B72@oracle.com>
Message-ID: <4E4424F5.7070308@oracle.com>

They are different but result is the same. I ran with assert as you suggested 
and found it immediately (-Xcomp). Anyway I will revert the change since we 
still have "wrong number of instructions" assert which should catch problems.

Thanks,
Vladimir

Tom Rodriguez wrote:
> On Aug 11, 2011, at 11:22 AM, Vladimir Kozlov wrote:
> 
>>>> !     if( bb->_nodes[bb->_nodes.size()-1] != n ) {
>>>> !     if( bb->_nodes[_bb_end-1] != n ) {
>>>>
>>>> That code is complicated enough that I can't reason about it's
>>>> correctness from a webrev.  Is this because of the trailing NOPs?
>>> I hit next assert during development because the loop above pushed nodes
>>> which are not for schedule.
>>>
>>>     assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of
>>> instructions" );
>>>
>>> It may happened before I split shorten_branches() and there were
>>> trailing NOPs. But it is not only trailing NOPs, it is also projections
>>> after calls and MachNullCheck nodes (see code in DoScheduling()). I
>>> think in general the check above should check the last node for schedule
>>> and not the last node in block.
>> Tom,
>>
>> I ran full CTW without this change with my latest changes and did not hit the assert which confirms that it was problem in early development when trailing NOPs were inserted before DoScheduling() call. Do you think I should remove this change?
> 
> If it isn't be needed then I think should be removed.  You could put in an assert that the old and new value are equal and then investigate any cases where they are different to confirm which value is correct.  It may be that they are different but both could be correct.
> 
> tom
> 
>> Thanks,
>> Vladimir
>>
>> Vladimir Kozlov wrote:
>>> Thank you, Tom
>>> Tom Rodriguez wrote:
>>>> This looks really good.
>>>>
>>>> This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*.  That would make it easier to use it directly during code generation, as in:
>>>>
>>>> +     __ jmpb($labl$$label);
>>> Yes, I would leave it for an other time. I will file RFE.
>>>> sparc.ad:
>>>>
>>>> It might be nice to factor this out:
>>>>
>>>>      Assembler::Predict predict_taken =
>>>> +       cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn;
>>> I will file RFE for that: use probability from IfNode to determine the pt value as you suggested before.
>>>> x86_32.ad:
>>>>
>>>> Would you get averse to inlining Jcc and JccShort?
>>> I did not realize that it is just one instruction now :)
>>> They are used in a lot of places and I did not want to duplicate the original code. I will inline them now.
>>>> output.cpp:
>>>>
>>>> Why does the first round of shorten_branches occur in the middle of init_buffer?  Couldn't it be done right afterwards?  It's just odd that it's buried inside there.
>>> First loop in shorten_branches() estimates code, locals, stubs sizes which are used later in init_buffer() to allocate CodeBuffer. I would need to split shorten_branches() method which is not easy since the first loop also collects information about branches which could be replaced.
>>>> That first round is conservative since we haven't done all padding yet, right?
>>> Correct.
>>>> Then shorten_branches_final does a last pass based on the real offsets? 
>>> Yes, backward branches inserted in this method use final offsets. For forward branches we still have only conservative offsets since following blocks are not processed yet.
>>>> shorten_branches_final isn't a great name.  Maybe finalize_offsets_and_shorten?
>>> I also did not like it, I will use finalize_offsets_and_shorten()
>>>> The core shorten branch logic is duplicated in those functions.  Could it be factored out or is there too much local state?
>>> I thought about it but as you said "too much local state".
>>>> Why was this needed?
>>>>
>>>> *** 2182,2192 ****
>>>> --- 2383,2393 ----
>>>>        (op != Op_Node &&         // Not an unused antidepedence node and
>>>>         // not an unallocated boxlock
>>>>         (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) {
>>>>        // Push any trailing projections
>>>> !     if( bb->_nodes[bb->_nodes.size()-1] != n ) {
>>>> !     if( bb->_nodes[_bb_end-1] != n ) {
>>>>        for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) {
>>>>          Node *foi = n->fast_out(i);
>>>>          if( foi->is_Proj() )
>>>>            _scheduled.push(foi);
>>>>        }
>>>>
>>>> That code is complicated enough that I can't reason about it's correctness from a webrev.  Is this because of the trailing NOPs?
>>> I hit next assert during development because the loop above pushed nodes which are not for schedule.
>>>    assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of instructions" );
>>> It may happened before I split shorten_branches() and there were trailing NOPs. But it is not only trailing NOPs, it is also projections after calls and MachNullCheck nodes (see code in DoScheduling()). I think in general the check above should check the last node for schedule and not the last node in block.
>>>> Can you add this comment to the that last anti_do_def piece I added:
>>>>
>>>> // kill projections on a branch should appear to occur on the
>>>> // branch, not afterwards, so grab the masks from the projections
>>>> // and process them.
>>> Done.
>>> Thanks,
>>> Vladimir
>>>> tom
>>>>
>>>>
>>>> On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote:
>>>>
>>>>> http://cr.openjdk.java.net/~kvn/7063629/webrev
>>>>>
>>>>> 7063629: use cbcond in C2 generated code on T4
>>>>>
>>>>> The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86.
>>>>>
>>>>> Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back.
>>>>>
>>>>> Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding.
>>>>>
>>>>> Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions.
>>>>>
>>>>> Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file.
>>>>>
>>>>> Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64.
>>>>>
>>>>> Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte).
>>>>>
>>>>> The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it.
> 

From tom.rodriguez at oracle.com  Thu Aug 11 15:02:53 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 11 Aug 2011 15:02:53 -0700
Subject: ReentrantLock performance regression between JDK5 and 6/7?
In-Reply-To: <CAFvQSYRV+B1WyRW5nJznHGAnbx_PBwwo8JsajNofUK_2oRURTw@mail.gmail.com>
References: <CAFvQSYT=MEEQ933ekHNPUVp5WporN62n2Rk8QFgppmrBho1SOA@mail.gmail.com>
	<CAHjP37EcxUcdEZFqO_jEj+8otnbGcqAUc0ydZRZpv1+808Bdpg@mail.gmail.com>
	<CAFvQSYQ8atHxh4GSkw0HH+Oh5A92d4-py4c-i1bJt2XDPA_-LA@mail.gmail.com>
	<CAHjP37F_X2hZE1C_j-TrEWydorncU1dajf2CHw5ysK9erYJfDQ@mail.gmail.com>
	<CAFvQSYRV+B1WyRW5nJznHGAnbx_PBwwo8JsajNofUK_2oRURTw@mail.gmail.com>
Message-ID: <0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com>

I believe this was caused by the switch to using lock addl[esp], 0 instead of mfence for volatile membars, 6822204.  My review request for that said that at the time I didn't measure any performance change for Intel, http://cr.openjdk.java.net/~never/6822204.  On your microbenchmark I can measure the difference though so I'm going to remeasure derby which previously showed the big difference.  We may want to make the lock addl be AMD specific.

tom

On Aug 11, 2011, at 11:05 AM, Clemens Eisserer wrote:

> Hi Vitaly,
> 
> I tried this bench on 6u23 and if I first run that code in a 10k iteration loop and then time the 1mm iteration loop I get about 10 ms speedup.  The first loop would trigger jit compilation (10k is the default threshold I believe) and second should run without compilation interruption.
> 
> Can you try the same? Also might be interesting to time it under the interpreter (-Xint).
> 
> I changed the testcase a bit, to no longer rely on OSR - as lockBench() will for sure soon hit the compilation threshold after a few runs.
> 
> I get the following timings for 1m runs:
> 
> jdk7-server: 53ms
> jdk7-client: 62ms
> jdk7-xint  : 955ms
> 
> jdk6-xint  : 1000ms
> jdk6-client: 68ms
> jdk6-server: 52ms
> 
> jdk5-server: 40ms
> jdk5-client: 61ms
> jdk5-xint  : 832ms
> 
> So JDK7 is slower in every case, the regression seems to have landed in jdk6 (I was using openjdk6).
> 
> Should I file a bug-report about this behaviour?
> 
> Thanks, Clemens
> 
> 
> public class LockPerf {
>     static ReentrantLock lock = new ReentrantLock();
>     
>     public static void main(String[] args) {
>      while (true) {
>           long start2 = System.nanoTime();
>           for(int i=0; i < 1000; i++) {
>           lockBench();
>         }
>         System.out.println("Lock bench: " + ((System.nanoTime() - start2)) / 1000000);
>     }
>     }
>    
>     private static void lockBench() {
>         for (int i = 0; i < 1000; i++) {
>           lock.lock();
>           lock.unlock();
>         }
>     }
> }
> 
>  
> On Aug 11, 2011 11:38 AM, "Clemens Eisserer" <linuxhippy at gmail.com> wrote:
> > Hi Vitaly,
> > 
> > Which OS are you using?
> >>
> > Linux-3.0 (Fedora 15)
> > 
> > 
> >> Also, you should use System.nanoTime() for this type of timing as it gives
> >> you a more precise timer.
> >>
> > I tried, but results remained the same. ~53ms for jdk6/7, ~41 for JDK5.
> > I was using the server compiler both times.
> > 
> > Thanks, Clemens
> 


From vitalyd at gmail.com  Thu Aug 11 15:39:15 2011
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Thu, 11 Aug 2011 18:39:15 -0400
Subject: ReentrantLock performance regression between JDK5 and 6/7?
In-Reply-To: <0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com>
References: <CAFvQSYT=MEEQ933ekHNPUVp5WporN62n2Rk8QFgppmrBho1SOA@mail.gmail.com>
	<CAHjP37EcxUcdEZFqO_jEj+8otnbGcqAUc0ydZRZpv1+808Bdpg@mail.gmail.com>
	<CAFvQSYQ8atHxh4GSkw0HH+Oh5A92d4-py4c-i1bJt2XDPA_-LA@mail.gmail.com>
	<CAHjP37F_X2hZE1C_j-TrEWydorncU1dajf2CHw5ysK9erYJfDQ@mail.gmail.com>
	<CAFvQSYRV+B1WyRW5nJznHGAnbx_PBwwo8JsajNofUK_2oRURTw@mail.gmail.com>
	<0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com>
Message-ID: <CAHjP37EaOQMNqFc1sFVVkRz+8sGKehJJf5m8ELPnMfXcvFyD=w@mail.gmail.com>

Hi Tom,

Just curious - I recall reading on Dave Dice's blog that he found locked add
to perform better than mfence.  Granted he tested on a nehalem box - do you
think it may need more granular decision making in the jit than just amd vs
Intel? i.e. check Intel generation as well.

Thanks
On Aug 11, 2011 6:03 PM, "Tom Rodriguez" <tom.rodriguez at oracle.com> wrote:
> I believe this was caused by the switch to using lock addl[esp], 0 instead
of mfence for volatile membars, 6822204. My review request for that said
that at the time I didn't measure any performance change for Intel,
http://cr.openjdk.java.net/~never/6822204. On your microbenchmark I can
measure the difference though so I'm going to remeasure derby which
previously showed the big difference. We may want to make the lock addl be
AMD specific.
>
> tom
>
> On Aug 11, 2011, at 11:05 AM, Clemens Eisserer wrote:
>
>> Hi Vitaly,
>>
>> I tried this bench on 6u23 and if I first run that code in a 10k
iteration loop and then time the 1mm iteration loop I get about 10 ms
speedup. The first loop would trigger jit compilation (10k is the default
threshold I believe) and second should run without compilation interruption.
>>
>> Can you try the same? Also might be interesting to time it under the
interpreter (-Xint).
>>
>> I changed the testcase a bit, to no longer rely on OSR - as lockBench()
will for sure soon hit the compilation threshold after a few runs.
>>
>> I get the following timings for 1m runs:
>>
>> jdk7-server: 53ms
>> jdk7-client: 62ms
>> jdk7-xint : 955ms
>>
>> jdk6-xint : 1000ms
>> jdk6-client: 68ms
>> jdk6-server: 52ms
>>
>> jdk5-server: 40ms
>> jdk5-client: 61ms
>> jdk5-xint : 832ms
>>
>> So JDK7 is slower in every case, the regression seems to have landed in
jdk6 (I was using openjdk6).
>>
>> Should I file a bug-report about this behaviour?
>>
>> Thanks, Clemens
>>
>>
>> public class LockPerf {
>> static ReentrantLock lock = new ReentrantLock();
>>
>> public static void main(String[] args) {
>> while (true) {
>> long start2 = System.nanoTime();
>> for(int i=0; i < 1000; i++) {
>> lockBench();
>> }
>> System.out.println("Lock bench: " + ((System.nanoTime() - start2)) /
1000000);
>> }
>> }
>>
>> private static void lockBench() {
>> for (int i = 0; i < 1000; i++) {
>> lock.lock();
>> lock.unlock();
>> }
>> }
>> }
>>
>>
>> On Aug 11, 2011 11:38 AM, "Clemens Eisserer" <linuxhippy at gmail.com>
wrote:
>> > Hi Vitaly,
>> >
>> > Which OS are you using?
>> >>
>> > Linux-3.0 (Fedora 15)
>> >
>> >
>> >> Also, you should use System.nanoTime() for this type of timing as it
gives
>> >> you a more precise timer.
>> >>
>> > I tried, but results remained the same. ~53ms for jdk6/7, ~41 for JDK5.
>> > I was using the server compiler both times.
>> >
>> > Thanks, Clemens
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110811/c7fc1bf9/attachment.html 

From vladimir.kozlov at oracle.com  Thu Aug 11 23:30:29 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Fri, 12 Aug 2011 06:30:29 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7063629: use cbcond in C2 generated
	code on T4
Message-ID: <20110812063035.2F11447B02@hg.openjdk.java.net>

Changeset: 95134e034042
Author:    kvn
Date:      2011-08-11 12:08 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/95134e034042

7063629: use cbcond in C2 generated code on T4
Summary: Use new short branch instruction in C2 generated code.
Reviewed-by: never

! src/cpu/sparc/vm/assembler_sparc.hpp
! src/cpu/sparc/vm/sparc.ad
! src/cpu/sparc/vm/vm_version_sparc.cpp
! src/cpu/x86/vm/assembler_x86.cpp
! src/cpu/x86/vm/assembler_x86.hpp
! src/cpu/x86/vm/x86_32.ad
! src/cpu/x86/vm/x86_64.ad
! src/os_cpu/linux_x86/vm/linux_x86_32.ad
! src/os_cpu/linux_x86/vm/linux_x86_64.ad
! src/os_cpu/solaris_x86/vm/solaris_x86_32.ad
! src/os_cpu/solaris_x86/vm/solaris_x86_64.ad
! src/share/vm/adlc/formssel.cpp
! src/share/vm/adlc/output_h.cpp
! src/share/vm/opto/block.cpp
! src/share/vm/opto/block.hpp
! src/share/vm/opto/compile.hpp
! src/share/vm/opto/machnode.hpp
! src/share/vm/opto/matcher.hpp
! src/share/vm/opto/node.hpp
! src/share/vm/opto/output.cpp


From fweimer at bfk.de  Fri Aug 12 00:57:50 2011
From: fweimer at bfk.de (Florian Weimer)
Date: Fri, 12 Aug 2011 07:57:50 +0000
Subject: ReentrantLock performance regression between JDK5 and 6/7?
In-Reply-To: <0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com> (Tom
	Rodriguez's message of "Thu, 11 Aug 2011 15:02:53 -0700")
References: <CAFvQSYT=MEEQ933ekHNPUVp5WporN62n2Rk8QFgppmrBho1SOA@mail.gmail.com>
	<CAHjP37EcxUcdEZFqO_jEj+8otnbGcqAUc0ydZRZpv1+808Bdpg@mail.gmail.com>
	<CAFvQSYQ8atHxh4GSkw0HH+Oh5A92d4-py4c-i1bJt2XDPA_-LA@mail.gmail.com>
	<CAHjP37F_X2hZE1C_j-TrEWydorncU1dajf2CHw5ysK9erYJfDQ@mail.gmail.com>
	<CAFvQSYRV+B1WyRW5nJznHGAnbx_PBwwo8JsajNofUK_2oRURTw@mail.gmail.com>
	<0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com>
Message-ID: <824o1nukn5.fsf@mid.bfk.de>

* Tom Rodriguez:

> I believe this was caused by the switch to using lock addl[esp], 0
> instead of mfence for volatile membars, 6822204.  My review request
> for that said that at the time I didn't measure any performance change
> for Intel, http://cr.openjdk.java.net/~never/6822204.  On your
> microbenchmark I can measure the difference though so I'm going to
> remeasure derby which previously showed the big difference.  We may
> want to make the lock addl be AMD specific.

Couldn't the relative speed of the two instructions also depend on the
type of benchmark?

-- 
Florian Weimer                <fweimer at bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstra?e 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

From tom.rodriguez at oracle.com  Fri Aug 12 11:22:14 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 12 Aug 2011 11:22:14 -0700
Subject: ReentrantLock performance regression between JDK5 and 6/7?
In-Reply-To: <824o1nukn5.fsf@mid.bfk.de>
References: <CAFvQSYT=MEEQ933ekHNPUVp5WporN62n2Rk8QFgppmrBho1SOA@mail.gmail.com>
	<CAHjP37EcxUcdEZFqO_jEj+8otnbGcqAUc0ydZRZpv1+808Bdpg@mail.gmail.com>
	<CAFvQSYQ8atHxh4GSkw0HH+Oh5A92d4-py4c-i1bJt2XDPA_-LA@mail.gmail.com>
	<CAHjP37F_X2hZE1C_j-TrEWydorncU1dajf2CHw5ysK9erYJfDQ@mail.gmail.com>
	<CAFvQSYRV+B1WyRW5nJznHGAnbx_PBwwo8JsajNofUK_2oRURTw@mail.gmail.com>
	<0B41DBAD-E290-425F-8214-F90DBBFCC5E3@oracle.com>
	<824o1nukn5.fsf@mid.bfk.de>
Message-ID: <0F9B135C-E961-4A73-8CD6-A17BAF2ABA19@oracle.com>


On Aug 12, 2011, at 12:57 AM, Florian Weimer wrote:

> * Tom Rodriguez:
> 
>> I believe this was caused by the switch to using lock addl[esp], 0
>> instead of mfence for volatile membars, 6822204.  My review request
>> for that said that at the time I didn't measure any performance change
>> for Intel, http://cr.openjdk.java.net/~never/6822204.  On your
>> microbenchmark I can measure the difference though so I'm going to
>> remeasure derby which previously showed the big difference.  We may
>> want to make the lock addl be AMD specific.
> 
> Couldn't the relative speed of the two instructions also depend on the
> type of benchmark?

These are primarily being emitted for volatile fences so many programs won't care about their speed at all.  If you look at my other email it suggests that the difference is that Intel chips prior to Nehalem had heavier weight implementation of lock addl than was required.  mfence stayed approximately the same between processor versions with it's speed pretty much tracking the relative clock speeds, 2.4 for the Tigerton and 2.8 for Nehalem.  The original data suggested no performance change on Nehalem when switching instructions so it probably doesn't care either way.

tom

> 
> -- 
> Florian Weimer                <fweimer at bfk.de>
> BFK edv-consulting GmbH       http://www.bfk.de/
> Kriegsstra?e 100              tel: +49-721-96201-1
> D-76133 Karlsruhe             fax: +49-721-96201-99


From vladimir.kozlov at oracle.com  Mon Aug 15 08:58:12 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 15 Aug 2011 08:58:12 -0700
Subject: Request for reviews (XS):  7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
Message-ID: <4E494214.2080407@oracle.com>

http://cr.openjdk.java.net/~kvn/7079317/webrev

7079317: Incorrect branch's destination block in PrintoOptoAssembly output

After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 
as destination block.
Remove unneeded debug verification code which overwrites label and block 
information for branches. There are other checks there which verify that code 
size was not changed.

From tom.rodriguez at oracle.com  Mon Aug 15 10:46:51 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 15 Aug 2011 10:46:51 -0700
Subject: Request for reviews (XS): 7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
In-Reply-To: <4E494214.2080407@oracle.com>
References: <4E494214.2080407@oracle.com>
Message-ID: <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com>

I don't understand how calling insts_size and Node::size causes a bug.  What am I missing?

tom

On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7079317/webrev
> 
> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output
> 
> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block.
> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed.


From vladimir.kozlov at oracle.com  Mon Aug 15 10:50:10 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 15 Aug 2011 10:50:10 -0700
Subject: Request for reviews (XS): 7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
In-Reply-To: <45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com>
References: <4E494214.2080407@oracle.com>
	<45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com>
Message-ID: <4E495C52.7080807@oracle.com>

Node::size() for branches calls code in scratch_emit_size() which resets label 
and block. An other solution for this problem would be save/restore label and 
block in scratch_emit_size() but it would require a lot more code changes.

Vladimir

Tom Rodriguez wrote:
> I don't understand how calling insts_size and Node::size causes a bug.  What am I missing?
> 
> tom
> 
> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>
>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output
>>
>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block.
>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed.
> 

From tom.rodriguez at oracle.com  Mon Aug 15 11:05:39 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 15 Aug 2011 11:05:39 -0700
Subject: Request for reviews (XS): 7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
In-Reply-To: <4E495C52.7080807@oracle.com>
References: <4E494214.2080407@oracle.com>
	<45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com>
	<4E495C52.7080807@oracle.com>
Message-ID: <08E31550-58B4-4125-876A-304C4465BC78@oracle.com>


On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote:

> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes.

Ah.  Fixing scratch_emit_size seems better since it's kind of a surprising behaviour.  It's not that much code is it?

tom

> 
> Vladimir
> 
> Tom Rodriguez wrote:
>> I don't understand how calling insts_size and Node::size causes a bug.  What am I missing?
>> tom
>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote:
>>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>> 
>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output
>>> 
>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block.
>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed.


From vladimir.kozlov at oracle.com  Mon Aug 15 12:04:40 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 15 Aug 2011 12:04:40 -0700
Subject: Request for reviews (XS): 7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
In-Reply-To: <08E31550-58B4-4125-876A-304C4465BC78@oracle.com>
References: <4E494214.2080407@oracle.com>
	<45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com>
	<4E495C52.7080807@oracle.com>
	<08E31550-58B4-4125-876A-304C4465BC78@oracle.com>
Message-ID: <4E496DC8.60107@oracle.com>

Tom Rodriguez wrote:
> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote:
> 
>> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes.
> 
> Ah.  Fixing scratch_emit_size seems better since it's kind of a surprising behaviour.  It's not that much code is it?

It needs a virtual method in MachNode which increase vtable of all Mach nodes. 
Here is webrev:

http://cr.openjdk.java.net/~kvn/7079317/webrev

Vladimir

> 
> tom
> 
>> Vladimir
>>
>> Tom Rodriguez wrote:
>>> I don't understand how calling insts_size and Node::size causes a bug.  What am I missing?
>>> tom
>>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote:
>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>>>
>>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output
>>>>
>>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block.
>>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed.
> 

From tom.rodriguez at oracle.com  Mon Aug 15 12:48:18 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 15 Aug 2011 12:48:18 -0700
Subject: Request for reviews (XS): 7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
In-Reply-To: <4E496DC8.60107@oracle.com>
References: <4E494214.2080407@oracle.com>
	<45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com>
	<4E495C52.7080807@oracle.com>
	<08E31550-58B4-4125-876A-304C4465BC78@oracle.com>
	<4E496DC8.60107@oracle.com>
Message-ID: <8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com>


On Aug 15, 2011, at 12:04 PM, Vladimir Kozlov wrote:

> Tom Rodriguez wrote:
>> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote:
>>> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes.
>> Ah.  Fixing scratch_emit_size seems better since it's kind of a surprising behaviour.  It's not that much code is it?
> 
> It needs a virtual method in MachNode which increase vtable of all Mach nodes. Here is webrev:

If we're really concerned about vtable size, all of those subtype specific setter/getters could probably be elsewhere down in the hierarchy.  The only meaningful implementations of label_set are in subclasses of MachGotoNode and MachIfNode so it seems like it could be moved into a new superclass of them.

I guess alternatively you could have a single virtual which returns the labelOper and implement label_set and save_label non-virtually in terms of that, though that probably doesn't play well with MachNullCheck which is_Branch but doesn't have a label.  The whole labelOper machinery looks ridiculously complicated...

Anyway, your change is ok with me as is.

tom

> 
> http://cr.openjdk.java.net/~kvn/7079317/webrev
> 
> Vladimir
> 
>> tom
>>> Vladimir
>>> 
>>> Tom Rodriguez wrote:
>>>> I don't understand how calling insts_size and Node::size causes a bug.  What am I missing?
>>>> tom
>>>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote:
>>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>>>> 
>>>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output
>>>>> 
>>>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block.
>>>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed.


From vladimir.kozlov at oracle.com  Mon Aug 15 17:20:53 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 15 Aug 2011 17:20:53 -0700
Subject: Request for reviews (XS): 7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
In-Reply-To: <8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com>
References: <4E494214.2080407@oracle.com>
	<45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com>
	<4E495C52.7080807@oracle.com>
	<08E31550-58B4-4125-876A-304C4465BC78@oracle.com>
	<4E496DC8.60107@oracle.com>
	<8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com>
Message-ID: <4E49B7E5.8080909@oracle.com>

Tom,

You should not give me these ideas since I can't back out now :) . Here is 
implementation using MachBranchNode. The only problem was JumpX mach node which 
is subclass of MachConstantNode. But it is fine since it does not have label, 
short version or delay slot (the sparc instruction has delay slot but we use 
ialu_reg_reg pipe_class). It needs only one additional check in output.cpp where 
Kill projections are processed.

http://cr.openjdk.java.net/~kvn/7079317/webrev

Thanks,
Vladimir

Tom Rodriguez wrote:
> On Aug 15, 2011, at 12:04 PM, Vladimir Kozlov wrote:
> 
>> Tom Rodriguez wrote:
>>> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote:
>>>> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes.
>>> Ah.  Fixing scratch_emit_size seems better since it's kind of a surprising behaviour.  It's not that much code is it?
>> It needs a virtual method in MachNode which increase vtable of all Mach nodes. Here is webrev:
> 
> If we're really concerned about vtable size, all of those subtype specific setter/getters could probably be elsewhere down in the hierarchy.  The only meaningful implementations of label_set are in subclasses of MachGotoNode and MachIfNode so it seems like it could be moved into a new superclass of them.
> 
> I guess alternatively you could have a single virtual which returns the labelOper and implement label_set and save_label non-virtually in terms of that, though that probably doesn't play well with MachNullCheck which is_Branch but doesn't have a label.  The whole labelOper machinery looks ridiculously complicated...
> 
> Anyway, your change is ok with me as is.
> 
> tom
> 
>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>
>> Vladimir
>>
>>> tom
>>>> Vladimir
>>>>
>>>> Tom Rodriguez wrote:
>>>>> I don't understand how calling insts_size and Node::size causes a bug.  What am I missing?
>>>>> tom
>>>>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote:
>>>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>>>>>
>>>>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output
>>>>>>
>>>>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block.
>>>>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed.
> 

From vladimir.kozlov at oracle.com  Mon Aug 15 18:12:03 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 15 Aug 2011 18:12:03 -0700
Subject: Request for reviews (M):  7079329: Adjust allocation prefetching
	for T4
Message-ID: <4E49C3E3.6060903@oracle.com>

http://cr.openjdk.java.net/~kvn/7079329/webrev

7079329: Adjust allocation prefetching for T4

L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As 
result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that 
prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued.

BIS can't be use for general prefetching since it may fault. New 
PrefetchAllocation node was added for allocation prefetching.

Changed prefetchAlloc_bis parameter from memory to regP.

Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for 
allocation prefetching (0: prefetch write, 1: BIS).

Added new instructions on Sparc cacheLineAdrX to reduce number of instructions 
generated for finding next cache line address.

Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch 
for instance allocation.

L1_data_cache_line_size() renamed to prefetch_data_size().

From christian.thalinger at oracle.com  Tue Aug 16 02:29:44 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 16 Aug 2011 11:29:44 +0200
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E49C3E3.6060903@oracle.com>
References: <4E49C3E3.6060903@oracle.com>
Message-ID: <AF474182-3118-4EA4-B06E-836A5A50D6AA@oracle.com>


On Aug 16, 2011, at 3:12 AM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7079329/webrev
> 
> 7079329: Adjust allocation prefetching for T4
> 
> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued.
> 
> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation prefetching.
> 
> Changed prefetchAlloc_bis parameter from memory to regP.
> 
> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch write, 1: BIS).
> 
> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line address.
> 
> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation.
> 
> L1_data_cache_line_size() renamed to prefetch_data_size().

src/cpu/x86/vm/x86_32.ad:
src/cpu/x86/vm/x86_64.ad:

Can you use MacroAssembler instructions to emit the code for the new instructs?

src/cpu/sparc/vm/vm_version_sparc.cpp:

+       if (is_T4()) {
+         // Double number of prefetched cache lines on T4
+         // since L2 cache line size is smaller (32 bytes).
+         if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) {
+           FLAG_SET_DEFAULT(AllocatePrefetchLines, 6);
+         }
+         if (FLAG_IS_DEFAULT(AllocateInstPrefetchLines)) {
+           FLAG_SET_DEFAULT(AllocateInstPrefetchLines, 2);
+         }
+       }

Maybe you should use *2 here.

Otherwise this looks good.

-- Christian

From igor.veresov at oracle.com  Tue Aug 16 02:47:58 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 16 Aug 2011 02:47:58 -0700
Subject: Request for reviews (M): 7079329: Adjust allocation
	prefetching for T4
In-Reply-To: <4E49C3E3.6060903@oracle.com>
References: <4E49C3E3.6060903@oracle.com>
Message-ID: <EE01C8A2008B4A1B8746809779FE9107@oracle.com>

 I think this looks good. 

igor

On Monday, August 15, 2011 at 6:12 PM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7079329/webrev
> 
> 7079329: Adjust allocation prefetching for T4
> 
> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As 
> result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that 
> prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued.
> 
> BIS can't be use for general prefetching since it may fault. New 
> PrefetchAllocation node was added for allocation prefetching.
> 
> Changed prefetchAlloc_bis parameter from memory to regP.
> 
> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for 
> allocation prefetching (0: prefetch write, 1: BIS).
> 
> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions 
> generated for finding next cache line address.
> 
> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch 
> for instance allocation.
> 
> L1_data_cache_line_size() renamed to prefetch_data_size().


From martin.doerr at sap.com  Tue Aug 16 03:31:53 2011
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 16 Aug 2011 12:31:53 +0200
Subject: allocation prefetching with block initializing instructions
Message-ID: <160598AAAEA6C640BF796BA28D836C6404FC0D85E6@DEWDFECCR04.wdf.sap.corp>

Hello everybody,

I have read your emails about the allocation prefetching on SPARC.
Avoiding fetching the cache lines from memory seems to make a lot of sense.
However, it should be possible to use these block initializing stores to replace
the ClearArray nodes in addition. We are loosing quite some time in these
clear loops.

Have you guys already thought about this?

I had played with the ZeroTLAB switch some time ago, but the TLABs appear to
get too large so clearing them at once doesn't perform well. But if we only
clear to something like a prefetch watermark and get rid of the ClearArray
we should get better performance. We only have to make sure that we always clear
up to some distance behind the object being allocated.

I'm looking forward to read your comments. Kind regards,
Martin D


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110816/5ee27bb4/attachment-0001.html 

From paul.hohensee at oracle.com  Tue Aug 16 06:01:12 2011
From: paul.hohensee at oracle.com (Paul Hohensee)
Date: Tue, 16 Aug 2011 09:01:12 -0400
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E49C3E3.6060903@oracle.com>
References: <4E49C3E3.6060903@oracle.com>
Message-ID: <4E4A6A18.6080807@oracle.com>

You're changing the meaning of an existing flag, AllocatePrefetchLines, to
apply only to arrays, right?

If so, I'd add another flag for arrays, maybe call it 
AllocateArrayPrefetchLines,
and change the code so AllocatePrefetchLines becomes an optional parameter.
E.g., default it to -1 in globals.hpp, and if it's specified on the 
command line,
set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the
command line value.  That would retain backward compatibility: I believe
I've seen AllocatePrefetchLines used in a few jbb submissions.

Also, I'd rename AllocateInstPrefetchLines to 
AllocateInstancePrefetchLines.  'Inst"
is a bit confusing to me and perhaps to others: the first thing I think 
of is 'instruction'.

Paul

On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
> http://cr.openjdk.java.net/~kvn/7079329/webrev
>
> 7079329: Adjust allocation prefetching for T4
>
> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series 
> before. As result BIS instruction prefetches only 32 bytes. Jbb2005 
> runs show that prefetching 64 bytes is still better on T4 so 2 BIS 
> instructions should be issued.
>
> BIS can't be use for general prefetching since it may fault. New 
> PrefetchAllocation node was added for allocation prefetching.
>
> Changed prefetchAlloc_bis parameter from memory to regP.
>
> Use AllocatePrefetchInstr on Sparc to allow specify what instruction 
> to use for allocation prefetching (0: prefetch write, 1: BIS).
>
> Added new instructions on Sparc cacheLineAdrX to reduce number of 
> instructions generated for finding next cache line address.
>
> Added new flag AllocateInstPrefetchLines to specify number of lines to 
> prefetch for instance allocation.
>
> L1_data_cache_line_size() renamed to prefetch_data_size().

From paul.hohensee at oracle.com  Tue Aug 16 06:01:36 2011
From: paul.hohensee at oracle.com (Paul Hohensee)
Date: Tue, 16 Aug 2011 09:01:36 -0400
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E49C3E3.6060903@oracle.com>
References: <4E49C3E3.6060903@oracle.com>
Message-ID: <4E4A6A30.6090608@oracle.com>

You're changing the meaning of an existing flag, AllocatePrefetchLines, to
apply only to arrays, right?

If so, I'd add another flag for arrays, maybe call it 
AllocateArrayPrefetchLines,
and change the code so AllocatePrefetchLines becomes an optional parameter.
E.g., default it to -1 in globals.hpp, and if it's specified on the 
command line,
set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the
command line value.  That would retain backward compatibility: I remember
seeing AllocatePrefetchLines used in a few jbb submissions.

Also, I'd rename AllocateInstPrefetchLines to 
AllocateInstancePrefetchLines.  'Inst"
is a bit confusing to me and perhaps to others: the first thing I think 
of is 'instruction'.

Paul

On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
> http://cr.openjdk.java.net/~kvn/7079329/webrev
>
> 7079329: Adjust allocation prefetching for T4
>
> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series 
> before. As result BIS instruction prefetches only 32 bytes. Jbb2005 
> runs show that prefetching 64 bytes is still better on T4 so 2 BIS 
> instructions should be issued.
>
> BIS can't be use for general prefetching since it may fault. New 
> PrefetchAllocation node was added for allocation prefetching.
>
> Changed prefetchAlloc_bis parameter from memory to regP.
>
> Use AllocatePrefetchInstr on Sparc to allow specify what instruction 
> to use for allocation prefetching (0: prefetch write, 1: BIS).
>
> Added new instructions on Sparc cacheLineAdrX to reduce number of 
> instructions generated for finding next cache line address.
>
> Added new flag AllocateInstPrefetchLines to specify number of lines to 
> prefetch for instance allocation.
>
> L1_data_cache_line_size() renamed to prefetch_data_size().

From paul.hohensee at oracle.com  Tue Aug 16 06:11:38 2011
From: paul.hohensee at oracle.com (Paul Hohensee)
Date: Tue, 16 Aug 2011 09:11:38 -0400
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E4A6A30.6090608@oracle.com>
References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com>
Message-ID: <4E4A6C8A.9030306@oracle.com>

Also, is there a way to avoid using #ifdef SPARC in 
threadLocalAllocBuffer.hpp?
Maybe add a predicate to vm_version that says whether or not to play the 
tlab
reserve game.

Paul

On 8/16/11 9:01 AM, Paul Hohensee wrote:
> You're changing the meaning of an existing flag, 
> AllocatePrefetchLines, to
> apply only to arrays, right?
>
> If so, I'd add another flag for arrays, maybe call it 
> AllocateArrayPrefetchLines,
> and change the code so AllocatePrefetchLines becomes an optional 
> parameter.
> E.g., default it to -1 in globals.hpp, and if it's specified on the 
> command line,
> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the
> command line value.  That would retain backward compatibility: I remember
> seeing AllocatePrefetchLines used in a few jbb submissions.
>
> Also, I'd rename AllocateInstPrefetchLines to 
> AllocateInstancePrefetchLines.  'Inst"
> is a bit confusing to me and perhaps to others: the first thing I 
> think of is 'instruction'.
>
> Paul
>
> On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>
>> 7079329: Adjust allocation prefetching for T4
>>
>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series 
>> before. As result BIS instruction prefetches only 32 bytes. Jbb2005 
>> runs show that prefetching 64 bytes is still better on T4 so 2 BIS 
>> instructions should be issued.
>>
>> BIS can't be use for general prefetching since it may fault. New 
>> PrefetchAllocation node was added for allocation prefetching.
>>
>> Changed prefetchAlloc_bis parameter from memory to regP.
>>
>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction 
>> to use for allocation prefetching (0: prefetch write, 1: BIS).
>>
>> Added new instructions on Sparc cacheLineAdrX to reduce number of 
>> instructions generated for finding next cache line address.
>>
>> Added new flag AllocateInstPrefetchLines to specify number of lines 
>> to prefetch for instance allocation.
>>
>> L1_data_cache_line_size() renamed to prefetch_data_size().

From christian.thalinger at oracle.com  Tue Aug 16 06:26:24 2011
From: christian.thalinger at oracle.com (christian.thalinger at oracle.com)
Date: Tue, 16 Aug 2011 13:26:24 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7071653: JSR 292: call site change
	notification should be pushed not pulled
Message-ID: <20110816132629.4D79447BFC@hg.openjdk.java.net>

Changeset: fdb992d83a87
Author:    twisti
Date:      2011-08-16 04:14 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/fdb992d83a87

7071653: JSR 292: call site change notification should be pushed not pulled
Reviewed-by: kvn, never, bdelsart

! src/cpu/sparc/vm/interp_masm_sparc.cpp
! src/cpu/sparc/vm/interp_masm_sparc.hpp
! src/cpu/sparc/vm/templateTable_sparc.cpp
! src/cpu/x86/vm/interp_masm_x86_32.cpp
! src/cpu/x86/vm/interp_masm_x86_32.hpp
! src/cpu/x86/vm/interp_masm_x86_64.cpp
! src/cpu/x86/vm/interp_masm_x86_64.hpp
! src/cpu/x86/vm/templateTable_x86_32.cpp
! src/cpu/x86/vm/templateTable_x86_64.cpp
! src/share/vm/ci/ciCallSite.cpp
! src/share/vm/ci/ciCallSite.hpp
! src/share/vm/ci/ciField.hpp
! src/share/vm/classfile/systemDictionary.cpp
! src/share/vm/classfile/systemDictionary.hpp
! src/share/vm/classfile/vmSymbols.hpp
! src/share/vm/code/dependencies.cpp
! src/share/vm/code/dependencies.hpp
! src/share/vm/code/nmethod.cpp
! src/share/vm/interpreter/interpreterRuntime.cpp
! src/share/vm/interpreter/templateTable.hpp
! src/share/vm/memory/universe.cpp
! src/share/vm/memory/universe.hpp
! src/share/vm/oops/instanceKlass.cpp
! src/share/vm/opto/callGenerator.cpp
! src/share/vm/opto/callGenerator.hpp
! src/share/vm/opto/doCall.cpp
! src/share/vm/opto/parse3.cpp


From vladimir.kozlov at oracle.com  Tue Aug 16 08:01:37 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 16 Aug 2011 08:01:37 -0700
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <AF474182-3118-4EA4-B06E-836A5A50D6AA@oracle.com>
References: <4E49C3E3.6060903@oracle.com>
	<AF474182-3118-4EA4-B06E-836A5A50D6AA@oracle.com>
Message-ID: <4E4A8651.60006@oracle.com>

On 8/16/11 2:29 AM, Christian Thalinger wrote:
>
> On Aug 16, 2011, at 3:12 AM, Vladimir Kozlov wrote:
>
>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>
>> 7079329: Adjust allocation prefetching for T4
>>
>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued.
>>
>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation prefetching.
>>
>> Changed prefetchAlloc_bis parameter from memory to regP.
>>
>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch write, 1: BIS).
>>
>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line address.
>>
>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation.
>>
>> L1_data_cache_line_size() renamed to prefetch_data_size().
>
> src/cpu/x86/vm/x86_32.ad:
> src/cpu/x86/vm/x86_64.ad:
>
> Can you use MacroAssembler instructions to emit the code for the new instructs?

OK.

>
> src/cpu/sparc/vm/vm_version_sparc.cpp:
>
> +       if (is_T4()) {
> +         // Double number of prefetched cache lines on T4
> +         // since L2 cache line size is smaller (32 bytes).
> +         if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) {
> +           FLAG_SET_DEFAULT(AllocatePrefetchLines, 6);
> +         }
> +         if (FLAG_IS_DEFAULT(AllocateInstPrefetchLines)) {
> +           FLAG_SET_DEFAULT(AllocateInstPrefetchLines, 2);
> +         }
> +       }
>
> Maybe you should use *2 here.

Something like this?:

+         if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) {
+           FLAG_SET_DEFAULT(AllocatePrefetchLines, AllocatePrefetchLines*2);
+         }

Vladimir

>
> Otherwise this looks good.
>
> -- Christian

From vladimir.kozlov at oracle.com  Tue Aug 16 08:13:30 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 16 Aug 2011 08:13:30 -0700
Subject: allocation prefetching with block initializing instructions
In-Reply-To: <160598AAAEA6C640BF796BA28D836C6404FC0D85E6@DEWDFECCR04.wdf.sap.corp>
References: <160598AAAEA6C640BF796BA28D836C6404FC0D85E6@DEWDFECCR04.wdf.sap.corp>
Message-ID: <4E4A891A.2060703@oracle.com>

Martin,

I have next RFE which I am working on. I do use BIS in ClearArray. I still need to figure out how to use it for zeroing 
new objects in runtime: pd_fill_to_aligned_words() in copy_sparc.hpp which is used for big arrays.

7059037: Use BIS for zeroing on T4

Regards,
Vladimir

On 8/16/11 3:31 AM, Doerr, Martin wrote:
> Hello everybody,
> I have read your emails about the allocation prefetching on SPARC.
> Avoiding fetching the cache lines from memory seems to make a lot of sense.
> However, it should be possible to use these block initializing stores to replace
> the ClearArray nodes in addition. We are loosing quite some time in these
> clear loops.
> Have you guys already thought about this?
> I had played with the ZeroTLAB switch some time ago, but the TLABs appear to
> get too large so clearing them at once doesn't perform well. But if we only
> clear to something like a prefetch watermark and get rid of the ClearArray
> we should get better performance. We only have to make sure that we always clear
> up to some distance behind the object being allocated.
> I'm looking forward to read your comments. Kind regards,
> Martin D

From vladimir.kozlov at oracle.com  Tue Aug 16 08:18:52 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 16 Aug 2011 08:18:52 -0700
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E4A6A18.6080807@oracle.com>
References: <4E49C3E3.6060903@oracle.com> <4E4A6A18.6080807@oracle.com>
Message-ID: <4E4A8A5C.7070808@oracle.com>

On 8/16/11 6:01 AM, Paul Hohensee wrote:
> You're changing the meaning of an existing flag, AllocatePrefetchLines, to
> apply only to arrays, right?

No. It was always used only for arrays:

!       uint lines = (length != NULL) ? AllocatePrefetchLines : 1;


> That would retain backward compatibility: I believe
> I've seen AllocatePrefetchLines used in a few jbb submissions.

That is why I did not rename it.

>
> Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst"
> is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'.

Agree.

Thanks,
Vladimir

>
> Paul
>
> On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>
>> 7079329: Adjust allocation prefetching for T4
>>
>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches only
>> 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued.
>>
>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation
>> prefetching.
>>
>> Changed prefetchAlloc_bis parameter from memory to regP.
>>
>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch
>> write, 1: BIS).
>>
>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line
>> address.
>>
>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation.
>>
>> L1_data_cache_line_size() renamed to prefetch_data_size().

From vladimir.kozlov at oracle.com  Tue Aug 16 08:20:09 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 16 Aug 2011 08:20:09 -0700
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E4A6C8A.9030306@oracle.com>
References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com>
	<4E4A6C8A.9030306@oracle.com>
Message-ID: <4E4A8AA9.2080006@oracle.com>

I will think about it.

Thanks,
Vladimir

On 8/16/11 6:11 AM, Paul Hohensee wrote:
> Also, is there a way to avoid using #ifdef SPARC in threadLocalAllocBuffer.hpp?
> Maybe add a predicate to vm_version that says whether or not to play the tlab
> reserve game.
>
> Paul
>
> On 8/16/11 9:01 AM, Paul Hohensee wrote:
>> You're changing the meaning of an existing flag, AllocatePrefetchLines, to
>> apply only to arrays, right?
>>
>> If so, I'd add another flag for arrays, maybe call it AllocateArrayPrefetchLines,
>> and change the code so AllocatePrefetchLines becomes an optional parameter.
>> E.g., default it to -1 in globals.hpp, and if it's specified on the command line,
>> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the
>> command line value. That would retain backward compatibility: I remember
>> seeing AllocatePrefetchLines used in a few jbb submissions.
>>
>> Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst"
>> is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'.
>>
>> Paul
>>
>> On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
>>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>>
>>> 7079329: Adjust allocation prefetching for T4
>>>
>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches
>>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued.
>>>
>>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation
>>> prefetching.
>>>
>>> Changed prefetchAlloc_bis parameter from memory to regP.
>>>
>>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch
>>> write, 1: BIS).
>>>
>>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line
>>> address.
>>>
>>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation.
>>>
>>> L1_data_cache_line_size() renamed to prefetch_data_size().

From christian.thalinger at oracle.com  Tue Aug 16 08:48:34 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 16 Aug 2011 17:48:34 +0200
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E4A8651.60006@oracle.com>
References: <4E49C3E3.6060903@oracle.com>
	<AF474182-3118-4EA4-B06E-836A5A50D6AA@oracle.com>
	<4E4A8651.60006@oracle.com>
Message-ID: <BEB7C4D3-C358-4962-B99B-2D3CC0EA3164@oracle.com>


On Aug 16, 2011, at 5:01 PM, Vladimir Kozlov wrote:

> On 8/16/11 2:29 AM, Christian Thalinger wrote:
>> 
>> On Aug 16, 2011, at 3:12 AM, Vladimir Kozlov wrote:
>> 
>>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>> 
>>> 7079329: Adjust allocation prefetching for T4
>>> 
>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued.
>>> 
>>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation prefetching.
>>> 
>>> Changed prefetchAlloc_bis parameter from memory to regP.
>>> 
>>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch write, 1: BIS).
>>> 
>>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line address.
>>> 
>>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation.
>>> 
>>> L1_data_cache_line_size() renamed to prefetch_data_size().
>> 
>> src/cpu/x86/vm/x86_32.ad:
>> src/cpu/x86/vm/x86_64.ad:
>> 
>> Can you use MacroAssembler instructions to emit the code for the new instructs?
> 
> OK.
> 
>> 
>> src/cpu/sparc/vm/vm_version_sparc.cpp:
>> 
>> +       if (is_T4()) {
>> +         // Double number of prefetched cache lines on T4
>> +         // since L2 cache line size is smaller (32 bytes).
>> +         if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) {
>> +           FLAG_SET_DEFAULT(AllocatePrefetchLines, 6);
>> +         }
>> +         if (FLAG_IS_DEFAULT(AllocateInstPrefetchLines)) {
>> +           FLAG_SET_DEFAULT(AllocateInstPrefetchLines, 2);
>> +         }
>> +       }
>> 
>> Maybe you should use *2 here.
> 
> Something like this?:
> 
> +         if (FLAG_IS_DEFAULT(AllocatePrefetchLines)) {
> +           FLAG_SET_DEFAULT(AllocatePrefetchLines, AllocatePrefetchLines*2);
> +         }

Yes, that makes more sense to me.

-- Christian

> 
> Vladimir
> 
>> 
>> Otherwise this looks good.
>> 
>> -- Christian


From vladimir.kozlov at oracle.com  Tue Aug 16 10:05:15 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 16 Aug 2011 10:05:15 -0700
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E4A8AA9.2080006@oracle.com>
References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com>
	<4E4A6C8A.9030306@oracle.com> <4E4A8AA9.2080006@oracle.com>
Message-ID: <4E4AA34B.8010504@oracle.com>

Thank you, Christian, Paul and Igor

I updated webrev with suggestions:

http://cr.openjdk.java.net/~kvn/7079329/webrev

- AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines.
- Prefetch instructions in x86 .ad use MacroAssembler instructions.
- Added Abstract_VM_Version::reserve_for_allocation_prefetch() method used in 
ThreadLocalAllocBuffer::end_reserve().
- I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting since 
VM_Version::initialize() is called twice on Sparc (long story which I don't want 
to discuss here).

Vladimir

Vladimir Kozlov wrote:
> I will think about it.
> 
> Thanks,
> Vladimir
> 
> On 8/16/11 6:11 AM, Paul Hohensee wrote:
>> Also, is there a way to avoid using #ifdef SPARC in 
>> threadLocalAllocBuffer.hpp?
>> Maybe add a predicate to vm_version that says whether or not to play 
>> the tlab
>> reserve game.
>>
>> Paul
>>
>> On 8/16/11 9:01 AM, Paul Hohensee wrote:
>>> You're changing the meaning of an existing flag, 
>>> AllocatePrefetchLines, to
>>> apply only to arrays, right?
>>>
>>> If so, I'd add another flag for arrays, maybe call it 
>>> AllocateArrayPrefetchLines,
>>> and change the code so AllocatePrefetchLines becomes an optional 
>>> parameter.
>>> E.g., default it to -1 in globals.hpp, and if it's specified on the 
>>> command line,
>>> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the
>>> command line value. That would retain backward compatibility: I remember
>>> seeing AllocatePrefetchLines used in a few jbb submissions.
>>>
>>> Also, I'd rename AllocateInstPrefetchLines to 
>>> AllocateInstancePrefetchLines. 'Inst"
>>> is a bit confusing to me and perhaps to others: the first thing I 
>>> think of is 'instruction'.
>>>
>>> Paul
>>>
>>> On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
>>>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>>>
>>>> 7079329: Adjust allocation prefetching for T4
>>>>
>>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series 
>>>> before. As result BIS instruction prefetches
>>>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still 
>>>> better on T4 so 2 BIS instructions should be issued.
>>>>
>>>> BIS can't be use for general prefetching since it may fault. New 
>>>> PrefetchAllocation node was added for allocation
>>>> prefetching.
>>>>
>>>> Changed prefetchAlloc_bis parameter from memory to regP.
>>>>
>>>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction 
>>>> to use for allocation prefetching (0: prefetch
>>>> write, 1: BIS).
>>>>
>>>> Added new instructions on Sparc cacheLineAdrX to reduce number of 
>>>> instructions generated for finding next cache line
>>>> address.
>>>>
>>>> Added new flag AllocateInstPrefetchLines to specify number of lines 
>>>> to prefetch for instance allocation.
>>>>
>>>> L1_data_cache_line_size() renamed to prefetch_data_size().

From igor.veresov at oracle.com  Tue Aug 16 11:09:05 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 16 Aug 2011 11:09:05 -0700
Subject: Request for reviews (M): 7079329: Adjust allocation
	prefetching for T4
In-Reply-To: <4E4AA34B.8010504@oracle.com>
References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com>
	<4E4A6C8A.9030306@oracle.com> <4E4A8AA9.2080006@oracle.com>
	<4E4AA34B.8010504@oracle.com>
Message-ID: <8D4DCF97085E4A44A5462CB4356E2B81@oracle.com>

Still looks good. 

igor

On Tuesday, August 16, 2011 at 10:05 AM, Vladimir Kozlov wrote:

> Thank you, Christian, Paul and Igor
> 
> I updated webrev with suggestions:
> 
> http://cr.openjdk.java.net/~kvn/7079329/webrev
> 
> - AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines.
> - Prefetch instructions in x86 .ad use MacroAssembler instructions.
> - Added Abstract_VM_Version::reserve_for_allocation_prefetch() method used in 
> ThreadLocalAllocBuffer::end_reserve().
> - I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting since 
> VM_Version::initialize() is called twice on Sparc (long story which I don't want 
> to discuss here).
> 
> Vladimir
> 
> Vladimir Kozlov wrote:
> > I will think about it.
> > 
> > Thanks,
> > Vladimir
> > 
> > On 8/16/11 6:11 AM, Paul Hohensee wrote:
> > > Also, is there a way to avoid using #ifdef SPARC in 
> > > threadLocalAllocBuffer.hpp?
> > > Maybe add a predicate to vm_version that says whether or not to play 
> > > the tlab
> > > reserve game.
> > > 
> > > Paul
> > > 
> > > On 8/16/11 9:01 AM, Paul Hohensee wrote:
> > > > You're changing the meaning of an existing flag, 
> > > > AllocatePrefetchLines, to
> > > > apply only to arrays, right?
> > > > 
> > > > If so, I'd add another flag for arrays, maybe call it 
> > > > AllocateArrayPrefetchLines,
> > > > and change the code so AllocatePrefetchLines becomes an optional 
> > > > parameter.
> > > > E.g., default it to -1 in globals.hpp, and if it's specified on the 
> > > > command line,
> > > > set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the
> > > > command line value. That would retain backward compatibility: I remember
> > > > seeing AllocatePrefetchLines used in a few jbb submissions.
> > > > 
> > > > Also, I'd rename AllocateInstPrefetchLines to 
> > > > AllocateInstancePrefetchLines. 'Inst"
> > > > is a bit confusing to me and perhaps to others: the first thing I 
> > > > think of is 'instruction'.
> > > > 
> > > > Paul
> > > > 
> > > > On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
> > > > > http://cr.openjdk.java.net/~kvn/7079329/webrev
> > > > > 
> > > > > 7079329: Adjust allocation prefetching for T4
> > > > > 
> > > > > L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series 
> > > > > before. As result BIS instruction prefetches
> > > > > only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still 
> > > > > better on T4 so 2 BIS instructions should be issued.
> > > > > 
> > > > > BIS can't be use for general prefetching since it may fault. New 
> > > > > PrefetchAllocation node was added for allocation
> > > > > prefetching.
> > > > > 
> > > > > Changed prefetchAlloc_bis parameter from memory to regP.
> > > > > 
> > > > > Use AllocatePrefetchInstr on Sparc to allow specify what instruction 
> > > > > to use for allocation prefetching (0: prefetch
> > > > > write, 1: BIS).
> > > > > 
> > > > > Added new instructions on Sparc cacheLineAdrX to reduce number of 
> > > > > instructions generated for finding next cache line
> > > > > address.
> > > > > 
> > > > > Added new flag AllocateInstPrefetchLines to specify number of lines 
> > > > > to prefetch for instance allocation.
> > > > > 
> > > > > L1_data_cache_line_size() renamed to prefetch_data_size().


From tom.rodriguez at oracle.com  Tue Aug 16 11:11:13 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 16 Aug 2011 11:11:13 -0700
Subject: Request for reviews (XS): 7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
In-Reply-To: <4E49B7E5.8080909@oracle.com>
References: <4E494214.2080407@oracle.com>
	<45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com>
	<4E495C52.7080807@oracle.com>
	<08E31550-58B4-4125-876A-304C4465BC78@oracle.com>
	<4E496DC8.60107@oracle.com>
	<8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com>
	<4E49B7E5.8080909@oracle.com>
Message-ID: <9987443B-50F2-42FF-8A98-5713B87B50A3@oracle.com>

That looks good.

tom

On Aug 15, 2011, at 5:20 PM, Vladimir Kozlov wrote:

> Tom,
> 
> You should not give me these ideas since I can't back out now :) . Here is implementation using MachBranchNode. The only problem was JumpX mach node which is subclass of MachConstantNode. But it is fine since it does not have label, short version or delay slot (the sparc instruction has delay slot but we use ialu_reg_reg pipe_class). It needs only one additional check in output.cpp where Kill projections are processed.
> 
> http://cr.openjdk.java.net/~kvn/7079317/webrev
> 
> Thanks,
> Vladimir
> 
> Tom Rodriguez wrote:
>> On Aug 15, 2011, at 12:04 PM, Vladimir Kozlov wrote:
>>> Tom Rodriguez wrote:
>>>> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote:
>>>>> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes.
>>>> Ah.  Fixing scratch_emit_size seems better since it's kind of a surprising behaviour.  It's not that much code is it?
>>> It needs a virtual method in MachNode which increase vtable of all Mach nodes. Here is webrev:
>> If we're really concerned about vtable size, all of those subtype specific setter/getters could probably be elsewhere down in the hierarchy.  The only meaningful implementations of label_set are in subclasses of MachGotoNode and MachIfNode so it seems like it could be moved into a new superclass of them.
>> I guess alternatively you could have a single virtual which returns the labelOper and implement label_set and save_label non-virtually in terms of that, though that probably doesn't play well with MachNullCheck which is_Branch but doesn't have a label.  The whole labelOper machinery looks ridiculously complicated...
>> Anyway, your change is ok with me as is.
>> tom
>>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>> 
>>> Vladimir
>>> 
>>>> tom
>>>>> Vladimir
>>>>> 
>>>>> Tom Rodriguez wrote:
>>>>>> I don't understand how calling insts_size and Node::size causes a bug.  What am I missing?
>>>>>> tom
>>>>>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote:
>>>>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>>>>>> 
>>>>>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output
>>>>>>> 
>>>>>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block.
>>>>>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed.


From vladimir.kozlov at oracle.com  Tue Aug 16 11:14:36 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 16 Aug 2011 11:14:36 -0700
Subject: Request for reviews (XS): 7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
In-Reply-To: <9987443B-50F2-42FF-8A98-5713B87B50A3@oracle.com>
References: <4E494214.2080407@oracle.com>
	<45803A4A-49AB-4B4A-88CB-44419E09AE7B@oracle.com>
	<4E495C52.7080807@oracle.com>
	<08E31550-58B4-4125-876A-304C4465BC78@oracle.com>
	<4E496DC8.60107@oracle.com>
	<8AA4ABA0-5C39-4F77-9FBD-F4B006A4AFC5@oracle.com>
	<4E49B7E5.8080909@oracle.com>
	<9987443B-50F2-42FF-8A98-5713B87B50A3@oracle.com>
Message-ID: <4E4AB38C.4000504@oracle.com>

Thank you, Tom

Vladimir

Tom Rodriguez wrote:
> That looks good.
> 
> tom
> 
> On Aug 15, 2011, at 5:20 PM, Vladimir Kozlov wrote:
> 
>> Tom,
>>
>> You should not give me these ideas since I can't back out now :) . Here is implementation using MachBranchNode. The only problem was JumpX mach node which is subclass of MachConstantNode. But it is fine since it does not have label, short version or delay slot (the sparc instruction has delay slot but we use ialu_reg_reg pipe_class). It needs only one additional check in output.cpp where Kill projections are processed.
>>
>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>
>> Thanks,
>> Vladimir
>>
>> Tom Rodriguez wrote:
>>> On Aug 15, 2011, at 12:04 PM, Vladimir Kozlov wrote:
>>>> Tom Rodriguez wrote:
>>>>> On Aug 15, 2011, at 10:50 AM, Vladimir Kozlov wrote:
>>>>>> Node::size() for branches calls code in scratch_emit_size() which resets label and block. An other solution for this problem would be save/restore label and block in scratch_emit_size() but it would require a lot more code changes.
>>>>> Ah.  Fixing scratch_emit_size seems better since it's kind of a surprising behaviour.  It's not that much code is it?
>>>> It needs a virtual method in MachNode which increase vtable of all Mach nodes. Here is webrev:
>>> If we're really concerned about vtable size, all of those subtype specific setter/getters could probably be elsewhere down in the hierarchy.  The only meaningful implementations of label_set are in subclasses of MachGotoNode and MachIfNode so it seems like it could be moved into a new superclass of them.
>>> I guess alternatively you could have a single virtual which returns the labelOper and implement label_set and save_label non-virtually in terms of that, though that probably doesn't play well with MachNullCheck which is_Branch but doesn't have a label.  The whole labelOper machinery looks ridiculously complicated...
>>> Anyway, your change is ok with me as is.
>>> tom
>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>>>
>>>> Vladimir
>>>>
>>>>> tom
>>>>>> Vladimir
>>>>>>
>>>>>> Tom Rodriguez wrote:
>>>>>>> I don't understand how calling insts_size and Node::size causes a bug.  What am I missing?
>>>>>>> tom
>>>>>>> On Aug 15, 2011, at 8:58 AM, Vladimir Kozlov wrote:
>>>>>>>> http://cr.openjdk.java.net/~kvn/7079317/webrev
>>>>>>>>
>>>>>>>> 7079317: Incorrect branch's destination block in PrintoOptoAssembly output
>>>>>>>>
>>>>>>>> After changes for 7063629 PrintoOptoAssembly output shows all branches have B0 as destination block.
>>>>>>>> Remove unneeded debug verification code which overwrites label and block information for branches. There are other checks there which verify that code size was not changed.
> 

From christian.thalinger at oracle.com  Tue Aug 16 11:32:40 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 16 Aug 2011 20:32:40 +0200
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E4AA34B.8010504@oracle.com>
References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com>
	<4E4A6C8A.9030306@oracle.com> <4E4A8AA9.2080006@oracle.com>
	<4E4AA34B.8010504@oracle.com>
Message-ID: <D20D8995-5409-4C6E-94B4-FCC7C0C99463@oracle.com>

Looks good.

-- Christian

On Aug 16, 2011, at 7:05 PM, Vladimir Kozlov wrote:

> Thank you, Christian, Paul and Igor
> 
> I updated webrev with suggestions:
> 
> http://cr.openjdk.java.net/~kvn/7079329/webrev
> 
> - AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines.
> - Prefetch instructions in x86 .ad use MacroAssembler instructions.
> - Added Abstract_VM_Version::reserve_for_allocation_prefetch() method used in ThreadLocalAllocBuffer::end_reserve().
> - I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting since VM_Version::initialize() is called twice on Sparc (long story which I don't want to discuss here).
> 
> Vladimir
> 
> Vladimir Kozlov wrote:
>> I will think about it.
>> Thanks,
>> Vladimir
>> On 8/16/11 6:11 AM, Paul Hohensee wrote:
>>> Also, is there a way to avoid using #ifdef SPARC in threadLocalAllocBuffer.hpp?
>>> Maybe add a predicate to vm_version that says whether or not to play the tlab
>>> reserve game.
>>> 
>>> Paul
>>> 
>>> On 8/16/11 9:01 AM, Paul Hohensee wrote:
>>>> You're changing the meaning of an existing flag, AllocatePrefetchLines, to
>>>> apply only to arrays, right?
>>>> 
>>>> If so, I'd add another flag for arrays, maybe call it AllocateArrayPrefetchLines,
>>>> and change the code so AllocatePrefetchLines becomes an optional parameter.
>>>> E.g., default it to -1 in globals.hpp, and if it's specified on the command line,
>>>> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the
>>>> command line value. That would retain backward compatibility: I remember
>>>> seeing AllocatePrefetchLines used in a few jbb submissions.
>>>> 
>>>> Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst"
>>>> is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'.
>>>> 
>>>> Paul
>>>> 
>>>> On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
>>>>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>>>> 
>>>>> 7079329: Adjust allocation prefetching for T4
>>>>> 
>>>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches
>>>>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued.
>>>>> 
>>>>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation
>>>>> prefetching.
>>>>> 
>>>>> Changed prefetchAlloc_bis parameter from memory to regP.
>>>>> 
>>>>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch
>>>>> write, 1: BIS).
>>>>> 
>>>>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line
>>>>> address.
>>>>> 
>>>>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation.
>>>>> 
>>>>> L1_data_cache_line_size() renamed to prefetch_data_size().


From vladimir.kozlov at oracle.com  Tue Aug 16 11:31:55 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 16 Aug 2011 11:31:55 -0700
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <D20D8995-5409-4C6E-94B4-FCC7C0C99463@oracle.com>
References: <4E49C3E3.6060903@oracle.com> <4E4A6A30.6090608@oracle.com>
	<4E4A6C8A.9030306@oracle.com> <4E4A8AA9.2080006@oracle.com>
	<4E4AA34B.8010504@oracle.com>
	<D20D8995-5409-4C6E-94B4-FCC7C0C99463@oracle.com>
Message-ID: <4E4AB79B.4020608@oracle.com>

thank you, Christian.

Vladimir

Christian Thalinger wrote:
> Looks good.
> 
> -- Christian
> 
> On Aug 16, 2011, at 7:05 PM, Vladimir Kozlov wrote:
> 
>> Thank you, Christian, Paul and Igor
>>
>> I updated webrev with suggestions:
>>
>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>
>> - AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines.
>> - Prefetch instructions in x86 .ad use MacroAssembler instructions.
>> - Added Abstract_VM_Version::reserve_for_allocation_prefetch() method used in ThreadLocalAllocBuffer::end_reserve().
>> - I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting since VM_Version::initialize() is called twice on Sparc (long story which I don't want to discuss here).
>>
>> Vladimir
>>
>> Vladimir Kozlov wrote:
>>> I will think about it.
>>> Thanks,
>>> Vladimir
>>> On 8/16/11 6:11 AM, Paul Hohensee wrote:
>>>> Also, is there a way to avoid using #ifdef SPARC in threadLocalAllocBuffer.hpp?
>>>> Maybe add a predicate to vm_version that says whether or not to play the tlab
>>>> reserve game.
>>>>
>>>> Paul
>>>>
>>>> On 8/16/11 9:01 AM, Paul Hohensee wrote:
>>>>> You're changing the meaning of an existing flag, AllocatePrefetchLines, to
>>>>> apply only to arrays, right?
>>>>>
>>>>> If so, I'd add another flag for arrays, maybe call it AllocateArrayPrefetchLines,
>>>>> and change the code so AllocatePrefetchLines becomes an optional parameter.
>>>>> E.g., default it to -1 in globals.hpp, and if it's specified on the command line,
>>>>> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines to the
>>>>> command line value. That would retain backward compatibility: I remember
>>>>> seeing AllocatePrefetchLines used in a few jbb submissions.
>>>>>
>>>>> Also, I'd rename AllocateInstPrefetchLines to AllocateInstancePrefetchLines. 'Inst"
>>>>> is a bit confusing to me and perhaps to others: the first thing I think of is 'instruction'.
>>>>>
>>>>> Paul
>>>>>
>>>>> On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
>>>>>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>>>>>
>>>>>> 7079329: Adjust allocation prefetching for T4
>>>>>>
>>>>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T series before. As result BIS instruction prefetches
>>>>>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is still better on T4 so 2 BIS instructions should be issued.
>>>>>>
>>>>>> BIS can't be use for general prefetching since it may fault. New PrefetchAllocation node was added for allocation
>>>>>> prefetching.
>>>>>>
>>>>>> Changed prefetchAlloc_bis parameter from memory to regP.
>>>>>>
>>>>>> Use AllocatePrefetchInstr on Sparc to allow specify what instruction to use for allocation prefetching (0: prefetch
>>>>>> write, 1: BIS).
>>>>>>
>>>>>> Added new instructions on Sparc cacheLineAdrX to reduce number of instructions generated for finding next cache line
>>>>>> address.
>>>>>>
>>>>>> Added new flag AllocateInstPrefetchLines to specify number of lines to prefetch for instance allocation.
>>>>>>
>>>>>> L1_data_cache_line_size() renamed to prefetch_data_size().
> 

From christian.thalinger at Oracle.com  Tue Aug 16 12:52:34 2011
From: christian.thalinger at Oracle.com (Christian Thalinger)
Date: Tue, 16 Aug 2011 21:52:34 +0200
Subject: Request for reviews (XXS): 7079626: x64 emits unnecessary REX prefix
Message-ID: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com>

http://cr.openjdk.java.net/~twisti/7079626/

7079626: x64 emits unnecessary REX prefix
Reviewed-by:

While investigating some other bug we found out that on x64 we
sometimes emit unnecessary REX prefixes.

From vladimir.kozlov at oracle.com  Tue Aug 16 12:57:32 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 16 Aug 2011 12:57:32 -0700
Subject: Request for reviews (XXS): 7079626: x64 emits unnecessary REX
	prefix
In-Reply-To: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com>
References: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com>
Message-ID: <4E4ACBAC.40801@oracle.com>

Looks good.

Vladimir

Christian Thalinger wrote:
> http://cr.openjdk.java.net/~twisti/7079626/
> 
> 7079626: x64 emits unnecessary REX prefix
> Reviewed-by:
> 
> While investigating some other bug we found out that on x64 we
> sometimes emit unnecessary REX prefixes.

From igor.veresov at oracle.com  Tue Aug 16 13:05:21 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 16 Aug 2011 13:05:21 -0700
Subject: Request for reviews (XXS): 7079626: x64 emits unnecessary
	REX prefix
In-Reply-To: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com>
References: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com>
Message-ID: <B0620A8AC2BC4384AF18AB57CE307968@oracle.com>

 Looks good. 

igor

On Tuesday, August 16, 2011 at 12:52 PM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7079626/
> 
> 7079626: x64 emits unnecessary REX prefix
> Reviewed-by:
> 
> While investigating some other bug we found out that on x64 we
> sometimes emit unnecessary REX prefixes.


From tom.rodriguez at oracle.com  Tue Aug 16 13:08:11 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 16 Aug 2011 13:08:11 -0700
Subject: Request for reviews (XXS): 7079626: x64 emits unnecessary REX
	prefix
In-Reply-To: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com>
References: <0B8C0892-E990-441A-B140-754C5CE96FE6@Oracle.com>
Message-ID: <D568FBB5-09D8-4992-9101-D0B80BD024CE@oracle.com>

Looks good.

tom

On Aug 16, 2011, at 12:52 PM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7079626/
> 
> 7079626: x64 emits unnecessary REX prefix
> Reviewed-by:
> 
> While investigating some other bug we found out that on x64 we
> sometimes emit unnecessary REX prefixes.


From vladimir.kozlov at oracle.com  Tue Aug 16 16:27:01 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Tue, 16 Aug 2011 23:27:01 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7079317: Incorrect branch's destination
	block in PrintoOptoAssembly output
Message-ID: <20110816232707.BA0DC47C30@hg.openjdk.java.net>

Changeset: 11211f7cb5a0
Author:    kvn
Date:      2011-08-16 11:53 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/11211f7cb5a0

7079317: Incorrect branch's destination block in PrintoOptoAssembly output
Summary: save/restore label and block in scratch_emit_size()
Reviewed-by: never

! src/share/vm/adlc/archDesc.cpp
! src/share/vm/adlc/formssel.cpp
! src/share/vm/adlc/output_c.cpp
! src/share/vm/adlc/output_h.cpp
! src/share/vm/opto/block.cpp
! src/share/vm/opto/compile.cpp
! src/share/vm/opto/idealGraphPrinter.cpp
! src/share/vm/opto/machnode.cpp
! src/share/vm/opto/machnode.hpp
! src/share/vm/opto/node.hpp
! src/share/vm/opto/output.cpp


From vladimir.kozlov at oracle.com  Tue Aug 16 21:32:33 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Wed, 17 Aug 2011 04:32:33 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7079329: Adjust allocation prefetching
	for T4
Message-ID: <20110817043235.1C4A147C41@hg.openjdk.java.net>

Changeset: 1af104d6cf99
Author:    kvn
Date:      2011-08-16 16:59 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/1af104d6cf99

7079329: Adjust allocation prefetching for T4
Summary: on T4 2 BIS instructions should be issued to prefetch 64 bytes
Reviewed-by: iveresov, phh, twisti

! src/cpu/sparc/vm/assembler_sparc.hpp
! src/cpu/sparc/vm/sparc.ad
! src/cpu/sparc/vm/vm_version_sparc.cpp
! src/cpu/sparc/vm/vm_version_sparc.hpp
! src/cpu/x86/vm/assembler_x86.cpp
! src/cpu/x86/vm/vm_version_x86.cpp
! src/cpu/x86/vm/vm_version_x86.hpp
! src/cpu/x86/vm/x86_32.ad
! src/cpu/x86/vm/x86_64.ad
! src/share/vm/adlc/formssel.cpp
! src/share/vm/memory/threadLocalAllocBuffer.hpp
! src/share/vm/opto/classes.hpp
! src/share/vm/opto/macro.cpp
! src/share/vm/opto/matcher.cpp
! src/share/vm/opto/memnode.hpp
! src/share/vm/runtime/globals.hpp
! src/share/vm/runtime/vm_version.cpp
! src/share/vm/runtime/vm_version.hpp


From paul.hohensee at oracle.com  Wed Aug 17 04:43:48 2011
From: paul.hohensee at oracle.com (Paul Hohensee)
Date: Wed, 17 Aug 2011 07:43:48 -0400
Subject: Request for reviews (M): 7079329: Adjust allocation prefetching
	for T4
In-Reply-To: <4E4AA34B.8010504@oracle.com>
References: <4E49C3E3.6060903@oracle.com>
	<4E4A6A30.6090608@oracle.com>	<4E4A6C8A.9030306@oracle.com>
	<4E4A8AA9.2080006@oracle.com> <4E4AA34B.8010504@oracle.com>
Message-ID: <4E4BA974.2080008@oracle.com>

Looks good.

Paul

On 8/16/11 1:05 PM, Vladimir Kozlov wrote:
> Thank you, Christian, Paul and Igor
>
> I updated webrev with suggestions:
>
> http://cr.openjdk.java.net/~kvn/7079329/webrev
>
> - AllocateInstPrefetchLines renamed to AllocateInstancePrefetchLines.
> - Prefetch instructions in x86 .ad use MacroAssembler instructions.
> - Added Abstract_VM_Version::reserve_for_allocation_prefetch() method 
> used in ThreadLocalAllocBuffer::end_reserve().
> - I have to use FLAG_SET_ERGO() for AllocatePrefetchLines*2 setting 
> since VM_Version::initialize() is called twice on Sparc (long story 
> which I don't want to discuss here).
>
> Vladimir
>
> Vladimir Kozlov wrote:
>> I will think about it.
>>
>> Thanks,
>> Vladimir
>>
>> On 8/16/11 6:11 AM, Paul Hohensee wrote:
>>> Also, is there a way to avoid using #ifdef SPARC in 
>>> threadLocalAllocBuffer.hpp?
>>> Maybe add a predicate to vm_version that says whether or not to play 
>>> the tlab
>>> reserve game.
>>>
>>> Paul
>>>
>>> On 8/16/11 9:01 AM, Paul Hohensee wrote:
>>>> You're changing the meaning of an existing flag, 
>>>> AllocatePrefetchLines, to
>>>> apply only to arrays, right?
>>>>
>>>> If so, I'd add another flag for arrays, maybe call it 
>>>> AllocateArrayPrefetchLines,
>>>> and change the code so AllocatePrefetchLines becomes an optional 
>>>> parameter.
>>>> E.g., default it to -1 in globals.hpp, and if it's specified on the 
>>>> command line,
>>>> set both AllocateArrayPrefetchLines and AllocateInstPrefetchLines 
>>>> to the
>>>> command line value. That would retain backward compatibility: I 
>>>> remember
>>>> seeing AllocatePrefetchLines used in a few jbb submissions.
>>>>
>>>> Also, I'd rename AllocateInstPrefetchLines to 
>>>> AllocateInstancePrefetchLines. 'Inst"
>>>> is a bit confusing to me and perhaps to others: the first thing I 
>>>> think of is 'instruction'.
>>>>
>>>> Paul
>>>>
>>>> On 8/15/11 9:12 PM, Vladimir Kozlov wrote:
>>>>> http://cr.openjdk.java.net/~kvn/7079329/webrev
>>>>>
>>>>> 7079329: Adjust allocation prefetching for T4
>>>>>
>>>>> L2 cache line size is 32 bytes on T4 instead of 64 bytes on T 
>>>>> series before. As result BIS instruction prefetches
>>>>> only 32 bytes. Jbb2005 runs show that prefetching 64 bytes is 
>>>>> still better on T4 so 2 BIS instructions should be issued.
>>>>>
>>>>> BIS can't be use for general prefetching since it may fault. New 
>>>>> PrefetchAllocation node was added for allocation
>>>>> prefetching.
>>>>>
>>>>> Changed prefetchAlloc_bis parameter from memory to regP.
>>>>>
>>>>> Use AllocatePrefetchInstr on Sparc to allow specify what 
>>>>> instruction to use for allocation prefetching (0: prefetch
>>>>> write, 1: BIS).
>>>>>
>>>>> Added new instructions on Sparc cacheLineAdrX to reduce number of 
>>>>> instructions generated for finding next cache line
>>>>> address.
>>>>>
>>>>> Added new flag AllocateInstPrefetchLines to specify number of 
>>>>> lines to prefetch for instance allocation.
>>>>>
>>>>> L1_data_cache_line_size() renamed to prefetch_data_size().

From christian.thalinger at oracle.com  Wed Aug 17 09:35:42 2011
From: christian.thalinger at oracle.com (christian.thalinger at oracle.com)
Date: Wed, 17 Aug 2011 16:35:42 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7079626: x64 emits unnecessary REX
	prefix
Message-ID: <20110817163545.80DC347C64@hg.openjdk.java.net>

Changeset: 381bf869f784
Author:    twisti
Date:      2011-08-17 05:14 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/381bf869f784

7079626: x64 emits unnecessary REX prefix
Reviewed-by: kvn, iveresov, never

! src/cpu/x86/vm/assembler_x86.cpp


From christian.thalinger at oracle.com  Wed Aug 17 11:20:01 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 17 Aug 2011 20:20:01 +0200
Subject: Request for reviews (XXS): 7079769: JSR 292: incorrect size() for
	CallStaticJavaHandle on sparc
Message-ID: <001166E8-B9AE-4843-AF8A-6F1F9063D751@oracle.com>

http://cr.openjdk.java.net/~twisti/7079769/

7079769: JSR 292: incorrect size() for CallStaticJavaHandle on sparc
Reviewed-by:

The preserve_SP and restore_SP add two instructions resulting in a
size of 16 not 8.

src/cpu/sparc/vm/sparc.ad


From tom.rodriguez at oracle.com  Wed Aug 17 11:37:19 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 17 Aug 2011 11:37:19 -0700
Subject: Request for reviews (XXS): 7079769: JSR 292: incorrect size() for
	CallStaticJavaHandle on sparc
In-Reply-To: <001166E8-B9AE-4843-AF8A-6F1F9063D751@oracle.com>
References: <001166E8-B9AE-4843-AF8A-6F1F9063D751@oracle.com>
Message-ID: <132B3A1B-B8EC-45B5-B08B-982CEC305B3D@oracle.com>

Looks good.

tom

On Aug 17, 2011, at 11:20 AM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7079769/
> 
> 7079769: JSR 292: incorrect size() for CallStaticJavaHandle on sparc
> Reviewed-by:
> 
> The preserve_SP and restore_SP add two instructions resulting in a
> size of 16 not 8.
> 
> src/cpu/sparc/vm/sparc.ad
> 


From christian.thalinger at oracle.com  Wed Aug 17 16:25:02 2011
From: christian.thalinger at oracle.com (christian.thalinger at oracle.com)
Date: Wed, 17 Aug 2011 23:25:02 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7079769: JSR 292: incorrect size() for
	CallStaticJavaHandle on sparc
Message-ID: <20110817232505.6EDE447C89@hg.openjdk.java.net>

Changeset: bd87c0dcaba5
Author:    twisti
Date:      2011-08-17 11:52 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/bd87c0dcaba5

7079769: JSR 292: incorrect size() for CallStaticJavaHandle on sparc
Reviewed-by: never, kvn

! src/cpu/sparc/vm/sparc.ad


From vladimir.kozlov at oracle.com  Wed Aug 17 17:35:29 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 17 Aug 2011 17:35:29 -0700
Subject: Request for reviews (S):  7080431: VM asserts if specified size(x)
	in .ad is larger than emitted size
Message-ID: <4E4C5E51.4020307@oracle.com>

http://cr.openjdk.java.net/~kvn/7080431/webrev

7080431: VM asserts if specified size(x) in .ad is larger than emitted size

It was allowed to specify larger size(x) in mach node definition in .ad file 
than actual emitted instruction size. It was treated as upper bound on 
instruction size. 7063629 changes broke that, it requires size(x) in mach node 
definition match the emitted size which reduced flexibility in C2 development.

Move code from finalize_offsets_and_shorten() to fill_buffer() to restore 
previous behavior.

From tom.rodriguez at oracle.com  Wed Aug 17 17:55:07 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 17 Aug 2011 17:55:07 -0700
Subject: Request for reviews (S): 7080431: VM asserts if specified size(x)
	in .ad is larger than emitted size
In-Reply-To: <4E4C5E51.4020307@oracle.com>
References: <4E4C5E51.4020307@oracle.com>
Message-ID: <4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com>

Is this effectively a partial anti-delta of the fill_buffer changes?

tom

On Aug 17, 2011, at 5:35 PM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7080431/webrev
> 
> 7080431: VM asserts if specified size(x) in .ad is larger than emitted size
> 
> It was allowed to specify larger size(x) in mach node definition in .ad file than actual emitted instruction size. It was treated as upper bound on instruction size. 7063629 changes broke that, it requires size(x) in mach node definition match the emitted size which reduced flexibility in C2 development.
> 
> Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior.


From vladimir.kozlov at oracle.com  Wed Aug 17 18:20:11 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 17 Aug 2011 18:20:11 -0700
Subject: Request for reviews (S): 7080431: VM asserts if specified size(x)
	in .ad is larger than emitted size
In-Reply-To: <4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com>
References: <4E4C5E51.4020307@oracle.com>
	<4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com>
Message-ID: <4E4C68CB.8030204@oracle.com>

On 8/17/11 5:55 PM, Tom Rodriguez wrote:
> Is this effectively a partial anti-delta of the fill_buffer changes?

Yes for inserting padding and block alignment. It never did branch shortening and corresponding offsets verification.

Vladimir

>
> tom
>
> On Aug 17, 2011, at 5:35 PM, Vladimir Kozlov wrote:
>
>> http://cr.openjdk.java.net/~kvn/7080431/webrev
>>
>> 7080431: VM asserts if specified size(x) in .ad is larger than emitted size
>>
>> It was allowed to specify larger size(x) in mach node definition in .ad file than actual emitted instruction size. It was treated as upper bound on instruction size. 7063629 changes broke that, it requires size(x) in mach node definition match the emitted size which reduced flexibility in C2 development.
>>
>> Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior.
>

From tom.rodriguez at oracle.com  Wed Aug 17 18:46:46 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 17 Aug 2011 18:46:46 -0700
Subject: Request for reviews (S): 7080431: VM asserts if specified size(x)
	in .ad is larger than emitted size
In-Reply-To: <4E4C68CB.8030204@oracle.com>
References: <4E4C5E51.4020307@oracle.com>
	<4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com>
	<4E4C68CB.8030204@oracle.com>
Message-ID: <6CA01544-397C-4654-A4E4-4DBD297E1A9A@oracle.com>


On Aug 17, 2011, at 6:20 PM, Vladimir Kozlov wrote:

> On 8/17/11 5:55 PM, Tom Rodriguez wrote:
>> Is this effectively a partial anti-delta of the fill_buffer changes?
> 
> Yes for inserting padding and block alignment. It never did branch shortening and corresponding offsets verification.

I compared it with the previous one and it looks good.  Thanks for fixing this.

tom

> 
> Vladimir
> 
>> 
>> tom
>> 
>> On Aug 17, 2011, at 5:35 PM, Vladimir Kozlov wrote:
>> 
>>> http://cr.openjdk.java.net/~kvn/7080431/webrev
>>> 
>>> 7080431: VM asserts if specified size(x) in .ad is larger than emitted size
>>> 
>>> It was allowed to specify larger size(x) in mach node definition in .ad file than actual emitted instruction size. It was treated as upper bound on instruction size. 7063629 changes broke that, it requires size(x) in mach node definition match the emitted size which reduced flexibility in C2 development.
>>> 
>>> Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior.
>> 


From vladimir.kozlov at oracle.com  Wed Aug 17 18:53:24 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 17 Aug 2011 18:53:24 -0700
Subject: Request for reviews (S): 7080431: VM asserts if specified size(x)
	in .ad is larger than emitted size
In-Reply-To: <6CA01544-397C-4654-A4E4-4DBD297E1A9A@oracle.com>
References: <4E4C5E51.4020307@oracle.com>
	<4DE59851-FB56-4155-8E11-ACCF0C3EC706@oracle.com>
	<4E4C68CB.8030204@oracle.com>
	<6CA01544-397C-4654-A4E4-4DBD297E1A9A@oracle.com>
Message-ID: <4E4C7094.3030209@oracle.com>

Thank you, Tom

Vladimir

On 8/17/11 6:46 PM, Tom Rodriguez wrote:
>
> On Aug 17, 2011, at 6:20 PM, Vladimir Kozlov wrote:
>
>> On 8/17/11 5:55 PM, Tom Rodriguez wrote:
>>> Is this effectively a partial anti-delta of the fill_buffer changes?
>>
>> Yes for inserting padding and block alignment. It never did branch shortening and corresponding offsets verification.
>
> I compared it with the previous one and it looks good.  Thanks for fixing this.
>
> tom
>
>>
>> Vladimir
>>
>>>
>>> tom
>>>
>>> On Aug 17, 2011, at 5:35 PM, Vladimir Kozlov wrote:
>>>
>>>> http://cr.openjdk.java.net/~kvn/7080431/webrev
>>>>
>>>> 7080431: VM asserts if specified size(x) in .ad is larger than emitted size
>>>>
>>>> It was allowed to specify larger size(x) in mach node definition in .ad file than actual emitted instruction size. It was treated as upper bound on instruction size. 7063629 changes broke that, it requires size(x) in mach node definition match the emitted size which reduced flexibility in C2 development.
>>>>
>>>> Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior.
>>>
>

From vladimir.kozlov at oracle.com  Thu Aug 18 16:14:17 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Thu, 18 Aug 2011 23:14:17 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7080431: VM asserts if specified
	size(x) in .ad is larger than emitted size
Message-ID: <20110818231422.52BE147D40@hg.openjdk.java.net>

Changeset: 739a9abbbd4b
Author:    kvn
Date:      2011-08-18 11:49 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/739a9abbbd4b

7080431: VM asserts if specified size(x) in .ad is larger than emitted size
Summary: Move code from finalize_offsets_and_shorten() to fill_buffer() to restore previous behavior.
Reviewed-by: never

! src/share/vm/opto/compile.hpp
! src/share/vm/opto/output.cpp


From vladimir.kozlov at oracle.com  Fri Aug 19 12:41:37 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 19 Aug 2011 12:41:37 -0700
Subject: Request for reviews (XS): 7076831: TEST_BUG:
	compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS
Message-ID: <4E4EBC71.9060101@oracle.com>

http://cr.openjdk.java.net/~kvn/7076831/webrev

7076831: TEST_BUG: compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS

Run test only on systems with 2Gbyte or more memory. Don't zap heap to reduce 
execution time.


From vladimir.kozlov at oracle.com  Fri Aug 19 22:20:04 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Sat, 20 Aug 2011 05:20:04 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 40 new changesets
Message-ID: <20110820052115.A5E8247EBE@hg.openjdk.java.net>

Changeset: d9dc0a55c848
Author:    schien
Date:      2011-05-20 16:03 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d9dc0a55c848

Added tag jdk7-b143 for changeset c149193c768b

! .hgtags

Changeset: 278445be9145
Author:    trims
Date:      2011-05-24 14:02 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/278445be9145

Added tag hs21-b13 for changeset c149193c768b

! .hgtags

Changeset: 01e01c25d24a
Author:    trims
Date:      2011-05-24 14:07 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/01e01c25d24a

Merge

! .hgtags

Changeset: e6e7d76b2bd3
Author:    mr
Date:      2011-05-24 15:28 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/e6e7d76b2bd3

7048009: Update .jcheck/conf files for JDK 8
Reviewed-by: jjh

! .jcheck/conf

Changeset: 968305b802ee
Author:    trims
Date:      2011-07-23 01:56 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/968305b802ee

Merge


Changeset: 8e5d4aa73a8c
Author:    trims
Date:      2011-07-22 23:47 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/8e5d4aa73a8c

7069176: Update the JDK version numbers in Hotspot for JDK 8
Summary: Change JDK_MINOR_VER and JDK_PREVIOUS_VERSION to reflect JDK8 values
Reviewed-by: jcoomes

! make/hotspot_version

Changeset: 0cc8a70952c3
Author:    trims
Date:      2011-07-22 23:42 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/0cc8a70952c3

7070061: Adjust Hotspot make/jprt.properties for new JDK8 settings
Summary: Fix so the JPRT can build with -release jdk8 now
Reviewed-by: ohair

! make/jprt.properties

Changeset: 20cac004a4f9
Author:    dsamersoff
Date:      2011-06-09 01:06 +0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/20cac004a4f9

Merge


Changeset: 1744e37e032b
Author:    dsamersoff
Date:      2011-06-18 13:32 +0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/1744e37e032b

Merge


Changeset: d425748f2203
Author:    dcubed
Date:      2011-06-23 20:31 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/d425748f2203

7043987: 3/3 JVMTI FollowReferences is slow
Summary: VM_HeapWalkOperation::doit() should only reset mark bits when necessary.
Reviewed-by: dsamersoff, ysr, dholmes, dcubed
Contributed-by: ashok.srinivasa.murthy at oracle.com

! src/share/vm/prims/jvmtiTagMap.cpp

Changeset: 88dce6a60ac8
Author:    dcubed
Date:      2011-06-29 20:28 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/88dce6a60ac8

6951623: 3/3 possible performance problems in FollowReferences() and GetObjectsWithTags()
Summary: Call collect_stack_roots() before collect_simple_roots() as an optimization.
Reviewed-by: ysr, dsamersoff, dcubed
Contributed-by: ashok.srinivasa.murthy at oracle.com

! src/share/vm/prims/jvmtiTagMap.cpp

Changeset: 109d1d265924
Author:    dholmes
Date:      2011-07-02 04:17 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/109d1d265924

7052988: JPRT embedded builds don't set MINIMIZE_RAM_USAGE
Reviewed-by: kamg, dsamersoff

! make/jprt.gmk

Changeset: 5447b2c582ad
Author:    coleenp
Date:      2011-07-07 22:34 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/5447b2c582ad

Merge


Changeset: bcc6475bc68f
Author:    coleenp
Date:      2011-07-16 22:21 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/bcc6475bc68f

Merge


Changeset: 0b80db433fcb
Author:    dholmes
Date:      2011-07-22 00:29 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/0b80db433fcb

7046490: Preallocated OOME objects should obey Throwable stack trace protocol
Summary: Update the OOME stacktrace to contain Throwable.UNASSIGNED_STACK when the backtrace is filled in
Reviewed-by: mchung, phh

! src/share/vm/classfile/javaClasses.cpp
! src/share/vm/classfile/javaClasses.hpp

Changeset: 8107273fd204
Author:    coleenp
Date:      2011-07-23 10:42 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/8107273fd204

Merge


Changeset: ca1f1753c866
Author:    andrew
Date:      2011-07-28 14:10 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/ca1f1753c866

7072341: enable hotspot builds on Linux 3.0
Summary: Add "3" to list of allowable versions
Reviewed-by: kamg, chrisphi

! make/linux/Makefile

Changeset: 14a2fd14c0db
Author:    johnc
Date:      2011-08-01 10:04 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/14a2fd14c0db

7068240: G1: Long "parallel other time" and "ext root scanning" when running specific benchmark
Summary: In root processing, move the scanning of the reference processor's discovered lists to before RSet updating and scanning. When scanning the reference processor's discovered lists, use a buffering closure so that the time spent copying any reference object is correctly attributed. Also removed a couple of unused and irrelevant timers.
Reviewed-by: ysr, jmasa

! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp
! src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp
! src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp
! src/share/vm/gc_implementation/g1/g1CollectorPolicy.hpp

Changeset: 6aa4feb8a366
Author:    johnc
Date:      2011-08-02 12:13 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/6aa4feb8a366

7069863: G1: SIGSEGV running SPECjbb2011 and -UseBiasedLocking
Summary: Align the reserved size of the heap and perm to the heap region size to get a preferred heap base that is aligned to the region size, and call the correct heap reservation constructor. Also add a check in the heap reservation code that the reserved space starts at the requested address (if any).
Reviewed-by: kvn, ysr

! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp
! src/share/vm/runtime/virtualspace.cpp

Changeset: a20e6e447d3d
Author:    iveresov
Date:      2011-08-05 16:44 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a20e6e447d3d

7060842: UseNUMA crash with UseHugreTLBFS running SPECjvm2008
Summary: Use mmap() instead of madvise(MADV_DONTNEED) to uncommit pages
Reviewed-by: ysr

! src/os/linux/vm/os_linux.cpp

Changeset: 7c2653aefc46
Author:    iveresov
Date:      2011-08-05 16:50 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/7c2653aefc46

7060836: RHEL 5.5 and 5.6 should support UseNUMA
Summary: Add a wrapper for sched_getcpu() for systems where libc lacks it
Reviewed-by: ysr
Contributed-by: Andrew John Hughes <ahughes at redhat.com>

! src/os/linux/vm/os_linux.cpp
! src/os/linux/vm/os_linux.hpp

Changeset: 41e6ee74f879
Author:    kevinw
Date:      2011-08-02 14:37 +0100
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/41e6ee74f879

7072527: CMS: JMM GC counters overcount in some cases
Summary: Avoid overcounting when CMS has concurrent mode failure.
Reviewed-by: ysr
Contributed-by: rednaxelafx at gmail.com

! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
! src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.hpp
+ test/gc/7072527/TestFullGCCount.java

Changeset: e9db47a083cc
Author:    kevinw
Date:      2011-08-11 14:58 +0100
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/e9db47a083cc

Merge


Changeset: 87e40b34bc2b
Author:    johnc
Date:      2011-08-11 11:36 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/87e40b34bc2b

7074579: G1: JVM crash with JDK7 running ATG CRMDemo Fusion App
Summary: Handlize MemoryUsage klass oop in createGCInfo routine
Reviewed-by: tonyp, fparain, ysr, jcoomes

! src/share/vm/services/gcNotifier.cpp

Changeset: f44782f04dd4
Author:    tonyp
Date:      2011-08-12 11:31 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/f44782f04dd4

7039627: G1: avoid BOT updates for survivor allocations and dirty survivor regions incrementally
Summary: Refactor the allocation code during GC to use the G1AllocRegion abstraction. Use separate subclasses of G1AllocRegion for survivor and old regions. Avoid BOT updates and dirty survivor cards incrementally for the former.
Reviewed-by: brutisso, johnc, ysr

! src/share/vm/gc_implementation/g1/g1AllocRegion.cpp
! src/share/vm/gc_implementation/g1/g1AllocRegion.hpp
! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp
! src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp
! src/share/vm/gc_implementation/g1/g1CollectedHeap.inline.hpp
! src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp
! src/share/vm/gc_implementation/g1/g1CollectorPolicy.hpp
! src/share/vm/gc_implementation/g1/heapRegion.cpp
! src/share/vm/gc_implementation/g1/heapRegion.hpp
! src/share/vm/gc_implementation/g1/heapRegionRemSet.cpp

Changeset: 76b1a9420e3d
Author:    ysr
Date:      2011-08-16 08:02 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/76b1a9420e3d

Merge


Changeset: 46cb9a7b8b01
Author:    dsamersoff
Date:      2011-08-10 15:04 +0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/46cb9a7b8b01

7073913: The fix for 7017193 causes segfaults
Summary: Buffer overflow in os::get_line_chars
Reviewed-by: coleenp, dholmes, dcubed
Contributed-by: aph at redhat.com

! src/share/vm/runtime/os.cpp

Changeset: b1cbb0907b36
Author:    zgu
Date:      2011-04-15 09:34 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/b1cbb0907b36

7016797: Hotspot: securely/restrictive load dlls and new API for loading system dlls
Summary: Created Windows Dll wrapped to handle jdk6 and jdk7 platform requirements, also provided more restictive Dll search orders for Windows system Dlls.
Reviewed-by: acorn, dcubed, ohair, alanb

! make/windows/makefiles/compile.make
! src/os/windows/vm/decoder_windows.cpp
! src/os/windows/vm/jvm_windows.h
! src/os/windows/vm/os_windows.cpp
! src/os/windows/vm/os_windows.hpp

Changeset: 279ef1916773
Author:    zgu
Date:      2011-07-12 21:13 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/279ef1916773

7065535: Mistyped function name that disabled UseLargePages on Windows
Summary: Missing suffix "A" of Windows API LookupPrivilegeValue failed finding function pointer, caused VM to disable UseLargePages option
Reviewed-by: coleenp, phh

! src/os/windows/vm/os_windows.cpp

Changeset: a68e11dceb83
Author:    zgu
Date:      2011-08-16 09:18 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a68e11dceb83

Merge


Changeset: 00ed4ccfe642
Author:    collins
Date:      2011-08-17 07:05 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/00ed4ccfe642

Merge


Changeset: de147f62e695
Author:    kvn
Date:      2011-08-19 08:55 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/de147f62e695

Merge

- agent/src/share/classes/sun/jvm/hotspot/interpreter/BytecodeFastAAccess0.java
- agent/src/share/classes/sun/jvm/hotspot/interpreter/BytecodeFastIAccess0.java

Changeset: 24cee90e9453
Author:    jcoomes
Date:      2011-08-17 10:32 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/24cee90e9453

6791672: enable 1G and larger pages on solaris
Reviewed-by: ysr, iveresov, johnc

! src/os/solaris/vm/os_solaris.cpp
! src/share/vm/runtime/os.cpp
! src/share/vm/runtime/os.hpp

Changeset: 3be7439273c5
Author:    katleman
Date:      2011-05-25 13:31 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/3be7439273c5

7044486: open jdk repos have files with incorrect copyright headers, which can end up in src bundles
Reviewed-by: ohair, trims

! agent/src/share/classes/sun/jvm/hotspot/runtime/ServiceThread.java
! make/linux/README
! make/windows/projectfiles/kernel/Makefile
! src/cpu/x86/vm/vm_version_x86.cpp
! src/cpu/x86/vm/vm_version_x86.hpp
! src/os_cpu/solaris_sparc/vm/solaris_sparc.s
! src/share/tools/hsdis/README
! src/share/vm/gc_implementation/g1/heapRegionSet.inline.hpp
! src/share/vm/gc_implementation/parNew/parCardTableModRefBS.cpp
! src/share/vm/utilities/yieldingWorkgroup.cpp

Changeset: 8b135e6129d6
Author:    jeff
Date:      2011-05-27 15:01 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/8b135e6129d6

7045697: JDK7 THIRD PARTY README update
Reviewed-by: lana

! THIRD_PARTY_README

Changeset: 52e4ba46751f
Author:    kamg
Date:      2011-04-12 16:42 -0400
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/52e4ba46751f

7020373: JSR rewriting can overflow memory address size variables
Summary: Abort if incoming classfile's parameters would cause overflows
Reviewed-by: coleenp, dcubed, never

! src/share/vm/oops/generateOopMap.cpp
+ test/runtime/7020373/Test7020373.sh

Changeset: bca686989d4b
Author:    asaha
Date:      2011-06-15 14:59 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/bca686989d4b

7055247: Ignore test of # 7020373
Reviewed-by: dcubed

! test/runtime/7020373/Test7020373.sh

Changeset: 337ffef74c37
Author:    jeff
Date:      2011-06-22 10:10 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/337ffef74c37

7057046: Add embedded license to THIRD PARTY README
Reviewed-by: lana

! THIRD_PARTY_README

Changeset: 9f12ede5571a
Author:    jcoomes
Date:      2011-08-19 14:08 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/9f12ede5571a

Merge

! src/cpu/x86/vm/vm_version_x86.cpp
! src/cpu/x86/vm/vm_version_x86.hpp
! src/share/vm/oops/generateOopMap.cpp
! src/share/vm/runtime/os.cpp

Changeset: 7c29742c41b4
Author:    jcoomes
Date:      2011-08-19 14:22 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/7c29742c41b4

7081251: bump the hs22 build number to 02
Reviewed-by: johnc

! make/hotspot_version


From igor.veresov at oracle.com  Fri Aug 19 23:17:51 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Fri, 19 Aug 2011 23:17:51 -0700
Subject: Request for reviews (XS): 7076831: TEST_BUG:
	compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS
In-Reply-To: <4E4EBC71.9060101@oracle.com>
References: <4E4EBC71.9060101@oracle.com>
Message-ID: <EE578608FF604486A6F8AC04757393EB@oracle.com>

 Looks good. 

igor

On Friday, August 19, 2011 at 12:41 PM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7076831/webrev
> 
> 7076831: TEST_BUG: compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS
> 
> Run test only on systems with 2Gbyte or more memory. Don't zap heap to reduce 
> execution time.


From vladimir.kozlov at oracle.com  Sat Aug 20 17:24:05 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Sun, 21 Aug 2011 00:24:05 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7076831: TEST_BUG:
	compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS
Message-ID: <20110821002407.7A6CD47F4B@hg.openjdk.java.net>

Changeset: ff9ab6327924
Author:    kvn
Date:      2011-08-20 14:03 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/ff9ab6327924

7076831: TEST_BUG: compiler/5091921/Test7005594.java fails on LOW MEM SYSTEMS
Summary: Run test only on systems with 2Gbyte or more memory. Don't zap heap to reduce execution time.
Reviewed-by: iveresov

! test/compiler/5091921/Test7005594.sh


From vladimir.kozlov at oracle.com  Mon Aug 22 10:33:50 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 22 Aug 2011 10:33:50 -0700
Subject: Request for reviews (XXXS): 7081926
	assert(VM_Version::supports_sse2()) failed: must support
Message-ID: <4E5292FE.3010500@oracle.com>

http://cr.openjdk.java.net/~kvn/7081926/webrev

7081926 assert(VM_Version::supports_sse2()) failed: must support

Changes in 7079329 (use MacroAssembler prefetch instructions in x86 .ad files) 
exposed typo in this assert, prefetchnta is supported since SSE not SSE2.


From tom.rodriguez at oracle.com  Mon Aug 22 10:44:38 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 22 Aug 2011 10:44:38 -0700
Subject: Request for reviews (XXXS): 7081926
	assert(VM_Version::supports_sse2()) failed: must support
In-Reply-To: <4E5292FE.3010500@oracle.com>
References: <4E5292FE.3010500@oracle.com>
Message-ID: <83375FC5-1695-4748-AB88-78E4011AB1C2@oracle.com>

Looks good.

tom

On Aug 22, 2011, at 10:33 AM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7081926/webrev
> 
> 7081926 assert(VM_Version::supports_sse2()) failed: must support
> 
> Changes in 7079329 (use MacroAssembler prefetch instructions in x86 .ad files) exposed typo in this assert, prefetchnta is supported since SSE not SSE2.
> 


From vladimir.kozlov at oracle.com  Mon Aug 22 10:52:24 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 22 Aug 2011 10:52:24 -0700
Subject: Request for reviews (XXXS): 7081926
	assert(VM_Version::supports_sse2()) failed: must support
In-Reply-To: <83375FC5-1695-4748-AB88-78E4011AB1C2@oracle.com>
References: <4E5292FE.3010500@oracle.com>
	<83375FC5-1695-4748-AB88-78E4011AB1C2@oracle.com>
Message-ID: <4E529758.9080202@oracle.com>

Thank you, Tom

Vladimir

Tom Rodriguez wrote:
> Looks good.
> 
> tom
> 
> On Aug 22, 2011, at 10:33 AM, Vladimir Kozlov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/7081926/webrev
>>
>> 7081926 assert(VM_Version::supports_sse2()) failed: must support
>>
>> Changes in 7079329 (use MacroAssembler prefetch instructions in x86 .ad files) exposed typo in this assert, prefetchnta is supported since SSE not SSE2.
>>
>

From vladimir.kozlov at oracle.com  Mon Aug 22 17:32:48 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Tue, 23 Aug 2011 00:32:48 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7081926:
	assert(VM_Version::supports_sse2()) failed: must support
Message-ID: <20110823003251.D5BF547FF5@hg.openjdk.java.net>

Changeset: a594deb1d6dc
Author:    kvn
Date:      2011-08-22 11:00 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a594deb1d6dc

7081926: assert(VM_Version::supports_sse2()) failed: must support
Summary: fix assert, prefetchnta is supported since SSE not SSE2.
Reviewed-by: never

! src/cpu/x86/vm/assembler_x86.cpp


From christian.thalinger at oracle.com  Tue Aug 23 12:20:30 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 23 Aug 2011 21:20:30 +0200
Subject: Request for reviews (S): 7078382: JSR 292: don't count method handle
	adapters against inlining budgets
Message-ID: <E888AE2D-54D3-4D9C-855F-70A555D12385@oracle.com>

http://cr.openjdk.java.net/~twisti/7078382/

7078382: JSR 292: don't count method handle adapters against inlining budgets
Reviewed-by:

Currently the code size of method handle adapters are counted against
inlining budgets like DesiredMethodLimit.  This results to earlier
compiler bailouts with method handle call sites than without leading
to worse performance.

The fix is to return an adjusted bytecode size for method handle
adapters for inlining decisions (the metric we use for now is the
number of invokes).

Tested with JRuby benchmarks.


From tom.rodriguez at oracle.com  Tue Aug 23 16:44:38 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 23 Aug 2011 16:44:38 -0700
Subject: review for 7071307: MethodHandle bimorphic inlining should consider
	the frequency
Message-ID: <DF1ECF39-6119-466F-97E5-3B6DB9C11F70@oracle.com>

This is a re-review since I added per method handle GWT profiling.

http://cr.openjdk.java.net/~never/7071307
312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg

7071307: MethodHandle bimorphic inlining should consider the frequency
Reviewed-by:

The fix for 7050554 added a bimorphic inline path but didn't take into
account the frequency of the guarding test.  This ends up treating
both sides of the if as equally frequent which can lead to over
inlining and overflowing the method inlining limits.  The fix is to
grab the frequency from the If and apply that to the branches.

Additionally I added support for per method handle profile collection
since this was required to get good results for more complex programs.
This requires the fix for 7082631 on the JDK side.
http://cr.openjdk.java.net/~never/7082631

I also fixed a problem with the ideal graph printer where debug_orig
printing would go into an infinite loop.

Tested with jruby and vm.mlvm tests.


From christian.thalinger at oracle.com  Wed Aug 24 06:12:55 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 24 Aug 2011 15:12:55 +0200
Subject: review for 7071307: MethodHandle bimorphic inlining should
	consider the frequency
In-Reply-To: <DF1ECF39-6119-466F-97E5-3B6DB9C11F70@oracle.com>
References: <DF1ECF39-6119-466F-97E5-3B6DB9C11F70@oracle.com>
Message-ID: <4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com>


On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote:

> This is a re-review since I added per method handle GWT profiling.
> 
> http://cr.openjdk.java.net/~never/7071307
> 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg

src/share/vm/prims/methodHandleWalk.cpp:

MethodHandleCompiler::fetch_counts:

+   int count1 = -1, count2 = -1;
...
+   int total = count1 + count2;
+   if (count1 != -1 && count2 != -2 && total != 0) {

Why -2?

+   int          _taken_count;
+   int          _not_taken_count;

Does taken refer to target and not_taken to fallback in the GWT?

MethodHandleCompiler::make_invoke:

Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking?

+     bool found_sel = false;

Can you rename that to maybe found_selectAlternative?


src/share/vm/ci/ciMethodHandle.cpp:

That print_chain is very helpful.  Thanks for that.


src/share/vm/classfile/javaClasses.cpp:

+ int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) {
+   assert(is_instance(mh), "DMH only");
+   return mh->int_field(_vmcount_offset);
+ }
+ 
+ void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) {
+   assert(is_instance(mh), "DMH only");
+   mh->int_field_put(_vmcount_offset, count);
+ }

I think the assert message is a copy-paste bug.

Otherwise looks good.

> 
> 7071307: MethodHandle bimorphic inlining should consider the frequency
> Reviewed-by:
> 
> The fix for 7050554 added a bimorphic inline path but didn't take into
> account the frequency of the guarding test.  This ends up treating
> both sides of the if as equally frequent which can lead to over
> inlining and overflowing the method inlining limits.  The fix is to
> grab the frequency from the If and apply that to the branches.
> 
> Additionally I added support for per method handle profile collection
> since this was required to get good results for more complex programs.
> This requires the fix for 7082631 on the JDK side.
> http://cr.openjdk.java.net/~never/7082631

The JDK changes look good.

-- Christian

> 
> I also fixed a problem with the ideal graph printer where debug_orig
> printing would go into an infinite loop.
> 
> Tested with jruby and vm.mlvm tests.
> 


From tom.deneau at amd.com  Wed Aug 24 09:26:54 2011
From: tom.deneau at amd.com (Deneau, Tom)
Date: Wed, 24 Aug 2011 11:26:54 -0500
Subject: Review Request: UseNUMAInterleaving #6
In-Reply-To: <4E543CDA.3050904@oracle.com>
References: <5EA33A275136844D843B73A29FB9A6A901362B54B2@SAUSEXMBP01.amd.com>
	<4E402E1C.1010807@oracle.com>
	<5EA33A275136844D843B73A29FB9A6A90186EF904E@SAUSEXMBP01.amd.com>
	<247BA26129A14681B03D0856A6FAC69D@oracle.com>
	<5EA33A275136844D843B73A29FB9A6A90186EF98B7@SAUSEXMBP01.amd.com>
	<F091B4C37F4044B38EC0D7C1E58B5641@oracle.com>
	<91928C974B07497184AF80B96F196606@oracle.com>
	<5EA33A275136844D843B73A29FB9A6A90186FA618A@SAUSEXMBP01.amd.com>
	<462098EF18364A629C463AC72D5495CC@oracle.com>
	<5EA33A275136844D843B73A29FB9A6A9018D581E22@SAUSEXMBP01.amd.com>
	<9F66C366BA1C4D8A83183711ADE738A0@oracle.com>
	<5EA33A275136844D843B73A29FB9A6A9018D581EBA@SAUSEXMBP01.amd.com>
	<4E543CDA.3050904@oracle.com>
Message-ID: <5EA33A275136844D843B73A29FB9A6A9018D582275@SAUSEXMBP01.amd.com>

I believe I have addressed ramki's comments with 
http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.06/

-- Tom

> -----Original Message-----
> From: Y. S. Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
> Sent: Tuesday, August 23, 2011 6:51 PM
> To: Deneau, Tom
> Cc: hotspot-gc-dev at openjdk.java.net
> Subject: Re: Review Request: UseNUMAInterleaving #4
> 
> Hi Tom -- the perf improvement on windows is impressive.
> 
> The changes look good. Just a few very minor nits below:
> 
> globals.hpp: In the doc string field for NUMAInterleaveGranularity, you
> might state that this is a Windows only option. (although i recognize
> that this hasn't been done for some of the other windows options that
> i became aware of now as being used exclusively in windows before
> your changes, for instance: UseLargePagesIndividualAllocation
> and LargePagesIndividualAllocationInjectError).
> 
> arguments.cpp: you could get rid of the empty lines 1432-1433, and move
> the
> content of 1428-1430 into the if-scope of 1422-1426.
> 
> os_windows.cpp: you can probably get rid of the extra newline
> introduced at line 1967.
> 
> line 3018, typo: "NUMAInterleavaing"
> also at line 3033: "thNUMANodeListHolderat"
> The comment at lines 3030-3033 would also benefit
> from a few missing punctuation marks.
> 
> at lines 3040 and 3043, it might read better to place the returns
> on lines of their own.
> 
> If you run with +UseNUMAInterleaving and a commit failed,
> it would seem that the error message at line 2987 would be
> confusing and incorrect. Perhaps you want to suitably modify
> it or just suppress the additional text in that case.
> 
> os_solaris.cpp: 2780-2784, it might make sense to do the madvise
> global/many
> call only if the mmap_chunk() succeeds, rather than all the time as you
> are doing. May be something like:-
> 
> 2780   char *res = Solaris::mmap_chunk(addr, size, MAP_PRIVATE|MAP_FIXED,
> prot);
> 2781   if (res != NULL) {
>           if (UseNUMAInterleaving) {
> 2782       numa_make_global(res, size);
>           }
>           return true;
> 2783   }
> 2784   return false;
> 
> At line 3444, would it make sense to use "size" instead of "bytes"
> (although
> size is just a copy of bytes -- i don't understand the reason for making
> the copy, so feel free to ignore if this is some recherche style issue;
> otherwise
> it might make sense to get rid of the copy and just use the formal
> parameter as is
> the case for the Linux code; although this is really not code that you
> introduced,
> but just because you happen to be touching code in the vicinity... your
> choice.)
> 
> In the same vein, i'd make the Linux code similar in shape to
> the solaris code for the two hunks changed in os_linux.cpp.
> 
> rest looks good.
> -- ramki
> 
> On 08/23/11 12:59, Deneau, Tom wrote:
> > OK, http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.05/
> > should address the concerns listed below...
> >
> > -- Tom
> >
> >
> >> -----Original Message-----
> >> From: Igor Veresov [mailto:igor.veresov at oracle.com]
> >> Sent: Tuesday, August 23, 2011 1:53 PM
> >> To: Deneau, Tom
> >> Cc: hotspot-gc-dev at openjdk.java.net
> >> Subject: Re: Review Request: UseNUMAInterleaving #4
> >>
> >>  Tom,
> >>
> >> This looks good to me, except three minor things:
> >>
> >> os_windows.cpp:
> >>
> >> - you should check for null here:
> >> 2630 ~NUMANodeListHolder() {
> >>> if (_numa_used_node_list != NULL) {
> >> 2631 FREE_C_HEAP_ARRAY(int, _numa_used_node_list);
> >>> }
> >> 2632 }
> >>
> >> - if NUMANodeListHolder::build() will be called multiple times, you'll
> >> leak memory. I guess you should check if _numa_used_node_list is NULL
> and
> >> if not free it first.
> >>
> >> - you didn't modify os::numa_get_leaf_groups() to handle the situation
> >> when the value of argument "size" is bigger than
> >> NUMANodeListHolder::get_count(). You can use MIN2 to adjust the value.
> >> See my comment in the previous mail.
> >>
> >>
> >> igor
> >>
> >> On Tuesday, August 23, 2011 at 11:23 AM, Deneau, Tom wrote:
> >>
> >>> Please review this patch which adds a new flag called
> >>> UseNUMAInterleaving. This flag provides a subset of the functionality
> >>> provided by UseNUMA. In Hotspot UseNUMA terminology,
> >>> UseNUMAInterleaved makes all memory "numa_global" which is
> implemented
> >>> as interleaved. This patch's main purpose is to provide that subset
> >>> on OSes like Windows which do not support the full UseNUMA
> >>> functionality. However, a simple implementation of
> UseNUMAInterleaving
> >> is
> >>> also provided for other OSes
> >>>
> >>> The situations where this shows the biggest benefits would be:
> >>>  * Windows platforms with multiple numa nodes (eg, 4)
> >>>
> >>>  * The JVM process is run across all the nodes (not affinitized to
> >>>  one node).
> >>>
> >>>  * A workload that has enough threads so that it uses the majority
> >>>  of the cores in the machine, so that the heap is being accessed
> >>>  from many cores, including remote ones.
> >>>
> >>>  * Enough memory per node and a heap size such that the default heap
> >>>  placement policy on windows would end up with the heap (or
> >>>  nursery) placed on one node.
> >>>
> >>> jbb2005 and SPECPower_ssj2008 are examples of such workloads. In our
> >>> measurements, we have seen some cases where the performance with
> >>> UseNUMAInterleaving was 2.7x vs. the performance without. There were
> >>> gains of varying sizes across all systems.
> >>>
> >>> The webrev is at
> >>> http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.04/
> >>>
> >>> Summary of changes in webrev.04 from webrev.03:
> >>>
> >>>  * As suggested by Igor Veresov, UseNUMA can imply
> >>>  UseNUMAInterleaving on all platforms. This is in arguments.cpp
> >>>
> >>>  * In NUMANodeListHolder in os_windows.cpp, allocates the node_list
> >>>  dynamically rather than assuming a length of 64. The method
> >>>  NUMANodeListHolder::get_node_list_entry checks returns -1 for
> >>>  indexes that are out of bounds.
> >>>
> >>>  * Several code convention cleanups suggested by Igor.
> >>>
> >>>  * Merge with the new style system dll function resolutions from
> >>>  "7016797: Hotspot: securely/restrictive load dlls and new API for
> >>>  loading system dlls" Note: my new NUMA functions are outside the
> >> ifdefs.
> >>>
> >>> Summary of changes in webrev.03 from webrev.02:
> >>>
> >>>  * As suggested by Igor Veresov, reverts to using
> >>>  UseNUMAInterleaving as the enabling flag. This will make it
> >>>  easier in the future when there are GCs that enable fuller
> >>>  UseNUMA on Windows.
> >>>
> >>>  * Adds a simple implementation of UseNUMAInterleaving on Linux and
> >>>  Solaris, which just calls numa_make_global after commit_memory
> >>>  and reserve_memory_special
> >>>
> >>>  * Adds a flag NUMAInterleaveGranularity which allows setting the
> >>>  granularity with which we move to a different node in a memory
> >>>  allocation. The default is 2MB. This flag only applies to
> >>>  Windows for now.
> >>>
> >>>  * Several code cleanups in os_windows.cpp suggested by Igor.
> >>>
> >>>
> >>> Summary of overall changes in os_windows.cpp:
> >>>
> >>>  * Some static routines were added to set things up init time. These
> >>>  * check that the required APIs (VirtualAllocExNuma,
> >>>  GetNumaHighestNodeNumber, GetNumaNodeProcessorMask) exist in
> >>>  the OS
> >>>
> >>>  * build the list of numa nodes on which this process has affinity
> >>>
> >>>  * Changes to os::reserve_memory
> >>>  * There was already a routine that reserved pages one page at a
> >>>  time (used for Individual Large Page Allocation on WS2003).
> >>>  This was abstracted to a separate routine, called
> >>>  allocate_pages_individually. This gets called both for the
> >>>  Individual Large Page Allocation thing mentioned above and for
> >>>  UseNUMAInterleaving (for both small and large pages)
> >>>
> >>>  * When used for NUMA Interleaving this just goes thru the numa
> >>>  node list in a round-robin fashion, allocating chunks at the
> >>>  NUMAInterleaveGranularity using a different allocation for
> >>>  each chunk
> >>>
> >>>  * Whether we do just a reserve or a combined reserve/commit is
> >>>  determined by the caller of allocate_pages_individually
> >>>
> >>>  * When used with large pages, we do a Reserve and Commit at
> >>>  the same time which is the way it always worked and the way
> >>>  it has to work on windows.
> >>>
> >>>  * For small pages, only the reserve is done, the commit will
> >>>  come later. (which is the way it worked for
> >>>  non-interleaved)
> >>>
> >>>  * os::commit_memory changes
> >>>  * If UseNUMAIntereaving is true, os::commit_memory has to check
> >>>  whether it was being asked to commit memory that might have
> >>>  come from multiple Reserve allocations, if so, the commits
> >>>  must also be broken up. We don't keep any data structure to
> >>>  keep track of this, we just use VirtualQuery which queries the
> >>>  properties of a VA range and can tell us how much came from
> >>>  one VirtualAlloc call.
> >>>
> >>> I do not have a bug id for this.
> >>>
> >>> -- Tom Deneau, AMD
> >>
> >


From tom.rodriguez at oracle.com  Wed Aug 24 11:59:01 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 24 Aug 2011 11:59:01 -0700
Subject: review for 7082949: JSR 292: missing ResourceMark in
	methodOopDesc::make_invoke_method
Message-ID: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com>

http://cr.openjdk.java.net/~never/7082949
55 lines changed: 55 ins; 0 del; 0 mod; 1606 unchg

7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method
Summary:
Reviewed-by:

The fix for 7056328 added some resource allocation in some cases when
building the invoke method but didn't insert a ResourceMark.  Mostly
we ended up using one in a caller but sometimes the caller doesn't
have one so this code needs its own.  Tested with failing test case.


From vladimir.kozlov at oracle.com  Wed Aug 24 12:17:26 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 24 Aug 2011 12:17:26 -0700
Subject: review for 7082949: JSR 292: missing ResourceMark in
	methodOopDesc::make_invoke_method
In-Reply-To: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com>
References: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com>
Message-ID: <4E554E46.40409@oracle.com>

Looks good.

Vladimir

Tom Rodriguez wrote:
> http://cr.openjdk.java.net/~never/7082949
> 55 lines changed: 55 ins; 0 del; 0 mod; 1606 unchg
> 
> 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method
> Summary:
> Reviewed-by:
> 
> The fix for 7056328 added some resource allocation in some cases when
> building the invoke method but didn't insert a ResourceMark.  Mostly
> we ended up using one in a caller but sometimes the caller doesn't
> have one so this code needs its own.  Tested with failing test case.
> 

From christian.thalinger at oracle.com  Wed Aug 24 12:42:36 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 24 Aug 2011 21:42:36 +0200
Subject: review for 7082949: JSR 292: missing ResourceMark in
	methodOopDesc::make_invoke_method
In-Reply-To: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com>
References: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com>
Message-ID: <014007F6-6C38-4B89-8622-43BAD1B64D3E@oracle.com>

Looks good.  -- Christian

On Aug 24, 2011, at 8:59 PM, Tom Rodriguez wrote:

> http://cr.openjdk.java.net/~never/7082949
> 55 lines changed: 55 ins; 0 del; 0 mod; 1606 unchg
> 
> 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method
> Summary:
> Reviewed-by:
> 
> The fix for 7056328 added some resource allocation in some cases when
> building the invoke method but didn't insert a ResourceMark.  Mostly
> we ended up using one in a caller but sometimes the caller doesn't
> have one so this code needs its own.  Tested with failing test case.
> 


From tom.rodriguez at oracle.com  Wed Aug 24 13:57:20 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 24 Aug 2011 13:57:20 -0700
Subject: review for 7082949: JSR 292: missing ResourceMark in
	methodOopDesc::make_invoke_method
In-Reply-To: <014007F6-6C38-4B89-8622-43BAD1B64D3E@oracle.com>
References: <5004B040-A61F-48C5-986B-D4CE9C3D1F0F@oracle.com>
	<014007F6-6C38-4B89-8622-43BAD1B64D3E@oracle.com>
Message-ID: <9DA84FDC-34BE-460D-92CA-F7BBCD752A94@oracle.com>

Thanks Christian and Vladimir.

tom

On Aug 24, 2011, at 12:42 PM, Christian Thalinger wrote:

> Looks good.  -- Christian
> 
> On Aug 24, 2011, at 8:59 PM, Tom Rodriguez wrote:
> 
>> http://cr.openjdk.java.net/~never/7082949
>> 55 lines changed: 55 ins; 0 del; 0 mod; 1606 unchg
>> 
>> 7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method
>> Summary:
>> Reviewed-by:
>> 
>> The fix for 7056328 added some resource allocation in some cases when
>> building the invoke method but didn't insert a ResourceMark.  Mostly
>> we ended up using one in a caller but sometimes the caller doesn't
>> have one so this code needs its own.  Tested with failing test case.
>> 
> 


From tom.rodriguez at oracle.com  Wed Aug 24 14:12:39 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 24 Aug 2011 14:12:39 -0700
Subject: review for 7071307: MethodHandle bimorphic inlining should
	consider the frequency
In-Reply-To: <4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com>
References: <DF1ECF39-6119-466F-97E5-3B6DB9C11F70@oracle.com>
	<4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com>
Message-ID: <CA629B01-E95B-45C0-A8F5-FE8EF01E0420@oracle.com>


On Aug 24, 2011, at 6:12 AM, Christian Thalinger wrote:

> 
> On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote:
> 
>> This is a re-review since I added per method handle GWT profiling.
>> 
>> http://cr.openjdk.java.net/~never/7071307
>> 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg
> 
> src/share/vm/prims/methodHandleWalk.cpp:
> 
> MethodHandleCompiler::fetch_counts:
> 
> +   int count1 = -1, count2 = -1;
> ...
> +   int total = count1 + count2;
> +   if (count1 != -1 && count2 != -2 && total != 0) {
> 
> Why -2?

Just a typo.  It's fixed.

> 
> +   int          _taken_count;
> +   int          _not_taken_count;
> 
> Does taken refer to target and not_taken to fallback in the GWT?

They refer to the bytecode and the vmcounts collected.  I think they are actually reversed from what selectAlternative generates but as long as they agree with the bytecodes generated I don't think it matters.  I verified empirically that the counts match the execution and feed into the frequency in the proper fashion.

> 
> MethodHandleCompiler::make_invoke:
> 
> Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking?

I added support for ifeq and added update_branch_dest to correct the offsets.  I only added support for ifeq for now.

> 
> +     bool found_sel = false;
> 
> Can you rename that to maybe found_selectAlternative?

Yup.

> 
> 
> src/share/vm/ci/ciMethodHandle.cpp:
> 
> That print_chain is very helpful.  Thanks for that.
> 
> 
> src/share/vm/classfile/javaClasses.cpp:
> 
> + int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) {
> +   assert(is_instance(mh), "DMH only");
> +   return mh->int_field(_vmcount_offset);
> + }
> + 
> + void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) {
> +   assert(is_instance(mh), "DMH only");
> +   mh->int_field_put(_vmcount_offset, count);
> + }
> 
> I think the assert message is a copy-paste bug.

Fixed.

> 
> Otherwise looks good.

Thanks!

tom

> 
>> 
>> 7071307: MethodHandle bimorphic inlining should consider the frequency
>> Reviewed-by:
>> 
>> The fix for 7050554 added a bimorphic inline path but didn't take into
>> account the frequency of the guarding test.  This ends up treating
>> both sides of the if as equally frequent which can lead to over
>> inlining and overflowing the method inlining limits.  The fix is to
>> grab the frequency from the If and apply that to the branches.
>> 
>> Additionally I added support for per method handle profile collection
>> since this was required to get good results for more complex programs.
>> This requires the fix for 7082631 on the JDK side.
>> http://cr.openjdk.java.net/~never/7082631
> 
> The JDK changes look good.
> 
> -- Christian
> 
>> 
>> I also fixed a problem with the ideal graph printer where debug_orig
>> printing would go into an infinite loop.
>> 
>> Tested with jruby and vm.mlvm tests.
>> 
> 


From vladimir.kozlov at oracle.com  Wed Aug 24 17:52:16 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 24 Aug 2011 17:52:16 -0700
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
Message-ID: <4E559CC0.6030701@oracle.com>

http://cr.openjdk.java.net/~kvn/7059037/webrev

7059037: Use BIS for zeroing on T4

On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
was added to use in runtime.

BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
requires membar. 2Hb was selected based on microbenchmark results.

I also added wrasi(Reg, immI) instruction which I used during development.
VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
was not used.
Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
since it will be cleaned later in init_obj().
Fixed call sites of check_for_bad_heap_word_value() where klass is not
initialized to avoid the verification failure.


From christian.thalinger at oracle.com  Thu Aug 25 00:59:25 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 25 Aug 2011 09:59:25 +0200
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <4E559CC0.6030701@oracle.com>
References: <4E559CC0.6030701@oracle.com>
Message-ID: <E0B14F3E-685F-43B0-8D7D-4B302D0C239F@oracle.com>


On Aug 25, 2011, at 2:52 AM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7059037/webrev
> 
> 7059037: Use BIS for zeroing on T4
> 
> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
> was added to use in runtime.
> 
> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
> requires membar. 2Hb was selected based on microbenchmark results.
> 
> I also added wrasi(Reg, immI) instruction which I used during development.
> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
> was not used.
> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
> since it will be cleaned later in init_obj().
> Fixed call sites of check_for_bad_heap_word_value() where klass is not
> initialized to avoid the verification failure.
> 

src/cpu/sparc/vm/assembler_sparc.cpp:

+   int cach_line_size = VM_Version::prefetch_data_size();

I guess this should be cache_line_size.

+   // Use BIS zeroing only for big arrays since it requires membar.
+   if (Assembler::is_simm13(blk_zero_size)) { // < 4096
+     cmp(count, blk_zero_size);
+   } else {
+     set(blk_zero_size, temp);
+     cmp(count, temp);
+   }

You could use ensure_simm13_or_reg here:

  cmp(count, ensure_simm13_or_reg(blk_zero_size, temp));

but I think you have to add a cmp(Register s1, RegisterOrConstant s2).

+   // Clean the beginning of space upto next cache line.

There is a space missing:  "up to".

Otherwise this looks good.


A side question:  what's the difference between using reg_to_register_object($tmp$$reg) and $tmp$$Register?  What does:

  assert(L5->encoding() == R_L5_enc && G1->encoding() == R_G1_enc, "right coding");

in reg_to_register_object actually check for?

-- Christian

From christian.thalinger at oracle.com  Thu Aug 25 02:17:00 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 25 Aug 2011 11:17:00 +0200
Subject: Request for reviews (M): 7083184: JSR 292: don't store context class
	argument with call site dependencies
Message-ID: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com>

http://cr.openjdk.java.net/~twisti/7083184/

7083184: JSR 292: don't store context class argument with call site dependencies
Reviewed-by:

The changes of 7071653 store a context class argument per call site
dependency in the dependency stream.  This is actually not required
since the context class is implicitly available with the first
argument; the call site object.  Additionally call site dependencies
should not depend on the very general super class CallSite but rather
its actual class.

src/share/vm/ci/ciEnv.cpp
src/share/vm/ci/ciEnv.hpp
src/share/vm/code/dependencies.cpp
src/share/vm/code/dependencies.hpp
src/share/vm/memory/universe.cpp
src/share/vm/opto/callGenerator.cpp


From christian.thalinger at oracle.com  Thu Aug 25 06:54:42 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 25 Aug 2011 15:54:42 +0200
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint invalidation
	should be pushed not pulled
Message-ID: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>

http://cr.openjdk.java.net/~twisti/7071709/

7071709: JSR 292: switchpoint invalidation should be pushed not pulled
Reviewed-by:

SwitchPoints use a MutableCallSite for its implementation.  The fix is
to treat the target field of constant CallSites as a compile time
constant and add a dependence for invalidation of the optimization.

src/share/vm/opto/memnode.cpp
src/share/vm/opto/parse3.cpp


From martin.doerr at sap.com  Thu Aug 25 09:42:03 2011
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 25 Aug 2011 18:42:03 +0200
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <4E559CC0.6030701@oracle.com>
References: <4E559CC0.6030701@oracle.com>
Message-ID: <160598AAAEA6C640BF796BA28D836C6404FC3DC8BF@DEWDFECCR04.wdf.sap.corp>

Hi Vladimir,

this looks like a good starting point. Have you already seen my comments which I had added to bug 7059037?
I just pasted them below.

Kind regards,
Martin D


I'm aware of 2 easy to implement but problematic ways to use block initializing
instructions for TLAB initialization:

1. Use them in ClearArray. The problem here is that objects are not cache line
aligned in general so we need to clear the slow way before (and after?) a cache line
boundary. This is not difficult to implement but has quite some overhead and
does not avoid fetching cache lines from memory at the beginning (end?) of objects.

2. Use them in zero_to_words and activate -XX:+ZeroTLAB. This will clear
the whole TLABs when they get allocated. Doesn't perform well when TLABs get
large and cache lines get squeezed out to other levels in the memory hierarchy.
(BTW: filling with badHeapWordVal in ThreadLocalAllocBuffer::allocate breaks
ZeroTLAB function in debug build, maybe we should open a new bug for it)

My new proposal is to combine the zeroing with the prefetching. We only have to
make sure that we always clear up to some distance behind the object being allocated.
Then we can disable the ClearArray nodes as it is done when ZeroTLAB is used. We already
have tlab_pf_top_offset which is used with AllocatePrefetchStyle==2. Block initializing
prefetching could be implemented using such kind of a prefetch watermark. If we establish
to align the TLABs to cache line boundaries and to use a size which is divisible by the
cache line size, this should be easy to implement (which shouldn't be a bad thing for any
platform).

We could use an AllocatePrefetchDistance of one cache line behind new_eden_top which
probably makes sense, but playing with it might still be interesting because some processors
use automatic hardware prefetching which can interfere with what we're doing. We should
probably clear so far ahead that the hardware prefetch engine doesn't overtake us.


-----Original Message-----
From: hotspot-compiler-dev-bounces at openjdk.java.net [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov
Sent: Donnerstag, 25. August 2011 02:52
To: hotspot compiler
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4

http://cr.openjdk.java.net/~kvn/7059037/webrev

7059037: Use BIS for zeroing on T4

On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
was added to use in runtime.

BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
requires membar. 2Hb was selected based on microbenchmark results.

I also added wrasi(Reg, immI) instruction which I used during development.
VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
was not used.
Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
since it will be cleaned later in init_obj().
Fixed call sites of check_for_bad_heap_word_value() where klass is not
initialized to avoid the verification failure.


From tom.rodriguez at oracle.com  Thu Aug 25 09:55:19 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 25 Aug 2011 09:55:19 -0700
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
Message-ID: <108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com>

Why is this being done for VolatileCallSite?  There's no mechanism for falling back if the field is invalided too many times so we're just going to recompile over and over again which seems wrong.  Otherwise it looks ok.

tom

On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7071709/
> 
> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled
> Reviewed-by:
> 
> SwitchPoints use a MutableCallSite for its implementation.  The fix is
> to treat the target field of constant CallSites as a compile time
> constant and add a dependence for invalidation of the optimization.
> 
> src/share/vm/opto/memnode.cpp
> src/share/vm/opto/parse3.cpp
> 


From forax at univ-mlv.fr  Thu Aug 25 10:51:27 2011
From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=)
Date: Thu, 25 Aug 2011 19:51:27 +0200
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com>
Message-ID: <4E568B9F.8040606@univ-mlv.fr>

On 08/25/2011 06:55 PM, Tom Rodriguez wrote:
> Why is this being done for VolatileCallSite?  There's no mechanism for falling back if the field is invalided too many times so we're just going to recompile over and over again which seems wrong.  Otherwise it looks ok.

Maybe because it can be invalidated only once from the API side.

>
> tom

R?mi

>
> On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote:
>
>> http://cr.openjdk.java.net/~twisti/7071709/
>>
>> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled
>> Reviewed-by:
>>
>> SwitchPoints use a MutableCallSite for its implementation.  The fix is
>> to treat the target field of constant CallSites as a compile time
>> constant and add a dependence for invalidation of the optimization.
>>
>> src/share/vm/opto/memnode.cpp
>> src/share/vm/opto/parse3.cpp
>>


From vladimir.kozlov at oracle.com  Thu Aug 25 10:50:19 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 25 Aug 2011 10:50:19 -0700
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <E0B14F3E-685F-43B0-8D7D-4B302D0C239F@oracle.com>
References: <4E559CC0.6030701@oracle.com>
	<E0B14F3E-685F-43B0-8D7D-4B302D0C239F@oracle.com>
Message-ID: <4E568B5B.5050307@oracle.com>

Thank you, Christian

Christian Thalinger wrote:
> On Aug 25, 2011, at 2:52 AM, Vladimir Kozlov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/7059037/webrev
>>
>> 7059037: Use BIS for zeroing on T4
>>
>> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
>> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
>> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
>> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
>> was added to use in runtime.
>>
>> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
>> requires membar. 2Hb was selected based on microbenchmark results.
>>
>> I also added wrasi(Reg, immI) instruction which I used during development.
>> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
>> was not used.
>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
>> since it will be cleaned later in init_obj().
>> Fixed call sites of check_for_bad_heap_word_value() where klass is not
>> initialized to avoid the verification failure.
>>
> 
> src/cpu/sparc/vm/assembler_sparc.cpp:
> 
> +   int cach_line_size = VM_Version::prefetch_data_size();
> 
> I guess this should be cache_line_size.

Fixed.

> 
> +   // Use BIS zeroing only for big arrays since it requires membar.
> +   if (Assembler::is_simm13(blk_zero_size)) { // < 4096
> +     cmp(count, blk_zero_size);
> +   } else {
> +     set(blk_zero_size, temp);
> +     cmp(count, temp);
> +   }
> 
> You could use ensure_simm13_or_reg here:
> 
>   cmp(count, ensure_simm13_or_reg(blk_zero_size, temp));
> 
> but I think you have to add a cmp(Register s1, RegisterOrConstant s2).

I will keep it as it is. I don't want to add new method for just one case.

> 
> +   // Clean the beginning of space upto next cache line.
> 
> There is a space missing:  "up to".

Fixed.

> 
> Otherwise this looks good.
> 
> 
> A side question:  what's the difference between using reg_to_register_object($tmp$$reg) and $tmp$$Register?  What does:

I think reg_to_register_object() was implemented before $tmp$$Register. I copied 
code from original clear_array() which is very old. I will switch to 
$tmp$$Register form.

> 
>   assert(L5->encoding() == R_L5_enc && G1->encoding() == R_G1_enc, "right coding");
> 
> in reg_to_register_object actually check for?

It is old code which verifies that encoding() produces correct result.

Thanks,
Vlaidmir

> 
> -- Christian

From john.r.rose at oracle.com  Thu Aug 25 11:15:03 2011
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 25 Aug 2011 11:15:03 -0700
Subject: Request for reviews (M): 7083184: JSR 292: don't store context
	class argument with call site dependencies
In-Reply-To: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com>
References: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com>
Message-ID: <4F53CD27-1042-4193-810B-C2A906490077@oracle.com>

Looks good.  -- John

On Aug 25, 2011, at 2:17 AM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7083184/
> 
> 7083184: JSR 292: don't store context class argument with call site dependencies
> Reviewed-by:
> 
> The changes of 7071653 store a context class argument per call site
> dependency in the dependency stream.  This is actually not required
> since the context class is implicitly available with the first
> argument; the call site object.  Additionally call site dependencies
> should not depend on the very general super class CallSite but rather
> its actual class.
> 
> src/share/vm/ci/ciEnv.cpp
> src/share/vm/ci/ciEnv.hpp
> src/share/vm/code/dependencies.cpp
> src/share/vm/code/dependencies.hpp
> src/share/vm/memory/universe.cpp
> src/share/vm/opto/callGenerator.cpp
> 


From john.r.rose at oracle.com  Thu Aug 25 11:21:40 2011
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 25 Aug 2011 11:21:40 -0700
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
Message-ID: <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>

That's nice and clean.

One question:  What happens when a CallSite optimizes down to a ConstantCallSite?  It looks like a useless dependency will get inserted.

Maybe the call to assert_call_site_target_value should be guarded by a check whether the field is marked final.

Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass.  Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses.

-- John

On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7071709/
> 
> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled
> Reviewed-by:
> 
> SwitchPoints use a MutableCallSite for its implementation.  The fix is
> to treat the target field of constant CallSites as a compile time
> constant and add a dependence for invalidation of the optimization.
> 
> src/share/vm/opto/memnode.cpp
> src/share/vm/opto/parse3.cpp
> 


From tom.rodriguez at oracle.com  Thu Aug 25 11:32:28 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 25 Aug 2011 11:32:28 -0700
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <4E568B9F.8040606@univ-mlv.fr>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com>
	<4E568B9F.8040606@univ-mlv.fr>
Message-ID: <05ACD704-7163-497D-8C2C-9E7AD760E080@oracle.com>


On Aug 25, 2011, at 10:51 AM, R?mi Forax wrote:

> On 08/25/2011 06:55 PM, Tom Rodriguez wrote:
>> Why is this being done for VolatileCallSite?  There's no mechanism for falling back if the field is invalided too many times so we're just going to recompile over and over again which seems wrong.  Otherwise it looks ok.
> 
> Maybe because it can be invalidated only once from the API side.

The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times.  MutableCallSite can be as well but the implication is that it's not updated very often.  I'm actually unclear what distinction is trying to be made with VolatileCallSite.

tom

> 
>> 
>> tom
> 
> R?mi
> 
>> 
>> On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote:
>> 
>>> http://cr.openjdk.java.net/~twisti/7071709/
>>> 
>>> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled
>>> Reviewed-by:
>>> 
>>> SwitchPoints use a MutableCallSite for its implementation.  The fix is
>>> to treat the target field of constant CallSites as a compile time
>>> constant and add a dependence for invalidation of the optimization.
>>> 
>>> src/share/vm/opto/memnode.cpp
>>> src/share/vm/opto/parse3.cpp
>>> 
> 


From tom.rodriguez at oracle.com  Thu Aug 25 11:34:13 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 25 Aug 2011 11:34:13 -0700
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
Message-ID: <FB55D31E-6C12-4901-91AB-B02327BE7D49@oracle.com>


On Aug 25, 2011, at 11:21 AM, John Rose wrote:

> That's nice and clean.
> 
> One question:  What happens when a CallSite optimizes down to a ConstantCallSite?  It looks like a useless dependency will get inserted.
> 
> Maybe the call to assert_call_site_target_value should be guarded by a check whether the field is marked final.

target isn't final.  The semantics of the field are captured in the subclass.  It's true you don't need the dependence for ConstantCallSite though.

tom

> 
> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass.  Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses.
> 
> -- John
> 
> On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote:
> 
>> http://cr.openjdk.java.net/~twisti/7071709/
>> 
>> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled
>> Reviewed-by:
>> 
>> SwitchPoints use a MutableCallSite for its implementation.  The fix is
>> to treat the target field of constant CallSites as a compile time
>> constant and add a dependence for invalidation of the optimization.
>> 
>> src/share/vm/opto/memnode.cpp
>> src/share/vm/opto/parse3.cpp
>> 
> 


From tom.rodriguez at oracle.com  Thu Aug 25 12:58:54 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 25 Aug 2011 12:58:54 -0700
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <4E559CC0.6030701@oracle.com>
References: <4E559CC0.6030701@oracle.com>
Message-ID: <2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com>

src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp:

Please use an ifdef block instead of the expression form.

You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments.  Something like:

predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con() > BlkZeroingLowLimit)

That would reduce any overhead for large instances that will never benefit from BIS.

Could we use block instead of blk?  Otherwise this looks good.

tom

On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7059037/webrev
> 
> 7059037: Use BIS for zeroing on T4
> 
> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
> was added to use in runtime.
> 
> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
> requires membar. 2Hb was selected based on microbenchmark results.
> 
> I also added wrasi(Reg, immI) instruction which I used during development.
> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
> was not used.
> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
> since it will be cleaned later in init_obj().
> Fixed call sites of check_for_bad_heap_word_value() where klass is not
> initialized to avoid the verification failure.
> 


From tom.rodriguez at oracle.com  Thu Aug 25 13:01:58 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 25 Aug 2011 13:01:58 -0700
Subject: Request for reviews (M): 7083184: JSR 292: don't store context
	class argument with call site dependencies
In-Reply-To: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com>
References: <0C8F4861-FA92-4CAC-861D-8B26CAE7D8D2@oracle.com>
Message-ID: <F67831B1-38CF-48D0-97B0-D608246BDB81@oracle.com>

Looks good.

tom

On Aug 25, 2011, at 2:17 AM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7083184/
> 
> 7083184: JSR 292: don't store context class argument with call site dependencies
> Reviewed-by:
> 
> The changes of 7071653 store a context class argument per call site
> dependency in the dependency stream.  This is actually not required
> since the context class is implicitly available with the first
> argument; the call site object.  Additionally call site dependencies
> should not depend on the very general super class CallSite but rather
> its actual class.
> 
> src/share/vm/ci/ciEnv.cpp
> src/share/vm/ci/ciEnv.hpp
> src/share/vm/code/dependencies.cpp
> src/share/vm/code/dependencies.hpp
> src/share/vm/memory/universe.cpp
> src/share/vm/opto/callGenerator.cpp
> 


From y.s.ramakrishna at oracle.com  Thu Aug 25 13:23:12 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Thu, 25 Aug 2011 13:23:12 -0700
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <4E559CC0.6030701@oracle.com>
References: <4E559CC0.6030701@oracle.com>
Message-ID: <4E56AF30.8050909@oracle.com>

Hi Vladimir --

On 8/24/2011 5:52 PM, Vladimir Kozlov wrote:
> http://cr.openjdk.java.net/~kvn/7059037/webrev
>
> 7059037: Use BIS for zeroing on T4
>
...
> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of 
> zeroing it
> since it will be cleaned later in init_obj().
> Fixed call sites of check_for_bad_heap_word_value() where klass is not
> initialized to avoid the verification failure.
>

Can you describe why these two changes were necessary? There was already 
support
for skipping headers for concurrent GC's when zapping and verifying. Did 
something
change that caused this to be changed.

I haven't looked at the rest of the files, but a high level description 
of the need to
make this change would allow me to review the changes that necessitated 
this,
and whether it could not be done more easily otherwise (using the existing
framework of skipping a preamble of words in the object).

-- ramki

From john.r.rose at oracle.com  Thu Aug 25 16:47:13 2011
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 25 Aug 2011 16:47:13 -0700
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <05ACD704-7163-497D-8C2C-9E7AD760E080@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com>
	<4E568B9F.8040606@univ-mlv.fr>
	<05ACD704-7163-497D-8C2C-9E7AD760E080@oracle.com>
Message-ID: <35544B63-E3B9-4F37-9015-4E081392A9D1@oracle.com>

On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote:

> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times.  MutableCallSite can be as well but the implication is that it's not updated very often.  I'm actually unclear what distinction is trying to be made with VolatileCallSite.

The MCS and VCS have the same semantics, except for the extra memory barriers on VCS.  These barriers do not affect the validity or applicability of push notification.  Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times.  We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted.

-- Joh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110825/c42544f4/attachment.html 

From vladimir.kozlov at oracle.com  Thu Aug 25 16:51:56 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 25 Aug 2011 16:51:56 -0700
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <4E56AF30.8050909@oracle.com>
References: <4E559CC0.6030701@oracle.com> <4E56AF30.8050909@oracle.com>
Message-ID: <4E56E01C.6000604@oracle.com>

Ramki,

Ramki Ramakrishna wrote:
> Hi Vladimir --
> 
> On 8/24/2011 5:52 PM, Vladimir Kozlov wrote:
>> http://cr.openjdk.java.net/~kvn/7059037/webrev
>>
>> 7059037: Use BIS for zeroing on T4
>>
> ...
>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of 
>> zeroing it
>> since it will be cleaned later in init_obj().

TLAB::allocate() zaps new objects so I think allocate_from_tlab_slow() should 
also zap new object (and I copied code from ThreadLocalAllocBuffer::allocate()) 
instead of cleaning it since it will be cleaned later in init_obj().

>> Fixed call sites of check_for_bad_heap_word_value() where klass is not
>> initialized to avoid the verification failure.
>>

% /java/re/jdk/7/latest/binaries/solaris-i586/fastdebug/bin/java 
-XX:+CheckMemoryInitialization -Xcomp t
VM option '+CheckMemoryInitialization'
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/collectedHeap.cpp:98
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error 
(/tmp/workspace/jdk7-2-build-solaris-i586-product/jdk7/hotspot/src/share/vm/gc_interface/collectedHeap.cpp:98), 
pid=27663, tid=2
#  assert((*(intptr_t*) (addr + slot)) != ((intptr_t) badHeapWordVal)) failed: 
Found badHeapWordValue in post-allocation check
#
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) Server VM (21.0-b17-fastdebug compiled mode 
solaris-x86 )

Vladimir

> 
> Can you describe why these two changes were necessary? There was already 
> support
> for skipping headers for concurrent GC's when zapping and verifying. Did 
> something
> change that caused this to be changed.
> 
> I haven't looked at the rest of the files, but a high level description 
> of the need to
> make this change would allow me to review the changes that necessitated 
> this,
> and whether it could not be done more easily otherwise (using the existing
> framework of skipping a preamble of words in the object).
> 
> -- ramki

From igor.veresov at oracle.com  Thu Aug 25 17:10:28 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 25 Aug 2011 17:10:28 -0700
Subject: review(XS): 6591247: C2 cleans up the merge point too early
	during SplitIf.
Message-ID: <F4AD466A5CE349A08332EAE8577FC431@oracle.com>

The problem here is that during split-if we remove the region's self reference too early while processing its users, which can make get_ctrl_no_update() return the wrong answer. 
 I wasn't able to reproduce the problem, but it seems to be possible for it to occur if the region points to something else but phi and the self reference is deleted too early. 
The solution is to remove the self reference last.

Webrev: http://cr.openjdk.java.net/~iveresov/6591247/webrev.00/

Testing: specjvm98, CTW

Thanks,
igor


From vladimir.kozlov at oracle.com  Thu Aug 25 18:47:37 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 25 Aug 2011 18:47:37 -0700
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com>
References: <4E559CC0.6030701@oracle.com>
	<2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com>
Message-ID: <4E56FB39.2050207@oracle.com>

Thank you, Tom

I updated webrev with your and Christian suggestions:

http://cr.openjdk.java.net/~kvn/7059037/webrev

Tom Rodriguez wrote:
> src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp:
> 
> Please use an ifdef block instead of the expression form.

Done.

> 
> You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments.  Something like:
> 
> predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con() > BlkZeroingLowLimit)
> 
> That would reduce any overhead for large instances that will never benefit from BIS.

Done. I thought about that but found that such cases are rare since the 
expression which calculates count could be complex (because we mostly do partial 
zeroing) or when object is small with constant count ClearArray is replaced with 
stores in ideal transformation. But I agree it still may help.

> 
> Could we use block instead of blk?  Otherwise this looks good.

Done.

Thanks,
Vladimir

> 
> tom
> 
> On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/7059037/webrev
>>
>> 7059037: Use BIS for zeroing on T4
>>
>> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
>> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
>> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
>> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
>> was added to use in runtime.
>>
>> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
>> requires membar. 2Hb was selected based on microbenchmark results.
>>
>> I also added wrasi(Reg, immI) instruction which I used during development.
>> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
>> was not used.
>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
>> since it will be cleaned later in init_obj().
>> Fixed call sites of check_for_bad_heap_word_value() where klass is not
>> initialized to avoid the verification failure.
>>
> 

From igor.veresov at oracle.com  Thu Aug 25 19:48:49 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 25 Aug 2011 19:48:49 -0700
Subject: review(XS): 6591247: C2 cleans up the merge point too early
	during SplitIf.
In-Reply-To: <4E56ED79.3080902@oracle.com>
References: <F4AD466A5CE349A08332EAE8577FC431@oracle.com>
	<4E56ED79.3080902@oracle.com>
Message-ID: <4FB4620C833040C090409B0FBB3DD4EF@oracle.com>

 Thanks, Vladimir! 

igor

On Thursday, August 25, 2011 at 5:48 PM, Vladimir Kozlov wrote:

> It is good.
> 
> Vladimir
> 
> Igor Veresov wrote:
> > The problem here is that during split-if we remove the region's self reference too early while processing its users, which can make get_ctrl_no_update() return the wrong answer. 
> >  I wasn't able to reproduce the problem, but it seems to be possible for it to occur if the region points to something else but phi and the self reference is deleted too early. 
> > The solution is to remove the self reference last.
> > 
> > Webrev: http://cr.openjdk.java.net/~iveresov/6591247/webrev.00/
> > 
> > Testing: specjvm98, CTW
> > 
> > Thanks,
> > igor


From tom.rodriguez at oracle.com  Thu Aug 25 19:57:11 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 25 Aug 2011 19:57:11 -0700
Subject: review(XS): 6591247: C2 cleans up the merge point too early
	during SplitIf.
In-Reply-To: <F4AD466A5CE349A08332EAE8577FC431@oracle.com>
References: <F4AD466A5CE349A08332EAE8577FC431@oracle.com>
Message-ID: <7E0E16EB-ACA8-482F-92FE-E76CD2B79CC2@oracle.com>

Looks good.

tom

On Aug 25, 2011, at 5:10 PM, Igor Veresov wrote:

> The problem here is that during split-if we remove the region's self reference too early while processing its users, which can make get_ctrl_no_update() return the wrong answer. 
> I wasn't able to reproduce the problem, but it seems to be possible for it to occur if the region points to something else but phi and the self reference is deleted too early. 
> The solution is to remove the self reference last.
> 
> Webrev: http://cr.openjdk.java.net/~iveresov/6591247/webrev.00/
> 
> Testing: specjvm98, CTW
> 
> Thanks,
> igor
> 


From tom.rodriguez at oracle.com  Thu Aug 25 19:58:47 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 25 Aug 2011 19:58:47 -0700
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <4E56FB39.2050207@oracle.com>
References: <4E559CC0.6030701@oracle.com>
	<2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com>
	<4E56FB39.2050207@oracle.com>
Message-ID: <33F76179-F235-46E6-8B7F-A2A92DE9FE20@oracle.com>

Looks good.

tom

On Aug 25, 2011, at 6:47 PM, Vladimir Kozlov wrote:

> Thank you, Tom
> 
> I updated webrev with your and Christian suggestions:
> 
> http://cr.openjdk.java.net/~kvn/7059037/webrev
> 
> Tom Rodriguez wrote:
>> src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp:
>> Please use an ifdef block instead of the expression form.
> 
> Done.
> 
>> You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments.  Something like:
>> predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con() > BlkZeroingLowLimit)
>> That would reduce any overhead for large instances that will never benefit from BIS.
> 
> Done. I thought about that but found that such cases are rare since the expression which calculates count could be complex (because we mostly do partial zeroing) or when object is small with constant count ClearArray is replaced with stores in ideal transformation. But I agree it still may help.
> 
>> Could we use block instead of blk?  Otherwise this looks good.
> 
> Done.
> 
> Thanks,
> Vladimir
> 
>> tom
>> On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote:
>>> http://cr.openjdk.java.net/~kvn/7059037/webrev
>>> 
>>> 7059037: Use BIS for zeroing on T4
>>> 
>>> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
>>> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
>>> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
>>> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
>>> was added to use in runtime.
>>> 
>>> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
>>> requires membar. 2Hb was selected based on microbenchmark results.
>>> 
>>> I also added wrasi(Reg, immI) instruction which I used during development.
>>> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
>>> was not used.
>>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
>>> since it will be cleaned later in init_obj().
>>> Fixed call sites of check_for_bad_heap_word_value() where klass is not
>>> initialized to avoid the verification failure.
>>> 


From igor.veresov at oracle.com  Thu Aug 25 20:51:43 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 25 Aug 2011 20:51:43 -0700
Subject: review(XS): 6591247: C2 cleans up the merge point too early
	during SplitIf.
In-Reply-To: <7E0E16EB-ACA8-482F-92FE-E76CD2B79CC2@oracle.com>
References: <F4AD466A5CE349A08332EAE8577FC431@oracle.com>
	<7E0E16EB-ACA8-482F-92FE-E76CD2B79CC2@oracle.com>
Message-ID: <5D1DCB1FC80E4C499A9FF261E399F976@oracle.com>

 Thanks, Tom! 

igor

On Thursday, August 25, 2011 at 7:57 PM, Tom Rodriguez wrote:

> Looks good.
> 
> tom
> 
> On Aug 25, 2011, at 5:10 PM, Igor Veresov wrote:
> 
> > The problem here is that during split-if we remove the region's self reference too early while processing its users, which can make get_ctrl_no_update() return the wrong answer. 
> > I wasn't able to reproduce the problem, but it seems to be possible for it to occur if the region points to something else but phi and the self reference is deleted too early. 
> > The solution is to remove the self reference last.
> > 
> > Webrev: http://cr.openjdk.java.net/~iveresov/6591247/webrev.00/
> > 
> > Testing: specjvm98, CTW
> > 
> > Thanks,
> > igor


From christian.thalinger at oracle.com  Fri Aug 26 00:02:05 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 26 Aug 2011 09:02:05 +0200
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <4E56FB39.2050207@oracle.com>
References: <4E559CC0.6030701@oracle.com>
	<2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com>
	<4E56FB39.2050207@oracle.com>
Message-ID: <3FF5DFAA-BF98-42C3-9744-758441EB2BB2@oracle.com>

Looks good.  -- Christian

On Aug 26, 2011, at 3:47 AM, Vladimir Kozlov wrote:

> Thank you, Tom
> 
> I updated webrev with your and Christian suggestions:
> 
> http://cr.openjdk.java.net/~kvn/7059037/webrev
> 
> Tom Rodriguez wrote:
>> src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp:
>> Please use an ifdef block instead of the expression form.
> 
> Done.
> 
>> You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments.  Something like:
>> predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con() > BlkZeroingLowLimit)
>> That would reduce any overhead for large instances that will never benefit from BIS.
> 
> Done. I thought about that but found that such cases are rare since the expression which calculates count could be complex (because we mostly do partial zeroing) or when object is small with constant count ClearArray is replaced with stores in ideal transformation. But I agree it still may help.
> 
>> Could we use block instead of blk?  Otherwise this looks good.
> 
> Done.
> 
> Thanks,
> Vladimir
> 
>> tom
>> On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote:
>>> http://cr.openjdk.java.net/~kvn/7059037/webrev
>>> 
>>> 7059037: Use BIS for zeroing on T4
>>> 
>>> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
>>> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
>>> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
>>> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
>>> was added to use in runtime.
>>> 
>>> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
>>> requires membar. 2Hb was selected based on microbenchmark results.
>>> 
>>> I also added wrasi(Reg, immI) instruction which I used during development.
>>> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
>>> was not used.
>>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
>>> since it will be cleaned later in init_obj().
>>> Fixed call sites of check_for_bad_heap_word_value() where klass is not
>>> initialized to avoid the verification failure.
>>> 


From y.s.ramakrishna at oracle.com  Fri Aug 26 00:51:19 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Fri, 26 Aug 2011 00:51:19 -0700
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <4E56E01C.6000604@oracle.com>
References: <4E559CC0.6030701@oracle.com> <4E56AF30.8050909@oracle.com>
	<4E56E01C.6000604@oracle.com>
Message-ID: <4E575077.6080502@oracle.com>


On 8/25/2011 4:51 PM, Vladimir Kozlov wrote:
> Ramki,
>
> Ramki Ramakrishna wrote:
>> Hi Vladimir --
>>
>> On 8/24/2011 5:52 PM, Vladimir Kozlov wrote:
>>> http://cr.openjdk.java.net/~kvn/7059037/webrev
>>>
>>> 7059037: Use BIS for zeroing on T4
>>>
>> ...
>>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead 
>>> of zeroing it
>>> since it will be cleaned later in init_obj().
>
> TLAB::allocate() zaps new objects so I think allocate_from_tlab_slow() 
> should also zap new object (and I copied code from 
> ThreadLocalAllocBuffer::allocate()) instead of cleaning it since it 
> will be cleaned later in init_obj().
>

I see. OK I agree that this is the right thing to do for concurrent 
gc's, although i wish this could be
cleanly abstracted based on the collector, rather than a blanket 
imposition from concurrent gc's.

>>> Fixed call sites of check_for_bad_heap_word_value() where klass is not
>>> initialized to avoid the verification failure.
>>>

I see that the skip_header_HeapWords() used in GCH:: 
check_for_non_bad_heap_word_value() was
not extended to the CH::check_for_bad_heap_word_value().

Your changes look good; thanks for fixing up the shortcomings. I'll 
check with my colleagues on
the need to clean this up (in a separate CR of course) so that the 
concurrent GC'isms do not leak out in
this manner into the general code, or at least are left sufficiently 
abstract when they can be.

-- ramki
>
> % /java/re/jdk/7/latest/binaries/solaris-i586/fastdebug/bin/java 
> -XX:+CheckMemoryInitialization -Xcomp t
> VM option '+CheckMemoryInitialization'
> # To suppress the following error report, specify this argument
> # after -XX: or in .hotspotrc:  SuppressErrorAt=/collectedHeap.cpp:98
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error 
> (/tmp/workspace/jdk7-2-build-solaris-i586-product/jdk7/hotspot/src/share/vm/gc_interface/collectedHeap.cpp:98), 
> pid=27663, tid=2
> #  assert((*(intptr_t*) (addr + slot)) != ((intptr_t) badHeapWordVal)) 
> failed: Found badHeapWordValue in post-allocation check
> #
> # JRE version: 7.0-b147
> # Java VM: Java HotSpot(TM) Server VM (21.0-b17-fastdebug compiled 
> mode solaris-x86 )
>
> Vladimir
>
>>
>> Can you describe why these two changes were necessary? There was 
>> already support
>> for skipping headers for concurrent GC's when zapping and verifying. 
>> Did something
>> change that caused this to be changed.
>>
>> I haven't looked at the rest of the files, but a high level 
>> description of the need to
>> make this change would allow me to review the changes that 
>> necessitated this,
>> and whether it could not be done more easily otherwise (using the 
>> existing
>> framework of skipping a preamble of words in the object).
>>
>> -- ramki

From christian.thalinger at oracle.com  Fri Aug 26 02:16:26 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 26 Aug 2011 11:16:26 +0200
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
Message-ID: <E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>


On Aug 25, 2011, at 8:21 PM, John Rose wrote:

> That's nice and clean.
> 
> One question:  What happens when a CallSite optimizes down to a ConstantCallSite?  It looks like a useless dependency will get inserted.

Right.  That slipped through the cracks.  I also changed the check in callGenerator.

> 
> Maybe the call to assert_call_site_target_value should be guarded by a check whether the field is marked final.

I'm not sure I understand.  The target field isn't final.

> 
> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass.  Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses.

Are there plans to do this?  I changed ciField::is_call_site_target to check for subclasses of CallSite.

-- Christian

> 
> -- John
> 
> On Aug 25, 2011, at 6:54 AM, Christian Thalinger wrote:
> 
>> http://cr.openjdk.java.net/~twisti/7071709/
>> 
>> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled
>> Reviewed-by:
>> 
>> SwitchPoints use a MutableCallSite for its implementation.  The fix is
>> to treat the target field of constant CallSites as a compile time
>> constant and add a dependence for invalidation of the optimization.
>> 
>> src/share/vm/opto/memnode.cpp
>> src/share/vm/opto/parse3.cpp
>> 
> 


From christian.thalinger at oracle.com  Fri Aug 26 02:23:08 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 26 Aug 2011 11:23:08 +0200
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <35544B63-E3B9-4F37-9015-4E081392A9D1@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<108F560B-C3D0-40AC-9C10-3EC21A4FB8C5@oracle.com>
	<4E568B9F.8040606@univ-mlv.fr>
	<05ACD704-7163-497D-8C2C-9E7AD760E080@oracle.com>
	<35544B63-E3B9-4F37-9015-4E081392A9D1@oracle.com>
Message-ID: <775B68CD-023B-4821-A6DD-383F57FDFE73@oracle.com>


On Aug 26, 2011, at 1:47 AM, John Rose wrote:

> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote:
> 
>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times.  MutableCallSite can be as well but the implication is that it's not updated very often.  I'm actually unclear what distinction is trying to be made with VolatileCallSite.
> 
> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS.  These barriers do not affect the validity or applicability of push notification.  Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times.  We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted.

So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes.  I'm fine with both.

-- Christian

> 
> -- Joh

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110826/37b5498c/attachment.html 

From christian.thalinger at oracle.com  Fri Aug 26 04:16:43 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Fri, 26 Aug 2011 13:16:43 +0200
Subject: review for 7071307: MethodHandle bimorphic inlining should
	consider the frequency
In-Reply-To: <CA629B01-E95B-45C0-A8F5-FE8EF01E0420@oracle.com>
References: <DF1ECF39-6119-466F-97E5-3B6DB9C11F70@oracle.com>
	<4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com>
	<CA629B01-E95B-45C0-A8F5-FE8EF01E0420@oracle.com>
Message-ID: <6E9A5D90-666D-436B-BCA9-A781510B70DD@oracle.com>

I just applied this patch to test the rtalk implementation and I hit an assert:

Internal Error at bytecodeInfo.cpp:152, pid=10351, tid=11
assert(mha_profile) failed: must exist

Some context:

(dbx) p _caller_jvms->method()->print()
<ciMethod name=invoke holder=rtPbc/r68 signature=(Lri/core/rtalk/RtObject;Lri/core/rtalk/RtObject;Lri/core/rtalk/RtObject;)Lri/core/rtalk/RtObject; loaded=true flags=public,static ident=904 PERM address=0x8d6ab58>_caller_jvms->method()->print() = (void)
(dbx) p _caller_jvms->bci()             
_caller_jvms->bci() = 7
(dbx) p _caller_jvms->method()->print_codes()
0 aload_2
1 astore_3
2 aload_1
3 fast_aload_0
4 aload_2
5 astore_3
6 aload_3
7 invokedynamic secondary cache[4] of CP[2] missing bias?
  0   bci: 7    CounterData         count(16900)
12 astore_3
13 aload_3
14 invokedynamic secondary cache[5] of CP[3] missing bias?
  8   bci: 14   CounterData         count(16900)
19 astore_3
20 aload_3
21 areturn
_caller_jvms->method()->print_codes() = (void)
(dbx) p mdo->print()
--- Extra data:
mdo->print() = (void)
(dbx) 

-- Christian

On Aug 24, 2011, at 11:12 PM, Tom Rodriguez wrote:

> 
> On Aug 24, 2011, at 6:12 AM, Christian Thalinger wrote:
> 
>> 
>> On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote:
>> 
>>> This is a re-review since I added per method handle GWT profiling.
>>> 
>>> http://cr.openjdk.java.net/~never/7071307
>>> 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg
>> 
>> src/share/vm/prims/methodHandleWalk.cpp:
>> 
>> MethodHandleCompiler::fetch_counts:
>> 
>> +   int count1 = -1, count2 = -1;
>> ...
>> +   int total = count1 + count2;
>> +   if (count1 != -1 && count2 != -2 && total != 0) {
>> 
>> Why -2?
> 
> Just a typo.  It's fixed.
> 
>> 
>> +   int          _taken_count;
>> +   int          _not_taken_count;
>> 
>> Does taken refer to target and not_taken to fallback in the GWT?
> 
> They refer to the bytecode and the vmcounts collected.  I think they are actually reversed from what selectAlternative generates but as long as they agree with the bytecodes generated I don't think it matters.  I verified empirically that the counts match the execution and feed into the frequency in the proper fashion.
> 
>> 
>> MethodHandleCompiler::make_invoke:
>> 
>> Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking?
> 
> I added support for ifeq and added update_branch_dest to correct the offsets.  I only added support for ifeq for now.
> 
>> 
>> +     bool found_sel = false;
>> 
>> Can you rename that to maybe found_selectAlternative?
> 
> Yup.
> 
>> 
>> 
>> src/share/vm/ci/ciMethodHandle.cpp:
>> 
>> That print_chain is very helpful.  Thanks for that.
>> 
>> 
>> src/share/vm/classfile/javaClasses.cpp:
>> 
>> + int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) {
>> +   assert(is_instance(mh), "DMH only");
>> +   return mh->int_field(_vmcount_offset);
>> + }
>> + 
>> + void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) {
>> +   assert(is_instance(mh), "DMH only");
>> +   mh->int_field_put(_vmcount_offset, count);
>> + }
>> 
>> I think the assert message is a copy-paste bug.
> 
> Fixed.
> 
>> 
>> Otherwise looks good.
> 
> Thanks!
> 
> tom
> 
>> 
>>> 
>>> 7071307: MethodHandle bimorphic inlining should consider the frequency
>>> Reviewed-by:
>>> 
>>> The fix for 7050554 added a bimorphic inline path but didn't take into
>>> account the frequency of the guarding test.  This ends up treating
>>> both sides of the if as equally frequent which can lead to over
>>> inlining and overflowing the method inlining limits.  The fix is to
>>> grab the frequency from the If and apply that to the branches.
>>> 
>>> Additionally I added support for per method handle profile collection
>>> since this was required to get good results for more complex programs.
>>> This requires the fix for 7082631 on the JDK side.
>>> http://cr.openjdk.java.net/~never/7082631
>> 
>> The JDK changes look good.
>> 
>> -- Christian
>> 
>>> 
>>> I also fixed a problem with the ideal graph printer where debug_orig
>>> printing would go into an infinite loop.
>>> 
>>> Tested with jruby and vm.mlvm tests.
>>> 
>> 
> 


From tom.rodriguez at oracle.com  Fri Aug 26 04:53:32 2011
From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com)
Date: Fri, 26 Aug 2011 11:53:32 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7082949: JSR 292: missing ResourceMark
	in methodOopDesc::make_invoke_method
Message-ID: <20110826115338.EE4DF47131@hg.openjdk.java.net>

Changeset: ac8738449b6f
Author:    never
Date:      2011-08-25 20:29 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/ac8738449b6f

7082949: JSR 292: missing ResourceMark in methodOopDesc::make_invoke_method
Reviewed-by: kvn, twisti

! src/share/vm/oops/methodOop.cpp
+ test/compiler/7082949/Test7082949.java


From vladimir.kozlov at oracle.com  Fri Aug 26 07:51:28 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 26 Aug 2011 07:51:28 -0700
Subject: Request for reviews (M): 7059037: Use BIS for zeroing on T4
In-Reply-To: <3FF5DFAA-BF98-42C3-9744-758441EB2BB2@oracle.com>
References: <4E559CC0.6030701@oracle.com>
	<2526D33F-9391-4E3A-A702-20B4D438F0C8@oracle.com>
	<4E56FB39.2050207@oracle.com>
	<3FF5DFAA-BF98-42C3-9744-758441EB2BB2@oracle.com>
Message-ID: <4E57B2F0.90805@oracle.com>

Thank you, Tom and Christian for reviews.

Vladimir

On 8/26/11 12:02 AM, Christian Thalinger wrote:
> Looks good.  -- Christian
>
> On Aug 26, 2011, at 3:47 AM, Vladimir Kozlov wrote:
>
>> Thank you, Tom
>>
>> I updated webrev with your and Christian suggestions:
>>
>> http://cr.openjdk.java.net/~kvn/7059037/webrev
>>
>> Tom Rodriguez wrote:
>>> src/share/vm/gc_interface/collectedHeap.inline.hpp, src/share/vm/oops/cpCacheKlass.cpp:
>>> Please use an ifdef block instead of the expression form.
>>
>> Done.
>>
>>> You might consider using more sophisticated predicates to statically rule out ClearArrays with constant arguments.  Something like:
>>> predicate(!n->in(1)->is_Con() || n->in(1)->find_intrpt_t_con()>  BlkZeroingLowLimit)
>>> That would reduce any overhead for large instances that will never benefit from BIS.
>>
>> Done. I thought about that but found that such cases are rare since the expression which calculates count could be complex (because we mostly do partial zeroing) or when object is small with constant count ClearArray is replaced with stores in ideal transformation. But I agree it still may help.
>>
>>> Could we use block instead of blk?  Otherwise this looks good.
>>
>> Done.
>>
>> Thanks,
>> Vladimir
>>
>>> tom
>>> On Aug 24, 2011, at 5:52 PM, Vladimir Kozlov wrote:
>>>> http://cr.openjdk.java.net/~kvn/7059037/webrev
>>>>
>>>> 7059037: Use BIS for zeroing on T4
>>>>
>>>> On T4 BIS to the beginning of cache line always zeros it. Use it for zeroing new
>>>> allocated java objects. The main code is in MacroAssembler::bis_zeroing() and is
>>>> used by C2 generated code (ClearArray), runtime (Copy::fill_to_aligned_words())
>>>> and template interpreter (TemplateTable::_new()). New stub zero_aligned_words
>>>> was added to use in runtime.
>>>>
>>>> BIS is used only for objects bigger than BlkZeroingLowLimit (2Kbyte) since it
>>>> requires membar. 2Hb was selected based on microbenchmark results.
>>>>
>>>> I also added wrasi(Reg, immI) instruction which I used during development.
>>>> VM_Version::has_mru_blk_init() is replaced with has_blk_zeroing() since original
>>>> was not used.
>>>> Zap new object in CollectedHeap::allocate_from_tlab_slow() instead of zeroing it
>>>> since it will be cleaned later in init_obj().
>>>> Fixed call sites of check_for_bad_heap_word_value() where klass is not
>>>> initialized to avoid the verification failure.
>>>>
>

From vladimir.kozlov at oracle.com  Fri Aug 26 13:33:55 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Fri, 26 Aug 2011 20:33:55 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7059037: Use BIS for zeroing on T4
Message-ID: <20110826203400.01D1247146@hg.openjdk.java.net>

Changeset: baf763f388e6
Author:    kvn
Date:      2011-08-26 08:52 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/baf763f388e6

7059037: Use BIS for zeroing on T4
Summary: Use BIS for zeroing new allocated big (2Kb and more) objects and arrays.
Reviewed-by: never, twisti, ysr

! src/cpu/sparc/vm/assembler_sparc.cpp
! src/cpu/sparc/vm/assembler_sparc.hpp
! src/cpu/sparc/vm/copy_sparc.hpp
! src/cpu/sparc/vm/sparc.ad
! src/cpu/sparc/vm/stubGenerator_sparc.cpp
! src/cpu/sparc/vm/templateTable_sparc.cpp
! src/cpu/sparc/vm/vm_version_sparc.cpp
! src/cpu/sparc/vm/vm_version_sparc.hpp
! src/share/vm/gc_interface/collectedHeap.cpp
! src/share/vm/gc_interface/collectedHeap.inline.hpp
! src/share/vm/oops/cpCacheKlass.cpp
! src/share/vm/runtime/globals.hpp
! src/share/vm/runtime/stubRoutines.cpp
! src/share/vm/runtime/stubRoutines.hpp


From tom.rodriguez at oracle.com  Fri Aug 26 13:47:47 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Fri, 26 Aug 2011 13:47:47 -0700
Subject: review for 7071307: MethodHandle bimorphic inlining should
	consider the frequency
In-Reply-To: <6E9A5D90-666D-436B-BCA9-A781510B70DD@oracle.com>
References: <DF1ECF39-6119-466F-97E5-3B6DB9C11F70@oracle.com>
	<4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com>
	<CA629B01-E95B-45C0-A8F5-FE8EF01E0420@oracle.com>
	<6E9A5D90-666D-436B-BCA9-A781510B70DD@oracle.com>
Message-ID: <8D08BA0D-D556-463A-BD48-E3800C76A9ED@oracle.com>

I needed an is_empty() test in addition to the method_data() != NULL.

    if (_caller_jvms != NULL && _caller_jvms->method() != NULL &&
        _caller_jvms->method()->method_data() != NULL &&
        !_caller_jvms->method()->method_data()->is_empty()) {
      ciMethodData* mdo = _caller_jvms->method()->method_data();
      ciProfileData* mha_profile = mdo->bci_to_data(_caller_jvms->bci());
      assert(mha_profile, "must exist");
      CounterData* cd = mha_profile->as_CounterData();
      call_site_count = cd->count();
    } else {
      call_site_count = invoke_count;  // use the same value                                                                                          
    }

I also hit another unrelated assertion when running the test where he had optimized away all the invokedynamics, so Compile::has_method_handle_invokes was true but we never actually emitted any.  So we failed this assert in nmethod.cpp:

  assert(has_method_handle_invokes() == (_deoptimize_mh_offset != -1), "must have deopt mh handler");

The fix is to remove the set_has_method_handle_invokes call in callGenerator.cpp and set them when they are matched.

diff -r ac8738449b6f src/share/vm/opto/matcher.cpp                                                                                                    
--- a/src/share/vm/opto/matcher.cpp                                                                                                                  
+++ b/src/share/vm/opto/matcher.cpp                                                                                                                  
@@ -1106,6 +1106,9 @@
       mcall_java->_optimized_virtual = call_java->is_optimized_virtual();                                                                            
       is_method_handle_invoke = call_java->is_method_handle_invoke();                                                                                
       mcall_java->_method_handle_invoke = is_method_handle_invoke;                                                                                  
+      if (is_method_handle_invoke) {                                                                                                                
+        C->set_has_method_handle_invokes(true);                                                                                                      
+      }                                                                                                                                              
       if( mcall_java->is_MachCallStaticJava() )                                                                                                      
         mcall_java->as_MachCallStaticJava()->_name =                                                                                                
          call_java->as_CallStaticJava()->_name;

There's some crazy deep inlining in that smalltalk test case.  I think there must be some sort of bug with it.  The PrintInlining output wraps on my screen several times.  I'm looking at it.

tom

On Aug 26, 2011, at 4:16 AM, Christian Thalinger wrote:

> I just applied this patch to test the rtalk implementation and I hit an assert:
> 
> Internal Error at bytecodeInfo.cpp:152, pid=10351, tid=11
> assert(mha_profile) failed: must exist
> 
> Some context:
> 
> (dbx) p _caller_jvms->method()->print()
> <ciMethod name=invoke holder=rtPbc/r68 signature=(Lri/core/rtalk/RtObject;Lri/core/rtalk/RtObject;Lri/core/rtalk/RtObject;)Lri/core/rtalk/RtObject; loaded=true flags=public,static ident=904 PERM address=0x8d6ab58>_caller_jvms->method()->print() = (void)
> (dbx) p _caller_jvms->bci()             
> _caller_jvms->bci() = 7
> (dbx) p _caller_jvms->method()->print_codes()
> 0 aload_2
> 1 astore_3
> 2 aload_1
> 3 fast_aload_0
> 4 aload_2
> 5 astore_3
> 6 aload_3
> 7 invokedynamic secondary cache[4] of CP[2] missing bias?
>  0   bci: 7    CounterData         count(16900)
> 12 astore_3
> 13 aload_3
> 14 invokedynamic secondary cache[5] of CP[3] missing bias?
>  8   bci: 14   CounterData         count(16900)
> 19 astore_3
> 20 aload_3
> 21 areturn
> _caller_jvms->method()->print_codes() = (void)
> (dbx) p mdo->print()
> --- Extra data:
> mdo->print() = (void)
> (dbx) 
> 
> -- Christian
> 
> On Aug 24, 2011, at 11:12 PM, Tom Rodriguez wrote:
> 
>> 
>> On Aug 24, 2011, at 6:12 AM, Christian Thalinger wrote:
>> 
>>> 
>>> On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote:
>>> 
>>>> This is a re-review since I added per method handle GWT profiling.
>>>> 
>>>> http://cr.openjdk.java.net/~never/7071307
>>>> 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg
>>> 
>>> src/share/vm/prims/methodHandleWalk.cpp:
>>> 
>>> MethodHandleCompiler::fetch_counts:
>>> 
>>> +   int count1 = -1, count2 = -1;
>>> ...
>>> +   int total = count1 + count2;
>>> +   if (count1 != -1 && count2 != -2 && total != 0) {
>>> 
>>> Why -2?
>> 
>> Just a typo.  It's fixed.
>> 
>>> 
>>> +   int          _taken_count;
>>> +   int          _not_taken_count;
>>> 
>>> Does taken refer to target and not_taken to fallback in the GWT?
>> 
>> They refer to the bytecode and the vmcounts collected.  I think they are actually reversed from what selectAlternative generates but as long as they agree with the bytecodes generated I don't think it matters.  I verified empirically that the counts match the execution and feed into the frequency in the proper fashion.
>> 
>>> 
>>> MethodHandleCompiler::make_invoke:
>>> 
>>> Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking?
>> 
>> I added support for ifeq and added update_branch_dest to correct the offsets.  I only added support for ifeq for now.
>> 
>>> 
>>> +     bool found_sel = false;
>>> 
>>> Can you rename that to maybe found_selectAlternative?
>> 
>> Yup.
>> 
>>> 
>>> 
>>> src/share/vm/ci/ciMethodHandle.cpp:
>>> 
>>> That print_chain is very helpful.  Thanks for that.
>>> 
>>> 
>>> src/share/vm/classfile/javaClasses.cpp:
>>> 
>>> + int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) {
>>> +   assert(is_instance(mh), "DMH only");
>>> +   return mh->int_field(_vmcount_offset);
>>> + }
>>> + 
>>> + void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) {
>>> +   assert(is_instance(mh), "DMH only");
>>> +   mh->int_field_put(_vmcount_offset, count);
>>> + }
>>> 
>>> I think the assert message is a copy-paste bug.
>> 
>> Fixed.
>> 
>>> 
>>> Otherwise looks good.
>> 
>> Thanks!
>> 
>> tom
>> 
>>> 
>>>> 
>>>> 7071307: MethodHandle bimorphic inlining should consider the frequency
>>>> Reviewed-by:
>>>> 
>>>> The fix for 7050554 added a bimorphic inline path but didn't take into
>>>> account the frequency of the guarding test.  This ends up treating
>>>> both sides of the if as equally frequent which can lead to over
>>>> inlining and overflowing the method inlining limits.  The fix is to
>>>> grab the frequency from the If and apply that to the branches.
>>>> 
>>>> Additionally I added support for per method handle profile collection
>>>> since this was required to get good results for more complex programs.
>>>> This requires the fix for 7082631 on the JDK side.
>>>> http://cr.openjdk.java.net/~never/7082631
>>> 
>>> The JDK changes look good.
>>> 
>>> -- Christian
>>> 
>>>> 
>>>> I also fixed a problem with the ideal graph printer where debug_orig
>>>> printing would go into an infinite loop.
>>>> 
>>>> Tested with jruby and vm.mlvm tests.
>>>> 
>>> 
>> 
> 


From igor.veresov at oracle.com  Sat Aug 27 02:21:23 2011
From: igor.veresov at oracle.com (igor.veresov at oracle.com)
Date: Sat, 27 Aug 2011 09:21:23 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 6591247: C2 cleans up the merge point
	too early during SplitIf
Message-ID: <20110827092125.9CD5147167@hg.openjdk.java.net>

Changeset: 8805f8c1e23e
Author:    iveresov
Date:      2011-08-27 00:23 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/8805f8c1e23e

6591247: C2 cleans up the merge point too early during SplitIf
Summary: Remove region self reference last
Reviewed-by: kvn, never

! src/share/vm/opto/split_if.cpp


From john.r.rose at oracle.com  Sat Aug 27 16:44:47 2011
From: john.r.rose at oracle.com (John Rose)
Date: Sat, 27 Aug 2011 16:44:47 -0700
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
	<E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
Message-ID: <F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>

On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote:

>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass.  Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses.
> 
> Are there plans to do this?  I changed ciField::is_call_site_target to check for subclasses of CallSite.

No, no plans.  Just a move toward robustness.  Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses.  I'm afraid we could get forced to split the field at some point.

On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote:

> 
> On Aug 26, 2011, at 1:47 AM, John Rose wrote:
> 
>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote:
>> 
>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times.  MutableCallSite can be as well but the implication is that it's not updated very often.  I'm actually unclear what distinction is trying to be made with VolatileCallSite.
>> 
>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS.  These barriers do not affect the validity or applicability of push notification.  Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times.  We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted.
> 
> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes.  I'm fine with both.


I think we need the throttling logic right away.  I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug.

-- John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110827/39c5114a/attachment.html 

From christian.thalinger at oracle.com  Sun Aug 28 01:16:23 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Sun, 28 Aug 2011 10:16:23 +0200
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
	<E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
	<F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
Message-ID: <4CE5B9F0-9357-4714-968B-2F818D0090A6@oracle.com>


On Aug 28, 2011, at 1:44 AM, John Rose wrote:

> On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote:
> 
>>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass.  Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses.
>> 
>> Are there plans to do this?  I changed ciField::is_call_site_target to check for subclasses of CallSite.
> 
> No, no plans.  Just a move toward robustness.  Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses.  I'm afraid we could get forced to split the field at some point.
> 
> On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote:
> 
>> 
>> On Aug 26, 2011, at 1:47 AM, John Rose wrote:
>> 
>>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote:
>>> 
>>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times.  MutableCallSite can be as well but the implication is that it's not updated very often.  I'm actually unclear what distinction is trying to be made with VolatileCallSite.
>>> 
>>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS.  These barriers do not affect the validity or applicability of push notification.  Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times.  We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted.
>> 
>> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes.  I'm fine with both.
> 
> 
> I think we need the throttling logic right away.  I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug.

The other way around (MCS instead of VCS).  Alright, then I'll change the logic in callGenerator and doCall to optimize VCSs too for this patch and start working on the throttling logic.

-- Christian

> 
> -- John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110828/aaa2db94/attachment.html 

From christian.thalinger at oracle.com  Mon Aug 29 05:45:58 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 29 Aug 2011 14:45:58 +0200
Subject: review for 7071307: MethodHandle bimorphic inlining should
	consider the frequency
In-Reply-To: <8D08BA0D-D556-463A-BD48-E3800C76A9ED@oracle.com>
References: <DF1ECF39-6119-466F-97E5-3B6DB9C11F70@oracle.com>
	<4B352C37-8DBD-432D-8B12-BE01D04155C4@oracle.com>
	<CA629B01-E95B-45C0-A8F5-FE8EF01E0420@oracle.com>
	<6E9A5D90-666D-436B-BCA9-A781510B70DD@oracle.com>
	<8D08BA0D-D556-463A-BD48-E3800C76A9ED@oracle.com>
Message-ID: <6B7CD75D-B7D9-4F9A-ACE8-BD0718426A59@oracle.com>


On Aug 26, 2011, at 10:47 PM, Tom Rodriguez wrote:

> I needed an is_empty() test in addition to the method_data() != NULL.
> 
>    if (_caller_jvms != NULL && _caller_jvms->method() != NULL &&
>        _caller_jvms->method()->method_data() != NULL &&
>        !_caller_jvms->method()->method_data()->is_empty()) {
>      ciMethodData* mdo = _caller_jvms->method()->method_data();
>      ciProfileData* mha_profile = mdo->bci_to_data(_caller_jvms->bci());
>      assert(mha_profile, "must exist");
>      CounterData* cd = mha_profile->as_CounterData();
>      call_site_count = cd->count();
>    } else {
>      call_site_count = invoke_count;  // use the same value                                                                                          
>    }

Looks good.

> 
> I also hit another unrelated assertion when running the test where he had optimized away all the invokedynamics, so Compile::has_method_handle_invokes was true but we never actually emitted any.  So we failed this assert in nmethod.cpp:
> 
>  assert(has_method_handle_invokes() == (_deoptimize_mh_offset != -1), "must have deopt mh handler");
> 
> The fix is to remove the set_has_method_handle_invokes call in callGenerator.cpp and set them when they are matched.
> 
> diff -r ac8738449b6f src/share/vm/opto/matcher.cpp                                                                                                    
> --- a/src/share/vm/opto/matcher.cpp                                                                                                                  
> +++ b/src/share/vm/opto/matcher.cpp                                                                                                                  
> @@ -1106,6 +1106,9 @@
>       mcall_java->_optimized_virtual = call_java->is_optimized_virtual();                                                                            
>       is_method_handle_invoke = call_java->is_method_handle_invoke();                                                                                
>       mcall_java->_method_handle_invoke = is_method_handle_invoke;                                                                                  
> +      if (is_method_handle_invoke) {                                                                                                                
> +        C->set_has_method_handle_invokes(true);                                                                                                      
> +      }                                                                                                                                              
>       if( mcall_java->is_MachCallStaticJava() )                                                                                                      
>         mcall_java->as_MachCallStaticJava()->_name =                                                                                                
>          call_java->as_CallStaticJava()->_name;

Ahh, good catch.

> 
> There's some crazy deep inlining in that smalltalk test case.  I think there must be some sort of bug with it.  The PrintInlining output wraps on my screen several times.  I'm looking at it.

I haven't printed the inlining tree yet.  I will try...

-- Christian

> 
> tom
> 
> On Aug 26, 2011, at 4:16 AM, Christian Thalinger wrote:
> 
>> I just applied this patch to test the rtalk implementation and I hit an assert:
>> 
>> Internal Error at bytecodeInfo.cpp:152, pid=10351, tid=11
>> assert(mha_profile) failed: must exist
>> 
>> Some context:
>> 
>> (dbx) p _caller_jvms->method()->print()
>> <ciMethod name=invoke holder=rtPbc/r68 signature=(Lri/core/rtalk/RtObject;Lri/core/rtalk/RtObject;Lri/core/rtalk/RtObject;)Lri/core/rtalk/RtObject; loaded=true flags=public,static ident=904 PERM address=0x8d6ab58>_caller_jvms->method()->print() = (void)
>> (dbx) p _caller_jvms->bci()             
>> _caller_jvms->bci() = 7
>> (dbx) p _caller_jvms->method()->print_codes()
>> 0 aload_2
>> 1 astore_3
>> 2 aload_1
>> 3 fast_aload_0
>> 4 aload_2
>> 5 astore_3
>> 6 aload_3
>> 7 invokedynamic secondary cache[4] of CP[2] missing bias?
>> 0   bci: 7    CounterData         count(16900)
>> 12 astore_3
>> 13 aload_3
>> 14 invokedynamic secondary cache[5] of CP[3] missing bias?
>> 8   bci: 14   CounterData         count(16900)
>> 19 astore_3
>> 20 aload_3
>> 21 areturn
>> _caller_jvms->method()->print_codes() = (void)
>> (dbx) p mdo->print()
>> --- Extra data:
>> mdo->print() = (void)
>> (dbx) 
>> 
>> -- Christian
>> 
>> On Aug 24, 2011, at 11:12 PM, Tom Rodriguez wrote:
>> 
>>> 
>>> On Aug 24, 2011, at 6:12 AM, Christian Thalinger wrote:
>>> 
>>>> 
>>>> On Aug 24, 2011, at 1:44 AM, Tom Rodriguez wrote:
>>>> 
>>>>> This is a re-review since I added per method handle GWT profiling.
>>>>> 
>>>>> http://cr.openjdk.java.net/~never/7071307
>>>>> 312 lines changed: 270 ins; 15 del; 27 mod; 22101 unchg
>>>> 
>>>> src/share/vm/prims/methodHandleWalk.cpp:
>>>> 
>>>> MethodHandleCompiler::fetch_counts:
>>>> 
>>>> +   int count1 = -1, count2 = -1;
>>>> ...
>>>> +   int total = count1 + count2;
>>>> +   if (count1 != -1 && count2 != -2 && total != 0) {
>>>> 
>>>> Why -2?
>>> 
>>> Just a typo.  It's fixed.
>>> 
>>>> 
>>>> +   int          _taken_count;
>>>> +   int          _not_taken_count;
>>>> 
>>>> Does taken refer to target and not_taken to fallback in the GWT?
>>> 
>>> They refer to the bytecode and the vmcounts collected.  I think they are actually reversed from what selectAlternative generates but as long as they agree with the bytecodes generated I don't think it matters.  I verified empirically that the counts match the execution and feed into the frequency in the proper fashion.
>>> 
>>>> 
>>>> MethodHandleCompiler::make_invoke:
>>>> 
>>>> Can you use emit_bc instead of _bytecode.push where possible so we have at least a little sanity checking?
>>> 
>>> I added support for ifeq and added update_branch_dest to correct the offsets.  I only added support for ifeq for now.
>>> 
>>>> 
>>>> +     bool found_sel = false;
>>>> 
>>>> Can you rename that to maybe found_selectAlternative?
>>> 
>>> Yup.
>>> 
>>>> 
>>>> 
>>>> src/share/vm/ci/ciMethodHandle.cpp:
>>>> 
>>>> That print_chain is very helpful.  Thanks for that.
>>>> 
>>>> 
>>>> src/share/vm/classfile/javaClasses.cpp:
>>>> 
>>>> + int java_lang_invoke_CountingMethodHandle::vmcount(oop mh) {
>>>> +   assert(is_instance(mh), "DMH only");
>>>> +   return mh->int_field(_vmcount_offset);
>>>> + }
>>>> + 
>>>> + void java_lang_invoke_CountingMethodHandle::set_vmcount(oop mh, int count) {
>>>> +   assert(is_instance(mh), "DMH only");
>>>> +   mh->int_field_put(_vmcount_offset, count);
>>>> + }
>>>> 
>>>> I think the assert message is a copy-paste bug.
>>> 
>>> Fixed.
>>> 
>>>> 
>>>> Otherwise looks good.
>>> 
>>> Thanks!
>>> 
>>> tom
>>> 
>>>> 
>>>>> 
>>>>> 7071307: MethodHandle bimorphic inlining should consider the frequency
>>>>> Reviewed-by:
>>>>> 
>>>>> The fix for 7050554 added a bimorphic inline path but didn't take into
>>>>> account the frequency of the guarding test.  This ends up treating
>>>>> both sides of the if as equally frequent which can lead to over
>>>>> inlining and overflowing the method inlining limits.  The fix is to
>>>>> grab the frequency from the If and apply that to the branches.
>>>>> 
>>>>> Additionally I added support for per method handle profile collection
>>>>> since this was required to get good results for more complex programs.
>>>>> This requires the fix for 7082631 on the JDK side.
>>>>> http://cr.openjdk.java.net/~never/7082631
>>>> 
>>>> The JDK changes look good.
>>>> 
>>>> -- Christian
>>>> 
>>>>> 
>>>>> I also fixed a problem with the ideal graph printer where debug_orig
>>>>> printing would go into an infinite loop.
>>>>> 
>>>>> Tested with jruby and vm.mlvm tests.
>>>>> 
>>>> 
>>> 
>> 
> 


From christian.thalinger at oracle.com  Mon Aug 29 08:21:54 2011
From: christian.thalinger at oracle.com (christian.thalinger at oracle.com)
Date: Mon, 29 Aug 2011 15:21:54 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7083184: JSR 292: don't store context
	class argument with call site dependencies
Message-ID: <20110829152156.726D7471D7@hg.openjdk.java.net>

Changeset: b27c72d69fd1
Author:    twisti
Date:      2011-08-29 05:07 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/b27c72d69fd1

7083184: JSR 292: don't store context class argument with call site dependencies
Reviewed-by: jrose, never

! src/share/vm/ci/ciEnv.cpp
! src/share/vm/ci/ciEnv.hpp
! src/share/vm/code/dependencies.cpp
! src/share/vm/code/dependencies.hpp
! src/share/vm/memory/universe.cpp
! src/share/vm/opto/callGenerator.cpp


From christian.thalinger at oracle.com  Mon Aug 29 09:52:27 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 29 Aug 2011 18:52:27 +0200
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
	<E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
	<F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
Message-ID: <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com>


On Aug 28, 2011, at 1:44 AM, John Rose wrote:

> On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote:
> 
>>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass.  Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses.
>> 
>> Are there plans to do this?  I changed ciField::is_call_site_target to check for subclasses of CallSite.
> 
> No, no plans.  Just a move toward robustness.  Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses.  I'm afraid we could get forced to split the field at some point.

I think the point is now.  setTargetVolatile uses Unsafe to fake the volatile field semantics and the compiler doesn't recognize this as a field store to CS.target.  Thus we miss the field stores to a VCS and end up with wrong behavior.

I'm currently preparing something that does this refactoring.

Why was this Unsafe trick used in the first place?

-- Christian

> 
> On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote:
> 
>> 
>> On Aug 26, 2011, at 1:47 AM, John Rose wrote:
>> 
>>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote:
>>> 
>>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times.  MutableCallSite can be as well but the implication is that it's not updated very often.  I'm actually unclear what distinction is trying to be made with VolatileCallSite.
>>> 
>>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS.  These barriers do not affect the validity or applicability of push notification.  Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times.  We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted.
>> 
>> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes.  I'm fine with both.
> 
> 
> I think we need the throttling logic right away.  I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug.
> 
> -- John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110829/5f65697c/attachment.html 

From christian.thalinger at oracle.com  Mon Aug 29 10:22:36 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 29 Aug 2011 19:22:36 +0200
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
	<E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
	<F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
	<9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com>
Message-ID: <4A211AB4-019F-4E4A-8E68-F7E3F4CDF2CC@oracle.com>


On Aug 29, 2011, at 6:52 PM, Christian Thalinger wrote:

> 
> On Aug 28, 2011, at 1:44 AM, John Rose wrote:
> 
>> On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote:
>> 
>>>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass.  Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses.
>>> 
>>> Are there plans to do this?  I changed ciField::is_call_site_target to check for subclasses of CallSite.
>> 
>> No, no plans.  Just a move toward robustness.  Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses.  I'm afraid we could get forced to split the field at some point.
> 
> I think the point is now.  setTargetVolatile uses Unsafe to fake the volatile field semantics and the compiler doesn't recognize this as a field store to CS.target.  Thus we miss the field stores to a VCS and end up with wrong behavior.
> 
> I'm currently preparing something that does this refactoring.
> 
> Why was this Unsafe trick used in the first place?

Never mind.  I can see now why.  -- Christian

> 
> -- Christian
> 
>> 
>> On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote:
>> 
>>> 
>>> On Aug 26, 2011, at 1:47 AM, John Rose wrote:
>>> 
>>>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote:
>>>> 
>>>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times.  MutableCallSite can be as well but the implication is that it's not updated very often.  I'm actually unclear what distinction is trying to be made with VolatileCallSite.
>>>> 
>>>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS.  These barriers do not affect the validity or applicability of push notification.  Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times.  We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted.
>>> 
>>> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes.  I'm fine with both.
>> 
>> 
>> I think we need the throttling logic right away.  I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug.
>> 
>> -- John
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110829/78974f61/attachment-0001.html 

From tom.rodriguez at oracle.com  Mon Aug 29 11:03:37 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Mon, 29 Aug 2011 11:03:37 -0700
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
	<E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
	<F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
	<9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com>
Message-ID: <0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com>


On Aug 29, 2011, at 9:52 AM, Christian Thalinger wrote:

> 
> On Aug 28, 2011, at 1:44 AM, John Rose wrote:
> 
>> On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote:
>> 
>>>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass.  Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses.
>>> 
>>> Are there plans to do this?  I changed ciField::is_call_site_target to check for subclasses of CallSite.
>> 
>> No, no plans.  Just a move toward robustness.  Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses.  I'm afraid we could get forced to split the field at some point.
> 
> I think the point is now.  setTargetVolatile uses Unsafe to fake the volatile field semantics and the compiler doesn't recognize this as a field store to CS.target.  Thus we miss the field stores to a VCS and end up with wrong behavior.
> 
> I'm currently preparing something that does this refactoring.

I'm not so sure this is a good idea.  A fair amount of code assumes that the structure of all CallSites is the same:

  __ load_heap_oop(rcx_method_handle, Address(rax_callsite, __ delayed_value(java_lang_invoke_CallSite::target_offset_in_bytes, rdx)));
  __ null_check(rcx_method_handle);
  __ verify_oop(rcx_method_handle);
  __ prepare_to_jump_from_interpreted();
  __ jump_to_method_handle_entry(rcx_method_handle, rdx);

I guess we could require/enforce that all call site subclasses have their target field at the same offset but it does seem to be break something fairly fundamental.

You could trap these writes in the Unsafe machinery instead.  That's fairly ugly but easy enough to do with a few assumptions about how it will be written.  We might have to worry about reflection too, though that should either use Unsafe I think.

Maybe we should move forward with what we have and deal with VCS later?

tom

> 
> Why was this Unsafe trick used in the first place?
> 
> -- Christian
> 
>> 
>> On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote:
>> 
>>> 
>>> On Aug 26, 2011, at 1:47 AM, John Rose wrote:
>>> 
>>>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote:
>>>> 
>>>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times.  MutableCallSite can be as well but the implication is that it's not updated very often.  I'm actually unclear what distinction is trying to be made with VolatileCallSite.
>>>> 
>>>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS.  These barriers do not affect the validity or applicability of push notification.  Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times.  We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted.
>>> 
>>> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes.  I'm fine with both.
>> 
>> I think we need the throttling logic right away.  I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug.
>> 
>> -- John
> 


From christian.thalinger at oracle.com  Mon Aug 29 11:56:32 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 29 Aug 2011 20:56:32 +0200
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
	<E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
	<F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
	<9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com>
	<0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com>
Message-ID: <2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com>


On Aug 29, 2011, at 8:03 PM, Tom Rodriguez wrote:

> 
> On Aug 29, 2011, at 9:52 AM, Christian Thalinger wrote:
> 
>> 
>> On Aug 28, 2011, at 1:44 AM, John Rose wrote:
>> 
>>> On Aug 26, 2011, at 2:16 AM, Christian Thalinger wrote:
>>> 
>>>>> Also, get_field_by_offset should be called on the actual type of the ciCallSite, not env->CallSite_klass.  Otherwise you might get NULL for the ciField, if the target fields are split out across different call site subclasses.
>>>> 
>>>> Are there plans to do this?  I changed ciField::is_call_site_target to check for subclasses of CallSite.
>>> 
>>> No, no plans.  Just a move toward robustness.  Right now we have a common field inherited from CS that is faked into final or volatile field semantics in the subclasses.  I'm afraid we could get forced to split the field at some point.
>> 
>> I think the point is now.  setTargetVolatile uses Unsafe to fake the volatile field semantics and the compiler doesn't recognize this as a field store to CS.target.  Thus we miss the field stores to a VCS and end up with wrong behavior.
>> 
>> I'm currently preparing something that does this refactoring.
> 
> I'm not so sure this is a good idea.  A fair amount of code assumes that the structure of all CallSites is the same:
> 
>  __ load_heap_oop(rcx_method_handle, Address(rax_callsite, __ delayed_value(java_lang_invoke_CallSite::target_offset_in_bytes, rdx)));
>  __ null_check(rcx_method_handle);
>  __ verify_oop(rcx_method_handle);
>  __ prepare_to_jump_from_interpreted();
>  __ jump_to_method_handle_entry(rcx_method_handle, rdx);
> 
> I guess we could require/enforce that all call site subclasses have their target field at the same offset but it does seem to be break something fairly fundamental.

I agree.  I got it working but it's fragile.

> 
> You could trap these writes in the Unsafe machinery instead.  That's fairly ugly but easy enough to do with a few assumptions about how it will be written.  We might have to worry about reflection too, though that should either use Unsafe I think.

I don't like that very much either.

> 
> Maybe we should move forward with what we have and deal with VCS later?

Yes, I think that would be the best approach.  John, what do you think, optimize CCS and MCS for now and deal with VCS later?

-- Christian

> 
> tom
> 
>> 
>> Why was this Unsafe trick used in the first place?
>> 
>> -- Christian
>> 
>>> 
>>> On Aug 26, 2011, at 2:23 AM, Christian Thalinger wrote:
>>> 
>>>> 
>>>> On Aug 26, 2011, at 1:47 AM, John Rose wrote:
>>>> 
>>>>> On Aug 25, 2011, at 11:32 AM, Tom Rodriguez wrote:
>>>>> 
>>>>>> The docs for VolatileCallSite suggest setTarget can be called whenever you feel like it, so it seems like it can be set many times.  MutableCallSite can be as well but the implication is that it's not updated very often.  I'm actually unclear what distinction is trying to be made with VolatileCallSite.
>>>>> 
>>>>> The MCS and VCS have the same semantics, except for the extra memory barriers on VCS.  These barriers do not affect the validity or applicability of push notification.  Also, either an MCS or VCS might end up being "megamutable", so there has to be some sort of cutoff that will prevent target-prediction for a CS which has been mispredicted too many times.  We need to squeeze a bit of state into the CS, somewhere, which displays how many times the thing has been mispredicted.
>>>> 
>>>> So the question is should we go with what we currently have for invokedynamic (only optimize CCS and MCS) or should we allow all CSs to be optimized and start working on the logic John describes.  I'm fine with both.
>>> 
>>> I think we need the throttling logic right away.  I'm not comfortable with encouraging users to use VCS just because MCS has a performance bug.
>>> 
>>> -- John
>> 
> 


From john.r.rose at oracle.com  Mon Aug 29 12:41:56 2011
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 29 Aug 2011 12:41:56 -0700
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
	<E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
	<F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
	<9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com>
	<0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com>
	<2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com>
Message-ID: <A99BE02A-9B40-48F0-A8AA-18E329C56153@oracle.com>

Yes, deal with volatile fields later. 

I do think that VCS should get push notif now. 

-- John  (on my iPhone)

On Aug 29, 2011, at 11:56 AM, Christian Thalinger <christian.thalinger at oracle.com> wrote:

> Yes, I think that would be the best approach.  John, what do you think, optimize CCS and MCS for now and deal with VCS later?

From christian.thalinger at oracle.com  Tue Aug 30 01:07:52 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 30 Aug 2011 10:07:52 +0200
Subject: Request for reviews (S): 7078382: JSR 292: don't count method
	handle adapters against inlining budgets
In-Reply-To: <E888AE2D-54D3-4D9C-855F-70A555D12385@oracle.com>
References: <E888AE2D-54D3-4D9C-855F-70A555D12385@oracle.com>
Message-ID: <5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com>

So, the change is so small that nobody cares? :-)

-- Christian

On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7078382/
> 
> 7078382: JSR 292: don't count method handle adapters against inlining budgets
> Reviewed-by:
> 
> Currently the code size of method handle adapters are counted against
> inlining budgets like DesiredMethodLimit.  This results to earlier
> compiler bailouts with method handle call sites than without leading
> to worse performance.
> 
> The fix is to return an adjusted bytecode size for method handle
> adapters for inlining decisions (the metric we use for now is the
> number of invokes).
> 
> Tested with JRuby benchmarks.
> 


From vladimir.kozlov at oracle.com  Tue Aug 30 07:59:58 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 30 Aug 2011 07:59:58 -0700
Subject: Request for reviews (S): 7078382: JSR 292: don't count method
	handle adapters against inlining budgets
In-Reply-To: <5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com>
References: <E888AE2D-54D3-4D9C-855F-70A555D12385@oracle.com>
	<5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com>
Message-ID: <4E5CFAEE.2010006@oracle.com>

+ // (a) Don't fully count method handle adapters against inlining
       ^ you have only one paragraph so (a) is not needed.

"sites of the adapter" --> "sites in the adapter"

Can you not assign inside loop's condition? You can do next:

+     while (iter.next() != ciBytecodeStream::EOBC()) {
+       if (Bytecodes::is_invoke(iter.cur_bc())) {

Other looks good.

Thanks,
Vladimir


On 8/30/11 1:07 AM, Christian Thalinger wrote:
> So, the change is so small that nobody cares? :-)
>
> -- Christian
>
> On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote:
>
>> http://cr.openjdk.java.net/~twisti/7078382/
>>
>> 7078382: JSR 292: don't count method handle adapters against inlining budgets
>> Reviewed-by:
>>
>> Currently the code size of method handle adapters are counted against
>> inlining budgets like DesiredMethodLimit.  This results to earlier
>> compiler bailouts with method handle call sites than without leading
>> to worse performance.
>>
>> The fix is to return an adjusted bytecode size for method handle
>> adapters for inlining decisions (the metric we use for now is the
>> number of invokes).
>>
>> Tested with JRuby benchmarks.
>>
>

From christian.thalinger at oracle.com  Tue Aug 30 08:35:11 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 30 Aug 2011 17:35:11 +0200
Subject: Request for reviews (S): 7078382: JSR 292: don't count method
	handle adapters against inlining budgets
In-Reply-To: <4E5CFAEE.2010006@oracle.com>
References: <E888AE2D-54D3-4D9C-855F-70A555D12385@oracle.com>
	<5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com>
	<4E5CFAEE.2010006@oracle.com>
Message-ID: <80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com>


On Aug 30, 2011, at 4:59 PM, Vladimir Kozlov wrote:

> + // (a) Don't fully count method handle adapters against inlining
>      ^ you have only one paragraph so (a) is not needed.

Yeah.  I thought maybe we get more in the future :-)  I removed it.

> 
> "sites of the adapter" --> "sites in the adapter"

Thanks.

> 
> Can you not assign inside loop's condition? You can do next:
> 
> +     while (iter.next() != ciBytecodeStream::EOBC()) {
> +       if (Bytecodes::is_invoke(iter.cur_bc())) {

Yes, I like that better.  I also changed the example in ciStreams.hpp as I got that code from there.

> 
> Other looks good.

Thank you.  I updated the webrev.

-- Christian

> 
> Thanks,
> Vladimir
> 
> 
> On 8/30/11 1:07 AM, Christian Thalinger wrote:
>> So, the change is so small that nobody cares? :-)
>> 
>> -- Christian
>> 
>> On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote:
>> 
>>> http://cr.openjdk.java.net/~twisti/7078382/
>>> 
>>> 7078382: JSR 292: don't count method handle adapters against inlining budgets
>>> Reviewed-by:
>>> 
>>> Currently the code size of method handle adapters are counted against
>>> inlining budgets like DesiredMethodLimit.  This results to earlier
>>> compiler bailouts with method handle call sites than without leading
>>> to worse performance.
>>> 
>>> The fix is to return an adjusted bytecode size for method handle
>>> adapters for inlining decisions (the metric we use for now is the
>>> number of invokes).
>>> 
>>> Tested with JRuby benchmarks.
>>> 
>> 


From vladimir.kozlov at oracle.com  Tue Aug 30 08:45:26 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 30 Aug 2011 08:45:26 -0700
Subject: Request for reviews (S): 7078382: JSR 292: don't count method
	handle adapters against inlining budgets
In-Reply-To: <80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com>
References: <E888AE2D-54D3-4D9C-855F-70A555D12385@oracle.com>
	<5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com>
	<4E5CFAEE.2010006@oracle.com>
	<80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com>
Message-ID: <4E5D0596.6060703@oracle.com>

Looks good.

Thanks,
Vladimir

Christian Thalinger wrote:
> On Aug 30, 2011, at 4:59 PM, Vladimir Kozlov wrote:
> 
>> + // (a) Don't fully count method handle adapters against inlining
>>      ^ you have only one paragraph so (a) is not needed.
> 
> Yeah.  I thought maybe we get more in the future :-)  I removed it.
> 
>> "sites of the adapter" --> "sites in the adapter"
> 
> Thanks.
> 
>> Can you not assign inside loop's condition? You can do next:
>>
>> +     while (iter.next() != ciBytecodeStream::EOBC()) {
>> +       if (Bytecodes::is_invoke(iter.cur_bc())) {
> 
> Yes, I like that better.  I also changed the example in ciStreams.hpp as I got that code from there.
> 
>> Other looks good.
> 
> Thank you.  I updated the webrev.
> 
> -- Christian
> 
>> Thanks,
>> Vladimir
>>
>>
>> On 8/30/11 1:07 AM, Christian Thalinger wrote:
>>> So, the change is so small that nobody cares? :-)
>>>
>>> -- Christian
>>>
>>> On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote:
>>>
>>>> http://cr.openjdk.java.net/~twisti/7078382/
>>>>
>>>> 7078382: JSR 292: don't count method handle adapters against inlining budgets
>>>> Reviewed-by:
>>>>
>>>> Currently the code size of method handle adapters are counted against
>>>> inlining budgets like DesiredMethodLimit.  This results to earlier
>>>> compiler bailouts with method handle call sites than without leading
>>>> to worse performance.
>>>>
>>>> The fix is to return an adjusted bytecode size for method handle
>>>> adapters for inlining decisions (the metric we use for now is the
>>>> number of invokes).
>>>>
>>>> Tested with JRuby benchmarks.
>>>>
> 

From christian.thalinger at oracle.com  Tue Aug 30 09:21:21 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 30 Aug 2011 18:21:21 +0200
Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline bytecoded
	method handle adapters
Message-ID: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com>

http://cr.openjdk.java.net/~twisti/7079673/

7079673: JSR 292: C1 should inline bytecoded method handle adapters
Reviewed-by:

The current JSR 292 support in C1 always does an invoke for method
handle invokes which results in a lot of C2I-I2C transfers.  This
results in very poor performance.

src/share/vm/c1/c1_GraphBuilder.cpp
src/share/vm/c1/c1_GraphBuilder.hpp
src/share/vm/c1/c1_Instruction.cpp
src/share/vm/c1/c1_Instruction.hpp
src/share/vm/classfile/javaClasses.cpp
src/share/vm/classfile/vmSymbols.hpp


From john.cuthbertson at oracle.com  Tue Aug 30 09:54:09 2011
From: john.cuthbertson at oracle.com (John Cuthbertson)
Date: Tue, 30 Aug 2011 09:54:09 -0700
Subject: RFR(S): 7066841: remove MacroAssembler::br_on_reg_cond() on sparc
Message-ID: <4E5D15B1.9010006@oracle.com>

Hi Everyone,

Can I have couple of volunteers look over these changes? The webrev can 
be found at: http://cr.openjdk.java.net/~johnc/7066841/webrev.0/.

These changes basically remove the macro assembler routine 
br_on_reg_cond and replace the remaining calls to that routine, in the 
G1 barriers, with an equivalent.

Testing: GC test suite and Kitchensink on 32/64 bit sparc with -Xint, 
-client -Xcomp, -XX:+TieredCompilation -XX:TieredStopAtLevel=1, and 
default. VerifyDuringGC and VerifyBeforeGC were also enabled to detect 
missing barriers.

Thanks,

JohnC

From tom.rodriguez at oracle.com  Tue Aug 30 09:56:23 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 30 Aug 2011 09:56:23 -0700
Subject: Request for reviews (S): 7078382: JSR 292: don't count method
	handle adapters against inlining budgets
In-Reply-To: <4E5D0596.6060703@oracle.com>
References: <E888AE2D-54D3-4D9C-855F-70A555D12385@oracle.com>
	<5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com>
	<4E5CFAEE.2010006@oracle.com>
	<80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com>
	<4E5D0596.6060703@oracle.com>
Message-ID: <6770529E-2661-4D36-8B1A-3607CF33CAE6@oracle.com>

Yes it looks good.

tom

On Aug 30, 2011, at 8:45 AM, Vladimir Kozlov wrote:

> Looks good.
> 
> Thanks,
> Vladimir
> 
> Christian Thalinger wrote:
>> On Aug 30, 2011, at 4:59 PM, Vladimir Kozlov wrote:
>>> + // (a) Don't fully count method handle adapters against inlining
>>>     ^ you have only one paragraph so (a) is not needed.
>> Yeah.  I thought maybe we get more in the future :-)  I removed it.
>>> "sites of the adapter" --> "sites in the adapter"
>> Thanks.
>>> Can you not assign inside loop's condition? You can do next:
>>> 
>>> +     while (iter.next() != ciBytecodeStream::EOBC()) {
>>> +       if (Bytecodes::is_invoke(iter.cur_bc())) {
>> Yes, I like that better.  I also changed the example in ciStreams.hpp as I got that code from there.
>>> Other looks good.
>> Thank you.  I updated the webrev.
>> -- Christian
>>> Thanks,
>>> Vladimir
>>> 
>>> 
>>> On 8/30/11 1:07 AM, Christian Thalinger wrote:
>>>> So, the change is so small that nobody cares? :-)
>>>> 
>>>> -- Christian
>>>> 
>>>> On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote:
>>>> 
>>>>> http://cr.openjdk.java.net/~twisti/7078382/
>>>>> 
>>>>> 7078382: JSR 292: don't count method handle adapters against inlining budgets
>>>>> Reviewed-by:
>>>>> 
>>>>> Currently the code size of method handle adapters are counted against
>>>>> inlining budgets like DesiredMethodLimit.  This results to earlier
>>>>> compiler bailouts with method handle call sites than without leading
>>>>> to worse performance.
>>>>> 
>>>>> The fix is to return an adjusted bytecode size for method handle
>>>>> adapters for inlining decisions (the metric we use for now is the
>>>>> number of invokes).
>>>>> 
>>>>> Tested with JRuby benchmarks.
>>>>> 


From christian.thalinger at oracle.com  Tue Aug 30 10:03:01 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 30 Aug 2011 19:03:01 +0200
Subject: Request for reviews (S): 7078382: JSR 292: don't count method
	handle adapters against inlining budgets
In-Reply-To: <6770529E-2661-4D36-8B1A-3607CF33CAE6@oracle.com>
References: <E888AE2D-54D3-4D9C-855F-70A555D12385@oracle.com>
	<5467AE30-2D5E-4514-B0CD-6FB7F56EE420@oracle.com>
	<4E5CFAEE.2010006@oracle.com>
	<80C99D6A-8D90-4512-9548-29597313B7FD@oracle.com>
	<4E5D0596.6060703@oracle.com>
	<6770529E-2661-4D36-8B1A-3607CF33CAE6@oracle.com>
Message-ID: <8F1601DB-D389-4451-8BF2-0530028D21B2@oracle.com>

Thanks, Tom and Vladimir.  -- Christian

On Aug 30, 2011, at 6:56 PM, Tom Rodriguez wrote:

> Yes it looks good.
> 
> tom
> 
> On Aug 30, 2011, at 8:45 AM, Vladimir Kozlov wrote:
> 
>> Looks good.
>> 
>> Thanks,
>> Vladimir
>> 
>> Christian Thalinger wrote:
>>> On Aug 30, 2011, at 4:59 PM, Vladimir Kozlov wrote:
>>>> + // (a) Don't fully count method handle adapters against inlining
>>>>    ^ you have only one paragraph so (a) is not needed.
>>> Yeah.  I thought maybe we get more in the future :-)  I removed it.
>>>> "sites of the adapter" --> "sites in the adapter"
>>> Thanks.
>>>> Can you not assign inside loop's condition? You can do next:
>>>> 
>>>> +     while (iter.next() != ciBytecodeStream::EOBC()) {
>>>> +       if (Bytecodes::is_invoke(iter.cur_bc())) {
>>> Yes, I like that better.  I also changed the example in ciStreams.hpp as I got that code from there.
>>>> Other looks good.
>>> Thank you.  I updated the webrev.
>>> -- Christian
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> 
>>>> On 8/30/11 1:07 AM, Christian Thalinger wrote:
>>>>> So, the change is so small that nobody cares? :-)
>>>>> 
>>>>> -- Christian
>>>>> 
>>>>> On Aug 23, 2011, at 9:20 PM, Christian Thalinger wrote:
>>>>> 
>>>>>> http://cr.openjdk.java.net/~twisti/7078382/
>>>>>> 
>>>>>> 7078382: JSR 292: don't count method handle adapters against inlining budgets
>>>>>> Reviewed-by:
>>>>>> 
>>>>>> Currently the code size of method handle adapters are counted against
>>>>>> inlining budgets like DesiredMethodLimit.  This results to earlier
>>>>>> compiler bailouts with method handle call sites than without leading
>>>>>> to worse performance.
>>>>>> 
>>>>>> The fix is to return an adjusted bytecode size for method handle
>>>>>> adapters for inlining decisions (the metric we use for now is the
>>>>>> number of invokes).
>>>>>> 
>>>>>> Tested with JRuby benchmarks.
>>>>>> 
> 


From vladimir.kozlov at oracle.com  Tue Aug 30 10:47:28 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 30 Aug 2011 10:47:28 -0700
Subject: RFR(S): 7066841: remove MacroAssembler::br_on_reg_cond() on sparc
In-Reply-To: <4E5D15B1.9010006@oracle.com>
References: <4E5D15B1.9010006@oracle.com>
Message-ID: <4E5D2230.70802@oracle.com>

Nice cleanup. Thank you, John.

Vladimir

John Cuthbertson wrote:
> Hi Everyone,
> 
> Can I have couple of volunteers look over these changes? The webrev can 
> be found at: http://cr.openjdk.java.net/~johnc/7066841/webrev.0/.
> 
> These changes basically remove the macro assembler routine 
> br_on_reg_cond and replace the remaining calls to that routine, in the 
> G1 barriers, with an equivalent.
> 
> Testing: GC test suite and Kitchensink on 32/64 bit sparc with -Xint, 
> -client -Xcomp, -XX:+TieredCompilation -XX:TieredStopAtLevel=1, and 
> default. VerifyDuringGC and VerifyBeforeGC were also enabled to detect 
> missing barriers.
> 
> Thanks,
> 
> JohnC

From tom.rodriguez at oracle.com  Tue Aug 30 11:50:33 2011
From: tom.rodriguez at oracle.com (tom.rodriguez at oracle.com)
Date: Tue, 30 Aug 2011 18:50:33 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7082263:
	Reflection::resolve_field/field_get/field_set are broken
Message-ID: <20110830185035.0FA6E47231@hg.openjdk.java.net>

Changeset: 19241ae0d839
Author:    never
Date:      2011-08-30 00:54 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/19241ae0d839

7082263: Reflection::resolve_field/field_get/field_set are broken
Reviewed-by: kvn, dholmes, stefank, coleenp

! make/linux/makefiles/mapfile-vers-debug
! make/linux/makefiles/mapfile-vers-product
! make/solaris/makefiles/debug.make
! make/solaris/makefiles/fastdebug.make
! make/solaris/makefiles/jvmg.make
- make/solaris/makefiles/mapfile-vers-nonproduct
! make/solaris/makefiles/optimized.make
! make/solaris/makefiles/product.make
! src/share/vm/precompiled.hpp
! src/share/vm/prims/jvm.cpp
! src/share/vm/prims/jvm.h
! src/share/vm/prims/unsafe.cpp
! src/share/vm/runtime/reflection.cpp
! src/share/vm/runtime/reflection.hpp
- src/share/vm/runtime/reflectionCompat.hpp


From tom.rodriguez at oracle.com  Tue Aug 30 12:08:23 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 30 Aug 2011 12:08:23 -0700
Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline
	bytecoded method handle adapters
In-Reply-To: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com>
References: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com>
Message-ID: <1C3853BB-211B-4082-950A-B837A4582775@oracle.com>

c1_GraphBuilder.cpp:

+   } else if (receiver->as_CheckCast()) {

I think this should be more robust.  The as_Phi and operand_count checks should be part of this guard instead of being asserts.

I assume this will be updated to do the optimization for VCS as well?  Otherwise it looks good.

tom

On Aug 30, 2011, at 9:21 AM, Christian Thalinger wrote:

> http://cr.openjdk.java.net/~twisti/7079673/
> 
> 7079673: JSR 292: C1 should inline bytecoded method handle adapters
> Reviewed-by:
> 
> The current JSR 292 support in C1 always does an invoke for method
> handle invokes which results in a lot of C2I-I2C transfers.  This
> results in very poor performance.
> 
> src/share/vm/c1/c1_GraphBuilder.cpp
> src/share/vm/c1/c1_GraphBuilder.hpp
> src/share/vm/c1/c1_Instruction.cpp
> src/share/vm/c1/c1_Instruction.hpp
> src/share/vm/classfile/javaClasses.cpp
> src/share/vm/classfile/vmSymbols.hpp
> 


From vladimir.kozlov at oracle.com  Tue Aug 30 14:26:24 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 30 Aug 2011 14:26:24 -0700
Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken
Message-ID: <4E5D5580.9010604@oracle.com>

http://cr.openjdk.java.net/~kvn/7085137/webrev

7085137: -XX:+VerifyOops is broken

I hit my new assert about different code emit size (7063629) when I specified 
-XX:+VerifyOops on sparc.  It uses set((intptr_t)msg, O0) instruction to set 
address of message which is new each time, as result set() size could be different.
Replace set() with patchable_set() to generate 8 instructions always.
Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg().

Thanks,
Vladimir

From tom.rodriguez at oracle.com  Tue Aug 30 16:12:02 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 30 Aug 2011 16:12:02 -0700
Subject: review for 7016881: JSR 292: JDI:
	sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds
Message-ID: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com>

http://cr.openjdk.java.net/~never/7016881
1 line changed: 0 ins; 0 del; 1 mod; 233 unchg

7016881: JSR 292: JDI: sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds
Reviewed-by:

This was a bug in the 7012081 changes.  A reference to rawIndex wasn't
updated to poolIndex so some times the wrong index was used resulting in
exceptions.  Tested with failing test.


From vladimir.kozlov at oracle.com  Tue Aug 30 15:00:48 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 30 Aug 2011 15:00:48 -0700
Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken
In-Reply-To: <9429CD3F52A14F59B559311750D61152@oracle.com>
References: <4E5D5580.9010604@oracle.com>
	<9429CD3F52A14F59B559311750D61152@oracle.com>
Message-ID: <4E5D5D90.3010302@oracle.com>

Thank you, Igor

Vladimir

Igor Veresov wrote:
>  Looks good 
> 
> igor
> 
> On Tuesday, August 30, 2011 at 2:26 PM, Vladimir Kozlov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/7085137/webrev
>>
>> 7085137: -XX:+VerifyOops is broken
>>
>> I hit my new assert about different code emit size (7063629) when I specified 
>> -XX:+VerifyOops on sparc. It uses set((intptr_t)msg, O0) instruction to set 
>> address of message which is new each time, as result set() size could be different.
>> Replace set() with patchable_set() to generate 8 instructions always.
>> Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg().
>>
>> Thanks,
>> Vladimir
> 
> 

From vladimir.kozlov at oracle.com  Tue Aug 30 16:22:32 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 30 Aug 2011 16:22:32 -0700
Subject: review for 7016881: JSR 292: JDI:
	sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds
In-Reply-To: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com>
References: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com>
Message-ID: <4E5D70B8.60304@oracle.com>

Good.

Vladimir

Tom Rodriguez wrote:
> http://cr.openjdk.java.net/~never/7016881
> 1 line changed: 0 ins; 0 del; 1 mod; 233 unchg
> 
> 7016881: JSR 292: JDI: sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds
> Reviewed-by:
> 
> This was a bug in the 7012081 changes.  A reference to rawIndex wasn't
> updated to poolIndex so some times the wrong index was used resulting in
> exceptions.  Tested with failing test.
> 

From igor.veresov at oracle.com  Tue Aug 30 14:57:00 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 30 Aug 2011 14:57:00 -0700
Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken
In-Reply-To: <4E5D5580.9010604@oracle.com>
References: <4E5D5580.9010604@oracle.com>
Message-ID: <9429CD3F52A14F59B559311750D61152@oracle.com>

 Looks good 

igor

On Tuesday, August 30, 2011 at 2:26 PM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7085137/webrev
> 
> 7085137: -XX:+VerifyOops is broken
> 
> I hit my new assert about different code emit size (7063629) when I specified 
> -XX:+VerifyOops on sparc. It uses set((intptr_t)msg, O0) instruction to set 
> address of message which is new each time, as result set() size could be different.
> Replace set() with patchable_set() to generate 8 instructions always.
> Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg().
> 
> Thanks,
> Vladimir


From igor.veresov at oracle.com  Tue Aug 30 17:19:22 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 30 Aug 2011 17:19:22 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops
	and CompressedOops
Message-ID: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>

This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 

I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 

Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/

Thanks,
igor


From tom.rodriguez at oracle.com  Tue Aug 30 17:51:18 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 30 Aug 2011 17:51:18 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and
	CompressedOops
In-Reply-To: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
Message-ID: <99BF7288-C64C-4F1A-93AD-70E668343872@oracle.com>


On Aug 30, 2011, at 5:19 PM, Igor Veresov wrote:

> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
> 
> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 

The 2K limit is fine.  I have some memory that the NMethodSizeLimit may be set at 32K because of the reach of branches on some platform.  I can't remember for sure though.

tom

> 
> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
> 
> Thanks,
> igor
> 


From tom.rodriguez at oracle.com  Tue Aug 30 17:53:48 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 30 Aug 2011 17:53:48 -0700
Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken
In-Reply-To: <4E5D5580.9010604@oracle.com>
References: <4E5D5580.9010604@oracle.com>
Message-ID: <432D4798-1664-4B0A-9C4D-B5A47174D5FD@oracle.com>

Looks good.

tom

On Aug 30, 2011, at 2:26 PM, Vladimir Kozlov wrote:

> http://cr.openjdk.java.net/~kvn/7085137/webrev
> 
> 7085137: -XX:+VerifyOops is broken
> 
> I hit my new assert about different code emit size (7063629) when I specified -XX:+VerifyOops on sparc.  It uses set((intptr_t)msg, O0) instruction to set address of message which is new each time, as result set() size could be different.
> Replace set() with patchable_set() to generate 8 instructions always.
> Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg().
> 
> Thanks,
> Vladimir


From vladimir.kozlov at oracle.com  Tue Aug 30 17:49:52 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 30 Aug 2011 17:49:52 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops
	and CompressedOops
In-Reply-To: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
Message-ID: <4E5D8530.5050507@oracle.com>

Igor,

May be you need to increase size only if VerifyOops is specified. What do you think?

Vladimir

Igor Veresov wrote:
> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
> 
> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 
> 
> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
> 
> Thanks,
> igor
> 

From igor.veresov at oracle.com  Tue Aug 30 18:09:13 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 30 Aug 2011 18:09:13 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with
	VerifyOops and CompressedOops
In-Reply-To: <99BF7288-C64C-4F1A-93AD-70E668343872@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
	<99BF7288-C64C-4F1A-93AD-70E668343872@oracle.com>
Message-ID: <6EF543BAD8094CB7AAEBD9A2CED6BE2C@oracle.com>

 I think it's being taken care of here: 

static int desired_max_code_buffer_size() {
#ifndef PPC
 return (int) NMethodSizeLimit; // default 256K or 512K
#else
 // conditional branches on PPC are restricted to 16 bit signed
 return MIN2((unsigned int)NMethodSizeLimit,32*K);
#endif
 }

igor

On Tuesday, August 30, 2011 at 5:51 PM, Tom Rodriguez wrote:

> 
> On Aug 30, 2011, at 5:19 PM, Igor Veresov wrote:
> 
> > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
> > 
> > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 
> 
> The 2K limit is fine. I have some memory that the NMethodSizeLimit may be set at 32K because of the reach of branches on some platform. I can't remember for sure though.
> 
> tom
> 
> > 
> > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
> > 
> > Thanks,
> > igor


From igor.veresov at oracle.com  Tue Aug 30 18:12:29 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 30 Aug 2011 18:12:29 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with
	VerifyOops and CompressedOops
In-Reply-To: <4E5D8530.5050507@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
	<4E5D8530.5050507@oracle.com>
Message-ID: <C39F39C09D4E4946870C9524DAFA92A4@oracle.com>

I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. 

igor

On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote:

> Igor,
> 
> May be you need to increase size only if VerifyOops is specified. What do you think?
> 
> Vladimir
> 
> Igor Veresov wrote:
> > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
> > 
> > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 
> > 
> > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
> > 
> > Thanks,
> > igor


From vladimir.kozlov at oracle.com  Tue Aug 30 18:17:40 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 30 Aug 2011 18:17:40 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops
	and CompressedOops
In-Reply-To: <C39F39C09D4E4946870C9524DAFA92A4@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>	<4E5D8530.5050507@oracle.com>
	<C39F39C09D4E4946870C9524DAFA92A4@oracle.com>
Message-ID: <4E5D8BB4.7020505@oracle.com>

Then it is fine. Changes looks good.

Vladimir

Igor Veresov wrote:
> I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. 
> 
> igor
> 
> On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote:
> 
>> Igor,
>>
>> May be you need to increase size only if VerifyOops is specified. What do you think?
>>
>> Vladimir
>>
>> Igor Veresov wrote:
>>> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
>>>
>>> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 
>>>
>>> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
>>>
>>> Thanks,
>>> igor
> 
> 

From tom.rodriguez at oracle.com  Tue Aug 30 18:24:08 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 30 Aug 2011 18:24:08 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and
	CompressedOops
In-Reply-To: <C39F39C09D4E4946870C9524DAFA92A4@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
	<4E5D8530.5050507@oracle.com>
	<C39F39C09D4E4946870C9524DAFA92A4@oracle.com>
Message-ID: <431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com>


On Aug 30, 2011, at 6:12 PM, Igor Veresov wrote:

> I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. 

It's 32k * wordSize so it's already twice as big on 64 bit.  We might want to revisit these limits for tiered though since profiling generates quite a bit of extra code.

tom

> 
> igor
> 
> On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote:
> 
>> Igor,
>> 
>> May be you need to increase size only if VerifyOops is specified. What do you think?
>> 
>> Vladimir
>> 
>> Igor Veresov wrote:
>>> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
>>> 
>>> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 
>>> 
>>> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
>>> 
>>> Thanks,
>>> igor
> 
> 


From tom.rodriguez at oracle.com  Tue Aug 30 18:24:19 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 30 Aug 2011 18:24:19 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and
	CompressedOops
In-Reply-To: <6EF543BAD8094CB7AAEBD9A2CED6BE2C@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
	<99BF7288-C64C-4F1A-93AD-70E668343872@oracle.com>
	<6EF543BAD8094CB7AAEBD9A2CED6BE2C@oracle.com>
Message-ID: <E210798B-E2F6-449B-B304-DCAC452A3F59@oracle.com>


On Aug 30, 2011, at 6:09 PM, Igor Veresov wrote:

> I think it's being taken care of here: 
> 
> static int desired_max_code_buffer_size() {
> #ifndef PPC
> return (int) NMethodSizeLimit; // default 256K or 512K
> #else
> // conditional branches on PPC are restricted to 16 bit signed
> return MIN2((unsigned int)NMethodSizeLimit,32*K);
> #endif
> }

Ah, that's what I'm thinking of.

tom

> 
> igor
> 
> On Tuesday, August 30, 2011 at 5:51 PM, Tom Rodriguez wrote:
> 
>> 
>> On Aug 30, 2011, at 5:19 PM, Igor Veresov wrote:
>> 
>>> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
>>> 
>>> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 
>> 
>> The 2K limit is fine. I have some memory that the NMethodSizeLimit may be set at 32K because of the reach of branches on some platform. I can't remember for sure though.
>> 
>> tom
>> 
>>> 
>>> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
>>> 
>>> Thanks,
>>> igor
> 
> 


From vladimir.kozlov at ORACLE.COM  Tue Aug 30 18:14:15 2011
From: vladimir.kozlov at ORACLE.COM (Vladimir Kozlov)
Date: Tue, 30 Aug 2011 18:14:15 -0700
Subject: Request for reviews (S): 7085137: -XX:+VerifyOops is broken
In-Reply-To: <432D4798-1664-4B0A-9C4D-B5A47174D5FD@oracle.com>
References: <4E5D5580.9010604@oracle.com>
	<432D4798-1664-4B0A-9C4D-B5A47174D5FD@oracle.com>
Message-ID: <4E5D8AE7.2050404@oracle.com>

Thank you, Tom

Vladimir

Tom Rodriguez wrote:
> Looks good.
> 
> tom
> 
> On Aug 30, 2011, at 2:26 PM, Vladimir Kozlov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/7085137/webrev
>>
>> 7085137: -XX:+VerifyOops is broken
>>
>> I hit my new assert about different code emit size (7063629) when I specified -XX:+VerifyOops on sparc.  It uses set((intptr_t)msg, O0) instruction to set address of message which is new each time, as result set() size could be different.
>> Replace set() with patchable_set() to generate 8 instructions always.
>> Add missing case Op_PrefetchAllocation in verification code in emit_form3_mem_reg().
>>
>> Thanks,
>> Vladimir
> 

From igor.veresov at oracle.com  Tue Aug 30 18:37:25 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 30 Aug 2011 18:37:25 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with
	VerifyOops and CompressedOops
In-Reply-To: <431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
	<4E5D8530.5050507@oracle.com>
	<C39F39C09D4E4946870C9524DAFA92A4@oracle.com>
	<431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com>
Message-ID: <CB569286A2E04D2D8EB0B4F7BA1378EB@oracle.com>

On Tuesday, August 30, 2011 at 6:24 PM, Tom Rodriguez wrote:
> 
> On Aug 30, 2011, at 6:12 PM, Igor Veresov wrote:
> 
> > I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. 
> 
> It's 32k * wordSize so it's already twice as big on 64 bit. We might want to revisit these limits for tiered though since profiling generates quite a bit of extra code.
> 
Yes, of course you're right. I was thinking about something else when I replied...

I guess we could make the increase predicated upon the verification, but I thought it should be pretty harmless to increase it since those buffers are allocated only once per compiler thread.

igor

> tom
> 
> > 
> > igor
> > 
> > On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote:
> > 
> > > Igor,
> > > 
> > > May be you need to increase size only if VerifyOops is specified. What do you think?
> > > 
> > > Vladimir
> > > 
> > > Igor Veresov wrote:
> > > > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
> > > > 
> > > > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 
> > > > 
> > > > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
> > > > 
> > > > Thanks,
> > > > igor


From tom.rodriguez at oracle.com  Tue Aug 30 18:38:07 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Tue, 30 Aug 2011 18:38:07 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with VerifyOops and
	CompressedOops
In-Reply-To: <CB569286A2E04D2D8EB0B4F7BA1378EB@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
	<4E5D8530.5050507@oracle.com>
	<C39F39C09D4E4946870C9524DAFA92A4@oracle.com>
	<431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com>
	<CB569286A2E04D2D8EB0B4F7BA1378EB@oracle.com>
Message-ID: <F0D02B33-A00D-44F7-A34A-F5DA3DA67245@oracle.com>


On Aug 30, 2011, at 6:37 PM, Igor Veresov wrote:

> On Tuesday, August 30, 2011 at 6:24 PM, Tom Rodriguez wrote:
>> 
>> On Aug 30, 2011, at 6:12 PM, Igor Veresov wrote:
>> 
>>> I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. 
>> 
>> It's 32k * wordSize so it's already twice as big on 64 bit. We might want to revisit these limits for tiered though since profiling generates quite a bit of extra code.
>> 
> Yes, of course you're right. I was thinking about something else when I replied...
> 
> I guess we could make the increase predicated upon the verification, but I thought it should be pretty harmless to increase it since those buffers are allocated only once per compiler thread.

Either way.

tom

> 
> igor
> 
>> tom
>> 
>>> 
>>> igor
>>> 
>>> On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote:
>>> 
>>>> Igor,
>>>> 
>>>> May be you need to increase size only if VerifyOops is specified. What do you think?
>>>> 
>>>> Vladimir
>>>> 
>>>> Igor Veresov wrote:
>>>>> This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
>>>>> 
>>>>> I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 
>>>>> 
>>>>> Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
>>>>> 
>>>>> Thanks,
>>>>> igor
> 
> 


From igor.veresov at oracle.com  Tue Aug 30 18:42:45 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 30 Aug 2011 18:42:45 -0700
Subject: review(XS): 7085279: C1 overflows code buffer with
	VerifyOops and CompressedOops
In-Reply-To: <F0D02B33-A00D-44F7-A34A-F5DA3DA67245@oracle.com>
References: <A9F5ED6558784E72B679CA5B7761F2F0@oracle.com>
	<4E5D8530.5050507@oracle.com>
	<C39F39C09D4E4946870C9524DAFA92A4@oracle.com>
	<431272B6-2C57-49BD-97DD-C8721A54E32C@oracle.com>
	<CB569286A2E04D2D8EB0B4F7BA1378EB@oracle.com>
	<F0D02B33-A00D-44F7-A34A-F5DA3DA67245@oracle.com>
Message-ID: <AEA5363CB33C4F31AB0B3DF92247E5EA@oracle.com>

 I'll just go with increasing it. Otherwise we'll have to factor in tiered, compressed oops, verification. 

Thanks Tom and Vladimir! 

igor

On Tuesday, August 30, 2011 at 6:38 PM, Tom Rodriguez wrote:

> 
> On Aug 30, 2011, at 6:37 PM, Igor Veresov wrote:
> 
> > On Tuesday, August 30, 2011 at 6:24 PM, Tom Rodriguez wrote:
> > > 
> > > On Aug 30, 2011, at 6:12 PM, Igor Veresov wrote:
> > > 
> > > > I just thought that it might need to be adjusted anyway for 64bit. I haven't seen any problems with that (because of the inlining constraints), but intuitively we would hit the limit sooner on 64 bit. 
> > > 
> > > It's 32k * wordSize so it's already twice as big on 64 bit. We might want to revisit these limits for tiered though since profiling generates quite a bit of extra code.
> > Yes, of course you're right. I was thinking about something else when I replied...
> > 
> > I guess we could make the increase predicated upon the verification, but I thought it should be pretty harmless to increase it since those buffers are allocated only once per compiler thread.
> 
> Either way.
> 
> tom
> 
> > 
> > igor
> > 
> > > tom
> > > 
> > > > 
> > > > igor
> > > > 
> > > > On Tuesday, August 30, 2011 at 5:49 PM, Vladimir Kozlov wrote:
> > > > 
> > > > > Igor,
> > > > > 
> > > > > May be you need to increase size only if VerifyOops is specified. What do you think?
> > > > > 
> > > > > Vladimir
> > > > > 
> > > > > Igor Veresov wrote:
> > > > > > This happens during emission of LIR_OpAllocObj. C1 assumes that a LIR instruction will fit into 1K but in this case it's not true because the allocation code is pretty large by itself and oop verfication adds an order of magnitude more of additional code. 
> > > > > > 
> > > > > > I bumped up the size of the code per LIR operation to 2K. And also increased the size of the NMethodSizeLimit to accommodate all the verification code emitted. 
> > > > > > 
> > > > > > Webrev: http://cr.openjdk.java.net/~iveresov/7085279/webrev.00/
> > > > > > 
> > > > > > Thanks,
> > > > > > igor


From igor.veresov at oracle.com  Tue Aug 30 18:47:36 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 30 Aug 2011 18:47:36 -0700
Subject: RFR(S): 7066841: remove MacroAssembler::br_on_reg_cond() on sparc
In-Reply-To: <4E5D15B1.9010006@oracle.com>
References: <4E5D15B1.9010006@oracle.com>
Message-ID: <4EA6DFEB650F440C8DCBD000C1F07B34@oracle.com>

 Looks good. 

igor

On Tuesday, August 30, 2011 at 9:54 AM, John Cuthbertson wrote:

> Hi Everyone,
> 
> Can I have couple of volunteers look over these changes? The webrev can 
> be found at: http://cr.openjdk.java.net/~johnc/7066841/webrev.0/.
> 
> These changes basically remove the macro assembler routine 
> br_on_reg_cond and replace the remaining calls to that routine, in the 
> G1 barriers, with an equivalent.
> 
> Testing: GC test suite and Kitchensink on 32/64 bit sparc with -Xint, 
> -client -Xcomp, -XX:+TieredCompilation -XX:TieredStopAtLevel=1, and 
> default. VerifyDuringGC and VerifyBeforeGC were also enabled to detect 
> missing barriers.
> 
> Thanks,
> 
> JohnC


From igor.veresov at oracle.com  Tue Aug 30 21:28:26 2011
From: igor.veresov at oracle.com (igor.veresov at oracle.com)
Date: Wed, 31 Aug 2011 04:28:26 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7085279: C1 overflows code buffer with
	VerifyOops and CompressedOops
Message-ID: <20110831042828.9290C47248@hg.openjdk.java.net>

Changeset: b346f13112d8
Author:    iveresov
Date:      2011-08-30 19:01 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/b346f13112d8

7085279: C1 overflows code buffer with VerifyOops and CompressedOops
Summary: Increase the limit of code emitted per LIR instruction, increase the max size of the nmethod generated by C1
Reviewed-by: never, kvn, johnc

! src/share/vm/c1/c1_LIRAssembler.cpp
! src/share/vm/c1/c1_globals.hpp


From christian.thalinger at oracle.com  Wed Aug 31 03:36:52 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 31 Aug 2011 12:36:52 +0200
Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline
	bytecoded method handle adapters
In-Reply-To: <1C3853BB-211B-4082-950A-B837A4582775@oracle.com>
References: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com>
	<1C3853BB-211B-4082-950A-B837A4582775@oracle.com>
Message-ID: <CB3CC88E-1E34-49A9-B706-F38DD2D53316@oracle.com>


On Aug 30, 2011, at 9:08 PM, Tom Rodriguez wrote:

> c1_GraphBuilder.cpp:
> 
> +   } else if (receiver->as_CheckCast()) {
> 
> I think this should be more robust.  The as_Phi and operand_count checks should be part of this guard instead of being asserts.

I changed that and updated the webrev.

> 
> I assume this will be updated to do the optimization for VCS as well?  Otherwise it looks good.

For the VCS optimization, I decided to split that off into its own CR since there where a couple of overlaps between C1 and C2.  It's covered by:

7085404: JSR 292: VolatileCallSites should have push notification too

http://cr.openjdk.java.net/~twisti/7085404/

To get this right the order of pushing these related CRs will be:

1. 7079673: JSR 292: C1 should inline bytecoded method handle adapters
2. 7085404: JSR 292: VolatileCallSites should have push notification too
3. 7071709: JSR 292: switchpoint invalidation should be pushed not pulled

-- Christian

> 
> tom
> 
> On Aug 30, 2011, at 9:21 AM, Christian Thalinger wrote:
> 
>> http://cr.openjdk.java.net/~twisti/7079673/
>> 
>> 7079673: JSR 292: C1 should inline bytecoded method handle adapters
>> Reviewed-by:
>> 
>> The current JSR 292 support in C1 always does an invoke for method
>> handle invokes which results in a lot of C2I-I2C transfers.  This
>> results in very poor performance.
>> 
>> src/share/vm/c1/c1_GraphBuilder.cpp
>> src/share/vm/c1/c1_GraphBuilder.hpp
>> src/share/vm/c1/c1_Instruction.cpp
>> src/share/vm/c1/c1_Instruction.hpp
>> src/share/vm/classfile/javaClasses.cpp
>> src/share/vm/classfile/vmSymbols.hpp
>> 
> 


From christian.thalinger at oracle.com  Wed Aug 31 03:42:58 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 31 Aug 2011 12:42:58 +0200
Subject: review for 7016881: JSR 292: JDI:
	sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds
In-Reply-To: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com>
References: <4DE24E60-5CE6-4417-A7D9-B58C5563C8D3@oracle.com>
Message-ID: <BFA72CD2-5F7E-4E8E-B92D-FD32B0A53076@oracle.com>

Looks good.  -- Christian

On Aug 31, 2011, at 1:12 AM, Tom Rodriguez wrote:

> http://cr.openjdk.java.net/~never/7016881
> 1 line changed: 0 ins; 0 del; 1 mod; 233 unchg
> 
> 7016881: JSR 292: JDI: sun.jvm.hotspot.utilities.AssertionFailure: index out of bounds
> Reviewed-by:
> 
> This was a bug in the 7012081 changes.  A reference to rawIndex wasn't
> updated to poolIndex so some times the wrong index was used resulting in
> exceptions.  Tested with failing test.
> 


From christian.thalinger at oracle.com  Wed Aug 31 03:42:07 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 31 Aug 2011 12:42:07 +0200
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <A99BE02A-9B40-48F0-A8AA-18E329C56153@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
	<E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
	<F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
	<9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com>
	<0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com>
	<2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com>
	<A99BE02A-9B40-48F0-A8AA-18E329C56153@oracle.com>
Message-ID: <DF313D32-4A4A-41F8-AF7F-6D078191B7D9@oracle.com>


On Aug 29, 2011, at 9:41 PM, John Rose wrote:

> Yes, deal with volatile fields later. 
> 
> I do think that VCS should get push notif now. 

They will:

7085404: JSR 292: VolatileCallSites should have push notification too

http://cr.openjdk.java.net/~twisti/7085404/

This patch now only contains the SwitchPoint optimization and will be pushed as the last of my fixes (as stated in an earlier email):

http://cr.openjdk.java.net/~twisti/7071709/

Tom, John, can you review this again?

-- Christian

> 
> -- John  (on my iPhone)
> 
> On Aug 29, 2011, at 11:56 AM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
> 
>> Yes, I think that would be the best approach.  John, what do you think, optimize CCS and MCS for now and deal with VCS later?


From christian.thalinger at oracle.com  Wed Aug 31 08:18:21 2011
From: christian.thalinger at oracle.com (christian.thalinger at oracle.com)
Date: Wed, 31 Aug 2011 15:18:21 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7078382: JSR 292: don't count method
	handle adapters against inlining budgets
Message-ID: <20110831151823.8BB7B47261@hg.openjdk.java.net>

Changeset: de847cac9235
Author:    twisti
Date:      2011-08-31 01:40 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/de847cac9235

7078382: JSR 292: don't count method handle adapters against inlining budgets
Reviewed-by: kvn, never

! src/share/vm/c1/c1_GraphBuilder.cpp
! src/share/vm/ci/ciMethod.cpp
! src/share/vm/ci/ciMethod.hpp
! src/share/vm/ci/ciStreams.hpp
! src/share/vm/interpreter/bytecodes.hpp
! src/share/vm/opto/bytecodeInfo.cpp


From tom.rodriguez at oracle.com  Wed Aug 31 10:24:30 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 31 Aug 2011 10:24:30 -0700
Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline
	bytecoded method handle adapters
In-Reply-To: <CB3CC88E-1E34-49A9-B706-F38DD2D53316@oracle.com>
References: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com>
	<1C3853BB-211B-4082-950A-B837A4582775@oracle.com>
	<CB3CC88E-1E34-49A9-B706-F38DD2D53316@oracle.com>
Message-ID: <E57355D8-EC9F-495A-B980-4EBC8BA331BE@oracle.com>


On Aug 31, 2011, at 3:36 AM, Christian Thalinger wrote:

> 
> On Aug 30, 2011, at 9:08 PM, Tom Rodriguez wrote:
> 
>> c1_GraphBuilder.cpp:
>> 
>> +   } else if (receiver->as_CheckCast()) {
>> 
>> I think this should be more robust.  The as_Phi and operand_count checks should be part of this guard instead of being asserts.
> 
> I changed that and updated the webrev.
> 
>> 
>> I assume this will be updated to do the optimization for VCS as well?  Otherwise it looks good.
> 
> For the VCS optimization, I decided to split that off into its own CR since there where a couple of overlaps between C1 and C2.  It's covered by:
> 
> 7085404: JSR 292: VolatileCallSites should have push notification too
> 
> http://cr.openjdk.java.net/~twisti/7085404/
> 
> To get this right the order of pushing these related CRs will be:
> 
> 1. 7079673: JSR 292: C1 should inline bytecoded method handle adapters
> 2. 7085404: JSR 292: VolatileCallSites should have push notification too
> 3. 7071709: JSR 292: switchpoint invalidation should be pushed not pulled

These all look good.

tom

> 
> -- Christian
> 
>> 
>> tom
>> 
>> On Aug 30, 2011, at 9:21 AM, Christian Thalinger wrote:
>> 
>>> http://cr.openjdk.java.net/~twisti/7079673/
>>> 
>>> 7079673: JSR 292: C1 should inline bytecoded method handle adapters
>>> Reviewed-by:
>>> 
>>> The current JSR 292 support in C1 always does an invoke for method
>>> handle invokes which results in a lot of C2I-I2C transfers.  This
>>> results in very poor performance.
>>> 
>>> src/share/vm/c1/c1_GraphBuilder.cpp
>>> src/share/vm/c1/c1_GraphBuilder.hpp
>>> src/share/vm/c1/c1_Instruction.cpp
>>> src/share/vm/c1/c1_Instruction.hpp
>>> src/share/vm/classfile/javaClasses.cpp
>>> src/share/vm/classfile/vmSymbols.hpp
>>> 
>> 
> 


From christian.thalinger at oracle.com  Wed Aug 31 11:45:14 2011
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 31 Aug 2011 20:45:14 +0200
Subject: Request for reviews (M): 7079673: JSR 292: C1 should inline
	bytecoded method handle adapters
In-Reply-To: <E57355D8-EC9F-495A-B980-4EBC8BA331BE@oracle.com>
References: <454A8FE2-FC6F-450C-9473-942E2CC4CA5D@oracle.com>
	<1C3853BB-211B-4082-950A-B837A4582775@oracle.com>
	<CB3CC88E-1E34-49A9-B706-F38DD2D53316@oracle.com>
	<E57355D8-EC9F-495A-B980-4EBC8BA331BE@oracle.com>
Message-ID: <ED279DDF-C54F-4C52-BC14-719910955CCD@oracle.com>


On Aug 31, 2011, at 7:24 PM, Tom Rodriguez wrote:

> 
> On Aug 31, 2011, at 3:36 AM, Christian Thalinger wrote:
> 
>> 
>> On Aug 30, 2011, at 9:08 PM, Tom Rodriguez wrote:
>> 
>>> c1_GraphBuilder.cpp:
>>> 
>>> +   } else if (receiver->as_CheckCast()) {
>>> 
>>> I think this should be more robust.  The as_Phi and operand_count checks should be part of this guard instead of being asserts.
>> 
>> I changed that and updated the webrev.
>> 
>>> 
>>> I assume this will be updated to do the optimization for VCS as well?  Otherwise it looks good.
>> 
>> For the VCS optimization, I decided to split that off into its own CR since there where a couple of overlaps between C1 and C2.  It's covered by:
>> 
>> 7085404: JSR 292: VolatileCallSites should have push notification too
>> 
>> http://cr.openjdk.java.net/~twisti/7085404/
>> 
>> To get this right the order of pushing these related CRs will be:
>> 
>> 1. 7079673: JSR 292: C1 should inline bytecoded method handle adapters
>> 2. 7085404: JSR 292: VolatileCallSites should have push notification too
>> 3. 7071709: JSR 292: switchpoint invalidation should be pushed not pulled
> 
> These all look good.

Thanks, Tom.  -- Christian

> 
> tom
> 
>> 
>> -- Christian
>> 
>>> 
>>> tom
>>> 
>>> On Aug 30, 2011, at 9:21 AM, Christian Thalinger wrote:
>>> 
>>>> http://cr.openjdk.java.net/~twisti/7079673/
>>>> 
>>>> 7079673: JSR 292: C1 should inline bytecoded method handle adapters
>>>> Reviewed-by:
>>>> 
>>>> The current JSR 292 support in C1 always does an invoke for method
>>>> handle invokes which results in a lot of C2I-I2C transfers.  This
>>>> results in very poor performance.
>>>> 
>>>> src/share/vm/c1/c1_GraphBuilder.cpp
>>>> src/share/vm/c1/c1_GraphBuilder.hpp
>>>> src/share/vm/c1/c1_Instruction.cpp
>>>> src/share/vm/c1/c1_Instruction.hpp
>>>> src/share/vm/classfile/javaClasses.cpp
>>>> src/share/vm/classfile/vmSymbols.hpp
>>>> 
>>> 
>> 
> 


From vladimir.kozlov at oracle.com  Wed Aug 31 12:08:23 2011
From: vladimir.kozlov at oracle.com (vladimir.kozlov at oracle.com)
Date: Wed, 31 Aug 2011 19:08:23 +0000
Subject: hg: hsx/hotspot-comp/hotspot: 7085137: -XX:+VerifyOops is broken
Message-ID: <20110831190826.731244726B@hg.openjdk.java.net>

Changeset: a64d352d1118
Author:    kvn
Date:      2011-08-31 09:48 -0700
URL:       http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/a64d352d1118

7085137: -XX:+VerifyOops is broken
Summary: Replace set() with patchable_set() to generate 8 instructions always.
Reviewed-by: iveresov, never, roland

! src/cpu/sparc/vm/assembler_sparc.cpp
! src/cpu/sparc/vm/sparc.ad


From tom.rodriguez at oracle.com  Wed Aug 31 12:56:38 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 31 Aug 2011 12:56:38 -0700
Subject: review for 7051798: SA-JDI: NPE in
	Frame.addressOfStackSlot(Frame.java:244)
Message-ID: <6D60141A-85A4-4762-946E-A4509CDA2CAA@oracle.com>

http://cr.openjdk.java.net/~never/7051798
1346 lines changed: 585 ins; 637 del; 124 mod; 26143 unchg

7051798: SA-JDI: NPE in Frame.addressOfStackSlot(Frame.java:244)
Reviewed-by:

The SA was never updated to handle ricochet frames so stack walking
was broken when they were encountered.  The X86 stack walking code
hadn't been updated in a while so I sync'ed it the current version of
frame_x86.cpp and eliminated the AMD64 variants of many of these
classes since they should be exactly that same.  All SA related
exceptions in the mlvm test have been fixed.  I had to convert the
PcDesc flags into masks since the SA can't deal with bitfields.

Because of some JDI features being used by the test I had to fix other
unreported SAJDI issues when asking for locals for optimized and
native frames.  I also hit an unreported assertion failure in C1 with
large frames.

Tested with failing mlvm sajdi tests from report plus the regular
tmtools and sajdi test to stress the stack walking.


From john.r.rose at oracle.com  Wed Aug 31 14:34:26 2011
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 31 Aug 2011 14:34:26 -0700
Subject: Request for reviews (S): 7071709: JSR 292: switchpoint
	invalidation should be pushed not pulled
In-Reply-To: <DF313D32-4A4A-41F8-AF7F-6D078191B7D9@oracle.com>
References: <5F4038AD-6959-480E-9DB8-1DEF17D6C4A6@oracle.com>
	<52852391-3B23-4326-B75C-D2CB502C52AF@oracle.com>
	<E90E01D9-DAA5-465A-B553-11842E5ED514@oracle.com>
	<F763B66B-FDC7-4D33-97E4-820D07DD62FB@oracle.com>
	<9EC6D299-AE3B-44C7-AC71-5526AB810557@oracle.com>
	<0EEADB80-E8F9-49B7-BF9C-9FD4A50BD73D@oracle.com>
	<2F4D4364-320E-4CCD-A6CB-28E1535FBACF@oracle.com>
	<A99BE02A-9B40-48F0-A8AA-18E329C56153@oracle.com>
	<DF313D32-4A4A-41F8-AF7F-6D078191B7D9@oracle.com>
Message-ID: <F52C832D-EEFB-42CB-8004-379C8B56026E@oracle.com>

On Aug 31, 2011, at 3:42 AM, Christian Thalinger wrote:

> This patch now only contains the SwitchPoint optimization and will be pushed as the last of my fixes (as stated in an earlier email):
> 
> http://cr.openjdk.java.net/~twisti/7071709/
> 
> Tom, John, can you review this again?


It is good, but I have a question.  What happens when this line produces a null value for the target (because of -Xcomp etc.):
  ciMethodHandle* target = call_site->get_target();

Shouldn't there be a guard for that edge case, in case Murphy's Law kicks in?

-- John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110831/7b72e42f/attachment.html 

From tom.rodriguez at oracle.com  Wed Aug 31 15:32:09 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 31 Aug 2011 15:32:09 -0700
Subject: review for 7083786: dead various dead chunks of code
Message-ID: <FB1444DA-F717-4DED-98C1-B57AEFE2885E@oracle.com>

http://cr.openjdk.java.net/~never/7083786
180 lines changed: 0 ins; 178 del; 2 mod; 32710 unchg

7083786: dead various dead chunks of code
Reviewed-by:

Delete some dead code.  Tested with JPRT.


From igor.veresov at oracle.com  Wed Aug 31 16:24:55 2011
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 31 Aug 2011 16:24:55 -0700
Subject: review for 7083786: dead various dead chunks of code
In-Reply-To: <FB1444DA-F717-4DED-98C1-B57AEFE2885E@oracle.com>
References: <FB1444DA-F717-4DED-98C1-B57AEFE2885E@oracle.com>
Message-ID: <7CF3CFF69B5042E0B27A86A12214C804@oracle.com>

 Looks good. 

igor

On Wednesday, August 31, 2011 at 3:32 PM, Tom Rodriguez wrote:

> http://cr.openjdk.java.net/~never/7083786
> 180 lines changed: 0 ins; 178 del; 2 mod; 32710 unchg
> 
> 7083786: dead various dead chunks of code
> Reviewed-by:
> 
> Delete some dead code. Tested with JPRT.


From vladimir.kozlov at oracle.com  Wed Aug 31 16:33:37 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 31 Aug 2011 16:33:37 -0700
Subject: review for 7083786: dead various dead chunks of code
In-Reply-To: <FB1444DA-F717-4DED-98C1-B57AEFE2885E@oracle.com>
References: <FB1444DA-F717-4DED-98C1-B57AEFE2885E@oracle.com>
Message-ID: <4E5EC4D1.4010502@oracle.com>

Looks good. How did you find all these cases (except #if 0)?

Thanks,
Vladimir

Tom Rodriguez wrote:
> http://cr.openjdk.java.net/~never/7083786
> 180 lines changed: 0 ins; 178 del; 2 mod; 32710 unchg
> 
> 7083786: dead various dead chunks of code
> Reviewed-by:
> 
> Delete some dead code.  Tested with JPRT.
> 

From tom.rodriguez at oracle.com  Wed Aug 31 16:44:07 2011
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Wed, 31 Aug 2011 16:44:07 -0700
Subject: review for 7083786: dead various dead chunks of code
In-Reply-To: <4E5EC4D1.4010502@oracle.com>
References: <FB1444DA-F717-4DED-98C1-B57AEFE2885E@oracle.com>
	<4E5EC4D1.4010502@oracle.com>
Message-ID: <0B5DF037-D4A4-4D84-BC8B-AF8A7D34346F@oracle.com>

I noticed them when doing various other changes and ended up collecting them.  Volker reported one of them.  Thanks!

tom

On Aug 31, 2011, at 4:33 PM, Vladimir Kozlov wrote:

> Looks good. How did you find all these cases (except #if 0)?
> 
> Thanks,
> Vladimir
> 
> Tom Rodriguez wrote:
>> http://cr.openjdk.java.net/~never/7083786
>> 180 lines changed: 0 ins; 178 del; 2 mod; 32710 unchg
>> 7083786: dead various dead chunks of code
>> Reviewed-by:
>> Delete some dead code.  Tested with JPRT.


From vladimir.kozlov at oracle.com  Wed Aug 31 17:09:29 2011
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 31 Aug 2011 17:09:29 -0700
Subject: review for 7051798: SA-JDI: NPE in
	Frame.addressOfStackSlot(Frame.java:244)
In-Reply-To: <6D60141A-85A4-4762-946E-A4509CDA2CAA@oracle.com>
References: <6D60141A-85A4-4762-946E-A4509CDA2CAA@oracle.com>
Message-ID: <4E5ECD39.9060106@oracle.com>

I think it looks good.

Thanks,
Vladimir

Tom Rodriguez wrote:
> http://cr.openjdk.java.net/~never/7051798
> 1346 lines changed: 585 ins; 637 del; 124 mod; 26143 unchg
> 
> 7051798: SA-JDI: NPE in Frame.addressOfStackSlot(Frame.java:244)
> Reviewed-by:
> 
> The SA was never updated to handle ricochet frames so stack walking
> was broken when they were encountered.  The X86 stack walking code
> hadn't been updated in a while so I sync'ed it the current version of
> frame_x86.cpp and eliminated the AMD64 variants of many of these
> classes since they should be exactly that same.  All SA related
> exceptions in the mlvm test have been fixed.  I had to convert the
> PcDesc flags into masks since the SA can't deal with bitfields.
> 
> Because of some JDI features being used by the test I had to fix other
> unreported SAJDI issues when asking for locals for optimized and
> native frames.  I also hit an unreported assertion failure in C1 with
> large frames.
> 
> Tested with failing mlvm sajdi tests from report plus the regular
> tmtools and sajdi test to stress the stack walking.
>