From christian.thalinger at oracle.com  Wed Apr  1 00:36:28 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Tue, 31 Mar 2015 17:36:28 -0700
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <551A0B72.8090208@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DC36A6@FMSMSX102.amr.corp.intel.com>
	<CANQc0nfzDfvTRk3fMA-O5V+tG47jJTEjbZUR=f=xEruqHORy3g@mail.gmail.com>
	<C568518E7B433348B114B6A7122D474755DC3DB3@FMSMSX102.amr.corp.intel.com>
	<CANQc0nfNBYLpQDCsjjQ95o-Zokei6ue+fUDT6mdTtDeV=S+Eqg@mail.gmail.com>
	<C568518E7B433348B114B6A7122D474755DC4141@FMSMSX102.amr.corp.intel.com>
	<D167FEE1-6E45-4A76-821E-1BAD5201873D@oracle.com>
	<C568518E7B433348B114B6A7122D474755DC418F@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474755DC46D2@FMSMSX102.amr.corp.intel.com>
	<55133E1C.7070803@oracle.com> <5519FC8B.6070602@oracle.com>
	<C568518E7B433348B114B6A7122D474755DCDBCC@FMSMSX102.amr.corp.intel.com>
	<551A0B72.8090208@oracle.com>
Message-ID: <275EB0FD-4915-494F-ABAF-B680AF06E8D6@oracle.com>


> On Mar 30, 2015, at 7:50 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> On 3/30/15 7:20 PM, Berg, Michael C wrote:
>> Almost, it's more than that, there are missing components in long support in AVX2, so we only allow what superword can currently process safely and bypass the question of long support for reductions until AVX3, where support is complete enough to allow those forms of reductions.
> 
> Okay.
> 
>> Nils was the initial reviewer and sponsor, so Nils can you make another pass and comment on the current webrev for the review.
> 
> Nils is out for few days. Christian looked on this too, let him do second review.

The only comment I have is this opening brace should be one the same line:

+ void SuperWord::packset_sort(int n)
+ {

Otherwise this looks good.  Thanks.

> 
> Thanks,
> Vladimir
> 
>> 
>> Thanks,
>> 
>> -Michael
>> 
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Monday, March 30, 2015 6:47 PM
>> To: Berg, Michael C
>> Cc: 'hotspot-compiler-dev at openjdk.java.net'
>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
>> 
>> Here is updated webrev which addressed these and other issues:
>> 
>> http://cr.openjdk.java.net/~kvn/8074981/webrev.01/
>> 
>> Michael, I noticed that .ad file does not have matched instructions for AddReductionVL. I assume it is because there is no avx3 yet. Right?
>> 
>> Otherwise this look good to me. You need second review from an other Reviewer since changes are big.
>> 
>> Thanks,
>> Vladimir
>> 
>> On 3/25/15 4:00 PM, Vladimir Kozlov wrote:
>>> Please, ignore previous email. I screwed up Michael's email address.
>>> 
>>> Hi Michael,
>>> 
>>> I have few major concerns which you need to address.
>>> 
>>> Adding new field _attr to Node class should be avoided - it will
>>> increase significantly memory footprint of graph and not be used
>>> frequently (vectorization is rare case).
>>> 
>>> NodeFlags has only 16 bits and you used 2. And I don't see how
>>> Flag_is_loop_carried_dep is used.
>>> 
>>> All above goes to one question: why mark_reductions() is executed in
>>> loopopts before each unroll and not during superword processing?
>>> If you do mark_reductions() in superword you can use VectorSet to
>>> indicate nodes which are reduction nodes.
>>> 
>>> And the same for _attr. Why to store alignment in Node and not use
>>> _node_info in packset_eval()?
>>> 
>>> Small note. Instead of:
>>> +        Node *defNode = n->in(len - 1);
>>> use:
>>> +        Node *defNode = n->in(LoopNode::LoopBackControl);
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> 
>>> On 3/25/15 1:09 PM, Berg, Michael C wrote:
>>>> Christian/Nils: Any additional comments for the review, if not
>>>> Thursday I will upload the final webrev with the requested change.
>>>> 
>>>> Thanks,
>>>> 
>>>> -Michael
>>>> 
>>>> *From:*Berg, Michael C
>>>> *Sent:* Thursday, March 19, 2015 5:55 PM
>>>> *To:* Christian Thalinger
>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net
>>>> *Subject:* RE: RFR(L): 8074981 (Integer/FP scalar reduction
>>>> optimization )
>>>> 
>>>> Christian, yes we could rely on the base class definitions instead,
>>>> since we are not augmenting arguments.
>>>> 
>>>> I will remove the file changes after the review concludes in case
>>>> there are any other modifications.
>>>> 
>>>> Thanks,
>>>> 
>>>> -Michael
>>>> 
>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>> *Sent:* Thursday, March 19, 2015 3:52 PM
>>>> *To:* Berg, Michael C
>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net
>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>> *Subject:* Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>> optimization )
>>>> 
>>>>     On Mar 19, 2015, at 3:23 PM, Berg, Michael C
>>>>     <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>> 
>>>>     I have updated the webrev contents after some feedback(with no code
>>>>     changes), and Vladimir has placed it in location everyone can access.
>>>>     Anyone should be able to apply the patch or review the code from
>>>>     this info:
>>>> 
>>>>     http://cr.openjdk.java.net/~kvn/8074981/webrev.00/
>>>> 
>>>> src/cpu/x86/vm/macroAssembler_x86.hpp:
>>>> 
>>>> Why do we need these methods?  MacroAssembler extends Assembler.
>>>> 
>>>> 
>>>>     this replaces the JBS version of the webrev files for 8074981.
>>>> 
>>>>     Thanks,
>>>> 
>>>>     -Michael
>>>> 
>>>>     -----Original Message-----
>>>>     From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>>>>     Sent: Thursday, March 19, 2015 12:55 AM
>>>>     To: Berg, Michael C
>>>>     Cc: hotspot-compiler-dev at openjdk.java.net
>>>>     <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>     Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>> optimization )
>>>> 
>>>>     Michael,
>>>> 
>>>>     I've got it, thank you for explanation.
>>>> 
>>>>     Regards,
>>>>     Filipp.
>>>> 
>>>>     On Wed, Mar 18, 2015 at 5:53 PM, Berg, Michael C
>>>>     <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>> 
>>>>         Filipp, for large iteration loops, if I am taking your meaning
>>>>         correctly, you could not do that without splitting the loop and
>>>>         re-architecting it into a loop nest pair to manage the reduction
>>>>         components.  Seems like the overhead from that scenario could
>>>>         create cost issues where reductions could actually hamper
>>>>         performance in small vector expressions.  Right now we never
>>>>         degrade and generally benefit with the implementation as it
>>>>         stands with the reductions stitched into the vector unit
>>>>         computations directly.
>>>> 
>>>>         Regarding sub/div/etc:
>>>>         For now we have waived off on non-commuting operations like sub
>>>>         and div, they would have to be very strictly managed via
>>>>         pack-set placement.  But the answer is yes we could support them.
>>>> 
>>>>         Thanks,
>>>>         -Michael
>>>> 
>>>>         -----Original Message-----
>>>>         From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>>>>         Sent: Wednesday, March 18, 2015 2:20 AM
>>>>         To: Berg, Michael C
>>>>         Cc: hotspot-compiler-dev at openjdk.java.net
>>>>         <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>         Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>         optimization
>>>>         )
>>>> 
>>>>         Hi Michael,
>>>> 
>>>>         thank you for contributing such a great improvement!
>>>> 
>>>>         Sorry if my question is silly, but I'm curious wouldn't it be
>>>>         better to replace integer scalar reduction variable with a
>>>>         vector "Rv" in loop's prologue, compile loop's body as a regular
>>>>         vectorized addition/multiplication, and reduce "Rv" to a scalar
>>>>         in loop's epilogue?
>>>> 
>>>>         Why you didn't add SubReduction* nodes?
>>>> 
>>>>         Best regards,
>>>>         Filipp.
>>>> 
>>>> 
>>>>         On Tue, Mar 17, 2015 at 12:40 AM, Berg, Michael C
>>>>         <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>> wrote:
>>>> 
>>>>             Hi All,
>>>> 
>>>> 
>>>> 
>>>>             We would like to contribute the Integer/FP scalar reduction
>>>>             optimization from Intel.
>>>> 
>>>>             The contribution is referenced as Bug ID 8074981 as a
>>>>             performance
>>>>             enhancement.
>>>> 
>>>> 
>>>> 
>>>>             Please review this patch:
>>>> 
>>>>             Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981
>>>> 
>>>>             webrev:
>>>> 
>>>> https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip
>>>> 
>>>> 
>>>> 
>>>>             The optimization achieves as much as 2.3x on integer
>>>>             reductions and
>>>>             supports float and double precision optimizations
>>>> 
>>>>             which also have significant optimization uplift an obey
>>>>             strict fp
>>>>             constraints.
>>>> 
>>>> 
>>>> 
>>>>             Nils Eliasson has offered to sponsor this patch.
>>>> 
>>>> 
>>>> 
>>>>             Thanks,
>>>> 
>>>> 
>>>> 
>>>>             -Michael
>>>> 


From vladimir.kozlov at oracle.com  Wed Apr  1 00:37:42 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 31 Mar 2015 17:37:42 -0700
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <275EB0FD-4915-494F-ABAF-B680AF06E8D6@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DC36A6@FMSMSX102.amr.corp.intel.com>
	<CANQc0nfzDfvTRk3fMA-O5V+tG47jJTEjbZUR=f=xEruqHORy3g@mail.gmail.com>
	<C568518E7B433348B114B6A7122D474755DC3DB3@FMSMSX102.amr.corp.intel.com>
	<CANQc0nfNBYLpQDCsjjQ95o-Zokei6ue+fUDT6mdTtDeV=S+Eqg@mail.gmail.com>
	<C568518E7B433348B114B6A7122D474755DC4141@FMSMSX102.amr.corp.intel.com>
	<D167FEE1-6E45-4A76-821E-1BAD5201873D@oracle.com>
	<C568518E7B433348B114B6A7122D474755DC418F@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474755DC46D2@FMSMSX102.amr.corp.intel.com>
	<55133E1C.7070803@oracle.com> <5519FC8B.6070602@oracle.com>
	<C568518E7B433348B114B6A7122D474755DCDBCC@FMSMSX102.amr.corp.intel.com>
	<551A0B72.8090208@oracle.com>
	<275EB0FD-4915-494F-ABAF-B680AF06E8D6@oracle.com>
Message-ID: <551B3DD6.9060807@oracle.com>


On 3/31/15 5:36 PM, Christian Thalinger wrote:
>
>> On Mar 30, 2015, at 7:50 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> On 3/30/15 7:20 PM, Berg, Michael C wrote:
>>> Almost, it's more than that, there are missing components in long support in AVX2, so we only allow what superword can currently process safely and bypass the question of long support for reductions until AVX3, where support is complete enough to allow those forms of reductions.
>>
>> Okay.
>>
>>> Nils was the initial reviewer and sponsor, so Nils can you make another pass and comment on the current webrev for the review.
>>
>> Nils is out for few days. Christian looked on this too, let him do second review.
>
> The only comment I have is this opening brace should be one the same line:
>
> + void SuperWord::packset_sort(int n)
> + {

I will fix it before push.

Thanks,
Vladimir

>
> Otherwise this looks good.  Thanks.
>
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Thanks,
>>>
>>> -Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Monday, March 30, 2015 6:47 PM
>>> To: Berg, Michael C
>>> Cc: 'hotspot-compiler-dev at openjdk.java.net'
>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
>>>
>>> Here is updated webrev which addressed these and other issues:
>>>
>>> http://cr.openjdk.java.net/~kvn/8074981/webrev.01/
>>>
>>> Michael, I noticed that .ad file does not have matched instructions for AddReductionVL. I assume it is because there is no avx3 yet. Right?
>>>
>>> Otherwise this look good to me. You need second review from an other Reviewer since changes are big.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/25/15 4:00 PM, Vladimir Kozlov wrote:
>>>> Please, ignore previous email. I screwed up Michael's email address.
>>>>
>>>> Hi Michael,
>>>>
>>>> I have few major concerns which you need to address.
>>>>
>>>> Adding new field _attr to Node class should be avoided - it will
>>>> increase significantly memory footprint of graph and not be used
>>>> frequently (vectorization is rare case).
>>>>
>>>> NodeFlags has only 16 bits and you used 2. And I don't see how
>>>> Flag_is_loop_carried_dep is used.
>>>>
>>>> All above goes to one question: why mark_reductions() is executed in
>>>> loopopts before each unroll and not during superword processing?
>>>> If you do mark_reductions() in superword you can use VectorSet to
>>>> indicate nodes which are reduction nodes.
>>>>
>>>> And the same for _attr. Why to store alignment in Node and not use
>>>> _node_info in packset_eval()?
>>>>
>>>> Small note. Instead of:
>>>> +        Node *defNode = n->in(len - 1);
>>>> use:
>>>> +        Node *defNode = n->in(LoopNode::LoopBackControl);
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>
>>>> On 3/25/15 1:09 PM, Berg, Michael C wrote:
>>>>> Christian/Nils: Any additional comments for the review, if not
>>>>> Thursday I will upload the final webrev with the requested change.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Michael
>>>>>
>>>>> *From:*Berg, Michael C
>>>>> *Sent:* Thursday, March 19, 2015 5:55 PM
>>>>> *To:* Christian Thalinger
>>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net
>>>>> *Subject:* RE: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>> optimization )
>>>>>
>>>>> Christian, yes we could rely on the base class definitions instead,
>>>>> since we are not augmenting arguments.
>>>>>
>>>>> I will remove the file changes after the review concludes in case
>>>>> there are any other modifications.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Michael
>>>>>
>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>> *Sent:* Thursday, March 19, 2015 3:52 PM
>>>>> *To:* Berg, Michael C
>>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net
>>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>> *Subject:* Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>> optimization )
>>>>>
>>>>>      On Mar 19, 2015, at 3:23 PM, Berg, Michael C
>>>>>      <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>
>>>>>      I have updated the webrev contents after some feedback(with no code
>>>>>      changes), and Vladimir has placed it in location everyone can access.
>>>>>      Anyone should be able to apply the patch or review the code from
>>>>>      this info:
>>>>>
>>>>>      http://cr.openjdk.java.net/~kvn/8074981/webrev.00/
>>>>>
>>>>> src/cpu/x86/vm/macroAssembler_x86.hpp:
>>>>>
>>>>> Why do we need these methods?  MacroAssembler extends Assembler.
>>>>>
>>>>>
>>>>>      this replaces the JBS version of the webrev files for 8074981.
>>>>>
>>>>>      Thanks,
>>>>>
>>>>>      -Michael
>>>>>
>>>>>      -----Original Message-----
>>>>>      From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>>>>>      Sent: Thursday, March 19, 2015 12:55 AM
>>>>>      To: Berg, Michael C
>>>>>      Cc: hotspot-compiler-dev at openjdk.java.net
>>>>>      <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>      Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>> optimization )
>>>>>
>>>>>      Michael,
>>>>>
>>>>>      I've got it, thank you for explanation.
>>>>>
>>>>>      Regards,
>>>>>      Filipp.
>>>>>
>>>>>      On Wed, Mar 18, 2015 at 5:53 PM, Berg, Michael C
>>>>>      <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>>>>
>>>>>          Filipp, for large iteration loops, if I am taking your meaning
>>>>>          correctly, you could not do that without splitting the loop and
>>>>>          re-architecting it into a loop nest pair to manage the reduction
>>>>>          components.  Seems like the overhead from that scenario could
>>>>>          create cost issues where reductions could actually hamper
>>>>>          performance in small vector expressions.  Right now we never
>>>>>          degrade and generally benefit with the implementation as it
>>>>>          stands with the reductions stitched into the vector unit
>>>>>          computations directly.
>>>>>
>>>>>          Regarding sub/div/etc:
>>>>>          For now we have waived off on non-commuting operations like sub
>>>>>          and div, they would have to be very strictly managed via
>>>>>          pack-set placement.  But the answer is yes we could support them.
>>>>>
>>>>>          Thanks,
>>>>>          -Michael
>>>>>
>>>>>          -----Original Message-----
>>>>>          From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>>>>>          Sent: Wednesday, March 18, 2015 2:20 AM
>>>>>          To: Berg, Michael C
>>>>>          Cc: hotspot-compiler-dev at openjdk.java.net
>>>>>          <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>          Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>>          optimization
>>>>>          )
>>>>>
>>>>>          Hi Michael,
>>>>>
>>>>>          thank you for contributing such a great improvement!
>>>>>
>>>>>          Sorry if my question is silly, but I'm curious wouldn't it be
>>>>>          better to replace integer scalar reduction variable with a
>>>>>          vector "Rv" in loop's prologue, compile loop's body as a regular
>>>>>          vectorized addition/multiplication, and reduce "Rv" to a scalar
>>>>>          in loop's epilogue?
>>>>>
>>>>>          Why you didn't add SubReduction* nodes?
>>>>>
>>>>>          Best regards,
>>>>>          Filipp.
>>>>>
>>>>>
>>>>>          On Tue, Mar 17, 2015 at 12:40 AM, Berg, Michael C
>>>>>          <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>> wrote:
>>>>>
>>>>>              Hi All,
>>>>>
>>>>>
>>>>>
>>>>>              We would like to contribute the Integer/FP scalar reduction
>>>>>              optimization from Intel.
>>>>>
>>>>>              The contribution is referenced as Bug ID 8074981 as a
>>>>>              performance
>>>>>              enhancement.
>>>>>
>>>>>
>>>>>
>>>>>              Please review this patch:
>>>>>
>>>>>              Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981
>>>>>
>>>>>              webrev:
>>>>>
>>>>> https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip
>>>>>
>>>>>
>>>>>
>>>>>              The optimization achieves as much as 2.3x on integer
>>>>>              reductions and
>>>>>              supports float and double precision optimizations
>>>>>
>>>>>              which also have significant optimization uplift an obey
>>>>>              strict fp
>>>>>              constraints.
>>>>>
>>>>>
>>>>>
>>>>>              Nils Eliasson has offered to sponsor this patch.
>>>>>
>>>>>
>>>>>
>>>>>              Thanks,
>>>>>
>>>>>
>>>>>
>>>>>              -Michael
>>>>>
>

From roland.westrelin at oracle.com  Wed Apr  1 07:39:53 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 1 Apr 2015 09:39:53 +0200
Subject: RFR(S): 8075587: Compilation of constant array containing
	different sub classes crashes the JVM
In-Reply-To: <55198DBE.2040203@oracle.com>
References: <B001C9FC-3FA8-4526-9AD2-BEA162F65308@oracle.com>
	<55198DBE.2040203@oracle.com>
Message-ID: <64C6CBF7-5437-4B58-819B-780C32127A76@oracle.com>

Thanks for the review, Vladimir.

Roland.

> On Mar 30, 2015, at 7:54 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Good.
> 
> Thanks,
> Vladimir
> 
> On 3/27/15 6:05 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8075587/webrev.00/
>> 
>> The bug was introduced by:
>> 
>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/5231c2210388
>> 
>> which causes the meet of 2 constant arrays to result in an array of elements of type bottom.
>> 
>> Roland.
>> 


From zoltan.majo at oracle.com  Wed Apr  1 13:38:27 2015
From: zoltan.majo at oracle.com (=?windows-1252?Q?Zolt=E1n_Maj=F3?=)
Date: Wed, 01 Apr 2015 15:38:27 +0200
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <5519C29D.8080200@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>
	<5519C29D.8080200@oracle.com>
Message-ID: <551BF4D3.90805@oracle.com>

Hi Vladimir,


On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> thank you for the feedback!
>>
>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>> How about PreserveFramePointer instead of simple FramePointer?
>>>
>>> PreserveFramePointer will mean that compiled (or other) code will use
>>> that register only as Frame pointer.
>>
>> I will change the flag's name to PreserveFramePointer and will also
>> update the description.
>>
>>> Zoltan, x86 flags setting should be in general globals_x86.hpp. You
>>> can #ifdef _LP64 there too. I don't understand why you only set it to
>>> true on linux-x64.
>>
>> I remembered that the original discussion with Brendan Gregg mentioned
>> only Linux's perf tool as a possible use case for "proper" frame
>> pointers. So I was unsure whether to enable proper frame pointers by
>> default on other x64 platforms as well.
>>
>> But if you think it would be better to have proper frame pointers on all
>> x64 platforms, I will change the code to set PreserveFramePointer to
>> true for all x64 platforms. Just please let me know.
>
> Currently compiled code for all x86 platforms is almost the same 
> (win64 has difference in registers usage) and we should keep it that way.
>
> Also the original request was to have flag to enable such behavior 
> (use RBP only as FP). So to have it off by default is acceptable. If 
> performance group or someone find a regression (or bug) due to this 
> change we can switch the flag off by default before jdk9 release.
>
> Try to run pstack on Solaris and jstack on OSX to make sure they 
> report correct call stack with compiled java methods. And JFR.
> Also it would be nice to run SunStudio analyzer to verify that it works.

I ran all tools you've suggested. JFR and jstack is unaffected, pstack 
produces nice stack traces (it did not always do so before).

However, I've encountered a problem with SunStudio: Two asserts fail in 
the fastdebug build. Both of them  "soft" failures, as neither the VM 
nor SunStudio crash with the product build. I worked on the problem 
today and have a partial understanding of the issue, but more 
investigation is needed to have a patch that preserves the correct 
behavior of SunStudio as well.

So that will put this RFR on hold for a while, unfortunately.

Thank you for the feedback and suggestions so far!

Best regards,


Zoltan


>
> Thanks,
> Vladimir
>
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>> Hi Ed,
>>>>
>>>>
>>>> thank you for your feedback! Please see comments below.
>>>>
>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>> Hi Zolt?n,
>>>>>
>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler
>>>>>> tests and
>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32. All 
>>>>>> tests
>>>>>> that pass without the patch pass also with the patch.
>>>>>>
>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>> infrastructure for
>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>> statistically significant performance degradation due to having 
>>>>>> proper
>>>>>> frame pointers. Therefore I propose to have OmitFramePointer set to
>>>>>> false by default on x86_64 (and set to true on all other platforms).
>>>>> This patch looks good, however I think there is a problem with the
>>>>> logic of OmitFramePointer.
>>>>>
>>>>> Here is my test case.
>>>>>
>>>>> --- CUT HERE ---
>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>
>>>>> public class fibo {
>>>>>      public static void main(String args[]) {
>>>>>     int N = Integer.parseInt(args[0]);
>>>>>     System.out.println(fib(N));
>>>>>      }
>>>>>      public static int fib(int n) {
>>>>>     if (n < 2) return(1);
>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>      }
>>>>> }
>>>>> --- CUT HERE ---
>>>>>
>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>
>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation 
>>>>> -XX:+PrintCompilation
>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>
>>>>> I get
>>>>>
>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>    # parm0:    rsi       = int
>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>    0x00007fc625071100: mov    %eax,-0x14000(%rsp)
>>>>>    0x00007fc625071107: push   %rbp
>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization entry
>>>>>
>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is 
>>>>> using
>>>>> the frame pointer
>>>>>
>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>> -XX:+OmitFramePointer in the above I get
>>>>>
>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation 
>>>>> -XX:+PrintCompilation
>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>
>>>>> I get
>>>>>
>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>    # parm0:    rsi       = int
>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>    0x00007f14e1071100: mov    %eax,-0x14000(%rsp)
>>>>>    0x00007f14e1071107: push   %rbp
>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>
>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it does not
>>>>> use a frame pointer.
>>>>>
>>>>> However, if I now delete the -XX:+/-OmitFramePointer altogether, IE
>>>>>
>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation 
>>>>> -XX:+PrintCompilation
>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>> -XX:+PrintAssembly fibo 43
>>>>>
>>>>> I get
>>>>>
>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>    # parm0:    rsi       = int
>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>    0x00007f0c75071100: mov    %eax,-0x14000(%rsp)
>>>>>    0x00007f0c75071107: push   %rbp
>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>
>>>>> It is not using a frame pointer which is the equivalent of
>>>>> -XX:+OmitFramePointer. However in your description above you say
>>>>>
>>>>>> Therefore I propose to have OmitFramePointer set to false by default
>>>>>> on x86_64 (and set to true on all other platforms).
>>>>> whereas OmitFramePointer actually seems to be set to true on x86_64
>>>>>
>>>>> I think the problem may be with the declaration and definition of
>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>
>>>>> In globals.hpp it does
>>>>>
>>>>> product(bool, OmitFramePointer, true,
>>>>>
>>>>> In globals_x86.hpp it does
>>>>>
>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>
>>>>> I am not sure that you can mix product(...) and product_pd(...) like
>>>>> this, so I think it just ends up getting the default from the
>>>>> product(...).
>>>>
>>>> You are right, mixing product and product_pd does not make sense at 
>>>> all.
>>>> Thank you for doing additional testing and for drawing attention to 
>>>> the
>>>> problem.
>>>>
>>>> I updated the code to use product_pd and define_pd_global on all
>>>> relevant platforms.
>>>>
>>>>> Aside: In general, I do not like options which include a negative in
>>>>> them because I have to do a double think when I see something like,
>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer,
>>>>> therefore it is using a frame pointer. How about FramePointer so we
>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>> -XX:-FramePointer to say I don't.
>>>>
>>>> That is a good idea. Double negation is an unnecessary 
>>>> complication, so
>>>> I changed the name of the flag to FramePointer, just as you suggested.
>>>>
>>>>>
>>>>> I did some timing on the above 'fibo' test
>>>>>
>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>> -XX:-OmitFramePointer fibo 43
>>>>> 701408733
>>>>>
>>>>> real    0m1.545s
>>>>> user    0m1.571s
>>>>> sys    0m0.015s
>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>> -XX:+OmitFramePointer fibo 43
>>>>> 701408733
>>>>>
>>>>> real    0m1.504s
>>>>> user    0m1.527s
>>>>> sys    0m0.019s
>>>>>
>>>>> which is ~3% difference on this test case. On aarch64, I see ~7%
>>>>> difference on this test case.
>>>>
>>>> Thank you for the performance measurements!
>>>>
>>>>> With the above change to fix the logic of OmitFramePointer (and
>>>>> possible change its name) the patch looks good to me.
>>>>
>>>> Here is the updated webrev (the same webrev that was already included
>>>> into my reply to Roland):
>>>>
>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>
>>>>> I will prepare a mirror patch for aarch64.
>>>>
>>>> That would be great!
>>>>
>>>> Thank you and best regards,
>>>>
>>>>
>>>> Zolt?n
>>>>
>>>>>
>>>>> All the best,
>>>>> Ed.
>>>>>
>>>>>
>>>>>
>>>>
>>


From vladimir.kozlov at oracle.com  Wed Apr  1 17:40:33 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 01 Apr 2015 10:40:33 -0700
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <551B3DD6.9060807@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DC36A6@FMSMSX102.amr.corp.intel.com>	<CANQc0nfzDfvTRk3fMA-O5V+tG47jJTEjbZUR=f=xEruqHORy3g@mail.gmail.com>	<C568518E7B433348B114B6A7122D474755DC3DB3@FMSMSX102.amr.corp.intel.com>	<CANQc0nfNBYLpQDCsjjQ95o-Zokei6ue+fUDT6mdTtDeV=S+Eqg@mail.gmail.com>	<C568518E7B433348B114B6A7122D474755DC4141@FMSMSX102.amr.corp.intel.com>	<D167FEE1-6E45-4A76-821E-1BAD5201873D@oracle.com>	<C568518E7B433348B114B6A7122D474755DC418F@FMSMSX102.amr.corp.intel.com>	<C568518E7B433348B114B6A7122D474755DC46D2@FMSMSX102.amr.corp.intel.com>	<55133E1C.7070803@oracle.com>
	<5519FC8B.6070602@oracle.com>	<C568518E7B433348B114B6A7122D474755DCDBCC@FMSMSX102.amr.corp.intel.com>	<551A0B72.8090208@oracle.com>	<275EB0FD-4915-494F-ABAF-B680AF06E8D6@oracle.com>
	<551B3DD6.9060807@oracle.com>
Message-ID: <551C2D91.4010806@oracle.com>

The push failed testing on Sparc with 64-bit fastdebug VM.
Michael, could you look what could go wrong there?
I will try to reproduce it on sparc.


hotspot/test/compiler/codegen/7100757/Test7100757.java

#  Internal Error 
(/opt/jprt/T/P1/031135.vkozlov/s/hotspot/src/share/vm/opto/superword.cpp:1742), 
pid=9157, tid=23
#  assert(_stk.length() == 0) failed: stk is empty
#

Stack: [0x0007fffeccc00000,0x0007fffeccd00000],  sp=0x0007fffecccf6780, 
  free space=985k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, 
C=native code)
V  [libjvm.so+0x15e78d4]  void VMError::report_and_die()+0x6c4
V  [libjvm.so+0xa762d0]  void report_vm_error(const char*,int,const 
char*,const char*)+0x70
V  [libjvm.so+0x14acd8c]  bool SuperWord::construct_bb()+0x4c
V  [libjvm.so+0x14a2704]  void SuperWord::SLP_extract()+0xc
V  [libjvm.so+0x10d8b6c]  void 
PhaseIdealLoop::build_and_optimize(bool,bool)+0x1374
V  [libjvm.so+0x9d9c70]  void Compile::Optimize()+0x24c8
V  [libjvm.so+0x9d0fdc]  Compile::Compile #Nvariant 
1(ciEnv*,C2Compiler*,ciMethod*,int,bool,bool,bool)+0x12fc
V  [libjvm.so+0x87595c]  void 
C2Compiler::compile_method(ciEnv*,ciMethod*,int)+0xf4

Current CompileTask:
C2:   1177  156       4       Test7100757::test (274 bytes)

Thanks,
Vladimir

On 3/31/15 5:37 PM, Vladimir Kozlov wrote:
>
> On 3/31/15 5:36 PM, Christian Thalinger wrote:
>>
>>> On Mar 30, 2015, at 7:50 PM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com> wrote:
>>>
>>> On 3/30/15 7:20 PM, Berg, Michael C wrote:
>>>> Almost, it's more than that, there are missing components in long
>>>> support in AVX2, so we only allow what superword can currently
>>>> process safely and bypass the question of long support for
>>>> reductions until AVX3, where support is complete enough to allow
>>>> those forms of reductions.
>>>
>>> Okay.
>>>
>>>> Nils was the initial reviewer and sponsor, so Nils can you make
>>>> another pass and comment on the current webrev for the review.
>>>
>>> Nils is out for few days. Christian looked on this too, let him do
>>> second review.
>>
>> The only comment I have is this opening brace should be one the same
>> line:
>>
>> + void SuperWord::packset_sort(int n)
>> + {
>
> I will fix it before push.
>
> Thanks,
> Vladimir
>
>>
>> Otherwise this looks good.  Thanks.
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Thanks,
>>>>
>>>> -Michael
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> Sent: Monday, March 30, 2015 6:47 PM
>>>> To: Berg, Michael C
>>>> Cc: 'hotspot-compiler-dev at openjdk.java.net'
>>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>> optimization )
>>>>
>>>> Here is updated webrev which addressed these and other issues:
>>>>
>>>> http://cr.openjdk.java.net/~kvn/8074981/webrev.01/
>>>>
>>>> Michael, I noticed that .ad file does not have matched instructions
>>>> for AddReductionVL. I assume it is because there is no avx3 yet. Right?
>>>>
>>>> Otherwise this look good to me. You need second review from an other
>>>> Reviewer since changes are big.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/25/15 4:00 PM, Vladimir Kozlov wrote:
>>>>> Please, ignore previous email. I screwed up Michael's email address.
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> I have few major concerns which you need to address.
>>>>>
>>>>> Adding new field _attr to Node class should be avoided - it will
>>>>> increase significantly memory footprint of graph and not be used
>>>>> frequently (vectorization is rare case).
>>>>>
>>>>> NodeFlags has only 16 bits and you used 2. And I don't see how
>>>>> Flag_is_loop_carried_dep is used.
>>>>>
>>>>> All above goes to one question: why mark_reductions() is executed in
>>>>> loopopts before each unroll and not during superword processing?
>>>>> If you do mark_reductions() in superword you can use VectorSet to
>>>>> indicate nodes which are reduction nodes.
>>>>>
>>>>> And the same for _attr. Why to store alignment in Node and not use
>>>>> _node_info in packset_eval()?
>>>>>
>>>>> Small note. Instead of:
>>>>> +        Node *defNode = n->in(len - 1);
>>>>> use:
>>>>> +        Node *defNode = n->in(LoopNode::LoopBackControl);
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>
>>>>> On 3/25/15 1:09 PM, Berg, Michael C wrote:
>>>>>> Christian/Nils: Any additional comments for the review, if not
>>>>>> Thursday I will upload the final webrev with the requested change.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -Michael
>>>>>>
>>>>>> *From:*Berg, Michael C
>>>>>> *Sent:* Thursday, March 19, 2015 5:55 PM
>>>>>> *To:* Christian Thalinger
>>>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net
>>>>>> *Subject:* RE: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>>> optimization )
>>>>>>
>>>>>> Christian, yes we could rely on the base class definitions instead,
>>>>>> since we are not augmenting arguments.
>>>>>>
>>>>>> I will remove the file changes after the review concludes in case
>>>>>> there are any other modifications.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -Michael
>>>>>>
>>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>>>>> *Sent:* Thursday, March 19, 2015 3:52 PM
>>>>>> *To:* Berg, Michael C
>>>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net
>>>>>> <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>> *Subject:* Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>>> optimization )
>>>>>>
>>>>>>      On Mar 19, 2015, at 3:23 PM, Berg, Michael C
>>>>>>      <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>>> wrote:
>>>>>>
>>>>>>      I have updated the webrev contents after some feedback(with
>>>>>> no code
>>>>>>      changes), and Vladimir has placed it in location everyone can
>>>>>> access.
>>>>>>      Anyone should be able to apply the patch or review the code from
>>>>>>      this info:
>>>>>>
>>>>>>      http://cr.openjdk.java.net/~kvn/8074981/webrev.00/
>>>>>>
>>>>>> src/cpu/x86/vm/macroAssembler_x86.hpp:
>>>>>>
>>>>>> Why do we need these methods?  MacroAssembler extends Assembler.
>>>>>>
>>>>>>
>>>>>>      this replaces the JBS version of the webrev files for 8074981.
>>>>>>
>>>>>>      Thanks,
>>>>>>
>>>>>>      -Michael
>>>>>>
>>>>>>      -----Original Message-----
>>>>>>      From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>>>>>>      Sent: Thursday, March 19, 2015 12:55 AM
>>>>>>      To: Berg, Michael C
>>>>>>      Cc: hotspot-compiler-dev at openjdk.java.net
>>>>>>      <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>>      Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>>> optimization )
>>>>>>
>>>>>>      Michael,
>>>>>>
>>>>>>      I've got it, thank you for explanation.
>>>>>>
>>>>>>      Regards,
>>>>>>      Filipp.
>>>>>>
>>>>>>      On Wed, Mar 18, 2015 at 5:53 PM, Berg, Michael C
>>>>>>      <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>>> wrote:
>>>>>>
>>>>>>          Filipp, for large iteration loops, if I am taking your
>>>>>> meaning
>>>>>>          correctly, you could not do that without splitting the
>>>>>> loop and
>>>>>>          re-architecting it into a loop nest pair to manage the
>>>>>> reduction
>>>>>>          components.  Seems like the overhead from that scenario
>>>>>> could
>>>>>>          create cost issues where reductions could actually hamper
>>>>>>          performance in small vector expressions.  Right now we never
>>>>>>          degrade and generally benefit with the implementation as it
>>>>>>          stands with the reductions stitched into the vector unit
>>>>>>          computations directly.
>>>>>>
>>>>>>          Regarding sub/div/etc:
>>>>>>          For now we have waived off on non-commuting operations
>>>>>> like sub
>>>>>>          and div, they would have to be very strictly managed via
>>>>>>          pack-set placement.  But the answer is yes we could
>>>>>> support them.
>>>>>>
>>>>>>          Thanks,
>>>>>>          -Michael
>>>>>>
>>>>>>          -----Original Message-----
>>>>>>          From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com]
>>>>>>          Sent: Wednesday, March 18, 2015 2:20 AM
>>>>>>          To: Berg, Michael C
>>>>>>          Cc: hotspot-compiler-dev at openjdk.java.net
>>>>>>          <mailto:hotspot-compiler-dev at openjdk.java.net>
>>>>>>          Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction
>>>>>>          optimization
>>>>>>          )
>>>>>>
>>>>>>          Hi Michael,
>>>>>>
>>>>>>          thank you for contributing such a great improvement!
>>>>>>
>>>>>>          Sorry if my question is silly, but I'm curious wouldn't
>>>>>> it be
>>>>>>          better to replace integer scalar reduction variable with a
>>>>>>          vector "Rv" in loop's prologue, compile loop's body as a
>>>>>> regular
>>>>>>          vectorized addition/multiplication, and reduce "Rv" to a
>>>>>> scalar
>>>>>>          in loop's epilogue?
>>>>>>
>>>>>>          Why you didn't add SubReduction* nodes?
>>>>>>
>>>>>>          Best regards,
>>>>>>          Filipp.
>>>>>>
>>>>>>
>>>>>>          On Tue, Mar 17, 2015 at 12:40 AM, Berg, Michael C
>>>>>>          <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>>>>>> wrote:
>>>>>>
>>>>>>              Hi All,
>>>>>>
>>>>>>
>>>>>>
>>>>>>              We would like to contribute the Integer/FP scalar
>>>>>> reduction
>>>>>>              optimization from Intel.
>>>>>>
>>>>>>              The contribution is referenced as Bug ID 8074981 as a
>>>>>>              performance
>>>>>>              enhancement.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              Please review this patch:
>>>>>>
>>>>>>              Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981
>>>>>>
>>>>>>              webrev:
>>>>>>
>>>>>> https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip
>>>>>>
>>>>>>
>>>>>>
>>>>>>              The optimization achieves as much as 2.3x on integer
>>>>>>              reductions and
>>>>>>              supports float and double precision optimizations
>>>>>>
>>>>>>              which also have significant optimization uplift an obey
>>>>>>              strict fp
>>>>>>              constraints.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              Nils Eliasson has offered to sponsor this patch.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              Thanks,
>>>>>>
>>>>>>
>>>>>>
>>>>>>              -Michael
>>>>>>
>>

From vladimir.x.ivanov at oracle.com  Wed Apr  1 20:56:50 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 01 Apr 2015 23:56:50 +0300
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
Message-ID: <551C5B92.8060500@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8057967

HotSpot JITs inline very aggressively through CallSites. The 
optimistically treat CallSite target as constant, but record a nmethod 
dependency to invalidate the compiled code once CallSite target changes.

Right now, such dependencies have call site class as a context. This 
context is too coarse and it leads to context pollution: if some 
CallSite target changes, VM needs to enumerate all nmethods which 
depends on call sites of such type.

As performance analysis in the bug report shows, it can sum to 
significant amount of work.

While working on the fix, I investigated 3 approaches:
   (1) unique context per call site
   (2) use CallSite target class
   (3) use a class the CallSite instance is linked to

Considering call sites are ubiquitous (e.g. 10,000s on some octane 
benchmarks), loading a dedicated class for every call site is an 
overkill (even VM anonymous).

CallSite target class 
(MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class<?>) is 
also not satisfactory, since it is a compiled LambdaForm VM anonymous 
class, which is heavily shared. It gets context pollution down, but 
still the overhead is quite high.

So, I decided to focus on (3) and ended up with a mixture of (2) & (3).

Comparing to other options, the complications of (3) are:
   - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so 
there should be some default context VM can use

   - CallSite instances can be shared and it shouldn't keep the context 
class from unloading;

It motivated a scheme where CallSite context is initialized lazily and 
can change during lifetime. When CallSite is linked with an indy 
instruction, it's context is initialized. Usually, JIT sees CallSite 
instances with initialized context (since it reaches them through indy), 
but if it's not the case and there's no context yet, JIT sets it to 
"default context", which means "use target call site".

I introduced CallSite$DependencyContext, which represents a nmethod 
dependency context and points (indirectly) to a Class<?> used as a context.

Context class is referenced through a phantom reference 
(sun.misc.Cleaner to simplify cleanup). Though it's impossible to 
extract referent using Reference.get(), VM can access it directly by 
reading corresponding field. Unlike other types of references, phantom 
references aren't cleared automatically. It allows VM to access context 
class until cleanup is performed. And cleanup resets the context to 
NULL, in addition to invalidating all relevant dependencies.

There are 3 context states a CallSite instance can be in:
   (1) NULL: no depedencies
   (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in 
call site target class
   (3) DependencyContext for some class: dependencies are stored on the 
class DependencyContext instance points to

Every CallSite starts w/o a context (1) and then lazily gets one ((2) or 
(3) depending on the situation).

State transitions:
   (1->3): When a CallSite w/o a context (1) is linked with some indy 
call site, it's owner is recorded as a context (3).

   (1->2): When JIT needs to record a dependency on a target of a 
CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and 
uses target class to store the dependency.

   (3->1): When context class becomes unreachable, a cleanup hook 
invalidates all dependencies on that CallSite and resets the context to 
NULL (1).

Only (3->1) requires dependency invalidation, because there are no 
depedencies in (1) and (2->1) isn't performed.

(1->3) is done in Java code (CallSite.initContext) and (1->2) is 
performed in VM (ciCallSite::get_context()). The updates are performed 
by CAS, so there's no need in additional synchronization. Other 
operations on VM side are volatile (to play well with Java code) and 
performed with Compile_lock held (to avoid races between VM operations).

Some statistics:
   Box2D, latest jdk9-dev
     - CallSite instances: ~22000

     - invalidated nmethods due to CallSite target changes: ~60

     - checked call_site_target_value dependencies:
       - before the fix: ~1,600,000
       - after the fix:        ~600

Testing:
   - dedicated test which excercises different state transitions
   - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn

Thanks!

Best regards,
Vladimir Ivanov

From john.r.rose at oracle.com  Thu Apr  2 02:27:00 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 1 Apr 2015 19:27:00 -0700
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <551C5B92.8060500@oracle.com>
References: <551C5B92.8060500@oracle.com>
Message-ID: <C5BCEFCD-B4E4-46C1-AE6B-F7491DC20FCD@oracle.com>

On Apr 1, 2015, at 1:56 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
> https://bugs.openjdk.java.net/browse/JDK-8057967

Impressive work.

Question:  How common is state 2 (context-free CS) compared to state 3 (indy-bound CS)?

And is state 2 well tested by Box2D?

I recommend putting CONTEXT_OFFSET into CallSite, not the nested class.
For one thing, your getDeclaredField call will fail (I think) with a security manager installed.
You can load it up where TARGET_OFFSET is initialized.

I haven't looked at the JVM changes yet, and I don't understand the cleaner, yet.

Can a call site target class change as a result of LF recompiling or customization?
If so, won't that cause a risk of dropped dependencies?

? John

From roland.westrelin at oracle.com  Thu Apr  2 07:39:36 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 2 Apr 2015 09:39:36 +0200
Subject: RFR(XS): 8076094: CheckCastPPNode::Value() has outdated logic for
	constants
In-Reply-To: <55142D20.3040407@oracle.com>
References: <C034ADFF-1F90-4F1E-9A1D-4C5445DAAA09@oracle.com>
	<55142D20.3040407@oracle.com>
Message-ID: <B1F28B46-1B76-4BAB-A118-D31EF20BE9D3@oracle.com>

Thanks for the reviews Vladimir & Vladimir.

Roland.

> On Mar 26, 2015, at 5:00 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 3/26/15 6:02 PM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8076094/webrev.00/
>> 
>> I noticed this logic in CheckCastPPNode::Value() that doesn?t seem to make sense and asked John about it. He said it might be outdated. I removed it and had it go through testing and saw no problem. I propose we remove it.
>> 
>> Roland.
>> 


From peter.levart at gmail.com  Thu Apr  2 08:02:17 2015
From: peter.levart at gmail.com (Peter Levart)
Date: Thu, 02 Apr 2015 10:02:17 +0200
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <551C5B92.8060500@oracle.com>
References: <551C5B92.8060500@oracle.com>
Message-ID: <551CF789.9080607@gmail.com>

Hi Vladimir,

Would it be possible for CallSite.context to hold the Cleaner instance 
itself (without indirection through DependencyContext)?

DEFAULT_CONTEXT would then be a Cleaner instance that references some 
"default" Class object (for example DefaultContext.class that serves no 
other purpose).

Regards, Peter

On 04/01/2015 10:56 PM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
> https://bugs.openjdk.java.net/browse/JDK-8057967
>
> HotSpot JITs inline very aggressively through CallSites. The 
> optimistically treat CallSite target as constant, but record a nmethod 
> dependency to invalidate the compiled code once CallSite target changes.
>
> Right now, such dependencies have call site class as a context. This 
> context is too coarse and it leads to context pollution: if some 
> CallSite target changes, VM needs to enumerate all nmethods which 
> depends on call sites of such type.
>
> As performance analysis in the bug report shows, it can sum to 
> significant amount of work.
>
> While working on the fix, I investigated 3 approaches:
>   (1) unique context per call site
>   (2) use CallSite target class
>   (3) use a class the CallSite instance is linked to
>
> Considering call sites are ubiquitous (e.g. 10,000s on some octane 
> benchmarks), loading a dedicated class for every call site is an 
> overkill (even VM anonymous).
>
> CallSite target class 
> (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class<?>) is 
> also not satisfactory, since it is a compiled LambdaForm VM anonymous 
> class, which is heavily shared. It gets context pollution down, but 
> still the overhead is quite high.
>
> So, I decided to focus on (3) and ended up with a mixture of (2) & (3).
>
> Comparing to other options, the complications of (3) are:
>   - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so 
> there should be some default context VM can use
>
>   - CallSite instances can be shared and it shouldn't keep the context 
> class from unloading;
>
> It motivated a scheme where CallSite context is initialized lazily and 
> can change during lifetime. When CallSite is linked with an indy 
> instruction, it's context is initialized. Usually, JIT sees CallSite 
> instances with initialized context (since it reaches them through 
> indy), but if it's not the case and there's no context yet, JIT sets 
> it to "default context", which means "use target call site".
>
> I introduced CallSite$DependencyContext, which represents a nmethod 
> dependency context and points (indirectly) to a Class<?> used as a 
> context.
>
> Context class is referenced through a phantom reference 
> (sun.misc.Cleaner to simplify cleanup). Though it's impossible to 
> extract referent using Reference.get(), VM can access it directly by 
> reading corresponding field. Unlike other types of references, phantom 
> references aren't cleared automatically. It allows VM to access 
> context class until cleanup is performed. And cleanup resets the 
> context to NULL, in addition to invalidating all relevant dependencies.
>
> There are 3 context states a CallSite instance can be in:
>   (1) NULL: no depedencies
>   (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in 
> call site target class
>   (3) DependencyContext for some class: dependencies are stored on the 
> class DependencyContext instance points to
>
> Every CallSite starts w/o a context (1) and then lazily gets one ((2) 
> or (3) depending on the situation).
>
> State transitions:
>   (1->3): When a CallSite w/o a context (1) is linked with some indy 
> call site, it's owner is recorded as a context (3).
>
>   (1->2): When JIT needs to record a dependency on a target of a 
> CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and 
> uses target class to store the dependency.
>
>   (3->1): When context class becomes unreachable, a cleanup hook 
> invalidates all dependencies on that CallSite and resets the context 
> to NULL (1).
>
> Only (3->1) requires dependency invalidation, because there are no 
> depedencies in (1) and (2->1) isn't performed.
>
> (1->3) is done in Java code (CallSite.initContext) and (1->2) is 
> performed in VM (ciCallSite::get_context()). The updates are performed 
> by CAS, so there's no need in additional synchronization. Other 
> operations on VM side are volatile (to play well with Java code) and 
> performed with Compile_lock held (to avoid races between VM operations).
>
> Some statistics:
>   Box2D, latest jdk9-dev
>     - CallSite instances: ~22000
>
>     - invalidated nmethods due to CallSite target changes: ~60
>
>     - checked call_site_target_value dependencies:
>       - before the fix: ~1,600,000
>       - after the fix:        ~600
>
> Testing:
>   - dedicated test which excercises different state transitions
>   - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn
>
> Thanks!
>
> Best regards,
> Vladimir Ivanov


From aleksey.shipilev at oracle.com  Thu Apr  2 16:10:53 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Thu, 02 Apr 2015 19:10:53 +0300
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <551C5B92.8060500@oracle.com>
References: <551C5B92.8060500@oracle.com>
Message-ID: <551D6A0D.8090500@oracle.com>

On 04/01/2015 11:56 PM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
> https://bugs.openjdk.java.net/browse/JDK-8057967

Glad to see this finally addressed, thanks!

I did not look through the code changes, but ran Octane on my
configuration. As expected, Typescript had improved substantially. Other
benchmarks are not affected much. This in line with the performance
analysis done for the original bug report.

Baseline:

Benchmark          Mode  Cnt        Score        Error  Units
Box2D.test           ss   20     4454.677 ?    345.807  ms/op
CodeLoad.test        ss   20     4784.299 ?    370.658  ms/op
Crypto.test          ss   20      878.395 ?     87.918  ms/op
DeltaBlue.test       ss   20      502.182 ?     52.362  ms/op
EarleyBoyer.test     ss   20     2250.508 ?    273.924  ms/op
Gbemu.test           ss   20     5893.102 ?    656.036  ms/op
Mandreel.test        ss   20     9323.484 ?    825.801  ms/op
NavierStokes.test    ss   20      657.608 ?     41.212  ms/op
PdfJS.test           ss   20     3829.534 ?    353.702  ms/op
Raytrace.test        ss   20     1202.826 ?    166.795  ms/op
Regexp.test          ss   20      156.782 ?     20.992  ms/op
Richards.test        ss   20      324.256 ?     35.874  ms/op
Splay.test           ss   20      179.660 ?     34.120  ms/op
Typescript.test      ss   20       40.537 ?      2.457   s/op

Patched:

Benchmark          Mode  Cnt        Score        Error  Units
Box2D.test           ss   20     4306.198 ?    376.030  ms/op
CodeLoad.test        ss   20     4881.635 ?    395.585  ms/op
Crypto.test          ss   20      823.551 ?    106.679  ms/op
DeltaBlue.test       ss   20      490.557 ?     41.705  ms/op
EarleyBoyer.test     ss   20     2299.763 ?    270.961  ms/op
Gbemu.test           ss   20     5612.868 ?    414.052  ms/op
Mandreel.test        ss   20     8616.735 ?    825.813  ms/op
NavierStokes.test    ss   20      640.722 ?     28.035  ms/op
PdfJS.test           ss   20     4139.396 ?    373.580  ms/op
Raytrace.test        ss   20     1227.632 ?    151.088  ms/op
Regexp.test          ss   20      169.246 ?     34.055  ms/op
Richards.test        ss   20      331.824 ?     32.706  ms/op
Splay.test           ss   20      168.479 ?     23.512  ms/op
Typescript.test      ss   20       31.181 ?      1.790   s/op

The offending profile branch (Universe::flush_dependents_on) is also
gone, which explains the performance improvement.

Thanks,
-Aleksey.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150402/641dfda4/signature.asc>

From vladimir.x.ivanov at oracle.com  Thu Apr  2 16:17:25 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 02 Apr 2015 19:17:25 +0300
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <C5BCEFCD-B4E4-46C1-AE6B-F7491DC20FCD@oracle.com>
References: <551C5B92.8060500@oracle.com>
	<C5BCEFCD-B4E4-46C1-AE6B-F7491DC20FCD@oracle.com>
Message-ID: <551D6B95.5030109@oracle.com>

John, Peter,

Thanks a lot for the feedback!

Updated webrev:
   http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/
   http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/

> Question:  How common is state 2 (context-free CS) compared to state 3 (indy-bound CS)?
It's quite rare (<2%). For Box2D the stats are:
    total # of call sites instantiated: 22000
    (1): ~1800 (stay uninitialized)
    (2): ~19900
    (3): ~300

> And is state 2 well tested by Box2D?
No, it's not. But: (1) I wrote a focused test on different context state 
transitions (see test/compiler/jsr292/CallSiteDepContextTest.java); and 
(2) artificially stressed the logic by eagerly initializing the context 
to DEFAULT_CONTEXT.

I had (2)->(3) transition (DEF_CTX => bound Class context) at some 
point, but decided to get rid of it. IMO the price of recompilation 
(recorded dependencies should be invalidated during context migration) 
is too much for reduced number of dependencies enumerated.

> I recommend putting CONTEXT_OFFSET into CallSite, not the nested class.
> For one thing, your getDeclaredField call will fail (I think) with a security manager installed.
> You can load it up where TARGET_OFFSET is initialized.
Since I removed DependencyContext, I moved CONTEXT_OFFSET to CallSite.

BTW why do you think security manager was the problem? (1) 
Class.getDeclaredField() is caller-sensitive; and (2) DependencyContext 
was eagerly initialized with CallSite (see 
UNSAFE.ensureClassInitialized() in original version).

>
> I haven't looked at the JVM changes yet, and I don't understand the cleaner, yet.

> Can a call site target class change as a result of LF recompiling or customization?
> If so, won't that cause a risk of dropped dependencies?
Good point! It's definitely a problem I haven't envisioned. Ok, I 
completely removed call site target class logic and use DefaultContext 
class instead.

On 4/2/15 11:02 AM, Peter Levart wrote:> Hi Vladimir,
 >
 > Would it be possible for CallSite.context to hold the Cleaner instance
 > itself (without indirection through DependencyContext)?
 >
 > DEFAULT_CONTEXT would then be a Cleaner instance that references some
 > "default" Class object (for example DefaultContext.class that serves no
 > other purpose).
Good idea! I eliminated the indirection as you suggest.

Best regards,
Vladimir Ivanov

From vladimir.x.ivanov at oracle.com  Thu Apr  2 16:26:20 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 02 Apr 2015 19:26:20 +0300
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <551D6A0D.8090500@oracle.com>
References: <551C5B92.8060500@oracle.com> <551D6A0D.8090500@oracle.com>
Message-ID: <551D6DAC.8030607@oracle.com>

Aleksey, thanks a lot for the performance evaluation of the fix!

Best regards,
Vladimir Ivanov

On 4/2/15 7:10 PM, Aleksey Shipilev wrote:
> On 04/01/2015 11:56 PM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
>> https://bugs.openjdk.java.net/browse/JDK-8057967
>
> Glad to see this finally addressed, thanks!
>
> I did not look through the code changes, but ran Octane on my
> configuration. As expected, Typescript had improved substantially. Other
> benchmarks are not affected much. This in line with the performance
> analysis done for the original bug report.
>
> Baseline:
>
> Benchmark          Mode  Cnt        Score        Error  Units
> Box2D.test           ss   20     4454.677 ?    345.807  ms/op
> CodeLoad.test        ss   20     4784.299 ?    370.658  ms/op
> Crypto.test          ss   20      878.395 ?     87.918  ms/op
> DeltaBlue.test       ss   20      502.182 ?     52.362  ms/op
> EarleyBoyer.test     ss   20     2250.508 ?    273.924  ms/op
> Gbemu.test           ss   20     5893.102 ?    656.036  ms/op
> Mandreel.test        ss   20     9323.484 ?    825.801  ms/op
> NavierStokes.test    ss   20      657.608 ?     41.212  ms/op
> PdfJS.test           ss   20     3829.534 ?    353.702  ms/op
> Raytrace.test        ss   20     1202.826 ?    166.795  ms/op
> Regexp.test          ss   20      156.782 ?     20.992  ms/op
> Richards.test        ss   20      324.256 ?     35.874  ms/op
> Splay.test           ss   20      179.660 ?     34.120  ms/op
> Typescript.test      ss   20       40.537 ?      2.457   s/op
>
> Patched:
>
> Benchmark          Mode  Cnt        Score        Error  Units
> Box2D.test           ss   20     4306.198 ?    376.030  ms/op
> CodeLoad.test        ss   20     4881.635 ?    395.585  ms/op
> Crypto.test          ss   20      823.551 ?    106.679  ms/op
> DeltaBlue.test       ss   20      490.557 ?     41.705  ms/op
> EarleyBoyer.test     ss   20     2299.763 ?    270.961  ms/op
> Gbemu.test           ss   20     5612.868 ?    414.052  ms/op
> Mandreel.test        ss   20     8616.735 ?    825.813  ms/op
> NavierStokes.test    ss   20      640.722 ?     28.035  ms/op
> PdfJS.test           ss   20     4139.396 ?    373.580  ms/op
> Raytrace.test        ss   20     1227.632 ?    151.088  ms/op
> Regexp.test          ss   20      169.246 ?     34.055  ms/op
> Richards.test        ss   20      331.824 ?     32.706  ms/op
> Splay.test           ss   20      168.479 ?     23.512  ms/op
> Typescript.test      ss   20       31.181 ?      1.790   s/op
>
> The offending profile branch (Universe::flush_dependents_on) is also
> gone, which explains the performance improvement.
>
> Thanks,
> -Aleksey.
>

From john.r.rose at oracle.com  Thu Apr  2 20:21:03 2015
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 2 Apr 2015 13:21:03 -0700
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <551D6B95.5030109@oracle.com>
References: <551C5B92.8060500@oracle.com>
	<C5BCEFCD-B4E4-46C1-AE6B-F7491DC20FCD@oracle.com>
	<551D6B95.5030109@oracle.com>
Message-ID: <606B777D-C99C-4F28-9E0B-A0A032659B71@oracle.com>

On Apr 2, 2015, at 9:17 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
>> 
>> I recommend putting CONTEXT_OFFSET into CallSite, not the nested class.
>> For one thing, your getDeclaredField call will fail (I think) with a security manager installed.
>> You can load it up where TARGET_OFFSET is initialized.
> Since I removed DependencyContext, I moved CONTEXT_OFFSET to CallSite.
> 
> BTW why do you think security manager was the problem? (1) Class.getDeclaredField() is caller-sensitive; and (2) DependencyContext was eagerly initialized with CallSite (see UNSAFE.ensureClassInitialized() in original version).

CallSite$DependencyContext and CallSite are distinct classes.
At the JVM level they cannot access each others' private members.
So if DependencyContext wants to reflect a private field from CallSite,
there will be extra security checks.  These sometimes fail, as in:

https://bugs.openjdk.java.net/browse/JDK-7050328

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150402/3defb8a1/attachment.html>

From vladimir.x.ivanov at oracle.com  Thu Apr  2 22:08:10 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Fri, 03 Apr 2015 01:08:10 +0300
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <606B777D-C99C-4F28-9E0B-A0A032659B71@oracle.com>
References: <551C5B92.8060500@oracle.com>
	<C5BCEFCD-B4E4-46C1-AE6B-F7491DC20FCD@oracle.com>
	<551D6B95.5030109@oracle.com>
	<606B777D-C99C-4F28-9E0B-A0A032659B71@oracle.com>
Message-ID: <551DBDCA.4020100@oracle.com>

John,

Thanks for the clarification!

>> BTW why do you think security manager was the problem? (1)
>> Class.getDeclaredField() is caller-sensitive; and (2)
>> DependencyContext was eagerly initialized with CallSite (see
>> UNSAFE.ensureClassInitialized() in original version).
>
> CallSite$DependencyContext and CallSite are distinct classes.
> At the JVM level they cannot access each others' private members.
> So if DependencyContext wants to reflect a private field from CallSite,
> there will be extra security checks.  These sometimes fail, as in:
Member access permission check isn't performed if caller and member 
owner class are loaded by the same class loader (which is the case with 
CallSite$DependencyContext and CallSite classes).

jdk/src/java.base/share/classes/java/lang/Class.java:
@CallerSensitive
     public Field getDeclaredField(String name)
         throws NoSuchFieldException, SecurityException {
         checkMemberAccess(Member.DECLARED, Reflection.getCallerClass(), 
true);
...
     private void checkMemberAccess(int which, Class<?> caller, boolean 
checkProxyInterfaces) {
         final SecurityManager s = System.getSecurityManager();
         if (s != null) {
             final ClassLoader ccl = ClassLoader.getClassLoader(caller);
             final ClassLoader cl = getClassLoader0();
             if (which != Member.PUBLIC) {
                 if (ccl != cl) {
 
s.checkPermission(SecurityConstants.CHECK_MEMBER_ACCESS_PERMISSION);
                 }

Best regards,
Vladimir Ivanov

From vladimir.kozlov at oracle.com  Thu Apr  2 22:33:45 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 02 Apr 2015 15:33:45 -0700
Subject: RFR(XS) 8076523: assert(((ABS(iv_adjustment_in_bytes) % elt_size)
	== 0)) fails in superword.cpp
Message-ID: <551DC3C9.7030704@oracle.com>

http://cr.openjdk.java.net/~kvn/8076523/webrev/
https://bugs.openjdk.java.net/browse/JDK-8076523

The problem was caused by JDK-8026049 changes. Vectorization assumes 
that offset in array is aligned to size of memory operations (which 
access element of array). With UseUnalignedAccesses Long load/store 
operations could be used to access byte[] array without alignment to 
sizeof(jlong).

Vectorization has code which verifies alignment - it should be adjusted 
to check that offset % mem_oper_size == 0.

Fix tested with failed tests and JPRT.

Thanks,
Vladimir

From john.r.rose at oracle.com  Thu Apr  2 22:57:41 2015
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 2 Apr 2015 15:57:41 -0700
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <551DBDCA.4020100@oracle.com>
References: <551C5B92.8060500@oracle.com>
	<C5BCEFCD-B4E4-46C1-AE6B-F7491DC20FCD@oracle.com>
	<551D6B95.5030109@oracle.com>
	<606B777D-C99C-4F28-9E0B-A0A032659B71@oracle.com>
	<551DBDCA.4020100@oracle.com>
Message-ID: <27F0D8DE-FF78-4B41-B5DF-2C985705CC96@oracle.com>

On Apr 2, 2015, at 3:08 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Member access permission check isn't performed if caller and member owner class are loaded by the same class loader (which is the case with CallSite$DependencyContext and CallSite classes).

Heh!  And I thought I had compiled the reflection logic to gray matter.  ? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150402/c38dde06/attachment.html>

From igor.veresov at oracle.com  Thu Apr  2 22:59:08 2015
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 2 Apr 2015 15:59:08 -0700
Subject: RFR(XS) 8076523: assert(((ABS(iv_adjustment_in_bytes) % elt_size)
	== 0)) fails in superword.cpp
In-Reply-To: <551DC3C9.7030704@oracle.com>
References: <551DC3C9.7030704@oracle.com>
Message-ID: <623F272A-2ACF-4401-BCBC-71B99394C533@oracle.com>

Looks good.

igor

> On Apr 2, 2015, at 3:33 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~kvn/8076523/webrev/
> https://bugs.openjdk.java.net/browse/JDK-8076523
> 
> The problem was caused by JDK-8026049 changes. Vectorization assumes that offset in array is aligned to size of memory operations (which access element of array). With UseUnalignedAccesses Long load/store operations could be used to access byte[] array without alignment to sizeof(jlong).
> 
> Vectorization has code which verifies alignment - it should be adjusted to check that offset % mem_oper_size == 0.
> 
> Fix tested with failed tests and JPRT.
> 
> Thanks,
> Vladimir


From vladimir.kozlov at oracle.com  Thu Apr  2 23:01:37 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 02 Apr 2015 16:01:37 -0700
Subject: RFR(XS) 8076523: assert(((ABS(iv_adjustment_in_bytes) % elt_size)
	== 0)) fails in superword.cpp
In-Reply-To: <623F272A-2ACF-4401-BCBC-71B99394C533@oracle.com>
References: <551DC3C9.7030704@oracle.com>
	<623F272A-2ACF-4401-BCBC-71B99394C533@oracle.com>
Message-ID: <551DCA51.5020708@oracle.com>

Thank you, Igor

Vladimir

On 4/2/15 3:59 PM, Igor Veresov wrote:
> Looks good.
>
> igor
>
>> On Apr 2, 2015, at 3:33 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~kvn/8076523/webrev/
>> https://bugs.openjdk.java.net/browse/JDK-8076523
>>
>> The problem was caused by JDK-8026049 changes. Vectorization assumes that offset in array is aligned to size of memory operations (which access element of array). With UseUnalignedAccesses Long load/store operations could be used to access byte[] array without alignment to sizeof(jlong).
>>
>> Vectorization has code which verifies alignment - it should be adjusted to check that offset % mem_oper_size == 0.
>>
>> Fix tested with failed tests and JPRT.
>>
>> Thanks,
>> Vladimir
>

From igor.veresov at oracle.com  Mon Apr  6 22:46:02 2015
From: igor.veresov at oracle.com (Igor Veresov)
Date: Mon, 6 Apr 2015 15:46:02 -0700
Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size on
	some SPARC systems is incorrect
Message-ID: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com>

The L2 data cache line size property can be name either "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both.

Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/

Thanks,
igor

From vladimir.kozlov at oracle.com  Mon Apr  6 22:59:47 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 06 Apr 2015 15:59:47 -0700
Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size
	on some SPARC systems is incorrect
In-Reply-To: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com>
References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com>
Message-ID: <55230FE3.2070301@oracle.com>

Good.

Thanks,
Vladimir

On 4/6/15 3:46 PM, Igor Veresov wrote:
> The L2 data cache line size property can be name either "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both.
>
> Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/
>
> Thanks,
> igor
>

From igor.veresov at oracle.com  Mon Apr  6 23:59:35 2015
From: igor.veresov at oracle.com (Igor Veresov)
Date: Mon, 6 Apr 2015 16:59:35 -0700
Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size
	on some SPARC systems is incorrect
In-Reply-To: <55230FE3.2070301@oracle.com>
References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com>
	<55230FE3.2070301@oracle.com>
Message-ID: <F7089DEC-DA38-4CB6-BF9C-291DF27DD00E@oracle.com>

Thanks, Vladimir!

igor

> On Apr 6, 2015, at 3:59 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Good.
> 
> Thanks,
> Vladimir
> 
> On 4/6/15 3:46 PM, Igor Veresov wrote:
>> The L2 data cache line size property can be name either "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both.
>> 
>> Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/
>> 
>> Thanks,
>> igor
>> 


From vitalyd at gmail.com  Tue Apr  7 00:13:30 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Mon, 6 Apr 2015 20:13:30 -0400
Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size
	on some SPARC systems is incorrect
In-Reply-To: <F7089DEC-DA38-4CB6-BF9C-291DF27DD00E@oracle.com>
References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com>
	<55230FE3.2070301@oracle.com>
	<F7089DEC-DA38-4CB6-BF9C-291DF27DD00E@oracle.com>
Message-ID: <CAHjP37HxER+oz=SQtoZrU1N756jzuLVqg3jmExs-WQBTiwpDRQ@mail.gmail.com>

Hi Igor,

// One the first visit determine the name of the l2 cache line size
property and memoize it

Typo - should be "On the first ..." I presume.

Also, that code doesn't just memoize the property name, it also appears to
actually probe for and set the value - worthwhile to update the comment?

sent from my phone
On Apr 6, 2015 8:00 PM, "Igor Veresov" <igor.veresov at oracle.com> wrote:

> Thanks, Vladimir!
>
> igor
>
> > On Apr 6, 2015, at 3:59 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com>
> wrote:
> >
> > Good.
> >
> > Thanks,
> > Vladimir
> >
> > On 4/6/15 3:46 PM, Igor Veresov wrote:
> >> The L2 data cache line size property can be name either
> "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both.
> >>
> >> Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/
> >>
> >> Thanks,
> >> igor
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150406/cd78e04d/attachment.html>

From michael.c.berg at intel.com  Tue Apr  7 01:35:37 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Tue, 7 Apr 2015 01:35:37 +0000
Subject: RFR 8076276 support for AVX512 
Message-ID: <C568518E7B433348B114B6A7122D474755DCE552@FMSMSX102.amr.corp.intel.com>

Hi Folks,

We (Intel) would like to contribute initial support for AVX512 (EVEX encoding, new register support, new ISA support, etc) for EVEX enabled microarchitectures.
The contribution is referenced as Bug ID 8076276 as a performance enhancement.

Please review this patch and comment as needed:

Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276

webrev:
http://cr.openjdk.java.net/~kvn/8076276/webrev

Superword optimizations covered on the vectorization path experience as much as 50% reduction in loop trace instruction count which make up the path length of EVEX encoded SIMD optimized loops.

Vladimir Koslov has offered to sponsor this patch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150407/b8fd5b13/attachment.html>

From igor.veresov at oracle.com  Tue Apr  7 01:53:20 2015
From: igor.veresov at oracle.com (Igor Veresov)
Date: Mon, 6 Apr 2015 18:53:20 -0700
Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size
	on some SPARC systems is incorrect
In-Reply-To: <CAHjP37HxER+oz=SQtoZrU1N756jzuLVqg3jmExs-WQBTiwpDRQ@mail.gmail.com>
References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com>
	<55230FE3.2070301@oracle.com>
	<F7089DEC-DA38-4CB6-BF9C-291DF27DD00E@oracle.com>
	<CAHjP37HxER+oz=SQtoZrU1N756jzuLVqg3jmExs-WQBTiwpDRQ@mail.gmail.com>
Message-ID: <6D50B83F-FA80-48B4-BBD8-6A4EC41A3D17@oracle.com>

Vitaly,

Thanks, others already noted the typo.
The comment means that we memoize the result of bruteforcing of the name of the property. I?m quite sure what is the confusion? What would you like it to say?

igor 

> On Apr 6, 2015, at 5:13 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> Hi Igor,
> 
> // One the first visit determine the name of the l2 cache line size property and memoize it
> 
> Typo - should be "On the first ..." I presume.
> 
> Also, that code doesn't just memoize the property name, it also appears to actually probe for and set the value - worthwhile to update the comment?
> 
> sent from my phone
> 
> On Apr 6, 2015 8:00 PM, "Igor Veresov" <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
> Thanks, Vladimir!
> 
> igor
> 
> > On Apr 6, 2015, at 3:59 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
> >
> > Good.
> >
> > Thanks,
> > Vladimir
> >
> > On 4/6/15 3:46 PM, Igor Veresov wrote:
> >> The L2 data cache line size property can be name either "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both.
> >>
> >> Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/ <http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/>
> >>
> >> Thanks,
> >> igor
> >>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150406/258f56b1/attachment.html>

From vitalyd at gmail.com  Tue Apr  7 02:12:41 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Mon, 6 Apr 2015 22:12:41 -0400
Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size
	on some SPARC systems is incorrect
In-Reply-To: <6D50B83F-FA80-48B4-BBD8-6A4EC41A3D17@oracle.com>
References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com>
	<55230FE3.2070301@oracle.com>
	<F7089DEC-DA38-4CB6-BF9C-291DF27DD00E@oracle.com>
	<CAHjP37HxER+oz=SQtoZrU1N756jzuLVqg3jmExs-WQBTiwpDRQ@mail.gmail.com>
	<6D50B83F-FA80-48B4-BBD8-6A4EC41A3D17@oracle.com>
Message-ID: <CAHjP37FmaJ-PbJATVhWLAABV40_C9Gf2g-z8XkDHK6YpdfzWdw@mail.gmail.com>

I initially read the comment as meaning it memoizes the name of the
property (and not the value) but re-reading it again, I think it's fine.

sent from my phone
On Apr 6, 2015 9:53 PM, "Igor Veresov" <igor.veresov at oracle.com> wrote:

> Vitaly,
>
> Thanks, others already noted the typo.
> The comment means that we memoize the result of bruteforcing of the name
> of the property. I?m quite sure what is the confusion? What would you like
> it to say?
>
> igor
>
> On Apr 6, 2015, at 5:13 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
> Hi Igor,
>
> // One the first visit determine the name of the l2 cache line size
> property and memoize it
>
> Typo - should be "On the first ..." I presume.
>
> Also, that code doesn't just memoize the property name, it also appears to
> actually probe for and set the value - worthwhile to update the comment?
>
> sent from my phone
> On Apr 6, 2015 8:00 PM, "Igor Veresov" <igor.veresov at oracle.com> wrote:
>
>> Thanks, Vladimir!
>>
>> igor
>>
>> > On Apr 6, 2015, at 3:59 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> wrote:
>> >
>> > Good.
>> >
>> > Thanks,
>> > Vladimir
>> >
>> > On 4/6/15 3:46 PM, Igor Veresov wrote:
>> >> The L2 data cache line size property can be name either
>> "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both.
>> >>
>> >> Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/
>> >>
>> >> Thanks,
>> >> igor
>> >>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150406/9c3ef371/attachment-0001.html>

From michael.c.berg at intel.com  Tue Apr  7 18:07:42 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Tue, 7 Apr 2015 18:07:42 +0000
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <C568518E7B433348B114B6A7122D474755DC3649@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755DC3649@FMSMSX102.amr.corp.intel.com>
Message-ID: <C568518E7B433348B114B6A7122D474755DCE86F@FMSMSX102.amr.corp.intel.com>

Please ignore this one its already checked in...

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C
Sent: Monday, March 16, 2015 2:18 PM
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )

Hi All,

We would like to contribute the Integer/FP scalar reduction optimization from Intel.
The contribution is referenced as Bug ID 8074981 as a performance enhancement.

Please review this patch:
Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981
webrev: https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip

The optimization achieves as much as 2.3x on integer reductions and supports float and double precision optimizations
which also have significant optimization uplift an obey strict fp constraints.

Nils Eliasson has offered to sponsor this patch.

Thanks,

-Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150407/4da8ee49/attachment.html>

From vitalyd at gmail.com  Tue Apr  7 18:30:37 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 7 Apr 2015 14:30:37 -0400
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <C568518E7B433348B114B6A7122D474755DCE86F@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755DC3649@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474755DCE86F@FMSMSX102.amr.corp.intel.com>
Message-ID: <CAHjP37G9HNHi3=P6CP0720Z+CMa147O-i6j51iC0g0q8WTy4JQ@mail.gmail.com>

Hi Michael/Vladimir,

Out of curiosity, is this change and the out-for-review avx512 one going to
be (or planned on being) backported to java 8?

Thanks

On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C <michael.c.berg at intel.com>
wrote:

>  Please ignore this one its already checked in?
>
>
>
> *From:* hotspot-compiler-dev [mailto:
> hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of *Berg,
> Michael C
> *Sent:* Monday, March 16, 2015 2:18 PM
> *To:* hotspot-compiler-dev at openjdk.java.net
> *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization )
>
>
>
> Hi All,
>
>
>
> We would like to contribute the Integer/FP scalar reduction optimization from
> Intel.
>
> The contribution is referenced as Bug ID 8074981 as a performance
> enhancement.
>
>
>
> Please review this patch:
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981
>
> webrev: https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip
>
>
>
> The optimization achieves as much as 2.3x on integer reductions and
> supports float and double precision optimizations
>
> which also have significant optimization uplift an obey strict fp
> constraints.
>
>
>
> Nils Eliasson has offered to sponsor this patch.
>
>
>
> Thanks,
>
>
>
> -Michael
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150407/37087b0b/attachment.html>

From vladimir.kozlov at oracle.com  Tue Apr  7 18:38:42 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 07 Apr 2015 11:38:42 -0700
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <CAHjP37G9HNHi3=P6CP0720Z+CMa147O-i6j51iC0g0q8WTy4JQ@mail.gmail.com>
References: <C568518E7B433348B114B6A7122D474755DC3649@FMSMSX102.amr.corp.intel.com>	<C568518E7B433348B114B6A7122D474755DCE86F@FMSMSX102.amr.corp.intel.com>
	<CAHjP37G9HNHi3=P6CP0720Z+CMa147O-i6j51iC0g0q8WTy4JQ@mail.gmail.com>
Message-ID: <55242432.9010607@oracle.com>

Currently it is only jdk9. There are no plans to backport to 8u.
The thinking is that we will get jdk9 released when this hardware will be widely available.

Regards,
Vladimir

On 4/7/15 11:30 AM, Vitaly Davidovich wrote:
> Hi Michael/Vladimir,
>
> Out of curiosity, is this change and the out-for-review avx512 one going to be (or planned on being) backported to java 8?
>
> Thanks
>
> On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>
>     Please ignore this one its already checked in?____
>
>     __ __
>
>     *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net
>     <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>] *On Behalf Of *Berg, Michael C
>     *Sent:* Monday, March 16, 2015 2:18 PM
>     *To:* hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>     *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization )____
>
>     __ __
>
>     Hi All,____
>
>     __ __
>
>     We would like to contribute the Integer/FP scalar reduction optimization from Intel.____
>
>     The contribution is referenced as Bug ID 8074981 as a performance enhancement. ____
>
>     __ __
>
>     Please review this patch:____
>
>     Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 ____
>
>     webrev: https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip ____
>
>     __ __
>
>     The optimization achieves as much as 2.3x on integer reductions and supports float and double precision
>     optimizations____
>
>     which also have significant optimization uplift an obey strict fp constraints.____
>
>     __ __
>
>     Nils Eliasson has offered to sponsor this patch.____
>
>     __ __
>
>     Thanks,____
>
>     __ __
>
>     -Michael____
>
>     __ __
>
>

From vitalyd at gmail.com  Tue Apr  7 18:55:17 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 7 Apr 2015 14:55:17 -0400
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <55242432.9010607@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DC3649@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474755DCE86F@FMSMSX102.amr.corp.intel.com>
	<CAHjP37G9HNHi3=P6CP0720Z+CMa147O-i6j51iC0g0q8WTy4JQ@mail.gmail.com>
	<55242432.9010607@oracle.com>
Message-ID: <CAHjP37GQTLVKyHkHr3TMXvP3_vvrKkp-rNfjgjbK-WQa=dCkLw@mail.gmail.com>

Ok, thanks.  That makes sense for avx512 support, but I think having
Michael's changes from this thread sooner would be nice as it's quite
likely that users are already running java 8 on hardware where this may
have benefit.  Java 9 is still ways away, and even when it's released, the
migration process is not always quick (depending on the nature of major
changes).  But, if backporting it is messy, it's probably not worth it.

On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com>
wrote:

> Currently it is only jdk9. There are no plans to backport to 8u.
> The thinking is that we will get jdk9 released when this hardware will be
> widely available.
>
> Regards,
> Vladimir
>
> On 4/7/15 11:30 AM, Vitaly Davidovich wrote:
>
>> Hi Michael/Vladimir,
>>
>> Out of curiosity, is this change and the out-for-review avx512 one going
>> to be (or planned on being) backported to java 8?
>>
>> Thanks
>>
>> On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C <michael.c.berg at intel.com
>> <mailto:michael.c.berg at intel.com>> wrote:
>>
>>     Please ignore this one its already checked in?____
>>
>>     __ __
>>
>>     *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev-
>> bounces at openjdk.java.net
>>     <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>] *On Behalf
>> Of *Berg, Michael C
>>     *Sent:* Monday, March 16, 2015 2:18 PM
>>     *To:* hotspot-compiler-dev at openjdk.java.net <mailto:
>> hotspot-compiler-dev at openjdk.java.net>
>>     *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization
>> )____
>>
>>     __ __
>>
>>     Hi All,____
>>
>>     __ __
>>
>>     We would like to contribute the Integer/FP scalar reduction
>> optimization from Intel.____
>>
>>     The contribution is referenced as Bug ID 8074981 as a performance
>> enhancement. ____
>>
>>     __ __
>>
>>     Please review this patch:____
>>
>>     Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 ____
>>
>>     webrev: https://bugs.openjdk.java.net/secure/attachment/26101/
>> webrev.zip ____
>>
>>     __ __
>>
>>     The optimization achieves as much as 2.3x on integer reductions and
>> supports float and double precision
>>     optimizations____
>>
>>     which also have significant optimization uplift an obey strict fp
>> constraints.____
>>
>>     __ __
>>
>>     Nils Eliasson has offered to sponsor this patch.____
>>
>>     __ __
>>
>>     Thanks,____
>>
>>     __ __
>>
>>     -Michael____
>>
>>     __ __
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150407/bc93fb81/attachment-0001.html>

From vladimir.kozlov at oracle.com  Tue Apr  7 19:04:38 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 07 Apr 2015 12:04:38 -0700
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <CAHjP37GQTLVKyHkHr3TMXvP3_vvrKkp-rNfjgjbK-WQa=dCkLw@mail.gmail.com>
References: <C568518E7B433348B114B6A7122D474755DC3649@FMSMSX102.amr.corp.intel.com>	<C568518E7B433348B114B6A7122D474755DCE86F@FMSMSX102.amr.corp.intel.com>	<CAHjP37G9HNHi3=P6CP0720Z+CMa147O-i6j51iC0g0q8WTy4JQ@mail.gmail.com>	<55242432.9010607@oracle.com>
	<CAHjP37GQTLVKyHkHr3TMXvP3_vvrKkp-rNfjgjbK-WQa=dCkLw@mail.gmail.com>
Message-ID: <55242A46.7020108@oracle.com>

We want to motivate people to migrate to new releases :)
If you mean loop reduction vectorization we can consider it after it is tested for some time in jdk9.

Vladimir

On 4/7/15 11:55 AM, Vitaly Davidovich wrote:
> Ok, thanks.  That makes sense for avx512 support, but I think having Michael's changes from this thread sooner would be
> nice as it's quite likely that users are already running java 8 on hardware where this may have benefit.  Java 9 is
> still ways away, and even when it's released, the migration process is not always quick (depending on the nature of
> major changes).  But, if backporting it is messy, it's probably not worth it.
>
> On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>
>     Currently it is only jdk9. There are no plans to backport to 8u.
>     The thinking is that we will get jdk9 released when this hardware will be widely available.
>
>     Regards,
>     Vladimir
>
>     On 4/7/15 11:30 AM, Vitaly Davidovich wrote:
>
>         Hi Michael/Vladimir,
>
>         Out of curiosity, is this change and the out-for-review avx512 one going to be (or planned on being) backported
>         to java 8?
>
>         Thanks
>
>         On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>
>         <mailto:michael.c.berg at intel.__com <mailto:michael.c.berg at intel.com>>> wrote:
>
>              Please ignore this one its already checked in?____
>
>              __ __
>
>              *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev-__bounces at openjdk.java.net
>         <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>
>              <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net
>         <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>>] *On Behalf Of *Berg, Michael C
>              *Sent:* Monday, March 16, 2015 2:18 PM
>              *To:* hotspot-compiler-dev at openjdk.__java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
>         <mailto:hotspot-compiler-dev at __openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>
>              *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization )____
>
>              __ __
>
>              Hi All,____
>
>              __ __
>
>              We would like to contribute the Integer/FP scalar reduction optimization from Intel.____
>
>              The contribution is referenced as Bug ID 8074981 as a performance enhancement. ____
>
>              __ __
>
>              Please review this patch:____
>
>              Bug-id: https://bugs.openjdk.java.net/__browse/JDK-8074981
>         <https://bugs.openjdk.java.net/browse/JDK-8074981> ____
>
>              webrev: https://bugs.openjdk.java.net/__secure/attachment/26101/__webrev.zip
>         <https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip> ____
>
>              __ __
>
>              The optimization achieves as much as 2.3x on integer reductions and supports float and double precision
>              optimizations____
>
>              which also have significant optimization uplift an obey strict fp constraints.____
>
>              __ __
>
>              Nils Eliasson has offered to sponsor this patch.____
>
>              __ __
>
>              Thanks,____
>
>              __ __
>
>              -Michael____
>
>              __ __
>
>
>

From michael.haupt at oracle.com  Tue Apr  7 19:11:30 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Tue, 7 Apr 2015 21:11:30 +0200
Subject: RFR (S): 8076461: JSR292: remove unused native and constants
Message-ID: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>

Dear all,

please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class.

RFE: https://bugs.openjdk.java.net/browse/JDK-8076461
Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/

Tested with JPRT, HotSpot testset.

Thanks,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150407/f033bba0/attachment.html>

From vitalyd at gmail.com  Tue Apr  7 19:12:16 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 7 Apr 2015 15:12:16 -0400
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <55242A46.7020108@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DC3649@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474755DCE86F@FMSMSX102.amr.corp.intel.com>
	<CAHjP37G9HNHi3=P6CP0720Z+CMa147O-i6j51iC0g0q8WTy4JQ@mail.gmail.com>
	<55242432.9010607@oracle.com>
	<CAHjP37GQTLVKyHkHr3TMXvP3_vvrKkp-rNfjgjbK-WQa=dCkLw@mail.gmail.com>
	<55242A46.7020108@oracle.com>
Message-ID: <CAHjP37F0AL6hTBN1b3djHM_Dah=MWkxRw=reia9w-jiRZoJExg@mail.gmail.com>

Oh, the motivation is there! :) However, it's not always a quick process
even if everyone's motivated as there may be changes of consequence.  As a
small example, java 8 virtual memory charge is significantly higher than
java 7 due to metaspace vs permgen differences.  In some cases, this now
requires tweaking java 8 settings in order to keep things running
smoothly.  With a big enough codebase, such migrations are never as quick
as one would hope.

At any rate, yes, I meant loop reduction vectorization.  It seems like a
fairly self-contained change which should be relatively painless to
backport, hence my inquiry.

On Tue, Apr 7, 2015 at 3:04 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com>
wrote:

> We want to motivate people to migrate to new releases :)
> If you mean loop reduction vectorization we can consider it after it is
> tested for some time in jdk9.
>
> Vladimir
>
> On 4/7/15 11:55 AM, Vitaly Davidovich wrote:
>
>> Ok, thanks.  That makes sense for avx512 support, but I think having
>> Michael's changes from this thread sooner would be
>> nice as it's quite likely that users are already running java 8 on
>> hardware where this may have benefit.  Java 9 is
>> still ways away, and even when it's released, the migration process is
>> not always quick (depending on the nature of
>> major changes).  But, if backporting it is messy, it's probably not worth
>> it.
>>
>> On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov <
>> vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>     Currently it is only jdk9. There are no plans to backport to 8u.
>>     The thinking is that we will get jdk9 released when this hardware
>> will be widely available.
>>
>>     Regards,
>>     Vladimir
>>
>>     On 4/7/15 11:30 AM, Vitaly Davidovich wrote:
>>
>>         Hi Michael/Vladimir,
>>
>>         Out of curiosity, is this change and the out-for-review avx512
>> one going to be (or planned on being) backported
>>         to java 8?
>>
>>         Thanks
>>
>>         On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C <
>> michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>
>>         <mailto:michael.c.berg at intel.__com <mailto:michael.c.berg at intel.
>> com>>> wrote:
>>
>>              Please ignore this one its already checked in?____
>>
>>              __ __
>>
>>              *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev-_
>> _bounces at openjdk.java.net
>>         <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>
>>              <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net
>>         <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>>] *On
>> Behalf Of *Berg, Michael C
>>              *Sent:* Monday, March 16, 2015 2:18 PM
>>              *To:* hotspot-compiler-dev at openjdk.__java.net <mailto:
>> hotspot-compiler-dev at openjdk.java.net>
>>         <mailto:hotspot-compiler-dev at __openjdk.java.net <mailto:
>> hotspot-compiler-dev at openjdk.java.net>>
>>              *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction
>> optimization )____
>>
>>              __ __
>>
>>              Hi All,____
>>
>>              __ __
>>
>>              We would like to contribute the Integer/FP scalar reduction
>> optimization from Intel.____
>>
>>              The contribution is referenced as Bug ID 8074981 as a
>> performance enhancement. ____
>>
>>              __ __
>>
>>              Please review this patch:____
>>
>>              Bug-id: https://bugs.openjdk.java.net/__browse/JDK-8074981
>>         <https://bugs.openjdk.java.net/browse/JDK-8074981> ____
>>
>>              webrev: https://bugs.openjdk.java.net/
>> __secure/attachment/26101/__webrev.zip
>>         <https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip>
>> ____
>>
>>              __ __
>>
>>              The optimization achieves as much as 2.3x on integer
>> reductions and supports float and double precision
>>              optimizations____
>>
>>              which also have significant optimization uplift an obey
>> strict fp constraints.____
>>
>>              __ __
>>
>>              Nils Eliasson has offered to sponsor this patch.____
>>
>>              __ __
>>
>>              Thanks,____
>>
>>              __ __
>>
>>              -Michael____
>>
>>              __ __
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150407/d06022e3/attachment.html>

From michael.haupt at oracle.com  Tue Apr  7 20:10:53 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Tue, 7 Apr 2015 22:10:53 +0200
Subject: RFR (S): 8076461: JSR292: remove unused native and constants
In-Reply-To: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>
References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>
Message-ID: <8D74B563-185A-40E8-AEA2-A6688E819377@oracle.com>

Hello,

in case anyone was wondering about the empty changeset in the webrev: that's fixed now. Thanks to Vladimir I. for pointing out the glitch in my webrev creation approach. :-)

Best,

Michael

> Am 07.04.2015 um 21:11 schrieb Michael Haupt <michael.haupt at oracle.com>:
> 
> Dear all,
> 
> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class.
> 
> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461 <https://bugs.openjdk.java.net/browse/JDK-8076461>
> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/ <http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/>
> 
> Tested with JPRT, HotSpot testset.
> 
> Thanks,
> 
> Michael


-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150407/22d7896a/attachment-0001.html>

From john.r.rose at oracle.com  Tue Apr  7 21:49:59 2015
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 7 Apr 2015 14:49:59 -0700
Subject: RFR (S): 8076461: JSR292: remove unused native and constants
In-Reply-To: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>
References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>
Message-ID: <C1C9C217-1448-416E-9862-47C52A4ED7DC@oracle.com>

On Apr 7, 2015, at 12:11 PM, Michael Haupt <michael.haupt at oracle.com> wrote:
> 
> Dear all,
> 
> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class.

The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about.  Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java).  If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE.  It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync.

If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for.  But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure.

I'm happy to see the "GC" guys go away.  They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams.

? John

> 
> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461
> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/
> 
> Tested with JPRT, HotSpot testset.
> 
> Thanks,
> 
> Michael
> 
> -- 
> 
> <http://www.oracle.com/>
> Dr. Michael Haupt | Principal Member of Technical Staff
> Phone: +49 331 200 7277 | Fax: +49 331 200 7561
> Oracle Java Platform Group | HotSpot Compiler Team 
> Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
> <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment
> 


From vladimir.x.ivanov at oracle.com  Wed Apr  8 13:06:41 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 08 Apr 2015 16:06:41 +0300
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <551C5B92.8060500@oracle.com>
References: <551C5B92.8060500@oracle.com>
Message-ID: <552527E1.5060102@oracle.com>

Any volunteers to review VM part?

Latest webrev:
   http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/
   http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/

Best regards,
Vladimir Ivanov

On 4/1/15 11:56 PM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
> https://bugs.openjdk.java.net/browse/JDK-8057967
>
> HotSpot JITs inline very aggressively through CallSites. The
> optimistically treat CallSite target as constant, but record a nmethod
> dependency to invalidate the compiled code once CallSite target changes.
>
> Right now, such dependencies have call site class as a context. This
> context is too coarse and it leads to context pollution: if some
> CallSite target changes, VM needs to enumerate all nmethods which
> depends on call sites of such type.
>
> As performance analysis in the bug report shows, it can sum to
> significant amount of work.
>
> While working on the fix, I investigated 3 approaches:
>    (1) unique context per call site
>    (2) use CallSite target class
>    (3) use a class the CallSite instance is linked to
>
> Considering call sites are ubiquitous (e.g. 10,000s on some octane
> benchmarks), loading a dedicated class for every call site is an
> overkill (even VM anonymous).
>
> CallSite target class
> (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class<?>) is
> also not satisfactory, since it is a compiled LambdaForm VM anonymous
> class, which is heavily shared. It gets context pollution down, but
> still the overhead is quite high.
>
> So, I decided to focus on (3) and ended up with a mixture of (2) & (3).
>
> Comparing to other options, the complications of (3) are:
>    - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so
> there should be some default context VM can use
>
>    - CallSite instances can be shared and it shouldn't keep the context
> class from unloading;
>
> It motivated a scheme where CallSite context is initialized lazily and
> can change during lifetime. When CallSite is linked with an indy
> instruction, it's context is initialized. Usually, JIT sees CallSite
> instances with initialized context (since it reaches them through indy),
> but if it's not the case and there's no context yet, JIT sets it to
> "default context", which means "use target call site".
>
> I introduced CallSite$DependencyContext, which represents a nmethod
> dependency context and points (indirectly) to a Class<?> used as a context.
>
> Context class is referenced through a phantom reference
> (sun.misc.Cleaner to simplify cleanup). Though it's impossible to
> extract referent using Reference.get(), VM can access it directly by
> reading corresponding field. Unlike other types of references, phantom
> references aren't cleared automatically. It allows VM to access context
> class until cleanup is performed. And cleanup resets the context to
> NULL, in addition to invalidating all relevant dependencies.
>
> There are 3 context states a CallSite instance can be in:
>    (1) NULL: no depedencies
>    (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in
> call site target class
>    (3) DependencyContext for some class: dependencies are stored on the
> class DependencyContext instance points to
>
> Every CallSite starts w/o a context (1) and then lazily gets one ((2) or
> (3) depending on the situation).
>
> State transitions:
>    (1->3): When a CallSite w/o a context (1) is linked with some indy
> call site, it's owner is recorded as a context (3).
>
>    (1->2): When JIT needs to record a dependency on a target of a
> CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and
> uses target class to store the dependency.
>
>    (3->1): When context class becomes unreachable, a cleanup hook
> invalidates all dependencies on that CallSite and resets the context to
> NULL (1).
>
> Only (3->1) requires dependency invalidation, because there are no
> depedencies in (1) and (2->1) isn't performed.
>
> (1->3) is done in Java code (CallSite.initContext) and (1->2) is
> performed in VM (ciCallSite::get_context()). The updates are performed
> by CAS, so there's no need in additional synchronization. Other
> operations on VM side are volatile (to play well with Java code) and
> performed with Compile_lock held (to avoid races between VM operations).
>
> Some statistics:
>    Box2D, latest jdk9-dev
>      - CallSite instances: ~22000
>
>      - invalidated nmethods due to CallSite target changes: ~60
>
>      - checked call_site_target_value dependencies:
>        - before the fix: ~1,600,000
>        - after the fix:        ~600
>
> Testing:
>    - dedicated test which excercises different state transitions
>    - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn
>
> Thanks!
>
> Best regards,
> Vladimir Ivanov

From vladimir.kozlov at oracle.com  Wed Apr  8 19:36:23 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 08 Apr 2015 12:36:23 -0700
Subject: RFR 8076276 support for AVX512
In-Reply-To: <C568518E7B433348B114B6A7122D474755DCE552@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755DCE552@FMSMSX102.amr.corp.intel.com>
Message-ID: <55258337.2050605@oracle.com>

I would suggest to remove MoveK and RegK from these changes since they are not used.
We can add them later when you have the use case.

sharedRuntime_x86_64.* You should have code and not comment:
// TODO: add ZMM save code

vm_version_x86.cpp Add code to verify that system preserve Z registers during interrupt. See code after comment :

// Some OSs have a bug when upper 128bits of YMM


I see repeated next pattern in C1 code. It should be moved to a function in FrameMap:

+        int num_caller_save_xmm_regs = FrameMap::nof_caller_save_xmm_regs;
+#if _LP64
+        if (UseAVX < 3) {
+          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
+        }
+#endif


In general we should avoid using #ifdef X86 in shared code: matcher.cpp. This file will not be issue if you remove RegK 
from changes.

c2compiler.cpp - can you move that code to Compile::pd_compiler2_init() which is platform specific?

matcher.cpp - typo 'eno':

+    // For VecZ we need eno alignment and 64 bytes (16 slots) for spills.


Thanks,
Vladimir


On 4/6/15 6:35 PM, Berg, Michael C wrote:
> Hi Folks,
>
> We (Intel) would like to contribute initial support for AVX512 (EVEX encoding, new register support, new ISA support,
> etc) for EVEX enabled microarchitectures.
> The contribution is referenced as Bug ID 8076276 as a performance enhancement.
>
> Please review this patch and comment as needed:
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>
> webrev:
> http://cr.openjdk.java.net/~kvn/8076276/webrev
>
> Superword optimizations covered on the vectorization path experience as much as 50% reduction in loop trace instruction
> count which make up the path length of EVEX encoded SIMD optimized loops.
>
> Vladimir Koslov has offered to sponsor this patch.
>

From vladimir.kozlov at oracle.com  Wed Apr  8 19:41:38 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 08 Apr 2015 12:41:38 -0700
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <CAHjP37F0AL6hTBN1b3djHM_Dah=MWkxRw=reia9w-jiRZoJExg@mail.gmail.com>
References: <C568518E7B433348B114B6A7122D474755DC3649@FMSMSX102.amr.corp.intel.com>	<C568518E7B433348B114B6A7122D474755DCE86F@FMSMSX102.amr.corp.intel.com>	<CAHjP37G9HNHi3=P6CP0720Z+CMa147O-i6j51iC0g0q8WTy4JQ@mail.gmail.com>	<55242432.9010607@oracle.com>	<CAHjP37GQTLVKyHkHr3TMXvP3_vvrKkp-rNfjgjbK-WQa=dCkLw@mail.gmail.com>	<55242A46.7020108@oracle.com>
	<CAHjP37F0AL6hTBN1b3djHM_Dah=MWkxRw=reia9w-jiRZoJExg@mail.gmail.com>
Message-ID: <55258472.4030106@oracle.com>

Note, that if we backport loop reduction vectorization, we backport only 8074981 changes as they are:

http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/6fff5df5f3d2

There will be no support for MulVL in it which requires avx512.

Regards,
Vladimir

On 4/7/15 12:12 PM, Vitaly Davidovich wrote:
> Oh, the motivation is there! :) However, it's not always a quick process even if everyone's motivated as there may be
> changes of consequence.  As a small example, java 8 virtual memory charge is significantly higher than java 7 due to
> metaspace vs permgen differences.  In some cases, this now requires tweaking java 8 settings in order to keep things
> running smoothly.  With a big enough codebase, such migrations are never as quick as one would hope.
>
> At any rate, yes, I meant loop reduction vectorization.  It seems like a fairly self-contained change which should be
> relatively painless to backport, hence my inquiry.
>
> On Tue, Apr 7, 2015 at 3:04 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>
>     We want to motivate people to migrate to new releases :)
>     If you mean loop reduction vectorization we can consider it after it is tested for some time in jdk9.
>
>     Vladimir
>
>     On 4/7/15 11:55 AM, Vitaly Davidovich wrote:
>
>         Ok, thanks.  That makes sense for avx512 support, but I think having Michael's changes from this thread sooner
>         would be
>         nice as it's quite likely that users are already running java 8 on hardware where this may have benefit.  Java 9 is
>         still ways away, and even when it's released, the migration process is not always quick (depending on the nature of
>         major changes).  But, if backporting it is messy, it's probably not worth it.
>
>         On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>
>         <mailto:vladimir.kozlov at __oracle.com <mailto:vladimir.kozlov at oracle.com>>> wrote:
>
>              Currently it is only jdk9. There are no plans to backport to 8u.
>              The thinking is that we will get jdk9 released when this hardware will be widely available.
>
>              Regards,
>              Vladimir
>
>              On 4/7/15 11:30 AM, Vitaly Davidovich wrote:
>
>                  Hi Michael/Vladimir,
>
>                  Out of curiosity, is this change and the out-for-review avx512 one going to be (or planned on being)
>         backported
>                  to java 8?
>
>                  Thanks
>
>                  On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C <michael.c.berg at intel.com
>         <mailto:michael.c.berg at intel.com> <mailto:michael.c.berg at intel.__com <mailto:michael.c.berg at intel.com>>
>                  <mailto:michael.c.berg at intel. <mailto:michael.c.berg at intel.>____com <mailto:michael.c.berg at intel.__com
>         <mailto:michael.c.berg at intel.com>>>> wrote:
>
>                       Please ignore this one its already checked in?____
>
>                       __ __
>
>                       *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev-____bounces at openjdk.java.net
>         <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net>
>                  <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net
>         <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>>
>                       <mailto:hotspot-compiler-dev-____bounces at openjdk.java.net
>         <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net>
>                  <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net
>         <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>>>] *On Behalf Of *Berg, Michael C
>                       *Sent:* Monday, March 16, 2015 2:18 PM
>                       *To:* hotspot-compiler-dev at openjdk.____java.net <http://java.net>
>         <mailto:hotspot-compiler-dev at __openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>>
>                  <mailto:hotspot-compiler-dev@ <mailto:hotspot-compiler-dev@>____openjdk.java.net
>         <http://openjdk.java.net> <mailto:hotspot-compiler-dev at __openjdk.java.net
>         <mailto:hotspot-compiler-dev at openjdk.java.net>>>
>                       *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization )____
>
>                       __ __
>
>                       Hi All,____
>
>                       __ __
>
>                       We would like to contribute the Integer/FP scalar reduction optimization from Intel.____
>
>                       The contribution is referenced as Bug ID 8074981 as a performance enhancement. ____
>
>                       __ __
>
>                       Please review this patch:____
>
>                       Bug-id: https://bugs.openjdk.java.net/____browse/JDK-8074981
>         <https://bugs.openjdk.java.net/__browse/JDK-8074981>
>                  <https://bugs.openjdk.java.__net/browse/JDK-8074981 <https://bugs.openjdk.java.net/browse/JDK-8074981>>
>         ____
>
>                       webrev: https://bugs.openjdk.java.net/____secure/attachment/26101/____webrev.zip
>         <https://bugs.openjdk.java.net/__secure/attachment/26101/__webrev.zip>
>                  <https://bugs.openjdk.java.__net/secure/attachment/26101/__webrev.zip
>         <https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip>> ____
>
>                       __ __
>
>                       The optimization achieves as much as 2.3x on integer reductions and supports float and double
>         precision
>                       optimizations____
>
>                       which also have significant optimization uplift an obey strict fp constraints.____
>
>                       __ __
>
>                       Nils Eliasson has offered to sponsor this patch.____
>
>                       __ __
>
>                       Thanks,____
>
>                       __ __
>
>                       -Michael____
>
>                       __ __
>
>
>
>

From vitalyd at gmail.com  Wed Apr  8 19:57:46 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 8 Apr 2015 15:57:46 -0400
Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization )
In-Reply-To: <55258472.4030106@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DC3649@FMSMSX102.amr.corp.intel.com>
	<C568518E7B433348B114B6A7122D474755DCE86F@FMSMSX102.amr.corp.intel.com>
	<CAHjP37G9HNHi3=P6CP0720Z+CMa147O-i6j51iC0g0q8WTy4JQ@mail.gmail.com>
	<55242432.9010607@oracle.com>
	<CAHjP37GQTLVKyHkHr3TMXvP3_vvrKkp-rNfjgjbK-WQa=dCkLw@mail.gmail.com>
	<55242A46.7020108@oracle.com>
	<CAHjP37F0AL6hTBN1b3djHM_Dah=MWkxRw=reia9w-jiRZoJExg@mail.gmail.com>
	<55258472.4030106@oracle.com>
Message-ID: <CAHjP37En63HFZrh_RnwqedDxvgM52TMq9ORhXzD0dNP_=4pxhg@mail.gmail.com>

Sounds good to me, I'll take what I can get :)

Thanks

On Wed, Apr 8, 2015 at 3:41 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com>
wrote:

> Note, that if we backport loop reduction vectorization, we backport only
> 8074981 changes as they are:
>
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/6fff5df5f3d2
>
> There will be no support for MulVL in it which requires avx512.
>
> Regards,
> Vladimir
>
> On 4/7/15 12:12 PM, Vitaly Davidovich wrote:
>
>> Oh, the motivation is there! :) However, it's not always a quick process
>> even if everyone's motivated as there may be
>> changes of consequence.  As a small example, java 8 virtual memory charge
>> is significantly higher than java 7 due to
>> metaspace vs permgen differences.  In some cases, this now requires
>> tweaking java 8 settings in order to keep things
>> running smoothly.  With a big enough codebase, such migrations are never
>> as quick as one would hope.
>>
>> At any rate, yes, I meant loop reduction vectorization.  It seems like a
>> fairly self-contained change which should be
>> relatively painless to backport, hence my inquiry.
>>
>> On Tue, Apr 7, 2015 at 3:04 PM, Vladimir Kozlov <
>> vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>     We want to motivate people to migrate to new releases :)
>>     If you mean loop reduction vectorization we can consider it after it
>> is tested for some time in jdk9.
>>
>>     Vladimir
>>
>>     On 4/7/15 11:55 AM, Vitaly Davidovich wrote:
>>
>>         Ok, thanks.  That makes sense for avx512 support, but I think
>> having Michael's changes from this thread sooner
>>         would be
>>         nice as it's quite likely that users are already running java 8
>> on hardware where this may have benefit.  Java 9 is
>>         still ways away, and even when it's released, the migration
>> process is not always quick (depending on the nature of
>>         major changes).  But, if backporting it is messy, it's probably
>> not worth it.
>>
>>         On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov <
>> vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>
>>         <mailto:vladimir.kozlov at __oracle.com <mailto:vladimir.kozlov@
>> oracle.com>>> wrote:
>>
>>              Currently it is only jdk9. There are no plans to backport to
>> 8u.
>>              The thinking is that we will get jdk9 released when this
>> hardware will be widely available.
>>
>>              Regards,
>>              Vladimir
>>
>>              On 4/7/15 11:30 AM, Vitaly Davidovich wrote:
>>
>>                  Hi Michael/Vladimir,
>>
>>                  Out of curiosity, is this change and the out-for-review
>> avx512 one going to be (or planned on being)
>>         backported
>>                  to java 8?
>>
>>                  Thanks
>>
>>                  On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C <
>> michael.c.berg at intel.com
>>         <mailto:michael.c.berg at intel.com> <mailto:michael.c.berg at intel.__com
>> <mailto:michael.c.berg at intel.com>>
>>                  <mailto:michael.c.berg at intel. <mailto:
>> michael.c.berg at intel.>____com <mailto:michael.c.berg at intel.__com
>>         <mailto:michael.c.berg at intel.com>>>> wrote:
>>
>>                       Please ignore this one its already checked in?____
>>
>>                       __ __
>>
>>                       *From:* hotspot-compiler-dev [mailto:
>> hotspot-compiler-dev-____bounces at openjdk.java.net
>>         <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net>
>>                  <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net
>>         <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>>
>>                       <mailto:hotspot-compiler-dev-_
>> ___bounces at openjdk.java.net
>>         <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net>
>>                  <mailto:hotspot-compiler-dev-__bounces at openjdk.java.net
>>         <mailto:hotspot-compiler-dev-bounces at openjdk.java.net>>>] *On
>> Behalf Of *Berg, Michael C
>>                       *Sent:* Monday, March 16, 2015 2:18 PM
>>                       *To:* hotspot-compiler-dev at openjdk.____java.net <
>> http://java.net>
>>         <mailto:hotspot-compiler-dev at __openjdk.java.net <mailto:
>> hotspot-compiler-dev at openjdk.java.net>>
>>                  <mailto:hotspot-compiler-dev@ <mailto:
>> hotspot-compiler-dev@>____openjdk.java.net
>>         <http://openjdk.java.net> <mailto:hotspot-compiler-dev at __
>> openjdk.java.net
>>         <mailto:hotspot-compiler-dev at openjdk.java.net>>>
>>                       *Subject:* RFR(L): 8074981 (Integer/FP scalar
>> reduction optimization )____
>>
>>                       __ __
>>
>>                       Hi All,____
>>
>>                       __ __
>>
>>                       We would like to contribute the Integer/FP scalar
>> reduction optimization from Intel.____
>>
>>                       The contribution is referenced as Bug ID 8074981 as
>> a performance enhancement. ____
>>
>>                       __ __
>>
>>                       Please review this patch:____
>>
>>                       Bug-id: https://bugs.openjdk.java.net/
>> ____browse/JDK-8074981
>>         <https://bugs.openjdk.java.net/__browse/JDK-8074981>
>>                  <https://bugs.openjdk.java.__net/browse/JDK-8074981 <
>> https://bugs.openjdk.java.net/browse/JDK-8074981>>
>>         ____
>>
>>                       webrev: https://bugs.openjdk.java.net/
>> ____secure/attachment/26101/____webrev.zip
>>         <https://bugs.openjdk.java.net/__secure/attachment/26101/
>> __webrev.zip>
>>                  <https://bugs.openjdk.java.__
>> net/secure/attachment/26101/__webrev.zip
>>         <https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip>>
>> ____
>>
>>                       __ __
>>
>>                       The optimization achieves as much as 2.3x on
>> integer reductions and supports float and double
>>         precision
>>                       optimizations____
>>
>>                       which also have significant optimization uplift an
>> obey strict fp constraints.____
>>
>>                       __ __
>>
>>                       Nils Eliasson has offered to sponsor this patch.____
>>
>>                       __ __
>>
>>                       Thanks,____
>>
>>                       __ __
>>
>>                       -Michael____
>>
>>                       __ __
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150408/94d2caf7/attachment-0001.html>

From vladimir.kozlov at oracle.com  Wed Apr  8 20:32:56 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 08 Apr 2015 13:32:56 -0700
Subject: RFR 8076276 support for AVX512
In-Reply-To: <C568518E7B433348B114B6A7122D474755DCEBDA@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755DCE552@FMSMSX102.amr.corp.intel.com>
	<55258337.2050605@oracle.com>
	<C568518E7B433348B114B6A7122D474755DCEBDA@FMSMSX102.amr.corp.intel.com>
Message-ID: <55259078.1080309@oracle.com>

Michael, please, make sure to include mailing lists in replies - it is review process.

I understand that K register may be important but I don't see the need to include it in these changes which are huge 
already. We can do it as separate changes unless you point me where they are critical needed for avx512 instructions.
I don't see the use of it in current changes which simple widen vectors to 512 bits.

I am concern that K reg implementation is incomplete but it is hard to see and review it in current changes.

Regards,
Vladimir

On 4/8/15 1:09 PM, Berg, Michael C wrote:
> Vladimir, RegK is needed as it frames the kmov instructions which utilize KRegister and
> the enumerated k registers, which are critically needed and used, although not yet matched (we use k1 and k0 now).  I will look into to the rest of
> the comments.  The plan is to register allocate the k registers at some point though.
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov
> Sent: Wednesday, April 08, 2015 12:36 PM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR 8076276 support for AVX512
>
> I would suggest to remove MoveK and RegK from these changes since they are not used.
> We can add them later when you have the use case.
>
> sharedRuntime_x86_64.* You should have code and not comment:
> // TODO: add ZMM save code
>
> vm_version_x86.cpp Add code to verify that system preserve Z registers during interrupt. See code after comment :
>
> // Some OSs have a bug when upper 128bits of YMM
>
>
> I see repeated next pattern in C1 code. It should be moved to a function in FrameMap:
>
> +        int num_caller_save_xmm_regs =
> +FrameMap::nof_caller_save_xmm_regs;
> +#if _LP64
> +        if (UseAVX < 3) {
> +          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
> +        }
> +#endif
>
>
> In general we should avoid using #ifdef X86 in shared code: matcher.cpp. This file will not be issue if you remove RegK from changes.
>
> c2compiler.cpp - can you move that code to Compile::pd_compiler2_init() which is platform specific?
>
> matcher.cpp - typo 'eno':
>
> +    // For VecZ we need eno alignment and 64 bytes (16 slots) for spills.
>
>
> Thanks,
> Vladimir
>
>
> On 4/6/15 6:35 PM, Berg, Michael C wrote:
>> Hi Folks,
>>
>> We (Intel) would like to contribute initial support for AVX512 (EVEX
>> encoding, new register support, new ISA support,
>> etc) for EVEX enabled microarchitectures.
>> The contribution is referenced as Bug ID 8076276 as a performance enhancement.
>>
>> Please review this patch and comment as needed:
>>
>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>>
>> webrev:
>> http://cr.openjdk.java.net/~kvn/8076276/webrev
>>
>> Superword optimizations covered on the vectorization path experience
>> as much as 50% reduction in loop trace instruction count which make up the path length of EVEX encoded SIMD optimized loops.
>>
>> Vladimir Koslov has offered to sponsor this patch.
>>

From mark.reinhold at oracle.com  Wed Apr  8 23:09:25 2015
From: mark.reinhold at oracle.com (mark.reinhold at oracle.com)
Date: Wed,  8 Apr 2015 16:09:25 -0700 (PDT)
Subject: JEP 243: Java-Level JVM Compiler Interface
Message-ID: <20150408230925.99BBD553C1@eggemoggin.niobe.net>

New JEP Candidate: http://openjdk.java.net/jeps/243

- Mark

From duncan.macgregor at ge.com  Thu Apr  9 09:46:21 2015
From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management))
Date: Thu, 9 Apr 2015 09:46:21 +0000
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <551D6DAC.8030607@oracle.com>
References: <551C5B92.8060500@oracle.com> <551D6A0D.8090500@oracle.com>
	<551D6DAC.8030607@oracle.com>
Message-ID: <D14C0730.DAFC9%duncan.macgregor@ge.com>

Now I?m back from my Easter break I?ve run done some testing with our
code. Hs-comp is looking good in general, and this code does appear to
give a nice little extra boost. My results are showing a difference at
peak performance, which I found slightly surprising so I?ll need to take a
look at just how often targets are being reset and for what reasons.

Anyway, in general I?m getting about 10% better performance with hs-comp
than 8u40, and that?s in code which spends a substantial amount of its
time down in some C libraries.

Keep up the good work Vladimir!

Duncan.

On 02/04/2015 17:26, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
wrote:

>Aleksey, thanks a lot for the performance evaluation of the fix!
>
>Best regards,
>Vladimir Ivanov
>
>On 4/2/15 7:10 PM, Aleksey Shipilev wrote:
>> On 04/01/2015 11:56 PM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
>>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
>>> https://bugs.openjdk.java.net/browse/JDK-8057967
>>
>> Glad to see this finally addressed, thanks!
>>
>> I did not look through the code changes, but ran Octane on my
>> configuration. As expected, Typescript had improved substantially. Other
>> benchmarks are not affected much. This in line with the performance
>> analysis done for the original bug report.
>>
>> Baseline:
>>
>> Benchmark          Mode  Cnt        Score        Error  Units
>> Box2D.test           ss   20     4454.677 ?    345.807  ms/op
>> CodeLoad.test        ss   20     4784.299 ?    370.658  ms/op
>> Crypto.test          ss   20      878.395 ?     87.918  ms/op
>> DeltaBlue.test       ss   20      502.182 ?     52.362  ms/op
>> EarleyBoyer.test     ss   20     2250.508 ?    273.924  ms/op
>> Gbemu.test           ss   20     5893.102 ?    656.036  ms/op
>> Mandreel.test        ss   20     9323.484 ?    825.801  ms/op
>> NavierStokes.test    ss   20      657.608 ?     41.212  ms/op
>> PdfJS.test           ss   20     3829.534 ?    353.702  ms/op
>> Raytrace.test        ss   20     1202.826 ?    166.795  ms/op
>> Regexp.test          ss   20      156.782 ?     20.992  ms/op
>> Richards.test        ss   20      324.256 ?     35.874  ms/op
>> Splay.test           ss   20      179.660 ?     34.120  ms/op
>> Typescript.test      ss   20       40.537 ?      2.457   s/op
>>
>> Patched:
>>
>> Benchmark          Mode  Cnt        Score        Error  Units
>> Box2D.test           ss   20     4306.198 ?    376.030  ms/op
>> CodeLoad.test        ss   20     4881.635 ?    395.585  ms/op
>> Crypto.test          ss   20      823.551 ?    106.679  ms/op
>> DeltaBlue.test       ss   20      490.557 ?     41.705  ms/op
>> EarleyBoyer.test     ss   20     2299.763 ?    270.961  ms/op
>> Gbemu.test           ss   20     5612.868 ?    414.052  ms/op
>> Mandreel.test        ss   20     8616.735 ?    825.813  ms/op
>> NavierStokes.test    ss   20      640.722 ?     28.035  ms/op
>> PdfJS.test           ss   20     4139.396 ?    373.580  ms/op
>> Raytrace.test        ss   20     1227.632 ?    151.088  ms/op
>> Regexp.test          ss   20      169.246 ?     34.055  ms/op
>> Richards.test        ss   20      331.824 ?     32.706  ms/op
>> Splay.test           ss   20      168.479 ?     23.512  ms/op
>> Typescript.test      ss   20       31.181 ?      1.790   s/op
>>
>> The offending profile branch (Universe::flush_dependents_on) is also
>> gone, which explains the performance improvement.
>>
>> Thanks,
>> -Aleksey.
>>
>_______________________________________________
>mlvm-dev mailing list
>mlvm-dev at openjdk.java.net
>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


From tobias.hartmann at oracle.com  Thu Apr  9 12:10:22 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 09 Apr 2015 14:10:22 +0200
Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in
	HeapByteBufferTest.java
Message-ID: <55266C2E.3050207@oracle.com>

Hi,

please review the following patch.

https://bugs.openjdk.java.net/browse/JDK-8076625
http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/

Problem:
A random offset to access in a byte array is computed by 

  int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) {
    return abs(r.nextInt()) % (buf.capacity() - size);
  }

The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative.

Solution:
Use nextInt(int n) to set the limit of the random value.

Testing:
Failing testcase and JPRT.

Thanks,
Tobias

From vladimir.x.ivanov at oracle.com  Thu Apr  9 14:33:48 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 09 Apr 2015 17:33:48 +0300
Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in
	HeapByteBufferTest.java
In-Reply-To: <55266C2E.3050207@oracle.com>
References: <55266C2E.3050207@oracle.com>
Message-ID: <55268DCC.70909@oracle.com>

Looks good.

Best regards,
Vladimir Ivanov

On 4/9/15 3:10 PM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch.
>
> https://bugs.openjdk.java.net/browse/JDK-8076625
> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/
>
> Problem:
> A random offset to access in a byte array is computed by
>
>    int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) {
>      return abs(r.nextInt()) % (buf.capacity() - size);
>    }
>
> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative.
>
> Solution:
> Use nextInt(int n) to set the limit of the random value.
>
> Testing:
> Failing testcase and JPRT.
>
> Thanks,
> Tobias
>

From tobias.hartmann at oracle.com  Thu Apr  9 14:35:05 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 09 Apr 2015 16:35:05 +0200
Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in
	HeapByteBufferTest.java
In-Reply-To: <55268DCC.70909@oracle.com>
References: <55266C2E.3050207@oracle.com> <55268DCC.70909@oracle.com>
Message-ID: <55268E19.8030404@oracle.com>

Thanks, Vladimir.

Best,
Tobias

On 09.04.2015 16:33, Vladimir Ivanov wrote:
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 4/9/15 3:10 PM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8076625
>> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/
>>
>> Problem:
>> A random offset to access in a byte array is computed by
>>
>>    int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) {
>>      return abs(r.nextInt()) % (buf.capacity() - size);
>>    }
>>
>> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative.
>>
>> Solution:
>> Use nextInt(int n) to set the limit of the random value.
>>
>> Testing:
>> Failing testcase and JPRT.
>>
>> Thanks,
>> Tobias
>>

From vladimir.kozlov at oracle.com  Thu Apr  9 16:22:43 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 09 Apr 2015 09:22:43 -0700
Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in
	HeapByteBufferTest.java
In-Reply-To: <55266C2E.3050207@oracle.com>
References: <55266C2E.3050207@oracle.com>
Message-ID: <5526A753.8040606@oracle.com>

Tobias,

I also asked to use Utils::getRandomInstance() to get reproducible results.

Thanks,
Vladimir


On 4/9/15 5:10 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch.
>
> https://bugs.openjdk.java.net/browse/JDK-8076625
> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/
>
> Problem:
> A random offset to access in a byte array is computed by
>
>    int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) {
>      return abs(r.nextInt()) % (buf.capacity() - size);
>    }
>
> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative.
>
> Solution:
> Use nextInt(int n) to set the limit of the random value.
>
> Testing:
> Failing testcase and JPRT.
>
> Thanks,
> Tobias
>

From michael.c.berg at intel.com  Thu Apr  9 23:02:57 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Thu, 9 Apr 2015 23:02:57 +0000
Subject: RFR 8076276 support for AVX512
In-Reply-To: <55259078.1080309@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DCE552@FMSMSX102.amr.corp.intel.com>
	<55258337.2050605@oracle.com>
	<C568518E7B433348B114B6A7122D474755DCEBDA@FMSMSX102.amr.corp.intel.com>
	<55259078.1080309@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474755DCED7C@FMSMSX102.amr.corp.intel.com>

Vladimir, some explanation of the EVEX encoding model is needed:

Some instructions are agnostic to vector length and can take the implicit k0 definition in encoding.  Some instructions must have predication definitions for their mask application to SIMD, which explicitly exclude k0. The range usage of predication mask registers must be k1..k7 as a real definition which code must provide with a mask value.  The EVEX enabled machine environment does not automatically initialize any of the mask assignable registers (k1..k7), so we must emit kmov instructions which gather an immediate value from a gpr register.  You will see code such as this in the review.  This effectively means KRegister must stay in the 
implementation, but I can accommodate the lion share of what you have indicated.  The places where KRegister is used via the assembler layer are:

src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265, 
src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it needs one too"
src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046

This is in place of formal register allocation for now as well as when we do more extravagant things with SIMD masks.  I will keep the webrev around so I can easily add these pieces back in as we are going to need them.
Also there are many other mask register instructions in the ISA which we will need to make use of in the future.  If this is amenable I will look into the other changes and resend the webrev accordingly modified.

Thanks,
Michael


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, April 08, 2015 1:33 PM
To: Berg, Michael C
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR 8076276 support for AVX512

Michael, please, make sure to include mailing lists in replies - it is review process.

I understand that K register may be important but I don't see the need to include it in these changes which are huge already. We can do it as separate changes unless you point me where they are critical needed for avx512 instructions.
I don't see the use of it in current changes which simple widen vectors to 512 bits.

I am concern that K reg implementation is incomplete but it is hard to see and review it in current changes.

Regards,
Vladimir

On 4/8/15 1:09 PM, Berg, Michael C wrote:
> Vladimir, RegK is needed as it frames the kmov instructions which 
> utilize KRegister and the enumerated k registers, which are critically 
> needed and used, although not yet matched (we use k1 and k0 now).  I will look into to the rest of the comments.  The plan is to register allocate the k registers at some point though.
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: hotspot-compiler-dev 
> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
> Vladimir Kozlov
> Sent: Wednesday, April 08, 2015 12:36 PM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR 8076276 support for AVX512
>
> I would suggest to remove MoveK and RegK from these changes since they are not used.
> We can add them later when you have the use case.
>
> sharedRuntime_x86_64.* You should have code and not comment:
> // TODO: add ZMM save code
>
> vm_version_x86.cpp Add code to verify that system preserve Z registers during interrupt. See code after comment :
>
> // Some OSs have a bug when upper 128bits of YMM
>
>
> I see repeated next pattern in C1 code. It should be moved to a function in FrameMap:
>
> +        int num_caller_save_xmm_regs = 
> +FrameMap::nof_caller_save_xmm_regs;
> +#if _LP64
> +        if (UseAVX < 3) {
> +          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
> +        }
> +#endif
>
>
> In general we should avoid using #ifdef X86 in shared code: matcher.cpp. This file will not be issue if you remove RegK from changes.
>
> c2compiler.cpp - can you move that code to Compile::pd_compiler2_init() which is platform specific?
>
> matcher.cpp - typo 'eno':
>
> +    // For VecZ we need eno alignment and 64 bytes (16 slots) for spills.
>
>
> Thanks,
> Vladimir
>
>
> On 4/6/15 6:35 PM, Berg, Michael C wrote:
>> Hi Folks,
>>
>> We (Intel) would like to contribute initial support for AVX512 (EVEX 
>> encoding, new register support, new ISA support,
>> etc) for EVEX enabled microarchitectures.
>> The contribution is referenced as Bug ID 8076276 as a performance enhancement.
>>
>> Please review this patch and comment as needed:
>>
>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>>
>> webrev:
>> http://cr.openjdk.java.net/~kvn/8076276/webrev
>>
>> Superword optimizations covered on the vectorization path experience 
>> as much as 50% reduction in loop trace instruction count which make up the path length of EVEX encoded SIMD optimized loops.
>>
>> Vladimir Koslov has offered to sponsor this patch.
>>

From vladimir.kozlov at oracle.com  Thu Apr  9 23:53:36 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 09 Apr 2015 16:53:36 -0700
Subject: RFR 8076276 support for AVX512
In-Reply-To: <C568518E7B433348B114B6A7122D474755DCED7C@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755DCE552@FMSMSX102.amr.corp.intel.com>
	<55258337.2050605@oracle.com>
	<C568518E7B433348B114B6A7122D474755DCEBDA@FMSMSX102.amr.corp.intel.com>
	<55259078.1080309@oracle.com>
	<C568518E7B433348B114B6A7122D474755DCED7C@FMSMSX102.amr.corp.intel.com>
Message-ID: <55271100.8080203@oracle.com>

Michael,

Thank you for detail explanation. I need to clarify by request:

1. I am fine with kmov amd Kregister definitions and usage in assembler, 
macroassembler and stubs.

2. I don't want KRegister and Kmove in C2 code (opto/ and .ad files) 
until we have full support for them in RA and signal processing.

Thanks,
Vladimir

On 4/9/15 4:02 PM, Berg, Michael C wrote:
> Vladimir, some explanation of the EVEX encoding model is needed:
>
> Some instructions are agnostic to vector length and can take the implicit k0 definition in encoding.  Some instructions must have predication definitions for their mask application to SIMD, which explicitly exclude k0. The range usage of predication mask registers must be k1..k7 as a real definition which code must provide with a mask value.  The EVEX enabled machine environment does not automatically initialize any of the mask assignable registers (k1..k7), so we must emit kmov instructions which gather an immediate value from a gpr register.  You will see code such as this in the review.  This effectively means KRegister must stay in the
> implementation, but I can accommodate the lion share of what you have indicated.  The places where KRegister is used via the assembler layer are:
>
> src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265,
> src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it needs one too"
> src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046
>
> This is in place of formal register allocation for now as well as when we do more extravagant things with SIMD masks.  I will keep the webrev around so I can easily add these pieces back in as we are going to need them.
> Also there are many other mask register instructions in the ISA which we will need to make use of in the future.  If this is amenable I will look into the other changes and resend the webrev accordingly modified.
>
> Thanks,
> Michael
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, April 08, 2015 1:33 PM
> To: Berg, Michael C
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR 8076276 support for AVX512
>
> Michael, please, make sure to include mailing lists in replies - it is review process.
>
> I understand that K register may be important but I don't see the need to include it in these changes which are huge already. We can do it as separate changes unless you point me where they are critical needed for avx512 instructions.
> I don't see the use of it in current changes which simple widen vectors to 512 bits.
>
> I am concern that K reg implementation is incomplete but it is hard to see and review it in current changes.
>
> Regards,
> Vladimir
>
> On 4/8/15 1:09 PM, Berg, Michael C wrote:
>> Vladimir, RegK is needed as it frames the kmov instructions which
>> utilize KRegister and the enumerated k registers, which are critically
>> needed and used, although not yet matched (we use k1 and k0 now).  I will look into to the rest of the comments.  The plan is to register allocate the k registers at some point though.
>>
>> Thanks,
>> Michael
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev
>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of
>> Vladimir Kozlov
>> Sent: Wednesday, April 08, 2015 12:36 PM
>> To: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR 8076276 support for AVX512
>>
>> I would suggest to remove MoveK and RegK from these changes since they are not used.
>> We can add them later when you have the use case.
>>
>> sharedRuntime_x86_64.* You should have code and not comment:
>> // TODO: add ZMM save code
>>
>> vm_version_x86.cpp Add code to verify that system preserve Z registers during interrupt. See code after comment :
>>
>> // Some OSs have a bug when upper 128bits of YMM
>>
>>
>> I see repeated next pattern in C1 code. It should be moved to a function in FrameMap:
>>
>> +        int num_caller_save_xmm_regs =
>> +FrameMap::nof_caller_save_xmm_regs;
>> +#if _LP64
>> +        if (UseAVX < 3) {
>> +          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
>> +        }
>> +#endif
>>
>>
>> In general we should avoid using #ifdef X86 in shared code: matcher.cpp. This file will not be issue if you remove RegK from changes.
>>
>> c2compiler.cpp - can you move that code to Compile::pd_compiler2_init() which is platform specific?
>>
>> matcher.cpp - typo 'eno':
>>
>> +    // For VecZ we need eno alignment and 64 bytes (16 slots) for spills.
>>
>>
>> Thanks,
>> Vladimir
>>
>>
>> On 4/6/15 6:35 PM, Berg, Michael C wrote:
>>> Hi Folks,
>>>
>>> We (Intel) would like to contribute initial support for AVX512 (EVEX
>>> encoding, new register support, new ISA support,
>>> etc) for EVEX enabled microarchitectures.
>>> The contribution is referenced as Bug ID 8076276 as a performance enhancement.
>>>
>>> Please review this patch and comment as needed:
>>>
>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>>>
>>> webrev:
>>> http://cr.openjdk.java.net/~kvn/8076276/webrev
>>>
>>> Superword optimizations covered on the vectorization path experience
>>> as much as 50% reduction in loop trace instruction count which make up the path length of EVEX encoded SIMD optimized loops.
>>>
>>> Vladimir Koslov has offered to sponsor this patch.
>>>

From tobias.hartmann at oracle.com  Fri Apr 10 08:52:18 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 10 Apr 2015 10:52:18 +0200
Subject: [8u60] RFR of backport for 8066875: VirtualSpace does not use
	large pages
In-Reply-To: <1427374516.3149.37.camel@oracle.com>
References: <1427374516.3149.37.camel@oracle.com>
Message-ID: <55278F42.9030703@oracle.com>

Hi Thomas,

the code cache related changes look good to me (not a reviewer).

Best,
Tobias

On 26.03.2015 13:55, Thomas Schatzl wrote:
> Hi all,
> 
>   can I have reviews for the backport of "8066875: VirtualSpace does not
> use large pages" for 8u60? I also would like to have one review from the
> compiler team (cc'ed) since the change touches some compiler files.
> 
> It did only apply with minor changes, so I need re-reviews. The problem
> is that in jdk9 the code cache sizing has been changed. In particular:
> - dropped the hunk in code/codeCache.cpp because the code to determine
> memory sizes in 8u60 is much simpler i.e. .
> E.g. this change:
> --- a/src/share/vm/code/codeCache.cpp	Thu Jan 15 16:05:20 2015 +0100
> +++ b/src/share/vm/code/codeCache.cpp	Fri Jan 16 10:29:12 2015 +0100
> @@ -233,8 +233,8 @@
>  ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) {
>    // Determine alignment
>    const size_t page_size = os::can_execute_large_page_memory() ?
> -          MIN2(os::page_size_for_region(InitialCodeCacheSize, 8),
> -               os::page_size_for_region(size, 8)) :
> +          MIN2(os::page_size_for_region_aligned(InitialCodeCacheSize, 8),
> +               os::page_size_for_region_aligned(size, 8)) :
>            os::vm_page_size();
>    const size_t granularity = os::vm_allocation_granularity();
>    const size_t r_align = MAX2(page_size, granularity);
> 
> - fixed the code in heap.cpp because of the same change (JDK-8015774:
> Add support for multiple code heaps) is not in 8u60.
> 
> Note that this change is based on "8049864: TestParallelHeapSizeFlags
> fails with unexpected heap size"
> which is also out for review (on hotspot-gc-dev), and "8053995: Add
> method to WhiteBox to get vm_pagesize" which applies cleanly.
> 
> Full 8u60 changeset:
> http://cr.openjdk.java.net/~tschatzl/8066875-8u60/webrev.8u60/
> Fix changeset:
> http://cr.openjdk.java.net/~tschatzl/8066875-8u60/webrev.8u60-fix/
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8066875
> Original change:
> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4321214d5dbc
> 
> Testing: jprt
> 
> With that changeset in place, JDK-8058354 can be merged relatively
> easily, which is the goal of most of the recent backports.
> 
> Thanks,
>   Thomas
> 
> 
> 
> 
> 
> 

From tobias.hartmann at oracle.com  Fri Apr 10 10:42:45 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 10 Apr 2015 12:42:45 +0200
Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in
	HeapByteBufferTest.java
In-Reply-To: <5526A753.8040606@oracle.com>
References: <55266C2E.3050207@oracle.com> <5526A753.8040606@oracle.com>
Message-ID: <5527A925.5060103@oracle.com>

Hi Vladimir,

On 09.04.2015 18:22, Vladimir Kozlov wrote:
> I also asked to use Utils::getRandomInstance() to get reproducible results.

Sorry, I missed that. Here is the new webrev:

http://cr.openjdk.java.net/~thartmann/8076625/webrev.01/

I also noticed that the sizes of short and char reads passed to 'randomOffset' are too large (4 instead of 2). Fixed it.

Best,
Tobias


> Thanks,
> Vladimir
> 
> 
> On 4/9/15 5:10 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8076625
>> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/
>>
>> Problem:
>> A random offset to access in a byte array is computed by
>>
>>    int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) {
>>      return abs(r.nextInt()) % (buf.capacity() - size);
>>    }
>>
>> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative.
>>
>> Solution:
>> Use nextInt(int n) to set the limit of the random value.
>>
>> Testing:
>> Failing testcase and JPRT.
>>
>> Thanks,
>> Tobias
>>

From bengt.rutisson at oracle.com  Fri Apr 10 10:57:37 2015
From: bengt.rutisson at oracle.com (Bengt Rutisson)
Date: Fri, 10 Apr 2015 12:57:37 +0200
Subject: [8u60] RFR of backport for 8066875: VirtualSpace does not use
	large pages
In-Reply-To: <55278F42.9030703@oracle.com>
References: <1427374516.3149.37.camel@oracle.com> <55278F42.9030703@oracle.com>
Message-ID: <5527ACA1.5000907@oracle.com>


Hi Thomas,

On 2015-04-10 10:52, Tobias Hartmann wrote:
> Hi Thomas,
>
> the code cache related changes look good to me (not a reviewer).

The change looks good to me too.

Bengt

>
> Best,
> Tobias
>
> On 26.03.2015 13:55, Thomas Schatzl wrote:
>> Hi all,
>>
>>    can I have reviews for the backport of "8066875: VirtualSpace does not
>> use large pages" for 8u60? I also would like to have one review from the
>> compiler team (cc'ed) since the change touches some compiler files.
>>
>> It did only apply with minor changes, so I need re-reviews. The problem
>> is that in jdk9 the code cache sizing has been changed. In particular:
>> - dropped the hunk in code/codeCache.cpp because the code to determine
>> memory sizes in 8u60 is much simpler i.e. .
>> E.g. this change:
>> --- a/src/share/vm/code/codeCache.cpp	Thu Jan 15 16:05:20 2015 +0100
>> +++ b/src/share/vm/code/codeCache.cpp	Fri Jan 16 10:29:12 2015 +0100
>> @@ -233,8 +233,8 @@
>>   ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) {
>>     // Determine alignment
>>     const size_t page_size = os::can_execute_large_page_memory() ?
>> -          MIN2(os::page_size_for_region(InitialCodeCacheSize, 8),
>> -               os::page_size_for_region(size, 8)) :
>> +          MIN2(os::page_size_for_region_aligned(InitialCodeCacheSize, 8),
>> +               os::page_size_for_region_aligned(size, 8)) :
>>             os::vm_page_size();
>>     const size_t granularity = os::vm_allocation_granularity();
>>     const size_t r_align = MAX2(page_size, granularity);
>>
>> - fixed the code in heap.cpp because of the same change (JDK-8015774:
>> Add support for multiple code heaps) is not in 8u60.
>>
>> Note that this change is based on "8049864: TestParallelHeapSizeFlags
>> fails with unexpected heap size"
>> which is also out for review (on hotspot-gc-dev), and "8053995: Add
>> method to WhiteBox to get vm_pagesize" which applies cleanly.
>>
>> Full 8u60 changeset:
>> http://cr.openjdk.java.net/~tschatzl/8066875-8u60/webrev.8u60/
>> Fix changeset:
>> http://cr.openjdk.java.net/~tschatzl/8066875-8u60/webrev.8u60-fix/
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8066875
>> Original change:
>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4321214d5dbc
>>
>> Testing: jprt
>>
>> With that changeset in place, JDK-8058354 can be merged relatively
>> easily, which is the goal of most of the recent backports.
>>
>> Thanks,
>>    Thomas
>>
>>
>>
>>
>>
>>


From kirill.zhaldybin at oracle.com  Fri Apr 10 13:15:12 2015
From: kirill.zhaldybin at oracle.com (Kirill Zhaldybin)
Date: Fri, 10 Apr 2015 16:15:12 +0300
Subject: RFR(XS): JDK-8071546:
	hotspot/test/compiler/codecache/jmx/PoolsIndependenceTest.java
	has been fixed, but still is in the exclude list
Message-ID: <5527CCE0.2060301@oracle.com>

Dear all,

Could you please review this really small fix?

CR:
https://bugs.openjdk.java.net/browse/JDK-8071546?

Webrev:
http://cr.openjdk.java.net/~ppunegov/kzhaldybin/8071546/webrev/
1 line changed: 0 ins; 1 del; 0 mod

Thank you.

Regards, Kirill

From igor.ignatyev at oracle.com  Fri Apr 10 13:22:34 2015
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 10 Apr 2015 16:22:34 +0300
Subject: RFR(XS): JDK-8071546:
	hotspot/test/compiler/codecache/jmx/PoolsIndependenceTest.java
	has been fixed, but still is in the exclude list
In-Reply-To: <5527CCE0.2060301@oracle.com>
References: <5527CCE0.2060301@oracle.com>
Message-ID: <5527CE9A.6040502@oracle.com>

Hi Kirill,

looks good to me.

Igor

On 04/10/2015 04:15 PM, Kirill Zhaldybin wrote:
> Dear all,
>
> Could you please review this really small fix?
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8071546?
>
> Webrev:
> http://cr.openjdk.java.net/~ppunegov/kzhaldybin/8071546/webrev/
> 1 line changed: 0 ins; 1 del; 0 mod
>
> Thank you.
>
> Regards, Kirill

From evgeniya.stepanova at oracle.com  Fri Apr 10 14:01:37 2015
From: evgeniya.stepanova at oracle.com (Evgeniya Stepanova)
Date: Fri, 10 Apr 2015 17:01:37 +0300
Subject: [8u60] RFR(s): 8038098: [TESTBUG] remove explicit set build flavor
	from hotspot/test/compiler/* tests
Message-ID: <5527D7C1.9050704@oracle.com>

Hi,

Could you please review back-port of 8038098 to the 8udev repo?
Diff applies cleanly to the all tests except of the 
test/compiler/IntegerArithmetic/TestIntegerComparison.java test, which 
does not exist in 8u60 repo.
After fix tests pass with 8u60 b09 with the client vm.

webrev for 8u60: 
http://cr.openjdk.java.net/~eistepan/8038098/8u60/webrev.00/
bug: https://bugs.openjdk.java.net/browse/JDK-8038098

Original webrev: 
http://cr.openjdk.java.net/~iignatyev/eistepan/8038098/webrev.02/
mail thread for 9: 
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015540.html
Original change:
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/662499384b32
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/662499384b32

Thanks,
Jane
-- 
/Evgeniya Stepanova/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150410/c692e3fd/attachment.html>

From thomas.schatzl at oracle.com  Fri Apr 10 14:30:19 2015
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 10 Apr 2015 16:30:19 +0200
Subject: [8u60] RFR of backport for 8066875: VirtualSpace does not use
	large pages
In-Reply-To: <5527ACA1.5000907@oracle.com>
References: <1427374516.3149.37.camel@oracle.com>
	<55278F42.9030703@oracle.com> <5527ACA1.5000907@oracle.com>
Message-ID: <1428676219.3364.14.camel@oracle.com>

Hi Tobias and Bengt,

On Fri, 2015-04-10 at 12:57 +0200, Bengt Rutisson wrote:
> Hi Thomas,
> 
> On 2015-04-10 10:52, Tobias Hartmann wrote:
> > Hi Thomas,
> >
> > the code cache related changes look good to me (not a reviewer).
> 
> The change looks good to me too.

Thanks for the reviews.

Thomas


From daniel.daugherty at oracle.com  Fri Apr 10 15:12:37 2015
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 10 Apr 2015 09:12:37 -0600
Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop
In-Reply-To: <5527E6E6.5000707@redhat.com>
References: <5527E6E6.5000707@redhat.com>
Message-ID: <5527E865.4060709@oracle.com>

Adding in the Compiler team since this is the MacroAssembler...

Dan


On 4/10/15 9:06 AM, Andrew Haley wrote:
> This is for x86:
>
>    addl (idx, 0x2);
>    andl (idx, 0x1);
>    subl(idx, 1);
>    jcc(Assembler::negative, L_post_third_loop_done);
>
> I'm trying to guess what the "addl (idx, 0x2)" instruction was supposed to do.
> I don't think it has any effect now.
>
> Andrew.


From vladimir.kozlov at oracle.com  Fri Apr 10 15:17:54 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 10 Apr 2015 08:17:54 -0700
Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in
	HeapByteBufferTest.java
In-Reply-To: <5527A925.5060103@oracle.com>
References: <55266C2E.3050207@oracle.com> <5526A753.8040606@oracle.com>
	<5527A925.5060103@oracle.com>
Message-ID: <5527E9A2.7050709@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/10/15 3:42 AM, Tobias Hartmann wrote:
> Hi Vladimir,
>
> On 09.04.2015 18:22, Vladimir Kozlov wrote:
>> I also asked to use Utils::getRandomInstance() to get reproducible results.
>
> Sorry, I missed that. Here is the new webrev:
>
> http://cr.openjdk.java.net/~thartmann/8076625/webrev.01/
>
> I also noticed that the sizes of short and char reads passed to 'randomOffset' are too large (4 instead of 2). Fixed it.
>
> Best,
> Tobias
>
>
>> Thanks,
>> Vladimir
>>
>>
>> On 4/9/15 5:10 AM, Tobias Hartmann wrote:
>>> Hi,
>>>
>>> please review the following patch.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8076625
>>> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/
>>>
>>> Problem:
>>> A random offset to access in a byte array is computed by
>>>
>>>     int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) {
>>>       return abs(r.nextInt()) % (buf.capacity() - size);
>>>     }
>>>
>>> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative.
>>>
>>> Solution:
>>> Use nextInt(int n) to set the limit of the random value.
>>>
>>> Testing:
>>> Failing testcase and JPRT.
>>>
>>> Thanks,
>>> Tobias
>>>

From vladimir.kozlov at oracle.com  Fri Apr 10 15:28:11 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 10 Apr 2015 08:28:11 -0700
Subject: RFR(XS): JDK-8071546:
	hotspot/test/compiler/codecache/jmx/PoolsIndependenceTest.java
	has been fixed, but still is in the exclude list
In-Reply-To: <5527CCE0.2060301@oracle.com>
References: <5527CCE0.2060301@oracle.com>
Message-ID: <5527EC0B.5080102@oracle.com>

Good.

Vladimir

On 4/10/15 6:15 AM, Kirill Zhaldybin wrote:
> Dear all,
>
> Could you please review this really small fix?
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8071546?
>
> Webrev:
> http://cr.openjdk.java.net/~ppunegov/kzhaldybin/8071546/webrev/
> 1 line changed: 0 ins; 1 del; 0 mod
>
> Thank you.
>
> Regards, Kirill

From vladimir.kozlov at oracle.com  Fri Apr 10 16:30:45 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 10 Apr 2015 09:30:45 -0700
Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop
In-Reply-To: <5527E865.4060709@oracle.com>
References: <5527E6E6.5000707@redhat.com> <5527E865.4060709@oracle.com>
Message-ID: <5527FAB5.6090802@oracle.com>

It restores counter after:

   subl(idx, 2);
   jcc(Assembler::negative, L_check_1);

The value could be 1,2,3 before this point (idx & 3 before).
After 'sub 2': -1, 0, 1.

So we have to restore to positive values before subtracting 1.

Vladimir

On 4/10/15 8:12 AM, Daniel D. Daugherty wrote:
> Adding in the Compiler team since this is the MacroAssembler...
>
> Dan
>
>
> On 4/10/15 9:06 AM, Andrew Haley wrote:
>> This is for x86:
>>
>>    addl (idx, 0x2);
>>    andl (idx, 0x1);
>>    subl(idx, 1);
>>    jcc(Assembler::negative, L_post_third_loop_done);
>>
>> I'm trying to guess what the "addl (idx, 0x2)" instruction was supposed to do.
>> I don't think it has any effect now.
>>
>> Andrew.
>

From aph at redhat.com  Fri Apr 10 16:47:02 2015
From: aph at redhat.com (Andrew Haley)
Date: Fri, 10 Apr 2015 17:47:02 +0100
Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop
In-Reply-To: <5527FAB5.6090802@oracle.com>
References: <5527E6E6.5000707@redhat.com> <5527E865.4060709@oracle.com>
	<5527FAB5.6090802@oracle.com>
Message-ID: <5527FE86.3060302@redhat.com>

On 04/10/2015 05:30 PM, Vladimir Kozlov wrote:
> It restores counter after:
> 
>    subl(idx, 2);
>    jcc(Assembler::negative, L_check_1);
> 
> The value could be 1,2,3 before this point (idx & 3 before).
> After 'sub 2': -1, 0, 1.
> 
> So we have to restore to positive values before subtracting 1.

But after

>    andl (idx, 0x1);

the only possible values are 0 and 1, and this is true regardless of the
'sub 2'.

>>>    addl (idx, 0x2);
>>>    andl (idx, 0x1);
>>>    subl(idx, 1);
>>>    jcc(Assembler::negative, L_post_third_loop_done);

Andrew.


From vladimir.kozlov at oracle.com  Fri Apr 10 16:47:13 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 10 Apr 2015 09:47:13 -0700
Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop
In-Reply-To: <5527FAB5.6090802@oracle.com>
References: <5527E6E6.5000707@redhat.com> <5527E865.4060709@oracle.com>
	<5527FAB5.6090802@oracle.com>
Message-ID: <5527FE91.1040606@oracle.com>

On 4/10/15 9:30 AM, Vladimir Kozlov wrote:
> It restores counter after:
>
>    subl(idx, 2);
>    jcc(Assembler::negative, L_check_1);
>
> The value could be 1,2,3 before this point (idx & 3 before).
> After 'sub 2': -1, 0, 1.
>
> So we have to restore to positive values before subtracting 1.

And you are right, it is not needed. We get the same (correct) result from 'and idx,1' regardless executing 'add'.

Vladimir

>
> Vladimir
>
> On 4/10/15 8:12 AM, Daniel D. Daugherty wrote:
>> Adding in the Compiler team since this is the MacroAssembler...
>>
>> Dan
>>
>>
>> On 4/10/15 9:06 AM, Andrew Haley wrote:
>>> This is for x86:
>>>
>>>    addl (idx, 0x2);
>>>    andl (idx, 0x1);
>>>    subl(idx, 1);
>>>    jcc(Assembler::negative, L_post_third_loop_done);
>>>
>>> I'm trying to guess what the "addl (idx, 0x2)" instruction was supposed to do.
>>> I don't think it has any effect now.
>>>
>>> Andrew.
>>

From aph at redhat.com  Fri Apr 10 16:49:41 2015
From: aph at redhat.com (Andrew Haley)
Date: Fri, 10 Apr 2015 17:49:41 +0100
Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop
In-Reply-To: <5527FE91.1040606@oracle.com>
References: <5527E6E6.5000707@redhat.com> <5527E865.4060709@oracle.com>
	<5527FAB5.6090802@oracle.com> <5527FE91.1040606@oracle.com>
Message-ID: <5527FF25.5030508@redhat.com>

On 04/10/2015 05:47 PM, Vladimir Kozlov wrote:
> On 4/10/15 9:30 AM, Vladimir Kozlov wrote:
>> It restores counter after:
>>
>>    subl(idx, 2);
>>    jcc(Assembler::negative, L_check_1);
>>
>> The value could be 1,2,3 before this point (idx & 3 before).
>> After 'sub 2': -1, 0, 1.
>>
>> So we have to restore to positive values before subtracting 1.
> 
> And you are right, it is not needed. We get the same (correct) result from 'and idx,1' regardless executing 'add'.

OK, cool.  It's not important then: I was just wondering if I'd found a
latent bug.

Andrew.


From tobias.hartmann at oracle.com  Mon Apr 13 04:51:06 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 13 Apr 2015 06:51:06 +0200
Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in
	HeapByteBufferTest.java
In-Reply-To: <5527E9A2.7050709@oracle.com>
References: <55266C2E.3050207@oracle.com> <5526A753.8040606@oracle.com>
	<5527A925.5060103@oracle.com> <5527E9A2.7050709@oracle.com>
Message-ID: <552B4B3A.4010609@oracle.com>

Thanks, Vladimir.

Best,
Tobias

On 10.04.2015 17:17, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 4/10/15 3:42 AM, Tobias Hartmann wrote:
>> Hi Vladimir,
>>
>> On 09.04.2015 18:22, Vladimir Kozlov wrote:
>>> I also asked to use Utils::getRandomInstance() to get reproducible results.
>>
>> Sorry, I missed that. Here is the new webrev:
>>
>> http://cr.openjdk.java.net/~thartmann/8076625/webrev.01/
>>
>> I also noticed that the sizes of short and char reads passed to 'randomOffset' are too large (4 instead of 2). Fixed it.
>>
>> Best,
>> Tobias
>>
>>
>>> Thanks,
>>> Vladimir
>>>
>>>
>>> On 4/9/15 5:10 AM, Tobias Hartmann wrote:
>>>> Hi,
>>>>
>>>> please review the following patch.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8076625
>>>> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/
>>>>
>>>> Problem:
>>>> A random offset to access in a byte array is computed by
>>>>
>>>>     int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) {
>>>>       return abs(r.nextInt()) % (buf.capacity() - size);
>>>>     }
>>>>
>>>> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative.
>>>>
>>>> Solution:
>>>> Use nextInt(int n) to set the limit of the random value.
>>>>
>>>> Testing:
>>>> Failing testcase and JPRT.
>>>>
>>>> Thanks,
>>>> Tobias
>>>>

From igor.ignatyev at oracle.com  Mon Apr 13 09:22:13 2015
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 13 Apr 2015 12:22:13 +0300
Subject: [8u60] RFR(s): 8038098: [TESTBUG] remove explicit set build flavor
	from hotspot/test/compiler/* tests
In-Reply-To: <5527D7C1.9050704@oracle.com>
References: <5527D7C1.9050704@oracle.com>
Message-ID: <552B8AC5.6040404@oracle.com>

Evgeniya,

looks good to me.

Igor

On 04/10/2015 05:01 PM, Evgeniya Stepanova wrote:
> Hi,
>
> Could you please review back-port of 8038098 to the 8udev repo?
> Diff applies cleanly to the all tests except of the
> test/compiler/IntegerArithmetic/TestIntegerComparison.java test, which
> does not exist in 8u60 repo.
> After fix tests pass with 8u60 b09 with the client vm.
>
> webrev for 8u60:
> http://cr.openjdk.java.net/~eistepan/8038098/8u60/webrev.00/
> bug: https://bugs.openjdk.java.net/browse/JDK-8038098
>
> Original webrev:
> http://cr.openjdk.java.net/~iignatyev/eistepan/8038098/webrev.02/
> mail thread for 9:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015540.html
> Original change:
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/662499384b32
> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/662499384b32
>
> Thanks,
> Jane
> --
> /Evgeniya Stepanova/

From evgeniya.stepanova at oracle.com  Mon Apr 13 09:28:28 2015
From: evgeniya.stepanova at oracle.com (Evgeniya Stepanova)
Date: Mon, 13 Apr 2015 12:28:28 +0300
Subject: [8u60] RFR(s): 8038098: [TESTBUG] remove explicit set build flavor
	from hotspot/test/compiler/* tests
In-Reply-To: <552B8AC5.6040404@oracle.com>
References: <5527D7C1.9050704@oracle.com> <552B8AC5.6040404@oracle.com>
Message-ID: <552B8C3C.2080403@oracle.com>

Hi Igor,

Thank you for the review!

Jane
On 13.04.2015 12:22, Igor Ignatyev wrote:
> Evgeniya,
>
> looks good to me.
>
> Igor
>
> On 04/10/2015 05:01 PM, Evgeniya Stepanova wrote:
>> Hi,
>>
>> Could you please review back-port of 8038098 to the 8udev repo?
>> Diff applies cleanly to the all tests except of the
>> test/compiler/IntegerArithmetic/TestIntegerComparison.java test, which
>> does not exist in 8u60 repo.
>> After fix tests pass with 8u60 b09 with the client vm.
>>
>> webrev for 8u60:
>> http://cr.openjdk.java.net/~eistepan/8038098/8u60/webrev.00/
>> bug: https://bugs.openjdk.java.net/browse/JDK-8038098
>>
>> Original webrev:
>> http://cr.openjdk.java.net/~iignatyev/eistepan/8038098/webrev.02/
>> mail thread for 9:
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015540.html 
>>
>> Original change:
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/662499384b32
>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/662499384b32
>>
>> Thanks,
>> Jane
>> -- 
>> /Evgeniya Stepanova/

-- 
/Evgeniya Stepanova/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150413/9b4e1470/attachment.html>

From michael.haupt at oracle.com  Mon Apr 13 11:40:08 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Mon, 13 Apr 2015 13:40:08 +0200
Subject: RFR (S): 8076461: JSR292: remove unused native and constants
In-Reply-To: <C1C9C217-1448-416E-9862-47C52A4ED7DC@oracle.com>
References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>
	<C1C9C217-1448-416E-9862-47C52A4ED7DC@oracle.com>
Message-ID: <3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com>

Hi John,

thank you very much for your review; keeping the Constants class around for VM/JDK constant value agreement certainly makes sense. I have undone most of the removal work and verified in a slowdebug build that MHN.verifyConstants() works. I've also added a comment on the Constants class to clarify its role a bit. Local tests and JPRT are still happy with this.

Updated webrev: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.01/

Best,

Michael

> Am 07.04.2015 um 23:49 schrieb John Rose <john.r.rose at oracle.com>:
> 
> On Apr 7, 2015, at 12:11 PM, Michael Haupt <michael.haupt at oracle.com> wrote:
>> 
>> Dear all,
>> 
>> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class.
> 
> The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about.  Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java).  If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE.  It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync.
> 
> If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for.  But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure.
> 
> I'm happy to see the "GC" guys go away.  They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams.
> 
> ? John
> 
>> 
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461
>> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/
>> 
>> Tested with JPRT, HotSpot testset.
>> 
>> Thanks,
>> 
>> Michael


-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150413/760673eb/attachment-0001.html>

From zoltan.majo at oracle.com  Mon Apr 13 11:51:43 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Mon, 13 Apr 2015 13:51:43 +0200
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites
	in GCTR doFinal
Message-ID: <552BADCF.80109@oracle.com>

Hi,


please review the following patch.

Bug: https://bugs.openjdk.java.net/browse/JDK-8067648


Problem: On architectures with hardware support for AES operations, the 
Java version (the version in the JDK sources) of the 
com.sun.crypto.provides.AESCrypt::encryptBlock(byte[], int, byte[], int) 
method is replaced with an intrinsic that uses the CPU's AES instructions.

The Java version of encryptBlock operates on arrays of size 
AES_BLOCK_SIZE=16 and it consequently performs a number of "implicit" 
checks (e.g., null checks and range checks) as required by the Java VM 
specification. The intrinsified version of encryptBlock, however, does 
not perform any of these checks.

Omitting checks results in a VM crash if invalid parameters (e.g., a 
null pointer, as reported in the current case) are passed to the method.


Solution: The failure reported in the current issue appears in the 
com.sun.crypto.provider.GCTR class that calls the intrinsified version 
of encryptBlock. None of the methods of the class are accessible from 
packages other than com.sun.crypto.provider. So, after private a 
discussion with John Rose, Vladimir Kozlov, and Roland Westrelin, I 
propose to solve this problem on the Java-level.

The GCTR::counter field is supposed to be initialized with an array of 
size AES_BLOCK_SIZE so that it is safe to call encryptBlock. The 
'counter' field is never supposed to become NULL during the lifetime of 
a GCTR object (so that encryptBlock can be always called safely).

The GCTR class supports saving and restoring the value of the 'counter' 
field (via the save() and restore() methods). For saving/restoring, the 
class uses the 'counterSave' field as temporary storage.

It is also possible to reset the a GCTR object to its initial state by 
calling reset(). Reset sets both the 'counter' and 'counterSave' fields 
to their initial values.

If a call to the method reset() is followed by a call to restore(), the 
field 'counter' is not restored to its original value, but it becomes 
NULL. This is an invalid state, because a GCTR object should always 
contain a valid 'counter' array. This problem has been also described 
(in part) by Chris Ellis.

https://intrbiz.com/post/blog/development/java_8_aes_gcm_nullpointerexception

This patch proposes to restore the contents of 'counter' from 
'counterSave' only if some data has been saved into 'counterSave' before 
(i.e., counterSave is not NULL). The patch also adds a check to the 
constructor of GCTR to verify if the length of 'counter' is 
AES_BLOCK_SIZE. (I checked and JDK code uses this class only with arrays 
of size AES_BLOCK_SIZE, but it is good if the required size is 
documented and enforced by GCTR.)

The array to store the output of the encryptBlock method (the third 
parameter) should be also of length AES_BLOCK_SIZE. That is ensured by 
the GCTR class (both in the doFinal and update methods). The input and 
output offsets (the second and fourth parameters) are 0, as required by 
encryptBlock.


Webrev: http://cr.openjdk.java.net/~zmajo/8067648/webrev.00/


Testing:

- JPRT (both with 9 and 8u), all tests in the testsets hotspot pass;
- JTREG tests in jdk_security[1-4] executed locally with the sources 
built with --enable-openjdk-only; all tests that pass without the patch 
pass with the patch as well;
- failure reported in 8067648 can be reproduced with 8u, failure is not 
triggered with patch applied.


Thank you and best regards,


Zoltan


From roland.westrelin at oracle.com  Mon Apr 13 14:39:41 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 13 Apr 2015 16:39:41 +0200
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>
	<55086F20.9020305@oracle.com>
	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>
	<550893F0.9050608@oracle.com>
	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
Message-ID: <6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com>

Vladimir,

Do you think this webrev looks ok?

Roland.


> On Mar 24, 2015, at 1:55 PM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
> 
> See inlined.
> 
>>> Thanks for looking at this.
>>> 
>>>> This is what I waited for long time! Thank you for doing this, Roland.
>>>> 
>>>> How you handle case when CastPP is input of Phi node?
>>> 
>>> The Phi has a control so any memory node that depends on the Phi output is guaranteed to be ?after? the CastPP, right?
>> 
>> Yes. So you simple remove CastPP in such case. Okay.
>> 
>>> 
>>>> I am worried how you separate cases when precedence edge added from CastPP and other precedence cases. Can you explain? May be there are no problems.
>>> 
>>> The
>>> 
>>> if (m->is_block_proj()) {
>>> 
>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something?
>> 
>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there.
>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes.
>> If control nodes come only from CastPP then I am fine with your code.
> 
> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that:
> 
> - CastPP nodes don?t always have a control
> - some CastPP nodes depend on a Region because the test was moved to the branch of a dominating If
> - the test for some CastPP?s nodes are removed during escape analysis
> 
> I updated the code to reflect those cases.
> 
> http://cr.openjdk.java.net/~roland/8069191/webrev.01/
> 
> Roland.
> 
>> 
>> Thanks,
>> Vladimir
>> 
>>> 
>>> Roland.
>>> 
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> On 3/17/15 3:54 AM, Roland Westrelin wrote:
>>>>> http://cr.openjdk.java.net/~roland/8069191/webrev.00/
>>>>> 
>>>>> In the test (that needs to be run with StressGCM to cause incorrect code generation), a dependency carried by a CastPP is lost when CastPPs are removed after CCP. Detailed description of the bug is in:
>>>>> 
>>>>> https://bugs.openjdk.java.net/browse/JDK-8069191
>>>>> 
>>>>> Vladimir suggested investigating the performance impact of keeping the CastPPs for the entire compilation. I found that this still causes performance regressions as documented in:
>>>>> 
>>>>> https://bugs.openjdk.java.net/browse/JDK-8039999
>>>>> 
>>>>> The fix I suggest is to keep CastPPs during optimizations and remove then during final graph reshaping. To not loose the dependencies they carry, precedence edges are added to memory operations that depend on them. During GCM, the control of the memory operations to take the current control and the precedence edges.
>>>>> 
>>>>> Experiments show that this scheme doesn?t cause performance regressions (I ran promotion testing on x64 and sparc).
>>>>> 
>>>>> Roland.


From vladimir.kozlov at oracle.com  Mon Apr 13 15:26:28 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 13 Apr 2015 08:26:28 -0700
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>	<55086F20.9020305@oracle.com>	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>	<550893F0.9050608@oracle.com>	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
	<6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com>
Message-ID: <552BE024.7080509@oracle.com>

Yes, changes look good.

Thanks,
Vladimir

On 4/13/15 7:39 AM, Roland Westrelin wrote:
> Vladimir,
>
> Do you think this webrev looks ok?
>
> Roland.
>
>
>> On Mar 24, 2015, at 1:55 PM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
>>
>> See inlined.
>>
>>>> Thanks for looking at this.
>>>>
>>>>> This is what I waited for long time! Thank you for doing this, Roland.
>>>>>
>>>>> How you handle case when CastPP is input of Phi node?
>>>>
>>>> The Phi has a control so any memory node that depends on the Phi output is guaranteed to be ?after? the CastPP, right?
>>>
>>> Yes. So you simple remove CastPP in such case. Okay.
>>>
>>>>
>>>>> I am worried how you separate cases when precedence edge added from CastPP and other precedence cases. Can you explain? May be there are no problems.
>>>>
>>>> The
>>>>
>>>> if (m->is_block_proj()) {
>>>>
>>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something?
>>>
>>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there.
>>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes.
>>> If control nodes come only from CastPP then I am fine with your code.
>>
>> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that:
>>
>> - CastPP nodes don?t always have a control
>> - some CastPP nodes depend on a Region because the test was moved to the branch of a dominating If
>> - the test for some CastPP?s nodes are removed during escape analysis
>>
>> I updated the code to reflect those cases.
>>
>> http://cr.openjdk.java.net/~roland/8069191/webrev.01/
>>
>> Roland.
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Roland.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/17/15 3:54 AM, Roland Westrelin wrote:
>>>>>> http://cr.openjdk.java.net/~roland/8069191/webrev.00/
>>>>>>
>>>>>> In the test (that needs to be run with StressGCM to cause incorrect code generation), a dependency carried by a CastPP is lost when CastPPs are removed after CCP. Detailed description of the bug is in:
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8069191
>>>>>>
>>>>>> Vladimir suggested investigating the performance impact of keeping the CastPPs for the entire compilation. I found that this still causes performance regressions as documented in:
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8039999
>>>>>>
>>>>>> The fix I suggest is to keep CastPPs during optimizations and remove then during final graph reshaping. To not loose the dependencies they carry, precedence edges are added to memory operations that depend on them. During GCM, the control of the memory operations to take the current control and the precedence edges.
>>>>>>
>>>>>> Experiments show that this scheme doesn?t cause performance regressions (I ran promotion testing on x64 and sparc).
>>>>>>
>>>>>> Roland.
>

From roland.westrelin at oracle.com  Mon Apr 13 15:27:57 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 13 Apr 2015 17:27:57 +0200
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <552BE024.7080509@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>
	<55086F20.9020305@oracle.com>
	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>
	<550893F0.9050608@oracle.com>
	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
	<6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com>
	<552BE024.7080509@oracle.com>
Message-ID: <D3CAF18B-6616-430D-BEFB-C9504A4B181A@oracle.com>

> Yes, changes look good.

Thanks for the review. Do I need another review for this?

Roland.

> 
> Thanks,
> Vladimir
> 
> On 4/13/15 7:39 AM, Roland Westrelin wrote:
>> Vladimir,
>> 
>> Do you think this webrev looks ok?
>> 
>> Roland.
>> 
>> 
>>> On Mar 24, 2015, at 1:55 PM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
>>> 
>>> See inlined.
>>> 
>>>>> Thanks for looking at this.
>>>>> 
>>>>>> This is what I waited for long time! Thank you for doing this, Roland.
>>>>>> 
>>>>>> How you handle case when CastPP is input of Phi node?
>>>>> 
>>>>> The Phi has a control so any memory node that depends on the Phi output is guaranteed to be ?after? the CastPP, right?
>>>> 
>>>> Yes. So you simple remove CastPP in such case. Okay.
>>>> 
>>>>> 
>>>>>> I am worried how you separate cases when precedence edge added from CastPP and other precedence cases. Can you explain? May be there are no problems.
>>>>> 
>>>>> The
>>>>> 
>>>>> if (m->is_block_proj()) {
>>>>> 
>>>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something?
>>>> 
>>>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there.
>>>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes.
>>>> If control nodes come only from CastPP then I am fine with your code.
>>> 
>>> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that:
>>> 
>>> - CastPP nodes don?t always have a control
>>> - some CastPP nodes depend on a Region because the test was moved to the branch of a dominating If
>>> - the test for some CastPP?s nodes are removed during escape analysis
>>> 
>>> I updated the code to reflect those cases.
>>> 
>>> http://cr.openjdk.java.net/~roland/8069191/webrev.01/
>>> 
>>> Roland.
>>> 
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>>> 
>>>>> Roland.
>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>> 
>>>>>> On 3/17/15 3:54 AM, Roland Westrelin wrote:
>>>>>>> http://cr.openjdk.java.net/~roland/8069191/webrev.00/
>>>>>>> 
>>>>>>> In the test (that needs to be run with StressGCM to cause incorrect code generation), a dependency carried by a CastPP is lost when CastPPs are removed after CCP. Detailed description of the bug is in:
>>>>>>> 
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8069191
>>>>>>> 
>>>>>>> Vladimir suggested investigating the performance impact of keeping the CastPPs for the entire compilation. I found that this still causes performance regressions as documented in:
>>>>>>> 
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8039999
>>>>>>> 
>>>>>>> The fix I suggest is to keep CastPPs during optimizations and remove then during final graph reshaping. To not loose the dependencies they carry, precedence edges are added to memory operations that depend on them. During GCM, the control of the memory operations to take the current control and the precedence edges.
>>>>>>> 
>>>>>>> Experiments show that this scheme doesn?t cause performance regressions (I ran promotion testing on x64 and sparc).
>>>>>>> 
>>>>>>> Roland.
>> 


From vladimir.kozlov at oracle.com  Mon Apr 13 15:30:15 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 13 Apr 2015 08:30:15 -0700
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <D3CAF18B-6616-430D-BEFB-C9504A4B181A@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>
	<55086F20.9020305@oracle.com>
	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>
	<550893F0.9050608@oracle.com>
	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
	<6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com>
	<552BE024.7080509@oracle.com>
	<D3CAF18B-6616-430D-BEFB-C9504A4B181A@oracle.com>
Message-ID: <552BE107.5050408@oracle.com>

On 4/13/15 8:27 AM, Roland Westrelin wrote:
>> Yes, changes look good.
>
> Thanks for the review. Do I need another review for this?

Yes, please. Ask someone directly.

Vladimir

>
> Roland.
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/13/15 7:39 AM, Roland Westrelin wrote:
>>> Vladimir,
>>>
>>> Do you think this webrev looks ok?
>>>
>>> Roland.
>>>
>>>
>>>> On Mar 24, 2015, at 1:55 PM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
>>>>
>>>> See inlined.
>>>>
>>>>>> Thanks for looking at this.
>>>>>>
>>>>>>> This is what I waited for long time! Thank you for doing this, Roland.
>>>>>>>
>>>>>>> How you handle case when CastPP is input of Phi node?
>>>>>>
>>>>>> The Phi has a control so any memory node that depends on the Phi output is guaranteed to be ?after? the CastPP, right?
>>>>>
>>>>> Yes. So you simple remove CastPP in such case. Okay.
>>>>>
>>>>>>
>>>>>>> I am worried how you separate cases when precedence edge added from CastPP and other precedence cases. Can you explain? May be there are no problems.
>>>>>>
>>>>>> The
>>>>>>
>>>>>> if (m->is_block_proj()) {
>>>>>>
>>>>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something?
>>>>>
>>>>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there.
>>>>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes.
>>>>> If control nodes come only from CastPP then I am fine with your code.
>>>>
>>>> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that:
>>>>
>>>> - CastPP nodes don?t always have a control
>>>> - some CastPP nodes depend on a Region because the test was moved to the branch of a dominating If
>>>> - the test for some CastPP?s nodes are removed during escape analysis
>>>>
>>>> I updated the code to reflect those cases.
>>>>
>>>> http://cr.openjdk.java.net/~roland/8069191/webrev.01/
>>>>
>>>> Roland.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> Roland.
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 3/17/15 3:54 AM, Roland Westrelin wrote:
>>>>>>>> http://cr.openjdk.java.net/~roland/8069191/webrev.00/
>>>>>>>>
>>>>>>>> In the test (that needs to be run with StressGCM to cause incorrect code generation), a dependency carried by a CastPP is lost when CastPPs are removed after CCP. Detailed description of the bug is in:
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8069191
>>>>>>>>
>>>>>>>> Vladimir suggested investigating the performance impact of keeping the CastPPs for the entire compilation. I found that this still causes performance regressions as documented in:
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8039999
>>>>>>>>
>>>>>>>> The fix I suggest is to keep CastPPs during optimizations and remove then during final graph reshaping. To not loose the dependencies they carry, precedence edges are added to memory operations that depend on them. During GCM, the control of the memory operations to take the current control and the precedence edges.
>>>>>>>>
>>>>>>>> Experiments show that this scheme doesn?t cause performance regressions (I ran promotion testing on x64 and sparc).
>>>>>>>>
>>>>>>>> Roland.
>>>
>

From john.r.rose at oracle.com  Mon Apr 13 19:38:55 2015
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 13 Apr 2015 12:38:55 -0700
Subject: RFR (S): 8076461: JSR292: remove unused native and constants
In-Reply-To: <3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com>
References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>
	<C1C9C217-1448-416E-9862-47C52A4ED7DC@oracle.com>
	<3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com>
Message-ID: <0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com>

That's much better; thanks.  Glad to hear the verifyC's still works.

The MN_* constants are a private interface between C++ and Java code.  Those are the most important to verify.

You can get rid of these lines; we don't look at vtable indexes any more:
        // The JVM uses values of -2 and above for vtable indexes.
        // Field values are simple positive offsets.
        // Ref: src/share/vm/oops/methodOop.hpp
        // This value is negative enough to avoid such numbers,
        // but not too negative.

The other constants are publicly defined in various standards docs (except T_ILLEGAL).

I don't think these constants are used any more, except the MN_* and REF_* ones.  (The REF_* ones are in the JVM standard, so are in some sense pre-verified.)

I suggest also removing the ACC_*, T_*, and CONSTANT_* names, if you can.  We probably stopped using any of those when we started using ASM.

Thanks!

? John

On Apr 13, 2015, at 4:40 AM, Michael Haupt <michael.haupt at oracle.com> wrote:
> 
> Hi John,
> 
> thank you very much for your review; keeping the Constants class around for VM/JDK constant value agreement certainly makes sense. I have undone most of the removal work and verified in a slowdebug build that MHN.verifyConstants() works. I've also added a comment on the Constants class to clarify its role a bit. Local tests and JPRT are still happy with this.
> 
> Updated webrev: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.01/
> 
> Best,
> 
> Michael
> 
>> Am 07.04.2015 um 23:49 schrieb John Rose <john.r.rose at oracle.com>:
>> 
>> On Apr 7, 2015, at 12:11 PM, Michael Haupt <michael.haupt at oracle.com> wrote:
>>> 
>>> Dear all,
>>> 
>>> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class.
>> 
>> The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about.  Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java).  If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE.  It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync.
>> 
>> If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for.  But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure.
>> 
>> I'm happy to see the "GC" guys go away.  They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams.
>> 
>> ? John
>> 
>>> 
>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461
>>> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/
>>> 
>>> Tested with JPRT, HotSpot testset.
>>> 
>>> Thanks,
>>> 
>>> Michael
> 
> 
> -- 
> 
> <http://www.oracle.com/>
> Dr. Michael Haupt | Principal Member of Technical Staff
> Phone: +49 331 200 7277 | Fax: +49 331 200 7561
> Oracle Java Platform Group | HotSpot Compiler Team 
> Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
> <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment
> 


From john.r.rose at oracle.com  Mon Apr 13 19:50:33 2015
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 13 Apr 2015 12:50:33 -0700
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <552BADCF.80109@oracle.com>
References: <552BADCF.80109@oracle.com>
Message-ID: <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>

On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com> wrote:
> 
> please review the following patch.

Good.  This line has a typo ("encrypBlock" = gang member induction party foul?):
+  * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150413/46c310d7/attachment.html>

From zoltan.majo at oracle.com  Mon Apr 13 20:55:06 2015
From: zoltan.majo at oracle.com (=?utf-8?Q?Zolt=C3=A1n_Maj=C3=B3?=)
Date: Mon, 13 Apr 2015 22:55:06 +0200
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <E2CD15F2-E1E6-4F93-A555-E4235FA4E640@oracle.com>
References: <552BADCF.80109@oracle.com>
	<4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
	<E2CD15F2-E1E6-4F93-A555-E4235FA4E640@oracle.com>
Message-ID: <28A5E5B5-4B00-45B8-96D9-1D28B0522319@oracle.com>

Hi Tony,


> On 13 Apr 2015, at 22:09, Anthony Scarpino <anthony.scarpino at oracle.com> wrote:
> 
> Hi,
> 
> Could you forward the whole message, with the patch, to the security list.  I have only received  John's response, but not the webrev. 

please find the original RFR below. I?ve sent it to security-dev at openjdk.java.net at the same time as I did send it to hotspot-compiler-dev at openjdk.java.net. But as security-dev seems to be moderated for non-members, the original message is most likely awaiting moderator approval.

Thank you and best regards,


Zoltan

================

Hi,


please review the following patch.

Bug: https://bugs.openjdk.java.net/browse/JDK-8067648


Problem: On architectures with hardware support for AES operations, the Java version (the version in the JDK sources) of the com.sun.crypto.provides.AESCrypt::encryptBlock(byte[], int, byte[], int) method is replaced with an intrinsic that uses the CPU's AES instructions.

The Java version of encryptBlock operates on arrays of size AES_BLOCK_SIZE=16 and it consequently performs a number of "implicit" checks (e.g., null checks and range checks) as required by the Java VM specification. The intrinsified version of encryptBlock, however, does not perform any of these checks.

Omitting checks results in a VM crash if invalid parameters (e.g., a null pointer, as reported in the current case) are passed to the method.


Solution: The failure reported in the current issue appears in the com.sun.crypto.provider.GCTR class that calls the intrinsified version of encryptBlock. None of the methods of the class are accessible from packages other than com.sun.crypto.provider. So, after private a discussion with John Rose, Vladimir Kozlov, and Roland Westrelin, I propose to solve this problem on the Java-level.

The GCTR::counter field is supposed to be initialized with an array of size AES_BLOCK_SIZE so that it is safe to call encryptBlock. The 'counter' field is never supposed to become NULL during the lifetime of a GCTR object (so that encryptBlock can be always called safely).

The GCTR class supports saving and restoring the value of the 'counter' field (via the save() and restore() methods). For saving/restoring, the class uses the 'counterSave' field as temporary storage.

It is also possible to reset the a GCTR object to its initial state by calling reset(). Reset sets both the 'counter' and 'counterSave' fields to their initial values.

If a call to the method reset() is followed by a call to restore(), the field 'counter' is not restored to its original value, but it becomes NULL. This is an invalid state, because a GCTR object should always contain a valid 'counter' array. This problem has been also described (in part) by Chris Ellis.

https://intrbiz.com/post/blog/development/java_8_aes_gcm_nullpointerexception

This patch proposes to restore the contents of 'counter' from 'counterSave' only if some data has been saved into 'counterSave' before (i.e., counterSave is not NULL). The patch also adds a check to the constructor of GCTR to verify if the length of 'counter' is AES_BLOCK_SIZE. (I checked and JDK code uses this class only with arrays of size AES_BLOCK_SIZE, but it is good if the required size is documented and enforced by GCTR.)

The array to store the output of the encryptBlock method (the third parameter) should be also of length AES_BLOCK_SIZE. That is ensured by the GCTR class (both in the doFinal and update methods). The input and output offsets (the second and fourth parameters) are 0, as required by encryptBlock.


Webrev: http://cr.openjdk.java.net/~zmajo/8067648/webrev.00/


Testing:

- JPRT (both with 9 and 8u), all tests in the testsets hotspot pass;
- JTREG tests in jdk_security[1-4] executed locally with the sources built with --enable-openjdk-only; all tests that pass without the patch pass with the patch as well;
- failure reported in 8067648 can be reproduced with 8u, failure is not triggered with patch applied.


Thank you and best regards,


Zoltan

> Thanks
> 
> Tony
> 
> 
> 
> On Apr 13, 2015, at 12:50 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com> wrote:
>>> 
>>> please review the following patch.
>> 
>> Good.  This line has a typo ("encrypBlock" = gang member induction party foul?):
>> +  * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM
>> 
>> ? John


From jan.civlin at intel.com  Mon Apr 13 22:06:44 2015
From: jan.civlin at intel.com (Civlin, Jan)
Date: Mon, 13 Apr 2015 22:06:44 +0000
Subject: RFR(S): 8076284: Improve vectorization of parallel streams
In-Reply-To: <39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C450E3E586@ORSMSX104.amr.corp.intel.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334516@FMSMSX112.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3E5A4@ORSMSX104.amr.corp.intel.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com>
Message-ID: <39F83597C33E5F408096702907E6C450E3E839@ORSMSX104.amr.corp.intel.com>

Hi All,

We would like to contribute the improvement of vectorization of parallel streams  from Intel.
The contribution Bug ID:  8076284.

Please review this patch:


Bug-id:     https://bugs.openjdk.java.net/browse/JDK-8076284

webrev:  http://cr.openjdk.java.net/~kvn/8076284/webrev/


Description
Improve vectorization of the unordered parallel streams (by vectorizing forEachRemaining method).
For example, this forEach will be vectorized:
java.util.stream.IntStream iStream = java.util.stream.IntStream.range(0, RANGE - 1).parallel();
iStream.forEach( id -> c[id] = c[id] + c[id+1] );

It also enables on-demand loop vectorization in a given method (by providing more hints to SuperWord optimization).
For example, use -XX:CompileCommand=option,computeCall,Vectorize to vectorize this loop
void computeCall(double [] Call, double  puByDf, double  pdByDf)
{
for(int i = timeStep; i > 0; i--)
for(int j = 0; j <= i - 1; j++)
Call[j] = puByDf * Call[j + 1] + pdByDf * Call[j];
}

This enhancement is contributed by Intel and sponsored by the hotspot compiler team.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150413/cd8b0870/attachment-0001.html>

From john.r.rose at oracle.com  Tue Apr 14 03:31:26 2015
From: john.r.rose at oracle.com (John Rose)
Date: Mon, 13 Apr 2015 20:31:26 -0700
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>
	<55086F20.9020305@oracle.com>
	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>
	<550893F0.9050608@oracle.com>
	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
Message-ID: <88169234-01DE-470C-B56A-D96AD7C53D50@oracle.com>

Reviewed.

On Mar 24, 2015, at 5:55 AM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
> 
>>> 
>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something?
>> 
>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there.
>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes.
>> If control nodes come only from CastPP then I am fine with your code.
> 
> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that:

That's a good testing method.

Precedence edges are a simple way to add miscellaneous node relations but it is easy to forget they are there.  I guess the gcm.cpp code picks them up completely.  And after the extra edges are added, not much happens that could "forget" (drop) an edge.  (Note that copying a node to make a better one has a risk to "forget" precedence edges.)

But, if this technique were to be used in any more expansive way, or if you have lingering doubts about using precedence edges here, I would recommend creating an explicit new node type that captures multiple control dependency edges.  As we have a MergeMem node we could have a MergeControl node, whose input edges (after in(0)) would act like the precedence edges you are adding now.

Two minor comments on code style in compile.cpp:  The new 'switch' is hard to untangle.  Wouldn't it be simpler to put the 'wq.push(use)' call before the 'break', and drop the 'default' case completely?

Also, I really dislike it when block structure ({...}) cuts across #ifdef structure.  This hack would be slightly better:
   #ifdef _LP64
      if (n->in(1)->is_DecodeNarrowPtr() || n->in(2)->is_DecodeNarrowPtr()) 
    ...
      } else
  #endif //_LP64
  {
     ...
  }

Better yet, you could also just delete the #ifdef LP64 and let the tests go forward.  Or incorporate a manifest constant:
    const bool is_LP64 = LP64_ONLY(true) NOT_LP64(false);
    if (is_LP64 && (...)) { ... } else { ... }

The code in gcm.cpp treats precedence edges asymmetrically.  (The expression is 'n = is_dominator(bn, bm) ? m : n'.)  Do we want to assert that one of them dominates the other, perhaps using 'assert_dom'?

It's great to see all that mysterious old code go away.

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150413/2f45a058/attachment.html>

From zoltan.majo at oracle.com  Tue Apr 14 07:44:30 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Tue, 14 Apr 2015 09:44:30 +0200
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
References: <552BADCF.80109@oracle.com>
	<4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
Message-ID: <552CC55E.4010702@oracle.com>

Hi Johh,


thank you for the review!

On 04/13/2015 09:50 PM, John Rose wrote:
> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com 
> <mailto:zoltan.majo at oracle.com>> wrote:
>>
>> please review the following patch.
>
> Good.  This line has a typo ("encrypBlock" = gang member induction 
> party foul?):
> +  * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM

Thanks for catching that. Here is the new webrev:

http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/

Best regards,


Zoltan

>
> ? John


From wolfgang.pedot at finkzeit.at  Tue Apr 14 10:00:56 2015
From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot)
Date: Tue, 14 Apr 2015 12:00:56 +0200
Subject: Java 8 TieredCompilation Blacklist?
Message-ID: <552CE558.5040602@finkzeit.at>

Hello,

I have recently migrated a big-ish application from 7u40 to 8u40 and I 
noticed a quite substantial increase in CPU utilisation.
After doing some research I figured out that the cause of that is 
TieredCompilation which is now on by default, I have deactivated that 
feature and now CPU utilisation is back to normal.
I tested TieredCompilation before on 7u<something> and also had an 
increase in CPU up to the point where the application actually slowed 
down so I ended that test.
A part of the application uses BIRT and that tends to generate a lot of 
short-lived classes to optimize Javascript-code, my guess is that the 
tiered compiler compiles those classes in an attempt to optimize them and
depending on the usage of the system that increases CPU without really 
accelerating anything (according to statistics). I have found 
"CompileOnly" which seems to be something to be used for test and 
development, is there something like a Blacklist I can use to tell the 
compiler NOT to compile classes in a specific package?

The system had been running for ~13h on 8u40 and used 1.5h of CPU-time 
for compilation, the previous version running on 7u40 had been up for 
~62.5days and only used 36min for compilation. I did notice the much 
quicker warmup in the response-times after the switch to 8u40 but I dont 
want the system to spend so much time compiling stuff that does not 
really improve performance.

any help would be appreciated

Wolfgang


From michael.haupt at oracle.com  Tue Apr 14 11:33:42 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Tue, 14 Apr 2015 13:33:42 +0200
Subject: RFR (S): 8076461: JSR292: remove unused native and constants
In-Reply-To: <0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com>
References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>
	<C1C9C217-1448-416E-9862-47C52A4ED7DC@oracle.com>
	<3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com>
	<0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com>
Message-ID: <46742670-8A71-4026-8ED5-25BE82DD2698@oracle.com>

Hi John,

thanks again; I've applied your suggestions, re-tested as before and uploaded the revision to http://cr.openjdk.java.net/~mhaupt/8076461/webrev.02/.

Best,

Michael

> Am 13.04.2015 um 21:38 schrieb John Rose <john.r.rose at oracle.com>:
> 
> That's much better; thanks.  Glad to hear the verifyC's still works.
> 
> The MN_* constants are a private interface between C++ and Java code.  Those are the most important to verify.
> 
> You can get rid of these lines; we don't look at vtable indexes any more:
>        // The JVM uses values of -2 and above for vtable indexes.
>        // Field values are simple positive offsets.
>        // Ref: src/share/vm/oops/methodOop.hpp
>        // This value is negative enough to avoid such numbers,
>        // but not too negative.
> 
> The other constants are publicly defined in various standards docs (except T_ILLEGAL).
> 
> I don't think these constants are used any more, except the MN_* and REF_* ones.  (The REF_* ones are in the JVM standard, so are in some sense pre-verified.)
> 
> I suggest also removing the ACC_*, T_*, and CONSTANT_* names, if you can.  We probably stopped using any of those when we started using ASM.
> 
> Thanks!
> 
> ? John
> 
> On Apr 13, 2015, at 4:40 AM, Michael Haupt <michael.haupt at oracle.com> wrote:
>> 
>> Hi John,
>> 
>> thank you very much for your review; keeping the Constants class around for VM/JDK constant value agreement certainly makes sense. I have undone most of the removal work and verified in a slowdebug build that MHN.verifyConstants() works. I've also added a comment on the Constants class to clarify its role a bit. Local tests and JPRT are still happy with this.
>> 
>> Updated webrev: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.01/
>> 
>> Best,
>> 
>> Michael
>> 
>>> Am 07.04.2015 um 23:49 schrieb John Rose <john.r.rose at oracle.com>:
>>> 
>>> On Apr 7, 2015, at 12:11 PM, Michael Haupt <michael.haupt at oracle.com> wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class.
>>> 
>>> The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about.  Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java).  If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE.  It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync.
>>> 
>>> If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for.  But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure.
>>> 
>>> I'm happy to see the "GC" guys go away.  They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams.
>>> 
>>> ? John
>>> 
>>>> 
>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461
>>>> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/
>>>> 
>>>> Tested with JPRT, HotSpot testset.
>>>> 
>>>> Thanks,
>>>> 
>>>> Michael


-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150414/b659be31/attachment-0001.html>

From vladimir.x.ivanov at oracle.com  Tue Apr 14 11:47:51 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 14 Apr 2015 14:47:51 +0300
Subject: RFR (S): 8076461: JSR292: remove unused native and constants
In-Reply-To: <46742670-8A71-4026-8ED5-25BE82DD2698@oracle.com>
References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>	<C1C9C217-1448-416E-9862-47C52A4ED7DC@oracle.com>	<3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com>	<0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com>
	<46742670-8A71-4026-8ED5-25BE82DD2698@oracle.com>
Message-ID: <552CFE67.1070208@oracle.com>

Looks good.

I'll push it for you.

Best regards,
Vladimir Ivanov

On 4/14/15 2:33 PM, Michael Haupt wrote:
> Hi John,
>
> thanks again; I've applied your suggestions, re-tested as before and uploaded the revision to http://cr.openjdk.java.net/~mhaupt/8076461/webrev.02/.
>
> Best,
>
> Michael
>
>> Am 13.04.2015 um 21:38 schrieb John Rose <john.r.rose at oracle.com>:
>>
>> That's much better; thanks.  Glad to hear the verifyC's still works.
>>
>> The MN_* constants are a private interface between C++ and Java code.  Those are the most important to verify.
>>
>> You can get rid of these lines; we don't look at vtable indexes any more:
>>         // The JVM uses values of -2 and above for vtable indexes.
>>         // Field values are simple positive offsets.
>>         // Ref: src/share/vm/oops/methodOop.hpp
>>         // This value is negative enough to avoid such numbers,
>>         // but not too negative.
>>
>> The other constants are publicly defined in various standards docs (except T_ILLEGAL).
>>
>> I don't think these constants are used any more, except the MN_* and REF_* ones.  (The REF_* ones are in the JVM standard, so are in some sense pre-verified.)
>>
>> I suggest also removing the ACC_*, T_*, and CONSTANT_* names, if you can.  We probably stopped using any of those when we started using ASM.
>>
>> Thanks!
>>
>> ? John
>>
>> On Apr 13, 2015, at 4:40 AM, Michael Haupt <michael.haupt at oracle.com> wrote:
>>>
>>> Hi John,
>>>
>>> thank you very much for your review; keeping the Constants class around for VM/JDK constant value agreement certainly makes sense. I have undone most of the removal work and verified in a slowdebug build that MHN.verifyConstants() works. I've also added a comment on the Constants class to clarify its role a bit. Local tests and JPRT are still happy with this.
>>>
>>> Updated webrev: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.01/
>>>
>>> Best,
>>>
>>> Michael
>>>
>>>> Am 07.04.2015 um 23:49 schrieb John Rose <john.r.rose at oracle.com>:
>>>>
>>>> On Apr 7, 2015, at 12:11 PM, Michael Haupt <michael.haupt at oracle.com> wrote:
>>>>>
>>>>> Dear all,
>>>>>
>>>>> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class.
>>>>
>>>> The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about.  Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java).  If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE.  It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync.
>>>>
>>>> If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for.  But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure.
>>>>
>>>> I'm happy to see the "GC" guys go away.  They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams.
>>>>
>>>> ? John
>>>>
>>>>>
>>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461
>>>>> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/
>>>>>
>>>>> Tested with JPRT, HotSpot testset.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Michael
>
>

From michael.haupt at oracle.com  Tue Apr 14 11:53:57 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Tue, 14 Apr 2015 13:53:57 +0200
Subject: RFR (S): 8076461: JSR292: remove unused native and constants
In-Reply-To: <552CFE67.1070208@oracle.com>
References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com>
	<C1C9C217-1448-416E-9862-47C52A4ED7DC@oracle.com>
	<3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com>
	<0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com>
	<46742670-8A71-4026-8ED5-25BE82DD2698@oracle.com>
	<552CFE67.1070208@oracle.com>
Message-ID: <5F0E415F-6453-48E8-923F-25B835EDC08E@oracle.com>

... thank you, Vladimir!

Best,

Michael

> Am 14.04.2015 um 13:47 schrieb Vladimir Ivanov <vladimir.x.ivanov at oracle.com>:
> 
> Looks good.
> 
> I'll push it for you.
> 
> Best regards,
> Vladimir Ivanov


-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150414/d7615081/attachment.html>

From vladimir.kozlov at oracle.com  Tue Apr 14 15:59:06 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 14 Apr 2015 08:59:06 -0700
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <552CC55E.4010702@oracle.com>
References: <552BADCF.80109@oracle.com>	<4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
	<552CC55E.4010702@oracle.com>
Message-ID: <552D394A.10309@oracle.com>

Sorry for later notice. Can you also list initialCounterBlk.length value in exception message?

Thanks,
Vladimir

On 4/14/15 12:44 AM, Zolt?n Maj? wrote:
> Hi Johh,
>
>
> thank you for the review!
>
> On 04/13/2015 09:50 PM, John Rose wrote:
>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com <mailto:zoltan.majo at oracle.com>> wrote:
>>>
>>> please review the following patch.
>>
>> Good.  This line has a typo ("encrypBlock" = gang member induction party foul?):
>> +  * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM
>
> Thanks for catching that. Here is the new webrev:
>
> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/
>
> Best regards,
>
>
> Zoltan
>
>>
>> ? John
>

From zoltan.majo at oracle.com  Tue Apr 14 17:54:08 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Tue, 14 Apr 2015 19:54:08 +0200
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <552D394A.10309@oracle.com>
References: <552BADCF.80109@oracle.com>	<4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
	<552CC55E.4010702@oracle.com> <552D394A.10309@oracle.com>
Message-ID: <552D5440.4090005@oracle.com>

Hi Vladimir,


On 04/14/2015 05:59 PM, Vladimir Kozlov wrote:
> Sorry for later notice. Can you also list initialCounterBlk.length 
> value in exception message?

thank you for the feedback!

I extended the error message in the exception, here is the updated webrev:

http://cr.openjdk.java.net/~zmajo/8067648/webrev.02/

Best regards,


Zoltan

>
> Thanks,
> Vladimir
>
> On 4/14/15 12:44 AM, Zolt?n Maj? wrote:
>> Hi Johh,
>>
>>
>> thank you for the review!
>>
>> On 04/13/2015 09:50 PM, John Rose wrote:
>>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com 
>>> <mailto:zoltan.majo at oracle.com>> wrote:
>>>>
>>>> please review the following patch.
>>>
>>> Good.  This line has a typo ("encrypBlock" = gang member induction 
>>> party foul?):
>>> +  * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM
>>
>> Thanks for catching that. Here is the new webrev:
>>
>> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> ? John
>>


From vladimir.kozlov at oracle.com  Tue Apr 14 17:59:03 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 14 Apr 2015 10:59:03 -0700
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <552D5440.4090005@oracle.com>
References: <552BADCF.80109@oracle.com>	<4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
	<552CC55E.4010702@oracle.com> <552D394A.10309@oracle.com>
	<552D5440.4090005@oracle.com>
Message-ID: <552D5567.0@oracle.com>

Good.

Thanks,
Vladimir

On 4/14/15 10:54 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
>
> On 04/14/2015 05:59 PM, Vladimir Kozlov wrote:
>> Sorry for later notice. Can you also list initialCounterBlk.length value in exception message?
>
> thank you for the feedback!
>
> I extended the error message in the exception, here is the updated webrev:
>
> http://cr.openjdk.java.net/~zmajo/8067648/webrev.02/
>
> Best regards,
>
>
> Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/14/15 12:44 AM, Zolt?n Maj? wrote:
>>> Hi Johh,
>>>
>>>
>>> thank you for the review!
>>>
>>> On 04/13/2015 09:50 PM, John Rose wrote:
>>>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com <mailto:zoltan.majo at oracle.com>> wrote:
>>>>>
>>>>> please review the following patch.
>>>>
>>>> Good.  This line has a typo ("encrypBlock" = gang member induction party foul?):
>>>> +  * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM
>>>
>>> Thanks for catching that. Here is the new webrev:
>>>
>>> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>>
>>>> ? John
>>>
>

From zoltan.majo at oracle.com  Tue Apr 14 19:09:19 2015
From: zoltan.majo at oracle.com (Zoltan Majo)
Date: Tue, 14 Apr 2015 20:09:19 +0100
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <552D5567.0@oracle.com>
References: <552BADCF.80109@oracle.com>	<4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
	<552CC55E.4010702@oracle.com> <552D394A.10309@oracle.com>
	<552D5440.4090005@oracle.com> <552D5567.0@oracle.com>
Message-ID: <552D65DF.4020306@oracle.com>

Thank you, John and Vladimir, for the review!

Best regards,


Zoltan

On 14.04.2015 18:59, Vladimir Kozlov wrote:
> Good.
>
> Thanks,
> Vladimir
>
> On 4/14/15 10:54 AM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>>
>> On 04/14/2015 05:59 PM, Vladimir Kozlov wrote:
>>> Sorry for later notice. Can you also list initialCounterBlk.length 
>>> value in exception message?
>>
>> thank you for the feedback!
>>
>> I extended the error message in the exception, here is the updated 
>> webrev:
>>
>> http://cr.openjdk.java.net/~zmajo/8067648/webrev.02/
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/14/15 12:44 AM, Zolt?n Maj? wrote:
>>>> Hi Johh,
>>>>
>>>>
>>>> thank you for the review!
>>>>
>>>> On 04/13/2015 09:50 PM, John Rose wrote:
>>>>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com 
>>>>> <mailto:zoltan.majo at oracle.com>> wrote:
>>>>>>
>>>>>> please review the following patch.
>>>>>
>>>>> Good.  This line has a typo ("encrypBlock" = gang member induction 
>>>>> party foul?):
>>>>> +  * AESCrypt.encrypBlock method can be intrinsified on the 
>>>>> HotSpot VM
>>>>
>>>> Thanks for catching that. Here is the new webrev:
>>>>
>>>> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>>
>>>>> ? John
>>>>
>>


From zoltan.majo at oracle.com  Tue Apr 14 19:13:22 2015
From: zoltan.majo at oracle.com (Zoltan Majo)
Date: Tue, 14 Apr 2015 20:13:22 +0100
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <552D584E.50201@oracle.com>
References: <552BADCF.80109@oracle.com>	<4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
	<552CC55E.4010702@oracle.com> <552D584E.50201@oracle.com>
Message-ID: <552D66D2.2040805@oracle.com>

Thank you, Tony, for the review!

Best regards,


Zoltan

On 14.04.2015 19:11, Anthony Scarpino wrote:
> The updated changes look good to me..
>
> Tony
>
> On 04/14/2015 12:44 AM, Zolt?n Maj? wrote:
>> Hi Johh,
>>
>>
>> thank you for the review!
>>
>> On 04/13/2015 09:50 PM, John Rose wrote:
>>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com
>>> <mailto:zoltan.majo at oracle.com>> wrote:
>>>>
>>>> please review the following patch.
>>>
>>> Good.  This line has a typo ("encrypBlock" = gang member induction
>>> party foul?):
>>> +  * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM
>>
>> Thanks for catching that. Here is the new webrev:
>>
>> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> ? John
>>
>


From anthony.scarpino at oracle.com  Mon Apr 13 20:09:10 2015
From: anthony.scarpino at oracle.com (Anthony Scarpino)
Date: Mon, 13 Apr 2015 13:09:10 -0700
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
References: <552BADCF.80109@oracle.com>
	<4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
Message-ID: <E2CD15F2-E1E6-4F93-A555-E4235FA4E640@oracle.com>

Hi,

Could you forward the whole message, with the patch, to the security list.  I have only received  John's response, but not the webrev. 

Thanks

Tony


> On Apr 13, 2015, at 12:50 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com> wrote:
>> 
>> please review the following patch.
> 
> Good.  This line has a typo ("encrypBlock" = gang member induction party foul?):
> +  * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM
> 
> ? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150413/a6b283de/attachment.html>

From anthony.scarpino at oracle.com  Tue Apr 14 18:11:26 2015
From: anthony.scarpino at oracle.com (Anthony Scarpino)
Date: Tue, 14 Apr 2015 11:11:26 -0700
Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher
	suites in GCTR doFinal
In-Reply-To: <552CC55E.4010702@oracle.com>
References: <552BADCF.80109@oracle.com>	<4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com>
	<552CC55E.4010702@oracle.com>
Message-ID: <552D584E.50201@oracle.com>

The updated changes look good to me..

Tony

On 04/14/2015 12:44 AM, Zolt?n Maj? wrote:
> Hi Johh,
>
>
> thank you for the review!
>
> On 04/13/2015 09:50 PM, John Rose wrote:
>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? <zoltan.majo at oracle.com
>> <mailto:zoltan.majo at oracle.com>> wrote:
>>>
>>> please review the following patch.
>>
>> Good.  This line has a typo ("encrypBlock" = gang member induction
>> party foul?):
>> +  * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM
>
> Thanks for catching that. Here is the new webrev:
>
> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/
>
> Best regards,
>
>
> Zoltan
>
>>
>> ? John
>


From jan.civlin at intel.com  Mon Apr 13 10:33:09 2015
From: jan.civlin at intel.com (Civlin, Jan)
Date: Mon, 13 Apr 2015 10:33:09 +0000
Subject: RFR(S): 8076284: Improve vectorization of parallel streams
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C450E3E586@ORSMSX104.amr.corp.intel.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334516@FMSMSX112.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3E5A4@ORSMSX104.amr.corp.intel.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com>
Message-ID: <39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com>

Hi All,

We would like to contribute the improvement of vectorization of parallel streams  from Intel.
The contribution Bug ID:  8076284.

Please review this patch:


Bug-id:     https://bugs.openjdk.java.net/browse/JDK-8076284

webrev:  http://cr.openjdk.java.net/~kvn/8076284/webrev/


Description
Improve vectorization of the unordered parallel streams (by vectorizing forEachRemaining method).
For example, this forEach will be vectorized:
java.util.stream.IntStream iStream = java.util.stream.IntStream.range(0, RANGE - 1).parallel();
iStream.forEach( id -> c[id] = c[id] + c[id+1] );

It also enables on-demand loop vectorization in a given method (by providing more hints to SuperWord optimization).
For example, use -XX:CompileCommand=option,computeCall,Vectorize to vectorize this loop
void computeCall(double [] Call, double  puByDf, double  pdByDf)
{
for(int i = timeStep; i > 0; i--)
for(int j = 0; j <= i - 1; j++)
Call[j] = puByDf * Call[j + 1] + pdByDf * Call[j];
}

This enhancement is contributed by Intel and sponsored by the hotspot compiler team.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150413/6fe76ebf/attachment.html>

From roland.westrelin at oracle.com  Wed Apr 15 09:17:24 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 15 Apr 2015 11:17:24 +0200
Subject: RFR(XS): 8074676: java.lang.invoke.PermuteArgsTest.java fails with
	"assert(is_Initialize()) failed: invalid node class"
Message-ID: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com>

http://cr.openjdk.java.net/~roland/8074676/webrev.00/

The guards that I added in the Arrays.copyOf() intrinsic can cause the control to become top. The code is missing a check for stopped().

Roland.

From roland.westrelin at oracle.com  Wed Apr 15 09:48:49 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 15 Apr 2015 11:48:49 +0200
Subject: RFR(S): 8077832: SA's dumpreplaydata,
	dumpcfg and buildreplayjars are broken
Message-ID: <F72C231A-7282-4898-B76B-0A2C18CEF0F0@oracle.com>

http://cr.openjdk.java.net/~roland/8077832/webrev.00/

I found 3 locations where the SA code is out of sync with the hotspot code.

Roland.

From roland.westrelin at oracle.com  Wed Apr 15 10:16:59 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 15 Apr 2015 12:16:59 +0200
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <552527E1.5060102@oracle.com>
References: <551C5B92.8060500@oracle.com> <552527E1.5060102@oracle.com>
Message-ID: <F76DEB60-25C3-4CD4-B71F-C29E364CBBB2@oracle.com>

Hi Vladimir,

>  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/

In ciCallSite::get_context(), is it safe to manipulate a raw oop the way you do it (with 2 different oops). Can?t it be moved concurrently by the GC?

Roland.

>  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/
> 
> Best regards,
> Vladimir Ivanov
> 
> On 4/1/15 11:56 PM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
>> https://bugs.openjdk.java.net/browse/JDK-8057967
>> 
>> HotSpot JITs inline very aggressively through CallSites. The
>> optimistically treat CallSite target as constant, but record a nmethod
>> dependency to invalidate the compiled code once CallSite target changes.
>> 
>> Right now, such dependencies have call site class as a context. This
>> context is too coarse and it leads to context pollution: if some
>> CallSite target changes, VM needs to enumerate all nmethods which
>> depends on call sites of such type.
>> 
>> As performance analysis in the bug report shows, it can sum to
>> significant amount of work.
>> 
>> While working on the fix, I investigated 3 approaches:
>>   (1) unique context per call site
>>   (2) use CallSite target class
>>   (3) use a class the CallSite instance is linked to
>> 
>> Considering call sites are ubiquitous (e.g. 10,000s on some octane
>> benchmarks), loading a dedicated class for every call site is an
>> overkill (even VM anonymous).
>> 
>> CallSite target class
>> (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class<?>) is
>> also not satisfactory, since it is a compiled LambdaForm VM anonymous
>> class, which is heavily shared. It gets context pollution down, but
>> still the overhead is quite high.
>> 
>> So, I decided to focus on (3) and ended up with a mixture of (2) & (3).
>> 
>> Comparing to other options, the complications of (3) are:
>>   - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so
>> there should be some default context VM can use
>> 
>>   - CallSite instances can be shared and it shouldn't keep the context
>> class from unloading;
>> 
>> It motivated a scheme where CallSite context is initialized lazily and
>> can change during lifetime. When CallSite is linked with an indy
>> instruction, it's context is initialized. Usually, JIT sees CallSite
>> instances with initialized context (since it reaches them through indy),
>> but if it's not the case and there's no context yet, JIT sets it to
>> "default context", which means "use target call site".
>> 
>> I introduced CallSite$DependencyContext, which represents a nmethod
>> dependency context and points (indirectly) to a Class<?> used as a context.
>> 
>> Context class is referenced through a phantom reference
>> (sun.misc.Cleaner to simplify cleanup). Though it's impossible to
>> extract referent using Reference.get(), VM can access it directly by
>> reading corresponding field. Unlike other types of references, phantom
>> references aren't cleared automatically. It allows VM to access context
>> class until cleanup is performed. And cleanup resets the context to
>> NULL, in addition to invalidating all relevant dependencies.
>> 
>> There are 3 context states a CallSite instance can be in:
>>   (1) NULL: no depedencies
>>   (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in
>> call site target class
>>   (3) DependencyContext for some class: dependencies are stored on the
>> class DependencyContext instance points to
>> 
>> Every CallSite starts w/o a context (1) and then lazily gets one ((2) or
>> (3) depending on the situation).
>> 
>> State transitions:
>>   (1->3): When a CallSite w/o a context (1) is linked with some indy
>> call site, it's owner is recorded as a context (3).
>> 
>>   (1->2): When JIT needs to record a dependency on a target of a
>> CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and
>> uses target class to store the dependency.
>> 
>>   (3->1): When context class becomes unreachable, a cleanup hook
>> invalidates all dependencies on that CallSite and resets the context to
>> NULL (1).
>> 
>> Only (3->1) requires dependency invalidation, because there are no
>> depedencies in (1) and (2->1) isn't performed.
>> 
>> (1->3) is done in Java code (CallSite.initContext) and (1->2) is
>> performed in VM (ciCallSite::get_context()). The updates are performed
>> by CAS, so there's no need in additional synchronization. Other
>> operations on VM side are volatile (to play well with Java code) and
>> performed with Compile_lock held (to avoid races between VM operations).
>> 
>> Some statistics:
>>   Box2D, latest jdk9-dev
>>     - CallSite instances: ~22000
>> 
>>     - invalidated nmethods due to CallSite target changes: ~60
>> 
>>     - checked call_site_target_value dependencies:
>>       - before the fix: ~1,600,000
>>       - after the fix:        ~600
>> 
>> Testing:
>>   - dedicated test which excercises different state transitions
>>   - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn
>> 
>> Thanks!
>> 
>> Best regards,
>> Vladimir Ivanov


From vitalyd at gmail.com  Wed Apr 15 14:10:41 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 15 Apr 2015 10:10:41 -0400
Subject: CHA for interfaces in C2 compiler
Message-ID: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>

Hi guys,

So CHA on classes works nicely in the case of only one subtype loaded.
What about interfaces? Currently, it looks like no such
optimization/analysis is done.  In my experience, there's a substantial
amount of code that exposes an interface via some API, but then loads only
implementation of it.  The interface is used instead of abstract class to
allow more flexibility in the future.

I fully realize that lots of interfaces have more than 1 implementer loaded
at runtime, but I also think it's worthwhile to attempt CHA for them.

Is this something that's feasible to do? It would require more class
loading dependencies to be tracked, but I'm also fine with having this be
an extra flag that I can use to enable/disable this optimization.

Thoughts?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150415/f2a3b7a2/attachment.html>

From dmitry.samersoff at oracle.com  Wed Apr 15 14:24:11 2015
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Wed, 15 Apr 2015 17:24:11 +0300
Subject: RFR(S): 8077832: SA's dumpreplaydata, dumpcfg and buildreplayjars
	are broken
In-Reply-To: <F72C231A-7282-4898-B76B-0A2C18CEF0F0@oracle.com>
References: <F72C231A-7282-4898-B76B-0A2C18CEF0F0@oracle.com>
Message-ID: <552E748B.4090202@oracle.com>

Roland,

Looks good to me.

-Dmitry

On 2015-04-15 12:48, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8077832/webrev.00/
> 
> I found 3 locations where the SA code is out of sync with the hotspot code.
> 
> Roland.
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From forax at univ-mlv.fr  Wed Apr 15 14:24:22 2015
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 15 Apr 2015 16:24:22 +0200
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
Message-ID: <552E7496.3080304@univ-mlv.fr>


On 04/15/2015 04:10 PM, Vitaly Davidovich wrote:
> Hi guys,
>
> So CHA on classes works nicely in the case of only one subtype 
> loaded.  What about interfaces? Currently, it looks like no such 
> optimization/analysis is done.  In my experience, there's a 
> substantial amount of code that exposes an interface via some API, but 
> then loads only implementation of it.  The interface is used instead 
> of abstract class to allow more flexibility in the future.
>
> I fully realize that lots of interfaces have more than 1 implementer 
> loaded at runtime, but I also think it's worthwhile to attempt CHA for 
> them.
>
> Is this something that's feasible to do? It would require more class 
> loading dependencies to be tracked, but I'm also fine with having this 
> be an extra flag that I can use to enable/disable this optimization.
>
> Thoughts?
>
> Thanks

I've implemented something like this in a language (which has a special 
syntax for calling Java object).
To avoid to have too many metadata, I've used a simple heuristic, the 
idea is that an interface with a lot of methods do not have a lot of 
implementations so the runtime only tried to do CHA, using a 
SwitchPoint, if there were more than 3 methods (included) in the interface.

cheers,
R?mi


From vitalyd at gmail.com  Wed Apr 15 14:26:51 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 15 Apr 2015 10:26:51 -0400
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <552E7496.3080304@univ-mlv.fr>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E7496.3080304@univ-mlv.fr>
Message-ID: <CAHjP37EiRiFSYO3h90k2doQcqw6iWFG9g8-dTtSaisdo4ZEJyA@mail.gmail.com>

A heuristic like that would work for most of my cases as well :).

On Wed, Apr 15, 2015 at 10:24 AM, Remi Forax <forax at univ-mlv.fr> wrote:

>
> On 04/15/2015 04:10 PM, Vitaly Davidovich wrote:
>
>> Hi guys,
>>
>> So CHA on classes works nicely in the case of only one subtype loaded.
>> What about interfaces? Currently, it looks like no such
>> optimization/analysis is done.  In my experience, there's a substantial
>> amount of code that exposes an interface via some API, but then loads only
>> implementation of it.  The interface is used instead of abstract class to
>> allow more flexibility in the future.
>>
>> I fully realize that lots of interfaces have more than 1 implementer
>> loaded at runtime, but I also think it's worthwhile to attempt CHA for them.
>>
>> Is this something that's feasible to do? It would require more class
>> loading dependencies to be tracked, but I'm also fine with having this be
>> an extra flag that I can use to enable/disable this optimization.
>>
>> Thoughts?
>>
>> Thanks
>>
>
> I've implemented something like this in a language (which has a special
> syntax for calling Java object).
> To avoid to have too many metadata, I've used a simple heuristic, the idea
> is that an interface with a lot of methods do not have a lot of
> implementations so the runtime only tried to do CHA, using a SwitchPoint,
> if there were more than 3 methods (included) in the interface.
>
> cheers,
> R?mi
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150415/374ab80d/attachment.html>

From staffan.larsen at oracle.com  Wed Apr 15 14:28:06 2015
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Wed, 15 Apr 2015 16:28:06 +0200
Subject: RFR(S): 8077832: SA's dumpreplaydata,
	dumpcfg and buildreplayjars are broken
In-Reply-To: <F72C231A-7282-4898-B76B-0A2C18CEF0F0@oracle.com>
References: <F72C231A-7282-4898-B76B-0A2C18CEF0F0@oracle.com>
Message-ID: <04E2ECF6-B209-48C8-8B8D-7B28FB1ABCEC@oracle.com>

Looks good!

Thanks,
/Staffan

> On 15 apr 2015, at 11:48, Roland Westrelin <roland.westrelin at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~roland/8077832/webrev.00/
> 
> I found 3 locations where the SA code is out of sync with the hotspot code.
> 
> Roland.


From vladimir.x.ivanov at oracle.com  Wed Apr 15 14:35:50 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 15 Apr 2015 17:35:50 +0300
Subject: RFR(XS): 8074676: java.lang.invoke.PermuteArgsTest.java fails
	with "assert(is_Initialize()) failed: invalid node class"
In-Reply-To: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com>
References: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com>
Message-ID: <552E7746.7050303@oracle.com>

Looks good.

Best regards,
Vladimir Ivanov

On 4/15/15 12:17 PM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8074676/webrev.00/
>
> The guards that I added in the Arrays.copyOf() intrinsic can cause the control to become top. The code is missing a check for stopped().
>
> Roland.
>

From gustav.r.akesson at gmail.com  Wed Apr 15 14:53:46 2015
From: gustav.r.akesson at gmail.com (=?UTF-8?Q?Gustav_=C3=85kesson?=)
Date: Wed, 15 Apr 2015 16:53:46 +0200
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <CAHjP37EiRiFSYO3h90k2doQcqw6iWFG9g8-dTtSaisdo4ZEJyA@mail.gmail.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E7496.3080304@univ-mlv.fr>
	<CAHjP37EiRiFSYO3h90k2doQcqw6iWFG9g8-dTtSaisdo4ZEJyA@mail.gmail.com>
Message-ID: <CAKEw5+7G398JLVY32iwE1=hVzr1dMi1qTu3mq_CCqMJRa_WbBg@mail.gmail.com>

Hi,

I was surprised by this finding after reading Shipilev's blog. In the huge
Java code base I'm currently working in, we have a significant amount of
interfaces with a single implementing class, and hardly any abstract
classes.

>From a use-case perspective I would gladly welcome an attempt to improve
the CHA for interfaces.

Best regards,
Gustav ?kesson
Den 15 apr 2015 16:27 skrev "Vitaly Davidovich" <vitalyd at gmail.com>:

> A heuristic like that would work for most of my cases as well :).
>
> On Wed, Apr 15, 2015 at 10:24 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>
>>
>> On 04/15/2015 04:10 PM, Vitaly Davidovich wrote:
>>
>>> Hi guys,
>>>
>>> So CHA on classes works nicely in the case of only one subtype loaded.
>>> What about interfaces? Currently, it looks like no such
>>> optimization/analysis is done.  In my experience, there's a substantial
>>> amount of code that exposes an interface via some API, but then loads only
>>> implementation of it.  The interface is used instead of abstract class to
>>> allow more flexibility in the future.
>>>
>>> I fully realize that lots of interfaces have more than 1 implementer
>>> loaded at runtime, but I also think it's worthwhile to attempt CHA for them.
>>>
>>> Is this something that's feasible to do? It would require more class
>>> loading dependencies to be tracked, but I'm also fine with having this be
>>> an extra flag that I can use to enable/disable this optimization.
>>>
>>> Thoughts?
>>>
>>> Thanks
>>>
>>
>> I've implemented something like this in a language (which has a special
>> syntax for calling Java object).
>> To avoid to have too many metadata, I've used a simple heuristic, the
>> idea is that an interface with a lot of methods do not have a lot of
>> implementations so the runtime only tried to do CHA, using a SwitchPoint,
>> if there were more than 3 methods (included) in the interface.
>>
>> cheers,
>> R?mi
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150415/31e2a77e/attachment.html>

From vladimir.x.ivanov at oracle.com  Wed Apr 15 15:55:24 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 15 Apr 2015 18:55:24 +0300
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <F76DEB60-25C3-4CD4-B71F-C29E364CBBB2@oracle.com>
References: <551C5B92.8060500@oracle.com> <552527E1.5060102@oracle.com>
	<F76DEB60-25C3-4CD4-B71F-C29E364CBBB2@oracle.com>
Message-ID: <552E89EC.7080900@oracle.com>

Roland, thanks for looking into the fix!

You are right.
I moved VM_ENTRY_MARK to the beginning of the method [1].

Updated webrev in place.
   http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/

Best regards,
Vladimir Ivanov

[1]
diff --git a/src/share/vm/ci/ciCallSite.cpp b/src/share/vm/ci/ciCallSite.cpp
--- a/src/share/vm/ci/ciCallSite.cpp
+++ b/src/share/vm/ci/ciCallSite.cpp
@@ -55,6 +55,8 @@
  // Return the target MethodHandle of this CallSite.
  ciKlass* ciCallSite::get_context() {
    assert(!is_constant_call_site(), "");
+
+  VM_ENTRY_MARK;
    oop call_site_oop = get_oop();
    InstanceKlass* ctxk = 
MethodHandles::get_call_site_context(call_site_oop);
    if (ctxk == NULL) {
@@ -63,7 +65,6 @@
      java_lang_invoke_CallSite::set_context_cas(call_site_oop, 
def_context_oop, /*expected=*/NULL);
      ctxk = MethodHandles::get_call_site_context(call_site_oop);
    }
-  VM_ENTRY_MARK;
    return (CURRENT_ENV->get_metadata(ctxk))->as_klass();
  }


On 4/15/15 1:16 PM, Roland Westrelin wrote:
> Hi Vladimir,
>
>>   http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/
>
> In ciCallSite::get_context(), is it safe to manipulate a raw oop the way you do it (with 2 different oops). Can?t it be moved concurrently by the GC?
>
> Roland.
>
>>   http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 4/1/15 11:56 PM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/
>>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/
>>> https://bugs.openjdk.java.net/browse/JDK-8057967
>>>
>>> HotSpot JITs inline very aggressively through CallSites. The
>>> optimistically treat CallSite target as constant, but record a nmethod
>>> dependency to invalidate the compiled code once CallSite target changes.
>>>
>>> Right now, such dependencies have call site class as a context. This
>>> context is too coarse and it leads to context pollution: if some
>>> CallSite target changes, VM needs to enumerate all nmethods which
>>> depends on call sites of such type.
>>>
>>> As performance analysis in the bug report shows, it can sum to
>>> significant amount of work.
>>>
>>> While working on the fix, I investigated 3 approaches:
>>>    (1) unique context per call site
>>>    (2) use CallSite target class
>>>    (3) use a class the CallSite instance is linked to
>>>
>>> Considering call sites are ubiquitous (e.g. 10,000s on some octane
>>> benchmarks), loading a dedicated class for every call site is an
>>> overkill (even VM anonymous).
>>>
>>> CallSite target class
>>> (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class<?>) is
>>> also not satisfactory, since it is a compiled LambdaForm VM anonymous
>>> class, which is heavily shared. It gets context pollution down, but
>>> still the overhead is quite high.
>>>
>>> So, I decided to focus on (3) and ended up with a mixture of (2) & (3).
>>>
>>> Comparing to other options, the complications of (3) are:
>>>    - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so
>>> there should be some default context VM can use
>>>
>>>    - CallSite instances can be shared and it shouldn't keep the context
>>> class from unloading;
>>>
>>> It motivated a scheme where CallSite context is initialized lazily and
>>> can change during lifetime. When CallSite is linked with an indy
>>> instruction, it's context is initialized. Usually, JIT sees CallSite
>>> instances with initialized context (since it reaches them through indy),
>>> but if it's not the case and there's no context yet, JIT sets it to
>>> "default context", which means "use target call site".
>>>
>>> I introduced CallSite$DependencyContext, which represents a nmethod
>>> dependency context and points (indirectly) to a Class<?> used as a context.
>>>
>>> Context class is referenced through a phantom reference
>>> (sun.misc.Cleaner to simplify cleanup). Though it's impossible to
>>> extract referent using Reference.get(), VM can access it directly by
>>> reading corresponding field. Unlike other types of references, phantom
>>> references aren't cleared automatically. It allows VM to access context
>>> class until cleanup is performed. And cleanup resets the context to
>>> NULL, in addition to invalidating all relevant dependencies.
>>>
>>> There are 3 context states a CallSite instance can be in:
>>>    (1) NULL: no depedencies
>>>    (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in
>>> call site target class
>>>    (3) DependencyContext for some class: dependencies are stored on the
>>> class DependencyContext instance points to
>>>
>>> Every CallSite starts w/o a context (1) and then lazily gets one ((2) or
>>> (3) depending on the situation).
>>>
>>> State transitions:
>>>    (1->3): When a CallSite w/o a context (1) is linked with some indy
>>> call site, it's owner is recorded as a context (3).
>>>
>>>    (1->2): When JIT needs to record a dependency on a target of a
>>> CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and
>>> uses target class to store the dependency.
>>>
>>>    (3->1): When context class becomes unreachable, a cleanup hook
>>> invalidates all dependencies on that CallSite and resets the context to
>>> NULL (1).
>>>
>>> Only (3->1) requires dependency invalidation, because there are no
>>> depedencies in (1) and (2->1) isn't performed.
>>>
>>> (1->3) is done in Java code (CallSite.initContext) and (1->2) is
>>> performed in VM (ciCallSite::get_context()). The updates are performed
>>> by CAS, so there's no need in additional synchronization. Other
>>> operations on VM side are volatile (to play well with Java code) and
>>> performed with Compile_lock held (to avoid races between VM operations).
>>>
>>> Some statistics:
>>>    Box2D, latest jdk9-dev
>>>      - CallSite instances: ~22000
>>>
>>>      - invalidated nmethods due to CallSite target changes: ~60
>>>
>>>      - checked call_site_target_value dependencies:
>>>        - before the fix: ~1,600,000
>>>        - after the fix:        ~600
>>>
>>> Testing:
>>>    - dedicated test which excercises different state transitions
>>>    - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>

From vladimir.x.ivanov at oracle.com  Wed Apr 15 16:26:15 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 15 Apr 2015 19:26:15 +0300
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
Message-ID: <552E9127.9030908@oracle.com>

Vitaly,

Type profiling reliably detects single interface implementation cases 
and type check overhead is completely eliminated in most of the cases 
(type checks are aggressively commoned).

Do you still think it is worth an effort?

Best regards,
Vladimir Ivanov

On 4/15/15 5:10 PM, Vitaly Davidovich wrote:
> Hi guys,
>
> So CHA on classes works nicely in the case of only one subtype loaded.
> What about interfaces? Currently, it looks like no such
> optimization/analysis is done.  In my experience, there's a substantial
> amount of code that exposes an interface via some API, but then loads
> only implementation of it.  The interface is used instead of abstract
> class to allow more flexibility in the future.
>
> I fully realize that lots of interfaces have more than 1 implementer
> loaded at runtime, but I also think it's worthwhile to attempt CHA for them.
>
> Is this something that's feasible to do? It would require more class
> loading dependencies to be tracked, but I'm also fine with having this
> be an extra flag that I can use to enable/disable this optimization.
>
> Thoughts?
>
> Thanks

From vitalyd at gmail.com  Wed Apr 15 16:37:50 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 15 Apr 2015 12:37:50 -0400
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <552E9127.9030908@oracle.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E9127.9030908@oracle.com>
Message-ID: <CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>

Hi Vladimir,

Here's what I see on 7u60:

private static int doIt(final Foo f) {
return f.num();
    }

    interface Foo
    {
int num();
    }

    final class FooImpl implements Foo
    {
@Override
public int num() {
    return 1;
}
    }

Running a simple test where only FooImpl is loaded (in fact, it's the only
impl period) produces the following asm (stripped down to essentials):

  0x00007f0b31e14a6c: mov    0x8(%rsi),%r10d    ; implicit exception:
dispatches to 0x00007f0b31e14a9d
  0x00007f0b31e14a70: cmp    $0x71c9e068,%r10d  ;   {oop('FooImpl')}
  0x00007f0b31e14a77: jne    0x00007f0b31e14a8a
  0x00007f0b31e14a79: mov    $0x1,%eax
  0x00007f0b31e14a7e: add    $0x10,%rsp
  0x00007f0b31e14a82: pop    %rbp

If I change Foo to be an abstract class, we get this:

0x00007f0209deb18c: test   %rsi,%rsi
  0x00007f0209deb18f: je     0x00007f0209deb1a2
  0x00007f0209deb191: mov    $0x1,%eax
  0x00007f0209deb196: add    $0x10,%rsp
  0x00007f0209deb19a: pop    %rbp

So there's an explicit null check but no type check.

Did something change in java 8 or 9 that leads you to say "completely
eliminated"?

Thanks

On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov <
vladimir.x.ivanov at oracle.com> wrote:

> Vitaly,
>
> Type profiling reliably detects single interface implementation cases and
> type check overhead is completely eliminated in most of the cases (type
> checks are aggressively commoned).
>
> Do you still think it is worth an effort?
>
> Best regards,
> Vladimir Ivanov
>
>
> On 4/15/15 5:10 PM, Vitaly Davidovich wrote:
>
>> Hi guys,
>>
>> So CHA on classes works nicely in the case of only one subtype loaded.
>> What about interfaces? Currently, it looks like no such
>> optimization/analysis is done.  In my experience, there's a substantial
>> amount of code that exposes an interface via some API, but then loads
>> only implementation of it.  The interface is used instead of abstract
>> class to allow more flexibility in the future.
>>
>> I fully realize that lots of interfaces have more than 1 implementer
>> loaded at runtime, but I also think it's worthwhile to attempt CHA for
>> them.
>>
>> Is this something that's feasible to do? It would require more class
>> loading dependencies to be tracked, but I'm also fine with having this
>> be an extra flag that I can use to enable/disable this optimization.
>>
>> Thoughts?
>>
>> Thanks
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150415/72d22e78/attachment-0001.html>

From roland.westrelin at oracle.com  Wed Apr 15 16:43:04 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 15 Apr 2015 18:43:04 +0200
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <552E89EC.7080900@oracle.com>
References: <551C5B92.8060500@oracle.com> <552527E1.5060102@oracle.com>
	<F76DEB60-25C3-4CD4-B71F-C29E364CBBB2@oracle.com>
	<552E89EC.7080900@oracle.com>
Message-ID: <AB789311-7EEB-47B3-BCD6-F381A70E8386@oracle.com>

>  http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/

That looks good to me.

Roland.

From vladimir.x.ivanov at oracle.com  Wed Apr 15 17:02:23 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 15 Apr 2015 20:02:23 +0300
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>	<552E9127.9030908@oracle.com>
	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
Message-ID: <552E999F.1080707@oracle.com>

Nothing changed in 8 & 9 in this respect.

You are looking on a microbenchmark, where you have a trivial method 
with contains just a single call. My point is that it's a corner case 
and you shouldn't notice the difference in a larger application.

Null checks are pervasive on Java level, but for JIT compiler it is 
enough to perform it only once on a value to known the value is non-null 
afterwards.

The same applies to exact type checks: dominating exact type check 
eliminates the need to repeat the type check. It is recorded in C2 type 
system and propagated to all usages.

Every place where type profiling for that interface happens a single 
exact type will be recorded.

Please, note that CHA is more generic and covers the cases when numerous 
classes have a single method implementation. Type profiling is usually 
useless in such case.

But in your example there's a single implementing class, so type profile 
works fine.

Best regards,
Vladimir Ivanov

On 4/15/15 7:37 PM, Vitaly Davidovich wrote:
> Hi Vladimir,
>
> Here's what I see on 7u60:
>
> private static int doIt(final Foo f) {
> return f.num();
>      }
>
>      interface Foo
>      {
> int num();
>      }
>
>      final class FooImpl implements Foo
>      {
> @Override
> public int num() {
>     return 1;
> }
>      }
>
> Running a simple test where only FooImpl is loaded (in fact, it's the
> only impl period) produces the following asm (stripped down to essentials):
>
>    0x00007f0b31e14a6c: mov    0x8(%rsi),%r10d    ; implicit exception:
> dispatches to 0x00007f0b31e14a9d
>    0x00007f0b31e14a70: cmp    $0x71c9e068,%r10d  ;   {oop('FooImpl')}
>    0x00007f0b31e14a77: jne    0x00007f0b31e14a8a
>    0x00007f0b31e14a79: mov    $0x1,%eax
>    0x00007f0b31e14a7e: add    $0x10,%rsp
>    0x00007f0b31e14a82: pop    %rbp
>
> If I change Foo to be an abstract class, we get this:
>
> 0x00007f0209deb18c: test   %rsi,%rsi
>    0x00007f0209deb18f: je     0x00007f0209deb1a2
>    0x00007f0209deb191: mov    $0x1,%eax
>    0x00007f0209deb196: add    $0x10,%rsp
>    0x00007f0209deb19a: pop    %rbp
>
> So there's an explicit null check but no type check.
>
> Did something change in java 8 or 9 that leads you to say "completely
> eliminated"?
>
> Thanks
>
> On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>
>     Vitaly,
>
>     Type profiling reliably detects single interface implementation
>     cases and type check overhead is completely eliminated in most of
>     the cases (type checks are aggressively commoned).
>
>     Do you still think it is worth an effort?
>
>     Best regards,
>     Vladimir Ivanov
>
>
>     On 4/15/15 5:10 PM, Vitaly Davidovich wrote:
>
>         Hi guys,
>
>         So CHA on classes works nicely in the case of only one subtype
>         loaded.
>         What about interfaces? Currently, it looks like no such
>         optimization/analysis is done.  In my experience, there's a
>         substantial
>         amount of code that exposes an interface via some API, but then
>         loads
>         only implementation of it.  The interface is used instead of
>         abstract
>         class to allow more flexibility in the future.
>
>         I fully realize that lots of interfaces have more than 1 implementer
>         loaded at runtime, but I also think it's worthwhile to attempt
>         CHA for them.
>
>         Is this something that's feasible to do? It would require more class
>         loading dependencies to be tracked, but I'm also fine with
>         having this
>         be an extra flag that I can use to enable/disable this optimization.
>
>         Thoughts?
>
>         Thanks
>
>

From vladimir.kozlov at oracle.com  Wed Apr 15 17:23:02 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 15 Apr 2015 10:23:02 -0700
Subject: RFR(XS): 8074676: java.lang.invoke.PermuteArgsTest.java fails
	with "assert(is_Initialize()) failed: invalid node class"
In-Reply-To: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com>
References: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com>
Message-ID: <552E9E76.1080006@oracle.com>

Good.

Thanks,
Vladimir

On 4/15/15 2:17 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8074676/webrev.00/
>
> The guards that I added in the Arrays.copyOf() intrinsic can cause the control to become top. The code is missing a check for stopped().
>
> Roland.
>

From vladimir.kozlov at oracle.com  Wed Apr 15 17:24:01 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 15 Apr 2015 10:24:01 -0700
Subject: RFR(S): 8077832: SA's dumpreplaydata, dumpcfg and buildreplayjars
	are broken
In-Reply-To: <F72C231A-7282-4898-B76B-0A2C18CEF0F0@oracle.com>
References: <F72C231A-7282-4898-B76B-0A2C18CEF0F0@oracle.com>
Message-ID: <552E9EB1.6070607@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/15/15 2:48 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8077832/webrev.00/
>
> I found 3 locations where the SA code is out of sync with the hotspot code.
>
> Roland.
>

From vitalyd at gmail.com  Wed Apr 15 17:40:18 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 15 Apr 2015 13:40:18 -0400
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <552E999F.1080707@oracle.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E9127.9030908@oracle.com>
	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
	<552E999F.1080707@oracle.com>
Message-ID: <CAHjP37F6XHTsbuxuvFJ7=-2=5+rKO-xh2E+UstQ8zaEEGGYaeQ@mail.gmail.com>

So I'm not worried about null checks because they're actually handled
really well.  They're also typically a quick test against a register if not
using implicit checking via trap.

As for propagating type information, I'm assuming this information is
propagated into the inlined code only -- if anything fails to inline, it
will not receive this information and will perform the same type check, is
that right? It's hard to argue against "this is a microbenchmark, larger
code won't notice the difference", but when you have code that's
"scattered" around (i.e. not all inlined in the same place) then it sounds
like this check will still be performed at each of those places.  In a
complex call graph, it's not realistic to expect the entire thing to inline
(for good reason) -- there are going to be islands.  My thinking here is
that given this analysis exists for classes (and works really well),
extending it to interfaces (using a heuristic like Remi's, a flag, etc)
would be profitable in some places.


On Wed, Apr 15, 2015 at 1:02 PM, Vladimir Ivanov <
vladimir.x.ivanov at oracle.com> wrote:

> Nothing changed in 8 & 9 in this respect.
>
> You are looking on a microbenchmark, where you have a trivial method with
> contains just a single call. My point is that it's a corner case and you
> shouldn't notice the difference in a larger application.
>
> Null checks are pervasive on Java level, but for JIT compiler it is enough
> to perform it only once on a value to known the value is non-null
> afterwards.
>
> The same applies to exact type checks: dominating exact type check
> eliminates the need to repeat the type check. It is recorded in C2 type
> system and propagated to all usages.
>
> Every place where type profiling for that interface happens a single exact
> type will be recorded.
>
> Please, note that CHA is more generic and covers the cases when numerous
> classes have a single method implementation. Type profiling is usually
> useless in such case.
>
> But in your example there's a single implementing class, so type profile
> works fine.
>
> Best regards,
> Vladimir Ivanov
>
>
> On 4/15/15 7:37 PM, Vitaly Davidovich wrote:
>
>> Hi Vladimir,
>>
>> Here's what I see on 7u60:
>>
>> private static int doIt(final Foo f) {
>> return f.num();
>>      }
>>
>>      interface Foo
>>      {
>> int num();
>>      }
>>
>>      final class FooImpl implements Foo
>>      {
>> @Override
>> public int num() {
>>     return 1;
>> }
>>      }
>>
>> Running a simple test where only FooImpl is loaded (in fact, it's the
>> only impl period) produces the following asm (stripped down to
>> essentials):
>>
>>    0x00007f0b31e14a6c: mov    0x8(%rsi),%r10d    ; implicit exception:
>> dispatches to 0x00007f0b31e14a9d
>>    0x00007f0b31e14a70: cmp    $0x71c9e068,%r10d  ;   {oop('FooImpl')}
>>    0x00007f0b31e14a77: jne    0x00007f0b31e14a8a
>>    0x00007f0b31e14a79: mov    $0x1,%eax
>>    0x00007f0b31e14a7e: add    $0x10,%rsp
>>    0x00007f0b31e14a82: pop    %rbp
>>
>> If I change Foo to be an abstract class, we get this:
>>
>> 0x00007f0209deb18c: test   %rsi,%rsi
>>    0x00007f0209deb18f: je     0x00007f0209deb1a2
>>    0x00007f0209deb191: mov    $0x1,%eax
>>    0x00007f0209deb196: add    $0x10,%rsp
>>    0x00007f0209deb19a: pop    %rbp
>>
>> So there's an explicit null check but no type check.
>>
>> Did something change in java 8 or 9 that leads you to say "completely
>> eliminated"?
>>
>> Thanks
>>
>> On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov
>> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>>
>> wrote:
>>
>>     Vitaly,
>>
>>     Type profiling reliably detects single interface implementation
>>     cases and type check overhead is completely eliminated in most of
>>     the cases (type checks are aggressively commoned).
>>
>>     Do you still think it is worth an effort?
>>
>>     Best regards,
>>     Vladimir Ivanov
>>
>>
>>     On 4/15/15 5:10 PM, Vitaly Davidovich wrote:
>>
>>         Hi guys,
>>
>>         So CHA on classes works nicely in the case of only one subtype
>>         loaded.
>>         What about interfaces? Currently, it looks like no such
>>         optimization/analysis is done.  In my experience, there's a
>>         substantial
>>         amount of code that exposes an interface via some API, but then
>>         loads
>>         only implementation of it.  The interface is used instead of
>>         abstract
>>         class to allow more flexibility in the future.
>>
>>         I fully realize that lots of interfaces have more than 1
>> implementer
>>         loaded at runtime, but I also think it's worthwhile to attempt
>>         CHA for them.
>>
>>         Is this something that's feasible to do? It would require more
>> class
>>         loading dependencies to be tracked, but I'm also fine with
>>         having this
>>         be an extra flag that I can use to enable/disable this
>> optimization.
>>
>>         Thoughts?
>>
>>         Thanks
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150415/9ec35b3e/attachment-0001.html>

From vitalyd at gmail.com  Wed Apr 15 18:39:00 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 15 Apr 2015 14:39:00 -0400
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <CAHjP37F6XHTsbuxuvFJ7=-2=5+rKO-xh2E+UstQ8zaEEGGYaeQ@mail.gmail.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E9127.9030908@oracle.com>
	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
	<552E999F.1080707@oracle.com>
	<CAHjP37F6XHTsbuxuvFJ7=-2=5+rKO-xh2E+UstQ8zaEEGGYaeQ@mail.gmail.com>
Message-ID: <CAHjP37GrJH=5y9HvyWw8sc_kLE7SuVu5j40TirtxgbwasAaPcw@mail.gmail.com>

By the way, just to be clear - my main gripe with the type check is that it
loads memory (class pointer in the header) which can take an unnecessary
cache miss if no instance data is used or instance data to be used is at
least cacheline size bytes away from the header; the cmp+jmp is not ideal
but secondary.

sent from my phone
On Apr 15, 2015 1:40 PM, "Vitaly Davidovich" <vitalyd at gmail.com> wrote:

> So I'm not worried about null checks because they're actually handled
> really well.  They're also typically a quick test against a register if not
> using implicit checking via trap.
>
> As for propagating type information, I'm assuming this information is
> propagated into the inlined code only -- if anything fails to inline, it
> will not receive this information and will perform the same type check, is
> that right? It's hard to argue against "this is a microbenchmark, larger
> code won't notice the difference", but when you have code that's
> "scattered" around (i.e. not all inlined in the same place) then it sounds
> like this check will still be performed at each of those places.  In a
> complex call graph, it's not realistic to expect the entire thing to inline
> (for good reason) -- there are going to be islands.  My thinking here is
> that given this analysis exists for classes (and works really well),
> extending it to interfaces (using a heuristic like Remi's, a flag, etc)
> would be profitable in some places.
>
>
> On Wed, Apr 15, 2015 at 1:02 PM, Vladimir Ivanov <
> vladimir.x.ivanov at oracle.com> wrote:
>
>> Nothing changed in 8 & 9 in this respect.
>>
>> You are looking on a microbenchmark, where you have a trivial method with
>> contains just a single call. My point is that it's a corner case and you
>> shouldn't notice the difference in a larger application.
>>
>> Null checks are pervasive on Java level, but for JIT compiler it is
>> enough to perform it only once on a value to known the value is non-null
>> afterwards.
>>
>> The same applies to exact type checks: dominating exact type check
>> eliminates the need to repeat the type check. It is recorded in C2 type
>> system and propagated to all usages.
>>
>> Every place where type profiling for that interface happens a single
>> exact type will be recorded.
>>
>> Please, note that CHA is more generic and covers the cases when numerous
>> classes have a single method implementation. Type profiling is usually
>> useless in such case.
>>
>> But in your example there's a single implementing class, so type profile
>> works fine.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>
>> On 4/15/15 7:37 PM, Vitaly Davidovich wrote:
>>
>>> Hi Vladimir,
>>>
>>> Here's what I see on 7u60:
>>>
>>> private static int doIt(final Foo f) {
>>> return f.num();
>>>      }
>>>
>>>      interface Foo
>>>      {
>>> int num();
>>>      }
>>>
>>>      final class FooImpl implements Foo
>>>      {
>>> @Override
>>> public int num() {
>>>     return 1;
>>> }
>>>      }
>>>
>>> Running a simple test where only FooImpl is loaded (in fact, it's the
>>> only impl period) produces the following asm (stripped down to
>>> essentials):
>>>
>>>    0x00007f0b31e14a6c: mov    0x8(%rsi),%r10d    ; implicit exception:
>>> dispatches to 0x00007f0b31e14a9d
>>>    0x00007f0b31e14a70: cmp    $0x71c9e068,%r10d  ;   {oop('FooImpl')}
>>>    0x00007f0b31e14a77: jne    0x00007f0b31e14a8a
>>>    0x00007f0b31e14a79: mov    $0x1,%eax
>>>    0x00007f0b31e14a7e: add    $0x10,%rsp
>>>    0x00007f0b31e14a82: pop    %rbp
>>>
>>> If I change Foo to be an abstract class, we get this:
>>>
>>> 0x00007f0209deb18c: test   %rsi,%rsi
>>>    0x00007f0209deb18f: je     0x00007f0209deb1a2
>>>    0x00007f0209deb191: mov    $0x1,%eax
>>>    0x00007f0209deb196: add    $0x10,%rsp
>>>    0x00007f0209deb19a: pop    %rbp
>>>
>>> So there's an explicit null check but no type check.
>>>
>>> Did something change in java 8 or 9 that leads you to say "completely
>>> eliminated"?
>>>
>>> Thanks
>>>
>>> On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov
>>> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>>
>>> wrote:
>>>
>>>     Vitaly,
>>>
>>>     Type profiling reliably detects single interface implementation
>>>     cases and type check overhead is completely eliminated in most of
>>>     the cases (type checks are aggressively commoned).
>>>
>>>     Do you still think it is worth an effort?
>>>
>>>     Best regards,
>>>     Vladimir Ivanov
>>>
>>>
>>>     On 4/15/15 5:10 PM, Vitaly Davidovich wrote:
>>>
>>>         Hi guys,
>>>
>>>         So CHA on classes works nicely in the case of only one subtype
>>>         loaded.
>>>         What about interfaces? Currently, it looks like no such
>>>         optimization/analysis is done.  In my experience, there's a
>>>         substantial
>>>         amount of code that exposes an interface via some API, but then
>>>         loads
>>>         only implementation of it.  The interface is used instead of
>>>         abstract
>>>         class to allow more flexibility in the future.
>>>
>>>         I fully realize that lots of interfaces have more than 1
>>> implementer
>>>         loaded at runtime, but I also think it's worthwhile to attempt
>>>         CHA for them.
>>>
>>>         Is this something that's feasible to do? It would require more
>>> class
>>>         loading dependencies to be tracked, but I'm also fine with
>>>         having this
>>>         be an extra flag that I can use to enable/disable this
>>> optimization.
>>>
>>>         Thoughts?
>>>
>>>         Thanks
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150415/53e98c45/attachment.html>

From christian.thalinger at oracle.com  Wed Apr 15 19:47:54 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 15 Apr 2015 12:47:54 -0700
Subject: Java 8 TieredCompilation Blacklist?
In-Reply-To: <552CE558.5040602@finkzeit.at>
References: <552CE558.5040602@finkzeit.at>
Message-ID: <55FB246C-621D-4C5E-AB04-193F4A6BA457@oracle.com>

exclude is what you want:

$ java -XX:CompileCommand=help

The CompileCommand option enables the user of the JVM to control specific
behavior of the dynamic compilers. Many commands require a pattern that defines
the set of methods the command shall be applied to. The CompileCommand
option provides the following commands:

  break,<pattern>       - debug breakpoint in compiler and in generated code
  print,<pattern>       - print assembly
  exclude,<pattern>     - don't compile or inline
  inline,<pattern>      - always inline
  dontinline,<pattern>  - don't inline
  compileonly,<pattern> - compile only
  log,<pattern>         - log compilation
  option,<pattern>,<option type>,<option name>,<value>
                        - set value of custom option
  option,<pattern>,<bool option name>
                        - shorthand for setting boolean flag
  quiet                 - silence the compile command output
  help                  - print this text

The preferred format for the method matching pattern is:
  package/Class.method()

For backward compatibility this form is also allowed:
  package.Class::method()

The signature can be separated by an optional whitespace or comma:
  package/Class.method ()

The class and method identifier can be used together with leading or
trailing *'s for a small amount of wildcarding:
  *ackage/Clas*.*etho*()

It is possible to use more than one CompileCommand on the command line:
  -XX:CompileCommand=exclude,java/*.* -XX:CompileCommand=log,java*.*

The CompileCommands can be loaded from a file with the flag
-XX:CompileCommandFile=<file> or be added to the file '.hotspot_compiler'
Use the same format in the file as the argument to the CompileCommand flag.
Add one command on each line.
  exclude java/*.*
  option java/*.* ReplayInline

The following commands have conflicting behavior: 'exclude', 'inline', 'dontinline',
and 'compileonly'. There is no priority of commands. Applying (a subset of) these
commands to the same method results in undefined behavior.

> On Apr 14, 2015, at 3:00 AM, Wolfgang Pedot <wolfgang.pedot at finkzeit.at> wrote:
> 
> Hello,
> 
> I have recently migrated a big-ish application from 7u40 to 8u40 and I noticed a quite substantial increase in CPU utilisation.
> After doing some research I figured out that the cause of that is TieredCompilation which is now on by default, I have deactivated that feature and now CPU utilisation is back to normal.
> I tested TieredCompilation before on 7u<something> and also had an increase in CPU up to the point where the application actually slowed down so I ended that test.
> A part of the application uses BIRT and that tends to generate a lot of short-lived classes to optimize Javascript-code, my guess is that the tiered compiler compiles those classes in an attempt to optimize them and
> depending on the usage of the system that increases CPU without really accelerating anything (according to statistics). I have found "CompileOnly" which seems to be something to be used for test and development, is there something like a Blacklist I can use to tell the compiler NOT to compile classes in a specific package?
> 
> The system had been running for ~13h on 8u40 and used 1.5h of CPU-time for compilation, the previous version running on 7u40 had been up for ~62.5days and only used 36min for compilation. I did notice the much quicker warmup in the response-times after the switch to 8u40 but I dont want the system to spend so much time compiling stuff that does not really improve performance.
> 
> any help would be appreciated
> 
> Wolfgang
> 


From vitalyd at gmail.com  Thu Apr 16 03:05:13 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 15 Apr 2015 23:05:13 -0400
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <552E999F.1080707@oracle.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E9127.9030908@oracle.com>
	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
	<552E999F.1080707@oracle.com>
Message-ID: <CAHjP37Hda38eeuDh-CnYKS_foSyLupugzF6-NN36cBmaS_9-Tg@mail.gmail.com>

Also, suppose Foo is an interface and I have a Foo[], only one impl of Foo
loaded.  If I loop through this array and invoke an interface method, I'm
assuming there will be a type check guard on each iteration before
proceeding to the inlined code (I'll have to verify tomorrow when I'm at my
dev machine, but I can't see how this would not be the case).  That would
certainly be a place where removing the type check would be beneficial
since it can't be commoned out.

sent from my phone
On Apr 15, 2015 1:02 PM, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
wrote:

> Nothing changed in 8 & 9 in this respect.
>
> You are looking on a microbenchmark, where you have a trivial method with
> contains just a single call. My point is that it's a corner case and you
> shouldn't notice the difference in a larger application.
>
> Null checks are pervasive on Java level, but for JIT compiler it is enough
> to perform it only once on a value to known the value is non-null
> afterwards.
>
> The same applies to exact type checks: dominating exact type check
> eliminates the need to repeat the type check. It is recorded in C2 type
> system and propagated to all usages.
>
> Every place where type profiling for that interface happens a single exact
> type will be recorded.
>
> Please, note that CHA is more generic and covers the cases when numerous
> classes have a single method implementation. Type profiling is usually
> useless in such case.
>
> But in your example there's a single implementing class, so type profile
> works fine.
>
> Best regards,
> Vladimir Ivanov
>
> On 4/15/15 7:37 PM, Vitaly Davidovich wrote:
>
>> Hi Vladimir,
>>
>> Here's what I see on 7u60:
>>
>> private static int doIt(final Foo f) {
>> return f.num();
>>      }
>>
>>      interface Foo
>>      {
>> int num();
>>      }
>>
>>      final class FooImpl implements Foo
>>      {
>> @Override
>> public int num() {
>>     return 1;
>> }
>>      }
>>
>> Running a simple test where only FooImpl is loaded (in fact, it's the
>> only impl period) produces the following asm (stripped down to
>> essentials):
>>
>>    0x00007f0b31e14a6c: mov    0x8(%rsi),%r10d    ; implicit exception:
>> dispatches to 0x00007f0b31e14a9d
>>    0x00007f0b31e14a70: cmp    $0x71c9e068,%r10d  ;   {oop('FooImpl')}
>>    0x00007f0b31e14a77: jne    0x00007f0b31e14a8a
>>    0x00007f0b31e14a79: mov    $0x1,%eax
>>    0x00007f0b31e14a7e: add    $0x10,%rsp
>>    0x00007f0b31e14a82: pop    %rbp
>>
>> If I change Foo to be an abstract class, we get this:
>>
>> 0x00007f0209deb18c: test   %rsi,%rsi
>>    0x00007f0209deb18f: je     0x00007f0209deb1a2
>>    0x00007f0209deb191: mov    $0x1,%eax
>>    0x00007f0209deb196: add    $0x10,%rsp
>>    0x00007f0209deb19a: pop    %rbp
>>
>> So there's an explicit null check but no type check.
>>
>> Did something change in java 8 or 9 that leads you to say "completely
>> eliminated"?
>>
>> Thanks
>>
>> On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov
>> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>>
>> wrote:
>>
>>     Vitaly,
>>
>>     Type profiling reliably detects single interface implementation
>>     cases and type check overhead is completely eliminated in most of
>>     the cases (type checks are aggressively commoned).
>>
>>     Do you still think it is worth an effort?
>>
>>     Best regards,
>>     Vladimir Ivanov
>>
>>
>>     On 4/15/15 5:10 PM, Vitaly Davidovich wrote:
>>
>>         Hi guys,
>>
>>         So CHA on classes works nicely in the case of only one subtype
>>         loaded.
>>         What about interfaces? Currently, it looks like no such
>>         optimization/analysis is done.  In my experience, there's a
>>         substantial
>>         amount of code that exposes an interface via some API, but then
>>         loads
>>         only implementation of it.  The interface is used instead of
>>         abstract
>>         class to allow more flexibility in the future.
>>
>>         I fully realize that lots of interfaces have more than 1
>> implementer
>>         loaded at runtime, but I also think it's worthwhile to attempt
>>         CHA for them.
>>
>>         Is this something that's feasible to do? It would require more
>> class
>>         loading dependencies to be tracked, but I'm also fine with
>>         having this
>>         be an extra flag that I can use to enable/disable this
>> optimization.
>>
>>         Thoughts?
>>
>>         Thanks
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150415/2970e1fa/attachment-0001.html>

From john.r.rose at oracle.com  Thu Apr 16 06:56:24 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 15 Apr 2015 23:56:24 -0700
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <CAHjP37Hda38eeuDh-CnYKS_foSyLupugzF6-NN36cBmaS_9-Tg@mail.gmail.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E9127.9030908@oracle.com>
	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
	<552E999F.1080707@oracle.com>
	<CAHjP37Hda38eeuDh-CnYKS_foSyLupugzF6-NN36cBmaS_9-Tg@mail.gmail.com>
Message-ID: <37DE1BE4-C5F3-4240-9AA4-129ED74A84A9@oracle.com>

The real cost of the type check is a cache line fetch. In this case you have a bunch of objects whose code is the same method(s) and data fields are the only way to vary the behavior. So almost any plausible application of this pattern will need the same cache line as the data fields.

We have yet to see a convincing use case for this CHA (THA) case.  I put some code in the VM to support this once but we never needed it and it was removed. 

(Another opt with a similar flavor would be support for the singleton pattern.)

? John

> On Apr 15, 2015, at 8:05 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> That would certainly be a place where removing the type check would be beneficial

From forax at univ-mlv.fr  Thu Apr 16 07:15:17 2015
From: forax at univ-mlv.fr (Remi Forax)
Date: Thu, 16 Apr 2015 09:15:17 +0200
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <552E999F.1080707@oracle.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>	<552E9127.9030908@oracle.com>	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
	<552E999F.1080707@oracle.com>
Message-ID: <552F6185.6090607@univ-mlv.fr>


On 04/15/2015 07:02 PM, Vladimir Ivanov wrote:
> Nothing changed in 8 & 9 in this respect.
>
> You are looking on a microbenchmark, where you have a trivial method 
> with contains just a single call. My point is that it's a corner case 
> and you shouldn't notice the difference in a larger application.
>
> Null checks are pervasive on Java level, but for JIT compiler it is 
> enough to perform it only once on a value to known the value is 
> non-null afterwards.
>
> The same applies to exact type checks: dominating exact type check 
> eliminates the need to repeat the type check. It is recorded in C2 
> type system and propagated to all usages.
>
> Every place where type profiling for that interface happens a single 
> exact type will be recorded.
>
> Please, note that CHA is more generic and covers the cases when 
> numerous classes have a single method implementation. Type profiling 
> is usually useless in such case.

Now that we have default methods on interface, we also have several 
classes that implement the same interface that may share a single method 
implementation, methods like Predicate::and or Function::compose by example.

I have no idea if implementing CHA on interfaces worth it or not,
but Java the language is a moving target, something that was true at 
some point of time may be not true anymore.

>
> But in your example there's a single implementing class, so type 
> profile works fine.
>
> Best regards,
> Vladimir Ivanov

regards,
R?mi

>
> On 4/15/15 7:37 PM, Vitaly Davidovich wrote:
>> Hi Vladimir,
>>
>> Here's what I see on 7u60:
>>
>> private static int doIt(final Foo f) {
>> return f.num();
>>      }
>>
>>      interface Foo
>>      {
>> int num();
>>      }
>>
>>      final class FooImpl implements Foo
>>      {
>> @Override
>> public int num() {
>>     return 1;
>> }
>>      }
>>
>> Running a simple test where only FooImpl is loaded (in fact, it's the
>> only impl period) produces the following asm (stripped down to 
>> essentials):
>>
>>    0x00007f0b31e14a6c: mov    0x8(%rsi),%r10d    ; implicit exception:
>> dispatches to 0x00007f0b31e14a9d
>>    0x00007f0b31e14a70: cmp    $0x71c9e068,%r10d  ; {oop('FooImpl')}
>>    0x00007f0b31e14a77: jne    0x00007f0b31e14a8a
>>    0x00007f0b31e14a79: mov    $0x1,%eax
>>    0x00007f0b31e14a7e: add    $0x10,%rsp
>>    0x00007f0b31e14a82: pop    %rbp
>>
>> If I change Foo to be an abstract class, we get this:
>>
>> 0x00007f0209deb18c: test   %rsi,%rsi
>>    0x00007f0209deb18f: je     0x00007f0209deb1a2
>>    0x00007f0209deb191: mov    $0x1,%eax
>>    0x00007f0209deb196: add    $0x10,%rsp
>>    0x00007f0209deb19a: pop    %rbp
>>
>> So there's an explicit null check but no type check.
>>
>> Did something change in java 8 or 9 that leads you to say "completely
>> eliminated"?
>>
>> Thanks
>>
>> On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov
>> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> 
>> wrote:
>>
>>     Vitaly,
>>
>>     Type profiling reliably detects single interface implementation
>>     cases and type check overhead is completely eliminated in most of
>>     the cases (type checks are aggressively commoned).
>>
>>     Do you still think it is worth an effort?
>>
>>     Best regards,
>>     Vladimir Ivanov
>>
>>
>>     On 4/15/15 5:10 PM, Vitaly Davidovich wrote:
>>
>>         Hi guys,
>>
>>         So CHA on classes works nicely in the case of only one subtype
>>         loaded.
>>         What about interfaces? Currently, it looks like no such
>>         optimization/analysis is done.  In my experience, there's a
>>         substantial
>>         amount of code that exposes an interface via some API, but then
>>         loads
>>         only implementation of it.  The interface is used instead of
>>         abstract
>>         class to allow more flexibility in the future.
>>
>>         I fully realize that lots of interfaces have more than 1 
>> implementer
>>         loaded at runtime, but I also think it's worthwhile to attempt
>>         CHA for them.
>>
>>         Is this something that's feasible to do? It would require 
>> more class
>>         loading dependencies to be tracked, but I'm also fine with
>>         having this
>>         be an extra flag that I can use to enable/disable this 
>> optimization.
>>
>>         Thoughts?
>>
>>         Thanks
>>
>>


From vitalyd at gmail.com  Thu Apr 16 10:24:14 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Thu, 16 Apr 2015 06:24:14 -0400
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <37DE1BE4-C5F3-4240-9AA4-129ED74A84A9@oracle.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E9127.9030908@oracle.com>
	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
	<552E999F.1080707@oracle.com>
	<CAHjP37Hda38eeuDh-CnYKS_foSyLupugzF6-NN36cBmaS_9-Tg@mail.gmail.com>
	<37DE1BE4-C5F3-4240-9AA4-129ED74A84A9@oracle.com>
Message-ID: <CAHjP37HpSpR_H8eLJhic3nS8k_zBZY6o=GpKtx67BF3rq52KVQ@mail.gmail.com>

I'd argue that the cacheline fetch is more of an issue when not dealing
with array iteration but with the "islands" I was referring to earlier.  In
the array iteration scenario, the actual method invoked may be less
expensive than the type check.  Yes, the branch will be predicted and all,
but that's still extra code that will execute, extra entry in the BTB, etc.

Maybe you guys aren't convinced, but do realize that there's plenty of code
out there with the "one impl of an interface loaded" situation.  Every
small efficiency gain counts, IMO.

sent from my phone
On Apr 16, 2015 2:56 AM, "John Rose" <john.r.rose at oracle.com> wrote:

> The real cost of the type check is a cache line fetch. In this case you
> have a bunch of objects whose code is the same method(s) and data fields
> are the only way to vary the behavior. So almost any plausible application
> of this pattern will need the same cache line as the data fields.
>
> We have yet to see a convincing use case for this CHA (THA) case.  I put
> some code in the VM to support this once but we never needed it and it was
> removed.
>
> (Another opt with a similar flavor would be support for the singleton
> pattern.)
>
> ? John
>
> > On Apr 15, 2015, at 8:05 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
> >
> > That would certainly be a place where removing the type check would be
> beneficial
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150416/1a18037e/attachment.html>

From roland.westrelin at oracle.com  Thu Apr 16 10:56:21 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 16 Apr 2015 12:56:21 +0200
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <88169234-01DE-470C-B56A-D96AD7C53D50@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>
	<55086F20.9020305@oracle.com>
	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>
	<550893F0.9050608@oracle.com>
	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
	<88169234-01DE-470C-B56A-D96AD7C53D50@oracle.com>
Message-ID: <7BDCA4DA-BB97-436E-B761-6C883EDE955E@oracle.com>

Hi John,

Thanks for the review, comments and suggestions. I followed all suggestions except the one to use assert_dom because I?m not sure how applicable it is here. I added:

assert(is_dominator(bn, bm) || is_dominator(bm, bn), "one must dominate the other?);

instead. This is what I intend to push unless I hear an objection:

http://cr.openjdk.java.net/~roland/8069191/webrev.02/

Roland.


> On Apr 14, 2015, at 5:31 AM, John Rose <john.r.rose at oracle.com> wrote:
> 
> Reviewed.
> 
> On Mar 24, 2015, at 5:55 AM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
>> 
>>>> 
>>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something?
>>> 
>>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there.
>>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes.
>>> If control nodes come only from CastPP then I am fine with your code.
>> 
>> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that:
> 
> That's a good testing method.
> 
> Precedence edges are a simple way to add miscellaneous node relations but it is easy to forget they are there.  I guess the gcm.cpp code picks them up completely.  And after the extra edges are added, not much happens that could "forget" (drop) an edge.  (Note that copying a node to make a better one has a risk to "forget" precedence edges.)
> 
> But, if this technique were to be used in any more expansive way, or if you have lingering doubts about using precedence edges here, I would recommend creating an explicit new node type that captures multiple control dependency edges.  As we have a MergeMem node we could have a MergeControl node, whose input edges (after in(0)) would act like the precedence edges you are adding now.
> 
> Two minor comments on code style in compile.cpp:  The new 'switch' is hard to untangle.  Wouldn't it be simpler to put the 'wq.push(use)' call before the 'break', and drop the 'default' case completely?
> 
> Also, I really dislike it when block structure ({...}) cuts across #ifdef structure.  This hack would be slightly better:
>    #ifdef _LP64
>       if (n->in(1)->is_DecodeNarrowPtr() || n->in(2)->is_DecodeNarrowPtr()) 
>     ...
>       } else
>   #endif //_LP64
>   {
>      ...
>   }
> 
> Better yet, you could also just delete the #ifdef LP64 and let the tests go forward.  Or incorporate a manifest constant:
>     const bool is_LP64 = LP64_ONLY(true) NOT_LP64(false);
>     if (is_LP64 && (...)) { ... } else { ... }
> 
> The code in gcm.cpp treats precedence edges asymmetrically.  (The expression is 'n = is_dominator(bn, bm) ? m : n'.)  Do we want to assert that one of them dominates the other, perhaps using 'assert_dom'?
> 
> It's great to see all that mysterious old code go away.
> 
> ? John


From vitalyd at gmail.com  Thu Apr 16 11:03:15 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Thu, 16 Apr 2015 07:03:15 -0400
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <37DE1BE4-C5F3-4240-9AA4-129ED74A84A9@oracle.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E9127.9030908@oracle.com>
	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
	<552E999F.1080707@oracle.com>
	<CAHjP37Hda38eeuDh-CnYKS_foSyLupugzF6-NN36cBmaS_9-Tg@mail.gmail.com>
	<37DE1BE4-C5F3-4240-9AA4-129ED74A84A9@oracle.com>
Message-ID: <CAHjP37EDTa28Ac8=t5Esa459jJsWe72YB7YtOZdOfzm58PCtNA@mail.gmail.com>

John,

I'm also unclear why you say that "every application of this will need the
same cache line as the data fields" - what if the data field fetched is a
cacheline length away from the start of the object? Say the class contains
8 doubles and the line is 64 bytes - if I'm reading the field that got laid
out last, it's going to be a different line from the class pointer on each
instance.

sent from my phone
On Apr 16, 2015 2:56 AM, "John Rose" <john.r.rose at oracle.com> wrote:

> The real cost of the type check is a cache line fetch. In this case you
> have a bunch of objects whose code is the same method(s) and data fields
> are the only way to vary the behavior. So almost any plausible application
> of this pattern will need the same cache line as the data fields.
>
> We have yet to see a convincing use case for this CHA (THA) case.  I put
> some code in the VM to support this once but we never needed it and it was
> removed.
>
> (Another opt with a similar flavor would be support for the singleton
> pattern.)
>
> ? John
>
> > On Apr 15, 2015, at 8:05 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
> >
> > That would certainly be a place where removing the type check would be
> beneficial
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150416/b5b4c553/attachment.html>

From roland.westrelin at oracle.com  Thu Apr 16 11:57:25 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 16 Apr 2015 13:57:25 +0200
Subject: RFR(S) 8077504: Unsafe load can loose control dependency and cause
	crash
Message-ID: <42EDECBF-2347-406A-8835-29A98F1CA153@oracle.com>

http://cr.openjdk.java.net/~roland/8077504/webrev.00/

Because we consider that all loads that have their control set only depend on the test that immediately dominate them (and if we move the test, the load can follow the test), loads from unsafe intrinsics can be moved so they don?t depend on conditions that keep them valid anymore (see test cases). The fix I propose is to add a _depends_only_on_test flag to LoadNodes so when we construct a LoadNode for an unsafe load we can record that it shouldn?t be moved.

Roland.

From adinn at redhat.com  Thu Apr 16 14:07:18 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 16 Apr 2015 15:07:18 +0100
Subject: RFR(S) 8077504: Unsafe load can loose control dependency and
	cause crash
In-Reply-To: <42EDECBF-2347-406A-8835-29A98F1CA153@oracle.com>
References: <42EDECBF-2347-406A-8835-29A98F1CA153@oracle.com>
Message-ID: <552FC216.4010503@redhat.com>

Hi Roland,

On 16/04/15 12:57, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8077504/webrev.00/
> 
> Because we consider that all loads that have their control set only
> depend on the test that immediately dominate them (and if we move the
> test, the load can follow the test), loads from unsafe intrinsics can
> be moved so they don?t depend on conditions that keep them valid
> anymore (see test cases). The fix I propose is to add a
> _depends_only_on_test flag to LoadNodes so when we construct a
> LoadNode for an unsafe load we can record that it shouldn?t be
> moved.

I must admit I was a tad confused by your use of variable
does_not_depend_only_on_test at library_call.cpp:2635. I understand that
at this point you are trying to emphasise that this call gets passed
false where other calls get passed true. So, in the call we see

  Node* p = make_load(control(), adr, value_type, type, adr_type, mo,
                      does_not_depend_only_on_test, is_volatile);

which is fine when you understand what is going on. However, the
preceding assignment:

  bool does_not_depend_only_on_test = false;

makes it look like you are suggesting that this case does depend only on
the test.

Would it not be clearer if you signalled what is happening by avoiding
the bool var declarations and instead tagging the calls with an
explanatory comment as follows:

  Node* p = make_load(control(), adr, value_type, type, adr_type, mo,
                      false, //does_not_depend_only_on_test
                      is_volatile);

vs

  Node* p = make_load(control(), adr, value_type, type, adr_type, mo,
                      true, //depends_only_on_test
                      is_volatile);

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From wolfgang.pedot at finkzeit.at  Thu Apr 16 14:22:43 2015
From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot)
Date: Thu, 16 Apr 2015 16:22:43 +0200
Subject: Java 8 TieredCompilation Blacklist?
In-Reply-To: <55FB246C-621D-4C5E-AB04-193F4A6BA457@oracle.com>
References: <552CE558.5040602@finkzeit.at>
	<55FB246C-621D-4C5E-AB04-193F4A6BA457@oracle.com>
Message-ID: <552FC5B3.7010902@finkzeit.at>

Thanks, that is exactly what I need.
Tests look very promising, I?ll give that a try on the real system soon.

Wolfgang

Am 15.04.2015 21:47, schrieb Christian Thalinger:
> exclude is what you want:
>
> $ java -XX:CompileCommand=help
>
> The CompileCommand option enables the user of the JVM to control specific
> behavior of the dynamic compilers. Many commands require a pattern that defines
> the set of methods the command shall be applied to. The CompileCommand
> option provides the following commands:
>
>    break,<pattern>       - debug breakpoint in compiler and in generated code
>    print,<pattern>       - print assembly
>    exclude,<pattern>     - don't compile or inline
>    inline,<pattern>      - always inline
>    dontinline,<pattern>  - don't inline
>    compileonly,<pattern> - compile only
>    log,<pattern>         - log compilation
>    option,<pattern>,<option type>,<option name>,<value>
>                          - set value of custom option
>    option,<pattern>,<bool option name>
>                          - shorthand for setting boolean flag
>    quiet                 - silence the compile command output
>    help                  - print this text
>
> The preferred format for the method matching pattern is:
>    package/Class.method()
>
> For backward compatibility this form is also allowed:
>    package.Class::method()
>
> The signature can be separated by an optional whitespace or comma:
>    package/Class.method ()
>
> The class and method identifier can be used together with leading or
> trailing *'s for a small amount of wildcarding:
>    *ackage/Clas*.*etho*()
>
> It is possible to use more than one CompileCommand on the command line:
>    -XX:CompileCommand=exclude,java/*.* -XX:CompileCommand=log,java*.*
>
> The CompileCommands can be loaded from a file with the flag
> -XX:CompileCommandFile=<file> or be added to the file '.hotspot_compiler'
> Use the same format in the file as the argument to the CompileCommand flag.
> Add one command on each line.
>    exclude java/*.*
>    option java/*.* ReplayInline
>
> The following commands have conflicting behavior: 'exclude', 'inline', 'dontinline',
> and 'compileonly'. There is no priority of commands. Applying (a subset of) these
> commands to the same method results in undefined behavior.
>
>> On Apr 14, 2015, at 3:00 AM, Wolfgang Pedot <wolfgang.pedot at finkzeit.at> wrote:
>>
>> Hello,
>>
>> I have recently migrated a big-ish application from 7u40 to 8u40 and I noticed a quite substantial increase in CPU utilisation.
>> After doing some research I figured out that the cause of that is TieredCompilation which is now on by default, I have deactivated that feature and now CPU utilisation is back to normal.
>> I tested TieredCompilation before on 7u<something> and also had an increase in CPU up to the point where the application actually slowed down so I ended that test.
>> A part of the application uses BIRT and that tends to generate a lot of short-lived classes to optimize Javascript-code, my guess is that the tiered compiler compiles those classes in an attempt to optimize them and
>> depending on the usage of the system that increases CPU without really accelerating anything (according to statistics). I have found "CompileOnly" which seems to be something to be used for test and development, is there something like a Blacklist I can use to tell the compiler NOT to compile classes in a specific package?
>>
>> The system had been running for ~13h on 8u40 and used 1.5h of CPU-time for compilation, the previous version running on 7u40 had been up for ~62.5days and only used 36min for compilation. I did notice the much quicker warmup in the response-times after the switch to 8u40 but I dont want the system to spend so much time compiling stuff that does not really improve performance.
>>
>> any help would be appreciated
>>
>> Wolfgang
>>


-- 
Mit freundlichen Gr??en
Wolfgang Pedot
F&E
?????????????????
Fink Zeitsysteme GmbH | M?slestra?e 19-21 | 6844 Altach | ?sterreich
Tel: +43 5576 72388 | Fax: +43 5576 72388 14
wolfgang.pedot at finkzeit.at | www.finkzeit.at

Landesgericht Feldkirch, 72223k | USt.ld: ATU36401407

Wir erbringen unsere Leistungen ausschlie?lich auf Basis unserer AGB und Leistungs- und Nutzungsvereinbarung, die wir auf unserer Webseite unter www.finkzeit.at/rechtliches ver?ffentlicht haben.


From roland.westrelin at oracle.com  Thu Apr 16 15:30:24 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 16 Apr 2015 17:30:24 +0200
Subject: RFR(XS): 8074676: java.lang.invoke.PermuteArgsTest.java fails
	with "assert(is_Initialize()) failed: invalid node class"
In-Reply-To: <552E9E76.1080006@oracle.com>
References: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com>
	<552E9E76.1080006@oracle.com>
Message-ID: <E0393F02-4702-4582-802C-6A01C28E677A@oracle.com>

Thanks for the reviews, Vladimir & Vladimir.

Roland.

> On Apr 15, 2015, at 7:23 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Good.
> 
> Thanks,
> Vladimir
> 
> On 4/15/15 2:17 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8074676/webrev.00/
>> 
>> The guards that I added in the Arrays.copyOf() intrinsic can cause the control to become top. The code is missing a check for stopped().
>> 
>> Roland.
>> 


From vladimir.kozlov at oracle.com  Thu Apr 16 16:44:49 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 16 Apr 2015 09:44:49 -0700
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <7BDCA4DA-BB97-436E-B761-6C883EDE955E@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>	<55086F20.9020305@oracle.com>	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>	<550893F0.9050608@oracle.com>	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>	<88169234-01DE-470C-B56A-D96AD7C53D50@oracle.com>
	<7BDCA4DA-BB97-436E-B761-6C883EDE955E@oracle.com>
Message-ID: <552FE701.3070502@oracle.com>

Still good for me.

Thanks,
Vladimir

On 4/16/15 3:56 AM, Roland Westrelin wrote:
> Hi John,
>
> Thanks for the review, comments and suggestions. I followed all suggestions except the one to use assert_dom because I?m not sure how applicable it is here. I added:
>
> assert(is_dominator(bn, bm) || is_dominator(bm, bn), "one must dominate the other?);
>
> instead. This is what I intend to push unless I hear an objection:
>
> http://cr.openjdk.java.net/~roland/8069191/webrev.02/
>
> Roland.
>
>
>> On Apr 14, 2015, at 5:31 AM, John Rose <john.r.rose at oracle.com> wrote:
>>
>> Reviewed.
>>
>> On Mar 24, 2015, at 5:55 AM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
>>>
>>>>>
>>>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something?
>>>>
>>>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there.
>>>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes.
>>>> If control nodes come only from CastPP then I am fine with your code.
>>>
>>> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that:
>>
>> That's a good testing method.
>>
>> Precedence edges are a simple way to add miscellaneous node relations but it is easy to forget they are there.  I guess the gcm.cpp code picks them up completely.  And after the extra edges are added, not much happens that could "forget" (drop) an edge.  (Note that copying a node to make a better one has a risk to "forget" precedence edges.)
>>
>> But, if this technique were to be used in any more expansive way, or if you have lingering doubts about using precedence edges here, I would recommend creating an explicit new node type that captures multiple control dependency edges.  As we have a MergeMem node we could have a MergeControl node, whose input edges (after in(0)) would act like the precedence edges you are adding now.
>>
>> Two minor comments on code style in compile.cpp:  The new 'switch' is hard to untangle.  Wouldn't it be simpler to put the 'wq.push(use)' call before the 'break', and drop the 'default' case completely?
>>
>> Also, I really dislike it when block structure ({...}) cuts across #ifdef structure.  This hack would be slightly better:
>>     #ifdef _LP64
>>        if (n->in(1)->is_DecodeNarrowPtr() || n->in(2)->is_DecodeNarrowPtr())
>>      ...
>>        } else
>>    #endif //_LP64
>>    {
>>       ...
>>    }
>>
>> Better yet, you could also just delete the #ifdef LP64 and let the tests go forward.  Or incorporate a manifest constant:
>>      const bool is_LP64 = LP64_ONLY(true) NOT_LP64(false);
>>      if (is_LP64 && (...)) { ... } else { ... }
>>
>> The code in gcm.cpp treats precedence edges asymmetrically.  (The expression is 'n = is_dominator(bn, bm) ? m : n'.)  Do we want to assert that one of them dominates the other, perhaps using 'assert_dom'?
>>
>> It's great to see all that mysterious old code go away.
>>
>> ? John
>

From vladimir.x.ivanov at oracle.com  Thu Apr 16 17:11:12 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 16 Apr 2015 20:11:12 +0300
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <CAHjP37HpSpR_H8eLJhic3nS8k_zBZY6o=GpKtx67BF3rq52KVQ@mail.gmail.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>	<552E9127.9030908@oracle.com>	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>	<552E999F.1080707@oracle.com>	<CAHjP37Hda38eeuDh-CnYKS_foSyLupugzF6-NN36cBmaS_9-Tg@mail.gmail.com>	<37DE1BE4-C5F3-4240-9AA4-129ED74A84A9@oracle.com>
	<CAHjP37HpSpR_H8eLJhic3nS8k_zBZY6o=GpKtx67BF3rq52KVQ@mail.gmail.com>
Message-ID: <552FED30.8070404@oracle.com>

There's a more compelling use case for CHA on interfaces [1].
That's the case where type profiling fails.

The other argument to extend CHA for interfaces is to take default 
methods into account [2].

Most likely, single interface implementer case you are looking for will 
follow nicely from one of these RFEs.

But, as noted by Tom in [1], CHA doesn't allow to reliably eliminate the 
type check for interfaces (since bytecode verifier doesn't provide type 
safety guarantees for interfaces). Except the array case, where type 
check happens during array store and JIT can skip rely on that.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-6986483
[2] https://bugs.openjdk.java.net/browse/JDK-8036580

On 4/16/15 1:24 PM, Vitaly Davidovich wrote:
> I'd argue that the cacheline fetch is more of an issue when not dealing
> with array iteration but with the "islands" I was referring to earlier.
> In the array iteration scenario, the actual method invoked may be less
> expensive than the type check.  Yes, the branch will be predicted and
> all, but that's still extra code that will execute, extra entry in the
> BTB, etc.
>
> Maybe you guys aren't convinced, but do realize that there's plenty of
> code out there with the "one impl of an interface loaded" situation.
> Every small efficiency gain counts, IMO.
>
> sent from my phone
>
> On Apr 16, 2015 2:56 AM, "John Rose" <john.r.rose at oracle.com
> <mailto:john.r.rose at oracle.com>> wrote:
>
>     The real cost of the type check is a cache line fetch. In this case
>     you have a bunch of objects whose code is the same method(s) and
>     data fields are the only way to vary the behavior. So almost any
>     plausible application of this pattern will need the same cache line
>     as the data fields.
>
>     We have yet to see a convincing use case for this CHA (THA) case.  I
>     put some code in the VM to support this once but we never needed it
>     and it was removed.
>
>     (Another opt with a similar flavor would be support for the
>     singleton pattern.)
>
>     ? John
>
>      > On Apr 15, 2015, at 8:05 PM, Vitaly Davidovich <vitalyd at gmail.com
>     <mailto:vitalyd at gmail.com>> wrote:
>      >
>      > That would certainly be a place where removing the type check
>     would be beneficial
>

From vitalyd at gmail.com  Thu Apr 16 18:11:10 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Thu, 16 Apr 2015 14:11:10 -0400
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <552FED30.8070404@oracle.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>
	<552E9127.9030908@oracle.com>
	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>
	<552E999F.1080707@oracle.com>
	<CAHjP37Hda38eeuDh-CnYKS_foSyLupugzF6-NN36cBmaS_9-Tg@mail.gmail.com>
	<37DE1BE4-C5F3-4240-9AA4-129ED74A84A9@oracle.com>
	<CAHjP37HpSpR_H8eLJhic3nS8k_zBZY6o=GpKtx67BF3rq52KVQ@mail.gmail.com>
	<552FED30.8070404@oracle.com>
Message-ID: <CAHjP37G=FOBJ4f5+a3psMbmcAc3RcJ41boA9KgHFjp7XYsqw4A@mail.gmail.com>

Hmm, I didn't know that bc verifier doesn't verify interfaces.  Is that an
open problem at the moment?

Sounds like interfaces just get the short end of the stick all around :).

On Thu, Apr 16, 2015 at 1:11 PM, Vladimir Ivanov <
vladimir.x.ivanov at oracle.com> wrote:

> There's a more compelling use case for CHA on interfaces [1].
> That's the case where type profiling fails.
>
> The other argument to extend CHA for interfaces is to take default methods
> into account [2].
>
> Most likely, single interface implementer case you are looking for will
> follow nicely from one of these RFEs.
>
> But, as noted by Tom in [1], CHA doesn't allow to reliably eliminate the
> type check for interfaces (since bytecode verifier doesn't provide type
> safety guarantees for interfaces). Except the array case, where type check
> happens during array store and JIT can skip rely on that.
>
> Best regards,
> Vladimir Ivanov
>
> [1] https://bugs.openjdk.java.net/browse/JDK-6986483
> [2] https://bugs.openjdk.java.net/browse/JDK-8036580
>
> On 4/16/15 1:24 PM, Vitaly Davidovich wrote:
>
>> I'd argue that the cacheline fetch is more of an issue when not dealing
>> with array iteration but with the "islands" I was referring to earlier.
>> In the array iteration scenario, the actual method invoked may be less
>> expensive than the type check.  Yes, the branch will be predicted and
>> all, but that's still extra code that will execute, extra entry in the
>> BTB, etc.
>>
>> Maybe you guys aren't convinced, but do realize that there's plenty of
>> code out there with the "one impl of an interface loaded" situation.
>> Every small efficiency gain counts, IMO.
>>
>> sent from my phone
>>
>> On Apr 16, 2015 2:56 AM, "John Rose" <john.r.rose at oracle.com
>> <mailto:john.r.rose at oracle.com>> wrote:
>>
>>     The real cost of the type check is a cache line fetch. In this case
>>     you have a bunch of objects whose code is the same method(s) and
>>     data fields are the only way to vary the behavior. So almost any
>>     plausible application of this pattern will need the same cache line
>>     as the data fields.
>>
>>     We have yet to see a convincing use case for this CHA (THA) case.  I
>>     put some code in the VM to support this once but we never needed it
>>     and it was removed.
>>
>>     (Another opt with a similar flavor would be support for the
>>     singleton pattern.)
>>
>>     ? John
>>
>>      > On Apr 15, 2015, at 8:05 PM, Vitaly Davidovich <vitalyd at gmail.com
>>     <mailto:vitalyd at gmail.com>> wrote:
>>      >
>>      > That would certainly be a place where removing the type check
>>     would be beneficial
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150416/98a39445/attachment-0001.html>

From vladimir.x.ivanov at oracle.com  Thu Apr 16 18:51:09 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 16 Apr 2015 21:51:09 +0300
Subject: CHA for interfaces in C2 compiler
In-Reply-To: <CAHjP37G=FOBJ4f5+a3psMbmcAc3RcJ41boA9KgHFjp7XYsqw4A@mail.gmail.com>
References: <CAHjP37FBM9JxaCdZCju_NFcfyF7HGURY5Evjc6+o+8cYfgmsZA@mail.gmail.com>	<552E9127.9030908@oracle.com>	<CAHjP37FxcQExHVZ+FGFfnSxfdTcgfaWKXtJM1+6pPNCMc3tK6g@mail.gmail.com>	<552E999F.1080707@oracle.com>	<CAHjP37Hda38eeuDh-CnYKS_foSyLupugzF6-NN36cBmaS_9-Tg@mail.gmail.com>	<37DE1BE4-C5F3-4240-9AA4-129ED74A84A9@oracle.com>	<CAHjP37HpSpR_H8eLJhic3nS8k_zBZY6o=GpKtx67BF3rq52KVQ@mail.gmail.com>	<552FED30.8070404@oracle.com>
	<CAHjP37G=FOBJ4f5+a3psMbmcAc3RcJ41boA9KgHFjp7XYsqw4A@mail.gmail.com>
Message-ID: <5530049D.6090207@oracle.com>

I don't think it's a problem, but one of peculiarities how Java bytecode 
verification algorithm works. Current approach (interface type erasure 
to Object) allows considerable simplification of verification algorithm, 
but VM is obliged to do more runtime checks.

[1] ($3.3) has a good overview of the problems with verification of 
interface types in Java bytecode.

Probably, we'll get a chance to reconsider that choice in the future.

Best regards,
Vladimir Ivanov

[1] 
http://www-master.ufr-info-p6.jussieu.fr/2005/IMG/pdf/leroy-bytecode-verification-JAR.pdf

On 4/16/15 9:11 PM, Vitaly Davidovich wrote:
> Hmm, I didn't know that bc verifier doesn't verify interfaces.  Is that
> an open problem at the moment?
>
> Sounds like interfaces just get the short end of the stick all around :).
>
> On Thu, Apr 16, 2015 at 1:11 PM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>
>     There's a more compelling use case for CHA on interfaces [1].
>     That's the case where type profiling fails.
>
>     The other argument to extend CHA for interfaces is to take default
>     methods into account [2].
>
>     Most likely, single interface implementer case you are looking for
>     will follow nicely from one of these RFEs.
>
>     But, as noted by Tom in [1], CHA doesn't allow to reliably eliminate
>     the type check for interfaces (since bytecode verifier doesn't
>     provide type safety guarantees for interfaces). Except the array
>     case, where type check happens during array store and JIT can skip
>     rely on that.
>
>     Best regards,
>     Vladimir Ivanov
>
>     [1] https://bugs.openjdk.java.net/__browse/JDK-6986483
>     <https://bugs.openjdk.java.net/browse/JDK-6986483>
>     [2] https://bugs.openjdk.java.net/__browse/JDK-8036580
>     <https://bugs.openjdk.java.net/browse/JDK-8036580>
>
>     On 4/16/15 1:24 PM, Vitaly Davidovich wrote:
>
>         I'd argue that the cacheline fetch is more of an issue when not
>         dealing
>         with array iteration but with the "islands" I was referring to
>         earlier.
>         In the array iteration scenario, the actual method invoked may
>         be less
>         expensive than the type check.  Yes, the branch will be
>         predicted and
>         all, but that's still extra code that will execute, extra entry
>         in the
>         BTB, etc.
>
>         Maybe you guys aren't convinced, but do realize that there's
>         plenty of
>         code out there with the "one impl of an interface loaded" situation.
>         Every small efficiency gain counts, IMO.
>
>         sent from my phone
>
>         On Apr 16, 2015 2:56 AM, "John Rose" <john.r.rose at oracle.com
>         <mailto:john.r.rose at oracle.com>
>         <mailto:john.r.rose at oracle.com
>         <mailto:john.r.rose at oracle.com>__>> wrote:
>
>              The real cost of the type check is a cache line fetch. In
>         this case
>              you have a bunch of objects whose code is the same
>         method(s) and
>              data fields are the only way to vary the behavior. So
>         almost any
>              plausible application of this pattern will need the same
>         cache line
>              as the data fields.
>
>              We have yet to see a convincing use case for this CHA (THA)
>         case.  I
>              put some code in the VM to support this once but we never
>         needed it
>              and it was removed.
>
>              (Another opt with a similar flavor would be support for the
>              singleton pattern.)
>
>              ? John
>
>               > On Apr 15, 2015, at 8:05 PM, Vitaly Davidovich
>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>              <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>               >
>               > That would certainly be a place where removing the type
>         check
>              would be beneficial
>
>

From vladimir.x.ivanov at oracle.com  Thu Apr 16 18:56:23 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Thu, 16 Apr 2015 21:56:23 +0300
Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales
	devastatingly poorly
In-Reply-To: <AB789311-7EEB-47B3-BCD6-F381A70E8386@oracle.com>
References: <551C5B92.8060500@oracle.com> <552527E1.5060102@oracle.com>
	<F76DEB60-25C3-4CD4-B71F-C29E364CBBB2@oracle.com>
	<552E89EC.7080900@oracle.com>
	<AB789311-7EEB-47B3-BCD6-F381A70E8386@oracle.com>
Message-ID: <553005D7.2070102@oracle.com>

Roland, thanks a lot for the review!

Best regards,
Vladimir Ivanov

On 4/15/15 7:43 PM, Roland Westrelin wrote:
>>   http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/
>
> That looks good to me.
>
> Roland.
>

From john.r.rose at oracle.com  Thu Apr 16 21:19:10 2015
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 16 Apr 2015 14:19:10 -0700
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <7BDCA4DA-BB97-436E-B761-6C883EDE955E@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>
	<55086F20.9020305@oracle.com>
	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>
	<550893F0.9050608@oracle.com>
	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
	<88169234-01DE-470C-B56A-D96AD7C53D50@oracle.com>
	<7BDCA4DA-BB97-436E-B761-6C883EDE955E@oracle.com>
Message-ID: <E94CE692-6C2E-440B-A24E-3180E13EDE77@oracle.com>

On Apr 16, 2015, at 3:56 AM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
> 
> Thanks for the review, comments and suggestions. I followed all suggestions except the one to use assert_dom because I?m not sure how applicable it is here. I added:
> 
> assert(is_dominator(bn, bm) || is_dominator(bm, bn), "one must dominate the other?);
> 
> instead. This is what I intend to push unless I hear an objection:
> 
> http://cr.openjdk.java.net/~roland/8069191/webrev.02/ <http://cr.openjdk.java.net/~roland/8069191/webrev.02/>
I'm happy to see this change go in, because it simplifies the code base significantly.

The code structure and assertions in gcm.cpp and compile.cpp are cleaner; thanks.

(I keep wanting to suggest a little ad hoc de-deduplication of control edges.  E.g., if c==n->in(0), don't do n->add_prec(c).  Is that overly picky?  If done awkwardly it could make the code much harder to read.  Could have a subroutine n->ensure_control_or_add_prec(c).)

As we discussed on Skype, "case Op_CheckCastPP" should be accompanied by "case Op_CastPP", since CastPP nodes can cascade (non-trivially).

I think I understand the whole patch except for this line:
+     if (n->in(0) != NULL && (n->in(0)->is_Region() || (n->in(0)->in(0) != NULL && n->in(0)->in(0)->is_If()))) {

I would prefer a named predicate:

+     if (must_retain_control(n->in(0))) {

Then, the local routine can have the necessary comments explaining why some controls can be neglected while others must be retained.

And, along those lines, I'm having trouble imagining examples of controls that are *not* retained, let along understand why they are less important than Regions (OK, those are always important), and If-projections.  Why not (for example) catch projections?

Thanks!
? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150416/37873f30/attachment.html>

From vladimir.kozlov at oracle.com  Thu Apr 16 22:30:59 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 16 Apr 2015 15:30:59 -0700
Subject: RFR(S): 8076284: Improve vectorization of parallel streams
In-Reply-To: <39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C450E3E586@ORSMSX104.amr.corp.intel.com>	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334516@FMSMSX112.amr.corp.intel.com>	<39F83597C33E5F408096702907E6C450E3E5A4@ORSMSX104.amr.corp.intel.com>	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com>
Message-ID: <55303823.1020205@oracle.com>

Hi Jan,

You did not describe your changes in details (what they do).

IgnoreVectorizeMethod flag should positive and enabled by default. 
Rename it to AllowVectorizeOnDemand (or something similar):

+  product(bool, AllowVectorizeOnDemand, true, 
      \

Instead of next you should add intrinsic definition to
and classfile/vmSymbols.hpp and then check method()->intrinsic_id():

+    if (strcmp("forEachRemaining", method()->name()->as_quoted_ascii()) 
== 0 && method()->signature() != 0
+      && method()->signature()->as_symbol() != 0 && 
method()->signature()->as_symbol()->as_quoted_ascii() != 0 ) {
+      if 
(strstr(method()->signature()->as_symbol()->as_quoted_ascii(),"Ljava/util/function/IntConsumer")) 
{
+        set_do_vector_loop(true);
+      }
+    }

And that should be under flag too because in general forEachRemaining 
should be vectorized only if it is safe.

Can you also utilize changes done by Michael Berg for reduction 
optimization (the code in jdk9/hs-comp already)? I mean marking some 
nodes before unrolling and searching Phis.

Regards,
Vladimir

On 4/13/15 3:33 AM, Civlin, Jan wrote:
> Hi All,
>
>
>   We would like to contribute the improvement of vectorization of
>   parallel streams  from Intel.
>
> The contribution Bug ID: 8076284.
>
> Please review this patch:
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076284
>
> webrev: http://cr.openjdk.java.net/~kvn/8076284/webrev/
>
>
>       *Description*
>
> Improve vectorization of the unordered parallel streams (by vectorizing
> forEachRemaining method).
>
> For example, this forEach will be vectorized:
>
> java.util.stream.IntStream iStream = java.util.stream.IntStream.range(0,
> RANGE - 1).parallel();
>
> iStream.forEach( id -> c[id] = c[id] + c[id+1] );
>
> It also enables on-demand loop vectorization in a given method (by
> providing more hints to SuperWord optimization).
>
> For example, use -XX:CompileCommand=option,computeCall,Vectorizeto
> vectorize this loop
>
> void computeCall(double [] Call, double  puByDf, double  pdByDf)
>
> {
>
> for(int i = timeStep; i > 0; i--)
>
> for(int j = 0; j <= i - 1; j++)
>
> Call[j] = puByDf * Call[j + 1] + pdByDf * Call[j];
>
> }
>
>
> This enhancement is contributed by Intel and sponsored by the hotspot
> compiler team.
>

From roland.westrelin at oracle.com  Fri Apr 17 08:16:13 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Fri, 17 Apr 2015 10:16:13 +0200
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <E94CE692-6C2E-440B-A24E-3180E13EDE77@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>
	<55086F20.9020305@oracle.com>
	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>
	<550893F0.9050608@oracle.com>
	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
	<88169234-01DE-470C-B56A-D96AD7C53D50@oracle.com>
	<7BDCA4DA-BB97-436E-B761-6C883EDE955E@oracle.com>
	<E94CE692-6C2E-440B-A24E-3180E13EDE77@oracle.com>
Message-ID: <5D648F8E-8A17-4195-8D77-323D8D6B1AB3@oracle.com>

Thanks for taking another look at this.

> And, along those lines, I'm having trouble imagining examples of controls that are *not* retained, let along understand why they are less important than Regions (OK, those are always important), and If-projections.  Why not (for example) catch projections?

I noticed this:

1- CastPP nodes don?t always have a control
2- some CastPP nodes depend on a Region because the test was moved to the branch of a dominating If
3- the test for some CastPP?s nodes are removed during escape analysis

In compile.cpp:

n->in(0) != NULL is 1-
n->in(0)->is_Region() is 2-
n->in(0)->in(0) != NULL && n->in(0)->in(0)->is_If() is 3-

What we want to avoid I think is: the test for a CastPP is optimized out but the CastPP is not and it has its control set to something that is no longer a test. If that something is in the middle of a block then during gcm we can?t determine whether a precedence edge?s control dominates a memory operation?s control because they could both be in the same block. This said, gcm has this test:

138         if (m->is_block_proj() || m->is_block_start()) {

which filters out those cases where m would be in the middle of a block. So I guess the test in compile.cpp could simply be n->in(0) != NULL

Roland.

From roland.westrelin at oracle.com  Fri Apr 17 14:35:18 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Fri, 17 Apr 2015 16:35:18 +0200
Subject: RFR(S): 8077832: SA's dumpreplaydata,
	dumpcfg and buildreplayjars are broken
In-Reply-To: <552E9EB1.6070607@oracle.com>
References: <F72C231A-7282-4898-B76B-0A2C18CEF0F0@oracle.com>
	<552E9EB1.6070607@oracle.com>
Message-ID: <CEEF2886-EFCB-45D3-87ED-957C5B1D7598@oracle.com>

Thanks for the reviews, Vladimir, Dmitry and Staffan.

Roland.

From zoltan.majo at oracle.com  Fri Apr 17 14:38:54 2015
From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=)
Date: Fri, 17 Apr 2015 16:38:54 +0200
Subject: [9] RFR(S): 8076445:  Array accesses using sun.misc.Unsafe cause
	data corruption or SIGSEGV
Message-ID: <55311AFE.4090403@oracle.com>

Hi,


please review the following patch.

Bug: https://bugs.openjdk.java.net/browse/JDK-8076445

Problem: The errors listed in the bug description appear because the C1 
compiler can generate incorrect memory addressing if unsafe intrinsics 
are inlined.

The problem appears if a Unsafe.putInt is followed by an Unsafe.getInt 
to the same location, for example if the following Java code is compiled 
(some details omitted):

1: for (long index = 0; index < arraySize; index++) {
2:     long offset = pointer + (index * SIZE_OF_INT_IN_BYTES);
3:     unsafe.putInt(offset, (int)index);
4:     int readback = unsafe.getInt(offset);
5: }

Here is the C1 compiler's HIR for lines 3 and 4:

_p__bci__use__tid__result____instruction________________________________________ 
(HIR)
...
  .  20   1    a23  [R194|L]  a17._104 (L) unsafe
     25   1    i24  [R195|I]  l2i(l6)
  .  26   0    a25            null_check(a23)
  .  26   0    v26            UnsafePutRaw.(base l18, index l6, 
log2_scale 2, value i24)
  .  29   1    a28  [R196|L]  a17._104 (L) unsafe
  .  33   0    a29            null_check(a28)
  .  33   1    i30  [R197|I]  UnsafeGetRaw.(base l18, index l6, 
log2_scale 2)
...

On the HIR level, both UnsafePutRaw and UnsafeGetRaw use the same (and 
correct) base, index, and scale. On the LIR level, however, the 
following is generated:

_nr___instruction_______________________________________________________________ 
(LIR)
...
  92   move [Base:[R188|L] Disp: 104|L] [R194|L]
  94   null_check [R194|L]   [bci:26]
  96   convert [l2i] [R180|J] [R195|I]
  98   shift_left [R180|J] [int:2|I] [R180|J]
  100  move [R195|I] [Base:[R189|J] Index:[R180|J] Disp: 0|I]
  102  move [Base:[R188|L] Disp: 104|L] [R196|L]
  104  null_check [R196|L]   [bci:33]
  106  move [Base:[R189|J] Index:[R180|J] * 4 Disp: 0|I] [R197|I]
...

At instruction #96, the long index (R180) is converted to an int and is 
stored into R195.

At instruction #100, the int index (R195) is stored into the memory 
region allocated with unsafe. However, as we store an int, the index 
used to address the destination of the store must be scaled by 4. As a 
result, at instruction #98 the long index (from R180) is scaled by 4, 
stored back into R180, and is then used in instruction #100.

At instruction #106, a load from the same memory location is attempted 
as where we have stored before. R180 is used as scale, but the compiler 
scales R180 once more by 4 (in addition to the scaling that was already 
done at #98).

As a result, the UnsafeGetRaw accesses an incorrect memory address, 
which result in data corruption and eventually a SIGSEGV.

HIR instruction #96 is generated by LIRGenerator::do_UnsafePutRaw (lines 
2190--2193 in c1_LIRGenerator.cpp): index_op is scaled by log2_scale and 
is stored back into index_op.

HIR instruction #106 is generated by LIRGenerator::do_UnsafeGetRaw (line 
2106 in c1_LIRGenerator.cpp): The previously scaled index_op is reused 
and an additional scaling by log2_scale is performed.


Solution:

- change the scaling code in do_UnsafePutRaw to store result into a new 
temporary register tmp and not back into index_op (as index_op is 
possibly reused later);
- change the scaling code in do_UnsafePutRaw to embed the scale directly 
into the instruction on Intel architectures (as it is done by 
do_UnsafeGetRaw);
- do_UnsafePutRaw and do_UnsafeGetRaw function similarly, so refactor 
both methods to make the similarities more obvious from the source code;
- give new names to temporary registers to improve readability of the 
code, for example at places like:

2053     base_op = new_register(T_INT);
2054     __ convert(Bytecodes::_l2i, base.result(), base_op); # base op 
is originally base.result()

becomes

2059     LIR_Opr tmp = new_register(T_INT);
2060     __ convert(Bytecodes::_l2i, base_op, tmp);
2061     base_op = tmp;


Webrev: http://cr.openjdk.java.net/~zmajo/8076445/webrev.00/

Testing:
- add new test: CheckC1OptimizedUnsafeIntrinsics.java;
- CheckC1OptimizedUnsafeIntrinsics.java: fails on 5/9 JPRT platforms if 
executed without the fix;
- JPRT: all tests pass (including CheckC1OptimizedUnsafeIntrinsics.java).

Thank you and best regards,


Zoltan


From aleksey.shipilev at oracle.com  Fri Apr 17 17:59:37 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 17 Apr 2015 20:59:37 +0300
Subject: RFR (S) 8076987: C1 should support conditional card marks
	(UseCondCardMark)
Message-ID: <55314A09.4050906@oracle.com>

Hi,

I would like to propose a tiny enhancement in C1: handling
-XX:+UseCondCardMark. With tiered compilation enabled by default, we
need to do C1 compiles with conditional card marks as well -- otherwise
it collides with C2 compiled code that honestly does it.

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8076987

Webrev:
  http://cr.openjdk.java.net/~shade/8076987/webrev.00/

Testing:
  - eyeballing benchmark assembly dumps
  - default JPRT
  - JPRT with -XX:+UseCondCardMark

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150417/714cfe06/signature.asc>

From john.r.rose at oracle.com  Fri Apr 17 19:53:29 2015
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 17 Apr 2015 12:53:29 -0700
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <5D648F8E-8A17-4195-8D77-323D8D6B1AB3@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>
	<55086F20.9020305@oracle.com>
	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>
	<550893F0.9050608@oracle.com>
	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
	<88169234-01DE-470C-B56A-D96AD7C53D50@oracle.com>
	<7BDCA4DA-BB97-436E-B761-6C883EDE955E@oracle.com>
	<E94CE692-6C2E-440B-A24E-3180E13EDE77@oracle.com>
	<5D648F8E-8A17-4195-8D77-323D8D6B1AB3@oracle.com>
Message-ID: <2F8C06E8-E0C5-42E2-A397-E94C5D7F82C9@oracle.com>

> On Apr 17, 2015, at 1:16 AM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
> ?
> So I guess the test in compile.cpp could simply be n->in(0) != NULL

That would be safer I think. 

? John


From sandhya.viswanathan at intel.com  Fri Apr 17 20:41:11 2015
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 17 Apr 2015 20:41:11 +0000
Subject: Vzeroupper hotspot bug 
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335BD8@FMSMSX112.amr.corp.intel.com>

Hi Vladimir,

With 32 byte width vectorization the JVM produces wrong results under certain circumstances for x86_64.
The Vzeroupper instruction introduced in String.equals, String.compareTo and OptimizeFill intrinsic/stubs in the following change set is the cause of the bug:

http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/rev/d59ed8d47aed

For 32 byte vectorization, YMM registers are used by the hotspot compiler and the register allocator can allocate these across intrinsic methods.
Vzeroupper in the intrinsic is clobbering upper 16 bytes in all the other YMM registers that are not touched in these methods and so our customers are seeing unexpected results.


We need your help to create an RFE for this problem.


I have created a patch that fixes the problem on Linux and I will send you the corresponding webrev to attach to the RFE.


Best Regards,

Sandhya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150417/8ddbf6ee/attachment.html>

From sandhya.viswanathan at intel.com  Fri Apr 17 21:34:21 2015
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 17 Apr 2015 21:34:21 +0000
Subject: Vzeroupper hotspot bug 
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335C40@FMSMSX112.amr.corp.intel.com>


RFE:


https://bugs.openjdk.java.net/browse/JDK-8078113


webrev:


http://cr.openjdk.java.net/~kvn/8078113/webrev.00/


Best Regards,
Sandhya


From: Viswanathan, Sandhya
Sent: Friday, April 17, 2015 1:41 PM
To: hotspot-compiler-dev at openjdk.java.net; 'Vladimir Kozlov'
Subject: Vzeroupper hotspot bug

Hi Vladimir,

With 32 byte width vectorization the JVM produces wrong results under certain circumstances for x86_64.
The Vzeroupper instruction introduced in String.equals, String.compareTo and OptimizeFill intrinsic/stubs in the following change set is the cause of the bug:

http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/rev/d59ed8d47aed

For 32 byte vectorization, YMM registers are used by the hotspot compiler and the register allocator can allocate these across intrinsic methods.
Vzeroupper in the intrinsic is clobbering upper 16 bytes in all the other YMM registers that are not touched in these methods and so our customers are seeing unexpected results.


We need your help to create an RFE for this problem.


I have created a patch that fixes the problem on Linux and I will send you the corresponding webrev to attach to the RFE.


Best Regards,

Sandhya


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150417/0950d978/attachment-0001.html>

From sandhya.viswanathan at intel.com  Fri Apr 17 22:05:58 2015
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 17 Apr 2015 22:05:58 +0000
Subject: RFR(XS): 8078113: 8011102 changes may cause incorrect results
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335C8C@FMSMSX112.amr.corp.intel.com>

Hi All,

We would like to contribute a patch for bug 8078113 from Intel.


RFE: https://bugs.openjdk.java.net/browse/JDK-8078113


webrev: http://cr.openjdk.java.net/~kvn/8078113/webrev.00/


With 32 byte width vectorization the JVM produces wrong results under certain circumstances for x86_64.
The Vzeroupper instruction introduced in String.equals, String.compareTo and OptimizeFill intrinsic/stubs in the following change set is the cause of the bug:

http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/rev/d59ed8d47aed

For 32 byte vectorization, YMM registers are used by the hotspot compiler and the register allocator can allocate these across intrinsic methods.
Vzeroupper in the intrinsic is clobbering upper 16 bytes in all the other YMM registers that are not touched in these methods and so our customers are
seeing unexpected results.


This patch fixes the problem on Linux.


Best Regards,
Sandhya

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150417/05698911/attachment.html>

From vladimir.kozlov at oracle.com  Fri Apr 17 22:13:23 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 17 Apr 2015 15:13:23 -0700
Subject: RFR(XS): 8078113: 8011102 changes may cause incorrect results
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335C8C@FMSMSX112.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335C8C@FMSMSX112.amr.corp.intel.com>
Message-ID: <55318583.2070503@oracle.com>

Looks good. How your verified the fix?

Thanks,
Vladimir

On 4/17/15 3:05 PM, Viswanathan, Sandhya wrote:
> Hi All,
>
> We would like to contribute a patch for bug 8078113 from Intel.
>
> RFE: https://bugs.openjdk.java.net/browse/JDK-8078113
>
> webrev: http://cr.openjdk.java.net/~kvn/8078113/webrev.00/
>
> With 32 byte width vectorization the JVM produces wrong results under
> certain circumstances for x86_64.
>
> The Vzeroupper instruction introduced in String.equals, String.compareTo
> and OptimizeFill intrinsic/stubs in the following change set is the
> cause of the bug:
>
> http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/rev/d59ed8d47aed
>
> For 32 byte vectorization, YMM registers are used by the hotspot
> compiler and the register allocator can allocate these across intrinsic
> methods.
>
> Vzeroupper in the intrinsic is clobbering upper 16 bytes in all the
> other YMM registers that are not touched in these methods and so our
> customers are
>
> seeing unexpected results.
>
> This patch fixes the problem on Linux.
>
> Best Regards,
>
> Sandhya
>

From sandhya.viswanathan at intel.com  Fri Apr 17 23:32:32 2015
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 17 Apr 2015 23:32:32 +0000
Subject: RFR(XS): 8078113: 8011102 changes may cause incorrect results
In-Reply-To: <55318583.2070503@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335C8C@FMSMSX112.amr.corp.intel.com>
	<55318583.2070503@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335CC7@FMSMSX112.amr.corp.intel.com>

Hi Vladimir,

I verified the fix using the code from our customer that I got under NDA, so unable to share that.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Friday, April 17, 2015 3:13 PM
To: Viswanathan, Sandhya; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(XS): 8078113: 8011102 changes may cause incorrect results

Looks good. How your verified the fix?

Thanks,
Vladimir

On 4/17/15 3:05 PM, Viswanathan, Sandhya wrote:
> Hi All,
>
> We would like to contribute a patch for bug 8078113 from Intel.
>
> RFE: https://bugs.openjdk.java.net/browse/JDK-8078113
>
> webrev: http://cr.openjdk.java.net/~kvn/8078113/webrev.00/
>
> With 32 byte width vectorization the JVM produces wrong results under
> certain circumstances for x86_64.
>
> The Vzeroupper instruction introduced in String.equals, String.compareTo
> and OptimizeFill intrinsic/stubs in the following change set is the
> cause of the bug:
>
> http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/rev/d59ed8d47aed
>
> For 32 byte vectorization, YMM registers are used by the hotspot
> compiler and the register allocator can allocate these across intrinsic
> methods.
>
> Vzeroupper in the intrinsic is clobbering upper 16 bytes in all the
> other YMM registers that are not touched in these methods and so our
> customers are
>
> seeing unexpected results.
>
> This patch fixes the problem on Linux.
>
> Best Regards,
>
> Sandhya
>

From vladimir.kozlov at oracle.com  Fri Apr 17 23:35:27 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 17 Apr 2015 16:35:27 -0700
Subject: RFR(XS): 8078113: 8011102 changes may cause incorrect results
In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335CC7@FMSMSX112.amr.corp.intel.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335C8C@FMSMSX112.amr.corp.intel.com>
	<55318583.2070503@oracle.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63335CC7@FMSMSX112.amr.corp.intel.com>
Message-ID: <553198BF.7040308@oracle.com>

Okay. I will sponsor this fix.

Thanks,
Vladimir

On 4/17/15 4:32 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
>
> I verified the fix using the code from our customer that I got under NDA, so unable to share that.
>
> Best Regards,
> Sandhya
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, April 17, 2015 3:13 PM
> To: Viswanathan, Sandhya; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8078113: 8011102 changes may cause incorrect results
>
> Looks good. How your verified the fix?
>
> Thanks,
> Vladimir
>
> On 4/17/15 3:05 PM, Viswanathan, Sandhya wrote:
>> Hi All,
>>
>> We would like to contribute a patch for bug 8078113 from Intel.
>>
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8078113
>>
>> webrev: http://cr.openjdk.java.net/~kvn/8078113/webrev.00/
>>
>> With 32 byte width vectorization the JVM produces wrong results under
>> certain circumstances for x86_64.
>>
>> The Vzeroupper instruction introduced in String.equals, String.compareTo
>> and OptimizeFill intrinsic/stubs in the following change set is the
>> cause of the bug:
>>
>> http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/rev/d59ed8d47aed
>>
>> For 32 byte vectorization, YMM registers are used by the hotspot
>> compiler and the register allocator can allocate these across intrinsic
>> methods.
>>
>> Vzeroupper in the intrinsic is clobbering upper 16 bytes in all the
>> other YMM registers that are not touched in these methods and so our
>> customers are
>>
>> seeing unexpected results.
>>
>> This patch fixes the problem on Linux.
>>
>> Best Regards,
>>
>> Sandhya
>>

From sandhya.viswanathan at intel.com  Fri Apr 17 23:46:22 2015
From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya)
Date: Fri, 17 Apr 2015 23:46:22 +0000
Subject: RFR(XS): 8078113: 8011102 changes may cause incorrect results
In-Reply-To: <553198BF.7040308@oracle.com>
References: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335C8C@FMSMSX112.amr.corp.intel.com>
	<55318583.2070503@oracle.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63335CC7@FMSMSX112.amr.corp.intel.com>
	<553198BF.7040308@oracle.com>
Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63335CE3@FMSMSX112.amr.corp.intel.com>

Thanks a lot!  This problem existed in JDK8u as well and it would be good to consider back porting to 8u60 if possible.

Another thing I want to mention is that we would need to look into Windows OS case separately, may be another RFE.
https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
The windows calling convention says that only XMM6-XMM15 are SOE(callee save), whereas upper 128 bits of YMM6-15 are SOC(caller save). 
This is not reflected in the x86.ad file. Also the AES stubs seem to only save the lower 128 bits for windows using movdqu.

Best Regards,
Sandhya


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Friday, April 17, 2015 4:35 PM
To: Viswanathan, Sandhya; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(XS): 8078113: 8011102 changes may cause incorrect results

Okay. I will sponsor this fix.

Thanks,
Vladimir

On 4/17/15 4:32 PM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
>
> I verified the fix using the code from our customer that I got under NDA, so unable to share that.
>
> Best Regards,
> Sandhya
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, April 17, 2015 3:13 PM
> To: Viswanathan, Sandhya; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8078113: 8011102 changes may cause incorrect results
>
> Looks good. How your verified the fix?
>
> Thanks,
> Vladimir
>
> On 4/17/15 3:05 PM, Viswanathan, Sandhya wrote:
>> Hi All,
>>
>> We would like to contribute a patch for bug 8078113 from Intel.
>>
>> RFE: https://bugs.openjdk.java.net/browse/JDK-8078113
>>
>> webrev: http://cr.openjdk.java.net/~kvn/8078113/webrev.00/
>>
>> With 32 byte width vectorization the JVM produces wrong results under
>> certain circumstances for x86_64.
>>
>> The Vzeroupper instruction introduced in String.equals, String.compareTo
>> and OptimizeFill intrinsic/stubs in the following change set is the
>> cause of the bug:
>>
>> http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/rev/d59ed8d47aed
>>
>> For 32 byte vectorization, YMM registers are used by the hotspot
>> compiler and the register allocator can allocate these across intrinsic
>> methods.
>>
>> Vzeroupper in the intrinsic is clobbering upper 16 bytes in all the
>> other YMM registers that are not touched in these methods and so our
>> customers are
>>
>> seeing unexpected results.
>>
>> This patch fixes the problem on Linux.
>>
>> Best Regards,
>>
>> Sandhya
>>

From vitalyd at gmail.com  Sat Apr 18 01:07:09 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Fri, 17 Apr 2015 21:07:09 -0400
Subject: TrustNonStaticFinalFields clarification
Message-ID: <CAHjP37H1Sz-C=opfebY82cd7+oELOQCniANOMgikQN-e7K7cSw@mail.gmail.com>

Hi guys,

I'm hoping someone could clarify/confirm my understanding of this
experimental flag's effects:

1) final instance array length is constant propagated? Even if array is
passed in as ctor arg rather than being instantiated in the ctor?
2) final instance fields seen as never null are forever considered as such?
So even if a method call on that object is fully eliminated (e.g. the
method is empty) no null check is left behind?
3) concrete runtime type of the instance field is propagated to uses and no
additional type checks are done? Say the declared type is an
interface/abstract with multiple implementations loaded but only one type
stored in the field - is a type check eliminated and calls are fully
devirtualized?
4) primitive type final fields have their value constant propagated if
compiler sees only one value always stored?
5) do derived classes and base class share field profile or not? For
example subclasses always store concrete type but each subclass stores a
different type from the others.

Also, there's been some talk about doing these optimizations automatically
with invalidations builtin.  Just curious where that stands.

Thanks

sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150417/bba166c6/attachment-0001.html>

From igor.veresov at oracle.com  Sat Apr 18 05:47:31 2015
From: igor.veresov at oracle.com (Igor Veresov)
Date: Fri, 17 Apr 2015 22:47:31 -0700
Subject: RFR (S) 8076987: C1 should support conditional card marks
	(UseCondCardMark)
In-Reply-To: <55314A09.4050906@oracle.com>
References: <55314A09.4050906@oracle.com>
Message-ID: <6DDD7A84-4108-4DD0-962C-982746F7A421@oracle.com>

Looks good to me.

igor

> On Apr 17, 2015, at 10:59 AM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
> 
> Hi,
> 
> I would like to propose a tiny enhancement in C1: handling
> -XX:+UseCondCardMark. With tiered compilation enabled by default, we
> need to do C1 compiles with conditional card marks as well -- otherwise
> it collides with C2 compiled code that honestly does it.
> 
> RFE:
>  https://bugs.openjdk.java.net/browse/JDK-8076987
> 
> Webrev:
>  http://cr.openjdk.java.net/~shade/8076987/webrev.00/
> 
> Testing:
>  - eyeballing benchmark assembly dumps
>  - default JPRT
>  - JPRT with -XX:+UseCondCardMark
> 
> Thanks,
> -Aleksey
> 


From roland.westrelin at oracle.com  Mon Apr 20 12:06:48 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 20 Apr 2015 14:06:48 +0200
Subject: RFR (S) 8076987: C1 should support conditional card marks
	(UseCondCardMark)
In-Reply-To: <55314A09.4050906@oracle.com>
References: <55314A09.4050906@oracle.com>
Message-ID: <6E61C8FB-0837-449B-BBBE-0A4669746A91@oracle.com>

>  http://cr.openjdk.java.net/~shade/8076987/webrev.00/

That looks good to me.

Roland.

From vitalyd at gmail.com  Mon Apr 20 14:58:20 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Mon, 20 Apr 2015 10:58:20 -0400
Subject: TrustFinalNonStaticFields clarification
Message-ID: <CAHjP37GF11yoyKWHA=foDp+OO7X_ZxaSa5kUg5n=N2u-TVmr+w@mail.gmail.com>

Fixed the flag name in the subject.

On Fri, Apr 17, 2015 at 9:07 PM, Vitaly Davidovich <vitalyd at gmail.com>
wrote:

> Hi guys,
>
> I'm hoping someone could clarify/confirm my understanding of this
> experimental flag's effects:
>
> 1) final instance array length is constant propagated? Even if array is
> passed in as ctor arg rather than being instantiated in the ctor?
> 2) final instance fields seen as never null are forever considered as
> such? So even if a method call on that object is fully eliminated (e.g. the
> method is empty) no null check is left behind?
> 3) concrete runtime type of the instance field is propagated to uses and
> no additional type checks are done? Say the declared type is an
> interface/abstract with multiple implementations loaded but only one type
> stored in the field - is a type check eliminated and calls are fully
> devirtualized?
> 4) primitive type final fields have their value constant propagated if
> compiler sees only one value always stored?
> 5) do derived classes and base class share field profile or not? For
> example subclasses always store concrete type but each subclass stores a
> different type from the others.
>
> Also, there's been some talk about doing these optimizations automatically
> with invalidations builtin.  Just curious where that stands.
>
> Thanks
>
> sent from my phone
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150420/51a2a7a8/attachment.html>

From roland.westrelin at oracle.com  Tue Apr 21 10:24:33 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Tue, 21 Apr 2015 12:24:33 +0200
Subject: RFR(S): 8069191: moving predicate out of loops may cause array
	accesses to bypass null check
In-Reply-To: <2F8C06E8-E0C5-42E2-A397-E94C5D7F82C9@oracle.com>
References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com>
	<55086F20.9020305@oracle.com>
	<2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com>
	<550893F0.9050608@oracle.com>
	<F8959E8E-ECA1-42DD-BC8A-AA7CD750F5C2@oracle.com>
	<88169234-01DE-470C-B56A-D96AD7C53D50@oracle.com>
	<7BDCA4DA-BB97-436E-B761-6C883EDE955E@oracle.com>
	<E94CE692-6C2E-440B-A24E-3180E13EDE77@oracle.com>
	<5D648F8E-8A17-4195-8D77-323D8D6B1AB3@oracle.com>
	<2F8C06E8-E0C5-42E2-A397-E94C5D7F82C9@oracle.com>
Message-ID: <E5775766-0AC7-4D2E-89DF-C83EEB391B31@oracle.com>

>> So I guess the test in compile.cpp could simply be n->in(0) != NULL
> 
> That would be safer I think. 

I?m making that change and the ensure_control_or_add_prec() change as well and pushing this.

Thanks for the suggestions, discussion and review.

Roland.

From roland.westrelin at oracle.com  Tue Apr 21 13:02:13 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Tue, 21 Apr 2015 15:02:13 +0200
Subject: RFR(M): 8076188 Optimize arraycopy out for non escaping destination 
Message-ID: <2B8622DD-1DA5-4715-B4BF-3801202B588B@oracle.com>

http://cr.openjdk.java.net/~roland/8076188/webrev.00/

This patch tries to eliminate ArrayCopyNodes (for instance clones, array clones, arraycopy and copyOf) when the destination of the copy doesn?t escape:

- during escape analysis, ArrayCopyNodes don?t cause the destination of the copy to be marked as escaping anymore
- a load to the destination of a copy may be replaced by a load from the source during IGVN
- during macro expansion, ArrayCopyNodes don?t stop allocation from being eliminated and can themselves be eliminated 

Roland.

From michael.haupt at oracle.com  Tue Apr 21 13:15:32 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Tue, 21 Apr 2015 15:15:32 +0200
Subject: node hierarchy poster updated
Message-ID: <2C785B0F-4BBF-4FB4-9F1C-774F46D99179@oracle.com>

Hi,

I've updated the C2 node hierarchy poster at http://cr.openjdk.java.net/~mhaupt/pres/IdealNodeHierarchyPoster.pdf to reflect the recent addition of reduction nodes.

Best,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150421/dcf808b1/attachment.html>

From vladimir.kozlov at oracle.com  Tue Apr 21 17:12:41 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 21 Apr 2015 10:12:41 -0700
Subject: node hierarchy poster updated
In-Reply-To: <2C785B0F-4BBF-4FB4-9F1C-774F46D99179@oracle.com>
References: <2C785B0F-4BBF-4FB4-9F1C-774F46D99179@oracle.com>
Message-ID: <55368509.1070801@oracle.com>

Thank you, Michael

Vladimir

On 4/21/15 6:15 AM, Michael Haupt wrote:
> Hi,
>
> I've updated the C2 node hierarchy poster at http://cr.openjdk.java.net/~mhaupt/pres/IdealNodeHierarchyPoster.pdf to
> reflect the recent addition of reduction nodes.
>
> Best,
>
> Michael
>
> --
>
> Oracle <http://www.oracle.com/>
> Dr. Michael Haupt | Principal Member of Technical Staff
> Phone: +49 331 200 7277 | Fax: +49 331 200 7561
> OracleJava Platform Group | HotSpot Compiler Team
> Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
> Green Oracle <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help
> protect the environment
>
>

From vladimir.kozlov at oracle.com  Tue Apr 21 17:29:40 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 21 Apr 2015 10:29:40 -0700
Subject: RFR(M): 8076188 Optimize arraycopy out for non escaping
	destination
In-Reply-To: <2B8622DD-1DA5-4715-B4BF-3801202B588B@oracle.com>
References: <2B8622DD-1DA5-4715-B4BF-3801202B588B@oracle.com>
Message-ID: <55368904.4030002@oracle.com>

Good.  Nice work.

CallLeafNode::may_modify() - does arraycopy call have dest in different edges? Why you searching through inputs fro it?

Thanks,
Vladimir

On 4/21/15 6:02 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8076188/webrev.00/
>
> This patch tries to eliminate ArrayCopyNodes (for instance clones, array clones, arraycopy and copyOf) when the destination of the copy doesn?t escape:
>
> - during escape analysis, ArrayCopyNodes don?t cause the destination of the copy to be marked as escaping anymore
> - a load to the destination of a copy may be replaced by a load from the source during IGVN
> - during macro expansion, ArrayCopyNodes don?t stop allocation from being eliminated and can themselves be eliminated
>
> Roland.
>

From vitalyd at gmail.com  Tue Apr 21 18:02:50 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 21 Apr 2015 14:02:50 -0400
Subject: TrustFinalNonStaticFields clarification
In-Reply-To: <CAHjP37GF11yoyKWHA=foDp+OO7X_ZxaSa5kUg5n=N2u-TVmr+w@mail.gmail.com>
References: <CAHjP37GF11yoyKWHA=foDp+OO7X_ZxaSa5kUg5n=N2u-TVmr+w@mail.gmail.com>
Message-ID: <CAHjP37FNeV+heJ6Rze6kK-stP7UtUZiN_yCsdsjTwQzBHRaQcg@mail.gmail.com>

Anyone? :)

In my brief experimentation on 7u60, only thing I noticed is Enum.ordinal()
replaced with constant in compiled method.  I couldn't, however, get it to
constant propagate array length, eliminate null receiver check, etc.

sent from my phone
On Apr 20, 2015 10:58 AM, "Vitaly Davidovich" <vitalyd at gmail.com> wrote:

> Fixed the flag name in the subject.
>
> On Fri, Apr 17, 2015 at 9:07 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>> Hi guys,
>>
>> I'm hoping someone could clarify/confirm my understanding of this
>> experimental flag's effects:
>>
>> 1) final instance array length is constant propagated? Even if array is
>> passed in as ctor arg rather than being instantiated in the ctor?
>> 2) final instance fields seen as never null are forever considered as
>> such? So even if a method call on that object is fully eliminated (e.g. the
>> method is empty) no null check is left behind?
>> 3) concrete runtime type of the instance field is propagated to uses and
>> no additional type checks are done? Say the declared type is an
>> interface/abstract with multiple implementations loaded but only one type
>> stored in the field - is a type check eliminated and calls are fully
>> devirtualized?
>> 4) primitive type final fields have their value constant propagated if
>> compiler sees only one value always stored?
>> 5) do derived classes and base class share field profile or not? For
>> example subclasses always store concrete type but each subclass stores a
>> different type from the others.
>>
>> Also, there's been some talk about doing these optimizations
>> automatically with invalidations builtin.  Just curious where that stands.
>>
>> Thanks
>>
>> sent from my phone
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150421/f0ad6dd0/attachment.html>

From roland.westrelin at oracle.com  Tue Apr 21 18:09:09 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Tue, 21 Apr 2015 20:09:09 +0200
Subject: RFR(M): 8076188 Optimize arraycopy out for non escaping
	destination
In-Reply-To: <55368904.4030002@oracle.com>
References: <2B8622DD-1DA5-4715-B4BF-3801202B588B@oracle.com>
	<55368904.4030002@oracle.com>
Message-ID: <70BD862B-7557-4BA6-AF02-E4163D8872A0@oracle.com>

Thanks Vladimir for reviewing this.

> Good.  Nice work.
> 
> CallLeafNode::may_modify() - does arraycopy call have dest in different edges? Why you searching through inputs fro it?

Stubs that can be called once an ArrayCopyNode is expanded have different signatures:

// arraycopy stub variations:                                                                                                                                                                                                                                                   
enum ArrayCopyType {
  ac_fast,                      // void(ptr, ptr, size_t)                                                                                                                                                                                                                       
  ac_checkcast,                 //  int(ptr, ptr, size_t, size_t, ptr)                                                                                                                                                                                                           
  ac_slow,                      // void(ptr, int, ptr, int, int)                                                                                                                                                                                                                 
  ac_generic                    //  int(ptr, int, ptr, int, int)                                                                                                                                                                                                                 
};


Roland.

> 
> Thanks,
> Vladimir
> 
> On 4/21/15 6:02 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8076188/webrev.00/
>> 
>> This patch tries to eliminate ArrayCopyNodes (for instance clones, array clones, arraycopy and copyOf) when the destination of the copy doesn?t escape:
>> 
>> - during escape analysis, ArrayCopyNodes don?t cause the destination of the copy to be marked as escaping anymore
>> - a load to the destination of a copy may be replaced by a load from the source during IGVN
>> - during macro expansion, ArrayCopyNodes don?t stop allocation from being eliminated and can themselves be eliminated
>> 
>> Roland.
>> 


From vladimir.kozlov at oracle.com  Tue Apr 21 18:17:20 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 21 Apr 2015 11:17:20 -0700
Subject: RFR(M): 8076188 Optimize arraycopy out for non escaping
	destination
In-Reply-To: <70BD862B-7557-4BA6-AF02-E4163D8872A0@oracle.com>
References: <2B8622DD-1DA5-4715-B4BF-3801202B588B@oracle.com>	<55368904.4030002@oracle.com>
	<70BD862B-7557-4BA6-AF02-E4163D8872A0@oracle.com>
Message-ID: <55369430.5010606@oracle.com>

On 4/21/15 11:09 AM, Roland Westrelin wrote:
> Thanks Vladimir for reviewing this.
>
>> Good.  Nice work.
>>
>> CallLeafNode::may_modify() - does arraycopy call have dest in different edges? Why you searching through inputs fro it?
>
> Stubs that can be called once an ArrayCopyNode is expanded have different signatures:

Okay. May be add comment to may_modify() to say that.

Thanks,
Vladimir

>
> // arraycopy stub variations:
> enum ArrayCopyType {
>    ac_fast,                      // void(ptr, ptr, size_t)
>    ac_checkcast,                 //  int(ptr, ptr, size_t, size_t, ptr)
>    ac_slow,                      // void(ptr, int, ptr, int, int)
>    ac_generic                    //  int(ptr, int, ptr, int, int)
> };
>
>
> Roland.
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/21/15 6:02 AM, Roland Westrelin wrote:
>>> http://cr.openjdk.java.net/~roland/8076188/webrev.00/
>>>
>>> This patch tries to eliminate ArrayCopyNodes (for instance clones, array clones, arraycopy and copyOf) when the destination of the copy doesn?t escape:
>>>
>>> - during escape analysis, ArrayCopyNodes don?t cause the destination of the copy to be marked as escaping anymore
>>> - a load to the destination of a copy may be replaced by a load from the source during IGVN
>>> - during macro expansion, ArrayCopyNodes don?t stop allocation from being eliminated and can themselves be eliminated
>>>
>>> Roland.
>>>
>

From roland.westrelin at oracle.com  Tue Apr 21 18:19:12 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Tue, 21 Apr 2015 20:19:12 +0200
Subject: RFR(M): 8076188 Optimize arraycopy out for non escaping
	destination
In-Reply-To: <55369430.5010606@oracle.com>
References: <2B8622DD-1DA5-4715-B4BF-3801202B588B@oracle.com>
	<55368904.4030002@oracle.com>
	<70BD862B-7557-4BA6-AF02-E4163D8872A0@oracle.com>
	<55369430.5010606@oracle.com>
Message-ID: <F4D9E90C-4124-47D9-BCC7-19FE792EE0B5@oracle.com>

>> Thanks Vladimir for reviewing this.
>> 
>>> Good.  Nice work.
>>> 
>>> CallLeafNode::may_modify() - does arraycopy call have dest in different edges? Why you searching through inputs fro it?
>> 
>> Stubs that can be called once an ArrayCopyNode is expanded have different signatures:
> 
> Okay. May be add comment to may_modify() to say that.

I will do that. Thanks again for the review.

Roland.

> 
> Thanks,
> Vladimir
> 
>> 
>> // arraycopy stub variations:
>> enum ArrayCopyType {
>>   ac_fast,                      // void(ptr, ptr, size_t)
>>   ac_checkcast,                 //  int(ptr, ptr, size_t, size_t, ptr)
>>   ac_slow,                      // void(ptr, int, ptr, int, int)
>>   ac_generic                    //  int(ptr, int, ptr, int, int)
>> };
>> 
>> 
>> Roland.
>> 
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 4/21/15 6:02 AM, Roland Westrelin wrote:
>>>> http://cr.openjdk.java.net/~roland/8076188/webrev.00/
>>>> 
>>>> This patch tries to eliminate ArrayCopyNodes (for instance clones, array clones, arraycopy and copyOf) when the destination of the copy doesn?t escape:
>>>> 
>>>> - during escape analysis, ArrayCopyNodes don?t cause the destination of the copy to be marked as escaping anymore
>>>> - a load to the destination of a copy may be replaced by a load from the source during IGVN
>>>> - during macro expansion, ArrayCopyNodes don?t stop allocation from being eliminated and can themselves be eliminated
>>>> 
>>>> Roland.
>>>> 
>> 


From evgeniya.stepanova at oracle.com  Wed Apr 22 11:28:41 2015
From: evgeniya.stepanova at oracle.com (Evgeniya Stepanova)
Date: Wed, 22 Apr 2015 14:28:41 +0300
Subject: [8u60] RFR(s): 8038098: [TESTBUG] remove explicit set build flavor
	from hotspot/test/compiler/* tests
In-Reply-To: <552B8C3C.2080403@oracle.com>
References: <5527D7C1.9050704@oracle.com> <552B8AC5.6040404@oracle.com>
	<552B8C3C.2080403@oracle.com>
Message-ID: <553785E9.6070700@oracle.com>

Hi!
I still need a reviewer's approval for this.
Please take a look

Thanks,
Jane
On 13.04.2015 12:28, Evgeniya Stepanova wrote:
> Hi Igor,
>
> Thank you for the review!
>
> Jane
> On 13.04.2015 12:22, Igor Ignatyev wrote:
>> Evgeniya,
>>
>> looks good to me.
>>
>> Igor
>>
>> On 04/10/2015 05:01 PM, Evgeniya Stepanova wrote:
>>> Hi,
>>>
>>> Could you please review back-port of 8038098 to the 8udev repo?
>>> Diff applies cleanly to the all tests except of the
>>> test/compiler/IntegerArithmetic/TestIntegerComparison.java test, which
>>> does not exist in 8u60 repo.
>>> After fix tests pass with 8u60 b09 with the client vm.
>>>
>>> webrev for 8u60:
>>> http://cr.openjdk.java.net/~eistepan/8038098/8u60/webrev.00/
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8038098
>>>
>>> Original webrev:
>>> http://cr.openjdk.java.net/~iignatyev/eistepan/8038098/webrev.02/
>>> mail thread for 9:
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015540.html 
>>>
>>> Original change:
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/662499384b32
>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/662499384b32
>>>
>>> Thanks,
>>> Jane
>>> -- 
>>> /Evgeniya Stepanova/
>
> -- 
> /Evgeniya Stepanova/

-- 
/Evgeniya Stepanova/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150422/cf8500bf/attachment.html>

From jan.civlin at intel.com  Tue Apr 21 06:45:14 2015
From: jan.civlin at intel.com (Civlin, Jan)
Date: Tue, 21 Apr 2015 06:45:14 +0000
Subject: RFR(S): 8076284: Improve vectorization of parallel streams
In-Reply-To: <39F83597C33E5F408096702907E6C450E3F0AC@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C450E3E586@ORSMSX104.amr.corp.intel.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334516@FMSMSX112.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3E5A4@ORSMSX104.amr.corp.intel.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com>
	<55303823.1020205@oracle.com>
	<39F83597C33E5F408096702907E6C450E3F0AC@ORSMSX104.amr.corp.intel.com>
Message-ID: <39F83597C33E5F408096702907E6C450E3F413@ORSMSX104.amr.corp.intel.com>

Vladimir, 

Here is the description and new patch with the changes you recommended (except the last one - see below my explanation).
 

The patch description.

This patch provides on-demand vectorization/SIMD'ing of a <method> specified in JVM command as -XX:CompileCommand=option,<method>,Vectorize.
This optimization may be globally disabled by setting the flag -XX: -AllowVectorizeOnDemand (by default it is true). 

For each method that was specified with Vectorize option we do the following: 

1. On each iteration of loop unroll for a given method (loopopts.cpp) we generate the next _clone_idx (which will be common for all the nodes cloned in this iteration); and on each node cloning we hash _idx of the origin of the node that is cloned (_idx_clone_orig ) and the _clone_idx (cm.verify_insert_and_clone).
CloneMap belongs to Compile and is created in CompilerWrapper.  

2. In SuperWord optimization, after max_depth has been built, we are hoisting the loads. 
For this we for each Load_X (subject of the hoisting) find some Load_0 that has the same origin as Load_X but belongs to the first iteration, i.e. if the MemNode::Memory input of Load_0 is memory Phi (collected previously in memory_slice) we set this Phi also as the MemNode::Memory input of Load_X. After this rebuild of the graph we restart the Superword optimization.

The major routines here.
- SuperWord::mark_generations: computes _ii_first (the index _clone_idx) of the nodes that have MemNode::Memory input coming from a phi node in some slice; computes list of the nodes in the first and last iterations of the loop.
- SuperWord::hoist_loads_in_graph: for each memory slice (a phi node) visits each load that has this phi as a memory input and then for each other load that has the same origin makes the memory input coming from the phi.  This routine does not use marking generations mechanism.
- SuperWord::pack_parallel - this routine is called only if SuperWord fails to produce any pack after extend_packlist(); it is another algorithm for packing instructions into SIMD.
It goes thru the list of all instructions in the _iteration_first, and if it is a Load, Store, Add or Mul it starts a new pack and adds this instruction to this pack. Then the algorithm circulates thru the iterations of the loop (gen < _ii_order.length()) and over the instructions list and finds the node with the origin coinciding with the origin of the nodes already in the pack - then it adds this node to the pack. Once packs are built, SuperWord returns to the normal processing (combine_packs()).
 
Note, that neither 2 or 3 goes thru the data dependency analysis, since the correctness of parallelization was guaranteed by the user.
Note, that some checks in added code could be omitted. But we are not assume that there is no optimization (now or in the future) that can change the graph structure between loop unrolling and the SuperWord, so we prefer to run many (probably unneeded) checks in the SuperWord. 


>>Can you also utilize changes done by Michael Berg for reduction 
>>optimization (the code in jdk9/hs-comp already)? I mean marking some 
>>nodes before unrolling and searching Phis.
Michael Berg and I looked at what you suggested (phi marking before unroll?) and we both think this marking is very different than what I do: Michael's phi marking is just one bit per node (in the flags) whereas I collect the _idx_clone_orig and the unroll generation (clone_idx).
   

 -----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Thursday, April 16, 2015 3:31 PM
To: Civlin, Jan; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8076284: Improve vectorization of parallel streams

Hi Jan,

You did not describe your changes in details (what they do).

IgnoreVectorizeMethod flag should positive and enabled by default. 
Rename it to AllowVectorizeOnDemand (or something similar):

+  product(bool, AllowVectorizeOnDemand, true, 
      \

Instead of next you should add intrinsic definition to
and classfile/vmSymbols.hpp and then check method()->intrinsic_id():

+    if (strcmp("forEachRemaining", method()->name()->as_quoted_ascii()) 
== 0 && method()->signature() != 0
+      && method()->signature()->as_symbol() != 0 && 
method()->signature()->as_symbol()->as_quoted_ascii() != 0 ) {
+      if 
(strstr(method()->signature()->as_symbol()->as_quoted_ascii(),"Ljava/util/function/IntConsumer")) 
{
+        set_do_vector_loop(true);
+      }
+    }

And that should be under flag too because in general forEachRemaining 
should be vectorized only if it is safe.

Can you also utilize changes done by Michael Berg for reduction 
optimization (the code in jdk9/hs-comp already)? I mean marking some 
nodes before unrolling and searching Phis.

Regards,
Vladimir

On 4/13/15 3:33 AM, Civlin, Jan wrote:
> Hi All,
>
>
>   We would like to contribute the improvement of vectorization of
>   parallel streams  from Intel.
>
> The contribution Bug ID: 8076284.
>
> Please review this patch:
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076284
>
> webrev: http://cr.openjdk.java.net/~kvn/8076284/webrev/
>
>
>       *Description*
>
> Improve vectorization of the unordered parallel streams (by vectorizing
> forEachRemaining method).
>
> For example, this forEach will be vectorized:
>
> java.util.stream.IntStream iStream = java.util.stream.IntStream.range(0,
> RANGE - 1).parallel();
>
> iStream.forEach( id -> c[id] = c[id] + c[id+1] );
>
> It also enables on-demand loop vectorization in a given method (by
> providing more hints to SuperWord optimization).
>
> For example, use -XX:CompileCommand=option,computeCall,Vectorizeto
> vectorize this loop
>
> void computeCall(double [] Call, double  puByDf, double  pdByDf)
>
> {
>
> for(int i = timeStep; i > 0; i--)
>
> for(int j = 0; j <= i - 1; j++)
>
> Call[j] = puByDf * Call[j + 1] + pdByDf * Call[j];
>
> }
>
>
> This enhancement is contributed by Intel and sponsored by the hotspot
> compiler team.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: webrev.tar.bz2
Type: application/octet-stream
Size: 932171 bytes
Desc: webrev.tar.bz2
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150421/86c289fa/webrev.tar-0001.bz2>

From jan.civlin at intel.com  Wed Apr 22 03:26:36 2015
From: jan.civlin at intel.com (Civlin, Jan)
Date: Wed, 22 Apr 2015 03:26:36 +0000
Subject: RFR(S): 8076284: Improve vectorization of parallel streams
In-Reply-To: <39F83597C33E5F408096702907E6C450E3F413@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C450E3E586@ORSMSX104.amr.corp.intel.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334516@FMSMSX112.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3E5A4@ORSMSX104.amr.corp.intel.com>
	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com>
	<55303823.1020205@oracle.com>
	<39F83597C33E5F408096702907E6C450E3F0AC@ORSMSX104.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3F413@ORSMSX104.amr.corp.intel.com>
Message-ID: <39F83597C33E5F408096702907E6C450E3F76D@ORSMSX104.amr.corp.intel.com>

Vladimir, 

I use MSVS12 (Microsoft (R) C/C++ Optimizing Compiler Version 18.00.21005.1 for x64) but found today that my last night patch may not compile in previous versions of the Microsoft compiler.
Here is the patch that compiles in both: MSVS12 and MSVS10.

Please use this patch.

Thank you,

Jan.

-----Original Message-----
From: Civlin, Jan 
Sent: Monday, April 20, 2015 11:45 PM
To: hotspot-compiler-dev at openjdk.java.net
Cc: Civlin, Jan
Subject: RE: RFR(S): 8076284: Improve vectorization of parallel streams

Vladimir, 

Here is the description and new patch with the changes you recommended (except the last one - see below my explanation).
 

The patch description.

This patch provides on-demand vectorization/SIMD'ing of a <method> specified in JVM command as -XX:CompileCommand=option,<method>,Vectorize.
This optimization may be globally disabled by setting the flag -XX: -AllowVectorizeOnDemand (by default it is true). 

For each method that was specified with Vectorize option we do the following: 

1. On each iteration of loop unroll for a given method (loopopts.cpp) we generate the next _clone_idx (which will be common for all the nodes cloned in this iteration); and on each node cloning we hash _idx of the origin of the node that is cloned (_idx_clone_orig ) and the _clone_idx (cm.verify_insert_and_clone).
CloneMap belongs to Compile and is created in CompilerWrapper.  

2. In SuperWord optimization, after max_depth has been built, we are hoisting the loads. 
For this we for each Load_X (subject of the hoisting) find some Load_0 that has the same origin as Load_X but belongs to the first iteration, i.e. if the MemNode::Memory input of Load_0 is memory Phi (collected previously in memory_slice) we set this Phi also as the MemNode::Memory input of Load_X. After this rebuild of the graph we restart the Superword optimization.

The major routines here.
- SuperWord::mark_generations: computes _ii_first (the index _clone_idx) of the nodes that have MemNode::Memory input coming from a phi node in some slice; computes list of the nodes in the first and last iterations of the loop.
- SuperWord::hoist_loads_in_graph: for each memory slice (a phi node) visits each load that has this phi as a memory input and then for each other load that has the same origin makes the memory input coming from the phi.  This routine does not use marking generations mechanism.
- SuperWord::pack_parallel - this routine is called only if SuperWord fails to produce any pack after extend_packlist(); it is another algorithm for packing instructions into SIMD.
It goes thru the list of all instructions in the _iteration_first, and if it is a Load, Store, Add or Mul it starts a new pack and adds this instruction to this pack. Then the algorithm circulates thru the iterations of the loop (gen < _ii_order.length()) and over the instructions list and finds the node with the origin coinciding with the origin of the nodes already in the pack - then it adds this node to the pack. Once packs are built, SuperWord returns to the normal processing (combine_packs()).
 
Note, that neither 2 or 3 goes thru the data dependency analysis, since the correctness of parallelization was guaranteed by the user.
Note, that some checks in added code could be omitted. But we are not assume that there is no optimization (now or in the future) that can change the graph structure between loop unrolling and the SuperWord, so we prefer to run many (probably unneeded) checks in the SuperWord. 


>>Can you also utilize changes done by Michael Berg for reduction 
>>optimization (the code in jdk9/hs-comp already)? I mean marking some 
>>nodes before unrolling and searching Phis.
Michael Berg and I looked at what you suggested (phi marking before unroll?) and we both think this marking is very different than what I do: Michael's phi marking is just one bit per node (in the flags) whereas I collect the _idx_clone_orig and the unroll generation (clone_idx).
   

 -----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Thursday, April 16, 2015 3:31 PM
To: Civlin, Jan; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): 8076284: Improve vectorization of parallel streams

Hi Jan,

You did not describe your changes in details (what they do).

IgnoreVectorizeMethod flag should positive and enabled by default. 
Rename it to AllowVectorizeOnDemand (or something similar):

+  product(bool, AllowVectorizeOnDemand, true,
      \

Instead of next you should add intrinsic definition to
and classfile/vmSymbols.hpp and then check method()->intrinsic_id():

+    if (strcmp("forEachRemaining", method()->name()->as_quoted_ascii()) 
== 0 && method()->signature() != 0
+      && method()->signature()->as_symbol() != 0 && 
method()->signature()->as_symbol()->as_quoted_ascii() != 0 ) {
+      if 
(strstr(method()->signature()->as_symbol()->as_quoted_ascii(),"Ljava/util/function/IntConsumer")) 
{
+        set_do_vector_loop(true);
+      }
+    }

And that should be under flag too because in general forEachRemaining 
should be vectorized only if it is safe.

Can you also utilize changes done by Michael Berg for reduction 
optimization (the code in jdk9/hs-comp already)? I mean marking some 
nodes before unrolling and searching Phis.

Regards,
Vladimir

On 4/13/15 3:33 AM, Civlin, Jan wrote:
> Hi All,
>
>
>   We would like to contribute the improvement of vectorization of
>   parallel streams  from Intel.
>
> The contribution Bug ID: 8076284.
>
> Please review this patch:
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076284
>
> webrev: http://cr.openjdk.java.net/~kvn/8076284/webrev/
>
>
>       *Description*
>
> Improve vectorization of the unordered parallel streams (by vectorizing
> forEachRemaining method).
>
> For example, this forEach will be vectorized:
>
> java.util.stream.IntStream iStream = java.util.stream.IntStream.range(0,
> RANGE - 1).parallel();
>
> iStream.forEach( id -> c[id] = c[id] + c[id+1] );
>
> It also enables on-demand loop vectorization in a given method (by
> providing more hints to SuperWord optimization).
>
> For example, use -XX:CompileCommand=option,computeCall,Vectorizeto
> vectorize this loop
>
> void computeCall(double [] Call, double  puByDf, double  pdByDf)
>
> {
>
> for(int i = timeStep; i > 0; i--)
>
> for(int j = 0; j <= i - 1; j++)
>
> Call[j] = puByDf * Call[j + 1] + pdByDf * Call[j];
>
> }
>
>
> This enhancement is contributed by Intel and sponsored by the hotspot
> compiler team.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: webrev.042115.1.tar.bz2
Type: application/octet-stream
Size: 935230 bytes
Desc: webrev.042115.1.tar.bz2
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150422/9c4f65f2/webrev.042115.1.tar-0001.bz2>

From vladimir.x.ivanov at oracle.com  Wed Apr 22 14:52:15 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 22 Apr 2015 17:52:15 +0300
Subject: [9] RFR (XXS): 8078309: compiler/jsr292/MHInlineTest.java failed
	with java.lang.RuntimeException: 'MHInlineTest$A::protected_x (3 bytes)
	virtual call' found in stdout
Message-ID: <5537B59F.5060606@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8078309/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8078309

Some checks in the test are too strict.

Depending on the compilation order, DMH LambdaForms can be compiled first:

j.l.i.LF$DMH/1971489295::invokeVirtual_L_L (20 bytes)
    @ 7 DMH::internalMemberName (8 bytes) force inline by annotation
    @ 16 MHInlineTest$A::protected_x (3 bytes) virtual call

Instead of trying to ensure correct compilation order, I propose to 
remove them.

Thanks!

Best regards,
Vladimir Ivanov

From vladimir.kozlov at oracle.com  Wed Apr 22 15:10:12 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 22 Apr 2015 08:10:12 -0700
Subject: [8u60] RFR(s): 8038098: [TESTBUG] remove explicit set build flavor
	from hotspot/test/compiler/* tests
In-Reply-To: <553785E9.6070700@oracle.com>
References: <5527D7C1.9050704@oracle.com>
	<552B8AC5.6040404@oracle.com>	<552B8C3C.2080403@oracle.com>
	<553785E9.6070700@oracle.com>
Message-ID: <5537B9D4.5080604@oracle.com>

Reviewed. Good.

Thanks,
Vladimir

On 4/22/15 4:28 AM, Evgeniya Stepanova wrote:
> Hi!
> I still need a reviewer's approval for this.
> Please take a look
>
> Thanks,
> Jane
> On 13.04.2015 12:28, Evgeniya Stepanova wrote:
>> Hi Igor,
>>
>> Thank you for the review!
>>
>> Jane
>> On 13.04.2015 12:22, Igor Ignatyev wrote:
>>> Evgeniya,
>>>
>>> looks good to me.
>>>
>>> Igor
>>>
>>> On 04/10/2015 05:01 PM, Evgeniya Stepanova wrote:
>>>> Hi,
>>>>
>>>> Could you please review back-port of 8038098 to the 8udev repo?
>>>> Diff applies cleanly to the all tests except of the
>>>> test/compiler/IntegerArithmetic/TestIntegerComparison.java test, which
>>>> does not exist in 8u60 repo.
>>>> After fix tests pass with 8u60 b09 with the client vm.
>>>>
>>>> webrev for 8u60:
>>>> http://cr.openjdk.java.net/~eistepan/8038098/8u60/webrev.00/
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8038098
>>>>
>>>> Original webrev:
>>>> http://cr.openjdk.java.net/~iignatyev/eistepan/8038098/webrev.02/
>>>> mail thread for 9:
>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015540.html
>>>> Original change:
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/662499384b32
>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/662499384b32
>>>>
>>>> Thanks,
>>>> Jane
>>>> --
>>>> /Evgeniya Stepanova/
>>
>> --
>> /Evgeniya Stepanova/
>
> --
> /Evgeniya Stepanova/

From aleksey.shipilev at oracle.com  Wed Apr 22 16:12:04 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 22 Apr 2015 19:12:04 +0300
Subject: RFR (S) 8076987: C1 should support conditional card marks
	(UseCondCardMark)
In-Reply-To: <6E61C8FB-0837-449B-BBBE-0A4669746A91@oracle.com>
References: <55314A09.4050906@oracle.com>
	<6E61C8FB-0837-449B-BBBE-0A4669746A91@oracle.com>
Message-ID: <5537C854.8060604@oracle.com>

On 04/20/2015 03:06 PM, Roland Westrelin wrote:
>>  http://cr.openjdk.java.net/~shade/8076987/webrev.00/
> 
> That looks good to me.

Thanks Roland and Igor!

Please sponsor, here is the changeset:
 http://cr.openjdk.java.net/~shade/8076987/8076987.changeset

-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150422/79d8f865/signature.asc>

From roland.westrelin at oracle.com  Wed Apr 22 16:18:03 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 22 Apr 2015 18:18:03 +0200
Subject: RFR(S) 8077504: Unsafe load can loose control dependency and
	cause crash
In-Reply-To: <552FC216.4010503@redhat.com>
References: <42EDECBF-2347-406A-8835-29A98F1CA153@oracle.com>
	<552FC216.4010503@redhat.com>
Message-ID: <95F149F4-7BE2-45C8-A3C0-DEACFE42ACDF@oracle.com>

Hi Andrew,

Thanks for taking a look at this. See my answer below.

> I must admit I was a tad confused by your use of variable
> does_not_depend_only_on_test at library_call.cpp:2635. I understand that
> at this point you are trying to emphasise that this call gets passed
> false where other calls get passed true. So, in the call we see
> 
>  Node* p = make_load(control(), adr, value_type, type, adr_type, mo,
>                      does_not_depend_only_on_test, is_volatile);
> 
> which is fine when you understand what is going on. However, the
> preceding assignment:
> 
>  bool does_not_depend_only_on_test = false;
> 
> makes it look like you are suggesting that this case does depend only on
> the test.
> 
> Would it not be clearer if you signalled what is happening by avoiding
> the bool var declarations and instead tagging the calls with an
> explanatory comment as follows:
> 
>  Node* p = make_load(control(), adr, value_type, type, adr_type, mo,
>                      false, //does_not_depend_only_on_test
>                      is_volatile);
> 
> vs
> 
>  Node* p = make_load(control(), adr, value_type, type, adr_type, mo,
>                      true, //depends_only_on_test
>                      is_volatile);

I tried to use a code pattern that is used elsewhere in c2:

LoadLNode* LoadLNode::make_atomic(Node* ctl, Node* mem, Node* adr, const TypePtr* adr_type, const Type* rt, MemOrd mo) {
  bool require_atomic = true;
  return new LoadLNode(ctl, mem, adr, adr_type, rt->is_long(), mo, require_atomic);
}

I understand does_not_depend_only_on_test is confusing. I use that pattern because other options didn?t feel better and I can?t say I find it great.

Vladimir suggested privately to set _depends_only_on_test to true in the constructor and then use an explicit call to a new a method set_depends_only_on_test() to set it to false in the rare cases where it?s needed. That feels better indeed. What do you think?

Roland.

> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)


From zoltan.majo at oracle.com  Wed Apr 22 16:26:22 2015
From: zoltan.majo at oracle.com (=?windows-1252?Q?Zolt=E1n_Maj=F3?=)
Date: Wed, 22 Apr 2015 18:26:22 +0200
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <551BF4D3.90805@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>
	<551BF4D3.90805@oracle.com>
Message-ID: <5537CBAE.9020500@oracle.com>

Hi Vladimir,


I managed to do some more work on this enhancement. Please see details 
below.

On 04/01/2015 03:38 PM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
>> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>>> Hi Vladimir,
>>>
>>>
>>> thank you for the feedback!
>>>
>>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>>> How about PreserveFramePointer instead of simple FramePointer?
>>>>
>>>> PreserveFramePointer will mean that compiled (or other) code will use
>>>> that register only as Frame pointer.
>>>
>>> I will change the flag's name to PreserveFramePointer and will also
>>> update the description.

I changed the flag's name to PreserveFramePointer, just as you suggested.

>>>
>>>> Zoltan, x86 flags setting should be in general globals_x86.hpp. You
>>>> can #ifdef _LP64 there too. I don't understand why you only set it to
>>>> true on linux-x64.
>>>
>>> I remembered that the original discussion with Brendan Gregg mentioned
>>> only Linux's perf tool as a possible use case for "proper" frame
>>> pointers. So I was unsure whether to enable proper frame pointers by
>>> default on other x64 platforms as well.
>>>
>>> But if you think it would be better to have proper frame pointers on 
>>> all
>>> x64 platforms, I will change the code to set PreserveFramePointer to
>>> true for all x64 platforms. Just please let me know.

The current webrev sets the PreserveFramePointer flag to to true on all 
x86_64 platforms and to false on all other platforms.

>>
>> Currently compiled code for all x86 platforms is almost the same 
>> (win64 has difference in registers usage) and we should keep it that 
>> way.
>>
>> Also the original request was to have flag to enable such behavior 
>> (use RBP only as FP). So to have it off by default is acceptable. If 
>> performance group or someone find a regression (or bug) due to this 
>> change we can switch the flag off by default before jdk9 release.
>>
>> Try to run pstack on Solaris and jstack on OSX to make sure they 
>> report correct call stack with compiled java methods. And JFR.
>> Also it would be nice to run SunStudio analyzer to verify that it works.
>
> I ran all tools you've suggested. JFR and jstack is unaffected, pstack 
> produces nice stack traces (it did not always do so before).

I tested the current webrev with the following setup: I used two tests, 
one that generates a long chain of lambda form invocations and an other 
one that generates a long chain of "regular" method invocations. Both 
tests were executed on an x64 machine in four configurations: with +/- 
Xcomp and with +/- PreserveFramePointer.

Just as before, JFR and jstack stack traces are unaffected for both 
tests, pstack can now produce stack traces with both tests if 
PreserveFramePointer is enabled.

> However, I've encountered a problem with SunStudio: Two asserts fail 
> in the fastdebug build. Both of them  "soft" failures, as neither the 
> VM nor SunStudio crash with the product build. I worked on the problem 
> today and have a partial understanding of the issue, but more 
> investigation is needed to have a patch that preserves the correct 
> behavior of SunStudio as well.

I was able to track down the problems with SunStudio. I had to change 
the code at two places.


Change #1 (in src/cpu/x86/vm/frame_x86.cpp):

*** 222,232 ****
       }

       if (sender_blob->is_nmethod()) {
           nmethod* nm = sender_blob->as_nmethod_or_null();
           if (nm != NULL) {
!             if (nm->is_deopt_mh_entry(sender_pc) || 
nm->is_deopt_entry(sender_pc)) {
                   return false;
               }
           }
       }

--- 222,233 ----
       }

       if (sender_blob->is_nmethod()) {
           nmethod* nm = sender_blob->as_nmethod_or_null();
           if (nm != NULL) {
!             if (nm->is_deopt_mh_entry(sender_pc) || 
nm->is_deopt_entry(sender_pc) ||
!                 nm->method()->is_method_handle_intrinsic()) {
                   return false;
               }
           }
       }

The reason for this change is the following. Method handle intrinsics 
(i.e., the intrinsics _invokeBasic, _linkToVirtual,_linkToStatic, 
_linkToSpecial, and _linkToInterface) do not allocate stack space when 
invoked, but they can extend the stack space of their caller "temporarily".

For example, if VerifyMethodHandles is enabled, some stack space is used 
during verification. The temporarily used stack space is released before 
the intrinsic jumps to its target. As a result, the target of a method 
handle intrinsic will have a correct SP when it returns and the 
program's control flow is correct.

Moreover, if the stack is walked synchronously (e.g., at safepoints), no 
problems appear either, because the synchronous interruption can happen 
while execution is within the method handle intrinsic.

The problem is that the SunStudio analyzer can interrupt the VM 
asynchonously and walk the stack. If execution of a thread is 
interrupted while the thread is in a method handle intrinsic, the SP 
might contain an invalid value.

The new webrev adds a check that marks the current frame unsafe for 
sender if the frame belongs to a method handle intrinsic 
(frame::safe_for_sender returns false in this case).


Change #2 (in src/share/vm/prims/forte.cpp):

*** 425,435 ****

       RegisterMap map(thd, false);
       initial_Java_frame = initial_Java_frame.sender(&map);
     }

!   vframeStreamForte st(thd, initial_Java_frame, false);

     for (; !st.at_end() && count < depth; st.forte_next(), count++) {
       bci = st.bci();
       method = st.method();

--- 425,435 ----

       RegisterMap map(thd, false);
       initial_Java_frame = initial_Java_frame.sender(&map);
     }

!   vframeStreamForte st(thd, initial_Java_frame, true);

     for (; !st.at_end() && count < depth; st.forte_next(), count++) {
       bci = st.bci();
       method = st.method();

The problem is that the following assert in forte.cpp on line 103

assert(filled_in, "invariant");

fails. The problem appears if we have a stack trace like:

V  [libjvm.so+0x1c98c4a]  void VMError::report(outputStream*)+0xb1a
V  [libjvm.so+0x1c9a3e8]  void VMError::report_and_die()+0x748
V  [libjvm.so+0x1003c8e]  void report_vm_error(const char*,int,const 
char*,const char*)+0x7e
V  [libjvm.so+0x10efa22]  vframeStreamForte::vframeStreamForte #Nvariant 
1(JavaThread*,frame,bool)+0xe2
--> (Frame #5) V  [libjvm.so+0x10f0bb9]  void 
forte_fill_call_trace_given_top(JavaThread*,ASGCT_CallTrace*,int,frame)+0x789
V  [libjvm.so+0x10f1436]  AsyncGetCallTrace+0x246
C  [libcollector.so+0x272a8]  __collector_ext_jstack_unwind+0xb8
C  [libcollector.so+0x277df]  __collector_get_frame_info+0x27f
C  [libcollector.so+0x2f093]  __collector_getUserCtx+0x13
C  [libcollector.so+0x1abc7]  __collector_ext_profile_handler+0x127
C  [libcollector.so+0x17535]  collector_sigprof_dispatcher+0x85
C  [libc.so.1+0x122476]  __sighndlr+0x6
C  [libc.so.1+0x115972]  call_user_handler+0x2ce
C  [libc.so.1+0x115e1b]  sigacthandler+0xdb
C  0xffffffffffffffff
V  [libjvm.so+0x1959089]  void os::PlatformEvent::park()+0xd9
V  [libjvm.so+0x18a6b34]  int ParkCommon(ParkEvent*,long)+0x34
V  [libjvm.so+0x18a7657]  int Monitor::IWait(Thread*,long)+0xb7
V  [libjvm.so+0x18a8b86]  bool Monitor::wait(bool,long,bool)+0x346
V  [libjvm.so+0xf7073d]  void 
CompileBroker::wait_for_completion(CompileTask*)+0xad
V  [libjvm.so+0xf6f6b6]  void 
CompileBroker::compile_method_base(methodHandle,int,int,methodHandle,int,const 
char*,Thread*)+0x406
V  [libjvm.so+0xf6fd96] 
nmethod*CompileBroker::compile_method(methodHandle,int,int,methodHandle,int,const 
char*,Thread*)+0x586
V  [libjvm.so+0xbbad72]  void 
AdvancedThresholdPolicy::submit_compile(methodHandle,int,CompLevel,JavaThread*)+0xb2
V  [libjvm.so+0x1aef92f]  void 
SimpleThresholdPolicy::compile(methodHandle,int,CompLevel,JavaThread*)+0x14f
V  [libjvm.so+0xbbb00f]  void 
AdvancedThresholdPolicy::method_invocation_event(methodHandle,methodHandle,CompLevel,nmethod*,JavaThread*)+0x1ff
V  [libjvm.so+0x1aef765] 
nmethod*SimpleThresholdPolicy::event(methodHandle,methodHandle,int,int,CompLevel,nmethod*,JavaThread*)+0x2e5
V  [libjvm.so+0xdb93bc]  unsigned 
char*Runtime1::counter_overflow(JavaThread*,int,Method*)+0x31c
v  ~RuntimeStub::counter_overflow Runtime1 stub
--> (Frame #29) J 143 C1 
java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @ 
0xffff80ffacc6962a [0xffff80ffacc69580+0xaa]
--> (Frame #30) v  ~StubRoutines::call_stub
V  [libjvm.so+0x13ca50b]  void 
JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0x41b
V  [libjvm.so+0x152a111]  JVM_DoPrivileged+0xfb1
C  [libjava.so+0x12f42] 
Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x12
J 142 
java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; 
(0 bytes) @ 0xffff80ffb400c57c [0xffff80ffb400c420+0x15c\
]
J 134 C1 
java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; 
(47 bytes) @ 0xffff80ffacc66014 [0xffff80ffacc65ec0+0x154]
... more stack frames

The forte_fill_call_trace_given_top() method (Frame #5) first checks if 
the first Java frame found is fully decipherable (line 395 in 
forte.cpp). In our case the first Java frame is Frame #29 (the 
C1-compiled version of java.net.URLClassLoader$1.run).

In our case Frame #29 is not decipherable, because 
java.net.URLClassLoader$1.run has been made "not entrant" (a C2-compiled 
version of the same method has been produced shortly before).

Afterwards, forte_fill_call_trace_given_top() checks if the method is 
"safe for sender" (line 424 in forte.cpp). The caller of the 
java.net.URLClassLoader$1.run method is ~StubRoutines::call_stub, which 
is considered "safe for sender" by the VM.

Then, initial_Java_frame is set to the ~StubRoutines::call_stub stub 
(line 430). This does not seem to be correct because the stub is not a 
Java method and causes the assert(filled_in, "invariant") in the 
constructor of vframeStreamForte (line 103 in forte.cpp) to fail 
(because the frame cannot be filled from a stub).

To avoid this failure, I propose to call the constructor of 
vframeStreamForte with parameter stop_at_java_call_stub set to true 
(instead of false) so that the VM stops walking the stack if a call stub 
has been reached.


Here is the updated webrev:

http://cr.openjdk.java.net/~zmajo/8068945/webrev.02/

In addition to testing the changeset with the tools mentioned before, I 
executed
- all JPRT tests, all pass;
- all java/lang/invoke and compiler JTREG tests; all tests that pass 
with the unmodified source trace pass with the changes as well.

Thank you very much in advance!

Best regards,


Zoltan

>
> So that will put this RFR on hold for a while, unfortunately.
>
> Thank you for the feedback and suggestions so far!
>
> Best regards,
>
>
> Zoltan
>
>
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>>> Hi Ed,
>>>>>
>>>>>
>>>>> thank you for your feedback! Please see comments below.
>>>>>
>>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>>> Hi Zolt?n,
>>>>>>
>>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler
>>>>>>> tests and
>>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32. All 
>>>>>>> tests
>>>>>>> that pass without the patch pass also with the patch.
>>>>>>>
>>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>>> infrastructure for
>>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>>> statistically significant performance degradation due to having 
>>>>>>> proper
>>>>>>> frame pointers. Therefore I propose to have OmitFramePointer set to
>>>>>>> false by default on x86_64 (and set to true on all other 
>>>>>>> platforms).
>>>>>> This patch looks good, however I think there is a problem with the
>>>>>> logic of OmitFramePointer.
>>>>>>
>>>>>> Here is my test case.
>>>>>>
>>>>>> --- CUT HERE ---
>>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>>
>>>>>> public class fibo {
>>>>>>      public static void main(String args[]) {
>>>>>>     int N = Integer.parseInt(args[0]);
>>>>>>     System.out.println(fib(N));
>>>>>>      }
>>>>>>      public static int fib(int n) {
>>>>>>     if (n < 2) return(1);
>>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>>      }
>>>>>> }
>>>>>> --- CUT HERE ---
>>>>>>
>>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>>
>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation 
>>>>>> -XX:+PrintCompilation
>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>
>>>>>> I get
>>>>>>
>>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>>    # parm0:    rsi       = int
>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>    0x00007fc625071100: mov    %eax,-0x14000(%rsp)
>>>>>>    0x00007fc625071107: push   %rbp
>>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization entry
>>>>>>
>>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is 
>>>>>> using
>>>>>> the frame pointer
>>>>>>
>>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>>> -XX:+OmitFramePointer in the above I get
>>>>>>
>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation 
>>>>>> -XX:+PrintCompilation
>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>
>>>>>> I get
>>>>>>
>>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>>    # parm0:    rsi       = int
>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>    0x00007f14e1071100: mov    %eax,-0x14000(%rsp)
>>>>>>    0x00007f14e1071107: push   %rbp
>>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>
>>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it does 
>>>>>> not
>>>>>> use a frame pointer.
>>>>>>
>>>>>> However, if I now delete the -XX:+/-OmitFramePointer altogether, IE
>>>>>>
>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation 
>>>>>> -XX:+PrintCompilation
>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>> -XX:+PrintAssembly fibo 43
>>>>>>
>>>>>> I get
>>>>>>
>>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>>    # parm0:    rsi       = int
>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>    0x00007f0c75071100: mov    %eax,-0x14000(%rsp)
>>>>>>    0x00007f0c75071107: push   %rbp
>>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>
>>>>>> It is not using a frame pointer which is the equivalent of
>>>>>> -XX:+OmitFramePointer. However in your description above you say
>>>>>>
>>>>>>> Therefore I propose to have OmitFramePointer set to false by 
>>>>>>> default
>>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>> whereas OmitFramePointer actually seems to be set to true on x86_64
>>>>>>
>>>>>> I think the problem may be with the declaration and definition of
>>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>>
>>>>>> In globals.hpp it does
>>>>>>
>>>>>> product(bool, OmitFramePointer, true,
>>>>>>
>>>>>> In globals_x86.hpp it does
>>>>>>
>>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>>
>>>>>> I am not sure that you can mix product(...) and product_pd(...) like
>>>>>> this, so I think it just ends up getting the default from the
>>>>>> product(...).
>>>>>
>>>>> You are right, mixing product and product_pd does not make sense 
>>>>> at all.
>>>>> Thank you for doing additional testing and for drawing attention 
>>>>> to the
>>>>> problem.
>>>>>
>>>>> I updated the code to use product_pd and define_pd_global on all
>>>>> relevant platforms.
>>>>>
>>>>>> Aside: In general, I do not like options which include a negative in
>>>>>> them because I have to do a double think when I see something like,
>>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer,
>>>>>> therefore it is using a frame pointer. How about FramePointer so we
>>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>>> -XX:-FramePointer to say I don't.
>>>>>
>>>>> That is a good idea. Double negation is an unnecessary 
>>>>> complication, so
>>>>> I changed the name of the flag to FramePointer, just as you 
>>>>> suggested.
>>>>>
>>>>>>
>>>>>> I did some timing on the above 'fibo' test
>>>>>>
>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>> -XX:-OmitFramePointer fibo 43
>>>>>> 701408733
>>>>>>
>>>>>> real    0m1.545s
>>>>>> user    0m1.571s
>>>>>> sys    0m0.015s
>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>> -XX:+OmitFramePointer fibo 43
>>>>>> 701408733
>>>>>>
>>>>>> real    0m1.504s
>>>>>> user    0m1.527s
>>>>>> sys    0m0.019s
>>>>>>
>>>>>> which is ~3% difference on this test case. On aarch64, I see ~7%
>>>>>> difference on this test case.
>>>>>
>>>>> Thank you for the performance measurements!
>>>>>
>>>>>> With the above change to fix the logic of OmitFramePointer (and
>>>>>> possible change its name) the patch looks good to me.
>>>>>
>>>>> Here is the updated webrev (the same webrev that was already included
>>>>> into my reply to Roland):
>>>>>
>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>>
>>>>>> I will prepare a mirror patch for aarch64.
>>>>>
>>>>> That would be great!
>>>>>
>>>>> Thank you and best regards,
>>>>>
>>>>>
>>>>> Zolt?n
>>>>>
>>>>>>
>>>>>> All the best,
>>>>>> Ed.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>


From adinn at redhat.com  Wed Apr 22 16:29:28 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 22 Apr 2015 17:29:28 +0100
Subject: RFR(S) 8077504: Unsafe load can loose control dependency and
	cause crash
In-Reply-To: <95F149F4-7BE2-45C8-A3C0-DEACFE42ACDF@oracle.com>
References: <42EDECBF-2347-406A-8835-29A98F1CA153@oracle.com>
	<552FC216.4010503@redhat.com>
	<95F149F4-7BE2-45C8-A3C0-DEACFE42ACDF@oracle.com>
Message-ID: <5537CC68.2020407@redhat.com>

On 22/04/15 17:18, Roland Westrelin wrote:
> Thanks for taking a look at this. See my answer below.
> . . .
> I tried to use a code pattern that is used elsewhere in c2:
> 
> LoadLNode* LoadLNode::make_atomic(Node* ctl, Node* mem, Node* adr, const TypePtr* adr_type, const Type* rt, MemOrd mo) {
>   bool require_atomic = true;
>   return new LoadLNode(ctl, mem, adr, adr_type, rt->is_long(), mo, require_atomic);
> }
> 
> 
> I understand does_not_depend_only_on_test is confusing. I use that
> pattern because other options didn?t feel better and I can?t say I find
> it great.
> 
> Vladimir suggested privately to set _depends_only_on_test to true in
> the constructor and then use an explicit call to a new a method
> set_depends_only_on_test() to set it to false in the rare cases where
> it?s needed. That feels better indeed. What do you think?

Yes, I think that would be better.

By the way I should have made it clear I had nothing else to add. Apart
from this detail the patch looks fine as a way to avoid this load (and
/only/ this load) being moved up.

regards,


Andrew Dinn
-----------


From roland.westrelin at oracle.com  Wed Apr 22 16:44:21 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 22 Apr 2015 18:44:21 +0200
Subject: RFR (S) 8076987: C1 should support conditional card marks
	(UseCondCardMark)
In-Reply-To: <5537C854.8060604@oracle.com>
References: <55314A09.4050906@oracle.com>
	<6E61C8FB-0837-449B-BBBE-0A4669746A91@oracle.com>
	<5537C854.8060604@oracle.com>
Message-ID: <B9774943-BBBD-40D1-9B5F-EF68063A8E49@oracle.com>


> Please sponsor, here is the changeset:
> http://cr.openjdk.java.net/~shade/8076987/8076987.changeset

I?m pushing it.

Roland.

From aleksey.shipilev at oracle.com  Wed Apr 22 16:46:10 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 22 Apr 2015 19:46:10 +0300
Subject: RFR (S) 8076987: C1 should support conditional card marks
	(UseCondCardMark)
In-Reply-To: <B9774943-BBBD-40D1-9B5F-EF68063A8E49@oracle.com>
References: <55314A09.4050906@oracle.com>
	<6E61C8FB-0837-449B-BBBE-0A4669746A91@oracle.com>
	<5537C854.8060604@oracle.com>
	<B9774943-BBBD-40D1-9B5F-EF68063A8E49@oracle.com>
Message-ID: <5537D052.4030401@oracle.com>

On 04/22/2015 07:44 PM, Roland Westrelin wrote:
> 
>> Please sponsor, here is the changeset:
>> http://cr.openjdk.java.net/~shade/8076987/8076987.changeset
> 
> I?m pushing it.

Thank you!

-Aleksey.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150422/4fdc260c/signature.asc>

From john.r.rose at oracle.com  Wed Apr 22 17:16:47 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 22 Apr 2015 10:16:47 -0700
Subject: [9] RFR (XXS): 8078309: compiler/jsr292/MHInlineTest.java failed
	with java.lang.RuntimeException: 'MHInlineTest$A::protected_x
	(3 bytes) virtual call' found in stdout
In-Reply-To: <5537B59F.5060606@oracle.com>
References: <5537B59F.5060606@oracle.com>
Message-ID: <CAD6B59F-407F-4EBE-A9A6-4BB62E86785E@oracle.com>

Reviewed.  ? John

On Apr 22, 2015, at 7:52 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8078309/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8078309
> 
> Some checks in the test are too strict.
> 
> Depending on the compilation order, DMH LambdaForms can be compiled first:
> 
> j.l.i.LF$DMH/1971489295::invokeVirtual_L_L (20 bytes)
>   @ 7 DMH::internalMemberName (8 bytes) force inline by annotation
>   @ 16 MHInlineTest$A::protected_x (3 bytes) virtual call
> 
> Instead of trying to ensure correct compilation order, I propose to remove them.
> 
> Thanks!
> 
> Best regards,
> Vladimir Ivanov


From vladimir.x.ivanov at oracle.com  Wed Apr 22 17:18:28 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 22 Apr 2015 20:18:28 +0300
Subject: [9] RFR (XXS): 8078309: compiler/jsr292/MHInlineTest.java failed
	with java.lang.RuntimeException: 'MHInlineTest$A::protected_x (3
	bytes) virtual call' found in stdout
In-Reply-To: <CAD6B59F-407F-4EBE-A9A6-4BB62E86785E@oracle.com>
References: <5537B59F.5060606@oracle.com>
	<CAD6B59F-407F-4EBE-A9A6-4BB62E86785E@oracle.com>
Message-ID: <5537D7E4.2030905@oracle.com>

Thanks, John.

Best regards,
Vladimir Ivanov

On 4/22/15 8:16 PM, John Rose wrote:
> Reviewed.  ? John
>
> On Apr 22, 2015, at 7:52 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> http://cr.openjdk.java.net/~vlivanov/8078309/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8078309
>>
>> Some checks in the test are too strict.
>>
>> Depending on the compilation order, DMH LambdaForms can be compiled first:
>>
>> j.l.i.LF$DMH/1971489295::invokeVirtual_L_L (20 bytes)
>>    @ 7 DMH::internalMemberName (8 bytes) force inline by annotation
>>    @ 16 MHInlineTest$A::protected_x (3 bytes) virtual call
>>
>> Instead of trying to ensure correct compilation order, I propose to remove them.
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>

From vitalyd at gmail.com  Wed Apr 22 18:42:34 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 22 Apr 2015 14:42:34 -0400
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer in
	JIT compiled code on x86
In-Reply-To: <5519C29D.8080200@oracle.com>
References: <55156A87.1070607@oracle.com>
	<1427706703.1606.22.camel@mylittlepony.linaroharston>
	<55196C2C.8080106@oracle.com> <5519B1AE.8070901@oracle.com>
	<5519BC6E.1090504@oracle.com> <5519C29D.8080200@oracle.com>
Message-ID: <CAHjP37GePOrnWQ8hUAQi8puNmgc9Q4-3kbSX7irV8UT_n_jS6w@mail.gmail.com>

Vladimir,

I don't think preserving frame pointer should default to true.  I realize
you're relying on performance group to detect regressions, but how
confident are you that they cover every conceivable scenario? Personally,
I'd rather this flag be off by default (keep that register allocatable) as
most folks won't be running linux perf or the like regularly, and if they
want nice call stacks, they can opt in.

What do you think?

sent from my phone
On Mar 30, 2015 5:39 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com>
wrote:

> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>
>> Hi Vladimir,
>>
>>
>> thank you for the feedback!
>>
>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>
>>> How about PreserveFramePointer instead of simple FramePointer?
>>>
>>> PreserveFramePointer will mean that compiled (or other) code will use
>>> that register only as Frame pointer.
>>>
>>
>> I will change the flag's name to PreserveFramePointer and will also
>> update the description.
>>
>>  Zoltan, x86 flags setting should be in general globals_x86.hpp. You
>>> can #ifdef _LP64 there too. I don't understand why you only set it to
>>> true on linux-x64.
>>>
>>
>> I remembered that the original discussion with Brendan Gregg mentioned
>> only Linux's perf tool as a possible use case for "proper" frame
>> pointers. So I was unsure whether to enable proper frame pointers by
>> default on other x64 platforms as well.
>>
>> But if you think it would be better to have proper frame pointers on all
>> x64 platforms, I will change the code to set PreserveFramePointer to
>> true for all x64 platforms. Just please let me know.
>>
>
> Currently compiled code for all x86 platforms is almost the same (win64
> has difference in registers usage) and we should keep it that way.
>
> Also the original request was to have flag to enable such behavior (use
> RBP only as FP). So to have it off by default is acceptable. If performance
> group or someone find a regression (or bug) due to this change we can
> switch the flag off by default before jdk9 release.
>
> Try to run pstack on Solaris and jstack on OSX to make sure they report
> correct call stack with compiled java methods. And JFR.
> Also it would be nice to run SunStudio analyzer to verify that it works.
>
> Thanks,
> Vladimir
>
>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>
>>>> Hi Ed,
>>>>
>>>>
>>>> thank you for your feedback! Please see comments below.
>>>>
>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>
>>>>> Hi Zolt?n,
>>>>>
>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>
>>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler
>>>>>> tests and
>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32. All tests
>>>>>> that pass without the patch pass also with the patch.
>>>>>>
>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>> infrastructure for
>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>> statistically significant performance degradation due to having proper
>>>>>> frame pointers. Therefore I propose to have OmitFramePointer set to
>>>>>> false by default on x86_64 (and set to true on all other platforms).
>>>>>>
>>>>> This patch looks good, however I think there is a problem with the
>>>>> logic of OmitFramePointer.
>>>>>
>>>>> Here is my test case.
>>>>>
>>>>> --- CUT HERE ---
>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>
>>>>> public class fibo {
>>>>>      public static void main(String args[]) {
>>>>>     int N = Integer.parseInt(args[0]);
>>>>>     System.out.println(fib(N));
>>>>>      }
>>>>>      public static int fib(int n) {
>>>>>     if (n < 2) return(1);
>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>      }
>>>>> }
>>>>> --- CUT HERE ---
>>>>>
>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>
>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation -XX:+PrintCompilation
>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>
>>>>> I get
>>>>>
>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>    # parm0:    rsi       = int
>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>    0x00007fc625071100: mov    %eax,-0x14000(%rsp)
>>>>>    0x00007fc625071107: push   %rbp
>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization entry
>>>>>
>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is using
>>>>> the frame pointer
>>>>>
>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>> -XX:+OmitFramePointer in the above I get
>>>>>
>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation -XX:+PrintCompilation
>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>
>>>>> I get
>>>>>
>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>    # parm0:    rsi       = int
>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>    0x00007f14e1071100: mov    %eax,-0x14000(%rsp)
>>>>>    0x00007f14e1071107: push   %rbp
>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>
>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it does not
>>>>> use a frame pointer.
>>>>>
>>>>> However, if I now delete the -XX:+/-OmitFramePointer altogether, IE
>>>>>
>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation -XX:+PrintCompilation
>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>> -XX:+PrintAssembly fibo 43
>>>>>
>>>>> I get
>>>>>
>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>    # parm0:    rsi       = int
>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>    0x00007f0c75071100: mov    %eax,-0x14000(%rsp)
>>>>>    0x00007f0c75071107: push   %rbp
>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>
>>>>> It is not using a frame pointer which is the equivalent of
>>>>> -XX:+OmitFramePointer. However in your description above you say
>>>>>
>>>>>  Therefore I propose to have OmitFramePointer set to false by default
>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>>
>>>>> whereas OmitFramePointer actually seems to be set to true on x86_64
>>>>>
>>>>> I think the problem may be with the declaration and definition of
>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>
>>>>> In globals.hpp it does
>>>>>
>>>>> product(bool, OmitFramePointer, true,
>>>>>
>>>>> In globals_x86.hpp it does
>>>>>
>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>
>>>>> I am not sure that you can mix product(...) and product_pd(...) like
>>>>> this, so I think it just ends up getting the default from the
>>>>> product(...).
>>>>>
>>>>
>>>> You are right, mixing product and product_pd does not make sense at all.
>>>> Thank you for doing additional testing and for drawing attention to the
>>>> problem.
>>>>
>>>> I updated the code to use product_pd and define_pd_global on all
>>>> relevant platforms.
>>>>
>>>>  Aside: In general, I do not like options which include a negative in
>>>>> them because I have to do a double think when I see something like,
>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer,
>>>>> therefore it is using a frame pointer. How about FramePointer so we
>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>> -XX:-FramePointer to say I don't.
>>>>>
>>>>
>>>> That is a good idea. Double negation is an unnecessary complication, so
>>>> I changed the name of the flag to FramePointer, just as you suggested.
>>>>
>>>>
>>>>> I did some timing on the above 'fibo' test
>>>>>
>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>> -XX:-OmitFramePointer fibo 43
>>>>> 701408733
>>>>>
>>>>> real    0m1.545s
>>>>> user    0m1.571s
>>>>> sys    0m0.015s
>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>> -XX:+OmitFramePointer fibo 43
>>>>> 701408733
>>>>>
>>>>> real    0m1.504s
>>>>> user    0m1.527s
>>>>> sys    0m0.019s
>>>>>
>>>>> which is ~3% difference on this test case. On aarch64, I see ~7%
>>>>> difference on this test case.
>>>>>
>>>>
>>>> Thank you for the performance measurements!
>>>>
>>>>  With the above change to fix the logic of OmitFramePointer (and
>>>>> possible change its name) the patch looks good to me.
>>>>>
>>>>
>>>> Here is the updated webrev (the same webrev that was already included
>>>> into my reply to Roland):
>>>>
>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>
>>>>  I will prepare a mirror patch for aarch64.
>>>>>
>>>>
>>>> That would be great!
>>>>
>>>> Thank you and best regards,
>>>>
>>>>
>>>> Zolt?n
>>>>
>>>>
>>>>> All the best,
>>>>> Ed.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150422/610b3778/attachment-0001.html>

From aleksey.shipilev at oracle.com  Wed Apr 22 20:59:05 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 22 Apr 2015 23:59:05 +0300
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <5519C29D.8080200@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>
	<5519C29D.8080200@oracle.com>
Message-ID: <55380B99.6060702@oracle.com>

On 03/31/2015 12:39 AM, Vladimir Kozlov wrote:
> Also the original request was to have flag to enable such behavior (use
> RBP only as FP). So to have it off by default is acceptable. If
> performance group or someone find a regression (or bug) due to this
> change we can switch the flag off by default before jdk9 release.

I'll save you a round-trip!

Given:
 a) there is a way enabling PreserveFramePointer can lead to performance
regression -- under high register pressure;
 b) there is an anecdotal evidence +PreserveFramePointer *does* lead to
performance regression (see Zoltan's fib benchmarks);
 c) frame pointer is required for a specific use case (stack walking by
native profilers), and Java profilers are able to walk through
Java-specific frames anyway (Forte/AsyncGetCallTrace).

...I would say this feature should be opt-in, not opt-out.

If you want it to be opt-out, then please work with someone from
performance team to establish if enabling this by default really is safe.

Thanks,
-Aleksey.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150422/c9de66a5/signature.asc>

From vladimir.kozlov at oracle.com  Wed Apr 22 21:09:54 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 22 Apr 2015 14:09:54 -0700
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <CAHjP37GePOrnWQ8hUAQi8puNmgc9Q4-3kbSX7irV8UT_n_jS6w@mail.gmail.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>
	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>
	<5519C29D.8080200@oracle.com>
	<CAHjP37GePOrnWQ8hUAQi8puNmgc9Q4-3kbSX7irV8UT_n_jS6w@mail.gmail.com>
Message-ID: <55380E22.6050202@oracle.com>

You can always find a test which will regress with any changes :)
We need to enable the flag by default for now to test new code path.
As I said we can switch it off later - it may be not final value which 
will be shipped.

Vladimir

On 4/22/15 11:42 AM, Vitaly Davidovich wrote:
> Vladimir,
>
> I don't think preserving frame pointer should default to true.  I
> realize you're relying on performance group to detect regressions, but
> how confident are you that they cover every conceivable scenario?
> Personally, I'd rather this flag be off by default (keep that register
> allocatable) as most folks won't be running linux perf or the like
> regularly, and if they want nice call stacks, they can opt in.
>
> What do you think?
>
> sent from my phone
>
> On Mar 30, 2015 5:39 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com
> <mailto:vladimir.kozlov at oracle.com>> wrote:
>
>     On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>
>         Hi Vladimir,
>
>
>         thank you for the feedback!
>
>         On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>
>             How about PreserveFramePointer instead of simple FramePointer?
>
>             PreserveFramePointer will mean that compiled (or other) code
>             will use
>             that register only as Frame pointer.
>
>
>         I will change the flag's name to PreserveFramePointer and will also
>         update the description.
>
>             Zoltan, x86 flags setting should be in general
>             globals_x86.hpp. You
>             can #ifdef _LP64 there too. I don't understand why you only
>             set it to
>             true on linux-x64.
>
>
>         I remembered that the original discussion with Brendan Gregg
>         mentioned
>         only Linux's perf tool as a possible use case for "proper" frame
>         pointers. So I was unsure whether to enable proper frame pointers by
>         default on other x64 platforms as well.
>
>         But if you think it would be better to have proper frame
>         pointers on all
>         x64 platforms, I will change the code to set PreserveFramePointer to
>         true for all x64 platforms. Just please let me know.
>
>
>     Currently compiled code for all x86 platforms is almost the same
>     (win64 has difference in registers usage) and we should keep it that
>     way.
>
>     Also the original request was to have flag to enable such behavior
>     (use RBP only as FP). So to have it off by default is acceptable. If
>     performance group or someone find a regression (or bug) due to this
>     change we can switch the flag off by default before jdk9 release.
>
>     Try to run pstack on Solaris and jstack on OSX to make sure they
>     report correct call stack with compiled java methods. And JFR.
>     Also it would be nice to run SunStudio analyzer to verify that it works.
>
>     Thanks,
>     Vladimir
>
>
>         Thank you!
>
>         Best regards,
>
>
>         Zoltan
>
>
>             Thanks,
>             Vladimir
>
>             On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>
>                 Hi Ed,
>
>
>                 thank you for your feedback! Please see comments below.
>
>                 On 03/30/2015 11:11 AM, Edward Nevill wrote:
>
>                     Hi Zolt?n,
>
>                     On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>
>                         Full JPRT run, all tests pass. I also ran all
>                         hotspot compiler
>                         tests and
>                         the jdk tests in java/lang/invoke on both x86_64
>                         and x86_32. All tests
>                         that pass without the patch pass also with the
>                         patch.
>
>                         I ran the SPEC JVM 2008 benchmarks on our
>                         performance
>                         infrastructure for
>                         x86_64. The performance evaluation suggests that
>                         there is no
>                         statistically significant performance
>                         degradation due to having proper
>                         frame pointers. Therefore I propose to have
>                         OmitFramePointer set to
>                         false by default on x86_64 (and set to true on
>                         all other platforms).
>
>                     This patch looks good, however I think there is a
>                     problem with the
>                     logic of OmitFramePointer.
>
>                     Here is my test case.
>
>                     --- CUT HERE ---
>                     // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>                     // http://www.bagley.org/~doug/shootout/
>
>                     public class fibo {
>                           public static void main(String args[]) {
>                          int N = Integer.parseInt(args[0]);
>                          System.out.println(fib(N));
>                           }
>                           public static int fib(int n) {
>                          if (n < 2) return(1);
>                          return( fib(n-2) + fib(n-1) );
>                           }
>                     }
>                     --- CUT HERE ---
>
>                     If I run it as follows on my x86 64 bit linux.
>
>                     /work/images/jdk/bin/java -XX:-TieredCompilation
>                     -XX:+PrintCompilation
>                     -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>                     -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>
>                     I get
>
>                         # {method} {0x00007fc62c97f388} 'fib' '(I)I' in
>                     'fibo'
>                         # parm0:    rsi       = int
>                         #           [sp+0x30]  (sp of caller)
>                         0x00007fc625071100: mov    %eax,-0x14000(%rsp)
>                         0x00007fc625071107: push   %rbp
>                         0x00007fc625071108: mov    %rsp,%rbp
>                         0x00007f836907110b: sub    $0x20,%rsp
>                     ;*synchronization entry
>
>                     which is correct, it is NOT(-) OmitFramePointer,
>                     therefore it is using
>                     the frame pointer
>
>                     Now if I try just changing -XX:-OmitFramePointer to
>                     -XX:+OmitFramePointer in the above I get
>
>                     /work/images/jdk/bin/java -XX:-TieredCompilation
>                     -XX:+PrintCompilation
>                     -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>                     -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>
>                     I get
>
>                         # {method} {0x00007f14d3c00388} 'fib' '(I)I' in
>                     'fibo'
>                         # parm0:    rsi       = int
>                         #           [sp+0x30]  (sp of caller)
>                         0x00007f14e1071100: mov    %eax,-0x14000(%rsp)
>                         0x00007f14e1071107: push   %rbp
>                         0x00007f14e1071108: sub    $0x20,%rsp
>                     ;*synchronization entry
>
>                     which is correct, it is ID(+) OmitFramePointer,
>                     therefore it does not
>                     use a frame pointer.
>
>                     However, if I now delete the -XX:+/-OmitFramePointer
>                     altogether, IE
>
>                     /work/images/jdk/bin/java -XX:-TieredCompilation
>                     -XX:+PrintCompilation
>                     -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>                     -XX:+PrintAssembly fibo 43
>
>                     I get
>
>                         # {method} {0x00007f0c4b730388} 'fib' '(I)I' in
>                     'fibo'
>                         # parm0:    rsi       = int
>                         #           [sp+0x30]  (sp of caller)
>                         0x00007f0c75071100: mov    %eax,-0x14000(%rsp)
>                         0x00007f0c75071107: push   %rbp
>                         0x00007f0c75071108: sub    $0x20,%rsp
>                     ;*synchronization entry
>
>                     It is not using a frame pointer which is the
>                     equivalent of
>                     -XX:+OmitFramePointer. However in your description
>                     above you say
>
>                         Therefore I propose to have OmitFramePointer set
>                         to false by default
>                         on x86_64 (and set to true on all other platforms).
>
>                     whereas OmitFramePointer actually seems to be set to
>                     true on x86_64
>
>                     I think the problem may be with the declaration and
>                     definition of
>                     OmitFramePointer in globals.hpp and globals_x86.hpp
>
>                     In globals.hpp it does
>
>                     product(bool, OmitFramePointer, true,
>
>                     In globals_x86.hpp it does
>
>                     LP64_ONLY(define_pd_global(bool, OmitFramePointer,
>                     false););
>
>                     I am not sure that you can mix product(...) and
>                     product_pd(...) like
>                     this, so I think it just ends up getting the default
>                     from the
>                     product(...).
>
>
>                 You are right, mixing product and product_pd does not
>                 make sense at all.
>                 Thank you for doing additional testing and for drawing
>                 attention to the
>                 problem.
>
>                 I updated the code to use product_pd and
>                 define_pd_global on all
>                 relevant platforms.
>
>                     Aside: In general, I do not like options which
>                     include a negative in
>                     them because I have to do a double think when I see
>                     something like,
>                     -XX:-OmitFramePointer, as in, it is omitting the
>                     frame pointer,
>                     therefore it is using a frame pointer. How about
>                     FramePointer so we
>                     have -XX:+FramePointer to say I want frame pointers and
>                     -XX:-FramePointer to say I don't.
>
>
>                 That is a good idea. Double negation is an unnecessary
>                 complication, so
>                 I changed the name of the flag to FramePointer, just as
>                 you suggested.
>
>
>                     I did some timing on the above 'fibo' test
>
>                     [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>                     -XX:-OmitFramePointer fibo 43
>                     701408733
>
>                     real    0m1.545s
>                     user    0m1.571s
>                     sys    0m0.015s
>                     [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>                     -XX:+OmitFramePointer fibo 43
>                     701408733
>
>                     real    0m1.504s
>                     user    0m1.527s
>                     sys    0m0.019s
>
>                     which is ~3% difference on this test case. On
>                     aarch64, I see ~7%
>                     difference on this test case.
>
>
>                 Thank you for the performance measurements!
>
>                     With the above change to fix the logic of
>                     OmitFramePointer (and
>                     possible change its name) the patch looks good to me.
>
>
>                 Here is the updated webrev (the same webrev that was
>                 already included
>                 into my reply to Roland):
>
>                 http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>
>                     I will prepare a mirror patch for aarch64.
>
>
>                 That would be great!
>
>                 Thank you and best regards,
>
>
>                 Zolt?n
>
>
>                     All the best,
>                     Ed.
>
>
>
>
>

From vitalyd at gmail.com  Wed Apr 22 21:13:18 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 22 Apr 2015 17:13:18 -0400
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer in
	JIT compiled code on x86
In-Reply-To: <55380E22.6050202@oracle.com>
References: <55156A87.1070607@oracle.com>
	<1427706703.1606.22.camel@mylittlepony.linaroharston>
	<55196C2C.8080106@oracle.com> <5519B1AE.8070901@oracle.com>
	<5519BC6E.1090504@oracle.com> <5519C29D.8080200@oracle.com>
	<CAHjP37GePOrnWQ8hUAQi8puNmgc9Q4-3kbSX7irV8UT_n_jS6w@mail.gmail.com>
	<55380E22.6050202@oracle.com>
Message-ID: <CAHjP37FSF=frma0hkp3V2jLKM9ZV6auPTY-2NCtqfJ-WXHY7-g@mail.gmail.com>

I'm ok with that if it makes your testing easier for now, but wouldn't want
that to slip in as default in java 9 :).  But at any rate, if this feature
is going to be supported properly, at some point you'll need to have tests
that run both flavors anyway, irrespective of which is the default, so
perhaps just do that now (create test configs with this explicitly enabled,
and disable it by default).  Anyway, your call.

On Wed, Apr 22, 2015 at 5:09 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> wrote:

> You can always find a test which will regress with any changes :)
> We need to enable the flag by default for now to test new code path.
> As I said we can switch it off later - it may be not final value which
> will be shipped.
>
> Vladimir
>
> On 4/22/15 11:42 AM, Vitaly Davidovich wrote:
>
>> Vladimir,
>>
>> I don't think preserving frame pointer should default to true.  I
>> realize you're relying on performance group to detect regressions, but
>> how confident are you that they cover every conceivable scenario?
>> Personally, I'd rather this flag be off by default (keep that register
>> allocatable) as most folks won't be running linux perf or the like
>> regularly, and if they want nice call stacks, they can opt in.
>>
>> What do you think?
>>
>> sent from my phone
>>
>> On Mar 30, 2015 5:39 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com
>> <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>     On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>>
>>         Hi Vladimir,
>>
>>
>>         thank you for the feedback!
>>
>>         On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>
>>             How about PreserveFramePointer instead of simple FramePointer?
>>
>>             PreserveFramePointer will mean that compiled (or other) code
>>             will use
>>             that register only as Frame pointer.
>>
>>
>>         I will change the flag's name to PreserveFramePointer and will
>> also
>>         update the description.
>>
>>             Zoltan, x86 flags setting should be in general
>>             globals_x86.hpp. You
>>             can #ifdef _LP64 there too. I don't understand why you only
>>             set it to
>>             true on linux-x64.
>>
>>
>>         I remembered that the original discussion with Brendan Gregg
>>         mentioned
>>         only Linux's perf tool as a possible use case for "proper" frame
>>         pointers. So I was unsure whether to enable proper frame pointers
>> by
>>         default on other x64 platforms as well.
>>
>>         But if you think it would be better to have proper frame
>>         pointers on all
>>         x64 platforms, I will change the code to set PreserveFramePointer
>> to
>>         true for all x64 platforms. Just please let me know.
>>
>>
>>     Currently compiled code for all x86 platforms is almost the same
>>     (win64 has difference in registers usage) and we should keep it that
>>     way.
>>
>>     Also the original request was to have flag to enable such behavior
>>     (use RBP only as FP). So to have it off by default is acceptable. If
>>     performance group or someone find a regression (or bug) due to this
>>     change we can switch the flag off by default before jdk9 release.
>>
>>     Try to run pstack on Solaris and jstack on OSX to make sure they
>>     report correct call stack with compiled java methods. And JFR.
>>     Also it would be nice to run SunStudio analyzer to verify that it
>> works.
>>
>>     Thanks,
>>     Vladimir
>>
>>
>>         Thank you!
>>
>>         Best regards,
>>
>>
>>         Zoltan
>>
>>
>>             Thanks,
>>             Vladimir
>>
>>             On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>
>>                 Hi Ed,
>>
>>
>>                 thank you for your feedback! Please see comments below.
>>
>>                 On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>
>>                     Hi Zolt?n,
>>
>>                     On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>
>>                         Full JPRT run, all tests pass. I also ran all
>>                         hotspot compiler
>>                         tests and
>>                         the jdk tests in java/lang/invoke on both x86_64
>>                         and x86_32. All tests
>>                         that pass without the patch pass also with the
>>                         patch.
>>
>>                         I ran the SPEC JVM 2008 benchmarks on our
>>                         performance
>>                         infrastructure for
>>                         x86_64. The performance evaluation suggests that
>>                         there is no
>>                         statistically significant performance
>>                         degradation due to having proper
>>                         frame pointers. Therefore I propose to have
>>                         OmitFramePointer set to
>>                         false by default on x86_64 (and set to true on
>>                         all other platforms).
>>
>>                     This patch looks good, however I think there is a
>>                     problem with the
>>                     logic of OmitFramePointer.
>>
>>                     Here is my test case.
>>
>>                     --- CUT HERE ---
>>                     // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>                     // http://www.bagley.org/~doug/shootout/
>>
>>                     public class fibo {
>>                           public static void main(String args[]) {
>>                          int N = Integer.parseInt(args[0]);
>>                          System.out.println(fib(N));
>>                           }
>>                           public static int fib(int n) {
>>                          if (n < 2) return(1);
>>                          return( fib(n-2) + fib(n-1) );
>>                           }
>>                     }
>>                     --- CUT HERE ---
>>
>>                     If I run it as follows on my x86 64 bit linux.
>>
>>                     /work/images/jdk/bin/java -XX:-TieredCompilation
>>                     -XX:+PrintCompilation
>>                     -XX:CompileOnly=fibo::fib
>> -XX:+UnlockDiagnosticVMOptions
>>                     -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>
>>                     I get
>>
>>                         # {method} {0x00007fc62c97f388} 'fib' '(I)I' in
>>                     'fibo'
>>                         # parm0:    rsi       = int
>>                         #           [sp+0x30]  (sp of caller)
>>                         0x00007fc625071100: mov    %eax,-0x14000(%rsp)
>>                         0x00007fc625071107: push   %rbp
>>                         0x00007fc625071108: mov    %rsp,%rbp
>>                         0x00007f836907110b: sub    $0x20,%rsp
>>                     ;*synchronization entry
>>
>>                     which is correct, it is NOT(-) OmitFramePointer,
>>                     therefore it is using
>>                     the frame pointer
>>
>>                     Now if I try just changing -XX:-OmitFramePointer to
>>                     -XX:+OmitFramePointer in the above I get
>>
>>                     /work/images/jdk/bin/java -XX:-TieredCompilation
>>                     -XX:+PrintCompilation
>>                     -XX:CompileOnly=fibo::fib
>> -XX:+UnlockDiagnosticVMOptions
>>                     -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>
>>                     I get
>>
>>                         # {method} {0x00007f14d3c00388} 'fib' '(I)I' in
>>                     'fibo'
>>                         # parm0:    rsi       = int
>>                         #           [sp+0x30]  (sp of caller)
>>                         0x00007f14e1071100: mov    %eax,-0x14000(%rsp)
>>                         0x00007f14e1071107: push   %rbp
>>                         0x00007f14e1071108: sub    $0x20,%rsp
>>                     ;*synchronization entry
>>
>>                     which is correct, it is ID(+) OmitFramePointer,
>>                     therefore it does not
>>                     use a frame pointer.
>>
>>                     However, if I now delete the -XX:+/-OmitFramePointer
>>                     altogether, IE
>>
>>                     /work/images/jdk/bin/java -XX:-TieredCompilation
>>                     -XX:+PrintCompilation
>>                     -XX:CompileOnly=fibo::fib
>> -XX:+UnlockDiagnosticVMOptions
>>                     -XX:+PrintAssembly fibo 43
>>
>>                     I get
>>
>>                         # {method} {0x00007f0c4b730388} 'fib' '(I)I' in
>>                     'fibo'
>>                         # parm0:    rsi       = int
>>                         #           [sp+0x30]  (sp of caller)
>>                         0x00007f0c75071100: mov    %eax,-0x14000(%rsp)
>>                         0x00007f0c75071107: push   %rbp
>>                         0x00007f0c75071108: sub    $0x20,%rsp
>>                     ;*synchronization entry
>>
>>                     It is not using a frame pointer which is the
>>                     equivalent of
>>                     -XX:+OmitFramePointer. However in your description
>>                     above you say
>>
>>                         Therefore I propose to have OmitFramePointer set
>>                         to false by default
>>                         on x86_64 (and set to true on all other
>> platforms).
>>
>>                     whereas OmitFramePointer actually seems to be set to
>>                     true on x86_64
>>
>>                     I think the problem may be with the declaration and
>>                     definition of
>>                     OmitFramePointer in globals.hpp and globals_x86.hpp
>>
>>                     In globals.hpp it does
>>
>>                     product(bool, OmitFramePointer, true,
>>
>>                     In globals_x86.hpp it does
>>
>>                     LP64_ONLY(define_pd_global(bool, OmitFramePointer,
>>                     false););
>>
>>                     I am not sure that you can mix product(...) and
>>                     product_pd(...) like
>>                     this, so I think it just ends up getting the default
>>                     from the
>>                     product(...).
>>
>>
>>                 You are right, mixing product and product_pd does not
>>                 make sense at all.
>>                 Thank you for doing additional testing and for drawing
>>                 attention to the
>>                 problem.
>>
>>                 I updated the code to use product_pd and
>>                 define_pd_global on all
>>                 relevant platforms.
>>
>>                     Aside: In general, I do not like options which
>>                     include a negative in
>>                     them because I have to do a double think when I see
>>                     something like,
>>                     -XX:-OmitFramePointer, as in, it is omitting the
>>                     frame pointer,
>>                     therefore it is using a frame pointer. How about
>>                     FramePointer so we
>>                     have -XX:+FramePointer to say I want frame pointers
>> and
>>                     -XX:-FramePointer to say I don't.
>>
>>
>>                 That is a good idea. Double negation is an unnecessary
>>                 complication, so
>>                 I changed the name of the flag to FramePointer, just as
>>                 you suggested.
>>
>>
>>                     I did some timing on the above 'fibo' test
>>
>>                     [ed at mylittlepony java]$ time
>> /work/images/jdk/bin/java
>>                     -XX:-OmitFramePointer fibo 43
>>                     701408733
>>
>>                     real    0m1.545s
>>                     user    0m1.571s
>>                     sys    0m0.015s
>>                     [ed at mylittlepony java]$ time
>> /work/images/jdk/bin/java
>>                     -XX:+OmitFramePointer fibo 43
>>                     701408733
>>
>>                     real    0m1.504s
>>                     user    0m1.527s
>>                     sys    0m0.019s
>>
>>                     which is ~3% difference on this test case. On
>>                     aarch64, I see ~7%
>>                     difference on this test case.
>>
>>
>>                 Thank you for the performance measurements!
>>
>>                     With the above change to fix the logic of
>>                     OmitFramePointer (and
>>                     possible change its name) the patch looks good to me.
>>
>>
>>                 Here is the updated webrev (the same webrev that was
>>                 already included
>>                 into my reply to Roland):
>>
>>                 http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>
>>                     I will prepare a mirror patch for aarch64.
>>
>>
>>                 That would be great!
>>
>>                 Thank you and best regards,
>>
>>
>>                 Zolt?n
>>
>>
>>                     All the best,
>>                     Ed.
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150422/9c56886e/attachment-0001.html>

From aleksey.shipilev at oracle.com  Wed Apr 22 21:16:27 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Thu, 23 Apr 2015 00:16:27 +0300
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <55380E22.6050202@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<CAHjP37GePOrnWQ8hUAQi8puNmgc9Q4-3kbSX7irV8UT_n_jS6w@mail.gmail.com>
	<55380E22.6050202@oracle.com>
Message-ID: <55380FAB.1070000@oracle.com>

On 04/23/2015 12:09 AM, Vladimir Kozlov wrote:
> You can always find a test which will regress with any changes :)

Yes, and the job for performance guys is to say which tests are really
important, and which can be disregarded ;)

> We need to enable the flag by default for now to test new code path.
> As I said we can switch it off later - it may be not final value which
> will be shipped.

I fully dig the idea of running the tests with a flag set on.

IMO, if this thing comes in enabled by default, there should be a shadow
high-priority bug saying "Figure out if we need to to disable $flag", so
it does not slip from the release radar.

-Aleksey.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/63069997/signature.asc>

From vitalyd at gmail.com  Wed Apr 22 21:21:16 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Wed, 22 Apr 2015 17:21:16 -0400
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer in
	JIT compiled code on x86
In-Reply-To: <55380FAB.1070000@oracle.com>
References: <55156A87.1070607@oracle.com>
	<1427706703.1606.22.camel@mylittlepony.linaroharston>
	<55196C2C.8080106@oracle.com> <5519B1AE.8070901@oracle.com>
	<5519BC6E.1090504@oracle.com> <5519C29D.8080200@oracle.com>
	<CAHjP37GePOrnWQ8hUAQi8puNmgc9Q4-3kbSX7irV8UT_n_jS6w@mail.gmail.com>
	<55380E22.6050202@oracle.com> <55380FAB.1070000@oracle.com>
Message-ID: <CAHjP37FquPYcpjA9yPPNP_=nsiYKKkeQ0Doc-ndhHQDOW2w7Xg@mail.gmail.com>

>
> IMO, if this thing comes in enabled by default, there should be a shadow
> high-priority bug saying "Figure out if we need to to disable $flag", so
> it does not slip from the release radar.


More like "Turn off $flag before release" :) It seems uncontroversial to
say this can only cause regressions, and not improve performance (and if it
somehow does, then perhaps it's a bug elsewhere).  Moreover, the use case
is for performance profiling/tuning, i.e. not an everyday feature you'd
care about.

On Wed, Apr 22, 2015 at 5:16 PM, Aleksey Shipilev <
aleksey.shipilev at oracle.com> wrote:

> On 04/23/2015 12:09 AM, Vladimir Kozlov wrote:
> > You can always find a test which will regress with any changes :)
>
> Yes, and the job for performance guys is to say which tests are really
> important, and which can be disregarded ;)
>
> > We need to enable the flag by default for now to test new code path.
> > As I said we can switch it off later - it may be not final value which
> > will be shipped.
>
> I fully dig the idea of running the tests with a flag set on.
>
> IMO, if this thing comes in enabled by default, there should be a shadow
> high-priority bug saying "Figure out if we need to to disable $flag", so
> it does not slip from the release radar.
>
> -Aleksey.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150422/d4b1b60b/attachment.html>

From vladimir.kozlov at oracle.com  Wed Apr 22 21:21:26 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 22 Apr 2015 14:21:26 -0700
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <55380FAB.1070000@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<CAHjP37GePOrnWQ8hUAQi8puNmgc9Q4-3kbSX7irV8UT_n_jS6w@mail.gmail.com>
	<55380E22.6050202@oracle.com> <55380FAB.1070000@oracle.com>
Message-ID: <553810D6.3060409@oracle.com>

Okay. We can leave the flag off by default but ask SQE to add it to 
flags rotation in our Nightly testing. This way we will have test 
coverage for both cases. Igor?

Vladimir

On 4/22/15 2:16 PM, Aleksey Shipilev wrote:
> On 04/23/2015 12:09 AM, Vladimir Kozlov wrote:
>> You can always find a test which will regress with any changes :)
>
> Yes, and the job for performance guys is to say which tests are really
> important, and which can be disregarded ;)
>
>> We need to enable the flag by default for now to test new code path.
>> As I said we can switch it off later - it may be not final value which
>> will be shipped.
>
> I fully dig the idea of running the tests with a flag set on.
>
> IMO, if this thing comes in enabled by default, there should be a shadow
> high-priority bug saying "Figure out if we need to to disable $flag", so
> it does not slip from the release radar.
>
> -Aleksey.
>
>

From aleksey.shipilev at oracle.com  Wed Apr 22 21:27:57 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Thu, 23 Apr 2015 00:27:57 +0300
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <553810D6.3060409@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<CAHjP37GePOrnWQ8hUAQi8puNmgc9Q4-3kbSX7irV8UT_n_jS6w@mail.gmail.com>
	<55380E22.6050202@oracle.com> <55380FAB.1070000@oracle.com>
	<553810D6.3060409@oracle.com>
Message-ID: <5538125D.1070800@oracle.com>

On 04/23/2015 12:21 AM, Vladimir Kozlov wrote:
> Okay. We can leave the flag off by default but ask SQE to add it to
> flags rotation in our Nightly testing. This way we will have test
> coverage for both cases. Igor?

That looks like a good move. We still have to test both codepaths,
regardless of the what is the flag value by default.

Thanks,
-Aleksey.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/8ee29aac/signature.asc>

From vladimir.kozlov at oracle.com  Wed Apr 22 21:47:37 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 22 Apr 2015 14:47:37 -0700
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <5537CBAE.9020500@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<551BF4D3.90805@oracle.com>
	<5537CBAE.9020500@oracle.com>
Message-ID: <553816F9.1040104@oracle.com>

Looks like 2 issues left.

First, after discussion on mailing list lets set the flag off by default 
and ask SQE to add it to rotation flags for Nightly testing (after you 
push changes).

Second, I am concern about vframeStreamForte() call change. Call stack 
may have several Call stubs. They should be skipped to get all java 
frames on stack as we do in other places. Otherwise we can get incorrect 
profiling information.

Do you mean "can NOT happen"?:
 > Moreover, if the stack is walked synchronously (e.g., at safepoints), no
 > problems appear either, because the synchronous interruption can happen
 > while execution is within the method handle intrinsic.

Vladimir

On 4/22/15 9:26 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> I managed to do some more work on this enhancement. Please see details
> below.
>
> On 04/01/2015 03:38 PM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
>>> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>>>> Hi Vladimir,
>>>>
>>>>
>>>> thank you for the feedback!
>>>>
>>>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>>>> How about PreserveFramePointer instead of simple FramePointer?
>>>>>
>>>>> PreserveFramePointer will mean that compiled (or other) code will use
>>>>> that register only as Frame pointer.
>>>>
>>>> I will change the flag's name to PreserveFramePointer and will also
>>>> update the description.
>
> I changed the flag's name to PreserveFramePointer, just as you suggested.
>
>>>>
>>>>> Zoltan, x86 flags setting should be in general globals_x86.hpp. You
>>>>> can #ifdef _LP64 there too. I don't understand why you only set it to
>>>>> true on linux-x64.
>>>>
>>>> I remembered that the original discussion with Brendan Gregg mentioned
>>>> only Linux's perf tool as a possible use case for "proper" frame
>>>> pointers. So I was unsure whether to enable proper frame pointers by
>>>> default on other x64 platforms as well.
>>>>
>>>> But if you think it would be better to have proper frame pointers on
>>>> all
>>>> x64 platforms, I will change the code to set PreserveFramePointer to
>>>> true for all x64 platforms. Just please let me know.
>
> The current webrev sets the PreserveFramePointer flag to to true on all
> x86_64 platforms and to false on all other platforms.
>
>>>
>>> Currently compiled code for all x86 platforms is almost the same
>>> (win64 has difference in registers usage) and we should keep it that
>>> way.
>>>
>>> Also the original request was to have flag to enable such behavior
>>> (use RBP only as FP). So to have it off by default is acceptable. If
>>> performance group or someone find a regression (or bug) due to this
>>> change we can switch the flag off by default before jdk9 release.
>>>
>>> Try to run pstack on Solaris and jstack on OSX to make sure they
>>> report correct call stack with compiled java methods. And JFR.
>>> Also it would be nice to run SunStudio analyzer to verify that it works.
>>
>> I ran all tools you've suggested. JFR and jstack is unaffected, pstack
>> produces nice stack traces (it did not always do so before).
>
> I tested the current webrev with the following setup: I used two tests,
> one that generates a long chain of lambda form invocations and an other
> one that generates a long chain of "regular" method invocations. Both
> tests were executed on an x64 machine in four configurations: with +/-
> Xcomp and with +/- PreserveFramePointer.
>
> Just as before, JFR and jstack stack traces are unaffected for both
> tests, pstack can now produce stack traces with both tests if
> PreserveFramePointer is enabled.
>
>> However, I've encountered a problem with SunStudio: Two asserts fail
>> in the fastdebug build. Both of them  "soft" failures, as neither the
>> VM nor SunStudio crash with the product build. I worked on the problem
>> today and have a partial understanding of the issue, but more
>> investigation is needed to have a patch that preserves the correct
>> behavior of SunStudio as well.
>
> I was able to track down the problems with SunStudio. I had to change
> the code at two places.
>
>
> Change #1 (in src/cpu/x86/vm/frame_x86.cpp):
>
> *** 222,232 ****
>        }
>
>        if (sender_blob->is_nmethod()) {
>            nmethod* nm = sender_blob->as_nmethod_or_null();
>            if (nm != NULL) {
> !             if (nm->is_deopt_mh_entry(sender_pc) ||
> nm->is_deopt_entry(sender_pc)) {
>                    return false;
>                }
>            }
>        }
>
> --- 222,233 ----
>        }
>
>        if (sender_blob->is_nmethod()) {
>            nmethod* nm = sender_blob->as_nmethod_or_null();
>            if (nm != NULL) {
> !             if (nm->is_deopt_mh_entry(sender_pc) ||
> nm->is_deopt_entry(sender_pc) ||
> !                 nm->method()->is_method_handle_intrinsic()) {
>                    return false;
>                }
>            }
>        }
>
> The reason for this change is the following. Method handle intrinsics
> (i.e., the intrinsics _invokeBasic, _linkToVirtual,_linkToStatic,
> _linkToSpecial, and _linkToInterface) do not allocate stack space when
> invoked, but they can extend the stack space of their caller "temporarily".
>
> For example, if VerifyMethodHandles is enabled, some stack space is used
> during verification. The temporarily used stack space is released before
> the intrinsic jumps to its target. As a result, the target of a method
> handle intrinsic will have a correct SP when it returns and the
> program's control flow is correct.
>
> Moreover, if the stack is walked synchronously (e.g., at safepoints), no
> problems appear either, because the synchronous interruption can happen
> while execution is within the method handle intrinsic.
>
> The problem is that the SunStudio analyzer can interrupt the VM
> asynchonously and walk the stack. If execution of a thread is
> interrupted while the thread is in a method handle intrinsic, the SP
> might contain an invalid value.
>
> The new webrev adds a check that marks the current frame unsafe for
> sender if the frame belongs to a method handle intrinsic
> (frame::safe_for_sender returns false in this case).
>
>
> Change #2 (in src/share/vm/prims/forte.cpp):
>
> *** 425,435 ****
>
>        RegisterMap map(thd, false);
>        initial_Java_frame = initial_Java_frame.sender(&map);
>      }
>
> !   vframeStreamForte st(thd, initial_Java_frame, false);
>
>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>        bci = st.bci();
>        method = st.method();
>
> --- 425,435 ----
>
>        RegisterMap map(thd, false);
>        initial_Java_frame = initial_Java_frame.sender(&map);
>      }
>
> !   vframeStreamForte st(thd, initial_Java_frame, true);
>
>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>        bci = st.bci();
>        method = st.method();
>
> The problem is that the following assert in forte.cpp on line 103
>
> assert(filled_in, "invariant");
>
> fails. The problem appears if we have a stack trace like:
>
> V  [libjvm.so+0x1c98c4a]  void VMError::report(outputStream*)+0xb1a
> V  [libjvm.so+0x1c9a3e8]  void VMError::report_and_die()+0x748
> V  [libjvm.so+0x1003c8e]  void report_vm_error(const char*,int,const
> char*,const char*)+0x7e
> V  [libjvm.so+0x10efa22]  vframeStreamForte::vframeStreamForte #Nvariant
> 1(JavaThread*,frame,bool)+0xe2
> --> (Frame #5) V  [libjvm.so+0x10f0bb9]  void
> forte_fill_call_trace_given_top(JavaThread*,ASGCT_CallTrace*,int,frame)+0x789
>
> V  [libjvm.so+0x10f1436]  AsyncGetCallTrace+0x246
> C  [libcollector.so+0x272a8]  __collector_ext_jstack_unwind+0xb8
> C  [libcollector.so+0x277df]  __collector_get_frame_info+0x27f
> C  [libcollector.so+0x2f093]  __collector_getUserCtx+0x13
> C  [libcollector.so+0x1abc7]  __collector_ext_profile_handler+0x127
> C  [libcollector.so+0x17535]  collector_sigprof_dispatcher+0x85
> C  [libc.so.1+0x122476]  __sighndlr+0x6
> C  [libc.so.1+0x115972]  call_user_handler+0x2ce
> C  [libc.so.1+0x115e1b]  sigacthandler+0xdb
> C  0xffffffffffffffff
> V  [libjvm.so+0x1959089]  void os::PlatformEvent::park()+0xd9
> V  [libjvm.so+0x18a6b34]  int ParkCommon(ParkEvent*,long)+0x34
> V  [libjvm.so+0x18a7657]  int Monitor::IWait(Thread*,long)+0xb7
> V  [libjvm.so+0x18a8b86]  bool Monitor::wait(bool,long,bool)+0x346
> V  [libjvm.so+0xf7073d]  void
> CompileBroker::wait_for_completion(CompileTask*)+0xad
> V  [libjvm.so+0xf6f6b6]  void
> CompileBroker::compile_method_base(methodHandle,int,int,methodHandle,int,const
> char*,Thread*)+0x406
> V  [libjvm.so+0xf6fd96]
> nmethod*CompileBroker::compile_method(methodHandle,int,int,methodHandle,int,const
> char*,Thread*)+0x586
> V  [libjvm.so+0xbbad72]  void
> AdvancedThresholdPolicy::submit_compile(methodHandle,int,CompLevel,JavaThread*)+0xb2
>
> V  [libjvm.so+0x1aef92f]  void
> SimpleThresholdPolicy::compile(methodHandle,int,CompLevel,JavaThread*)+0x14f
>
> V  [libjvm.so+0xbbb00f]  void
> AdvancedThresholdPolicy::method_invocation_event(methodHandle,methodHandle,CompLevel,nmethod*,JavaThread*)+0x1ff
>
> V  [libjvm.so+0x1aef765]
> nmethod*SimpleThresholdPolicy::event(methodHandle,methodHandle,int,int,CompLevel,nmethod*,JavaThread*)+0x2e5
>
> V  [libjvm.so+0xdb93bc]  unsigned
> char*Runtime1::counter_overflow(JavaThread*,int,Method*)+0x31c
> v  ~RuntimeStub::counter_overflow Runtime1 stub
> --> (Frame #29) J 143 C1
> java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @
> 0xffff80ffacc6962a [0xffff80ffacc69580+0xaa]
> --> (Frame #30) v  ~StubRoutines::call_stub
> V  [libjvm.so+0x13ca50b]  void
> JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0x41b
>
> V  [libjvm.so+0x152a111]  JVM_DoPrivileged+0xfb1
> C  [libjava.so+0x12f42]
> Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x12
>
> J 142
> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;
> (0 bytes) @ 0xffff80ffb400c57c [0xffff80ffb400c420+0x15c\
> ]
> J 134 C1
> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;
> (47 bytes) @ 0xffff80ffacc66014 [0xffff80ffacc65ec0+0x154]
> ... more stack frames
>
> The forte_fill_call_trace_given_top() method (Frame #5) first checks if
> the first Java frame found is fully decipherable (line 395 in
> forte.cpp). In our case the first Java frame is Frame #29 (the
> C1-compiled version of java.net.URLClassLoader$1.run).
>
> In our case Frame #29 is not decipherable, because
> java.net.URLClassLoader$1.run has been made "not entrant" (a C2-compiled
> version of the same method has been produced shortly before).
>
> Afterwards, forte_fill_call_trace_given_top() checks if the method is
> "safe for sender" (line 424 in forte.cpp). The caller of the
> java.net.URLClassLoader$1.run method is ~StubRoutines::call_stub, which
> is considered "safe for sender" by the VM.
>
> Then, initial_Java_frame is set to the ~StubRoutines::call_stub stub
> (line 430). This does not seem to be correct because the stub is not a
> Java method and causes the assert(filled_in, "invariant") in the
> constructor of vframeStreamForte (line 103 in forte.cpp) to fail
> (because the frame cannot be filled from a stub).
>
> To avoid this failure, I propose to call the constructor of
> vframeStreamForte with parameter stop_at_java_call_stub set to true
> (instead of false) so that the VM stops walking the stack if a call stub
> has been reached.
>
>
> Here is the updated webrev:
>
> http://cr.openjdk.java.net/~zmajo/8068945/webrev.02/
>
> In addition to testing the changeset with the tools mentioned before, I
> executed
> - all JPRT tests, all pass;
> - all java/lang/invoke and compiler JTREG tests; all tests that pass
> with the unmodified source trace pass with the changes as well.
>
> Thank you very much in advance!
>
> Best regards,
>
>
> Zoltan
>
>>
>> So that will put this RFR on hold for a while, unfortunately.
>>
>> Thank you for the feedback and suggestions so far!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>>>> Hi Ed,
>>>>>>
>>>>>>
>>>>>> thank you for your feedback! Please see comments below.
>>>>>>
>>>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>>>> Hi Zolt?n,
>>>>>>>
>>>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler
>>>>>>>> tests and
>>>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32. All
>>>>>>>> tests
>>>>>>>> that pass without the patch pass also with the patch.
>>>>>>>>
>>>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>>>> infrastructure for
>>>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>>>> statistically significant performance degradation due to having
>>>>>>>> proper
>>>>>>>> frame pointers. Therefore I propose to have OmitFramePointer set to
>>>>>>>> false by default on x86_64 (and set to true on all other
>>>>>>>> platforms).
>>>>>>> This patch looks good, however I think there is a problem with the
>>>>>>> logic of OmitFramePointer.
>>>>>>>
>>>>>>> Here is my test case.
>>>>>>>
>>>>>>> --- CUT HERE ---
>>>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>>>
>>>>>>> public class fibo {
>>>>>>>      public static void main(String args[]) {
>>>>>>>     int N = Integer.parseInt(args[0]);
>>>>>>>     System.out.println(fib(N));
>>>>>>>      }
>>>>>>>      public static int fib(int n) {
>>>>>>>     if (n < 2) return(1);
>>>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>>>      }
>>>>>>> }
>>>>>>> --- CUT HERE ---
>>>>>>>
>>>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>>>
>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>> -XX:+PrintCompilation
>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>
>>>>>>> I get
>>>>>>>
>>>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>>>    # parm0:    rsi       = int
>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>    0x00007fc625071100: mov    %eax,-0x14000(%rsp)
>>>>>>>    0x00007fc625071107: push   %rbp
>>>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>
>>>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is
>>>>>>> using
>>>>>>> the frame pointer
>>>>>>>
>>>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>>>> -XX:+OmitFramePointer in the above I get
>>>>>>>
>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>> -XX:+PrintCompilation
>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>
>>>>>>> I get
>>>>>>>
>>>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>>>    # parm0:    rsi       = int
>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>    0x00007f14e1071100: mov    %eax,-0x14000(%rsp)
>>>>>>>    0x00007f14e1071107: push   %rbp
>>>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>
>>>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it does
>>>>>>> not
>>>>>>> use a frame pointer.
>>>>>>>
>>>>>>> However, if I now delete the -XX:+/-OmitFramePointer altogether, IE
>>>>>>>
>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>> -XX:+PrintCompilation
>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>> -XX:+PrintAssembly fibo 43
>>>>>>>
>>>>>>> I get
>>>>>>>
>>>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>>>    # parm0:    rsi       = int
>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>    0x00007f0c75071100: mov    %eax,-0x14000(%rsp)
>>>>>>>    0x00007f0c75071107: push   %rbp
>>>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>
>>>>>>> It is not using a frame pointer which is the equivalent of
>>>>>>> -XX:+OmitFramePointer. However in your description above you say
>>>>>>>
>>>>>>>> Therefore I propose to have OmitFramePointer set to false by
>>>>>>>> default
>>>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>>> whereas OmitFramePointer actually seems to be set to true on x86_64
>>>>>>>
>>>>>>> I think the problem may be with the declaration and definition of
>>>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>>>
>>>>>>> In globals.hpp it does
>>>>>>>
>>>>>>> product(bool, OmitFramePointer, true,
>>>>>>>
>>>>>>> In globals_x86.hpp it does
>>>>>>>
>>>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>>>
>>>>>>> I am not sure that you can mix product(...) and product_pd(...) like
>>>>>>> this, so I think it just ends up getting the default from the
>>>>>>> product(...).
>>>>>>
>>>>>> You are right, mixing product and product_pd does not make sense
>>>>>> at all.
>>>>>> Thank you for doing additional testing and for drawing attention
>>>>>> to the
>>>>>> problem.
>>>>>>
>>>>>> I updated the code to use product_pd and define_pd_global on all
>>>>>> relevant platforms.
>>>>>>
>>>>>>> Aside: In general, I do not like options which include a negative in
>>>>>>> them because I have to do a double think when I see something like,
>>>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer,
>>>>>>> therefore it is using a frame pointer. How about FramePointer so we
>>>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>>>> -XX:-FramePointer to say I don't.
>>>>>>
>>>>>> That is a good idea. Double negation is an unnecessary
>>>>>> complication, so
>>>>>> I changed the name of the flag to FramePointer, just as you
>>>>>> suggested.
>>>>>>
>>>>>>>
>>>>>>> I did some timing on the above 'fibo' test
>>>>>>>
>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>> -XX:-OmitFramePointer fibo 43
>>>>>>> 701408733
>>>>>>>
>>>>>>> real    0m1.545s
>>>>>>> user    0m1.571s
>>>>>>> sys    0m0.015s
>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>> -XX:+OmitFramePointer fibo 43
>>>>>>> 701408733
>>>>>>>
>>>>>>> real    0m1.504s
>>>>>>> user    0m1.527s
>>>>>>> sys    0m0.019s
>>>>>>>
>>>>>>> which is ~3% difference on this test case. On aarch64, I see ~7%
>>>>>>> difference on this test case.
>>>>>>
>>>>>> Thank you for the performance measurements!
>>>>>>
>>>>>>> With the above change to fix the logic of OmitFramePointer (and
>>>>>>> possible change its name) the patch looks good to me.
>>>>>>
>>>>>> Here is the updated webrev (the same webrev that was already included
>>>>>> into my reply to Roland):
>>>>>>
>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>>>
>>>>>>> I will prepare a mirror patch for aarch64.
>>>>>>
>>>>>> That would be great!
>>>>>>
>>>>>> Thank you and best regards,
>>>>>>
>>>>>>
>>>>>> Zolt?n
>>>>>>
>>>>>>>
>>>>>>> All the best,
>>>>>>> Ed.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>

From zoltan.majo at oracle.com  Thu Apr 23 07:29:28 2015
From: zoltan.majo at oracle.com (=?windows-1252?Q?Zolt=E1n_Maj=F3?=)
Date: Thu, 23 Apr 2015 09:29:28 +0200
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <55380B99.6060702@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>
	<55380B99.6060702@oracle.com>
Message-ID: <55389F58.3070904@oracle.com>

Hi Aleksey,


On 04/22/2015 10:59 PM, Aleksey Shipilev wrote:
> On 03/31/2015 12:39 AM, Vladimir Kozlov wrote:
>> Also the original request was to have flag to enable such behavior (use
>> RBP only as FP). So to have it off by default is acceptable. If
>> performance group or someone find a regression (or bug) due to this
>> change we can switch the flag off by default before jdk9 release.
> I'll save you a round-trip!

thank you for the feedback!

> Given:
>   a) there is a way enabling PreserveFramePointer can lead to performance
> regression -- under high register pressure;
>   b) there is an anecdotal evidence +PreserveFramePointer *does* lead to
> performance regression (see Zoltan's fib benchmarks);

Just for the record: Benchmarking with Fibonacci on x86_64 was done by 
Ed Newill and he has seen a 3% performance difference. I ran the SPEC 
JVM 2008 benchmarks for x86_64 and have not seen any statistically 
significant performance degradation.

A thorough performance evaluation probably would not hurt.

Thank you and best regards,


Zolt?n

>   c) frame pointer is required for a specific use case (stack walking by
> native profilers), and Java profilers are able to walk through
> Java-specific frames anyway (Forte/AsyncGetCallTrace).
>
> ...I would say this feature should be opt-in, not opt-out.
>
> If you want it to be opt-out, then please work with someone from
> performance team to establish if enabling this by default really is safe.
>
> Thanks,
> -Aleksey.
>


From evgeniya.stepanova at oracle.com  Thu Apr 23 09:57:24 2015
From: evgeniya.stepanova at oracle.com (Evgeniya Stepanova)
Date: Thu, 23 Apr 2015 12:57:24 +0300
Subject: [8u60] RFR(s): 8038098: [TESTBUG] remove explicit set build flavor
	from hotspot/test/compiler/* tests
In-Reply-To: <5537B9D4.5080604@oracle.com>
References: <5527D7C1.9050704@oracle.com>
	<552B8AC5.6040404@oracle.com>	<552B8C3C.2080403@oracle.com>
	<553785E9.6070700@oracle.com> <5537B9D4.5080604@oracle.com>
Message-ID: <5538C204.9000301@oracle.com>

Hi Vladimir,
Thanks for the review!
Jane
On 22.04.2015 18:10, Vladimir Kozlov wrote:
> Reviewed. Good.
>
> Thanks,
> Vladimir
>
> On 4/22/15 4:28 AM, Evgeniya Stepanova wrote:
>> Hi!
>> I still need a reviewer's approval for this.
>> Please take a look
>>
>> Thanks,
>> Jane
>> On 13.04.2015 12:28, Evgeniya Stepanova wrote:
>>> Hi Igor,
>>>
>>> Thank you for the review!
>>>
>>> Jane
>>> On 13.04.2015 12:22, Igor Ignatyev wrote:
>>>> Evgeniya,
>>>>
>>>> looks good to me.
>>>>
>>>> Igor
>>>>
>>>> On 04/10/2015 05:01 PM, Evgeniya Stepanova wrote:
>>>>> Hi,
>>>>>
>>>>> Could you please review back-port of 8038098 to the 8udev repo?
>>>>> Diff applies cleanly to the all tests except of the
>>>>> test/compiler/IntegerArithmetic/TestIntegerComparison.java test, 
>>>>> which
>>>>> does not exist in 8u60 repo.
>>>>> After fix tests pass with 8u60 b09 with the client vm.
>>>>>
>>>>> webrev for 8u60:
>>>>> http://cr.openjdk.java.net/~eistepan/8038098/8u60/webrev.00/
>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8038098
>>>>>
>>>>> Original webrev:
>>>>> http://cr.openjdk.java.net/~iignatyev/eistepan/8038098/webrev.02/
>>>>> mail thread for 9:
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015540.html 
>>>>>
>>>>> Original change:
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/662499384b32
>>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/662499384b32
>>>>>
>>>>> Thanks,
>>>>> Jane
>>>>> -- 
>>>>> /Evgeniya Stepanova/
>>>
>>> -- 
>>> /Evgeniya Stepanova/
>>
>> -- 
>> /Evgeniya Stepanova/

-- 
/Evgeniya Stepanova/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/456c1580/attachment.html>

From vitalyd at gmail.com  Thu Apr 23 15:21:16 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Thu, 23 Apr 2015 11:21:16 -0400
Subject: TrustFinalNonStaticFields clarification
In-Reply-To: <CAHjP37FNeV+heJ6Rze6kK-stP7UtUZiN_yCsdsjTwQzBHRaQcg@mail.gmail.com>
References: <CAHjP37GF11yoyKWHA=foDp+OO7X_ZxaSa5kUg5n=N2u-TVmr+w@mail.gmail.com>
	<CAHjP37FNeV+heJ6Rze6kK-stP7UtUZiN_yCsdsjTwQzBHRaQcg@mail.gmail.com>
Message-ID: <CAHjP37E3uShXueSFaRoeTp4i2-Bj_A1qNoTZ2zWsQqUk1PHv7g@mail.gmail.com>

Ok last try to get some insight :).

sent from my phone
On Apr 21, 2015 2:02 PM, "Vitaly Davidovich" <vitalyd at gmail.com> wrote:

> Anyone? :)
>
> In my brief experimentation on 7u60, only thing I noticed is
> Enum.ordinal() replaced with constant in compiled method.  I couldn't,
> however, get it to constant propagate array length, eliminate null receiver
> check, etc.
>
> sent from my phone
> On Apr 20, 2015 10:58 AM, "Vitaly Davidovich" <vitalyd at gmail.com> wrote:
>
>> Fixed the flag name in the subject.
>>
>> On Fri, Apr 17, 2015 at 9:07 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>
>>> Hi guys,
>>>
>>> I'm hoping someone could clarify/confirm my understanding of this
>>> experimental flag's effects:
>>>
>>> 1) final instance array length is constant propagated? Even if array is
>>> passed in as ctor arg rather than being instantiated in the ctor?
>>> 2) final instance fields seen as never null are forever considered as
>>> such? So even if a method call on that object is fully eliminated (e.g. the
>>> method is empty) no null check is left behind?
>>> 3) concrete runtime type of the instance field is propagated to uses and
>>> no additional type checks are done? Say the declared type is an
>>> interface/abstract with multiple implementations loaded but only one type
>>> stored in the field - is a type check eliminated and calls are fully
>>> devirtualized?
>>> 4) primitive type final fields have their value constant propagated if
>>> compiler sees only one value always stored?
>>> 5) do derived classes and base class share field profile or not? For
>>> example subclasses always store concrete type but each subclass stores a
>>> different type from the others.
>>>
>>> Also, there's been some talk about doing these optimizations
>>> automatically with invalidations builtin.  Just curious where that stands.
>>>
>>> Thanks
>>>
>>> sent from my phone
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/00e17bcf/attachment.html>

From roland.westrelin at oracle.com  Thu Apr 23 15:31:07 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 23 Apr 2015 17:31:07 +0200
Subject: RFR(XS): 8078444: compiler/arraycopy/TestArrayCopyNoInitDeopt.java
	fails with exception 'm2 not deoptimized'
Message-ID: <C63D5DDA-C19A-4BA6-A026-3A811A331010@oracle.com>

http://cr.openjdk.java.net/~roland/8078444/webrev.00/

Some platforms don?t support parameter/argument/return value profiling.

Roland.

From vladimir.kozlov at oracle.com  Thu Apr 23 15:42:05 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 23 Apr 2015 08:42:05 -0700
Subject: RFR(XS): 8078444: compiler/arraycopy/TestArrayCopyNoInitDeopt.java
	fails with exception 'm2 not deoptimized'
In-Reply-To: <C63D5DDA-C19A-4BA6-A026-3A811A331010@oracle.com>
References: <C63D5DDA-C19A-4BA6-A026-3A811A331010@oracle.com>
Message-ID: <553912CD.6010008@oracle.com>

Should you use '&' and check all 3 digits since value of TypeProfileLevel is encoded as 3 digits?

Thanks,
Vladimir

On 4/23/15 8:31 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8078444/webrev.00/
>
> Some platforms don?t support parameter/argument/return value profiling.
>
> Roland.
>

From roland.westrelin at oracle.com  Thu Apr 23 15:43:45 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 23 Apr 2015 17:43:45 +0200
Subject: RFR(XS): 8078444:
	compiler/arraycopy/TestArrayCopyNoInitDeopt.java fails with
	exception 'm2 not deoptimized'
In-Reply-To: <553912CD.6010008@oracle.com>
References: <C63D5DDA-C19A-4BA6-A026-3A811A331010@oracle.com>
	<553912CD.6010008@oracle.com>
Message-ID: <AA27978E-8A79-4AC1-BED4-00A77BFF2930@oracle.com>

Thanks for looking at this, Vladimir.

> Should you use '&' and check all 3 digits since value of TypeProfileLevel is encoded as 3 digits?

The test explicitly sets TypeProfileLevel=020 because that?s all it needs. So I check that it gets what it wants.

Roland.

From vladimir.kozlov at oracle.com  Thu Apr 23 16:08:01 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 23 Apr 2015 09:08:01 -0700
Subject: RFR(XS): 8078444: compiler/arraycopy/TestArrayCopyNoInitDeopt.java
	fails with exception 'm2 not deoptimized'
In-Reply-To: <AA27978E-8A79-4AC1-BED4-00A77BFF2930@oracle.com>
References: <C63D5DDA-C19A-4BA6-A026-3A811A331010@oracle.com>	<553912CD.6010008@oracle.com>
	<AA27978E-8A79-4AC1-BED4-00A77BFF2930@oracle.com>
Message-ID: <553918E1.5070403@oracle.com>

Yes, you are right. I did not see that the test has this flag.
Okay then. Fix is good.

Thanks,
Vladimir

On 4/23/15 8:43 AM, Roland Westrelin wrote:
> Thanks for looking at this, Vladimir.
>
>> Should you use '&' and check all 3 digits since value of TypeProfileLevel is encoded as 3 digits?
>
> The test explicitly sets TypeProfileLevel=020 because that?s all it needs. So I check that it gets what it wants.
>
> Roland.
>

From roland.westrelin at oracle.com  Thu Apr 23 16:08:50 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 23 Apr 2015 18:08:50 +0200
Subject: RFR(XS): 8078444:
	compiler/arraycopy/TestArrayCopyNoInitDeopt.java fails with
	exception 'm2 not deoptimized'
In-Reply-To: <553918E1.5070403@oracle.com>
References: <C63D5DDA-C19A-4BA6-A026-3A811A331010@oracle.com>
	<553912CD.6010008@oracle.com>
	<AA27978E-8A79-4AC1-BED4-00A77BFF2930@oracle.com>
	<553918E1.5070403@oracle.com>
Message-ID: <48D4087F-232D-4CFD-A34A-B09E1BF794AD@oracle.com>

> Yes, you are right. I did not see that the test has this flag.
> Okay then. Fix is good.

Thanks for the review.

Roland.

From vladimir.kozlov at oracle.com  Thu Apr 23 19:24:37 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 23 Apr 2015 12:24:37 -0700
Subject: RFR 8076276 support for AVX512
In-Reply-To: <55271100.8080203@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DCE552@FMSMSX102.amr.corp.intel.com>	<55258337.2050605@oracle.com>	<C568518E7B433348B114B6A7122D474755DCEBDA@FMSMSX102.amr.corp.intel.com>	<55259078.1080309@oracle.com>	<C568518E7B433348B114B6A7122D474755DCED7C@FMSMSX102.amr.corp.intel.com>
	<55271100.8080203@oracle.com>
Message-ID: <553946F5.2090009@oracle.com>

Updated webrev:

http://cr.openjdk.java.net/~kvn/8076276/webrev.02

Passed JPRT testing.

Changes:

The assembler layer now handles KNL as well for EVEX, it's a target that 
will be available earlier than Skylake server.   This is done by 
carefully managing cpuid information and applying each machines 
characteristics to their code generation model.  I also added support 
for 32-bit compilation via the machine description which manage many of 
the same things in 64-bit with some additions for instruction size 
calculations, such as a static function which answers the question of 
displacement size for memory offsets.  You will see two versions, one 
which modifies the offset and answer the question of size range, another 
which statically takes all the equivalent object data as its dynamic 
counterpart as input to interpret if the displacement fits the motif. 
One is made to be run statically and one as part of assembler processing 
in its allocated object dynamically.  There is also a dummy region in 
32-bit register description of floating point registers which are used 
to stage regmask alignment for the xmm register bank on that target.  I 
do this so that I can use the same code for both compiler models wrt 
register mask handling of vector components.  Please also note the new 
long java tests in superword.  The afore mentioned zmm save region for 
OS vector testing was ported to run in KNL mode.  The call save regions 
have been extended for both compilation models to handle their 
respective register banks and are working correctly.

Thanks,
Michael

On 4/9/15 4:53 PM, Vladimir Kozlov wrote:
> Michael,
>
> Thank you for detail explanation. I need to clarify by request:
>
> 1. I am fine with kmov amd Kregister definitions and usage in assembler,
> macroassembler and stubs.
>
> 2. I don't want KRegister and Kmove in C2 code (opto/ and .ad files)
> until we have full support for them in RA and signal processing.
>
> Thanks,
> Vladimir
>
> On 4/9/15 4:02 PM, Berg, Michael C wrote:
>> Vladimir, some explanation of the EVEX encoding model is needed:
>>
>> Some instructions are agnostic to vector length and can take the
>> implicit k0 definition in encoding.  Some instructions must have
>> predication definitions for their mask application to SIMD, which
>> explicitly exclude k0. The range usage of predication mask registers
>> must be k1..k7 as a real definition which code must provide with a
>> mask value.  The EVEX enabled machine environment does not
>> automatically initialize any of the mask assignable registers
>> (k1..k7), so we must emit kmov instructions which gather an immediate
>> value from a gpr register.  You will see code such as this in the
>> review.  This effectively means KRegister must stay in the
>> implementation, but I can accommodate the lion share of what you have
>> indicated.  The places where KRegister is used via the assembler layer
>> are:
>>
>> src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265,
>> src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it
>> needs one too"
>> src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046
>>
>> This is in place of formal register allocation for now as well as when
>> we do more extravagant things with SIMD masks.  I will keep the webrev
>> around so I can easily add these pieces back in as we are going to
>> need them.
>> Also there are many other mask register instructions in the ISA which
>> we will need to make use of in the future.  If this is amenable I will
>> look into the other changes and resend the webrev accordingly modified.
>>
>> Thanks,
>> Michael
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, April 08, 2015 1:33 PM
>> To: Berg, Michael C
>> Cc: hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR 8076276 support for AVX512
>>
>> Michael, please, make sure to include mailing lists in replies - it is
>> review process.
>>
>> I understand that K register may be important but I don't see the need
>> to include it in these changes which are huge already. We can do it as
>> separate changes unless you point me where they are critical needed
>> for avx512 instructions.
>> I don't see the use of it in current changes which simple widen
>> vectors to 512 bits.
>>
>> I am concern that K reg implementation is incomplete but it is hard to
>> see and review it in current changes.
>>
>> Regards,
>> Vladimir
>>
>> On 4/8/15 1:09 PM, Berg, Michael C wrote:
>>> Vladimir, RegK is needed as it frames the kmov instructions which
>>> utilize KRegister and the enumerated k registers, which are critically
>>> needed and used, although not yet matched (we use k1 and k0 now).  I
>>> will look into to the rest of the comments.  The plan is to register
>>> allocate the k registers at some point though.
>>>
>>> Thanks,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev
>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of
>>> Vladimir Kozlov
>>> Sent: Wednesday, April 08, 2015 12:36 PM
>>> To: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR 8076276 support for AVX512
>>>
>>> I would suggest to remove MoveK and RegK from these changes since
>>> they are not used.
>>> We can add them later when you have the use case.
>>>
>>> sharedRuntime_x86_64.* You should have code and not comment:
>>> // TODO: add ZMM save code
>>>
>>> vm_version_x86.cpp Add code to verify that system preserve Z
>>> registers during interrupt. See code after comment :
>>>
>>> // Some OSs have a bug when upper 128bits of YMM
>>>
>>>
>>> I see repeated next pattern in C1 code. It should be moved to a
>>> function in FrameMap:
>>>
>>> +        int num_caller_save_xmm_regs =
>>> +FrameMap::nof_caller_save_xmm_regs;
>>> +#if _LP64
>>> +        if (UseAVX < 3) {
>>> +          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
>>> +        }
>>> +#endif
>>>
>>>
>>> In general we should avoid using #ifdef X86 in shared code:
>>> matcher.cpp. This file will not be issue if you remove RegK from
>>> changes.
>>>
>>> c2compiler.cpp - can you move that code to
>>> Compile::pd_compiler2_init() which is platform specific?
>>>
>>> matcher.cpp - typo 'eno':
>>>
>>> +    // For VecZ we need eno alignment and 64 bytes (16 slots) for
>>> spills.
>>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>
>>> On 4/6/15 6:35 PM, Berg, Michael C wrote:
>>>> Hi Folks,
>>>>
>>>> We (Intel) would like to contribute initial support for AVX512 (EVEX
>>>> encoding, new register support, new ISA support,
>>>> etc) for EVEX enabled microarchitectures.
>>>> The contribution is referenced as Bug ID 8076276 as a performance
>>>> enhancement.
>>>>
>>>> Please review this patch and comment as needed:
>>>>
>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>>>>
>>>> webrev:
>>>> http://cr.openjdk.java.net/~kvn/8076276/webrev
>>>>
>>>> Superword optimizations covered on the vectorization path experience
>>>> as much as 50% reduction in loop trace instruction count which make
>>>> up the path length of EVEX encoded SIMD optimized loops.
>>>>
>>>> Vladimir Koslov has offered to sponsor this patch.
>>>>

From john.r.rose at oracle.com  Fri Apr 24 00:08:26 2015
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 23 Apr 2015 17:08:26 -0700
Subject: TrustFinalNonStaticFields clarification
In-Reply-To: <CAHjP37GF11yoyKWHA=foDp+OO7X_ZxaSa5kUg5n=N2u-TVmr+w@mail.gmail.com>
References: <CAHjP37GF11yoyKWHA=foDp+OO7X_ZxaSa5kUg5n=N2u-TVmr+w@mail.gmail.com>
Message-ID: <4F0A070B-B411-4A6E-A4A0-286A0073FB48@oracle.com>

On Apr 20, 2015, at 7:58 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> Fixed the flag name in the subject.
> 
> On Fri, Apr 17, 2015 at 9:07 PM, Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
> Hi guys,
> 
> I'm hoping someone could clarify/confirm my understanding of this experimental flag's effects:
> 
> 1) final instance array length is constant propagated? Even if array is passed in as ctor arg rather than being instantiated in the ctor?
> 
If an array reference is a constant, its length is a constant, regardless of the experimental flag.

The same is true for any object's class, and for a variety of meta-data visible from methods on Class.

The value of a static final field is treated as a constant, regardless of the experimental flag.

If the experimental flag is true, a non-static final field is treated as a constant, *if* the containing object reference is a constant.

I.e., a getfield instruction can fold up if its input (the receiver) is constant, but only for final fields with the flag turned on.

The non-standard @Stable annotation also enables this same constant folding of getfield, but only if the field value is non-zero.

The non-standard @Stable annotation also enables some folding through array elements, again only if the element value is non-zero.
> 2) final instance fields seen as never null are forever considered as such? So even if a method call on that object is fully eliminated (e.g. the method is empty) no null check is left behind?
> 
Non-nullness is logically independent from being a constant, and so would be tracked differently.  We don't track this presently.

If any value is a constant of course null checks on it will fold up.
> 3) concrete runtime type of the instance field is propagated to uses and no additional type checks are done? Say the declared type is an interface/abstract with multiple implementations loaded but only one type stored in the field - is a type check eliminated and calls are fully devirtualized?
> 
Concrete type is also logically independent from being a constant.  Nothing to do with the experimental flag.

We track concrete types (at invoke sites, not at-rest data) using the type profiling mechanism.  This is enough for most purposes.
> 4) primitive type final fields have their value constant propagated if compiler sees only one value always stored?
> 
We do little or no profiling of primitive value ranges, either at operation sites (cmp, add, xor, aaload, etc.) or at rest (fields, array elements).  But if a primitive value is constant, we can fold it at compile time.
> 5) do derived classes and base class share field profile or not? For example subclasses always store concrete type but each subclass stores a different type from the others.
> 
Strictly speaking, field profiles are both fully shared and fully unshared.  This is possible because they are all empty.  That is, we don't have field profiles at present.  (Or am I missing a point here?  Is there a project I have forgotten about??)

> Also, there's been some talk about doing these optimizations automatically with invalidations builtin.  Just curious where that stands.
> 
One thing that folks have been talking about for a long time is "effectively final" optimizations.  This requires some profiling of fields (at least, to detect multiple putfields).  A field is "effectively final" if it could have been written with a "final" keyword, at least with reference to all code paths that are live.  We might reserve the right to make a field stop being "effectively final", because we don't have an airtight analysis that proves the "one putfield before any use" invariant.  (Such analyses are hard to come by in the JVM, given various dynamisms and also reflection.)  In such cases, where an optimization can be invalidated, we track dependencies.

HTH

? John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/8bf51c53/attachment-0001.html>

From michael.c.berg at intel.com  Fri Apr 24 00:53:09 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Fri, 24 Apr 2015 00:53:09 +0000
Subject: RFR 8078563 - add profitability tests for reductions
Message-ID: <C568518E7B433348B114B6A7122D474755DDE7A4@FMSMSX102.amr.corp.intel.com>

Hi Folks,

We (Intel) would like to add profitability tests to superword to gate scenarios where reduction optimization overhead is roughly equal to the benefit gained by vectorization.

We would like to do this for all x86 enabled microarchitectures that support reductions and superword.  This new constraint was tested on SSE and AVX (1,2) enabled platforms.
The contribution as referenced by RFR 8078563 is defined by the information at the links below.

Please review this bug entry and its code and comment as needed:

https://bugs.openjdk.java.net/browse/JDK-8078563


And its code and test addition (this is a small patch):


http://cr.openjdk.java.net/~kvn/8078563/webrev/


Vladimir Koslov has offered to sponsor this patch.

Thanks,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150424/209b8a4d/attachment.html>

From vitalyd at gmail.com  Fri Apr 24 01:51:47 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Thu, 23 Apr 2015 21:51:47 -0400
Subject: TrustFinalNonStaticFields clarification
In-Reply-To: <4F0A070B-B411-4A6E-A4A0-286A0073FB48@oracle.com>
References: <CAHjP37GF11yoyKWHA=foDp+OO7X_ZxaSa5kUg5n=N2u-TVmr+w@mail.gmail.com>
	<4F0A070B-B411-4A6E-A4A0-286A0073FB48@oracle.com>
Message-ID: <CAHjP37FhXf7sNP4MP_bTFC5oAm5=x7pq3-S1m81SyD_098_v1A@mail.gmail.com>

Hi John,

Thanks for the reply.  Besides static final fields, what else is considered
constant? For example, my intuitive understanding of this flag implies that
the following:

final class C {
     private final Object[] arr = new Object[10];

     int capacity () { return arr.length; }

     Object getFirst () { return arr [0]; }

}

capacity() should constant fold to return 10 and then constant propagated
into wherever it's inlined.  Instead when looking at generated asm, I see
the array length is loaded from the field.

getFirst () should not have range check but it does.

Both of these cases work as expected if I make arr static final.  So, how
do I make instance final behave in the above manner if not for this flag?
Note that I think the above should hold irrespective of whether the
enclosing instance is constant or not.

Likewise, here's another scenario.   Suppose I have a final field in an
object whose declared type is some abstract class.  I then have a "null
object" subclass (i.e. overriden methods are empty/nop).  At runtime, only
the null object subclass is loaded; CHA kicks in and effectively removes
the empty method invocation - great! Except, it leaves a pesky null check
behind.  But, if this field is final and set inside constructor only, why
still have the null check if I'm asking to trust instance final fields?

Thanks

sent from my phone
On Apr 23, 2015 8:08 PM, "John Rose" <john.r.rose at oracle.com> wrote:

> On Apr 20, 2015, at 7:58 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
>
> Fixed the flag name in the subject.
>
> On Fri, Apr 17, 2015 at 9:07 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>> Hi guys,
>>
>> I'm hoping someone could clarify/confirm my understanding of this
>> experimental flag's effects:
>>
>> 1) final instance array length is constant propagated? Even if array is
>> passed in as ctor arg rather than being instantiated in the ctor?
>>
> If an array reference is a constant, its length is a constant, regardless
> of the experimental flag.
>
> The same is true for any object's class, and for a variety of meta-data
> visible from methods on Class.
>
> The value of a static final field is treated as a constant, regardless of
> the experimental flag.
>
> If the experimental flag is true, a non-static final field is treated as a
> constant, *if* the containing object reference is a constant.
>
> I.e., a getfield instruction can fold up if its input (the receiver) is
> constant, but only for final fields with the flag turned on.
>
> The non-standard @Stable annotation also enables this same constant
> folding of getfield, but only if the field value is non-zero.
>
> The non-standard @Stable annotation also enables some folding through
> array elements, again only if the element value is non-zero.
>
> 2) final instance fields seen as never null are forever considered as
>> such? So even if a method call on that object is fully eliminated (e.g. the
>> method is empty) no null check is left behind?
>>
> Non-nullness is logically independent from being a constant, and so would
> be tracked differently.  We don't track this presently.
>
> If any value is a constant of course null checks on it will fold up.
>
> 3) concrete runtime type of the instance field is propagated to uses and
>> no additional type checks are done? Say the declared type is an
>> interface/abstract with multiple implementations loaded but only one type
>> stored in the field - is a type check eliminated and calls are fully
>> devirtualized?
>>
> Concrete type is also logically independent from being a constant.
> Nothing to do with the experimental flag.
>
> We track concrete types (at invoke sites, not at-rest data) using the type
> profiling mechanism.  This is enough for most purposes.
>
> 4) primitive type final fields have their value constant propagated if
>> compiler sees only one value always stored?
>>
> We do little or no profiling of primitive value ranges, either at
> operation sites (cmp, add, xor, aaload, etc.) or at rest (fields, array
> elements).  But if a primitive value is constant, we can fold it at compile
> time.
>
> 5) do derived classes and base class share field profile or not? For
>> example subclasses always store concrete type but each subclass stores a
>> different type from the others.
>>
> Strictly speaking, field profiles are both fully shared and fully
> unshared.  This is possible because they are all empty.  That is, we don't
> have field profiles at present.  (Or am I missing a point here?  Is there a
> project I have forgotten about??)
>
> Also, there's been some talk about doing these optimizations automatically
>> with invalidations builtin.  Just curious where that stands.
>>
> One thing that folks have been talking about for a long time is
> "effectively final" optimizations.  This requires some profiling of fields
> (at least, to detect multiple putfields).  A field is "effectively final"
> if it could have been written with a "final" keyword, at least with
> reference to all code paths that are live.  We might reserve the right to
> make a field stop being "effectively final", because we don't have an
> airtight analysis that proves the "one putfield before any use" invariant.
>  (Such analyses are hard to come by in the JVM, given various dynamisms and
> also reflection.)  In such cases, where an optimization can be invalidated,
> we track dependencies.
>
> HTH
>
> ? John
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/7ed12f6e/attachment.html>

From john.r.rose at oracle.com  Fri Apr 24 03:03:19 2015
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 23 Apr 2015 20:03:19 -0700
Subject: TrustFinalNonStaticFields clarification
In-Reply-To: <CAHjP37FhXf7sNP4MP_bTFC5oAm5=x7pq3-S1m81SyD_098_v1A@mail.gmail.com>
References: <CAHjP37GF11yoyKWHA=foDp+OO7X_ZxaSa5kUg5n=N2u-TVmr+w@mail.gmail.com>
	<4F0A070B-B411-4A6E-A4A0-286A0073FB48@oracle.com>
	<CAHjP37FhXf7sNP4MP_bTFC5oAm5=x7pq3-S1m81SyD_098_v1A@mail.gmail.com>
Message-ID: <51C323F0-7A1A-4D83-AA8A-AF10C8F690B7@oracle.com>

On Apr 23, 2015, at 6:51 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> if this field is final and set inside constructor only, why still have the null check if I'm asking to trust instance final fields?
> 

The answer to all of your questions is that we don't profile fields.

We also don't do class-wide analysis, since reflection can break the results of such analysis.

The JIT only trusts field values that it can observe directly, either by compiling the sets in the same compilation unit, or by folding a constant object reference.

? John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/cddc3f76/attachment-0001.html>

From vitalyd at gmail.com  Fri Apr 24 03:15:34 2015
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Thu, 23 Apr 2015 23:15:34 -0400
Subject: TrustFinalNonStaticFields clarification
In-Reply-To: <51C323F0-7A1A-4D83-AA8A-AF10C8F690B7@oracle.com>
References: <CAHjP37GF11yoyKWHA=foDp+OO7X_ZxaSa5kUg5n=N2u-TVmr+w@mail.gmail.com>
	<4F0A070B-B411-4A6E-A4A0-286A0073FB48@oracle.com>
	<CAHjP37FhXf7sNP4MP_bTFC5oAm5=x7pq3-S1m81SyD_098_v1A@mail.gmail.com>
	<51C323F0-7A1A-4D83-AA8A-AF10C8F690B7@oracle.com>
Message-ID: <CAHjP37E_XzSjzmfM2bRByiCdu4Di8_00c3jmfN60qLYJfA_0Lw@mail.gmail.com>

Ok, but in the array case what's there to profile? At compile or parse time
isn't the array length known since it's set internally in the class?
Perhaps you're still calling this profiling, but to me it sounds like
keeping simple metadata about the fields.  The only thing that could break
this is reflection, but I naively thought enabling the flag would make
compiler more confident (I see now that this isn't the case).  The other
case here is tracking whether final field is nullable or not; lots of
constructors do a null check and throw if they see a null.  It seems, on
the surface, fairly easy to detect that and then "stamp" the field as known
not null.

How difficult would it be to profile or keep some metadata about final
fields that can be gathered at parse time, taking reflection out of the
equation for a second? It seems like some additional optimization
opportunities are missed now as a result of this.

Thanks

sent from my phone
On Apr 23, 2015 11:03 PM, "John Rose" <john.r.rose at oracle.com> wrote:

> On Apr 23, 2015, at 6:51 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
>
> if this field is final and set inside constructor only, why still have the
> null check if I'm asking to trust instance final fields?
>
>
> The answer to all of your questions is that we don't profile fields.
>
> We also don't do class-wide analysis, since reflection can break the
> results of such analysis.
>
> The JIT only trusts field values that it can observe directly, either by
> compiling the sets in the same compilation unit, or by folding a constant
> object reference.
>
> ? John
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/1fed8539/attachment.html>

From roland.westrelin at oracle.com  Fri Apr 24 08:03:26 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Fri, 24 Apr 2015 10:03:26 +0200
Subject: RFR(S) 8077504: Unsafe load can loose control dependency and
	cause crash
In-Reply-To: <95F149F4-7BE2-45C8-A3C0-DEACFE42ACDF@oracle.com>
References: <42EDECBF-2347-406A-8835-29A98F1CA153@oracle.com>
	<552FC216.4010503@redhat.com>
	<95F149F4-7BE2-45C8-A3C0-DEACFE42ACDF@oracle.com>
Message-ID: <6AC5258A-A40C-4121-8B1F-0076713345D4@oracle.com>


> Vladimir suggested privately to set _depends_only_on_test to true in the constructor and then use an explicit call to a new a method set_depends_only_on_test() to set it to false in the rare cases where it?s needed. That feels better indeed. What do you think?

Actually, using a set_depends_only_on_test() method doesn?t work well. In LibraryCallKit::inline_unsafe_access() the node returned by make_load() may have been transformed already and we could call set_depends_only_on_test() on a node that doesn?t need to be pinned. The call to set_depends_only_on_test() would have to be in LoadNode::make(). I went with default parameters instead to keep the change small:

http://cr.openjdk.java.net/~roland/8077504/webrev.01/

Roland.

From zoltan.majo at oracle.com  Fri Apr 24 11:49:28 2015
From: zoltan.majo at oracle.com (=?windows-1252?Q?Zolt=E1n_Maj=F3?=)
Date: Fri, 24 Apr 2015 13:49:28 +0200
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <553816F9.1040104@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<551BF4D3.90805@oracle.com>	<5537CBAE.9020500@oracle.com>
	<553816F9.1040104@oracle.com>
Message-ID: <553A2DC8.3080804@oracle.com>

Hi Vladimir,


thank you for the feedback!

On 04/22/2015 11:47 PM, Vladimir Kozlov wrote:
> Looks like 2 issues left.
>
> First, after discussion on mailing list lets set the flag off by 
> default and ask SQE to add it to rotation flags for Nightly testing 
> (after you push changes).

OK, I set the flag to off on all architectures and will notify SQE about 
it once the change has been pushed.

> Second, I am concern about vframeStreamForte() call change. Call stack 
> may have several Call stubs. They should be skipped to get all java 
> frames on stack as we do in other places. Otherwise we can get 
> incorrect profiling information.

I changed vframeStreamForte back to the way as it was (to *not* stop at 
call stubs and return all Java frames). But I removed the if-block for 
the case the first Java frame found is not decipherable (lines 407--427 
in forte.cpp). Here are the reasons for that.

The forte_fill_call_trace_given_top() function returns a list of methods 
to the profiler (the list of Java methods that have been found on the 
stack). For every method, two pieces of information are returned (in an 
ASGCT_CallFrame structure):

- (1) the jmethodID corresponding to that method and
- (2) debug information (the line number where execution was in the 
method when the profiler interrupted the VM).

(1) The jmethodID is *critical* for forte_fill_call_trace_given_top() 
because it provides information about the method that a frames belongs 
to. If the jmethodID is not available (because no Method* was obtained 
by looking at that frame or the Method* found is invalid), the VM stops 
walking the stack and returns right away.

(2) The debug information is *not critical*: If there is no BCI 
corresponding to a PC stored within a frame, the line number in in 
ASGCT_CallFrame is set to -1 for interpreted/compiled frames and to -3 
for native frames (as described on lines 512--518 in forte.cpp) and the 
stack is walked further.

The stack is walked using vframeStreamForte. The vframeStreamForte 
object is created from the first Java frame available on the stack.

In the original version of the code, if the first Java frame is not 
decipherable (i.e., there is no debug information available for it), the 
first Java frame is *handled separately* (lines 407--427 in forte.cpp) 
and the vframeStreamForte object is created from the *sender* of the 
first Java frame (see reasons below).

The original version of the code assumes that the sender of the first 
Java frame is also a Java frame. But that is not always the case, as the 
calling frame can be also something else, for example a call stub. Using 
a call stub as a Java frame can result in a failure.

The *reason for separately handling* the first Java frame if the frame 
is not decipherable is to avoid triggering the assert in 
found_bad_method_frame() (line 443 in vframe.hpp). The function 
found_bad_method_frame() can be called in two ways:

- vframeStreamForte::vframeStreamForte -> vframeStreamForte::fill_frame 
-> vframeStreamCommon::fill_from_interpreter_frame -> 
found_bad_method_frame()
- vframeStreamForte::vframeStreamForte -> vframeStreamForte::fill_frame 
-> vframeStreamCommon::fill_from_compiled_frame -> found_bad_method_frame()

In a non-product build, the assert fails if no debug information is 
available for an interpreted/compiled frame. In a product build, 
however, a Method* is returned also for interpreted/compiled frames with 
no debug information and filling the stack trace continues.

I think the approach in product builds is better, because critical 
information (the JmethodID of all stack frames) is returned to 
StunStudio even if non-critical debug information is not available for 
some frames in the trace. The fastdebug build, however, crashes, if it 
encounters a single frame with no debug information.

Here is the newest webrev: 
http://cr.openjdk.java.net/~zmajo/8068945/webrev.03/

Files updated relative to the previous webrev (webrev.02):
- forte.cpp
- globals_x86.hpp

Testing:
- JPRT: all tests pass; tested with PreserveFramePointer enabled on x64
- Repeated all SunStudio experiments described for webrev.02 (two tests 
w/ and w/o lamba forms, executed +/-PreserveFramePointer, w/ and w/o -Xcomp)
- all java/lang/invoke and compiler tests: all tests that pass with the 
unmodified source tree pass with the changes as well.

I ran the experiments *with the assert enabled* in 
vframeStreamCommon::found_bad_method_frame(). The assert was not triggered.

I would be, however, inclined to disable that assert because:
- (1) it is better to have some information in stack traces than no 
information and a crash (even though the crash happens in a fastdebug 
build);
- (2) the assert is there only for SunStudio (according to the comments 
on lines 439--442 in vframe.cpp) and AsyncGetCallTrace (the only place 
where forte_fill_call_trace_given_top() is called) does not expect debug 
information to available for all frames, it just needs the jmethodID for 
every frame.

> Do you mean "can NOT happen"?:
> > Moreover, if the stack is walked synchronously (e.g., at 
> safepoints), no
> > problems appear either, because the synchronous interruption can happen
> > while execution is within the method handle intrinsic.

Yes, I meant that it can *not* happen. Sorry for the confusion.

Thank you and best regards,


Zoltan

>
> Vladimir
>
> On 4/22/15 9:26 AM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> I managed to do some more work on this enhancement. Please see details
>> below.
>>
>> On 04/01/2015 03:38 PM, Zolt?n Maj? wrote:
>>> Hi Vladimir,
>>>
>>>
>>> On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
>>>> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>>
>>>>> thank you for the feedback!
>>>>>
>>>>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>>>>> How about PreserveFramePointer instead of simple FramePointer?
>>>>>>
>>>>>> PreserveFramePointer will mean that compiled (or other) code will 
>>>>>> use
>>>>>> that register only as Frame pointer.
>>>>>
>>>>> I will change the flag's name to PreserveFramePointer and will also
>>>>> update the description.
>>
>> I changed the flag's name to PreserveFramePointer, just as you 
>> suggested.
>>
>>>>>
>>>>>> Zoltan, x86 flags setting should be in general globals_x86.hpp. You
>>>>>> can #ifdef _LP64 there too. I don't understand why you only set 
>>>>>> it to
>>>>>> true on linux-x64.
>>>>>
>>>>> I remembered that the original discussion with Brendan Gregg 
>>>>> mentioned
>>>>> only Linux's perf tool as a possible use case for "proper" frame
>>>>> pointers. So I was unsure whether to enable proper frame pointers by
>>>>> default on other x64 platforms as well.
>>>>>
>>>>> But if you think it would be better to have proper frame pointers on
>>>>> all
>>>>> x64 platforms, I will change the code to set PreserveFramePointer to
>>>>> true for all x64 platforms. Just please let me know.
>>
>> The current webrev sets the PreserveFramePointer flag to to true on all
>> x86_64 platforms and to false on all other platforms.
>>
>>>>
>>>> Currently compiled code for all x86 platforms is almost the same
>>>> (win64 has difference in registers usage) and we should keep it that
>>>> way.
>>>>
>>>> Also the original request was to have flag to enable such behavior
>>>> (use RBP only as FP). So to have it off by default is acceptable. If
>>>> performance group or someone find a regression (or bug) due to this
>>>> change we can switch the flag off by default before jdk9 release.
>>>>
>>>> Try to run pstack on Solaris and jstack on OSX to make sure they
>>>> report correct call stack with compiled java methods. And JFR.
>>>> Also it would be nice to run SunStudio analyzer to verify that it 
>>>> works.
>>>
>>> I ran all tools you've suggested. JFR and jstack is unaffected, pstack
>>> produces nice stack traces (it did not always do so before).
>>
>> I tested the current webrev with the following setup: I used two tests,
>> one that generates a long chain of lambda form invocations and an other
>> one that generates a long chain of "regular" method invocations. Both
>> tests were executed on an x64 machine in four configurations: with +/-
>> Xcomp and with +/- PreserveFramePointer.
>>
>> Just as before, JFR and jstack stack traces are unaffected for both
>> tests, pstack can now produce stack traces with both tests if
>> PreserveFramePointer is enabled.
>>
>>> However, I've encountered a problem with SunStudio: Two asserts fail
>>> in the fastdebug build. Both of them  "soft" failures, as neither the
>>> VM nor SunStudio crash with the product build. I worked on the problem
>>> today and have a partial understanding of the issue, but more
>>> investigation is needed to have a patch that preserves the correct
>>> behavior of SunStudio as well.
>>
>> I was able to track down the problems with SunStudio. I had to change
>> the code at two places.
>>
>>
>> Change #1 (in src/cpu/x86/vm/frame_x86.cpp):
>>
>> *** 222,232 ****
>>        }
>>
>>        if (sender_blob->is_nmethod()) {
>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>            if (nm != NULL) {
>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>> nm->is_deopt_entry(sender_pc)) {
>>                    return false;
>>                }
>>            }
>>        }
>>
>> --- 222,233 ----
>>        }
>>
>>        if (sender_blob->is_nmethod()) {
>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>            if (nm != NULL) {
>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>> nm->is_deopt_entry(sender_pc) ||
>> ! nm->method()->is_method_handle_intrinsic()) {
>>                    return false;
>>                }
>>            }
>>        }
>>
>> The reason for this change is the following. Method handle intrinsics
>> (i.e., the intrinsics _invokeBasic, _linkToVirtual,_linkToStatic,
>> _linkToSpecial, and _linkToInterface) do not allocate stack space when
>> invoked, but they can extend the stack space of their caller 
>> "temporarily".
>>
>> For example, if VerifyMethodHandles is enabled, some stack space is used
>> during verification. The temporarily used stack space is released before
>> the intrinsic jumps to its target. As a result, the target of a method
>> handle intrinsic will have a correct SP when it returns and the
>> program's control flow is correct.
>>
>> Moreover, if the stack is walked synchronously (e.g., at safepoints), no
>> problems appear either, because the synchronous interruption can happen
>> while execution is within the method handle intrinsic.
>>
>> The problem is that the SunStudio analyzer can interrupt the VM
>> asynchonously and walk the stack. If execution of a thread is
>> interrupted while the thread is in a method handle intrinsic, the SP
>> might contain an invalid value.
>>
>> The new webrev adds a check that marks the current frame unsafe for
>> sender if the frame belongs to a method handle intrinsic
>> (frame::safe_for_sender returns false in this case).
>>
>>
>> Change #2 (in src/share/vm/prims/forte.cpp):
>>
>> *** 425,435 ****
>>
>>        RegisterMap map(thd, false);
>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>      }
>>
>> !   vframeStreamForte st(thd, initial_Java_frame, false);
>>
>>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>>        bci = st.bci();
>>        method = st.method();
>>
>> --- 425,435 ----
>>
>>        RegisterMap map(thd, false);
>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>      }
>>
>> !   vframeStreamForte st(thd, initial_Java_frame, true);
>>
>>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>>        bci = st.bci();
>>        method = st.method();
>>
>> The problem is that the following assert in forte.cpp on line 103
>>
>> assert(filled_in, "invariant");
>>
>> fails. The problem appears if we have a stack trace like:
>>
>> V  [libjvm.so+0x1c98c4a]  void VMError::report(outputStream*)+0xb1a
>> V  [libjvm.so+0x1c9a3e8]  void VMError::report_and_die()+0x748
>> V  [libjvm.so+0x1003c8e]  void report_vm_error(const char*,int,const
>> char*,const char*)+0x7e
>> V  [libjvm.so+0x10efa22]  vframeStreamForte::vframeStreamForte #Nvariant
>> 1(JavaThread*,frame,bool)+0xe2
>> --> (Frame #5) V  [libjvm.so+0x10f0bb9]  void
>> forte_fill_call_trace_given_top(JavaThread*,ASGCT_CallTrace*,int,frame)+0x789 
>>
>>
>> V  [libjvm.so+0x10f1436]  AsyncGetCallTrace+0x246
>> C  [libcollector.so+0x272a8]  __collector_ext_jstack_unwind+0xb8
>> C  [libcollector.so+0x277df]  __collector_get_frame_info+0x27f
>> C  [libcollector.so+0x2f093]  __collector_getUserCtx+0x13
>> C  [libcollector.so+0x1abc7] __collector_ext_profile_handler+0x127
>> C  [libcollector.so+0x17535]  collector_sigprof_dispatcher+0x85
>> C  [libc.so.1+0x122476]  __sighndlr+0x6
>> C  [libc.so.1+0x115972]  call_user_handler+0x2ce
>> C  [libc.so.1+0x115e1b]  sigacthandler+0xdb
>> C  0xffffffffffffffff
>> V  [libjvm.so+0x1959089]  void os::PlatformEvent::park()+0xd9
>> V  [libjvm.so+0x18a6b34]  int ParkCommon(ParkEvent*,long)+0x34
>> V  [libjvm.so+0x18a7657]  int Monitor::IWait(Thread*,long)+0xb7
>> V  [libjvm.so+0x18a8b86]  bool Monitor::wait(bool,long,bool)+0x346
>> V  [libjvm.so+0xf7073d]  void
>> CompileBroker::wait_for_completion(CompileTask*)+0xad
>> V  [libjvm.so+0xf6f6b6]  void
>> CompileBroker::compile_method_base(methodHandle,int,int,methodHandle,int,const 
>>
>> char*,Thread*)+0x406
>> V  [libjvm.so+0xf6fd96]
>> nmethod*CompileBroker::compile_method(methodHandle,int,int,methodHandle,int,const 
>>
>> char*,Thread*)+0x586
>> V  [libjvm.so+0xbbad72]  void
>> AdvancedThresholdPolicy::submit_compile(methodHandle,int,CompLevel,JavaThread*)+0xb2 
>>
>>
>> V  [libjvm.so+0x1aef92f]  void
>> SimpleThresholdPolicy::compile(methodHandle,int,CompLevel,JavaThread*)+0x14f 
>>
>>
>> V  [libjvm.so+0xbbb00f]  void
>> AdvancedThresholdPolicy::method_invocation_event(methodHandle,methodHandle,CompLevel,nmethod*,JavaThread*)+0x1ff 
>>
>>
>> V  [libjvm.so+0x1aef765]
>> nmethod*SimpleThresholdPolicy::event(methodHandle,methodHandle,int,int,CompLevel,nmethod*,JavaThread*)+0x2e5 
>>
>>
>> V  [libjvm.so+0xdb93bc]  unsigned
>> char*Runtime1::counter_overflow(JavaThread*,int,Method*)+0x31c
>> v  ~RuntimeStub::counter_overflow Runtime1 stub
>> --> (Frame #29) J 143 C1
>> java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @
>> 0xffff80ffacc6962a [0xffff80ffacc69580+0xaa]
>> --> (Frame #30) v  ~StubRoutines::call_stub
>> V  [libjvm.so+0x13ca50b]  void
>> JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0x41b 
>>
>>
>> V  [libjvm.so+0x152a111]  JVM_DoPrivileged+0xfb1
>> C  [libjava.so+0x12f42]
>> Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x12 
>>
>>
>> J 142
>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; 
>>
>> (0 bytes) @ 0xffff80ffb400c57c [0xffff80ffb400c420+0x15c\
>> ]
>> J 134 C1
>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;
>> (47 bytes) @ 0xffff80ffacc66014 [0xffff80ffacc65ec0+0x154]
>> ... more stack frames
>>
>> The forte_fill_call_trace_given_top() method (Frame #5) first checks if
>> the first Java frame found is fully decipherable (line 395 in
>> forte.cpp). In our case the first Java frame is Frame #29 (the
>> C1-compiled version of java.net.URLClassLoader$1.run).
>>
>> In our case Frame #29 is not decipherable, because
>> java.net.URLClassLoader$1.run has been made "not entrant" (a C2-compiled
>> version of the same method has been produced shortly before).
>>
>> Afterwards, forte_fill_call_trace_given_top() checks if the method is
>> "safe for sender" (line 424 in forte.cpp). The caller of the
>> java.net.URLClassLoader$1.run method is ~StubRoutines::call_stub, which
>> is considered "safe for sender" by the VM.
>>
>> Then, initial_Java_frame is set to the ~StubRoutines::call_stub stub
>> (line 430). This does not seem to be correct because the stub is not a
>> Java method and causes the assert(filled_in, "invariant") in the
>> constructor of vframeStreamForte (line 103 in forte.cpp) to fail
>> (because the frame cannot be filled from a stub).
>>
>> To avoid this failure, I propose to call the constructor of
>> vframeStreamForte with parameter stop_at_java_call_stub set to true
>> (instead of false) so that the VM stops walking the stack if a call stub
>> has been reached.
>>
>>
>> Here is the updated webrev:
>>
>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.02/
>>
>> In addition to testing the changeset with the tools mentioned before, I
>> executed
>> - all JPRT tests, all pass;
>> - all java/lang/invoke and compiler JTREG tests; all tests that pass
>> with the unmodified source trace pass with the changes as well.
>>
>> Thank you very much in advance!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> So that will put this RFR on hold for a while, unfortunately.
>>>
>>> Thank you for the feedback and suggestions so far!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>>>>> Hi Ed,
>>>>>>>
>>>>>>>
>>>>>>> thank you for your feedback! Please see comments below.
>>>>>>>
>>>>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>>>>> Hi Zolt?n,
>>>>>>>>
>>>>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler
>>>>>>>>> tests and
>>>>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32. All
>>>>>>>>> tests
>>>>>>>>> that pass without the patch pass also with the patch.
>>>>>>>>>
>>>>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>>>>> infrastructure for
>>>>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>>>>> statistically significant performance degradation due to having
>>>>>>>>> proper
>>>>>>>>> frame pointers. Therefore I propose to have OmitFramePointer 
>>>>>>>>> set to
>>>>>>>>> false by default on x86_64 (and set to true on all other
>>>>>>>>> platforms).
>>>>>>>> This patch looks good, however I think there is a problem with the
>>>>>>>> logic of OmitFramePointer.
>>>>>>>>
>>>>>>>> Here is my test case.
>>>>>>>>
>>>>>>>> --- CUT HERE ---
>>>>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>>>>
>>>>>>>> public class fibo {
>>>>>>>>      public static void main(String args[]) {
>>>>>>>>     int N = Integer.parseInt(args[0]);
>>>>>>>>     System.out.println(fib(N));
>>>>>>>>      }
>>>>>>>>      public static int fib(int n) {
>>>>>>>>     if (n < 2) return(1);
>>>>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>>>>      }
>>>>>>>> }
>>>>>>>> --- CUT HERE ---
>>>>>>>>
>>>>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>>>>
>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>> -XX:+PrintCompilation
>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>
>>>>>>>> I get
>>>>>>>>
>>>>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>    0x00007fc625071100: mov    %eax,-0x14000(%rsp)
>>>>>>>>    0x00007fc625071107: push   %rbp
>>>>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>
>>>>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is
>>>>>>>> using
>>>>>>>> the frame pointer
>>>>>>>>
>>>>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>>>>> -XX:+OmitFramePointer in the above I get
>>>>>>>>
>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>> -XX:+PrintCompilation
>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>
>>>>>>>> I get
>>>>>>>>
>>>>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>    0x00007f14e1071100: mov    %eax,-0x14000(%rsp)
>>>>>>>>    0x00007f14e1071107: push   %rbp
>>>>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>
>>>>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it does
>>>>>>>> not
>>>>>>>> use a frame pointer.
>>>>>>>>
>>>>>>>> However, if I now delete the -XX:+/-OmitFramePointer 
>>>>>>>> altogether, IE
>>>>>>>>
>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>> -XX:+PrintCompilation
>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>> -XX:+PrintAssembly fibo 43
>>>>>>>>
>>>>>>>> I get
>>>>>>>>
>>>>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>    0x00007f0c75071100: mov    %eax,-0x14000(%rsp)
>>>>>>>>    0x00007f0c75071107: push   %rbp
>>>>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>
>>>>>>>> It is not using a frame pointer which is the equivalent of
>>>>>>>> -XX:+OmitFramePointer. However in your description above you say
>>>>>>>>
>>>>>>>>> Therefore I propose to have OmitFramePointer set to false by
>>>>>>>>> default
>>>>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>>>> whereas OmitFramePointer actually seems to be set to true on 
>>>>>>>> x86_64
>>>>>>>>
>>>>>>>> I think the problem may be with the declaration and definition of
>>>>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>>>>
>>>>>>>> In globals.hpp it does
>>>>>>>>
>>>>>>>> product(bool, OmitFramePointer, true,
>>>>>>>>
>>>>>>>> In globals_x86.hpp it does
>>>>>>>>
>>>>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>>>>
>>>>>>>> I am not sure that you can mix product(...) and product_pd(...) 
>>>>>>>> like
>>>>>>>> this, so I think it just ends up getting the default from the
>>>>>>>> product(...).
>>>>>>>
>>>>>>> You are right, mixing product and product_pd does not make sense
>>>>>>> at all.
>>>>>>> Thank you for doing additional testing and for drawing attention
>>>>>>> to the
>>>>>>> problem.
>>>>>>>
>>>>>>> I updated the code to use product_pd and define_pd_global on all
>>>>>>> relevant platforms.
>>>>>>>
>>>>>>>> Aside: In general, I do not like options which include a 
>>>>>>>> negative in
>>>>>>>> them because I have to do a double think when I see something 
>>>>>>>> like,
>>>>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer,
>>>>>>>> therefore it is using a frame pointer. How about FramePointer 
>>>>>>>> so we
>>>>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>>>>> -XX:-FramePointer to say I don't.
>>>>>>>
>>>>>>> That is a good idea. Double negation is an unnecessary
>>>>>>> complication, so
>>>>>>> I changed the name of the flag to FramePointer, just as you
>>>>>>> suggested.
>>>>>>>
>>>>>>>>
>>>>>>>> I did some timing on the above 'fibo' test
>>>>>>>>
>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>> -XX:-OmitFramePointer fibo 43
>>>>>>>> 701408733
>>>>>>>>
>>>>>>>> real    0m1.545s
>>>>>>>> user    0m1.571s
>>>>>>>> sys    0m0.015s
>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>> -XX:+OmitFramePointer fibo 43
>>>>>>>> 701408733
>>>>>>>>
>>>>>>>> real    0m1.504s
>>>>>>>> user    0m1.527s
>>>>>>>> sys    0m0.019s
>>>>>>>>
>>>>>>>> which is ~3% difference on this test case. On aarch64, I see ~7%
>>>>>>>> difference on this test case.
>>>>>>>
>>>>>>> Thank you for the performance measurements!
>>>>>>>
>>>>>>>> With the above change to fix the logic of OmitFramePointer (and
>>>>>>>> possible change its name) the patch looks good to me.
>>>>>>>
>>>>>>> Here is the updated webrev (the same webrev that was already 
>>>>>>> included
>>>>>>> into my reply to Roland):
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>>>>
>>>>>>>> I will prepare a mirror patch for aarch64.
>>>>>>>
>>>>>>> That would be great!
>>>>>>>
>>>>>>> Thank you and best regards,
>>>>>>>
>>>>>>>
>>>>>>> Zolt?n
>>>>>>>
>>>>>>>>
>>>>>>>> All the best,
>>>>>>>> Ed.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>


From vladimir.kozlov at oracle.com  Fri Apr 24 16:26:54 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 24 Apr 2015 09:26:54 -0700
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <553A2DC8.3080804@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<551BF4D3.90805@oracle.com>	<5537CBAE.9020500@oracle.com>	<553816F9.1040104@oracle.com>
	<553A2DC8.3080804@oracle.com>
Message-ID: <553A6ECE.8090908@oracle.com>

Yes, this looks good. Nice work and thank you for testing.

Regards,
Vladimir

On 4/24/15 4:49 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> thank you for the feedback!
>
> On 04/22/2015 11:47 PM, Vladimir Kozlov wrote:
>> Looks like 2 issues left.
>>
>> First, after discussion on mailing list lets set the flag off by
>> default and ask SQE to add it to rotation flags for Nightly testing
>> (after you push changes).
>
> OK, I set the flag to off on all architectures and will notify SQE about
> it once the change has been pushed.
>
>> Second, I am concern about vframeStreamForte() call change. Call stack
>> may have several Call stubs. They should be skipped to get all java
>> frames on stack as we do in other places. Otherwise we can get
>> incorrect profiling information.
>
> I changed vframeStreamForte back to the way as it was (to *not* stop at
> call stubs and return all Java frames). But I removed the if-block for
> the case the first Java frame found is not decipherable (lines 407--427
> in forte.cpp). Here are the reasons for that.
>
> The forte_fill_call_trace_given_top() function returns a list of methods
> to the profiler (the list of Java methods that have been found on the
> stack). For every method, two pieces of information are returned (in an
> ASGCT_CallFrame structure):
>
> - (1) the jmethodID corresponding to that method and
> - (2) debug information (the line number where execution was in the
> method when the profiler interrupted the VM).
>
> (1) The jmethodID is *critical* for forte_fill_call_trace_given_top()
> because it provides information about the method that a frames belongs
> to. If the jmethodID is not available (because no Method* was obtained
> by looking at that frame or the Method* found is invalid), the VM stops
> walking the stack and returns right away.
>
> (2) The debug information is *not critical*: If there is no BCI
> corresponding to a PC stored within a frame, the line number in in
> ASGCT_CallFrame is set to -1 for interpreted/compiled frames and to -3
> for native frames (as described on lines 512--518 in forte.cpp) and the
> stack is walked further.
>
> The stack is walked using vframeStreamForte. The vframeStreamForte
> object is created from the first Java frame available on the stack.
>
> In the original version of the code, if the first Java frame is not
> decipherable (i.e., there is no debug information available for it), the
> first Java frame is *handled separately* (lines 407--427 in forte.cpp)
> and the vframeStreamForte object is created from the *sender* of the
> first Java frame (see reasons below).
>
> The original version of the code assumes that the sender of the first
> Java frame is also a Java frame. But that is not always the case, as the
> calling frame can be also something else, for example a call stub. Using
> a call stub as a Java frame can result in a failure.
>
> The *reason for separately handling* the first Java frame if the frame
> is not decipherable is to avoid triggering the assert in
> found_bad_method_frame() (line 443 in vframe.hpp). The function
> found_bad_method_frame() can be called in two ways:
>
> - vframeStreamForte::vframeStreamForte -> vframeStreamForte::fill_frame
> -> vframeStreamCommon::fill_from_interpreter_frame ->
> found_bad_method_frame()
> - vframeStreamForte::vframeStreamForte -> vframeStreamForte::fill_frame
> -> vframeStreamCommon::fill_from_compiled_frame -> found_bad_method_frame()
>
> In a non-product build, the assert fails if no debug information is
> available for an interpreted/compiled frame. In a product build,
> however, a Method* is returned also for interpreted/compiled frames with
> no debug information and filling the stack trace continues.
>
> I think the approach in product builds is better, because critical
> information (the JmethodID of all stack frames) is returned to
> StunStudio even if non-critical debug information is not available for
> some frames in the trace. The fastdebug build, however, crashes, if it
> encounters a single frame with no debug information.
>
> Here is the newest webrev:
> http://cr.openjdk.java.net/~zmajo/8068945/webrev.03/
>
> Files updated relative to the previous webrev (webrev.02):
> - forte.cpp
> - globals_x86.hpp
>
> Testing:
> - JPRT: all tests pass; tested with PreserveFramePointer enabled on x64
> - Repeated all SunStudio experiments described for webrev.02 (two tests
> w/ and w/o lamba forms, executed +/-PreserveFramePointer, w/ and w/o
> -Xcomp)
> - all java/lang/invoke and compiler tests: all tests that pass with the
> unmodified source tree pass with the changes as well.
>
> I ran the experiments *with the assert enabled* in
> vframeStreamCommon::found_bad_method_frame(). The assert was not triggered.
>
> I would be, however, inclined to disable that assert because:
> - (1) it is better to have some information in stack traces than no
> information and a crash (even though the crash happens in a fastdebug
> build);
> - (2) the assert is there only for SunStudio (according to the comments
> on lines 439--442 in vframe.cpp) and AsyncGetCallTrace (the only place
> where forte_fill_call_trace_given_top() is called) does not expect debug
> information to available for all frames, it just needs the jmethodID for
> every frame.
>
>> Do you mean "can NOT happen"?:
>> > Moreover, if the stack is walked synchronously (e.g., at
>> safepoints), no
>> > problems appear either, because the synchronous interruption can happen
>> > while execution is within the method handle intrinsic.
>
> Yes, I meant that it can *not* happen. Sorry for the confusion.
>
> Thank you and best regards,
>
>
> Zoltan
>
>>
>> Vladimir
>>
>> On 4/22/15 9:26 AM, Zolt?n Maj? wrote:
>>> Hi Vladimir,
>>>
>>>
>>> I managed to do some more work on this enhancement. Please see details
>>> below.
>>>
>>> On 04/01/2015 03:38 PM, Zolt?n Maj? wrote:
>>>> Hi Vladimir,
>>>>
>>>>
>>>> On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
>>>>> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>>
>>>>>> thank you for the feedback!
>>>>>>
>>>>>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>>>>>> How about PreserveFramePointer instead of simple FramePointer?
>>>>>>>
>>>>>>> PreserveFramePointer will mean that compiled (or other) code will
>>>>>>> use
>>>>>>> that register only as Frame pointer.
>>>>>>
>>>>>> I will change the flag's name to PreserveFramePointer and will also
>>>>>> update the description.
>>>
>>> I changed the flag's name to PreserveFramePointer, just as you
>>> suggested.
>>>
>>>>>>
>>>>>>> Zoltan, x86 flags setting should be in general globals_x86.hpp. You
>>>>>>> can #ifdef _LP64 there too. I don't understand why you only set
>>>>>>> it to
>>>>>>> true on linux-x64.
>>>>>>
>>>>>> I remembered that the original discussion with Brendan Gregg
>>>>>> mentioned
>>>>>> only Linux's perf tool as a possible use case for "proper" frame
>>>>>> pointers. So I was unsure whether to enable proper frame pointers by
>>>>>> default on other x64 platforms as well.
>>>>>>
>>>>>> But if you think it would be better to have proper frame pointers on
>>>>>> all
>>>>>> x64 platforms, I will change the code to set PreserveFramePointer to
>>>>>> true for all x64 platforms. Just please let me know.
>>>
>>> The current webrev sets the PreserveFramePointer flag to to true on all
>>> x86_64 platforms and to false on all other platforms.
>>>
>>>>>
>>>>> Currently compiled code for all x86 platforms is almost the same
>>>>> (win64 has difference in registers usage) and we should keep it that
>>>>> way.
>>>>>
>>>>> Also the original request was to have flag to enable such behavior
>>>>> (use RBP only as FP). So to have it off by default is acceptable. If
>>>>> performance group or someone find a regression (or bug) due to this
>>>>> change we can switch the flag off by default before jdk9 release.
>>>>>
>>>>> Try to run pstack on Solaris and jstack on OSX to make sure they
>>>>> report correct call stack with compiled java methods. And JFR.
>>>>> Also it would be nice to run SunStudio analyzer to verify that it
>>>>> works.
>>>>
>>>> I ran all tools you've suggested. JFR and jstack is unaffected, pstack
>>>> produces nice stack traces (it did not always do so before).
>>>
>>> I tested the current webrev with the following setup: I used two tests,
>>> one that generates a long chain of lambda form invocations and an other
>>> one that generates a long chain of "regular" method invocations. Both
>>> tests were executed on an x64 machine in four configurations: with +/-
>>> Xcomp and with +/- PreserveFramePointer.
>>>
>>> Just as before, JFR and jstack stack traces are unaffected for both
>>> tests, pstack can now produce stack traces with both tests if
>>> PreserveFramePointer is enabled.
>>>
>>>> However, I've encountered a problem with SunStudio: Two asserts fail
>>>> in the fastdebug build. Both of them  "soft" failures, as neither the
>>>> VM nor SunStudio crash with the product build. I worked on the problem
>>>> today and have a partial understanding of the issue, but more
>>>> investigation is needed to have a patch that preserves the correct
>>>> behavior of SunStudio as well.
>>>
>>> I was able to track down the problems with SunStudio. I had to change
>>> the code at two places.
>>>
>>>
>>> Change #1 (in src/cpu/x86/vm/frame_x86.cpp):
>>>
>>> *** 222,232 ****
>>>        }
>>>
>>>        if (sender_blob->is_nmethod()) {
>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>            if (nm != NULL) {
>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>> nm->is_deopt_entry(sender_pc)) {
>>>                    return false;
>>>                }
>>>            }
>>>        }
>>>
>>> --- 222,233 ----
>>>        }
>>>
>>>        if (sender_blob->is_nmethod()) {
>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>            if (nm != NULL) {
>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>> nm->is_deopt_entry(sender_pc) ||
>>> ! nm->method()->is_method_handle_intrinsic()) {
>>>                    return false;
>>>                }
>>>            }
>>>        }
>>>
>>> The reason for this change is the following. Method handle intrinsics
>>> (i.e., the intrinsics _invokeBasic, _linkToVirtual,_linkToStatic,
>>> _linkToSpecial, and _linkToInterface) do not allocate stack space when
>>> invoked, but they can extend the stack space of their caller
>>> "temporarily".
>>>
>>> For example, if VerifyMethodHandles is enabled, some stack space is used
>>> during verification. The temporarily used stack space is released before
>>> the intrinsic jumps to its target. As a result, the target of a method
>>> handle intrinsic will have a correct SP when it returns and the
>>> program's control flow is correct.
>>>
>>> Moreover, if the stack is walked synchronously (e.g., at safepoints), no
>>> problems appear either, because the synchronous interruption can happen
>>> while execution is within the method handle intrinsic.
>>>
>>> The problem is that the SunStudio analyzer can interrupt the VM
>>> asynchonously and walk the stack. If execution of a thread is
>>> interrupted while the thread is in a method handle intrinsic, the SP
>>> might contain an invalid value.
>>>
>>> The new webrev adds a check that marks the current frame unsafe for
>>> sender if the frame belongs to a method handle intrinsic
>>> (frame::safe_for_sender returns false in this case).
>>>
>>>
>>> Change #2 (in src/share/vm/prims/forte.cpp):
>>>
>>> *** 425,435 ****
>>>
>>>        RegisterMap map(thd, false);
>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>      }
>>>
>>> !   vframeStreamForte st(thd, initial_Java_frame, false);
>>>
>>>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>>>        bci = st.bci();
>>>        method = st.method();
>>>
>>> --- 425,435 ----
>>>
>>>        RegisterMap map(thd, false);
>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>      }
>>>
>>> !   vframeStreamForte st(thd, initial_Java_frame, true);
>>>
>>>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>>>        bci = st.bci();
>>>        method = st.method();
>>>
>>> The problem is that the following assert in forte.cpp on line 103
>>>
>>> assert(filled_in, "invariant");
>>>
>>> fails. The problem appears if we have a stack trace like:
>>>
>>> V  [libjvm.so+0x1c98c4a]  void VMError::report(outputStream*)+0xb1a
>>> V  [libjvm.so+0x1c9a3e8]  void VMError::report_and_die()+0x748
>>> V  [libjvm.so+0x1003c8e]  void report_vm_error(const char*,int,const
>>> char*,const char*)+0x7e
>>> V  [libjvm.so+0x10efa22]  vframeStreamForte::vframeStreamForte #Nvariant
>>> 1(JavaThread*,frame,bool)+0xe2
>>> --> (Frame #5) V  [libjvm.so+0x10f0bb9]  void
>>> forte_fill_call_trace_given_top(JavaThread*,ASGCT_CallTrace*,int,frame)+0x789
>>>
>>>
>>> V  [libjvm.so+0x10f1436]  AsyncGetCallTrace+0x246
>>> C  [libcollector.so+0x272a8]  __collector_ext_jstack_unwind+0xb8
>>> C  [libcollector.so+0x277df]  __collector_get_frame_info+0x27f
>>> C  [libcollector.so+0x2f093]  __collector_getUserCtx+0x13
>>> C  [libcollector.so+0x1abc7] __collector_ext_profile_handler+0x127
>>> C  [libcollector.so+0x17535]  collector_sigprof_dispatcher+0x85
>>> C  [libc.so.1+0x122476]  __sighndlr+0x6
>>> C  [libc.so.1+0x115972]  call_user_handler+0x2ce
>>> C  [libc.so.1+0x115e1b]  sigacthandler+0xdb
>>> C  0xffffffffffffffff
>>> V  [libjvm.so+0x1959089]  void os::PlatformEvent::park()+0xd9
>>> V  [libjvm.so+0x18a6b34]  int ParkCommon(ParkEvent*,long)+0x34
>>> V  [libjvm.so+0x18a7657]  int Monitor::IWait(Thread*,long)+0xb7
>>> V  [libjvm.so+0x18a8b86]  bool Monitor::wait(bool,long,bool)+0x346
>>> V  [libjvm.so+0xf7073d]  void
>>> CompileBroker::wait_for_completion(CompileTask*)+0xad
>>> V  [libjvm.so+0xf6f6b6]  void
>>> CompileBroker::compile_method_base(methodHandle,int,int,methodHandle,int,const
>>>
>>> char*,Thread*)+0x406
>>> V  [libjvm.so+0xf6fd96]
>>> nmethod*CompileBroker::compile_method(methodHandle,int,int,methodHandle,int,const
>>>
>>> char*,Thread*)+0x586
>>> V  [libjvm.so+0xbbad72]  void
>>> AdvancedThresholdPolicy::submit_compile(methodHandle,int,CompLevel,JavaThread*)+0xb2
>>>
>>>
>>> V  [libjvm.so+0x1aef92f]  void
>>> SimpleThresholdPolicy::compile(methodHandle,int,CompLevel,JavaThread*)+0x14f
>>>
>>>
>>> V  [libjvm.so+0xbbb00f]  void
>>> AdvancedThresholdPolicy::method_invocation_event(methodHandle,methodHandle,CompLevel,nmethod*,JavaThread*)+0x1ff
>>>
>>>
>>> V  [libjvm.so+0x1aef765]
>>> nmethod*SimpleThresholdPolicy::event(methodHandle,methodHandle,int,int,CompLevel,nmethod*,JavaThread*)+0x2e5
>>>
>>>
>>> V  [libjvm.so+0xdb93bc]  unsigned
>>> char*Runtime1::counter_overflow(JavaThread*,int,Method*)+0x31c
>>> v  ~RuntimeStub::counter_overflow Runtime1 stub
>>> --> (Frame #29) J 143 C1
>>> java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @
>>> 0xffff80ffacc6962a [0xffff80ffacc69580+0xaa]
>>> --> (Frame #30) v  ~StubRoutines::call_stub
>>> V  [libjvm.so+0x13ca50b]  void
>>> JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0x41b
>>>
>>>
>>> V  [libjvm.so+0x152a111]  JVM_DoPrivileged+0xfb1
>>> C  [libjava.so+0x12f42]
>>> Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x12
>>>
>>>
>>> J 142
>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;
>>>
>>> (0 bytes) @ 0xffff80ffb400c57c [0xffff80ffb400c420+0x15c\
>>> ]
>>> J 134 C1
>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;
>>> (47 bytes) @ 0xffff80ffacc66014 [0xffff80ffacc65ec0+0x154]
>>> ... more stack frames
>>>
>>> The forte_fill_call_trace_given_top() method (Frame #5) first checks if
>>> the first Java frame found is fully decipherable (line 395 in
>>> forte.cpp). In our case the first Java frame is Frame #29 (the
>>> C1-compiled version of java.net.URLClassLoader$1.run).
>>>
>>> In our case Frame #29 is not decipherable, because
>>> java.net.URLClassLoader$1.run has been made "not entrant" (a C2-compiled
>>> version of the same method has been produced shortly before).
>>>
>>> Afterwards, forte_fill_call_trace_given_top() checks if the method is
>>> "safe for sender" (line 424 in forte.cpp). The caller of the
>>> java.net.URLClassLoader$1.run method is ~StubRoutines::call_stub, which
>>> is considered "safe for sender" by the VM.
>>>
>>> Then, initial_Java_frame is set to the ~StubRoutines::call_stub stub
>>> (line 430). This does not seem to be correct because the stub is not a
>>> Java method and causes the assert(filled_in, "invariant") in the
>>> constructor of vframeStreamForte (line 103 in forte.cpp) to fail
>>> (because the frame cannot be filled from a stub).
>>>
>>> To avoid this failure, I propose to call the constructor of
>>> vframeStreamForte with parameter stop_at_java_call_stub set to true
>>> (instead of false) so that the VM stops walking the stack if a call stub
>>> has been reached.
>>>
>>>
>>> Here is the updated webrev:
>>>
>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.02/
>>>
>>> In addition to testing the changeset with the tools mentioned before, I
>>> executed
>>> - all JPRT tests, all pass;
>>> - all java/lang/invoke and compiler JTREG tests; all tests that pass
>>> with the unmodified source trace pass with the changes as well.
>>>
>>> Thank you very much in advance!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>>
>>>> So that will put this RFR on hold for a while, unfortunately.
>>>>
>>>> Thank you for the feedback and suggestions so far!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>>>>>> Hi Ed,
>>>>>>>>
>>>>>>>>
>>>>>>>> thank you for your feedback! Please see comments below.
>>>>>>>>
>>>>>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>>>>>> Hi Zolt?n,
>>>>>>>>>
>>>>>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler
>>>>>>>>>> tests and
>>>>>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32. All
>>>>>>>>>> tests
>>>>>>>>>> that pass without the patch pass also with the patch.
>>>>>>>>>>
>>>>>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>>>>>> infrastructure for
>>>>>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>>>>>> statistically significant performance degradation due to having
>>>>>>>>>> proper
>>>>>>>>>> frame pointers. Therefore I propose to have OmitFramePointer
>>>>>>>>>> set to
>>>>>>>>>> false by default on x86_64 (and set to true on all other
>>>>>>>>>> platforms).
>>>>>>>>> This patch looks good, however I think there is a problem with the
>>>>>>>>> logic of OmitFramePointer.
>>>>>>>>>
>>>>>>>>> Here is my test case.
>>>>>>>>>
>>>>>>>>> --- CUT HERE ---
>>>>>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>>>>>
>>>>>>>>> public class fibo {
>>>>>>>>>      public static void main(String args[]) {
>>>>>>>>>     int N = Integer.parseInt(args[0]);
>>>>>>>>>     System.out.println(fib(N));
>>>>>>>>>      }
>>>>>>>>>      public static int fib(int n) {
>>>>>>>>>     if (n < 2) return(1);
>>>>>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>>>>>      }
>>>>>>>>> }
>>>>>>>>> --- CUT HERE ---
>>>>>>>>>
>>>>>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>>>>>
>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>
>>>>>>>>> I get
>>>>>>>>>
>>>>>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>    0x00007fc625071100: mov    %eax,-0x14000(%rsp)
>>>>>>>>>    0x00007fc625071107: push   %rbp
>>>>>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>>
>>>>>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is
>>>>>>>>> using
>>>>>>>>> the frame pointer
>>>>>>>>>
>>>>>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>>>>>> -XX:+OmitFramePointer in the above I get
>>>>>>>>>
>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>
>>>>>>>>> I get
>>>>>>>>>
>>>>>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>    0x00007f14e1071100: mov    %eax,-0x14000(%rsp)
>>>>>>>>>    0x00007f14e1071107: push   %rbp
>>>>>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>>
>>>>>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it does
>>>>>>>>> not
>>>>>>>>> use a frame pointer.
>>>>>>>>>
>>>>>>>>> However, if I now delete the -XX:+/-OmitFramePointer
>>>>>>>>> altogether, IE
>>>>>>>>>
>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>> -XX:+PrintAssembly fibo 43
>>>>>>>>>
>>>>>>>>> I get
>>>>>>>>>
>>>>>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>    0x00007f0c75071100: mov    %eax,-0x14000(%rsp)
>>>>>>>>>    0x00007f0c75071107: push   %rbp
>>>>>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>>
>>>>>>>>> It is not using a frame pointer which is the equivalent of
>>>>>>>>> -XX:+OmitFramePointer. However in your description above you say
>>>>>>>>>
>>>>>>>>>> Therefore I propose to have OmitFramePointer set to false by
>>>>>>>>>> default
>>>>>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>>>>> whereas OmitFramePointer actually seems to be set to true on
>>>>>>>>> x86_64
>>>>>>>>>
>>>>>>>>> I think the problem may be with the declaration and definition of
>>>>>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>>>>>
>>>>>>>>> In globals.hpp it does
>>>>>>>>>
>>>>>>>>> product(bool, OmitFramePointer, true,
>>>>>>>>>
>>>>>>>>> In globals_x86.hpp it does
>>>>>>>>>
>>>>>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>>>>>
>>>>>>>>> I am not sure that you can mix product(...) and product_pd(...)
>>>>>>>>> like
>>>>>>>>> this, so I think it just ends up getting the default from the
>>>>>>>>> product(...).
>>>>>>>>
>>>>>>>> You are right, mixing product and product_pd does not make sense
>>>>>>>> at all.
>>>>>>>> Thank you for doing additional testing and for drawing attention
>>>>>>>> to the
>>>>>>>> problem.
>>>>>>>>
>>>>>>>> I updated the code to use product_pd and define_pd_global on all
>>>>>>>> relevant platforms.
>>>>>>>>
>>>>>>>>> Aside: In general, I do not like options which include a
>>>>>>>>> negative in
>>>>>>>>> them because I have to do a double think when I see something
>>>>>>>>> like,
>>>>>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer,
>>>>>>>>> therefore it is using a frame pointer. How about FramePointer
>>>>>>>>> so we
>>>>>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>>>>>> -XX:-FramePointer to say I don't.
>>>>>>>>
>>>>>>>> That is a good idea. Double negation is an unnecessary
>>>>>>>> complication, so
>>>>>>>> I changed the name of the flag to FramePointer, just as you
>>>>>>>> suggested.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I did some timing on the above 'fibo' test
>>>>>>>>>
>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>> -XX:-OmitFramePointer fibo 43
>>>>>>>>> 701408733
>>>>>>>>>
>>>>>>>>> real    0m1.545s
>>>>>>>>> user    0m1.571s
>>>>>>>>> sys    0m0.015s
>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>> -XX:+OmitFramePointer fibo 43
>>>>>>>>> 701408733
>>>>>>>>>
>>>>>>>>> real    0m1.504s
>>>>>>>>> user    0m1.527s
>>>>>>>>> sys    0m0.019s
>>>>>>>>>
>>>>>>>>> which is ~3% difference on this test case. On aarch64, I see ~7%
>>>>>>>>> difference on this test case.
>>>>>>>>
>>>>>>>> Thank you for the performance measurements!
>>>>>>>>
>>>>>>>>> With the above change to fix the logic of OmitFramePointer (and
>>>>>>>>> possible change its name) the patch looks good to me.
>>>>>>>>
>>>>>>>> Here is the updated webrev (the same webrev that was already
>>>>>>>> included
>>>>>>>> into my reply to Roland):
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>>>>>
>>>>>>>>> I will prepare a mirror patch for aarch64.
>>>>>>>>
>>>>>>>> That would be great!
>>>>>>>>
>>>>>>>> Thank you and best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Zolt?n
>>>>>>>>
>>>>>>>>>
>>>>>>>>> All the best,
>>>>>>>>> Ed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>

From zoltan.majo at oracle.com  Fri Apr 24 17:21:30 2015
From: zoltan.majo at oracle.com (=?windows-1252?Q?Zolt=E1n_Maj=F3?=)
Date: Fri, 24 Apr 2015 19:21:30 +0200
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <553A6ECE.8090908@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<551BF4D3.90805@oracle.com>	<5537CBAE.9020500@oracle.com>	<553816F9.1040104@oracle.com>	<553A2DC8.3080804@oracle.com>
	<553A6ECE.8090908@oracle.com>
Message-ID: <553A7B9A.5060407@oracle.com>

Hi Vladimir,


On 04/24/2015 06:26 PM, Vladimir Kozlov wrote:
> Yes, this looks good. Nice work and thank you for testing.

Thank you for all the feedback you've provided while I was working on 
this issue!

I would have one final question: Can I disable the assert in 
vframeStreamCommon::found_bad_method_frame() on line 443 in vframe.cpp? 
I would prefer that for the reasons I mentioned in my previous message 
(please see also below).

Here is an updated webrev (with the assert disabled): 
http://cr.openjdk.java.net/~zmajo/8068945/webrev.04/

Could you please let me know which webrev I can push, webrev.03 with the 
assert left enabled or webrev.04 with the assert disabled?

Thank you and best regards,


Zoltan

>> I ran the experiments *with the assert enabled* in
>> vframeStreamCommon::found_bad_method_frame(). The assert was not 
>> triggered.
>>
>> I would be, however, inclined to disable that assert because:
>> - (1) it is better to have some information in stack traces than no
>> information and a crash (even though the crash happens in a fastdebug
>> build);
>> - (2) the assert is there only for SunStudio (according to the comments
>> on lines 439--442 in vframe.cpp) and AsyncGetCallTrace (the only place
>> where forte_fill_call_trace_given_top() is called) does not expect debug
>> information to available for all frames, it just needs the jmethodID for
>> every frame.
>>
>>> Do you mean "can NOT happen"?:
>>> > Moreover, if the stack is walked synchronously (e.g., at
>>> safepoints), no
>>> > problems appear either, because the synchronous interruption can 
>>> happen
>>> > while execution is within the method handle intrinsic.
>>
>> Yes, I meant that it can *not* happen. Sorry for the confusion.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Vladimir
>>>
>>> On 4/22/15 9:26 AM, Zolt?n Maj? wrote:
>>>> Hi Vladimir,
>>>>
>>>>
>>>> I managed to do some more work on this enhancement. Please see details
>>>> below.
>>>>
>>>> On 04/01/2015 03:38 PM, Zolt?n Maj? wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>>
>>>>> On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
>>>>>> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>>
>>>>>>> thank you for the feedback!
>>>>>>>
>>>>>>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>>>>>>> How about PreserveFramePointer instead of simple FramePointer?
>>>>>>>>
>>>>>>>> PreserveFramePointer will mean that compiled (or other) code will
>>>>>>>> use
>>>>>>>> that register only as Frame pointer.
>>>>>>>
>>>>>>> I will change the flag's name to PreserveFramePointer and will also
>>>>>>> update the description.
>>>>
>>>> I changed the flag's name to PreserveFramePointer, just as you
>>>> suggested.
>>>>
>>>>>>>
>>>>>>>> Zoltan, x86 flags setting should be in general globals_x86.hpp. 
>>>>>>>> You
>>>>>>>> can #ifdef _LP64 there too. I don't understand why you only set
>>>>>>>> it to
>>>>>>>> true on linux-x64.
>>>>>>>
>>>>>>> I remembered that the original discussion with Brendan Gregg
>>>>>>> mentioned
>>>>>>> only Linux's perf tool as a possible use case for "proper" frame
>>>>>>> pointers. So I was unsure whether to enable proper frame 
>>>>>>> pointers by
>>>>>>> default on other x64 platforms as well.
>>>>>>>
>>>>>>> But if you think it would be better to have proper frame 
>>>>>>> pointers on
>>>>>>> all
>>>>>>> x64 platforms, I will change the code to set 
>>>>>>> PreserveFramePointer to
>>>>>>> true for all x64 platforms. Just please let me know.
>>>>
>>>> The current webrev sets the PreserveFramePointer flag to to true on 
>>>> all
>>>> x86_64 platforms and to false on all other platforms.
>>>>
>>>>>>
>>>>>> Currently compiled code for all x86 platforms is almost the same
>>>>>> (win64 has difference in registers usage) and we should keep it that
>>>>>> way.
>>>>>>
>>>>>> Also the original request was to have flag to enable such behavior
>>>>>> (use RBP only as FP). So to have it off by default is acceptable. If
>>>>>> performance group or someone find a regression (or bug) due to this
>>>>>> change we can switch the flag off by default before jdk9 release.
>>>>>>
>>>>>> Try to run pstack on Solaris and jstack on OSX to make sure they
>>>>>> report correct call stack with compiled java methods. And JFR.
>>>>>> Also it would be nice to run SunStudio analyzer to verify that it
>>>>>> works.
>>>>>
>>>>> I ran all tools you've suggested. JFR and jstack is unaffected, 
>>>>> pstack
>>>>> produces nice stack traces (it did not always do so before).
>>>>
>>>> I tested the current webrev with the following setup: I used two 
>>>> tests,
>>>> one that generates a long chain of lambda form invocations and an 
>>>> other
>>>> one that generates a long chain of "regular" method invocations. Both
>>>> tests were executed on an x64 machine in four configurations: with +/-
>>>> Xcomp and with +/- PreserveFramePointer.
>>>>
>>>> Just as before, JFR and jstack stack traces are unaffected for both
>>>> tests, pstack can now produce stack traces with both tests if
>>>> PreserveFramePointer is enabled.
>>>>
>>>>> However, I've encountered a problem with SunStudio: Two asserts fail
>>>>> in the fastdebug build. Both of them  "soft" failures, as neither the
>>>>> VM nor SunStudio crash with the product build. I worked on the 
>>>>> problem
>>>>> today and have a partial understanding of the issue, but more
>>>>> investigation is needed to have a patch that preserves the correct
>>>>> behavior of SunStudio as well.
>>>>
>>>> I was able to track down the problems with SunStudio. I had to change
>>>> the code at two places.
>>>>
>>>>
>>>> Change #1 (in src/cpu/x86/vm/frame_x86.cpp):
>>>>
>>>> *** 222,232 ****
>>>>        }
>>>>
>>>>        if (sender_blob->is_nmethod()) {
>>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>>            if (nm != NULL) {
>>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>>> nm->is_deopt_entry(sender_pc)) {
>>>>                    return false;
>>>>                }
>>>>            }
>>>>        }
>>>>
>>>> --- 222,233 ----
>>>>        }
>>>>
>>>>        if (sender_blob->is_nmethod()) {
>>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>>            if (nm != NULL) {
>>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>>> nm->is_deopt_entry(sender_pc) ||
>>>> ! nm->method()->is_method_handle_intrinsic()) {
>>>>                    return false;
>>>>                }
>>>>            }
>>>>        }
>>>>
>>>> The reason for this change is the following. Method handle intrinsics
>>>> (i.e., the intrinsics _invokeBasic, _linkToVirtual,_linkToStatic,
>>>> _linkToSpecial, and _linkToInterface) do not allocate stack space when
>>>> invoked, but they can extend the stack space of their caller
>>>> "temporarily".
>>>>
>>>> For example, if VerifyMethodHandles is enabled, some stack space is 
>>>> used
>>>> during verification. The temporarily used stack space is released 
>>>> before
>>>> the intrinsic jumps to its target. As a result, the target of a method
>>>> handle intrinsic will have a correct SP when it returns and the
>>>> program's control flow is correct.
>>>>
>>>> Moreover, if the stack is walked synchronously (e.g., at 
>>>> safepoints), no
>>>> problems appear either, because the synchronous interruption can 
>>>> happen
>>>> while execution is within the method handle intrinsic.
>>>>
>>>> The problem is that the SunStudio analyzer can interrupt the VM
>>>> asynchonously and walk the stack. If execution of a thread is
>>>> interrupted while the thread is in a method handle intrinsic, the SP
>>>> might contain an invalid value.
>>>>
>>>> The new webrev adds a check that marks the current frame unsafe for
>>>> sender if the frame belongs to a method handle intrinsic
>>>> (frame::safe_for_sender returns false in this case).
>>>>
>>>>
>>>> Change #2 (in src/share/vm/prims/forte.cpp):
>>>>
>>>> *** 425,435 ****
>>>>
>>>>        RegisterMap map(thd, false);
>>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>>      }
>>>>
>>>> !   vframeStreamForte st(thd, initial_Java_frame, false);
>>>>
>>>>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>>>>        bci = st.bci();
>>>>        method = st.method();
>>>>
>>>> --- 425,435 ----
>>>>
>>>>        RegisterMap map(thd, false);
>>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>>      }
>>>>
>>>> !   vframeStreamForte st(thd, initial_Java_frame, true);
>>>>
>>>>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>>>>        bci = st.bci();
>>>>        method = st.method();
>>>>
>>>> The problem is that the following assert in forte.cpp on line 103
>>>>
>>>> assert(filled_in, "invariant");
>>>>
>>>> fails. The problem appears if we have a stack trace like:
>>>>
>>>> V  [libjvm.so+0x1c98c4a]  void VMError::report(outputStream*)+0xb1a
>>>> V  [libjvm.so+0x1c9a3e8]  void VMError::report_and_die()+0x748
>>>> V  [libjvm.so+0x1003c8e]  void report_vm_error(const char*,int,const
>>>> char*,const char*)+0x7e
>>>> V  [libjvm.so+0x10efa22] vframeStreamForte::vframeStreamForte 
>>>> #Nvariant
>>>> 1(JavaThread*,frame,bool)+0xe2
>>>> --> (Frame #5) V  [libjvm.so+0x10f0bb9]  void
>>>> forte_fill_call_trace_given_top(JavaThread*,ASGCT_CallTrace*,int,frame)+0x789 
>>>>
>>>>
>>>>
>>>> V  [libjvm.so+0x10f1436]  AsyncGetCallTrace+0x246
>>>> C  [libcollector.so+0x272a8] __collector_ext_jstack_unwind+0xb8
>>>> C  [libcollector.so+0x277df] __collector_get_frame_info+0x27f
>>>> C  [libcollector.so+0x2f093]  __collector_getUserCtx+0x13
>>>> C  [libcollector.so+0x1abc7] __collector_ext_profile_handler+0x127
>>>> C  [libcollector.so+0x17535] collector_sigprof_dispatcher+0x85
>>>> C  [libc.so.1+0x122476]  __sighndlr+0x6
>>>> C  [libc.so.1+0x115972]  call_user_handler+0x2ce
>>>> C  [libc.so.1+0x115e1b]  sigacthandler+0xdb
>>>> C  0xffffffffffffffff
>>>> V  [libjvm.so+0x1959089]  void os::PlatformEvent::park()+0xd9
>>>> V  [libjvm.so+0x18a6b34]  int ParkCommon(ParkEvent*,long)+0x34
>>>> V  [libjvm.so+0x18a7657]  int Monitor::IWait(Thread*,long)+0xb7
>>>> V  [libjvm.so+0x18a8b86]  bool Monitor::wait(bool,long,bool)+0x346
>>>> V  [libjvm.so+0xf7073d]  void
>>>> CompileBroker::wait_for_completion(CompileTask*)+0xad
>>>> V  [libjvm.so+0xf6f6b6]  void
>>>> CompileBroker::compile_method_base(methodHandle,int,int,methodHandle,int,const 
>>>>
>>>>
>>>> char*,Thread*)+0x406
>>>> V  [libjvm.so+0xf6fd96]
>>>> nmethod*CompileBroker::compile_method(methodHandle,int,int,methodHandle,int,const 
>>>>
>>>>
>>>> char*,Thread*)+0x586
>>>> V  [libjvm.so+0xbbad72]  void
>>>> AdvancedThresholdPolicy::submit_compile(methodHandle,int,CompLevel,JavaThread*)+0xb2 
>>>>
>>>>
>>>>
>>>> V  [libjvm.so+0x1aef92f]  void
>>>> SimpleThresholdPolicy::compile(methodHandle,int,CompLevel,JavaThread*)+0x14f 
>>>>
>>>>
>>>>
>>>> V  [libjvm.so+0xbbb00f]  void
>>>> AdvancedThresholdPolicy::method_invocation_event(methodHandle,methodHandle,CompLevel,nmethod*,JavaThread*)+0x1ff 
>>>>
>>>>
>>>>
>>>> V  [libjvm.so+0x1aef765]
>>>> nmethod*SimpleThresholdPolicy::event(methodHandle,methodHandle,int,int,CompLevel,nmethod*,JavaThread*)+0x2e5 
>>>>
>>>>
>>>>
>>>> V  [libjvm.so+0xdb93bc]  unsigned
>>>> char*Runtime1::counter_overflow(JavaThread*,int,Method*)+0x31c
>>>> v  ~RuntimeStub::counter_overflow Runtime1 stub
>>>> --> (Frame #29) J 143 C1
>>>> java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @
>>>> 0xffff80ffacc6962a [0xffff80ffacc69580+0xaa]
>>>> --> (Frame #30) v  ~StubRoutines::call_stub
>>>> V  [libjvm.so+0x13ca50b]  void
>>>> JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0x41b 
>>>>
>>>>
>>>>
>>>> V  [libjvm.so+0x152a111]  JVM_DoPrivileged+0xfb1
>>>> C  [libjava.so+0x12f42]
>>>> Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x12 
>>>>
>>>>
>>>>
>>>> J 142
>>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; 
>>>>
>>>>
>>>> (0 bytes) @ 0xffff80ffb400c57c [0xffff80ffb400c420+0x15c\
>>>> ]
>>>> J 134 C1
>>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;
>>>> (47 bytes) @ 0xffff80ffacc66014 [0xffff80ffacc65ec0+0x154]
>>>> ... more stack frames
>>>>
>>>> The forte_fill_call_trace_given_top() method (Frame #5) first 
>>>> checks if
>>>> the first Java frame found is fully decipherable (line 395 in
>>>> forte.cpp). In our case the first Java frame is Frame #29 (the
>>>> C1-compiled version of java.net.URLClassLoader$1.run).
>>>>
>>>> In our case Frame #29 is not decipherable, because
>>>> java.net.URLClassLoader$1.run has been made "not entrant" (a 
>>>> C2-compiled
>>>> version of the same method has been produced shortly before).
>>>>
>>>> Afterwards, forte_fill_call_trace_given_top() checks if the method is
>>>> "safe for sender" (line 424 in forte.cpp). The caller of the
>>>> java.net.URLClassLoader$1.run method is ~StubRoutines::call_stub, 
>>>> which
>>>> is considered "safe for sender" by the VM.
>>>>
>>>> Then, initial_Java_frame is set to the ~StubRoutines::call_stub stub
>>>> (line 430). This does not seem to be correct because the stub is not a
>>>> Java method and causes the assert(filled_in, "invariant") in the
>>>> constructor of vframeStreamForte (line 103 in forte.cpp) to fail
>>>> (because the frame cannot be filled from a stub).
>>>>
>>>> To avoid this failure, I propose to call the constructor of
>>>> vframeStreamForte with parameter stop_at_java_call_stub set to true
>>>> (instead of false) so that the VM stops walking the stack if a call 
>>>> stub
>>>> has been reached.
>>>>
>>>>
>>>> Here is the updated webrev:
>>>>
>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.02/
>>>>
>>>> In addition to testing the changeset with the tools mentioned 
>>>> before, I
>>>> executed
>>>> - all JPRT tests, all pass;
>>>> - all java/lang/invoke and compiler JTREG tests; all tests that pass
>>>> with the unmodified source trace pass with the changes as well.
>>>>
>>>> Thank you very much in advance!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>>
>>>>> So that will put this RFR on hold for a while, unfortunately.
>>>>>
>>>>> Thank you for the feedback and suggestions so far!
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>>
>>>>>>> Zoltan
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>>>>>>> Hi Ed,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> thank you for your feedback! Please see comments below.
>>>>>>>>>
>>>>>>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>>>>>>> Hi Zolt?n,
>>>>>>>>>>
>>>>>>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>>>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler
>>>>>>>>>>> tests and
>>>>>>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32. 
>>>>>>>>>>> All
>>>>>>>>>>> tests
>>>>>>>>>>> that pass without the patch pass also with the patch.
>>>>>>>>>>>
>>>>>>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>>>>>>> infrastructure for
>>>>>>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>>>>>>> statistically significant performance degradation due to having
>>>>>>>>>>> proper
>>>>>>>>>>> frame pointers. Therefore I propose to have OmitFramePointer
>>>>>>>>>>> set to
>>>>>>>>>>> false by default on x86_64 (and set to true on all other
>>>>>>>>>>> platforms).
>>>>>>>>>> This patch looks good, however I think there is a problem 
>>>>>>>>>> with the
>>>>>>>>>> logic of OmitFramePointer.
>>>>>>>>>>
>>>>>>>>>> Here is my test case.
>>>>>>>>>>
>>>>>>>>>> --- CUT HERE ---
>>>>>>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>>>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>>>>>>
>>>>>>>>>> public class fibo {
>>>>>>>>>>      public static void main(String args[]) {
>>>>>>>>>>     int N = Integer.parseInt(args[0]);
>>>>>>>>>>     System.out.println(fib(N));
>>>>>>>>>>      }
>>>>>>>>>>      public static int fib(int n) {
>>>>>>>>>>     if (n < 2) return(1);
>>>>>>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>>>>>>      }
>>>>>>>>>> }
>>>>>>>>>> --- CUT HERE ---
>>>>>>>>>>
>>>>>>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>>>>>>
>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>>
>>>>>>>>>> I get
>>>>>>>>>>
>>>>>>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>    0x00007fc625071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>    0x00007fc625071107: push   %rbp
>>>>>>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>>>
>>>>>>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is
>>>>>>>>>> using
>>>>>>>>>> the frame pointer
>>>>>>>>>>
>>>>>>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>>>>>>> -XX:+OmitFramePointer in the above I get
>>>>>>>>>>
>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>>
>>>>>>>>>> I get
>>>>>>>>>>
>>>>>>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>    0x00007f14e1071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>    0x00007f14e1071107: push   %rbp
>>>>>>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>>>
>>>>>>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it 
>>>>>>>>>> does
>>>>>>>>>> not
>>>>>>>>>> use a frame pointer.
>>>>>>>>>>
>>>>>>>>>> However, if I now delete the -XX:+/-OmitFramePointer
>>>>>>>>>> altogether, IE
>>>>>>>>>>
>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>> -XX:+PrintAssembly fibo 43
>>>>>>>>>>
>>>>>>>>>> I get
>>>>>>>>>>
>>>>>>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>    0x00007f0c75071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>    0x00007f0c75071107: push   %rbp
>>>>>>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>>>
>>>>>>>>>> It is not using a frame pointer which is the equivalent of
>>>>>>>>>> -XX:+OmitFramePointer. However in your description above you say
>>>>>>>>>>
>>>>>>>>>>> Therefore I propose to have OmitFramePointer set to false by
>>>>>>>>>>> default
>>>>>>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>>>>>> whereas OmitFramePointer actually seems to be set to true on
>>>>>>>>>> x86_64
>>>>>>>>>>
>>>>>>>>>> I think the problem may be with the declaration and 
>>>>>>>>>> definition of
>>>>>>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>>>>>>
>>>>>>>>>> In globals.hpp it does
>>>>>>>>>>
>>>>>>>>>> product(bool, OmitFramePointer, true,
>>>>>>>>>>
>>>>>>>>>> In globals_x86.hpp it does
>>>>>>>>>>
>>>>>>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>>>>>>
>>>>>>>>>> I am not sure that you can mix product(...) and product_pd(...)
>>>>>>>>>> like
>>>>>>>>>> this, so I think it just ends up getting the default from the
>>>>>>>>>> product(...).
>>>>>>>>>
>>>>>>>>> You are right, mixing product and product_pd does not make sense
>>>>>>>>> at all.
>>>>>>>>> Thank you for doing additional testing and for drawing attention
>>>>>>>>> to the
>>>>>>>>> problem.
>>>>>>>>>
>>>>>>>>> I updated the code to use product_pd and define_pd_global on all
>>>>>>>>> relevant platforms.
>>>>>>>>>
>>>>>>>>>> Aside: In general, I do not like options which include a
>>>>>>>>>> negative in
>>>>>>>>>> them because I have to do a double think when I see something
>>>>>>>>>> like,
>>>>>>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer,
>>>>>>>>>> therefore it is using a frame pointer. How about FramePointer
>>>>>>>>>> so we
>>>>>>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>>>>>>> -XX:-FramePointer to say I don't.
>>>>>>>>>
>>>>>>>>> That is a good idea. Double negation is an unnecessary
>>>>>>>>> complication, so
>>>>>>>>> I changed the name of the flag to FramePointer, just as you
>>>>>>>>> suggested.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I did some timing on the above 'fibo' test
>>>>>>>>>>
>>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>>> -XX:-OmitFramePointer fibo 43
>>>>>>>>>> 701408733
>>>>>>>>>>
>>>>>>>>>> real    0m1.545s
>>>>>>>>>> user    0m1.571s
>>>>>>>>>> sys    0m0.015s
>>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>>> -XX:+OmitFramePointer fibo 43
>>>>>>>>>> 701408733
>>>>>>>>>>
>>>>>>>>>> real    0m1.504s
>>>>>>>>>> user    0m1.527s
>>>>>>>>>> sys    0m0.019s
>>>>>>>>>>
>>>>>>>>>> which is ~3% difference on this test case. On aarch64, I see ~7%
>>>>>>>>>> difference on this test case.
>>>>>>>>>
>>>>>>>>> Thank you for the performance measurements!
>>>>>>>>>
>>>>>>>>>> With the above change to fix the logic of OmitFramePointer (and
>>>>>>>>>> possible change its name) the patch looks good to me.
>>>>>>>>>
>>>>>>>>> Here is the updated webrev (the same webrev that was already
>>>>>>>>> included
>>>>>>>>> into my reply to Roland):
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>>>>>>
>>>>>>>>>> I will prepare a mirror patch for aarch64.
>>>>>>>>>
>>>>>>>>> That would be great!
>>>>>>>>>
>>>>>>>>> Thank you and best regards,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Zolt?n
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> All the best,
>>>>>>>>>> Ed.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>


From vladimir.kozlov at oracle.com  Fri Apr 24 17:52:25 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 24 Apr 2015 10:52:25 -0700
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <553A7B9A.5060407@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<551BF4D3.90805@oracle.com>	<5537CBAE.9020500@oracle.com>	<553816F9.1040104@oracle.com>	<553A2DC8.3080804@oracle.com>	<553A6ECE.8090908@oracle.com>
	<553A7B9A.5060407@oracle.com>
Message-ID: <553A82D9.1090500@oracle.com>

 > - (2) the assert is there only for SunStudio (according to the comments
 > on lines 439--442 in vframe.cpp) and AsyncGetCallTrace (the only place
 > where forte_fill_call_trace_given_top() is called) does not expect debug
 > information to available for all frames, it just needs the jmethodID for
 > every frame.

This is incorrect interpretation of comments. The assert is in general 
code to catch incorrect bci or pcoffset during vframe construction. The 
comment said that instead of having several asserts we have only one for 
all these case to simplify disabling it when running performance analyzer.

So you can only skip it when you run with performance analyzer but not 
during general execution.

May be you can set or path flag when vframe construction is called from 
forte and skip assert in such case. I assume you hit it only with 
performance analyzer. Right?

Thanks,
Vladimir

On 4/24/15 10:21 AM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> On 04/24/2015 06:26 PM, Vladimir Kozlov wrote:
>> Yes, this looks good. Nice work and thank you for testing.
>
> Thank you for all the feedback you've provided while I was working on
> this issue!
>
> I would have one final question: Can I disable the assert in
> vframeStreamCommon::found_bad_method_frame() on line 443 in vframe.cpp?
> I would prefer that for the reasons I mentioned in my previous message
> (please see also below).
>
> Here is an updated webrev (with the assert disabled):
> http://cr.openjdk.java.net/~zmajo/8068945/webrev.04/
>
> Could you please let me know which webrev I can push, webrev.03 with the
> assert left enabled or webrev.04 with the assert disabled?
>
> Thank you and best regards,
>
>
> Zoltan
>
>>> I ran the experiments *with the assert enabled* in
>>> vframeStreamCommon::found_bad_method_frame(). The assert was not
>>> triggered.
>>>
>>> I would be, however, inclined to disable that assert because:
>>> - (1) it is better to have some information in stack traces than no
>>> information and a crash (even though the crash happens in a fastdebug
>>> build);
>>> - (2) the assert is there only for SunStudio (according to the comments
>>> on lines 439--442 in vframe.cpp) and AsyncGetCallTrace (the only place
>>> where forte_fill_call_trace_given_top() is called) does not expect debug
>>> information to available for all frames, it just needs the jmethodID for
>>> every frame.
>>>
>>>> Do you mean "can NOT happen"?:
>>>> > Moreover, if the stack is walked synchronously (e.g., at
>>>> safepoints), no
>>>> > problems appear either, because the synchronous interruption can
>>>> happen
>>>> > while execution is within the method handle intrinsic.
>>>
>>> Yes, I meant that it can *not* happen. Sorry for the confusion.
>>>
>>> Thank you and best regards,
>>>
>>>
>>> Zoltan
>>>
>>>>
>>>> Vladimir
>>>>
>>>> On 4/22/15 9:26 AM, Zolt?n Maj? wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>>
>>>>> I managed to do some more work on this enhancement. Please see details
>>>>> below.
>>>>>
>>>>> On 04/01/2015 03:38 PM, Zolt?n Maj? wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>>
>>>>>> On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
>>>>>>> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>>>>>>>> Hi Vladimir,
>>>>>>>>
>>>>>>>>
>>>>>>>> thank you for the feedback!
>>>>>>>>
>>>>>>>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>>>>>>>> How about PreserveFramePointer instead of simple FramePointer?
>>>>>>>>>
>>>>>>>>> PreserveFramePointer will mean that compiled (or other) code will
>>>>>>>>> use
>>>>>>>>> that register only as Frame pointer.
>>>>>>>>
>>>>>>>> I will change the flag's name to PreserveFramePointer and will also
>>>>>>>> update the description.
>>>>>
>>>>> I changed the flag's name to PreserveFramePointer, just as you
>>>>> suggested.
>>>>>
>>>>>>>>
>>>>>>>>> Zoltan, x86 flags setting should be in general globals_x86.hpp.
>>>>>>>>> You
>>>>>>>>> can #ifdef _LP64 there too. I don't understand why you only set
>>>>>>>>> it to
>>>>>>>>> true on linux-x64.
>>>>>>>>
>>>>>>>> I remembered that the original discussion with Brendan Gregg
>>>>>>>> mentioned
>>>>>>>> only Linux's perf tool as a possible use case for "proper" frame
>>>>>>>> pointers. So I was unsure whether to enable proper frame
>>>>>>>> pointers by
>>>>>>>> default on other x64 platforms as well.
>>>>>>>>
>>>>>>>> But if you think it would be better to have proper frame
>>>>>>>> pointers on
>>>>>>>> all
>>>>>>>> x64 platforms, I will change the code to set
>>>>>>>> PreserveFramePointer to
>>>>>>>> true for all x64 platforms. Just please let me know.
>>>>>
>>>>> The current webrev sets the PreserveFramePointer flag to to true on
>>>>> all
>>>>> x86_64 platforms and to false on all other platforms.
>>>>>
>>>>>>>
>>>>>>> Currently compiled code for all x86 platforms is almost the same
>>>>>>> (win64 has difference in registers usage) and we should keep it that
>>>>>>> way.
>>>>>>>
>>>>>>> Also the original request was to have flag to enable such behavior
>>>>>>> (use RBP only as FP). So to have it off by default is acceptable. If
>>>>>>> performance group or someone find a regression (or bug) due to this
>>>>>>> change we can switch the flag off by default before jdk9 release.
>>>>>>>
>>>>>>> Try to run pstack on Solaris and jstack on OSX to make sure they
>>>>>>> report correct call stack with compiled java methods. And JFR.
>>>>>>> Also it would be nice to run SunStudio analyzer to verify that it
>>>>>>> works.
>>>>>>
>>>>>> I ran all tools you've suggested. JFR and jstack is unaffected,
>>>>>> pstack
>>>>>> produces nice stack traces (it did not always do so before).
>>>>>
>>>>> I tested the current webrev with the following setup: I used two
>>>>> tests,
>>>>> one that generates a long chain of lambda form invocations and an
>>>>> other
>>>>> one that generates a long chain of "regular" method invocations. Both
>>>>> tests were executed on an x64 machine in four configurations: with +/-
>>>>> Xcomp and with +/- PreserveFramePointer.
>>>>>
>>>>> Just as before, JFR and jstack stack traces are unaffected for both
>>>>> tests, pstack can now produce stack traces with both tests if
>>>>> PreserveFramePointer is enabled.
>>>>>
>>>>>> However, I've encountered a problem with SunStudio: Two asserts fail
>>>>>> in the fastdebug build. Both of them  "soft" failures, as neither the
>>>>>> VM nor SunStudio crash with the product build. I worked on the
>>>>>> problem
>>>>>> today and have a partial understanding of the issue, but more
>>>>>> investigation is needed to have a patch that preserves the correct
>>>>>> behavior of SunStudio as well.
>>>>>
>>>>> I was able to track down the problems with SunStudio. I had to change
>>>>> the code at two places.
>>>>>
>>>>>
>>>>> Change #1 (in src/cpu/x86/vm/frame_x86.cpp):
>>>>>
>>>>> *** 222,232 ****
>>>>>        }
>>>>>
>>>>>        if (sender_blob->is_nmethod()) {
>>>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>>>            if (nm != NULL) {
>>>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>>>> nm->is_deopt_entry(sender_pc)) {
>>>>>                    return false;
>>>>>                }
>>>>>            }
>>>>>        }
>>>>>
>>>>> --- 222,233 ----
>>>>>        }
>>>>>
>>>>>        if (sender_blob->is_nmethod()) {
>>>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>>>            if (nm != NULL) {
>>>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>>>> nm->is_deopt_entry(sender_pc) ||
>>>>> ! nm->method()->is_method_handle_intrinsic()) {
>>>>>                    return false;
>>>>>                }
>>>>>            }
>>>>>        }
>>>>>
>>>>> The reason for this change is the following. Method handle intrinsics
>>>>> (i.e., the intrinsics _invokeBasic, _linkToVirtual,_linkToStatic,
>>>>> _linkToSpecial, and _linkToInterface) do not allocate stack space when
>>>>> invoked, but they can extend the stack space of their caller
>>>>> "temporarily".
>>>>>
>>>>> For example, if VerifyMethodHandles is enabled, some stack space is
>>>>> used
>>>>> during verification. The temporarily used stack space is released
>>>>> before
>>>>> the intrinsic jumps to its target. As a result, the target of a method
>>>>> handle intrinsic will have a correct SP when it returns and the
>>>>> program's control flow is correct.
>>>>>
>>>>> Moreover, if the stack is walked synchronously (e.g., at
>>>>> safepoints), no
>>>>> problems appear either, because the synchronous interruption can
>>>>> happen
>>>>> while execution is within the method handle intrinsic.
>>>>>
>>>>> The problem is that the SunStudio analyzer can interrupt the VM
>>>>> asynchonously and walk the stack. If execution of a thread is
>>>>> interrupted while the thread is in a method handle intrinsic, the SP
>>>>> might contain an invalid value.
>>>>>
>>>>> The new webrev adds a check that marks the current frame unsafe for
>>>>> sender if the frame belongs to a method handle intrinsic
>>>>> (frame::safe_for_sender returns false in this case).
>>>>>
>>>>>
>>>>> Change #2 (in src/share/vm/prims/forte.cpp):
>>>>>
>>>>> *** 425,435 ****
>>>>>
>>>>>        RegisterMap map(thd, false);
>>>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>>>      }
>>>>>
>>>>> !   vframeStreamForte st(thd, initial_Java_frame, false);
>>>>>
>>>>>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>>>>>        bci = st.bci();
>>>>>        method = st.method();
>>>>>
>>>>> --- 425,435 ----
>>>>>
>>>>>        RegisterMap map(thd, false);
>>>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>>>      }
>>>>>
>>>>> !   vframeStreamForte st(thd, initial_Java_frame, true);
>>>>>
>>>>>      for (; !st.at_end() && count < depth; st.forte_next(), count++) {
>>>>>        bci = st.bci();
>>>>>        method = st.method();
>>>>>
>>>>> The problem is that the following assert in forte.cpp on line 103
>>>>>
>>>>> assert(filled_in, "invariant");
>>>>>
>>>>> fails. The problem appears if we have a stack trace like:
>>>>>
>>>>> V  [libjvm.so+0x1c98c4a]  void VMError::report(outputStream*)+0xb1a
>>>>> V  [libjvm.so+0x1c9a3e8]  void VMError::report_and_die()+0x748
>>>>> V  [libjvm.so+0x1003c8e]  void report_vm_error(const char*,int,const
>>>>> char*,const char*)+0x7e
>>>>> V  [libjvm.so+0x10efa22] vframeStreamForte::vframeStreamForte
>>>>> #Nvariant
>>>>> 1(JavaThread*,frame,bool)+0xe2
>>>>> --> (Frame #5) V  [libjvm.so+0x10f0bb9]  void
>>>>> forte_fill_call_trace_given_top(JavaThread*,ASGCT_CallTrace*,int,frame)+0x789
>>>>>
>>>>>
>>>>>
>>>>> V  [libjvm.so+0x10f1436]  AsyncGetCallTrace+0x246
>>>>> C  [libcollector.so+0x272a8] __collector_ext_jstack_unwind+0xb8
>>>>> C  [libcollector.so+0x277df] __collector_get_frame_info+0x27f
>>>>> C  [libcollector.so+0x2f093]  __collector_getUserCtx+0x13
>>>>> C  [libcollector.so+0x1abc7] __collector_ext_profile_handler+0x127
>>>>> C  [libcollector.so+0x17535] collector_sigprof_dispatcher+0x85
>>>>> C  [libc.so.1+0x122476]  __sighndlr+0x6
>>>>> C  [libc.so.1+0x115972]  call_user_handler+0x2ce
>>>>> C  [libc.so.1+0x115e1b]  sigacthandler+0xdb
>>>>> C  0xffffffffffffffff
>>>>> V  [libjvm.so+0x1959089]  void os::PlatformEvent::park()+0xd9
>>>>> V  [libjvm.so+0x18a6b34]  int ParkCommon(ParkEvent*,long)+0x34
>>>>> V  [libjvm.so+0x18a7657]  int Monitor::IWait(Thread*,long)+0xb7
>>>>> V  [libjvm.so+0x18a8b86]  bool Monitor::wait(bool,long,bool)+0x346
>>>>> V  [libjvm.so+0xf7073d]  void
>>>>> CompileBroker::wait_for_completion(CompileTask*)+0xad
>>>>> V  [libjvm.so+0xf6f6b6]  void
>>>>> CompileBroker::compile_method_base(methodHandle,int,int,methodHandle,int,const
>>>>>
>>>>>
>>>>> char*,Thread*)+0x406
>>>>> V  [libjvm.so+0xf6fd96]
>>>>> nmethod*CompileBroker::compile_method(methodHandle,int,int,methodHandle,int,const
>>>>>
>>>>>
>>>>> char*,Thread*)+0x586
>>>>> V  [libjvm.so+0xbbad72]  void
>>>>> AdvancedThresholdPolicy::submit_compile(methodHandle,int,CompLevel,JavaThread*)+0xb2
>>>>>
>>>>>
>>>>>
>>>>> V  [libjvm.so+0x1aef92f]  void
>>>>> SimpleThresholdPolicy::compile(methodHandle,int,CompLevel,JavaThread*)+0x14f
>>>>>
>>>>>
>>>>>
>>>>> V  [libjvm.so+0xbbb00f]  void
>>>>> AdvancedThresholdPolicy::method_invocation_event(methodHandle,methodHandle,CompLevel,nmethod*,JavaThread*)+0x1ff
>>>>>
>>>>>
>>>>>
>>>>> V  [libjvm.so+0x1aef765]
>>>>> nmethod*SimpleThresholdPolicy::event(methodHandle,methodHandle,int,int,CompLevel,nmethod*,JavaThread*)+0x2e5
>>>>>
>>>>>
>>>>>
>>>>> V  [libjvm.so+0xdb93bc]  unsigned
>>>>> char*Runtime1::counter_overflow(JavaThread*,int,Method*)+0x31c
>>>>> v  ~RuntimeStub::counter_overflow Runtime1 stub
>>>>> --> (Frame #29) J 143 C1
>>>>> java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @
>>>>> 0xffff80ffacc6962a [0xffff80ffacc69580+0xaa]
>>>>> --> (Frame #30) v  ~StubRoutines::call_stub
>>>>> V  [libjvm.so+0x13ca50b]  void
>>>>> JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0x41b
>>>>>
>>>>>
>>>>>
>>>>> V  [libjvm.so+0x152a111]  JVM_DoPrivileged+0xfb1
>>>>> C  [libjava.so+0x12f42]
>>>>> Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x12
>>>>>
>>>>>
>>>>>
>>>>> J 142
>>>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;
>>>>>
>>>>>
>>>>> (0 bytes) @ 0xffff80ffb400c57c [0xffff80ffb400c420+0x15c\
>>>>> ]
>>>>> J 134 C1
>>>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class;
>>>>> (47 bytes) @ 0xffff80ffacc66014 [0xffff80ffacc65ec0+0x154]
>>>>> ... more stack frames
>>>>>
>>>>> The forte_fill_call_trace_given_top() method (Frame #5) first
>>>>> checks if
>>>>> the first Java frame found is fully decipherable (line 395 in
>>>>> forte.cpp). In our case the first Java frame is Frame #29 (the
>>>>> C1-compiled version of java.net.URLClassLoader$1.run).
>>>>>
>>>>> In our case Frame #29 is not decipherable, because
>>>>> java.net.URLClassLoader$1.run has been made "not entrant" (a
>>>>> C2-compiled
>>>>> version of the same method has been produced shortly before).
>>>>>
>>>>> Afterwards, forte_fill_call_trace_given_top() checks if the method is
>>>>> "safe for sender" (line 424 in forte.cpp). The caller of the
>>>>> java.net.URLClassLoader$1.run method is ~StubRoutines::call_stub,
>>>>> which
>>>>> is considered "safe for sender" by the VM.
>>>>>
>>>>> Then, initial_Java_frame is set to the ~StubRoutines::call_stub stub
>>>>> (line 430). This does not seem to be correct because the stub is not a
>>>>> Java method and causes the assert(filled_in, "invariant") in the
>>>>> constructor of vframeStreamForte (line 103 in forte.cpp) to fail
>>>>> (because the frame cannot be filled from a stub).
>>>>>
>>>>> To avoid this failure, I propose to call the constructor of
>>>>> vframeStreamForte with parameter stop_at_java_call_stub set to true
>>>>> (instead of false) so that the VM stops walking the stack if a call
>>>>> stub
>>>>> has been reached.
>>>>>
>>>>>
>>>>> Here is the updated webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.02/
>>>>>
>>>>> In addition to testing the changeset with the tools mentioned
>>>>> before, I
>>>>> executed
>>>>> - all JPRT tests, all pass;
>>>>> - all java/lang/invoke and compiler JTREG tests; all tests that pass
>>>>> with the unmodified source trace pass with the changes as well.
>>>>>
>>>>> Thank you very much in advance!
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>>>>
>>>>>> So that will put this RFR on hold for a while, unfortunately.
>>>>>>
>>>>>> Thank you for the feedback and suggestions so far!
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>>>>>>>> Hi Ed,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> thank you for your feedback! Please see comments below.
>>>>>>>>>>
>>>>>>>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>>>>>>>> Hi Zolt?n,
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>>>>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler
>>>>>>>>>>>> tests and
>>>>>>>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32.
>>>>>>>>>>>> All
>>>>>>>>>>>> tests
>>>>>>>>>>>> that pass without the patch pass also with the patch.
>>>>>>>>>>>>
>>>>>>>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>>>>>>>> infrastructure for
>>>>>>>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>>>>>>>> statistically significant performance degradation due to having
>>>>>>>>>>>> proper
>>>>>>>>>>>> frame pointers. Therefore I propose to have OmitFramePointer
>>>>>>>>>>>> set to
>>>>>>>>>>>> false by default on x86_64 (and set to true on all other
>>>>>>>>>>>> platforms).
>>>>>>>>>>> This patch looks good, however I think there is a problem
>>>>>>>>>>> with the
>>>>>>>>>>> logic of OmitFramePointer.
>>>>>>>>>>>
>>>>>>>>>>> Here is my test case.
>>>>>>>>>>>
>>>>>>>>>>> --- CUT HERE ---
>>>>>>>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>>>>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>>>>>>>
>>>>>>>>>>> public class fibo {
>>>>>>>>>>>      public static void main(String args[]) {
>>>>>>>>>>>     int N = Integer.parseInt(args[0]);
>>>>>>>>>>>     System.out.println(fib(N));
>>>>>>>>>>>      }
>>>>>>>>>>>      public static int fib(int n) {
>>>>>>>>>>>     if (n < 2) return(1);
>>>>>>>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>>>>>>>      }
>>>>>>>>>>> }
>>>>>>>>>>> --- CUT HERE ---
>>>>>>>>>>>
>>>>>>>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>>>>>>>
>>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>>>
>>>>>>>>>>> I get
>>>>>>>>>>>
>>>>>>>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>>    0x00007fc625071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>>    0x00007fc625071107: push   %rbp
>>>>>>>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>>>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>>>>
>>>>>>>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is
>>>>>>>>>>> using
>>>>>>>>>>> the frame pointer
>>>>>>>>>>>
>>>>>>>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>>>>>>>> -XX:+OmitFramePointer in the above I get
>>>>>>>>>>>
>>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>>>
>>>>>>>>>>> I get
>>>>>>>>>>>
>>>>>>>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>>    0x00007f14e1071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>>    0x00007f14e1071107: push   %rbp
>>>>>>>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>>>>
>>>>>>>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it
>>>>>>>>>>> does
>>>>>>>>>>> not
>>>>>>>>>>> use a frame pointer.
>>>>>>>>>>>
>>>>>>>>>>> However, if I now delete the -XX:+/-OmitFramePointer
>>>>>>>>>>> altogether, IE
>>>>>>>>>>>
>>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>> -XX:+PrintAssembly fibo 43
>>>>>>>>>>>
>>>>>>>>>>> I get
>>>>>>>>>>>
>>>>>>>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>>    0x00007f0c75071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>>    0x00007f0c75071107: push   %rbp
>>>>>>>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization entry
>>>>>>>>>>>
>>>>>>>>>>> It is not using a frame pointer which is the equivalent of
>>>>>>>>>>> -XX:+OmitFramePointer. However in your description above you say
>>>>>>>>>>>
>>>>>>>>>>>> Therefore I propose to have OmitFramePointer set to false by
>>>>>>>>>>>> default
>>>>>>>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>>>>>>> whereas OmitFramePointer actually seems to be set to true on
>>>>>>>>>>> x86_64
>>>>>>>>>>>
>>>>>>>>>>> I think the problem may be with the declaration and
>>>>>>>>>>> definition of
>>>>>>>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>>>>>>>
>>>>>>>>>>> In globals.hpp it does
>>>>>>>>>>>
>>>>>>>>>>> product(bool, OmitFramePointer, true,
>>>>>>>>>>>
>>>>>>>>>>> In globals_x86.hpp it does
>>>>>>>>>>>
>>>>>>>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>>>>>>>
>>>>>>>>>>> I am not sure that you can mix product(...) and product_pd(...)
>>>>>>>>>>> like
>>>>>>>>>>> this, so I think it just ends up getting the default from the
>>>>>>>>>>> product(...).
>>>>>>>>>>
>>>>>>>>>> You are right, mixing product and product_pd does not make sense
>>>>>>>>>> at all.
>>>>>>>>>> Thank you for doing additional testing and for drawing attention
>>>>>>>>>> to the
>>>>>>>>>> problem.
>>>>>>>>>>
>>>>>>>>>> I updated the code to use product_pd and define_pd_global on all
>>>>>>>>>> relevant platforms.
>>>>>>>>>>
>>>>>>>>>>> Aside: In general, I do not like options which include a
>>>>>>>>>>> negative in
>>>>>>>>>>> them because I have to do a double think when I see something
>>>>>>>>>>> like,
>>>>>>>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer,
>>>>>>>>>>> therefore it is using a frame pointer. How about FramePointer
>>>>>>>>>>> so we
>>>>>>>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>>>>>>>> -XX:-FramePointer to say I don't.
>>>>>>>>>>
>>>>>>>>>> That is a good idea. Double negation is an unnecessary
>>>>>>>>>> complication, so
>>>>>>>>>> I changed the name of the flag to FramePointer, just as you
>>>>>>>>>> suggested.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I did some timing on the above 'fibo' test
>>>>>>>>>>>
>>>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>>>> -XX:-OmitFramePointer fibo 43
>>>>>>>>>>> 701408733
>>>>>>>>>>>
>>>>>>>>>>> real    0m1.545s
>>>>>>>>>>> user    0m1.571s
>>>>>>>>>>> sys    0m0.015s
>>>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>>>> -XX:+OmitFramePointer fibo 43
>>>>>>>>>>> 701408733
>>>>>>>>>>>
>>>>>>>>>>> real    0m1.504s
>>>>>>>>>>> user    0m1.527s
>>>>>>>>>>> sys    0m0.019s
>>>>>>>>>>>
>>>>>>>>>>> which is ~3% difference on this test case. On aarch64, I see ~7%
>>>>>>>>>>> difference on this test case.
>>>>>>>>>>
>>>>>>>>>> Thank you for the performance measurements!
>>>>>>>>>>
>>>>>>>>>>> With the above change to fix the logic of OmitFramePointer (and
>>>>>>>>>>> possible change its name) the patch looks good to me.
>>>>>>>>>>
>>>>>>>>>> Here is the updated webrev (the same webrev that was already
>>>>>>>>>> included
>>>>>>>>>> into my reply to Roland):
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>>>>>>>
>>>>>>>>>>> I will prepare a mirror patch for aarch64.
>>>>>>>>>>
>>>>>>>>>> That would be great!
>>>>>>>>>>
>>>>>>>>>> Thank you and best regards,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Zolt?n
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> All the best,
>>>>>>>>>>> Ed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>

From vladimir.kozlov at oracle.com  Fri Apr 24 18:18:27 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 24 Apr 2015 11:18:27 -0700
Subject: RFR(S) 8077504: Unsafe load can loose control dependency and
	cause crash
In-Reply-To: <6AC5258A-A40C-4121-8B1F-0076713345D4@oracle.com>
References: <42EDECBF-2347-406A-8835-29A98F1CA153@oracle.com>	<552FC216.4010503@redhat.com>	<95F149F4-7BE2-45C8-A3C0-DEACFE42ACDF@oracle.com>
	<6AC5258A-A40C-4121-8B1F-0076713345D4@oracle.com>
Message-ID: <553A88F3.2010700@oracle.com>

I agree that we have to pass parameter to GraphKit::make_load().
I thought we can avoid it for LoadNode::make() but it has transform for 
compressed oops. AARGH!

Add comment to code in library_call.cpp why we set flag to false.

BTW, should we modify LoadNode::hash() to include _depends_only_on_test 
and prevent igvning?

Thanks,
Vladimir

On 4/24/15 1:03 AM, Roland Westrelin wrote:
>
>> Vladimir suggested privately to set _depends_only_on_test to true in the constructor and then use an explicit call to a new a method set_depends_only_on_test() to set it to false in the rare cases where it?s needed. That feels better indeed. What do you think?
>
> Actually, using a set_depends_only_on_test() method doesn?t work well. In LibraryCallKit::inline_unsafe_access() the node returned by make_load() may have been transformed already and we could call set_depends_only_on_test() on a node that doesn?t need to be pinned. The call to set_depends_only_on_test() would have to be in LoadNode::make(). I went with default parameters instead to keep the change small:
>
> http://cr.openjdk.java.net/~roland/8077504/webrev.01/
>
> Roland.
>

From zoltan.majo at oracle.com  Fri Apr 24 18:30:37 2015
From: zoltan.majo at oracle.com (=?windows-1252?Q?Zolt=E1n_Maj=F3?=)
Date: Fri, 24 Apr 2015 20:30:37 +0200
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <553A82D9.1090500@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<551BF4D3.90805@oracle.com>	<5537CBAE.9020500@oracle.com>	<553816F9.1040104@oracle.com>	<553A2DC8.3080804@oracle.com>	<553A6ECE.8090908@oracle.com>	<553A7B9A.5060407@oracle.com>
	<553A82D9.1090500@oracle.com>
Message-ID: <553A8BCD.4010506@oracle.com>

Hi Vladimir,


On 04/24/2015 07:52 PM, Vladimir Kozlov wrote:
> > - (2) the assert is there only for SunStudio (according to the comments
> > on lines 439--442 in vframe.cpp) and AsyncGetCallTrace (the only place
> > where forte_fill_call_trace_given_top() is called) does not expect 
> debug
> > information to available for all frames, it just needs the jmethodID 
> for
> > every frame.
>
> This is incorrect interpretation of comments. The assert is in general 
> code to catch incorrect bci or pcoffset during vframe construction. 
> The comment said that instead of having several asserts we have only 
> one for all these case to simplify disabling it when running 
> performance analyzer.

Thank you for the clarification!

> So you can only skip it when you run with performance analyzer but not 
> during general execution.
>
> May be you can set or path flag when vframe construction is called 
> from forte and skip assert in such case. I assume you hit it only with 
> performance analyzer. Right?

You are right. The assert can be triggered only if vframe construction 
is called from forte, but not otherwise. So setting the command-line 
flag -XX:SuppressErrorAt=vframe.cpp:443 (as suggested in vframe.cpp 
lines 439--442) is sufficient to skip the assert when running a 
fastdebug VM with the performance analyzer.

I'll leave the assert enabled and intend to push webrev.03 then.

Thank you and best regards,


Zoltan

>
> Thanks,
> Vladimir
>
> On 4/24/15 10:21 AM, Zolt?n Maj? wrote:
>> Hi Vladimir,
>>
>>
>> On 04/24/2015 06:26 PM, Vladimir Kozlov wrote:
>>> Yes, this looks good. Nice work and thank you for testing.
>>
>> Thank you for all the feedback you've provided while I was working on
>> this issue!
>>
>> I would have one final question: Can I disable the assert in
>> vframeStreamCommon::found_bad_method_frame() on line 443 in vframe.cpp?
>> I would prefer that for the reasons I mentioned in my previous message
>> (please see also below).
>>
>> Here is an updated webrev (with the assert disabled):
>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.04/
>>
>> Could you please let me know which webrev I can push, webrev.03 with the
>> assert left enabled or webrev.04 with the assert disabled?
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>
>>>> I ran the experiments *with the assert enabled* in
>>>> vframeStreamCommon::found_bad_method_frame(). The assert was not
>>>> triggered.
>>>>
>>>> I would be, however, inclined to disable that assert because:
>>>> - (1) it is better to have some information in stack traces than no
>>>> information and a crash (even though the crash happens in a fastdebug
>>>> build);
>>>> - (2) the assert is there only for SunStudio (according to the 
>>>> comments
>>>> on lines 439--442 in vframe.cpp) and AsyncGetCallTrace (the only place
>>>> where forte_fill_call_trace_given_top() is called) does not expect 
>>>> debug
>>>> information to available for all frames, it just needs the 
>>>> jmethodID for
>>>> every frame.
>>>>
>>>>> Do you mean "can NOT happen"?:
>>>>> > Moreover, if the stack is walked synchronously (e.g., at
>>>>> safepoints), no
>>>>> > problems appear either, because the synchronous interruption can
>>>>> happen
>>>>> > while execution is within the method handle intrinsic.
>>>>
>>>> Yes, I meant that it can *not* happen. Sorry for the confusion.
>>>>
>>>> Thank you and best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>>
>>>>> Vladimir
>>>>>
>>>>> On 4/22/15 9:26 AM, Zolt?n Maj? wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>>
>>>>>> I managed to do some more work on this enhancement. Please see 
>>>>>> details
>>>>>> below.
>>>>>>
>>>>>> On 04/01/2015 03:38 PM, Zolt?n Maj? wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>>
>>>>>>> On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
>>>>>>>> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>>>>>>>>> Hi Vladimir,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> thank you for the feedback!
>>>>>>>>>
>>>>>>>>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>>>>>>>>> How about PreserveFramePointer instead of simple FramePointer?
>>>>>>>>>>
>>>>>>>>>> PreserveFramePointer will mean that compiled (or other) code 
>>>>>>>>>> will
>>>>>>>>>> use
>>>>>>>>>> that register only as Frame pointer.
>>>>>>>>>
>>>>>>>>> I will change the flag's name to PreserveFramePointer and will 
>>>>>>>>> also
>>>>>>>>> update the description.
>>>>>>
>>>>>> I changed the flag's name to PreserveFramePointer, just as you
>>>>>> suggested.
>>>>>>
>>>>>>>>>
>>>>>>>>>> Zoltan, x86 flags setting should be in general globals_x86.hpp.
>>>>>>>>>> You
>>>>>>>>>> can #ifdef _LP64 there too. I don't understand why you only set
>>>>>>>>>> it to
>>>>>>>>>> true on linux-x64.
>>>>>>>>>
>>>>>>>>> I remembered that the original discussion with Brendan Gregg
>>>>>>>>> mentioned
>>>>>>>>> only Linux's perf tool as a possible use case for "proper" frame
>>>>>>>>> pointers. So I was unsure whether to enable proper frame
>>>>>>>>> pointers by
>>>>>>>>> default on other x64 platforms as well.
>>>>>>>>>
>>>>>>>>> But if you think it would be better to have proper frame
>>>>>>>>> pointers on
>>>>>>>>> all
>>>>>>>>> x64 platforms, I will change the code to set
>>>>>>>>> PreserveFramePointer to
>>>>>>>>> true for all x64 platforms. Just please let me know.
>>>>>>
>>>>>> The current webrev sets the PreserveFramePointer flag to to true on
>>>>>> all
>>>>>> x86_64 platforms and to false on all other platforms.
>>>>>>
>>>>>>>>
>>>>>>>> Currently compiled code for all x86 platforms is almost the same
>>>>>>>> (win64 has difference in registers usage) and we should keep it 
>>>>>>>> that
>>>>>>>> way.
>>>>>>>>
>>>>>>>> Also the original request was to have flag to enable such behavior
>>>>>>>> (use RBP only as FP). So to have it off by default is 
>>>>>>>> acceptable. If
>>>>>>>> performance group or someone find a regression (or bug) due to 
>>>>>>>> this
>>>>>>>> change we can switch the flag off by default before jdk9 release.
>>>>>>>>
>>>>>>>> Try to run pstack on Solaris and jstack on OSX to make sure they
>>>>>>>> report correct call stack with compiled java methods. And JFR.
>>>>>>>> Also it would be nice to run SunStudio analyzer to verify that it
>>>>>>>> works.
>>>>>>>
>>>>>>> I ran all tools you've suggested. JFR and jstack is unaffected,
>>>>>>> pstack
>>>>>>> produces nice stack traces (it did not always do so before).
>>>>>>
>>>>>> I tested the current webrev with the following setup: I used two
>>>>>> tests,
>>>>>> one that generates a long chain of lambda form invocations and an
>>>>>> other
>>>>>> one that generates a long chain of "regular" method invocations. 
>>>>>> Both
>>>>>> tests were executed on an x64 machine in four configurations: 
>>>>>> with +/-
>>>>>> Xcomp and with +/- PreserveFramePointer.
>>>>>>
>>>>>> Just as before, JFR and jstack stack traces are unaffected for both
>>>>>> tests, pstack can now produce stack traces with both tests if
>>>>>> PreserveFramePointer is enabled.
>>>>>>
>>>>>>> However, I've encountered a problem with SunStudio: Two asserts 
>>>>>>> fail
>>>>>>> in the fastdebug build. Both of them  "soft" failures, as 
>>>>>>> neither the
>>>>>>> VM nor SunStudio crash with the product build. I worked on the
>>>>>>> problem
>>>>>>> today and have a partial understanding of the issue, but more
>>>>>>> investigation is needed to have a patch that preserves the correct
>>>>>>> behavior of SunStudio as well.
>>>>>>
>>>>>> I was able to track down the problems with SunStudio. I had to 
>>>>>> change
>>>>>> the code at two places.
>>>>>>
>>>>>>
>>>>>> Change #1 (in src/cpu/x86/vm/frame_x86.cpp):
>>>>>>
>>>>>> *** 222,232 ****
>>>>>>        }
>>>>>>
>>>>>>        if (sender_blob->is_nmethod()) {
>>>>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>>>>            if (nm != NULL) {
>>>>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>>>>> nm->is_deopt_entry(sender_pc)) {
>>>>>>                    return false;
>>>>>>                }
>>>>>>            }
>>>>>>        }
>>>>>>
>>>>>> --- 222,233 ----
>>>>>>        }
>>>>>>
>>>>>>        if (sender_blob->is_nmethod()) {
>>>>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>>>>            if (nm != NULL) {
>>>>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>>>>> nm->is_deopt_entry(sender_pc) ||
>>>>>> ! nm->method()->is_method_handle_intrinsic()) {
>>>>>>                    return false;
>>>>>>                }
>>>>>>            }
>>>>>>        }
>>>>>>
>>>>>> The reason for this change is the following. Method handle 
>>>>>> intrinsics
>>>>>> (i.e., the intrinsics _invokeBasic, _linkToVirtual,_linkToStatic,
>>>>>> _linkToSpecial, and _linkToInterface) do not allocate stack space 
>>>>>> when
>>>>>> invoked, but they can extend the stack space of their caller
>>>>>> "temporarily".
>>>>>>
>>>>>> For example, if VerifyMethodHandles is enabled, some stack space is
>>>>>> used
>>>>>> during verification. The temporarily used stack space is released
>>>>>> before
>>>>>> the intrinsic jumps to its target. As a result, the target of a 
>>>>>> method
>>>>>> handle intrinsic will have a correct SP when it returns and the
>>>>>> program's control flow is correct.
>>>>>>
>>>>>> Moreover, if the stack is walked synchronously (e.g., at
>>>>>> safepoints), no
>>>>>> problems appear either, because the synchronous interruption can
>>>>>> happen
>>>>>> while execution is within the method handle intrinsic.
>>>>>>
>>>>>> The problem is that the SunStudio analyzer can interrupt the VM
>>>>>> asynchonously and walk the stack. If execution of a thread is
>>>>>> interrupted while the thread is in a method handle intrinsic, the SP
>>>>>> might contain an invalid value.
>>>>>>
>>>>>> The new webrev adds a check that marks the current frame unsafe for
>>>>>> sender if the frame belongs to a method handle intrinsic
>>>>>> (frame::safe_for_sender returns false in this case).
>>>>>>
>>>>>>
>>>>>> Change #2 (in src/share/vm/prims/forte.cpp):
>>>>>>
>>>>>> *** 425,435 ****
>>>>>>
>>>>>>        RegisterMap map(thd, false);
>>>>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>>>>      }
>>>>>>
>>>>>> !   vframeStreamForte st(thd, initial_Java_frame, false);
>>>>>>
>>>>>>      for (; !st.at_end() && count < depth; st.forte_next(), 
>>>>>> count++) {
>>>>>>        bci = st.bci();
>>>>>>        method = st.method();
>>>>>>
>>>>>> --- 425,435 ----
>>>>>>
>>>>>>        RegisterMap map(thd, false);
>>>>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>>>>      }
>>>>>>
>>>>>> !   vframeStreamForte st(thd, initial_Java_frame, true);
>>>>>>
>>>>>>      for (; !st.at_end() && count < depth; st.forte_next(), 
>>>>>> count++) {
>>>>>>        bci = st.bci();
>>>>>>        method = st.method();
>>>>>>
>>>>>> The problem is that the following assert in forte.cpp on line 103
>>>>>>
>>>>>> assert(filled_in, "invariant");
>>>>>>
>>>>>> fails. The problem appears if we have a stack trace like:
>>>>>>
>>>>>> V  [libjvm.so+0x1c98c4a]  void VMError::report(outputStream*)+0xb1a
>>>>>> V  [libjvm.so+0x1c9a3e8]  void VMError::report_and_die()+0x748
>>>>>> V  [libjvm.so+0x1003c8e]  void report_vm_error(const char*,int,const
>>>>>> char*,const char*)+0x7e
>>>>>> V  [libjvm.so+0x10efa22] vframeStreamForte::vframeStreamForte
>>>>>> #Nvariant
>>>>>> 1(JavaThread*,frame,bool)+0xe2
>>>>>> --> (Frame #5) V  [libjvm.so+0x10f0bb9]  void
>>>>>> forte_fill_call_trace_given_top(JavaThread*,ASGCT_CallTrace*,int,frame)+0x789 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> V  [libjvm.so+0x10f1436]  AsyncGetCallTrace+0x246
>>>>>> C  [libcollector.so+0x272a8] __collector_ext_jstack_unwind+0xb8
>>>>>> C  [libcollector.so+0x277df] __collector_get_frame_info+0x27f
>>>>>> C  [libcollector.so+0x2f093] __collector_getUserCtx+0x13
>>>>>> C  [libcollector.so+0x1abc7] __collector_ext_profile_handler+0x127
>>>>>> C  [libcollector.so+0x17535] collector_sigprof_dispatcher+0x85
>>>>>> C  [libc.so.1+0x122476]  __sighndlr+0x6
>>>>>> C  [libc.so.1+0x115972]  call_user_handler+0x2ce
>>>>>> C  [libc.so.1+0x115e1b]  sigacthandler+0xdb
>>>>>> C  0xffffffffffffffff
>>>>>> V  [libjvm.so+0x1959089]  void os::PlatformEvent::park()+0xd9
>>>>>> V  [libjvm.so+0x18a6b34]  int ParkCommon(ParkEvent*,long)+0x34
>>>>>> V  [libjvm.so+0x18a7657]  int Monitor::IWait(Thread*,long)+0xb7
>>>>>> V  [libjvm.so+0x18a8b86]  bool Monitor::wait(bool,long,bool)+0x346
>>>>>> V  [libjvm.so+0xf7073d]  void
>>>>>> CompileBroker::wait_for_completion(CompileTask*)+0xad
>>>>>> V  [libjvm.so+0xf6f6b6]  void
>>>>>> CompileBroker::compile_method_base(methodHandle,int,int,methodHandle,int,const 
>>>>>>
>>>>>>
>>>>>>
>>>>>> char*,Thread*)+0x406
>>>>>> V  [libjvm.so+0xf6fd96]
>>>>>> nmethod*CompileBroker::compile_method(methodHandle,int,int,methodHandle,int,const 
>>>>>>
>>>>>>
>>>>>>
>>>>>> char*,Thread*)+0x586
>>>>>> V  [libjvm.so+0xbbad72]  void
>>>>>> AdvancedThresholdPolicy::submit_compile(methodHandle,int,CompLevel,JavaThread*)+0xb2 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> V  [libjvm.so+0x1aef92f]  void
>>>>>> SimpleThresholdPolicy::compile(methodHandle,int,CompLevel,JavaThread*)+0x14f 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> V  [libjvm.so+0xbbb00f]  void
>>>>>> AdvancedThresholdPolicy::method_invocation_event(methodHandle,methodHandle,CompLevel,nmethod*,JavaThread*)+0x1ff 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> V  [libjvm.so+0x1aef765]
>>>>>> nmethod*SimpleThresholdPolicy::event(methodHandle,methodHandle,int,int,CompLevel,nmethod*,JavaThread*)+0x2e5 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> V  [libjvm.so+0xdb93bc]  unsigned
>>>>>> char*Runtime1::counter_overflow(JavaThread*,int,Method*)+0x31c
>>>>>> v  ~RuntimeStub::counter_overflow Runtime1 stub
>>>>>> --> (Frame #29) J 143 C1
>>>>>> java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @
>>>>>> 0xffff80ffacc6962a [0xffff80ffacc69580+0xaa]
>>>>>> --> (Frame #30) v  ~StubRoutines::call_stub
>>>>>> V  [libjvm.so+0x13ca50b]  void
>>>>>> JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0x41b 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> V  [libjvm.so+0x152a111]  JVM_DoPrivileged+0xfb1
>>>>>> C  [libjava.so+0x12f42]
>>>>>> Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x12 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> J 142
>>>>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; 
>>>>>>
>>>>>>
>>>>>>
>>>>>> (0 bytes) @ 0xffff80ffb400c57c [0xffff80ffb400c420+0x15c\
>>>>>> ]
>>>>>> J 134 C1
>>>>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; 
>>>>>>
>>>>>> (47 bytes) @ 0xffff80ffacc66014 [0xffff80ffacc65ec0+0x154]
>>>>>> ... more stack frames
>>>>>>
>>>>>> The forte_fill_call_trace_given_top() method (Frame #5) first
>>>>>> checks if
>>>>>> the first Java frame found is fully decipherable (line 395 in
>>>>>> forte.cpp). In our case the first Java frame is Frame #29 (the
>>>>>> C1-compiled version of java.net.URLClassLoader$1.run).
>>>>>>
>>>>>> In our case Frame #29 is not decipherable, because
>>>>>> java.net.URLClassLoader$1.run has been made "not entrant" (a
>>>>>> C2-compiled
>>>>>> version of the same method has been produced shortly before).
>>>>>>
>>>>>> Afterwards, forte_fill_call_trace_given_top() checks if the 
>>>>>> method is
>>>>>> "safe for sender" (line 424 in forte.cpp). The caller of the
>>>>>> java.net.URLClassLoader$1.run method is ~StubRoutines::call_stub,
>>>>>> which
>>>>>> is considered "safe for sender" by the VM.
>>>>>>
>>>>>> Then, initial_Java_frame is set to the ~StubRoutines::call_stub stub
>>>>>> (line 430). This does not seem to be correct because the stub is 
>>>>>> not a
>>>>>> Java method and causes the assert(filled_in, "invariant") in the
>>>>>> constructor of vframeStreamForte (line 103 in forte.cpp) to fail
>>>>>> (because the frame cannot be filled from a stub).
>>>>>>
>>>>>> To avoid this failure, I propose to call the constructor of
>>>>>> vframeStreamForte with parameter stop_at_java_call_stub set to true
>>>>>> (instead of false) so that the VM stops walking the stack if a call
>>>>>> stub
>>>>>> has been reached.
>>>>>>
>>>>>>
>>>>>> Here is the updated webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.02/
>>>>>>
>>>>>> In addition to testing the changeset with the tools mentioned
>>>>>> before, I
>>>>>> executed
>>>>>> - all JPRT tests, all pass;
>>>>>> - all java/lang/invoke and compiler JTREG tests; all tests that pass
>>>>>> with the unmodified source trace pass with the changes as well.
>>>>>>
>>>>>> Thank you very much in advance!
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>>>
>>>>>>> So that will put this RFR on hold for a while, unfortunately.
>>>>>>>
>>>>>>> Thank you for the feedback and suggestions so far!
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>>
>>>>>>> Zoltan
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you!
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Zoltan
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>>>>>>>>> Hi Ed,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> thank you for your feedback! Please see comments below.
>>>>>>>>>>>
>>>>>>>>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>>>>>>>>> Hi Zolt?n,
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>>>>>>>>> Full JPRT run, all tests pass. I also ran all hotspot 
>>>>>>>>>>>>> compiler
>>>>>>>>>>>>> tests and
>>>>>>>>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32.
>>>>>>>>>>>>> All
>>>>>>>>>>>>> tests
>>>>>>>>>>>>> that pass without the patch pass also with the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>>>>>>>>> infrastructure for
>>>>>>>>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>>>>>>>>> statistically significant performance degradation due to 
>>>>>>>>>>>>> having
>>>>>>>>>>>>> proper
>>>>>>>>>>>>> frame pointers. Therefore I propose to have OmitFramePointer
>>>>>>>>>>>>> set to
>>>>>>>>>>>>> false by default on x86_64 (and set to true on all other
>>>>>>>>>>>>> platforms).
>>>>>>>>>>>> This patch looks good, however I think there is a problem
>>>>>>>>>>>> with the
>>>>>>>>>>>> logic of OmitFramePointer.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is my test case.
>>>>>>>>>>>>
>>>>>>>>>>>> --- CUT HERE ---
>>>>>>>>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>>>>>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>>>>>>>>
>>>>>>>>>>>> public class fibo {
>>>>>>>>>>>>      public static void main(String args[]) {
>>>>>>>>>>>>     int N = Integer.parseInt(args[0]);
>>>>>>>>>>>>     System.out.println(fib(N));
>>>>>>>>>>>>      }
>>>>>>>>>>>>      public static int fib(int n) {
>>>>>>>>>>>>     if (n < 2) return(1);
>>>>>>>>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>>>>>>>>      }
>>>>>>>>>>>> }
>>>>>>>>>>>> --- CUT HERE ---
>>>>>>>>>>>>
>>>>>>>>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>>>>>>>>
>>>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>>>>
>>>>>>>>>>>> I get
>>>>>>>>>>>>
>>>>>>>>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>>>    0x00007fc625071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>>>    0x00007fc625071107: push   %rbp
>>>>>>>>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>>>>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization 
>>>>>>>>>>>> entry
>>>>>>>>>>>>
>>>>>>>>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore 
>>>>>>>>>>>> it is
>>>>>>>>>>>> using
>>>>>>>>>>>> the frame pointer
>>>>>>>>>>>>
>>>>>>>>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>>>>>>>>> -XX:+OmitFramePointer in the above I get
>>>>>>>>>>>>
>>>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>>>>
>>>>>>>>>>>> I get
>>>>>>>>>>>>
>>>>>>>>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>>>    0x00007f14e1071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>>>    0x00007f14e1071107: push   %rbp
>>>>>>>>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization 
>>>>>>>>>>>> entry
>>>>>>>>>>>>
>>>>>>>>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it
>>>>>>>>>>>> does
>>>>>>>>>>>> not
>>>>>>>>>>>> use a frame pointer.
>>>>>>>>>>>>
>>>>>>>>>>>> However, if I now delete the -XX:+/-OmitFramePointer
>>>>>>>>>>>> altogether, IE
>>>>>>>>>>>>
>>>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>>> -XX:+PrintAssembly fibo 43
>>>>>>>>>>>>
>>>>>>>>>>>> I get
>>>>>>>>>>>>
>>>>>>>>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>>>    0x00007f0c75071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>>>    0x00007f0c75071107: push   %rbp
>>>>>>>>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization 
>>>>>>>>>>>> entry
>>>>>>>>>>>>
>>>>>>>>>>>> It is not using a frame pointer which is the equivalent of
>>>>>>>>>>>> -XX:+OmitFramePointer. However in your description above 
>>>>>>>>>>>> you say
>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore I propose to have OmitFramePointer set to false by
>>>>>>>>>>>>> default
>>>>>>>>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>>>>>>>> whereas OmitFramePointer actually seems to be set to true on
>>>>>>>>>>>> x86_64
>>>>>>>>>>>>
>>>>>>>>>>>> I think the problem may be with the declaration and
>>>>>>>>>>>> definition of
>>>>>>>>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>>>>>>>>
>>>>>>>>>>>> In globals.hpp it does
>>>>>>>>>>>>
>>>>>>>>>>>> product(bool, OmitFramePointer, true,
>>>>>>>>>>>>
>>>>>>>>>>>> In globals_x86.hpp it does
>>>>>>>>>>>>
>>>>>>>>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>>>>>>>>
>>>>>>>>>>>> I am not sure that you can mix product(...) and 
>>>>>>>>>>>> product_pd(...)
>>>>>>>>>>>> like
>>>>>>>>>>>> this, so I think it just ends up getting the default from the
>>>>>>>>>>>> product(...).
>>>>>>>>>>>
>>>>>>>>>>> You are right, mixing product and product_pd does not make 
>>>>>>>>>>> sense
>>>>>>>>>>> at all.
>>>>>>>>>>> Thank you for doing additional testing and for drawing 
>>>>>>>>>>> attention
>>>>>>>>>>> to the
>>>>>>>>>>> problem.
>>>>>>>>>>>
>>>>>>>>>>> I updated the code to use product_pd and define_pd_global on 
>>>>>>>>>>> all
>>>>>>>>>>> relevant platforms.
>>>>>>>>>>>
>>>>>>>>>>>> Aside: In general, I do not like options which include a
>>>>>>>>>>>> negative in
>>>>>>>>>>>> them because I have to do a double think when I see something
>>>>>>>>>>>> like,
>>>>>>>>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame 
>>>>>>>>>>>> pointer,
>>>>>>>>>>>> therefore it is using a frame pointer. How about FramePointer
>>>>>>>>>>>> so we
>>>>>>>>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>>>>>>>>> -XX:-FramePointer to say I don't.
>>>>>>>>>>>
>>>>>>>>>>> That is a good idea. Double negation is an unnecessary
>>>>>>>>>>> complication, so
>>>>>>>>>>> I changed the name of the flag to FramePointer, just as you
>>>>>>>>>>> suggested.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I did some timing on the above 'fibo' test
>>>>>>>>>>>>
>>>>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>>>>> -XX:-OmitFramePointer fibo 43
>>>>>>>>>>>> 701408733
>>>>>>>>>>>>
>>>>>>>>>>>> real    0m1.545s
>>>>>>>>>>>> user    0m1.571s
>>>>>>>>>>>> sys    0m0.015s
>>>>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>>>>> -XX:+OmitFramePointer fibo 43
>>>>>>>>>>>> 701408733
>>>>>>>>>>>>
>>>>>>>>>>>> real    0m1.504s
>>>>>>>>>>>> user    0m1.527s
>>>>>>>>>>>> sys    0m0.019s
>>>>>>>>>>>>
>>>>>>>>>>>> which is ~3% difference on this test case. On aarch64, I 
>>>>>>>>>>>> see ~7%
>>>>>>>>>>>> difference on this test case.
>>>>>>>>>>>
>>>>>>>>>>> Thank you for the performance measurements!
>>>>>>>>>>>
>>>>>>>>>>>> With the above change to fix the logic of OmitFramePointer 
>>>>>>>>>>>> (and
>>>>>>>>>>>> possible change its name) the patch looks good to me.
>>>>>>>>>>>
>>>>>>>>>>> Here is the updated webrev (the same webrev that was already
>>>>>>>>>>> included
>>>>>>>>>>> into my reply to Roland):
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>>>>>>>>
>>>>>>>>>>>> I will prepare a mirror patch for aarch64.
>>>>>>>>>>>
>>>>>>>>>>> That would be great!
>>>>>>>>>>>
>>>>>>>>>>> Thank you and best regards,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Zolt?n
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> All the best,
>>>>>>>>>>>> Ed.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>


From vladimir.kozlov at oracle.com  Fri Apr 24 19:03:09 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 24 Apr 2015 12:03:09 -0700
Subject: RFR(S): 8076284: Improve vectorization of parallel streams
In-Reply-To: <39F83597C33E5F408096702907E6C450E3F413@ORSMSX104.amr.corp.intel.com>
References: <39F83597C33E5F408096702907E6C450E3E586@ORSMSX104.amr.corp.intel.com>	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334516@FMSMSX112.amr.corp.intel.com>	<39F83597C33E5F408096702907E6C450E3E5A4@ORSMSX104.amr.corp.intel.com>	<02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com>	<39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com>	<55303823.1020205@oracle.com>	<39F83597C33E5F408096702907E6C450E3F0AC@ORSMSX104.amr.corp.intel.com>
	<39F83597C33E5F408096702907E6C450E3F413@ORSMSX104.amr.corp.intel.com>
Message-ID: <553A936D.2020909@oracle.com>

Updated webrev:

http://cr.openjdk.java.net/~kvn/8076284/webrev.01/

Vladimir

On 4/20/15 11:45 PM, Civlin, Jan wrote:
> Vladimir,
>
> Here is the description and new patch with the changes you recommended (except the last one - see below my explanation).
>
>
> The patch description.
>
> This patch provides on-demand vectorization/SIMD'ing of a <method> specified in JVM command as -XX:CompileCommand=option,<method>,Vectorize.
> This optimization may be globally disabled by setting the flag -XX: -AllowVectorizeOnDemand (by default it is true).
>
> For each method that was specified with Vectorize option we do the following:
>
> 1. On each iteration of loop unroll for a given method (loopopts.cpp) we generate the next _clone_idx (which will be common for all the nodes cloned in this iteration); and on each node cloning we hash _idx of the origin of the node that is cloned (_idx_clone_orig ) and the _clone_idx (cm.verify_insert_and_clone).
> CloneMap belongs to Compile and is created in CompilerWrapper.
>
> 2. In SuperWord optimization, after max_depth has been built, we are hoisting the loads.
> For this we for each Load_X (subject of the hoisting) find some Load_0 that has the same origin as Load_X but belongs to the first iteration, i.e. if the MemNode::Memory input of Load_0 is memory Phi (collected previously in memory_slice) we set this Phi also as the MemNode::Memory input of Load_X. After this rebuild of the graph we restart the Superword optimization.
>
> The major routines here.
> - SuperWord::mark_generations: computes _ii_first (the index _clone_idx) of the nodes that have MemNode::Memory input coming from a phi node in some slice; computes list of the nodes in the first and last iterations of the loop.
> - SuperWord::hoist_loads_in_graph: for each memory slice (a phi node) visits each load that has this phi as a memory input and then for each other load that has the same origin makes the memory input coming from the phi.  This routine does not use marking generations mechanism.
> - SuperWord::pack_parallel - this routine is called only if SuperWord fails to produce any pack after extend_packlist(); it is another algorithm for packing instructions into SIMD.
> It goes thru the list of all instructions in the _iteration_first, and if it is a Load, Store, Add or Mul it starts a new pack and adds this instruction to this pack. Then the algorithm circulates thru the iterations of the loop (gen < _ii_order.length()) and over the instructions list and finds the node with the origin coinciding with the origin of the nodes already in the pack - then it adds this node to the pack. Once packs are built, SuperWord returns to the normal processing (combine_packs()).
>
> Note, that neither 2 or 3 goes thru the data dependency analysis, since the correctness of parallelization was guaranteed by the user.
> Note, that some checks in added code could be omitted. But we are not assume that there is no optimization (now or in the future) that can change the graph structure between loop unrolling and the SuperWord, so we prefer to run many (probably unneeded) checks in the SuperWord.
>
>
>
>>> Can you also utilize changes done by Michael Berg for reduction
>>> optimization (the code in jdk9/hs-comp already)? I mean marking some
>>> nodes before unrolling and searching Phis.
> Michael Berg and I looked at what you suggested (phi marking before unroll?) and we both think this marking is very different than what I do: Michael's phi marking is just one bit per node (in the flags) whereas I collect the _idx_clone_orig and the unroll generation (clone_idx).
>
>
>   -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, April 16, 2015 3:31 PM
> To: Civlin, Jan; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR(S): 8076284: Improve vectorization of parallel streams
>
> Hi Jan,
>
> You did not describe your changes in details (what they do).
>
> IgnoreVectorizeMethod flag should positive and enabled by default.
> Rename it to AllowVectorizeOnDemand (or something similar):
>
> +  product(bool, AllowVectorizeOnDemand, true,
>        \
>
> Instead of next you should add intrinsic definition to
> and classfile/vmSymbols.hpp and then check method()->intrinsic_id():
>
> +    if (strcmp("forEachRemaining", method()->name()->as_quoted_ascii())
> == 0 && method()->signature() != 0
> +      && method()->signature()->as_symbol() != 0 &&
> method()->signature()->as_symbol()->as_quoted_ascii() != 0 ) {
> +      if
> (strstr(method()->signature()->as_symbol()->as_quoted_ascii(),"Ljava/util/function/IntConsumer"))
> {
> +        set_do_vector_loop(true);
> +      }
> +    }
>
> And that should be under flag too because in general forEachRemaining
> should be vectorized only if it is safe.
>
> Can you also utilize changes done by Michael Berg for reduction
> optimization (the code in jdk9/hs-comp already)? I mean marking some
> nodes before unrolling and searching Phis.
>
> Regards,
> Vladimir
>
> On 4/13/15 3:33 AM, Civlin, Jan wrote:
>> Hi All,
>>
>>
>>    We would like to contribute the improvement of vectorization of
>>    parallel streams  from Intel.
>>
>> The contribution Bug ID: 8076284.
>>
>> Please review this patch:
>>
>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076284
>>
>> webrev: http://cr.openjdk.java.net/~kvn/8076284/webrev/
>>
>>
>>        *Description*
>>
>> Improve vectorization of the unordered parallel streams (by vectorizing
>> forEachRemaining method).
>>
>> For example, this forEach will be vectorized:
>>
>> java.util.stream.IntStream iStream = java.util.stream.IntStream.range(0,
>> RANGE - 1).parallel();
>>
>> iStream.forEach( id -> c[id] = c[id] + c[id+1] );
>>
>> It also enables on-demand loop vectorization in a given method (by
>> providing more hints to SuperWord optimization).
>>
>> For example, use -XX:CompileCommand=option,computeCall,Vectorizeto
>> vectorize this loop
>>
>> void computeCall(double [] Call, double  puByDf, double  pdByDf)
>>
>> {
>>
>> for(int i = timeStep; i > 0; i--)
>>
>> for(int j = 0; j <= i - 1; j++)
>>
>> Call[j] = puByDf * Call[j + 1] + pdByDf * Call[j];
>>
>> }
>>
>>
>> This enhancement is contributed by Intel and sponsored by the hotspot
>> compiler team.
>>
>

From vladimir.x.ivanov at oracle.com  Fri Apr 24 23:43:04 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Sat, 25 Apr 2015 02:43:04 +0300
Subject: [9] RFR (S): 8059241: Incremental inlining is too hot when compiling
	Nashorn/Octane
Message-ID: <553AD508.8010609@oracle.com>

http://cr.openjdk.java.net/~vlivanov/8059241/webrev.00
https://bugs.openjdk.java.net/browse/JDK-8059241

According to -XX:+CITime, C2 spends too much time in incremental 
inlining (see the bug for the numbers).

2 observations:
   * PhaseRemoveUseless is performed too frequently (for every 
successful inlining tree) and it becomes more expensive the larger IR is 
(linear complexity);

   * Inlining happens in smaller steps the closer live node count is to 
LiveNodeCountInliningCutoff.

The fix is two-fold:

  (1) Reduce PhaseRemoveUseless frequency: inline in larger chunks until 
IR size LiveNodeCountInliningCutoff, then eliminate dead nodes.

  (2) Have a relatively small (10%) gap between 
LiveNodeCountInliningCutoff and actual limit when inlining step is 
finished to give the algorithm some space to "breath" (hence smallest 
inlining chunk produce at least 10%*LiveNodeCountInliningCutoff nodes).

It leads to significant reduction in incremental inline times (e.g. 
Box2D: Prune Useless: 22s -> 2.2s).

Testing: octane/nashorn (w/ -XX:+CITime)

Best regards,
Vladimir Ivanov

From john.r.rose at oracle.com  Fri Apr 24 23:55:51 2015
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 24 Apr 2015 16:55:51 -0700
Subject: [9] RFR (S): 8059241: Incremental inlining is too hot when
	compiling Nashorn/Octane
In-Reply-To: <553AD508.8010609@oracle.com>
References: <553AD508.8010609@oracle.com>
Message-ID: <D43F0717-BE08-464D-96C7-844E2A535A03@oracle.com>

Nice tweak; reads better too.  s/breath/breathe/.  Reviewed.  ? John

On Apr 24, 2015, at 4:43 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~vlivanov/8059241/webrev.00


From aleksey.shipilev at oracle.com  Sat Apr 25 00:13:16 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Sat, 25 Apr 2015 03:13:16 +0300
Subject: [9] RFR (S): 8059241: Incremental inlining is too hot when
	compiling Nashorn/Octane
In-Reply-To: <553AD508.8010609@oracle.com>
References: <553AD508.8010609@oracle.com>
Message-ID: <553ADC1C.6090805@oracle.com>

On 04/25/2015 02:43 AM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8059241/webrev.00
> https://bugs.openjdk.java.net/browse/JDK-8059241

Thanks, confirmed, seeing the same good improvement in "Pruning" times.
Box2D warmup seems also faster.

-Aleksey.

P.S. See, there are simple hanging fruits. In fact, most new code has
lots of simple things to tweak up.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150425/87a3cbee/signature.asc>

From roland.westrelin at oracle.com  Mon Apr 27 07:39:43 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 27 Apr 2015 09:39:43 +0200
Subject: [9] RFR (S): 8059241: Incremental inlining is too hot when
	compiling Nashorn/Octane
In-Reply-To: <553AD508.8010609@oracle.com>
References: <553AD508.8010609@oracle.com>
Message-ID: <8A6D4EA1-6527-41D9-9DBB-C707C84FECCB@oracle.com>

> http://cr.openjdk.java.net/~vlivanov/8059241/webrev.00

That looks good to me.

Roland.


From roland.westrelin at oracle.com  Mon Apr 27 07:52:43 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Mon, 27 Apr 2015 09:52:43 +0200
Subject: RFR(S) 8077504: Unsafe load can loose control dependency and
	cause crash
In-Reply-To: <553A88F3.2010700@oracle.com>
References: <42EDECBF-2347-406A-8835-29A98F1CA153@oracle.com>
	<552FC216.4010503@redhat.com>
	<95F149F4-7BE2-45C8-A3C0-DEACFE42ACDF@oracle.com>
	<6AC5258A-A40C-4121-8B1F-0076713345D4@oracle.com>
	<553A88F3.2010700@oracle.com>
Message-ID: <CDF55084-CD74-4195-B596-37E7935FCB62@oracle.com>

Thanks for the review, Vladimir. See below.

> I agree that we have to pass parameter to GraphKit::make_load().
> I thought we can avoid it for LoadNode::make() but it has transform for compressed oops. AARGH!
> 
> Add comment to code in library_call.cpp why we set flag to false.
> 
> BTW, should we modify LoadNode::hash() to include _depends_only_on_test and prevent igvning?

If the graph already has a non-pinned LoadNode and we add a pinned LoadNode with the same inputs, it?s safe for GVN to replace the pinned LoadNode by the non-pinned LoadNode, otherwise it wouldn?t be safe to have a non pinned LoadNode with those inputs in the first place. If the graph already has a pinned LoadNode and we add a non pinned LoadNode with the same inputs, it?s safe for GVN to replace the non-pinned LoadNode by the pinned LoadNode. It?s also suboptimal but better than 2 LoadNodes?

I guess we could change LoadNode::hash() and then use LoadNode::Identity/Ideal to make sure the pinned LoadNode is always replaced by the non-pinned LoadNode in the scenarios above but that sounds like extra complexity for something that, as far as we know, never happens in practice.

Roland.


> 
> Thanks,
> Vladimir
> 
> On 4/24/15 1:03 AM, Roland Westrelin wrote:
>> 
>>> Vladimir suggested privately to set _depends_only_on_test to true in the constructor and then use an explicit call to a new a method set_depends_only_on_test() to set it to false in the rare cases where it?s needed. That feels better indeed. What do you think?
>> 
>> Actually, using a set_depends_only_on_test() method doesn?t work well. In LibraryCallKit::inline_unsafe_access() the node returned by make_load() may have been transformed already and we could call set_depends_only_on_test() on a node that doesn?t need to be pinned. The call to set_depends_only_on_test() would have to be in LoadNode::make(). I went with default parameters instead to keep the change small:
>> 
>> http://cr.openjdk.java.net/~roland/8077504/webrev.01/
>> 
>> Roland.
>> 


From vladimir.x.ivanov at oracle.com  Mon Apr 27 08:14:49 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Mon, 27 Apr 2015 11:14:49 +0300
Subject: [9] RFR (S): 8059241: Incremental inlining is too hot when
	compiling Nashorn/Octane
In-Reply-To: <8A6D4EA1-6527-41D9-9DBB-C707C84FECCB@oracle.com>
References: <553AD508.8010609@oracle.com>
	<8A6D4EA1-6527-41D9-9DBB-C707C84FECCB@oracle.com>
Message-ID: <553DEFF9.90801@oracle.com>

Thanks for reviews, John, Roland, and Aleksey.

Best regards,
Vladimir Ivanov

On 4/27/15 10:39 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~vlivanov/8059241/webrev.00
>
> That looks good to me.
>
> Roland.
>

From zoltan.majo at oracle.com  Mon Apr 27 08:45:11 2015
From: zoltan.majo at oracle.com (=?windows-1252?Q?Zolt=E1n_Maj=F3?=)
Date: Mon, 27 Apr 2015 10:45:11 +0200
Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer
	in JIT compiled code on x86
In-Reply-To: <553A8BCD.4010506@oracle.com>
References: <55156A87.1070607@oracle.com>	<1427706703.1606.22.camel@mylittlepony.linaroharston>	<55196C2C.8080106@oracle.com>	<5519B1AE.8070901@oracle.com>	<5519BC6E.1090504@oracle.com>	<5519C29D.8080200@oracle.com>	<551BF4D3.90805@oracle.com>	<5537CBAE.9020500@oracle.com>	<553816F9.1040104@oracle.com>	<553A2DC8.3080804@oracle.com>	<553A6ECE.8090908@oracle.com>	<553A7B9A.5060407@oracle.com>	<553A82D9.1090500@oracle.com>
	<553A8BCD.4010506@oracle.com>
Message-ID: <553DF717.8040603@oracle.com>

Thank you for the reviews, Vladimir, Roland, Dean, Ed, Vitaly, and Aleksey!

Best regards,


Zoltan

On 04/24/2015 08:30 PM, Zolt?n Maj? wrote:
> Hi Vladimir,
>
>
> On 04/24/2015 07:52 PM, Vladimir Kozlov wrote:
>> > - (2) the assert is there only for SunStudio (according to the 
>> comments
>> > on lines 439--442 in vframe.cpp) and AsyncGetCallTrace (the only place
>> > where forte_fill_call_trace_given_top() is called) does not expect 
>> debug
>> > information to available for all frames, it just needs the 
>> jmethodID for
>> > every frame.
>>
>> This is incorrect interpretation of comments. The assert is in 
>> general code to catch incorrect bci or pcoffset during vframe 
>> construction. The comment said that instead of having several asserts 
>> we have only one for all these case to simplify disabling it when 
>> running performance analyzer.
>
> Thank you for the clarification!
>
>> So you can only skip it when you run with performance analyzer but 
>> not during general execution.
>>
>> May be you can set or path flag when vframe construction is called 
>> from forte and skip assert in such case. I assume you hit it only 
>> with performance analyzer. Right?
>
> You are right. The assert can be triggered only if vframe construction 
> is called from forte, but not otherwise. So setting the command-line 
> flag -XX:SuppressErrorAt=vframe.cpp:443 (as suggested in vframe.cpp 
> lines 439--442) is sufficient to skip the assert when running a 
> fastdebug VM with the performance analyzer.
>
> I'll leave the assert enabled and intend to push webrev.03 then.
>
> Thank you and best regards,
>
>
> Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/24/15 10:21 AM, Zolt?n Maj? wrote:
>>> Hi Vladimir,
>>>
>>>
>>> On 04/24/2015 06:26 PM, Vladimir Kozlov wrote:
>>>> Yes, this looks good. Nice work and thank you for testing.
>>>
>>> Thank you for all the feedback you've provided while I was working on
>>> this issue!
>>>
>>> I would have one final question: Can I disable the assert in
>>> vframeStreamCommon::found_bad_method_frame() on line 443 in vframe.cpp?
>>> I would prefer that for the reasons I mentioned in my previous message
>>> (please see also below).
>>>
>>> Here is an updated webrev (with the assert disabled):
>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.04/
>>>
>>> Could you please let me know which webrev I can push, webrev.03 with 
>>> the
>>> assert left enabled or webrev.04 with the assert disabled?
>>>
>>> Thank you and best regards,
>>>
>>>
>>> Zoltan
>>>
>>>>> I ran the experiments *with the assert enabled* in
>>>>> vframeStreamCommon::found_bad_method_frame(). The assert was not
>>>>> triggered.
>>>>>
>>>>> I would be, however, inclined to disable that assert because:
>>>>> - (1) it is better to have some information in stack traces than no
>>>>> information and a crash (even though the crash happens in a fastdebug
>>>>> build);
>>>>> - (2) the assert is there only for SunStudio (according to the 
>>>>> comments
>>>>> on lines 439--442 in vframe.cpp) and AsyncGetCallTrace (the only 
>>>>> place
>>>>> where forte_fill_call_trace_given_top() is called) does not expect 
>>>>> debug
>>>>> information to available for all frames, it just needs the 
>>>>> jmethodID for
>>>>> every frame.
>>>>>
>>>>>> Do you mean "can NOT happen"?:
>>>>>> > Moreover, if the stack is walked synchronously (e.g., at
>>>>>> safepoints), no
>>>>>> > problems appear either, because the synchronous interruption can
>>>>>> happen
>>>>>> > while execution is within the method handle intrinsic.
>>>>>
>>>>> Yes, I meant that it can *not* happen. Sorry for the confusion.
>>>>>
>>>>> Thank you and best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>> On 4/22/15 9:26 AM, Zolt?n Maj? wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>>
>>>>>>> I managed to do some more work on this enhancement. Please see 
>>>>>>> details
>>>>>>> below.
>>>>>>>
>>>>>>> On 04/01/2015 03:38 PM, Zolt?n Maj? wrote:
>>>>>>>> Hi Vladimir,
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/30/2015 11:39 PM, Vladimir Kozlov wrote:
>>>>>>>>> On 3/30/15 2:13 PM, Zolt?n Maj? wrote:
>>>>>>>>>> Hi Vladimir,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> thank you for the feedback!
>>>>>>>>>>
>>>>>>>>>> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote:
>>>>>>>>>>> How about PreserveFramePointer instead of simple FramePointer?
>>>>>>>>>>>
>>>>>>>>>>> PreserveFramePointer will mean that compiled (or other) code 
>>>>>>>>>>> will
>>>>>>>>>>> use
>>>>>>>>>>> that register only as Frame pointer.
>>>>>>>>>>
>>>>>>>>>> I will change the flag's name to PreserveFramePointer and 
>>>>>>>>>> will also
>>>>>>>>>> update the description.
>>>>>>>
>>>>>>> I changed the flag's name to PreserveFramePointer, just as you
>>>>>>> suggested.
>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Zoltan, x86 flags setting should be in general globals_x86.hpp.
>>>>>>>>>>> You
>>>>>>>>>>> can #ifdef _LP64 there too. I don't understand why you only set
>>>>>>>>>>> it to
>>>>>>>>>>> true on linux-x64.
>>>>>>>>>>
>>>>>>>>>> I remembered that the original discussion with Brendan Gregg
>>>>>>>>>> mentioned
>>>>>>>>>> only Linux's perf tool as a possible use case for "proper" frame
>>>>>>>>>> pointers. So I was unsure whether to enable proper frame
>>>>>>>>>> pointers by
>>>>>>>>>> default on other x64 platforms as well.
>>>>>>>>>>
>>>>>>>>>> But if you think it would be better to have proper frame
>>>>>>>>>> pointers on
>>>>>>>>>> all
>>>>>>>>>> x64 platforms, I will change the code to set
>>>>>>>>>> PreserveFramePointer to
>>>>>>>>>> true for all x64 platforms. Just please let me know.
>>>>>>>
>>>>>>> The current webrev sets the PreserveFramePointer flag to to true on
>>>>>>> all
>>>>>>> x86_64 platforms and to false on all other platforms.
>>>>>>>
>>>>>>>>>
>>>>>>>>> Currently compiled code for all x86 platforms is almost the same
>>>>>>>>> (win64 has difference in registers usage) and we should keep 
>>>>>>>>> it that
>>>>>>>>> way.
>>>>>>>>>
>>>>>>>>> Also the original request was to have flag to enable such 
>>>>>>>>> behavior
>>>>>>>>> (use RBP only as FP). So to have it off by default is 
>>>>>>>>> acceptable. If
>>>>>>>>> performance group or someone find a regression (or bug) due to 
>>>>>>>>> this
>>>>>>>>> change we can switch the flag off by default before jdk9 release.
>>>>>>>>>
>>>>>>>>> Try to run pstack on Solaris and jstack on OSX to make sure they
>>>>>>>>> report correct call stack with compiled java methods. And JFR.
>>>>>>>>> Also it would be nice to run SunStudio analyzer to verify that it
>>>>>>>>> works.
>>>>>>>>
>>>>>>>> I ran all tools you've suggested. JFR and jstack is unaffected,
>>>>>>>> pstack
>>>>>>>> produces nice stack traces (it did not always do so before).
>>>>>>>
>>>>>>> I tested the current webrev with the following setup: I used two
>>>>>>> tests,
>>>>>>> one that generates a long chain of lambda form invocations and an
>>>>>>> other
>>>>>>> one that generates a long chain of "regular" method invocations. 
>>>>>>> Both
>>>>>>> tests were executed on an x64 machine in four configurations: 
>>>>>>> with +/-
>>>>>>> Xcomp and with +/- PreserveFramePointer.
>>>>>>>
>>>>>>> Just as before, JFR and jstack stack traces are unaffected for both
>>>>>>> tests, pstack can now produce stack traces with both tests if
>>>>>>> PreserveFramePointer is enabled.
>>>>>>>
>>>>>>>> However, I've encountered a problem with SunStudio: Two asserts 
>>>>>>>> fail
>>>>>>>> in the fastdebug build. Both of them  "soft" failures, as 
>>>>>>>> neither the
>>>>>>>> VM nor SunStudio crash with the product build. I worked on the
>>>>>>>> problem
>>>>>>>> today and have a partial understanding of the issue, but more
>>>>>>>> investigation is needed to have a patch that preserves the correct
>>>>>>>> behavior of SunStudio as well.
>>>>>>>
>>>>>>> I was able to track down the problems with SunStudio. I had to 
>>>>>>> change
>>>>>>> the code at two places.
>>>>>>>
>>>>>>>
>>>>>>> Change #1 (in src/cpu/x86/vm/frame_x86.cpp):
>>>>>>>
>>>>>>> *** 222,232 ****
>>>>>>>        }
>>>>>>>
>>>>>>>        if (sender_blob->is_nmethod()) {
>>>>>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>>>>>            if (nm != NULL) {
>>>>>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>>>>>> nm->is_deopt_entry(sender_pc)) {
>>>>>>>                    return false;
>>>>>>>                }
>>>>>>>            }
>>>>>>>        }
>>>>>>>
>>>>>>> --- 222,233 ----
>>>>>>>        }
>>>>>>>
>>>>>>>        if (sender_blob->is_nmethod()) {
>>>>>>>            nmethod* nm = sender_blob->as_nmethod_or_null();
>>>>>>>            if (nm != NULL) {
>>>>>>> !             if (nm->is_deopt_mh_entry(sender_pc) ||
>>>>>>> nm->is_deopt_entry(sender_pc) ||
>>>>>>> ! nm->method()->is_method_handle_intrinsic()) {
>>>>>>>                    return false;
>>>>>>>                }
>>>>>>>            }
>>>>>>>        }
>>>>>>>
>>>>>>> The reason for this change is the following. Method handle 
>>>>>>> intrinsics
>>>>>>> (i.e., the intrinsics _invokeBasic, _linkToVirtual,_linkToStatic,
>>>>>>> _linkToSpecial, and _linkToInterface) do not allocate stack 
>>>>>>> space when
>>>>>>> invoked, but they can extend the stack space of their caller
>>>>>>> "temporarily".
>>>>>>>
>>>>>>> For example, if VerifyMethodHandles is enabled, some stack space is
>>>>>>> used
>>>>>>> during verification. The temporarily used stack space is released
>>>>>>> before
>>>>>>> the intrinsic jumps to its target. As a result, the target of a 
>>>>>>> method
>>>>>>> handle intrinsic will have a correct SP when it returns and the
>>>>>>> program's control flow is correct.
>>>>>>>
>>>>>>> Moreover, if the stack is walked synchronously (e.g., at
>>>>>>> safepoints), no
>>>>>>> problems appear either, because the synchronous interruption can
>>>>>>> happen
>>>>>>> while execution is within the method handle intrinsic.
>>>>>>>
>>>>>>> The problem is that the SunStudio analyzer can interrupt the VM
>>>>>>> asynchonously and walk the stack. If execution of a thread is
>>>>>>> interrupted while the thread is in a method handle intrinsic, 
>>>>>>> the SP
>>>>>>> might contain an invalid value.
>>>>>>>
>>>>>>> The new webrev adds a check that marks the current frame unsafe for
>>>>>>> sender if the frame belongs to a method handle intrinsic
>>>>>>> (frame::safe_for_sender returns false in this case).
>>>>>>>
>>>>>>>
>>>>>>> Change #2 (in src/share/vm/prims/forte.cpp):
>>>>>>>
>>>>>>> *** 425,435 ****
>>>>>>>
>>>>>>>        RegisterMap map(thd, false);
>>>>>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>>>>>      }
>>>>>>>
>>>>>>> !   vframeStreamForte st(thd, initial_Java_frame, false);
>>>>>>>
>>>>>>>      for (; !st.at_end() && count < depth; st.forte_next(), 
>>>>>>> count++) {
>>>>>>>        bci = st.bci();
>>>>>>>        method = st.method();
>>>>>>>
>>>>>>> --- 425,435 ----
>>>>>>>
>>>>>>>        RegisterMap map(thd, false);
>>>>>>>        initial_Java_frame = initial_Java_frame.sender(&map);
>>>>>>>      }
>>>>>>>
>>>>>>> !   vframeStreamForte st(thd, initial_Java_frame, true);
>>>>>>>
>>>>>>>      for (; !st.at_end() && count < depth; st.forte_next(), 
>>>>>>> count++) {
>>>>>>>        bci = st.bci();
>>>>>>>        method = st.method();
>>>>>>>
>>>>>>> The problem is that the following assert in forte.cpp on line 103
>>>>>>>
>>>>>>> assert(filled_in, "invariant");
>>>>>>>
>>>>>>> fails. The problem appears if we have a stack trace like:
>>>>>>>
>>>>>>> V  [libjvm.so+0x1c98c4a]  void VMError::report(outputStream*)+0xb1a
>>>>>>> V  [libjvm.so+0x1c9a3e8]  void VMError::report_and_die()+0x748
>>>>>>> V  [libjvm.so+0x1003c8e]  void report_vm_error(const 
>>>>>>> char*,int,const
>>>>>>> char*,const char*)+0x7e
>>>>>>> V  [libjvm.so+0x10efa22] vframeStreamForte::vframeStreamForte
>>>>>>> #Nvariant
>>>>>>> 1(JavaThread*,frame,bool)+0xe2
>>>>>>> --> (Frame #5) V  [libjvm.so+0x10f0bb9]  void
>>>>>>> forte_fill_call_trace_given_top(JavaThread*,ASGCT_CallTrace*,int,frame)+0x789 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> V  [libjvm.so+0x10f1436]  AsyncGetCallTrace+0x246
>>>>>>> C  [libcollector.so+0x272a8] __collector_ext_jstack_unwind+0xb8
>>>>>>> C  [libcollector.so+0x277df] __collector_get_frame_info+0x27f
>>>>>>> C  [libcollector.so+0x2f093] __collector_getUserCtx+0x13
>>>>>>> C  [libcollector.so+0x1abc7] __collector_ext_profile_handler+0x127
>>>>>>> C  [libcollector.so+0x17535] collector_sigprof_dispatcher+0x85
>>>>>>> C  [libc.so.1+0x122476]  __sighndlr+0x6
>>>>>>> C  [libc.so.1+0x115972]  call_user_handler+0x2ce
>>>>>>> C  [libc.so.1+0x115e1b]  sigacthandler+0xdb
>>>>>>> C  0xffffffffffffffff
>>>>>>> V  [libjvm.so+0x1959089]  void os::PlatformEvent::park()+0xd9
>>>>>>> V  [libjvm.so+0x18a6b34]  int ParkCommon(ParkEvent*,long)+0x34
>>>>>>> V  [libjvm.so+0x18a7657]  int Monitor::IWait(Thread*,long)+0xb7
>>>>>>> V  [libjvm.so+0x18a8b86]  bool Monitor::wait(bool,long,bool)+0x346
>>>>>>> V  [libjvm.so+0xf7073d]  void
>>>>>>> CompileBroker::wait_for_completion(CompileTask*)+0xad
>>>>>>> V  [libjvm.so+0xf6f6b6]  void
>>>>>>> CompileBroker::compile_method_base(methodHandle,int,int,methodHandle,int,const 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> char*,Thread*)+0x406
>>>>>>> V  [libjvm.so+0xf6fd96]
>>>>>>> nmethod*CompileBroker::compile_method(methodHandle,int,int,methodHandle,int,const 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> char*,Thread*)+0x586
>>>>>>> V  [libjvm.so+0xbbad72]  void
>>>>>>> AdvancedThresholdPolicy::submit_compile(methodHandle,int,CompLevel,JavaThread*)+0xb2 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> V  [libjvm.so+0x1aef92f]  void
>>>>>>> SimpleThresholdPolicy::compile(methodHandle,int,CompLevel,JavaThread*)+0x14f 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> V  [libjvm.so+0xbbb00f]  void
>>>>>>> AdvancedThresholdPolicy::method_invocation_event(methodHandle,methodHandle,CompLevel,nmethod*,JavaThread*)+0x1ff 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> V  [libjvm.so+0x1aef765]
>>>>>>> nmethod*SimpleThresholdPolicy::event(methodHandle,methodHandle,int,int,CompLevel,nmethod*,JavaThread*)+0x2e5 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> V  [libjvm.so+0xdb93bc]  unsigned
>>>>>>> char*Runtime1::counter_overflow(JavaThread*,int,Method*)+0x31c
>>>>>>> v  ~RuntimeStub::counter_overflow Runtime1 stub
>>>>>>> --> (Frame #29) J 143 C1
>>>>>>> java.net.URLClassLoader$1.run()Ljava/lang/Object; (5 bytes) @
>>>>>>> 0xffff80ffacc6962a [0xffff80ffacc69580+0xaa]
>>>>>>> --> (Frame #30) v  ~StubRoutines::call_stub
>>>>>>> V  [libjvm.so+0x13ca50b]  void
>>>>>>> JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)+0x41b 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> V  [libjvm.so+0x152a111]  JVM_DoPrivileged+0xfb1
>>>>>>> C  [libjava.so+0x12f42]
>>>>>>> Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x12 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> J 142
>>>>>>> java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (0 bytes) @ 0xffff80ffb400c57c [0xffff80ffb400c420+0x15c\
>>>>>>> ]
>>>>>>> J 134 C1
>>>>>>> java.net.URLClassLoader.findClass(Ljava/lang/String;)Ljava/lang/Class; 
>>>>>>>
>>>>>>> (47 bytes) @ 0xffff80ffacc66014 [0xffff80ffacc65ec0+0x154]
>>>>>>> ... more stack frames
>>>>>>>
>>>>>>> The forte_fill_call_trace_given_top() method (Frame #5) first
>>>>>>> checks if
>>>>>>> the first Java frame found is fully decipherable (line 395 in
>>>>>>> forte.cpp). In our case the first Java frame is Frame #29 (the
>>>>>>> C1-compiled version of java.net.URLClassLoader$1.run).
>>>>>>>
>>>>>>> In our case Frame #29 is not decipherable, because
>>>>>>> java.net.URLClassLoader$1.run has been made "not entrant" (a
>>>>>>> C2-compiled
>>>>>>> version of the same method has been produced shortly before).
>>>>>>>
>>>>>>> Afterwards, forte_fill_call_trace_given_top() checks if the 
>>>>>>> method is
>>>>>>> "safe for sender" (line 424 in forte.cpp). The caller of the
>>>>>>> java.net.URLClassLoader$1.run method is ~StubRoutines::call_stub,
>>>>>>> which
>>>>>>> is considered "safe for sender" by the VM.
>>>>>>>
>>>>>>> Then, initial_Java_frame is set to the ~StubRoutines::call_stub 
>>>>>>> stub
>>>>>>> (line 430). This does not seem to be correct because the stub is 
>>>>>>> not a
>>>>>>> Java method and causes the assert(filled_in, "invariant") in the
>>>>>>> constructor of vframeStreamForte (line 103 in forte.cpp) to fail
>>>>>>> (because the frame cannot be filled from a stub).
>>>>>>>
>>>>>>> To avoid this failure, I propose to call the constructor of
>>>>>>> vframeStreamForte with parameter stop_at_java_call_stub set to true
>>>>>>> (instead of false) so that the VM stops walking the stack if a call
>>>>>>> stub
>>>>>>> has been reached.
>>>>>>>
>>>>>>>
>>>>>>> Here is the updated webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.02/
>>>>>>>
>>>>>>> In addition to testing the changeset with the tools mentioned
>>>>>>> before, I
>>>>>>> executed
>>>>>>> - all JPRT tests, all pass;
>>>>>>> - all java/lang/invoke and compiler JTREG tests; all tests that 
>>>>>>> pass
>>>>>>> with the unmodified source trace pass with the changes as well.
>>>>>>>
>>>>>>> Thank you very much in advance!
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>>
>>>>>>> Zoltan
>>>>>>>
>>>>>>>>
>>>>>>>> So that will put this RFR on hold for a while, unfortunately.
>>>>>>>>
>>>>>>>> Thank you for the feedback and suggestions so far!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you!
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Zoltan
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>>
>>>>>>>>>>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote:
>>>>>>>>>>>> Hi Ed,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> thank you for your feedback! Please see comments below.
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/30/2015 11:11 AM, Edward Nevill wrote:
>>>>>>>>>>>>> Hi Zolt?n,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote:
>>>>>>>>>>>>>> Full JPRT run, all tests pass. I also ran all hotspot 
>>>>>>>>>>>>>> compiler
>>>>>>>>>>>>>> tests and
>>>>>>>>>>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32.
>>>>>>>>>>>>>> All
>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>> that pass without the patch pass also with the patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I ran the SPEC JVM 2008 benchmarks on our performance
>>>>>>>>>>>>>> infrastructure for
>>>>>>>>>>>>>> x86_64. The performance evaluation suggests that there is no
>>>>>>>>>>>>>> statistically significant performance degradation due to 
>>>>>>>>>>>>>> having
>>>>>>>>>>>>>> proper
>>>>>>>>>>>>>> frame pointers. Therefore I propose to have OmitFramePointer
>>>>>>>>>>>>>> set to
>>>>>>>>>>>>>> false by default on x86_64 (and set to true on all other
>>>>>>>>>>>>>> platforms).
>>>>>>>>>>>>> This patch looks good, however I think there is a problem
>>>>>>>>>>>>> with the
>>>>>>>>>>>>> logic of OmitFramePointer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is my test case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --- CUT HERE ---
>>>>>>>>>>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $
>>>>>>>>>>>>> // http://www.bagley.org/~doug/shootout/
>>>>>>>>>>>>>
>>>>>>>>>>>>> public class fibo {
>>>>>>>>>>>>>      public static void main(String args[]) {
>>>>>>>>>>>>>     int N = Integer.parseInt(args[0]);
>>>>>>>>>>>>>     System.out.println(fib(N));
>>>>>>>>>>>>>      }
>>>>>>>>>>>>>      public static int fib(int n) {
>>>>>>>>>>>>>     if (n < 2) return(1);
>>>>>>>>>>>>>     return( fib(n-2) + fib(n-1) );
>>>>>>>>>>>>>      }
>>>>>>>>>>>>> }
>>>>>>>>>>>>> --- CUT HERE ---
>>>>>>>>>>>>>
>>>>>>>>>>>>> If I run it as follows on my x86 64 bit linux.
>>>>>>>>>>>>>
>>>>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>>>>>
>>>>>>>>>>>>> I get
>>>>>>>>>>>>>
>>>>>>>>>>>>>    # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>>>>    0x00007fc625071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>>>>    0x00007fc625071107: push   %rbp
>>>>>>>>>>>>>    0x00007fc625071108: mov    %rsp,%rbp
>>>>>>>>>>>>>    0x00007f836907110b: sub    $0x20,%rsp ;*synchronization 
>>>>>>>>>>>>> entry
>>>>>>>>>>>>>
>>>>>>>>>>>>> which is correct, it is NOT(-) OmitFramePointer, therefore 
>>>>>>>>>>>>> it is
>>>>>>>>>>>>> using
>>>>>>>>>>>>> the frame pointer
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now if I try just changing -XX:-OmitFramePointer to
>>>>>>>>>>>>> -XX:+OmitFramePointer in the above I get
>>>>>>>>>>>>>
>>>>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43
>>>>>>>>>>>>>
>>>>>>>>>>>>> I get
>>>>>>>>>>>>>
>>>>>>>>>>>>>    # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>>>>    0x00007f14e1071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>>>>    0x00007f14e1071107: push   %rbp
>>>>>>>>>>>>>    0x00007f14e1071108: sub    $0x20,%rsp ;*synchronization 
>>>>>>>>>>>>> entry
>>>>>>>>>>>>>
>>>>>>>>>>>>> which is correct, it is ID(+) OmitFramePointer, therefore it
>>>>>>>>>>>>> does
>>>>>>>>>>>>> not
>>>>>>>>>>>>> use a frame pointer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, if I now delete the -XX:+/-OmitFramePointer
>>>>>>>>>>>>> altogether, IE
>>>>>>>>>>>>>
>>>>>>>>>>>>> /work/images/jdk/bin/java -XX:-TieredCompilation
>>>>>>>>>>>>> -XX:+PrintCompilation
>>>>>>>>>>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions
>>>>>>>>>>>>> -XX:+PrintAssembly fibo 43
>>>>>>>>>>>>>
>>>>>>>>>>>>> I get
>>>>>>>>>>>>>
>>>>>>>>>>>>>    # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo'
>>>>>>>>>>>>>    # parm0:    rsi       = int
>>>>>>>>>>>>>    #           [sp+0x30]  (sp of caller)
>>>>>>>>>>>>>    0x00007f0c75071100: mov %eax,-0x14000(%rsp)
>>>>>>>>>>>>>    0x00007f0c75071107: push   %rbp
>>>>>>>>>>>>>    0x00007f0c75071108: sub    $0x20,%rsp ;*synchronization 
>>>>>>>>>>>>> entry
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is not using a frame pointer which is the equivalent of
>>>>>>>>>>>>> -XX:+OmitFramePointer. However in your description above 
>>>>>>>>>>>>> you say
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Therefore I propose to have OmitFramePointer set to false by
>>>>>>>>>>>>>> default
>>>>>>>>>>>>>> on x86_64 (and set to true on all other platforms).
>>>>>>>>>>>>> whereas OmitFramePointer actually seems to be set to true on
>>>>>>>>>>>>> x86_64
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the problem may be with the declaration and
>>>>>>>>>>>>> definition of
>>>>>>>>>>>>> OmitFramePointer in globals.hpp and globals_x86.hpp
>>>>>>>>>>>>>
>>>>>>>>>>>>> In globals.hpp it does
>>>>>>>>>>>>>
>>>>>>>>>>>>> product(bool, OmitFramePointer, true,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In globals_x86.hpp it does
>>>>>>>>>>>>>
>>>>>>>>>>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false););
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am not sure that you can mix product(...) and 
>>>>>>>>>>>>> product_pd(...)
>>>>>>>>>>>>> like
>>>>>>>>>>>>> this, so I think it just ends up getting the default from the
>>>>>>>>>>>>> product(...).
>>>>>>>>>>>>
>>>>>>>>>>>> You are right, mixing product and product_pd does not make 
>>>>>>>>>>>> sense
>>>>>>>>>>>> at all.
>>>>>>>>>>>> Thank you for doing additional testing and for drawing 
>>>>>>>>>>>> attention
>>>>>>>>>>>> to the
>>>>>>>>>>>> problem.
>>>>>>>>>>>>
>>>>>>>>>>>> I updated the code to use product_pd and define_pd_global 
>>>>>>>>>>>> on all
>>>>>>>>>>>> relevant platforms.
>>>>>>>>>>>>
>>>>>>>>>>>>> Aside: In general, I do not like options which include a
>>>>>>>>>>>>> negative in
>>>>>>>>>>>>> them because I have to do a double think when I see something
>>>>>>>>>>>>> like,
>>>>>>>>>>>>> -XX:-OmitFramePointer, as in, it is omitting the frame 
>>>>>>>>>>>>> pointer,
>>>>>>>>>>>>> therefore it is using a frame pointer. How about FramePointer
>>>>>>>>>>>>> so we
>>>>>>>>>>>>> have -XX:+FramePointer to say I want frame pointers and
>>>>>>>>>>>>> -XX:-FramePointer to say I don't.
>>>>>>>>>>>>
>>>>>>>>>>>> That is a good idea. Double negation is an unnecessary
>>>>>>>>>>>> complication, so
>>>>>>>>>>>> I changed the name of the flag to FramePointer, just as you
>>>>>>>>>>>> suggested.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did some timing on the above 'fibo' test
>>>>>>>>>>>>>
>>>>>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>>>>>> -XX:-OmitFramePointer fibo 43
>>>>>>>>>>>>> 701408733
>>>>>>>>>>>>>
>>>>>>>>>>>>> real    0m1.545s
>>>>>>>>>>>>> user    0m1.571s
>>>>>>>>>>>>> sys    0m0.015s
>>>>>>>>>>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java
>>>>>>>>>>>>> -XX:+OmitFramePointer fibo 43
>>>>>>>>>>>>> 701408733
>>>>>>>>>>>>>
>>>>>>>>>>>>> real    0m1.504s
>>>>>>>>>>>>> user    0m1.527s
>>>>>>>>>>>>> sys    0m0.019s
>>>>>>>>>>>>>
>>>>>>>>>>>>> which is ~3% difference on this test case. On aarch64, I 
>>>>>>>>>>>>> see ~7%
>>>>>>>>>>>>> difference on this test case.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for the performance measurements!
>>>>>>>>>>>>
>>>>>>>>>>>>> With the above change to fix the logic of OmitFramePointer 
>>>>>>>>>>>>> (and
>>>>>>>>>>>>> possible change its name) the patch looks good to me.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is the updated webrev (the same webrev that was already
>>>>>>>>>>>> included
>>>>>>>>>>>> into my reply to Roland):
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/
>>>>>>>>>>>>
>>>>>>>>>>>>> I will prepare a mirror patch for aarch64.
>>>>>>>>>>>>
>>>>>>>>>>>> That would be great!
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you and best regards,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Zolt?n
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> All the best,
>>>>>>>>>>>>> Ed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>


From sgehwolf at redhat.com  Mon Apr 27 14:18:35 2015
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Mon, 27 Apr 2015 16:18:35 +0200
Subject: RFR(xs): 8078666: JVM fastdebug build compiled with GCC 5 asserts
	with "widen increases"
Message-ID: <1430144315.3349.17.camel@redhat.com>

Hi,

Could somebody please review and sponsor the following patch?

Bug: https://bugs.openjdk.java.net/browse/JDK-8078666
Webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8078666/webrev.02/

We've discovered this issue in Fedora where we were seeing a strange
memory leak issue of an OpenJDK build with GCC 5. More info in the bug.

As it turns out, current hotspot relies on undefined behaviour in
normalize_int_widen()/normalize_long_widen() where an integer overflow
can occur on some inputs.

The fix is to do the math on the unsigned type where overflows are well
defined.

Thanks,
Severin


From christian.thalinger at oracle.com  Mon Apr 27 16:48:53 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 27 Apr 2015 09:48:53 -0700
Subject: RFR 8078563 - add profitability tests for reductions
In-Reply-To: <C568518E7B433348B114B6A7122D474755DDE7A4@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755DDE7A4@FMSMSX102.amr.corp.intel.com>
Message-ID: <BA406C4C-87EA-43D2-8948-53F3F6045FD7@oracle.com>

+       // Length 2 reductions of INT/LONG do not offer performance benefits
+       if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) {

I don?t know that code very well but can there be reductions with size == 1?

> On Apr 23, 2015, at 5:53 PM, Berg, Michael C <michael.c.berg at intel.com> wrote:
> 
> Hi Folks,
> 
> We (Intel) would like to add profitability tests to superword to gate scenarios where reduction optimization overhead is roughly equal to the benefit gained by vectorization.
> We would like to do this for all x86 enabled microarchitectures that support reductions and superword.  This new constraint was tested on SSE and AVX (1,2) enabled platforms.
> The contribution as referenced by RFR 8078563 is defined by the information at the links below.
> 
> Please review this bug entry and its code and comment as needed:
> 
> https://bugs.openjdk.java.net/browse/JDK-8078563 <https://bugs.openjdk.java.net/browse/JDK-8078563>
>  
> And its code and test addition (this is a small patch):
>  
> http://cr.openjdk.java.net/~kvn/8078563/webrev/ <http://cr.openjdk.java.net/~kvn/8078563/webrev/>
> 
> 
> Vladimir Koslov has offered to sponsor this patch.
>  
> Thanks,
> Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150427/779ac197/attachment.html>

From michael.c.berg at intel.com  Mon Apr 27 19:57:24 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Mon, 27 Apr 2015 19:57:24 +0000
Subject: RFR 8078563 - add profitability tests for reductions
References: <C568518E7B433348B114B6A7122D474755DDE7A4@FMSMSX102.amr.corp.intel.com>
	<BA406C4C-87EA-43D2-8948-53F3F6045FD7@oracle.com> 
Message-ID: <C568518E7B433348B114B6A7122D474755DDECF3@FMSMSX102.amr.corp.intel.com>

Christian:

No, there cannot be such a mapping, that would be scalar code with no vector solution where superword would fail to find a packset mapping.

Thanks,
Michael

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Monday, April 27, 2015 9:49 AM
To: Berg, Michael C
Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR 8078563 - add profitability tests for reductions


+       // Length 2 reductions of INT/LONG do not offer performance benefits

+       if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) {

I don?t know that code very well but can there be reductions with size == 1?

On Apr 23, 2015, at 5:53 PM, Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>> wrote:

Hi Folks,

We (Intel) would like to add profitability tests to superword to gate scenarios where reduction optimization overhead is roughly equal to the benefit gained by vectorization.
We would like to do this for all x86 enabled microarchitectures that support reductions and superword.  This new constraint was tested on SSE and AVX (1,2) enabled platforms.
The contribution as referenced by RFR 8078563 is defined by the information at the links below.

Please review this bug entry and its code and comment as needed:

https://bugs.openjdk.java.net/browse/JDK-8078563

And its code and test addition (this is a small patch):

http://cr.openjdk.java.net/~kvn/8078563/webrev/


Vladimir Kozlov has offered to sponsor this patch.

Thanks,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150427/6f27de77/attachment.html>

From roland.westrelin at oracle.com  Tue Apr 28 08:38:00 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Tue, 28 Apr 2015 10:38:00 +0200
Subject: RFR(S): 8078426: mb/jvm/compiler/InterfaceCalls/testAC2 -
	assert(predicate_proj == 0L) failed: only one predicate entry expected
Message-ID: <CDEB8295-F9E7-4152-AC6F-B1FA39501216@oracle.com>

http://cr.openjdk.java.net/~roland/8078426/webrev.00/

See test case: the loop is unswitched, then the loop bodies become empty so the loops are optimized out. The split if optimization then finds predicates it doesn?t expect on both branches of the unswitched loop test.

Roland.

From rickard.backman at oracle.com  Tue Apr 28 10:03:37 2015
From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=)
Date: Tue, 28 Apr 2015 12:03:37 +0200
Subject: RFR (M): 8064458 OopMap class could be more compact
Message-ID: <20150428100337.GB31204@rbackman>

Hi all,

can I please have reviews for this change:

RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
RFE: http://bugs.openjdk.java.net/browse/JDK-8064458

While looking at OopMaps a while ago I noticed that there were a couple
of different fields that were unused after the OopMaps were finalised.

I took some time to investigate and rearrange the OopMaps. Since I
didn't want to change how the OopMaps are built I introduced new data
structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
structures are used to build up the OopMaps and when finalised they are
copied into the Immutable variants.

The ImmutableOopMapSet contains a few fields [size, count] and then a
list of [pc, offset]. The offset points to the offset after the list
where the ImmutableOopMap is placed. By moving pc out from OopMap to be
part of the list we can now have multiple pcs with identical OopMaps
point to the same data.

We only keep 1 empty OopMap, and the other compaction that is done in
this change is to check if the OopMap is identical to the previous one
and then reuse that one. So no complete uniqueness check.

I ran a couple of small benchmarks and printed the size of the old
OopMaps vs the new. The new layout uses about 20 - 25% of the space on
the benchmarks I've run.

Tested by running through JPRT, running BigApps and NSK.quick.testlist

Thanks
/R
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150428/c81deb0a/signature.asc>

From bertrand.delsart at oracle.com  Tue Apr 28 12:22:45 2015
From: bertrand.delsart at oracle.com (Bertrand Delsart)
Date: Tue, 28 Apr 2015 14:22:45 +0200
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <20150428100337.GB31204@rbackman>
References: <20150428100337.GB31204@rbackman>
Message-ID: <553F7B95.7070103@oracle.com>

Hi,

First, thanks for the change. The additional benefit is that an 
ImmutableOopMapSet no longer contains any absolute references.

A few comments.

The ImmutableOopMapSet.java seems to be missing in the webrev.

There also seem to be a few issues with ImmutableOopMap::print_on
and ImmutableOopMapSet::print_on:

- I did not spot the closing "}" corresponding to
   "ImmutableOopMap{"

- the "map != last" part does not look complete. I assume that you are 
trying to dump only once the OopMap when it is shared by successive pcs 
but you are then missing a "last = map;" line somewhere in the if statement.

- as a minor point, I'd rather print "offs:" or "pc offsets:" instead of 
"pcs:" because OopMapSet use pc offsets, not absolute pcs. [ Your CR 
might also be the right time to replace "pc" by "pc_offset" for a few 
field or variable names since this is a bit confusing. ]

Regards,

Bertrand.

On 28/04/2015 12:03, Rickard B?ckman wrote:
> Hi all,
>
> can I please have reviews for this change:
>
> RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
> RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
>
> While looking at OopMaps a while ago I noticed that there were a couple
> of different fields that were unused after the OopMaps were finalised.
>
> I took some time to investigate and rearrange the OopMaps. Since I
> didn't want to change how the OopMaps are built I introduced new data
> structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
> structures are used to build up the OopMaps and when finalised they are
> copied into the Immutable variants.
>
> The ImmutableOopMapSet contains a few fields [size, count] and then a
> list of [pc, offset]. The offset points to the offset after the list
> where the ImmutableOopMap is placed. By moving pc out from OopMap to be
> part of the list we can now have multiple pcs with identical OopMaps
> point to the same data.
>
> We only keep 1 empty OopMap, and the other compaction that is done in
> this change is to check if the OopMap is identical to the previous one
> and then reuse that one. So no complete uniqueness check.
>
> I ran a couple of small benchmarks and printed the size of the old
> OopMaps vs the new. The new layout uses about 20 - 25% of the space on
> the benchmarks I've run.
>
> Tested by running through JPRT, running BigApps and NSK.quick.testlist
>
> Thanks
> /R
>


-- 
Bertrand Delsart,                     Grenoble Engineering Center
Oracle,         180 av. de l'Europe,          ZIRST de Montbonnot
38330 Montbonnot Saint Martin,                             FRANCE
bertrand.delsart at oracle.com             Phone : +33 4 76 18 81 23

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of
the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From boris.molodenkov at oracle.com  Tue Apr 28 13:31:28 2015
From: boris.molodenkov at oracle.com (Boris Molodenkov)
Date: Tue, 28 Apr 2015 16:31:28 +0300
Subject: [8u60] Request for approval: backport of JDK-8058846
Message-ID: <553F8BB0.7050302@oracle.com>

Hi All,

I would like to backport fix for JDK-8058846 to 8u60.

Bug id: https://bugs.openjdk.java.net/browse/JDK-8058846
Webrev: http://cr.openjdk.java.net/~bmoloden/8058846/webrev.00/
Changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4d1463933e28
Review thread for original fix:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-November/016382.html

testing: manual with jtreg

Thanks,
Boris


From boris.molodenkov at oracle.com  Tue Apr 28 13:32:25 2015
From: boris.molodenkov at oracle.com (Boris Molodenkov)
Date: Tue, 28 Apr 2015 16:32:25 +0300
Subject: [8u60] Request for approval: backport of JDK-8050486
Message-ID: <553F8BE9.7090004@oracle.com>

Hi All,

I would like to backport fix for JDK-8050486 to 8u60.

Bug id: https://bugs.openjdk.java.net/browse/JDK-8050486
Webrev: http://cr.openjdk.java.net/~bmoloden/8050486/webrev.00/
Changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/2f8520599d39
Review thread for original fix:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016754.html

testing: manual with jtreg

Thanks,
Boris


From vladimir.x.ivanov at oracle.com  Tue Apr 28 14:42:18 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Tue, 28 Apr 2015 17:42:18 +0300
Subject: RFR(M): 8076188 Optimize arraycopy out for non escaping
	destination
In-Reply-To: <2B8622DD-1DA5-4715-B4BF-3801202B588B@oracle.com>
References: <2B8622DD-1DA5-4715-B4BF-3801202B588B@oracle.com>
Message-ID: <553F9C4A.7040407@oracle.com>

Looks good.

Best regards,
Vladimir Ivanov

On 4/21/15 4:02 PM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8076188/webrev.00/
>
> This patch tries to eliminate ArrayCopyNodes (for instance clones, array clones, arraycopy and copyOf) when the destination of the copy doesn?t escape:
>
> - during escape analysis, ArrayCopyNodes don?t cause the destination of the copy to be marked as escaping anymore
> - a load to the destination of a copy may be replaced by a load from the source during IGVN
> - during macro expansion, ArrayCopyNodes don?t stop allocation from being eliminated and can themselves be eliminated
>
> Roland.
>

From vladimir.kozlov at oracle.com  Tue Apr 28 18:45:59 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Apr 2015 11:45:59 -0700
Subject: [8u60] Request for approval: backport of JDK-8058846
In-Reply-To: <553F8BB0.7050302@oracle.com>
References: <553F8BB0.7050302@oracle.com>
Message-ID: <553FD567.7020009@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/28/15 6:31 AM, Boris Molodenkov wrote:
> Hi All,
>
> I would like to backport fix for JDK-8058846 to 8u60.
>
> Bug id: https://bugs.openjdk.java.net/browse/JDK-8058846
> Webrev: http://cr.openjdk.java.net/~bmoloden/8058846/webrev.00/
> Changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4d1463933e28
> Review thread for original fix:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-November/016382.html
>
> testing: manual with jtreg
>
> Thanks,
> Boris
>

From vladimir.kozlov at oracle.com  Tue Apr 28 18:51:10 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Apr 2015 11:51:10 -0700
Subject: [8u60] Request for approval: backport of JDK-8050486
In-Reply-To: <553F8BE9.7090004@oracle.com>
References: <553F8BE9.7090004@oracle.com>
Message-ID: <553FD69E.1040202@oracle.com>

Good. Was there other changes in RTMTestBase.java which were not backported? Changes are different in that file from 
jdk9 (but the resulting flags are the same).

Thanks,
Vladimir

On 4/28/15 6:32 AM, Boris Molodenkov wrote:
> Hi All,
>
> I would like to backport fix for JDK-8050486 to 8u60.
>
> Bug id: https://bugs.openjdk.java.net/browse/JDK-8050486
> Webrev: http://cr.openjdk.java.net/~bmoloden/8050486/webrev.00/
> Changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/2f8520599d39
> Review thread for original fix:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016754.html
>
> testing: manual with jtreg
>
> Thanks,
> Boris
>

From vladimir.kozlov at oracle.com  Tue Apr 28 20:39:05 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Apr 2015 13:39:05 -0700
Subject: RFR(S): 8078426: mb/jvm/compiler/InterfaceCalls/testAC2 -
	assert(predicate_proj == 0L) failed: only one predicate entry expected
In-Reply-To: <CDEB8295-F9E7-4152-AC6F-B1FA39501216@oracle.com>
References: <CDEB8295-F9E7-4152-AC6F-B1FA39501216@oracle.com>
Message-ID: <553FEFE9.4040305@oracle.com>

Can we remove predicates when loop is optimized out?
What code eliminates the loop?

Vladimir

On 4/28/15 1:38 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8078426/webrev.00/
>
> See test case: the loop is unswitched, then the loop bodies become empty so the loops are optimized out. The split if optimization then finds predicates it doesn?t expect on both branches of the unswitched loop test.
>
> Roland.
>

From vladimir.kozlov at oracle.com  Tue Apr 28 21:18:38 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Apr 2015 14:18:38 -0700
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <20150428100337.GB31204@rbackman>
References: <20150428100337.GB31204@rbackman>
Message-ID: <553FF92E.3010908@oracle.com>

You need closed SA changes.

Style:

Move fields to the beginning of ImmutableOopMapBuilder and classes.
Add comments describing each field in ALL new classes. Add comments to fields in old class. It will help next persone 
who will look on oop maps later.

Add ResourceMark into ImmutableOopMapSet::build_from() to free memory allocated in ImmutableOopMapBuilder().
Why you need ImmutableOopMapBuilder to be friend of class ImmutableOopMap?

I think a simple loop in ImmutableOopMapBuilder::verify() would be faster than calling memcmp.

Field _end is not used.

ImmutableOopMapBuilder() calls reset() and next called heap_size() calls reset() again. May be move reset() to the end 
of heap_size() so that you don't need to call it in fill().

Thanks,
Vladimir

On 4/28/15 3:03 AM, Rickard B?ckman wrote:
> Hi all,
>
> can I please have reviews for this change:
>
> RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
> RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
>
> While looking at OopMaps a while ago I noticed that there were a couple
> of different fields that were unused after the OopMaps were finalised.
>
> I took some time to investigate and rearrange the OopMaps. Since I
> didn't want to change how the OopMaps are built I introduced new data
> structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
> structures are used to build up the OopMaps and when finalised they are
> copied into the Immutable variants.
>
> The ImmutableOopMapSet contains a few fields [size, count] and then a
> list of [pc, offset]. The offset points to the offset after the list
> where the ImmutableOopMap is placed. By moving pc out from OopMap to be
> part of the list we can now have multiple pcs with identical OopMaps
> point to the same data.
>
> We only keep 1 empty OopMap, and the other compaction that is done in
> this change is to check if the OopMap is identical to the previous one
> and then reuse that one. So no complete uniqueness check.
>
> I ran a couple of small benchmarks and printed the size of the old
> OopMaps vs the new. The new layout uses about 20 - 25% of the space on
> the benchmarks I've run.
>
> Tested by running through JPRT, running BigApps and NSK.quick.testlist
>
> Thanks
> /R
>

From vladimir.kozlov at oracle.com  Tue Apr 28 21:25:24 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 28 Apr 2015 14:25:24 -0700
Subject: RFR(S) 8077504: Unsafe load can loose control dependency and
	cause crash
In-Reply-To: <CDF55084-CD74-4195-B596-37E7935FCB62@oracle.com>
References: <42EDECBF-2347-406A-8835-29A98F1CA153@oracle.com>
	<552FC216.4010503@redhat.com>
	<95F149F4-7BE2-45C8-A3C0-DEACFE42ACDF@oracle.com>
	<6AC5258A-A40C-4121-8B1F-0076713345D4@oracle.com>
	<553A88F3.2010700@oracle.com>
	<CDF55084-CD74-4195-B596-37E7935FCB62@oracle.com>
Message-ID: <553FFAC4.7030107@oracle.com>

On 4/27/15 12:52 AM, Roland Westrelin wrote:
> Thanks for the review, Vladimir. See below.
>
>> I agree that we have to pass parameter to GraphKit::make_load().
>> I thought we can avoid it for LoadNode::make() but it has transform for compressed oops. AARGH!
>>
>> Add comment to code in library_call.cpp why we set flag to false.
>>
>> BTW, should we modify LoadNode::hash() to include _depends_only_on_test and prevent igvning?
>
> If the graph already has a non-pinned LoadNode and we add a pinned LoadNode with the same inputs, it?s safe for GVN to replace the pinned LoadNode by the non-pinned LoadNode, otherwise it wouldn?t be safe to have a non pinned LoadNode with those inputs in the first place. If the graph already has a pinned LoadNode and we add a non pinned LoadNode with the same inputs, it?s safe for GVN to replace the non-pinned LoadNode by the pinned LoadNode. It?s also suboptimal but better than 2 LoadNodes?

Okay. Add comment to _depends_only_on_test field why we don't use it in hash().

Thanks,
Vladimir

>
> I guess we could change LoadNode::hash() and then use LoadNode::Identity/Ideal to make sure the pinned LoadNode is always replaced by the non-pinned LoadNode in the scenarios above but that sounds like extra complexity for something that, as far as we know, never happens in practice.
>
> Roland.
>
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/24/15 1:03 AM, Roland Westrelin wrote:
>>>
>>>> Vladimir suggested privately to set _depends_only_on_test to true in the constructor and then use an explicit call to a new a method set_depends_only_on_test() to set it to false in the rare cases where it?s needed. That feels better indeed. What do you think?
>>>
>>> Actually, using a set_depends_only_on_test() method doesn?t work well. In LibraryCallKit::inline_unsafe_access() the node returned by make_load() may have been transformed already and we could call set_depends_only_on_test() on a node that doesn?t need to be pinned. The call to set_depends_only_on_test() would have to be in LoadNode::make(). I went with default parameters instead to keep the change small:
>>>
>>> http://cr.openjdk.java.net/~roland/8077504/webrev.01/
>>>
>>> Roland.
>>>
>

From sgehwolf at redhat.com  Wed Apr 29 07:56:43 2015
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Wed, 29 Apr 2015 09:56:43 +0200
Subject: RFR(xs): 8078666: JVM fastdebug build compiled with GCC 5
	asserts with "widen increases"
In-Reply-To: <1430144315.3349.17.camel@redhat.com>
References: <1430144315.3349.17.camel@redhat.com>
Message-ID: <1430294203.3356.9.camel@redhat.com>

Hi,

Adding hotspot-dev for wider audience. IMHO hotspot should not rely on
undefined behaviour (overflow on signed int/long is undefined) and this
should get fixed.

--Severin

On Mon, 2015-04-27 at 16:18 +0200, Severin Gehwolf wrote:
> Hi,
> 
> Could somebody please review and sponsor the following patch?
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8078666
> Webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8078666/webrev.02/
> 
> We've discovered this issue in Fedora where we were seeing a strange
> memory leak issue of an OpenJDK build with GCC 5. More info in the bug.
> 
> As it turns out, current hotspot relies on undefined behaviour in
> normalize_int_widen()/normalize_long_widen() where an integer overflow
> can occur on some inputs.
> 
> The fix is to do the math on the unsigned type where overflows are well
> defined.
> 
> Thanks,
> Severin
> 
> 
> 


From aph at redhat.com  Wed Apr 29 08:05:13 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 29 Apr 2015 09:05:13 +0100
Subject: RFR(xs): 8078666: JVM fastdebug build compiled with GCC 5 asserts
	with "widen increases"
In-Reply-To: <1430294203.3356.9.camel@redhat.com>
References: <1430144315.3349.17.camel@redhat.com>
	<1430294203.3356.9.camel@redhat.com>
Message-ID: <554090B9.2050306@redhat.com>

On 29/04/15 08:56, Severin Gehwolf wrote:
> Adding hotspot-dev for wider audience. IMHO hotspot should not rely on
> undefined behaviour (overflow on signed int/long is undefined) and this
> should get fixed.

Absolutely so.  This looks like a good patch.

Andrew.


From roland.westrelin at oracle.com  Wed Apr 29 09:26:03 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 29 Apr 2015 11:26:03 +0200
Subject: RFR(S): 8078426: mb/jvm/compiler/InterfaceCalls/testAC2 -
	assert(predicate_proj == 0L) failed: only one predicate entry expected
In-Reply-To: <553FEFE9.4040305@oracle.com>
References: <CDEB8295-F9E7-4152-AC6F-B1FA39501216@oracle.com>
	<553FEFE9.4040305@oracle.com>
Message-ID: <914001D5-433D-4B22-9AE7-8672E667DA83@oracle.com>

Hi Vladimir,

Thanks for looking at this.

> Can we remove predicates when loop is optimized out?
> What code eliminates the loop?

For the test case, the loop is found empty by IdealLoopTree::policy_do_remove_empty_loop() and the CountedLoopNode is effectively removed by RegionNode::Ideal() because it has a single input. For the crash in mb/jvm/compiler/InterfaceCalls/testAC2, the loop is also removed by RegionNode::Ideal() but I don?t know if it?s because the loop was empty or for another reason (I assume the backbranch could be removed also because the loop has a single iteration).

We can?t remove all predicates without risking incorrect execution, right? The loop could for instance only perform a null check. Loop predication moves the null check out of the loop. The loop becomes empty so it goes away. But the null check can?t go away because the method is still supposed to throw an NPE if the null check fails.

We could remove the predicates that GraphKit::add_predicate() adds and that will eventually go away (the ones that test an opaque node). That doesn?t help split_if because PhaseIdealLoop::find_predicate() could still find a predicate like the null check above. So we could remove the predicates that test on an opaque node when the loop goes dead, then change split_if so it looks not for any predicates but only for the predicates that test on an opaque node. But that doesn?t help either because AFAIU, split_if can be run after the loop optimizations are over and Opaque1Node::Identity() has removed those predicates.

Roland.

> 
> Vladimir
> 
> On 4/28/15 1:38 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8078426/webrev.00/
>> 
>> See test case: the loop is unswitched, then the loop bodies become empty so the loops are optimized out. The split if optimization then finds predicates it doesn?t expect on both branches of the unswitched loop test.
>> 
>> Roland.
>> 


From roland.westrelin at oracle.com  Wed Apr 29 09:30:08 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 29 Apr 2015 11:30:08 +0200
Subject: RFR(M): 8076188 Optimize arraycopy out for non escaping
	destination
In-Reply-To: <553F9C4A.7040407@oracle.com>
References: <2B8622DD-1DA5-4715-B4BF-3801202B588B@oracle.com>
	<553F9C4A.7040407@oracle.com>
Message-ID: <D5590E7C-C82D-4F9B-A3DC-B69AE805D409@oracle.com>

Thanks for the review, Vladimir.

Roland.

> On Apr 28, 2015, at 4:42 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 4/21/15 4:02 PM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8076188/webrev.00/
>> 
>> This patch tries to eliminate ArrayCopyNodes (for instance clones, array clones, arraycopy and copyOf) when the destination of the copy doesn?t escape:
>> 
>> - during escape analysis, ArrayCopyNodes don?t cause the destination of the copy to be marked as escaping anymore
>> - a load to the destination of a copy may be replaced by a load from the source during IGVN
>> - during macro expansion, ArrayCopyNodes don?t stop allocation from being eliminated and can themselves be eliminated
>> 
>> Roland.
>> 


From tobias.hartmann at oracle.com  Wed Apr 29 12:44:39 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Wed, 29 Apr 2015 14:44:39 +0200
Subject: [9] RFR(S): 8078497: C2's superword optimization causes unaligned
	memory accesses
Message-ID: <5540D237.8000107@oracle.com>

Hi,

please review the following patch.

https://bugs.openjdk.java.net/browse/JDK-8078497
http://cr.openjdk.java.net/~thartmann/8078497/webrev.00/

Background information (simplified):
After loop optimizations, C2 tries to vectorize memory operations by finding and merging adjacent memory accesses in the loop body. The superword optimization matches MemNodes with the following pattern as address input:

  ptr + k*iv + constant [+ invar]

where iv is the loop induction variable, k is a scaling factor that may be 0 and invar is an optional loop invariant value. C2 then picks the memory operation with most similar references and tries to align it in the main loop by setting the number of pre-loop iterations accordingly. Other adjacent memory operations that conform to this alignment are merged into packs. After extending and filtering these packs, final vector operations are emitted.

The problem is that some architectures (for example, Sparc) require memory accesses to be aligned and in special cases the superword optimization generates code that contains unaligned vector instructions.

Problems:
(1) If two memory operations have different loop invariant offset values and C2 decides to align the main loop to one of them, we can always set the invariant of the other operation such that we break the alignment constraint. For example, if we have a store to a byte array a[i+inv1] and a load from a byte array b[i+inv2] (where i is the induction variable), C2 may decide to align the main loop such that (i+inv1 % 8) == 0 and replace the adjacent byte stores by a 8-byte double word store. Also the byte loads will be replaced by a double word load. If we then set inv2 = inv1 + 1 at runtime, the load will fail with a SIGBUS because the access is not 8-byte aligned, i.e., (i+inv2 % 8) != 0. Vectorization should not take place in this case.

(2) If C2 decides to align the main loop to a memory operation, the necessary adjustment of the induction variable by the pre-loop is computed such that the resulting offset in the main loop is aligned to the vector width (see 'SuperWord::get_iv_adjustment'). If the offset of a memory operation is independent of the loop induction variable (i.e., scale k is 0), the iv adjustment should be 0. 

(3) If the loop span is greater than the vector width 'vw' of a memory operation, the superword optimization assumes that the pre-loop is never able to align this operation in the main loop (see 'SuperWord::ref_is_alignable'). This is wrong because if the loop span is a multiple of vw and depending on the initial offset, it may very well be possible to align the operation to vw.

These problems originally only showed up in the string density code base (JDK-8054307) where we have a putChar intrinsic that writes a char value to two entries of a byte array. To reproduce the issue with JDK9 we have to make sure that the memory operations:
(i) are independent,
(ii) have different invariants,
(iii) are not too complex (because then vectorization will not take place).

To guarantee (i) we either need two different arrays (for example, byte[] and char[]) for the load and store operations or the same array but different offsets. Since the offsets should include a loop invariant part (ii), the superword optimization will not be able to determine that the runtime offset values do not overlap. We therefore use Unsafe.getChar/putChar to read/write a char value from/to a byte/char array and thereby guarantee independence.

I came up with the following test (see 'TestVectorizationWithInvariant.java'):
 
  byte[] src = new byte[1000];
  byte[] dst = new char[1000];

  for (int i = (int) CHAR_ARRAY_OFFSET; i < 100; i = i + 8) {
    // Copy 8 chars from src to dst
    unsafe.putChar(dst, i + 0, unsafe.getChar(src, off + 0));
    [...]
    unsafe.putChar(dst, i + 14, unsafe.getChar(src, off + 14));
  }

Problem (1) shows up since the main loop will be aligned to the StoreC[i + 0] which has no invariant. However, the LoadUS[off + 0] has the loop invariant 'off'. Setting off to BYTE_ARRAY_OFFSET + 2 will break the alignment of the emitted double word load and result in a crash.

The LoadUS[off + 0] in above example is independent of the loop induction variable and therefore has no scale value. Because of problem (2), the iv adjustment is computed as -4 (see 'SuperWord::get_iv_adjustment').

  offset = 16 iv_adjust = -4 elt_size = 2 scale = 0 iv_stride = 16 vect_size 8

The regression test contains additional test cases that also trigger problem (3). 

Solution:
(1) I added a check to 'SuperWord::find_adjacent_refs' to make sure that memory accesses with different invariants are only vectorized if the architecture supports unaligned memory accesses.
(2) 'SuperWord::get_iv_adjustment' is modified such that it returns 0 if the memory operation is independent of iv.
(3) In 'SuperWord::ref_is_alignable' we need to additionally check if span is divisible by vw. If so, the pre-loop will add multiples of vw to the initial offset and if that initial offset is divisible by vm, the final offset (after the pre-loop) will be divisible as well and we should return true. I added comments to the code describing this in detail.

Testing:
- original test with string-density codebase
- regression test
- JPRT

Thanks,
Tobias

From benedikt.wedenik at theobroma-systems.com  Wed Apr 29 12:50:06 2015
From: benedikt.wedenik at theobroma-systems.com (Benedikt Wedenik)
Date: Wed, 29 Apr 2015 14:50:06 +0200
Subject: aarch64 AD-file / matching rule
Message-ID: <00D2ABB8-C2DE-45C2-A515-1D6FC222208E@theobroma-systems.com>

Hi!

I?m writing compiler-optimisations for the aarch64 port at the moment and I am using specjbb2005 for benchmarking.
One of the patterns I want to optimise is the following:

  0x0000007f8c2961b4: and	w2, w2, #0x7ffff8
  0x0000007f8c2961b8: cmp	w2, #0x0
  0x0000007f8c2961bc: b.eq	0x0000007f8c2968f4


Here I see an opportunity for ands, b.eq.

I created a new rule in the cpu/aarch64/vm/aarch64.ad file.
My matching looks like this:

instruct and_cmp_branch(cmpOp cmp, immI0 zero, iRegIorL2I src1, immILog src2, label lbl, rFlagsReg cr) %{
  match(If cmp (CmpI (AndI src1 src2) zero) );

  effect(USE lbl);
  ins_cost(0); // is zero at the moment to be sure the rule is triggered.

  ins_encode %{
    Label* L = $lbl$$label;
    Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
    __ andsw(as_Register($src1$$reg),
        as_Register($src1$$reg),
        (unsigned long)($src2$$constant));
    __ br ((Assembler::Condition)$cmp$$cmpcode, *L);
  %}  

  ins_pipe(pipe_cmp_branch); //TODO but not relevant yet
%}


As I don?t know whether my matching-rule is wrong or something else stops the rule from getting emitted I wanted to find out which ?and?-rule is triggered for this pattern.
I inserted some nop?s to locate the according rule and I found out, that most of the emitted ?and?s were surrounded by nop?s except for my pattern and some few other ones like this one:

0x0000007f984bf568: eor   x1, x0, x1
0x0000007f984bf56c: and   x1, x1, #0xffffffffffffff87
0x0000007f984bf570: cbz   x1, 0x0000007f984bf664
0x0000007f984bf574: and   xscratch1, x1, #0x7
0x0000007f984bf578: cbnz  xscratch1, 0x0000007f984bf5f0
0x0000007f984bf57c: and   xscratch1, x1, #0x300
0x0000007f984bf580: cbnz  xscratch1, 0x0000007f984bf5b8
0x0000007f984bf584: mov   xscratch1, #0x37f                   // #895
0x0000007f984bf588: and   x0, x0, xscratch1
0x0000007f984bf58c: orr   x1, x0, xthread
0x0000007f984bf590: ldaxr xscratch1, [x3]
0x0000007f984bf594: cmp   xscratch1, x0
0x0000007f984bf598: b.ne  0x0000007f984bf5a8


Usually I call the program like this:

????
JAVA=/root/bwedenik/jdk8/jdk8/build/linux-aarch64-normal-server-release/jdk/bin/java

$JAVA -fullversion
$JAVA -server -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:+UseBiasedLocking -XX:+UseParallelGC -XX:ParallelGCThreads=10 -XX:+UseParallelOldGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15  -Xms10g -Xmx10g -Xmn4g -Xss64m -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand='print,*DeliveryTransaction.preprocess' spec.jbb.JBBmain -propfile SPECjbb.props
????


I tried to figure out if this problem only occurs with c1, c2 or pure interpretation mode and these are the results (calling java as usual including the given arguments):

* [-Xint] : This gives me neither the inserted nop?s nor the pattern I am searching for (as expected due to no compilation).
* [-client -Xcomp -XX:-TieredCompilation] : Here the cmp for #0x0 only occurs about 3 times in the whole disassembly, instead of about 200 times without these flags. In addition there are no of my inserted nop?s in the disass.
* [-server -Xcomp -XX:-TieredCompilation] : Same as -client.


My question is now how to find out why the rule does not match / if the rule is correct and how to find the actual rule which emits the code of my desired pattern.

Thanks in advance,
Benedikt Wedenik, Theobroma-Systems.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/bd39f6a1/attachment.html>

From filipp.zhinkin at gmail.com  Wed Apr 29 13:01:06 2015
From: filipp.zhinkin at gmail.com (Filipp Zhinkin)
Date: Wed, 29 Apr 2015 16:01:06 +0300
Subject: [8u60] Request for approval: backport of JDK-8050486
In-Reply-To: <553F8BE9.7090004@oracle.com>
References: <553F8BE9.7090004@oracle.com>
Message-ID: <CANQc0nfGXu5kPJHTxsFZL5x+JFXkV5XmLxJAByDMBdQWS7=bGQ@mail.gmail.com>

Hi Boris,

you have to backport JDK-8068272 [1] first, because this (JDK-8050486)
fix relies on it.

Regards,
Filipp.

[1] https://bugs.openjdk.java.net/browse/JDK-8068272

On Tue, Apr 28, 2015 at 4:32 PM, Boris Molodenkov
<boris.molodenkov at oracle.com> wrote:
> Hi All,
>
> I would like to backport fix for JDK-8050486 to 8u60.
>
> Bug id: https://bugs.openjdk.java.net/browse/JDK-8050486
> Webrev: http://cr.openjdk.java.net/~bmoloden/8050486/webrev.00/
> Changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/2f8520599d39
> Review thread for original fix:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016754.html
>
> testing: manual with jtreg
>
> Thanks,
> Boris
>

From adinn at redhat.com  Wed Apr 29 13:22:50 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 29 Apr 2015 14:22:50 +0100
Subject: aarch64 AD-file / matching rule
In-Reply-To: <00D2ABB8-C2DE-45C2-A515-1D6FC222208E@theobroma-systems.com>
References: <00D2ABB8-C2DE-45C2-A515-1D6FC222208E@theobroma-systems.com>
Message-ID: <5540DB2A.5030202@redhat.com>

On 29/04/15 13:50, Benedikt Wedenik wrote:
> I?m writing compiler-optimisations for the aarch64 port at the moment
> and I am using specjbb2005 for benchmarking. One of the patterns I
> want to optimise is the following:
> . . .
> My question is now how to find out why the rule does not match / if
> the rule is correct and how to find the actual rule which emits the
> code of my desired pattern.

In the long run you probably need to use gdb, placing breaks in th ead
file either in encodings or in inline code in the ins_encode section of
your rule.

One thing you could do before diving into that is to encode a comment
into the code buffer -- do this in all the rules which assemble 'and'
instructions (including your rule) e.g.

  ins_encode %{
    __ block_comment("optimised andw; cmpw; b.cc to andsw; b.cc");
    __ andsw(as_Register($src1$$reg),
        as_Register($src1$$reg),
        (unsigned long)($src2$$constant));
    __ br ((Assembler::Condition)$cmp$$cmpcode, *L);    __ ands(...)
  %}

When you disassemble the code you will see the comment in the
disassembly telling you where the code came from.

regards,


Andrew Dinn
-----------

From aph at redhat.com  Wed Apr 29 13:35:45 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 29 Apr 2015 14:35:45 +0100
Subject: aarch64 AD-file / matching rule
In-Reply-To: <5540DB2A.5030202@redhat.com>
References: <00D2ABB8-C2DE-45C2-A515-1D6FC222208E@theobroma-systems.com>
	<5540DB2A.5030202@redhat.com>
Message-ID: <5540DE31.9080302@redhat.com>

On 04/29/2015 02:22 PM, Andrew Dinn wrote:
> When you disassemble the code you will see the comment in the
> disassembly telling you where the code came from.

Iff you use a debug build.  Note to Benedikt: build a debug VM.

Andrew.


From michael.haupt at oracle.com  Wed Apr 29 13:51:03 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Wed, 29 Apr 2015 15:51:03 +0200
Subject: RFR (XL): 8075492: update IGV in hs-comp
Message-ID: <12D47C44-DF22-4DA2-9A30-444FA726D6D3@oracle.com>

Dear all,

please review and sponsor this (large, sorry) change.

RFE: https://bugs.openjdk.java.net/browse/JDK-8075492
Webrev: http://cr.openjdk.java.net/~mhaupt/8075492/webrev.00

The IGV version in the hs-comp repository - and consequently in OpenJDK - currently dates back to 2010. This change brings it up to date; essentially copying the source from the Graal repository over to hs-comp. In the process of doing so, the control flow window and basic block layout feature that existed in another outdated IGV repository on Kenai was also brought back to Graal's IGV, and is now part of this proposed change.

Only manual testing was applied. There are minor changes in HotSpot, which are only in code pertaining to IdealGraphPrinter. Thanks to Roland Westrelin and Doug Simon for helping with the testing.

Best,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/3cff9c16/attachment-0001.html>

From boris.molodenkov at oracle.com  Wed Apr 29 13:57:32 2015
From: boris.molodenkov at oracle.com (Boris Molodenkov)
Date: Wed, 29 Apr 2015 16:57:32 +0300
Subject: [8u60] Request for approval: backport of JDK-8050486
In-Reply-To: <CANQc0nfGXu5kPJHTxsFZL5x+JFXkV5XmLxJAByDMBdQWS7=bGQ@mail.gmail.com>
References: <553F8BE9.7090004@oracle.com>
	<CANQc0nfGXu5kPJHTxsFZL5x+JFXkV5XmLxJAByDMBdQWS7=bGQ@mail.gmail.com>
Message-ID: <5540E34C.8060308@oracle.com>

Hi Filipp,

Thank you! I'll do it.

On 04/29/2015 04:01 PM, Filipp Zhinkin wrote:
> Hi Boris,
>
> you have to backport JDK-8068272 [1] first, because this (JDK-8050486)
> fix relies on it.
>
> Regards,
> Filipp.
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8068272
>
> On Tue, Apr 28, 2015 at 4:32 PM, Boris Molodenkov
> <boris.molodenkov at oracle.com> wrote:
>> Hi All,
>>
>> I would like to backport fix for JDK-8050486 to 8u60.
>>
>> Bug id: https://bugs.openjdk.java.net/browse/JDK-8050486
>> Webrev: http://cr.openjdk.java.net/~bmoloden/8050486/webrev.00/
>> Changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/2f8520599d39
>> Review thread for original fix:
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016754.html
>>
>> testing: manual with jtreg
>>
>> Thanks,
>> Boris
>>


From roland.westrelin at oracle.com  Wed Apr 29 14:36:01 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 29 Apr 2015 16:36:01 +0200
Subject: RFR (XL): 8075492: update IGV in hs-comp
In-Reply-To: <12D47C44-DF22-4DA2-9A30-444FA726D6D3@oracle.com>
References: <12D47C44-DF22-4DA2-9A30-444FA726D6D3@oracle.com>
Message-ID: <5AB09E0F-F065-431C-AB25-0BEFDCF63EF0@oracle.com>

Hi Michael,

> Webrev: http://cr.openjdk.java.net/~mhaupt/8075492/webrev.00

Why do you need the change to BranchData::print_data_on()?

Otherwise I think it?s good. I only looked at the hotspot changes and I don?t see how the IGV changes themselves can be reviewed. I also gave it a quick try and spotted no problems.

Roland.

From goetz.lindenmaier at sap.com  Wed Apr 29 14:37:46 2015
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Wed, 29 Apr 2015 14:37:46 +0000
Subject: aarch64 AD-file / matching rule
In-Reply-To: <00D2ABB8-C2DE-45C2-A515-1D6FC222208E@theobroma-systems.com>
References: <00D2ABB8-C2DE-45C2-A515-1D6FC222208E@theobroma-systems.com>
Message-ID: <4295855A5C1DE049A61835A1887419CC2CFC2F71@DEWDFEMB12A.global.corp.sap>

Hi,

I am using PrintOptoAssembly in such cases.  This tells me how the IR is looking after
matching.  Together with PrintAssembly you can manage to locate the block
with the pattern.

With PrintIdeal you can see the graph before matching.  You should find the pattern
you described in the ad rule there.  Hard to read, though.

There is also the PrintIdealGraph flag, printing a graph you can visualize.
I didn't use that, though.  We have instrumented the opto compiler with
our own graph printer.

I could imagine that the AndI node has more than one usage/out edge.
Then it's not a tree-like subgraph, and the matcher can not apply the rule.
This is something you would check in the PrintIdeal output or in the last
Ideal graph before matching.

Best regards,
  Goetz.

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Benedikt Wedenik
Sent: Mittwoch, 29. April 2015 14:50
To: hotspot-compiler-dev at openjdk.java.net
Cc: Dr. Philipp Tomsich; Benedikt Huber
Subject: aarch64 AD-file / matching rule

Hi!

I'm writing compiler-optimisations for the aarch64 port at the moment and I am using specjbb2005 for benchmarking.
One of the patterns I want to optimise is the following:

  0x0000007f8c2961b4: and w2, w2, #0x7ffff8
  0x0000007f8c2961b8: cmp w2, #0x0
  0x0000007f8c2961bc: b.eq     0x0000007f8c2968f4


Here I see an opportunity for ands, b.eq.

I created a new rule in the cpu/aarch64/vm/aarch64.ad file.
My matching looks like this:

instruct and_cmp_branch(cmpOp cmp, immI0 zero, iRegIorL2I src1, immILog src2, label lbl, rFlagsReg cr) %{
  match(If cmp (CmpI (AndI src1 src2) zero) );

  effect(USE lbl);
  ins_cost(0); // is zero at the moment to be sure the rule is triggered.

  ins_encode %{
    Label* L = $lbl$$label;
    Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
    __ andsw(as_Register($src1$$reg),
        as_Register($src1$$reg),
        (unsigned long)($src2$$constant));
    __ br ((Assembler::Condition)$cmp$$cmpcode, *L);
  %}

  ins_pipe(pipe_cmp_branch); //TODO but not relevant yet
%}


As I don't know whether my matching-rule is wrong or something else stops the rule from getting emitted I wanted to find out which "and"-rule is triggered for this pattern.
I inserted some nop's to locate the according rule and I found out, that most of the emitted "and"s were surrounded by nop's except for my pattern and some few other ones like this one:

0x0000007f984bf568: eor   x1, x0, x1
0x0000007f984bf56c: and   x1, x1, #0xffffffffffffff87
0x0000007f984bf570: cbz   x1, 0x0000007f984bf664
0x0000007f984bf574: and   xscratch1, x1, #0x7
0x0000007f984bf578: cbnz  xscratch1, 0x0000007f984bf5f0
0x0000007f984bf57c: and   xscratch1, x1, #0x300
0x0000007f984bf580: cbnz  xscratch1, 0x0000007f984bf5b8
0x0000007f984bf584: mov   xscratch1, #0x37f                   // #895
0x0000007f984bf588: and   x0, x0, xscratch1
0x0000007f984bf58c: orr   x1, x0, xthread
0x0000007f984bf590: ldaxr xscratch1, [x3]
0x0000007f984bf594: cmp   xscratch1, x0
0x0000007f984bf598: b.ne  0x0000007f984bf5a8


Usually I call the program like this:

----
JAVA=/root/bwedenik/jdk8/jdk8/build/linux-aarch64-normal-server-release/jdk/bin/java

$JAVA -fullversion
$JAVA -server -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:+UseBiasedLocking -XX:+UseParallelGC -XX:ParallelGCThreads=10 -XX:+UseParallelOldGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15  -Xms10g -Xmx10g -Xmn4g -Xss64m -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand='print,*DeliveryTransaction.preprocess' spec.jbb.JBBmain -propfile SPECjbb.props
----


I tried to figure out if this problem only occurs with c1, c2 or pure interpretation mode and these are the results (calling java as usual including the given arguments):

* [-Xint] : This gives me neither the inserted nop's nor the pattern I am searching for (as expected due to no compilation).
* [-client -Xcomp -XX:-TieredCompilation] : Here the cmp for #0x0 only occurs about 3 times in the whole disassembly, instead of about 200 times without these flags. In addition there are no of my inserted nop's in the disass.
* [-server -Xcomp -XX:-TieredCompilation] : Same as -client.


My question is now how to find out why the rule does not match / if the rule is correct and how to find the actual rule which emits the code of my desired pattern.

Thanks in advance,
Benedikt Wedenik, Theobroma-Systems.com<http://Theobroma-Systems.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/a628e9bf/attachment-0001.html>

From michael.haupt at oracle.com  Wed Apr 29 14:40:35 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Wed, 29 Apr 2015 16:40:35 +0200
Subject: RFR (XL): 8075492: update IGV in hs-comp
In-Reply-To: <5AB09E0F-F065-431C-AB25-0BEFDCF63EF0@oracle.com>
References: <12D47C44-DF22-4DA2-9A30-444FA726D6D3@oracle.com>
	<5AB09E0F-F065-431C-AB25-0BEFDCF63EF0@oracle.com>
Message-ID: <736465FA-F43B-4F66-BBF9-913EE0288FDB@oracle.com>

Hi Roland,

> Am 29.04.2015 um 16:36 schrieb Roland Westrelin <roland.westrelin at oracle.com>:
>> Webrev: http://cr.openjdk.java.net/~mhaupt/8075492/webrev.00
> 
> Why do you need the change to BranchData::print_data_on()?

the extra line break otherwise emitted would confuse IGV's input parsing logic, generating a plethora of warning messages.

> Otherwise I think it?s good. I only looked at the hotspot changes and I don?t see how the IGV changes themselves can be reviewed. I also gave it a quick try and spotted no problems.

Thanks,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/6dd685bb/attachment.html>

From roland.westrelin at oracle.com  Wed Apr 29 14:45:33 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Wed, 29 Apr 2015 16:45:33 +0200
Subject: RFR (XL): 8075492: update IGV in hs-comp
In-Reply-To: <736465FA-F43B-4F66-BBF9-913EE0288FDB@oracle.com>
References: <12D47C44-DF22-4DA2-9A30-444FA726D6D3@oracle.com>
	<5AB09E0F-F065-431C-AB25-0BEFDCF63EF0@oracle.com>
	<736465FA-F43B-4F66-BBF9-913EE0288FDB@oracle.com>
Message-ID: <97A05047-C96A-487F-BE28-2F8C9BEDD099@oracle.com>

>>> Webrev: http://cr.openjdk.java.net/~mhaupt/8075492/webrev.00
>> 
>> Why do you need the change to BranchData::print_data_on()?
> 
> the extra line break otherwise emitted would confuse IGV's input parsing logic, generating a plethora of warning messages.

BranchData::print_data_on() is also used by PrintMethodData which prints a nicely formatted dump of profiling data. BranchData is not the only profiling data that is printed on multiple lines. Why does this one in particular confuses the IGV?

Roland.

From rickard.backman at oracle.com  Wed Apr 29 14:52:59 2015
From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=)
Date: Wed, 29 Apr 2015 16:52:59 +0200
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <553FF92E.3010908@oracle.com>
References: <20150428100337.GB31204@rbackman>
 <553FF92E.3010908@oracle.com>
Message-ID: <20150429145259.GD31204@rbackman>

On 04/28, Vladimir Kozlov wrote:
> You need closed SA changes.

I have them, just forgot to send the mail. Small changes too.

> 
> Style:
> 
> Move fields to the beginning of ImmutableOopMapBuilder and classes.
> Add comments describing each field in ALL new classes. Add comments
> to fields in old class. It will help next persone who will look on
> oop maps later.

All fields are at the beginning? Just following style and keeping
friends above fields as other classes in the same file has.

> 
> Add ResourceMark into ImmutableOopMapSet::build_from() to free memory allocated in ImmutableOopMapBuilder().
> Why you need ImmutableOopMapBuilder to be friend of class ImmutableOopMap?

It makes a call to data_addr() which is private. Only for debug builds
so I wrapped it in DEBUG_ONLY.

> 
> I think a simple loop in ImmutableOopMapBuilder::verify() would be faster than calling memcmp.

Fixed.

> 
> Field _end is not used.

Removed.

> 
> ImmutableOopMapBuilder() calls reset() and next called heap_size()
> calls reset() again. May be move reset() to the end of heap_size()
> so that you don't need to call it in fill().
> 

Better yet, the reset() method isn't required anymore.

Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2

/R

> Thanks,
> Vladimir
> 
> On 4/28/15 3:03 AM, Rickard B?ckman wrote:
> >Hi all,
> >
> >can I please have reviews for this change:
> >
> >RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
> >RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
> >
> >While looking at OopMaps a while ago I noticed that there were a couple
> >of different fields that were unused after the OopMaps were finalised.
> >
> >I took some time to investigate and rearrange the OopMaps. Since I
> >didn't want to change how the OopMaps are built I introduced new data
> >structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
> >structures are used to build up the OopMaps and when finalised they are
> >copied into the Immutable variants.
> >
> >The ImmutableOopMapSet contains a few fields [size, count] and then a
> >list of [pc, offset]. The offset points to the offset after the list
> >where the ImmutableOopMap is placed. By moving pc out from OopMap to be
> >part of the list we can now have multiple pcs with identical OopMaps
> >point to the same data.
> >
> >We only keep 1 empty OopMap, and the other compaction that is done in
> >this change is to check if the OopMap is identical to the previous one
> >and then reuse that one. So no complete uniqueness check.
> >
> >I ran a couple of small benchmarks and printed the size of the old
> >OopMaps vs the new. The new layout uses about 20 - 25% of the space on
> >the benchmarks I've run.
> >
> >Tested by running through JPRT, running BigApps and NSK.quick.testlist
> >
> >Thanks
> >/R
> >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/b7062db6/signature.asc>

From rickard.backman at oracle.com  Wed Apr 29 14:53:09 2015
From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=)
Date: Wed, 29 Apr 2015 16:53:09 +0200
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <553F7B95.7070103@oracle.com>
References: <20150428100337.GB31204@rbackman>
 <553F7B95.7070103@oracle.com>
Message-ID: <20150429145309.GE31204@rbackman>

On 04/28, Bertrand Delsart wrote:
> Hi,
> 
> First, thanks for the change. The additional benefit is that an
> ImmutableOopMapSet no longer contains any absolute references.
> 
> A few comments.
> 
> The ImmutableOopMapSet.java seems to be missing in the webrev.

Added.

> 
> There also seem to be a few issues with ImmutableOopMap::print_on
> and ImmutableOopMapSet::print_on:
> 
> - I did not spot the closing "}" corresponding to
>   "ImmutableOopMap{"

Fixed.

> 
> - the "map != last" part does not look complete. I assume that you
> are trying to dump only once the OopMap when it is shared by
> successive pcs but you are then missing a "last = map;" line
> somewhere in the if statement.

Thanks, missed that. Fixed.

> 
> - as a minor point, I'd rather print "offs:" or "pc offsets:"
> instead of "pcs:" because OopMapSet use pc offsets, not absolute
> pcs. [ Your CR might also be the right time to replace "pc" by
> "pc_offset" for a few field or variable names since this is a bit
> confusing. ]

Changed to "pc offsets:". Renamed some of the new fields as well. Didn't
want to mess with the old ones.

Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2/


Thanks
/R

> 
> Regards,
> 
> Bertrand.
> 
> On 28/04/2015 12:03, Rickard B?ckman wrote:
> >Hi all,
> >
> >can I please have reviews for this change:
> >
> >RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
> >RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
> >
> >While looking at OopMaps a while ago I noticed that there were a couple
> >of different fields that were unused after the OopMaps were finalised.
> >
> >I took some time to investigate and rearrange the OopMaps. Since I
> >didn't want to change how the OopMaps are built I introduced new data
> >structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
> >structures are used to build up the OopMaps and when finalised they are
> >copied into the Immutable variants.
> >
> >The ImmutableOopMapSet contains a few fields [size, count] and then a
> >list of [pc, offset]. The offset points to the offset after the list
> >where the ImmutableOopMap is placed. By moving pc out from OopMap to be
> >part of the list we can now have multiple pcs with identical OopMaps
> >point to the same data.
> >
> >We only keep 1 empty OopMap, and the other compaction that is done in
> >this change is to check if the OopMap is identical to the previous one
> >and then reuse that one. So no complete uniqueness check.
> >
> >I ran a couple of small benchmarks and printed the size of the old
> >OopMaps vs the new. The new layout uses about 20 - 25% of the space on
> >the benchmarks I've run.
> >
> >Tested by running through JPRT, running BigApps and NSK.quick.testlist
> >
> >Thanks
> >/R
> >
> 
> 
> -- 
> Bertrand Delsart,                     Grenoble Engineering Center
> Oracle,         180 av. de l'Europe,          ZIRST de Montbonnot
> 38330 Montbonnot Saint Martin,                             FRANCE
> bertrand.delsart at oracle.com             Phone : +33 4 76 18 81 23
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or
> distribution is prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of
> the original message.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/R
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/1be9982a/signature.asc>

From michael.haupt at oracle.com  Wed Apr 29 15:18:17 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Wed, 29 Apr 2015 17:18:17 +0200
Subject: RFR (XL): 8075492: update IGV in hs-comp
In-Reply-To: <97A05047-C96A-487F-BE28-2F8C9BEDD099@oracle.com>
References: <12D47C44-DF22-4DA2-9A30-444FA726D6D3@oracle.com>
	<5AB09E0F-F065-431C-AB25-0BEFDCF63EF0@oracle.com>
	<736465FA-F43B-4F66-BBF9-913EE0288FDB@oracle.com>
	<97A05047-C96A-487F-BE28-2F8C9BEDD099@oracle.com>
Message-ID: <D5C7C768-1CFA-4EF3-9400-DA348327DF92@oracle.com>

Hi Roland,

> Am 29.04.2015 um 16:45 schrieb Roland Westrelin <roland.westrelin at oracle.com>:
>>>> Webrev: http://cr.openjdk.java.net/~mhaupt/8075492/webrev.00
>>> 
>>> Why do you need the change to BranchData::print_data_on()?
>> 
>> the extra line break otherwise emitted would confuse IGV's input parsing logic, generating a plethora of warning messages.
> 
> BranchData::print_data_on() is also used by PrintMethodData which prints a nicely formatted dump of profiling data. BranchData is not the only profiling data that is printed on multiple lines. Why does this one in particular confuses the IGV?


The dump contains CDATA sections with method bytecode listings. One example is this:

9 if_icmple 17
  16  bci: 9    BranchData          taken(6701) displacement(32)
                                    not taken(0)

Note how the "not taken(0)" appears on an extra line. This extra line confuses the IGV parser, and it will issue a warning like "no match: not taken(0)" for each such case.

The point you're making is a good one; I'll take another look at whether the parsing logic could be improved.

Best,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/389bb14a/attachment-0001.html>

From boris.molodenkov at oracle.com  Wed Apr 29 15:41:29 2015
From: boris.molodenkov at oracle.com (Boris Molodenkov)
Date: Wed, 29 Apr 2015 18:41:29 +0300
Subject: [8u60] Request for approval: backport of JDK-8068272
Message-ID: <5540FBA9.6060606@oracle.com>

Hi All,

I would like to backport fix for JDK-8068272 to 8u60.

Bug id: https://bugs.openjdk.java.net/browse/JDK-8068272
Webrev: http://cr.openjdk.java.net/~bmoloden/8068272/webrev.00/
Changesets:
http://hg.openjdk.java.net/jdk9/jdk9/rev/a09f9fd80f87
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/2025390834c6
Review thread for original fix:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016751.html

testing: jprt

Thanks,
Boris


From boris.molodenkov at oracle.com  Wed Apr 29 15:44:14 2015
From: boris.molodenkov at oracle.com (Boris Molodenkov)
Date: Wed, 29 Apr 2015 18:44:14 +0300
Subject: [8u60] Request for approval: backport of JDK-8050486
In-Reply-To: <553FD69E.1040202@oracle.com>
References: <553F8BE9.7090004@oracle.com> <553FD69E.1040202@oracle.com>
Message-ID: <5540FC4E.4000707@oracle.com>

On 04/28/2015 09:51 PM, Vladimir Kozlov wrote:
> Good. Was there other changes in RTMTestBase.java which were not 
> backported? Changes are different in that file from jdk9 (but the 
> resulting flags are the same).
There are not other changes in RTMTestBase.java. Changes are different 
because initial versions of RTMTestBase.java in jdk9 and jdk8 slightly 
differ.

Thanks,
Boris
>
> Thanks,
> Vladimir
>
> On 4/28/15 6:32 AM, Boris Molodenkov wrote:
>> Hi All,
>>
>> I would like to backport fix for JDK-8050486 to 8u60.
>>
>> Bug id: https://bugs.openjdk.java.net/browse/JDK-8050486
>> Webrev: http://cr.openjdk.java.net/~bmoloden/8050486/webrev.00/
>> Changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/2f8520599d39
>> Review thread for original fix:
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016754.html 
>>
>>
>> testing: manual with jtreg
>>
>> Thanks,
>> Boris
>>


From vladimir.kozlov at oracle.com  Wed Apr 29 16:10:26 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Apr 2015 09:10:26 -0700
Subject: RFR(S): 8078426: mb/jvm/compiler/InterfaceCalls/testAC2 -
	assert(predicate_proj == 0L) failed: only one predicate entry expected
In-Reply-To: <914001D5-433D-4B22-9AE7-8672E667DA83@oracle.com>
References: <CDEB8295-F9E7-4152-AC6F-B1FA39501216@oracle.com>	<553FEFE9.4040305@oracle.com>
	<914001D5-433D-4B22-9AE7-8672E667DA83@oracle.com>
Message-ID: <55410272.2090205@oracle.com>

Okay. Thank you for explaining. You are right that we can't remove new predicates (null checks, etc).
Changes looks good.

Vladimir

On 4/29/15 2:26 AM, Roland Westrelin wrote:
> Hi Vladimir,
>
> Thanks for looking at this.
>
>> Can we remove predicates when loop is optimized out?
>> What code eliminates the loop?
>
> For the test case, the loop is found empty by IdealLoopTree::policy_do_remove_empty_loop() and the CountedLoopNode is effectively removed by RegionNode::Ideal() because it has a single input. For the crash in mb/jvm/compiler/InterfaceCalls/testAC2, the loop is also removed by RegionNode::Ideal() but I don?t know if it?s because the loop was empty or for another reason (I assume the backbranch could be removed also because the loop has a single iteration).
>
> We can?t remove all predicates without risking incorrect execution, right? The loop could for instance only perform a null check. Loop predication moves the null check out of the loop. The loop becomes empty so it goes away. But the null check can?t go away because the method is still supposed to throw an NPE if the null check fails.
>
> We could remove the predicates that GraphKit::add_predicate() adds and that will eventually go away (the ones that test an opaque node). That doesn?t help split_if because PhaseIdealLoop::find_predicate() could still find a predicate like the null check above. So we could remove the predicates that test on an opaque node when the loop goes dead, then change split_if so it looks not for any predicates but only for the predicates that test on an opaque node. But that doesn?t help either because AFAIU, split_if can be run after the loop optimizations are over and Opaque1Node::Identity() has removed those predicates.
>
> Roland.
>
>>
>> Vladimir
>>
>> On 4/28/15 1:38 AM, Roland Westrelin wrote:
>>> http://cr.openjdk.java.net/~roland/8078426/webrev.00/
>>>
>>> See test case: the loop is unswitched, then the loop bodies become empty so the loops are optimized out. The split if optimization then finds predicates it doesn?t expect on both branches of the unswitched loop test.
>>>
>>> Roland.
>>>
>

From michael.haupt at oracle.com  Wed Apr 29 16:17:01 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Wed, 29 Apr 2015 18:17:01 +0200
Subject: RFR (XL): 8075492: update IGV in hs-comp
In-Reply-To: <D5C7C768-1CFA-4EF3-9400-DA348327DF92@oracle.com>
References: <12D47C44-DF22-4DA2-9A30-444FA726D6D3@oracle.com>
	<5AB09E0F-F065-431C-AB25-0BEFDCF63EF0@oracle.com>
	<736465FA-F43B-4F66-BBF9-913EE0288FDB@oracle.com>
	<97A05047-C96A-487F-BE28-2F8C9BEDD099@oracle.com>
	<D5C7C768-1CFA-4EF3-9400-DA348327DF92@oracle.com>
Message-ID: <45948EB9-34E8-4D6F-9E30-D6062B0A91AE@oracle.com>

Hi Roland,

> Am 29.04.2015 um 17:18 schrieb Michael Haupt <michael.haupt at oracle.com>:
> The point you're making is a good one; I'll take another look at whether the parsing logic could be improved.

yes, done. The updated webrev is at http://cr.openjdk.java.net/~mhaupt/8075492/webrev.01 - the change in BranchData is no longer necessary, as the parsing logic (see InputNode.setBytecodes() in IGV) now honours indented lines as extra information.

Thanks for pointing this out,

Michael

-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/8d201797/attachment.html>

From vladimir.kozlov at oracle.com  Wed Apr 29 16:48:21 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Apr 2015 09:48:21 -0700
Subject: [8u60] Request for approval: backport of JDK-8068272
In-Reply-To: <5540FBA9.6060606@oracle.com>
References: <5540FBA9.6060606@oracle.com>
Message-ID: <55410B55.4000708@oracle.com>

Looks fine. But don't forget to push jdk changes - you did not pointed webrev for it. Does jdk8u need it?

Thanks,
Vladimir

On 4/29/15 8:41 AM, Boris Molodenkov wrote:
> Hi All,
>
> I would like to backport fix for JDK-8068272 to 8u60.
>
> Bug id: https://bugs.openjdk.java.net/browse/JDK-8068272
> Webrev: http://cr.openjdk.java.net/~bmoloden/8068272/webrev.00/
> Changesets:
> http://hg.openjdk.java.net/jdk9/jdk9/rev/a09f9fd80f87
> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/2025390834c6
> Review thread for original fix:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016751.html
>
> testing: jprt
>
> Thanks,
> Boris
>

From vladimir.kozlov at oracle.com  Wed Apr 29 17:08:32 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Apr 2015 10:08:32 -0700
Subject: RFR (XL): 8075492: update IGV in hs-comp
In-Reply-To: <45948EB9-34E8-4D6F-9E30-D6062B0A91AE@oracle.com>
References: <12D47C44-DF22-4DA2-9A30-444FA726D6D3@oracle.com>	<5AB09E0F-F065-431C-AB25-0BEFDCF63EF0@oracle.com>	<736465FA-F43B-4F66-BBF9-913EE0288FDB@oracle.com>	<97A05047-C96A-487F-BE28-2F8C9BEDD099@oracle.com>	<D5C7C768-1CFA-4EF3-9400-DA348327DF92@oracle.com>
	<45948EB9-34E8-4D6F-9E30-D6062B0A91AE@oracle.com>
Message-ID: <55411010.4000003@oracle.com>

Please, update Copyright year in all files which have it. For example:

  * Copyright (c) 2008, 2015, Oracle and/or its affiliates. All rights reserved.

New files can have also double years since they were created before.

Add copyright header to IdealGraphVisualizer/igv.sh

Otherwise looks fine.

Thanks,
Vladimir

On 4/29/15 9:17 AM, Michael Haupt wrote:
> Hi Roland,
>
>> Am 29.04.2015 um 17:18 schrieb Michael Haupt <michael.haupt at oracle.com <mailto:michael.haupt at oracle.com>>:
>> The point you're making is a good one; I'll take another look at whether the parsing logic could be improved.
>
> yes, done. The updated webrev is at http://cr.openjdk.java.net/~mhaupt/8075492/webrev.01 - the change in BranchData is
> no longer necessary, as the parsing logic (see InputNode.setBytecodes() in IGV) now honours indented lines as extra
> information.
>
> Thanks for pointing this out,
>
> Michael
>
> --
>
> Oracle <http://www.oracle.com/>
> Dr. Michael Haupt | Principal Member of Technical Staff
> Phone: +49 331 200 7277 | Fax: +49 331 200 7561
> OracleJava Platform Group | HotSpot Compiler Team
> Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
> Green Oracle <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help
> protect the environment
>
>

From vladimir.kozlov at oracle.com  Wed Apr 29 17:21:48 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Apr 2015 10:21:48 -0700
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <20150429145259.GD31204@rbackman>
References: <20150428100337.GB31204@rbackman> <553FF92E.3010908@oracle.com>
	<20150429145259.GD31204@rbackman>
Message-ID: <5541132C.9020208@oracle.com>

On 4/29/15 7:52 AM, Rickard B?ckman wrote:
> On 04/28, Vladimir Kozlov wrote:
>> You need closed SA changes.
>
> I have them, just forgot to send the mail. Small changes too.
>
>>
>> Style:
>>
>> Move fields to the beginning of ImmutableOopMapBuilder and classes.
>> Add comments describing each field in ALL new classes. Add comments
>> to fields in old class. It will help next persone who will look on
>> oop maps later.
>
> All fields are at the beginning? Just following style and keeping
> friends above fields as other classes in the same file has.

   friends
   fields
   methods

It is better to see fields before methods which access them. I am not sure what codding style you are talking about. All 
classes in oopMap.hpp has above layout.

I am asking to change layout of ImmutableOopMapBuilder and Mapping classes. Also why Mapping is inner class?


>
>>
>> Add ResourceMark into ImmutableOopMapSet::build_from() to free memory allocated in ImmutableOopMapBuilder().

no ResourceMark in 8064458.2

>> Why you need ImmutableOopMapBuilder to be friend of class ImmutableOopMap?
>
> It makes a call to data_addr() which is private. Only for debug builds
> so I wrapped it in DEBUG_ONLY.

Put ImmutableOopMapBuilder::verify() under #ifdef ASSERT and its declaration in DEBUG_ONLY.

Thanks,
Vladimir

>
>>
>> I think a simple loop in ImmutableOopMapBuilder::verify() would be faster than calling memcmp.
>
> Fixed.
>
>>
>> Field _end is not used.
>
> Removed.
>
>>
>> ImmutableOopMapBuilder() calls reset() and next called heap_size()
>> calls reset() again. May be move reset() to the end of heap_size()
>> so that you don't need to call it in fill().
>>
>
> Better yet, the reset() method isn't required anymore.
>
> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2
>
> /R
>
>> Thanks,
>> Vladimir
>>
>> On 4/28/15 3:03 AM, Rickard B?ckman wrote:
>>> Hi all,
>>>
>>> can I please have reviews for this change:
>>>
>>> RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
>>> RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
>>>
>>> While looking at OopMaps a while ago I noticed that there were a couple
>>> of different fields that were unused after the OopMaps were finalised.
>>>
>>> I took some time to investigate and rearrange the OopMaps. Since I
>>> didn't want to change how the OopMaps are built I introduced new data
>>> structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
>>> structures are used to build up the OopMaps and when finalised they are
>>> copied into the Immutable variants.
>>>
>>> The ImmutableOopMapSet contains a few fields [size, count] and then a
>>> list of [pc, offset]. The offset points to the offset after the list
>>> where the ImmutableOopMap is placed. By moving pc out from OopMap to be
>>> part of the list we can now have multiple pcs with identical OopMaps
>>> point to the same data.
>>>
>>> We only keep 1 empty OopMap, and the other compaction that is done in
>>> this change is to check if the OopMap is identical to the previous one
>>> and then reuse that one. So no complete uniqueness check.
>>>
>>> I ran a couple of small benchmarks and printed the size of the old
>>> OopMaps vs the new. The new layout uses about 20 - 25% of the space on
>>> the benchmarks I've run.
>>>
>>> Tested by running through JPRT, running BigApps and NSK.quick.testlist
>>>
>>> Thanks
>>> /R
>>>

From michael.c.berg at intel.com  Wed Apr 29 17:23:55 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Wed, 29 Apr 2015 17:23:55 +0000
Subject: RFR 8078563 - add profitability tests for reductions
In-Reply-To: <BA406C4C-87EA-43D2-8948-53F3F6045FD7@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DDE7A4@FMSMSX102.amr.corp.intel.com>
	<BA406C4C-87EA-43D2-8948-53F3F6045FD7@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474755DDF1A4@FMSMSX102.amr.corp.intel.com>

Christian, do you have any additional comments or does the code look ok?

Thanks,
Michael

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Monday, April 27, 2015 9:49 AM
To: Berg, Michael C
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR 8078563 - add profitability tests for reductions


+       // Length 2 reductions of INT/LONG do not offer performance benefits

+       if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) {

I don?t know that code very well but can there be reductions with size == 1?

On Apr 23, 2015, at 5:53 PM, Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>> wrote:

Hi Folks,

We (Intel) would like to add profitability tests to superword to gate scenarios where reduction optimization overhead is roughly equal to the benefit gained by vectorization.
We would like to do this for all x86 enabled microarchitectures that support reductions and superword.  This new constraint was tested on SSE and AVX (1,2) enabled platforms.
The contribution as referenced by RFR 8078563 is defined by the information at the links below.

Please review this bug entry and its code and comment as needed:

https://bugs.openjdk.java.net/browse/JDK-8078563

And its code and test addition (this is a small patch):

http://cr.openjdk.java.net/~kvn/8078563/webrev/


Vladimir Koslov has offered to sponsor this patch.

Thanks,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/ab9b8216/attachment.html>

From christian.thalinger at oracle.com  Wed Apr 29 17:27:01 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Wed, 29 Apr 2015 10:27:01 -0700
Subject: RFR 8078563 - add profitability tests for reductions
In-Reply-To: <C568518E7B433348B114B6A7122D474755DDF1A4@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755DDE7A4@FMSMSX102.amr.corp.intel.com>
	<BA406C4C-87EA-43D2-8948-53F3F6045FD7@oracle.com>
	<C568518E7B433348B114B6A7122D474755DDF1A4@FMSMSX102.amr.corp.intel.com>
Message-ID: <05BF44DD-26A9-4C22-8A0D-ABCEF31FCC86@oracle.com>

I think this looks good but as I said I?m not an expert in this area.  It would be good to have an additional reviewer.

> On Apr 29, 2015, at 10:23 AM, Berg, Michael C <michael.c.berg at intel.com> wrote:
> 
> Christian, do you have any additional comments or does the code look ok?
>  
> Thanks,
> Michael
>  
> From: Christian Thalinger [mailto:christian.thalinger at oracle.com] 
> Sent: Monday, April 27, 2015 9:49 AM
> To: Berg, Michael C
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR 8078563 - add profitability tests for reductions
>  
> +       // Length 2 reductions of INT/LONG do not offer performance benefits
> +       if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) {
>  
> I don?t know that code very well but can there be reductions with size == 1?
>  
> On Apr 23, 2015, at 5:53 PM, Berg, Michael C <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>  
> Hi Folks,
> 
> We (Intel) would like to add profitability tests to superword to gate scenarios where reduction optimization overhead is roughly equal to the benefit gained by vectorization.
> We would like to do this for all x86 enabled microarchitectures that support reductions and superword.  This new constraint was tested on SSE and AVX (1,2) enabled platforms.
> The contribution as referenced by RFR 8078563 is defined by the information at the links below.
> 
> Please review this bug entry and its code and comment as needed:
> 
> https://bugs.openjdk.java.net/browse/JDK-8078563 <https://bugs.openjdk.java.net/browse/JDK-8078563>
>  
> And its code and test addition (this is a small patch):
>  
> http://cr.openjdk.java.net/~kvn/8078563/webrev/ <http://cr.openjdk.java.net/~kvn/8078563/webrev/>
> 
> 
> Vladimir Koslov has offered to sponsor this patch.
>  
> Thanks,
> Michael
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/1a4f6440/attachment.html>

From vladimir.kozlov at oracle.com  Wed Apr 29 17:38:53 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Apr 2015 10:38:53 -0700
Subject: [9] RFR(S): 8078497: C2's superword optimization causes unaligned
	memory accesses
In-Reply-To: <5540D237.8000107@oracle.com>
References: <5540D237.8000107@oracle.com>
Message-ID: <5541172D.5010200@oracle.com>

Excellent description.

Consider reducing number of iterations in test from 1M so that the test does not timeout on slow platforms.

lines 511-515 are duplicates of 486-491.  Consider moving them before vw%span check.

Typo 'iff':
+    // final offset is a multiple of vw iff init_offset is a multiple.

Thanks,
Vladimir

On 4/29/15 5:44 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following patch.
>
> https://bugs.openjdk.java.net/browse/JDK-8078497
> http://cr.openjdk.java.net/~thartmann/8078497/webrev.00/
>
> Background information (simplified):
> After loop optimizations, C2 tries to vectorize memory operations by finding and merging adjacent memory accesses in the loop body. The superword optimization matches MemNodes with the following pattern as address input:
>
>    ptr + k*iv + constant [+ invar]
>
> where iv is the loop induction variable, k is a scaling factor that may be 0 and invar is an optional loop invariant value. C2 then picks the memory operation with most similar references and tries to align it in the main loop by setting the number of pre-loop iterations accordingly. Other adjacent memory operations that conform to this alignment are merged into packs. After extending and filtering these packs, final vector operations are emitted.
>
> The problem is that some architectures (for example, Sparc) require memory accesses to be aligned and in special cases the superword optimization generates code that contains unaligned vector instructions.
>
> Problems:
> (1) If two memory operations have different loop invariant offset values and C2 decides to align the main loop to one of them, we can always set the invariant of the other operation such that we break the alignment constraint. For example, if we have a store to a byte array a[i+inv1] and a load from a byte array b[i+inv2] (where i is the induction variable), C2 may decide to align the main loop such that (i+inv1 % 8) == 0 and replace the adjacent byte stores by a 8-byte double word store. Also the byte loads will be replaced by a double word load. If we then set inv2 = inv1 + 1 at runtime, the load will fail with a SIGBUS because the access is not 8-byte aligned, i.e., (i+inv2 % 8) != 0. Vectorization should not take place in this case.
>
> (2) If C2 decides to align the main loop to a memory operation, the necessary adjustment of the induction variable by the pre-loop is computed such that the resulting offset in the main loop is aligned to the vector width (see 'SuperWord::get_iv_adjustment'). If the offset of a memory operation is independent of the loop induction variable (i.e., scale k is 0), the iv adjustment should be 0.
>
> (3) If the loop span is greater than the vector width 'vw' of a memory operation, the superword optimization assumes that the pre-loop is never able to align this operation in the main loop (see 'SuperWord::ref_is_alignable'). This is wrong because if the loop span is a multiple of vw and depending on the initial offset, it may very well be possible to align the operation to vw.
>
> These problems originally only showed up in the string density code base (JDK-8054307) where we have a putChar intrinsic that writes a char value to two entries of a byte array. To reproduce the issue with JDK9 we have to make sure that the memory operations:
> (i) are independent,
> (ii) have different invariants,
> (iii) are not too complex (because then vectorization will not take place).
>
> To guarantee (i) we either need two different arrays (for example, byte[] and char[]) for the load and store operations or the same array but different offsets. Since the offsets should include a loop invariant part (ii), the superword optimization will not be able to determine that the runtime offset values do not overlap. We therefore use Unsafe.getChar/putChar to read/write a char value from/to a byte/char array and thereby guarantee independence.
>
> I came up with the following test (see 'TestVectorizationWithInvariant.java'):
>
>    byte[] src = new byte[1000];
>    byte[] dst = new char[1000];
>
>    for (int i = (int) CHAR_ARRAY_OFFSET; i < 100; i = i + 8) {
>      // Copy 8 chars from src to dst
>      unsafe.putChar(dst, i + 0, unsafe.getChar(src, off + 0));
>      [...]
>      unsafe.putChar(dst, i + 14, unsafe.getChar(src, off + 14));
>    }
>
> Problem (1) shows up since the main loop will be aligned to the StoreC[i + 0] which has no invariant. However, the LoadUS[off + 0] has the loop invariant 'off'. Setting off to BYTE_ARRAY_OFFSET + 2 will break the alignment of the emitted double word load and result in a crash.
>
> The LoadUS[off + 0] in above example is independent of the loop induction variable and therefore has no scale value. Because of problem (2), the iv adjustment is computed as -4 (see 'SuperWord::get_iv_adjustment').
>
>    offset = 16 iv_adjust = -4 elt_size = 2 scale = 0 iv_stride = 16 vect_size 8
>
> The regression test contains additional test cases that also trigger problem (3).
>
> Solution:
> (1) I added a check to 'SuperWord::find_adjacent_refs' to make sure that memory accesses with different invariants are only vectorized if the architecture supports unaligned memory accesses.
> (2) 'SuperWord::get_iv_adjustment' is modified such that it returns 0 if the memory operation is independent of iv.
> (3) In 'SuperWord::ref_is_alignable' we need to additionally check if span is divisible by vw. If so, the pre-loop will add multiples of vw to the initial offset and if that initial offset is divisible by vm, the final offset (after the pre-loop) will be divisible as well and we should return true. I added comments to the code describing this in detail.
>
> Testing:
> - original test with string-density codebase
> - regression test
> - JPRT
>
> Thanks,
> Tobias
>

From vladimir.x.ivanov at oracle.com  Wed Apr 29 18:11:08 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 29 Apr 2015 21:11:08 +0300
Subject: [9] RFR (S): 8059241: Incremental inlining is too hot when
	compiling Nashorn/Octane
In-Reply-To: <553DEFF9.90801@oracle.com>
References: <553AD508.8010609@oracle.com>
	<8A6D4EA1-6527-41D9-9DBB-C707C84FECCB@oracle.com>
	<553DEFF9.90801@oracle.com>
Message-ID: <55411EBC.8050008@oracle.com>

Sigh, I missed an important case.

Incremental inlining fails miserably when a call site dies while waiting 
for being inlined. It happens when previously inlined call causes some 
branches to be eliminated, but the info hasn't been propagated yet.

I tried to enhance dead code detection logic, but failed. So, I reverted 
the following part of original fix:
  "(1) Reduce PhaseRemoveUseless frequency: inline in larger chunks 
until IR size LiveNodeCountInliningCutoff, then eliminate dead nodes."

Updated webrev:
http://cr.openjdk.java.net/~vlivanov/8059241/webrev.01

Best regards,
Vladimir Ivanov

On 4/27/15 11:14 AM, Vladimir Ivanov wrote:
> Thanks for reviews, John, Roland, and Aleksey.
>
> Best regards,
> Vladimir Ivanov
>
> On 4/27/15 10:39 AM, Roland Westrelin wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8059241/webrev.00
>>
>> That looks good to me.
>>
>> Roland.
>>

From vladimir.x.ivanov at oracle.com  Wed Apr 29 18:17:16 2015
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 29 Apr 2015 21:17:16 +0300
Subject: [9] RFR (S): 8059241: Incremental inlining is too hot when
	compiling Nashorn/Octane
In-Reply-To: <55411EBC.8050008@oracle.com>
References: <553AD508.8010609@oracle.com>
	<8A6D4EA1-6527-41D9-9DBB-C707C84FECCB@oracle.com>
	<553DEFF9.90801@oracle.com> <55411EBC.8050008@oracle.com>
Message-ID: <5541202C.40302@oracle.com>

Regarding compilation times, the fix helps only if 
LiveNodeCountInliningCutoff is reached during compilation.

Best regards,
Vladimir Ivanov

On 4/29/15 9:11 PM, Vladimir Ivanov wrote:
> Sigh, I missed an important case.
>
> Incremental inlining fails miserably when a call site dies while waiting
> for being inlined. It happens when previously inlined call causes some
> branches to be eliminated, but the info hasn't been propagated yet.
>
> I tried to enhance dead code detection logic, but failed. So, I reverted
> the following part of original fix:
>   "(1) Reduce PhaseRemoveUseless frequency: inline in larger chunks
> until IR size LiveNodeCountInliningCutoff, then eliminate dead nodes."
>
> Updated webrev:
> http://cr.openjdk.java.net/~vlivanov/8059241/webrev.01
>
> Best regards,
> Vladimir Ivanov
>
> On 4/27/15 11:14 AM, Vladimir Ivanov wrote:
>> Thanks for reviews, John, Roland, and Aleksey.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 4/27/15 10:39 AM, Roland Westrelin wrote:
>>>> http://cr.openjdk.java.net/~vlivanov/8059241/webrev.00
>>>
>>> That looks good to me.
>>>
>>> Roland.
>>>

From bertrand.delsart at oracle.com  Wed Apr 29 18:17:58 2015
From: bertrand.delsart at oracle.com (Bertrand Delsart)
Date: Wed, 29 Apr 2015 20:17:58 +0200
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <20150429145309.GE31204@rbackman>
References: <20150428100337.GB31204@rbackman> <553F7B95.7070103@oracle.com>
	<20150429145309.GE31204@rbackman>
Message-ID: <55412056.2060908@oracle.com>

On 29/04/2015 16:53, Rickard B?ckman wrote:
> On 04/28, Bertrand Delsart wrote:
>> Hi,
>>
>> First, thanks for the change. The additional benefit is that an
>> ImmutableOopMapSet no longer contains any absolute references.
>>
>> A few comments.
>>
>> The ImmutableOopMapSet.java seems to be missing in the webrev.
>
> Added.

Thanks.

I suppose that ImmutableOopMap.java and ImmutableOopMapSet.java are 
renamings of the old non Immutable code. Did you use 'hg move' ? This 
may help get cleaner history and webrevs, showing what you modified in 
these files.

Note also the missing copyright header in ImmutableOopMapPair.java

Regards,

Bertrand.

>
>>
>> There also seem to be a few issues with ImmutableOopMap::print_on
>> and ImmutableOopMapSet::print_on:
>>
>> - I did not spot the closing "}" corresponding to
>>    "ImmutableOopMap{"
>
> Fixed.
>
>>
>> - the "map != last" part does not look complete. I assume that you
>> are trying to dump only once the OopMap when it is shared by
>> successive pcs but you are then missing a "last = map;" line
>> somewhere in the if statement.
>
> Thanks, missed that. Fixed.
>
>>
>> - as a minor point, I'd rather print "offs:" or "pc offsets:"
>> instead of "pcs:" because OopMapSet use pc offsets, not absolute
>> pcs. [ Your CR might also be the right time to replace "pc" by
>> "pc_offset" for a few field or variable names since this is a bit
>> confusing. ]
>
> Changed to "pc offsets:". Renamed some of the new fields as well. Didn't
> want to mess with the old ones.
>
> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2/
>
>
> Thanks
> /R
>
>>
>> Regards,
>>
>> Bertrand.
>>
>> On 28/04/2015 12:03, Rickard B?ckman wrote:
>>> Hi all,
>>>
>>> can I please have reviews for this change:
>>>
>>> RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
>>> RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
>>>
>>> While looking at OopMaps a while ago I noticed that there were a couple
>>> of different fields that were unused after the OopMaps were finalised.
>>>
>>> I took some time to investigate and rearrange the OopMaps. Since I
>>> didn't want to change how the OopMaps are built I introduced new data
>>> structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
>>> structures are used to build up the OopMaps and when finalised they are
>>> copied into the Immutable variants.
>>>
>>> The ImmutableOopMapSet contains a few fields [size, count] and then a
>>> list of [pc, offset]. The offset points to the offset after the list
>>> where the ImmutableOopMap is placed. By moving pc out from OopMap to be
>>> part of the list we can now have multiple pcs with identical OopMaps
>>> point to the same data.
>>>
>>> We only keep 1 empty OopMap, and the other compaction that is done in
>>> this change is to check if the OopMap is identical to the previous one
>>> and then reuse that one. So no complete uniqueness check.
>>>
>>> I ran a couple of small benchmarks and printed the size of the old
>>> OopMaps vs the new. The new layout uses about 20 - 25% of the space on
>>> the benchmarks I've run.
>>>
>>> Tested by running through JPRT, running BigApps and NSK.quick.testlist
>>>
>>> Thanks
>>> /R
>>>
>>
>>
>> --
>> Bertrand Delsart,                     Grenoble Engineering Center
>> Oracle,         180 av. de l'Europe,          ZIRST de Montbonnot
>> 38330 Montbonnot Saint Martin,                             FRANCE
>> bertrand.delsart at oracle.com             Phone : +33 4 76 18 81 23
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> NOTICE: This email message is for the sole use of the intended
>> recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or
>> distribution is prohibited. If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of
>> the original message.
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> /R
>


-- 
Bertrand Delsart,                     Grenoble Engineering Center
Oracle,         180 av. de l'Europe,          ZIRST de Montbonnot
38330 Montbonnot Saint Martin,                             FRANCE
bertrand.delsart at oracle.com             Phone : +33 4 76 18 81 23

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of
the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From john.r.rose at oracle.com  Wed Apr 29 18:47:02 2015
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 29 Apr 2015 11:47:02 -0700
Subject: [9] RFR (S): 8059241: Incremental inlining is too hot when
	compiling Nashorn/Octane
In-Reply-To: <55411EBC.8050008@oracle.com>
References: <553AD508.8010609@oracle.com>
	<8A6D4EA1-6527-41D9-9DBB-C707C84FECCB@oracle.com>
	<553DEFF9.90801@oracle.com> <55411EBC.8050008@oracle.com>
Message-ID: <8BB844D4-742E-4CC8-A709-60524DAB0FB6@oracle.com>

On Apr 29, 2015, at 11:11 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Incremental inlining fails miserably when a call site dies while waiting for being inlined. It happens when previously inlined call causes some branches to be eliminated, but the info hasn't been propagated yet.
> 
> I tried to enhance dead code detection logic, but failed. So, I reverted the following part of original fix:
> "(1) Reduce PhaseRemoveUseless frequency: inline in larger chunks until IR size LiveNodeCountInliningCutoff, then eliminate dead nodes."

Oh, that's sad and annoying.

I have a couple of comments.

Since you are cleaning up the contract for inline_incrementally_one (a good idea), remove the cross-call role of the global inlining_progress flag, by having inline_incrementally_one return a boolean sample of inlining_progress, meaning "there is more work to be done".  Then the only reader of that flag will be inline_incrementally_one itself, and the only asserter (writer to true) of the flag will be the call generator.

Some complex optimizations can become buggy when the graph changes due to dead path elimination.  One way we can defend against this is to delay the removal of dead paths to a cleanup phase.  There are several ways to do this trick.  To maintain certain connectivity properties, we introduce not-really-conditional NeverBranch nodes, which are rewritten extremely late.  A "washable" version of NeverBranch nodes (which "washes out" sooner than the current ones) could perhaps be used to delay path-cutting.  More commonly, in loop opts, we use various kinds of opaque nodes to block constant folding when it would be inconvenient.  Perhaps during incremental inlining we could block path-cutting by introducing some kind of opaque data or control node, which would wash out in the PhaseIterGVN that you originally intended to run after several rounds of incremental inlining.

I'm not sure if this would be profitable in this case, but the idea of delaying path-cutting has been useful in the past.

? John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/83868dd5/attachment.html>

From boris.molodenkov at oracle.com  Wed Apr 29 19:24:36 2015
From: boris.molodenkov at oracle.com (Boris Molodenkov)
Date: Wed, 29 Apr 2015 22:24:36 +0300
Subject: [8u60] Request for approval: backport of JDK-8068272
In-Reply-To: <55410B55.4000708@oracle.com>
References: <5540FBA9.6060606@oracle.com> <55410B55.4000708@oracle.com>
Message-ID: <55412FF4.8020305@oracle.com>

On 04/29/2015 07:48 PM, Vladimir Kozlov wrote:
> Looks fine. But don't forget to push jdk changes - you did not pointed 
> webrev for it. Does jdk8u need it?
Whitebox.java was not moved to root repository for jdk8.
So all changes are located in hotspot.

Thank you for review!
Boris
>
> Thanks,
> Vladimir
>
> On 4/29/15 8:41 AM, Boris Molodenkov wrote:
>> Hi All,
>>
>> I would like to backport fix for JDK-8068272 to 8u60.
>>
>> Bug id: https://bugs.openjdk.java.net/browse/JDK-8068272
>> Webrev: http://cr.openjdk.java.net/~bmoloden/8068272/webrev.00/
>> Changesets:
>> http://hg.openjdk.java.net/jdk9/jdk9/rev/a09f9fd80f87
>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/2025390834c6
>> Review thread for original fix:
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016751.html 
>>
>>
>> testing: jprt
>>
>> Thanks,
>> Boris
>>


From boris.molodenkov at oracle.com  Wed Apr 29 19:25:48 2015
From: boris.molodenkov at oracle.com (Boris Molodenkov)
Date: Wed, 29 Apr 2015 22:25:48 +0300
Subject: [8u60] Request for approval: backport of JDK-8050486
In-Reply-To: <55410B7A.4020308@oracle.com>
References: <553F8BE9.7090004@oracle.com> <553FD69E.1040202@oracle.com>
	<5540FC4E.4000707@oracle.com> <55410B7A.4020308@oracle.com>
Message-ID: <5541303C.1030603@oracle.com>

Thank you for review!
Boris

On 04/29/2015 07:48 PM, Vladimir Kozlov wrote:
> Okay. Thank you for explaining the difference.
>
> Vladimir
>
> On 4/29/15 8:44 AM, Boris Molodenkov wrote:
>> On 04/28/2015 09:51 PM, Vladimir Kozlov wrote:
>>> Good. Was there other changes in RTMTestBase.java which were not 
>>> backported? Changes are different in that file from
>>> jdk9 (but the resulting flags are the same).
>> There are not other changes in RTMTestBase.java. Changes are 
>> different because initial versions of RTMTestBase.java in
>> jdk9 and jdk8 slightly differ.
>>
>> Thanks,
>> Boris
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 4/28/15 6:32 AM, Boris Molodenkov wrote:
>>>> Hi All,
>>>>
>>>> I would like to backport fix for JDK-8050486 to 8u60.
>>>>
>>>> Bug id: https://bugs.openjdk.java.net/browse/JDK-8050486
>>>> Webrev: http://cr.openjdk.java.net/~bmoloden/8050486/webrev.00/
>>>> Changeset: 
>>>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/2f8520599d39
>>>> Review thread for original fix:
>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-December/016754.html 
>>>>
>>>>
>>>> testing: manual with jtreg
>>>>
>>>> Thanks,
>>>> Boris
>>>>
>>


From boris.molodenkov at oracle.com  Wed Apr 29 19:26:25 2015
From: boris.molodenkov at oracle.com (Boris Molodenkov)
Date: Wed, 29 Apr 2015 22:26:25 +0300
Subject: [8u60] Request for approval: backport of JDK-8058846
In-Reply-To: <553FD567.7020009@oracle.com>
References: <553F8BB0.7050302@oracle.com> <553FD567.7020009@oracle.com>
Message-ID: <55413061.2060104@oracle.com>

Thank you for review!
Boris

On 04/28/2015 09:45 PM, Vladimir Kozlov wrote:
> Looks good.
>
> Thanks,
> Vladimir
>
> On 4/28/15 6:31 AM, Boris Molodenkov wrote:
>> Hi All,
>>
>> I would like to backport fix for JDK-8058846 to 8u60.
>>
>> Bug id: https://bugs.openjdk.java.net/browse/JDK-8058846
>> Webrev: http://cr.openjdk.java.net/~bmoloden/8058846/webrev.00/
>> Changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4d1463933e28
>> Review thread for original fix:
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-November/016382.html 
>>
>>
>> testing: manual with jtreg
>>
>> Thanks,
>> Boris
>>


From vladimir.kozlov at oracle.com  Wed Apr 29 19:28:08 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Apr 2015 12:28:08 -0700
Subject: RFR(xs): 8078666: JVM fastdebug build compiled with GCC 5 asserts
	with "widen increases"
In-Reply-To: <1430294203.3356.9.camel@redhat.com>
References: <1430144315.3349.17.camel@redhat.com>
	<1430294203.3356.9.camel@redhat.com>
Message-ID: <554130C8.8090209@oracle.com>

Looks reasonable. I will test it and push to hs-comp/hotspot.

Thanks,
Vladimir

On 4/29/15 12:56 AM, Severin Gehwolf wrote:
> Hi,
>
> Adding hotspot-dev for wider audience. IMHO hotspot should not rely on
> undefined behaviour (overflow on signed int/long is undefined) and this
> should get fixed.
>
> --Severin
>
> On Mon, 2015-04-27 at 16:18 +0200, Severin Gehwolf wrote:
>> Hi,
>>
>> Could somebody please review and sponsor the following patch?
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8078666
>> Webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8078666/webrev.02/
>>
>> We've discovered this issue in Fedora where we were seeing a strange
>> memory leak issue of an OpenJDK build with GCC 5. More info in the bug.
>>
>> As it turns out, current hotspot relies on undefined behaviour in
>> normalize_int_widen()/normalize_long_widen() where an integer overflow
>> can occur on some inputs.
>>
>> The fix is to do the math on the unsigned type where overflows are well
>> defined.
>>
>> Thanks,
>> Severin
>>
>>
>>
>
>
>

From vladimir.kozlov at oracle.com  Wed Apr 29 21:11:26 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Apr 2015 14:11:26 -0700
Subject: RFR 8076276 support for AVX512
In-Reply-To: <553946F5.2090009@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DCE552@FMSMSX102.amr.corp.intel.com>	<55258337.2050605@oracle.com>	<C568518E7B433348B114B6A7122D474755DCEBDA@FMSMSX102.amr.corp.intel.com>	<55259078.1080309@oracle.com>	<C568518E7B433348B114B6A7122D474755DCED7C@FMSMSX102.amr.corp.intel.com>	<55271100.8080203@oracle.com>
	<553946F5.2090009@oracle.com>
Message-ID: <554148FE.2010007@oracle.com>

For the records, I reviewed it and I think it is good.

Thanks,
Vladimir

On 4/23/15 12:24 PM, Vladimir Kozlov wrote:
> Updated webrev:
>
> http://cr.openjdk.java.net/~kvn/8076276/webrev.02
>
> Passed JPRT testing.
>
> Changes:
>
> The assembler layer now handles KNL as well for EVEX, it's a target that
> will be available earlier than Skylake server.   This is done by
> carefully managing cpuid information and applying each machines
> characteristics to their code generation model.  I also added support
> for 32-bit compilation via the machine description which manage many of
> the same things in 64-bit with some additions for instruction size
> calculations, such as a static function which answers the question of
> displacement size for memory offsets.  You will see two versions, one
> which modifies the offset and answer the question of size range, another
> which statically takes all the equivalent object data as its dynamic
> counterpart as input to interpret if the displacement fits the motif.
> One is made to be run statically and one as part of assembler processing
> in its allocated object dynamically.  There is also a dummy region in
> 32-bit register description of floating point registers which are used
> to stage regmask alignment for the xmm register bank on that target.  I
> do this so that I can use the same code for both compiler models wrt
> register mask handling of vector components.  Please also note the new
> long java tests in superword.  The afore mentioned zmm save region for
> OS vector testing was ported to run in KNL mode.  The call save regions
> have been extended for both compilation models to handle their
> respective register banks and are working correctly.
>
> Thanks,
> Michael
>
> On 4/9/15 4:53 PM, Vladimir Kozlov wrote:
>> Michael,
>>
>> Thank you for detail explanation. I need to clarify by request:
>>
>> 1. I am fine with kmov amd Kregister definitions and usage in assembler,
>> macroassembler and stubs.
>>
>> 2. I don't want KRegister and Kmove in C2 code (opto/ and .ad files)
>> until we have full support for them in RA and signal processing.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/9/15 4:02 PM, Berg, Michael C wrote:
>>> Vladimir, some explanation of the EVEX encoding model is needed:
>>>
>>> Some instructions are agnostic to vector length and can take the
>>> implicit k0 definition in encoding.  Some instructions must have
>>> predication definitions for their mask application to SIMD, which
>>> explicitly exclude k0. The range usage of predication mask registers
>>> must be k1..k7 as a real definition which code must provide with a
>>> mask value.  The EVEX enabled machine environment does not
>>> automatically initialize any of the mask assignable registers
>>> (k1..k7), so we must emit kmov instructions which gather an immediate
>>> value from a gpr register.  You will see code such as this in the
>>> review.  This effectively means KRegister must stay in the
>>> implementation, but I can accommodate the lion share of what you have
>>> indicated.  The places where KRegister is used via the assembler layer
>>> are:
>>>
>>> src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265,
>>> src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it
>>> needs one too"
>>> src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046
>>>
>>> This is in place of formal register allocation for now as well as when
>>> we do more extravagant things with SIMD masks.  I will keep the webrev
>>> around so I can easily add these pieces back in as we are going to
>>> need them.
>>> Also there are many other mask register instructions in the ISA which
>>> we will need to make use of in the future.  If this is amenable I will
>>> look into the other changes and resend the webrev accordingly modified.
>>>
>>> Thanks,
>>> Michael
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, April 08, 2015 1:33 PM
>>> To: Berg, Michael C
>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR 8076276 support for AVX512
>>>
>>> Michael, please, make sure to include mailing lists in replies - it is
>>> review process.
>>>
>>> I understand that K register may be important but I don't see the need
>>> to include it in these changes which are huge already. We can do it as
>>> separate changes unless you point me where they are critical needed
>>> for avx512 instructions.
>>> I don't see the use of it in current changes which simple widen
>>> vectors to 512 bits.
>>>
>>> I am concern that K reg implementation is incomplete but it is hard to
>>> see and review it in current changes.
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 4/8/15 1:09 PM, Berg, Michael C wrote:
>>>> Vladimir, RegK is needed as it frames the kmov instructions which
>>>> utilize KRegister and the enumerated k registers, which are critically
>>>> needed and used, although not yet matched (we use k1 and k0 now).  I
>>>> will look into to the rest of the comments.  The plan is to register
>>>> allocate the k registers at some point though.
>>>>
>>>> Thanks,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: hotspot-compiler-dev
>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of
>>>> Vladimir Kozlov
>>>> Sent: Wednesday, April 08, 2015 12:36 PM
>>>> To: hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR 8076276 support for AVX512
>>>>
>>>> I would suggest to remove MoveK and RegK from these changes since
>>>> they are not used.
>>>> We can add them later when you have the use case.
>>>>
>>>> sharedRuntime_x86_64.* You should have code and not comment:
>>>> // TODO: add ZMM save code
>>>>
>>>> vm_version_x86.cpp Add code to verify that system preserve Z
>>>> registers during interrupt. See code after comment :
>>>>
>>>> // Some OSs have a bug when upper 128bits of YMM
>>>>
>>>>
>>>> I see repeated next pattern in C1 code. It should be moved to a
>>>> function in FrameMap:
>>>>
>>>> +        int num_caller_save_xmm_regs =
>>>> +FrameMap::nof_caller_save_xmm_regs;
>>>> +#if _LP64
>>>> +        if (UseAVX < 3) {
>>>> +          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
>>>> +        }
>>>> +#endif
>>>>
>>>>
>>>> In general we should avoid using #ifdef X86 in shared code:
>>>> matcher.cpp. This file will not be issue if you remove RegK from
>>>> changes.
>>>>
>>>> c2compiler.cpp - can you move that code to
>>>> Compile::pd_compiler2_init() which is platform specific?
>>>>
>>>> matcher.cpp - typo 'eno':
>>>>
>>>> +    // For VecZ we need eno alignment and 64 bytes (16 slots) for
>>>> spills.
>>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>
>>>> On 4/6/15 6:35 PM, Berg, Michael C wrote:
>>>>> Hi Folks,
>>>>>
>>>>> We (Intel) would like to contribute initial support for AVX512 (EVEX
>>>>> encoding, new register support, new ISA support,
>>>>> etc) for EVEX enabled microarchitectures.
>>>>> The contribution is referenced as Bug ID 8076276 as a performance
>>>>> enhancement.
>>>>>
>>>>> Please review this patch and comment as needed:
>>>>>
>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>>>>>
>>>>> webrev:
>>>>> http://cr.openjdk.java.net/~kvn/8076276/webrev
>>>>>
>>>>> Superword optimizations covered on the vectorization path experience
>>>>> as much as 50% reduction in loop trace instruction count which make
>>>>> up the path length of EVEX encoded SIMD optimized loops.
>>>>>
>>>>> Vladimir Koslov has offered to sponsor this patch.
>>>>>

From michael.c.berg at intel.com  Wed Apr 29 21:38:53 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Wed, 29 Apr 2015 21:38:53 +0000
Subject: RFR 8078563 - add profitability tests for reductions
In-Reply-To: <05BF44DD-26A9-4C22-8A0D-ABCEF31FCC86@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DDE7A4@FMSMSX102.amr.corp.intel.com>
	<BA406C4C-87EA-43D2-8948-53F3F6045FD7@oracle.com>
	<C568518E7B433348B114B6A7122D474755DDF1A4@FMSMSX102.amr.corp.intel.com>
	<05BF44DD-26A9-4C22-8A0D-ABCEF31FCC86@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474755DDF327@FMSMSX102.amr.corp.intel.com>

Vladimir, does this patch also look ok to you as well?
I believe you have reviewed the changes.

Thanks,
Michael

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Wednesday, April 29, 2015 10:27 AM
To: Berg, Michael C
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR 8078563 - add profitability tests for reductions

I think this looks good but as I said I?m not an expert in this area.  It would be good to have an additional reviewer.

On Apr 29, 2015, at 10:23 AM, Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>> wrote:

Christian, do you have any additional comments or does the code look ok?

Thanks,
Michael

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Monday, April 27, 2015 9:49 AM
To: Berg, Michael C
Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR 8078563 - add profitability tests for reductions


+       // Length 2 reductions of INT/LONG do not offer performance benefits

+       if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) {

I don?t know that code very well but can there be reductions with size == 1?

On Apr 23, 2015, at 5:53 PM, Berg, Michael C <michael.c.berg at intel.com<mailto:michael.c.berg at intel.com>> wrote:

Hi Folks,

We (Intel) would like to add profitability tests to superword to gate scenarios where reduction optimization overhead is roughly equal to the benefit gained by vectorization.
We would like to do this for all x86 enabled microarchitectures that support reductions and superword.  This new constraint was tested on SSE and AVX (1,2) enabled platforms.
The contribution as referenced by RFR 8078563 is defined by the information at the links below.

Please review this bug entry and its code and comment as needed:

https://bugs.openjdk.java.net/browse/JDK-8078563

And its code and test addition (this is a small patch):

http://cr.openjdk.java.net/~kvn/8078563/webrev/


Vladimir Koslov has offered to sponsor this patch.

Thanks,
Michael


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150429/ae5c7863/attachment-0001.html>

From michael.c.berg at intel.com  Wed Apr 29 21:39:58 2015
From: michael.c.berg at intel.com (Berg, Michael C)
Date: Wed, 29 Apr 2015 21:39:58 +0000
Subject: RFR 8076276 support for AVX512
In-Reply-To: <554148FE.2010007@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DCE552@FMSMSX102.amr.corp.intel.com>
	<55258337.2050605@oracle.com>
	<C568518E7B433348B114B6A7122D474755DCEBDA@FMSMSX102.amr.corp.intel.com>
	<55259078.1080309@oracle.com>
	<C568518E7B433348B114B6A7122D474755DCED7C@FMSMSX102.amr.corp.intel.com>
	<55271100.8080203@oracle.com> <553946F5.2090009@oracle.com>
	<554148FE.2010007@oracle.com>
Message-ID: <C568518E7B433348B114B6A7122D474755DDF334@FMSMSX102.amr.corp.intel.com>

Thanks Vladimir for the review and for sponsoring this set of changes.  
Can a second person please take a look at this patch and comment as needed.

Thanks in advance,
Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, April 29, 2015 2:11 PM
To: hotspot-compiler-dev at openjdk.java.net
Cc: Berg, Michael C
Subject: Re: RFR 8076276 support for AVX512

For the records, I reviewed it and I think it is good.

Thanks,
Vladimir

On 4/23/15 12:24 PM, Vladimir Kozlov wrote:
> Updated webrev:
>
> http://cr.openjdk.java.net/~kvn/8076276/webrev.02
>
> Passed JPRT testing.
>
> Changes:
>
> The assembler layer now handles KNL as well for EVEX, it's a target that
> will be available earlier than Skylake server.   This is done by
> carefully managing cpuid information and applying each machines 
> characteristics to their code generation model.  I also added support 
> for 32-bit compilation via the machine description which manage many 
> of the same things in 64-bit with some additions for instruction size 
> calculations, such as a static function which answers the question of 
> displacement size for memory offsets.  You will see two versions, one 
> which modifies the offset and answer the question of size range, 
> another which statically takes all the equivalent object data as its 
> dynamic counterpart as input to interpret if the displacement fits the motif.
> One is made to be run statically and one as part of assembler 
> processing in its allocated object dynamically.  There is also a dummy 
> region in 32-bit register description of floating point registers 
> which are used to stage regmask alignment for the xmm register bank on 
> that target.  I do this so that I can use the same code for both 
> compiler models wrt register mask handling of vector components.  
> Please also note the new long java tests in superword.  The afore 
> mentioned zmm save region for OS vector testing was ported to run in 
> KNL mode.  The call save regions have been extended for both 
> compilation models to handle their respective register banks and are working correctly.
>
> Thanks,
> Michael
>
> On 4/9/15 4:53 PM, Vladimir Kozlov wrote:
>> Michael,
>>
>> Thank you for detail explanation. I need to clarify by request:
>>
>> 1. I am fine with kmov amd Kregister definitions and usage in 
>> assembler, macroassembler and stubs.
>>
>> 2. I don't want KRegister and Kmove in C2 code (opto/ and .ad files) 
>> until we have full support for them in RA and signal processing.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/9/15 4:02 PM, Berg, Michael C wrote:
>>> Vladimir, some explanation of the EVEX encoding model is needed:
>>>
>>> Some instructions are agnostic to vector length and can take the 
>>> implicit k0 definition in encoding.  Some instructions must have 
>>> predication definitions for their mask application to SIMD, which 
>>> explicitly exclude k0. The range usage of predication mask registers 
>>> must be k1..k7 as a real definition which code must provide with a 
>>> mask value.  The EVEX enabled machine environment does not 
>>> automatically initialize any of the mask assignable registers 
>>> (k1..k7), so we must emit kmov instructions which gather an 
>>> immediate value from a gpr register.  You will see code such as this 
>>> in the review.  This effectively means KRegister must stay in the 
>>> implementation, but I can accommodate the lion share of what you 
>>> have indicated.  The places where KRegister is used via the 
>>> assembler layer
>>> are:
>>>
>>> src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265,
>>> src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it 
>>> needs one too"
>>> src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046
>>>
>>> This is in place of formal register allocation for now as well as 
>>> when we do more extravagant things with SIMD masks.  I will keep the 
>>> webrev around so I can easily add these pieces back in as we are 
>>> going to need them.
>>> Also there are many other mask register instructions in the ISA 
>>> which we will need to make use of in the future.  If this is 
>>> amenable I will look into the other changes and resend the webrev accordingly modified.
>>>
>>> Thanks,
>>> Michael
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, April 08, 2015 1:33 PM
>>> To: Berg, Michael C
>>> Cc: hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR 8076276 support for AVX512
>>>
>>> Michael, please, make sure to include mailing lists in replies - it 
>>> is review process.
>>>
>>> I understand that K register may be important but I don't see the 
>>> need to include it in these changes which are huge already. We can 
>>> do it as separate changes unless you point me where they are 
>>> critical needed for avx512 instructions.
>>> I don't see the use of it in current changes which simple widen 
>>> vectors to 512 bits.
>>>
>>> I am concern that K reg implementation is incomplete but it is hard 
>>> to see and review it in current changes.
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 4/8/15 1:09 PM, Berg, Michael C wrote:
>>>> Vladimir, RegK is needed as it frames the kmov instructions which 
>>>> utilize KRegister and the enumerated k registers, which are 
>>>> critically needed and used, although not yet matched (we use k1 and 
>>>> k0 now).  I will look into to the rest of the comments.  The plan 
>>>> is to register allocate the k registers at some point though.
>>>>
>>>> Thanks,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: hotspot-compiler-dev
>>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of 
>>>> Vladimir Kozlov
>>>> Sent: Wednesday, April 08, 2015 12:36 PM
>>>> To: hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR 8076276 support for AVX512
>>>>
>>>> I would suggest to remove MoveK and RegK from these changes since 
>>>> they are not used.
>>>> We can add them later when you have the use case.
>>>>
>>>> sharedRuntime_x86_64.* You should have code and not comment:
>>>> // TODO: add ZMM save code
>>>>
>>>> vm_version_x86.cpp Add code to verify that system preserve Z 
>>>> registers during interrupt. See code after comment :
>>>>
>>>> // Some OSs have a bug when upper 128bits of YMM
>>>>
>>>>
>>>> I see repeated next pattern in C1 code. It should be moved to a 
>>>> function in FrameMap:
>>>>
>>>> +        int num_caller_save_xmm_regs = 
>>>> +FrameMap::nof_caller_save_xmm_regs;
>>>> +#if _LP64
>>>> +        if (UseAVX < 3) {
>>>> +          num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2;
>>>> +        }
>>>> +#endif
>>>>
>>>>
>>>> In general we should avoid using #ifdef X86 in shared code:
>>>> matcher.cpp. This file will not be issue if you remove RegK from 
>>>> changes.
>>>>
>>>> c2compiler.cpp - can you move that code to
>>>> Compile::pd_compiler2_init() which is platform specific?
>>>>
>>>> matcher.cpp - typo 'eno':
>>>>
>>>> +    // For VecZ we need eno alignment and 64 bytes (16 slots) for
>>>> spills.
>>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>
>>>> On 4/6/15 6:35 PM, Berg, Michael C wrote:
>>>>> Hi Folks,
>>>>>
>>>>> We (Intel) would like to contribute initial support for AVX512 
>>>>> (EVEX encoding, new register support, new ISA support,
>>>>> etc) for EVEX enabled microarchitectures.
>>>>> The contribution is referenced as Bug ID 8076276 as a performance 
>>>>> enhancement.
>>>>>
>>>>> Please review this patch and comment as needed:
>>>>>
>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276
>>>>>
>>>>> webrev:
>>>>> http://cr.openjdk.java.net/~kvn/8076276/webrev
>>>>>
>>>>> Superword optimizations covered on the vectorization path 
>>>>> experience as much as 50% reduction in loop trace instruction 
>>>>> count which make up the path length of EVEX encoded SIMD optimized loops.
>>>>>
>>>>> Vladimir Koslov has offered to sponsor this patch.
>>>>>

From vladimir.kozlov at oracle.com  Wed Apr 29 21:47:23 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Apr 2015 14:47:23 -0700
Subject: RFR 8078563 - add profitability tests for reductions
In-Reply-To: <C568518E7B433348B114B6A7122D474755DDF327@FMSMSX102.amr.corp.intel.com>
References: <C568518E7B433348B114B6A7122D474755DDE7A4@FMSMSX102.amr.corp.intel.com>
	<BA406C4C-87EA-43D2-8948-53F3F6045FD7@oracle.com>
	<C568518E7B433348B114B6A7122D474755DDF1A4@FMSMSX102.amr.corp.intel.com>
	<05BF44DD-26A9-4C22-8A0D-ABCEF31FCC86@oracle.com>
	<C568518E7B433348B114B6A7122D474755DDF327@FMSMSX102.amr.corp.intel.com>
Message-ID: <5541516B.5080100@oracle.com>

I want to test it with my and your tests when I have time.

Thanks,
Vladimir

On 4/29/15 2:38 PM, Berg, Michael C wrote:
> Vladimir, does this patch also look ok to you as well?
>
> I believe you have reviewed the changes.
>
> Thanks,
>
> Michael
>
> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
> *Sent:* Wednesday, April 29, 2015 10:27 AM
> *To:* Berg, Michael C
> *Cc:* hotspot-compiler-dev at openjdk.java.net
> *Subject:* Re: RFR 8078563 - add profitability tests for reductions
>
> I think this looks good but as I said I?m not an expert in this area.
>   It would be good to have an additional reviewer.
>
>     On Apr 29, 2015, at 10:23 AM, Berg, Michael C
>     <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>
>     Christian, do you have any additional comments or does the code look ok?
>
>     Thanks,
>
>     Michael
>
>     *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>     *Sent:* Monday, April 27, 2015 9:49 AM
>     *To:* Berg, Michael C
>     *Cc:* hotspot-compiler-dev at openjdk.java.net
>     <mailto:hotspot-compiler-dev at openjdk.java.net>
>     *Subject:* Re: RFR 8078563 - add profitability tests for reductions
>
>     *+       // Length 2 reductions of INT/LONG do not offer performance benefits*
>
>     *+       if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) {*
>
>     I don?t know that code very well but can there be reductions with
>     size == 1?
>
>         On Apr 23, 2015, at 5:53 PM, Berg, Michael C
>         <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>
>         Hi Folks,
>
>         We (Intel) would like to add profitability tests to superword to
>         gate scenarios where reduction optimization overhead is roughly
>         equal to the benefit gained by vectorization.
>
>         We would like to do this for all x86 enabled microarchitectures
>         that support reductions and superword.  This new constraint was
>         tested on SSE and AVX (1,2) enabled platforms.
>         The contribution as referenced by RFR 8078563 is defined by the
>         information at the links below.
>
>         Please review this bug entry and its code and comment as needed:
>
>         https://bugs.openjdk.java.net/browse/JDK-8078563
>
>         And its code and test addition (this is a small patch):
>
>         http://cr.openjdk.java.net/~kvn/8078563/webrev/
>
>
>
>         Vladimir Koslov has offered to sponsor this patch.
>
>         Thanks,
>
>         Michael
>

From vladimir.kozlov at oracle.com  Wed Apr 29 22:35:38 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 29 Apr 2015 15:35:38 -0700
Subject: RFR 8078563 - add profitability tests for reductions
In-Reply-To: <5541516B.5080100@oracle.com>
References: <C568518E7B433348B114B6A7122D474755DDE7A4@FMSMSX102.amr.corp.intel.com>	<BA406C4C-87EA-43D2-8948-53F3F6045FD7@oracle.com>	<C568518E7B433348B114B6A7122D474755DDF1A4@FMSMSX102.amr.corp.intel.com>	<05BF44DD-26A9-4C22-8A0D-ABCEF31FCC86@oracle.com>	<C568518E7B433348B114B6A7122D474755DDF327@FMSMSX102.amr.corp.intel.com>
	<5541516B.5080100@oracle.com>
Message-ID: <55415CBA.2050508@oracle.com>

Testing shows no problems so I will push it.

Thanks,
Vladimir

On 4/29/15 2:47 PM, Vladimir Kozlov wrote:
> I want to test it with my and your tests when I have time.
>
> Thanks,
> Vladimir
>
> On 4/29/15 2:38 PM, Berg, Michael C wrote:
>> Vladimir, does this patch also look ok to you as well?
>>
>> I believe you have reviewed the changes.
>>
>> Thanks,
>>
>> Michael
>>
>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> *Sent:* Wednesday, April 29, 2015 10:27 AM
>> *To:* Berg, Michael C
>> *Cc:* hotspot-compiler-dev at openjdk.java.net
>> *Subject:* Re: RFR 8078563 - add profitability tests for reductions
>>
>> I think this looks good but as I said I?m not an expert in this area.
>>   It would be good to have an additional reviewer.
>>
>>     On Apr 29, 2015, at 10:23 AM, Berg, Michael C
>>     <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>> wrote:
>>
>>     Christian, do you have any additional comments or does the code
>> look ok?
>>
>>     Thanks,
>>
>>     Michael
>>
>>     *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com]
>>     *Sent:* Monday, April 27, 2015 9:49 AM
>>     *To:* Berg, Michael C
>>     *Cc:* hotspot-compiler-dev at openjdk.java.net
>>     <mailto:hotspot-compiler-dev at openjdk.java.net>
>>     *Subject:* Re: RFR 8078563 - add profitability tests for reductions
>>
>>     *+       // Length 2 reductions of INT/LONG do not offer
>> performance benefits*
>>
>>     *+       if (((arith_type->basic_type() == T_INT) ||
>> (arith_type->basic_type() == T_LONG)) && (size == 2)) {*
>>
>>     I don?t know that code very well but can there be reductions with
>>     size == 1?
>>
>>         On Apr 23, 2015, at 5:53 PM, Berg, Michael C
>>         <michael.c.berg at intel.com <mailto:michael.c.berg at intel.com>>
>> wrote:
>>
>>         Hi Folks,
>>
>>         We (Intel) would like to add profitability tests to superword to
>>         gate scenarios where reduction optimization overhead is roughly
>>         equal to the benefit gained by vectorization.
>>
>>         We would like to do this for all x86 enabled microarchitectures
>>         that support reductions and superword.  This new constraint was
>>         tested on SSE and AVX (1,2) enabled platforms.
>>         The contribution as referenced by RFR 8078563 is defined by the
>>         information at the links below.
>>
>>         Please review this bug entry and its code and comment as needed:
>>
>>         https://bugs.openjdk.java.net/browse/JDK-8078563
>>
>>         And its code and test addition (this is a small patch):
>>
>>         http://cr.openjdk.java.net/~kvn/8078563/webrev/
>>
>>
>>
>>         Vladimir Koslov has offered to sponsor this patch.
>>
>>         Thanks,
>>
>>         Michael
>>

From michael.haupt at oracle.com  Thu Apr 30 07:22:53 2015
From: michael.haupt at oracle.com (Michael Haupt)
Date: Thu, 30 Apr 2015 09:22:53 +0200
Subject: RFR (XL): 8075492: update IGV in hs-comp
In-Reply-To: <55411010.4000003@oracle.com>
References: <12D47C44-DF22-4DA2-9A30-444FA726D6D3@oracle.com>
	<5AB09E0F-F065-431C-AB25-0BEFDCF63EF0@oracle.com>
	<736465FA-F43B-4F66-BBF9-913EE0288FDB@oracle.com>
	<97A05047-C96A-487F-BE28-2F8C9BEDD099@oracle.com>
	<D5C7C768-1CFA-4EF3-9400-DA348327DF92@oracle.com>
	<45948EB9-34E8-4D6F-9E30-D6062B0A91AE@oracle.com>
	<55411010.4000003@oracle.com>
Message-ID: <4D940F1A-7A91-4EE0-81C8-F3306E137671@oracle.com>

Hi Vladimir,

thanks, done: http://cr.openjdk.java.net/~mhaupt/8075492/webrev.02

Best,

Michael

> Am 29.04.2015 um 19:08 schrieb Vladimir Kozlov <vladimir.kozlov at oracle.com>:
> 
> Please, update Copyright year in all files which have it. For example:
> 
> * Copyright (c) 2008, 2015, Oracle and/or its affiliates. All rights reserved.
> 
> New files can have also double years since they were created before.
> 
> Add copyright header to IdealGraphVisualizer/igv.sh
> 
> Otherwise looks fine.
> 
> Thanks,
> Vladimir


-- 

 <http://www.oracle.com/>
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | HotSpot Compiler Team 
Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany
 <http://www.oracle.com/commitment>	Oracle is committed to developing practices and products that help protect the environment

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150430/f0a5edf8/attachment-0001.html>

From tobias.hartmann at oracle.com  Thu Apr 30 09:06:57 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 30 Apr 2015 11:06:57 +0200
Subject: [9] RFR(S): 8078497: C2's superword optimization causes unaligned
	memory accesses
In-Reply-To: <5541172D.5010200@oracle.com>
References: <5540D237.8000107@oracle.com> <5541172D.5010200@oracle.com>
Message-ID: <5541F0B1.50804@oracle.com>

Hi Vladimir,

thanks for the review!

On 29.04.2015 19:38, Vladimir Kozlov wrote: 
> Consider reducing number of iterations in test from 1M so that the test does not timeout on slow platforms.

I reduced the number of iterations to 20k and verified that it is enough to trigger compilation and crash on Sparc.

> lines 511-515 are duplicates of 486-491.  Consider moving them before vw%span check.

Thanks, I refactored that part.

> Typo 'iff':
> +    // final offset is a multiple of vw iff init_offset is a multiple.

I used 'iff' to refer to 'if and only if' [1]. I changed it.

Here is the new webrev:
http://cr.openjdk.java.net/~thartmann/8078497/webrev.01/

Thanks,
Tobias

[1] http://en.wikipedia.org/wiki/If_and_only_if

> 
> Thanks,
> Vladimir
> 
> On 4/29/15 5:44 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8078497
>> http://cr.openjdk.java.net/~thartmann/8078497/webrev.00/
>>
>> Background information (simplified):
>> After loop optimizations, C2 tries to vectorize memory operations by finding and merging adjacent memory accesses in the loop body. The superword optimization matches MemNodes with the following pattern as address input:
>>
>>    ptr + k*iv + constant [+ invar]
>>
>> where iv is the loop induction variable, k is a scaling factor that may be 0 and invar is an optional loop invariant value. C2 then picks the memory operation with most similar references and tries to align it in the main loop by setting the number of pre-loop iterations accordingly. Other adjacent memory operations that conform to this alignment are merged into packs. After extending and filtering these packs, final vector operations are emitted.
>>
>> The problem is that some architectures (for example, Sparc) require memory accesses to be aligned and in special cases the superword optimization generates code that contains unaligned vector instructions.
>>
>> Problems:
>> (1) If two memory operations have different loop invariant offset values and C2 decides to align the main loop to one of them, we can always set the invariant of the other operation such that we break the alignment constraint. For example, if we have a store to a byte array a[i+inv1] and a load from a byte array b[i+inv2] (where i is the induction variable), C2 may decide to align the main loop such that (i+inv1 % 8) == 0 and replace the adjacent byte stores by a 8-byte double word store. Also the byte loads will be replaced by a double word load. If we then set inv2 = inv1 + 1 at runtime, the load will fail with a SIGBUS because the access is not 8-byte aligned, i.e., (i+inv2 % 8) != 0. Vectorization should not take place in this case.
>>
>> (2) If C2 decides to align the main loop to a memory operation, the necessary adjustment of the induction variable by the pre-loop is computed such that the resulting offset in the main loop is aligned to the vector width (see 'SuperWord::get_iv_adjustment'). If the offset of a memory operation is independent of the loop induction variable (i.e., scale k is 0), the iv adjustment should be 0.
>>
>> (3) If the loop span is greater than the vector width 'vw' of a memory operation, the superword optimization assumes that the pre-loop is never able to align this operation in the main loop (see 'SuperWord::ref_is_alignable'). This is wrong because if the loop span is a multiple of vw and depending on the initial offset, it may very well be possible to align the operation to vw.
>>
>> These problems originally only showed up in the string density code base (JDK-8054307) where we have a putChar intrinsic that writes a char value to two entries of a byte array. To reproduce the issue with JDK9 we have to make sure that the memory operations:
>> (i) are independent,
>> (ii) have different invariants,
>> (iii) are not too complex (because then vectorization will not take place).
>>
>> To guarantee (i) we either need two different arrays (for example, byte[] and char[]) for the load and store operations or the same array but different offsets. Since the offsets should include a loop invariant part (ii), the superword optimization will not be able to determine that the runtime offset values do not overlap. We therefore use Unsafe.getChar/putChar to read/write a char value from/to a byte/char array and thereby guarantee independence.
>>
>> I came up with the following test (see 'TestVectorizationWithInvariant.java'):
>>
>>    byte[] src = new byte[1000];
>>    byte[] dst = new char[1000];
>>
>>    for (int i = (int) CHAR_ARRAY_OFFSET; i < 100; i = i + 8) {
>>      // Copy 8 chars from src to dst
>>      unsafe.putChar(dst, i + 0, unsafe.getChar(src, off + 0));
>>      [...]
>>      unsafe.putChar(dst, i + 14, unsafe.getChar(src, off + 14));
>>    }
>>
>> Problem (1) shows up since the main loop will be aligned to the StoreC[i + 0] which has no invariant. However, the LoadUS[off + 0] has the loop invariant 'off'. Setting off to BYTE_ARRAY_OFFSET + 2 will break the alignment of the emitted double word load and result in a crash.
>>
>> The LoadUS[off + 0] in above example is independent of the loop induction variable and therefore has no scale value. Because of problem (2), the iv adjustment is computed as -4 (see 'SuperWord::get_iv_adjustment').
>>
>>    offset = 16 iv_adjust = -4 elt_size = 2 scale = 0 iv_stride = 16 vect_size 8
>>
>> The regression test contains additional test cases that also trigger problem (3).
>>
>> Solution:
>> (1) I added a check to 'SuperWord::find_adjacent_refs' to make sure that memory accesses with different invariants are only vectorized if the architecture supports unaligned memory accesses.
>> (2) 'SuperWord::get_iv_adjustment' is modified such that it returns 0 if the memory operation is independent of iv.
>> (3) In 'SuperWord::ref_is_alignable' we need to additionally check if span is divisible by vw. If so, the pre-loop will add multiples of vw to the initial offset and if that initial offset is divisible by vm, the final offset (after the pre-loop) will be divisible as well and we should return true. I added comments to the code describing this in detail.
>>
>> Testing:
>> - original test with string-density codebase
>> - regression test
>> - JPRT
>>
>> Thanks,
>> Tobias
>>

From benedikt.wedenik at theobroma-systems.com  Thu Apr 30 10:06:32 2015
From: benedikt.wedenik at theobroma-systems.com (Benedikt Wedenik)
Date: Thu, 30 Apr 2015 12:06:32 +0200
Subject: aarch64 AD-file / matching rule
In-Reply-To: <4295855A5C1DE049A61835A1887419CC2CFC2F71@DEWDFEMB12A.global.corp.sap>
References: <00D2ABB8-C2DE-45C2-A515-1D6FC222208E@theobroma-systems.com>
	<4295855A5C1DE049A61835A1887419CC2CFC2F71@DEWDFEMB12A.global.corp.sap>
Message-ID: <40E09326-B771-47C6-8174-17BAE8D151D2@theobroma-systems.com>

Hi,

thanks for your quick help!
But I found out, that the pattern I was searching for is emitted here:
cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp

This means, my pattern will never match the rule in the AD-file because it is more or less ?hardcoded? :)
I wrote a small simulation program to see if the rule would match in JIT-compiled code and it worked.

I?ll do some more investigation in how to optimise this pattern in the C++ code because it occurs quite often.

Thanks again, 
Benedikt


On 29 Apr 2015, at 16:37, Lindenmaier, Goetz <goetz.lindenmaier at sap.com> wrote:

> Hi,
>  
> I am using PrintOptoAssembly in such cases.  This tells me how the IR is looking after
> matching.  Together with PrintAssembly you can manage to locate the block
> with the pattern.
>  
> With PrintIdeal you can see the graph before matching.  You should find the pattern
> you described in the ad rule there.  Hard to read, though.
>  
> There is also the PrintIdealGraph flag, printing a graph you can visualize.
> I didn?t use that, though.  We have instrumented the opto compiler with
> our own graph printer.
>  
> I could imagine that the AndI node has more than one usage/out edge.
> Then it?s not a tree-like subgraph, and the matcher can not apply the rule.
> This is something you would check in the PrintIdeal output or in the last
> Ideal graph before matching.
>  
> Best regards,
>   Goetz.
>  
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Benedikt Wedenik
> Sent: Mittwoch, 29. April 2015 14:50
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: Dr. Philipp Tomsich; Benedikt Huber
> Subject: aarch64 AD-file / matching rule
>  
> Hi!
>  
> I?m writing compiler-optimisations for the aarch64 port at the moment and I am using specjbb2005 for benchmarking.
> One of the patterns I want to optimise is the following:
>  
>   0x0000007f8c2961b4: and w2, w2, #0x7ffff8
>   0x0000007f8c2961b8: cmp w2, #0x0
>   0x0000007f8c2961bc: b.eq     0x0000007f8c2968f4
>  
>  
> Here I see an opportunity for ands, b.eq.
>  
> I created a new rule in the cpu/aarch64/vm/aarch64.ad file.
> My matching looks like this:
>  
> instruct and_cmp_branch(cmpOp cmp, immI0 zero, iRegIorL2I src1, immILog src2, label lbl, rFlagsReg cr) %{
>   match(If cmp (CmpI (AndI src1 src2) zero) );
>  
>   effect(USE lbl);
>   ins_cost(0); // is zero at the moment to be sure the rule is triggered.
>  
>   ins_encode %{
>     Label* L = $lbl$$label;
>     Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
>     __ andsw(as_Register($src1$$reg),
>         as_Register($src1$$reg),
>         (unsigned long)($src2$$constant));
>     __ br ((Assembler::Condition)$cmp$$cmpcode, *L);
>   %}  
>  
>   ins_pipe(pipe_cmp_branch); //TODO but not relevant yet
> %}
>  
>  
> As I don?t know whether my matching-rule is wrong or something else stops the rule from getting emitted I wanted to find out which ?and?-rule is triggered for this pattern.
> I inserted some nop?s to locate the according rule and I found out, that most of the emitted ?and?s were surrounded by nop?s except for my pattern and some few other ones like this one:
>  
> 0x0000007f984bf568: eor   x1, x0, x1
> 0x0000007f984bf56c: and   x1, x1, #0xffffffffffffff87
> 0x0000007f984bf570: cbz   x1, 0x0000007f984bf664
> 0x0000007f984bf574: and   xscratch1, x1, #0x7
> 0x0000007f984bf578: cbnz  xscratch1, 0x0000007f984bf5f0
> 0x0000007f984bf57c: and   xscratch1, x1, #0x300
> 0x0000007f984bf580: cbnz  xscratch1, 0x0000007f984bf5b8
> 0x0000007f984bf584: mov   xscratch1, #0x37f                   // #895
> 0x0000007f984bf588: and   x0, x0, xscratch1
> 0x0000007f984bf58c: orr   x1, x0, xthread
> 0x0000007f984bf590: ldaxr xscratch1, [x3]
> 0x0000007f984bf594: cmp   xscratch1, x0
> 0x0000007f984bf598: b.ne  0x0000007f984bf5a8
>  
>  
> Usually I call the program like this:
>  
> ????
> JAVA=/root/bwedenik/jdk8/jdk8/build/linux-aarch64-normal-server-release/jdk/bin/java
>  
> $JAVA -fullversion
> $JAVA -server -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:+UseBiasedLocking -XX:+UseParallelGC -XX:ParallelGCThreads=10 -XX:+UseParallelOldGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15  -Xms10g -Xmx10g -Xmn4g -Xss64m -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand='print,*DeliveryTransaction.preprocess' spec.jbb.JBBmain -propfile SPECjbb.props
> ????
>  
>  
> I tried to figure out if this problem only occurs with c1, c2 or pure interpretation mode and these are the results (calling java as usual including the given arguments):
>  
> * [-Xint] : This gives me neither the inserted nop?s nor the pattern I am searching for (as expected due to no compilation).
> * [-client -Xcomp -XX:-TieredCompilation] : Here the cmp for #0x0 only occurs about 3 times in the whole disassembly, instead of about 200 times without these flags. In addition there are no of my inserted nop?s in the disass.
> * [-server -Xcomp -XX:-TieredCompilation] : Same as -client.
>  
>  
> My question is now how to find out why the rule does not match / if the rule is correct and how to find the actual rule which emits the code of my desired pattern.
>  
> Thanks in advance,
> Benedikt Wedenik, Theobroma-Systems.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150430/66e46630/attachment-0001.html>

From aph at redhat.com  Thu Apr 30 10:27:23 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 30 Apr 2015 11:27:23 +0100
Subject: aarch64 AD-file / matching rule
In-Reply-To: <40E09326-B771-47C6-8174-17BAE8D151D2@theobroma-systems.com>
References: <00D2ABB8-C2DE-45C2-A515-1D6FC222208E@theobroma-systems.com>	<4295855A5C1DE049A61835A1887419CC2CFC2F71@DEWDFEMB12A.global.corp.sap>
	<40E09326-B771-47C6-8174-17BAE8D151D2@theobroma-systems.com>
Message-ID: <5542038B.30301@redhat.com>

On 30/04/15 11:06, Benedikt Wedenik wrote:
> But I found out, that the pattern I was searching for is emitted here:
> cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp

That's the C1 (i.e. the client JIT) compiler, which does very little
optimization.  These days the only real purpose of C1 is to generate
profile data for C2.  Of course it is nice if C1 generates code which
is basically decent.

> This means, my pattern will never match the rule in the AD-file
> because it is more or less ?hardcoded? :)

Right.  C1 does not use the .ad file, and indeed its patterns are
hard-coded.

> I wrote a small simulation program to see if the rule would match in
> JIT-compiled code and it worked.
> 
> I?ll do some more investigation in how to optimise this pattern in
> the C++ code because it occurs quite often.

I would expect C2 to do this:

  and     w3, w12, #0x7ffff
  cbnz    w3, ...

Andrew.

From roland.westrelin at oracle.com  Thu Apr 30 11:30:45 2015
From: roland.westrelin at oracle.com (Roland Westrelin)
Date: Thu, 30 Apr 2015 13:30:45 +0200
Subject: RFR(S): 8078426: mb/jvm/compiler/InterfaceCalls/testAC2 -
	assert(predicate_proj == 0L) failed: only one predicate entry expected
In-Reply-To: <55410272.2090205@oracle.com>
References: <CDEB8295-F9E7-4152-AC6F-B1FA39501216@oracle.com>
	<553FEFE9.4040305@oracle.com>
	<914001D5-433D-4B22-9AE7-8672E667DA83@oracle.com>
	<55410272.2090205@oracle.com>
Message-ID: <9D528997-8343-46F4-81E1-764624164201@oracle.com>

> Okay. Thank you for explaining. You are right that we can't remove new predicates (null checks, etc).
> Changes looks good.

Thanks for the review and discussion.

Roland.

> 
> Vladimir
> 
> On 4/29/15 2:26 AM, Roland Westrelin wrote:
>> Hi Vladimir,
>> 
>> Thanks for looking at this.
>> 
>>> Can we remove predicates when loop is optimized out?
>>> What code eliminates the loop?
>> 
>> For the test case, the loop is found empty by IdealLoopTree::policy_do_remove_empty_loop() and the CountedLoopNode is effectively removed by RegionNode::Ideal() because it has a single input. For the crash in mb/jvm/compiler/InterfaceCalls/testAC2, the loop is also removed by RegionNode::Ideal() but I don?t know if it?s because the loop was empty or for another reason (I assume the backbranch could be removed also because the loop has a single iteration).
>> 
>> We can?t remove all predicates without risking incorrect execution, right? The loop could for instance only perform a null check. Loop predication moves the null check out of the loop. The loop becomes empty so it goes away. But the null check can?t go away because the method is still supposed to throw an NPE if the null check fails.
>> 
>> We could remove the predicates that GraphKit::add_predicate() adds and that will eventually go away (the ones that test an opaque node). That doesn?t help split_if because PhaseIdealLoop::find_predicate() could still find a predicate like the null check above. So we could remove the predicates that test on an opaque node when the loop goes dead, then change split_if so it looks not for any predicates but only for the predicates that test on an opaque node. But that doesn?t help either because AFAIU, split_if can be run after the loop optimizations are over and Opaque1Node::Identity() has removed those predicates.
>> 
>> Roland.
>> 
>>> 
>>> Vladimir
>>> 
>>> On 4/28/15 1:38 AM, Roland Westrelin wrote:
>>>> http://cr.openjdk.java.net/~roland/8078426/webrev.00/
>>>> 
>>>> See test case: the loop is unswitched, then the loop bodies become empty so the loops are optimized out. The split if optimization then finds predicates it doesn?t expect on both branches of the unswitched loop test.
>>>> 
>>>> Roland.
>>>> 
>> 


From rickard.backman at oracle.com  Thu Apr 30 13:07:32 2015
From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=)
Date: Thu, 30 Apr 2015 15:07:32 +0200
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <55412056.2060908@oracle.com>
References: <20150428100337.GB31204@rbackman> <553F7B95.7070103@oracle.com>
	<20150429145309.GE31204@rbackman> <55412056.2060908@oracle.com>
Message-ID: <20150430130732.GF31204@rbackman>

On 04/29, Bertrand Delsart wrote:
> On 29/04/2015 16:53, Rickard B?ckman wrote:
> >On 04/28, Bertrand Delsart wrote:
> >>Hi,
> >>
> >>First, thanks for the change. The additional benefit is that an
> >>ImmutableOopMapSet no longer contains any absolute references.
> >>
> >>A few comments.
> >>
> >>The ImmutableOopMapSet.java seems to be missing in the webrev.
> >
> >Added.
> 
> Thanks.
> 
> I suppose that ImmutableOopMap.java and ImmutableOopMapSet.java are
> renamings of the old non Immutable code. Did you use 'hg move' ?
> This may help get cleaner history and webrevs, showing what you
> modified in these files.

Thanks, no my idea moved the files for me. Probably not using hg move.
I'll see what I can do about it.

> 
> Note also the missing copyright header in ImmutableOopMapPair.java
> 

Thanks.

/R

> Regards,
> 
> Bertrand.
> 
> >
> >>
> >>There also seem to be a few issues with ImmutableOopMap::print_on
> >>and ImmutableOopMapSet::print_on:
> >>
> >>- I did not spot the closing "}" corresponding to
> >>   "ImmutableOopMap{"
> >
> >Fixed.
> >
> >>
> >>- the "map != last" part does not look complete. I assume that you
> >>are trying to dump only once the OopMap when it is shared by
> >>successive pcs but you are then missing a "last = map;" line
> >>somewhere in the if statement.
> >
> >Thanks, missed that. Fixed.
> >
> >>
> >>- as a minor point, I'd rather print "offs:" or "pc offsets:"
> >>instead of "pcs:" because OopMapSet use pc offsets, not absolute
> >>pcs. [ Your CR might also be the right time to replace "pc" by
> >>"pc_offset" for a few field or variable names since this is a bit
> >>confusing. ]
> >
> >Changed to "pc offsets:". Renamed some of the new fields as well. Didn't
> >want to mess with the old ones.
> >
> >Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2/
> >
> >
> >Thanks
> >/R
> >
> >>
> >>Regards,
> >>
> >>Bertrand.
> >>
> >>On 28/04/2015 12:03, Rickard B?ckman wrote:
> >>>Hi all,
> >>>
> >>>can I please have reviews for this change:
> >>>
> >>>RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
> >>>RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
> >>>
> >>>While looking at OopMaps a while ago I noticed that there were a couple
> >>>of different fields that were unused after the OopMaps were finalised.
> >>>
> >>>I took some time to investigate and rearrange the OopMaps. Since I
> >>>didn't want to change how the OopMaps are built I introduced new data
> >>>structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
> >>>structures are used to build up the OopMaps and when finalised they are
> >>>copied into the Immutable variants.
> >>>
> >>>The ImmutableOopMapSet contains a few fields [size, count] and then a
> >>>list of [pc, offset]. The offset points to the offset after the list
> >>>where the ImmutableOopMap is placed. By moving pc out from OopMap to be
> >>>part of the list we can now have multiple pcs with identical OopMaps
> >>>point to the same data.
> >>>
> >>>We only keep 1 empty OopMap, and the other compaction that is done in
> >>>this change is to check if the OopMap is identical to the previous one
> >>>and then reuse that one. So no complete uniqueness check.
> >>>
> >>>I ran a couple of small benchmarks and printed the size of the old
> >>>OopMaps vs the new. The new layout uses about 20 - 25% of the space on
> >>>the benchmarks I've run.
> >>>
> >>>Tested by running through JPRT, running BigApps and NSK.quick.testlist
> >>>
> >>>Thanks
> >>>/R
> >>>
> >>
> >>
> >>--
> >>Bertrand Delsart,                     Grenoble Engineering Center
> >>Oracle,         180 av. de l'Europe,          ZIRST de Montbonnot
> >>38330 Montbonnot Saint Martin,                             FRANCE
> >>bertrand.delsart at oracle.com             Phone : +33 4 76 18 81 23
> >>
> >>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>NOTICE: This email message is for the sole use of the intended
> >>recipient(s) and may contain confidential and privileged
> >>information. Any unauthorized review, use, disclosure or
> >>distribution is prohibited. If you are not the intended recipient,
> >>please contact the sender by reply email and destroy all copies of
> >>the original message.
> >>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >/R
> >
> 
> 
> -- 
> Bertrand Delsart,                     Grenoble Engineering Center
> Oracle,         180 av. de l'Europe,          ZIRST de Montbonnot
> 38330 Montbonnot Saint Martin,                             FRANCE
> bertrand.delsart at oracle.com             Phone : +33 4 76 18 81 23
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or
> distribution is prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of
> the original message.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/R
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150430/a2bf2c16/signature.asc>

From rickard.backman at oracle.com  Thu Apr 30 14:18:18 2015
From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=)
Date: Thu, 30 Apr 2015 16:18:18 +0200
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <5541132C.9020208@oracle.com>
References: <20150428100337.GB31204@rbackman> <553FF92E.3010908@oracle.com>
	<20150429145259.GD31204@rbackman> <5541132C.9020208@oracle.com>
Message-ID: <20150430141818.GG31204@rbackman>

On 04/29, Vladimir Kozlov wrote:
> On 4/29/15 7:52 AM, Rickard B?ckman wrote:
> >On 04/28, Vladimir Kozlov wrote:
> >>You need closed SA changes.
> >
> >I have them, just forgot to send the mail. Small changes too.
> >
> >>
> >>Style:
> >>
> >>Move fields to the beginning of ImmutableOopMapBuilder and classes.
> >>Add comments describing each field in ALL new classes. Add comments
> >>to fields in old class. It will help next persone who will look on
> >>oop maps later.
> >
> >All fields are at the beginning? Just following style and keeping
> >friends above fields as other classes in the same file has.
> 
>   friends
>   fields
>   methods
> 
> It is better to see fields before methods which access them. I am
> not sure what codding style you are talking about. All classes in
> oopMap.hpp has above layout.
> 
> I am asking to change layout of ImmutableOopMapBuilder and Mapping classes. Also why Mapping is inner class?

Ah, I was looking at oopMap.hpp and didn't see them. Fixed.
Mapping is an inner class because it is only used by the Builder and I
didn't want to clutter the namespace with something as general (and bad)
as Mapping. I can move it out if that is preferred.

> 
> 
> >
> >>
> >>Add ResourceMark into ImmutableOopMapSet::build_from() to free memory allocated in ImmutableOopMapBuilder().
> 
> no ResourceMark in 8064458.2

Added.

> 
> >>Why you need ImmutableOopMapBuilder to be friend of class ImmutableOopMap?
> >
> >It makes a call to data_addr() which is private. Only for debug builds
> >so I wrapped it in DEBUG_ONLY.
> 
> Put ImmutableOopMapBuilder::verify() under #ifdef ASSERT and its declaration in DEBUG_ONLY.
> 

Done.

Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.4

Thanks
/R

> Thanks,
> Vladimir
> 
> >
> >>
> >>I think a simple loop in ImmutableOopMapBuilder::verify() would be faster than calling memcmp.
> >
> >Fixed.
> >
> >>
> >>Field _end is not used.
> >
> >Removed.
> >
> >>
> >>ImmutableOopMapBuilder() calls reset() and next called heap_size()
> >>calls reset() again. May be move reset() to the end of heap_size()
> >>so that you don't need to call it in fill().
> >>
> >
> >Better yet, the reset() method isn't required anymore.
> >
> >Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2
> >
> >/R
> >
> >>Thanks,
> >>Vladimir
> >>
> >>On 4/28/15 3:03 AM, Rickard B?ckman wrote:
> >>>Hi all,
> >>>
> >>>can I please have reviews for this change:
> >>>
> >>>RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
> >>>RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
> >>>
> >>>While looking at OopMaps a while ago I noticed that there were a couple
> >>>of different fields that were unused after the OopMaps were finalised.
> >>>
> >>>I took some time to investigate and rearrange the OopMaps. Since I
> >>>didn't want to change how the OopMaps are built I introduced new data
> >>>structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
> >>>structures are used to build up the OopMaps and when finalised they are
> >>>copied into the Immutable variants.
> >>>
> >>>The ImmutableOopMapSet contains a few fields [size, count] and then a
> >>>list of [pc, offset]. The offset points to the offset after the list
> >>>where the ImmutableOopMap is placed. By moving pc out from OopMap to be
> >>>part of the list we can now have multiple pcs with identical OopMaps
> >>>point to the same data.
> >>>
> >>>We only keep 1 empty OopMap, and the other compaction that is done in
> >>>this change is to check if the OopMap is identical to the previous one
> >>>and then reuse that one. So no complete uniqueness check.
> >>>
> >>>I ran a couple of small benchmarks and printed the size of the old
> >>>OopMaps vs the new. The new layout uses about 20 - 25% of the space on
> >>>the benchmarks I've run.
> >>>
> >>>Tested by running through JPRT, running BigApps and NSK.quick.testlist
> >>>
> >>>Thanks
> >>>/R
> >>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150430/8499a92d/signature.asc>

From vladimir.kozlov at oracle.com  Thu Apr 30 16:13:29 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 30 Apr 2015 09:13:29 -0700
Subject: [9] RFR(S): 8078497: C2's superword optimization causes unaligned
	memory accesses
In-Reply-To: <5541F0B1.50804@oracle.com>
References: <5540D237.8000107@oracle.com> <5541172D.5010200@oracle.com>
	<5541F0B1.50804@oracle.com>
Message-ID: <554254A9.6060608@oracle.com>

Looks good.

Thanks,
Vladimir

On 4/30/15 2:06 AM, Tobias Hartmann wrote:
> Hi Vladimir,
>
> thanks for the review!
>
> On 29.04.2015 19:38, Vladimir Kozlov wrote:
>> Consider reducing number of iterations in test from 1M so that the test does not timeout on slow platforms.
>
> I reduced the number of iterations to 20k and verified that it is enough to trigger compilation and crash on Sparc.
>
>> lines 511-515 are duplicates of 486-491.  Consider moving them before vw%span check.
>
> Thanks, I refactored that part.
>
>> Typo 'iff':
>> +    // final offset is a multiple of vw iff init_offset is a multiple.
>
> I used 'iff' to refer to 'if and only if' [1]. I changed it.
>
> Here is the new webrev:
> http://cr.openjdk.java.net/~thartmann/8078497/webrev.01/
>
> Thanks,
> Tobias
>
> [1] http://en.wikipedia.org/wiki/If_and_only_if
>
>>
>> Thanks,
>> Vladimir
>>
>> On 4/29/15 5:44 AM, Tobias Hartmann wrote:
>>> Hi,
>>>
>>> please review the following patch.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8078497
>>> http://cr.openjdk.java.net/~thartmann/8078497/webrev.00/
>>>
>>> Background information (simplified):
>>> After loop optimizations, C2 tries to vectorize memory operations by finding and merging adjacent memory accesses in the loop body. The superword optimization matches MemNodes with the following pattern as address input:
>>>
>>>     ptr + k*iv + constant [+ invar]
>>>
>>> where iv is the loop induction variable, k is a scaling factor that may be 0 and invar is an optional loop invariant value. C2 then picks the memory operation with most similar references and tries to align it in the main loop by setting the number of pre-loop iterations accordingly. Other adjacent memory operations that conform to this alignment are merged into packs. After extending and filtering these packs, final vector operations are emitted.
>>>
>>> The problem is that some architectures (for example, Sparc) require memory accesses to be aligned and in special cases the superword optimization generates code that contains unaligned vector instructions.
>>>
>>> Problems:
>>> (1) If two memory operations have different loop invariant offset values and C2 decides to align the main loop to one of them, we can always set the invariant of the other operation such that we break the alignment constraint. For example, if we have a store to a byte array a[i+inv1] and a load from a byte array b[i+inv2] (where i is the induction variable), C2 may decide to align the main loop such that (i+inv1 % 8) == 0 and replace the adjacent byte stores by a 8-byte double word store. Also the byte loads will be replaced by a double word load. If we then set inv2 = inv1 + 1 at runtime, the load will fail with a SIGBUS because the access is not 8-byte aligned, i.e., (i+inv2 % 8) != 0. Vectorization should not take place in this case.
>>>
>>> (2) If C2 decides to align the main loop to a memory operation, the necessary adjustment of the induction variable by the pre-loop is computed such that the resulting offset in the main loop is aligned to the vector width (see 'SuperWord::get_iv_adjustment'). If the offset of a memory operation is independent of the loop induction variable (i.e., scale k is 0), the iv adjustment should be 0.
>>>
>>> (3) If the loop span is greater than the vector width 'vw' of a memory operation, the superword optimization assumes that the pre-loop is never able to align this operation in the main loop (see 'SuperWord::ref_is_alignable'). This is wrong because if the loop span is a multiple of vw and depending on the initial offset, it may very well be possible to align the operation to vw.
>>>
>>> These problems originally only showed up in the string density code base (JDK-8054307) where we have a putChar intrinsic that writes a char value to two entries of a byte array. To reproduce the issue with JDK9 we have to make sure that the memory operations:
>>> (i) are independent,
>>> (ii) have different invariants,
>>> (iii) are not too complex (because then vectorization will not take place).
>>>
>>> To guarantee (i) we either need two different arrays (for example, byte[] and char[]) for the load and store operations or the same array but different offsets. Since the offsets should include a loop invariant part (ii), the superword optimization will not be able to determine that the runtime offset values do not overlap. We therefore use Unsafe.getChar/putChar to read/write a char value from/to a byte/char array and thereby guarantee independence.
>>>
>>> I came up with the following test (see 'TestVectorizationWithInvariant.java'):
>>>
>>>     byte[] src = new byte[1000];
>>>     byte[] dst = new char[1000];
>>>
>>>     for (int i = (int) CHAR_ARRAY_OFFSET; i < 100; i = i + 8) {
>>>       // Copy 8 chars from src to dst
>>>       unsafe.putChar(dst, i + 0, unsafe.getChar(src, off + 0));
>>>       [...]
>>>       unsafe.putChar(dst, i + 14, unsafe.getChar(src, off + 14));
>>>     }
>>>
>>> Problem (1) shows up since the main loop will be aligned to the StoreC[i + 0] which has no invariant. However, the LoadUS[off + 0] has the loop invariant 'off'. Setting off to BYTE_ARRAY_OFFSET + 2 will break the alignment of the emitted double word load and result in a crash.
>>>
>>> The LoadUS[off + 0] in above example is independent of the loop induction variable and therefore has no scale value. Because of problem (2), the iv adjustment is computed as -4 (see 'SuperWord::get_iv_adjustment').
>>>
>>>     offset = 16 iv_adjust = -4 elt_size = 2 scale = 0 iv_stride = 16 vect_size 8
>>>
>>> The regression test contains additional test cases that also trigger problem (3).
>>>
>>> Solution:
>>> (1) I added a check to 'SuperWord::find_adjacent_refs' to make sure that memory accesses with different invariants are only vectorized if the architecture supports unaligned memory accesses.
>>> (2) 'SuperWord::get_iv_adjustment' is modified such that it returns 0 if the memory operation is independent of iv.
>>> (3) In 'SuperWord::ref_is_alignable' we need to additionally check if span is divisible by vw. If so, the pre-loop will add multiples of vw to the initial offset and if that initial offset is divisible by vm, the final offset (after the pre-loop) will be divisible as well and we should return true. I added comments to the code describing this in detail.
>>>
>>> Testing:
>>> - original test with string-density codebase
>>> - regression test
>>> - JPRT
>>>
>>> Thanks,
>>> Tobias
>>>

From vladimir.kozlov at oracle.com  Thu Apr 30 16:26:46 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 30 Apr 2015 09:26:46 -0700
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <20150430141818.GG31204@rbackman>
References: <20150428100337.GB31204@rbackman>
	<553FF92E.3010908@oracle.com>	<20150429145259.GD31204@rbackman>
	<5541132C.9020208@oracle.com> <20150430141818.GG31204@rbackman>
Message-ID: <554257C6.1010600@oracle.com>

It is better now. I am fine with inner Mapping class.

One thing left. I think only Mapping and fields could be private since only ImmutableOopMapBuilder accesses them:


+ class ImmutableOopMapBuilder {
+ public:
+   class Mapping;
+
+ public:
+   const OopMapSet* _set;

thanks,
Vladimir

On 4/30/15 7:18 AM, Rickard B?ckman wrote:
> On 04/29, Vladimir Kozlov wrote:
>> On 4/29/15 7:52 AM, Rickard B?ckman wrote:
>>> On 04/28, Vladimir Kozlov wrote:
>>>> You need closed SA changes.
>>>
>>> I have them, just forgot to send the mail. Small changes too.
>>>
>>>>
>>>> Style:
>>>>
>>>> Move fields to the beginning of ImmutableOopMapBuilder and classes.
>>>> Add comments describing each field in ALL new classes. Add comments
>>>> to fields in old class. It will help next persone who will look on
>>>> oop maps later.
>>>
>>> All fields are at the beginning? Just following style and keeping
>>> friends above fields as other classes in the same file has.
>>
>>    friends
>>    fields
>>    methods
>>
>> It is better to see fields before methods which access them. I am
>> not sure what codding style you are talking about. All classes in
>> oopMap.hpp has above layout.
>>
>> I am asking to change layout of ImmutableOopMapBuilder and Mapping classes. Also why Mapping is inner class?
>
> Ah, I was looking at oopMap.hpp and didn't see them. Fixed.
> Mapping is an inner class because it is only used by the Builder and I
> didn't want to clutter the namespace with something as general (and bad)
> as Mapping. I can move it out if that is preferred.
>
>>
>>
>>>
>>>>
>>>> Add ResourceMark into ImmutableOopMapSet::build_from() to free memory allocated in ImmutableOopMapBuilder().
>>
>> no ResourceMark in 8064458.2
>
> Added.
>
>>
>>>> Why you need ImmutableOopMapBuilder to be friend of class ImmutableOopMap?
>>>
>>> It makes a call to data_addr() which is private. Only for debug builds
>>> so I wrapped it in DEBUG_ONLY.
>>
>> Put ImmutableOopMapBuilder::verify() under #ifdef ASSERT and its declaration in DEBUG_ONLY.
>>
>
> Done.
>
> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.4
>
> Thanks
> /R
>
>> Thanks,
>> Vladimir
>>
>>>
>>>>
>>>> I think a simple loop in ImmutableOopMapBuilder::verify() would be faster than calling memcmp.
>>>
>>> Fixed.
>>>
>>>>
>>>> Field _end is not used.
>>>
>>> Removed.
>>>
>>>>
>>>> ImmutableOopMapBuilder() calls reset() and next called heap_size()
>>>> calls reset() again. May be move reset() to the end of heap_size()
>>>> so that you don't need to call it in fill().
>>>>
>>>
>>> Better yet, the reset() method isn't required anymore.
>>>
>>> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2
>>>
>>> /R
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 4/28/15 3:03 AM, Rickard B?ckman wrote:
>>>>> Hi all,
>>>>>
>>>>> can I please have reviews for this change:
>>>>>
>>>>> RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
>>>>> RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
>>>>>
>>>>> While looking at OopMaps a while ago I noticed that there were a couple
>>>>> of different fields that were unused after the OopMaps were finalised.
>>>>>
>>>>> I took some time to investigate and rearrange the OopMaps. Since I
>>>>> didn't want to change how the OopMaps are built I introduced new data
>>>>> structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
>>>>> structures are used to build up the OopMaps and when finalised they are
>>>>> copied into the Immutable variants.
>>>>>
>>>>> The ImmutableOopMapSet contains a few fields [size, count] and then a
>>>>> list of [pc, offset]. The offset points to the offset after the list
>>>>> where the ImmutableOopMap is placed. By moving pc out from OopMap to be
>>>>> part of the list we can now have multiple pcs with identical OopMaps
>>>>> point to the same data.
>>>>>
>>>>> We only keep 1 empty OopMap, and the other compaction that is done in
>>>>> this change is to check if the OopMap is identical to the previous one
>>>>> and then reuse that one. So no complete uniqueness check.
>>>>>
>>>>> I ran a couple of small benchmarks and printed the size of the old
>>>>> OopMaps vs the new. The new layout uses about 20 - 25% of the space on
>>>>> the benchmarks I've run.
>>>>>
>>>>> Tested by running through JPRT, running BigApps and NSK.quick.testlist
>>>>>
>>>>> Thanks
>>>>> /R
>>>>>

From christian.thalinger at oracle.com  Thu Apr 30 16:33:48 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 30 Apr 2015 09:33:48 -0700
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <554257C6.1010600@oracle.com>
References: <20150428100337.GB31204@rbackman> <553FF92E.3010908@oracle.com>
	<20150429145259.GD31204@rbackman> <5541132C.9020208@oracle.com>
	<20150430141818.GG31204@rbackman> <554257C6.1010600@oracle.com>
Message-ID: <94B29370-0478-43B8-8547-F108D5AD29FF@oracle.com>

One thing I mentioned to Rickard privately is that I would prefer to have the current OopMap be renamed to something else (TempOopMap?) and keep OopMap as the one used.

> On Apr 30, 2015, at 9:26 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> It is better now. I am fine with inner Mapping class.
> 
> One thing left. I think only Mapping and fields could be private since only ImmutableOopMapBuilder accesses them:
> 
> 
> + class ImmutableOopMapBuilder {
> + public:
> +   class Mapping;
> +
> + public:
> +   const OopMapSet* _set;
> 
> thanks,
> Vladimir
> 
> On 4/30/15 7:18 AM, Rickard B?ckman wrote:
>> On 04/29, Vladimir Kozlov wrote:
>>> On 4/29/15 7:52 AM, Rickard B?ckman wrote:
>>>> On 04/28, Vladimir Kozlov wrote:
>>>>> You need closed SA changes.
>>>> 
>>>> I have them, just forgot to send the mail. Small changes too.
>>>> 
>>>>> 
>>>>> Style:
>>>>> 
>>>>> Move fields to the beginning of ImmutableOopMapBuilder and classes.
>>>>> Add comments describing each field in ALL new classes. Add comments
>>>>> to fields in old class. It will help next persone who will look on
>>>>> oop maps later.
>>>> 
>>>> All fields are at the beginning? Just following style and keeping
>>>> friends above fields as other classes in the same file has.
>>> 
>>>   friends
>>>   fields
>>>   methods
>>> 
>>> It is better to see fields before methods which access them. I am
>>> not sure what codding style you are talking about. All classes in
>>> oopMap.hpp has above layout.
>>> 
>>> I am asking to change layout of ImmutableOopMapBuilder and Mapping classes. Also why Mapping is inner class?
>> 
>> Ah, I was looking at oopMap.hpp and didn't see them. Fixed.
>> Mapping is an inner class because it is only used by the Builder and I
>> didn't want to clutter the namespace with something as general (and bad)
>> as Mapping. I can move it out if that is preferred.
>> 
>>> 
>>> 
>>>> 
>>>>> 
>>>>> Add ResourceMark into ImmutableOopMapSet::build_from() to free memory allocated in ImmutableOopMapBuilder().
>>> 
>>> no ResourceMark in 8064458.2
>> 
>> Added.
>> 
>>> 
>>>>> Why you need ImmutableOopMapBuilder to be friend of class ImmutableOopMap?
>>>> 
>>>> It makes a call to data_addr() which is private. Only for debug builds
>>>> so I wrapped it in DEBUG_ONLY.
>>> 
>>> Put ImmutableOopMapBuilder::verify() under #ifdef ASSERT and its declaration in DEBUG_ONLY.
>>> 
>> 
>> Done.
>> 
>> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.4
>> 
>> Thanks
>> /R
>> 
>>> Thanks,
>>> Vladimir
>>> 
>>>> 
>>>>> 
>>>>> I think a simple loop in ImmutableOopMapBuilder::verify() would be faster than calling memcmp.
>>>> 
>>>> Fixed.
>>>> 
>>>>> 
>>>>> Field _end is not used.
>>>> 
>>>> Removed.
>>>> 
>>>>> 
>>>>> ImmutableOopMapBuilder() calls reset() and next called heap_size()
>>>>> calls reset() again. May be move reset() to the end of heap_size()
>>>>> so that you don't need to call it in fill().
>>>>> 
>>>> 
>>>> Better yet, the reset() method isn't required anymore.
>>>> 
>>>> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2
>>>> 
>>>> /R
>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>>> 
>>>>> On 4/28/15 3:03 AM, Rickard B?ckman wrote:
>>>>>> Hi all,
>>>>>> 
>>>>>> can I please have reviews for this change:
>>>>>> 
>>>>>> RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
>>>>>> RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
>>>>>> 
>>>>>> While looking at OopMaps a while ago I noticed that there were a couple
>>>>>> of different fields that were unused after the OopMaps were finalised.
>>>>>> 
>>>>>> I took some time to investigate and rearrange the OopMaps. Since I
>>>>>> didn't want to change how the OopMaps are built I introduced new data
>>>>>> structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
>>>>>> structures are used to build up the OopMaps and when finalised they are
>>>>>> copied into the Immutable variants.
>>>>>> 
>>>>>> The ImmutableOopMapSet contains a few fields [size, count] and then a
>>>>>> list of [pc, offset]. The offset points to the offset after the list
>>>>>> where the ImmutableOopMap is placed. By moving pc out from OopMap to be
>>>>>> part of the list we can now have multiple pcs with identical OopMaps
>>>>>> point to the same data.
>>>>>> 
>>>>>> We only keep 1 empty OopMap, and the other compaction that is done in
>>>>>> this change is to check if the OopMap is identical to the previous one
>>>>>> and then reuse that one. So no complete uniqueness check.
>>>>>> 
>>>>>> I ran a couple of small benchmarks and printed the size of the old
>>>>>> OopMaps vs the new. The new layout uses about 20 - 25% of the space on
>>>>>> the benchmarks I've run.
>>>>>> 
>>>>>> Tested by running through JPRT, running BigApps and NSK.quick.testlist
>>>>>> 
>>>>>> Thanks
>>>>>> /R
>>>>>> 


From vladimir.kozlov at oracle.com  Thu Apr 30 16:39:11 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 30 Apr 2015 09:39:11 -0700
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <94B29370-0478-43B8-8547-F108D5AD29FF@oracle.com>
References: <20150428100337.GB31204@rbackman> <553FF92E.3010908@oracle.com>
	<20150429145259.GD31204@rbackman> <5541132C.9020208@oracle.com>
	<20150430141818.GG31204@rbackman> <554257C6.1010600@oracle.com>
	<94B29370-0478-43B8-8547-F108D5AD29FF@oracle.com>
Message-ID: <55425AAF.2020801@oracle.com>

We still use OopMap and OopMapSet when we create oopmap data. ImmutableOopMap* is used for writing and reading final 
metadata. Otherwise changes would be much larger (see for example, sharedRuntime_<arch>.cpp etc). Rickard may correct me 
if I am wrong.

Vladimir

On 4/30/15 9:33 AM, Christian Thalinger wrote:
> One thing I mentioned to Rickard privately is that I would prefer to have the current OopMap be renamed to something else (TempOopMap?) and keep OopMap as the one used.
>
>> On Apr 30, 2015, at 9:26 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> It is better now. I am fine with inner Mapping class.
>>
>> One thing left. I think only Mapping and fields could be private since only ImmutableOopMapBuilder accesses them:
>>
>>
>> + class ImmutableOopMapBuilder {
>> + public:
>> +   class Mapping;
>> +
>> + public:
>> +   const OopMapSet* _set;
>>
>> thanks,
>> Vladimir
>>
>> On 4/30/15 7:18 AM, Rickard B?ckman wrote:
>>> On 04/29, Vladimir Kozlov wrote:
>>>> On 4/29/15 7:52 AM, Rickard B?ckman wrote:
>>>>> On 04/28, Vladimir Kozlov wrote:
>>>>>> You need closed SA changes.
>>>>>
>>>>> I have them, just forgot to send the mail. Small changes too.
>>>>>
>>>>>>
>>>>>> Style:
>>>>>>
>>>>>> Move fields to the beginning of ImmutableOopMapBuilder and classes.
>>>>>> Add comments describing each field in ALL new classes. Add comments
>>>>>> to fields in old class. It will help next persone who will look on
>>>>>> oop maps later.
>>>>>
>>>>> All fields are at the beginning? Just following style and keeping
>>>>> friends above fields as other classes in the same file has.
>>>>
>>>>    friends
>>>>    fields
>>>>    methods
>>>>
>>>> It is better to see fields before methods which access them. I am
>>>> not sure what codding style you are talking about. All classes in
>>>> oopMap.hpp has above layout.
>>>>
>>>> I am asking to change layout of ImmutableOopMapBuilder and Mapping classes. Also why Mapping is inner class?
>>>
>>> Ah, I was looking at oopMap.hpp and didn't see them. Fixed.
>>> Mapping is an inner class because it is only used by the Builder and I
>>> didn't want to clutter the namespace with something as general (and bad)
>>> as Mapping. I can move it out if that is preferred.
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Add ResourceMark into ImmutableOopMapSet::build_from() to free memory allocated in ImmutableOopMapBuilder().
>>>>
>>>> no ResourceMark in 8064458.2
>>>
>>> Added.
>>>
>>>>
>>>>>> Why you need ImmutableOopMapBuilder to be friend of class ImmutableOopMap?
>>>>>
>>>>> It makes a call to data_addr() which is private. Only for debug builds
>>>>> so I wrapped it in DEBUG_ONLY.
>>>>
>>>> Put ImmutableOopMapBuilder::verify() under #ifdef ASSERT and its declaration in DEBUG_ONLY.
>>>>
>>>
>>> Done.
>>>
>>> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.4
>>>
>>> Thanks
>>> /R
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>>>
>>>>>> I think a simple loop in ImmutableOopMapBuilder::verify() would be faster than calling memcmp.
>>>>>
>>>>> Fixed.
>>>>>
>>>>>>
>>>>>> Field _end is not used.
>>>>>
>>>>> Removed.
>>>>>
>>>>>>
>>>>>> ImmutableOopMapBuilder() calls reset() and next called heap_size()
>>>>>> calls reset() again. May be move reset() to the end of heap_size()
>>>>>> so that you don't need to call it in fill().
>>>>>>
>>>>>
>>>>> Better yet, the reset() method isn't required anymore.
>>>>>
>>>>> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2
>>>>>
>>>>> /R
>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 4/28/15 3:03 AM, Rickard B?ckman wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> can I please have reviews for this change:
>>>>>>>
>>>>>>> RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
>>>>>>> RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
>>>>>>>
>>>>>>> While looking at OopMaps a while ago I noticed that there were a couple
>>>>>>> of different fields that were unused after the OopMaps were finalised.
>>>>>>>
>>>>>>> I took some time to investigate and rearrange the OopMaps. Since I
>>>>>>> didn't want to change how the OopMaps are built I introduced new data
>>>>>>> structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
>>>>>>> structures are used to build up the OopMaps and when finalised they are
>>>>>>> copied into the Immutable variants.
>>>>>>>
>>>>>>> The ImmutableOopMapSet contains a few fields [size, count] and then a
>>>>>>> list of [pc, offset]. The offset points to the offset after the list
>>>>>>> where the ImmutableOopMap is placed. By moving pc out from OopMap to be
>>>>>>> part of the list we can now have multiple pcs with identical OopMaps
>>>>>>> point to the same data.
>>>>>>>
>>>>>>> We only keep 1 empty OopMap, and the other compaction that is done in
>>>>>>> this change is to check if the OopMap is identical to the previous one
>>>>>>> and then reuse that one. So no complete uniqueness check.
>>>>>>>
>>>>>>> I ran a couple of small benchmarks and printed the size of the old
>>>>>>> OopMaps vs the new. The new layout uses about 20 - 25% of the space on
>>>>>>> the benchmarks I've run.
>>>>>>>
>>>>>>> Tested by running through JPRT, running BigApps and NSK.quick.testlist
>>>>>>>
>>>>>>> Thanks
>>>>>>> /R
>>>>>>>
>

From christian.thalinger at oracle.com  Thu Apr 30 16:55:36 2015
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Thu, 30 Apr 2015 09:55:36 -0700
Subject: RFR (M): 8064458 OopMap class could be more compact
In-Reply-To: <55425AAF.2020801@oracle.com>
References: <20150428100337.GB31204@rbackman> <553FF92E.3010908@oracle.com>
	<20150429145259.GD31204@rbackman> <5541132C.9020208@oracle.com>
	<20150430141818.GG31204@rbackman> <554257C6.1010600@oracle.com>
	<94B29370-0478-43B8-8547-F108D5AD29FF@oracle.com>
	<55425AAF.2020801@oracle.com>
Message-ID: <DD1396FD-92E6-453B-B9F1-20CE6803CD83@oracle.com>

The changes would be much bigger, yes.  Just a thought, I don?t insist on doing it.

> On Apr 30, 2015, at 9:39 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> We still use OopMap and OopMapSet when we create oopmap data. ImmutableOopMap* is used for writing and reading final metadata. Otherwise changes would be much larger (see for example, sharedRuntime_<arch>.cpp etc). Rickard may correct me if I am wrong.
> 
> Vladimir
> 
> On 4/30/15 9:33 AM, Christian Thalinger wrote:
>> One thing I mentioned to Rickard privately is that I would prefer to have the current OopMap be renamed to something else (TempOopMap?) and keep OopMap as the one used.
>> 
>>> On Apr 30, 2015, at 9:26 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>> 
>>> It is better now. I am fine with inner Mapping class.
>>> 
>>> One thing left. I think only Mapping and fields could be private since only ImmutableOopMapBuilder accesses them:
>>> 
>>> 
>>> + class ImmutableOopMapBuilder {
>>> + public:
>>> +   class Mapping;
>>> +
>>> + public:
>>> +   const OopMapSet* _set;
>>> 
>>> thanks,
>>> Vladimir
>>> 
>>> On 4/30/15 7:18 AM, Rickard B?ckman wrote:
>>>> On 04/29, Vladimir Kozlov wrote:
>>>>> On 4/29/15 7:52 AM, Rickard B?ckman wrote:
>>>>>> On 04/28, Vladimir Kozlov wrote:
>>>>>>> You need closed SA changes.
>>>>>> 
>>>>>> I have them, just forgot to send the mail. Small changes too.
>>>>>> 
>>>>>>> 
>>>>>>> Style:
>>>>>>> 
>>>>>>> Move fields to the beginning of ImmutableOopMapBuilder and classes.
>>>>>>> Add comments describing each field in ALL new classes. Add comments
>>>>>>> to fields in old class. It will help next persone who will look on
>>>>>>> oop maps later.
>>>>>> 
>>>>>> All fields are at the beginning? Just following style and keeping
>>>>>> friends above fields as other classes in the same file has.
>>>>> 
>>>>>   friends
>>>>>   fields
>>>>>   methods
>>>>> 
>>>>> It is better to see fields before methods which access them. I am
>>>>> not sure what codding style you are talking about. All classes in
>>>>> oopMap.hpp has above layout.
>>>>> 
>>>>> I am asking to change layout of ImmutableOopMapBuilder and Mapping classes. Also why Mapping is inner class?
>>>> 
>>>> Ah, I was looking at oopMap.hpp and didn't see them. Fixed.
>>>> Mapping is an inner class because it is only used by the Builder and I
>>>> didn't want to clutter the namespace with something as general (and bad)
>>>> as Mapping. I can move it out if that is preferred.
>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Add ResourceMark into ImmutableOopMapSet::build_from() to free memory allocated in ImmutableOopMapBuilder().
>>>>> 
>>>>> no ResourceMark in 8064458.2
>>>> 
>>>> Added.
>>>> 
>>>>> 
>>>>>>> Why you need ImmutableOopMapBuilder to be friend of class ImmutableOopMap?
>>>>>> 
>>>>>> It makes a call to data_addr() which is private. Only for debug builds
>>>>>> so I wrapped it in DEBUG_ONLY.
>>>>> 
>>>>> Put ImmutableOopMapBuilder::verify() under #ifdef ASSERT and its declaration in DEBUG_ONLY.
>>>>> 
>>>> 
>>>> Done.
>>>> 
>>>> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.4
>>>> 
>>>> Thanks
>>>> /R
>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> I think a simple loop in ImmutableOopMapBuilder::verify() would be faster than calling memcmp.
>>>>>> 
>>>>>> Fixed.
>>>>>> 
>>>>>>> 
>>>>>>> Field _end is not used.
>>>>>> 
>>>>>> Removed.
>>>>>> 
>>>>>>> 
>>>>>>> ImmutableOopMapBuilder() calls reset() and next called heap_size()
>>>>>>> calls reset() again. May be move reset() to the end of heap_size()
>>>>>>> so that you don't need to call it in fill().
>>>>>>> 
>>>>>> 
>>>>>> Better yet, the reset() method isn't required anymore.
>>>>>> 
>>>>>> Updated webrev: http://cr.openjdk.java.net/~rbackman/8064458.2
>>>>>> 
>>>>>> /R
>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>> 
>>>>>>> On 4/28/15 3:03 AM, Rickard B?ckman wrote:
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> can I please have reviews for this change:
>>>>>>>> 
>>>>>>>> RFR: http://cr.openjdk.java.net/~rbackman/8064458.1/
>>>>>>>> RFE: http://bugs.openjdk.java.net/browse/JDK-8064458
>>>>>>>> 
>>>>>>>> While looking at OopMaps a while ago I noticed that there were a couple
>>>>>>>> of different fields that were unused after the OopMaps were finalised.
>>>>>>>> 
>>>>>>>> I took some time to investigate and rearrange the OopMaps. Since I
>>>>>>>> didn't want to change how the OopMaps are built I introduced new data
>>>>>>>> structures. ImmutableOopMapSet and ImmutableOopMap. The original OopMap
>>>>>>>> structures are used to build up the OopMaps and when finalised they are
>>>>>>>> copied into the Immutable variants.
>>>>>>>> 
>>>>>>>> The ImmutableOopMapSet contains a few fields [size, count] and then a
>>>>>>>> list of [pc, offset]. The offset points to the offset after the list
>>>>>>>> where the ImmutableOopMap is placed. By moving pc out from OopMap to be
>>>>>>>> part of the list we can now have multiple pcs with identical OopMaps
>>>>>>>> point to the same data.
>>>>>>>> 
>>>>>>>> We only keep 1 empty OopMap, and the other compaction that is done in
>>>>>>>> this change is to check if the OopMap is identical to the previous one
>>>>>>>> and then reuse that one. So no complete uniqueness check.
>>>>>>>> 
>>>>>>>> I ran a couple of small benchmarks and printed the size of the old
>>>>>>>> OopMaps vs the new. The new layout uses about 20 - 25% of the space on
>>>>>>>> the benchmarks I've run.
>>>>>>>> 
>>>>>>>> Tested by running through JPRT, running BigApps and NSK.quick.testlist
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> /R
>>>>>>>> 
>> 


From benedikt.wedenik at theobroma-systems.com  Thu Apr 23 13:41:46 2015
From: benedikt.wedenik at theobroma-systems.com (Benedikt Wedenik)
Date: Thu, 23 Apr 2015 13:41:46 -0000
Subject: aarch64 AD-file / matching rule
Message-ID: <E914289F-BD6A-47EB-81D0-3EDCF988A8E4@theobroma-systems.com>

Hi!

I?m writing compiler-optimisations for the aarch64 port at the moment and I am using specjbb2005 for benchmarking.
One of the patterns I want to optimise is the following:

  0x0000007f8c2961b4: and	w2, w2, #0x7ffff8
  0x0000007f8c2961b8: cmp	w2, #0x0
  0x0000007f8c2961bc: b.eq	0x0000007f8c2968f4


Here I see an opportunity for ands, b.eq.

I created a new rule in the cpu/aarch64/vm/aarch64.ad file.
My matching looks like this:

instruct and_cmp_branch(cmpOp cmp, immI0 zero, iRegIorL2I src1, immILog src2, label lbl, rFlagsReg cr) %{
  match(If cmp (CmpI (AndI src1 src2) zero) );

  effect(USE lbl);
  ins_cost(0); // is zero at the moment to be sure the rule is triggered.

  ins_encode %{
    Label* L = $lbl$$label;
    Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
    __ andsw(as_Register($src1$$reg),
        as_Register($src1$$reg),
        (unsigned long)($src2$$constant));
    __ br ((Assembler::Condition)$cmp$$cmpcode, *L);
  %}

  ins_pipe(pipe_cmp_branch); //TODO but not relevant yet
%}


As I don?t know whether my matching-rule is wrong or something else stops the rule from getting emitted I wanted to find out which ?and?-rule is triggered for this pattern.
I inserted some nop?s to locate the according rule and I found out, that most of the emitted ?and?s were surrounded by nop?s except for my pattern and some few other ones like this one:

0x0000007f984bf568: eor   x1, x0, x1
0x0000007f984bf56c: and   x1, x1, #0xffffffffffffff87
0x0000007f984bf570: cbz   x1, 0x0000007f984bf664
0x0000007f984bf574: and   xscratch1, x1, #0x7
0x0000007f984bf578: cbnz  xscratch1, 0x0000007f984bf5f0
0x0000007f984bf57c: and   xscratch1, x1, #0x300
0x0000007f984bf580: cbnz  xscratch1, 0x0000007f984bf5b8
0x0000007f984bf584: mov   xscratch1, #0x37f                   // #895
0x0000007f984bf588: and   x0, x0, xscratch1
0x0000007f984bf58c: orr   x1, x0, xthread
0x0000007f984bf590: ldaxr xscratch1, [x3]
0x0000007f984bf594: cmp   xscratch1, x0
0x0000007f984bf598: b.ne  0x0000007f984bf5a8


Usually I call the program like this:

????
JAVA=/root/bwedenik/jdk8/jdk8/build/linux-aarch64-normal-server-release/jdk/bin/java

$JAVA -fullversion
$JAVA -server -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:+UseBiasedLocking -XX:+UseParallelGC -XX:ParallelGCThreads=10 -XX:+UseParallelOldGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15  -Xms10g -Xmx10g -Xmn4g -Xss64m -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand='print,*DeliveryTransaction.preprocess' spec.jbb.JBBmain -propfile SPECjbb.props
????


I tried to figure out if this problem only occurs with c1, c2 or pure interpretation mode and these are the results (calling java as usual including the given arguments):

* [-Xint] : This gives me neither the inserted nop?s nor the pattern I am searching for (as expected due to no compilation).
* [-client -Xcomp -XX:-TieredCompilation] : Here the cmp for #0x0 only occurs about 3 times in the whole disassembly, instead of about 200 times without these flags. In addition there are no of my inserted nop?s in the disass.
* [-server -Xcomp -XX:-TieredCompilation] : Same as -client.


My question is now how to find out why the rule does not match / if the rule is correct and how to find the actual rule which emits the code of my desired pattern.

Thanks in advance,
Benedikt Wedenik, Theobroma-Systems.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/21f8d204/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150423/21f8d204/signature.asc>