From adinn at redhat.com  Mon Aug  3 09:28:48 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 03 Aug 2015 10:28:48 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64
 CodeCache by commit for 8130309
Message-ID: <55BF3450.5020008@redhat.com>

The following /AArch64-only/ webrev fixes some problems introduced into
the AArch64 codecache routines by the recent fix for JDK-8130309
committed to to hs-comp

  http://cr.openjdk.java.net/~adinn/8132875/webrev.00/

With this patch the hs-comp tree compiles and runs correctly on AArch64.
Reviews welcome.

regards,


Andrew Dinn
-----------


From aph at redhat.com  Mon Aug  3 10:05:12 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 03 Aug 2015 11:05:12 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into
 AArch64 CodeCache by commit for 8130309
In-Reply-To: <55BF3450.5020008@redhat.com>
References: <55BF3450.5020008@redhat.com>
Message-ID: <55BF3CD8.6020905@redhat.com>

On 03/08/15 10:28, Andrew Dinn wrote:
> With this patch the hs-comp tree compiles and runs correctly on AArch64.
> Reviews welcome.

That looks right to me.

Thanks,
Andrew.


From adinn at redhat.com  Mon Aug  3 11:03:13 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 03 Aug 2015 12:03:13 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into
 AArch64 CodeCache by commit for 8130309
In-Reply-To: <55BF3CD8.6020905@redhat.com>
References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com>
Message-ID: <55BF4A71.2040308@redhat.com>

On 03/08/15 11:05, Andrew Haley wrote:
> On 03/08/15 10:28, Andrew Dinn wrote:
>> With this patch the hs-comp tree compiles and runs correctly on AArch64.
>> Reviews welcome.
> 
> That looks right to me.

Thanks for the review.

Could someone from the compiler team with the relevant access right also
please review and then sponsor this patch for inclusion into hs-comp? It
would be good to get this fix into that repo before the original patch
goes up into jdk9. Thanks.

regards,


Andrew Dinn
-----------


From tobias.hartmann at oracle.com  Mon Aug  3 11:35:01 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 03 Aug 2015 13:35:01 +0200
Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into
 AArch64 CodeCache by commit for 8130309
In-Reply-To: <55BF4A71.2040308@redhat.com>
References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com>
	<55BF4A71.2040308@redhat.com>
Message-ID: <55BF51E5.5090708@oracle.com>

Hi Andrew,

thanks for fixing that! Seems like I forgot the manual aarch64 testing for my latest webrev..

The changes look good. I can sponsor and push them into hs-comp after an official reviewer approved them.

Best,
Tobias

On 03.08.2015 13:03, Andrew Dinn wrote:
> On 03/08/15 11:05, Andrew Haley wrote:
>> On 03/08/15 10:28, Andrew Dinn wrote:
>>> With this patch the hs-comp tree compiles and runs correctly on AArch64.
>>> Reviews welcome.
>>
>> That looks right to me.
> 
> Thanks for the review.
> 
> Could someone from the compiler team with the relevant access right also
> please review and then sponsor this patch for inclusion into hs-comp? It
> would be good to get this fix into that repo before the original patch
> goes up into jdk9. Thanks.
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> 

From adinn at redhat.com  Mon Aug  3 11:42:02 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 03 Aug 2015 12:42:02 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into
 AArch64 CodeCache by commit for 8130309
In-Reply-To: <55BF51E5.5090708@oracle.com>
References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com>
	<55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com>
Message-ID: <55BF538A.9080409@redhat.com>

Hi Tobias,

On 03/08/15 12:35, Tobias Hartmann wrote:
> thanks for fixing that! Seems like I forgot the manual aarch64
> testing for my latest webrev..
> 
> The changes look good. I can sponsor and push them into hs-comp after
> an official reviewer approved them.

Thanks, Tobias.

Do we need another reviewer for an AArch64-only change? If so then could
someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on
holiday so we don't have another AArch64 port dev to review?

Thanks!

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From tobias.hartmann at oracle.com  Mon Aug  3 13:20:46 2015
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 03 Aug 2015 15:20:46 +0200
Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into
 AArch64 CodeCache by commit for 8130309
In-Reply-To: <55BF538A.9080409@redhat.com>
References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com>
	<55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com>
	<55BF538A.9080409@redhat.com>
Message-ID: <55BF6AAE.2030101@oracle.com>


On 03.08.2015 13:42, Andrew Dinn wrote:
> Hi Tobias,
> 
> On 03/08/15 12:35, Tobias Hartmann wrote:
>> thanks for fixing that! Seems like I forgot the manual aarch64
>> testing for my latest webrev..
>>
>> The changes look good. I can sponsor and push them into hs-comp after
>> an official reviewer approved them.
> 
> Thanks, Tobias.
> 
> Do we need another reviewer for an AArch64-only change? If so then could
> someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on
> holiday so we don't have another AArch64 port dev to review?

I think we need at least one JDK 9 reviewer (I'm not an official reviewer).

Best,
Tobias

> 
> Thanks!
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)
> 

From vladimir.kozlov at oracle.com  Mon Aug  3 16:07:03 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 3 Aug 2015 09:07:03 -0700
Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into
 AArch64 CodeCache by commit for 8130309
In-Reply-To: <55BF3450.5020008@redhat.com>
References: <55BF3450.5020008@redhat.com>
Message-ID: <55BF91A7.7030608@oracle.com>

Looks good.

Thanks,
Vladimir

On 8/3/15 2:28 AM, Andrew Dinn wrote:
> The following /AArch64-only/ webrev fixes some problems introduced into
> the AArch64 codecache routines by the recent fix for JDK-8130309
> committed to to hs-comp
>
>    http://cr.openjdk.java.net/~adinn/8132875/webrev.00/
>
> With this patch the hs-comp tree compiles and runs correctly on AArch64.
> Reviews welcome.
>
> regards,
>
>
> Andrew Dinn
> -----------
>

From adinn at redhat.com  Wed Aug  5 19:55:26 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 05 Aug 2015 20:55:26 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into
 AArch64 CodeCache by commit for 8130309
In-Reply-To: <55BF91A7.7030608@oracle.com>
References: <55BF3450.5020008@redhat.com> <55BF91A7.7030608@oracle.com>
Message-ID: <55C26A2E.1060902@redhat.com>

On 03/08/15 17:07, Vladimir Kozlov wrote:
> Looks good.

Thanks for the review Vladimir (and apologies for the delay in replying
-- I was traveling for a meeting).

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From adinn at redhat.com  Thu Aug  6 14:04:49 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 06 Aug 2015 15:04:49 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into
 AArch64 CodeCache by commit for 8130309
In-Reply-To: <20150806082441.GK12948@rbackman>
References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com>
	<55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com>
	<55BF538A.9080409@redhat.com> <20150806082441.GK12948@rbackman>
Message-ID: <55C36981.5070001@redhat.com>

On 06/08/15 09:24, Rickard B?ckman wrote:
> Looks good.

Thanks, Rickard!

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From aleksey.shipilev at oracle.com  Mon Aug 10 08:17:08 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Mon, 10 Aug 2015 11:17:08 +0300
Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte
 nops everywhere
In-Reply-To: <55B8FFA0.4070105@oracle.com>
References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com>
	<55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com>
	<55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com>
	<4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap>
	<55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com>
	<55B8DC9C.7010003@oracle.com> <55B8FFA0.4070105@oracle.com>
Message-ID: <55C85E04.7060002@oracle.com>

On 07/29/2015 07:30 PM, Dean Long wrote:
> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote:
>> On 07/27/2015 01:21 PM, Andrew Haley wrote:
>>> On 27/07/15 10:13, Aleksey Shipilev wrote:
>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>>>
>>>> Andrew/Edward, are you OK with AArch64 part?
>>>>    http://cr.openjdk.java.net/~shade/8131682/webrev.02/
>>> I agree that it looks good.
>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and
>> Andrew Haley. Still no Capital (R)eviewers.
>>
>> Otherwise, I think we are good to go. I respinned the JPRT with
>> open+closed sources, and it would seem the changes in closed sources are
>> not required.
> 
> The changes to sparc and ppc may not be required anymore.

Excellent, please sponsor!
  http://cr.openjdk.java.net/~shade/8131682/8131682.changeset

Thanks,
-Aleksey


From dean.long at oracle.com  Mon Aug 10 19:42:48 2015
From: dean.long at oracle.com (Dean)
Date: Mon, 10 Aug 2015 12:42:48 -0700
Subject: [aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1
	should use multibyte nops everywhere
Message-ID: <q47s84gly6v3wgscvd76euxl.1439235768858@email.android.com>

I can sponsor this.  How about removing the ppc, aarch64, and sparc changes and making sure it still builds?

dl


Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
>On 07/29/2015 07:30 PM, Dean Long wrote:
>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote:
>>> On 07/27/2015 01:21 PM, Andrew Haley wrote:
>>>> On 27/07/15 10:13, Aleksey Shipilev wrote:
>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>>>>
>>>>> Andrew/Edward, are you OK with AArch64 part?
>>>>>    http://cr.openjdk.java.net/~shade/8131682/webrev.02/
>>>> I agree that it looks good.
>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and
>>> Andrew Haley. Still no Capital (R)eviewers.
>>>
>>> Otherwise, I think we are good to go. I respinned the JPRT with
>>> open+closed sources, and it would seem the changes in closed sources are
>>> not required.
>> 
>> The changes to sparc and ppc may not be required anymore.
>
>Excellent, please sponsor!
>  http://cr.openjdk.java.net/~shade/8131682/8131682.changeset
>
>Thanks,
>-Aleksey
>
>

From dean.long at oracle.com  Mon Aug 10 19:57:03 2015
From: dean.long at oracle.com (Dean)
Date: Mon, 10 Aug 2015 12:57:03 -0700
Subject: [aarch64-port-dev ] aarch64-port-dev ] aarch64-port-dev ] RFR
	(S) 8131682: C1 should use multibyte nops everywhere
Message-ID: <g6jxe628pno3gl97tqu3sfod.1439236623876@email.android.com>

Did you get a Reviewer yet?

dl


Dean <dean.long at oracle.com> wrote:
>I can sponsor this.  How about removing the ppc, aarch64, and sparc changes and making sure it still builds?
>
>dl
>
>
>Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
>>On 07/29/2015 07:30 PM, Dean Long wrote:
>>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote:
>>>> On 07/27/2015 01:21 PM, Andrew Haley wrote:
>>>>> On 27/07/15 10:13, Aleksey Shipilev wrote:
>>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>>>>>
>>>>>> Andrew/Edward, are you OK with AArch64 part?
>>>>>>    http://cr.openjdk.java.net/~shade/8131682/webrev.02/
>>>>> I agree that it looks good.
>>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and
>>>> Andrew Haley. Still no Capital (R)eviewers.
>>>>
>>>> Otherwise, I think we are good to go. I respinned the JPRT with
>>>> open+closed sources, and it would seem the changes in closed sources are
>>>> not required.
>>> 
>>> The changes to sparc and ppc may not be required anymore.
>>
>>Excellent, please sponsor!
>>  http://cr.openjdk.java.net/~shade/8131682/8131682.changeset
>>
>>Thanks,
>>-Aleksey
>>
>>

From aleksey.shipilev at oracle.com  Tue Aug 11 09:27:38 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Tue, 11 Aug 2015 12:27:38 +0300
Subject: [aarch64-port-dev ] aarch64-port-dev ] aarch64-port-dev ] RFR
 (S) 8131682: C1 should use multibyte nops everywhere
In-Reply-To: <g6jxe628pno3gl97tqu3sfod.1439236623876@email.android.com>
References: <g6jxe628pno3gl97tqu3sfod.1439236623876@email.android.com>
Message-ID: <55C9C00A.3040302@oracle.com>

Hi Dean,

Ah yes, since we now use MacroAssembler::align to produce the effective
alignment, we can drop the platform-specific changes. ARM and PPC ports
may rewire their own MacroAssemblers if there are potentially better nop
sequences.

New changeset:
  http://cr.openjdk.java.net/~shade/8131682/8131682.changeset

Tested it builds and runs with full JPRT.

See the "Reviewed-by" line there. I think there are Reviewers there...

Thanks,
-Aleksey

On 08/10/2015 10:57 PM, Dean wrote:
> Did you get a Reviewer yet?
> 
> dl
> 
> 
> Dean <dean.long at oracle.com> wrote:
>> I can sponsor this.  How about removing the ppc, aarch64, and sparc changes and making sure it still builds?
>>
>> dl
>>
>>
>> Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
>>> On 07/29/2015 07:30 PM, Dean Long wrote:
>>>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote:
>>>>> On 07/27/2015 01:21 PM, Andrew Haley wrote:
>>>>>> On 27/07/15 10:13, Aleksey Shipilev wrote:
>>>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>>>>>>
>>>>>>> Andrew/Edward, are you OK with AArch64 part?
>>>>>>>    http://cr.openjdk.java.net/~shade/8131682/webrev.02/
>>>>>> I agree that it looks good.
>>>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and
>>>>> Andrew Haley. Still no Capital (R)eviewers.
>>>>>
>>>>> Otherwise, I think we are good to go. I respinned the JPRT with
>>>>> open+closed sources, and it would seem the changes in closed sources are
>>>>> not required.
>>>>
>>>> The changes to sparc and ppc may not be required anymore.
>>>
>>> Excellent, please sponsor!
>>>  http://cr.openjdk.java.net/~shade/8131682/8131682.changeset
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>>


From edward.nevill at gmail.com  Tue Aug 11 15:57:33 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 11 Aug 2015 16:57:33 +0100
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
	unpredictable instructions
Message-ID: <1439308653.5920.16.camel@mylittlepony.linaroharston>

Hi,

Webrev http://cr.openjdk.java.net/~enevill/8133352/

fixes an issue reported by one of our partners where aarch64 jdk9 is still generating constrained unpredictable instructions.

The two cases being generates are

STXR Rs, Rt, [Rn] where Rs == Rt

and

LDAXP Rt1, Rt2, [Rn] where Rt1 == Rt2 (this case is only generated in the assembler smoke test and is not generated in real code)

On the particular vendors HW the behavior for these instructions is to generate a SIGILL.

Unfortunately the fix for this is non trivial, the reason being that

STXR Rs, Rt, [Rn]

requires Rs != Rt != Rn however we only have 2 scratch registers.

The solution I have adopted is to add an addition temp arg to the routines in macroAssembler and pass down an additional temp from the top level usage, but this involves a non trivial amount of code changes.

The alternative solution would be create a temp by pushing a register on the stack.

I am in the process of testing this but I want to get peoples feedback as to whether this is the right solution or whether there is some better way of doing it.

Thanks for your help,
Ed.


From vladimir.kozlov at oracle.com  Tue Aug 11 16:55:08 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 11 Aug 2015 09:55:08 -0700
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
 unpredictable instructions
In-Reply-To: <1439308653.5920.16.camel@mylittlepony.linaroharston>
References: <1439308653.5920.16.camel@mylittlepony.linaroharston>
Message-ID: <55CA28EC.7060109@oracle.com>

I think it depends how expensive push/pop on arm64.
In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in 
.ad). So you are saving on stack anyway.
On other hand your changes (third temp) are not so big and I think acceptable.

Thanks,
Vladimir

On 8/11/15 8:57 AM, Edward Nevill wrote:
> Hi,
>
> Webrev http://cr.openjdk.java.net/~enevill/8133352/
>
> fixes an issue reported by one of our partners where aarch64 jdk9 is still generating constrained unpredictable instructions.
>
> The two cases being generates are
>
> STXR Rs, Rt, [Rn] where Rs == Rt
>
> and
>
> LDAXP Rt1, Rt2, [Rn] where Rt1 == Rt2 (this case is only generated in the assembler smoke test and is not generated in real code)
>
> On the particular vendors HW the behavior for these instructions is to generate a SIGILL.
>
> Unfortunately the fix for this is non trivial, the reason being that
>
> STXR Rs, Rt, [Rn]
>
> requires Rs != Rt != Rn however we only have 2 scratch registers.
>
> The solution I have adopted is to add an addition temp arg to the routines in macroAssembler and pass down an additional temp from the top level usage, but this involves a non trivial amount of code changes.
>
> The alternative solution would be create a temp by pushing a register on the stack.
>
> I am in the process of testing this but I want to get peoples feedback as to whether this is the right solution or whether there is some better way of doing it.
>
> Thanks for your help,
> Ed.
>
>

From aph at redhat.com  Wed Aug 12 16:10:00 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 12 Aug 2015 17:10:00 +0100
Subject: [aarch64-port-dev ] JDK 8: Remove code in C2 to generate ldar/stlr
Message-ID: <55CB6FD8.1070203@redhat.com>

This pulls out all the old (and probably bogus) code which uses
acquire and release instructions.  I'm hoping Andrew Dinn will follow
up with a backport of his (much better) JDK9 code which requires no
changes to the code in share/ .

It cleanly passes jcstress.

Andrew.


-------------- next part --------------
# HG changeset patch
# User aph
# Date 1439395472 0
#      Wed Aug 12 16:04:32 2015 +0000
# Node ID 8ec803e97a0d578eaeaf8375ee295a5928eb546f
# Parent  4c3f7e682e4849e84211e7c27379900fd0571173
Remove code which uses load acquire and store release.  Revert to
plain old memory fences.

diff -r 4c3f7e682e48 -r 8ec803e97a0d src/cpu/aarch64/vm/aarch64.ad
--- a/src/cpu/aarch64/vm/aarch64.ad	Fri Jul 31 16:29:03 2015 +0100
+++ b/src/cpu/aarch64/vm/aarch64.ad	Wed Aug 12 16:04:32 2015 +0000
@@ -962,35 +962,10 @@
   }
 };
 
-  bool preceded_by_ordered_load(const Node *barrier);
-
 %}
 
 source %{
 
-  // AArch64 has load acquire and store release instructions which we
-  // use for ordered memory accesses, e.g. for volatiles.  The ideal
-  // graph generator also inserts memory barriers around volatile
-  // accesses, and we don't want to generate both barriers and acq/rel
-  // instructions.  So, when we emit a MemBarAcquire we look back in
-  // the ideal graph for an ordered load and only emit the barrier if
-  // we don't find one.
-
-bool preceded_by_ordered_load(const Node *barrier) {
-  Node *x = barrier->lookup(TypeFunc::Parms);
-
-  if (! x)
-    return false;
-
-  if (x->is_DecodeNarrowPtr())
-    x = x->in(1);
-
-  if (x->is_Load())
-    return ! x->as_Load()->is_unordered();
-
-  return false;
-}
-
 #define __ _masm.
 
 // advance declaratuons for helper functions to convert register
@@ -5620,7 +5595,7 @@
 instruct loadB(iRegINoSp dst, memory mem)
 %{
   match(Set dst (LoadB mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrsbw  $dst, $mem\t# byte" %}
@@ -5634,7 +5609,7 @@
 instruct loadB2L(iRegLNoSp dst, memory mem)
 %{
   match(Set dst (ConvI2L (LoadB mem)));
-  predicate(n->in(1)->as_Load()->is_unordered());
+  // predicate(n->in(1)->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrsb  $dst, $mem\t# byte" %}
@@ -5648,7 +5623,7 @@
 instruct loadUB(iRegINoSp dst, memory mem)
 %{
   match(Set dst (LoadUB mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrbw  $dst, $mem\t# byte" %}
@@ -5662,7 +5637,7 @@
 instruct loadUB2L(iRegLNoSp dst, memory mem)
 %{
   match(Set dst (ConvI2L (LoadUB mem)));
-  predicate(n->in(1)->as_Load()->is_unordered());
+  // predicate(n->in(1)->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrb  $dst, $mem\t# byte" %}
@@ -5676,7 +5651,7 @@
 instruct loadS(iRegINoSp dst, memory mem)
 %{
   match(Set dst (LoadS mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrshw  $dst, $mem\t# short" %}
@@ -5690,7 +5665,7 @@
 instruct loadS2L(iRegLNoSp dst, memory mem)
 %{
   match(Set dst (ConvI2L (LoadS mem)));
-  predicate(n->in(1)->as_Load()->is_unordered());
+  // predicate(n->in(1)->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrsh  $dst, $mem\t# short" %}
@@ -5704,7 +5679,7 @@
 instruct loadUS(iRegINoSp dst, memory mem)
 %{
   match(Set dst (LoadUS mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrh  $dst, $mem\t# short" %}
@@ -5718,7 +5693,7 @@
 instruct loadUS2L(iRegLNoSp dst, memory mem)
 %{
   match(Set dst (ConvI2L (LoadUS mem)));
-  predicate(n->in(1)->as_Load()->is_unordered());
+  // predicate(n->in(1)->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrh  $dst, $mem\t# short" %}
@@ -5732,7 +5707,7 @@
 instruct loadI(iRegINoSp dst, memory mem)
 %{
   match(Set dst (LoadI mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrw  $dst, $mem\t# int" %}
@@ -5746,7 +5721,7 @@
 instruct loadI2L(iRegLNoSp dst, memory mem)
 %{
   match(Set dst (ConvI2L (LoadI mem)));
-  predicate(n->in(1)->as_Load()->is_unordered());
+  // predicate(n->in(1)->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrsw  $dst, $mem\t# int" %}
@@ -5760,7 +5735,7 @@
 instruct loadUI2L(iRegLNoSp dst, memory mem, immL_32bits mask)
 %{
   match(Set dst (AndL (ConvI2L (LoadI mem)) mask));
-  predicate(n->in(1)->in(1)->as_Load()->is_unordered());
+  // predicate(n->in(1)->in(1)->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrw  $dst, $mem\t# int" %}
@@ -5774,7 +5749,7 @@
 instruct loadL(iRegLNoSp dst, memory mem)
 %{
   match(Set dst (LoadL mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldr  $dst, $mem\t# int" %}
@@ -5801,7 +5776,7 @@
 instruct loadP(iRegPNoSp dst, memory mem)
 %{
   match(Set dst (LoadP mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldr  $dst, $mem\t# ptr" %}
@@ -5815,7 +5790,7 @@
 instruct loadN(iRegNNoSp dst, memory mem)
 %{
   match(Set dst (LoadN mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrw  $dst, $mem\t# compressed ptr" %}
@@ -5829,7 +5804,7 @@
 instruct loadKlass(iRegPNoSp dst, memory mem)
 %{
   match(Set dst (LoadKlass mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldr  $dst, $mem\t# class" %}
@@ -5843,7 +5818,7 @@
 instruct loadNKlass(iRegNNoSp dst, memory mem)
 %{
   match(Set dst (LoadNKlass mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrw  $dst, $mem\t# compressed class ptr" %}
@@ -5857,7 +5832,7 @@
 instruct loadF(vRegF dst, memory mem)
 %{
   match(Set dst (LoadF mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrs  $dst, $mem\t# float" %}
@@ -5871,7 +5846,7 @@
 instruct loadD(vRegD dst, memory mem)
 %{
   match(Set dst (LoadD mem));
-  predicate(n->as_Load()->is_unordered());
+  // predicate(n->as_Load()->is_unordered());
 
   ins_cost(4 * INSN_COST);
   format %{ "ldrd  $dst, $mem\t# double" %}
@@ -6102,7 +6077,7 @@
 instruct storeB(iRegIorL2I src, memory mem)
 %{
   match(Set mem (StoreB mem src));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "strb  $src, $mem\t# byte" %}
@@ -6116,7 +6091,7 @@
 instruct storeimmB0(immI0 zero, memory mem)
 %{
   match(Set mem (StoreB mem zero));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "strb zr, $mem\t# byte" %}
@@ -6130,7 +6105,7 @@
 instruct storeC(iRegIorL2I src, memory mem)
 %{
   match(Set mem (StoreC mem src));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "strh  $src, $mem\t# short" %}
@@ -6143,7 +6118,7 @@
 instruct storeimmC0(immI0 zero, memory mem)
 %{
   match(Set mem (StoreC mem zero));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "strh  zr, $mem\t# short" %}
@@ -6158,7 +6133,7 @@
 instruct storeI(iRegIorL2I src, memory mem)
 %{
   match(Set mem(StoreI mem src));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "strw  $src, $mem\t# int" %}
@@ -6171,7 +6146,7 @@
 instruct storeimmI0(immI0 zero, memory mem)
 %{
   match(Set mem(StoreI mem zero));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "strw  zr, $mem\t# int" %}
@@ -6185,7 +6160,7 @@
 instruct storeL(iRegL src, memory mem)
 %{
   match(Set mem (StoreL mem src));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "str  $src, $mem\t# int" %}
@@ -6199,7 +6174,7 @@
 instruct storeimmL0(immL0 zero, memory mem)
 %{
   match(Set mem (StoreL mem zero));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "str  zr, $mem\t# int" %}
@@ -6213,7 +6188,7 @@
 instruct storeP(iRegP src, memory mem)
 %{
   match(Set mem (StoreP mem src));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "str  $src, $mem\t# ptr" %}
@@ -6227,7 +6202,7 @@
 instruct storeimmP0(immP0 zero, memory mem)
 %{
   match(Set mem (StoreP mem zero));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "str zr, $mem\t# ptr" %}
@@ -6286,7 +6261,7 @@
 instruct storeN(iRegN src, memory mem)
 %{
   match(Set mem (StoreN mem src));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "strw  $src, $mem\t# compressed ptr" %}
@@ -6300,8 +6275,9 @@
 %{
   match(Set mem (StoreN mem zero));
   predicate(Universe::narrow_oop_base() == NULL &&
-            Universe::narrow_klass_base() == NULL &&
-            n->as_Store()->is_unordered());
+            Universe::narrow_klass_base() == NULL//  &&
+	    // n->as_Store()->is_unordered()
+	    );
 
   ins_cost(INSN_COST);
   format %{ "strw  rheapbase, $mem\t# compressed ptr (rheapbase==0)" %}
@@ -6315,7 +6291,7 @@
 instruct storeF(vRegF src, memory mem)
 %{
   match(Set mem (StoreF mem src));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "strs  $src, $mem\t# float" %}
@@ -6332,7 +6308,7 @@
 instruct storeD(vRegD src, memory mem)
 %{
   match(Set mem (StoreD mem src));
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
 
   ins_cost(INSN_COST);
   format %{ "strd  $src, $mem\t# double" %}
@@ -6345,7 +6321,7 @@
 // Store Compressed Klass Pointer
 instruct storeNKlass(iRegN src, memory mem)
 %{
-  predicate(n->as_Store()->is_unordered());
+//   predicate(n->as_Store()->is_unordered());
   match(Set mem (StoreNKlass mem src));
 
   ins_cost(INSN_COST);
@@ -6395,312 +6371,6 @@
   ins_pipe(iload_prefetch);
 %}
 
-//  ---------------- volatile loads and stores ----------------
-
-// Load Byte (8 bit signed)
-instruct loadB_volatile(iRegINoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadB mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarsb  $dst, $mem\t# byte" %}
-
-  ins_encode(aarch64_enc_ldarsb(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Byte (8 bit signed) into long
-instruct loadB2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (ConvI2L (LoadB mem)));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarsb  $dst, $mem\t# byte" %}
-
-  ins_encode(aarch64_enc_ldarsb(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Byte (8 bit unsigned)
-instruct loadUB_volatile(iRegINoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadUB mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarb  $dst, $mem\t# byte" %}
-
-  ins_encode(aarch64_enc_ldarb(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Byte (8 bit unsigned) into long
-instruct loadUB2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (ConvI2L (LoadUB mem)));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarb  $dst, $mem\t# byte" %}
-
-  ins_encode(aarch64_enc_ldarb(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Short (16 bit signed)
-instruct loadS_volatile(iRegINoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadS mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarshw  $dst, $mem\t# short" %}
-
-  ins_encode(aarch64_enc_ldarshw(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-instruct loadUS_volatile(iRegINoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadUS mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarhw  $dst, $mem\t# short" %}
-
-  ins_encode(aarch64_enc_ldarhw(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Short/Char (16 bit unsigned) into long
-instruct loadUS2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (ConvI2L (LoadUS mem)));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarh  $dst, $mem\t# short" %}
-
-  ins_encode(aarch64_enc_ldarh(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Short/Char (16 bit signed) into long
-instruct loadS2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (ConvI2L (LoadS mem)));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarh  $dst, $mem\t# short" %}
-
-  ins_encode(aarch64_enc_ldarsh(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Integer (32 bit signed)
-instruct loadI_volatile(iRegINoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadI mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarw  $dst, $mem\t# int" %}
-
-  ins_encode(aarch64_enc_ldarw(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Integer (32 bit unsigned) into long
-instruct loadUI2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem, immL_32bits mask)
-%{
-  match(Set dst (AndL (ConvI2L (LoadI mem)) mask));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarw  $dst, $mem\t# int" %}
-
-  ins_encode(aarch64_enc_ldarw(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Long (64 bit signed)
-instruct loadL_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadL mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldar  $dst, $mem\t# int" %}
-
-  ins_encode(aarch64_enc_ldar(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Pointer
-instruct loadP_volatile(iRegPNoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadP mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldar  $dst, $mem\t# ptr" %}
-
-  ins_encode(aarch64_enc_ldar(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Compressed Pointer
-instruct loadN_volatile(iRegNNoSp dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadN mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldarw  $dst, $mem\t# compressed ptr" %}
-
-  ins_encode(aarch64_enc_ldarw(dst, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Float
-instruct loadF_volatile(vRegF dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadF mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldars  $dst, $mem\t# float" %}
-
-  ins_encode( aarch64_enc_fldars(dst, mem) );
-
-  ins_pipe(pipe_serial);
-%}
-
-// Load Double
-instruct loadD_volatile(vRegD dst, /* sync_memory*/indirect mem)
-%{
-  match(Set dst (LoadD mem));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "ldard  $dst, $mem\t# double" %}
-
-  ins_encode( aarch64_enc_fldard(dst, mem) );
-
-  ins_pipe(pipe_serial);
-%}
-
-// Store Byte
-instruct storeB_volatile(iRegIorL2I src, /* sync_memory*/indirect mem)
-%{
-  match(Set mem (StoreB mem src));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "stlrb  $src, $mem\t# byte" %}
-
-  ins_encode(aarch64_enc_stlrb(src, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Store Char/Short
-instruct storeC_volatile(iRegIorL2I src, /* sync_memory*/indirect mem)
-%{
-  match(Set mem (StoreC mem src));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "stlrh  $src, $mem\t# short" %}
-
-  ins_encode(aarch64_enc_stlrh(src, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Store Integer
-
-instruct storeI_volatile(iRegIorL2I src, /* sync_memory*/indirect mem)
-%{
-  match(Set mem(StoreI mem src));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "stlrw  $src, $mem\t# int" %}
-
-  ins_encode(aarch64_enc_stlrw(src, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Store Long (64 bit signed)
-instruct storeL_volatile(iRegL src, /* sync_memory*/indirect mem)
-%{
-  match(Set mem (StoreL mem src));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "stlr  $src, $mem\t# int" %}
-
-  ins_encode(aarch64_enc_stlr(src, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Store Pointer
-instruct storeP_volatile(iRegP src, /* sync_memory*/indirect mem)
-%{
-  match(Set mem (StoreP mem src));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "stlr  $src, $mem\t# ptr" %}
-
-  ins_encode(aarch64_enc_stlr(src, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Store Compressed Pointer
-instruct storeN_volatile(iRegN src, /* sync_memory*/indirect mem)
-%{
-  match(Set mem (StoreN mem src));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "stlrw  $src, $mem\t# compressed ptr" %}
-
-  ins_encode(aarch64_enc_stlrw(src, mem));
-
-  ins_pipe(pipe_serial);
-%}
-
-// Store Float
-instruct storeF_volatile(vRegF src, /* sync_memory*/indirect mem)
-%{
-  match(Set mem (StoreF mem src));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "stlrs  $src, $mem\t# float" %}
-
-  ins_encode( aarch64_enc_fstlrs(src, mem) );
-
-  ins_pipe(pipe_serial);
-%}
-
-// TODO
-// implement storeImmF0 and storeFImmPacked
-
-// Store Double
-instruct storeD_volatile(vRegD src, /* sync_memory*/indirect mem)
-%{
-  match(Set mem (StoreD mem src));
-
-  ins_cost(VOLATILE_REF_COST);
-  format %{ "stlrd  $src, $mem\t# double" %}
-
-  ins_encode( aarch64_enc_fstlrd(src, mem) );
-
-  ins_pipe(pipe_serial);
-%}
-
-//  ---------------- end of volatile loads and stores ----------------
-
 // ============================================================================
 // BSWAP Instructions
 
@@ -6918,20 +6588,6 @@
   ins_pipe(pipe_serial);
 %}
 
-instruct unnecessary_membar_acquire() %{
-  predicate(preceded_by_ordered_load(n));
-  match(MemBarAcquire);
-  ins_cost(0);
-
-  format %{ "membar_acquire (elided)" %}
-
-  ins_encode %{
-    __ block_comment("membar_acquire (elided)");
-  %}
-
-  ins_pipe(pipe_class_empty);
-%}
-
 instruct membar_acquire() %{
   match(MemBarAcquire);
   ins_cost(VOLATILE_REF_COST);
@@ -7016,7 +6672,7 @@
 
   ins_encode %{
     __ membar(Assembler::StoreLoad);
-  %}
+    %}
 
   ins_pipe(pipe_serial);
 %}
diff -r 4c3f7e682e48 -r 8ec803e97a0d src/share/vm/opto/graphKit.cpp
--- a/src/share/vm/opto/graphKit.cpp	Fri Jul 31 16:29:03 2015 +0100
+++ b/src/share/vm/opto/graphKit.cpp	Wed Aug 12 16:04:32 2015 +0000
@@ -3803,8 +3803,7 @@
 
   // Smash zero into card
   if( !UseConcMarkSweepGC ) {
-    __ store(__ ctrl(), card_adr, zero, bt, adr_type,
-	     MemNode:: NOT_AARCH64(release) AARCH64_ONLY(unordered));
+    __ store(__ ctrl(), card_adr, zero, bt, adr_type, MemNode::release);
   } else {
     // Specialized path for CM store barrier
     __ storeCM(__ ctrl(), card_adr, zero, oop_store, adr_idx, bt, adr_type);
diff -r 4c3f7e682e48 -r 8ec803e97a0d src/share/vm/opto/memnode.hpp
--- a/src/share/vm/opto/memnode.hpp	Fri Jul 31 16:29:03 2015 +0100
+++ b/src/share/vm/opto/memnode.hpp	Wed Aug 12 16:04:32 2015 +0000
@@ -540,16 +540,10 @@
   // Conservatively release stores of object references in order to
   // ensure visibility of object initialization.
   static inline MemOrd release_if_reference(const BasicType t) {
-#ifndef AARCH64
     const MemOrd mo = (t == T_ARRAY ||
                        t == T_ADDRESS || // Might be the address of an object reference (`boxing').
                        t == T_OBJECT) ? release : unordered;
     return mo;
-#else
-    // AArch64 doesn't need this because it emits barriers when an
-    // object is initialized.
-    return unordered;
-#endif
   }
 
   // Polymorphic factory method
diff -r 4c3f7e682e48 -r 8ec803e97a0d src/share/vm/opto/parse2.cpp
--- a/src/share/vm/opto/parse2.cpp	Fri Jul 31 16:29:03 2015 +0100
+++ b/src/share/vm/opto/parse2.cpp	Wed Aug 12 16:04:32 2015 +0000
@@ -1717,8 +1717,7 @@
     a = pop();                  // the array itself
     const TypeOopPtr* elemtype  = _gvn.type(a)->is_aryptr()->elem()->make_oopptr();
     const TypeAryPtr* adr_type = TypeAryPtr::OOPS;
-    Node* store = store_oop_to_array(control(), a, d, adr_type, c, elemtype, T_OBJECT,
-				     MemNode:: NOT_AARCH64(release) AARCH64_ONLY(unordered));
+    Node* store = store_oop_to_array(control(), a, d, adr_type, c, elemtype, T_OBJECT, MemNode::release);
     break;
   }
   case Bytecodes::_lastore: {
diff -r 4c3f7e682e48 -r 8ec803e97a0d src/share/vm/opto/parse3.cpp
--- a/src/share/vm/opto/parse3.cpp	Fri Jul 31 16:29:03 2015 +0100
+++ b/src/share/vm/opto/parse3.cpp	Wed Aug 12 16:04:32 2015 +0000
@@ -281,10 +281,7 @@
   // If reference is volatile, prevent following memory ops from
   // floating down past the volatile write.  Also prevents commoning
   // another volatile read.
-  // AArch64 uses store release (which does everything we need to keep
-  // the machine in order) but we still need a compiler barrier here.
-  if (is_vol)
-    insert_mem_bar(NOT_AARCH64(Op_MemBarRelease) AARCH64_ONLY(Op_MemBarCPUOrder));
+  if (is_vol)  insert_mem_bar(Op_MemBarRelease);
 
   // Compute address and memory type.
   int offset = field->offset_in_bytes();
@@ -325,7 +322,7 @@
   if (is_vol) {
     // If not multiple copy atomic, we do the MemBarVolatile before the load.
     if (!support_IRIW_for_not_multiple_copy_atomic_cpu) {
-      insert_mem_bar(NOT_AARCH64(Op_MemBarVolatile) AARCH64_ONLY(Op_MemBarCPUOrder)); // Use fat membar
+      insert_mem_bar(Op_MemBarVolatile); // Use fat membar
     }
     // Remember we wrote a volatile field.
     // For not multiple copy atomic cpu (ppc64) a barrier should be issued

From aph at redhat.com  Wed Aug 12 16:10:33 2015
From: aph at redhat.com (aph at redhat.com)
Date: Wed, 12 Aug 2015 16:10:33 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: Remove code
	which uses load acquire	and store release. Revert to
Message-ID: <201508121610.t7CGAXde009443@aojmv0008.oracle.com>

Changeset: 8ec803e97a0d
Author:    aph
Date:      2015-08-12 16:04 +0000
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/8ec803e97a0d

Remove code which uses load acquire and store release.  Revert to
plain old memory fences.

! src/cpu/aarch64/vm/aarch64.ad
! src/share/vm/opto/graphKit.cpp
! src/share/vm/opto/memnode.hpp
! src/share/vm/opto/parse2.cpp
! src/share/vm/opto/parse3.cpp


From adinn at redhat.com  Wed Aug 12 16:15:07 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 12 Aug 2015 17:15:07 +0100
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: Remove code
 which uses load acquire and store release. Revert to
In-Reply-To: <201508121610.t7CGAXde009443@aojmv0008.oracle.com>
References: <201508121610.t7CGAXde009443@aojmv0008.oracle.com>
Message-ID: <55CB710B.5060200@redhat.com>

On 12/08/15 17:10, aph at redhat.com wrote:
> Changeset: 8ec803e97a0d
> Author:    aph
> Date:      2015-08-12 16:04 +0000
> URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/8ec803e97a0d
> 
> Remove code which uses load acquire and store release.  Revert to
> plain old memory fences.
> 
> ! src/cpu/aarch64/vm/aarch64.ad
> ! src/share/vm/opto/graphKit.cpp
> ! src/share/vm/opto/memnode.hpp
> ! src/share/vm/opto/parse2.cpp
> ! src/share/vm/opto/parse3.cpp

That all looks right to me.

regards,


Andrew Dinn
-----------


From edward.nevill at gmail.com  Wed Aug 12 16:23:32 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 12 Aug 2015 17:23:32 +0100
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
 unpredictable instructions
In-Reply-To: <55CA28EC.7060109@oracle.com>
References: <1439308653.5920.16.camel@mylittlepony.linaroharston>
	<55CA28EC.7060109@oracle.com>
Message-ID: <1439396612.4820.31.camel@mylittlepony.linaroharston>

On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote:
> I think it depends how expensive push/pop on arm64.
> In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in 
> .ad). So you are saving on stack anyway.
> On other hand your changes (third temp) are not so big and I think acceptable.
> On 8/11/15 8:57 AM, Edward Nevill wrote:
> > Hi,
> >
> > Webrev http://cr.openjdk.java.net/~enevill/8133352/

Hi Vladimir,

Thanks for that. Another possibility is to use the inverse operation to restore the result after it has been corrupted.

EG.

-#define ATOMIC_OP(LDXR, OP, STXR)                                       \
+#define ATOMIC_OP(LDXR, OP, IOP, STXR)                                       \
 void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \
   Register result = rscratch2;                                          \
   if (prev->is_valid())                                                 \
@@ -2120,14 +2125,15 @@
   bind(retry_load);                                                     \
   LDXR(result, addr);                                                   \
   OP(rscratch1, result, incr);                                          \
-  STXR(rscratch1, rscratch1, addr);                                     \
-  cbnzw(rscratch1, retry_load);                                         \
-  if (prev->is_valid() && prev != result)                               \
-    mov(prev, result);                                                  \
+  STXR(rscratch2, rscratch1, addr);                                     \
+  cbnzw(rscratch2, retry_load);                                         \
+  if (prev->is_valid() && prev != result) {                             \
+    IOP(prev, rscratch1, incr);                                         \
+  }                                                                     \
 }

-ATOMIC_OP(ldxr, add, stxr)
-ATOMIC_OP(ldxrw, addw, stxrw)
+ATOMIC_OP(ldxr, add, sub, stxr)
+ATOMIC_OP(ldxrw, addw, subw, stxrw)

This essentially creates the extra register we need by using the inverse operation to restore the result.

It doesn't win any beauty contests, but it is probably the most optimal.

All the best,
Ed.


From dean.long at oracle.com  Thu Aug 13 06:25:03 2015
From: dean.long at oracle.com (Dean Long)
Date: Wed, 12 Aug 2015 23:25:03 -0700
Subject: [aarch64-port-dev ] aarch64-port-dev ] aarch64-port-dev ] RFR
 (S) 8131682: C1 should use multibyte nops everywhere
In-Reply-To: <55C9C00A.3040302@oracle.com>
References: <g6jxe628pno3gl97tqu3sfod.1439236623876@email.android.com>
	<55C9C00A.3040302@oracle.com>
Message-ID: <55CC383F.2070105@oracle.com>

OK, I pushed it for you.

dl

On 8/11/2015 2:27 AM, Aleksey Shipilev wrote:
> Hi Dean,
>
> Ah yes, since we now use MacroAssembler::align to produce the effective
> alignment, we can drop the platform-specific changes. ARM and PPC ports
> may rewire their own MacroAssemblers if there are potentially better nop
> sequences.
>
> New changeset:
>    http://cr.openjdk.java.net/~shade/8131682/8131682.changeset
>
> Tested it builds and runs with full JPRT.
>
> See the "Reviewed-by" line there. I think there are Reviewers there...
>
> Thanks,
> -Aleksey
>
> On 08/10/2015 10:57 PM, Dean wrote:
>> Did you get a Reviewer yet?
>>
>> dl
>>
>>
>> Dean <dean.long at oracle.com> wrote:
>>> I can sponsor this.  How about removing the ppc, aarch64, and sparc changes and making sure it still builds?
>>>
>>> dl
>>>
>>>
>>> Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
>>>> On 07/29/2015 07:30 PM, Dean Long wrote:
>>>>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote:
>>>>>> On 07/27/2015 01:21 PM, Andrew Haley wrote:
>>>>>>> On 27/07/15 10:13, Aleksey Shipilev wrote:
>>>>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp.
>>>>>>>>
>>>>>>>> Andrew/Edward, are you OK with AArch64 part?
>>>>>>>>     http://cr.openjdk.java.net/~shade/8131682/webrev.02/
>>>>>>> I agree that it looks good.
>>>>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and
>>>>>> Andrew Haley. Still no Capital (R)eviewers.
>>>>>>
>>>>>> Otherwise, I think we are good to go. I respinned the JPRT with
>>>>>> open+closed sources, and it would seem the changes in closed sources are
>>>>>> not required.
>>>>> The changes to sparc and ppc may not be required anymore.
>>>> Excellent, please sponsor!
>>>>   http://cr.openjdk.java.net/~shade/8131682/8131682.changeset
>>>>
>>>> Thanks,
>>>> -Aleksey
>>>>
>>>>
>


From adinn at redhat.com  Thu Aug 13 08:58:02 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 13 Aug 2015 09:58:02 +0100
Subject: [aarch64-port-dev ] Fwd: Re: RFR: 8078743: AARCH64: Extend use of
 stlr to cater for volatile object stores
In-Reply-To: <55CB3FEC.1070709@redhat.com>
References: <55CB3FEC.1070709@redhat.com>
Message-ID: <55CC5C1A.4060804@redhat.com>

Apologies, I replied only to compiler-dev when I should also have posted
to aarch64-port-dev. Could someone other than Vladimir please review
this patch? Also, I will need a sponsor to commit it to hs-comp if/when
it passes review. Thanks!

regards,


Andrew Dinn
-----------

-------- Forwarded Message --------
Subject: Re: RFR: 8078743: AARCH64: Extend use of stlr to cater for
volatile object stores
Date: Wed, 12 Aug 2015 13:45:32 +0100
From: Andrew Dinn <adinn at redhat.com>
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>,
hotspot-compiler-dev at openjdk.java.net

Hi Vladimir,

Apologies for the delay in responding to your feedback -- I was
traveling for a team meeting all of last week.

Here is a revised webrev which includes all the code changes you suggested

  http://cr.openjdk.java.net/~adinn/8078743/webrev.04

Also, as requested I did some testing on the two AArch64 machines to
which I have access. Does it help? Short answer: yes it is well worth
doing as it causes no harm on the sort of architecture where you would
expect no benefit and helps a lot on the sort of architecture where you
would expect it to help. More details below.

regards,


Andrew Dinn
-----------

The Tests
---------

I ran some simple tests using the jmh micro-benchmark harness, first
using the old style dmb based implementation (i.e. passing
-XX:+UseBarriersForVolatile) and then using the new style stlr-based
implementation (using -XX:-UseBarriersForVolatile). Each test was run in
each of the 5 relevant GC configs:

  +G1GC
  +CMS +UseCondCardMark
  +CMS -UseCondCardMark
  +Par +UseCondCardMark
  +Par -UseCondCardMark

The tests were derived from Alexey Shipilev's recently posted CAS test,
tweaked to do volatile stores instead of CASes. Each test employs a
single thread which repeatedly writes a volatile field
(AtomicReference<Object>.set). A delay call follows each write
(BlackHole.consumeCPU) with the delay argument varying from 0 to 64. A
single AtomicReference instance is employed throughout the test.

Test one always writes null; test two always writes a fixed object; test
three writes an object newly allocated at each write (example source for
the null write test is included below). This range of tests allows
various elements of the write barrier to be omitted at generate time or
run time, depending upon the GC config.

In each case the result was recorded as the average number of
nanoseconds per write operation (ns/op). I am afraid I am not in a
position to give the actual timings on any specific architecture or,
indeed, name what hardware was used. However, I can give a qualitiative
account of what I found and it pretty much accords with Andrew Haley's
expectations.

Main Results
------------

  With the first (O-O-O CPU) implementation of AArch64 there was no
statistically significant variation in the ns/op.

  With the other (simple pipeline CPU) implementation for most of the
test space there was a very significant improvement (decrease) in ns/op
for the stlr version when compared against the equivalent barrier
implementation

Detailed Results
----------------

The second machine showed some interesting variations in performance
improvement which are worth mentioning:

  - in the best case ns/op was cut by 50% (CMS - UseCondCardMark,
backoff 0, old value write)

  - at backoff 0 in most cases ns/op was cut by ~30-35% for null/old
value write and ~15-20% for young value write

  - at backoff 64 in most cases ns/op was cut by ~5-10%  (n.b. this is
mostly to do with the addition of wasted backoff time -- there was only
a small decrease in the absolute times)

  - with most GC configs greatest improvement was with old value write,
least improvement with young value write

the above general results did not apply for 2 specific data points

  - with CMS + UseCondCardMark no significant %ge change was seen for
old value writes

  - with Par + UseCondCardMark no significant %ge change was seen for
young value writes

These last 2 results are a bit odd.

  For both old and young puts CMS + UseCondCardMark requires a dmb ish
after the stlr to ensure the card read does not float above the volatile
store. For null puts the dmb gets elided (type info tells the compiler
no card mark needed). So, the difference here between old and young
writes is unexpected but must be down to the effect of conditional card
marking rather than the barriers vs stlr.

  Par + UseCondCardMark employs no synchronization for the card mark.
Once again the null write case will not need a card mark but the other
two cases will. So, once again the disparity in the improvement between
these two cases is unexpected but must be down to the effect of
conditional card marking rather than the barriers vs stlr.

Example Test Class
-------------------

package org.openjdk;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicReference;

@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class VolSetNull {

    AtomicReference<Object> ref;

    @Param({"0", "1", "2", "4", "8", "16", "32", "64"})
    int backoff;

    @Setup
    public void setup() {
        ref = new AtomicReference<>();
        ref.set(new Object());
    }

    @Benchmark
    public boolean test() {
        Blackhole.consumeCPU(backoff);
        ref.set(null);
	return true;
    }
}


From adinn at redhat.com  Fri Aug 14 09:01:02 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 14 Aug 2015 10:01:02 +0100
Subject: [aarch64-port-dev ] RFR: 8078743: AARCH64: Extend use of stlr
 to cater for volatile object stores
In-Reply-To: <55CB3FEC.1070709@redhat.com>
References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com>
	<55CB3FEC.1070709@redhat.com>
Message-ID: <55CDAE4E.5060301@redhat.com>

Any chance of getting my updated patch reviewed by 2 people and
sponsored it to go into hs-comp?

I now have the follow on patch for issue ready for review

  8080293: AARCH64: Remove unnecessary dmbs from generated CAS code

So it would be very helpful if this one could be checked. Thanks!

regards,

Andrew Dinn
-----------


On 12/08/15 13:45, Andrew Dinn wrote:
> Hi Vladimir,
> 
> Apologies for the delay in responding to your feedback -- I was
> traveling for a team meeting all of last week.
> 
> Here is a revised webrev which includes all the code changes you suggested
> 
>   http://cr.openjdk.java.net/~adinn/8078743/webrev.04
> 
> Also, as requested I did some testing on the two AArch64 machines to
> which I have access. Does it help? Short answer: yes it is well worth
> doing as it causes no harm on the sort of architecture where you would
> expect no benefit and helps a lot on the sort of architecture where you
> would expect it to help. More details below.
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> 
> The Tests
> ---------
> 
> I ran some simple tests using the jmh micro-benchmark harness, first
> using the old style dmb based implementation (i.e. passing
> -XX:+UseBarriersForVolatile) and then using the new style stlr-based
> implementation (using -XX:-UseBarriersForVolatile). Each test was run in
> each of the 5 relevant GC configs:
> 
>   +G1GC
>   +CMS +UseCondCardMark
>   +CMS -UseCondCardMark
>   +Par +UseCondCardMark
>   +Par -UseCondCardMark
> 
> The tests were derived from Alexey Shipilev's recently posted CAS test,
> tweaked to do volatile stores instead of CASes. Each test employs a
> single thread which repeatedly writes a volatile field
> (AtomicReference<Object>.set). A delay call follows each write
> (BlackHole.consumeCPU) with the delay argument varying from 0 to 64. A
> single AtomicReference instance is employed throughout the test.
> 
> Test one always writes null; test two always writes a fixed object; test
> three writes an object newly allocated at each write (example source for
> the null write test is included below). This range of tests allows
> various elements of the write barrier to be omitted at generate time or
> run time, depending upon the GC config.
> 
> In each case the result was recorded as the average number of
> nanoseconds per write operation (ns/op). I am afraid I am not in a
> position to give the actual timings on any specific architecture or,
> indeed, name what hardware was used. However, I can give a qualitiative
> account of what I found and it pretty much accords with Andrew Haley's
> expectations.
> 
> Main Results
> ------------
> 
>   With the first (O-O-O CPU) implementation of AArch64 there was no
> statistically significant variation in the ns/op.
> 
>   With the other (simple pipeline CPU) implementation for most of the
> test space there was a very significant improvement (decrease) in ns/op
> for the stlr version when compared against the equivalent barrier
> implementation
> 
> Detailed Results
> ----------------
> 
> The second machine showed some interesting variations in performance
> improvement which are worth mentioning:
> 
>   - in the best case ns/op was cut by 50% (CMS - UseCondCardMark,
> backoff 0, old value write)
> 
>   - at backoff 0 in most cases ns/op was cut by ~30-35% for null/old
> value write and ~15-20% for young value write
> 
>   - at backoff 64 in most cases ns/op was cut by ~5-10%  (n.b. this is
> mostly to do with the addition of wasted backoff time -- there was only
> a small decrease in the absolute times)
> 
>   - with most GC configs greatest improvement was with old value write,
> least improvement with young value write
> 
> the above general results did not apply for 2 specific data points
> 
>   - with CMS + UseCondCardMark no significant %ge change was seen for
> old value writes
> 
>   - with Par + UseCondCardMark no significant %ge change was seen for
> young value writes
> 
> These last 2 results are a bit odd.
> 
>   For both old and young puts CMS + UseCondCardMark requires a dmb ish
> after the stlr to ensure the card read does not float above the volatile
> store. For null puts the dmb gets elided (type info tells the compiler
> no card mark needed). So, the difference here between old and young
> writes is unexpected but must be down to the effect of conditional card
> marking rather than the barriers vs stlr.
> 
>   Par + UseCondCardMark employs no synchronization for the card mark.
> Once again the null write case will not need a card mark but the other
> two cases will. So, once again the disparity in the improvement between
> these two cases is unexpected but must be down to the effect of
> conditional card marking rather than the barriers vs stlr.
> 
> Example Test Class
> -------------------
> 
> package org.openjdk;
> 
> import org.openjdk.jmh.annotations.*;
> import org.openjdk.jmh.infra.Blackhole;
> 
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicReference;
> 
> @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
> @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
> @Fork(3)
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @State(Scope.Benchmark)
> public class VolSetNull {
> 
>     AtomicReference<Object> ref;
> 
>     @Param({"0", "1", "2", "4", "8", "16", "32", "64"})
>     int backoff;
> 
>     @Setup
>     public void setup() {
>         ref = new AtomicReference<>();
>         ref.set(new Object());
>     }
> 
>     @Benchmark
>     public boolean test() {
>         Blackhole.consumeCPU(backoff);
>         ref.set(null);
> 	return true;
>     }
> }
> 

-- 
regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From edward.nevill at gmail.com  Tue Aug 18 13:07:11 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Tue, 18 Aug 2015 14:07:11 +0100
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
 unpredictable instructions
In-Reply-To: <1439396612.4820.31.camel@mylittlepony.linaroharston>
References: <1439308653.5920.16.camel@mylittlepony.linaroharston>
	<55CA28EC.7060109@oracle.com>
	<1439396612.4820.31.camel@mylittlepony.linaroharston>
Message-ID: <1439903231.5709.5.camel@mylittlepony.linaroharston>

Hi,

Given that there has been no objections to my proposed solution I have prepared a webrev based on this.

http://cr.openjdk.java.net/~enevill/8133352/webrev.01

The original jira issue is here

https://bugs.openjdk.java.net/browse/JDK-8133352

I have tested with jtreg hotspot and langtools. Results before and after were identical.

Hotspot: Test results: passed: 883; failed: 2; error: 10
Langtools: Test results: passed: 3,260; failed: 2

Please review and if OK I will push,

Thanks,
Ed.

On Wed, 2015-08-12 at 17:23 +0100, Edward Nevill wrote:
> On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote:
> > I think it depends how expensive push/pop on arm64.
> > In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in 
> > .ad). So you are saving on stack anyway.
> > On other hand your changes (third temp) are not so big and I think acceptable.
> > On 8/11/15 8:57 AM, Edward Nevill wrote:
> -#define ATOMIC_OP(LDXR, OP, STXR)                                       \
> +#define ATOMIC_OP(LDXR, OP, IOP, STXR)                                       \
>  void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \
>    Register result = rscratch2;                                          \
>    if (prev->is_valid())                                                 \
> @@ -2120,14 +2125,15 @@
>    bind(retry_load);                                                     \
>    LDXR(result, addr);                                                   \
>    OP(rscratch1, result, incr);                                          \
> -  STXR(rscratch1, rscratch1, addr);                                     \
> -  cbnzw(rscratch1, retry_load);                                         \
> -  if (prev->is_valid() && prev != result)                               \
> -    mov(prev, result);                                                  \
> +  STXR(rscratch2, rscratch1, addr);                                     \
> +  cbnzw(rscratch2, retry_load);                                         \
> +  if (prev->is_valid() && prev != result) {                             \
> +    IOP(prev, rscratch1, incr);                                         \
> +  }                                                                     \
>  }
> 
> -ATOMIC_OP(ldxr, add, stxr)
> -ATOMIC_OP(ldxrw, addw, stxrw)
> +ATOMIC_OP(ldxr, add, sub, stxr)
> +ATOMIC_OP(ldxrw, addw, subw, stxrw)
> 
> This essentially creates the extra register we need by using the inverse operation to restore the result.
> 
> It doesn't win any beauty contests, but it is probably the most optimal.


From adinn at redhat.com  Tue Aug 18 14:50:07 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Tue, 18 Aug 2015 15:50:07 +0100
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
 unpredictable instructions
In-Reply-To: <1439903231.5709.5.camel@mylittlepony.linaroharston>
References: <1439308653.5920.16.camel@mylittlepony.linaroharston>
	<55CA28EC.7060109@oracle.com>
	<1439396612.4820.31.camel@mylittlepony.linaroharston>
	<1439903231.5709.5.camel@mylittlepony.linaroharston>
Message-ID: <55D3461F.8070405@redhat.com>

Hi Ed,

On 18/08/15 14:07, Edward Nevill wrote:
> Given that there has been no objections to my proposed solution I
> have prepared a webrev based on this.
> 
> http://cr.openjdk.java.net/~enevill/8133352/webrev.01
> 
> The original jira issue is here
> 
> https://bugs.openjdk.java.net/browse/JDK-8133352
> 
> I have tested with jtreg hotspot and langtools. Results before and
> after were identical.
> 
> Hotspot: Test results: passed: 883; failed: 2; error: 10 Langtools:
> Test results: passed: 3,260; failed: 2
> 
> Please review and if OK I will push,

Your change looks good to me.

regards,


Andrew Dinn
-----------

From aph at redhat.com  Tue Aug 18 17:56:19 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 18 Aug 2015 18:56:19 +0100
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
 unpredictable instructions
In-Reply-To: <55D3461F.8070405@redhat.com>
References: <1439308653.5920.16.camel@mylittlepony.linaroharston>	<55CA28EC.7060109@oracle.com>	<1439396612.4820.31.camel@mylittlepony.linaroharston>	<1439903231.5709.5.camel@mylittlepony.linaroharston>
	<55D3461F.8070405@redhat.com>
Message-ID: <55D371C3.6030703@redhat.com>

On 08/18/2015 03:50 PM, Andrew Dinn wrote:
> Your change looks good to me.

Me too.

Andrew.


From c_ejohns at qti.qualcomm.com  Tue Aug 18 19:43:23 2015
From: c_ejohns at qti.qualcomm.com (Johnson, E Andrew)
Date: Tue, 18 Aug 2015 19:43:23 +0000
Subject: [aarch64-port-dev ] aarch64 build is broken
Message-ID: <cc806c23ce1c4111a86f8b7b45a2f0c0@nasanexm02f.na.qualcomm.com>

After updating to the most recent sources (either jdk9/dev or jdk9/hs-comp), I am getting the following error:
macroAssembler_aarch64.o: In function `MacroAssembler::patch_oop(unsigned char*, unsigned char*)':
$HOME/openjdk/jdk9/dev/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:168: undefined reference to `oopDesc::encode_heap_oop(oopDesc*)'
collect2: error: ld returned 1 exit status
make[8]: *** [libjvm.so] Error 1

The source line in question has not changed since the last successful build, so something changed in some other checkin that is causing this error.  Has anyone else encountered this problem?  It is failing on both a native aarch64 build and also on a cross_compile build on x86_64.

-AndyJ


From aph at redhat.com  Wed Aug 19 07:32:49 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 19 Aug 2015 08:32:49 +0100
Subject: [aarch64-port-dev ] aarch64 build is broken
In-Reply-To: <cc806c23ce1c4111a86f8b7b45a2f0c0@nasanexm02f.na.qualcomm.com>
References: <cc806c23ce1c4111a86f8b7b45a2f0c0@nasanexm02f.na.qualcomm.com>
Message-ID: <55D43121.3070702@redhat.com>

On 08/18/2015 08:43 PM, Johnson, E Andrew wrote:
> Has anyone else encountered this problem?

I don't think so; you'll have to debug the build to find out.
I guess you've already tried a clean build from scratch.

Andrew.


From edward.nevill at gmail.com  Wed Aug 19 12:10:06 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 19 Aug 2015 13:10:06 +0100
Subject: [aarch64-port-dev ] RFR: 8133935: aarch64: fails to build from
	source
Message-ID: <1439986206.23660.11.camel@mint>

Hi,

http://cr.openjdk.java.net/~enevill/8133935/webrev/

fixes the following error in the build

macroAssembler_aarch64.o: In function `MacroAssembler::patch_oop(unsigned char*, unsigned char*)':
$HOME/openjdk/jdk9/dev/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:168: undefined reference to `oopDesc::encode_heap_oop(oopDesc*)'
collect2: error: ld returned 1 exit status
make[8]: *** [libjvm.so] Error 1

Please review,
Thanks,
Ed.

PS: Both hs-comp and dev are broken, should I push to both? or just to hs-comp?


From edward.nevill at gmail.com  Wed Aug 19 13:24:05 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 19 Aug 2015 14:24:05 +0100
Subject: [aarch64-port-dev ] RFR: 8133935: aarch64: fails to build from
	source
In-Reply-To: <55D47B72.40509@oracle.com>
References: <1439986206.23660.11.camel@mint>  <55D47B72.40509@oracle.com>
Message-ID: <1439990645.23660.64.camel@mint>

Sorry. I wasn't aware of that. New webrev.

http://cr.openjdk.java.net/~enevill/8133935/webrev.01/

Thanks,
Ed.

On Wed, 2015-08-19 at 08:49 -0400, Coleen Phillimore wrote:
> Sorry, I should spend more time on stuff like this.   The #include list 
> should be alphabetized.
> Coleen
> 
> On 8/19/15 8:10 AM, Edward Nevill wrote:
> > Hi,
> >
> > http://cr.openjdk.java.net/~enevill/8133935/webrev/
> >
> > fixes the following error in the build
> >
> > macroAssembler_aarch64.o: In function `MacroAssembler::patch_oop(unsigned char*, unsigned char*)':
> > $HOME/openjdk/jdk9/dev/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:168: undefined reference to `oopDesc::encode_heap_oop(oopDesc*)'
> > collect2: error: ld returned 1 exit status
> > make[8]: *** [libjvm.so] Error 1
> >
> > Please review,
> > Thanks,
> > Ed.
> >
> > PS: Both hs-comp and dev are broken, should I push to both? or just to hs-comp?
> >
> >
> 


From edward.nevill at gmail.com  Wed Aug 19 13:30:13 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Wed, 19 Aug 2015 14:30:13 +0100
Subject: [aarch64-port-dev ] RFR: 8078743: AARCH64: Extend use of stlr
 to cater for volatile object stores
In-Reply-To: <55CB3FEC.1070709@redhat.com>
References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com>
	<55CB3FEC.1070709@redhat.com>
Message-ID: <1439991013.23660.72.camel@mint>

Hi Andrew,

I have tested this on two different partner platforms with G1GC, CMS and
Parallel GC with and without UseCondCardMark.

I have also reviewed the code and and happy with the code and comments.

Vladimir, if you are still happy with this may I go ahead and push this
on behalf of Andrew Dinn. The changes only affect aarch64.ad.

Many thanks,
Ed.

On Wed, 2015-08-12 at 13:45 +0100, Andrew Dinn wrote:
> Hi Vladimir,
> 
> Apologies for the delay in responding to your feedback -- I was
> traveling for a team meeting all of last week.
> 
> Here is a revised webrev which includes all the code changes you suggested
> 
>   http://cr.openjdk.java.net/~adinn/8078743/webrev.04
> 


From coleen.phillimore at oracle.com  Wed Aug 19 14:10:31 2015
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Wed, 19 Aug 2015 10:10:31 -0400
Subject: [aarch64-port-dev ] RFR: 8133935: aarch64: fails to build from
	source
In-Reply-To: <1439990645.23660.64.camel@mint>
References: <1439986206.23660.11.camel@mint> <55D47B72.40509@oracle.com>
	<1439990645.23660.64.camel@mint>
Message-ID: <55D48E57.8040805@oracle.com>


yes, this looks good.
Coleen

On 8/19/15 9:24 AM, Edward Nevill wrote:
> Sorry. I wasn't aware of that. New webrev.
>
> http://cr.openjdk.java.net/~enevill/8133935/webrev.01/
>
> Thanks,
> Ed.
>
> On Wed, 2015-08-19 at 08:49 -0400, Coleen Phillimore wrote:
>> Sorry, I should spend more time on stuff like this.   The #include list
>> should be alphabetized.
>> Coleen
>>
>> On 8/19/15 8:10 AM, Edward Nevill wrote:
>>> Hi,
>>>
>>> http://cr.openjdk.java.net/~enevill/8133935/webrev/
>>>
>>> fixes the following error in the build
>>>
>>> macroAssembler_aarch64.o: In function `MacroAssembler::patch_oop(unsigned char*, unsigned char*)':
>>> $HOME/openjdk/jdk9/dev/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:168: undefined reference to `oopDesc::encode_heap_oop(oopDesc*)'
>>> collect2: error: ld returned 1 exit status
>>> make[8]: *** [libjvm.so] Error 1
>>>
>>> Please review,
>>> Thanks,
>>> Ed.
>>>
>>> PS: Both hs-comp and dev are broken, should I push to both? or just to hs-comp?
>>>
>>>
>


From adinn at redhat.com  Wed Aug 19 15:18:12 2015
From: adinn at redhat.com (adinn at redhat.com)
Date: Wed, 19 Aug 2015 15:18:12 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8: Added tag
	aarch64-jdk8u60-b24.2 for changeset	6b5ede6157df
Message-ID: <201508191518.t7JFICmY016017@aojmv0008.oracle.com>

Changeset: 9a781f9c1338
Author:    adinn
Date:      2015-08-19 16:16 +0100
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/rev/9a781f9c1338

Added tag aarch64-jdk8u60-b24.2 for changeset 6b5ede6157df

! .hgtags


From adinn at redhat.com  Wed Aug 19 15:18:16 2015
From: adinn at redhat.com (adinn at redhat.com)
Date: Wed, 19 Aug 2015 15:18:16 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/corba: Added tag
	aarch64-jdk8u60-b24.2 for	changeset a6046ed1e2d5
Message-ID: <201508191518.t7JFIG6g016025@aojmv0008.oracle.com>

Changeset: 63a3a8c3af8c
Author:    adinn
Date:      2015-08-19 16:16 +0100
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/corba/rev/63a3a8c3af8c

Added tag aarch64-jdk8u60-b24.2 for changeset a6046ed1e2d5

! .hgtags


From adinn at redhat.com  Wed Aug 19 15:18:20 2015
From: adinn at redhat.com (adinn at redhat.com)
Date: Wed, 19 Aug 2015 15:18:20 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: Added tag
	aarch64-jdk8u60-b24.2 for	changeset 8ec803e97a0d
Message-ID: <201508191518.t7JFIKbC016035@aojmv0008.oracle.com>

Changeset: a6acc533dfef
Author:    adinn
Date:      2015-08-19 16:16 +0100
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/a6acc533dfef

Added tag aarch64-jdk8u60-b24.2 for changeset 8ec803e97a0d

! .hgtags


From adinn at redhat.com  Wed Aug 19 15:18:24 2015
From: adinn at redhat.com (adinn at redhat.com)
Date: Wed, 19 Aug 2015 15:18:24 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/jaxp: Added tag
	aarch64-jdk8u60-b24.2 for	changeset 57a7047f7875
Message-ID: <201508191518.t7JFIOkj016056@aojmv0008.oracle.com>

Changeset: 46fdf7a8c4a2
Author:    adinn
Date:      2015-08-19 16:16 +0100
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/jaxp/rev/46fdf7a8c4a2

Added tag aarch64-jdk8u60-b24.2 for changeset 57a7047f7875

! .hgtags


From adinn at redhat.com  Wed Aug 19 15:18:28 2015
From: adinn at redhat.com (adinn at redhat.com)
Date: Wed, 19 Aug 2015 15:18:28 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/jaxws: Added tag
	aarch64-jdk8u60-b24.2 for	changeset afbe78dc93b2
Message-ID: <201508191518.t7JFIS0s016072@aojmv0008.oracle.com>

Changeset: 9b9270adaf2d
Author:    adinn
Date:      2015-08-19 16:16 +0100
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/jaxws/rev/9b9270adaf2d

Added tag aarch64-jdk8u60-b24.2 for changeset afbe78dc93b2

! .hgtags


From adinn at redhat.com  Wed Aug 19 15:18:32 2015
From: adinn at redhat.com (adinn at redhat.com)
Date: Wed, 19 Aug 2015 15:18:32 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/jdk: Added tag
	aarch64-jdk8u60-b24.2 for	changeset 0b8920048898
Message-ID: <201508191518.t7JFIXQW016082@aojmv0008.oracle.com>

Changeset: f4b06f2bc28d
Author:    adinn
Date:      2015-08-19 16:16 +0100
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/jdk/rev/f4b06f2bc28d

Added tag aarch64-jdk8u60-b24.2 for changeset 0b8920048898

! .hgtags


From adinn at redhat.com  Wed Aug 19 15:18:37 2015
From: adinn at redhat.com (adinn at redhat.com)
Date: Wed, 19 Aug 2015 15:18:37 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/langtools: Added tag
	aarch64-jdk8u60-b24.2 for	changeset 40a63630eb87
Message-ID: <201508191518.t7JFIceQ016092@aojmv0008.oracle.com>

Changeset: 847cdcb2d035
Author:    adinn
Date:      2015-08-19 16:16 +0100
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/langtools/rev/847cdcb2d035

Added tag aarch64-jdk8u60-b24.2 for changeset 40a63630eb87

! .hgtags


From adinn at redhat.com  Wed Aug 19 15:18:42 2015
From: adinn at redhat.com (adinn at redhat.com)
Date: Wed, 19 Aug 2015 15:18:42 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/nashorn: Added tag
	aarch64-jdk8u60-b24.2 for	changeset db6228024490
Message-ID: <201508191518.t7JFIgLf016100@aojmv0008.oracle.com>

Changeset: 748cb101e6a0
Author:    adinn
Date:      2015-08-19 16:16 +0100
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/nashorn/rev/748cb101e6a0

Added tag aarch64-jdk8u60-b24.2 for changeset db6228024490

! .hgtags


From vladimir.kozlov at oracle.com  Wed Aug 19 16:33:49 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Aug 2015 09:33:49 -0700
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
 unpredictable instructions
In-Reply-To: <1439903231.5709.5.camel@mylittlepony.linaroharston>
References: <1439308653.5920.16.camel@mylittlepony.linaroharston>
	<55CA28EC.7060109@oracle.com>
	<1439396612.4820.31.camel@mylittlepony.linaroharston>
	<1439903231.5709.5.camel@mylittlepony.linaroharston>
Message-ID: <55D4AFED.5000305@oracle.com>

Looks fine to me.

I did not see any comments from our colleges (from RH) who works on 
arm64. Are they agree with this change?

Thanks,
Vladimir

On 8/18/15 6:07 AM, Edward Nevill wrote:
> Hi,
>
> Given that there has been no objections to my proposed solution I have prepared a webrev based on this.
>
> http://cr.openjdk.java.net/~enevill/8133352/webrev.01
>
> The original jira issue is here
>
> https://bugs.openjdk.java.net/browse/JDK-8133352
>
> I have tested with jtreg hotspot and langtools. Results before and after were identical.
>
> Hotspot: Test results: passed: 883; failed: 2; error: 10
> Langtools: Test results: passed: 3,260; failed: 2
>
> Please review and if OK I will push,
>
> Thanks,
> Ed.
>
> On Wed, 2015-08-12 at 17:23 +0100, Edward Nevill wrote:
>> On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote:
>>> I think it depends how expensive push/pop on arm64.
>>> In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in
>>> .ad). So you are saving on stack anyway.
>>> On other hand your changes (third temp) are not so big and I think acceptable.
>>> On 8/11/15 8:57 AM, Edward Nevill wrote:
>> -#define ATOMIC_OP(LDXR, OP, STXR)                                       \
>> +#define ATOMIC_OP(LDXR, OP, IOP, STXR)                                       \
>>   void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \
>>     Register result = rscratch2;                                          \
>>     if (prev->is_valid())                                                 \
>> @@ -2120,14 +2125,15 @@
>>     bind(retry_load);                                                     \
>>     LDXR(result, addr);                                                   \
>>     OP(rscratch1, result, incr);                                          \
>> -  STXR(rscratch1, rscratch1, addr);                                     \
>> -  cbnzw(rscratch1, retry_load);                                         \
>> -  if (prev->is_valid() && prev != result)                               \
>> -    mov(prev, result);                                                  \
>> +  STXR(rscratch2, rscratch1, addr);                                     \
>> +  cbnzw(rscratch2, retry_load);                                         \
>> +  if (prev->is_valid() && prev != result) {                             \
>> +    IOP(prev, rscratch1, incr);                                         \
>> +  }                                                                     \
>>   }
>>
>> -ATOMIC_OP(ldxr, add, stxr)
>> -ATOMIC_OP(ldxrw, addw, stxrw)
>> +ATOMIC_OP(ldxr, add, sub, stxr)
>> +ATOMIC_OP(ldxrw, addw, subw, stxrw)
>>
>> This essentially creates the extra register we need by using the inverse operation to restore the result.
>>
>> It doesn't win any beauty contests, but it is probably the most optimal.
>
>
>

From aph at redhat.com  Wed Aug 19 16:37:26 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 19 Aug 2015 17:37:26 +0100
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
 unpredictable instructions
In-Reply-To: <55D4AFED.5000305@oracle.com>
References: <1439308653.5920.16.camel@mylittlepony.linaroharston>	<55CA28EC.7060109@oracle.com>	<1439396612.4820.31.camel@mylittlepony.linaroharston>	<1439903231.5709.5.camel@mylittlepony.linaroharston>
	<55D4AFED.5000305@oracle.com>
Message-ID: <55D4B0C6.7050803@redhat.com>

On 08/19/2015 05:33 PM, Vladimir Kozlov wrote:
> I did not see any comments from our colleges (from RH) who works on 
> arm64. Are they agree with this change?

Oh yes, absolutely.

Andrew.


From adinn at redhat.com  Wed Aug 19 16:38:59 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 19 Aug 2015 17:38:59 +0100
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
 unpredictable instructions
In-Reply-To: <55D4AFED.5000305@oracle.com>
References: <1439308653.5920.16.camel@mylittlepony.linaroharston>
	<55CA28EC.7060109@oracle.com>
	<1439396612.4820.31.camel@mylittlepony.linaroharston>
	<1439903231.5709.5.camel@mylittlepony.linaroharston>
	<55D4AFED.5000305@oracle.com>
Message-ID: <55D4B123.5060403@redhat.com>

On 19/08/15 17:33, Vladimir Kozlov wrote:
> Looks fine to me.
> 
> I did not see any comments from our colleges (from RH) who works on
> arm64. Are they agree with this change?

Hmm, maybe the mail server is being slow. Andrew Haley and I both gave
it a thumbs up earlier today.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From vladimir.kozlov at oracle.com  Wed Aug 19 16:48:53 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Aug 2015 09:48:53 -0700
Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained
 unpredictable instructions
In-Reply-To: <55D4B123.5060403@redhat.com>
References: <1439308653.5920.16.camel@mylittlepony.linaroharston>
	<55CA28EC.7060109@oracle.com>
	<1439396612.4820.31.camel@mylittlepony.linaroharston>
	<1439903231.5709.5.camel@mylittlepony.linaroharston>
	<55D4AFED.5000305@oracle.com> <55D4B123.5060403@redhat.com>
Message-ID: <55D4B375.4010307@oracle.com>

Good.

Yes, may be it is our internal Oracle's server.
But I found your comments in openjdk mailing archive:

http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-August/018700.html

Thanks,
Vladimir

On 8/19/15 9:38 AM, Andrew Dinn wrote:
> On 19/08/15 17:33, Vladimir Kozlov wrote:
>> Looks fine to me.
>>
>> I did not see any comments from our colleges (from RH) who works on
>> arm64. Are they agree with this change?
>
> Hmm, maybe the mail server is being slow. Andrew Haley and I both gave
> it a thumbs up earlier today.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)
>

From vladimir.kozlov at oracle.com  Wed Aug 19 18:22:18 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Aug 2015 11:22:18 -0700
Subject: [aarch64-port-dev ] RFR: 8078743: AARCH64: Extend use of stlr
 to cater for volatile object stores
In-Reply-To: <1439991013.23660.72.camel@mint>
References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com>
	<55CB3FEC.1070709@redhat.com> <1439991013.23660.72.camel@mint>
Message-ID: <55D4C95A.4060308@oracle.com>

Yes, I agree with latest changes.

Thank you, Andrew, for running performance tests!

Regards,
Vladimir

On 8/19/15 6:30 AM, Edward Nevill wrote:
> Hi Andrew,
>
> I have tested this on two different partner platforms with G1GC, CMS and
> Parallel GC with and without UseCondCardMark.
>
> I have also reviewed the code and and happy with the code and comments.
>
> Vladimir, if you are still happy with this may I go ahead and push this
> on behalf of Andrew Dinn. The changes only affect aarch64.ad.
>
> Many thanks,
> Ed.
>
> On Wed, 2015-08-12 at 13:45 +0100, Andrew Dinn wrote:
>> Hi Vladimir,
>>
>> Apologies for the delay in responding to your feedback -- I was
>> traveling for a team meeting all of last week.
>>
>> Here is a revised webrev which includes all the code changes you suggested
>>
>>    http://cr.openjdk.java.net/~adinn/8078743/webrev.04
>>
>
>

From edward.nevill at gmail.com  Thu Aug 20 10:47:44 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 20 Aug 2015 11:47:44 +0100
Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal
 instructions with int shifts >=32
Message-ID: <1440067664.5989.21.camel@mylittlepony.linaroharston>

Hi,

The following webrev

http://cr.openjdk.java.net/~enevill/8133842/webrev.01/

fixes a problem reported by one of our partners whereby C2 can generate illegal instructions on certain partners HW.

JIRA issue here

https://bugs.openjdk.java.net/browse/JDK-8133842

The problem occurs when you have a logical or arithmetic instruction with a RHS which is shifted by a constant where (const & 32) != 0, ie the constant is 32..63 or 96..127 etc.

For example the following

  res = i | (j >> 53)

generates the instruction

  orrw Rd, Rn, Rm, ASR #53

This instruction has a 6 bit field for the shift so this would appear to be a legal encoding. However certain partner HW treats this as an undefined instruction generating a SIGILL.

The problem was that the rules in aarch64.ad were always anding the constant with 0x3f for both ints and longs.

The above webrev fixes this to mask with 0x1f for ints and 0x3f for longs.

Tested with hotspot and langtools. Results the same in both cases.

Hotspot: Test results: passed: 863; failed: 2; error: 10
Langtools: Test results: passed: 3,263

Thanks for your help with this review,
Ed.


From aph at redhat.com  Thu Aug 20 10:56:20 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 20 Aug 2015 11:56:20 +0100
Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal
 instructions with int shifts >=32
In-Reply-To: <1440067664.5989.21.camel@mylittlepony.linaroharston>
References: <1440067664.5989.21.camel@mylittlepony.linaroharston>
Message-ID: <55D5B254.1090303@redhat.com>

Looks good for C2.  OK if you already checked C1!

Thanks,
Andrew.

From adinn at redhat.com  Thu Aug 20 11:24:11 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 20 Aug 2015 12:24:11 +0100
Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal
 instructions with int shifts >=32
In-Reply-To: <55D5B254.1090303@redhat.com>
References: <1440067664.5989.21.camel@mylittlepony.linaroharston>
	<55D5B254.1090303@redhat.com>
Message-ID: <55D5B8DB.8030508@redhat.com>

On 20/08/15 11:56, Andrew Haley wrote:
> Looks good for C2.  OK if you already checked C1!
> 
> Thanks,
> Andrew.

What he said :-)

regards,


Andrew Dinn
-----------


From edward.nevill at gmail.com  Thu Aug 20 13:10:20 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 20 Aug 2015 14:10:20 +0100
Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal
 instructions with int shifts >=32
In-Reply-To: <55D5B254.1090303@redhat.com>
References: <1440067664.5989.21.camel@mylittlepony.linaroharston>
	<55D5B254.1090303@redhat.com>
Message-ID: <1440076220.5989.27.camel@mylittlepony.linaroharston>

On Thu, 2015-08-20 at 11:56 +0100, Andrew Haley wrote:
> Looks good for C2.  OK if you already checked C1!

Yes. For my test case

  res += i | (j >> 53);

C1 generates

  0x000003ff90eb0d68: lsr       w3, w1, #21
  0x000003ff90eb0d6c: orr       w3, w0, w3
  0x000003ff90eb0d70: add       w3, w3, w2

IE. It doesn't attempt to merge the shift and the or, and it masks the shift with 0x1f getting 21 instead of 53.

Below is the code in C1 that does it (from c1_LIRGenerator_aarch64.cpp). You can see it ands ints with 0x1f and longs with 0x3f.

All the best,
Ed.

--- cut here ---
  if (right.is_constant()) {
    right.dont_load_item();

    switch (x->op()) {
    case Bytecodes::_ishl: {
      int c = right.get_jint_constant() & 0x1f;
      __ shift_left(left.result(), c, x->operand());
      break;
    }
    case Bytecodes::_ishr: {
      int c = right.get_jint_constant() & 0x1f;
      __ shift_right(left.result(), c, x->operand());
      break;
    }
    case Bytecodes::_iushr: {
      int c = right.get_jint_constant() & 0x1f;
      __ unsigned_shift_right(left.result(), c, x->operand());
      break;
    }
    case Bytecodes::_lshl: {
      int c = right.get_jint_constant() & 0x3f;
      __ shift_left(left.result(), c, x->operand());
      break;
    }
    case Bytecodes::_lshr: {
      int c = right.get_jint_constant() & 0x3f;
      __ shift_right(left.result(), c, x->operand());
      break;
    }
    case Bytecodes::_lushr: {
      int c = right.get_jint_constant() & 0x3f;
      __ unsigned_shift_right(left.result(), c, x->operand());
      break;
    }
    default:
      ShouldNotReachHere();
    }
  } else {
--- cut here ---


From edward.nevill at gmail.com  Thu Aug 20 15:06:35 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 20 Aug 2015 16:06:35 +0100
Subject: [aarch64-port-dev ] Patches to backport to jdk8
Message-ID: <1440083195.10528.4.camel@mylittlepony.linaroharston>

Hi,

Is it OK to backport the following patches from jdk9 to jdk8.

All the best,
Ed.

--- CUT HERE ---
changeset:   8690:e4304d76473f
user:        enevill
date:        Wed Jul 15 16:05:53 2015 +0000
files:       src/cpu/aarch64/vm/aarch64.ad
description:
8131358: aarch64: test compiler/loopopts/superword/ProdRed_Float.java fails when run with debug VM
Summary: fix typo in match rule in vsub2f
Reviewed-by: kvn, aph
----------------
changeset:   8693:aa7220a36fb0
user:        enevill
date:        Thu Jul 16 14:16:44 2015 +0000
files:       src/cpu/aarch64/vm/assembler_aarch64.cpp src/cpu/aarch64/vm/assembler_aarch64.hpp src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
description:
8131483: aarch64: illegal stlxr instructions
Summary: Do not generate stlxX with Ws == Xn
Reviewed-by: kvn, aph
----------------
changeset:   8721:89a220e70e99
user:        enevill
date:        Fri Jul 17 07:50:36 2015 +0000
files:       src/cpu/aarch64/vm/aarch64.ad src/cpu/aarch64/vm/macroAssembler_aarch64.cpp src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
description:
8131362: aarch64: C2 does not handle large stack offsets
Summary: change spill code to allow large offsets
Reviewed-by: kvn, aph
----------------
changeset:   8840:8783515c57ad
user:        enevill
date:        Tue Aug 18 12:40:22 2015 +0000
files:       src/cpu/aarch64/vm/assembler_aarch64.cpp src/cpu/aarch64/vm/assembler_aarch64.hpp src/cpu/aarch64/vm/interp_masm_aarch64.cpp src/cpu/aarch64/vm/macroAssembler_aarch64.cpp src/cpu/aarch64/vm/macroAssembler_aarch64.hpp src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp src/cpu/aarch64/vm/templateInterpreter_aarch64.cpp
description:
8133352: aarch64: generates constrained unpredictable instructions
Summary: Fix generation of unpredictable STXR Rs, Rt, [Rn] with Rs == Rt
Reviewed-by: kvn, aph, adinn
----------------
changeset:   8842:a94519010b13
tag:         tip
user:        enevill
date:        Thu Aug 20 09:40:08 2015 +0000
files:       src/cpu/aarch64/vm/aarch64.ad src/cpu/aarch64/vm/aarch64_ad.m4
description:
8133842: aarch64: C2 generates illegal instructions with int shifts >=32
Summary: Fix logical operatations combined with shifts >= 32
Reviewed-by: duke
--- CUT HERE ---


From vladimir.kozlov at oracle.com  Thu Aug 20 15:55:36 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 20 Aug 2015 08:55:36 -0700
Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal
 instructions with int shifts >=32
In-Reply-To: <1440067664.5989.21.camel@mylittlepony.linaroharston>
References: <1440067664.5989.21.camel@mylittlepony.linaroharston>
Message-ID: <55D5F878.6030901@oracle.com>

Changes are fine but I am curious why results will be the same even if you mask less bits - I don't see predicates which 
checks shift's value (to select different instructions for different shift values) and you use general immI operand type 
for it.

Thanks,
Vladimir

On 8/20/15 3:47 AM, Edward Nevill wrote:
> Hi,
>
> The following webrev
>
> http://cr.openjdk.java.net/~enevill/8133842/webrev.01/
>
> fixes a problem reported by one of our partners whereby C2 can generate illegal instructions on certain partners HW.
>
> JIRA issue here
>
> https://bugs.openjdk.java.net/browse/JDK-8133842
>
> The problem occurs when you have a logical or arithmetic instruction with a RHS which is shifted by a constant where (const & 32) != 0, ie the constant is 32..63 or 96..127 etc.
>
> For example the following
>
>    res = i | (j >> 53)
>
> generates the instruction
>
>    orrw Rd, Rn, Rm, ASR #53
>
> This instruction has a 6 bit field for the shift so this would appear to be a legal encoding. However certain partner HW treats this as an undefined instruction generating a SIGILL.
>
> The problem was that the rules in aarch64.ad were always anding the constant with 0x3f for both ints and longs.
>
> The above webrev fixes this to mask with 0x1f for ints and 0x3f for longs.
>
> Tested with hotspot and langtools. Results the same in both cases.
>
> Hotspot: Test results: passed: 863; failed: 2; error: 10
> Langtools: Test results: passed: 3,263
>
> Thanks for your help with this review,
> Ed.
>
>

From aph at redhat.com  Thu Aug 20 16:04:51 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 20 Aug 2015 17:04:51 +0100
Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal
 instructions with int shifts >=32
In-Reply-To: <55D5F878.6030901@oracle.com>
References: <1440067664.5989.21.camel@mylittlepony.linaroharston>
	<55D5F878.6030901@oracle.com>
Message-ID: <55D5FAA3.4050200@redhat.com>

On 08/20/2015 04:55 PM, Vladimir Kozlov wrote:
> Changes are fine but I am curious why results will be the same even if you mask less bits

Java VM spec says:

The shift distance actually used is always in the range 0 to 31,
inclusive, as if value2 were subjected to a bitwise logical AND with
the mask value 0x1f.

Andrew.

From edward.nevill at gmail.com  Thu Aug 20 16:06:22 2015
From: edward.nevill at gmail.com (Edward Nevill)
Date: Thu, 20 Aug 2015 17:06:22 +0100
Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal
 instructions with int shifts >=32
In-Reply-To: <55D5F878.6030901@oracle.com>
References: <1440067664.5989.21.camel@mylittlepony.linaroharston>
	<55D5F878.6030901@oracle.com>
Message-ID: <1440086782.1907.6.camel@mylittlepony.linaroharston>

On Thu, 2015-08-20 at 08:55 -0700, Vladimir Kozlov wrote:
> Changes are fine but I am curious why results will be the same even if you mask less bits - I don't see predicates which 
> checks shift's value (to select different instructions for different shift values) and you use general immI operand type 
> for it.

Not sure I understand the question.

Currently it generate

    orrw Rd, Rn, Rm, ASR #53

Rd, Rn and Rm are 32 bit here (for orrw as opposed to orr).

So on our partners implementation this generates a SIGILL.

The webrev change this to

    orrw Rd, Rn, Rm, ASR #21

which is correct (for Java).

All the best,
Ed.


From vladimir.kozlov at oracle.com  Thu Aug 20 16:34:30 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 20 Aug 2015 09:34:30 -0700
Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal
 instructions with int shifts >=32
In-Reply-To: <1440086782.1907.6.camel@mylittlepony.linaroharston>
References: <1440067664.5989.21.camel@mylittlepony.linaroharston>
	<55D5F878.6030901@oracle.com>
	<1440086782.1907.6.camel@mylittlepony.linaroharston>
Message-ID: <55D60196.7000505@oracle.com>

Thank you. Andrew answered my question (only 0-31 values are meaningful).
I thought you had on low level asm instruction masking already.
For example on sparc we use u_field(imm5a, 4, 0):

void sll(  Register s1, int imm5a,   Register d ) { emit_int32( op(arith_op) | rd(d) | op3(sll_op3) | rs1(s1) | sx(0) | 
immed(true) | u_field(imm5a, 4, 0) ); }

It looks like you have to mask on more high level then we do on SPARC.

Again, changes are good.

Thanks,
Vladimir

On 8/20/15 9:06 AM, Edward Nevill wrote:
> On Thu, 2015-08-20 at 08:55 -0700, Vladimir Kozlov wrote:
>> Changes are fine but I am curious why results will be the same even if you mask less bits - I don't see predicates which
>> checks shift's value (to select different instructions for different shift values) and you use general immI operand type
>> for it.
>
> Not sure I understand the question.
>
> Currently it generate
>
>      orrw Rd, Rn, Rm, ASR #53
>
> Rd, Rn and Rm are 32 bit here (for orrw as opposed to orr).
>
> So on our partners implementation this generates a SIGILL.
>
> The webrev change this to
>
>      orrw Rd, Rn, Rm, ASR #21
>
> which is correct (for Java).
>
> All the best,
> Ed.
>
>

From vladimir.kozlov at oracle.com  Fri Aug 21 21:24:17 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 21 Aug 2015 14:24:17 -0700
Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues
In-Reply-To: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>
Message-ID: <55D79701.9090407@oracle.com>

Thank you for report and suggested fixes.

CC to aarch64 port developers.

Thanks,
Vladimir

On 8/21/15 5:21 AM, Hui Shi wrote:
> Hi JIT members,
> Attached fast_lock.patch fixes issues in fast lock/unlock on aarch64
> platform (in both aarch64-jdk8 and jdk9/hs-comp/hotspot). Could someone
> help comments, review or sponsor?
> A small test case and  PrintAssembly log with/without fix are also
> attached for reference.
>
> To reproduce this issue on aarch64, command line is "java
> -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions
> -XX:-BackgroundCompilation -XX:CompileCommand="compileonly,TestSync.f*"
> -XX:+PrintAssembly  TestSync"
> There are three Issues in aarch64 fast lock/unlock:
> *1. Duplicated biased lock checking*
> When option UseBiasedLocking and UseOptoBiasInlining are both true, it
> doesn't need emit biased_locking_enter in aarch64_enc_fast_lock. This is
> redundant as biased locking enter check is already inlined in
> PhaseMacroExpand::expand_lock_node. Checking assembly code in orig.asm
>
> [Inlined biased lock check in PhaseMacroExpand::expand_lock_node]
>    0x000003ff88320d94: str       x1, [sp]
>    0x000003ff88320d98: ldr       x10, [x1]
>    0x000003ff88320d9c: and       x11, x10, #0x7
>    0x000003ff88320da0: cmp       x11, #0x5
>    0x000003ff88320da4: b.ne <http://b.ne>      0x000003ff88320e18
> [Biased lock check expanded in aarch64_enc_fast_lock]
>    0x000003ff88320e18: add       x12, sp, #0x10
>    0x000003ff88320e1c: ldr       x10, [x1]
>    0x000003ff88320e20: and       x11, x12, #0x7
>    0x000003ff88320e24: cmp       x11, #0x5
>    0x000003ff88320e28: b.ne <http://b.ne>      0x000003ff88320eec
> *2. Incorrect parameter used in biased_locking_enter in
> aarch64_enc_fast_lock*
> Checking above code [Biased lock check expanded in
> aarch64_enc_fast_lock], x12 is the box register and holding the address
> of the lock record on stack. However it is mis-used as mark word in
> biased lock checking here. As a result, biased pattern check always
> fails because stack pointer is 8 bytes align and x11 must be zero.
> Current implementation in aarch64_enc_fast_lock.
> /biased_locking_enter(disp_hdr, oop, box, tmp, true, cont);/
> Which should be
> /biased_locking_enter(box, oop, disp_hdr, tmp, true, cont);  //swap
> disp_hdr and box register, disp_hdr is already loaded with object mark word/
> This issue might cause problem when running with option
> -XX:-UseOptoBiasInlining in following scenario, let?s check above code
> in [Biased lock check expanded in aarch64_enc_fast_lock], x12 is box and
> x10 is disp_hdr.
> 1. Suppose object?s mark word (loaded into register x10) is in biased
> mode, with content ?[biased_thread |epoch|age| 101]? and biased_thread
> is executing its synchronized block.
> 2. Another thread tries to acquire the same lock. Firstly, it performs
> biased pattern check and fails, because ?mark word? register used here
> is X12 (correct register should be x10).
> 3. As x12 is not ?biased? (least three significant bits of SP + 0x10
> would never be 101),  execution goes to thin lock CAS acquire code
> instead of biased lock revoke/rebias code.
> 4. Thin lock CAS acquire will succeed because x10?s least two
> significant bit is 01 (thin lock CAS code uses disp_hdr (x10) as mark
> word). Two threads acquire same lock at same time and this is incorrect
> behavior.
>
> *3. Inflate monitor code has typo in aarch64_enc_fast_lock*
> Inflated lock test is generated under condition (EmitSync & 0x02), while
> generating inflated lock fast path under condition "if ((EmitSync &
> 0x02) == 0))". At both location, they should be "if ((EmitSync & 0x02)
> == 0) ". In orig.asm, no instruction branches to inflated lock acquire
> fast path at 0x000003ff88320f24.
> Issue #1 and #3 does not impact correctness, they introduce redundant
> code (double biased lock check) and skip inflated lock fast path check
> (_owner is null case).
> Fix is in aarch64_enc_fast_lock/aarch64_enc_fast_unlock, this will not
> impact C1 and interpreter. Attached patch includes:
> 1. Disable generating biased lock handle code in fast_lock/fast_unlock
> when UseOptoBiasInlining is true.
> 2. Adjust biased_locking_enter?s actual parameters, swap disp_hdr and
> box register.
> 3. Fix typo in inflated monitor handling.
>
> Regards
> Shi Hui

From hui.shi at linaro.org  Sat Aug 22 00:37:54 2015
From: hui.shi at linaro.org (Hui Shi)
Date: Sat, 22 Aug 2015 08:37:54 +0800
Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues
In-Reply-To: <55D79701.9090407@oracle.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>
	<55D79701.9090407@oracle.com>
Message-ID: <CAF1YaiD81ZQJcope8_5=FRsYPt-N+uogKV9J=VD=RTY0mhOsew@mail.gmail.com>

Thanks Vladimir!

Regards
Shi Hui

On 22 August 2015 at 05:24, Vladimir Kozlov <vladimir.kozlov at oracle.com>
wrote:

> Thank you for report and suggested fixes.
>
> CC to aarch64 port developers.
>
> Thanks,
> Vladimir
>
> On 8/21/15 5:21 AM, Hui Shi wrote:
>
>> Hi JIT members,
>> Attached fast_lock.patch fixes issues in fast lock/unlock on aarch64
>> platform (in both aarch64-jdk8 and jdk9/hs-comp/hotspot). Could someone
>> help comments, review or sponsor?
>> A small test case and  PrintAssembly log with/without fix are also
>> attached for reference.
>>
>> To reproduce this issue on aarch64, command line is "java
>> -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions
>> -XX:-BackgroundCompilation -XX:CompileCommand="compileonly,TestSync.f*"
>> -XX:+PrintAssembly  TestSync"
>> There are three Issues in aarch64 fast lock/unlock:
>> *1. Duplicated biased lock checking*
>> When option UseBiasedLocking and UseOptoBiasInlining are both true, it
>> doesn't need emit biased_locking_enter in aarch64_enc_fast_lock. This is
>> redundant as biased locking enter check is already inlined in
>> PhaseMacroExpand::expand_lock_node. Checking assembly code in orig.asm
>>
>> [Inlined biased lock check in PhaseMacroExpand::expand_lock_node]
>>    0x000003ff88320d94: str       x1, [sp]
>>    0x000003ff88320d98: ldr       x10, [x1]
>>    0x000003ff88320d9c: and       x11, x10, #0x7
>>    0x000003ff88320da0: cmp       x11, #0x5
>>    0x000003ff88320da4: b.ne <http://b.ne>      0x000003ff88320e18
>> [Biased lock check expanded in aarch64_enc_fast_lock]
>>    0x000003ff88320e18: add       x12, sp, #0x10
>>    0x000003ff88320e1c: ldr       x10, [x1]
>>    0x000003ff88320e20: and       x11, x12, #0x7
>>    0x000003ff88320e24: cmp       x11, #0x5
>>    0x000003ff88320e28: b.ne <http://b.ne>      0x000003ff88320eec
>> *2. Incorrect parameter used in biased_locking_enter in
>> aarch64_enc_fast_lock*
>> Checking above code [Biased lock check expanded in
>> aarch64_enc_fast_lock], x12 is the box register and holding the address
>> of the lock record on stack. However it is mis-used as mark word in
>> biased lock checking here. As a result, biased pattern check always
>> fails because stack pointer is 8 bytes align and x11 must be zero.
>> Current implementation in aarch64_enc_fast_lock.
>> /biased_locking_enter(disp_hdr, oop, box, tmp, true, cont);/
>> Which should be
>> /biased_locking_enter(box, oop, disp_hdr, tmp, true, cont);  //swap
>> disp_hdr and box register, disp_hdr is already loaded with object mark
>> word/
>> This issue might cause problem when running with option
>> -XX:-UseOptoBiasInlining in following scenario, let?s check above code
>> in [Biased lock check expanded in aarch64_enc_fast_lock], x12 is box and
>> x10 is disp_hdr.
>> 1. Suppose object?s mark word (loaded into register x10) is in biased
>> mode, with content ?[biased_thread |epoch|age| 101]? and biased_thread
>> is executing its synchronized block.
>> 2. Another thread tries to acquire the same lock. Firstly, it performs
>> biased pattern check and fails, because ?mark word? register used here
>> is X12 (correct register should be x10).
>> 3. As x12 is not ?biased? (least three significant bits of SP + 0x10
>> would never be 101),  execution goes to thin lock CAS acquire code
>> instead of biased lock revoke/rebias code.
>> 4. Thin lock CAS acquire will succeed because x10?s least two
>> significant bit is 01 (thin lock CAS code uses disp_hdr (x10) as mark
>> word). Two threads acquire same lock at same time and this is incorrect
>> behavior.
>>
>> *3. Inflate monitor code has typo in aarch64_enc_fast_lock*
>> Inflated lock test is generated under condition (EmitSync & 0x02), while
>> generating inflated lock fast path under condition "if ((EmitSync &
>> 0x02) == 0))". At both location, they should be "if ((EmitSync & 0x02)
>> == 0) ". In orig.asm, no instruction branches to inflated lock acquire
>> fast path at 0x000003ff88320f24.
>> Issue #1 and #3 does not impact correctness, they introduce redundant
>> code (double biased lock check) and skip inflated lock fast path check
>> (_owner is null case).
>> Fix is in aarch64_enc_fast_lock/aarch64_enc_fast_unlock, this will not
>> impact C1 and interpreter. Attached patch includes:
>> 1. Disable generating biased lock handle code in fast_lock/fast_unlock
>> when UseOptoBiasInlining is true.
>> 2. Adjust biased_locking_enter?s actual parameters, swap disp_hdr and
>> box register.
>> 3. Fix typo in inflated monitor handling.
>>
>> Regards
>> Shi Hui
>>
>

From aph at redhat.com  Mon Aug 24 09:31:59 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 24 Aug 2015 10:31:59 +0100
Subject: [aarch64-port-dev ] An interesting bug
Message-ID: <55DAE48F.2030403@redhat.com>

This is JDK8 on AArch64.

I have an interesting bug which causes a crash.  Occasionally, but
rarely, I get a crash when accessing an instance of ForkJoinTask.  The
instance is allocated in the old generation of the parallel scan GC.
However, the "this" pointer of the ForkJoinTask points into the middle
of another allocated object.

I'm trying to figure out how this might happen.  Perhaps the root scan
of the thread running the ForkJoinTask did not happen, or for some
reason the roots it found were not added to the root set.  Or perhaps
they were found, but were not rewritten in the thread.

Has anyone seen anything like this before?  Do you have any advice for
me?

Thanks,
Andrew.

From aph at redhat.com  Mon Aug 24 10:07:49 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 24 Aug 2015 11:07:49 +0100
Subject: [aarch64-port-dev ] Patches to backport to jdk8
In-Reply-To: <1440083195.10528.4.camel@mylittlepony.linaroharston>
References: <1440083195.10528.4.camel@mylittlepony.linaroharston>
Message-ID: <55DAECF5.4040303@redhat.com>

On 08/20/2015 04:06 PM, Edward Nevill wrote:
> Is it OK to backport the following patches from jdk9 to jdk8.

Yes, please.

Andrew.


From aph at redhat.com  Mon Aug 24 10:12:29 2015
From: aph at redhat.com (Andrew Haley)
Date: Mon, 24 Aug 2015 11:12:29 +0100
Subject: [aarch64-port-dev ] An interesting bug
In-Reply-To: <55DAE48F.2030403@redhat.com>
References: <55DAE48F.2030403@redhat.com>
Message-ID: <55DAEE0D.1020907@redhat.com>

I suppose another possibility might be that the worker threads promoting
to the old generation somehow (because of a failed compare-and-swap)
overwrote each other's data.

Andrew.

From adinn at redhat.com  Mon Aug 24 14:31:29 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 24 Aug 2015 15:31:29 +0100
Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary dmbs
 from generated CAS code
Message-ID: <55DB2AC1.50208@redhat.com>

The following webrev against hs-comp head fixes 8080293

  http://cr.openjdk.java.net/~adinn/8080293/webrev.00/

It is a follow on to the prior volatile object patch

  8078743: AARCH64: Extend use of stlr to cater for volatile object stores
  http://cr.openjdk.java.net/~adinn/8078743/webrev.04/

and requires that previous patch to be applied first.

Testing
-------

The patch is sensitive to GC configuration so it was tested against 5
relevant configs

  G1
 CMS+UseCondCardMark
 CMS-UseCondCardMark
 Par+UseCondCardMark
 Par-UseCondCardMark

The validity of the transformation was verified by:

  generating and eyeballing compiled code for simple test programs
  successfully running a fairly large program (netbeans)
  generating and eyeballing HashMap code compiled on a fairly large
program run

The fix was performance tested on 2 implementations of the AArch64
architecture (more details below). On an O-O-O CPU it gave no noticeable
benefit. On a simple pipeline CPU it gave a very significant benefit in
specific cases.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

The Test
--------

As with the prior patch I tested the original vs new code generation
strategy by running a jmh test first with -XX:+UseBarriersForVolatile
and then with -XX:+UseBarriersForVolatile. Four different test programs
ran in all 5 GC configs executing. Each test executed repeated CAS
operations to an object field in a single thread with a BlackHole
backoff between CASes varying from 0 to 64.

Test one performed a CAS guaranteed to fail; test two performed a
successful CAS from a fixed object to null and then back; test three
performed a successful CAS from a fixed object to another fixed object
and then back; test four performed a successful CAS from a fixed
object to a newly allocated object and then back. The average time per
CAS operation (ns/op) -- actually per 2 CAS operations for the latter 3
tests -- was used as a score.

The Results
-----------

On an O-O-O CPU there was no significant difference in the time taken.

On a simple pipeline CPU the optimization gave a very significant
benefit for the Fail tests on all GC configurations except CMS
+ UseCondCardMark. In all other cases there was no significant
measurable benefit.

Example Test
------------

package org.openjdk;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicReference;

@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class CasNull {

    Object tombstone;

    AtomicReference<Object> ref;

    @Param({"0", "1", "2", "4", "8", "16", "32", "64"})
    int backoff;

    @Setup
    public void setup() {
        tombstone = new Object();

        ref = new AtomicReference<>();
        ref.set(tombstone);
    }

    @Benchmark
    public boolean test() {
        Blackhole.consumeCPU(backoff);
        ref.compareAndSet(tombstone, null);
        ref.compareAndSet(null, tombstone);
	return true;
    }
}

From adinn at redhat.com  Mon Aug 24 15:25:08 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Mon, 24 Aug 2015 16:25:08 +0100
Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues
In-Reply-To: <55D79701.9090407@oracle.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>
	<55D79701.9090407@oracle.com>
Message-ID: <55DB3754.9090504@redhat.com>

Thank you for proposing these fixes. Your analysis and patch both look
to me to be completely correct. I applied the patch and it appears to
work fine when running programs with biased locking enabled.

I have raised the following JIRA for this fix:

  https://bugs.openjdk.java.net/browse/JDK-8134322

I will post a webrev containing your patch for review as soon as possible.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From vladimir.kozlov at oracle.com  Tue Aug 25 00:24:29 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 24 Aug 2015 17:24:29 -0700
Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary
 dmbs from generated CAS code
In-Reply-To: <55DB2AC1.50208@redhat.com>
References: <55DB2AC1.50208@redhat.com>
Message-ID: <55DBB5BD.4040400@oracle.com>

I did not look deep on code logic.
Few comments only:

Use {} for all conditional code (it cause a lot of pain in the past):

       if (is_cas)
         return NULL;

You don't need #ifndef PRODUCT:

+#ifndef PRODUCT
+#ifdef ASSERT

New mach instructs missing predicate:

  predicate(needs_acquiring_load_exclusive(n));

You use higher ins_cost to avoid their generation when predicate is 
false.  So why not explicit predicate?

Thanks,
Vladimir

On 8/24/15 7:31 AM, Andrew Dinn wrote:
> The following webrev against hs-comp head fixes 8080293
>
>    http://cr.openjdk.java.net/~adinn/8080293/webrev.00/
>
> It is a follow on to the prior volatile object patch
>
>    8078743: AARCH64: Extend use of stlr to cater for volatile object stores
>    http://cr.openjdk.java.net/~adinn/8078743/webrev.04/
>
> and requires that previous patch to be applied first.
>
> Testing
> -------
>
> The patch is sensitive to GC configuration so it was tested against 5
> relevant configs
>
>    G1
>   CMS+UseCondCardMark
>   CMS-UseCondCardMark
>   Par+UseCondCardMark
>   Par-UseCondCardMark
>
> The validity of the transformation was verified by:
>
>    generating and eyeballing compiled code for simple test programs
>    successfully running a fairly large program (netbeans)
>    generating and eyeballing HashMap code compiled on a fairly large
> program run
>
> The fix was performance tested on 2 implementations of the AArch64
> architecture (more details below). On an O-O-O CPU it gave no noticeable
> benefit. On a simple pipeline CPU it gave a very significant benefit in
> specific cases.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)
>
> The Test
> --------
>
> As with the prior patch I tested the original vs new code generation
> strategy by running a jmh test first with -XX:+UseBarriersForVolatile
> and then with -XX:+UseBarriersForVolatile. Four different test programs
> ran in all 5 GC configs executing. Each test executed repeated CAS
> operations to an object field in a single thread with a BlackHole
> backoff between CASes varying from 0 to 64.
>
> Test one performed a CAS guaranteed to fail; test two performed a
> successful CAS from a fixed object to null and then back; test three
> performed a successful CAS from a fixed object to another fixed object
> and then back; test four performed a successful CAS from a fixed
> object to a newly allocated object and then back. The average time per
> CAS operation (ns/op) -- actually per 2 CAS operations for the latter 3
> tests -- was used as a score.
>
> The Results
> -----------
>
> On an O-O-O CPU there was no significant difference in the time taken.
>
> On a simple pipeline CPU the optimization gave a very significant
> benefit for the Fail tests on all GC configurations except CMS
> + UseCondCardMark. In all other cases there was no significant
> measurable benefit.
>
> Example Test
> ------------
>
> package org.openjdk;
>
> import org.openjdk.jmh.annotations.*;
> import org.openjdk.jmh.infra.Blackhole;
>
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicReference;
>
> @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
> @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
> @Fork(3)
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @State(Scope.Benchmark)
> public class CasNull {
>
>      Object tombstone;
>
>      AtomicReference<Object> ref;
>
>      @Param({"0", "1", "2", "4", "8", "16", "32", "64"})
>      int backoff;
>
>      @Setup
>      public void setup() {
>          tombstone = new Object();
>
>          ref = new AtomicReference<>();
>          ref.set(tombstone);
>      }
>
>      @Benchmark
>      public boolean test() {
>          Blackhole.consumeCPU(backoff);
>          ref.compareAndSet(tombstone, null);
>          ref.compareAndSet(null, tombstone);
> 	return true;
>      }
> }
>

From adinn at redhat.com  Tue Aug 25 08:20:47 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Tue, 25 Aug 2015 09:20:47 +0100
Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary
 dmbs from generated CAS code
In-Reply-To: <55DBB5BD.4040400@oracle.com>
References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com>
Message-ID: <55DC255F.7070903@redhat.com>

Hi Vladimir,

Thank you very much for your review.

On 25/08/15 01:24, Vladimir Kozlov wrote:
> I did not look deep on code logic.
> Few comments only:
> 
> Use {} for all conditional code (it cause a lot of pain in the past):
> 
>       if (is_cas)
>         return NULL;

Ok, I'll correct all of those.

> You don't need #ifndef PRODUCT:
> 
> +#ifndef PRODUCT
> +#ifdef ASSERT

Ah, that's good to know. Thanks.

> New mach instructs missing predicate:
> 
>  predicate(needs_acquiring_load_exclusive(n));
> 
> You use higher ins_cost to avoid their generation when predicate is
> false.  So why not explicit predicate?

I had two separate reasons for not repeating the predicates:

  1 They do quite a lot of work crawling the graph. So, calling the
predicate in the lower cost case and omitting it in the higher cost case
attempts to avoid the expense of executing it twice in some cases.

  2 The ins_cost for the lower and higher cost cases is meant to reflect
a difference in the expected execution cost associated with the instruction.

[n.b. I adopted the same strategy for the new membar generation rules
which were added as part of the volatile put optimization patch --
compare membar_release and unnecessary_membar_release]

However, looking again at the code I believe I have the costs (and hence
the predicates) attached to the wrong rules in each pair. For example,
currently the rules include the following details

  compareAndSwapIAcq -- does not emit dmb instructions
    no predicate
    cost  (2 * VOLATILE_REF_COST )

  compareAndSwapI -- emits dmb instructions
    predicate(!needs_acquiring_load_exclusive(n))
    cost VOLATILE_REF_COST

The patch presumes that the first rule will have a lower (or at least no
higher) cost than the second. So a correct version would be either,
calling the predicate once:

A:

  compareAndSwapIAcq -- does not emit dmb instructions
    predicate(needs_acquiring_load_exclusive(n))
    cost VOLATILE_REF_COST

  compareAndSwapI-- emits dmb instructions
    no predicate
    cost (2 * VOLATILE_REF_COST)

or, calling the predicate twice:

B:

  compareAndSwapIAcq -- does not emit dmb instructions
    predicate(needs_acquiring_load_exclusive(n))
    cost VOLATILE_REF_COST

  compareAndSwapI-- emits dmb instructions
    predicate(!needs_acquiring_load_exclusive(n))
    cost (2 * VOLATILE_REF_COST)

I would prefer to retain A over B. A is guaranteed to give the same
result as B with potentially less execution overhead. However, if you
feel B is required I will adopt that format for all rules.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)


From adinn at redhat.com  Tue Aug 25 12:50:43 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Tue, 25 Aug 2015 13:50:43 +0100
Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary
 dmbs from generated CAS code
In-Reply-To: <55DC255F.7070903@redhat.com>
References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com>
	<55DC255F.7070903@redhat.com>
Message-ID: <55DC64A3.9000907@redhat.com>

A webrev in the light of Vladimir's feedback has been uploaded at the
following URL

  http://cr.openjdk.java.net/~adinn/8080293/webrev.01/

With this version I have made the changes Vladimir requested except for
the one detail I queried in my previous response. I have only called the
needs_acquiring_load_exclusive predicate in one of each of the pairs of
CompareAndSwapX rules. However, I have corrected the rule costs so that
they actually reflect the expected cost of the alternative generated
code segments (and adjusted the location+sense of the predicate
expression accordingly).

[Vladimir, if you really require the predicate to be called in both
rules I will prepare another webrev]

I still need another reviewer and an hs-comp committer to look at this.
Could someone from the aarch64-port-dev group (or maybe Alexey Shipilev)
please provide a second review and agree to sponsor the patch?

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)


On 25/08/15 09:20, Andrew Dinn wrote:
> Hi Vladimir,
> 
> Thank you very much for your review.
> 
> On 25/08/15 01:24, Vladimir Kozlov wrote:
>> I did not look deep on code logic.
>> Few comments only:
>>
>> Use {} for all conditional code (it cause a lot of pain in the past):
>>
>>       if (is_cas)
>>         return NULL;
> 
> Ok, I'll correct all of those.
> 
>> You don't need #ifndef PRODUCT:
>>
>> +#ifndef PRODUCT
>> +#ifdef ASSERT
> 
> Ah, that's good to know. Thanks.
> 
>> New mach instructs missing predicate:
>>
>>  predicate(needs_acquiring_load_exclusive(n));
>>
>> You use higher ins_cost to avoid their generation when predicate is
>> false.  So why not explicit predicate?
> 
> I had two separate reasons for not repeating the predicates:
> 
>   1 They do quite a lot of work crawling the graph. So, calling the
> predicate in the lower cost case and omitting it in the higher cost case
> attempts to avoid the expense of executing it twice in some cases.
> 
>   2 The ins_cost for the lower and higher cost cases is meant to reflect
> a difference in the expected execution cost associated with the instruction.
> 
> [n.b. I adopted the same strategy for the new membar generation rules
> which were added as part of the volatile put optimization patch --
> compare membar_release and unnecessary_membar_release]
> 
> However, looking again at the code I believe I have the costs (and hence
> the predicates) attached to the wrong rules in each pair. For example,
> currently the rules include the following details
> 
>   compareAndSwapIAcq -- does not emit dmb instructions
>     no predicate
>     cost  (2 * VOLATILE_REF_COST )
> 
>   compareAndSwapI -- emits dmb instructions
>     predicate(!needs_acquiring_load_exclusive(n))
>     cost VOLATILE_REF_COST
> 
> The patch presumes that the first rule will have a lower (or at least no
> higher) cost than the second. So a correct version would be either,
> calling the predicate once:
> 
> A:
> 
>   compareAndSwapIAcq -- does not emit dmb instructions
>     predicate(needs_acquiring_load_exclusive(n))
>     cost VOLATILE_REF_COST
> 
>   compareAndSwapI-- emits dmb instructions
>     no predicate
>     cost (2 * VOLATILE_REF_COST)
> 
> or, calling the predicate twice:
> 
> B:
> 
>   compareAndSwapIAcq -- does not emit dmb instructions
>     predicate(needs_acquiring_load_exclusive(n))
>     cost VOLATILE_REF_COST
> 
>   compareAndSwapI-- emits dmb instructions
>     predicate(!needs_acquiring_load_exclusive(n))
>     cost (2 * VOLATILE_REF_COST)
> 
> I would prefer to retain A over B. A is guaranteed to give the same
> result as B with potentially less execution overhead. However, if you
> feel B is required I will adopt that format for all rules.
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)
> 
> 

From vladimir.kozlov at oracle.com  Tue Aug 25 17:14:19 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 25 Aug 2015 10:14:19 -0700
Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary
 dmbs from generated CAS code
In-Reply-To: <55DC255F.7070903@redhat.com>
References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com>
	<55DC255F.7070903@redhat.com>
Message-ID: <55DCA26B.3020508@oracle.com>

Okay, I agree to have only one predicate. So I am fine with version A).

Thanks,
Vladimir

PS: "first rule will have a lower" - should compareAndSwapI be first then?

 > However, looking again at the code I believe I have the costs (and hence
 > the predicates) attached to the wrong rules in each pair. For example,
 > currently the rules include the following details
 >
 >    compareAndSwapIAcq -- does not emit dmb instructions
 >      no predicate
 >      cost  (2 * VOLATILE_REF_COST )
 >
 >    compareAndSwapI -- emits dmb instructions
 >      predicate(!needs_acquiring_load_exclusive(n))
 >      cost VOLATILE_REF_COST
 >
 > The patch presumes that the first rule will have a lower (or at least no
 > higher) cost than the second. So a correct version would be either,
 > calling the predicate once:


On 8/25/15 1:20 AM, Andrew Dinn wrote:
> Hi Vladimir,
>
> Thank you very much for your review.
>
> On 25/08/15 01:24, Vladimir Kozlov wrote:
>> I did not look deep on code logic.
>> Few comments only:
>>
>> Use {} for all conditional code (it cause a lot of pain in the past):
>>
>>        if (is_cas)
>>          return NULL;
>
> Ok, I'll correct all of those.
>
>> You don't need #ifndef PRODUCT:
>>
>> +#ifndef PRODUCT
>> +#ifdef ASSERT
>
> Ah, that's good to know. Thanks.
>
>> New mach instructs missing predicate:
>>
>>   predicate(needs_acquiring_load_exclusive(n));
>>
>> You use higher ins_cost to avoid their generation when predicate is
>> false.  So why not explicit predicate?
>
> I had two separate reasons for not repeating the predicates:
>
>    1 They do quite a lot of work crawling the graph. So, calling the
> predicate in the lower cost case and omitting it in the higher cost case
> attempts to avoid the expense of executing it twice in some cases.
>
>    2 The ins_cost for the lower and higher cost cases is meant to reflect
> a difference in the expected execution cost associated with the instruction.
>
> [n.b. I adopted the same strategy for the new membar generation rules
> which were added as part of the volatile put optimization patch --
> compare membar_release and unnecessary_membar_release]
>
> However, looking again at the code I believe I have the costs (and hence
> the predicates) attached to the wrong rules in each pair. For example,
> currently the rules include the following details
>
>    compareAndSwapIAcq -- does not emit dmb instructions
>      no predicate
>      cost  (2 * VOLATILE_REF_COST )
>
>    compareAndSwapI -- emits dmb instructions
>      predicate(!needs_acquiring_load_exclusive(n))
>      cost VOLATILE_REF_COST
>
> The patch presumes that the first rule will have a lower (or at least no
> higher) cost than the second. So a correct version would be either,
> calling the predicate once:
>
> A:
>
>    compareAndSwapIAcq -- does not emit dmb instructions
>      predicate(needs_acquiring_load_exclusive(n))
>      cost VOLATILE_REF_COST
>
>    compareAndSwapI-- emits dmb instructions
>      no predicate
>      cost (2 * VOLATILE_REF_COST)
>
> or, calling the predicate twice:
>
> B:
>
>    compareAndSwapIAcq -- does not emit dmb instructions
>      predicate(needs_acquiring_load_exclusive(n))
>      cost VOLATILE_REF_COST
>
>    compareAndSwapI-- emits dmb instructions
>      predicate(!needs_acquiring_load_exclusive(n))
>      cost (2 * VOLATILE_REF_COST)
>
> I would prefer to retain A over B. A is guaranteed to give the same
> result as B with potentially less execution overhead. However, if you
> feel B is required I will adopt that format for all rules.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)
>

From adinn at redhat.com  Wed Aug 26 09:35:55 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 26 Aug 2015 10:35:55 +0100
Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary
 dmbs from generated CAS code
In-Reply-To: <55DCA26B.3020508@oracle.com>
References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com>
	<55DC255F.7070903@redhat.com> <55DCA26B.3020508@oracle.com>
Message-ID: <55DD887B.5090403@redhat.com>

On 25/08/15 18:14, Vladimir Kozlov wrote:
> Okay, I agree to have only one predicate. So I am fine with version A).

Thanks, Vladimir. So, that is now as provided in the latest posted webrev:

  http://cr.openjdk.java.net/~adinn/8080293/webrev.01/

> PS: "first rule will have a lower" - should compareAndSwapI be first then?

Sorry, I think the problem here is that I explained the status of the
original patch in a rather confusing way. I am not sure it matters all
that much which rule appears first. Or do you really want the lower cost
rule to appear before the higher cost one? . . .

>> However, looking again at the code I believe I have the costs (and hence
>> the predicates) attached to the wrong rules in each pair. For example,
>> currently the rules include the following details
>>
>>    compareAndSwapIAcq -- does not emit dmb instructions
>>      no predicate
>>      cost  (2 * VOLATILE_REF_COST )
>>
>>    compareAndSwapI -- emits dmb instructions
>>      predicate(!needs_acquiring_load_exclusive(n))
>>      cost VOLATILE_REF_COST


. . . what I meant by that comment was that this:

  - The optimization implemented in this patch is based on an assumption
that a generation strategy using dmb -- i.e. the one encoded by
compareAndSwapI -- will execute more slowly, or at least no faster, than
a generation strategy using stlr -- i.e. the one encoded by
compareAndSwapIAcq.

  - The text above displays the original costs and predicates used to
enforce the required rule selection.

  - In that version the /costs/ are the wrong way round with respect to
the /motivating assumption/ i.e. compareAndSwapI has a lower cost than
compareAndSwapIAcq.

In version A the costs reflect the motivating assumption  i.e. for each
X in {I, L, P, N} rule compareAndSwapXAcq has a lower cost than
compareAndSwapX.

However, it is also true that for each X in {I, L, P, N} rule
compareAndSwapX appears earlier than compareAndSwapXAcq.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

From aph at redhat.com  Wed Aug 26 13:04:09 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 26 Aug 2015 14:04:09 +0100
Subject: [aarch64-port-dev ] HotSpot debugger for JDK8
Message-ID: <55DDB949.3010404@redhat.com>

http://cr.openjdk.java.net/~aph/jdk8-hsdb/

This is a fairly straightforward backport from JDK9.

Andrew.

From vladimir.kozlov at oracle.com  Wed Aug 26 15:41:14 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 26 Aug 2015 08:41:14 -0700
Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary
 dmbs from generated CAS code
In-Reply-To: <55DD887B.5090403@redhat.com>
References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com>
	<55DC255F.7070903@redhat.com> <55DCA26B.3020508@oracle.com>
	<55DD887B.5090403@redhat.com>
Message-ID: <55DDDE1A.7030008@oracle.com>

Looks good. Thank you for clarification.

Vladimir

On 8/26/15 2:35 AM, Andrew Dinn wrote:
> On 25/08/15 18:14, Vladimir Kozlov wrote:
>> Okay, I agree to have only one predicate. So I am fine with version A).
>
> Thanks, Vladimir. So, that is now as provided in the latest posted webrev:
>
>    http://cr.openjdk.java.net/~adinn/8080293/webrev.01/
>
>> PS: "first rule will have a lower" - should compareAndSwapI be first then?
>
> Sorry, I think the problem here is that I explained the status of the
> original patch in a rather confusing way. I am not sure it matters all
> that much which rule appears first. Or do you really want the lower cost
> rule to appear before the higher cost one? . . .
>
>>> However, looking again at the code I believe I have the costs (and hence
>>> the predicates) attached to the wrong rules in each pair. For example,
>>> currently the rules include the following details
>>>
>>>     compareAndSwapIAcq -- does not emit dmb instructions
>>>       no predicate
>>>       cost  (2 * VOLATILE_REF_COST )
>>>
>>>     compareAndSwapI -- emits dmb instructions
>>>       predicate(!needs_acquiring_load_exclusive(n))
>>>       cost VOLATILE_REF_COST
>
>
> . . . what I meant by that comment was that this:
>
>    - The optimization implemented in this patch is based on an assumption
> that a generation strategy using dmb -- i.e. the one encoded by
> compareAndSwapI -- will execute more slowly, or at least no faster, than
> a generation strategy using stlr -- i.e. the one encoded by
> compareAndSwapIAcq.
>
>    - The text above displays the original costs and predicates used to
> enforce the required rule selection.
>
>    - In that version the /costs/ are the wrong way round with respect to
> the /motivating assumption/ i.e. compareAndSwapI has a lower cost than
> compareAndSwapIAcq.
>
> In version A the costs reflect the motivating assumption  i.e. for each
> X in {I, L, P, N} rule compareAndSwapXAcq has a lower cost than
> compareAndSwapX.
>
> However, it is also true that for each X in {I, L, P, N} rule
> compareAndSwapX appears earlier than compareAndSwapXAcq.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in UK and Wales under Company Registration No. 3798903
> Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
> (USA), Michael O'Neill (Ireland)
>

From adinn at redhat.com  Wed Aug 26 16:32:58 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 26 Aug 2015 17:32:58 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors in C2
 biased locking implementation
In-Reply-To: <55DB3754.9090504@redhat.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>
	<55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com>
Message-ID: <55DDEA3A.8070603@redhat.com>

The following AArch64-only webrev against hs-comp (fix contributed by
Hui Sha of Linaro) fixes several problems with biased locking on AArch64

  http://cr.openjdk.java.net/~adinn/8134322/webrev.00/

I have reviewed the patch. It requires one more AArch64 reviewer who can
also commit it to hs-comp (Andrew Haley?).

When I tested it running netbeans with biased locking enabled it seemed
(just by feel) to improve performance. A follow up to enable biased
locking by default might be worth investigating.

n.b. I built this on hs-comp on top of my (almost but not yet committed)
patch for 8080293 so that patch also gets shown in the webrev. The two
patches are independent and should both commit cleanly with or without
the other.


From aph at redhat.com  Wed Aug 26 16:40:34 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 26 Aug 2015 17:40:34 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors
 in C2 biased locking implementation
In-Reply-To: <55DDEA3A.8070603@redhat.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>	<55D79701.9090407@oracle.com>
	<55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com>
Message-ID: <55DDEC02.6080502@redhat.com>

On 08/26/2015 05:32 PM, Andrew Dinn wrote:
> The following AArch64-only webrev against hs-comp (fix contributed by
> Hui Sha of Linaro) fixes several problems with biased locking on AArch64
> 
>   http://cr.openjdk.java.net/~adinn/8134322/webrev.00/
> 
> I have reviewed the patch. It requires one more AArch64 reviewer who can
> also commit it to hs-comp (Andrew Haley?).

This is OK.

Andrew.


From aph at redhat.com  Thu Aug 27 15:19:15 2015
From: aph at redhat.com (Andrew Haley)
Date: Thu, 27 Aug 2015 16:19:15 +0100
Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues
In-Reply-To: <55D79701.9090407@oracle.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>
	<55D79701.9090407@oracle.com>
Message-ID: <55DF2A73.2060207@redhat.com>

On 08/21/2015 10:24 PM, Vladimir Kozlov wrote:
> Thank you for report and suggested fixes.
> 
> CC to aarch64 port developers.

May I push this, or do I need you to review it?

Andrew.


From aph at redhat.com  Thu Aug 27 17:06:11 2015
From: aph at redhat.com (aph at redhat.com)
Date: Thu, 27 Aug 2015 17:06:11 +0000
Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: 8078521:
	AARCH64: Add AArch64 SA	support
Message-ID: <201508271706.t7RH6BhB028427@aojmv0008.oracle.com>

Changeset: 1c4ef82d32d1
Author:    aph
Date:      2015-08-20 09:10 +0000
URL:       http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/1c4ef82d32d1

8078521: AARCH64: Add AArch64 SA support
Summary: Add AArch64 SA support

! agent/make/Makefile
! agent/src/os/linux/LinuxDebuggerLocal.c
! agent/src/os/linux/Makefile
! agent/src/share/classes/sun/jvm/hotspot/HSDB.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/MachineDescriptionAARCH64.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/aarch64/AARCH64ThreadContext.java
! agent/src/share/classes/sun/jvm/hotspot/debugger/linux/LinuxCDebugger.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/linux/aarch64/LinuxAARCH64CFrame.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/linux/aarch64/LinuxAARCH64ThreadContext.java
! agent/src/share/classes/sun/jvm/hotspot/debugger/proc/ProcDebuggerLocal.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/proc/aarch64/ProcAARCH64Thread.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/proc/aarch64/ProcAARCH64ThreadContext.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/proc/aarch64/ProcAARCH64ThreadFactory.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/remote/aarch64/RemoteAARCH64Thread.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/remote/aarch64/RemoteAARCH64ThreadContext.java
+ agent/src/share/classes/sun/jvm/hotspot/debugger/remote/aarch64/RemoteAARCH64ThreadFactory.java
! agent/src/share/classes/sun/jvm/hotspot/runtime/Threads.java
+ agent/src/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64CurrentFrameGuess.java
+ agent/src/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java
+ agent/src/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64JavaCallWrapper.java
+ agent/src/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64RegisterMap.java
+ agent/src/share/classes/sun/jvm/hotspot/runtime/linux_aarch64/LinuxAARCH64JavaThreadPDAccess.java
! agent/src/share/classes/sun/jvm/hotspot/utilities/PlatformInfo.java
! make/linux/makefiles/defs.make
! make/linux/makefiles/sa.make
! make/linux/makefiles/saproc.make
! make/sa.files


From vladimir.kozlov at oracle.com  Fri Aug 28 02:58:04 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Aug 2015 19:58:04 -0700
Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors
 in C2 biased locking implementation
In-Reply-To: <55DDEA3A.8070603@redhat.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>
	<55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com>
	<55DDEA3A.8070603@redhat.com>
Message-ID: <55DFCE3C.3040201@oracle.com>

Looks good to me.

Vladimir

On 8/26/15 9:32 AM, Andrew Dinn wrote:
> The following AArch64-only webrev against hs-comp (fix contributed by
> Hui Sha of Linaro) fixes several problems with biased locking on AArch64
>
>    http://cr.openjdk.java.net/~adinn/8134322/webrev.00/
>
> I have reviewed the patch. It requires one more AArch64 reviewer who can
> also commit it to hs-comp (Andrew Haley?).
>
> When I tested it running netbeans with biased locking enabled it seemed
> (just by feel) to improve performance. A follow up to enable biased
> locking by default might be worth investigating.
>
> n.b. I built this on hs-comp on top of my (almost but not yet committed)
> patch for 8080293 so that patch also gets shown in the webrev. The two
> patches are independent and should both commit cleanly with or without
> the other.
>
>
>
>
>

From vladimir.kozlov at oracle.com  Fri Aug 28 03:00:56 2015
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 27 Aug 2015 20:00:56 -0700
Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues
In-Reply-To: <55DF2A73.2060207@redhat.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>
	<55D79701.9090407@oracle.com> <55DF2A73.2060207@redhat.com>
Message-ID: <55DFCEE8.3060806@oracle.com>

Reviewed. Push.

We need at least one official *Reviewer* for all changesets.

Thanks,
Vladimir

On 8/27/15 8:19 AM, Andrew Haley wrote:
> On 08/21/2015 10:24 PM, Vladimir Kozlov wrote:
>> Thank you for report and suggested fixes.
>>
>> CC to aarch64 port developers.
>
> May I push this, or do I need you to review it?
>
> Andrew.
>

From aph at redhat.com  Fri Aug 28 15:21:52 2015
From: aph at redhat.com (Andrew Haley)
Date: Fri, 28 Aug 2015 16:21:52 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors
 in C2 biased locking implementation
In-Reply-To: <55DDEA3A.8070603@redhat.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>	<55D79701.9090407@oracle.com>
	<55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com>
Message-ID: <55E07C90.6000905@redhat.com>

On 08/26/2015 05:32 PM, Andrew Dinn wrote:
> n.b. I built this on hs-comp on top of my (almost but not yet committed)
> patch for 8080293 so that patch also gets shown in the webrev. The two
> patches are independent and should both commit cleanly with or without
> the other.

Sure, but I have no way to get the changesets.  All that is
in the webrev is one patch.

Please send me the changesets from "hg export"

Andrew.


From adinn at redhat.com  Sat Aug 29 08:53:21 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Sat, 29 Aug 2015 09:53:21 +0100
Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors
 in C2 biased locking implementation
In-Reply-To: <55E07C90.6000905@redhat.com>
References: <CAF1YaiD9wBDhbrdv8T2nAbegJ-tOoEyeAqM6pMCnOy6vEGHK1A@mail.gmail.com>
	<55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com>
	<55DDEA3A.8070603@redhat.com> <55E07C90.6000905@redhat.com>
Message-ID: <55E17301.2050406@redhat.com>

On 28/08/15 16:21, Andrew Haley wrote:
> On 08/26/2015 05:32 PM, Andrew Dinn wrote:
>> n.b. I built this on hs-comp on top of my (almost but not yet committed)
>> patch for 8080293 so that patch also gets shown in the webrev. The two
>> patches are independent and should both commit cleanly with or without
>> the other.
> 
> Sure, but I have no way to get the changesets.  All that is
> in the webrev is one patch.
> 
> Please send me the changesets from "hg export"

change set below


regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)

----- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< ---
# HG changeset patch
# User adinn
# Date 1440605639 -3600
#      Wed Aug 26 17:13:59 2015 +0100
# Node ID e699c7a83f8e9630f79ef83066a64d1e38c72ce2
# Parent  d7185fbee7e5e0f40ae4286f29055bed5da6bd58
8134322: AArch64: Fix several errors in C2 biased locking implementation
Summary: Several errors in C2 biased locking require fixing
Reviewed-by: adinn
Contributed-by: Hui Shi (hui.shi at linaro.org)

diff -r d7185fbee7e5 -r e699c7a83f8e src/cpu/aarch64/vm/aarch64.ad
--- a/src/cpu/aarch64/vm/aarch64.ad	Tue Aug 25 08:25:38 2015 -0400
+++ b/src/cpu/aarch64/vm/aarch64.ad	Wed Aug 26 17:13:59 2015 +0100
@@ -4896,12 +4896,12 @@
       return;
     }

-    if (UseBiasedLocking) {
-      __ biased_locking_enter(disp_hdr, oop, box, tmp, true, cont);
+    if (UseBiasedLocking && !UseOptoBiasInlining) {
+      __ biased_locking_enter(box, oop, disp_hdr, tmp, true, cont);
     }

     // Handle existing monitor
-    if (EmitSync & 0x02) {
+    if ((EmitSync & 0x02) == 0) {
       // we can use AArch64's bit test and branch here but
       // markoopDesc does not define a bit index just the bit value
       // so assert in case the bit pos changes
@@ -5041,7 +5041,7 @@
       return;
     }

-    if (UseBiasedLocking) {
+    if (UseBiasedLocking && !UseOptoBiasInlining) {
       __ biased_locking_exit(oop, tmp, cont);
     }
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fastlock.patch
Type: text/x-patch
Size: 1330 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/aarch64-port-dev/attachments/20150829/dffcf114/fastlock.patch>

From felix.yang at linaro.org  Sat Aug 29 13:13:27 2015
From: felix.yang at linaro.org (Felix Yang)
Date: Sat, 29 Aug 2015 21:13:27 +0800
Subject: [aarch64-port-dev ] RFR: Disable C2 peephole by default for aarch64
Message-ID: <CACc5Y6SZJgrbb9XDZawvZOXQRJWF9gYcbgrOvtG9+f1vs+DvdQ@mail.gmail.com>

Hi JIT members,

  Currently, the C2 peephole optimization is only enabled by default for
x86 & aarch64 port.
  But we don't have any peephole rules for aarch64 port, scanning the
instruction stream
  in PhasePeephole::do_transform does not make sense but a waste of time
for this port.
  So I am disabling this pass for aarch64 port by default, can anyone
sponsor this if approved?

PATCH:
diff -r a6acc533dfef src/cpu/aarch64/vm/c2_globals_aarch64.hpp
--- a/src/cpu/aarch64/vm/c2_globals_aarch64.hpp Wed Aug 19 16:16:54 2015
+0100
+++ b/src/cpu/aarch64/vm/c2_globals_aarch64.hpp Fri Aug 28 09:28:07 2015
+0800
@@ -69,7 +69,7 @@

 // Peephole and CISC spilling both break the graph, and so makes the
 // scheduler sick.
-define_pd_global(bool, OptoPeephole,                 true);
+define_pd_global(bool, OptoPeephole,                 false);
 define_pd_global(bool, UseCISCSpill,                 true);
 define_pd_global(bool, OptoScheduling,               false);
 define_pd_global(bool, OptoBundling,                 false);