From adinn at redhat.com Mon Aug 3 09:28:48 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 03 Aug 2015 10:28:48 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 Message-ID: <55BF3450.5020008@redhat.com> The following /AArch64-only/ webrev fixes some problems introduced into the AArch64 codecache routines by the recent fix for JDK-8130309 committed to to hs-comp http://cr.openjdk.java.net/~adinn/8132875/webrev.00/ With this patch the hs-comp tree compiles and runs correctly on AArch64. Reviews welcome. regards, Andrew Dinn ----------- From aph at redhat.com Mon Aug 3 10:05:12 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 03 Aug 2015 11:05:12 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF3450.5020008@redhat.com> References: <55BF3450.5020008@redhat.com> Message-ID: <55BF3CD8.6020905@redhat.com> On 03/08/15 10:28, Andrew Dinn wrote: > With this patch the hs-comp tree compiles and runs correctly on AArch64. > Reviews welcome. That looks right to me. Thanks, Andrew. From adinn at redhat.com Mon Aug 3 11:03:13 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 03 Aug 2015 12:03:13 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF3CD8.6020905@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> Message-ID: <55BF4A71.2040308@redhat.com> On 03/08/15 11:05, Andrew Haley wrote: > On 03/08/15 10:28, Andrew Dinn wrote: >> With this patch the hs-comp tree compiles and runs correctly on AArch64. >> Reviews welcome. > > That looks right to me. Thanks for the review. Could someone from the compiler team with the relevant access right also please review and then sponsor this patch for inclusion into hs-comp? It would be good to get this fix into that repo before the original patch goes up into jdk9. Thanks. regards, Andrew Dinn ----------- From tobias.hartmann at oracle.com Mon Aug 3 11:35:01 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 03 Aug 2015 13:35:01 +0200 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF4A71.2040308@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> Message-ID: <55BF51E5.5090708@oracle.com> Hi Andrew, thanks for fixing that! Seems like I forgot the manual aarch64 testing for my latest webrev.. The changes look good. I can sponsor and push them into hs-comp after an official reviewer approved them. Best, Tobias On 03.08.2015 13:03, Andrew Dinn wrote: > On 03/08/15 11:05, Andrew Haley wrote: >> On 03/08/15 10:28, Andrew Dinn wrote: >>> With this patch the hs-comp tree compiles and runs correctly on AArch64. >>> Reviews welcome. >> >> That looks right to me. > > Thanks for the review. > > Could someone from the compiler team with the relevant access right also > please review and then sponsor this patch for inclusion into hs-comp? It > would be good to get this fix into that repo before the original patch > goes up into jdk9. Thanks. > > regards, > > > Andrew Dinn > ----------- > From adinn at redhat.com Mon Aug 3 11:42:02 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 03 Aug 2015 12:42:02 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF51E5.5090708@oracle.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> Message-ID: <55BF538A.9080409@redhat.com> Hi Tobias, On 03/08/15 12:35, Tobias Hartmann wrote: > thanks for fixing that! Seems like I forgot the manual aarch64 > testing for my latest webrev.. > > The changes look good. I can sponsor and push them into hs-comp after > an official reviewer approved them. Thanks, Tobias. Do we need another reviewer for an AArch64-only change? If so then could someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on holiday so we don't have another AArch64 port dev to review? Thanks! regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From tobias.hartmann at oracle.com Mon Aug 3 13:20:46 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 03 Aug 2015 15:20:46 +0200 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF538A.9080409@redhat.com> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> <55BF538A.9080409@redhat.com> Message-ID: <55BF6AAE.2030101@oracle.com> On 03.08.2015 13:42, Andrew Dinn wrote: > Hi Tobias, > > On 03/08/15 12:35, Tobias Hartmann wrote: >> thanks for fixing that! Seems like I forgot the manual aarch64 >> testing for my latest webrev.. >> >> The changes look good. I can sponsor and push them into hs-comp after >> an official reviewer approved them. > > Thanks, Tobias. > > Do we need another reviewer for an AArch64-only change? If so then could > someone from hs-comp (or a hotspot dev) volunteer -- Ed Nevill is on > holiday so we don't have another AArch64 port dev to review? I think we need at least one JDK 9 reviewer (I'm not an official reviewer). Best, Tobias > > Thanks! > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From vladimir.kozlov at oracle.com Mon Aug 3 16:07:03 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Aug 2015 09:07:03 -0700 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF3450.5020008@redhat.com> References: <55BF3450.5020008@redhat.com> Message-ID: <55BF91A7.7030608@oracle.com> Looks good. Thanks, Vladimir On 8/3/15 2:28 AM, Andrew Dinn wrote: > The following /AArch64-only/ webrev fixes some problems introduced into > the AArch64 codecache routines by the recent fix for JDK-8130309 > committed to to hs-comp > > http://cr.openjdk.java.net/~adinn/8132875/webrev.00/ > > With this patch the hs-comp tree compiles and runs correctly on AArch64. > Reviews welcome. > > regards, > > > Andrew Dinn > ----------- > From adinn at redhat.com Wed Aug 5 19:55:26 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 05 Aug 2015 20:55:26 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <55BF91A7.7030608@oracle.com> References: <55BF3450.5020008@redhat.com> <55BF91A7.7030608@oracle.com> Message-ID: <55C26A2E.1060902@redhat.com> On 03/08/15 17:07, Vladimir Kozlov wrote: > Looks good. Thanks for the review Vladimir (and apologies for the delay in replying -- I was traveling for a meeting). regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From adinn at redhat.com Thu Aug 6 14:04:49 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 06 Aug 2015 15:04:49 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: Fix error introduced into AArch64 CodeCache by commit for 8130309 In-Reply-To: <20150806082441.GK12948@rbackman> References: <55BF3450.5020008@redhat.com> <55BF3CD8.6020905@redhat.com> <55BF4A71.2040308@redhat.com> <55BF51E5.5090708@oracle.com> <55BF538A.9080409@redhat.com> <20150806082441.GK12948@rbackman> Message-ID: <55C36981.5070001@redhat.com> On 06/08/15 09:24, Rickard B?ckman wrote: > Looks good. Thanks, Rickard! regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From aleksey.shipilev at oracle.com Mon Aug 10 08:17:08 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Aug 2015 11:17:08 +0300 Subject: [aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55B8FFA0.4070105@oracle.com> References: <55A9033C.2030302@oracle.com> <55AA0567.6070602@oracle.com> <55AD0AE0.3060803@oracle.com> <55AEAB5C.8050307@oracle.com> <55AF500B.9000505@oracle.com> <55B20A16.7020300@oracle.com> <4295855A5C1DE049A61835A1887419CC2D00A472@DEWDFEMB12A.global.corp.sap> <55B5F651.5090509@oracle.com> <55B6063B.3070604@redhat.com> <55B8DC9C.7010003@oracle.com> <55B8FFA0.4070105@oracle.com> Message-ID: <55C85E04.7060002@oracle.com> On 07/29/2015 07:30 PM, Dean Long wrote: > On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>> >>>> Andrew/Edward, are you OK with AArch64 part? >>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>> I agree that it looks good. >> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >> Andrew Haley. Still no Capital (R)eviewers. >> >> Otherwise, I think we are good to go. I respinned the JPRT with >> open+closed sources, and it would seem the changes in closed sources are >> not required. > > The changes to sparc and ppc may not be required anymore. Excellent, please sponsor! http://cr.openjdk.java.net/~shade/8131682/8131682.changeset Thanks, -Aleksey From dean.long at oracle.com Mon Aug 10 19:42:48 2015 From: dean.long at oracle.com (Dean) Date: Mon, 10 Aug 2015 12:42:48 -0700 Subject: [aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere Message-ID: I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? dl Aleksey Shipilev wrote: >On 07/29/2015 07:30 PM, Dean Long wrote: >> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>> >>>>> Andrew/Edward, are you OK with AArch64 part? >>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>> I agree that it looks good. >>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>> Andrew Haley. Still no Capital (R)eviewers. >>> >>> Otherwise, I think we are good to go. I respinned the JPRT with >>> open+closed sources, and it would seem the changes in closed sources are >>> not required. >> >> The changes to sparc and ppc may not be required anymore. > >Excellent, please sponsor! > http://cr.openjdk.java.net/~shade/8131682/8131682.changeset > >Thanks, >-Aleksey > > From dean.long at oracle.com Mon Aug 10 19:57:03 2015 From: dean.long at oracle.com (Dean) Date: Mon, 10 Aug 2015 12:57:03 -0700 Subject: [aarch64-port-dev ] aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere Message-ID: Did you get a Reviewer yet? dl Dean wrote: >I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? > >dl > > >Aleksey Shipilev wrote: >>On 07/29/2015 07:30 PM, Dean Long wrote: >>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>>> >>>>>> Andrew/Edward, are you OK with AArch64 part? >>>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>>> I agree that it looks good. >>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>>> Andrew Haley. Still no Capital (R)eviewers. >>>> >>>> Otherwise, I think we are good to go. I respinned the JPRT with >>>> open+closed sources, and it would seem the changes in closed sources are >>>> not required. >>> >>> The changes to sparc and ppc may not be required anymore. >> >>Excellent, please sponsor! >> http://cr.openjdk.java.net/~shade/8131682/8131682.changeset >> >>Thanks, >>-Aleksey >> >> From aleksey.shipilev at oracle.com Tue Aug 11 09:27:38 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 11 Aug 2015 12:27:38 +0300 Subject: [aarch64-port-dev ] aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: References: Message-ID: <55C9C00A.3040302@oracle.com> Hi Dean, Ah yes, since we now use MacroAssembler::align to produce the effective alignment, we can drop the platform-specific changes. ARM and PPC ports may rewire their own MacroAssemblers if there are potentially better nop sequences. New changeset: http://cr.openjdk.java.net/~shade/8131682/8131682.changeset Tested it builds and runs with full JPRT. See the "Reviewed-by" line there. I think there are Reviewers there... Thanks, -Aleksey On 08/10/2015 10:57 PM, Dean wrote: > Did you get a Reviewer yet? > > dl > > > Dean wrote: >> I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? >> >> dl >> >> >> Aleksey Shipilev wrote: >>> On 07/29/2015 07:30 PM, Dean Long wrote: >>>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>>>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>>>> >>>>>>> Andrew/Edward, are you OK with AArch64 part? >>>>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>>>> I agree that it looks good. >>>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>>>> Andrew Haley. Still no Capital (R)eviewers. >>>>> >>>>> Otherwise, I think we are good to go. I respinned the JPRT with >>>>> open+closed sources, and it would seem the changes in closed sources are >>>>> not required. >>>> >>>> The changes to sparc and ppc may not be required anymore. >>> >>> Excellent, please sponsor! >>> http://cr.openjdk.java.net/~shade/8131682/8131682.changeset >>> >>> Thanks, >>> -Aleksey >>> >>> From edward.nevill at gmail.com Tue Aug 11 15:57:33 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 11 Aug 2015 16:57:33 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions Message-ID: <1439308653.5920.16.camel@mylittlepony.linaroharston> Hi, Webrev http://cr.openjdk.java.net/~enevill/8133352/ fixes an issue reported by one of our partners where aarch64 jdk9 is still generating constrained unpredictable instructions. The two cases being generates are STXR Rs, Rt, [Rn] where Rs == Rt and LDAXP Rt1, Rt2, [Rn] where Rt1 == Rt2 (this case is only generated in the assembler smoke test and is not generated in real code) On the particular vendors HW the behavior for these instructions is to generate a SIGILL. Unfortunately the fix for this is non trivial, the reason being that STXR Rs, Rt, [Rn] requires Rs != Rt != Rn however we only have 2 scratch registers. The solution I have adopted is to add an addition temp arg to the routines in macroAssembler and pass down an additional temp from the top level usage, but this involves a non trivial amount of code changes. The alternative solution would be create a temp by pushing a register on the stack. I am in the process of testing this but I want to get peoples feedback as to whether this is the right solution or whether there is some better way of doing it. Thanks for your help, Ed. From vladimir.kozlov at oracle.com Tue Aug 11 16:55:08 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 11 Aug 2015 09:55:08 -0700 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <1439308653.5920.16.camel@mylittlepony.linaroharston> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> Message-ID: <55CA28EC.7060109@oracle.com> I think it depends how expensive push/pop on arm64. In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in .ad). So you are saving on stack anyway. On other hand your changes (third temp) are not so big and I think acceptable. Thanks, Vladimir On 8/11/15 8:57 AM, Edward Nevill wrote: > Hi, > > Webrev http://cr.openjdk.java.net/~enevill/8133352/ > > fixes an issue reported by one of our partners where aarch64 jdk9 is still generating constrained unpredictable instructions. > > The two cases being generates are > > STXR Rs, Rt, [Rn] where Rs == Rt > > and > > LDAXP Rt1, Rt2, [Rn] where Rt1 == Rt2 (this case is only generated in the assembler smoke test and is not generated in real code) > > On the particular vendors HW the behavior for these instructions is to generate a SIGILL. > > Unfortunately the fix for this is non trivial, the reason being that > > STXR Rs, Rt, [Rn] > > requires Rs != Rt != Rn however we only have 2 scratch registers. > > The solution I have adopted is to add an addition temp arg to the routines in macroAssembler and pass down an additional temp from the top level usage, but this involves a non trivial amount of code changes. > > The alternative solution would be create a temp by pushing a register on the stack. > > I am in the process of testing this but I want to get peoples feedback as to whether this is the right solution or whether there is some better way of doing it. > > Thanks for your help, > Ed. > > From aph at redhat.com Wed Aug 12 16:10:00 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 12 Aug 2015 17:10:00 +0100 Subject: [aarch64-port-dev ] JDK 8: Remove code in C2 to generate ldar/stlr Message-ID: <55CB6FD8.1070203@redhat.com> This pulls out all the old (and probably bogus) code which uses acquire and release instructions. I'm hoping Andrew Dinn will follow up with a backport of his (much better) JDK9 code which requires no changes to the code in share/ . It cleanly passes jcstress. Andrew. -------------- next part -------------- # HG changeset patch # User aph # Date 1439395472 0 # Wed Aug 12 16:04:32 2015 +0000 # Node ID 8ec803e97a0d578eaeaf8375ee295a5928eb546f # Parent 4c3f7e682e4849e84211e7c27379900fd0571173 Remove code which uses load acquire and store release. Revert to plain old memory fences. diff -r 4c3f7e682e48 -r 8ec803e97a0d src/cpu/aarch64/vm/aarch64.ad --- a/src/cpu/aarch64/vm/aarch64.ad Fri Jul 31 16:29:03 2015 +0100 +++ b/src/cpu/aarch64/vm/aarch64.ad Wed Aug 12 16:04:32 2015 +0000 @@ -962,35 +962,10 @@ } }; - bool preceded_by_ordered_load(const Node *barrier); - %} source %{ - // AArch64 has load acquire and store release instructions which we - // use for ordered memory accesses, e.g. for volatiles. The ideal - // graph generator also inserts memory barriers around volatile - // accesses, and we don't want to generate both barriers and acq/rel - // instructions. So, when we emit a MemBarAcquire we look back in - // the ideal graph for an ordered load and only emit the barrier if - // we don't find one. - -bool preceded_by_ordered_load(const Node *barrier) { - Node *x = barrier->lookup(TypeFunc::Parms); - - if (! x) - return false; - - if (x->is_DecodeNarrowPtr()) - x = x->in(1); - - if (x->is_Load()) - return ! x->as_Load()->is_unordered(); - - return false; -} - #define __ _masm. // advance declaratuons for helper functions to convert register @@ -5620,7 +5595,7 @@ instruct loadB(iRegINoSp dst, memory mem) %{ match(Set dst (LoadB mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrsbw $dst, $mem\t# byte" %} @@ -5634,7 +5609,7 @@ instruct loadB2L(iRegLNoSp dst, memory mem) %{ match(Set dst (ConvI2L (LoadB mem))); - predicate(n->in(1)->as_Load()->is_unordered()); + // predicate(n->in(1)->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrsb $dst, $mem\t# byte" %} @@ -5648,7 +5623,7 @@ instruct loadUB(iRegINoSp dst, memory mem) %{ match(Set dst (LoadUB mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrbw $dst, $mem\t# byte" %} @@ -5662,7 +5637,7 @@ instruct loadUB2L(iRegLNoSp dst, memory mem) %{ match(Set dst (ConvI2L (LoadUB mem))); - predicate(n->in(1)->as_Load()->is_unordered()); + // predicate(n->in(1)->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrb $dst, $mem\t# byte" %} @@ -5676,7 +5651,7 @@ instruct loadS(iRegINoSp dst, memory mem) %{ match(Set dst (LoadS mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrshw $dst, $mem\t# short" %} @@ -5690,7 +5665,7 @@ instruct loadS2L(iRegLNoSp dst, memory mem) %{ match(Set dst (ConvI2L (LoadS mem))); - predicate(n->in(1)->as_Load()->is_unordered()); + // predicate(n->in(1)->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrsh $dst, $mem\t# short" %} @@ -5704,7 +5679,7 @@ instruct loadUS(iRegINoSp dst, memory mem) %{ match(Set dst (LoadUS mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrh $dst, $mem\t# short" %} @@ -5718,7 +5693,7 @@ instruct loadUS2L(iRegLNoSp dst, memory mem) %{ match(Set dst (ConvI2L (LoadUS mem))); - predicate(n->in(1)->as_Load()->is_unordered()); + // predicate(n->in(1)->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrh $dst, $mem\t# short" %} @@ -5732,7 +5707,7 @@ instruct loadI(iRegINoSp dst, memory mem) %{ match(Set dst (LoadI mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrw $dst, $mem\t# int" %} @@ -5746,7 +5721,7 @@ instruct loadI2L(iRegLNoSp dst, memory mem) %{ match(Set dst (ConvI2L (LoadI mem))); - predicate(n->in(1)->as_Load()->is_unordered()); + // predicate(n->in(1)->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrsw $dst, $mem\t# int" %} @@ -5760,7 +5735,7 @@ instruct loadUI2L(iRegLNoSp dst, memory mem, immL_32bits mask) %{ match(Set dst (AndL (ConvI2L (LoadI mem)) mask)); - predicate(n->in(1)->in(1)->as_Load()->is_unordered()); + // predicate(n->in(1)->in(1)->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrw $dst, $mem\t# int" %} @@ -5774,7 +5749,7 @@ instruct loadL(iRegLNoSp dst, memory mem) %{ match(Set dst (LoadL mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldr $dst, $mem\t# int" %} @@ -5801,7 +5776,7 @@ instruct loadP(iRegPNoSp dst, memory mem) %{ match(Set dst (LoadP mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldr $dst, $mem\t# ptr" %} @@ -5815,7 +5790,7 @@ instruct loadN(iRegNNoSp dst, memory mem) %{ match(Set dst (LoadN mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrw $dst, $mem\t# compressed ptr" %} @@ -5829,7 +5804,7 @@ instruct loadKlass(iRegPNoSp dst, memory mem) %{ match(Set dst (LoadKlass mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldr $dst, $mem\t# class" %} @@ -5843,7 +5818,7 @@ instruct loadNKlass(iRegNNoSp dst, memory mem) %{ match(Set dst (LoadNKlass mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrw $dst, $mem\t# compressed class ptr" %} @@ -5857,7 +5832,7 @@ instruct loadF(vRegF dst, memory mem) %{ match(Set dst (LoadF mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrs $dst, $mem\t# float" %} @@ -5871,7 +5846,7 @@ instruct loadD(vRegD dst, memory mem) %{ match(Set dst (LoadD mem)); - predicate(n->as_Load()->is_unordered()); + // predicate(n->as_Load()->is_unordered()); ins_cost(4 * INSN_COST); format %{ "ldrd $dst, $mem\t# double" %} @@ -6102,7 +6077,7 @@ instruct storeB(iRegIorL2I src, memory mem) %{ match(Set mem (StoreB mem src)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "strb $src, $mem\t# byte" %} @@ -6116,7 +6091,7 @@ instruct storeimmB0(immI0 zero, memory mem) %{ match(Set mem (StoreB mem zero)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "strb zr, $mem\t# byte" %} @@ -6130,7 +6105,7 @@ instruct storeC(iRegIorL2I src, memory mem) %{ match(Set mem (StoreC mem src)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "strh $src, $mem\t# short" %} @@ -6143,7 +6118,7 @@ instruct storeimmC0(immI0 zero, memory mem) %{ match(Set mem (StoreC mem zero)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "strh zr, $mem\t# short" %} @@ -6158,7 +6133,7 @@ instruct storeI(iRegIorL2I src, memory mem) %{ match(Set mem(StoreI mem src)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "strw $src, $mem\t# int" %} @@ -6171,7 +6146,7 @@ instruct storeimmI0(immI0 zero, memory mem) %{ match(Set mem(StoreI mem zero)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "strw zr, $mem\t# int" %} @@ -6185,7 +6160,7 @@ instruct storeL(iRegL src, memory mem) %{ match(Set mem (StoreL mem src)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "str $src, $mem\t# int" %} @@ -6199,7 +6174,7 @@ instruct storeimmL0(immL0 zero, memory mem) %{ match(Set mem (StoreL mem zero)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "str zr, $mem\t# int" %} @@ -6213,7 +6188,7 @@ instruct storeP(iRegP src, memory mem) %{ match(Set mem (StoreP mem src)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "str $src, $mem\t# ptr" %} @@ -6227,7 +6202,7 @@ instruct storeimmP0(immP0 zero, memory mem) %{ match(Set mem (StoreP mem zero)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "str zr, $mem\t# ptr" %} @@ -6286,7 +6261,7 @@ instruct storeN(iRegN src, memory mem) %{ match(Set mem (StoreN mem src)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "strw $src, $mem\t# compressed ptr" %} @@ -6300,8 +6275,9 @@ %{ match(Set mem (StoreN mem zero)); predicate(Universe::narrow_oop_base() == NULL && - Universe::narrow_klass_base() == NULL && - n->as_Store()->is_unordered()); + Universe::narrow_klass_base() == NULL// && + // n->as_Store()->is_unordered() + ); ins_cost(INSN_COST); format %{ "strw rheapbase, $mem\t# compressed ptr (rheapbase==0)" %} @@ -6315,7 +6291,7 @@ instruct storeF(vRegF src, memory mem) %{ match(Set mem (StoreF mem src)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "strs $src, $mem\t# float" %} @@ -6332,7 +6308,7 @@ instruct storeD(vRegD src, memory mem) %{ match(Set mem (StoreD mem src)); - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); ins_cost(INSN_COST); format %{ "strd $src, $mem\t# double" %} @@ -6345,7 +6321,7 @@ // Store Compressed Klass Pointer instruct storeNKlass(iRegN src, memory mem) %{ - predicate(n->as_Store()->is_unordered()); +// predicate(n->as_Store()->is_unordered()); match(Set mem (StoreNKlass mem src)); ins_cost(INSN_COST); @@ -6395,312 +6371,6 @@ ins_pipe(iload_prefetch); %} -// ---------------- volatile loads and stores ---------------- - -// Load Byte (8 bit signed) -instruct loadB_volatile(iRegINoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadB mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarsb $dst, $mem\t# byte" %} - - ins_encode(aarch64_enc_ldarsb(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Byte (8 bit signed) into long -instruct loadB2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (ConvI2L (LoadB mem))); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarsb $dst, $mem\t# byte" %} - - ins_encode(aarch64_enc_ldarsb(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Byte (8 bit unsigned) -instruct loadUB_volatile(iRegINoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadUB mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarb $dst, $mem\t# byte" %} - - ins_encode(aarch64_enc_ldarb(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Byte (8 bit unsigned) into long -instruct loadUB2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (ConvI2L (LoadUB mem))); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarb $dst, $mem\t# byte" %} - - ins_encode(aarch64_enc_ldarb(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Short (16 bit signed) -instruct loadS_volatile(iRegINoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadS mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarshw $dst, $mem\t# short" %} - - ins_encode(aarch64_enc_ldarshw(dst, mem)); - - ins_pipe(pipe_serial); -%} - -instruct loadUS_volatile(iRegINoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadUS mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarhw $dst, $mem\t# short" %} - - ins_encode(aarch64_enc_ldarhw(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Short/Char (16 bit unsigned) into long -instruct loadUS2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (ConvI2L (LoadUS mem))); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarh $dst, $mem\t# short" %} - - ins_encode(aarch64_enc_ldarh(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Short/Char (16 bit signed) into long -instruct loadS2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (ConvI2L (LoadS mem))); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarh $dst, $mem\t# short" %} - - ins_encode(aarch64_enc_ldarsh(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Integer (32 bit signed) -instruct loadI_volatile(iRegINoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadI mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarw $dst, $mem\t# int" %} - - ins_encode(aarch64_enc_ldarw(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Integer (32 bit unsigned) into long -instruct loadUI2L_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem, immL_32bits mask) -%{ - match(Set dst (AndL (ConvI2L (LoadI mem)) mask)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarw $dst, $mem\t# int" %} - - ins_encode(aarch64_enc_ldarw(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Long (64 bit signed) -instruct loadL_volatile(iRegLNoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadL mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldar $dst, $mem\t# int" %} - - ins_encode(aarch64_enc_ldar(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Pointer -instruct loadP_volatile(iRegPNoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadP mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldar $dst, $mem\t# ptr" %} - - ins_encode(aarch64_enc_ldar(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Compressed Pointer -instruct loadN_volatile(iRegNNoSp dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadN mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldarw $dst, $mem\t# compressed ptr" %} - - ins_encode(aarch64_enc_ldarw(dst, mem)); - - ins_pipe(pipe_serial); -%} - -// Load Float -instruct loadF_volatile(vRegF dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadF mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldars $dst, $mem\t# float" %} - - ins_encode( aarch64_enc_fldars(dst, mem) ); - - ins_pipe(pipe_serial); -%} - -// Load Double -instruct loadD_volatile(vRegD dst, /* sync_memory*/indirect mem) -%{ - match(Set dst (LoadD mem)); - - ins_cost(VOLATILE_REF_COST); - format %{ "ldard $dst, $mem\t# double" %} - - ins_encode( aarch64_enc_fldard(dst, mem) ); - - ins_pipe(pipe_serial); -%} - -// Store Byte -instruct storeB_volatile(iRegIorL2I src, /* sync_memory*/indirect mem) -%{ - match(Set mem (StoreB mem src)); - - ins_cost(VOLATILE_REF_COST); - format %{ "stlrb $src, $mem\t# byte" %} - - ins_encode(aarch64_enc_stlrb(src, mem)); - - ins_pipe(pipe_serial); -%} - -// Store Char/Short -instruct storeC_volatile(iRegIorL2I src, /* sync_memory*/indirect mem) -%{ - match(Set mem (StoreC mem src)); - - ins_cost(VOLATILE_REF_COST); - format %{ "stlrh $src, $mem\t# short" %} - - ins_encode(aarch64_enc_stlrh(src, mem)); - - ins_pipe(pipe_serial); -%} - -// Store Integer - -instruct storeI_volatile(iRegIorL2I src, /* sync_memory*/indirect mem) -%{ - match(Set mem(StoreI mem src)); - - ins_cost(VOLATILE_REF_COST); - format %{ "stlrw $src, $mem\t# int" %} - - ins_encode(aarch64_enc_stlrw(src, mem)); - - ins_pipe(pipe_serial); -%} - -// Store Long (64 bit signed) -instruct storeL_volatile(iRegL src, /* sync_memory*/indirect mem) -%{ - match(Set mem (StoreL mem src)); - - ins_cost(VOLATILE_REF_COST); - format %{ "stlr $src, $mem\t# int" %} - - ins_encode(aarch64_enc_stlr(src, mem)); - - ins_pipe(pipe_serial); -%} - -// Store Pointer -instruct storeP_volatile(iRegP src, /* sync_memory*/indirect mem) -%{ - match(Set mem (StoreP mem src)); - - ins_cost(VOLATILE_REF_COST); - format %{ "stlr $src, $mem\t# ptr" %} - - ins_encode(aarch64_enc_stlr(src, mem)); - - ins_pipe(pipe_serial); -%} - -// Store Compressed Pointer -instruct storeN_volatile(iRegN src, /* sync_memory*/indirect mem) -%{ - match(Set mem (StoreN mem src)); - - ins_cost(VOLATILE_REF_COST); - format %{ "stlrw $src, $mem\t# compressed ptr" %} - - ins_encode(aarch64_enc_stlrw(src, mem)); - - ins_pipe(pipe_serial); -%} - -// Store Float -instruct storeF_volatile(vRegF src, /* sync_memory*/indirect mem) -%{ - match(Set mem (StoreF mem src)); - - ins_cost(VOLATILE_REF_COST); - format %{ "stlrs $src, $mem\t# float" %} - - ins_encode( aarch64_enc_fstlrs(src, mem) ); - - ins_pipe(pipe_serial); -%} - -// TODO -// implement storeImmF0 and storeFImmPacked - -// Store Double -instruct storeD_volatile(vRegD src, /* sync_memory*/indirect mem) -%{ - match(Set mem (StoreD mem src)); - - ins_cost(VOLATILE_REF_COST); - format %{ "stlrd $src, $mem\t# double" %} - - ins_encode( aarch64_enc_fstlrd(src, mem) ); - - ins_pipe(pipe_serial); -%} - -// ---------------- end of volatile loads and stores ---------------- - // ============================================================================ // BSWAP Instructions @@ -6918,20 +6588,6 @@ ins_pipe(pipe_serial); %} -instruct unnecessary_membar_acquire() %{ - predicate(preceded_by_ordered_load(n)); - match(MemBarAcquire); - ins_cost(0); - - format %{ "membar_acquire (elided)" %} - - ins_encode %{ - __ block_comment("membar_acquire (elided)"); - %} - - ins_pipe(pipe_class_empty); -%} - instruct membar_acquire() %{ match(MemBarAcquire); ins_cost(VOLATILE_REF_COST); @@ -7016,7 +6672,7 @@ ins_encode %{ __ membar(Assembler::StoreLoad); - %} + %} ins_pipe(pipe_serial); %} diff -r 4c3f7e682e48 -r 8ec803e97a0d src/share/vm/opto/graphKit.cpp --- a/src/share/vm/opto/graphKit.cpp Fri Jul 31 16:29:03 2015 +0100 +++ b/src/share/vm/opto/graphKit.cpp Wed Aug 12 16:04:32 2015 +0000 @@ -3803,8 +3803,7 @@ // Smash zero into card if( !UseConcMarkSweepGC ) { - __ store(__ ctrl(), card_adr, zero, bt, adr_type, - MemNode:: NOT_AARCH64(release) AARCH64_ONLY(unordered)); + __ store(__ ctrl(), card_adr, zero, bt, adr_type, MemNode::release); } else { // Specialized path for CM store barrier __ storeCM(__ ctrl(), card_adr, zero, oop_store, adr_idx, bt, adr_type); diff -r 4c3f7e682e48 -r 8ec803e97a0d src/share/vm/opto/memnode.hpp --- a/src/share/vm/opto/memnode.hpp Fri Jul 31 16:29:03 2015 +0100 +++ b/src/share/vm/opto/memnode.hpp Wed Aug 12 16:04:32 2015 +0000 @@ -540,16 +540,10 @@ // Conservatively release stores of object references in order to // ensure visibility of object initialization. static inline MemOrd release_if_reference(const BasicType t) { -#ifndef AARCH64 const MemOrd mo = (t == T_ARRAY || t == T_ADDRESS || // Might be the address of an object reference (`boxing'). t == T_OBJECT) ? release : unordered; return mo; -#else - // AArch64 doesn't need this because it emits barriers when an - // object is initialized. - return unordered; -#endif } // Polymorphic factory method diff -r 4c3f7e682e48 -r 8ec803e97a0d src/share/vm/opto/parse2.cpp --- a/src/share/vm/opto/parse2.cpp Fri Jul 31 16:29:03 2015 +0100 +++ b/src/share/vm/opto/parse2.cpp Wed Aug 12 16:04:32 2015 +0000 @@ -1717,8 +1717,7 @@ a = pop(); // the array itself const TypeOopPtr* elemtype = _gvn.type(a)->is_aryptr()->elem()->make_oopptr(); const TypeAryPtr* adr_type = TypeAryPtr::OOPS; - Node* store = store_oop_to_array(control(), a, d, adr_type, c, elemtype, T_OBJECT, - MemNode:: NOT_AARCH64(release) AARCH64_ONLY(unordered)); + Node* store = store_oop_to_array(control(), a, d, adr_type, c, elemtype, T_OBJECT, MemNode::release); break; } case Bytecodes::_lastore: { diff -r 4c3f7e682e48 -r 8ec803e97a0d src/share/vm/opto/parse3.cpp --- a/src/share/vm/opto/parse3.cpp Fri Jul 31 16:29:03 2015 +0100 +++ b/src/share/vm/opto/parse3.cpp Wed Aug 12 16:04:32 2015 +0000 @@ -281,10 +281,7 @@ // If reference is volatile, prevent following memory ops from // floating down past the volatile write. Also prevents commoning // another volatile read. - // AArch64 uses store release (which does everything we need to keep - // the machine in order) but we still need a compiler barrier here. - if (is_vol) - insert_mem_bar(NOT_AARCH64(Op_MemBarRelease) AARCH64_ONLY(Op_MemBarCPUOrder)); + if (is_vol) insert_mem_bar(Op_MemBarRelease); // Compute address and memory type. int offset = field->offset_in_bytes(); @@ -325,7 +322,7 @@ if (is_vol) { // If not multiple copy atomic, we do the MemBarVolatile before the load. if (!support_IRIW_for_not_multiple_copy_atomic_cpu) { - insert_mem_bar(NOT_AARCH64(Op_MemBarVolatile) AARCH64_ONLY(Op_MemBarCPUOrder)); // Use fat membar + insert_mem_bar(Op_MemBarVolatile); // Use fat membar } // Remember we wrote a volatile field. // For not multiple copy atomic cpu (ppc64) a barrier should be issued From aph at redhat.com Wed Aug 12 16:10:33 2015 From: aph at redhat.com (aph at redhat.com) Date: Wed, 12 Aug 2015 16:10:33 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: Remove code which uses load acquire and store release. Revert to Message-ID: <201508121610.t7CGAXde009443@aojmv0008.oracle.com> Changeset: 8ec803e97a0d Author: aph Date: 2015-08-12 16:04 +0000 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/8ec803e97a0d Remove code which uses load acquire and store release. Revert to plain old memory fences. ! src/cpu/aarch64/vm/aarch64.ad ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/memnode.hpp ! src/share/vm/opto/parse2.cpp ! src/share/vm/opto/parse3.cpp From adinn at redhat.com Wed Aug 12 16:15:07 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 12 Aug 2015 17:15:07 +0100 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: Remove code which uses load acquire and store release. Revert to In-Reply-To: <201508121610.t7CGAXde009443@aojmv0008.oracle.com> References: <201508121610.t7CGAXde009443@aojmv0008.oracle.com> Message-ID: <55CB710B.5060200@redhat.com> On 12/08/15 17:10, aph at redhat.com wrote: > Changeset: 8ec803e97a0d > Author: aph > Date: 2015-08-12 16:04 +0000 > URL: http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/8ec803e97a0d > > Remove code which uses load acquire and store release. Revert to > plain old memory fences. > > ! src/cpu/aarch64/vm/aarch64.ad > ! src/share/vm/opto/graphKit.cpp > ! src/share/vm/opto/memnode.hpp > ! src/share/vm/opto/parse2.cpp > ! src/share/vm/opto/parse3.cpp That all looks right to me. regards, Andrew Dinn ----------- From edward.nevill at gmail.com Wed Aug 12 16:23:32 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 12 Aug 2015 17:23:32 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55CA28EC.7060109@oracle.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> Message-ID: <1439396612.4820.31.camel@mylittlepony.linaroharston> On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote: > I think it depends how expensive push/pop on arm64. > In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in > .ad). So you are saving on stack anyway. > On other hand your changes (third temp) are not so big and I think acceptable. > On 8/11/15 8:57 AM, Edward Nevill wrote: > > Hi, > > > > Webrev http://cr.openjdk.java.net/~enevill/8133352/ Hi Vladimir, Thanks for that. Another possibility is to use the inverse operation to restore the result after it has been corrupted. EG. -#define ATOMIC_OP(LDXR, OP, STXR) \ +#define ATOMIC_OP(LDXR, OP, IOP, STXR) \ void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \ Register result = rscratch2; \ if (prev->is_valid()) \ @@ -2120,14 +2125,15 @@ bind(retry_load); \ LDXR(result, addr); \ OP(rscratch1, result, incr); \ - STXR(rscratch1, rscratch1, addr); \ - cbnzw(rscratch1, retry_load); \ - if (prev->is_valid() && prev != result) \ - mov(prev, result); \ + STXR(rscratch2, rscratch1, addr); \ + cbnzw(rscratch2, retry_load); \ + if (prev->is_valid() && prev != result) { \ + IOP(prev, rscratch1, incr); \ + } \ } -ATOMIC_OP(ldxr, add, stxr) -ATOMIC_OP(ldxrw, addw, stxrw) +ATOMIC_OP(ldxr, add, sub, stxr) +ATOMIC_OP(ldxrw, addw, subw, stxrw) This essentially creates the extra register we need by using the inverse operation to restore the result. It doesn't win any beauty contests, but it is probably the most optimal. All the best, Ed. From dean.long at oracle.com Thu Aug 13 06:25:03 2015 From: dean.long at oracle.com (Dean Long) Date: Wed, 12 Aug 2015 23:25:03 -0700 Subject: [aarch64-port-dev ] aarch64-port-dev ] aarch64-port-dev ] RFR (S) 8131682: C1 should use multibyte nops everywhere In-Reply-To: <55C9C00A.3040302@oracle.com> References: <55C9C00A.3040302@oracle.com> Message-ID: <55CC383F.2070105@oracle.com> OK, I pushed it for you. dl On 8/11/2015 2:27 AM, Aleksey Shipilev wrote: > Hi Dean, > > Ah yes, since we now use MacroAssembler::align to produce the effective > alignment, we can drop the platform-specific changes. ARM and PPC ports > may rewire their own MacroAssemblers if there are potentially better nop > sequences. > > New changeset: > http://cr.openjdk.java.net/~shade/8131682/8131682.changeset > > Tested it builds and runs with full JPRT. > > See the "Reviewed-by" line there. I think there are Reviewers there... > > Thanks, > -Aleksey > > On 08/10/2015 10:57 PM, Dean wrote: >> Did you get a Reviewer yet? >> >> dl >> >> >> Dean wrote: >>> I can sponsor this. How about removing the ppc, aarch64, and sparc changes and making sure it still builds? >>> >>> dl >>> >>> >>> Aleksey Shipilev wrote: >>>> On 07/29/2015 07:30 PM, Dean Long wrote: >>>>> On 7/29/2015 7:01 AM, Aleksey Shipilev wrote: >>>>>> On 07/27/2015 01:21 PM, Andrew Haley wrote: >>>>>>> On 27/07/15 10:13, Aleksey Shipilev wrote: >>>>>>>> Thanks Goetz! Fixed the assembler_ppc.inline.hpp. >>>>>>>> >>>>>>>> Andrew/Edward, are you OK with AArch64 part? >>>>>>>> http://cr.openjdk.java.net/~shade/8131682/webrev.02/ >>>>>>> I agree that it looks good. >>>>>> So, we have reviews from Dean Long, Goetz Lindenmaier, Andrew Dinn, and >>>>>> Andrew Haley. Still no Capital (R)eviewers. >>>>>> >>>>>> Otherwise, I think we are good to go. I respinned the JPRT with >>>>>> open+closed sources, and it would seem the changes in closed sources are >>>>>> not required. >>>>> The changes to sparc and ppc may not be required anymore. >>>> Excellent, please sponsor! >>>> http://cr.openjdk.java.net/~shade/8131682/8131682.changeset >>>> >>>> Thanks, >>>> -Aleksey >>>> >>>> > From adinn at redhat.com Thu Aug 13 08:58:02 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 13 Aug 2015 09:58:02 +0100 Subject: [aarch64-port-dev ] Fwd: Re: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55CB3FEC.1070709@redhat.com> References: <55CB3FEC.1070709@redhat.com> Message-ID: <55CC5C1A.4060804@redhat.com> Apologies, I replied only to compiler-dev when I should also have posted to aarch64-port-dev. Could someone other than Vladimir please review this patch? Also, I will need a sponsor to commit it to hs-comp if/when it passes review. Thanks! regards, Andrew Dinn ----------- -------- Forwarded Message -------- Subject: Re: RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores Date: Wed, 12 Aug 2015 13:45:32 +0100 From: Andrew Dinn To: Vladimir Kozlov , hotspot-compiler-dev at openjdk.java.net Hi Vladimir, Apologies for the delay in responding to your feedback -- I was traveling for a team meeting all of last week. Here is a revised webrev which includes all the code changes you suggested http://cr.openjdk.java.net/~adinn/8078743/webrev.04 Also, as requested I did some testing on the two AArch64 machines to which I have access. Does it help? Short answer: yes it is well worth doing as it causes no harm on the sort of architecture where you would expect no benefit and helps a lot on the sort of architecture where you would expect it to help. More details below. regards, Andrew Dinn ----------- The Tests --------- I ran some simple tests using the jmh micro-benchmark harness, first using the old style dmb based implementation (i.e. passing -XX:+UseBarriersForVolatile) and then using the new style stlr-based implementation (using -XX:-UseBarriersForVolatile). Each test was run in each of the 5 relevant GC configs: +G1GC +CMS +UseCondCardMark +CMS -UseCondCardMark +Par +UseCondCardMark +Par -UseCondCardMark The tests were derived from Alexey Shipilev's recently posted CAS test, tweaked to do volatile stores instead of CASes. Each test employs a single thread which repeatedly writes a volatile field (AtomicReference.set). A delay call follows each write (BlackHole.consumeCPU) with the delay argument varying from 0 to 64. A single AtomicReference instance is employed throughout the test. Test one always writes null; test two always writes a fixed object; test three writes an object newly allocated at each write (example source for the null write test is included below). This range of tests allows various elements of the write barrier to be omitted at generate time or run time, depending upon the GC config. In each case the result was recorded as the average number of nanoseconds per write operation (ns/op). I am afraid I am not in a position to give the actual timings on any specific architecture or, indeed, name what hardware was used. However, I can give a qualitiative account of what I found and it pretty much accords with Andrew Haley's expectations. Main Results ------------ With the first (O-O-O CPU) implementation of AArch64 there was no statistically significant variation in the ns/op. With the other (simple pipeline CPU) implementation for most of the test space there was a very significant improvement (decrease) in ns/op for the stlr version when compared against the equivalent barrier implementation Detailed Results ---------------- The second machine showed some interesting variations in performance improvement which are worth mentioning: - in the best case ns/op was cut by 50% (CMS - UseCondCardMark, backoff 0, old value write) - at backoff 0 in most cases ns/op was cut by ~30-35% for null/old value write and ~15-20% for young value write - at backoff 64 in most cases ns/op was cut by ~5-10% (n.b. this is mostly to do with the addition of wasted backoff time -- there was only a small decrease in the absolute times) - with most GC configs greatest improvement was with old value write, least improvement with young value write the above general results did not apply for 2 specific data points - with CMS + UseCondCardMark no significant %ge change was seen for old value writes - with Par + UseCondCardMark no significant %ge change was seen for young value writes These last 2 results are a bit odd. For both old and young puts CMS + UseCondCardMark requires a dmb ish after the stlr to ensure the card read does not float above the volatile store. For null puts the dmb gets elided (type info tells the compiler no card mark needed). So, the difference here between old and young writes is unexpected but must be down to the effect of conditional card marking rather than the barriers vs stlr. Par + UseCondCardMark employs no synchronization for the card mark. Once again the null write case will not need a card mark but the other two cases will. So, once again the disparity in the improvement between these two cases is unexpected but must be down to the effect of conditional card marking rather than the barriers vs stlr. Example Test Class ------------------- package org.openjdk; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.infra.Blackhole; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicReference; @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Fork(3) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class VolSetNull { AtomicReference ref; @Param({"0", "1", "2", "4", "8", "16", "32", "64"}) int backoff; @Setup public void setup() { ref = new AtomicReference<>(); ref.set(new Object()); } @Benchmark public boolean test() { Blackhole.consumeCPU(backoff); ref.set(null); return true; } } From adinn at redhat.com Fri Aug 14 09:01:02 2015 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 14 Aug 2015 10:01:02 +0100 Subject: [aarch64-port-dev ] RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55CB3FEC.1070709@redhat.com> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> <55CB3FEC.1070709@redhat.com> Message-ID: <55CDAE4E.5060301@redhat.com> Any chance of getting my updated patch reviewed by 2 people and sponsored it to go into hs-comp? I now have the follow on patch for issue ready for review 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code So it would be very helpful if this one could be checked. Thanks! regards, Andrew Dinn ----------- On 12/08/15 13:45, Andrew Dinn wrote: > Hi Vladimir, > > Apologies for the delay in responding to your feedback -- I was > traveling for a team meeting all of last week. > > Here is a revised webrev which includes all the code changes you suggested > > http://cr.openjdk.java.net/~adinn/8078743/webrev.04 > > Also, as requested I did some testing on the two AArch64 machines to > which I have access. Does it help? Short answer: yes it is well worth > doing as it causes no harm on the sort of architecture where you would > expect no benefit and helps a lot on the sort of architecture where you > would expect it to help. More details below. > > regards, > > > Andrew Dinn > ----------- > > The Tests > --------- > > I ran some simple tests using the jmh micro-benchmark harness, first > using the old style dmb based implementation (i.e. passing > -XX:+UseBarriersForVolatile) and then using the new style stlr-based > implementation (using -XX:-UseBarriersForVolatile). Each test was run in > each of the 5 relevant GC configs: > > +G1GC > +CMS +UseCondCardMark > +CMS -UseCondCardMark > +Par +UseCondCardMark > +Par -UseCondCardMark > > The tests were derived from Alexey Shipilev's recently posted CAS test, > tweaked to do volatile stores instead of CASes. Each test employs a > single thread which repeatedly writes a volatile field > (AtomicReference.set). A delay call follows each write > (BlackHole.consumeCPU) with the delay argument varying from 0 to 64. A > single AtomicReference instance is employed throughout the test. > > Test one always writes null; test two always writes a fixed object; test > three writes an object newly allocated at each write (example source for > the null write test is included below). This range of tests allows > various elements of the write barrier to be omitted at generate time or > run time, depending upon the GC config. > > In each case the result was recorded as the average number of > nanoseconds per write operation (ns/op). I am afraid I am not in a > position to give the actual timings on any specific architecture or, > indeed, name what hardware was used. However, I can give a qualitiative > account of what I found and it pretty much accords with Andrew Haley's > expectations. > > Main Results > ------------ > > With the first (O-O-O CPU) implementation of AArch64 there was no > statistically significant variation in the ns/op. > > With the other (simple pipeline CPU) implementation for most of the > test space there was a very significant improvement (decrease) in ns/op > for the stlr version when compared against the equivalent barrier > implementation > > Detailed Results > ---------------- > > The second machine showed some interesting variations in performance > improvement which are worth mentioning: > > - in the best case ns/op was cut by 50% (CMS - UseCondCardMark, > backoff 0, old value write) > > - at backoff 0 in most cases ns/op was cut by ~30-35% for null/old > value write and ~15-20% for young value write > > - at backoff 64 in most cases ns/op was cut by ~5-10% (n.b. this is > mostly to do with the addition of wasted backoff time -- there was only > a small decrease in the absolute times) > > - with most GC configs greatest improvement was with old value write, > least improvement with young value write > > the above general results did not apply for 2 specific data points > > - with CMS + UseCondCardMark no significant %ge change was seen for > old value writes > > - with Par + UseCondCardMark no significant %ge change was seen for > young value writes > > These last 2 results are a bit odd. > > For both old and young puts CMS + UseCondCardMark requires a dmb ish > after the stlr to ensure the card read does not float above the volatile > store. For null puts the dmb gets elided (type info tells the compiler > no card mark needed). So, the difference here between old and young > writes is unexpected but must be down to the effect of conditional card > marking rather than the barriers vs stlr. > > Par + UseCondCardMark employs no synchronization for the card mark. > Once again the null write case will not need a card mark but the other > two cases will. So, once again the disparity in the improvement between > these two cases is unexpected but must be down to the effect of > conditional card marking rather than the barriers vs stlr. > > Example Test Class > ------------------- > > package org.openjdk; > > import org.openjdk.jmh.annotations.*; > import org.openjdk.jmh.infra.Blackhole; > > import java.util.concurrent.TimeUnit; > import java.util.concurrent.atomic.AtomicReference; > > @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) > @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) > @Fork(3) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @State(Scope.Benchmark) > public class VolSetNull { > > AtomicReference ref; > > @Param({"0", "1", "2", "4", "8", "16", "32", "64"}) > int backoff; > > @Setup > public void setup() { > ref = new AtomicReference<>(); > ref.set(new Object()); > } > > @Benchmark > public boolean test() { > Blackhole.consumeCPU(backoff); > ref.set(null); > return true; > } > } > -- regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From edward.nevill at gmail.com Tue Aug 18 13:07:11 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 18 Aug 2015 14:07:11 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <1439396612.4820.31.camel@mylittlepony.linaroharston> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> Message-ID: <1439903231.5709.5.camel@mylittlepony.linaroharston> Hi, Given that there has been no objections to my proposed solution I have prepared a webrev based on this. http://cr.openjdk.java.net/~enevill/8133352/webrev.01 The original jira issue is here https://bugs.openjdk.java.net/browse/JDK-8133352 I have tested with jtreg hotspot and langtools. Results before and after were identical. Hotspot: Test results: passed: 883; failed: 2; error: 10 Langtools: Test results: passed: 3,260; failed: 2 Please review and if OK I will push, Thanks, Ed. On Wed, 2015-08-12 at 17:23 +0100, Edward Nevill wrote: > On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote: > > I think it depends how expensive push/pop on arm64. > > In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in > > .ad). So you are saving on stack anyway. > > On other hand your changes (third temp) are not so big and I think acceptable. > > On 8/11/15 8:57 AM, Edward Nevill wrote: > -#define ATOMIC_OP(LDXR, OP, STXR) \ > +#define ATOMIC_OP(LDXR, OP, IOP, STXR) \ > void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \ > Register result = rscratch2; \ > if (prev->is_valid()) \ > @@ -2120,14 +2125,15 @@ > bind(retry_load); \ > LDXR(result, addr); \ > OP(rscratch1, result, incr); \ > - STXR(rscratch1, rscratch1, addr); \ > - cbnzw(rscratch1, retry_load); \ > - if (prev->is_valid() && prev != result) \ > - mov(prev, result); \ > + STXR(rscratch2, rscratch1, addr); \ > + cbnzw(rscratch2, retry_load); \ > + if (prev->is_valid() && prev != result) { \ > + IOP(prev, rscratch1, incr); \ > + } \ > } > > -ATOMIC_OP(ldxr, add, stxr) > -ATOMIC_OP(ldxrw, addw, stxrw) > +ATOMIC_OP(ldxr, add, sub, stxr) > +ATOMIC_OP(ldxrw, addw, subw, stxrw) > > This essentially creates the extra register we need by using the inverse operation to restore the result. > > It doesn't win any beauty contests, but it is probably the most optimal. From adinn at redhat.com Tue Aug 18 14:50:07 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 18 Aug 2015 15:50:07 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <1439903231.5709.5.camel@mylittlepony.linaroharston> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> Message-ID: <55D3461F.8070405@redhat.com> Hi Ed, On 18/08/15 14:07, Edward Nevill wrote: > Given that there has been no objections to my proposed solution I > have prepared a webrev based on this. > > http://cr.openjdk.java.net/~enevill/8133352/webrev.01 > > The original jira issue is here > > https://bugs.openjdk.java.net/browse/JDK-8133352 > > I have tested with jtreg hotspot and langtools. Results before and > after were identical. > > Hotspot: Test results: passed: 883; failed: 2; error: 10 Langtools: > Test results: passed: 3,260; failed: 2 > > Please review and if OK I will push, Your change looks good to me. regards, Andrew Dinn ----------- From aph at redhat.com Tue Aug 18 17:56:19 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 18 Aug 2015 18:56:19 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55D3461F.8070405@redhat.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> <55D3461F.8070405@redhat.com> Message-ID: <55D371C3.6030703@redhat.com> On 08/18/2015 03:50 PM, Andrew Dinn wrote: > Your change looks good to me. Me too. Andrew. From c_ejohns at qti.qualcomm.com Tue Aug 18 19:43:23 2015 From: c_ejohns at qti.qualcomm.com (Johnson, E Andrew) Date: Tue, 18 Aug 2015 19:43:23 +0000 Subject: [aarch64-port-dev ] aarch64 build is broken Message-ID: After updating to the most recent sources (either jdk9/dev or jdk9/hs-comp), I am getting the following error: macroAssembler_aarch64.o: In function `MacroAssembler::patch_oop(unsigned char*, unsigned char*)': $HOME/openjdk/jdk9/dev/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:168: undefined reference to `oopDesc::encode_heap_oop(oopDesc*)' collect2: error: ld returned 1 exit status make[8]: *** [libjvm.so] Error 1 The source line in question has not changed since the last successful build, so something changed in some other checkin that is causing this error. Has anyone else encountered this problem? It is failing on both a native aarch64 build and also on a cross_compile build on x86_64. -AndyJ From aph at redhat.com Wed Aug 19 07:32:49 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 19 Aug 2015 08:32:49 +0100 Subject: [aarch64-port-dev ] aarch64 build is broken In-Reply-To: References: Message-ID: <55D43121.3070702@redhat.com> On 08/18/2015 08:43 PM, Johnson, E Andrew wrote: > Has anyone else encountered this problem? I don't think so; you'll have to debug the build to find out. I guess you've already tried a clean build from scratch. Andrew. From edward.nevill at gmail.com Wed Aug 19 12:10:06 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 19 Aug 2015 13:10:06 +0100 Subject: [aarch64-port-dev ] RFR: 8133935: aarch64: fails to build from source Message-ID: <1439986206.23660.11.camel@mint> Hi, http://cr.openjdk.java.net/~enevill/8133935/webrev/ fixes the following error in the build macroAssembler_aarch64.o: In function `MacroAssembler::patch_oop(unsigned char*, unsigned char*)': $HOME/openjdk/jdk9/dev/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:168: undefined reference to `oopDesc::encode_heap_oop(oopDesc*)' collect2: error: ld returned 1 exit status make[8]: *** [libjvm.so] Error 1 Please review, Thanks, Ed. PS: Both hs-comp and dev are broken, should I push to both? or just to hs-comp? From edward.nevill at gmail.com Wed Aug 19 13:24:05 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 19 Aug 2015 14:24:05 +0100 Subject: [aarch64-port-dev ] RFR: 8133935: aarch64: fails to build from source In-Reply-To: <55D47B72.40509@oracle.com> References: <1439986206.23660.11.camel@mint> <55D47B72.40509@oracle.com> Message-ID: <1439990645.23660.64.camel@mint> Sorry. I wasn't aware of that. New webrev. http://cr.openjdk.java.net/~enevill/8133935/webrev.01/ Thanks, Ed. On Wed, 2015-08-19 at 08:49 -0400, Coleen Phillimore wrote: > Sorry, I should spend more time on stuff like this. The #include list > should be alphabetized. > Coleen > > On 8/19/15 8:10 AM, Edward Nevill wrote: > > Hi, > > > > http://cr.openjdk.java.net/~enevill/8133935/webrev/ > > > > fixes the following error in the build > > > > macroAssembler_aarch64.o: In function `MacroAssembler::patch_oop(unsigned char*, unsigned char*)': > > $HOME/openjdk/jdk9/dev/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:168: undefined reference to `oopDesc::encode_heap_oop(oopDesc*)' > > collect2: error: ld returned 1 exit status > > make[8]: *** [libjvm.so] Error 1 > > > > Please review, > > Thanks, > > Ed. > > > > PS: Both hs-comp and dev are broken, should I push to both? or just to hs-comp? > > > > > From edward.nevill at gmail.com Wed Aug 19 13:30:13 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Wed, 19 Aug 2015 14:30:13 +0100 Subject: [aarch64-port-dev ] RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <55CB3FEC.1070709@redhat.com> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> <55CB3FEC.1070709@redhat.com> Message-ID: <1439991013.23660.72.camel@mint> Hi Andrew, I have tested this on two different partner platforms with G1GC, CMS and Parallel GC with and without UseCondCardMark. I have also reviewed the code and and happy with the code and comments. Vladimir, if you are still happy with this may I go ahead and push this on behalf of Andrew Dinn. The changes only affect aarch64.ad. Many thanks, Ed. On Wed, 2015-08-12 at 13:45 +0100, Andrew Dinn wrote: > Hi Vladimir, > > Apologies for the delay in responding to your feedback -- I was > traveling for a team meeting all of last week. > > Here is a revised webrev which includes all the code changes you suggested > > http://cr.openjdk.java.net/~adinn/8078743/webrev.04 > From coleen.phillimore at oracle.com Wed Aug 19 14:10:31 2015 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 19 Aug 2015 10:10:31 -0400 Subject: [aarch64-port-dev ] RFR: 8133935: aarch64: fails to build from source In-Reply-To: <1439990645.23660.64.camel@mint> References: <1439986206.23660.11.camel@mint> <55D47B72.40509@oracle.com> <1439990645.23660.64.camel@mint> Message-ID: <55D48E57.8040805@oracle.com> yes, this looks good. Coleen On 8/19/15 9:24 AM, Edward Nevill wrote: > Sorry. I wasn't aware of that. New webrev. > > http://cr.openjdk.java.net/~enevill/8133935/webrev.01/ > > Thanks, > Ed. > > On Wed, 2015-08-19 at 08:49 -0400, Coleen Phillimore wrote: >> Sorry, I should spend more time on stuff like this. The #include list >> should be alphabetized. >> Coleen >> >> On 8/19/15 8:10 AM, Edward Nevill wrote: >>> Hi, >>> >>> http://cr.openjdk.java.net/~enevill/8133935/webrev/ >>> >>> fixes the following error in the build >>> >>> macroAssembler_aarch64.o: In function `MacroAssembler::patch_oop(unsigned char*, unsigned char*)': >>> $HOME/openjdk/jdk9/dev/hotspot/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp:168: undefined reference to `oopDesc::encode_heap_oop(oopDesc*)' >>> collect2: error: ld returned 1 exit status >>> make[8]: *** [libjvm.so] Error 1 >>> >>> Please review, >>> Thanks, >>> Ed. >>> >>> PS: Both hs-comp and dev are broken, should I push to both? or just to hs-comp? >>> >>> > From adinn at redhat.com Wed Aug 19 15:18:12 2015 From: adinn at redhat.com (adinn at redhat.com) Date: Wed, 19 Aug 2015 15:18:12 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8: Added tag aarch64-jdk8u60-b24.2 for changeset 6b5ede6157df Message-ID: <201508191518.t7JFICmY016017@aojmv0008.oracle.com> Changeset: 9a781f9c1338 Author: adinn Date: 2015-08-19 16:16 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/rev/9a781f9c1338 Added tag aarch64-jdk8u60-b24.2 for changeset 6b5ede6157df ! .hgtags From adinn at redhat.com Wed Aug 19 15:18:16 2015 From: adinn at redhat.com (adinn at redhat.com) Date: Wed, 19 Aug 2015 15:18:16 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/corba: Added tag aarch64-jdk8u60-b24.2 for changeset a6046ed1e2d5 Message-ID: <201508191518.t7JFIG6g016025@aojmv0008.oracle.com> Changeset: 63a3a8c3af8c Author: adinn Date: 2015-08-19 16:16 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/corba/rev/63a3a8c3af8c Added tag aarch64-jdk8u60-b24.2 for changeset a6046ed1e2d5 ! .hgtags From adinn at redhat.com Wed Aug 19 15:18:20 2015 From: adinn at redhat.com (adinn at redhat.com) Date: Wed, 19 Aug 2015 15:18:20 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: Added tag aarch64-jdk8u60-b24.2 for changeset 8ec803e97a0d Message-ID: <201508191518.t7JFIKbC016035@aojmv0008.oracle.com> Changeset: a6acc533dfef Author: adinn Date: 2015-08-19 16:16 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/a6acc533dfef Added tag aarch64-jdk8u60-b24.2 for changeset 8ec803e97a0d ! .hgtags From adinn at redhat.com Wed Aug 19 15:18:24 2015 From: adinn at redhat.com (adinn at redhat.com) Date: Wed, 19 Aug 2015 15:18:24 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/jaxp: Added tag aarch64-jdk8u60-b24.2 for changeset 57a7047f7875 Message-ID: <201508191518.t7JFIOkj016056@aojmv0008.oracle.com> Changeset: 46fdf7a8c4a2 Author: adinn Date: 2015-08-19 16:16 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/jaxp/rev/46fdf7a8c4a2 Added tag aarch64-jdk8u60-b24.2 for changeset 57a7047f7875 ! .hgtags From adinn at redhat.com Wed Aug 19 15:18:28 2015 From: adinn at redhat.com (adinn at redhat.com) Date: Wed, 19 Aug 2015 15:18:28 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/jaxws: Added tag aarch64-jdk8u60-b24.2 for changeset afbe78dc93b2 Message-ID: <201508191518.t7JFIS0s016072@aojmv0008.oracle.com> Changeset: 9b9270adaf2d Author: adinn Date: 2015-08-19 16:16 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/jaxws/rev/9b9270adaf2d Added tag aarch64-jdk8u60-b24.2 for changeset afbe78dc93b2 ! .hgtags From adinn at redhat.com Wed Aug 19 15:18:32 2015 From: adinn at redhat.com (adinn at redhat.com) Date: Wed, 19 Aug 2015 15:18:32 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/jdk: Added tag aarch64-jdk8u60-b24.2 for changeset 0b8920048898 Message-ID: <201508191518.t7JFIXQW016082@aojmv0008.oracle.com> Changeset: f4b06f2bc28d Author: adinn Date: 2015-08-19 16:16 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/jdk/rev/f4b06f2bc28d Added tag aarch64-jdk8u60-b24.2 for changeset 0b8920048898 ! .hgtags From adinn at redhat.com Wed Aug 19 15:18:37 2015 From: adinn at redhat.com (adinn at redhat.com) Date: Wed, 19 Aug 2015 15:18:37 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/langtools: Added tag aarch64-jdk8u60-b24.2 for changeset 40a63630eb87 Message-ID: <201508191518.t7JFIceQ016092@aojmv0008.oracle.com> Changeset: 847cdcb2d035 Author: adinn Date: 2015-08-19 16:16 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/langtools/rev/847cdcb2d035 Added tag aarch64-jdk8u60-b24.2 for changeset 40a63630eb87 ! .hgtags From adinn at redhat.com Wed Aug 19 15:18:42 2015 From: adinn at redhat.com (adinn at redhat.com) Date: Wed, 19 Aug 2015 15:18:42 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/nashorn: Added tag aarch64-jdk8u60-b24.2 for changeset db6228024490 Message-ID: <201508191518.t7JFIgLf016100@aojmv0008.oracle.com> Changeset: 748cb101e6a0 Author: adinn Date: 2015-08-19 16:16 +0100 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/nashorn/rev/748cb101e6a0 Added tag aarch64-jdk8u60-b24.2 for changeset db6228024490 ! .hgtags From vladimir.kozlov at oracle.com Wed Aug 19 16:33:49 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2015 09:33:49 -0700 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <1439903231.5709.5.camel@mylittlepony.linaroharston> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> Message-ID: <55D4AFED.5000305@oracle.com> Looks fine to me. I did not see any comments from our colleges (from RH) who works on arm64. Are they agree with this change? Thanks, Vladimir On 8/18/15 6:07 AM, Edward Nevill wrote: > Hi, > > Given that there has been no objections to my proposed solution I have prepared a webrev based on this. > > http://cr.openjdk.java.net/~enevill/8133352/webrev.01 > > The original jira issue is here > > https://bugs.openjdk.java.net/browse/JDK-8133352 > > I have tested with jtreg hotspot and langtools. Results before and after were identical. > > Hotspot: Test results: passed: 883; failed: 2; error: 10 > Langtools: Test results: passed: 3,260; failed: 2 > > Please review and if OK I will push, > > Thanks, > Ed. > > On Wed, 2015-08-12 at 17:23 +0100, Edward Nevill wrote: >> On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote: >>> I think it depends how expensive push/pop on arm64. >>> In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in >>> .ad). So you are saving on stack anyway. >>> On other hand your changes (third temp) are not so big and I think acceptable. >>> On 8/11/15 8:57 AM, Edward Nevill wrote: >> -#define ATOMIC_OP(LDXR, OP, STXR) \ >> +#define ATOMIC_OP(LDXR, OP, IOP, STXR) \ >> void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \ >> Register result = rscratch2; \ >> if (prev->is_valid()) \ >> @@ -2120,14 +2125,15 @@ >> bind(retry_load); \ >> LDXR(result, addr); \ >> OP(rscratch1, result, incr); \ >> - STXR(rscratch1, rscratch1, addr); \ >> - cbnzw(rscratch1, retry_load); \ >> - if (prev->is_valid() && prev != result) \ >> - mov(prev, result); \ >> + STXR(rscratch2, rscratch1, addr); \ >> + cbnzw(rscratch2, retry_load); \ >> + if (prev->is_valid() && prev != result) { \ >> + IOP(prev, rscratch1, incr); \ >> + } \ >> } >> >> -ATOMIC_OP(ldxr, add, stxr) >> -ATOMIC_OP(ldxrw, addw, stxrw) >> +ATOMIC_OP(ldxr, add, sub, stxr) >> +ATOMIC_OP(ldxrw, addw, subw, stxrw) >> >> This essentially creates the extra register we need by using the inverse operation to restore the result. >> >> It doesn't win any beauty contests, but it is probably the most optimal. > > > From aph at redhat.com Wed Aug 19 16:37:26 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 19 Aug 2015 17:37:26 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55D4AFED.5000305@oracle.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> <55D4AFED.5000305@oracle.com> Message-ID: <55D4B0C6.7050803@redhat.com> On 08/19/2015 05:33 PM, Vladimir Kozlov wrote: > I did not see any comments from our colleges (from RH) who works on > arm64. Are they agree with this change? Oh yes, absolutely. Andrew. From adinn at redhat.com Wed Aug 19 16:38:59 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 19 Aug 2015 17:38:59 +0100 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55D4AFED.5000305@oracle.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> <55D4AFED.5000305@oracle.com> Message-ID: <55D4B123.5060403@redhat.com> On 19/08/15 17:33, Vladimir Kozlov wrote: > Looks fine to me. > > I did not see any comments from our colleges (from RH) who works on > arm64. Are they agree with this change? Hmm, maybe the mail server is being slow. Andrew Haley and I both gave it a thumbs up earlier today. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From vladimir.kozlov at oracle.com Wed Aug 19 16:48:53 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2015 09:48:53 -0700 Subject: [aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions In-Reply-To: <55D4B123.5060403@redhat.com> References: <1439308653.5920.16.camel@mylittlepony.linaroharston> <55CA28EC.7060109@oracle.com> <1439396612.4820.31.camel@mylittlepony.linaroharston> <1439903231.5709.5.camel@mylittlepony.linaroharston> <55D4AFED.5000305@oracle.com> <55D4B123.5060403@redhat.com> Message-ID: <55D4B375.4010307@oracle.com> Good. Yes, may be it is our internal Oracle's server. But I found your comments in openjdk mailing archive: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-August/018700.html Thanks, Vladimir On 8/19/15 9:38 AM, Andrew Dinn wrote: > On 19/08/15 17:33, Vladimir Kozlov wrote: >> Looks fine to me. >> >> I did not see any comments from our colleges (from RH) who works on >> arm64. Are they agree with this change? > > Hmm, maybe the mail server is being slow. Andrew Haley and I both gave > it a thumbs up earlier today. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From vladimir.kozlov at oracle.com Wed Aug 19 18:22:18 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Aug 2015 11:22:18 -0700 Subject: [aarch64-port-dev ] RFR: 8078743: AARCH64: Extend use of stlr to cater for volatile object stores In-Reply-To: <1439991013.23660.72.camel@mint> References: <55B8B134.2050803@redhat.com> <55BA78B7.7030300@oracle.com> <55CB3FEC.1070709@redhat.com> <1439991013.23660.72.camel@mint> Message-ID: <55D4C95A.4060308@oracle.com> Yes, I agree with latest changes. Thank you, Andrew, for running performance tests! Regards, Vladimir On 8/19/15 6:30 AM, Edward Nevill wrote: > Hi Andrew, > > I have tested this on two different partner platforms with G1GC, CMS and > Parallel GC with and without UseCondCardMark. > > I have also reviewed the code and and happy with the code and comments. > > Vladimir, if you are still happy with this may I go ahead and push this > on behalf of Andrew Dinn. The changes only affect aarch64.ad. > > Many thanks, > Ed. > > On Wed, 2015-08-12 at 13:45 +0100, Andrew Dinn wrote: >> Hi Vladimir, >> >> Apologies for the delay in responding to your feedback -- I was >> traveling for a team meeting all of last week. >> >> Here is a revised webrev which includes all the code changes you suggested >> >> http://cr.openjdk.java.net/~adinn/8078743/webrev.04 >> > > From edward.nevill at gmail.com Thu Aug 20 10:47:44 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 20 Aug 2015 11:47:44 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 Message-ID: <1440067664.5989.21.camel@mylittlepony.linaroharston> Hi, The following webrev http://cr.openjdk.java.net/~enevill/8133842/webrev.01/ fixes a problem reported by one of our partners whereby C2 can generate illegal instructions on certain partners HW. JIRA issue here https://bugs.openjdk.java.net/browse/JDK-8133842 The problem occurs when you have a logical or arithmetic instruction with a RHS which is shifted by a constant where (const & 32) != 0, ie the constant is 32..63 or 96..127 etc. For example the following res = i | (j >> 53) generates the instruction orrw Rd, Rn, Rm, ASR #53 This instruction has a 6 bit field for the shift so this would appear to be a legal encoding. However certain partner HW treats this as an undefined instruction generating a SIGILL. The problem was that the rules in aarch64.ad were always anding the constant with 0x3f for both ints and longs. The above webrev fixes this to mask with 0x1f for ints and 0x3f for longs. Tested with hotspot and langtools. Results the same in both cases. Hotspot: Test results: passed: 863; failed: 2; error: 10 Langtools: Test results: passed: 3,263 Thanks for your help with this review, Ed. From aph at redhat.com Thu Aug 20 10:56:20 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Aug 2015 11:56:20 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <1440067664.5989.21.camel@mylittlepony.linaroharston> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> Message-ID: <55D5B254.1090303@redhat.com> Looks good for C2. OK if you already checked C1! Thanks, Andrew. From adinn at redhat.com Thu Aug 20 11:24:11 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 Aug 2015 12:24:11 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <55D5B254.1090303@redhat.com> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5B254.1090303@redhat.com> Message-ID: <55D5B8DB.8030508@redhat.com> On 20/08/15 11:56, Andrew Haley wrote: > Looks good for C2. OK if you already checked C1! > > Thanks, > Andrew. What he said :-) regards, Andrew Dinn ----------- From edward.nevill at gmail.com Thu Aug 20 13:10:20 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 20 Aug 2015 14:10:20 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <55D5B254.1090303@redhat.com> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5B254.1090303@redhat.com> Message-ID: <1440076220.5989.27.camel@mylittlepony.linaroharston> On Thu, 2015-08-20 at 11:56 +0100, Andrew Haley wrote: > Looks good for C2. OK if you already checked C1! Yes. For my test case res += i | (j >> 53); C1 generates 0x000003ff90eb0d68: lsr w3, w1, #21 0x000003ff90eb0d6c: orr w3, w0, w3 0x000003ff90eb0d70: add w3, w3, w2 IE. It doesn't attempt to merge the shift and the or, and it masks the shift with 0x1f getting 21 instead of 53. Below is the code in C1 that does it (from c1_LIRGenerator_aarch64.cpp). You can see it ands ints with 0x1f and longs with 0x3f. All the best, Ed. --- cut here --- if (right.is_constant()) { right.dont_load_item(); switch (x->op()) { case Bytecodes::_ishl: { int c = right.get_jint_constant() & 0x1f; __ shift_left(left.result(), c, x->operand()); break; } case Bytecodes::_ishr: { int c = right.get_jint_constant() & 0x1f; __ shift_right(left.result(), c, x->operand()); break; } case Bytecodes::_iushr: { int c = right.get_jint_constant() & 0x1f; __ unsigned_shift_right(left.result(), c, x->operand()); break; } case Bytecodes::_lshl: { int c = right.get_jint_constant() & 0x3f; __ shift_left(left.result(), c, x->operand()); break; } case Bytecodes::_lshr: { int c = right.get_jint_constant() & 0x3f; __ shift_right(left.result(), c, x->operand()); break; } case Bytecodes::_lushr: { int c = right.get_jint_constant() & 0x3f; __ unsigned_shift_right(left.result(), c, x->operand()); break; } default: ShouldNotReachHere(); } } else { --- cut here --- From edward.nevill at gmail.com Thu Aug 20 15:06:35 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 20 Aug 2015 16:06:35 +0100 Subject: [aarch64-port-dev ] Patches to backport to jdk8 Message-ID: <1440083195.10528.4.camel@mylittlepony.linaroharston> Hi, Is it OK to backport the following patches from jdk9 to jdk8. All the best, Ed. --- CUT HERE --- changeset: 8690:e4304d76473f user: enevill date: Wed Jul 15 16:05:53 2015 +0000 files: src/cpu/aarch64/vm/aarch64.ad description: 8131358: aarch64: test compiler/loopopts/superword/ProdRed_Float.java fails when run with debug VM Summary: fix typo in match rule in vsub2f Reviewed-by: kvn, aph ---------------- changeset: 8693:aa7220a36fb0 user: enevill date: Thu Jul 16 14:16:44 2015 +0000 files: src/cpu/aarch64/vm/assembler_aarch64.cpp src/cpu/aarch64/vm/assembler_aarch64.hpp src/cpu/aarch64/vm/macroAssembler_aarch64.cpp description: 8131483: aarch64: illegal stlxr instructions Summary: Do not generate stlxX with Ws == Xn Reviewed-by: kvn, aph ---------------- changeset: 8721:89a220e70e99 user: enevill date: Fri Jul 17 07:50:36 2015 +0000 files: src/cpu/aarch64/vm/aarch64.ad src/cpu/aarch64/vm/macroAssembler_aarch64.cpp src/cpu/aarch64/vm/macroAssembler_aarch64.hpp description: 8131362: aarch64: C2 does not handle large stack offsets Summary: change spill code to allow large offsets Reviewed-by: kvn, aph ---------------- changeset: 8840:8783515c57ad user: enevill date: Tue Aug 18 12:40:22 2015 +0000 files: src/cpu/aarch64/vm/assembler_aarch64.cpp src/cpu/aarch64/vm/assembler_aarch64.hpp src/cpu/aarch64/vm/interp_masm_aarch64.cpp src/cpu/aarch64/vm/macroAssembler_aarch64.cpp src/cpu/aarch64/vm/macroAssembler_aarch64.hpp src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp src/cpu/aarch64/vm/templateInterpreter_aarch64.cpp description: 8133352: aarch64: generates constrained unpredictable instructions Summary: Fix generation of unpredictable STXR Rs, Rt, [Rn] with Rs == Rt Reviewed-by: kvn, aph, adinn ---------------- changeset: 8842:a94519010b13 tag: tip user: enevill date: Thu Aug 20 09:40:08 2015 +0000 files: src/cpu/aarch64/vm/aarch64.ad src/cpu/aarch64/vm/aarch64_ad.m4 description: 8133842: aarch64: C2 generates illegal instructions with int shifts >=32 Summary: Fix logical operatations combined with shifts >= 32 Reviewed-by: duke --- CUT HERE --- From vladimir.kozlov at oracle.com Thu Aug 20 15:55:36 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 20 Aug 2015 08:55:36 -0700 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <1440067664.5989.21.camel@mylittlepony.linaroharston> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> Message-ID: <55D5F878.6030901@oracle.com> Changes are fine but I am curious why results will be the same even if you mask less bits - I don't see predicates which checks shift's value (to select different instructions for different shift values) and you use general immI operand type for it. Thanks, Vladimir On 8/20/15 3:47 AM, Edward Nevill wrote: > Hi, > > The following webrev > > http://cr.openjdk.java.net/~enevill/8133842/webrev.01/ > > fixes a problem reported by one of our partners whereby C2 can generate illegal instructions on certain partners HW. > > JIRA issue here > > https://bugs.openjdk.java.net/browse/JDK-8133842 > > The problem occurs when you have a logical or arithmetic instruction with a RHS which is shifted by a constant where (const & 32) != 0, ie the constant is 32..63 or 96..127 etc. > > For example the following > > res = i | (j >> 53) > > generates the instruction > > orrw Rd, Rn, Rm, ASR #53 > > This instruction has a 6 bit field for the shift so this would appear to be a legal encoding. However certain partner HW treats this as an undefined instruction generating a SIGILL. > > The problem was that the rules in aarch64.ad were always anding the constant with 0x3f for both ints and longs. > > The above webrev fixes this to mask with 0x1f for ints and 0x3f for longs. > > Tested with hotspot and langtools. Results the same in both cases. > > Hotspot: Test results: passed: 863; failed: 2; error: 10 > Langtools: Test results: passed: 3,263 > > Thanks for your help with this review, > Ed. > > From aph at redhat.com Thu Aug 20 16:04:51 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Aug 2015 17:04:51 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <55D5F878.6030901@oracle.com> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5F878.6030901@oracle.com> Message-ID: <55D5FAA3.4050200@redhat.com> On 08/20/2015 04:55 PM, Vladimir Kozlov wrote: > Changes are fine but I am curious why results will be the same even if you mask less bits Java VM spec says: The shift distance actually used is always in the range 0 to 31, inclusive, as if value2 were subjected to a bitwise logical AND with the mask value 0x1f. Andrew. From edward.nevill at gmail.com Thu Aug 20 16:06:22 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Thu, 20 Aug 2015 17:06:22 +0100 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <55D5F878.6030901@oracle.com> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5F878.6030901@oracle.com> Message-ID: <1440086782.1907.6.camel@mylittlepony.linaroharston> On Thu, 2015-08-20 at 08:55 -0700, Vladimir Kozlov wrote: > Changes are fine but I am curious why results will be the same even if you mask less bits - I don't see predicates which > checks shift's value (to select different instructions for different shift values) and you use general immI operand type > for it. Not sure I understand the question. Currently it generate orrw Rd, Rn, Rm, ASR #53 Rd, Rn and Rm are 32 bit here (for orrw as opposed to orr). So on our partners implementation this generates a SIGILL. The webrev change this to orrw Rd, Rn, Rm, ASR #21 which is correct (for Java). All the best, Ed. From vladimir.kozlov at oracle.com Thu Aug 20 16:34:30 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 20 Aug 2015 09:34:30 -0700 Subject: [aarch64-port-dev ] RFR: aarch64: 8133842: C2 generates illegal instructions with int shifts >=32 In-Reply-To: <1440086782.1907.6.camel@mylittlepony.linaroharston> References: <1440067664.5989.21.camel@mylittlepony.linaroharston> <55D5F878.6030901@oracle.com> <1440086782.1907.6.camel@mylittlepony.linaroharston> Message-ID: <55D60196.7000505@oracle.com> Thank you. Andrew answered my question (only 0-31 values are meaningful). I thought you had on low level asm instruction masking already. For example on sparc we use u_field(imm5a, 4, 0): void sll( Register s1, int imm5a, Register d ) { emit_int32( op(arith_op) | rd(d) | op3(sll_op3) | rs1(s1) | sx(0) | immed(true) | u_field(imm5a, 4, 0) ); } It looks like you have to mask on more high level then we do on SPARC. Again, changes are good. Thanks, Vladimir On 8/20/15 9:06 AM, Edward Nevill wrote: > On Thu, 2015-08-20 at 08:55 -0700, Vladimir Kozlov wrote: >> Changes are fine but I am curious why results will be the same even if you mask less bits - I don't see predicates which >> checks shift's value (to select different instructions for different shift values) and you use general immI operand type >> for it. > > Not sure I understand the question. > > Currently it generate > > orrw Rd, Rn, Rm, ASR #53 > > Rd, Rn and Rm are 32 bit here (for orrw as opposed to orr). > > So on our partners implementation this generates a SIGILL. > > The webrev change this to > > orrw Rd, Rn, Rm, ASR #21 > > which is correct (for Java). > > All the best, > Ed. > > From vladimir.kozlov at oracle.com Fri Aug 21 21:24:17 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 21 Aug 2015 14:24:17 -0700 Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues In-Reply-To: References: Message-ID: <55D79701.9090407@oracle.com> Thank you for report and suggested fixes. CC to aarch64 port developers. Thanks, Vladimir On 8/21/15 5:21 AM, Hui Shi wrote: > Hi JIT members, > Attached fast_lock.patch fixes issues in fast lock/unlock on aarch64 > platform (in both aarch64-jdk8 and jdk9/hs-comp/hotspot). Could someone > help comments, review or sponsor? > A small test case and PrintAssembly log with/without fix are also > attached for reference. > > To reproduce this issue on aarch64, command line is "java > -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions > -XX:-BackgroundCompilation -XX:CompileCommand="compileonly,TestSync.f*" > -XX:+PrintAssembly TestSync" > There are three Issues in aarch64 fast lock/unlock: > *1. Duplicated biased lock checking* > When option UseBiasedLocking and UseOptoBiasInlining are both true, it > doesn't need emit biased_locking_enter in aarch64_enc_fast_lock. This is > redundant as biased locking enter check is already inlined in > PhaseMacroExpand::expand_lock_node. Checking assembly code in orig.asm > > [Inlined biased lock check in PhaseMacroExpand::expand_lock_node] > 0x000003ff88320d94: str x1, [sp] > 0x000003ff88320d98: ldr x10, [x1] > 0x000003ff88320d9c: and x11, x10, #0x7 > 0x000003ff88320da0: cmp x11, #0x5 > 0x000003ff88320da4: b.ne 0x000003ff88320e18 > [Biased lock check expanded in aarch64_enc_fast_lock] > 0x000003ff88320e18: add x12, sp, #0x10 > 0x000003ff88320e1c: ldr x10, [x1] > 0x000003ff88320e20: and x11, x12, #0x7 > 0x000003ff88320e24: cmp x11, #0x5 > 0x000003ff88320e28: b.ne 0x000003ff88320eec > *2. Incorrect parameter used in biased_locking_enter in > aarch64_enc_fast_lock* > Checking above code [Biased lock check expanded in > aarch64_enc_fast_lock], x12 is the box register and holding the address > of the lock record on stack. However it is mis-used as mark word in > biased lock checking here. As a result, biased pattern check always > fails because stack pointer is 8 bytes align and x11 must be zero. > Current implementation in aarch64_enc_fast_lock. > /biased_locking_enter(disp_hdr, oop, box, tmp, true, cont);/ > Which should be > /biased_locking_enter(box, oop, disp_hdr, tmp, true, cont); //swap > disp_hdr and box register, disp_hdr is already loaded with object mark word/ > This issue might cause problem when running with option > -XX:-UseOptoBiasInlining in following scenario, let?s check above code > in [Biased lock check expanded in aarch64_enc_fast_lock], x12 is box and > x10 is disp_hdr. > 1. Suppose object?s mark word (loaded into register x10) is in biased > mode, with content ?[biased_thread |epoch|age| 101]? and biased_thread > is executing its synchronized block. > 2. Another thread tries to acquire the same lock. Firstly, it performs > biased pattern check and fails, because ?mark word? register used here > is X12 (correct register should be x10). > 3. As x12 is not ?biased? (least three significant bits of SP + 0x10 > would never be 101), execution goes to thin lock CAS acquire code > instead of biased lock revoke/rebias code. > 4. Thin lock CAS acquire will succeed because x10?s least two > significant bit is 01 (thin lock CAS code uses disp_hdr (x10) as mark > word). Two threads acquire same lock at same time and this is incorrect > behavior. > > *3. Inflate monitor code has typo in aarch64_enc_fast_lock* > Inflated lock test is generated under condition (EmitSync & 0x02), while > generating inflated lock fast path under condition "if ((EmitSync & > 0x02) == 0))". At both location, they should be "if ((EmitSync & 0x02) > == 0) ". In orig.asm, no instruction branches to inflated lock acquire > fast path at 0x000003ff88320f24. > Issue #1 and #3 does not impact correctness, they introduce redundant > code (double biased lock check) and skip inflated lock fast path check > (_owner is null case). > Fix is in aarch64_enc_fast_lock/aarch64_enc_fast_unlock, this will not > impact C1 and interpreter. Attached patch includes: > 1. Disable generating biased lock handle code in fast_lock/fast_unlock > when UseOptoBiasInlining is true. > 2. Adjust biased_locking_enter?s actual parameters, swap disp_hdr and > box register. > 3. Fix typo in inflated monitor handling. > > Regards > Shi Hui From hui.shi at linaro.org Sat Aug 22 00:37:54 2015 From: hui.shi at linaro.org (Hui Shi) Date: Sat, 22 Aug 2015 08:37:54 +0800 Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues In-Reply-To: <55D79701.9090407@oracle.com> References: <55D79701.9090407@oracle.com> Message-ID: Thanks Vladimir! Regards Shi Hui On 22 August 2015 at 05:24, Vladimir Kozlov wrote: > Thank you for report and suggested fixes. > > CC to aarch64 port developers. > > Thanks, > Vladimir > > On 8/21/15 5:21 AM, Hui Shi wrote: > >> Hi JIT members, >> Attached fast_lock.patch fixes issues in fast lock/unlock on aarch64 >> platform (in both aarch64-jdk8 and jdk9/hs-comp/hotspot). Could someone >> help comments, review or sponsor? >> A small test case and PrintAssembly log with/without fix are also >> attached for reference. >> >> To reproduce this issue on aarch64, command line is "java >> -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions >> -XX:-BackgroundCompilation -XX:CompileCommand="compileonly,TestSync.f*" >> -XX:+PrintAssembly TestSync" >> There are three Issues in aarch64 fast lock/unlock: >> *1. Duplicated biased lock checking* >> When option UseBiasedLocking and UseOptoBiasInlining are both true, it >> doesn't need emit biased_locking_enter in aarch64_enc_fast_lock. This is >> redundant as biased locking enter check is already inlined in >> PhaseMacroExpand::expand_lock_node. Checking assembly code in orig.asm >> >> [Inlined biased lock check in PhaseMacroExpand::expand_lock_node] >> 0x000003ff88320d94: str x1, [sp] >> 0x000003ff88320d98: ldr x10, [x1] >> 0x000003ff88320d9c: and x11, x10, #0x7 >> 0x000003ff88320da0: cmp x11, #0x5 >> 0x000003ff88320da4: b.ne 0x000003ff88320e18 >> [Biased lock check expanded in aarch64_enc_fast_lock] >> 0x000003ff88320e18: add x12, sp, #0x10 >> 0x000003ff88320e1c: ldr x10, [x1] >> 0x000003ff88320e20: and x11, x12, #0x7 >> 0x000003ff88320e24: cmp x11, #0x5 >> 0x000003ff88320e28: b.ne 0x000003ff88320eec >> *2. Incorrect parameter used in biased_locking_enter in >> aarch64_enc_fast_lock* >> Checking above code [Biased lock check expanded in >> aarch64_enc_fast_lock], x12 is the box register and holding the address >> of the lock record on stack. However it is mis-used as mark word in >> biased lock checking here. As a result, biased pattern check always >> fails because stack pointer is 8 bytes align and x11 must be zero. >> Current implementation in aarch64_enc_fast_lock. >> /biased_locking_enter(disp_hdr, oop, box, tmp, true, cont);/ >> Which should be >> /biased_locking_enter(box, oop, disp_hdr, tmp, true, cont); //swap >> disp_hdr and box register, disp_hdr is already loaded with object mark >> word/ >> This issue might cause problem when running with option >> -XX:-UseOptoBiasInlining in following scenario, let?s check above code >> in [Biased lock check expanded in aarch64_enc_fast_lock], x12 is box and >> x10 is disp_hdr. >> 1. Suppose object?s mark word (loaded into register x10) is in biased >> mode, with content ?[biased_thread |epoch|age| 101]? and biased_thread >> is executing its synchronized block. >> 2. Another thread tries to acquire the same lock. Firstly, it performs >> biased pattern check and fails, because ?mark word? register used here >> is X12 (correct register should be x10). >> 3. As x12 is not ?biased? (least three significant bits of SP + 0x10 >> would never be 101), execution goes to thin lock CAS acquire code >> instead of biased lock revoke/rebias code. >> 4. Thin lock CAS acquire will succeed because x10?s least two >> significant bit is 01 (thin lock CAS code uses disp_hdr (x10) as mark >> word). Two threads acquire same lock at same time and this is incorrect >> behavior. >> >> *3. Inflate monitor code has typo in aarch64_enc_fast_lock* >> Inflated lock test is generated under condition (EmitSync & 0x02), while >> generating inflated lock fast path under condition "if ((EmitSync & >> 0x02) == 0))". At both location, they should be "if ((EmitSync & 0x02) >> == 0) ". In orig.asm, no instruction branches to inflated lock acquire >> fast path at 0x000003ff88320f24. >> Issue #1 and #3 does not impact correctness, they introduce redundant >> code (double biased lock check) and skip inflated lock fast path check >> (_owner is null case). >> Fix is in aarch64_enc_fast_lock/aarch64_enc_fast_unlock, this will not >> impact C1 and interpreter. Attached patch includes: >> 1. Disable generating biased lock handle code in fast_lock/fast_unlock >> when UseOptoBiasInlining is true. >> 2. Adjust biased_locking_enter?s actual parameters, swap disp_hdr and >> box register. >> 3. Fix typo in inflated monitor handling. >> >> Regards >> Shi Hui >> > From aph at redhat.com Mon Aug 24 09:31:59 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 24 Aug 2015 10:31:59 +0100 Subject: [aarch64-port-dev ] An interesting bug Message-ID: <55DAE48F.2030403@redhat.com> This is JDK8 on AArch64. I have an interesting bug which causes a crash. Occasionally, but rarely, I get a crash when accessing an instance of ForkJoinTask. The instance is allocated in the old generation of the parallel scan GC. However, the "this" pointer of the ForkJoinTask points into the middle of another allocated object. I'm trying to figure out how this might happen. Perhaps the root scan of the thread running the ForkJoinTask did not happen, or for some reason the roots it found were not added to the root set. Or perhaps they were found, but were not rewritten in the thread. Has anyone seen anything like this before? Do you have any advice for me? Thanks, Andrew. From aph at redhat.com Mon Aug 24 10:07:49 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 24 Aug 2015 11:07:49 +0100 Subject: [aarch64-port-dev ] Patches to backport to jdk8 In-Reply-To: <1440083195.10528.4.camel@mylittlepony.linaroharston> References: <1440083195.10528.4.camel@mylittlepony.linaroharston> Message-ID: <55DAECF5.4040303@redhat.com> On 08/20/2015 04:06 PM, Edward Nevill wrote: > Is it OK to backport the following patches from jdk9 to jdk8. Yes, please. Andrew. From aph at redhat.com Mon Aug 24 10:12:29 2015 From: aph at redhat.com (Andrew Haley) Date: Mon, 24 Aug 2015 11:12:29 +0100 Subject: [aarch64-port-dev ] An interesting bug In-Reply-To: <55DAE48F.2030403@redhat.com> References: <55DAE48F.2030403@redhat.com> Message-ID: <55DAEE0D.1020907@redhat.com> I suppose another possibility might be that the worker threads promoting to the old generation somehow (because of a failed compare-and-swap) overwrote each other's data. Andrew. From adinn at redhat.com Mon Aug 24 14:31:29 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 24 Aug 2015 15:31:29 +0100 Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code Message-ID: <55DB2AC1.50208@redhat.com> The following webrev against hs-comp head fixes 8080293 http://cr.openjdk.java.net/~adinn/8080293/webrev.00/ It is a follow on to the prior volatile object patch 8078743: AARCH64: Extend use of stlr to cater for volatile object stores http://cr.openjdk.java.net/~adinn/8078743/webrev.04/ and requires that previous patch to be applied first. Testing ------- The patch is sensitive to GC configuration so it was tested against 5 relevant configs G1 CMS+UseCondCardMark CMS-UseCondCardMark Par+UseCondCardMark Par-UseCondCardMark The validity of the transformation was verified by: generating and eyeballing compiled code for simple test programs successfully running a fairly large program (netbeans) generating and eyeballing HashMap code compiled on a fairly large program run The fix was performance tested on 2 implementations of the AArch64 architecture (more details below). On an O-O-O CPU it gave no noticeable benefit. On a simple pipeline CPU it gave a very significant benefit in specific cases. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) The Test -------- As with the prior patch I tested the original vs new code generation strategy by running a jmh test first with -XX:+UseBarriersForVolatile and then with -XX:+UseBarriersForVolatile. Four different test programs ran in all 5 GC configs executing. Each test executed repeated CAS operations to an object field in a single thread with a BlackHole backoff between CASes varying from 0 to 64. Test one performed a CAS guaranteed to fail; test two performed a successful CAS from a fixed object to null and then back; test three performed a successful CAS from a fixed object to another fixed object and then back; test four performed a successful CAS from a fixed object to a newly allocated object and then back. The average time per CAS operation (ns/op) -- actually per 2 CAS operations for the latter 3 tests -- was used as a score. The Results ----------- On an O-O-O CPU there was no significant difference in the time taken. On a simple pipeline CPU the optimization gave a very significant benefit for the Fail tests on all GC configurations except CMS + UseCondCardMark. In all other cases there was no significant measurable benefit. Example Test ------------ package org.openjdk; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.infra.Blackhole; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicReference; @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @Fork(3) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class CasNull { Object tombstone; AtomicReference ref; @Param({"0", "1", "2", "4", "8", "16", "32", "64"}) int backoff; @Setup public void setup() { tombstone = new Object(); ref = new AtomicReference<>(); ref.set(tombstone); } @Benchmark public boolean test() { Blackhole.consumeCPU(backoff); ref.compareAndSet(tombstone, null); ref.compareAndSet(null, tombstone); return true; } } From adinn at redhat.com Mon Aug 24 15:25:08 2015 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 24 Aug 2015 16:25:08 +0100 Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues In-Reply-To: <55D79701.9090407@oracle.com> References: <55D79701.9090407@oracle.com> Message-ID: <55DB3754.9090504@redhat.com> Thank you for proposing these fixes. Your analysis and patch both look to me to be completely correct. I applied the patch and it appears to work fine when running programs with biased locking enabled. I have raised the following JIRA for this fix: https://bugs.openjdk.java.net/browse/JDK-8134322 I will post a webrev containing your patch for review as soon as possible. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From vladimir.kozlov at oracle.com Tue Aug 25 00:24:29 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 24 Aug 2015 17:24:29 -0700 Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DB2AC1.50208@redhat.com> References: <55DB2AC1.50208@redhat.com> Message-ID: <55DBB5BD.4040400@oracle.com> I did not look deep on code logic. Few comments only: Use {} for all conditional code (it cause a lot of pain in the past): if (is_cas) return NULL; You don't need #ifndef PRODUCT: +#ifndef PRODUCT +#ifdef ASSERT New mach instructs missing predicate: predicate(needs_acquiring_load_exclusive(n)); You use higher ins_cost to avoid their generation when predicate is false. So why not explicit predicate? Thanks, Vladimir On 8/24/15 7:31 AM, Andrew Dinn wrote: > The following webrev against hs-comp head fixes 8080293 > > http://cr.openjdk.java.net/~adinn/8080293/webrev.00/ > > It is a follow on to the prior volatile object patch > > 8078743: AARCH64: Extend use of stlr to cater for volatile object stores > http://cr.openjdk.java.net/~adinn/8078743/webrev.04/ > > and requires that previous patch to be applied first. > > Testing > ------- > > The patch is sensitive to GC configuration so it was tested against 5 > relevant configs > > G1 > CMS+UseCondCardMark > CMS-UseCondCardMark > Par+UseCondCardMark > Par-UseCondCardMark > > The validity of the transformation was verified by: > > generating and eyeballing compiled code for simple test programs > successfully running a fairly large program (netbeans) > generating and eyeballing HashMap code compiled on a fairly large > program run > > The fix was performance tested on 2 implementations of the AArch64 > architecture (more details below). On an O-O-O CPU it gave no noticeable > benefit. On a simple pipeline CPU it gave a very significant benefit in > specific cases. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > > The Test > -------- > > As with the prior patch I tested the original vs new code generation > strategy by running a jmh test first with -XX:+UseBarriersForVolatile > and then with -XX:+UseBarriersForVolatile. Four different test programs > ran in all 5 GC configs executing. Each test executed repeated CAS > operations to an object field in a single thread with a BlackHole > backoff between CASes varying from 0 to 64. > > Test one performed a CAS guaranteed to fail; test two performed a > successful CAS from a fixed object to null and then back; test three > performed a successful CAS from a fixed object to another fixed object > and then back; test four performed a successful CAS from a fixed > object to a newly allocated object and then back. The average time per > CAS operation (ns/op) -- actually per 2 CAS operations for the latter 3 > tests -- was used as a score. > > The Results > ----------- > > On an O-O-O CPU there was no significant difference in the time taken. > > On a simple pipeline CPU the optimization gave a very significant > benefit for the Fail tests on all GC configurations except CMS > + UseCondCardMark. In all other cases there was no significant > measurable benefit. > > Example Test > ------------ > > package org.openjdk; > > import org.openjdk.jmh.annotations.*; > import org.openjdk.jmh.infra.Blackhole; > > import java.util.concurrent.TimeUnit; > import java.util.concurrent.atomic.AtomicReference; > > @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS) > @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) > @Fork(3) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @State(Scope.Benchmark) > public class CasNull { > > Object tombstone; > > AtomicReference ref; > > @Param({"0", "1", "2", "4", "8", "16", "32", "64"}) > int backoff; > > @Setup > public void setup() { > tombstone = new Object(); > > ref = new AtomicReference<>(); > ref.set(tombstone); > } > > @Benchmark > public boolean test() { > Blackhole.consumeCPU(backoff); > ref.compareAndSet(tombstone, null); > ref.compareAndSet(null, tombstone); > return true; > } > } > From adinn at redhat.com Tue Aug 25 08:20:47 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 25 Aug 2015 09:20:47 +0100 Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DBB5BD.4040400@oracle.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> Message-ID: <55DC255F.7070903@redhat.com> Hi Vladimir, Thank you very much for your review. On 25/08/15 01:24, Vladimir Kozlov wrote: > I did not look deep on code logic. > Few comments only: > > Use {} for all conditional code (it cause a lot of pain in the past): > > if (is_cas) > return NULL; Ok, I'll correct all of those. > You don't need #ifndef PRODUCT: > > +#ifndef PRODUCT > +#ifdef ASSERT Ah, that's good to know. Thanks. > New mach instructs missing predicate: > > predicate(needs_acquiring_load_exclusive(n)); > > You use higher ins_cost to avoid their generation when predicate is > false. So why not explicit predicate? I had two separate reasons for not repeating the predicates: 1 They do quite a lot of work crawling the graph. So, calling the predicate in the lower cost case and omitting it in the higher cost case attempts to avoid the expense of executing it twice in some cases. 2 The ins_cost for the lower and higher cost cases is meant to reflect a difference in the expected execution cost associated with the instruction. [n.b. I adopted the same strategy for the new membar generation rules which were added as part of the volatile put optimization patch -- compare membar_release and unnecessary_membar_release] However, looking again at the code I believe I have the costs (and hence the predicates) attached to the wrong rules in each pair. For example, currently the rules include the following details compareAndSwapIAcq -- does not emit dmb instructions no predicate cost (2 * VOLATILE_REF_COST ) compareAndSwapI -- emits dmb instructions predicate(!needs_acquiring_load_exclusive(n)) cost VOLATILE_REF_COST The patch presumes that the first rule will have a lower (or at least no higher) cost than the second. So a correct version would be either, calling the predicate once: A: compareAndSwapIAcq -- does not emit dmb instructions predicate(needs_acquiring_load_exclusive(n)) cost VOLATILE_REF_COST compareAndSwapI-- emits dmb instructions no predicate cost (2 * VOLATILE_REF_COST) or, calling the predicate twice: B: compareAndSwapIAcq -- does not emit dmb instructions predicate(needs_acquiring_load_exclusive(n)) cost VOLATILE_REF_COST compareAndSwapI-- emits dmb instructions predicate(!needs_acquiring_load_exclusive(n)) cost (2 * VOLATILE_REF_COST) I would prefer to retain A over B. A is guaranteed to give the same result as B with potentially less execution overhead. However, if you feel B is required I will adopt that format for all rules. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From adinn at redhat.com Tue Aug 25 12:50:43 2015 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 25 Aug 2015 13:50:43 +0100 Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DC255F.7070903@redhat.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> <55DC255F.7070903@redhat.com> Message-ID: <55DC64A3.9000907@redhat.com> A webrev in the light of Vladimir's feedback has been uploaded at the following URL http://cr.openjdk.java.net/~adinn/8080293/webrev.01/ With this version I have made the changes Vladimir requested except for the one detail I queried in my previous response. I have only called the needs_acquiring_load_exclusive predicate in one of each of the pairs of CompareAndSwapX rules. However, I have corrected the rule costs so that they actually reflect the expected cost of the alternative generated code segments (and adjusted the location+sense of the predicate expression accordingly). [Vladimir, if you really require the predicate to be called in both rules I will prepare another webrev] I still need another reviewer and an hs-comp committer to look at this. Could someone from the aarch64-port-dev group (or maybe Alexey Shipilev) please provide a second review and agree to sponsor the patch? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) On 25/08/15 09:20, Andrew Dinn wrote: > Hi Vladimir, > > Thank you very much for your review. > > On 25/08/15 01:24, Vladimir Kozlov wrote: >> I did not look deep on code logic. >> Few comments only: >> >> Use {} for all conditional code (it cause a lot of pain in the past): >> >> if (is_cas) >> return NULL; > > Ok, I'll correct all of those. > >> You don't need #ifndef PRODUCT: >> >> +#ifndef PRODUCT >> +#ifdef ASSERT > > Ah, that's good to know. Thanks. > >> New mach instructs missing predicate: >> >> predicate(needs_acquiring_load_exclusive(n)); >> >> You use higher ins_cost to avoid their generation when predicate is >> false. So why not explicit predicate? > > I had two separate reasons for not repeating the predicates: > > 1 They do quite a lot of work crawling the graph. So, calling the > predicate in the lower cost case and omitting it in the higher cost case > attempts to avoid the expense of executing it twice in some cases. > > 2 The ins_cost for the lower and higher cost cases is meant to reflect > a difference in the expected execution cost associated with the instruction. > > [n.b. I adopted the same strategy for the new membar generation rules > which were added as part of the volatile put optimization patch -- > compare membar_release and unnecessary_membar_release] > > However, looking again at the code I believe I have the costs (and hence > the predicates) attached to the wrong rules in each pair. For example, > currently the rules include the following details > > compareAndSwapIAcq -- does not emit dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST ) > > compareAndSwapI -- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > The patch presumes that the first rule will have a lower (or at least no > higher) cost than the second. So a correct version would be either, > calling the predicate once: > > A: > > compareAndSwapIAcq -- does not emit dmb instructions > predicate(needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > compareAndSwapI-- emits dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST) > > or, calling the predicate twice: > > B: > > compareAndSwapIAcq -- does not emit dmb instructions > predicate(needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > compareAndSwapI-- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost (2 * VOLATILE_REF_COST) > > I would prefer to retain A over B. A is guaranteed to give the same > result as B with potentially less execution overhead. However, if you > feel B is required I will adopt that format for all rules. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > > From vladimir.kozlov at oracle.com Tue Aug 25 17:14:19 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 25 Aug 2015 10:14:19 -0700 Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DC255F.7070903@redhat.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> <55DC255F.7070903@redhat.com> Message-ID: <55DCA26B.3020508@oracle.com> Okay, I agree to have only one predicate. So I am fine with version A). Thanks, Vladimir PS: "first rule will have a lower" - should compareAndSwapI be first then? > However, looking again at the code I believe I have the costs (and hence > the predicates) attached to the wrong rules in each pair. For example, > currently the rules include the following details > > compareAndSwapIAcq -- does not emit dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST ) > > compareAndSwapI -- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > The patch presumes that the first rule will have a lower (or at least no > higher) cost than the second. So a correct version would be either, > calling the predicate once: On 8/25/15 1:20 AM, Andrew Dinn wrote: > Hi Vladimir, > > Thank you very much for your review. > > On 25/08/15 01:24, Vladimir Kozlov wrote: >> I did not look deep on code logic. >> Few comments only: >> >> Use {} for all conditional code (it cause a lot of pain in the past): >> >> if (is_cas) >> return NULL; > > Ok, I'll correct all of those. > >> You don't need #ifndef PRODUCT: >> >> +#ifndef PRODUCT >> +#ifdef ASSERT > > Ah, that's good to know. Thanks. > >> New mach instructs missing predicate: >> >> predicate(needs_acquiring_load_exclusive(n)); >> >> You use higher ins_cost to avoid their generation when predicate is >> false. So why not explicit predicate? > > I had two separate reasons for not repeating the predicates: > > 1 They do quite a lot of work crawling the graph. So, calling the > predicate in the lower cost case and omitting it in the higher cost case > attempts to avoid the expense of executing it twice in some cases. > > 2 The ins_cost for the lower and higher cost cases is meant to reflect > a difference in the expected execution cost associated with the instruction. > > [n.b. I adopted the same strategy for the new membar generation rules > which were added as part of the volatile put optimization patch -- > compare membar_release and unnecessary_membar_release] > > However, looking again at the code I believe I have the costs (and hence > the predicates) attached to the wrong rules in each pair. For example, > currently the rules include the following details > > compareAndSwapIAcq -- does not emit dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST ) > > compareAndSwapI -- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > The patch presumes that the first rule will have a lower (or at least no > higher) cost than the second. So a correct version would be either, > calling the predicate once: > > A: > > compareAndSwapIAcq -- does not emit dmb instructions > predicate(needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > compareAndSwapI-- emits dmb instructions > no predicate > cost (2 * VOLATILE_REF_COST) > > or, calling the predicate twice: > > B: > > compareAndSwapIAcq -- does not emit dmb instructions > predicate(needs_acquiring_load_exclusive(n)) > cost VOLATILE_REF_COST > > compareAndSwapI-- emits dmb instructions > predicate(!needs_acquiring_load_exclusive(n)) > cost (2 * VOLATILE_REF_COST) > > I would prefer to retain A over B. A is guaranteed to give the same > result as B with potentially less execution overhead. However, if you > feel B is required I will adopt that format for all rules. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From adinn at redhat.com Wed Aug 26 09:35:55 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 26 Aug 2015 10:35:55 +0100 Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DCA26B.3020508@oracle.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> <55DC255F.7070903@redhat.com> <55DCA26B.3020508@oracle.com> Message-ID: <55DD887B.5090403@redhat.com> On 25/08/15 18:14, Vladimir Kozlov wrote: > Okay, I agree to have only one predicate. So I am fine with version A). Thanks, Vladimir. So, that is now as provided in the latest posted webrev: http://cr.openjdk.java.net/~adinn/8080293/webrev.01/ > PS: "first rule will have a lower" - should compareAndSwapI be first then? Sorry, I think the problem here is that I explained the status of the original patch in a rather confusing way. I am not sure it matters all that much which rule appears first. Or do you really want the lower cost rule to appear before the higher cost one? . . . >> However, looking again at the code I believe I have the costs (and hence >> the predicates) attached to the wrong rules in each pair. For example, >> currently the rules include the following details >> >> compareAndSwapIAcq -- does not emit dmb instructions >> no predicate >> cost (2 * VOLATILE_REF_COST ) >> >> compareAndSwapI -- emits dmb instructions >> predicate(!needs_acquiring_load_exclusive(n)) >> cost VOLATILE_REF_COST . . . what I meant by that comment was that this: - The optimization implemented in this patch is based on an assumption that a generation strategy using dmb -- i.e. the one encoded by compareAndSwapI -- will execute more slowly, or at least no faster, than a generation strategy using stlr -- i.e. the one encoded by compareAndSwapIAcq. - The text above displays the original costs and predicates used to enforce the required rule selection. - In that version the /costs/ are the wrong way round with respect to the /motivating assumption/ i.e. compareAndSwapI has a lower cost than compareAndSwapIAcq. In version A the costs reflect the motivating assumption i.e. for each X in {I, L, P, N} rule compareAndSwapXAcq has a lower cost than compareAndSwapX. However, it is also true that for each X in {I, L, P, N} rule compareAndSwapX appears earlier than compareAndSwapXAcq. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) From aph at redhat.com Wed Aug 26 13:04:09 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Aug 2015 14:04:09 +0100 Subject: [aarch64-port-dev ] HotSpot debugger for JDK8 Message-ID: <55DDB949.3010404@redhat.com> http://cr.openjdk.java.net/~aph/jdk8-hsdb/ This is a fairly straightforward backport from JDK9. Andrew. From vladimir.kozlov at oracle.com Wed Aug 26 15:41:14 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 26 Aug 2015 08:41:14 -0700 Subject: [aarch64-port-dev ] RFR: 8080293: AARCH64: Remove unnecessary dmbs from generated CAS code In-Reply-To: <55DD887B.5090403@redhat.com> References: <55DB2AC1.50208@redhat.com> <55DBB5BD.4040400@oracle.com> <55DC255F.7070903@redhat.com> <55DCA26B.3020508@oracle.com> <55DD887B.5090403@redhat.com> Message-ID: <55DDDE1A.7030008@oracle.com> Looks good. Thank you for clarification. Vladimir On 8/26/15 2:35 AM, Andrew Dinn wrote: > On 25/08/15 18:14, Vladimir Kozlov wrote: >> Okay, I agree to have only one predicate. So I am fine with version A). > > Thanks, Vladimir. So, that is now as provided in the latest posted webrev: > > http://cr.openjdk.java.net/~adinn/8080293/webrev.01/ > >> PS: "first rule will have a lower" - should compareAndSwapI be first then? > > Sorry, I think the problem here is that I explained the status of the > original patch in a rather confusing way. I am not sure it matters all > that much which rule appears first. Or do you really want the lower cost > rule to appear before the higher cost one? . . . > >>> However, looking again at the code I believe I have the costs (and hence >>> the predicates) attached to the wrong rules in each pair. For example, >>> currently the rules include the following details >>> >>> compareAndSwapIAcq -- does not emit dmb instructions >>> no predicate >>> cost (2 * VOLATILE_REF_COST ) >>> >>> compareAndSwapI -- emits dmb instructions >>> predicate(!needs_acquiring_load_exclusive(n)) >>> cost VOLATILE_REF_COST > > > . . . what I meant by that comment was that this: > > - The optimization implemented in this patch is based on an assumption > that a generation strategy using dmb -- i.e. the one encoded by > compareAndSwapI -- will execute more slowly, or at least no faster, than > a generation strategy using stlr -- i.e. the one encoded by > compareAndSwapIAcq. > > - The text above displays the original costs and predicates used to > enforce the required rule selection. > > - In that version the /costs/ are the wrong way round with respect to > the /motivating assumption/ i.e. compareAndSwapI has a lower cost than > compareAndSwapIAcq. > > In version A the costs reflect the motivating assumption i.e. for each > X in {I, L, P, N} rule compareAndSwapXAcq has a lower cost than > compareAndSwapX. > > However, it is also true that for each X in {I, L, P, N} rule > compareAndSwapX appears earlier than compareAndSwapXAcq. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in UK and Wales under Company Registration No. 3798903 > Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters > (USA), Michael O'Neill (Ireland) > From adinn at redhat.com Wed Aug 26 16:32:58 2015 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 26 Aug 2015 17:32:58 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55DB3754.9090504@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> Message-ID: <55DDEA3A.8070603@redhat.com> The following AArch64-only webrev against hs-comp (fix contributed by Hui Sha of Linaro) fixes several problems with biased locking on AArch64 http://cr.openjdk.java.net/~adinn/8134322/webrev.00/ I have reviewed the patch. It requires one more AArch64 reviewer who can also commit it to hs-comp (Andrew Haley?). When I tested it running netbeans with biased locking enabled it seemed (just by feel) to improve performance. A follow up to enable biased locking by default might be worth investigating. n.b. I built this on hs-comp on top of my (almost but not yet committed) patch for 8080293 so that patch also gets shown in the webrev. The two patches are independent and should both commit cleanly with or without the other. From aph at redhat.com Wed Aug 26 16:40:34 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Aug 2015 17:40:34 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55DDEA3A.8070603@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com> Message-ID: <55DDEC02.6080502@redhat.com> On 08/26/2015 05:32 PM, Andrew Dinn wrote: > The following AArch64-only webrev against hs-comp (fix contributed by > Hui Sha of Linaro) fixes several problems with biased locking on AArch64 > > http://cr.openjdk.java.net/~adinn/8134322/webrev.00/ > > I have reviewed the patch. It requires one more AArch64 reviewer who can > also commit it to hs-comp (Andrew Haley?). This is OK. Andrew. From aph at redhat.com Thu Aug 27 15:19:15 2015 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Aug 2015 16:19:15 +0100 Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues In-Reply-To: <55D79701.9090407@oracle.com> References: <55D79701.9090407@oracle.com> Message-ID: <55DF2A73.2060207@redhat.com> On 08/21/2015 10:24 PM, Vladimir Kozlov wrote: > Thank you for report and suggested fixes. > > CC to aarch64 port developers. May I push this, or do I need you to review it? Andrew. From aph at redhat.com Thu Aug 27 17:06:11 2015 From: aph at redhat.com (aph at redhat.com) Date: Thu, 27 Aug 2015 17:06:11 +0000 Subject: [aarch64-port-dev ] hg: aarch64-port/jdk8/hotspot: 8078521: AARCH64: Add AArch64 SA support Message-ID: <201508271706.t7RH6BhB028427@aojmv0008.oracle.com> Changeset: 1c4ef82d32d1 Author: aph Date: 2015-08-20 09:10 +0000 URL: http://hg.openjdk.java.net/aarch64-port/jdk8/hotspot/rev/1c4ef82d32d1 8078521: AARCH64: Add AArch64 SA support Summary: Add AArch64 SA support ! agent/make/Makefile ! agent/src/os/linux/LinuxDebuggerLocal.c ! agent/src/os/linux/Makefile ! agent/src/share/classes/sun/jvm/hotspot/HSDB.java + agent/src/share/classes/sun/jvm/hotspot/debugger/MachineDescriptionAARCH64.java + agent/src/share/classes/sun/jvm/hotspot/debugger/aarch64/AARCH64ThreadContext.java ! agent/src/share/classes/sun/jvm/hotspot/debugger/linux/LinuxCDebugger.java + agent/src/share/classes/sun/jvm/hotspot/debugger/linux/aarch64/LinuxAARCH64CFrame.java + agent/src/share/classes/sun/jvm/hotspot/debugger/linux/aarch64/LinuxAARCH64ThreadContext.java ! agent/src/share/classes/sun/jvm/hotspot/debugger/proc/ProcDebuggerLocal.java + agent/src/share/classes/sun/jvm/hotspot/debugger/proc/aarch64/ProcAARCH64Thread.java + agent/src/share/classes/sun/jvm/hotspot/debugger/proc/aarch64/ProcAARCH64ThreadContext.java + agent/src/share/classes/sun/jvm/hotspot/debugger/proc/aarch64/ProcAARCH64ThreadFactory.java + agent/src/share/classes/sun/jvm/hotspot/debugger/remote/aarch64/RemoteAARCH64Thread.java + agent/src/share/classes/sun/jvm/hotspot/debugger/remote/aarch64/RemoteAARCH64ThreadContext.java + agent/src/share/classes/sun/jvm/hotspot/debugger/remote/aarch64/RemoteAARCH64ThreadFactory.java ! agent/src/share/classes/sun/jvm/hotspot/runtime/Threads.java + agent/src/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64CurrentFrameGuess.java + agent/src/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java + agent/src/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64JavaCallWrapper.java + agent/src/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64RegisterMap.java + agent/src/share/classes/sun/jvm/hotspot/runtime/linux_aarch64/LinuxAARCH64JavaThreadPDAccess.java ! agent/src/share/classes/sun/jvm/hotspot/utilities/PlatformInfo.java ! make/linux/makefiles/defs.make ! make/linux/makefiles/sa.make ! make/linux/makefiles/saproc.make ! make/sa.files From vladimir.kozlov at oracle.com Fri Aug 28 02:58:04 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2015 19:58:04 -0700 Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55DDEA3A.8070603@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com> Message-ID: <55DFCE3C.3040201@oracle.com> Looks good to me. Vladimir On 8/26/15 9:32 AM, Andrew Dinn wrote: > The following AArch64-only webrev against hs-comp (fix contributed by > Hui Sha of Linaro) fixes several problems with biased locking on AArch64 > > http://cr.openjdk.java.net/~adinn/8134322/webrev.00/ > > I have reviewed the patch. It requires one more AArch64 reviewer who can > also commit it to hs-comp (Andrew Haley?). > > When I tested it running netbeans with biased locking enabled it seemed > (just by feel) to improve performance. A follow up to enable biased > locking by default might be worth investigating. > > n.b. I built this on hs-comp on top of my (almost but not yet committed) > patch for 8080293 so that patch also gets shown in the webrev. The two > patches are independent and should both commit cleanly with or without > the other. > > > > > From vladimir.kozlov at oracle.com Fri Aug 28 03:00:56 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 27 Aug 2015 20:00:56 -0700 Subject: [aarch64-port-dev ] aarch64: C2 fast lock/unlock issues In-Reply-To: <55DF2A73.2060207@redhat.com> References: <55D79701.9090407@oracle.com> <55DF2A73.2060207@redhat.com> Message-ID: <55DFCEE8.3060806@oracle.com> Reviewed. Push. We need at least one official *Reviewer* for all changesets. Thanks, Vladimir On 8/27/15 8:19 AM, Andrew Haley wrote: > On 08/21/2015 10:24 PM, Vladimir Kozlov wrote: >> Thank you for report and suggested fixes. >> >> CC to aarch64 port developers. > > May I push this, or do I need you to review it? > > Andrew. > From aph at redhat.com Fri Aug 28 15:21:52 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 28 Aug 2015 16:21:52 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55DDEA3A.8070603@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com> Message-ID: <55E07C90.6000905@redhat.com> On 08/26/2015 05:32 PM, Andrew Dinn wrote: > n.b. I built this on hs-comp on top of my (almost but not yet committed) > patch for 8080293 so that patch also gets shown in the webrev. The two > patches are independent and should both commit cleanly with or without > the other. Sure, but I have no way to get the changesets. All that is in the webrev is one patch. Please send me the changesets from "hg export" Andrew. From adinn at redhat.com Sat Aug 29 08:53:21 2015 From: adinn at redhat.com (Andrew Dinn) Date: Sat, 29 Aug 2015 09:53:21 +0100 Subject: [aarch64-port-dev ] RFR: AArch64: 8134322: Fix several errors in C2 biased locking implementation In-Reply-To: <55E07C90.6000905@redhat.com> References: <55D79701.9090407@oracle.com> <55DB3754.9090504@redhat.com> <55DDEA3A.8070603@redhat.com> <55E07C90.6000905@redhat.com> Message-ID: <55E17301.2050406@redhat.com> On 28/08/15 16:21, Andrew Haley wrote: > On 08/26/2015 05:32 PM, Andrew Dinn wrote: >> n.b. I built this on hs-comp on top of my (almost but not yet committed) >> patch for 8080293 so that patch also gets shown in the webrev. The two >> patches are independent and should both commit cleanly with or without >> the other. > > Sure, but I have no way to get the changesets. All that is > in the webrev is one patch. > > Please send me the changesets from "hg export" change set below regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters (USA), Michael O'Neill (Ireland) ----- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< --- # HG changeset patch # User adinn # Date 1440605639 -3600 # Wed Aug 26 17:13:59 2015 +0100 # Node ID e699c7a83f8e9630f79ef83066a64d1e38c72ce2 # Parent d7185fbee7e5e0f40ae4286f29055bed5da6bd58 8134322: AArch64: Fix several errors in C2 biased locking implementation Summary: Several errors in C2 biased locking require fixing Reviewed-by: adinn Contributed-by: Hui Shi (hui.shi at linaro.org) diff -r d7185fbee7e5 -r e699c7a83f8e src/cpu/aarch64/vm/aarch64.ad --- a/src/cpu/aarch64/vm/aarch64.ad Tue Aug 25 08:25:38 2015 -0400 +++ b/src/cpu/aarch64/vm/aarch64.ad Wed Aug 26 17:13:59 2015 +0100 @@ -4896,12 +4896,12 @@ return; } - if (UseBiasedLocking) { - __ biased_locking_enter(disp_hdr, oop, box, tmp, true, cont); + if (UseBiasedLocking && !UseOptoBiasInlining) { + __ biased_locking_enter(box, oop, disp_hdr, tmp, true, cont); } // Handle existing monitor - if (EmitSync & 0x02) { + if ((EmitSync & 0x02) == 0) { // we can use AArch64's bit test and branch here but // markoopDesc does not define a bit index just the bit value // so assert in case the bit pos changes @@ -5041,7 +5041,7 @@ return; } - if (UseBiasedLocking) { + if (UseBiasedLocking && !UseOptoBiasInlining) { __ biased_locking_exit(oop, tmp, cont); } -------------- next part -------------- A non-text attachment was scrubbed... Name: fastlock.patch Type: text/x-patch Size: 1330 bytes Desc: not available URL: From felix.yang at linaro.org Sat Aug 29 13:13:27 2015 From: felix.yang at linaro.org (Felix Yang) Date: Sat, 29 Aug 2015 21:13:27 +0800 Subject: [aarch64-port-dev ] RFR: Disable C2 peephole by default for aarch64 Message-ID: Hi JIT members, Currently, the C2 peephole optimization is only enabled by default for x86 & aarch64 port. But we don't have any peephole rules for aarch64 port, scanning the instruction stream in PhasePeephole::do_transform does not make sense but a waste of time for this port. So I am disabling this pass for aarch64 port by default, can anyone sponsor this if approved? PATCH: diff -r a6acc533dfef src/cpu/aarch64/vm/c2_globals_aarch64.hpp --- a/src/cpu/aarch64/vm/c2_globals_aarch64.hpp Wed Aug 19 16:16:54 2015 +0100 +++ b/src/cpu/aarch64/vm/c2_globals_aarch64.hpp Fri Aug 28 09:28:07 2015 +0800 @@ -69,7 +69,7 @@ // Peephole and CISC spilling both break the graph, and so makes the // scheduler sick. -define_pd_global(bool, OptoPeephole, true); +define_pd_global(bool, OptoPeephole, false); define_pd_global(bool, UseCISCSpill, true); define_pd_global(bool, OptoScheduling, false); define_pd_global(bool, OptoBundling, false);