From roman at kennke.org Tue Oct 4 08:55:05 2016 From: roman at kennke.org (roman at kennke.org) Date: Tue, 04 Oct 2016 08:55:05 +0000 Subject: hg: shenandoah/jdk9/hotspot: Cleanup Shenandoah arguments. Message-ID: <201610040855.u948t5tP003461@aojmv0008.oracle.com> Changeset: 12ce07b230f7 Author: rkennke Date: 2016-10-04 10:53 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/12ce07b230f7 Cleanup Shenandoah arguments. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahRootProcessor.cpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp + test/gc/shenandoah/TestShenandoahArgumentRanges.java From rkennke at redhat.com Tue Oct 4 14:22:05 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 04 Oct 2016 16:22:05 +0200 Subject: RFR: Rewrite Shenandoah logging to use new logging framework Message-ID: <1475590925.2564.26.camel@redhat.com> This change removes all the various ShenandoahTrace* and similar options, and replaces them with appropriate calls to the new logging framework. http://cr.openjdk.java.net/~rkennke/shenandoah-logging/webrev.00/ Ok to commit? Roman From zgu at redhat.com Tue Oct 4 14:57:22 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 4 Oct 2016 10:57:22 -0400 Subject: RFR: Rewrite Shenandoah logging to use new logging framework In-Reply-To: <1475590925.2564.26.camel@redhat.com> References: <1475590925.2564.26.camel@redhat.com> Message-ID: <745f1685-77a4-9db1-9de9-a1d3dafd4c32@redhat.com> Looks good to me. -Zhengyu On 10/04/2016 10:22 AM, Roman Kennke wrote: > This change removes all the various ShenandoahTrace* and similar > options, and replaces them with appropriate calls to the new logging > framework. > > http://cr.openjdk.java.net/~rkennke/shenandoah-logging/webrev.00/ > > Ok to commit? > > Roman From zgu at redhat.com Tue Oct 4 19:10:39 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 4 Oct 2016 15:10:39 -0400 Subject: Shenandoah crashes when running compiler/c2/cr6865031 test Message-ID: <3926e9de-971c-5cdd-340e-9f556cb7bf55@redhat.com> Hi, It crashes in a compiled frame (C2). What interesting is that, it only crashes when running with Shenandoah GC, but not with G1. Can someone take a look? Thanks, -Zhengyu From zgu at redhat.com Tue Oct 4 20:29:57 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 4 Oct 2016 16:29:57 -0400 Subject: RFR(XS): Shenandoah support in jvmci Message-ID: Please review the simple changes that fix most of jvmci test failures. Webrev:http://cr.openjdk.java.net/~zgu/jvmci/webrev/ Thanks, -Zhengyu From rkennke at redhat.com Tue Oct 4 21:04:19 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 4 Oct 2016 17:04:19 -0400 (EDT) Subject: RFR(XS): Shenandoah support in jvmci Message-ID: <278240069.1749675.1475615059527.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com> OK. Roman Am 04.10.2016 10:30 nachm. schrieb Zhengyu Gu : > > Please review the simple changes that fix most of jvmci test failures. > > Webrev:http://cr.openjdk.java.net/~zgu/jvmci/webrev/ > > > > Thanks, > > -Zhengyu > From rwestrel at redhat.com Wed Oct 5 07:18:28 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 05 Oct 2016 09:18:28 +0200 Subject: Shenandoah crashes when running compiler/c2/cr6865031 test In-Reply-To: <3926e9de-971c-5cdd-340e-9f556cb7bf55@redhat.com> References: <3926e9de-971c-5cdd-340e-9f556cb7bf55@redhat.com> Message-ID: > It crashes in a compiled frame (C2). What interesting is that, it only > crashes when running with Shenandoah GC, but not with G1. > > Can someone take a look? I will take a look. Roland. From roman at kennke.org Wed Oct 5 07:42:07 2016 From: roman at kennke.org (roman at kennke.org) Date: Wed, 05 Oct 2016 07:42:07 +0000 Subject: hg: shenandoah/jdk9/hotspot: Rewrite Shenandoah logging to use unified logging framework. Message-ID: <201610050742.u957g8FU009647@aojmv0008.oracle.com> Changeset: 0d9863428c53 Author: rkennke Date: 2016-10-05 09:41 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/0d9863428c53 Rewrite Shenandoah logging to use unified logging framework. ! src/share/vm/gc/shenandoah/brooksPointer.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.hpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp From rkennke at redhat.com Wed Oct 5 07:45:19 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 05 Oct 2016 09:45:19 +0200 Subject: RFR: Rename enter/exit_critical to pin_object Message-ID: <1475653519.2626.2.camel@redhat.com> Hi, Christine suggested to rename enter_critical() and exit_critical() in CollectedHeap (and related) to pin_object() and unpin_object(). I agree that this is a more useful name and this change implements that: http://cr.openjdk.java.net/~rkennke/rename-critical/webrev.01/ Ok? Roman From roman at kennke.org Wed Oct 5 08:38:20 2016 From: roman at kennke.org (roman at kennke.org) Date: Wed, 05 Oct 2016 08:38:20 +0000 Subject: hg: shenandoah/jdk9/hotspot: Remove unnecessary and obsolete cmd line arg from test. Message-ID: <201610050838.u958cKwG021368@aojmv0008.oracle.com> Changeset: bbdfe1eadfed Author: rkennke Date: 2016-10-05 10:38 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/bbdfe1eadfed Remove unnecessary and obsolete cmd line arg from test. ! test/gc/shenandoah/HumongousRegionReclaimTest/TestHumongous.java From shade at redhat.com Wed Oct 5 10:55:00 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 5 Oct 2016 12:55:00 +0200 Subject: RFR: Rename enter/exit_critical to pin_object In-Reply-To: <1475653519.2626.2.camel@redhat.com> References: <1475653519.2626.2.camel@redhat.com> Message-ID: On 10/05/2016 09:45 AM, Roman Kennke wrote: > Christine suggested to rename enter_critical() and exit_critical() in > CollectedHeap (and related) to pin_object() and unpin_object(). I agree > that this is a more useful name and this change implements that: > > http://cr.openjdk.java.net/~rkennke/rename-critical/webrev.01/ > > Ok? OK. Would not mind have a little comment in collectedHeap.hpp. Thanks, -Aleksey From zgu at redhat.com Wed Oct 5 13:00:52 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 5 Oct 2016 09:00:52 -0400 Subject: Shenandoah crashes when running compiler/c2/cr6865031 test In-Reply-To: References: <3926e9de-971c-5cdd-340e-9f556cb7bf55@redhat.com> Message-ID: <86d13582-1953-8f02-7bf8-9ae3f0071aee@redhat.com> It appears that it crashes when pointing test jdk to /build/linux-x86_64-normal-server-fastdebug/jdk, but works fine when linux-x86_64-normal-server-fastdebug/images/jdk is used. Sorry for the noise. -Zhengyu On 10/05/2016 07:54 AM, Roland Westrelin wrote: > Hi Zhengyu, > >> It crashes in a compiled frame (C2). What interesting is that, it only >> crashes when running with Shenandoah GC, but not with G1. >> >> Can someone take a look? > I can't get it to crash. I ran it 100 times on my laptop and it runs > fine. Do you pass any option? Does it crash every time? Do you run a > debug or product build? > > Roland. From rwestrel at redhat.com Wed Oct 5 16:12:03 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 05 Oct 2016 18:12:03 +0200 Subject: Shenandoah crashes when running compiler/c2/cr6865031 test In-Reply-To: <86d13582-1953-8f02-7bf8-9ae3f0071aee@redhat.com> References: <3926e9de-971c-5cdd-340e-9f556cb7bf55@redhat.com> <86d13582-1953-8f02-7bf8-9ae3f0071aee@redhat.com> Message-ID: > It appears that it crashes when pointing test jdk to > /build/linux-x86_64-normal-server-fastdebug/jdk, but works fine when > linux-x86_64-normal-server-fastdebug/images/jdk is used. I can reproduce it now. The problem is that the signal handler doesn't recognize a load of the brooks pointer from a null object as an implicit null check when compressed oops are on (subtracting the base doesn't happen because the load of a brooks pointer on a null object hits right below the base): http://cr.openjdk.java.net/~roland/shenandoah/compressedoops-nullcheck/webrev.00/ Roland. From zgu at redhat.com Wed Oct 5 17:29:20 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 5 Oct 2016 13:29:20 -0400 Subject: Shenandoah crashes when running compiler/c2/cr6865031 test In-Reply-To: References: <3926e9de-971c-5cdd-340e-9f556cb7bf55@redhat.com> <86d13582-1953-8f02-7bf8-9ae3f0071aee@redhat.com> Message-ID: <73c76a1f-107a-b7b8-3808-ef6ad0068b00@redhat.com> Verified. Thanks for fixing it. -Zhengyu On 10/05/2016 12:12 PM, Roland Westrelin wrote: >> It appears that it crashes when pointing test jdk to >> /build/linux-x86_64-normal-server-fastdebug/jdk, but works fine when >> linux-x86_64-normal-server-fastdebug/images/jdk is used. > I can reproduce it now. The problem is that the signal handler doesn't > recognize a load of the brooks pointer from a null object as an implicit > null check when compressed oops are on (subtracting the base doesn't > happen because the load of a brooks pointer on a null object hits right > below the base): > > http://cr.openjdk.java.net/~roland/shenandoah/compressedoops-nullcheck/webrev.00/ > > Roland. From rkennke at redhat.com Thu Oct 6 10:13:43 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 06 Oct 2016 12:13:43 +0200 Subject: RFR: Reuse MacroAssembler-CAS-Obj for C1. Plus some minor/cosmetic changes in cas-obj code Message-ID: <1475748823.3342.2.camel@redhat.com> This makes C1 reuse the CAS-Obj impl of MacroAssembler. It also changes the cas-obj impl so that it uses movl for compressed oops, and use Assembler::equal instead of ::zero for the final set instruction, just to be consistent with the other comparisons in that code. Should be cosmetic afaict. http://cr.openjdk.java.net/~rkennke/c1casobj/webrev.00/ Ok? Roman From rkennke at redhat.com Thu Oct 6 10:16:48 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 06 Oct 2016 12:16:48 +0200 Subject: CFV: New Shenandoah Committer: Zhengyu Gu In-Reply-To: <1854325345.872523.1475247154541.JavaMail.zimbra@redhat.com> References: <1854325345.872523.1475247154541.JavaMail.zimbra@redhat.com> Message-ID: <1475749008.3342.3.camel@redhat.com> Vote: yes Roman Am Freitag, den 30.09.2016, 10:52 -0400 schrieb Christine Flood: > I hereby nominate Zhengyu Gu to Shenandoah Committer. > > Zhengyu has made several valuable contributions to the Shenandoah > code base. > > He should be able to commit his changes and future changes himself. > > Votes are due by Oct, 7th, 2016. > > Only current Shenandoah Committers [1] are eligible to vote > on this nomination.??Votes must be cast in the open by replying? > to this mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Christine Flood > > > [1] http://openjdk.java.net/census > [2] http://openjdk.java.net/projects/#committer-vote From shade at redhat.com Thu Oct 6 13:55:33 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 6 Oct 2016 15:55:33 +0200 Subject: RFR: Reuse MacroAssembler-CAS-Obj for C1. Plus some minor/cosmetic changes in cas-obj code In-Reply-To: <1475748823.3342.2.camel@redhat.com> References: <1475748823.3342.2.camel@redhat.com> Message-ID: On 10/06/2016 12:13 PM, Roman Kennke wrote: > This makes C1 reuse the CAS-Obj impl of MacroAssembler. > It also changes the cas-obj impl so that it uses movl for compressed > oops, and use Assembler::equal instead of ::zero for the final set > instruction, just to be consistent with the other comparisons in that > code. Should be cosmetic afaict. > > http://cr.openjdk.java.net/~rkennke/c1casobj/webrev.00/ Okay! -Aleksey From roman at kennke.org Thu Oct 6 13:58:53 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 06 Oct 2016 13:58:53 +0000 Subject: hg: shenandoah/jdk9/hotspot: Use MacroAssembler cas-obj code for C1 too. Use movl for compressed oops. Use sete instead of setz to set result reg. Message-ID: <201610061358.u96DwrAH000888@aojmv0008.oracle.com> Changeset: 055a3e68502e Author: rkennke Date: 2016-10-06 11:39 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/055a3e68502e Use MacroAssembler cas-obj code for C1 too. Use movl for compressed oops. Use sete instead of setz to set result reg. ! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp ! src/cpu/x86/vm/macroAssembler_x86.cpp From roman at kennke.org Thu Oct 6 16:02:54 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 06 Oct 2016 16:02:54 +0000 Subject: hg: shenandoah/jdk9/hotspot: Additional assert and debug output. Message-ID: <201610061602.u96G2scv002489@aojmv0008.oracle.com> Changeset: fb237a02238e Author: rkennke Date: 2016-10-06 18:02 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fb237a02238e Additional assert and debug output. ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp From rkennke at redhat.com Thu Oct 6 16:09:08 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 06 Oct 2016 18:09:08 +0200 Subject: RFR: Add asserts all over synchronizer Message-ID: <1475770148.3019.5.camel@redhat.com> I've seen an assert in synchronizer, and want to make sure than whenever the mark word is modified by synchronizer, the object really is in to-space and safe to write. This patch adds assert all over the place, wherever the mark word is modified by synchronizer and biased locking. I hope I found all the places... http://cr.openjdk.java.net/~rkennke/assertsynchronizer/webrev.00/ Ok to push? Roman From roman at kennke.org Thu Oct 6 16:43:46 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 06 Oct 2016 16:43:46 +0000 Subject: hg: shenandoah/jdk9/hotspot: Use unsafe_equals() instead of == for comparing oops. Message-ID: <201610061643.u96Ghk1v011861@aojmv0008.oracle.com> Changeset: 8a0bf4a83b74 Author: rkennke Date: 2016-10-06 18:43 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/8a0bf4a83b74 Use unsafe_equals() instead of == for comparing oops. ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.inline.hpp From rkennke at redhat.com Thu Oct 6 17:06:21 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 06 Oct 2016 19:06:21 +0200 Subject: RFR: Add asserts all over synchronizer In-Reply-To: <1475770148.3019.5.camel@redhat.com> References: <1475770148.3019.5.camel@redhat.com> Message-ID: <1475773581.3019.7.camel@redhat.com> Aleksey pointed out in private that we could just as well make synchronizer and biasedlocking use oopDesc::cas_set_mark() and then put the asserts there and in oopDesc::set_mark() and oopDesc::release_set_mark(). That should give us better coverage. So here it is: http://cr.openjdk.java.net/~rkennke/assertsynchronizer/webrev.01/ Ok now? Roman Am Donnerstag, den 06.10.2016, 18:09 +0200 schrieb Roman Kennke: > I've seen an assert in synchronizer, and want to make sure than > whenever the mark word is modified by synchronizer, the object really > is in to-space and safe to write. This patch adds assert all over the > place, wherever the mark word is modified by synchronizer and biased > locking. I hope I found all the places... > > http://cr.openjdk.java.net/~rkennke/assertsynchronizer/webrev.00/ > > Ok to push? > > Roman From shade at redhat.com Thu Oct 6 17:11:58 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 6 Oct 2016 19:11:58 +0200 Subject: RFR: Add asserts all over synchronizer In-Reply-To: <1475773581.3019.7.camel@redhat.com> References: <1475770148.3019.5.camel@redhat.com> <1475773581.3019.7.camel@redhat.com> Message-ID: <29e71e12-0959-f0a8-25ad-52e78b2cd245@redhat.com> On 10/06/2016 07:06 PM, Roman Kennke wrote: > Aleksey pointed out in private that we could just as well make > synchronizer and biasedlocking use oopDesc::cas_set_mark() and then put > the asserts there and in oopDesc::set_mark() and > oopDesc::release_set_mark(). That should give us better coverage. So > here it is: > > http://cr.openjdk.java.net/~rkennke/assertsynchronizer/webrev.01/ > > Ok now? Yes. Eager to see all those little places where it crashes with Shenandoah. Maybe run TEST="hotspot_all" on Shenandoah/fastdebug before pushing? -Aleksey From rwestrel at redhat.com Fri Oct 7 08:01:59 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 07 Oct 2016 10:01:59 +0200 Subject: ShenandoahBarrierNode::needs_barrier() doesn't fully support compressed oops Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/compressedoops-needs_barrier/webrev.00/ Roland. From rkennke at redhat.com Fri Oct 7 08:06:31 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 07 Oct 2016 10:06:31 +0200 Subject: ShenandoahBarrierNode::needs_barrier() doesn't fully support compressed oops In-Reply-To: References: Message-ID: <1475827591.3019.13.camel@redhat.com> Am Freitag, den 07.10.2016, 10:01 +0200 schrieb Roland Westrelin: > http://cr.openjdk.java.net/~roland/shenandoah/compressedoops-needs_ba > rrier/webrev.00/ This doesn't sound right. Our barriers don't operate on compressed oops. There must always be a Decode*Node in between a, e.g., LoadN and a shenandoah barrier. We shuold never get to a LoadN or similar node. Please correct me if I'm wrong... Roman From rwestrel at redhat.com Fri Oct 7 08:21:25 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 7 Oct 2016 10:21:25 +0200 Subject: ShenandoahBarrierNode::needs_barrier() doesn't fully support compressed oops In-Reply-To: <2130316795.2585882.1475828418021.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com> References: <2130316795.2585882.1475828418021.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com> Message-ID: <69679d75-7161-e028-dd97-e1ff5c246387@redhat.com> Forgot to include the list in my answer. Here is the follow up discussion: On 10/07/2016 10:20 AM, Roman Kennke wrote: > > Am 07.10.2016 10:09 vorm. schrieb Roland Westrelin : >> >>> Our barriers don't operate on compressed oops. There must always be a >>> Decode*Node in between a, e.g., LoadN and a shenandoah barrier. We >>> shuold never get to a LoadN or similar node. >> >> Because of that change: >> >> 160 if (n->Opcode() == Op_DecodeN || >> 161 n->Opcode() == Op_EncodeP) { >> 162 return needs_barrier_impl(phase, orig, n->in(1), rb_mem, >> allow_fromspace, visited); >> 163 } >> >> we can reach a LoadN. The rational is that we might encounter a useless >> DecodeN (for instance DecodeN->ConN) that's not yet removed. While it >> doesn't sound likely, it's also pretty easy to check. > > > Ah OK, I get it now. OK! > > Roman > From rkennke at redhat.com Fri Oct 7 09:06:02 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 07 Oct 2016 11:06:02 +0200 Subject: RFR: Add asserts all over synchronizer In-Reply-To: <29e71e12-0959-f0a8-25ad-52e78b2cd245@redhat.com> References: <1475770148.3019.5.camel@redhat.com> <1475773581.3019.7.camel@redhat.com> <29e71e12-0959-f0a8-25ad-52e78b2cd245@redhat.com> Message-ID: <1475831162.3019.15.camel@redhat.com> Am Donnerstag, den 06.10.2016, 19:11 +0200 schrieb Aleksey Shipilev: > On 10/06/2016 07:06 PM, Roman Kennke wrote: > > > > Aleksey pointed out in private that we could just as well make > > synchronizer and biasedlocking use oopDesc::cas_set_mark() and then > > put > > the asserts there and in oopDesc::set_mark() and > > oopDesc::release_set_mark(). That should give us better coverage. > > So > > here it is: > > > > http://cr.openjdk.java.net/~rkennke/assertsynchronizer/webrev.01/ > > > > Ok now? > > Yes. Eager to see all those little places where it crashes with > Shenandoah. Maybe run TEST="hotspot_all" on Shenandoah/fastdebug > before > pushing? I don't know (yet) how to unconditionally use Shenandoah everywhere. JAVA_TOOL_OPTIONS doesn't seem to do it. I've run jcstress in sanity mode, and SPECjvm and didn't get any asserts. I'd like to push it and run more tests on a big machine. Ok? Roman From shade at redhat.com Fri Oct 7 09:07:13 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 7 Oct 2016 11:07:13 +0200 Subject: RFR: Add asserts all over synchronizer In-Reply-To: <1475831162.3019.15.camel@redhat.com> References: <1475770148.3019.5.camel@redhat.com> <1475773581.3019.7.camel@redhat.com> <29e71e12-0959-f0a8-25ad-52e78b2cd245@redhat.com> <1475831162.3019.15.camel@redhat.com> Message-ID: <896a017d-fc65-e4ee-3f9f-e1f95df3f4d9@redhat.com> On 10/07/2016 11:06 AM, Roman Kennke wrote: > Am Donnerstag, den 06.10.2016, 19:11 +0200 schrieb Aleksey Shipilev: >> On 10/06/2016 07:06 PM, Roman Kennke wrote: >>> >>> Aleksey pointed out in private that we could just as well make >>> synchronizer and biasedlocking use oopDesc::cas_set_mark() and then >>> put >>> the asserts there and in oopDesc::set_mark() and >>> oopDesc::release_set_mark(). That should give us better coverage. >>> So >>> here it is: >>> >>> http://cr.openjdk.java.net/~rkennke/assertsynchronizer/webrev.01/ >>> >>> Ok now? >> >> Yes. Eager to see all those little places where it crashes with >> Shenandoah. Maybe run TEST="hotspot_all" on Shenandoah/fastdebug >> before >> pushing? > > I don't know (yet) how to unconditionally use Shenandoah everywhere. > JAVA_TOOL_OPTIONS doesn't seem to do it. > > I've run jcstress in sanity mode, and SPECjvm and didn't get any > asserts. I'd like to push it and run more tests on a big machine. Ok? Yeah, OK. -Aleksey From roman at kennke.org Fri Oct 7 09:15:19 2016 From: roman at kennke.org (roman at kennke.org) Date: Fri, 07 Oct 2016 09:15:19 +0000 Subject: hg: shenandoah/jdk9/hotspot: Added asserts all over synchronizer code to check that target objects are in to-space. Message-ID: <201610070915.u979FJsp004643@aojmv0008.oracle.com> Changeset: e382d419ca8b Author: rkennke Date: 2016-10-07 11:15 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e382d419ca8b Added asserts all over synchronizer code to check that target objects are in to-space. ! src/share/vm/oops/oop.hpp ! src/share/vm/oops/oop.inline.hpp ! src/share/vm/runtime/biasedLocking.cpp ! src/share/vm/runtime/synchronizer.cpp From rwestrel at redhat.com Fri Oct 7 10:02:26 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 07 Oct 2016 12:02:26 +0200 Subject: implicit null checks with compressed oops may cause crash Message-ID: It's the issue with null checks that Zhengyu encountered. Here is a new webrev with a test case: http://cr.openjdk.java.net/~roland/shenandoah/compressedoops-nullcheck/webrev.01/ The fix that I sent previously: http://cr.openjdk.java.net/~roland/shenandoah/compressedoops-nullcheck/webrev.00/ wasn't correct. With compressed oops enabled there are 2 code shapes for a read barrier: mov -0x8(%rsi),%r10 or mov -0x8(%r12,%r11,8),%r10 When an implicit null checks fires at a barrier, the offset observed by the signal handler can then be -8 or base - 8. The if ((uintptr_t)offset >= base) { check in MacroAssembler::needs_explicit_null_check() is broken for 2 reasons: - cast of offset to unsigned causes the test to always success if offset is -8 - if offset is base - 8, the test never succeeds. In both case, the implicit null check is not recognized and the VM crashes. Roland. From rkennke at redhat.com Fri Oct 7 10:05:17 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 07 Oct 2016 12:05:17 +0200 Subject: implicit null checks with compressed oops may cause crash In-Reply-To: References: Message-ID: <1475834717.3019.16.camel@redhat.com> Ok. Roman Am Freitag, den 07.10.2016, 12:02 +0200 schrieb Roland Westrelin: > It's the issue with null checks that Zhengyu encountered. Here is a > new > webrev with a test case: > > http://cr.openjdk.java.net/~roland/shenandoah/compressedoops-nullchec > k/webrev.01/ > > The fix that I sent previously: > > http://cr.openjdk.java.net/~roland/shenandoah/compressedoops-nullchec > k/webrev.00/ > > wasn't correct. > > With compressed oops enabled there are 2 code shapes for a read > barrier: > > mov -0x8(%rsi),%r10 > > or > > mov -0x8(%r12,%r11,8),%r10 > > When an implicit null checks fires at a barrier, the offset observed > by > the signal handler can then be -8 or base - 8. The > > if ((uintptr_t)offset >= base) { > > check in MacroAssembler::needs_explicit_null_check() is broken for 2 > reasons:? > > - cast of offset to unsigned causes the test to always success if > offset is -8 > > - if offset is base - 8, the test never succeeds. > > In both case, the implicit null check is not recognized and the VM > crashes. > > Roland. > From roman at kennke.org Fri Oct 7 14:15:26 2016 From: roman at kennke.org (roman at kennke.org) Date: Fri, 07 Oct 2016 14:15:26 +0000 Subject: hg: shenandoah/jdk9/hotspot: Added assert in biased locking to see where to-space objects might come from. Message-ID: <201610071415.u97EFQR1012087@aojmv0008.oracle.com> Changeset: 9cff9ac021c1 Author: rkennke Date: 2016-10-07 16:15 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9cff9ac021c1 Added assert in biased locking to see where to-space objects might come from. ! src/share/vm/runtime/biasedLocking.cpp From rwestrel at redhat.com Fri Oct 7 15:07:55 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Fri, 07 Oct 2016 15:07:55 +0000 Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets Message-ID: <201610071507.u97F7tgx024352@aojmv0008.oracle.com> Changeset: 3a1656c0e09d Author: roland Date: 2016-10-07 09:57 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/3a1656c0e09d ShenandoahBarrierNode::needs_barrier() support for compressed oops ! src/share/vm/opto/shenandoahSupport.cpp Changeset: 3dd7da8c2d71 Author: roland Date: 2016-10-07 11:37 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/3dd7da8c2d71 implicit null checks broken with compressed oops ! src/share/vm/asm/assembler.cpp + test/gc/shenandoah/compiler/TestNullCheck.java From roman at kennke.org Fri Oct 7 15:50:49 2016 From: roman at kennke.org (roman at kennke.org) Date: Fri, 07 Oct 2016 15:50:49 +0000 Subject: hg: shenandoah/jdk9/hotspot: More asserts to find where from-space oops come from in biased locking. Message-ID: <201610071550.u97FonAe004093@aojmv0008.oracle.com> Changeset: 8d3c72ecfc70 Author: rkennke Date: 2016-10-07 17:49 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/8d3c72ecfc70 More asserts to find where from-space oops come from in biased locking. ! src/share/vm/runtime/biasedLocking.cpp From ashipile at redhat.com Fri Oct 7 17:08:41 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 07 Oct 2016 17:08:41 +0000 Subject: hg: shenandoah/jdk9/hotspot: TestWriteBarrierClearControl fails in release build: needs -XX:+UnlockDiagnosticVMOptions. Message-ID: <201610071708.u97H8fCp022672@aojmv0008.oracle.com> Changeset: 907c358bf012 Author: shade Date: 2016-10-07 19:07 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/907c358bf012 TestWriteBarrierClearControl fails in release build: needs -XX:+UnlockDiagnosticVMOptions. ! test/gc/shenandoah/compiler/TestWriteBarrierClearControl.java From shade at redhat.com Fri Oct 7 17:15:10 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 7 Oct 2016 19:15:10 +0200 Subject: RFR (S): Refactor BrooksPointer utility class Message-ID: <9c8135ed-bef2-aafb-97fa-957c12d89172@redhat.com> Hi, When I was doing the fwdptr experiments, new asserts in BrooksPointer helped to diagnose bugs. Also, the API is unfortunate in the way it provides levity for doing fwdptr transitions we don't want. This is the cleanup: http://cr.openjdk.java.net/~shade/shenandoah/brooks-asserts/webrev.01/ Testing: {fastdebug,release} hotspot_gc_shenandoah, jcstress tests-custom -m quick (in progress) Thanks, -Aleksey From rkennke at redhat.com Fri Oct 7 17:18:02 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 07 Oct 2016 19:18:02 +0200 Subject: RFR (S): Refactor BrooksPointer utility class In-Reply-To: <9c8135ed-bef2-aafb-97fa-957c12d89172@redhat.com> References: <9c8135ed-bef2-aafb-97fa-957c12d89172@redhat.com> Message-ID: <1475860682.3019.24.camel@redhat.com> Looks good to me! Roman Am Freitag, den 07.10.2016, 19:15 +0200 schrieb Aleksey Shipilev: > Hi, > > When I was doing the fwdptr experiments, new asserts in BrooksPointer > helped to diagnose bugs. Also, the API is unfortunate in the way it > provides levity for doing fwdptr transitions we don't want. This is > the > cleanup: > ?http://cr.openjdk.java.net/~shade/shenandoah/brooks-asserts/webrev.0 > 1/ > > Testing: {fastdebug,release} hotspot_gc_shenandoah, jcstress > tests-custom -m quick (in progress) > > Thanks, > -Aleksey > > From ashipile at redhat.com Fri Oct 7 19:32:11 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 07 Oct 2016 19:32:11 +0000 Subject: hg: shenandoah/jdk9/hotspot: Refactor BrooksPointer utility class: reflow API, add more asserts, etc. Message-ID: <201610071932.u97JWB2n028013@aojmv0008.oracle.com> Changeset: bae5f9792c83 Author: shade Date: 2016-10-07 21:31 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/bae5f9792c83 Refactor BrooksPointer utility class: reflow API, add more asserts, etc. ! src/share/vm/gc/shenandoah/brooksPointer.cpp ! src/share/vm/gc/shenandoah/brooksPointer.hpp ! src/share/vm/gc/shenandoah/brooksPointer.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.hpp ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.cpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp From roman at kennke.org Mon Oct 10 07:54:08 2016 From: roman at kennke.org (roman at kennke.org) Date: Mon, 10 Oct 2016 07:54:08 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix types of oops/oopDesc*/HeapWord* in new asserts to make compiler happy. Message-ID: <201610100754.u9A7s8f6006621@aojmv0008.oracle.com> Changeset: 48f12ecaf167 Author: rkennke Date: 2016-10-10 09:53 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/48f12ecaf167 Fix types of oops/oopDesc*/HeapWord* in new asserts to make compiler happy. ! src/share/vm/gc/shenandoah/brooksPointer.inline.hpp From rkennke at redhat.com Mon Oct 10 08:14:16 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 10 Oct 2016 10:14:16 +0200 Subject: RFR: Barriers for locks rewrite Message-ID: <1476087256.3628.8.camel@redhat.com> Hi there, this is a bigger one. The initial issue was an assert that complained about the mark word not being neutral. It happened rarely. The issue was that for locking we would always do a write barrier fairly early on, then pass that WB'ed oop down to synchronizer.cpp and biasedLocking.cpp. Some code paths wrap the oop into a handle, and later pull it out again. In particular, biased locking can run into a safepoint, after which we do not guarantee that the oop in the handle is still in to-space. It would then go on and possibly (and rarely) write to a from-space oop. I fixed this by moving the barriers to where the mark word is actually accessed: a read-barrier in oopDesc::mark() and write-barriers for oopDesc::set_mark(), oopDesc::release_set_mark() and oopDesc::cas_set_mark(). It means we no longer need write-barriers in many places that somehow called into synchronizer. Much less changes in shared code! It means that we potentially don't need to invoke write-barriers for locks at all: when we try to enter a lock, and the lock is already biased towards our thread, then we only ever read the mark word, and therefore only require a read barrier. I also changed the interpreter and C1 to do the barriers late. Some nasty code didn't use oopDesc::mark_offset_in_bytes() but just 0. I fixed that. I haven't changed this in c2 yet, currently discussing with Roland how to do that. It does the barriers early still, and this is conservative and ok. Last but not least, the patch fixes the assert that got me going initially :-) Testing: I've run jcstress in quick mode, and will run more on a bigger machine as soon as the change is in. http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.00/ Ok? Roman From rkennke at redhat.com Mon Oct 10 08:17:10 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 10 Oct 2016 10:17:10 +0200 Subject: RFR: Barriers for locks rewrite In-Reply-To: <1476087256.3628.8.camel@redhat.com> References: <1476087256.3628.8.camel@redhat.com> Message-ID: <1476087430.3628.10.camel@redhat.com> Oh, and in my opinion, this also makes it more obvious why we need barriers in those places at all: we read the mark-word -> need read- barrier. We write the mark-word -> need write barrier. ?It also makes the code more consistent with the other field accessors in oopDesc, and more resilient to future changes: having the barriers in the accessors ensures that any code that uses those does The Right Thing. Roman Am Montag, den 10.10.2016, 10:14 +0200 schrieb Roman Kennke: > Hi there, > > this is a bigger one. > > The initial issue was an assert that complained about the mark word > not > being neutral. It happened rarely. The issue was that for locking we > would always do a write barrier fairly early on, then pass that WB'ed > oop down to synchronizer.cpp and biasedLocking.cpp. Some code paths > wrap the oop into a handle, and later pull it out again. In > particular, > biased locking can run into a safepoint, after which we do not > guarantee that the oop in the handle is still in to-space. It would > then go on and possibly (and rarely) write to a from-space oop. > > I fixed this by moving the barriers to where the mark word is > actually > accessed: a read-barrier in oopDesc::mark() and write-barriers for > oopDesc::set_mark(), oopDesc::release_set_mark() and > oopDesc::cas_set_mark(). > > It means we no longer need write-barriers in many places that somehow > called into synchronizer. Much less changes in shared code! > > It means that we potentially don't need to invoke write-barriers for > locks at all: when we try to enter a lock, and the lock is already > biased towards our thread, then we only ever read the mark word, and > therefore only require a read barrier. > > I also changed the interpreter and C1 to do the barriers late. Some > nasty code didn't use oopDesc::mark_offset_in_bytes() but just 0. I > fixed that. > > I haven't changed this in c2 yet, currently discussing with Roland > how > to do that. It does the barriers early still, and this is > conservative > and ok. > > Last but not least, the patch fixes the assert that got me going > initially :-) > > Testing: I've run jcstress in quick mode, and will run more on a > bigger > machine as soon as the change is in. > > http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.00/ > > Ok? > > Roman > From aph at redhat.com Mon Oct 10 08:28:26 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 10 Oct 2016 09:28:26 +0100 Subject: RFR: Barriers for locks rewrite In-Reply-To: <1476087256.3628.8.camel@redhat.com> References: <1476087256.3628.8.camel@redhat.com> Message-ID: <6d57dd3e-fcdd-29c2-7ef3-47989aee7cf6@redhat.com> On 10/10/16 09:14, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.00/ > > Ok? What to do about AArch64? Andrew. From rkennke at redhat.com Mon Oct 10 08:34:21 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 10 Oct 2016 10:34:21 +0200 Subject: RFR: Barriers for locks rewrite In-Reply-To: <6d57dd3e-fcdd-29c2-7ef3-47989aee7cf6@redhat.com> References: <1476087256.3628.8.camel@redhat.com> <6d57dd3e-fcdd-29c2-7ef3-47989aee7cf6@redhat.com> Message-ID: <1476088461.3628.15.camel@redhat.com> Am Montag, den 10.10.2016, 09:28 +0100 schrieb Andrew Haley: > On 10/10/16 09:14, Roman Kennke wrote: > > > > http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.00/ > > > > Ok? > > What to do about AArch64? Good point! Not doing anything should be safe & conservative. If we want to do similar improvement as on x86, seek out all the places where we modify the mark word (e.g. grep for mark_offset_in_bytes... hopefully we haven't used any 0s in there ;-) ) and place an appropriate barrier right before that. In x86, the tricky part was to get the interpreter_write_barrier() code correct for use with c1. (Side note: I would like to use the stub that's used by C2 for C1 and interpreter too... should not be hard). And then remove the early write-barriers in the monitorenter/exit code for interpreter and c1. Roman From shade at redhat.com Mon Oct 10 09:26:32 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 10 Oct 2016 11:26:32 +0200 Subject: RFR: Barriers for locks rewrite In-Reply-To: <1476087256.3628.8.camel@redhat.com> References: <1476087256.3628.8.camel@redhat.com> Message-ID: <31fbda95-5b22-8d13-6c1a-54bd417d4e83@redhat.com> On 10/10/2016 10:14 AM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.00/ I agree, this strikes right at the core where mark word is modified, not spread all around. Minor nit: *) I think jccb(...) should be replaced with jcc(..., true // maybe_short) Thanks, -Aleksey From aph at redhat.com Mon Oct 10 09:28:57 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 10 Oct 2016 10:28:57 +0100 Subject: RFR: Barriers for locks rewrite In-Reply-To: <1476088461.3628.15.camel@redhat.com> References: <1476087256.3628.8.camel@redhat.com> <6d57dd3e-fcdd-29c2-7ef3-47989aee7cf6@redhat.com> <1476088461.3628.15.camel@redhat.com> Message-ID: On 10/10/16 09:34, Roman Kennke wrote: > Am Montag, den 10.10.2016, 09:28 +0100 schrieb Andrew Haley: >> On 10/10/16 09:14, Roman Kennke wrote: >>> >>> http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.00/ >>> >>> Ok? >> >> What to do about AArch64? > > Good point! > > Not doing anything should be safe & conservative. > > If we want to do similar improvement as on x86, seek out all the places > where we modify the mark word (e.g. grep for mark_offset_in_bytes... > hopefully we haven't used any 0s in there ;-) ) and place an > appropriate barrier right before that. In x86, the tricky part was to > get the interpreter_write_barrier() code correct for use with c1. (Side > note: I would like to use the stub that's used by C2 for C1 and > interpreter too... should not be hard). And then remove the early > write-barriers in the monitorenter/exit code for interpreter and c1. It's important that AArch64 and x86 don't diverge. It would be best if all arch-dependent Shenandoah patches had Aarch64 and x86 code included. Failing that, we need to keep a list of such x86 changes so they can be back-ported. Andrew. From rkennke at redhat.com Mon Oct 10 09:30:14 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 10 Oct 2016 11:30:14 +0200 Subject: RFR: Barriers for locks rewrite In-Reply-To: <31fbda95-5b22-8d13-6c1a-54bd417d4e83@redhat.com> References: <1476087256.3628.8.camel@redhat.com> <31fbda95-5b22-8d13-6c1a-54bd417d4e83@redhat.com> Message-ID: <1476091814.3628.20.camel@redhat.com> Am Montag, den 10.10.2016, 11:26 +0200 schrieb Aleksey Shipilev: > On 10/10/2016 10:14 AM, Roman Kennke wrote: > > > > http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.00/ > > I agree, this strikes right at the core where mark word is modified, > not > spread all around. > > Minor nit: > > ?*) I think jccb(...) should be replaced with jcc(..., true // > maybe_short) > But that only does anything useful if the label is already bound, i.e. backward jumps. Or am I missing something? Roman From shade at redhat.com Mon Oct 10 09:39:37 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 10 Oct 2016 11:39:37 +0200 Subject: RFR: Barriers for locks rewrite In-Reply-To: <1476091814.3628.20.camel@redhat.com> References: <1476087256.3628.8.camel@redhat.com> <31fbda95-5b22-8d13-6c1a-54bd417d4e83@redhat.com> <1476091814.3628.20.camel@redhat.com> Message-ID: On 10/10/2016 11:30 AM, Roman Kennke wrote: > Am Montag, den 10.10.2016, 11:26 +0200 schrieb Aleksey Shipilev: >> >> *) I think jccb(...) should be replaced with jcc(..., true // >> maybe_short) > > But that only does anything useful if the label is already bound, i.e. > backward jumps. Or am I missing something? Ok, those are forward jumps. No matter then. -Aleksey From shade at redhat.com Mon Oct 10 12:18:16 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 10 Oct 2016 14:18:16 +0200 Subject: RFR (S): Parallel AlwaysPreTouch Message-ID: Hi, It is not unusual for low-latency customers with large heaps to enable -XX:+AlwaysPreTouch, that touches every page preemptively during the allocation (e.g. on startup). This pays the upfront cost during startup, instead of during the execution. However, pre-touching on larger heaps may take a while, see e.g. G1 on 1 Tb heap reports 10 minutes (!) to init the heap [1]. So, G1 made the pre-touch parallel. We can do it too: http://cr.openjdk.java.net/~shade/shenandoah/always-pretouch/webrev.01/ On my 4-core desktop, -XX:+AlwaysPreTouch HelloWorld with 20 Gb heap improves 4.5s -> 1.8s. I would expect this improvement to be larger on mammoth machines. Note that shared VM changes match the G1 changes [2], which means we should nicely fit during the next merge. Thanks, -Aleksey [1] https://bugs.openjdk.java.net/browse/JDK-8157952 [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/bc2c975bc342 From rkennke at redhat.com Mon Oct 10 12:55:01 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 10 Oct 2016 14:55:01 +0200 Subject: RFR (S): Parallel AlwaysPreTouch In-Reply-To: References: Message-ID: <1476104101.2764.2.camel@redhat.com> Great! Some little comments: - Why do we need our own flag? Can't we simply use the +AlwaysPreTouch flag? - Instead of dividing memory into chunks, and adding another flag for it, you could just iterate over shenandoah's regions using ShenandoahHeapRegionSet::claim() and pretouch each region's memory? Roman m Montag, den 10.10.2016, 14:18 +0200 schrieb Aleksey Shipilev: > Hi, > > It is not unusual for low-latency customers with large heaps to > enable > -XX:+AlwaysPreTouch, that touches every page preemptively during the > allocation (e.g. on startup). This pays the upfront cost during > startup, > instead of during the execution. > > However, pre-touching on larger heaps may take a while, see e.g. G1 > on 1 > Tb heap reports 10 minutes (!) to init the heap [1]. So, G1 made the > pre-touch parallel. We can do it too: > ?http://cr.openjdk.java.net/~shade/shenandoah/always-pretouch/webrev. > 01/ > > On my 4-core desktop, -XX:+AlwaysPreTouch HelloWorld with 20 Gb heap > improves 4.5s -> 1.8s. I would expect this improvement to be larger > on > mammoth machines. > > Note that shared VM changes match the G1 changes [2], which means we > should nicely fit during the next merge. > > Thanks, > -Aleksey > > > [1] https://bugs.openjdk.java.net/browse/JDK-8157952 > [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/bc2c975bc342 > > From shade at redhat.com Mon Oct 10 13:05:42 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 10 Oct 2016 15:05:42 +0200 Subject: RFR (S): Parallel AlwaysPreTouch In-Reply-To: <1476104101.2764.2.camel@redhat.com> References: <1476104101.2764.2.camel@redhat.com> Message-ID: <3a0772b4-5685-a7bf-aace-df61965bdf2b@redhat.com> On 10/10/2016 02:55 PM, Roman Kennke wrote: > Some little comments: > - Why do we need our own flag? Can't we simply use the +AlwaysPreTouch > flag? Ah, I was about to explain that in comments, but forgot. VirtualSpace code has this nasty thing: static bool commit_expanded(char* start, size_t size, size_t alignment, bool pre_touch, bool executable) { if (os::commit_memory(start, size, alignment, executable)) { if (pre_touch || AlwaysPreTouch) { pretouch_expanded_memory(start, start + size); } return true; } ...notice AlwaysPreTouch is unconditionally checked. This will get memory touched during the initial storage allocation, even before we've got a chance to wind up gang workers to parallelize it. So, if we want to bypass this behavior, there are two options: a) Create our own VirtualSpace (like G1 is doing), and manage memory by ourselves; b) Turn off AlwaysPreTouch, substitute it with a flag, and do this little thing on our own. I think (b) is much less hassle. > - Instead of dividing memory into chunks, and adding another flag for > it, you could just iterate over shenandoah's regions using > ShenandoahHeapRegionSet::claim() and pretouch each region's memory? The flag is coming from G1 changes, it seems we can leverage that. Also, regions are not yet initialized at this point, and it would seem more straightforward to do this on storage memory. Thanks, -Aleksey From rkennke at redhat.com Mon Oct 10 13:31:25 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 10 Oct 2016 15:31:25 +0200 Subject: RFR (S): Parallel AlwaysPreTouch In-Reply-To: <3a0772b4-5685-a7bf-aace-df61965bdf2b@redhat.com> References: <1476104101.2764.2.camel@redhat.com> <3a0772b4-5685-a7bf-aace-df61965bdf2b@redhat.com> Message-ID: <1476106285.2764.3.camel@redhat.com> Am Montag, den 10.10.2016, 15:05 +0200 schrieb Aleksey Shipilev: > On 10/10/2016 02:55 PM, Roman Kennke wrote: > > > > Some little comments: > > - Why do we need our own flag? Can't we simply use the > > +AlwaysPreTouch > > flag? > > Ah, I was about to explain that in comments, but forgot. VirtualSpace > code has this nasty thing: > > static bool commit_expanded(char* start, size_t size, size_t > alignment, > bool pre_touch, bool executable) { > ? if (os::commit_memory(start, size, alignment, executable)) { > ????if (pre_touch || AlwaysPreTouch) { > ??????pretouch_expanded_memory(start, start + size); > ????} > ????return true; > ? } > > ...notice AlwaysPreTouch is unconditionally checked. This will get > memory touched during the initial storage allocation, even before > we've > got a chance to wind up gang workers to parallelize it. So, if we > want > to bypass this behavior, there are two options: > ?a) Create our own VirtualSpace (like G1 is doing), and manage memory > by > ourselves; > ?b) Turn off AlwaysPreTouch, substitute it with a flag, and do this > little thing on our own. > > I think (b) is much less hassle. > > > > > - Instead of dividing memory into chunks, and adding another flag > > for > > it, you could just iterate over shenandoah's regions using > > ShenandoahHeapRegionSet::claim() and pretouch each region's memory? > > The flag is coming from G1 changes, it seems we can leverage that. > Also, > regions are not yet initialized at this point, and it would seem more > straightforward to do this on storage memory. Ah, ok, that explains it. Ok to go then! Maybe add comments in code..? Roman From shade at redhat.com Mon Oct 10 13:45:30 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 10 Oct 2016 15:45:30 +0200 Subject: RFR (S): Parallel AlwaysPreTouch In-Reply-To: <1476106285.2764.3.camel@redhat.com> References: <1476104101.2764.2.camel@redhat.com> <3a0772b4-5685-a7bf-aace-df61965bdf2b@redhat.com> <1476106285.2764.3.camel@redhat.com> Message-ID: On 10/10/2016 03:31 PM, Roman Kennke wrote: > Ah, ok, that explains it. Ok to go then! Maybe add comments in code..? Thanks, I also made signature a bit simpler: http://cr.openjdk.java.net/~shade/shenandoah/always-pretouch/webrev.02/ Will push soon. -Aleksey From ashipile at redhat.com Mon Oct 10 13:56:25 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 10 Oct 2016 13:56:25 +0000 Subject: hg: shenandoah/jdk9/hotspot: Parallel AlwaysPreTouch: do heap pre-touch operation in parallel. Message-ID: <201610101356.u9ADuP93021899@aojmv0008.oracle.com> Changeset: 699db4e3478e Author: shade Date: 2016-10-10 15:51 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/699db4e3478e Parallel AlwaysPreTouch: do heap pre-touch operation in parallel. ! src/share/vm/gc/shared/workgroup.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/globals.hpp ! src/share/vm/runtime/os.cpp ! src/share/vm/runtime/os.hpp + test/gc/shenandoah/AlwaysPreTouch.java From rwestrel at redhat.com Mon Oct 10 14:20:25 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Mon, 10 Oct 2016 14:20:25 +0000 Subject: hg: shenandoah/jdk9/hotspot: C2's barrier verification pass doesn't cover compressed oops correctly Message-ID: <201610101420.u9AEKPgC027355@aojmv0008.oracle.com> Changeset: 210512cec44a Author: roland Date: 2016-10-10 16:18 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/210512cec44a C2's barrier verification pass doesn't cover compressed oops correctly ! src/share/vm/opto/shenandoahSupport.cpp From chf at redhat.com Tue Oct 11 12:53:47 2016 From: chf at redhat.com (Christine Flood) Date: Tue, 11 Oct 2016 08:53:47 -0400 (EDT) Subject: Result: New Shenandoah Committer Zhengyu Gu In-Reply-To: <429000562.2403425.1476190012406.JavaMail.zimbra@redhat.com> Message-ID: <1712708438.2404715.1476190427285.JavaMail.zimbra@redhat.com> Voting for Zhengyu Gu is now closed. Yes: 3 Veto: 0 Abstain: 2 According to the Bylaws definition of Lazy Consensus, this is sufficient to approve the nomination. Christine [1] http://mail.openjdk.java.net/pipermail/shenandoah-dev/2016-September/000841.html From shade at redhat.com Tue Oct 11 15:07:38 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 11 Oct 2016 17:07:38 +0200 Subject: RFR (S) Improve UseLargePages support Message-ID: <5ed889fe-de01-d06a-21c9-adfd8fba9a45@redhat.com> Hi, I took a brief look at what Shenandoah does with large pages (-XX:+UseLargePages). It seems we are already covered for the heap part, because the shared code already allocates it with large pages when requested. We only shall do a few minor touchups: a) Allow pretouch to touch with larger steps, since pages are larger -- this improves -XX:+AlwaysPreTouch performance even further; b) Make sure the region sizes are at least one page, otherwise, in theory, mprotect granularity would make false alarms in our mprotect-based access verification code; c) Make sure we allocate large bitmaps with large pages too; All done here: http://cr.openjdk.java.net/~shade/shenandoah/large-pages-fix/webrev.01/ GC-heavy compiler tests are still happy: Benchmark Mode Cnt Score Error Units # baseline, -XX:-UseLargePages Compiler.compiler thrpt 10 70.635 ? 2.289 ops/min Compiler.sunflow thrpt 10 175.764 ? 4.574 ops/min # patched, -XX:-UseLargePages Compiler.compiler thrpt 10 70.639 ? 2.043 ops/min Compiler.sunflow thrpt 10 175.878 ? 4.355 ops/min # baseline, -XX:+UseLargePages Compiler.compiler thrpt 10 73.543 ? 2.574 ops/min Compiler.sunflow thrpt 10 183.399 ? 5.432 ops/min # patched, -XX:+UseLargePages Compiler.compiler thrpt 10 74.739 ? 2.526 ops/min Compiler.sunflow thrpt 10 186.247 ? 5.963 ops/min Thanks, -Aleksey From rkennke at redhat.com Wed Oct 12 08:18:09 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 12 Oct 2016 10:18:09 +0200 Subject: RFR (S) Improve UseLargePages support In-Reply-To: <5ed889fe-de01-d06a-21c9-adfd8fba9a45@redhat.com> References: <5ed889fe-de01-d06a-21c9-adfd8fba9a45@redhat.com> Message-ID: <1476260289.2926.7.camel@redhat.com> Looks alright! Roman Am Dienstag, den 11.10.2016, 17:07 +0200 schrieb Aleksey Shipilev: > Hi, > > I took a brief look at what Shenandoah does with large pages > (-XX:+UseLargePages). It seems we are already covered for the heap > part, > because the shared code already allocates it with large pages when > requested. > > We only shall do a few minor touchups: > ?a) Allow pretouch to touch with larger steps, since pages are larger > -- > this improves -XX:+AlwaysPreTouch performance even further; > ?b) Make sure the region sizes are at least one page, otherwise, in > theory, mprotect granularity would make false alarms in our > mprotect-based access verification code; > ?c) Make sure we allocate large bitmaps with large pages too; > > All done here: > ?http://cr.openjdk.java.net/~shade/shenandoah/large-pages-fix/webrev. > 01/ > > GC-heavy compiler tests are still happy: > > ?Benchmark???????????Mode??Cnt????Score???Error????Units > > ?# baseline, -XX:-UseLargePages > ?Compiler.compiler??thrpt???10???70.635 ? 2.289??ops/min > ?Compiler.sunflow???thrpt???10??175.764 ? 4.574??ops/min > > ?# patched, -XX:-UseLargePages > ?Compiler.compiler??thrpt???10???70.639 ? 2.043??ops/min > ?Compiler.sunflow???thrpt???10??175.878 ? 4.355??ops/min > > ?# baseline, -XX:+UseLargePages > ?Compiler.compiler??thrpt???10???73.543 ? 2.574??ops/min > ?Compiler.sunflow???thrpt???10??183.399 ? 5.432??ops/min > > ?# patched, -XX:+UseLargePages > ?Compiler.compiler??thrpt???10???74.739 ? 2.526??ops/min > ?Compiler.sunflow???thrpt???10??186.247 ? 5.963??ops/min > > Thanks, > -Aleksey > > From ashipile at redhat.com Wed Oct 12 08:20:44 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 12 Oct 2016 08:20:44 +0000 Subject: hg: shenandoah/jdk9/hotspot: Improve UseLargePages support. Message-ID: <201610120820.u9C8Ki65015452@aojmv0008.oracle.com> Changeset: 72ad8196ae92 Author: shade Date: 2016-10-12 10:19 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/72ad8196ae92 Improve UseLargePages support. ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.cpp From rkennke at redhat.com Wed Oct 12 20:16:57 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 12 Oct 2016 22:16:57 +0200 Subject: RFR: Barriers for locks rewrite In-Reply-To: <1476087256.3628.8.camel@redhat.com> References: <1476087256.3628.8.camel@redhat.com> Message-ID: <1476303417.2683.15.camel@redhat.com> Ok, this took a little longer to sort out. For now I decided to take out the platform specific parts. This 'only' moves the barriers from outside the locking code to where the mark word is actually touched. This results in much fewer write barriers all over the place, and fixes the problem that when synchronizer or biasedlocking pull out an oop from a Handle, it may no longer be in to- space. http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.01/ The platform specific parts will be addressed in a separate change, and in a different way than originally intended. Ok to commit now? Roman Am Montag, den 10.10.2016, 10:14 +0200 schrieb Roman Kennke: > Hi there, > > this is a bigger one. > > The initial issue was an assert that complained about the mark word > not > being neutral. It happened rarely. The issue was that for locking we > would always do a write barrier fairly early on, then pass that WB'ed > oop down to synchronizer.cpp and biasedLocking.cpp. Some code paths > wrap the oop into a handle, and later pull it out again. In > particular, > biased locking can run into a safepoint, after which we do not > guarantee that the oop in the handle is still in to-space. It would > then go on and possibly (and rarely) write to a from-space oop. > > I fixed this by moving the barriers to where the mark word is > actually > accessed: a read-barrier in oopDesc::mark() and write-barriers for > oopDesc::set_mark(), oopDesc::release_set_mark() and > oopDesc::cas_set_mark(). > > It means we no longer need write-barriers in many places that somehow > called into synchronizer. Much less changes in shared code! > > It means that we potentially don't need to invoke write-barriers for > locks at all: when we try to enter a lock, and the lock is already > biased towards our thread, then we only ever read the mark word, and > therefore only require a read barrier. > > I also changed the interpreter and C1 to do the barriers late. Some > nasty code didn't use oopDesc::mark_offset_in_bytes() but just 0. I > fixed that. > > I haven't changed this in c2 yet, currently discussing with Roland > how > to do that. It does the barriers early still, and this is > conservative > and ok. > > Last but not least, the patch fixes the assert that got me going > initially :-) > > Testing: I've run jcstress in quick mode, and will run more on a > bigger > machine as soon as the change is in. > > http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.00/ > > Ok? > > Roman > From rkennke at redhat.com Thu Oct 13 08:08:45 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 13 Oct 2016 10:08:45 +0200 Subject: RFR: Use oopDesc::mark_offset_in_bytes() instead of 0 for addressing mark word Message-ID: <1476346125.2776.4.camel@redhat.com> There are some places in assembly code where we access the mark word using Address(obj, 0). Not only is this bad style, but it makes it very hard to find all accesses of the mark word (that was my problem). Also, should the object layout ever change in the future (*cough*), this makes it easier to do. This patch changes hopefully all the places in x86 and aarch64. In aarch64, we're cmpxchg'ing the mark word in some places using the obj_reg as address. I've not put a lea ahead of it, but only an assert that the mark_offset_in_bytes() == 0. This is not strictly a shenandoah specific patch, but I need it in our repos for upcoming patches. Not sure if it's useful to propose upstream at this point in jdk9 dev? Also, it would probably require fixing ppc and sparc as well, which I have currently no desire to do ;-) http://cr.openjdk.java.net/~rkennke/markoffset/webrev.00/ Ok to commit? Roman From aph at redhat.com Thu Oct 13 09:23:51 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 13 Oct 2016 10:23:51 +0100 Subject: RFR: Use oopDesc::mark_offset_in_bytes() instead of 0 for addressing mark word In-Reply-To: <1476346125.2776.4.camel@redhat.com> References: <1476346125.2776.4.camel@redhat.com> Message-ID: <8035b0e7-310d-f41f-bcdf-20cf28407451@redhat.com> On 13/10/16 09:08, Roman Kennke wrote: > This is not strictly a shenandoah specific patch, but I need it in our > repos for upcoming patches. Not sure if it's useful to propose upstream > at this point in jdk9 dev? I don't think it is at all a good idea to change any code which is not strictly required by Shenandoah. The lock word as the first word of an object is a basic assumption. Any non-Shenandoah changes should go upstream first. Andrew. From rkennke at redhat.com Thu Oct 13 09:43:27 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 13 Oct 2016 11:43:27 +0200 Subject: RFR: Use oopDesc::mark_offset_in_bytes() instead of 0 for addressing mark word In-Reply-To: <8035b0e7-310d-f41f-bcdf-20cf28407451@redhat.com> References: <1476346125.2776.4.camel@redhat.com> <8035b0e7-310d-f41f-bcdf-20cf28407451@redhat.com> Message-ID: <1476351807.2776.9.camel@redhat.com> Am Donnerstag, den 13.10.2016, 10:23 +0100 schrieb Andrew Haley: > On 13/10/16 09:08, Roman Kennke wrote: > > > > This is not strictly a shenandoah specific patch, but I need it in our > > repos for upcoming patches. Not sure if it's useful to propose upstream > > at this point in jdk9 dev? > > I don't think it is at all a good idea to change any code which is > not strictly required by Shenandoah.??The lock word as the first word > of an object is a basic assumption.??Any non-Shenandoah changes should > go upstream first. > > Andrew. > Ok. Posted for review there: http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-October/024889.html Roman From roman at kennke.org Thu Oct 13 10:32:12 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 13 Oct 2016 10:32:12 +0000 Subject: hg: shenandoah/jdk9/hotspot: Added includes to fix aarch64 build. Message-ID: <201610131032.u9DAWCeC001243@aojmv0008.oracle.com> Changeset: 757506b12adf Author: rkennke Date: 2016-10-13 12:31 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/757506b12adf Added includes to fix aarch64 build. ! src/cpu/aarch64/vm/aarch64.ad ! src/share/vm/runtime/globals_extension.hpp From zgu at redhat.com Thu Oct 13 13:44:43 2016 From: zgu at redhat.com (zgu at redhat.com) Date: Thu, 13 Oct 2016 13:44:43 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fixed Shenandoah support in jvmci Message-ID: <201610131344.u9DDihC1011743@aojmv0008.oracle.com> Changeset: c2620a41ff99 Author: zgu Date: 2016-10-04 16:24 -0400 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/c2620a41ff99 Fixed Shenandoah support in jvmci ! src/share/vm/jvmci/jvmciCompilerToVM.cpp From zgu at redhat.com Thu Oct 13 16:18:31 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 13 Oct 2016 12:18:31 -0400 Subject: RFR(S): Parallelize CLDG and CodeCache root scanning and evacuation Message-ID: <703d8300-5e68-e130-1c9e-9320c27da0e2@redhat.com> Hi, This investigation was based Roman's suggestion. By parallelizing CLDG and CodeCahe root scanning and evacuation, it results about 4% improvement of critical-jOPS without much impact on max-jOPS on SPECjbb2015 tests. Webrev: http://cr.openjdk.java.net/~zgu/par-scan/webrev.01/ Thanks, -Zhengyu From roman at kennke.org Fri Oct 14 13:48:25 2016 From: roman at kennke.org (roman at kennke.org) Date: Fri, 14 Oct 2016 13:48:25 +0000 Subject: hg: shenandoah/jdk9/hotspot: Use non-checking version of mov for 0xdeaddead to avoid assert. Message-ID: <201610141348.u9EDmP5j014488@aojmv0008.oracle.com> Changeset: fd87a04d5540 Author: rkennke Date: 2016-10-14 13:45 +0000 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fd87a04d5540 Use non-checking version of mov for 0xdeaddead to avoid assert. ! src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp From shade at redhat.com Mon Oct 17 14:50:16 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 17 Oct 2016 16:50:16 +0200 Subject: RFR (XS): Prune is_in call from in_cset_fast_test Message-ID: <149fa695-5039-7e3b-3a79-f8d5db6b3874@redhat.com> Hi, The profiles on SPECjvm2008 show that ShenandoahHeap::is_in is in hotspots. Further analysis shows the most dominating use is in_cset_fast_test. It seems to me the ::is_in use there is redundant and should be instead turned into the assert -- compiled code that dubs this test blindly trusts it anyway. Webrev: http://cr.openjdk.java.net/~shade/shenandoah/in-cset-opt/webrev.01/ Testing: fastdebug/hotspot_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Mon Oct 17 15:00:44 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 17 Oct 2016 17:00:44 +0200 Subject: RFR (XS): Prune is_in call from in_cset_fast_test In-Reply-To: <149fa695-5039-7e3b-3a79-f8d5db6b3874@redhat.com> References: <149fa695-5039-7e3b-3a79-f8d5db6b3874@redhat.com> Message-ID: <1476716444.2671.13.camel@redhat.com> Yes. (Duh. How did this happen?) Cheers, Roman Am Montag, den 17.10.2016, 16:50 +0200 schrieb Aleksey Shipilev: > > Hi, > > The profiles on SPECjvm2008 show that ShenandoahHeap::is_in is in > hotspots. Further analysis shows the most dominating use is > in_cset_fast_test. It seems to me the ::is_in use there is redundant > and > should be instead turned into the assert -- compiled code that dubs > this > test blindly trusts it anyway. > > Webrev: > ? http://cr.openjdk.java.net/~shade/shenandoah/in-cset-opt/webrev.01/ > > Testing: fastdebug/hotspot_gc_shenandoah > > Thanks, > -Aleksey > From ashipile at redhat.com Mon Oct 17 15:05:28 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 17 Oct 2016 15:05:28 +0000 Subject: hg: shenandoah/jdk9/hotspot: Prune is_in call from in_cset_fast_test. Message-ID: <201610171505.u9HF5Sh6020963@aojmv0008.oracle.com> Changeset: bb1b3ff7f950 Author: shade Date: 2016-10-17 17:05 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/bb1b3ff7f950 Prune is_in call from in_cset_fast_test. ! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp From shade at redhat.com Mon Oct 17 15:19:49 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 17 Oct 2016 17:19:49 +0200 Subject: RFR: Barriers for locks rewrite In-Reply-To: <1476303417.2683.15.camel@redhat.com> References: <1476087256.3628.8.camel@redhat.com> <1476303417.2683.15.camel@redhat.com> Message-ID: On 10/12/2016 10:16 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/lockbarriers/webrev.01/ Looks okay to me. Thanks, -Aleksey From rkennke at redhat.com Tue Oct 18 10:03:51 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 18 Oct 2016 12:03:51 +0200 Subject: RFR(S): Parallelize CLDG and CodeCache root scanning and evacuation In-Reply-To: <703d8300-5e68-e130-1c9e-9320c27da0e2@redhat.com> References: <703d8300-5e68-e130-1c9e-9320c27da0e2@redhat.com> Message-ID: <1476785031.2671.17.camel@redhat.com> Am Donnerstag, den 13.10.2016, 12:18 -0400 schrieb Zhengyu Gu: > Hi, > > This investigation was based Roman's suggestion. By parallelizing > CLDG and CodeCahe root scanning and evacuation, it results about > 4% improvement of critical-jOPS without much impact on max-jOPS on > SPECjbb2015 tests. > > > Webrev: http://cr.openjdk.java.net/~zgu/par-scan/webrev.01/ Sorry for late reply, somehow it slipped through my attention ;-) It's great, please push! Most of it seems potentially useful for other GCs, so keep it on the list for proposing upstream. Although we'll have to wait for jdk10 to branch off... Cheers, Roman From zgu at redhat.com Tue Oct 18 13:37:41 2016 From: zgu at redhat.com (zgu at redhat.com) Date: Tue, 18 Oct 2016 13:37:41 +0000 Subject: hg: shenandoah/jdk9/hotspot: Parallelize CLDG and CodeCache root scanning and evacuation Message-ID: <201610181337.u9IDbgq9012379@aojmv0008.oracle.com> Changeset: 8c4c934a3fa8 Author: zgu Date: 2016-10-18 09:31 -0400 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/8c4c934a3fa8 Parallelize CLDG and CodeCache root scanning and evacuation ! src/share/vm/classfile/classLoaderData.cpp ! src/share/vm/classfile/classLoaderData.hpp ! src/share/vm/code/codeCache.cpp ! src/share/vm/code/codeCache.hpp ! src/share/vm/gc/shenandoah/shenandoahRootProcessor.cpp ! src/share/vm/gc/shenandoah/shenandoahRootProcessor.hpp From roman at kennke.org Thu Oct 20 09:30:55 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 20 Oct 2016 09:30:55 +0000 Subject: hg: shenandoah/jdk9/hotspot: Rewrite barriers for locks to invoke barriers only when mark word is actually touched. Message-ID: <201610200930.u9K9UtiV018154@aojmv0008.oracle.com> Changeset: e44bdff3b9d4 Author: rkennke Date: 2016-10-20 11:22 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e44bdff3b9d4 Rewrite barriers for locks to invoke barriers only when mark word is actually touched. ! src/share/vm/c1/c1_Runtime1.cpp ! src/share/vm/classfile/classLoaderData.cpp ! src/share/vm/classfile/systemDictionary.cpp ! src/share/vm/gc/shared/referencePendingListLocker.cpp ! src/share/vm/interpreter/interpreterRuntime.cpp ! src/share/vm/oops/cpCache.cpp ! src/share/vm/oops/instanceKlass.cpp ! src/share/vm/oops/oop.hpp ! src/share/vm/oops/oop.inline.hpp ! src/share/vm/opto/runtime.cpp ! src/share/vm/prims/jni.cpp ! src/share/vm/prims/jvm.cpp ! src/share/vm/prims/jvmtiEnv.cpp ! src/share/vm/runtime/biasedLocking.cpp ! src/share/vm/runtime/deoptimization.cpp ! src/share/vm/runtime/sharedRuntime.cpp ! src/share/vm/runtime/synchronizer.cpp ! src/share/vm/runtime/thread.cpp From rkennke at redhat.com Thu Oct 20 13:48:39 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 20 Oct 2016 15:48:39 +0200 Subject: RFR: Store verification Message-ID: <1476971319.2548.11.camel@redhat.com> This change implements (extends+improves in case of AArch64) store validation for Shenandoah. The idea is to insert a check at every store in assembly level that verifies (for stores to heap): - that the target address is not in the collection set, ever - in case of oop stores, that the store-value is not in collection set, but only during marking This is very useful when the write-barrier is not obviously next to the store, e.g. for c1 and c2. Infact, I left out the store-checks for the intepreter stores because the check would always be right next to the write-barrier. It's also inserted whereever objects are used for locking (e.g. writes to the mark-word). This is implemented in the interpreter (locks-only), c1 and c2 (x86- only for now, see below), for both AArch64 and x86. There has been a similar check already implemented in aarch64. This patch improves it such that it can be inserted anywhere (regardless if rscratch1+2 are available). I have not yet implemented store-checks in x86_64.ad yet, because I couldn't figure out how/where to insert the necessary MacroAssembler calls. Roland: maybe you could have a look? In aarch64, I added two methods to the assembler: get_cflags(Register) and set_cflags(Register), they are used to save/restore the condition flags. If we don't do this, we run into failures in tiered level 3. Apparently it mixes condition-setting/using instructions with stores in a funny way. Store-checks are turned on via -XX:+ShenandoahStoreCheck. Store-checks are now equivalent in x86 and aarch64. http://cr.openjdk.java.net/~rkennke/storechecks/webrev/ Ok to push? Roman From rwestrel at redhat.com Thu Oct 20 15:25:13 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 20 Oct 2016 17:25:13 +0200 Subject: RFR: Store verification In-Reply-To: <1476971319.2548.11.camel@redhat.com> References: <1476971319.2548.11.camel@redhat.com> Message-ID: <94287fad-77f0-2af4-8ad1-0974b17f485e@redhat.com> > http://cr.openjdk.java.net/~rkennke/storechecks/webrev/ That looks good to me. I'll take a look at x86_64.ad Roland. From aph at redhat.com Thu Oct 20 15:46:48 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Oct 2016 16:46:48 +0100 Subject: RFR: Store verification In-Reply-To: <1476971319.2548.11.camel@redhat.com> References: <1476971319.2548.11.camel@redhat.com> Message-ID: On 20/10/16 14:48, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/storechecks/webrev/ > > Ok to push? I really don't like all those boolean constants "true" and "false" all over the place. Yeah, I know I've done it elsewhere in aarch64.ad and it's bad style. Mea culpa, but I don't want it to spread any further. Please either define loadInsn() and storeInsn() or define an enum enum (isLoad, isStore) and use it instead of a bool. + orr(rval, zr, value); + // mov(rval, value); + mov(raddr, addr); What is this change for? Why orr instead of mov? + // Macro instructions for accessing and updating the condition flags + inline void get_cflags(Register reg) + { + mrs(0b011, 0b0100, 0b0010, 0b000, reg); + } + + inline void set_cflags(Register reg) + { + msr(0b011, 0b0100, 0b0010, 0b000, reg); + } + get_nzcv() would be easier for the reader to understand, and matches ARM's assembly language. Andrew. From rkennke at redhat.com Thu Oct 20 16:31:07 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 20 Oct 2016 18:31:07 +0200 Subject: RFR: Store verification In-Reply-To: References: <1476971319.2548.11.camel@redhat.com> Message-ID: <1476981067.2548.14.camel@redhat.com> Am Donnerstag, den 20.10.2016, 16:46 +0100 schrieb Andrew Haley: > On 20/10/16 14:48, Roman Kennke wrote: > > > > http://cr.openjdk.java.net/~rkennke/storechecks/webrev/ > > > > Ok to push? > > I really don't like all those boolean constants "true" and "false" > all > over the place. Me neither. > ??Yeah, I know I've done it elsewhere in aarch64.ad and > it's bad style.??Mea culpa, but I don't want it to spread any > further. > > Please either define loadInsn() and storeInsn() We've got 3 loadStore() methods, and one MOV_VOLATILE macro, and they are multiline, 2 of them >10lines and duplicating those, with only 1 line difference seems ugly... > or define an enum > > enum (isLoad, isStore) > > and use it instead of a bool. Sounds more useful. Will do that then. > +??orr(rval, zr, value); > +??// mov(rval, value); > +??mov(raddr, addr); > > What is this change for???Why orr instead of mov? A leftover from some debugging. Will turn in back into mov. Good find! > +??// Macro instructions for accessing and updating the condition > flags > +??inline void get_cflags(Register reg) > +??{ > +????mrs(0b011, 0b0100, 0b0010, 0b000, reg); > +??} > + > +??inline void set_cflags(Register reg) > +??{ > +????msr(0b011, 0b0100, 0b0010, 0b000, reg); > +??} > + > > get_nzcv() would be easier for the reader to understand, and matches > ARM's assembly language. Ok. Will post another webrev soon. Roman From aph at redhat.com Thu Oct 20 16:41:06 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 20 Oct 2016 17:41:06 +0100 Subject: RFR: Store verification In-Reply-To: <1476981067.2548.14.camel@redhat.com> References: <1476971319.2548.11.camel@redhat.com> <1476981067.2548.14.camel@redhat.com> Message-ID: <8b7e374b-2bfa-0d3b-a873-0480d355d71f@redhat.com> On 20/10/16 17:31, Roman Kennke wrote: > We've got 3 loadStore() methods, and one MOV_VOLATILE macro, and they > are multiline, 2 of them >10lines and duplicating those, with only 1 > line difference seems ugly... No, I meant I didn't mind you defining a method using the true and false but only using it twice, once from a doLoad() and once from a doStore() method. But never mind. Andrew. From rwestrel at redhat.com Fri Oct 21 08:27:39 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 21 Oct 2016 10:27:39 +0200 Subject: RFR: Store verification In-Reply-To: <1476971319.2548.11.camel@redhat.com> References: <1476971319.2548.11.camel@redhat.com> Message-ID: > I have not yet implemented store-checks in x86_64.ad yet, because I > couldn't figure out how/where to insert the necessary MacroAssembler > calls. Roland: maybe you could have a look? Below is an example of how I think you would do in x86_64.ad Also it seems you got some conditions wrong. Roland. diff --git a/src/cpu/x86/vm/macroAssembler_x86.cpp b/src/cpu/x86/vm/macroAssembler_x86.cpp --- a/src/cpu/x86/vm/macroAssembler_x86.cpp +++ b/src/cpu/x86/vm/macroAssembler_x86.cpp @@ -6123,7 +6123,7 @@ } void MacroAssembler::_shenandoah_store_addr_check(Register dst, const char* msg, const char* file, int line) { - if (! UseShenandoahGC && ! ShenandoahStoreCheck) return; + if (! UseShenandoahGC || ! ShenandoahStoreCheck) return; if (dst == rsp) return; // Stack-based target Register raddr = r9; @@ -6169,7 +6169,7 @@ } void MacroAssembler::_shenandoah_store_check(Register dst, Register value, const char* msg, const char* file, int line) { - if (! UseShenandoahGC && ! ShenandoahStoreCheck) return; + if (! UseShenandoahGC || ! ShenandoahStoreCheck) return; if (dst == rsp) return; // Stack-based target Register raddr = r8; diff --git a/src/cpu/x86/vm/x86_64.ad b/src/cpu/x86/vm/x86_64.ad --- a/src/cpu/x86/vm/x86_64.ad +++ b/src/cpu/x86/vm/x86_64.ad @@ -2693,6 +2693,15 @@ RELOC_DISP32); %} + enc_class shenandoah_store_check(memory mem, any_RegP src) %{ + MacroAssembler _masm(&cbuf); + __ shenandoah_store_check($mem$$Address, $src$$Register); + %} + + enc_class shenandoah_store_addr_check(memory mem) %{ + MacroAssembler _masm(&cbuf); + __ shenandoah_store_addr_check($mem$$Address); + %} %} @@ -5665,7 +5674,7 @@ ins_cost(125); // XXX format %{ "movl $mem, $src\t# int" %} opcode(0x89); - ins_encode(REX_reg_mem(src, mem), OpcP, reg_mem(src, mem)); + ins_encode(shenandoah_store_addr_check(mem), REX_reg_mem(src, mem), OpcP, reg_mem(src, mem)); ins_pipe(ialu_mem_reg); %} @@ -5689,7 +5698,7 @@ ins_cost(125); // XXX format %{ "movq $mem, $src\t# ptr" %} opcode(0x89); - ins_encode(REX_reg_mem_wide(src, mem), OpcP, reg_mem(src, mem)); + ins_encode(shenandoah_store_check(mem, src), REX_reg_mem_wide(src, mem), OpcP, reg_mem(src, mem)); ins_pipe(ialu_mem_reg); %} From rkennke at redhat.com Fri Oct 21 13:01:15 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 21 Oct 2016 15:01:15 +0200 Subject: RFR: Store verification In-Reply-To: <8b7e374b-2bfa-0d3b-a873-0480d355d71f@redhat.com> References: <1476971319.2548.11.camel@redhat.com> <1476981067.2548.14.camel@redhat.com> <8b7e374b-2bfa-0d3b-a873-0480d355d71f@redhat.com> Message-ID: <1477054875.2548.18.camel@redhat.com> Am Donnerstag, den 20.10.2016, 17:41 +0100 schrieb Andrew Haley: > On 20/10/16 17:31, Roman Kennke wrote: > > > > We've got 3 loadStore() methods, and one MOV_VOLATILE macro, and > > they > > are multiline, 2 of them >10lines and duplicating those, with only > > 1 > > line difference seems ugly... > > No, I meant I didn't mind you defining a method using the true and > false > but only using it twice, once from a doLoad() and once from a > doStore() > method.??But never mind. ok. I went for the enum. I agree, it does look better/clearer. I also changed the orr back to mov and renamed the set/get_cflags() to set/get_nzcv(), and incorporated Roland's suggestions about the conditions (oops) and the store-checks in x86_64.ad. http://cr.openjdk.java.net/~rkennke/storechecks/webrev.01/ Ok now? Roman From chf at redhat.com Fri Oct 21 14:22:26 2016 From: chf at redhat.com (Christine Flood) Date: Fri, 21 Oct 2016 10:22:26 -0400 (EDT) Subject: RFR: Small change to logging to make -Xlog:gc behave better. In-Reply-To: <586756530.5098808.1477059701909.JavaMail.zimbra@redhat.com> Message-ID: <1852921693.5098977.1477059746359.JavaMail.zimbra@redhat.com> I changed the behavior slightly so that -Xlog:gc now reports: init mark pause times final mark pause times concurrent marking times concurrent evacuation times and when concurrent gc is canceled. http://cr.openjdk.java.net/~chf/BetterLogging/webrev.00/ Christine From rkennke at redhat.com Fri Oct 21 14:25:14 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 21 Oct 2016 16:25:14 +0200 Subject: RFR: Small change to logging to make -Xlog:gc behave better. In-Reply-To: <1852921693.5098977.1477059746359.JavaMail.zimbra@redhat.com> References: <1852921693.5098977.1477059746359.JavaMail.zimbra@redhat.com> Message-ID: <1477059914.2548.21.camel@redhat.com> Am Freitag, den 21.10.2016, 10:22 -0400 schrieb Christine Flood: > I changed the behavior slightly so that -Xlog:gc now? > reports: > > ???init mark pause times > ???final mark pause times > ???concurrent marking times > ???concurrent evacuation times > ???and when concurrent gc is canceled.? > > > http://cr.openjdk.java.net/~chf/BetterLogging/webrev.00/ Very good! Please push! Roman From rkennke at redhat.com Fri Oct 21 15:31:36 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 21 Oct 2016 17:31:36 +0200 Subject: RFR: Shared write barrier stub (x86 and aarch64) Message-ID: <1477063896.2548.27.camel@redhat.com> This change makes C1 and the interpreter use the assembly write-barrier stub that was so far only used for C2. All 3 now use the same MacroAssembler routine to generate the call to the stub (incl. evacuation-in-progress-check), and all 3 now use that stub for the write-barrier fast path, and all 3 now call into Shenandoah for slow path using the same entry method. Advantages: - More efficient write barriers for interpreter and C1 - Smaller code - Less duplicated code (easier to maintain...) In order to do this, I added a 3rd stub generation phase, because we need the heap to be initialized (happens after phase 1) and before generating the interpreter (happens before phase 2). http://cr.openjdk.java.net/~rkennke/sharedwbstubs/webrev.00/ Ok to go? Roman From aph at redhat.com Fri Oct 21 15:47:57 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 21 Oct 2016 16:47:57 +0100 Subject: RFR: Shared write barrier stub (x86 and aarch64) In-Reply-To: <1477063896.2548.27.camel@redhat.com> References: <1477063896.2548.27.camel@redhat.com> Message-ID: On 21/10/16 16:31, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/sharedwbstubs/webrev.00/ > > Ok to go? Yes! Any patch which removes that much code will get my vote. Andrew. From chf at redhat.com Fri Oct 21 16:30:28 2016 From: chf at redhat.com (chf at redhat.com) Date: Fri, 21 Oct 2016 16:30:28 +0000 Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets Message-ID: <201610211630.u9LGUSgx029919@aojmv0008.oracle.com> Changeset: 0d090e6562d8 Author: chf Date: 2016-10-21 10:38 -0400 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/0d090e6562d8 Better logging ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp Changeset: be389cf2604a Author: chf Date: 2016-10-21 12:29 -0400 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/be389cf2604a Merge From roman at kennke.org Fri Oct 21 16:41:12 2016 From: roman at kennke.org (roman at kennke.org) Date: Fri, 21 Oct 2016 16:41:12 +0000 Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets Message-ID: <201610211641.u9LGfCAa002322@aojmv0008.oracle.com> Changeset: 103124b62abc Author: rkennke Date: 2016-10-21 18:40 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/103124b62abc Implement/improve Shenandoah store checks. ! src/cpu/aarch64/vm/aarch64.ad ! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp ! src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/interp_masm_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.hpp ! src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp ! src/cpu/aarch64/vm/stubRoutines_aarch64.hpp ! src/cpu/aarch64/vm/templateInterpreterGenerator_aarch64.cpp ! src/cpu/aarch64/vm/templateTable_aarch64.cpp ! src/cpu/x86/vm/assembler_x86.cpp ! src/cpu/x86/vm/assembler_x86.hpp ! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp ! src/cpu/x86/vm/c1_MacroAssembler_x86.cpp ! src/cpu/x86/vm/interp_masm_x86.cpp ! src/cpu/x86/vm/macroAssembler_x86.cpp ! src/cpu/x86/vm/macroAssembler_x86.hpp ! src/cpu/x86/vm/sharedRuntime_x86_64.cpp ! src/cpu/x86/vm/templateInterpreterGenerator_x86.cpp ! src/cpu/x86/vm/templateTable_x86.cpp ! src/cpu/x86/vm/x86_64.ad ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp Changeset: 462814876bed Author: rkennke Date: 2016-10-21 18:40 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/462814876bed Reuse C2 write barrier stub in interpreter and C1. ! src/cpu/aarch64/vm/aarch64.ad ! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp ! src/cpu/aarch64/vm/c1_Runtime1_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.hpp ! src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp ! src/cpu/aarch64/vm/shenandoahBarrierSet_aarch64.cpp ! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp ! src/cpu/aarch64/vm/stubRoutines_aarch64.hpp ! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp ! src/cpu/x86/vm/c1_Runtime1_x86.cpp ! src/cpu/x86/vm/macroAssembler_x86.cpp ! src/cpu/x86/vm/macroAssembler_x86.hpp ! src/cpu/x86/vm/sharedRuntime_x86_64.cpp ! src/cpu/x86/vm/shenandoahBarrierSet_x86.cpp ! src/cpu/x86/vm/stubGenerator_x86_64.cpp ! src/cpu/x86/vm/stubRoutines_x86.hpp ! src/cpu/x86/vm/x86_64.ad ! src/share/vm/c1/c1_Runtime1.cpp ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.hpp ! src/share/vm/runtime/init.cpp ! src/share/vm/runtime/stubRoutines.cpp ! src/share/vm/runtime/stubRoutines.hpp From roman at kennke.org Tue Oct 25 15:12:00 2016 From: roman at kennke.org (roman at kennke.org) Date: Tue, 25 Oct 2016 15:12:00 +0000 Subject: hg: shenandoah/jdk9/hotspot: Insert load-load fence in obj-eq-barrier, to prevent brooks ptr loads from floating above comparison. Message-ID: <201610251512.u9PFC1bw020250@aojmv0008.oracle.com> Changeset: a5ecc810e4b6 Author: rkennke Date: 2016-10-25 17:11 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a5ecc810e4b6 Insert load-load fence in obj-eq-barrier, to prevent brooks ptr loads from floating above comparison. ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp From shade at redhat.com Tue Oct 25 15:22:04 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 25 Oct 2016 17:22:04 +0200 Subject: RFR (XS): Native Brooks ptr accesses can be constant-folded Message-ID: <7ea518d4-cd26-071d-bdb9-9439f1658b92@redhat.com> Hi, When I did the bugfix for 16-byte object alignments, I had to make the byte/word offsets non-constant and agreeing with allocated sizes -- otherwise we were crashing with reading garbage. But now I see there is a little inconsistency in C2 that bumps the size based on offset, not the size. When we fix that, we can get our constant offsets back! Native disassembly for, say, ShenandoahBarrierSet::obj_equals looks much better now. Webrev: http://cr.openjdk.java.net/~shade/shenandoah/brooks-cnst-fold/webrev.01/ Testing: hs_gc_shenandoah (fastdebug) Thanks, -Aleksey From rkennke at redhat.com Tue Oct 25 15:51:42 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 25 Oct 2016 17:51:42 +0200 Subject: RFR (XS): Native Brooks ptr accesses can be constant-folded In-Reply-To: <7ea518d4-cd26-071d-bdb9-9439f1658b92@redhat.com> References: <7ea518d4-cd26-071d-bdb9-9439f1658b92@redhat.com> Message-ID: <1477410702.2548.35.camel@redhat.com> Am Dienstag, den 25.10.2016, 17:22 +0200 schrieb Aleksey Shipilev: > Hi, > > When I did the bugfix for 16-byte object alignments, I had to make > the > byte/word offsets non-constant and agreeing with allocated sizes -- > otherwise we were crashing with reading garbage. But now I see there > is > a little inconsistency in C2 that bumps the size based on offset, not > the size. When we fix that, we can get our constant offsets back! > Native > disassembly for, say, ShenandoahBarrierSet::obj_equals looks much > better > now. > > Webrev: > ? http://cr.openjdk.java.net/~shade/shenandoah/brooks-cnst-fold/webre > v.01/ > > Testing: > ?hs_gc_shenandoah (fastdebug) Ok! Roman From ashipile at redhat.com Tue Oct 25 15:54:03 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 25 Oct 2016 15:54:03 +0000 Subject: hg: shenandoah/jdk9/hotspot: Native Brooks ptr accesses can be constant-folded. Message-ID: <201610251554.u9PFs3UA029622@aojmv0008.oracle.com> Changeset: 6d234e9beadd Author: shade Date: 2016-10-25 17:53 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/6d234e9beadd Native Brooks ptr accesses can be constant-folded. ! src/share/vm/gc/shenandoah/brooksPointer.hpp ! src/share/vm/opto/macro.cpp From rwestrel at redhat.com Tue Oct 25 16:16:20 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 25 Oct 2016 18:16:20 +0200 Subject: Expand shenandoah write barrier as C2 IR Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/wb2ir/webrev.00/ This expands the write barrier to c2 IR after most optimizations are over. It also takes care of finding a dominating null check and reshapes the graph to enable implicit null checks. This method: static void m9(A a) { a.f = 0x42; } is compiled to (on x86): 0x00007fa4bcb4bb4c: movzbl 0x640(%r15),%r11d 0x00007fa4bcb4bb54: test %r11d,%r11d 0x00007fa4bcb4bb57: jne 0x00007fa4bcb4bb70 ;; B2: # B6 B3 <- B1 Freq: 0.999 0x00007fa4bcb4bb59: mov -0x8(%rsi),%rax ; implicit exception: dispatches to 0x00007fa4bcb4bb83 ;; B3: # N1 <- B2 B5 Freq: 0.999999 0x00007fa4bcb4bb5d: movl $0x42,0x10(%rax) ;*putfield f {reexecute=0 rethrow=0 return_oop=0} ; - TestShenandoahBarrier::m9 at 3 (line 66) 0x00007fa4bcb4bb64: add $0x10,%rsp 0x00007fa4bcb4bb68: pop %rbp 0x00007fa4bcb4bb69: test %eax,0x14487491(%rip) # 0x00007fa4d0fd3000 ; {poll_return} 0x00007fa4bcb4bb6f: retq ;; B4: # B6 B5 <- B1 Freq: 0.000999987 0x00007fa4bcb4bb70: mov -0x8(%rsi),%rdi ; implicit exception: dispatches to 0x00007fa4bcb4bb83 ;; B5: # B3 <- B4 Freq: 0.000999986 0x00007fa4bcb4bb74: movabs $0x7fa4bc94a8e4,%r10 0x00007fa4bcb4bb7e: callq *%r10 0x00007fa4bcb4bb81: jmp 0x00007fa4bcb4bb5d and on aarch64: 0x000003ff84b49cd4: ldrb w11, [xthread,#1600] 0x000003ff84b49cd8: dmb ishld 0x000003ff84b49cdc: cbnz w11, 0x000003ff84b49d00 ;; B2: # B6 B3 <- B1 Freq: 0.999 0x000003ff84b49ce0: ldr x0, [x1,#-8] ; implicit exception: dispatches to 0x000003ff84b49d0c ;; B3: # N1 <- B2 B5 Freq: 0.999999 ;; 0x42 0x000003ff84b49ce4: mov w10, #0x42 // #66 0x000003ff84b49ce8: str w10, [x0,#16] ;*synchronization entry ; - TestShenandoahBarrier::m9 at -1 (line 66) 0x000003ff84b49cec: ldp xfp, xlr, [sp,#16] 0x000003ff84b49cf0: add sp, sp, #0x20 0x000003ff84b49cf4: adrp xscratch1, 0x000003ff976d0000 ; {poll_return} 0x000003ff84b49cf8: ldr wzr, [xscratch1] ; {poll_return} 0x000003ff84b49cfc: ret ;; B4: # B6 B5 <- B1 Freq: 0.000999987 0x000003ff84b49d00: ldr x0, [x1,#-8] ; implicit exception: dispatches to 0x000003ff84b49d0c ;; B5: # B3 <- B4 Freq: 0.000999986 0x000003ff84b49d04: bl Stub::shenandoah_wb+4 0x000003ff849bf994 ; {runtime_call StubRoutines (3)} 0x000003ff84b49d08: b 0x000003ff84b49ce4 This is off by default for now as I'm seeing some hangs on aarch64. Roland. From rwestrel at redhat.com Wed Oct 26 11:24:06 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 26 Oct 2016 13:24:06 +0200 Subject: missing memory barrier in acmp with C2 Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/webrev.00/ The code generated for acmp is missing a memory barrier. Should it be a loadstore + loadload as in ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a loadload? Roland. From rkennke at redhat.com Wed Oct 26 11:27:47 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 26 Oct 2016 13:27:47 +0200 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: Message-ID: <1477481267.2548.42.camel@redhat.com> Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin: > http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/webrev.00/ > > The code generated for acmp is missing a memory barrier. Great! > Should it be a loadstore + loadload as in > ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a > loadload? I can come up with a reason for loadload, but not for loadstore, I think loadstore is not necessary there. I'd go for the less restrictive fence unless we come up with a good reason not to. Roman From rkennke at redhat.com Tue Oct 25 19:30:24 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 25 Oct 2016 21:30:24 +0200 Subject: Expand shenandoah write barrier as C2 IR In-Reply-To: References: Message-ID: <1477423824.2548.39.camel@redhat.com> Hi Roland, This looks like an awesome improvement to me! I'm worried about all those not-obviously Shenandoah related changes in there. Do we have a chance to eventually get them accepted upstream? Or can we somehow contain them? I also see you sneaked in a loadload barrier, presumably for the acmp barrier, but I don't see it inserted there. ? Roman Am Dienstag, den 25.10.2016, 18:16 +0200 schrieb Roland Westrelin: > http://cr.openjdk.java.net/~roland/shenandoah/wb2ir/webrev.00/ > > This expands the write barrier to c2 IR after most optimizations are > over. It also takes care of finding a dominating null check and > reshapes > the graph to enable implicit null checks. This method: > > ????static void m9(A a) { > ????????a.f = 0x42; > ????} > > is compiled to (on x86): > > ? 0x00007fa4bcb4bb4c: movzbl 0x640(%r15),%r11d > ? 0x00007fa4bcb4bb54: test???%r11d,%r11d > ? 0x00007fa4bcb4bb57: jne????0x00007fa4bcb4bb70 > ?;; B2: # B6 B3 <- B1??Freq: 0.999 > > ? 0x00007fa4bcb4bb59: mov????-0x8(%rsi),%rax????; implicit exception: > dispatches to 0x00007fa4bcb4bb83 > ?;; B3: # N1 <- B2 B5??Freq: 0.999999 > > ? 0x00007fa4bcb4bb5d: movl???$0x42,0x10(%rax)???;*putfield f > {reexecute=0 rethrow=0 return_oop=0} > ????????????????????????????????????????????????; - > TestShenandoahBarrier::m9 at 3 (line 66) > > ? 0x00007fa4bcb4bb64: add????$0x10,%rsp > ? 0x00007fa4bcb4bb68: pop????%rbp > ? 0x00007fa4bcb4bb69: test???%eax,0x14487491(%rip)????????# > 0x00007fa4d0fd3000 > ????????????????????????????????????????????????;???{poll_return} > ? 0x00007fa4bcb4bb6f: retq??? > ?;; B4: # B6 B5 <- B1??Freq: 0.000999987 > > ? 0x00007fa4bcb4bb70: mov????-0x8(%rsi),%rdi????; implicit exception: > dispatches to 0x00007fa4bcb4bb83 > ?;; B5: # B3 <- B4??Freq: 0.000999986 > > ? 0x00007fa4bcb4bb74: movabs $0x7fa4bc94a8e4,%r10 > ? 0x00007fa4bcb4bb7e: callq??*%r10 > ? 0x00007fa4bcb4bb81: jmp????0x00007fa4bcb4bb5d > > and on aarch64: > > ? 0x000003ff84b49cd4: ldrb??????w11, [xthread,#1600] > ? 0x000003ff84b49cd8: dmb???????ishld > ? 0x000003ff84b49cdc: cbnz??????w11, 0x000003ff84b49d00 > ?;; B2: #???????B6 B3 <- B1??Freq: 0.999 > > ? 0x000003ff84b49ce0: ldr???????x0, [x1,#-8]????; implicit exception: > dispatches to 0x000003ff84b49d0c > ?;; B3: #???????N1 <- B2 B5??Freq: 0.999999 > > ?;; 0x42 > ? 0x000003ff84b49ce4: mov???????w10, #0x42??????????????????????// > #66 > ? 0x000003ff84b49ce8: str???????w10, [x0,#16]???;*synchronization > entry > ????????????????????????????????????????????????; - > TestShenandoahBarrier::m9 at -1 (line 66) > > ? 0x000003ff84b49cec: ldp???????xfp, xlr, [sp,#16] > ? 0x000003ff84b49cf0: add???????sp, sp, #0x20 > ? 0x000003ff84b49cf4: adrp??????xscratch1, 0x000003ff976d0000 > ????????????????????????????????????????????????;???{poll_return} > ? 0x000003ff84b49cf8: ldr???????wzr, [xscratch1]??;???{poll_return} > ? 0x000003ff84b49cfc: ret > ?;; B4: #???????B6 B5 <- B1??Freq: 0.000999987 > > ? 0x000003ff84b49d00: ldr???????x0, [x1,#-8]????; implicit exception: > dispatches to 0x000003ff84b49d0c > ?;; B5: #???????B3 <- B4??Freq: 0.000999986 > > ? 0x000003ff84b49d04: bl????????Stub::shenandoah_wb+4 > 0x000003ff849bf994 > ????????????????????????????????????????????????;???{runtime_call > StubRoutines (3)} > ? 0x000003ff84b49d08: b 0x000003ff84b49ce4 > > This is off by default for now as I'm seeing some hangs on aarch64. > > Roland. From rwestrel at redhat.com Wed Oct 26 12:13:00 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 26 Oct 2016 14:13:00 +0200 Subject: Expand shenandoah write barrier as C2 IR In-Reply-To: <1477423824.2548.39.camel@redhat.com> References: <1477423824.2548.39.camel@redhat.com> Message-ID: Hi Roman, Thanks for looking at this. > I'm worried about all those not-obviously Shenandoah related changes in > there. Do we have a chance to eventually get them accepted upstream? Or > can we somehow contain them? Most of the changes are related to Shenandoah. The change in block.hpp/lcm.cpp is something that's fixed upstream but we don't have yet. Adding a new membar is also something that touches many files. So if you're asking if there's anything we can push upstream ahead of an integration of shenandoah, I don't see anything in that change: things are either shenandoah specific or already upstream or in the case of the new membar, something that has no use upstream yet so hard to justify. > I also see you sneaked in a loadload barrier, presumably for the acmp > barrier, but I don't see it inserted there. ? The write barrier on aarch64 has a load load barrier. So I need a load-load barrier node to properly expand the write barrier to IR. Roland. > > Roman > > Am Dienstag, den 25.10.2016, 18:16 +0200 schrieb Roland Westrelin: >> http://cr.openjdk.java.net/~roland/shenandoah/wb2ir/webrev.00/ >> >> This expands the write barrier to c2 IR after most optimizations are >> over. It also takes care of finding a dominating null check and >> reshapes >> the graph to enable implicit null checks. This method: >> >> static void m9(A a) { >> a.f = 0x42; >> } >> >> is compiled to (on x86): >> >> 0x00007fa4bcb4bb4c: movzbl 0x640(%r15),%r11d >> 0x00007fa4bcb4bb54: test %r11d,%r11d >> 0x00007fa4bcb4bb57: jne 0x00007fa4bcb4bb70 >> ;; B2: # B6 B3 <- B1 Freq: 0.999 >> >> 0x00007fa4bcb4bb59: mov -0x8(%rsi),%rax ; implicit exception: >> dispatches to 0x00007fa4bcb4bb83 >> ;; B3: # N1 <- B2 B5 Freq: 0.999999 >> >> 0x00007fa4bcb4bb5d: movl $0x42,0x10(%rax) ;*putfield f >> {reexecute=0 rethrow=0 return_oop=0} >> ; - >> TestShenandoahBarrier::m9 at 3 (line 66) >> >> 0x00007fa4bcb4bb64: add $0x10,%rsp >> 0x00007fa4bcb4bb68: pop %rbp >> 0x00007fa4bcb4bb69: test %eax,0x14487491(%rip) # >> 0x00007fa4d0fd3000 >> ; {poll_return} >> 0x00007fa4bcb4bb6f: retq >> ;; B4: # B6 B5 <- B1 Freq: 0.000999987 >> >> 0x00007fa4bcb4bb70: mov -0x8(%rsi),%rdi ; implicit exception: >> dispatches to 0x00007fa4bcb4bb83 >> ;; B5: # B3 <- B4 Freq: 0.000999986 >> >> 0x00007fa4bcb4bb74: movabs $0x7fa4bc94a8e4,%r10 >> 0x00007fa4bcb4bb7e: callq *%r10 >> 0x00007fa4bcb4bb81: jmp 0x00007fa4bcb4bb5d >> >> and on aarch64: >> >> 0x000003ff84b49cd4: ldrb w11, [xthread,#1600] >> 0x000003ff84b49cd8: dmb ishld >> 0x000003ff84b49cdc: cbnz w11, 0x000003ff84b49d00 >> ;; B2: # B6 B3 <- B1 Freq: 0.999 >> >> 0x000003ff84b49ce0: ldr x0, [x1,#-8] ; implicit exception: >> dispatches to 0x000003ff84b49d0c >> ;; B3: # N1 <- B2 B5 Freq: 0.999999 >> >> ;; 0x42 >> 0x000003ff84b49ce4: mov w10, #0x42 // >> #66 >> 0x000003ff84b49ce8: str w10, [x0,#16] ;*synchronization >> entry >> ; - >> TestShenandoahBarrier::m9 at -1 (line 66) >> >> 0x000003ff84b49cec: ldp xfp, xlr, [sp,#16] >> 0x000003ff84b49cf0: add sp, sp, #0x20 >> 0x000003ff84b49cf4: adrp xscratch1, 0x000003ff976d0000 >> ; {poll_return} >> 0x000003ff84b49cf8: ldr wzr, [xscratch1] ; {poll_return} >> 0x000003ff84b49cfc: ret >> ;; B4: # B6 B5 <- B1 Freq: 0.000999987 >> >> 0x000003ff84b49d00: ldr x0, [x1,#-8] ; implicit exception: >> dispatches to 0x000003ff84b49d0c >> ;; B5: # B3 <- B4 Freq: 0.000999986 >> >> 0x000003ff84b49d04: bl Stub::shenandoah_wb+4 >> 0x000003ff849bf994 >> ; {runtime_call >> StubRoutines (3)} >> 0x000003ff84b49d08: b 0x000003ff84b49ce4 >> >> This is off by default for now as I'm seeing some hangs on aarch64. >> >> Roland. From aph at redhat.com Wed Oct 26 12:35:37 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Oct 2016 13:35:37 +0100 Subject: missing memory barrier in acmp with C2 In-Reply-To: <1477481267.2548.42.camel@redhat.com> References: <1477481267.2548.42.camel@redhat.com> Message-ID: On 26/10/16 12:27, Roman Kennke wrote: > Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin: >> http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/webrev.00/ >> >> The code generated for acmp is missing a memory barrier. > > Great! > >> Should it be a loadstore + loadload as in >> ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a >> loadload? > > I can come up with a reason for loadload, but not for loadstore, I > think loadstore is not necessary there. I'd go for the less restrictive > fence unless we come up with a good reason not to. The general rule is that you can get away with loadload fences if you really know what you are doing, but it is exceedingly subtle. Imagine this. We have two variables, a boolean x_init and an oop x. Thread 1: x_init.store_release(true); Thread 2: if (x_init.load_aquire()) x.blah = y If you replace the load acquire with a loadload fence, the store of x.blah can become visible before the initialization of x. In this particular case you are probably OK, but in general it's not worth the risk of using naked loadload or laodstore fences IMO. This is an optimization that can break things and is very unlikely to result in a significant performance improvement. Default to correct! http://www.hboehm.info/c++mm/no_write_fences.html Andrew. From vitalyd at gmail.com Wed Oct 26 14:02:27 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 26 Oct 2016 10:02:27 -0400 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> Message-ID: On Wednesday, October 26, 2016, Andrew Haley wrote: > On 26/10/16 12:27, Roman Kennke wrote: > > Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin: > >> http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/webrev.00/ > >> > >> The code generated for acmp is missing a memory barrier. > > > > Great! > > > >> Should it be a loadstore + loadload as in > >> ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a > >> loadload? > > > > I can come up with a reason for loadload, but not for loadstore, I > > think loadstore is not necessary there. I'd go for the less restrictive > > fence unless we come up with a good reason not to. > > The general rule is that you can get away with loadload fences if you > really know what you are doing, but it is exceedingly subtle. > > Imagine this. We have two variables, a boolean x_init and an oop > x. > > Thread 1: > > x_init.store_release(true); > > Thread 2: > if (x_init.load_aquire()) > x.blah = y > > If you replace the load acquire with a loadload fence, the store of > x.blah can become visible before the initialization of x. x.blah requires a load of x (which cannot reorder with loadload) and it's data dependent; unless you take something like Alpha into account, but that's unsupported anyway. > In this > particular case you are probably OK, but in general it's not worth the > risk of using naked loadload or laodstore fences IMO. This is an > optimization that can break things and is very unlikely to result in a > significant performance improvement. Default to correct! > > http://www.hboehm.info/c++mm/no_write_fences.html > > Andrew. > -- Sent from my phone From aph at redhat.com Wed Oct 26 15:21:57 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Oct 2016 16:21:57 +0100 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> Message-ID: On 26/10/16 15:02, Vitaly Davidovich wrote: > On Wednesday, October 26, 2016, Andrew Haley wrote: > >> On 26/10/16 12:27, Roman Kennke wrote: >>> Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin: >>>> http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/webrev.00/ >>>> >>>> The code generated for acmp is missing a memory barrier. >>> >>> Great! >>> >>>> Should it be a loadstore + loadload as in >>>> ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a >>>> loadload? >>> >>> I can come up with a reason for loadload, but not for loadstore, I >>> think loadstore is not necessary there. I'd go for the less restrictive >>> fence unless we come up with a good reason not to. >> >> The general rule is that you can get away with loadload fences if you >> really know what you are doing, but it is exceedingly subtle. >> >> Imagine this. We have two variables, a boolean x_init and an oop >> x. >> >> Thread 1: >> >> x_init.store_release(true); >> >> Thread 2: >> if (x_init.load_aquire()) >> x.blah = y >> >> If you replace the load acquire with a loadload fence, the store of >> x.blah can become visible before the initialization of x. > > x.blah requires a load of x (which cannot reorder with loadload) x is just a local, and it's in a register. Where would you even load it from? > and it's data dependent; unless you take something like Alpha into > account, but that's unsupported anyway. Please explain. And, while you're at it, please explain why Hans is wrong, or why my interpretation is wrong. Andrew. From vitalyd at gmail.com Wed Oct 26 15:31:05 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 26 Oct 2016 11:31:05 -0400 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> Message-ID: On Wednesday, October 26, 2016, Andrew Haley wrote: > On 26/10/16 15:02, Vitaly Davidovich wrote: > > On Wednesday, October 26, 2016, Andrew Haley > wrote: > > > >> On 26/10/16 12:27, Roman Kennke wrote: > >>> Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin: > >>>> http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/webrev.00/ > >>>> > >>>> The code generated for acmp is missing a memory barrier. > >>> > >>> Great! > >>> > >>>> Should it be a loadstore + loadload as in > >>>> ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a > >>>> loadload? > >>> > >>> I can come up with a reason for loadload, but not for loadstore, I > >>> think loadstore is not necessary there. I'd go for the less restrictive > >>> fence unless we come up with a good reason not to. > >> > >> The general rule is that you can get away with loadload fences if you > >> really know what you are doing, but it is exceedingly subtle. > >> > >> Imagine this. We have two variables, a boolean x_init and an oop > >> x. > >> > >> Thread 1: > >> > >> x_init.store_release(true); > >> > >> Thread 2: > >> if (x_init.load_aquire()) > >> x.blah = y > >> > >> If you replace the load acquire with a loadload fence, the store of > >> x.blah can become visible before the initialization of x. > > > > x.blah requires a load of x (which cannot reorder with loadload) > > x is just a local, and it's in a register. Where would you even load > it from? I don't follow - x is an oop, and x.blah is at (addr of x) + (offset of blah field). You need to load addr of x to figure out dest addr of the store. As written in your snippet, the load of x is after the loadload. So what am I missing? > > > and it's data dependent; unless you take something like Alpha into > > account, but that's unsupported anyway. > > Please explain. And, while you're at it, please explain why Hans is > wrong, or why my interpretation is wrong. As mentioned above, to get x.blah address you need a load of x (or have the address available already) - that's data dependent load. AFAIK, Alpha is the commonly referenced arch that doesn't respect such data dependent loads - it can speculate on the address of x and proceed to compute x.blah ahead of resolving x itself. I'm not saying anyone is wrong, just trying to identify why you think your example is valid on archs other than Alpha and the like. > > Andrew. > > > -- Sent from my phone From shade at redhat.com Wed Oct 26 16:19:37 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Oct 2016 18:19:37 +0200 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> Message-ID: <2472e8fd-a021-fb71-0860-52f0bd8fff9f@redhat.com> On 10/26/2016 02:35 PM, Andrew Haley wrote: > On 26/10/16 12:27, Roman Kennke wrote: >> Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin: >>> http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/webrev.00/ >>> >>> The code generated for acmp is missing a memory barrier. >> >> Great! >> >>> Should it be a loadstore + loadload as in >>> ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a >>> loadload? >> >> I can come up with a reason for loadload, but not for loadstore, I >> think loadstore is not necessary there. I'd go for the less restrictive >> fence unless we come up with a good reason not to. > > The general rule is that you can get away with loadload fences if you > really know what you are doing, but it is exceedingly subtle. > > Imagine this. We have two variables, a boolean x_init and an oop > x. > > Thread 1: > > x_init.store_release(true); > > Thread 2: > if (x_init.load_aquire()) > x.blah = y > > If you replace the load acquire with a loadload fence, the store of > x.blah can become visible before the initialization of x. In this > particular case you are probably OK, but in general it's not worth the > risk of using naked loadload or laodstore fences IMO. This is an > optimization that can break things and is very unlikely to result in a > significant performance improvement. Default to correct! I understand the sentiment, and have nothing against it. However, in the particular case of acmp barrier, loadload seems enough, because we are indeed only ordering the loads. No potential stores are of our interest here, and Hans' example talks about stores. As far as I understood Hans' argument over the years, it was basically about "think about what is happening around too", and we don't care about that for acmp. Thanks, -Aleksey From aph at redhat.com Wed Oct 26 18:15:43 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Oct 2016 19:15:43 +0100 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> Message-ID: On 26/10/16 16:31, Vitaly Davidovich wrote: > On Wednesday, October 26, 2016, Andrew Haley wrote: > >> On 26/10/16 15:02, Vitaly Davidovich wrote: >>> On Wednesday, October 26, 2016, Andrew Haley > > wrote: >>> >>>> On 26/10/16 12:27, Roman Kennke wrote: >>>>> Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin: >>>>>> http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/webrev.00/ >>>>>> >>>>>> The code generated for acmp is missing a memory barrier. >>>>> >>>>> Great! >>>>> >>>>>> Should it be a loadstore + loadload as in >>>>>> ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a >>>>>> loadload? >>>>> >>>>> I can come up with a reason for loadload, but not for loadstore, I >>>>> think loadstore is not necessary there. I'd go for the less restrictive >>>>> fence unless we come up with a good reason not to. >>>> >>>> The general rule is that you can get away with loadload fences if you >>>> really know what you are doing, but it is exceedingly subtle. >>>> >>>> Imagine this. We have two variables, a boolean x_init and an oop >>>> x. >>>> >>>> Thread 1: >>>> >>>> x_init.store_release(true); >>>> >>>> Thread 2: >>>> if (x_init.load_aquire()) >>>> x.blah = y >>>> >>>> If you replace the load acquire with a loadload fence, the store of >>>> x.blah can become visible before the initialization of x. >>> >>> x.blah requires a load of x (which cannot reorder with loadload) >> >> x is just a local, and it's in a register. Where would you even load >> it from? > > I don't follow - x is an oop, and x.blah is at (addr of x) + (offset of > blah field). You need to load addr of x Where do you suppose the addr of x is being loaded from? The addr of x is in a register already. We don't need to read it from a field. It may be an argument, for example. > to figure out dest addr of the store. As written in your snippet, > the load of x is after the loadload. It's not. >>> and it's data dependent; unless you take something like Alpha into >>> account, but that's unsupported anyway. >> >> Please explain. And, while you're at it, please explain why Hans is >> wrong, or why my interpretation is wrong. > > As mentioned above, to get x.blah address you need a load of x (or have the > address available already) - that's data dependent load. Dependent on what? Andrew. From aph at redhat.com Wed Oct 26 18:15:51 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 26 Oct 2016 19:15:51 +0100 Subject: missing memory barrier in acmp with C2 In-Reply-To: <2472e8fd-a021-fb71-0860-52f0bd8fff9f@redhat.com> References: <1477481267.2548.42.camel@redhat.com> <2472e8fd-a021-fb71-0860-52f0bd8fff9f@redhat.com> Message-ID: <18ba88b4-513f-318c-d828-cf4c04127342@redhat.com> On 26/10/16 17:19, Aleksey Shipilev wrote: > I understand the sentiment, and have nothing against it. > > However, in the particular case of acmp barrier, loadload seems > enough, because we are indeed only ordering the loads. No potential > stores are of our interest here, and Hans' example talks about > stores. As far as I understood Hans' argument over the years, it was > basically about "think about what is happening around too", and we > don't care about that for acmp. Yes, I get that, in this particular case, it's OK. No argument. But the additional cost of loadload|loadstore is close to zero (and may actually be zero) on many architectures. Except in some extraordinary cases we don't need to apply such finicky reasoning. And if we do, we may get it wrong, and we only have to get it wrong once to suffer some major pain. So let's not go there. Andrew. From vitalyd at gmail.com Wed Oct 26 18:31:19 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 26 Oct 2016 14:31:19 -0400 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> Message-ID: On Wed, Oct 26, 2016 at 2:15 PM, Andrew Haley wrote: > On 26/10/16 16:31, Vitaly Davidovich wrote: > > On Wednesday, October 26, 2016, Andrew Haley wrote: > > > >> On 26/10/16 15:02, Vitaly Davidovich wrote: > >>> On Wednesday, October 26, 2016, Andrew Haley >> > wrote: > >>> > >>>> On 26/10/16 12:27, Roman Kennke wrote: > >>>>> Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin: > >>>>>> http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/ > webrev.00/ > >>>>>> > >>>>>> The code generated for acmp is missing a memory barrier. > >>>>> > >>>>> Great! > >>>>> > >>>>>> Should it be a loadstore + loadload as in > >>>>>> ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a > >>>>>> loadload? > >>>>> > >>>>> I can come up with a reason for loadload, but not for loadstore, I > >>>>> think loadstore is not necessary there. I'd go for the less > restrictive > >>>>> fence unless we come up with a good reason not to. > >>>> > >>>> The general rule is that you can get away with loadload fences if you > >>>> really know what you are doing, but it is exceedingly subtle. > >>>> > >>>> Imagine this. We have two variables, a boolean x_init and an oop > >>>> x. > >>>> > >>>> Thread 1: > >>>> > >>>> x_init.store_release(true); > >>>> > >>>> Thread 2: > >>>> if (x_init.load_aquire()) > >>>> x.blah = y > >>>> > >>>> If you replace the load acquire with a loadload fence, the store of > >>>> x.blah can become visible before the initialization of x. > >>> > >>> x.blah requires a load of x (which cannot reorder with loadload) > >> > >> x is just a local, and it's in a register. Where would you even load > >> it from? > > > > I don't follow - x is an oop, and x.blah is at (addr of x) + (offset of > > blah field). You need to load addr of x > > Where do you suppose the addr of x is being loaded from? > > The addr of x is in a register already. We don't need to read it > from a field. It may be an argument, for example. > > to figure out dest addr of the store. As written in your snippet, > > the load of x is after the loadload. > > It's not. > I interpreted your code as pseudocode, but you seem to be implying some other context. So you're saying you constructed x in Thread1, store_release'd the initialization, passed the address of x to Thread2 through memory, Thread2 read it from a field somewhere into a register, and now the snippet you're showing is when 'x' is already in a register? > > >>> and it's data dependent; unless you take something like Alpha into > >>> account, but that's unsupported anyway. > >> > >> Please explain. And, while you're at it, please explain why Hans is > >> wrong, or why my interpretation is wrong. > > > > As mentioned above, to get x.blah address you need a load of x (or have > the > > address available already) - that's data dependent load. > > Dependent on what? > See above - your code looks like pseudocode, and x.blah seemed like shorthand/pseudocode for loading x and writing a new value to .blah > > Andrew. > > From vitalyd at gmail.com Wed Oct 26 18:48:55 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 26 Oct 2016 14:48:55 -0400 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> Message-ID: So I see you took Hans' example, but his example has Thread 1 also reading some state during construction, which can be modified by Thread 2 concurrently. That is a problem, but your example was a bit too slimmed down to illustrate that. On Wednesday, October 26, 2016, Vitaly Davidovich wrote: > > > On Wed, Oct 26, 2016 at 2:15 PM, Andrew Haley > wrote: > >> On 26/10/16 16:31, Vitaly Davidovich wrote: >> > On Wednesday, October 26, 2016, Andrew Haley > > wrote: >> > >> >> On 26/10/16 15:02, Vitaly Davidovich wrote: >> >>> On Wednesday, October 26, 2016, Andrew Haley > >> >> > wrote: >> >>> >> >>>> On 26/10/16 12:27, Roman Kennke wrote: >> >>>>> Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin: >> >>>>>> http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/we >> brev.00/ >> >>>>>> >> >>>>>> The code generated for acmp is missing a memory barrier. >> >>>>> >> >>>>> Great! >> >>>>> >> >>>>>> Should it be a loadstore + loadload as in >> >>>>>> ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a >> >>>>>> loadload? >> >>>>> >> >>>>> I can come up with a reason for loadload, but not for loadstore, I >> >>>>> think loadstore is not necessary there. I'd go for the less >> restrictive >> >>>>> fence unless we come up with a good reason not to. >> >>>> >> >>>> The general rule is that you can get away with loadload fences if you >> >>>> really know what you are doing, but it is exceedingly subtle. >> >>>> >> >>>> Imagine this. We have two variables, a boolean x_init and an oop >> >>>> x. >> >>>> >> >>>> Thread 1: >> >>>> >> >>>> x_init.store_release(true); >> >>>> >> >>>> Thread 2: >> >>>> if (x_init.load_aquire()) >> >>>> x.blah = y >> >>>> >> >>>> If you replace the load acquire with a loadload fence, the store of >> >>>> x.blah can become visible before the initialization of x. >> >>> >> >>> x.blah requires a load of x (which cannot reorder with loadload) >> >> >> >> x is just a local, and it's in a register. Where would you even load >> >> it from? >> > >> > I don't follow - x is an oop, and x.blah is at (addr of x) + (offset of >> > blah field). You need to load addr of x >> >> Where do you suppose the addr of x is being loaded from? >> >> The addr of x is in a register already. We don't need to read it >> from a field. It may be an argument, for example. > > >> > to figure out dest addr of the store. As written in your snippet, >> > the load of x is after the loadload. >> >> It's not. >> > I interpreted your code as pseudocode, but you seem to be implying some > other context. So you're saying you constructed x in Thread1, > store_release'd the initialization, passed the address of x to Thread2 > through memory, Thread2 read it from a field somewhere into a register, and > now the snippet you're showing is when 'x' is already in a register? > >> >> >>> and it's data dependent; unless you take something like Alpha into >> >>> account, but that's unsupported anyway. >> >> >> >> Please explain. And, while you're at it, please explain why Hans is >> >> wrong, or why my interpretation is wrong. >> > >> > As mentioned above, to get x.blah address you need a load of x (or have >> the >> > address available already) - that's data dependent load. >> >> Dependent on what? >> > See above - your code looks like pseudocode, and x.blah seemed like > shorthand/pseudocode for loading x and writing a new value to .blah > >> >> Andrew. >> >> > -- Sent from my phone From shade at redhat.com Wed Oct 26 18:56:04 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 26 Oct 2016 20:56:04 +0200 Subject: missing memory barrier in acmp with C2 In-Reply-To: <18ba88b4-513f-318c-d828-cf4c04127342@redhat.com> References: <1477481267.2548.42.camel@redhat.com> <2472e8fd-a021-fb71-0860-52f0bd8fff9f@redhat.com> <18ba88b4-513f-318c-d828-cf4c04127342@redhat.com> Message-ID: On 10/26/2016 08:15 PM, Andrew Haley wrote: > On 26/10/16 17:19, Aleksey Shipilev wrote: >> I understand the sentiment, and have nothing against it. >> >> However, in the particular case of acmp barrier, loadload seems >> enough, because we are indeed only ordering the loads. No potential >> stores are of our interest here, and Hans' example talks about >> stores. As far as I understood Hans' argument over the years, it was >> basically about "think about what is happening around too", and we >> don't care about that for acmp. > > Yes, I get that, in this particular case, it's OK. No argument. But > the additional cost of loadload|loadstore is close to zero (and may > actually be zero) on many architectures. Except in some extraordinary > cases we don't need to apply such finicky reasoning. And if we do, we > may get it wrong, and we only have to get it wrong once to suffer some > major pain. So let's not go there. +1 Thanks, -Aleksey From aph at redhat.com Thu Oct 27 09:17:00 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Oct 2016 10:17:00 +0100 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> Message-ID: On 26/10/16 19:48, Vitaly Davidovich wrote: > So I see you took Hans' example, but his example has Thread 1 also > reading some state during construction, which can be modified by > Thread 2 concurrently. That is a problem, but your example was a > bit too slimmed down to illustrate that. Again, construction is a red herring. All the example says is that the store to x.blah can become visible before the initialization of x. This is not "initialization" in the sense of executing in a constructor, which is safe before the address of the object is published. Andrew. From aph at redhat.com Thu Oct 27 09:17:47 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 27 Oct 2016 10:17:47 +0100 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> Message-ID: <97199a8d-0079-5db7-236b-84feb528e1f4@redhat.com> On 26/10/16 19:31, Vitaly Davidovich wrote: > On Wed, Oct 26, 2016 at 2:15 PM, Andrew Haley wrote: > >> On 26/10/16 16:31, Vitaly Davidovich wrote: >>> On Wednesday, October 26, 2016, Andrew Haley wrote: >> >> The addr of x is in a register already. We don't need to read it >> from a field. It may be an argument, for example. > >>> to figure out dest addr of the store. As written in your snippet, >>> the load of x is after the loadload. >> >> It's not. >> > I interpreted your code as pseudocode, but you seem to be implying some > other context. So you're saying you constructed Let's go back to the original code: Thread 1: x_init.store_release(true); Thread 2: if (x_init.load_aquire()) x.blah = y No construction there. > x in Thread1, > store_release'd the initialization, passed the address of x to Thread2 > through memory, No. Thread2 already has the address of x. x was constructed a long time ago. This initialization is some other code which happens later. > See above - your code looks like pseudocode, and x.blah seemed like > shorthand/pseudocode for loading x and writing a new value to .blah It's not. Andrew. From shade at redhat.com Fri Oct 28 09:18:01 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Oct 2016 11:18:01 +0200 Subject: RFR (S): Chunked array processing should first push the continuation Message-ID: <7322b16e-5588-7a68-4903-226bcf66c776@redhat.com> Hi, This is one of those "LOL" performance bugs. If you profile the ArrayFragger test [1] that eventually scans a large array, you will notice that TaskQueues are the hotspots with lots of stealing. If you wonder why, this is why: in chunked processing we *first* process our chunk, and then let others know we have more work (of course, next thing you know, pulling that work under their feet). The solution is to first fork out the continuation, and then process our own chunk in solitude: http://cr.openjdk.java.net/~shade/shenandoah/concmark-cont-first/webrev.01/ Improves the stress test in question by very much: Benchmark (ldsMB) (objSize) Mode Cnt Score Error Units # Before ArrayFragger.test 500 100 avgt 100 903.449 ? 23.912 ns/op # After ArrayFragger.test 500 100 avgt 100 581.849 ? 53.288 ns/op Testing: hotspot_gc_shenandoah Thanks, -Aleksey [1] http://cr.openjdk.java.net/~shade/shenandoah/shenandoah-gc-bench/src/main/java/org/openjdk/shenandoah/fragger/ArrayFragger.java From rkennke at redhat.com Fri Oct 28 10:02:11 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 12:02:11 +0200 Subject: RFR (S): Chunked array processing should first push the continuation In-Reply-To: <7322b16e-5588-7a68-4903-226bcf66c776@redhat.com> References: <7322b16e-5588-7a68-4903-226bcf66c776@redhat.com> Message-ID: <1477648931.2548.51.camel@redhat.com> Awesome! Please push! BTW: Can we collect all those little benchmarks into a proper suite? Roman Am Freitag, den 28.10.2016, 11:18 +0200 schrieb Aleksey Shipilev: > Hi, > > This is one of those "LOL" performance bugs. If you profile the > ArrayFragger test [1] that eventually scans a large array, you will > notice that TaskQueues are the hotspots with lots of stealing. If you > wonder why, this is why: in chunked processing we *first* process our > chunk, and then let others know we have more work (of course, next > thing > you know, pulling that work under their feet). > > The solution is to first fork out the continuation, and then process > our > own chunk in solitude: > > http://cr.openjdk.java.net/~shade/shenandoah/concmark-cont-first/webr > ev.01/ > > Improves the stress test in question by very much: > > Benchmark?????????(ldsMB) > (objSize)??Mode??Cnt????Score????Error??Units > > # Before > ArrayFragger.test????500???????100??avgt??100??903.449 ? > 23.912??ns/op > > # After > ArrayFragger.test?????500??????100??avgt??100??581.849 ? > 53.288??ns/op > > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > > [1] > http://cr.openjdk.java.net/~shade/shenandoah/shenandoah-gc-bench/src/ > main/java/org/openjdk/shenandoah/fragger/ArrayFragger.java > From ashipile at redhat.com Fri Oct 28 10:05:31 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 28 Oct 2016 10:05:31 +0000 Subject: hg: shenandoah/jdk9/hotspot: Chunked array processing should first push the continuation. Message-ID: <201610281005.u9SA5VS9026288@aojmv0008.oracle.com> Changeset: ffaa8941fdce Author: shade Date: 2016-10-28 12:04 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/ffaa8941fdce Chunked array processing should first push the continuation. ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp From rwestrel at redhat.com Fri Oct 28 11:35:06 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Oct 2016 13:35:06 +0200 Subject: Condition code not set after CAS on aarch64 Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/aarch64-cas-cc/webrev.00/ Instructions that set the condition code got dropped... Roland. From aph at redhat.com Fri Oct 28 11:45:05 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 28 Oct 2016 12:45:05 +0100 Subject: Condition code not set after CAS on aarch64 In-Reply-To: References: Message-ID: On 28/10/16 12:35, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/shenandoah/aarch64-cas-cc/webrev.00/ > > Instructions that set the condition code got dropped... I'd scratch my head over how that happened, but life's too short... Andrew. From rwestrel at redhat.com Fri Oct 28 11:45:35 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Oct 2016 13:45:35 +0200 Subject: missing memory barrier in acmp with C2 In-Reply-To: <18ba88b4-513f-318c-d828-cf4c04127342@redhat.com> References: <1477481267.2548.42.camel@redhat.com> <2472e8fd-a021-fb71-0860-52f0bd8fff9f@redhat.com> <18ba88b4-513f-318c-d828-cf4c04127342@redhat.com> Message-ID: > Yes, I get that, in this particular case, it's OK. No argument. But > the additional cost of loadload|loadstore is close to zero (and may > actually be zero) on many architectures. Except in some extraordinary > cases we don't need to apply such finicky reasoning. And if we do, we > may get it wrong, and we only have to get it wrong once to suffer some > major pain. So let's not go there. What about the memory barrier in the shenandoah write barrier: Address evacuation_in_progress = Address(rthread, in_bytes(JavaThread::evacuation_in_progress_offset())); ldrb(rscratch1, evacuation_in_progress); membar(Assembler::LoadLoad); // The read-barrier. ldr(dst, Address(dst, BrooksPointer::byte_offset())); Should it be loadload|loadstore? The reason I'm asking, is that to expand the barrier as c2 IR, I had to add support for a loadload memory barrier in the IR. If MemBarAcquire can be used instead then I can remove all the shared changes for the loadload memory barrier from my patch. Roland. From aph at redhat.com Fri Oct 28 11:48:44 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 28 Oct 2016 12:48:44 +0100 Subject: missing memory barrier in acmp with C2 In-Reply-To: References: <1477481267.2548.42.camel@redhat.com> <2472e8fd-a021-fb71-0860-52f0bd8fff9f@redhat.com> <18ba88b4-513f-318c-d828-cf4c04127342@redhat.com> Message-ID: On 28/10/16 12:45, Roland Westrelin wrote: > The reason I'm asking, is that to expand the barrier as c2 IR, I had to > add support for a loadload memory barrier in the IR. If MemBarAcquire > can be used instead then I can remove all the shared changes for the > loadload memory barrier from my patch. Excellent! Make it so. Andrew. From rwestrel at redhat.com Fri Oct 28 11:54:39 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Oct 2016 13:54:39 +0200 Subject: Expand shenandoah write barrier as C2 IR In-Reply-To: References: Message-ID: > This is off by default for now as I'm seeing some hangs on aarch64. The hangs on aarch64 are unrelated. I will push this enabled by default. Roland. From rwestrel at redhat.com Fri Oct 28 12:01:57 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Fri, 28 Oct 2016 12:01:57 +0000 Subject: hg: shenandoah/jdk9/hotspot: Condition code not set after CAS on aarch64 Message-ID: <201610281201.u9SC1vF2023562@aojmv0008.oracle.com> Changeset: 40e322c38a82 Author: roland Date: 2016-10-28 11:30 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/40e322c38a82 Condition code not set after CAS on aarch64 ! src/cpu/aarch64/vm/aarch64.ad From rwestrel at redhat.com Fri Oct 28 12:48:23 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Fri, 28 Oct 2016 12:48:23 +0000 Subject: hg: shenandoah/jdk9/hotspot: missing memory barrier in acmp Message-ID: <201610281248.u9SCmNeM003541@aojmv0008.oracle.com> Changeset: b329d9c36925 Author: roland Date: 2016-10-28 14:27 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/b329d9c36925 missing memory barrier in acmp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/library_call.cpp ! src/share/vm/opto/subnode.cpp From rkennke at redhat.com Fri Oct 28 12:49:57 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 14:49:57 +0200 Subject: Expand shenandoah write barrier as C2 IR In-Reply-To: References: Message-ID: <1477658997.2548.52.camel@redhat.com> Am Freitag, den 28.10.2016, 13:54 +0200 schrieb Roland Westrelin: > > > > This is off by default for now as I'm seeing some hangs on aarch64. > > The hangs on aarch64 are unrelated. I will push this enabled by > default. Might be related to the synchronizer bug I'm looking at right now. Roman From rwestrel at redhat.com Fri Oct 28 12:51:18 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Oct 2016 14:51:18 +0200 Subject: Expand shenandoah write barrier as C2 IR In-Reply-To: <1477658997.2548.52.camel@redhat.com> References: <1477658997.2548.52.camel@redhat.com> Message-ID: <0b53459b-08b9-532c-52df-fb7d252c7d5b@redhat.com> >> The hangs on aarch64 are unrelated. I will push this enabled by >> default. > > Might be related to the synchronizer bug I'm looking at right now. CAS was missing the instructions to set the condition flag. I pushed a fix. It was aarch64 specific. Roland. From rwestrel at redhat.com Fri Oct 28 13:24:44 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Fri, 28 Oct 2016 13:24:44 +0000 Subject: hg: shenandoah/jdk9/hotspot: Expand shenandoah write barrier as C2 IR Message-ID: <201610281324.u9SDOiYe011513@aojmv0008.oracle.com> Changeset: 978d7601df14 Author: roland Date: 2016-10-28 14:35 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/978d7601df14 Expand shenandoah write barrier as C2 IR ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.hpp ! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp ! src/cpu/x86/vm/stubGenerator_x86_64.cpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/opto/block.hpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/compile.hpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/lcm.cpp ! src/share/vm/opto/loopnode.cpp ! src/share/vm/opto/loopnode.hpp ! src/share/vm/opto/matcher.cpp ! src/share/vm/opto/memnode.hpp ! src/share/vm/opto/node.cpp ! src/share/vm/opto/node.hpp ! src/share/vm/opto/phaseX.cpp ! src/share/vm/opto/runtime.cpp ! src/share/vm/opto/runtime.hpp ! src/share/vm/opto/shenandoahSupport.cpp ! src/share/vm/opto/shenandoahSupport.hpp ! src/share/vm/runtime/stubRoutines.cpp ! src/share/vm/runtime/stubRoutines.hpp From aph at redhat.com Fri Oct 28 14:12:37 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 28 Oct 2016 15:12:37 +0100 Subject: Expand shenandoah write barrier as C2 IR In-Reply-To: References: Message-ID: <782ac60b-b152-de59-2574-4475bbc7cb23@redhat.com> On 25/10/16 17:16, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/shenandoah/wb2ir/webrev.00/ > > This expands the write barrier to c2 IR after most optimizations are > over. It also takes care of finding a dominating null check and reshapes > the graph to enable implicit null checks. I'm slightly baffled by all the + if (!c_abi) { + __ mov(rscratch1, obj); + __ pop_call_clobbered_registers(); + __ mov(obj, rscratch1); + } else { + __ pop_call_clobbered_fp_registers(); + } stuff. What has the C ABI to do with this? And why does the C ABI only save floating-point registers? Andrew. From rwestrel at redhat.com Fri Oct 28 14:23:02 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 28 Oct 2016 16:23:02 +0200 Subject: Expand shenandoah write barrier as C2 IR In-Reply-To: <782ac60b-b152-de59-2574-4475bbc7cb23@redhat.com> References: <782ac60b-b152-de59-2574-4475bbc7cb23@redhat.com> Message-ID: > I'm slightly baffled by all the > > > + if (!c_abi) { > + __ mov(rscratch1, obj); > + __ pop_call_clobbered_registers(); > + __ mov(obj, rscratch1); > + } else { > + __ pop_call_clobbered_fp_registers(); > + } > > stuff. What has the C ABI to do with this? And why does the C ABI > only save floating-point registers? The stub generated with c_abi = true is called directly from compiled code (i.e. C2 CallLeafNoFP node) and the stub should stick to the C abi: no need to save r1-r4 or other caller save registers. It's a CallLeafNoFP and not a CallLeaf so the caller doesn't take care of saving fp registers (the rational being that it's an uncommon code path). Anyway, the plan is to move more of the stub into the compiled method itself so the stub will see further changes. Roland. From shade at redhat.com Fri Oct 28 16:46:38 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Oct 2016 18:46:38 +0200 Subject: RFR (S): Proper divide-n-conquer in array handling Message-ID: <55961f21-610d-3e6d-72e6-24ab29d3cd1e@redhat.com> Hi, It bugs me our GC workers are following the odd HS tradition of forking out the piece of work, and pushing the rest of the work back on queue. This effectively serializes the balancing work: with N workers, there should be N consecutive cutout-submit-steal interactions. Worse, this only gets amortized if the chunk work takes more time than queue stealing (which is a dangerous assumption to make). In fork/join world, we know how to seed the work queues more efficiently: you divide in half, and let others steal the largest half. Then you keep dividing until you hit the leaf task, which you can then execute. The upside of this is that stealers always get "larger" chunks of work, which they break down for themselves. This is what this patch does: http://cr.openjdk.java.net/~shade/shenandoah/concmark-proper-dnc/webrev.01/ (also did a few renames and comments) There are no clear performance improvements on our workloads, because mark costs are completely dominated by cache misses on "marked" bitset. But there is a chicken-and-egg problem: optimizing for faster marks would trim down the chunk work size, and run into task disbalance because of this issue. So I prefer to nail this down before it hits us. Testing: hs_gc_sheneandoah, SPECjvm2008, some microbenchmarks Thanks, -Aleksey From rkennke at redhat.com Fri Oct 28 17:39:25 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 19:39:25 +0200 Subject: RFR (S): Proper divide-n-conquer in array handling In-Reply-To: <55961f21-610d-3e6d-72e6-24ab29d3cd1e@redhat.com> References: <55961f21-610d-3e6d-72e6-24ab29d3cd1e@redhat.com> Message-ID: <1477676365.2548.53.camel@redhat.com> I like it. Please go. Roman Am Freitag, den 28.10.2016, 18:46 +0200 schrieb Aleksey Shipilev: > Hi, > > It bugs me our GC workers are following the odd HS tradition of > forking > out the piece of work, and pushing the rest of the work back on > queue. > This effectively serializes the balancing work: with N workers, there > should be N consecutive cutout-submit-steal interactions. Worse, this > only gets amortized if the chunk work takes more time than queue > stealing (which is a dangerous assumption to make). > > In fork/join world, we know how to seed the work queues more > efficiently: you divide in half, and let others steal the largest > half. > Then you keep dividing until you hit the leaf task, which you can > then > execute. The upside of this is that stealers always get "larger" > chunks > of work, which they break down for themselves. > > This is what this patch does: > ?http://cr.openjdk.java.net/~shade/shenandoah/concmark-proper-dnc/web > rev.01/ > ?(also did a few renames and comments) > > There are no clear performance improvements on our workloads, because > mark costs are completely dominated by cache misses on "marked" > bitset. > But there is a chicken-and-egg problem: optimizing for faster marks > would trim down the chunk work size, and run into task disbalance > because of this issue. So I prefer to nail this down before it hits > us. > > Testing: hs_gc_sheneandoah, SPECjvm2008, some microbenchmarks > > Thanks, > -Aleksey > > From shade at redhat.com Fri Oct 28 17:52:07 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Oct 2016 19:52:07 +0200 Subject: Heads-up: ReferenceCAS fails after C2 WB expansion Message-ID: The usual fails: CONF=linux-x86_64-normal-server-fastdebug LOG=info make test images TEST="hotspot_gc_shenandoah" # Internal Error (/home/shade/trunks/shenandoah-jdk9/hotspot/src/share/vm/opto/machnode.cpp:380), pid=22550, tid=22579 # assert(tp->base() != Type::AnyPtr) failed: not a bare pointer Since it is Friday night, let's turn ShenandoahWriteBarrierToIR off until Roland has a chance to look into it next week. $ hg diff diff -r 978d7601df14 src/share/vm/gc/shenandoah/shenandoah_globals.hpp - experimental(bool, ShenandoahWriteBarrierToIR, true, \ + experimental(bool, ShenandoahWriteBarrierToIR, false, \ Thanks, -Aleksey From rkennke at redhat.com Fri Oct 28 19:24:23 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 21:24:23 +0200 Subject: RFR: Fix store check Message-ID: <1477682663.2548.55.camel@redhat.com> There were two mistakes in the store check assembly: - It was comparing against first_region_bottom instead of last_region_end - It was using a 32-bit immediate compare op. For 64 bit, the operand needs to be copied into a tmp register first, and then compare against that. http://cr.openjdk.java.net/~rkennke/fixstorecheck/webrev.00/ Ok? Roman From rkennke at redhat.com Fri Oct 28 19:30:01 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 21:30:01 +0200 Subject: RFR: Little interpreter optimization in monitorenter/exit Message-ID: <1477683001.2548.58.camel@redhat.com> In monitorenter/exit there are two little loops that search for existing monitors for a given oop. We used to do read-barriers there to ensure we don't get false negatives. However, there's an invariant that we maintain: in BasicObjectLock, we only ever store to-space oops, and across GC pauses, we ensure those oops remain to-space. I took out the read-barrier and added store-checks to verify the invariant. http://cr.openjdk.java.net/~rkennke/interpr-monitors/webrev.01/ Ok? Roman From shade at redhat.com Fri Oct 28 20:06:53 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Oct 2016 22:06:53 +0200 Subject: RFR: Fix store check In-Reply-To: <1477682663.2548.55.camel@redhat.com> References: <1477682663.2548.55.camel@redhat.com> Message-ID: <15378fe4-232c-15f1-14f1-0f694b0349bf@redhat.com> On 10/28/2016 09:24 PM, Roman Kennke wrote: > There were two mistakes in the store check assembly: > - It was comparing against first_region_bottom instead of > last_region_end > - It was using a 32-bit immediate compare op. For 64 bit, the operand > needs to be copied into a tmp register first, and then compare against > that. > > http://cr.openjdk.java.net/~rkennke/fixstorecheck/webrev.00/ OK. -Aleksey From shade at redhat.com Fri Oct 28 20:12:54 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Oct 2016 22:12:54 +0200 Subject: RFR: Little interpreter optimization in monitorenter/exit In-Reply-To: <1477683001.2548.58.camel@redhat.com> References: <1477683001.2548.58.camel@redhat.com> Message-ID: <42998437-f50a-5eed-f215-fb86ee91fd5a@redhat.com> On 10/28/2016 09:30 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/interpr-monitors/webrev.01/ OK, makes sense. Thanks, -Aleksey From roman at kennke.org Fri Oct 28 20:19:04 2016 From: roman at kennke.org (roman at kennke.org) Date: Fri, 28 Oct 2016 20:19:04 +0000 Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets Message-ID: <201610282019.u9SKJ45x003981@aojmv0008.oracle.com> Changeset: fcf893b2f7a7 Author: rkennke Date: 2016-10-28 21:22 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fcf893b2f7a7 Fix in-heap check in x86 store check. ! src/cpu/x86/vm/macroAssembler_x86.cpp ! src/cpu/x86/vm/macroAssembler_x86.hpp Changeset: 393dd35cec61 Author: rkennke Date: 2016-10-28 21:27 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/393dd35cec61 Replace read-barrier in monitor-search-loop with store checks. ! src/cpu/x86/vm/templateTable_x86.cpp From rkennke at redhat.com Fri Oct 28 20:23:58 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 22:23:58 +0200 Subject: RFR: Fix jmp-if-possible macro assembly Message-ID: <1477686238.2548.61.camel@redhat.com> I introduced jmpb_if_possible and jccb_if_possible to macroAssembler_x86 in order to deal with increased offsets when ShenandoahStoreCheck is enabled. This patch fixes it so that it also works correctly with release build and under correct conditions. http://cr.openjdk.java.net/~rkennke/fix-jmp-if-possible/webrev.00/ Ok? From rkennke at redhat.com Fri Oct 28 20:30:57 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 22:30:57 +0200 Subject: RFR: Fix oop comparisons Message-ID: <1477686657.2548.63.camel@redhat.com> While debugging, I found 2 little issues. In JVM_StopThread we need to add an == barrier for comparing thread objects, and in stackwalk.cpp we have a redundant comparison. http://cr.openjdk.java.net/~rkennke/fixoopcomp/webrev.00/ Ok? Roman From roman at kennke.org Fri Oct 28 20:34:44 2016 From: roman at kennke.org (roman at kennke.org) Date: Fri, 28 Oct 2016 20:34:44 +0000 Subject: hg: shenandoah/jdk9/hotspot: Rename enter/exit_critical to pin_object in CollectedHeap. Message-ID: <201610282034.u9SKYikb007713@aojmv0008.oracle.com> Changeset: 3f6ab1f1bca0 Author: rkennke Date: 2016-10-28 22:34 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/3f6ab1f1bca0 Rename enter/exit_critical to pin_object in CollectedHeap. ! src/share/vm/gc/shared/collectedHeap.cpp ! src/share/vm/gc/shared/collectedHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.hpp ! src/share/vm/prims/jni.cpp From shade at redhat.com Fri Oct 28 20:35:25 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Oct 2016 22:35:25 +0200 Subject: RFR: Fix jmp-if-possible macro assembly In-Reply-To: <1477686238.2548.61.camel@redhat.com> References: <1477686238.2548.61.camel@redhat.com> Message-ID: <82ca1f38-e61d-0656-890a-dced4bcf0151@redhat.com> On 10/28/2016 10:23 PM, Roman Kennke wrote: > I introduced jmpb_if_possible and jccb_if_possible to > macroAssembler_x86 in order to deal with increased offsets when > ShenandoahStoreCheck is enabled. This patch fixes it so that it also > works correctly with release build and under correct conditions. > > http://cr.openjdk.java.net/~rkennke/fix-jmp-if-possible/webrev.00/ Not very sure about Shenandoah-specific things in Assembler. Can't we rewrite the potentially non-jmpb cases with jmp(Label, bool = maybe_short)? Or, maybe we should do: void Assembler::jmpb_unless(Label& L, bool test) { if (test) { jmp(L, /* maybe_short = */ true); } else jmpb(L); } } ...then usages would be specific: jmpb_unless(DONE_LABEL, ShenandoahStoreCheck); Thanks, -Aleksey From shade at redhat.com Fri Oct 28 20:38:09 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Oct 2016 22:38:09 +0200 Subject: RFR: Fix oop comparisons In-Reply-To: <1477686657.2548.63.camel@redhat.com> References: <1477686657.2548.63.camel@redhat.com> Message-ID: On 10/28/2016 10:30 PM, Roman Kennke wrote: > While debugging, I found 2 little issues. In JVM_StopThread we need to > add an == barrier for comparing thread objects, and in stackwalk.cpp we > have a redundant comparison. > > http://cr.openjdk.java.net/~rkennke/fixoopcomp/webrev.00/ Looks OK. Can go straight into upstream? Thanks, -Aleksey From rkennke at redhat.com Fri Oct 28 20:38:59 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 22:38:59 +0200 Subject: RFR: Fix oop comparisons In-Reply-To: References: <1477686657.2548.63.camel@redhat.com> Message-ID: <1477687139.2548.64.camel@redhat.com> Am Freitag, den 28.10.2016, 22:38 +0200 schrieb Aleksey Shipilev: > On 10/28/2016 10:30 PM, Roman Kennke wrote: > > > > While debugging, I found 2 little issues. In JVM_StopThread we need > > to > > add an == barrier for comparing thread objects, and in > > stackwalk.cpp we > > have a redundant comparison. > > > > http://cr.openjdk.java.net/~rkennke/fixoopcomp/webrev.00/ > > Looks OK. Can go straight into upstream? Nope. Because oopDesc::unsafe:equals() is introduced by us. Roman From rkennke at redhat.com Fri Oct 28 20:43:52 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 22:43:52 +0200 Subject: RFR: Make ReferenceCAS test more aggressive Message-ID: <1477687432.2548.67.camel@redhat.com> This does the following to ReferenceCAS testcase: - Make it use -XX:ShenandoahGCHeuristics=aggressive - Use new string objects rather than constants - verify that CAS does not destroy cmp values (yes, I have seen this...) http://cr.openjdk.java.net/~rkennke/refcas/webrev.00/ Ok? Roman From roman at kennke.org Fri Oct 28 20:45:26 2016 From: roman at kennke.org (roman at kennke.org) Date: Fri, 28 Oct 2016 20:45:26 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix oop comparison in JVM_StopThread() and remove redundant comparison in stackwalk. Message-ID: <201610282045.u9SKjRID010515@aojmv0008.oracle.com> Changeset: c3df4ac5091c Author: rkennke Date: 2016-10-28 22:45 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/c3df4ac5091c Fix oop comparison in JVM_StopThread() and remove redundant comparison in stackwalk. ! src/share/vm/prims/jvm.cpp ! src/share/vm/prims/stackwalk.cpp From rkennke at redhat.com Fri Oct 28 20:48:17 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 28 Oct 2016 22:48:17 +0200 Subject: Heads-up: ReferenceCAS fails after C2 WB expansion In-Reply-To: References: Message-ID: <1477687697.2548.68.camel@redhat.com> Am Freitag, den 28.10.2016, 19:52 +0200 schrieb Aleksey Shipilev: > The usual fails: > > CONF=linux-x86_64-normal-server-fastdebug LOG=info make test images > TEST="hotspot_gc_shenandoah" > > #??Internal Error > (/home/shade/trunks/shenandoah- > jdk9/hotspot/src/share/vm/opto/machnode.cpp:380), > pid=22550, tid=22579 > #??assert(tp->base() != Type::AnyPtr) failed: not a bare pointer > > Since it is Friday night, let's turn ShenandoahWriteBarrierToIR off > until Roland has a chance to look into it next week. > > $ hg diff > diff -r 978d7601df14 > src/share/vm/gc/shenandoah/shenandoah_globals.hpp > -??experimental(bool, ShenandoahWriteBarrierToIR, true, > ????\ > +??experimental(bool, ShenandoahWriteBarrierToIR, false, > ????\ Oh yes, please go ahead and do that! Roman From ashipile at redhat.com Fri Oct 28 20:50:14 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 28 Oct 2016 20:50:14 +0000 Subject: hg: shenandoah/jdk9/hotspot: Turn off C2 WB expansion until the ReferenceCAS test bug is fixed. Message-ID: <201610282050.u9SKoE9R011736@aojmv0008.oracle.com> Changeset: 6f23d8404b9e Author: shade Date: 2016-10-28 22:49 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/6f23d8404b9e Turn off C2 WB expansion until the ReferenceCAS test bug is fixed. ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp From shade at redhat.com Fri Oct 28 20:51:38 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Oct 2016 22:51:38 +0200 Subject: Heads-up: ReferenceCAS fails after C2 WB expansion In-Reply-To: <1477687697.2548.68.camel@redhat.com> References: <1477687697.2548.68.camel@redhat.com> Message-ID: On 10/28/2016 10:48 PM, Roman Kennke wrote: > Am Freitag, den 28.10.2016, 19:52 +0200 schrieb Aleksey Shipilev: >> The usual fails: >> >> CONF=linux-x86_64-normal-server-fastdebug LOG=info make test images >> TEST="hotspot_gc_shenandoah" >> >> # Internal Error >> (/home/shade/trunks/shenandoah- >> jdk9/hotspot/src/share/vm/opto/machnode.cpp:380), >> pid=22550, tid=22579 >> # assert(tp->base() != Type::AnyPtr) failed: not a bare pointer >> >> Since it is Friday night, let's turn ShenandoahWriteBarrierToIR off >> until Roland has a chance to look into it next week. >> >> $ hg diff >> diff -r 978d7601df14 >> src/share/vm/gc/shenandoah/shenandoah_globals.hpp >> - experimental(bool, ShenandoahWriteBarrierToIR, true, >> \ >> + experimental(bool, ShenandoahWriteBarrierToIR, false, >> \ > > Oh yes, please go ahead and do that! Pushed. Roland, please fix the failure and enable the flag back :) Thanks, -Aleksey From shade at redhat.com Fri Oct 28 20:54:31 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 28 Oct 2016 22:54:31 +0200 Subject: Heads-up: ReferenceCAS fails after C2 WB expansion In-Reply-To: References: <1477687697.2548.68.camel@redhat.com> Message-ID: On 10/28/2016 10:51 PM, Aleksey Shipilev wrote: > On 10/28/2016 10:48 PM, Roman Kennke wrote: >> Am Freitag, den 28.10.2016, 19:52 +0200 schrieb Aleksey Shipilev: >>> The usual fails: >>> >>> CONF=linux-x86_64-normal-server-fastdebug LOG=info make test images >>> TEST="hotspot_gc_shenandoah" >>> >>> # Internal Error >>> (/home/shade/trunks/shenandoah- >>> jdk9/hotspot/src/share/vm/opto/machnode.cpp:380), >>> pid=22550, tid=22579 >>> # assert(tp->base() != Type::AnyPtr) failed: not a bare pointer >>> >>> Since it is Friday night, let's turn ShenandoahWriteBarrierToIR off >>> until Roland has a chance to look into it next week. >>> >>> $ hg diff >>> diff -r 978d7601df14 >>> src/share/vm/gc/shenandoah/shenandoah_globals.hpp >>> - experimental(bool, ShenandoahWriteBarrierToIR, true, >>> \ >>> + experimental(bool, ShenandoahWriteBarrierToIR, false, >>> \ >> >> Oh yes, please go ahead and do that! > > Pushed. Roland, please fix the failure and enable the flag back :) Ouch. This starts failing *again* even with the flag turned off, if you apply Roman's stronger test (sigh): http://cr.openjdk.java.net/~rkennke/refcas/webrev.00/ Thanks, -Aleksey From ashipile at redhat.com Fri Oct 28 21:12:41 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 28 Oct 2016 21:12:41 +0000 Subject: hg: shenandoah/jdk9/hotspot: Efficient divide-and-conquer for chunked array handling during mark. Message-ID: <201610282112.u9SLCgoB018291@aojmv0008.oracle.com> Changeset: f66cf3bcac8e Author: shade Date: 2016-10-28 23:11 +0200 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/f66cf3bcac8e Efficient divide-and-conquer for chunked array handling during mark. ! src/share/vm/gc/shared/taskqueue.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp From rwestrel at redhat.com Mon Oct 31 08:05:51 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 31 Oct 2016 09:05:51 +0100 Subject: Heads-up: ReferenceCAS fails after C2 WB expansion In-Reply-To: References: <1477687697.2548.68.camel@redhat.com> Message-ID: It looks like a duplicate of JDK-8167298. Can you verify the bug goes away with the fix (not yet upstream): http://cr.openjdk.java.net/~roland/8167298/webrev.01/ Roland. From shade at redhat.com Mon Oct 31 10:53:55 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 31 Oct 2016 11:53:55 +0100 Subject: Heads-up: ReferenceCAS fails after C2 WB expansion In-Reply-To: References: <1477687697.2548.68.camel@redhat.com> Message-ID: On 10/31/2016 09:05 AM, Roland Westrelin wrote: > It looks like a duplicate of JDK-8167298. Can you verify the bug goes > away with the fix (not yet upstream): > > http://cr.openjdk.java.net/~roland/8167298/webrev.01/ Yes, it passes! Let's cherry-pick the change, along with the Roman's aggressive test, and enable the C2 WB expansion back: http://cr.openjdk.java.net/~shade/shenandoah/reference-cas-cherrypick/webrev.01/ -Aleksey From rwestrel at redhat.com Mon Oct 31 15:16:17 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Mon, 31 Oct 2016 15:16:17 +0000 Subject: hg: shenandoah/jdk9/hotspot: assert in PhaseIdealLoop::shenandoah_fix_memory_uses() is too strong Message-ID: <201610311516.u9VFGH1u019338@aojmv0008.oracle.com> Changeset: b3776237524a Author: roland Date: 2016-10-31 16:16 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/b3776237524a assert in PhaseIdealLoop::shenandoah_fix_memory_uses() is too strong ! src/share/vm/opto/shenandoahSupport.cpp From rkennke at redhat.com Mon Oct 31 15:21:03 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 31 Oct 2016 16:21:03 +0100 Subject: Heads-up: ReferenceCAS fails after C2 WB expansion In-Reply-To: References: <1477687697.2548.68.camel@redhat.com> Message-ID: <1477927263.4215.1.camel@redhat.com> Am Montag, den 31.10.2016, 11:53 +0100 schrieb Aleksey Shipilev: > On 10/31/2016 09:05 AM, Roland Westrelin wrote: > > > > It looks like a duplicate of JDK-8167298. Can you verify the bug > > goes > > away with the fix (not yet upstream): > > > > http://cr.openjdk.java.net/~roland/8167298/webrev.01/ > > Yes, it passes! > > Let's cherry-pick the change, along with the Roman's aggressive test, > and enable the C2 WB expansion back: > > http://cr.openjdk.java.net/~shade/shenandoah/reference-cas- > cherrypick/webrev.01/ Yes please go ahead! Roman From ashipile at redhat.com Mon Oct 31 15:28:39 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 31 Oct 2016 15:28:39 +0000 Subject: hg: shenandoah/jdk9/hotspot: Cherry-pick JDK-8167298 change, modify ReferenceCAS to be more aggressive, turn back C2 opto. Message-ID: <201610311528.u9VFSds2022293@aojmv0008.oracle.com> Changeset: 658bdee8b6ab Author: shade Date: 2016-10-31 16:26 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/658bdee8b6ab Cherry-pick JDK-8167298 change, modify ReferenceCAS to be more aggressive, turn back C2 opto. ! src/share/vm/adlc/formssel.cpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! test/gc/shenandoah/cas/ReferenceCAS.java From rkennke at redhat.com Mon Oct 31 16:28:22 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 31 Oct 2016 17:28:22 +0100 Subject: RFR: Fix inconsistency between global and local evac-in-progress flag Message-ID: <1477931302.4215.9.camel@redhat.com> In rare circumstances, we could get an inconsistency between the global JavaThread::_evacuation_in_progress_global and JavaThread::_evacuation_in_progress flags. This could happen when using jni_AttachCurrentThread. In this situation, the just-attached thread could line up before the Threads_lock with its evac flag turned off, while the safepoint that holds the Threads_lock would turn it on. In this case, the attached thread would not turn evacuation on for that GC cycle, and we'd miss stores. I've reused a testcase that Aleksey has written. It uses shutdown hooks, because that's where we've first observed the bug. And it looks like the only use of AttachCurrentThread inside the VM too. The test is relatively long running, otherwise it wouldn't very likely expose the bug. The fix is to sync the local and global flags in JavaThread::initialize_queues(), this is where the other local/global flags for SATB and dirty card queue are synced too. (In the future we might want to rename this method?) I've also protected the code that initially sets all threads to evacuation on/off by the Threads_lock to avoid races. This is only necessary when turning the flag off, because we turn it on only at safepoints, and we're already holding the Threads_lock there. http://cr.openjdk.java.net/~rkennke/synclocalglobalevac/webrev.00/ Ok to go? Roman From shade at redhat.com Mon Oct 31 16:36:32 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 31 Oct 2016 17:36:32 +0100 Subject: RFR: Fix inconsistency between global and local evac-in-progress flag In-Reply-To: <1477931302.4215.9.camel@redhat.com> References: <1477931302.4215.9.camel@redhat.com> Message-ID: <66b71a9f-6ed9-78ab-1e16-2396b14827e1@redhat.com> On 10/31/2016 05:28 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/synclocalglobalevac/webrev.00/ Okay. Thanks, -Aleksey From shade at redhat.com Mon Oct 31 16:45:02 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 31 Oct 2016 17:45:02 +0100 Subject: RFR (M): Various cleanups: (un)signed math, method names, leftover code, etc Message-ID: <0bce118a-da04-56c9-accb-e44773a4a3be@redhat.com> Hi, I have managed to open Hotspot in CLion, and it barraged me with warnings I could not resist to fix. These include some (un)signed math problems, inconsistent method names, leftover code, etc. See: http://cr.openjdk.java.net/~shade/shenandoah/cleanups-1/webrev.01/ This is not entirely complete, but fixes some blatantly obvious things. Ok to go? Thanks, -Aleksey From rkennke at redhat.com Mon Oct 31 16:48:33 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 31 Oct 2016 17:48:33 +0100 Subject: RFR (M): Various cleanups: (un)signed math, method names, leftover code, etc In-Reply-To: <0bce118a-da04-56c9-accb-e44773a4a3be@redhat.com> References: <0bce118a-da04-56c9-accb-e44773a4a3be@redhat.com> Message-ID: <1477932513.4215.10.camel@redhat.com> Am Montag, den 31.10.2016, 17:45 +0100 schrieb Aleksey Shipilev: > Hi, > > I have managed to open Hotspot in CLion, and it barraged me with > warnings I could not resist to fix. These include some (un)signed > math > problems, inconsistent method names, leftover code, etc. > > See: > ? http://cr.openjdk.java.net/~shade/shenandoah/cleanups-1/webrev.01/ > > This is not entirely complete, but fixes some blatantly obvious > things. > Ok to go? Nice. Go! Roman From ashipile at redhat.com Mon Oct 31 16:56:20 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 31 Oct 2016 16:56:20 +0000 Subject: hg: shenandoah/jdk9/hotspot: Various cleanups: (un)signed math, method names, leftover code, etc Message-ID: <201610311656.u9VGuKAK016113@aojmv0008.oracle.com> Changeset: c7a2d9ce5168 Author: shade Date: 2016-10-31 17:55 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/c7a2d9ce5168 Various cleanups: (un)signed math, method names, leftover code, etc ! src/share/vm/gc/shenandoah/brooksPointer.hpp ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectionSet.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.cpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp From roman at kennke.org Mon Oct 31 17:05:33 2016 From: roman at kennke.org (roman at kennke.org) Date: Mon, 31 Oct 2016 17:05:33 +0000 Subject: hg: shenandoah/jdk9/hotspot: Make sure to sync local and global evac-in-progress flags correctly. Message-ID: <201610311705.u9VH5Xa8018690@aojmv0008.oracle.com> Changeset: b24395051cc1 Author: rkennke Date: 2016-10-31 17:45 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/b24395051cc1 Make sure to sync local and global evac-in-progress flags correctly. ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/runtime/thread.cpp + test/gc/shenandoah/EvilSyncBug.java From ashipile at redhat.com Mon Oct 31 19:02:08 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 31 Oct 2016 19:02:08 +0000 Subject: hg: shenandoah/jdk9/hotspot: Separate Full GC counters in GC stats. Message-ID: <201610311902.u9VJ28Tg027315@aojmv0008.oracle.com> Changeset: e52c6b2eba6d Author: shade Date: 2016-10-31 20:01 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e52c6b2eba6d Separate Full GC counters in GC stats. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp