From rwestrel at redhat.com Thu Dec 1 08:12:37 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Thu, 01 Dec 2016 08:12:37 +0000 Subject: hg: shenandoah/jdk9/hotspot: Couple fixes to write barrier expansion Message-ID: <201612010812.uB18Cbdr018883@aojmv0008.oracle.com> Changeset: 7e4baa0817d1 Author: roland Date: 2016-12-01 08:49 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/7e4baa0817d1 Couple fixes to write barrier expansion ! src/share/vm/classfile/classLoader.cpp ! src/share/vm/opto/shenandoahSupport.cpp From rwestrel at redhat.com Thu Dec 1 08:23:54 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Thu, 01 Dec 2016 08:23:54 +0000 Subject: hg: shenandoah/jdk9/hotspot: undo change made by mistake to compile the world Message-ID: <201612010823.uB18NsQh022239@aojmv0008.oracle.com> Changeset: dfa629752080 Author: roland Date: 2016-12-01 09:23 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/dfa629752080 undo change made by mistake to compile the world ! src/share/vm/classfile/classLoader.cpp From rwestrel at redhat.com Fri Dec 2 16:33:33 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 02 Dec 2016 17:33:33 +0100 Subject: replace barrier's input with barrier's output in all dominated uses to decrease pressure on register allocator Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/registerpressure/webrev.00/ This implements Roman's suggestion that when we use an oop directly and there's a dominating barrier, we can safely replace the oop by the output of the barrier. So for instance: a' = rb(a); .. call(a); can also be compiled as: a' = rb(a); .. call(a'); and if there's no use of a after the barrier then we don't keep both a and a' live but only a'. This is implemented in the patch: - for write barriers at barrier expansion time. - for read barriers, when read barriers are scheduled. Roland. From rkennke at redhat.com Fri Dec 2 17:34:02 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 02 Dec 2016 18:34:02 +0100 Subject: replace barrier's input with barrier's output in all dominated uses to decrease pressure on register allocator In-Reply-To: References: Message-ID: <1480700042.2597.2.camel@redhat.com> Sounds and looks good. The changes in shenandoahHeap.cpp|hpp seem unrelated though. Roman Am Freitag, den 02.12.2016, 17:33 +0100 schrieb Roland Westrelin: > http://cr.openjdk.java.net/~roland/shenandoah/registerpressure/webrev > .00/ > > This implements Roman's suggestion that when we use an oop directly > and > there's a dominating barrier, we can safely replace the oop by the > output of the barrier. So for instance: > > a' = rb(a); > .. > call(a); > > can also be compiled as: > > a' = rb(a); > .. > call(a'); > > and if there's no use of a after the barrier then we don't keep both > a > and a' live but only a'. > > This is implemented in the patch: > - for write barriers at barrier expansion time. > - for read barriers, when read barriers are scheduled. > > Roland. From rwestrel at redhat.com Fri Dec 2 17:35:07 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 2 Dec 2016 18:35:07 +0100 Subject: replace barrier's input with barrier's output in all dominated uses to decrease pressure on register allocator In-Reply-To: <1480700042.2597.2.camel@redhat.com> References: <1480700042.2597.2.camel@redhat.com> Message-ID: <98eaa9b0-a605-986b-82ab-7237ed017961@redhat.com> > The changes in shenandoahHeap.cpp|hpp seem unrelated though. They are. At some point I tried building an optimized build and those changes were required. Roland. From rkennke at redhat.com Fri Dec 2 17:37:55 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 02 Dec 2016 18:37:55 +0100 Subject: replace barrier's input with barrier's output in all dominated uses to decrease pressure on register allocator In-Reply-To: <98eaa9b0-a605-986b-82ab-7237ed017961@redhat.com> References: <1480700042.2597.2.camel@redhat.com> <98eaa9b0-a605-986b-82ab-7237ed017961@redhat.com> Message-ID: <1480700275.2597.3.camel@redhat.com> Am Freitag, den 02.12.2016, 18:35 +0100 schrieb Roland Westrelin: > > The changes in shenandoahHeap.cpp|hpp seem unrelated though. > > They are. At some point I tried building an optimized build and those > changes were required. Hmm, strange. Doesn't matter, please push! Roman From shade at redhat.com Mon Dec 5 15:14:42 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 5 Dec 2016 16:14:42 +0100 Subject: RFR (XS): Cherry-pick 8169261: Fix for JDK-8067744 creates build failures with some versions of gcc and/or linux Message-ID: <324ede33-1036-9ba3-933a-e6c83858e78a@redhat.com> Going to cherry-pick this one, otherwise hotspot_gc_shenandoah does not run: http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/d9e9bc313c5a Ok? Thanks, -Aleksey From shade at redhat.com Mon Dec 5 16:00:52 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 5 Dec 2016 17:00:52 +0100 Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled Message-ID: Hi, Currently, when concurrent GC is canceled, we still enter the VM operation for concurrent evacuation, only to exit it quickly and slide into the full GC. This causes *two* back-to-back safepoints: one short from evac, and another large for full GC. While short one is normally short, it can hit the unlucky scheduling outlier and drag the pause time up. This change avoids going to evac if conc GC was canceled: http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.01/ Additionally, it resets the mark bitmaps before full GC with parallel workers, not concurrent ones, which would be important once Zhengyu trims down the number of concurrent workers. Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick) Thanks, -Aleksey From rkennke at redhat.com Mon Dec 5 16:04:13 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 5 Dec 2016 11:04:13 -0500 (EST) Subject: RFR (XS): Cherry-pick 8169261: Fix for JDK-8067744 creates build failures with some versions of gcc and/or linux Message-ID: <531194503.4583034.1480953853677.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com> Yes. Generally, I'd say what is approved upstream doesn't need approval here, unless you are unsure for some reason. Roman Am 05.12.2016 4:15 nachm. schrieb Aleksey Shipilev : > > Going to cherry-pick this one, otherwise hotspot_gc_shenandoah does not run: > ? http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/d9e9bc313c5a > > Ok? > > Thanks, > -Aleksey > From ashipile at redhat.com Mon Dec 5 16:05:34 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 05 Dec 2016 16:05:34 +0000 Subject: hg: shenandoah/jdk9/hotspot: Cherry-pick 8169261: Fix for JDK-8067744 creates build failures with some versions of gcc and/or linux Message-ID: <201612051605.uB5G5YTA011291@aojmv0008.oracle.com> Changeset: 5db8e70a5237 Author: shade Date: 2016-12-05 16:26 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/5db8e70a5237 Cherry-pick 8169261: Fix for JDK-8067744 creates build failures with some versions of gcc and/or linux ! make/test/JtregNative.gmk From shade at redhat.com Mon Dec 5 16:07:48 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 5 Dec 2016 17:07:48 +0100 Subject: RFR (XS): Cherry-pick 8169261: Fix for JDK-8067744 creates build failures with some versions of gcc and/or linux In-Reply-To: <531194503.4583034.1480953853677.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com> References: <531194503.4583034.1480953853677.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com> Message-ID: <10d07415-7b51-c544-1ada-e44b1772ad1e@redhat.com> Okay, if that does not complicates the merge somehow :) -Aleksey On 12/05/2016 05:04 PM, Roman Kennke wrote: > Yes. > Generally, I'd say what is approved upstream doesn't need approval here, unless you are unsure for some reason. > > Roman > > Am 05.12.2016 4:15 nachm. schrieb Aleksey Shipilev : >> >> Going to cherry-pick this one, otherwise hotspot_gc_shenandoah does not run: >> http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/d9e9bc313c5a >> >> Ok? >> >> Thanks, >> -Aleksey >> From rkennke at redhat.com Mon Dec 5 17:20:01 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 05 Dec 2016 18:20:01 +0100 Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled In-Reply-To: References: Message-ID: <1480958401.2597.8.camel@redhat.com> Some comments: - GC can be cancelled during final-mark-pause. Might be worth to keep the check for cancelled-gc after init-mark-pause. Same after evacuation: if evacuation gets cancelled, we don't need to reset the bitmaps because now it's done at start of full-gc. I think. - This here looks wrong: +??// b. Cancel evacuation, if in progress +??if (_heap->is_evacuation_in_progress()) { +????MutexLocker mu(Threads_lock); +????_heap->set_evacuation_in_progress(false); +??} This happens during safepoint. The VMThread would hold the Threads_lock and the above would deadlock. We need to acquire the Threads_lock only when turning off evacuation outside of a safepoint. Roman Am Montag, den 05.12.2016, 17:00 +0100 schrieb Aleksey Shipilev: > Hi, > > Currently, when concurrent GC is canceled, we still enter the VM > operation for > concurrent evacuation, only to exit it quickly and slide into the > full GC. This > causes *two* back-to-back safepoints: one short from evac, and > another large for > full GC. While short one is normally short, it can hit the unlucky > scheduling > outlier and drag the pause time up. > > This change avoids going to evac if conc GC was canceled: > ? http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev. > 01/ > > Additionally, it resets the mark bitmaps before full GC with parallel > workers, > not concurrent ones, which would be important once Zhengyu trims down > the number > of concurrent workers. > > Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick) > > Thanks, > -Aleksey > From zgu at redhat.com Mon Dec 5 17:25:58 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 5 Dec 2016 12:25:58 -0500 Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled In-Reply-To: References: Message-ID: <9ab07a34-8406-481d-870a-7d03688eeae4@redhat.com> 114 // b. Cancel evacuation, if in progress 115 if (_heap->is_evacuation_in_progress()) { 116 MutexLocker mu(Threads_lock); 117 _heap->set_evacuation_in_progress(false); 118 } I think that we can eliminate Threads_lock above by changing the assertion below: void JavaThread::set_evacuation_in_progress_all_threads(bool in_prog) { assert(Threads_lock->owned_by_self(), "must hold Threads_lock"); <==== assert_locked_or_safepoint(Threads_lock) _evacuation_in_progress_global = in_prog; for (JavaThread* t = Threads::first(); t != NULL; t = t->next()) { t->set_evacuation_in_progress(in_prog); } } Thanks, -Zhengyu On 12/05/2016 11:00 AM, Aleksey Shipilev wrote: > Hi, > > Currently, when concurrent GC is canceled, we still enter the VM operation for > concurrent evacuation, only to exit it quickly and slide into the full GC. This > causes *two* back-to-back safepoints: one short from evac, and another large for > full GC. While short one is normally short, it can hit the unlucky scheduling > outlier and drag the pause time up. > > This change avoids going to evac if conc GC was canceled: > http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.01/ > > Additionally, it resets the mark bitmaps before full GC with parallel workers, > not concurrent ones, which would be important once Zhengyu trims down the number > of concurrent workers. > > Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick) > > Thanks, > -Aleksey > From rkennke at redhat.com Mon Dec 5 17:28:41 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 05 Dec 2016 18:28:41 +0100 Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled In-Reply-To: <9ab07a34-8406-481d-870a-7d03688eeae4@redhat.com> References: <9ab07a34-8406-481d-870a-7d03688eeae4@redhat.com> Message-ID: <1480958921.2597.10.camel@redhat.com> Am Montag, den 05.12.2016, 12:25 -0500 schrieb Zhengyu Gu: > 114 // b. Cancel evacuation, if in progress > 115 if (_heap->is_evacuation_in_progress()) { > 116 MutexLocker mu(Threads_lock); > 117 _heap->set_evacuation_in_progress(false); > 118 } > > > I think that we can eliminate Threads_lock above by changing the > assertion below: > > void JavaThread::set_evacuation_in_progress_all_threads(bool in_prog) > { > ???assert(Threads_lock->owned_by_self(), "must hold > Threads_lock");???????<==== assert_locked_or_safepoint(Threads_lock) > ???_evacuation_in_progress_global = in_prog; > ???for (JavaThread* t = Threads::first(); t != NULL; t = t->next()) { > ?????t->set_evacuation_in_progress(in_prog); > ???} > } No, I don't think so. We're iterating over the threads, so we should hold that lock. However, as I mentioned in that other email, the VMThread should already hold it. Now that I think about it again, it's probably not going to deadlock, it's simply re-entrant. In any case, acquiring it should not be necessary. Roman > > > Thanks, > > -Zhengyu > > On 12/05/2016 11:00 AM, Aleksey Shipilev wrote: > > Hi, > > > > Currently, when concurrent GC is canceled, we still enter the VM > > operation for > > concurrent evacuation, only to exit it quickly and slide into the > > full GC. This > > causes *two* back-to-back safepoints: one short from evac, and > > another large for > > full GC. While short one is normally short, it can hit the unlucky > > scheduling > > outlier and drag the pause time up. > > > > This change avoids going to evac if conc GC was canceled: > > ???http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webr > > ev.01/ > > > > Additionally, it resets the mark bitmaps before full GC with > > parallel workers, > > not concurrent ones, which would be important once Zhengyu trims > > down the number > > of concurrent workers. > > > > Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick) > > > > Thanks, > > -Aleksey > > From zgu at redhat.com Mon Dec 5 17:43:59 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 5 Dec 2016 12:43:59 -0500 Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled In-Reply-To: <1480958921.2597.10.camel@redhat.com> References: <9ab07a34-8406-481d-870a-7d03688eeae4@redhat.com> <1480958921.2597.10.camel@redhat.com> Message-ID: <9f42aaa7-7b00-006f-580d-812ccdd7bb7b@redhat.com> On 12/05/2016 12:28 PM, Roman Kennke wrote: > Am Montag, den 05.12.2016, 12:25 -0500 schrieb Zhengyu Gu: >> 114 // b. Cancel evacuation, if in progress >> 115 if (_heap->is_evacuation_in_progress()) { >> 116 MutexLocker mu(Threads_lock); >> 117 _heap->set_evacuation_in_progress(false); >> 118 } >> >> >> I think that we can eliminate Threads_lock above by changing the >> assertion below: >> >> void JavaThread::set_evacuation_in_progress_all_threads(bool in_prog) >> { >> assert(Threads_lock->owned_by_self(), "must hold >> Threads_lock"); <==== assert_locked_or_safepoint(Threads_lock) >> _evacuation_in_progress_global = in_prog; >> for (JavaThread* t = Threads::first(); t != NULL; t = t->next()) { >> t->set_evacuation_in_progress(in_prog); >> } >> } > No, I don't think so. We're iterating over the threads, so we should > hold that lock. However, as I mentioned in that other email, the > VMThread should already hold it. Now that I think about it again, it's > probably not going to deadlock, it's simply re-entrant. In any case, > acquiring it should not be necessary. I think that it is safe to iterate over the thread list without the Thread_lock during safepoint. Check following code: void Threads::threads_do(ThreadClosure* tc) { assert_locked_or_safepoint(Threads_lock); // ALL_JAVA_THREADS iterates through all JavaThreads ALL_JAVA_THREADS(p) { tc->do_thread(p); } .... -Zhengyu > Roman > >> >> Thanks, >> >> -Zhengyu >> >> On 12/05/2016 11:00 AM, Aleksey Shipilev wrote: >>> Hi, >>> >>> Currently, when concurrent GC is canceled, we still enter the VM >>> operation for >>> concurrent evacuation, only to exit it quickly and slide into the >>> full GC. This >>> causes *two* back-to-back safepoints: one short from evac, and >>> another large for >>> full GC. While short one is normally short, it can hit the unlucky >>> scheduling >>> outlier and drag the pause time up. >>> >>> This change avoids going to evac if conc GC was canceled: >>> http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webr >>> ev.01/ >>> >>> Additionally, it resets the mark bitmaps before full GC with >>> parallel workers, >>> not concurrent ones, which would be important once Zhengyu trims >>> down the number >>> of concurrent workers. >>> >>> Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick) >>> >>> Thanks, >>> -Aleksey >>> From rkennke at redhat.com Mon Dec 5 17:47:03 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 5 Dec 2016 12:47:03 -0500 (EST) Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled Message-ID: <1793230632.4612707.1480960023576.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com> Ah yes I see what you mean. Yes we can change to assert_locked_or_safepoint() there. /Roman Am 05.12.2016 6:44 nachm. schrieb Zhengyu Gu : > > On 12/05/2016 12:28 PM, Roman Kennke wrote: > > > Am Montag, den 05.12.2016, 12:25 -0500 schrieb Zhengyu Gu: > >> 114 // b. Cancel evacuation, if in progress > >> 115 if (_heap->is_evacuation_in_progress()) { > >> 116 MutexLocker mu(Threads_lock); > >> 117 _heap->set_evacuation_in_progress(false); > >> 118 } > >> > >> > >> I think that we can eliminate Threads_lock above by changing the > >> assertion below: > >> > >> void JavaThread::set_evacuation_in_progress_all_threads(bool in_prog) > >> { > >>???? assert(Threads_lock->owned_by_self(), "must hold > >> Threads_lock");?????? <==== assert_locked_or_safepoint(Threads_lock) > >>???? _evacuation_in_progress_global = in_prog; > >>???? for (JavaThread* t = Threads::first(); t != NULL; t = t->next()) { > >>?????? t->set_evacuation_in_progress(in_prog); > >>???? } > >> } > > No, I don't think so. We're iterating over the threads, so we should > > hold that lock. However, as I mentioned in that other email, the > > VMThread should already hold it. Now that I think about it again, it's > > probably not going to deadlock, it's simply re-entrant. In any case, > > acquiring it should not be necessary. > > I think that it is safe to iterate over the thread list without the Thread_lock during safepoint. > > Check following code: > > void Threads::threads_do(ThreadClosure* tc) { > ?? assert_locked_or_safepoint(Threads_lock); > ?? // ALL_JAVA_THREADS iterates through all JavaThreads > ?? ALL_JAVA_THREADS(p) { > ???? tc->do_thread(p); > ?? } > > .... > > > -Zhengyu > > > > > > Roman > > > >> > >> Thanks, > >> > >> -Zhengyu > >> > >> On 12/05/2016 11:00 AM, Aleksey Shipilev wrote: > >>> Hi, > >>> > >>> Currently, when concurrent GC is canceled, we still enter the VM > >>> operation for > >>> concurrent evacuation, only to exit it quickly and slide into the > >>> full GC. This > >>> causes *two* back-to-back safepoints: one short from evac, and > >>> another large for > >>> full GC. While short one is normally short, it can hit the unlucky > >>> scheduling > >>> outlier and drag the pause time up. > >>> > >>> This change avoids going to evac if conc GC was canceled: > >>>???? http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webr > >>> ev.01/ > >>> > >>> Additionally, it resets the mark bitmaps before full GC with > >>> parallel workers, > >>> not concurrent ones, which would be important once Zhengyu trims > >>> down the number > >>> of concurrent workers. > >>> > >>> Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick) > >>> > >>> Thanks, > >>> -Aleksey > >>> > From shade at redhat.com Mon Dec 5 18:09:52 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 5 Dec 2016 19:09:52 +0100 Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled In-Reply-To: <1480958401.2597.8.camel@redhat.com> References: <1480958401.2597.8.camel@redhat.com> Message-ID: <5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com> Okay! How about this then? http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.02/ Rewired ShenandoahConcurrentThread to capture cancellation after each phase. Once a phase fails, it will re-spin towards full GC, which will recover. Also dropped a mutex acquire in mark-compact, and changed assert to assert_locked_or_safepoint. Still passes hs_gc_shenandoah, and jcstress run is chugging along. Thanks, -Aleksey On 12/05/2016 06:20 PM, Roman Kennke wrote: > Some comments: > > - GC can be cancelled during final-mark-pause. Might be worth to keep > the check for cancelled-gc after init-mark-pause. Same after > evacuation: if evacuation gets cancelled, we don't need to reset the > bitmaps because now it's done at start of full-gc. I think. > > - This here looks wrong: > > + // b. Cancel evacuation, if in progress > + if (_heap->is_evacuation_in_progress()) { > + MutexLocker mu(Threads_lock); > + _heap->set_evacuation_in_progress(false); > + } > > This happens during safepoint. The VMThread would hold the Threads_lock > and the above would deadlock. > > We need to acquire the Threads_lock only when turning off evacuation > outside of a safepoint. > > Roman > > Am Montag, den 05.12.2016, 17:00 +0100 schrieb Aleksey Shipilev: >> Hi, >> >> Currently, when concurrent GC is canceled, we still enter the VM >> operation for >> concurrent evacuation, only to exit it quickly and slide into the >> full GC. This >> causes *two* back-to-back safepoints: one short from evac, and >> another large for >> full GC. While short one is normally short, it can hit the unlucky >> scheduling >> outlier and drag the pause time up. >> >> This change avoids going to evac if conc GC was canceled: >> http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev. >> 01/ >> >> Additionally, it resets the mark bitmaps before full GC with parallel >> workers, >> not concurrent ones, which would be important once Zhengyu trims down >> the number >> of concurrent workers. >> >> Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick) >> >> Thanks, >> -Aleksey >> From rkennke at redhat.com Mon Dec 5 18:44:01 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 05 Dec 2016 19:44:01 +0100 Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled In-Reply-To: <5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com> References: <1480958401.2597.8.camel@redhat.com> <5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com> Message-ID: <1480963441.2597.12.camel@redhat.com> Am Montag, den 05.12.2016, 19:09 +0100 schrieb Aleksey Shipilev: > Okay! How about this then? > ?http://cr.openjdk.java.net/~shade/shenandoah/cancel-no- > evac/webrev.02/ Hmm, you still don't check for cancelled gc after final-mark pause. Notice how initial-evacuation can, in theory, fail and cause full-gc.? (Infact, if that happens, there'd be no need to exit the final-mark safepoint: we could jump right into full-gc. However, that sounds a bit tricky: would need to ensure that shenandoahConcurrentThread doesn't start evacuation or another full-gc after that 'embedded' full-gc.) I like the comments though! Not your fault, but I find the use of both heap->cancelled_gc() and should_terminate() confusing. Not sure if it can be consolidated somehow? Not necessarily in this patch though. Another crazy pants idea to consider: if GC gets cancelled during marking, we could short-cut the full-gc: instead of throwing away the half-completed mark-bitmap, we could have full-gc pick up both the half-completed mark bitmap *and* the current taskqueues from concurrent marking, and complete that, and then do the full-compact with it. The idea here is that if we fail during marking, in all likelyhood we're *almost* done with marking and don't necessarily need to make everything again. Downside would be that the mark bitmap is slightly pessimistic because of SATB. Roman From shade at redhat.com Mon Dec 5 19:14:45 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 5 Dec 2016 20:14:45 +0100 Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled In-Reply-To: <1480963441.2597.12.camel@redhat.com> References: <1480958401.2597.8.camel@redhat.com> <5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com> <1480963441.2597.12.camel@redhat.com> Message-ID: <3d84b557-c23a-1a5b-840d-0e360599194f@redhat.com> On 12/05/2016 07:44 PM, Roman Kennke wrote: > Am Montag, den 05.12.2016, 19:09 +0100 schrieb Aleksey Shipilev: >> Okay! How about this then? >> http://cr.openjdk.java.net/~shade/shenandoah/cancel-no- >> evac/webrev.02/ > > Hmm, you still don't check for cancelled gc after final-mark pause. > Notice how initial-evacuation can, in theory, fail and cause full-gc. Right. Oops, the code is hairy, and prone to mishaps like that. > Not your fault, but I find the use of both heap->cancelled_gc() and > should_terminate() confusing. Not sure if it can be consolidated > somehow? Not necessarily in this patch though. Yes, let's rehash ShenandoahConcurrentThread::run_service into two methods, so that code is cleaner and early returns make cancellation checks similar to our beloved ParallelTerminator: http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.03/ Still passes hotspot_gc_shenandoah, and jcstress is running. > The idea here is that if we fail during marking, in all likelyhood we're > *almost* done with marking and don't necessarily need to make > everything again. Downside would be that the mark bitmap is slightly > pessimistic because of SATB. No, I think Full GC should be our "last ditch" collection, and be able to recover from any legitimate heap situation. This mandates starting from scratch, to avoid spamming via e.g. SATB. We can probably do the "optimistic" STW collection that does reuse the concurrent mark data though. Thanks, -Aleksey From rkennke at redhat.com Mon Dec 5 19:53:42 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 05 Dec 2016 20:53:42 +0100 Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled In-Reply-To: <3d84b557-c23a-1a5b-840d-0e360599194f@redhat.com> References: <1480958401.2597.8.camel@redhat.com> <5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com> <1480963441.2597.12.camel@redhat.com> <3d84b557-c23a-1a5b-840d-0e360599194f@redhat.com> Message-ID: <1480967622.2597.14.camel@redhat.com> Am Montag, den 05.12.2016, 20:14 +0100 schrieb Aleksey Shipilev: > On 12/05/2016 07:44 PM, Roman Kennke wrote: > > Am Montag, den 05.12.2016, 19:09 +0100 schrieb Aleksey Shipilev: > > > Okay! How about this then? > > > ?http://cr.openjdk.java.net/~shade/shenandoah/cancel-no- > > > evac/webrev.02/ > > > > Hmm, you still don't check for cancelled gc after final-mark pause. > > Notice how initial-evacuation can, in theory, fail and cause full- > > gc.? > > Right. Oops, the code is hairy, and prone to mishaps like that. > > > Not your fault, but I find the use of both heap->cancelled_gc() and > > should_terminate() confusing. Not sure if it can be consolidated > > somehow? Not necessarily in this patch though. > > Yes, let's rehash ShenandoahConcurrentThread::run_service into two > methods, so > that code is cleaner and early returns make cancellation checks > similar to our > beloved ParallelTerminator: > ? http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev. > 03/ > > Still passes hotspot_gc_shenandoah, and jcstress is running. Looks great! > > The idea here is that if we fail during marking, in all likelyhood > > we're > > *almost* done with marking and don't necessarily need to make > > everything again. Downside would be that the mark bitmap is > > slightly > > pessimistic because of SATB. > > No, I think Full GC should be our "last ditch" collection, and be > able to > recover from any legitimate heap situation. This mandates starting > from scratch, > to avoid spamming via e.g. SATB. Yes ok. Future idea: also compact humongous objects ;-) > We can probably do the "optimistic" STW > collection that does reuse the concurrent mark data though. Not exactly sure what you mean.? Green light for the patch! Roman From ashipile at redhat.com Mon Dec 5 20:28:24 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 05 Dec 2016 20:28:24 +0000 Subject: hg: shenandoah/jdk9/hotspot: Avoid evacuation if concurrent GC was cancelled. Make sure Full GC is able to recover. Message-ID: <201612052028.uB5KSPn8015227@aojmv0008.oracle.com> Changeset: 179aba55a53a Author: shade Date: 2016-12-05 21:28 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/179aba55a53a Avoid evacuation if concurrent GC was cancelled. Make sure Full GC is able to recover. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/runtime/thread.cpp From shade at redhat.com Tue Dec 6 17:17:11 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Dec 2016 18:17:11 +0100 Subject: RFC: TLAB size flapping Message-ID: <8e9458d9-9c24-5b7c-6dd5-017728c81381@redhat.com> Hi, So, if you run allocation tests under -Xlog:gc+tlab, then a funny story unfolds. The interesting piece of code is below, it is polled by TLAB allocation machinery to figure what is the max TLAB allocatable without hassle: size_t ShenandoahHeap::unsafe_max_tlab_alloc(Thread *thread) const { size_t idx = _free_regions->current_index(); ShenandoahHeapRegion* current = _free_regions->get(idx); if (current == NULL) { return 0; } else if (current->free() > MinTLABSize) { return current->free(); } else { return MinTLABSize; } } This what happens next: // Step 1: TLAB request for allocating, polling Shenandoah about the next free // region. Shenandoah replies there is a current free region with 256 words // busy (hm!). Okay, we claim the rest of the region for a TLAB then. [2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ... [2.328s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc: region = 1019, capacity = 524288, used = 256, free = 524032 [2.328s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3) returns 524032 [2.328s][trace][gc,tlab] allocating new tlab of size 524032 at addr 0x00000006bec00800 // Step 2: Another TLAB request. No more space in current region. But yeah, we // return MinTLABSize (those 256 words!), and shared infra moves on, asking us // to allocate a new TLAB of 256 words. Now, the current region is depleted, so // we allocate those 256 words in the *next* region. [2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ... [2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc: (failing) region = 1019, capacity = 524288, used = 524288, free = 0 [2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3) returns 256 [2.329s][trace][gc,tlab] allocating new tlab of size 256 at addr 0x00000006bf000000 // Step 1 again. The cycle continues. Another TLAB request, current region has // 256 words used, claim the rest... goes on and on. [2.329s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ... [2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc: region = 1020, capacity = 524288, used = 256, free = 524032 [2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3) returns 524032 [2.329s][trace][gc,tlab] allocating new tlab of size 524032 at addr 0x00000006bf000800 So, this flaps TLAB allocations between the region size and MinTLABSize. Oops! We enter the slow path *twice* per region, instead of doing it once. I think returning MinTLABSize is wrong in the code above, and we have two options: a) Return 0 on MinTLABSize branch. If I read the code right, this will bail us from TLAB allocation path, which is undesireable; b) Advance to the next free region, and try to poll its free(). G1 is susceptible to the same problem, as far as I can see. Thanks, -Aleksey From rkennke at redhat.com Tue Dec 6 17:26:27 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 06 Dec 2016 18:26:27 +0100 Subject: RFC: TLAB size flapping In-Reply-To: <8e9458d9-9c24-5b7c-6dd5-017728c81381@redhat.com> References: <8e9458d9-9c24-5b7c-6dd5-017728c81381@redhat.com> Message-ID: <1481045187.2597.19.camel@redhat.com> Am Dienstag, den 06.12.2016, 18:17 +0100 schrieb Aleksey Shipilev: > Hi, > > So, if you run allocation tests under -Xlog:gc+tlab, then a funny > story unfolds. > The interesting piece of code is below, it is polled by TLAB > allocation > machinery to figure what is the max TLAB allocatable without hassle: > > size_t??ShenandoahHeap::unsafe_max_tlab_alloc(Thread *thread) const { > ? size_t idx = _free_regions->current_index(); > ? ShenandoahHeapRegion* current = _free_regions->get(idx); > ? if (current == NULL) { > ????return 0; > ? } else if (current->free() > MinTLABSize) { > ????return current->free(); > ? } else { > ????return MinTLABSize; > ? } > } > > This what happens next: > > // Step 1: TLAB request for allocating, polling Shenandoah about the > next free > // region. Shenandoah replies there is a current free region with 256 > words > // busy (hm!). Okay, we claim the rest of the region for a TLAB then. > [2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ... > [2.328s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc: > region = 1019, > capacity = 524288, used = 256, free = 524032 > [2.328s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3) > returns 524032 > [2.328s][trace][gc,tlab] allocating new tlab of size 524032 at addr > 0x00000006bec00800 > > // Step 2: Another TLAB request. No more space in current region. But > yeah, we > // return MinTLABSize (those 256 words!), and shared infra moves on, > asking us > // to allocate a new TLAB of 256 words. Now, the current region is > depleted, so > // we allocate those 256 words in the *next* region. > [2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ... > [2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc: > (failing) region > = 1019, capacity = 524288, used = 524288, free = 0 > [2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3) > returns 256 > [2.329s][trace][gc,tlab] allocating new tlab of size 256 at addr > 0x00000006bf000000 > > // Step 1 again. The cycle continues. Another TLAB request, current > region has > // 256 words used, claim the rest... goes on and on. > [2.329s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ... > [2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc: > region = 1020, > capacity = 524288, used = 256, free = 524032 > [2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3) > returns 524032 > [2.329s][trace][gc,tlab] allocating new tlab of size 524032 at addr > 0x00000006bf000800 > > So, this flaps TLAB allocations between the region size and > MinTLABSize. Oops! Oops indeed! :-) > We enter the slow path *twice* per region, instead of doing it once. > I think > returning MinTLABSize is wrong in the code above, and we have two > options: > ? a) Return 0 on MinTLABSize branch. If I read the code right, this > will bail us > from TLAB allocation path, which is undesireable; > ? b) Advance to the next free region, and try to poll its free(). Hmm, a seems undesirable. Do we really need to advance to next region? Can't we simply return region-size here? I mean, it is inherently racy and it doesn't matter if we advance right now, or a little later when trying to allocate. Returning X here doesn't somehow magically guarantee that we can later allocate X without skipping to next region. Unless it's somehow done atomically. Which we don't. (Shenandoah does lock-free allocations, maybe other GCs are better off because they allocate under Heap_lock?) Roman From shade at redhat.com Tue Dec 6 17:55:09 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Dec 2016 18:55:09 +0100 Subject: RFR: TLAB size flapping In-Reply-To: <1481045187.2597.19.camel@redhat.com> References: <8e9458d9-9c24-5b7c-6dd5-017728c81381@redhat.com> <1481045187.2597.19.camel@redhat.com> Message-ID: <7ff5d029-768d-1e9a-b093-2c81a379fb5f@redhat.com> On 12/06/2016 06:26 PM, Roman Kennke wrote: >> We enter the slow path *twice* per region, instead of doing it once. >> I think >> returning MinTLABSize is wrong in the code above, and we have two >> options: >> a) Return 0 on MinTLABSize branch. If I read the code right, this >> will bail us >> from TLAB allocation path, which is undesireable; >> b) Advance to the next free region, and try to poll its free(). > > Hmm, a seems undesirable. Do we really need to advance to next region? > Can't we simply return region-size here? I mean, it is inherently racy > and it doesn't matter if we advance right now, or a little later when > trying to allocate. Returning X here doesn't somehow magically > guarantee that we can later allocate X without skipping to next region. > Unless it's somehow done atomically. Which we don't. (Shenandoah does > lock-free allocations, maybe other GCs are better off because they > allocate under Heap_lock?) Ah, very good, we can return the region size, knowing the next free region is completely free: http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/webrev.01/ It does seem to improve allocation rates when multiple allocating threads are bashing us with requests (caveat emptor: new workload, blah-blah): http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/baseline.txt http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/patched.txt Thanks, -Aleksey From shade at redhat.com Tue Dec 6 18:39:01 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Dec 2016 19:39:01 +0100 Subject: Perf: excess store in allocation fast path? Message-ID: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com> Hi, (Roland?) I think we have the excess store at allocation fast path, compare Shenandoah [1] and Parallel [2]. And this is not storing the fwdptr, but seems to be the excess zeroing. In that test, allocating a simple Object yields this: mov %r11,(%rax) ; mark word prefetchnta 0xc0(%r10) movl $0xf80001dd,0x8(%rax) ; class word mov %rax,-0x8(%rax) ; fwdptr mov %r12d,0xc(%rax) ; zeroing last 4 bytes mov %r12,0x10(%rax) ; <--- hey, what? I think this happens because allocation fastpath bumps the instance size to "cover" for the upcoming object's fwdptr, and accidentally zeroes it as well? Do we need this? I can imagine the invariant that everything up to top pointer should be zeroed, is this such a case? The original test is in our suite [3], runnable like this, if you want to poke around it: $ java -jar target/benchmarks.jar alloc.plain.Objects --jvmArgs "-XX:+UseShenandoahGC -Xmx8g -Xms8g" -f 1 -wi 5 -i 5 -t 1 -prof perfasm Thanks, -Aleksey [1] http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc-shenandoah.txt [2] http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc-parallel.txt [3] http://icedtea.classpath.org/people/shade/gc-bench From roman at kennke.org Tue Dec 6 18:25:39 2016 From: roman at kennke.org (Roman Kennke) Date: Tue, 06 Dec 2016 19:25:39 +0100 Subject: RFR: TLAB size flapping Message-ID: OK! Sent from my FairPhoneAm 06.12.2016 6:55 nachm. schrieb Aleksey Shipilev : > > On 12/06/2016 06:26 PM, Roman Kennke wrote: > >> We enter the slow path *twice* per region, instead of doing it once. > >> I think > >> returning MinTLABSize is wrong in the code above, and we have two > >> options: > >>?? a) Return 0 on MinTLABSize branch. If I read the code right, this > >> will bail us > >> from TLAB allocation path, which is undesireable; > >>?? b) Advance to the next free region, and try to poll its free(). > > > > Hmm, a seems undesirable. Do we really need to advance to next region? > > Can't we simply return region-size here? I mean, it is inherently racy > > and it doesn't matter if we advance right now, or a little later when > > trying to allocate. Returning X here doesn't somehow magically > > guarantee that we can later allocate X without skipping to next region. > > Unless it's somehow done atomically. Which we don't. (Shenandoah does > > lock-free allocations, maybe other GCs are better off because they > > allocate under Heap_lock?) > > Ah, very good, we can return the region size, knowing the next free region is > completely free: > ? http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/webrev.01/ > > It does seem to improve allocation rates when multiple allocating threads are > bashing us with requests (caveat emptor: new workload, blah-blah): > http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/baseline.txt > http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/patched.txt > > Thanks, > -Aleksey > > From ashipile at redhat.com Tue Dec 6 18:42:07 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 06 Dec 2016 18:42:07 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix TLAB flapping. Do not reply with MinTLABSize if we have no space left in current region, make allocator to ask for another region. Message-ID: <201612061842.uB6Ig7ro020621@aojmv0008.oracle.com> Changeset: 7009fc6f74b3 Author: shade Date: 2016-12-06 19:41 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/7009fc6f74b3 Fix TLAB flapping. Do not reply with MinTLABSize if we have no space left in current region, make allocator to ask for another region. ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.cpp From chf at redhat.com Tue Dec 6 18:47:53 2016 From: chf at redhat.com (Christine Flood) Date: Tue, 6 Dec 2016 13:47:53 -0500 (EST) Subject: First pass at a connection matrix... In-Reply-To: <34332593.2371335.1481049924996.JavaMail.zimbra@redhat.com> Message-ID: <342735797.2371608.1481050073955.JavaMail.zimbra@redhat.com> This is just experimental for now. The long term plan is to have this matrix built by write barriers and have a smarter metric for choosing connected collection set regions. The matrix is built and printed during concurrent marking if you run with -XX:+ShenandoahMatrix. The somewhat silly heuristic is run via -XX:ShenandoahGCHeuristics=connections. This isn't really integrated with the new region_in_collection_set stuff, but is enough for now. http://cr.openjdk.java.net/~chf/connections/webrev.01/ Christine From shade at redhat.com Tue Dec 6 18:53:27 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Dec 2016 19:53:27 +0100 Subject: RFR: TLAB size flapping In-Reply-To: <201612061839.uB6IdOGp025048@int-mx10.intmail.prod.int.phx2.redhat.com> References: <201612061839.uB6IdOGp025048@int-mx10.intmail.prod.int.phx2.redhat.com> Message-ID: <0b3722cd-1a35-bdef-7aee-c9bad9261af7@redhat.com> On 12/06/2016 07:25 PM, Roman Kennke wrote: > OK! Pushed. I know some G1 folks are reading this list (waves), so here is the relevant bug for G1. Maybe there is a better solution there: https://bugs.openjdk.java.net/browse/JDK-8170817 Thanks, -Aleksey From shade at redhat.com Tue Dec 6 19:13:00 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Dec 2016 20:13:00 +0100 Subject: First pass at a connection matrix... In-Reply-To: <342735797.2371608.1481050073955.JavaMail.zimbra@redhat.com> References: <342735797.2371608.1481050073955.JavaMail.zimbra@redhat.com> Message-ID: <9ccde02b-73fd-d7d3-da7b-b2d7c25e1e04@redhat.com> On 12/06/2016 07:47 PM, Christine Flood wrote: > http://cr.openjdk.java.net/~chf/connections/webrev.01/ I don't mind this experimental code in repo, but let's do a few cleanups to match with other experimental hacks we have! *) Change tty->print-s to log_develop_trace(gc); that also fixes tty->print vs. tty->print_cr. *) Prefix new bug comments with FIXME *) Crush bad formatting early: - ConnectionHeuristics::choose_collection_set: indenting, 2 vs 3 spaces? - ShenandoahCollectorPolicy::phase_times() definition, excess space - globals.hpp, align right "\" around new additions *) UseShenandoahOWST in globals.hpp moved accidentally? Otherwise looks okay. Thanks, -Aleksey From rkennke at redhat.com Tue Dec 6 19:25:14 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 06 Dec 2016 20:25:14 +0100 Subject: Perf: excess store in allocation fast path? In-Reply-To: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com> References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com> Message-ID: <1481052314.2597.21.camel@redhat.com> Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev: > Hi, (Roland?) > > I think we have the excess store at allocation fast path, compare > Shenandoah [1] > and Parallel [2]. And this is not storing the fwdptr, but seems to be > the excess > zeroing. In that test, allocating a simple Object yields this: > > ? mov????%r11,(%rax)????????????; mark word > ? prefetchnta 0xc0(%r10) > ? movl???$0xf80001dd,0x8(%rax)??; class word > ? mov????%rax,-0x8(%rax)????????; fwdptr > ? mov????%r12d,0xc(%rax)????????; zeroing last 4 bytes > ? mov????%r12,0x10(%rax)????????; <--- hey, what? > > I think this happens because allocation fastpath bumps the instance > size to > "cover" for the upcoming object's fwdptr, and accidentally zeroes it > as well? Do > we need this? I can imagine the invariant that everything up to top > pointer > should be zeroed, is this such a case? It looks like initialization for the first field in the object. Maybe we're failing the c2 opt that eliminates initial zeroing for fields? Maybe our barrier or allocation stuff somehow gets in the way of that and c2 can't see the initialization and therefore cannot optimize it away? Roman From shade at redhat.com Tue Dec 6 19:29:28 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Dec 2016 20:29:28 +0100 Subject: Perf: excess store in allocation fast path? In-Reply-To: <1481052314.2597.21.camel@redhat.com> References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com> <1481052314.2597.21.camel@redhat.com> Message-ID: <66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com> On 12/06/2016 08:25 PM, Roman Kennke wrote: > Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev: >> I think we have the excess store at allocation fast path, compare >> Shenandoah [1] and Parallel [2]. And this is not storing the fwdptr, but >> seems to be the excess zeroing. In that test, allocating a simple Object >> yields this: >> >> mov %r11,(%rax) ; mark word >> prefetchnta 0xc0(%r10) >> movl $0xf80001dd,0x8(%rax) ; class word >> mov %rax,-0x8(%rax) ; fwdptr >> mov %r12d,0xc(%rax) ; zeroing last 4 bytes >> mov %r12,0x10(%rax) ; <--- hey, what? >> >> I think this happens because allocation fastpath bumps the instance size >> to "cover" for the upcoming object's fwdptr, and accidentally zeroes it as >> well? Do we need this? I can imagine the invariant that everything up to >> top pointer should be zeroed, is this such a case? > > It looks like initialization for the first field in the object. Maybe > we're failing the c2 opt that eliminates initial zeroing for fields? > Maybe our barrier or allocation stuff somehow gets in the way of that > and c2 can't see the initialization and therefore cannot optimize it > away? The test allocates new Object(), no fields. The object is 16 bytes long, yet we store something beyond 16 bytes -- which AFAIR is the slot for the next object's forwarding pointer. Thanks, -Aleksey From rkennke at redhat.com Tue Dec 6 19:44:14 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 06 Dec 2016 20:44:14 +0100 Subject: Perf: excess store in allocation fast path? In-Reply-To: <66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com> References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com> <1481052314.2597.21.camel@redhat.com> <66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com> Message-ID: <1481053454.2597.23.camel@redhat.com> Am Dienstag, den 06.12.2016, 20:29 +0100 schrieb Aleksey Shipilev: > On 12/06/2016 08:25 PM, Roman Kennke wrote: > > Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev: > > > I think we have the excess store at allocation fast path, > > > compare? > > > Shenandoah [1] and Parallel [2]. And this is not storing the > > > fwdptr, but > > > seems to be the excess zeroing. In that test, allocating a simple > > > Object > > > yields this: > > > > > > ? mov????%r11,(%rax)????????????; mark word > > > ? prefetchnta 0xc0(%r10) > > > ? movl???$0xf80001dd,0x8(%rax)??; class word > > > ? mov????%rax,-0x8(%rax)????????; fwdptr > > > ? mov????%r12d,0xc(%rax)????????; zeroing last 4 bytes > > > ? mov????%r12,0x10(%rax)????????; <--- hey, what? > > > > > > I think this happens because allocation fastpath bumps the > > > instance size > > > to "cover" for the upcoming object's fwdptr, and accidentally > > > zeroes it as > > > well? Do we need this? I can imagine the invariant that > > > everything up to > > > top pointer should be zeroed, is this such a case? > > > > It looks like initialization for the first field in the object. > > Maybe > > we're failing the c2 opt that eliminates initial zeroing for > > fields? > > Maybe our barrier or allocation stuff somehow gets in the way of > > that > > and c2 can't see the initialization and therefore cannot optimize > > it > > away? > > The test allocates new Object(), no fields. The object is 16 bytes > long, yet we > store something beyond 16 bytes -- which AFAIR is the slot for the > next object's > forwarding pointer. Try the attached patch. It preserves the obj_size, and passes that to initialize_object(). -------------- next part -------------- diff --git a/src/share/vm/opto/macro.cpp b/src/share/vm/opto/macro.cpp --- a/src/share/vm/opto/macro.cpp +++ b/src/share/vm/opto/macro.cpp @@ -1449,6 +1449,7 @@ transform_later(old_eden_top); // Add to heap top to get a new heap top + Node* init_size_in_bytes = size_in_bytes; if (UseShenandoahGC) { // Allocate several words more for the Shenandoah brooks pointer. size_in_bytes = new AddLNode(size_in_bytes, _igvn.MakeConX(BrooksPointer::byte_size())); @@ -1554,7 +1555,7 @@ InitializeNode* init = alloc->initialization(); fast_oop_rawmem = initialize_object(alloc, fast_oop_ctrl, fast_oop_rawmem, fast_oop, - klass_node, length, size_in_bytes); + klass_node, length, init_size_in_bytes); // If initialization is performed by an array copy, any required // MemBarStoreStore was already added. If the object does not From shade at redhat.com Tue Dec 6 19:50:57 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Dec 2016 20:50:57 +0100 Subject: Perf: excess store in allocation fast path? In-Reply-To: <1481053454.2597.23.camel@redhat.com> References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com> <1481052314.2597.21.camel@redhat.com> <66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com> <1481053454.2597.23.camel@redhat.com> Message-ID: <9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com> On 12/06/2016 08:44 PM, Roman Kennke wrote: > Try the attached patch. It preserves the obj_size, and passes that to > initialize_object(). Yea, that works, see: http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc-shenandoah-rkennke1.txt Compare with baseline: http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc-shenandoah.txt ...and have your 50 picoseconds per alloc back! Now, I want to know if it's okay to skip zeroing memory past the allocation pointer. I think it is safe, because that's how zeroing elimination works in other cases? Thanks, -Aleksey From rkennke at redhat.com Tue Dec 6 19:55:17 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 06 Dec 2016 20:55:17 +0100 Subject: Perf: excess store in allocation fast path? In-Reply-To: <9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com> References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com> <1481052314.2597.21.camel@redhat.com> <66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com> <1481053454.2597.23.camel@redhat.com> <9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com> Message-ID: <1481054117.2597.24.camel@redhat.com> Am Dienstag, den 06.12.2016, 20:50 +0100 schrieb Aleksey Shipilev: > On 12/06/2016 08:44 PM, Roman Kennke wrote: > > Try the attached patch. It preserves the obj_size, and passes that > > to > > initialize_object(). > > Yea, that works, see: > > http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc > -shenandoah-rkennke1.txt > > Compare with baseline: > > http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc > -shenandoah.txt > > ...and have your 50 picoseconds per alloc back! > > Now, I want to know if it's okay to skip zeroing memory past the > allocation > pointer. I think it is safe, because that's how zeroing elimination > works in > other cases? It's not only ok, I think it is a bug to zero past the allocation ptr. Consider what happens when you allocate at the region boundary, and then initialize one word past the object -> we'd wreck the 1st word of the next region. Roman From shade at redhat.com Tue Dec 6 20:07:24 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Dec 2016 21:07:24 +0100 Subject: Perf: excess store in allocation fast path? In-Reply-To: <1481054117.2597.24.camel@redhat.com> References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com> <1481052314.2597.21.camel@redhat.com> <66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com> <1481053454.2597.23.camel@redhat.com> <9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com> <1481054117.2597.24.camel@redhat.com> Message-ID: <54c8fecd-6bd5-8427-cd11-34466f6d8ed4@redhat.com> On 12/06/2016 08:55 PM, Roman Kennke wrote: > Am Dienstag, den 06.12.2016, 20:50 +0100 schrieb Aleksey Shipilev: >> Now, I want to know if it's okay to skip zeroing memory past the >> allocation pointer. I think it is safe, because that's how zeroing >> elimination works in other cases? > It's not only ok, I think it is a bug to zero past the allocation ptr. > Consider what happens when you allocate at the region boundary, and > then initialize one word past the object -> we'd wreck the 1st word of > the next region. Hrmpf. IIRC our filler object mechanics correctly, we allocate the space at the end of the object, so there is no way to cross into other region? Anyhow, that one notwithstanding, I meant if it's okay to have non-zeroed slot _under_ the allocation top, as in: (obj2 header would go here) ----------------------------------------- alloc top [garbage slot, soon to be obj2 fwdptr] [obj1 fields] [obj1 header] [obj1 fwdptr] ... It's not likely to be parsable, but still. Thanks, -Aleksey From rkennke at redhat.com Tue Dec 6 20:53:19 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 06 Dec 2016 21:53:19 +0100 Subject: Perf: excess store in allocation fast path? In-Reply-To: <54c8fecd-6bd5-8427-cd11-34466f6d8ed4@redhat.com> References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com> <1481052314.2597.21.camel@redhat.com> <66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com> <1481053454.2597.23.camel@redhat.com> <9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com> <1481054117.2597.24.camel@redhat.com> <54c8fecd-6bd5-8427-cd11-34466f6d8ed4@redhat.com> Message-ID: <1481057599.2597.27.camel@redhat.com> Am Dienstag, den 06.12.2016, 21:07 +0100 schrieb Aleksey Shipilev: > On 12/06/2016 08:55 PM, Roman Kennke wrote: > > Am Dienstag, den 06.12.2016, 20:50 +0100 schrieb Aleksey Shipilev: > > > Now, I want to know if it's okay to skip zeroing memory past the? > > > allocation pointer. I think it is safe, because that's how > > > zeroing > > > elimination works in other cases? > > > > It's not only ok, I think it is a bug to zero past the allocation > > ptr. > > Consider what happens when you allocate at the region boundary, and > > then initialize one word past the object -> we'd wreck the 1st word > > of > > the next region. > > Hrmpf. IIRC our filler object mechanics correctly, we allocate the > space at the > end of the object, so there is no way to cross into other region? Nope, this shouldn't be the case. We *should* always allocate brooks ptr + object of this object, not into the next one. > Anyhow, that one notwithstanding, I meant if it's okay to have non- > zeroed slot > _under_ the allocation top, as in: > > ? (obj2 header would go here) > ?----------------------------------------- alloc top > ? [garbage slot, soon to be obj2 fwdptr] > ? [obj1 fields] > ? [obj1 header] > ? [obj1 fwdptr] > ? ... It should be: ? (obj2 header would go here) ???[garbage slot, soon to be obj2 fwdptr] ? ----------------------------------------- alloc top ??[obj1 fields] ???[obj1 header] ???[obj1 fwdptr] ???... if not, I'd say it's a bug. Roman From rkennke at redhat.com Tue Dec 6 21:24:24 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 06 Dec 2016 22:24:24 +0100 Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u Message-ID: <1481059464.2597.29.camel@redhat.com> This huge change backports the current state of JDK9 (minus the last bunch of patches) to jdk8u: http://cr.openjdk.java.net/~rkennke/backport-jdk8/webrev.00/ Not sure if this can be reasonably reviewed. ;-) I checked this line by line and also compared it to our baseline repo ( http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/). The one thing missing is changes in src/share/vm/opto and src/share/vm/adlc, but Roland is working on those. I've checked with SPECjvm2008 and jcstress is on the way. Unfortunately, I could not get the jtreg stuff to work. Ok to go in? Roman From rkennke at redhat.com Tue Dec 6 21:29:28 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 06 Dec 2016 22:29:28 +0100 Subject: RFR: Fix object initialization in C2 Message-ID: <1481059768.2597.31.camel@redhat.com> As discussed in previous thread, we overshoot object initialization by one word in C2 compiled allocation code. Besides generating one extra store, I believe it's very dangerous: an object allocated at region end would write to one word beyond, either thrashing the brooks ptr of the next regions first object, or causing a SEGV at end of heap. I'm actually surprised it hasn't happened yet ;-) The fix is relatively simple: keep around the true object size, and pass that to initialize_object() instead of the obj-size + brooksptr- size that we calculated. http://cr.openjdk.java.net/~rkennke/obj-init/webrev.00/ Ok? Roman From shade at redhat.com Wed Dec 7 08:49:59 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 7 Dec 2016 09:49:59 +0100 Subject: RFR: Fix object initialization in C2 In-Reply-To: <1481059768.2597.31.camel@redhat.com> References: <1481059768.2597.31.camel@redhat.com> Message-ID: <8aaa8c2a-183d-5959-99f8-8ecfdc9cea9b@redhat.com> On 12/06/2016 10:29 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/obj-init/webrev.00/ Looks good to me. I would like Roland to OK this. -Aleksey From rwestrel at redhat.com Wed Dec 7 08:55:24 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 07 Dec 2016 09:55:24 +0100 Subject: RFR: Fix object initialization in C2 In-Reply-To: <1481059768.2597.31.camel@redhat.com> References: <1481059768.2597.31.camel@redhat.com> Message-ID: > http://cr.openjdk.java.net/~rkennke/obj-init/webrev.00/ That looks good to me. Roland. From rkennke at redhat.com Wed Dec 7 10:18:42 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 07 Dec 2016 11:18:42 +0100 Subject: RFR: Fix object initialization in C2 In-Reply-To: References: <1481059768.2597.31.camel@redhat.com> Message-ID: <1481105922.2597.32.camel@redhat.com> Am Mittwoch, den 07.12.2016, 09:55 +0100 schrieb Roland Westrelin: > > http://cr.openjdk.java.net/~rkennke/obj-init/webrev.00/ > > That looks good to me. Thanks. I pushed it. Turns out that we're saved by the prefetch-reserve in ThreadLocalAllocationBuffer: it always allocates a few words more than necessary and thus we're never jumping off the cliff. Lucky us :-) Roman From roman at kennke.org Wed Dec 7 10:19:37 2016 From: roman at kennke.org (roman at kennke.org) Date: Wed, 07 Dec 2016 10:19:37 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix object initialization in C2 Message-ID: <201612071019.uB7AJbp5019698@aojmv0008.oracle.com> Changeset: f6d8d643198e Author: rkennke Date: 2016-12-07 11:17 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/f6d8d643198e Fix object initialization in C2 ! src/share/vm/opto/macro.cpp From shade at redhat.com Wed Dec 7 12:12:50 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 7 Dec 2016 13:12:50 +0100 Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u In-Reply-To: <1481059464.2597.29.camel@redhat.com> References: <1481059464.2597.29.camel@redhat.com> Message-ID: On 12/06/2016 10:24 PM, Roman Kennke wrote: > This huge change backports the current state of JDK9 (minus the last > bunch of patches) to jdk8u: > > http://cr.openjdk.java.net/~rkennke/backport-jdk8/webrev.00/ Spot-checking: *) src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp These are not conditional for Shenandoah, do we hit these guarantees with other GCs? 2032 guarantee(opr2->type() != T_OBJECT && opr2->type() != T_ARRAY, "need acmp barrier?"); 2033 guarantee(opr1->type() != T_OBJECT && opr1->type() != T_ARRAY, "need acmp barrier?"); *) src/share/vm/c1/c1_Runtime1.cpp Bad indent: 688 Handle h_obj(thread, obj); *) src/share/vm/memory/barrierSet.cpp Why we moved BarrierSet::write_ref_array here? Was that the upstream jdk-9 move? Should probably stay closer to jdk-8 version. *) src/share/vm/runtime/fieldDescriptor.hpp Another leak from jdk-9? 101 bool is_stable() const { return access_flags().is_stable(); } *) src/share/vm/runtime/os.hpp Leak? 56 class methodHandle; *) src/share/vm/utilities/growableArray.hpp Leak? 30 #include "oops/oop.hpp" Otherwise looks good. Thanks, -Aleksey From rkennke at redhat.com Wed Dec 7 13:08:17 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 07 Dec 2016 14:08:17 +0100 Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u In-Reply-To: References: <1481059464.2597.29.camel@redhat.com> Message-ID: <1481116097.2597.34.camel@redhat.com> Am Mittwoch, den 07.12.2016, 13:12 +0100 schrieb Aleksey Shipilev: > On 12/06/2016 10:24 PM, Roman Kennke wrote: > > This huge change backports the current state of JDK9 (minus the > > last > > bunch of patches) to jdk8u: > > > > http://cr.openjdk.java.net/~rkennke/backport-jdk8/webrev.00/ > > Spot-checking: > > *) src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp > > These are not conditional for Shenandoah, do we hit these guarantees > with other GCs? > > ? 2032???????guarantee(opr2->type() != T_OBJECT && opr2->type() != > T_ARRAY, > "need acmp barrier?"); > ? 2033???????guarantee(opr1->type() != T_OBJECT && opr1->type() != > T_ARRAY, > "need acmp barrier?"); Nope, we don't. Should I remove them? FWIW those are the same as we have in jdk9-shenandoah. > *) src/share/vm/c1/c1_Runtime1.cpp > > Bad indent: > > ? 688 Handle h_obj(thread, obj); Uh, I was reading 'bad intent' and didn't know what you mean ;-) ?Will fix it before pushing. > > *) src/share/vm/memory/barrierSet.cpp > > Why we moved BarrierSet::write_ref_array here? Was that the upstream > jdk-9 move? No, this was moved in jdk9-shenandoah because we made that method virtual. > *) src/share/vm/runtime/fieldDescriptor.hpp > > Another leak from jdk-9? > > ? 101???bool is_stable()????????????????const????{ return > access_flags().is_stable(); } No. We need it in c2 to identify stable fields. (no read-barrier needed...) > *) src/share/vm/runtime/os.hpp > > Leak? > > ? 56 class methodHandle; No. It's used some lines down as methodHandle* and we're changing order of includes and this is needed for compilation. > *) src/share/vm/utilities/growableArray.hpp > > Leak? > > ? 30 #include "oops/oop.hpp" No. We introduced some code that uses oopDesc::is_safe(). Ok to push after fixing the bad intent ;-) Roman From shade at redhat.com Wed Dec 7 19:50:24 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 7 Dec 2016 20:50:24 +0100 Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u In-Reply-To: <1481116097.2597.34.camel@redhat.com> References: <1481059464.2597.29.camel@redhat.com> <1481116097.2597.34.camel@redhat.com> Message-ID: <5608f8a7-38be-e499-ae9a-d476cd27172a@redhat.com> On 12/07/2016 02:08 PM, Roman Kennke wrote: > Am Mittwoch, den 07.12.2016, 13:12 +0100 schrieb Aleksey Shipilev: >> On 12/06/2016 10:24 PM, Roman Kennke wrote: > Ok to push after fixing the bad intent ;-) Ok then. Thanks, -Aleksey From roman at kennke.org Wed Dec 7 20:03:22 2016 From: roman at kennke.org (roman at kennke.org) Date: Wed, 07 Dec 2016 20:03:22 +0000 Subject: hg: shenandoah/jdk8u/hotspot: Backport JDK9 Shenandoah to JDK8u Message-ID: <201612072003.uB7K3Mm0023185@aojmv0008.oracle.com> Changeset: 87059e2365be Author: rkennke Date: 2016-12-07 21:03 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/87059e2365be Backport JDK9 Shenandoah to JDK8u ! src/cpu/aarch64/vm/aarch64.ad ! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp ! src/cpu/aarch64/vm/c1_LIRGenerator_aarch64.cpp ! src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/c1_Runtime1_aarch64.cpp ! src/cpu/aarch64/vm/interp_masm_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.hpp ! src/cpu/aarch64/vm/methodHandles_aarch64.cpp ! src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp ! src/cpu/aarch64/vm/shenandoahBarrierSet_aarch64.cpp ! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp ! src/cpu/aarch64/vm/stubRoutines_aarch64.hpp ! src/cpu/aarch64/vm/templateInterpreter_aarch64.cpp ! src/cpu/aarch64/vm/templateTable_aarch64.cpp ! src/cpu/x86/vm/assembler_x86.cpp ! src/cpu/x86/vm/assembler_x86.hpp ! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp ! src/cpu/x86/vm/c1_LIRGenerator_x86.cpp ! src/cpu/x86/vm/c1_MacroAssembler_x86.cpp ! src/cpu/x86/vm/c1_Runtime1_x86.cpp ! src/cpu/x86/vm/interp_masm_x86_64.cpp ! src/cpu/x86/vm/macroAssembler_x86.cpp ! src/cpu/x86/vm/macroAssembler_x86.hpp ! src/cpu/x86/vm/sharedRuntime_x86_64.cpp ! src/cpu/x86/vm/shenandoahBarrierSet_x86.cpp ! src/cpu/x86/vm/stubGenerator_x86_64.cpp ! src/cpu/x86/vm/stubRoutines_x86_64.hpp ! src/cpu/x86/vm/templateInterpreter_x86_64.cpp ! src/cpu/x86/vm/templateTable_x86_64.cpp ! src/cpu/x86/vm/x86_64.ad ! src/share/vm/asm/assembler.cpp ! src/share/vm/c1/c1_LIRGenerator.cpp ! src/share/vm/c1/c1_Runtime1.cpp ! src/share/vm/c1/c1_Runtime1.hpp ! src/share/vm/ci/ciInstanceKlass.cpp ! src/share/vm/classfile/classLoaderData.cpp ! src/share/vm/classfile/classLoaderData.hpp ! src/share/vm/classfile/javaClasses.cpp ! src/share/vm/classfile/systemDictionary.cpp ! src/share/vm/code/codeCache.cpp ! src/share/vm/code/nmethod.cpp ! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp + src/share/vm/gc_implementation/shared/parallelCleaning.cpp + src/share/vm/gc_implementation/shared/parallelCleaning.hpp - src/share/vm/gc_implementation/shenandoah/brooksPointer.cpp ! src/share/vm/gc_implementation/shenandoah/brooksPointer.hpp ! src/share/vm/gc_implementation/shenandoah/brooksPointer.inline.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahBarrierSet.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahBarrierSet.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahBarrierSet.inline.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahCollectionSet.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahFreeSet.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.inline.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.inline.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.hpp - src/share/vm/gc_implementation/shenandoah/shenandoahJNICritical.cpp - src/share/vm/gc_implementation/shenandoah/shenandoahJNICritical.hpp + src/share/vm/gc_implementation/shenandoah/shenandoahLogging.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMonitoringSupport.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahOopClosures.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahOopClosures.inline.hpp + src/share/vm/gc_implementation/shenandoah/shenandoahPhaseTimes.cpp + src/share/vm/gc_implementation/shenandoah/shenandoahPhaseTimes.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.hpp + src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.cpp + src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.hpp + src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.inline.hpp + src/share/vm/gc_implementation/shenandoah/shenandoahWorkerDataArray.cpp + src/share/vm/gc_implementation/shenandoah/shenandoahWorkerDataArray.hpp + src/share/vm/gc_implementation/shenandoah/shenandoahWorkerDataArray.inline.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp ! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.cpp ! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.hpp ! src/share/vm/gc_interface/collectedHeap.cpp ! src/share/vm/gc_interface/collectedHeap.hpp ! src/share/vm/memory/barrierSet.cpp ! src/share/vm/memory/barrierSet.hpp ! src/share/vm/memory/barrierSet.inline.hpp ! src/share/vm/memory/genMarkSweep.cpp ! src/share/vm/memory/space.inline.hpp ! src/share/vm/oops/instanceKlass.cpp ! src/share/vm/oops/instanceRefKlass.cpp ! src/share/vm/oops/oop.cpp ! src/share/vm/oops/oop.hpp ! src/share/vm/oops/oop.inline.hpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/escape.cpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/macro.cpp ! src/share/vm/opto/memnode.cpp ! src/share/vm/opto/shenandoahSupport.cpp ! src/share/vm/opto/shenandoahSupport.hpp ! src/share/vm/prims/jni.cpp ! src/share/vm/prims/jvm.cpp ! src/share/vm/prims/jvmtiEnv.cpp ! src/share/vm/prims/unsafe.cpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/biasedLocking.cpp ! src/share/vm/runtime/deoptimization.cpp ! src/share/vm/runtime/fieldDescriptor.hpp ! src/share/vm/runtime/init.cpp ! src/share/vm/runtime/mutexLocker.cpp ! src/share/vm/runtime/mutexLocker.hpp ! src/share/vm/runtime/objectMonitor.cpp ! src/share/vm/runtime/objectMonitor.hpp ! src/share/vm/runtime/os.hpp ! src/share/vm/runtime/safepoint.cpp ! src/share/vm/runtime/sharedRuntime.cpp ! src/share/vm/runtime/stubRoutines.cpp ! src/share/vm/runtime/stubRoutines.hpp ! src/share/vm/runtime/synchronizer.cpp ! src/share/vm/runtime/synchronizer.hpp ! src/share/vm/runtime/thread.cpp ! src/share/vm/services/attachListener.cpp ! src/share/vm/services/diagnosticCommand.cpp ! src/share/vm/services/heapDumper.cpp ! src/share/vm/services/threadService.cpp ! src/share/vm/utilities/growableArray.hpp ! src/share/vm/utilities/taskqueue.hpp ! test/TEST.groups From rkennke at redhat.com Thu Dec 8 11:00:16 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 12:00:16 +0100 Subject: RFR: C1 cleanup Message-ID: <1481194816.2597.39.camel@redhat.com> This is a cleanup of C1 related code: - Removed tmp1 and tmp2 from the ShenandoahWriteBarrier op (currently not needed) - Removed unused includes - Several whitespace fixes to make code as close as possible to upstream - Removed shenandoah_write_barrier_slow_id stub. we now use the shared WB stub http://cr.openjdk.java.net/~rkennke/c1-cleanup/webrev.00/ Ok? Roman From shade at redhat.com Thu Dec 8 11:05:46 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 8 Dec 2016 12:05:46 +0100 Subject: RFR: C1 cleanup In-Reply-To: <1481194816.2597.39.camel@redhat.com> References: <1481194816.2597.39.camel@redhat.com> Message-ID: <4c504e92-e14d-0e9d-2db7-91259182b88d@redhat.com> On 12/08/2016 12:00 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/c1-cleanup/webrev.00/ *) Shouldn't this assert be on lir_cas_long branch only? Or is it in upstream that (odd) way? 1600 void LIR_Assembler::emit_compare_and_swap(LIR_OpCompareAndSwap* op) { 1601 assert(VM_Version::supports_cx8(), "wrong machine"); *) Please break this line: 1461 LIR_OpShenandoahWriteBarrier(LIR_Opr obj, LIR_Opr result, CodeEmitInfo* info, bool need_null_check) : LIR_Op1(lir_shenandoah_wb, obj, result, T_OBJECT, lir_patch_none, info), _need_null_check(need_null_check) { Otherwise looks good. Thanks, -Aleksey From roman at kennke.org Thu Dec 8 11:13:00 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 08 Dec 2016 11:13:00 +0000 Subject: hg: shenandoah/jdk9/hotspot: C1 cleanup Message-ID: <201612081113.uB8BD0I1029974@aojmv0008.oracle.com> Changeset: e4acea31c079 Author: rkennke Date: 2016-12-08 12:12 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e4acea31c079 C1 cleanup ! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp ! src/cpu/aarch64/vm/c1_LIRGenerator_aarch64.cpp ! src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp ! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp ! src/cpu/x86/vm/c1_Runtime1_x86.cpp ! src/share/vm/c1/c1_LIR.cpp ! src/share/vm/c1/c1_LIR.hpp ! src/share/vm/c1/c1_LIRGenerator.cpp ! src/share/vm/c1/c1_Runtime1.cpp ! src/share/vm/c1/c1_Runtime1.hpp From rkennke at redhat.com Thu Dec 8 11:13:23 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 12:13:23 +0100 Subject: RFR: C1 cleanup In-Reply-To: <4c504e92-e14d-0e9d-2db7-91259182b88d@redhat.com> References: <1481194816.2597.39.camel@redhat.com> <4c504e92-e14d-0e9d-2db7-91259182b88d@redhat.com> Message-ID: <1481195603.2597.40.camel@redhat.com> Am Donnerstag, den 08.12.2016, 12:05 +0100 schrieb Aleksey Shipilev: > On 12/08/2016 12:00 PM, Roman Kennke wrote: > > http://cr.openjdk.java.net/~rkennke/c1-cleanup/webrev.00/ > > *) Shouldn't this assert be on lir_cas_long branch only? Or is it in > upstream > that (odd) way? > > 1600 void LIR_Assembler::emit_compare_and_swap(LIR_OpCompareAndSwap* > op) { > 1601???assert(VM_Version::supports_cx8(), "wrong machine"); It's in upstream like this. > *) Please break this line: > 1461???LIR_OpShenandoahWriteBarrier(LIR_Opr obj, LIR_Opr result, > CodeEmitInfo* > info, bool need_null_check) : LIR_Op1(lir_shenandoah_wb, obj, result, > T_OBJECT, > lir_patch_none, info), _need_null_check(need_null_check) { > > Otherwise looks good. Ok, pushed with that line broken in half. Roman From shade at redhat.com Thu Dec 8 13:25:03 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 8 Dec 2016 14:25:03 +0100 Subject: RFR (S): Fix shutdown/cancelled races Message-ID: Hi, The recent change for early cancellation introduced/exposed a few interesting races in shutdown/cancellation sequence. First race is on shutdown, and goes like this: a) SHHeap::stop() is called. b) SHHeap::stop() sets cancelled_gc to "true" c) SHConcThread loop detects canceled GC, and tries to exit d) SHConcThread fails, because neither full GC nor "terminate" is set assert (_do_full_gc || should_terminate(), "Either exiting, or impending Full GC"); e) SHHeap::stop() eventually calls SHConcThread::stop() to set "terminate", but it is too late. Fixed by introducing the "graceful shutdown" flag. Second race is between canceling GC and scheduling a full GC. Goes like this: a) ShenandoahHeap::collect() cancels GC b) SHConcThread loop detects canceled GC, and tries to exit c) SHConcThread fails, because neither full GC nor "terminate" is set assert (_do_full_gc || should_terminate(), "Either exiting, or impending Full GC"); d) ShenandoahHeap::collect() eventually calls into do_full_gc() to set _do_full_gc, but it is too late. Solved by moving GC cancellation within the do_full_gc method, and canceling after Full GC is scheduled. Both fixes: http://cr.openjdk.java.net/~shade/shenandoah/cancel-races/webrev.01/ Testing: hs_gc_shenandoah (with sleeps in critical places to exacerbate races), jcstress (tests-all) that was failing before. Note that in last week's code both races could have tried to start concurrent mark, or dived to sleep for 10ms, before SHConcThread could not detect it was stopped. It would have exited early by detecting the canceled GC. New code checks that early before doing the GC cycle, in case we slip like that again. Thanks, -Aleksey From rwestrel at redhat.com Thu Dec 8 14:15:02 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 08 Dec 2016 15:15:02 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/ This re-enables an optimization that was disabled with shenandoah. Roland. From shade at redhat.com Thu Dec 8 14:25:31 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 8 Dec 2016 15:25:31 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: References: Message-ID: On 12/08/2016 03:15 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/ *) (sirens sound, line breaking police storming in) Break this line :) 1076 call = make_leaf_call(c, m, OptoRuntime::shenandoah_clone_barrier_Type(), CAST_FROM_FN_PTR(address, SharedRuntime::shenandoah_clone_barrier), "shenandoah_clone_barrier", raw_adr_type, dest->in(AddPNode::Base)); Roman would probably do a more thorough review of this compiler change. Thanks, -Aleksey From rkennke at redhat.com Thu Dec 8 14:55:05 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 15:55:05 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: References: Message-ID: <1481208905.2597.45.camel@redhat.com> Am Donnerstag, den 08.12.2016, 15:15 +0100 schrieb Roland Westrelin: > http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/ > > This re-enables an optimization that was disabled with shenandoah. Cool! I like that! Do we have any idea if it does improve performance? That would be arraycopy on smallish arrays only right? Aleksey? This removes the call to SharedRuntime::shenandoah_clone_barrier(). You should also remove that method. I find references of it in : src/share/vm/opto/runtime.cpp src/share/vm/opto/runtime.hpp src/share/vm/opto/escape.cpp src/share/vm/runtime/sharedRuntime.hpp src/share/vm/runtime/sharedRuntime.cpp :-) Also, we really need to trim down shenandoah-specific changes in c2. My idea is to move everything that's more than 2 lines to shenandoahSupport.cpp and have other code in C2 call that. I wanted to do that for the GraphKit::shenandoah_XYZ_barrier() methods, but we seem to be getting more of such stuff :-) That's for another patch though. Roman From rkennke at redhat.com Thu Dec 8 14:57:47 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 15:57:47 +0100 Subject: RFR (S): Fix shutdown/cancelled races In-Reply-To: References: Message-ID: <1481209067.2597.46.camel@redhat.com> Patch looks good. Did you need to change any of the tests? E.g. "with sleeps in critical places to exacerbate races" ?? I can't tell you how often we have 'fixed' this code before... having a test triggering on the bug would be awesome! Roman Am Donnerstag, den 08.12.2016, 14:25 +0100 schrieb Aleksey Shipilev: > Hi, > > The recent change for early cancellation introduced/exposed a few > interesting > races in shutdown/cancellation sequence. > > First race is on shutdown, and goes like this: > ?a) SHHeap::stop() is called. > ?b) SHHeap::stop() sets cancelled_gc to "true" > ?c) SHConcThread loop detects canceled GC, and tries to exit > ?d) SHConcThread fails, because neither full GC nor "terminate" is > set > ????assert (_do_full_gc || should_terminate(), > ???????"Either exiting, or impending Full GC"); > ?e) SHHeap::stop() eventually calls SHConcThread::stop() to set > "terminate", but > it is too late. > > Fixed by introducing the "graceful shutdown" flag. > > Second race is between canceling GC and scheduling a full GC. Goes > like this: > ?a) ShenandoahHeap::collect() cancels GC > ?b) SHConcThread loop detects canceled GC, and tries to exit > ?c) SHConcThread fails, because neither full GC nor "terminate" is > set > ????assert (_do_full_gc || should_terminate(), > ???????"Either exiting, or impending Full GC"); > ?d) ShenandoahHeap::collect() eventually calls into do_full_gc() to > set > _do_full_gc, but it is too late. > > Solved by moving GC cancellation within the do_full_gc method, and > canceling > after Full GC is scheduled. > > Both fixes: > ?http://cr.openjdk.java.net/~shade/shenandoah/cancel-races/webrev.01/ > > Testing: hs_gc_shenandoah (with sleeps in critical places to > exacerbate races), > jcstress (tests-all) that was failing before. > > Note that in last week's code both races could have tried to start > concurrent > mark, or dived to sleep for 10ms, before SHConcThread could not > detect it was > stopped. It would have exited early by detecting the canceled GC. New > code > checks that early before doing the GC cycle, in case we slip like > that again. > > Thanks, > -Aleksey > From ashipile at redhat.com Thu Dec 8 15:36:37 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Thu, 08 Dec 2016 15:36:37 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix shutdown/cancelled races. Message-ID: <201612081536.uB8FabTM018219@aojmv0008.oracle.com> Changeset: 36b281f64016 Author: shade Date: 2016-12-08 16:36 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/36b281f64016 Fix shutdown/cancelled races. ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp From shade at redhat.com Thu Dec 8 15:37:40 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 8 Dec 2016 16:37:40 +0100 Subject: RFR (S): Fix shutdown/cancelled races In-Reply-To: <1481209067.2597.46.camel@redhat.com> References: <1481209067.2597.46.camel@redhat.com> Message-ID: <167226b6-3d00-d0c9-c658-e3972f430351@redhat.com> On 12/08/2016 03:57 PM, Roman Kennke wrote: > Patch looks good. Thanks, pushed. > Did you need to change any of the tests? E.g. "with sleeps in critical > places to exacerbate races" ?? I had to put it right at Shenandoah product code to trigger, so not really committable... > I can't tell you how often we have > 'fixed' this code before... having a test triggering on the bug would > be awesome! The "regular" jcstress testing found the races because of new asserts, so I guess we somewhat covered there. Thanks, -Aleksey From shade at redhat.com Thu Dec 8 15:38:27 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 8 Dec 2016 16:38:27 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: <1481208905.2597.45.camel@redhat.com> References: <1481208905.2597.45.camel@redhat.com> Message-ID: <7686428e-154f-4504-afbb-d2272c74633a@redhat.com> On 12/08/2016 03:55 PM, Roman Kennke wrote: > Am Donnerstag, den 08.12.2016, 15:15 +0100 schrieb Roland Westrelin: >> http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/ >> >> This re-enables an optimization that was disabled with shenandoah. > > Cool! I like that! > > Do we have any idea if it does improve performance? That would be > arraycopy on smallish arrays only right? Aleksey? Let me find the arraycopy tests (that I swear I did in OpenJDK for the previous Roland's non-Shenandoah patch :) and run then. Thanks, -Aleksey From rwestrel at redhat.com Thu Dec 8 15:55:12 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 08 Dec 2016 16:55:12 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: <1481208905.2597.45.camel@redhat.com> References: <1481208905.2597.45.camel@redhat.com> Message-ID: > This removes the call to SharedRuntime::shenandoah_clone_barrier(). It doesn't remove it. It moves it around. It's now only added to those clones that were not converted to loads/stores. Clone when it's not converted to loads/stores uses bulk copies. So it's not obvious to me that we can do better than using the SharedRuntime::shenandoah_clone_barrier() call. I also added a test for any object fields so the call to SharedRuntime::shenandoah_clone_barrier() is not emitted when it's obviously not needed. Roland. From rkennke at redhat.com Thu Dec 8 15:56:16 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 16:56:16 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: References: <1481208905.2597.45.camel@redhat.com> Message-ID: <1481212576.2597.48.camel@redhat.com> Am Donnerstag, den 08.12.2016, 16:55 +0100 schrieb Roland Westrelin: > > This removes the call to SharedRuntime::shenandoah_clone_barrier(). > > It doesn't remove it. It moves it around. It's now only added to > those > clones that were not converted to loads/stores. Clone when it's not > converted to loads/stores uses bulk copies. So it's not obvious to me > that we can do better than using the > SharedRuntime::shenandoah_clone_barrier() call. I also added a test > for > any object fields so the call to > SharedRuntime::shenandoah_clone_barrier() is not emitted when it's > obviously not needed. Ah. Oops, my bad :-) Ok to push then. Roman From rwestrel at redhat.com Thu Dec 8 16:01:59 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 08 Dec 2016 17:01:59 +0100 Subject: backport of jdk9 c2 code to 8 Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/jdk9-backport/webrev.00/ I had to import some non shenandoah changes from jdk9 to make my life easier. Roland. From rkennke at redhat.com Thu Dec 8 16:08:04 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 17:08:04 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: References: <1481208905.2597.45.camel@redhat.com> Message-ID: <1481213284.2597.50.camel@redhat.com> Am Donnerstag, den 08.12.2016, 16:55 +0100 schrieb Roland Westrelin: > > This removes the call to SharedRuntime::shenandoah_clone_barrier(). > > It doesn't remove it. It moves it around. It's now only added to > those > clones that were not converted to loads/stores. Clone when it's not > converted to loads/stores uses bulk copies. So it's not obvious to me > that we can do better than using the > SharedRuntime::shenandoah_clone_barrier() call. Now that you say it, I wonder what's done for other GCs. They must be doing something here, to update the card tables. Other arraycopy routines call a special barrier in BarrierSet::static_write_ref_array_post(), this is not suitable for clones, but do they call any barrier for clone too? Or can other GCs ignore it because it's basically initializing stores? Roman From rkennke at redhat.com Thu Dec 8 16:09:16 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 17:09:16 +0100 Subject: backport of jdk9 c2 code to 8 In-Reply-To: References: Message-ID: <1481213356.2597.51.camel@redhat.com> Am Donnerstag, den 08.12.2016, 17:01 +0100 schrieb Roland Westrelin: > http://cr.openjdk.java.net/~roland/shenandoah/jdk9-backport/webrev.00 > / > > I had to import some non shenandoah changes from jdk9 to make my life > easier. Looks good to me. Roman From rwestrel at redhat.com Thu Dec 8 16:14:35 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 8 Dec 2016 17:14:35 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: <1481213284.2597.50.camel@redhat.com> References: <1481208905.2597.45.camel@redhat.com> <1481213284.2597.50.camel@redhat.com> Message-ID: <50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com> > Now that you say it, I wonder what's done for other GCs. They must be > doing something here, to update the card tables. Other arraycopy > routines call a special barrier in > BarrierSet::static_write_ref_array_post(), this is not suitable for > clones, but do they call any barrier for clone too? Or can other GCs > ignore it because it's basically initializing stores? For clone, unless ReduceInitialCardMarks is false, nothing is done AFAICT. Roland. From rkennke at redhat.com Thu Dec 8 16:24:25 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 17:24:25 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: <50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com> References: <1481208905.2597.45.camel@redhat.com> <1481213284.2597.50.camel@redhat.com> <50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com> Message-ID: <1481214265.2597.52.camel@redhat.com> Am Donnerstag, den 08.12.2016, 17:14 +0100 schrieb Roland Westrelin: > > Now that you say it, I wonder what's done for other GCs. They must > > be > > doing something here, to update the card tables. Other arraycopy > > routines call a special barrier in > > BarrierSet::static_write_ref_array_post(), this is not suitable for > > clones, but do they call any barrier for clone too? Or can other > > GCs > > ignore it because it's basically initializing stores? > > For clone, unless ReduceInitialCardMarks is false, nothing is done > AFAICT. Ok. And what happens when ReduceInitialCardMarks is false? Because this might be what we need. Roman From shade at redhat.com Thu Dec 8 16:37:44 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 8 Dec 2016 17:37:44 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: <7686428e-154f-4504-afbb-d2272c74633a@redhat.com> References: <1481208905.2597.45.camel@redhat.com> <7686428e-154f-4504-afbb-d2272c74633a@redhat.com> Message-ID: <262d5e03-a5b3-43a0-5272-3138fc3da291@redhat.com> On 12/08/2016 04:38 PM, Aleksey Shipilev wrote: > On 12/08/2016 03:55 PM, Roman Kennke wrote: >> Am Donnerstag, den 08.12.2016, 15:15 +0100 schrieb Roland Westrelin: >>> http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/ >>> >>> This re-enables an optimization that was disabled with shenandoah. >> >> Cool! I like that! >> >> Do we have any idea if it does improve performance? That would be arraycopy >> on smallish arrays only right? Aleksey? > > Let me find the arraycopy tests (that I swear I did in OpenJDK for the > previous Roland's non-Shenandoah patch :) and run then. Using this test: http://icedtea.classpath.org/people/shade/gc-bench/file/6d332199876c/src/main/java/org/openjdk/gcbench/runtime/arraycopy/RefArray.java === baseline Benchmark Mode Cnt Score Error Units RefArray.nulls_01 avgt 5 3.987 ? 1.282 ns/op RefArray.nulls_02 avgt 5 4.185 ? 0.145 ns/op RefArray.nulls_04 avgt 5 5.022 ? 0.601 ns/op RefArray.nulls_08 avgt 5 6.421 ? 0.252 ns/op RefArray.nulls_16 avgt 5 8.344 ? 1.012 ns/op RefArray.nulls_32 avgt 5 14.646 ? 1.486 ns/op RefArray.nulls_64 avgt 5 28.125 ? 3.523 ns/op RefArray.objs_01 avgt 5 3.905 ? 0.131 ns/op RefArray.objs_02 avgt 5 4.267 ? 0.332 ns/op RefArray.objs_04 avgt 5 4.838 ? 0.064 ns/op RefArray.objs_08 avgt 5 6.459 ? 0.187 ns/op RefArray.objs_16 avgt 5 8.610 ? 1.526 ns/op RefArray.objs_32 avgt 5 14.269 ? 0.536 ns/op RefArray.objs_64 avgt 5 27.225 ? 0.405 ns/op === baseline +UseShenandoahGC Benchmark Mode Cnt Score Error Units RefArray.nulls_01 avgt 5 16.021 ? 0.379 ns/op RefArray.nulls_02 avgt 5 15.997 ? 0.137 ns/op RefArray.nulls_04 avgt 5 16.560 ? 0.342 ns/op RefArray.nulls_08 avgt 5 16.103 ? 0.070 ns/op RefArray.nulls_16 avgt 5 17.060 ? 0.285 ns/op RefArray.nulls_32 avgt 5 18.654 ? 0.092 ns/op RefArray.nulls_64 avgt 5 30.848 ? 0.948 ns/op RefArray.objs_01 avgt 5 15.941 ? 0.015 ns/op RefArray.objs_02 avgt 5 15.953 ? 0.041 ns/op RefArray.objs_04 avgt 5 16.514 ? 0.059 ns/op RefArray.objs_08 avgt 5 16.122 ? 0.032 ns/op RefArray.objs_16 avgt 5 17.110 ? 0.146 ns/op RefArray.objs_32 avgt 5 19.304 ? 0.622 ns/op RefArray.objs_64 avgt 5 31.025 ? 0.806 ns/op === patched +UseShenandoahGC Benchmark Mode Cnt Score Error Units RefArray.nulls_01 avgt 5 5.110 ? 0.033 ns/op RefArray.nulls_02 avgt 5 5.293 ? 0.019 ns/op RefArray.nulls_04 avgt 5 6.903 ? 0.065 ns/op RefArray.nulls_08 avgt 5 9.627 ? 0.043 ns/op RefArray.nulls_16 avgt 5 17.016 ? 0.134 ns/op RefArray.nulls_32 avgt 5 19.466 ? 2.545 ns/op RefArray.nulls_64 avgt 5 30.659 ? 0.147 ns/op RefArray.objs_01 avgt 5 5.171 ? 0.106 ns/op RefArray.objs_02 avgt 5 5.827 ? 0.013 ns/op RefArray.objs_04 avgt 5 7.377 ? 0.046 ns/op RefArray.objs_08 avgt 5 9.353 ? 0.099 ns/op RefArray.objs_16 avgt 5 17.097 ? 0.434 ns/op RefArray.objs_32 avgt 5 19.212 ? 0.792 ns/op RefArray.objs_64 avgt 5 30.818 ? 0.301 ns/op Good to go. I guess the code quality might be a teeny little better (we've seen this before with null-paths in read barriers being thrown out), but I'll take that too. 0.82% 1.21% ?? 0x00007f2451477ea1: mov 0x10(%rcx),%r10d 1.17% 1.04% ?? 0x00007f2451477ea5: test %r10d,%r10d ? ?? 0x00007f2451477ea8: je 0x00007f2451477f01 2.23% 2.78% ? ?? 0x00007f2451477eaa: mov -0x8(%r12,%r10,8),%r10 10.65% 13.44% ? ?? 0x00007f2451477eaf: mov %r10,%r11 0.14% 0.19% ? ?? 0x00007f2451477eb2: shr $0x3,%r11 2.68% 3.36% ? ??? 0x00007f2451477eb6: mov %r11d,0x10(%rdx) 2.46% 3.38% ? ??? 0x00007f2451477eba: mov 0x14(%rcx),%r11d 0.05% ? ??? 0x00007f2451477ebe: test %r11d,%r11d ?? ??? 0x00007f2451477ec1: je 0x00007f2451477f06 0.12% 0.17% ?? ??? 0x00007f2451477ec3: mov -0x8(%r12,%r11,8),%r10 0.64% 1.01% ?? ??? 0x00007f2451477ec8: mov %r10,%r11 2.42% 2.35% ?? ??? 0x00007f2451477ecb: shr $0x3,%r11 0.34% 0.41% ?? ???? 0x00007f2451477ecf: mov %r11d,0x14(%rdx) 1.27% 1.26% ?? ???? 0x00007f2451477ed3: mov 0x18(%rcx),%r11d 0.10% 0.09% ?? ???? 0x00007f2451477ed7: test %r11d,%r11d ??????? 0x00007f2451477eda: je 0x00007f2451477f0b 1.77% 1.35% ??????? 0x00007f2451477edc: mov -0x8(%r12,%r11,8),%r10 0.36% 0.51% ??????? 0x00007f2451477ee1: mov %r10,%r11 1.10% 1.33% ??????? 0x00007f2451477ee4: shr $0x3,%r11 0.24% 0.19% ???????? 0x00007f2451477ee8: mov %r11d,0x18(%rdx) 1.77% 1.36% ???????? 0x00007f2451477eec: mov 0x1c(%rcx),%r11d 0.02% 0.03% ???????? 0x00007f2451477ef0: test %r11d,%r11d ???????? 0x00007f2451477ef3: jne 0x00007f2451477dd1 ??? ???? 0x00007f2451477ef9: xor %r10,%r10 ??? ???? 0x00007f2451477efc: jmpq 0x00007f2451477dda ??? ??? 0x00007f2451477f01: xor %r11,%r11 ?? ??? 0x00007f2451477f04: jmp 0x00007f2451477eb6 ?? ?? 0x00007f2451477f06: xor %r11,%r11 ? ?? 0x00007f2451477f09: jmp 0x00007f2451477ecf ? ? 0x00007f2451477f0b: xor %r11,%r11 ? 0x00007f2451477f0e: jmp 0x00007f2451477ee8 -Aleksey From rwestrel at redhat.com Thu Dec 8 16:41:23 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 08 Dec 2016 17:41:23 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: <1481214265.2597.52.camel@redhat.com> References: <1481208905.2597.45.camel@redhat.com> <1481213284.2597.50.camel@redhat.com> <50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com> <1481214265.2597.52.camel@redhat.com> Message-ID: > Ok. And what happens when ReduceInitialCardMarks is false? Because this > might be what we need. For instance clone: post_barrier(control(), memory(raw_adr_type), alloc_obj, no_particular_field, raw_adr_idx, no_particular_value, T_OBJECT, false); void GraphKit::post_barrier(Node* ctl, Node* store, Node* obj, Node* adr, uint adr_idx, Node* val, BasicType bt, bool use_precise) { BarrierSet* bs = Universe::heap()->barrier_set(); set_control(ctl); switch (bs->kind()) { case BarrierSet::G1SATBCTLogging: g1_write_barrier_post(store, obj, adr, adr_idx, val, bt, use_precise); break; case BarrierSet::CardTableForRS: case BarrierSet::CardTableExtension: write_barrier_post(store, obj, adr, adr_idx, val, use_precise); break; case BarrierSet::ModRef: case BarrierSet::ShenandoahBarrierSet: break; default : ShouldNotReachHere(); } } For array clone, if I follow the logic correctly arrayof_oop_disjoint_arraycopy stub. The shenandoah clone barrier is a no-op unless ShenandoahBarrierSet::need_update_refs_barrier() is true. If it's false often enough, then it seems a reasonable trade off to do the bulk copy and have an extra call. Roland. From roman at kennke.org Thu Dec 8 16:48:50 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 08 Dec 2016 16:48:50 +0000 Subject: hg: shenandoah/jdk8u/hotspot: Added dummy arg consumer to pseudo-logging code to be able to build release. Message-ID: <201612081648.uB8Gmol0014349@aojmv0008.oracle.com> Changeset: db98996d26b2 Author: rkennke Date: 2016-12-08 17:48 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/db98996d26b2 Added dummy arg consumer to pseudo-logging code to be able to build release. ! src/share/vm/gc_implementation/shenandoah/shenandoahLogging.hpp From rkennke at redhat.com Thu Dec 8 16:52:05 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 17:52:05 +0100 Subject: Enable optimization of arraycopy as loads/stores with Shenandoah In-Reply-To: References: <1481208905.2597.45.camel@redhat.com> <1481213284.2597.50.camel@redhat.com> <50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com> <1481214265.2597.52.camel@redhat.com> Message-ID: <1481215925.2597.54.camel@redhat.com> Am Donnerstag, den 08.12.2016, 17:41 +0100 schrieb Roland Westrelin: > > Ok. And what happens when ReduceInitialCardMarks is false? Because > > this > > might be what we need. > > For instance clone: > > ????post_barrier(control(), > ?????????????????memory(raw_adr_type), > ?????????????????alloc_obj, > ?????????????????no_particular_field, > ?????????????????raw_adr_idx, > ?????????????????no_particular_value, > ?????????????????T_OBJECT, > ?????????????????false); > > void GraphKit::post_barrier(Node* ctl, > ????????????????????????????Node* store, > ????????????????????????????Node* obj, > ????????????????????????????Node* adr, > ????????????????????????????uint??adr_idx, > ????????????????????????????Node* val, > ????????????????????????????BasicType bt, > ????????????????????????????bool use_precise) { > ? BarrierSet* bs = Universe::heap()->barrier_set(); > ? set_control(ctl); > ? switch (bs->kind()) { > ????case BarrierSet::G1SATBCTLogging: > ??????g1_write_barrier_post(store, obj, adr, adr_idx, val, bt, > use_precise); > ??????break; > > ????case BarrierSet::CardTableForRS: > ????case BarrierSet::CardTableExtension: > ??????write_barrier_post(store, obj, adr, adr_idx, val, use_precise); > ??????break; > > ????case BarrierSet::ModRef: > ????case BarrierSet::ShenandoahBarrierSet: > ??????break; > > ????default??????: > ??????ShouldNotReachHere(); > > ? } > } > > For array clone, if I follow the logic correctly > arrayof_oop_disjoint_arraycopy stub. > > The shenandoah clone barrier is a no-op unless > ShenandoahBarrierSet::need_update_refs_barrier() is true. If it's > false > often enough, then it seems a reasonable trade off to do the bulk > copy > and have an extra call. Ok. I know I went through this a while ago, but needed a refresher ;-) Thanks, Roman From rkennke at redhat.com Thu Dec 8 16:54:52 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 17:54:52 +0100 Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u In-Reply-To: <1481059464.2597.29.camel@redhat.com> References: <1481059464.2597.29.camel@redhat.com> Message-ID: <1481216092.2597.56.camel@redhat.com> I just pushed this little attendum without review: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/db98996d26b2 It consumes the arguments for our dummy-logging in release builds. Same hack as in JDK9 logging. Roman Am Dienstag, den 06.12.2016, 22:24 +0100 schrieb Roman Kennke: > This huge change backports the current state of JDK9 (minus the last > bunch of patches) to jdk8u: > > http://cr.openjdk.java.net/~rkennke/backport-jdk8/webrev.00/ > > Not sure if this can be reasonably reviewed. ;-) > > I checked this line by line and also compared it to our baseline repo > ( > http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/). > > The one thing missing is changes in src/share/vm/opto and > src/share/vm/adlc, but Roland is working on those. > > I've checked with SPECjvm2008 and jcstress is on the way. > Unfortunately, I could not get the jtreg stuff to work. > > Ok to go in? > > Roman From rkennke at redhat.com Thu Dec 8 17:08:48 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 08 Dec 2016 18:08:48 +0100 Subject: RFR: More cleanup Message-ID: <1481216928.2597.59.camel@redhat.com> This removes some more unnecessary diffs between jdk9 baseline and shenandoah: http://cr.openjdk.java.net/~rkennke/cleanup/webrev.00/ There's not much of significance here, except that it brings our repo closer to our baseline. The two most interesting ones: - In ThreadLocalAllocBuffer, we added 1 extra word for the brooks ptr to the end reserve. I believe this has been added a long time ago, and probably for the bug we resolved yesterday :-) In any case, it's not needed. - In JVM_Clone, we're doing a read-barrier when sticking an oop into a Handle. However, there's no guarantee that when we're crossing a safepoint there, that the oop is still valid for reading. Barriers should therefore always be done after pulling the oop out of the Handle. Done with this patch. Ok? Roman From rwestrel at redhat.com Fri Dec 9 08:30:48 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Fri, 09 Dec 2016 08:30:48 +0000 Subject: hg: shenandoah/jdk8u/hotspot: backport shenandoah C2 support from jdk9 Message-ID: <201612090830.uB98UmSx026466@aojmv0008.oracle.com> Changeset: da17b9cffd4f Author: roland Date: 2016-12-08 13:28 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/da17b9cffd4f backport shenandoah C2 support from jdk9 ! src/share/vm/adlc/formssel.cpp ! src/share/vm/adlc/formssel.hpp ! src/share/vm/adlc/output_c.cpp ! src/share/vm/adlc/output_h.cpp ! src/share/vm/ci/ciInstanceKlass.cpp ! src/share/vm/ci/ciInstanceKlass.hpp ! src/share/vm/opto/addnode.cpp ! src/share/vm/opto/callGenerator.cpp ! src/share/vm/opto/callnode.cpp ! src/share/vm/opto/callnode.hpp ! src/share/vm/opto/cfgnode.cpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/compile.hpp ! src/share/vm/opto/connode.cpp ! src/share/vm/opto/connode.hpp ! src/share/vm/opto/escape.cpp ! src/share/vm/opto/gcm.cpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/graphKit.hpp ! src/share/vm/opto/lcm.cpp ! src/share/vm/opto/library_call.cpp ! src/share/vm/opto/loopPredicate.cpp ! src/share/vm/opto/loopTransform.cpp ! src/share/vm/opto/loopnode.cpp ! src/share/vm/opto/loopnode.hpp ! src/share/vm/opto/loopopts.cpp ! src/share/vm/opto/machnode.cpp ! src/share/vm/opto/machnode.hpp ! src/share/vm/opto/macro.cpp ! src/share/vm/opto/matcher.cpp ! src/share/vm/opto/matcher.hpp ! src/share/vm/opto/memnode.cpp ! src/share/vm/opto/memnode.hpp ! src/share/vm/opto/multnode.cpp ! src/share/vm/opto/multnode.hpp ! src/share/vm/opto/node.cpp ! src/share/vm/opto/node.hpp ! src/share/vm/opto/parse2.cpp ! src/share/vm/opto/parse3.cpp ! src/share/vm/opto/phaseX.cpp ! src/share/vm/opto/phaseX.hpp ! src/share/vm/opto/runtime.cpp ! src/share/vm/opto/runtime.hpp ! src/share/vm/opto/shenandoahSupport.cpp ! src/share/vm/opto/shenandoahSupport.hpp ! src/share/vm/opto/stringopts.cpp ! src/share/vm/opto/subnode.cpp ! src/share/vm/opto/superword.cpp ! src/share/vm/opto/superword.hpp ! src/share/vm/opto/type.cpp ! src/share/vm/opto/type.hpp From rwestrel at redhat.com Fri Dec 9 09:04:28 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Fri, 09 Dec 2016 09:04:28 +0000 Subject: hg: shenandoah/jdk9/hotspot: Enable optimization of arraycopy as loads/stores with Shenandoah Message-ID: <201612090904.uB994Shq005569@aojmv0008.oracle.com> Changeset: f61052a4dd46 Author: roland Date: 2016-12-08 14:45 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/f61052a4dd46 Enable optimization of arraycopy as loads/stores with Shenandoah ! src/share/vm/ci/ciInstanceKlass.cpp ! src/share/vm/ci/ciInstanceKlass.hpp ! src/share/vm/opto/arraycopynode.cpp ! src/share/vm/opto/arraycopynode.hpp ! src/share/vm/opto/library_call.cpp ! src/share/vm/opto/macro.hpp ! src/share/vm/opto/macroArrayCopy.cpp ! src/share/vm/opto/shenandoahSupport.cpp From rwestrel at redhat.com Fri Dec 9 09:31:49 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Fri, 09 Dec 2016 09:31:49 +0000 Subject: hg: shenandoah/jdk9/hotspot: replace barrier's input with barrier's output in all dominated uses to decrease pressure on register allocator Message-ID: <201612090931.uB99Vntp013056@aojmv0008.oracle.com> Changeset: 577da6ba5a48 Author: roland Date: 2016-12-02 16:49 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/577da6ba5a48 replace barrier's input with barrier's output in all dominated uses to decrease pressure on register allocator ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/opto/block.hpp ! src/share/vm/opto/lcm.cpp ! src/share/vm/opto/loopnode.hpp ! src/share/vm/opto/shenandoahSupport.cpp From shade at redhat.com Fri Dec 9 10:58:29 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 9 Dec 2016 11:58:29 +0100 Subject: RFR: More cleanup In-Reply-To: <1481216928.2597.59.camel@redhat.com> References: <1481216928.2597.59.camel@redhat.com> Message-ID: On 12/08/2016 06:08 PM, Roman Kennke wrote: > This removes some more unnecessary diffs between jdk9 baseline and > shenandoah: > > http://cr.openjdk.java.net/~rkennke/cleanup/webrev.00/ Looks okay to me. -Aleksey From roman at kennke.org Fri Dec 9 11:02:11 2016 From: roman at kennke.org (roman at kennke.org) Date: Fri, 09 Dec 2016 11:02:11 +0000 Subject: hg: shenandoah/jdk9/hotspot: More cleanup Message-ID: <201612091102.uB9B2BGv006760@aojmv0008.oracle.com> Changeset: be1010acc2ff Author: rkennke Date: 2016-12-09 12:01 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/be1010acc2ff More cleanup ! src/cpu/aarch64/vm/aarch64.ad ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.hpp ! src/cpu/aarch64/vm/methodHandles_aarch64.cpp ! src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp ! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp ! src/cpu/aarch64/vm/stubRoutines_aarch64.hpp ! src/cpu/aarch64/vm/templateInterpreterGenerator_aarch64.cpp ! src/share/vm/classfile/systemDictionary.cpp ! src/share/vm/gc/serial/genMarkSweep.cpp ! src/share/vm/gc/shared/gcCause.hpp ! src/share/vm/gc/shared/threadLocalAllocBuffer.cpp ! src/share/vm/oops/cpCache.cpp ! src/share/vm/oops/instanceRefKlass.inline.hpp ! src/share/vm/oops/objArrayOop.hpp ! src/share/vm/oops/oop.cpp ! src/share/vm/prims/jni.cpp ! src/share/vm/prims/jvm.cpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/safepoint.cpp ! src/share/vm/services/heapDumper.cpp From shade at redhat.com Mon Dec 12 11:17:20 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 12 Dec 2016 12:17:20 +0100 Subject: RFR (S): Heap dump support Message-ID: Hi, I have been trying to analyze the cause for OOME with Shenandoah, only to figure that Shenandoah does not support heap dumping (d'oh). Solved this by implementing ShenandoahHeap::safe_object_iterate: http://cr.openjdk.java.net/~shade/shenandoah/heapdump-support/webrev.01/ Testing: manual heap dumps with fastdebug/release. Thanks, -Aleksey From shade at redhat.com Mon Dec 12 14:36:04 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 12 Dec 2016 15:36:04 +0100 Subject: RFR (S): Fix races on full GC request Message-ID: Hi, There is yet another semi-race in scheduling Full GC, here: 173 void ShenandoahConcurrentThread::do_full_gc(GCCause::Cause cause) { 175 assert(Thread::current()->is_Java_thread(), "expect Java thread here"); 176 177 MonitorLockerEx ml(&_full_gc_lock); 178 schedule_full_gc(); // sets _do_full_gc = true 179 _full_gc_cause = cause; 180 181 // Now that full GC is scheduled, we can abort everything else 182 ShenandoahHeap::heap()->cancel_concgc(cause); 183 184 while (_do_full_gc) { 185 ml.wait(); 186 OrderAccess::storeload(); 187 } 188 assert(!_do_full_gc, "expect full GC to have completed"); 189 } If there is a thread that blocked on _full_gc_lock when Full GC had started, but re-entered after Full GC is completed, it would try to schedule full GC / cancel conc GC again! This mostly happens when full GCs are really short. In our current code, this also fails the assert in Shenandoah control thread that every cancellation should have a reason, like impending full GC. This interesting result is because there are racy unlocked gets of _do_full_gc in assertion code. Both are solved by turning _do_full_gc updates atomic/lock-free, and using the lock only for wait/notifies: http://cr.openjdk.java.net/~shade/shenandoah/cancel-races-again/webrev.02/ Testing: hotspot_gc_shenandoah, jcstress Thanks, -Aleksey P.S. I swear to God, another race there, and I will burn the entire termination protocol thing down. From zgu at redhat.com Mon Dec 12 14:48:01 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 12 Dec 2016 09:48:01 -0500 Subject: RFR (S): Heap dump support In-Reply-To: References: Message-ID: <810fe37d-1965-aa52-8103-a600064bc7a1@redhat.com> Hi Aleksey, ShenandoahSafeObjectIterateAdjustPtrsClosure seems a duplicate of ShenandoahAdjustPointersClosure in shenandoahMarkCompact.cpp. Thanks, -Zhengyu On 12/12/2016 06:17 AM, Aleksey Shipilev wrote: > Hi, > > I have been trying to analyze the cause for OOME with Shenandoah, only to figure > that Shenandoah does not support heap dumping (d'oh). > > Solved this by implementing ShenandoahHeap::safe_object_iterate: > http://cr.openjdk.java.net/~shade/shenandoah/heapdump-support/webrev.01/ > > Testing: manual heap dumps with fastdebug/release. > > Thanks, > -Aleksey > From shade at redhat.com Mon Dec 12 14:58:19 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 12 Dec 2016 15:58:19 +0100 Subject: RFR (S): Heap dump support In-Reply-To: <810fe37d-1965-aa52-8103-a600064bc7a1@redhat.com> References: <810fe37d-1965-aa52-8103-a600064bc7a1@redhat.com> Message-ID: <926cec20-2b26-83b9-bccd-17fe2e78ae65@redhat.com> On 12/12/2016 03:48 PM, Zhengyu Gu wrote: > ShenandoahSafeObjectIterateAdjustPtrsClosure seems a duplicate of > ShenandoahAdjustPointersClosure in shenandoahMarkCompact.cpp. Yes, except that mark-compact bypasses the usual fwdptr verification checks with BrooksPointer::get_raw, which I don't want to do in regular code. Also, I thought copying it would be more straightforward than making it shared. We should clean up all these closures at once in some shared file, like g1OopClosures.* do. Thanks, -Aleksey From zgu at redhat.com Mon Dec 12 15:13:13 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 12 Dec 2016 10:13:13 -0500 Subject: RFR (S): Heap dump support In-Reply-To: <926cec20-2b26-83b9-bccd-17fe2e78ae65@redhat.com> References: <810fe37d-1965-aa52-8103-a600064bc7a1@redhat.com> <926cec20-2b26-83b9-bccd-17fe2e78ae65@redhat.com> Message-ID: <18b0f407-b116-5101-cd72-65d9a19da53b@redhat.com> Okay. Thanks, -Zhengyu On 12/12/2016 09:58 AM, Aleksey Shipilev wrote: > On 12/12/2016 03:48 PM, Zhengyu Gu wrote: >> ShenandoahSafeObjectIterateAdjustPtrsClosure seems a duplicate of >> ShenandoahAdjustPointersClosure in shenandoahMarkCompact.cpp. > Yes, except that mark-compact bypasses the usual fwdptr verification checks with > BrooksPointer::get_raw, which I don't want to do in regular code. > > Also, I thought copying it would be more straightforward than making it shared. > We should clean up all these closures at once in some shared file, like > g1OopClosures.* do. > > Thanks, > -Aleksey > > > From rkennke at redhat.com Mon Dec 12 15:16:42 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 12 Dec 2016 16:16:42 +0100 Subject: RFR (S): Heap dump support In-Reply-To: References: Message-ID: <1481555802.2597.77.camel@redhat.com> Hi Aleksey, this would report evacuated objects twice, right? Maybe simply skip cset regions? Not sure we need to update references. Seems like extra unnecessary work. Calling code should do the appropriate read barriers, or receive opaque JNI handles. I believe the straightforward way to implement this is to simply delegate to marked_object_iterate() but only for non-cset regions. Roman Am Montag, den 12.12.2016, 12:17 +0100 schrieb Aleksey Shipilev: > Hi, > > I have been trying to analyze the cause for OOME with Shenandoah, > only to figure > that Shenandoah does not support heap dumping (d'oh). > > Solved this by implementing ShenandoahHeap::safe_object_iterate: > ? http://cr.openjdk.java.net/~shade/shenandoah/heapdump-support/webre > v.01/ > > Testing: manual heap dumps with fastdebug/release. > > Thanks, > -Aleksey > From rkennke at redhat.com Mon Dec 12 15:20:12 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 12 Dec 2016 16:20:12 +0100 Subject: RFR (S): Fix races on full GC request In-Reply-To: References: Message-ID: <1481556012.2597.79.camel@redhat.com> Hi, > ?173 void ShenandoahConcurrentThread::do_full_gc(GCCause::Cause > cause) { > ?175???assert(Thread::current()->is_Java_thread(), "expect Java > thread here"); > ?176 > ?177???MonitorLockerEx ml(&_full_gc_lock); > ?178???schedule_full_gc(); // sets _do_full_gc = true > ?179???_full_gc_cause = cause; > ?180 > ?181???// Now that full GC is scheduled, we can abort everything else > ?182???ShenandoahHeap::heap()->cancel_concgc(cause); > ?183 > ?184???while (_do_full_gc) { > ?185?????ml.wait(); > ?186?????OrderAccess::storeload(); > ?187???} > ?188???assert(!_do_full_gc, "expect full GC to have completed"); > ?189 } > > If there is a thread that blocked on _full_gc_lock when Full GC had > started, but > re-entered after Full GC is completed, it would try to schedule full > GC / cancel > conc GC again! This mostly happens when full GCs are really short. > > In our current code, this also fails the assert in Shenandoah control > thread > that every cancellation should have a reason, like impending full GC. > This > interesting result is because there are racy unlocked gets of > _do_full_gc in > assertion code. > > Both are solved by turning _do_full_gc updates atomic/lock-free, and > using the > lock only for wait/notifies: > ? http://cr.openjdk.java.net/~shade/shenandoah/cancel-races-again/web > rev.02/ Looks good to me. > P.S. I swear to God, another race there, and I will burn the entire > termination > protocol thing down. :-D I can't count how many races and strange conditions we already fixed there. One entire problem class went *puff* away when Zhengyu suggested to simplify JNI critical regions. I seriously hope it's the last one. Otherwise we simply give up on terminating? :-P Roman From shade at redhat.com Mon Dec 12 16:02:14 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 12 Dec 2016 17:02:14 +0100 Subject: RFR (S): Heap dump support In-Reply-To: <1481555802.2597.77.camel@redhat.com> References: <1481555802.2597.77.camel@redhat.com> Message-ID: <492adb14-2bb0-1b2b-8eed-35666eb8e465@redhat.com> On 12/12/2016 04:16 PM, Roman Kennke wrote: > this would report evacuated objects twice, right? Maybe simply skip > cset regions? Right, I missed the double-counting here! > Not sure we need to update references. Seems like extra unnecessary > work. Calling code should do the appropriate read barriers, or receive > opaque JNI handles. Except that HeapDumper does not have read barriers, and does abominable things like accessing field with naked (oop + field_offset). A little more safety for "safe_*" iteration method would not hurt. > I believe the straightforward way to implement this is to simply > delegate to marked_object_iterate() but only for non-cset regions. Right. A little less straightforward way is to reuse flagged heap_region_iterate() for this. See: http://cr.openjdk.java.net/~shade/shenandoah/heapdump-support/webrev.02/ Thanks, -Aleksey From roman at kennke.org Mon Dec 12 16:03:22 2016 From: roman at kennke.org (roman at kennke.org) Date: Mon, 12 Dec 2016 16:03:22 +0000 Subject: hg: shenandoah/jdk8u/hotspot: Added missing include of oop closures. Fixes linking problem. Message-ID: <201612121603.uBCG3MR5027600@aojmv0008.oracle.com> Changeset: 88c8ad7d034b Author: rkennke Date: 2016-12-12 17:03 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/88c8ad7d034b Added missing include of oop closures. Fixes linking problem. ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.cpp From rkennke at redhat.com Mon Dec 12 16:04:40 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 12 Dec 2016 17:04:40 +0100 Subject: RFR (S): Heap dump support In-Reply-To: <492adb14-2bb0-1b2b-8eed-35666eb8e465@redhat.com> References: <1481555802.2597.77.camel@redhat.com> <492adb14-2bb0-1b2b-8eed-35666eb8e465@redhat.com> Message-ID: <1481558680.2597.80.camel@redhat.com> Am Montag, den 12.12.2016, 17:02 +0100 schrieb Aleksey Shipilev: > On 12/12/2016 04:16 PM, Roman Kennke wrote: > > this would report evacuated objects twice, right? Maybe simply skip > > cset regions? > > Right, I missed the double-counting here! > > > Not sure we need to update references. Seems like extra unnecessary > > work. Calling code should do the appropriate read barriers, or > > receive > > opaque JNI handles. > > Except that HeapDumper does not have read barriers, and does > abominable things > like accessing field with naked (oop + field_offset). A little more > safety for > "safe_*" iteration method would not hurt. Yep. And we're iterating over everything anyway, so we can just as well fix the ptrs. > > I believe the straightforward way to implement this is to simply > > delegate to marked_object_iterate() but only for non-cset regions. > > Right. A little less straightforward way is to reuse flagged > heap_region_iterate() for this. See: > ? http://cr.openjdk.java.net/~shade/shenandoah/heapdump- > support/webrev.02/ Good! Please push! Roman From ashipile at redhat.com Mon Dec 12 16:16:56 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 12 Dec 2016 16:16:56 +0000 Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets Message-ID: <201612121616.uBCGGuBR001285@aojmv0008.oracle.com> Changeset: 582651ecf809 Author: shade Date: 2016-12-12 17:06 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/582651ecf809 Heap dump support ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp Changeset: aef414e15af5 Author: shade Date: 2016-12-12 17:08 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/aef414e15af5 Fix another Full GC trigger race ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp From shade at redhat.com Mon Dec 12 16:35:29 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 12 Dec 2016 17:35:29 +0100 Subject: RFR (XS): Enable -XX:+HeapDump{Before|After}FullGC Message-ID: <4776529a-0485-6067-8da7-549943d2e39f@redhat.com> Hi, Little change: make Shenandoah dump the heap before/after Full GC, if requested, like any diagnosable collector should do: http://cr.openjdk.java.net/~shade/shenandoah/heapdumps-before-after/webrev.01/ Thanks, -Aleksey From rkennke at redhat.com Mon Dec 12 17:25:03 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 12 Dec 2016 18:25:03 +0100 Subject: RFR (XS): Enable -XX:+HeapDump{Before|After}FullGC In-Reply-To: <4776529a-0485-6067-8da7-549943d2e39f@redhat.com> References: <4776529a-0485-6067-8da7-549943d2e39f@redhat.com> Message-ID: <1481563503.2597.82.camel@redhat.com> Yes Am Montag, den 12.12.2016, 17:35 +0100 schrieb Aleksey Shipilev: > Hi, > > Little change: make Shenandoah dump the heap before/after Full GC, if > requested, > like any diagnosable collector should do: > ?http://cr.openjdk.java.net/~shade/shenandoah/heapdumps-before-after/ > webrev.01/ > > Thanks, > -Aleksey > From ashipile at redhat.com Mon Dec 12 17:26:29 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 12 Dec 2016 17:26:29 +0000 Subject: hg: shenandoah/jdk9/hotspot: Enable -XX:+HeapDump{Before|After}FullGC. Message-ID: <201612121726.uBCHQTRl018865@aojmv0008.oracle.com> Changeset: 6f8831470752 Author: shade Date: 2016-12-12 18:26 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/6f8831470752 Enable -XX:+HeapDump{Before|After}FullGC. ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp From shade at redhat.com Tue Dec 13 09:47:17 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 13 Dec 2016 10:47:17 +0100 Subject: Perf: wasted region after humongous alloc? Message-ID: <4673219a-e9f4-fa0e-1d3f-bf5204537b6c@redhat.com> Hi, Been playing with tests, and realized we have an peculiar behavior when allocating humongous objects, e.g. in: public class Alloc { static final int SIZE = Integer.getInteger("size", 2_000_000); static Object sink; public static void main(String... args) throws Exception { for (int c = 0; c < 1000000; c++) { sink = new int[SIZE]; } } } The region logging prints this: ... region 238, used = 4194304, live = 0, flags = region 239, used = 4194304, live = 0, flags = region 240, used = 0, live = 0, flags = region 241, used = 4194304, live = 0, flags = region 242, used = 4194304, live = 0, flags = region 243, used = 0, live = 0, flags = region 244, used = 4194304, live = 0, flags = region 245, used = 4194304, live = 0, flags = region 246, used = 0, live = 0, flags = ... So there seems to be an empty region right after the humongous allocation. Are we wasting it intentionally, or is it a bug? Seems wasteful either way. Thanks, -Aleksey From rkennke at redhat.com Tue Dec 13 09:55:15 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 13 Dec 2016 10:55:15 +0100 Subject: Perf: wasted region after humongous alloc? In-Reply-To: <4673219a-e9f4-fa0e-1d3f-bf5204537b6c@redhat.com> References: <4673219a-e9f4-fa0e-1d3f-bf5204537b6c@redhat.com> Message-ID: <1481622915.2597.92.camel@redhat.com> It's probably because we're unconditionally skipping to the next region in ShenandoahFreeSet::claim_contiguous(), assuming that normally the 'current' region is already allocated into. This might not be the case though, especially when commonly allocating region-sized TLABs. In any case, it is wasteful. Do you want to look into this? Roman Am Dienstag, den 13.12.2016, 10:47 +0100 schrieb Aleksey Shipilev: > Hi, > > Been playing with tests, and realized we have an peculiar behavior > when > allocating humongous objects, e.g. in: > > public class Alloc { > ? static final int SIZE = Integer.getInteger("size", 2_000_000); > ? static Object sink; > > ? public static void main(String... args) throws Exception { > ????for (int c = 0; c < 1000000; c++) { > ??????sink = new int[SIZE]; > ????} > ? } > } > > The region logging prints this: > > ... > region 238, used = 4194304, live = 0, flags = > region 239, used = 4194304, live = 0, flags = > region 240, used = 0, live = 0, flags = > region 241, used = 4194304, live = 0, flags = > region 242, used = 4194304, live = 0, flags = > region 243, used = 0, live = 0, flags = > region 244, used = 4194304, live = 0, flags = > region 245, used = 4194304, live = 0, flags = > region 246, used = 0, live = 0, flags = > ... > > So there seems to be an empty region right after the humongous > allocation. Are > we wasting it intentionally, or is it a bug? Seems wasteful either > way. > > Thanks, > -Aleksey > From shade at redhat.com Tue Dec 13 10:11:06 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 13 Dec 2016 11:11:06 +0100 Subject: Perf: wasted region after humongous alloc? In-Reply-To: <1481622915.2597.92.camel@redhat.com> References: <4673219a-e9f4-fa0e-1d3f-bf5204537b6c@redhat.com> <1481622915.2597.92.camel@redhat.com> Message-ID: <3f85aeb6-062c-11a4-e3c4-ddc5775bc2f5@redhat.com> Aha, this decision is odd, and as the example below shows, it wastes regions. Please take it into your work queue? Thanks, -Aleksey On 12/13/2016 10:55 AM, Roman Kennke wrote: > It's probably because we're unconditionally skipping to the next region > in ShenandoahFreeSet::claim_contiguous(), assuming that normally the > 'current' region is already allocated into. This might not be the case > though, especially when commonly allocating region-sized TLABs. > > In any case, it is wasteful. Do you want to look into this? > > Roman > > Am Dienstag, den 13.12.2016, 10:47 +0100 schrieb Aleksey Shipilev: >> Hi, >> >> Been playing with tests, and realized we have an peculiar behavior >> when >> allocating humongous objects, e.g. in: >> >> public class Alloc { >> static final int SIZE = Integer.getInteger("size", 2_000_000); >> static Object sink; >> >> public static void main(String... args) throws Exception { >> for (int c = 0; c < 1000000; c++) { >> sink = new int[SIZE]; >> } >> } >> } >> >> The region logging prints this: >> >> ... >> region 238, used = 4194304, live = 0, flags = >> region 239, used = 4194304, live = 0, flags = >> region 240, used = 0, live = 0, flags = >> region 241, used = 4194304, live = 0, flags = >> region 242, used = 4194304, live = 0, flags = >> region 243, used = 0, live = 0, flags = >> region 244, used = 4194304, live = 0, flags = >> region 245, used = 4194304, live = 0, flags = >> region 246, used = 0, live = 0, flags = >> ... >> >> So there seems to be an empty region right after the humongous >> allocation. Are >> we wasting it intentionally, or is it a bug? Seems wasteful either >> way. >> >> Thanks, >> -Aleksey >> From shade at redhat.com Tue Dec 13 13:23:19 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 13 Dec 2016 14:23:19 +0100 Subject: RFR (S): Do more Full GC tries following the allocation failure Message-ID: Hi, There is another interesting race after full GC: if there are many threads alloc-failing and then requesting full GC, it might so happen that one of those threads unblocks after full GC, tries to allocate only to find other threads have claimed everything, fails, and that is OOME. While the better strategy should be another full GC. See the change and the comment: http://cr.openjdk.java.net/~shade/shenandoah/full-gc-retry/webrev.01/ Additionally, this gives us a tuning knob: with -XX:ShenandoahFullGCTries=0, we say that we would rather fail with OOME than accept the Full GC. Testing: hotspot_gc_shenandoah, gc-bench alloc tests (where it OOMEd before) Thanks, -Aleksey From chf at redhat.com Tue Dec 13 15:19:37 2016 From: chf at redhat.com (Christine Flood) Date: Tue, 13 Dec 2016 10:19:37 -0500 (EST) Subject: RFR (S): Do more Full GC tries following the allocation failure In-Reply-To: References: Message-ID: <1621862713.4202511.1481642377106.JavaMail.zimbra@redhat.com> I suppose three is the magic number... This looks fine to me. ----- Original Message ----- > From: "Aleksey Shipilev" > To: shenandoah-dev at openjdk.java.net > Sent: Tuesday, December 13, 2016 8:23:19 AM > Subject: RFR (S): Do more Full GC tries following the allocation failure > > Hi, > > There is another interesting race after full GC: if there are many threads > alloc-failing and then requesting full GC, it might so happen that one of > those > threads unblocks after full GC, tries to allocate only to find other threads > have claimed everything, fails, and that is OOME. While the better strategy > should be another full GC. > > See the change and the comment: > http://cr.openjdk.java.net/~shade/shenandoah/full-gc-retry/webrev.01/ > > Additionally, this gives us a tuning knob: with -XX:ShenandoahFullGCTries=0, > we > say that we would rather fail with OOME than accept the Full GC. > > Testing: hotspot_gc_shenandoah, gc-bench alloc tests (where it OOMEd before) > > Thanks, > -Aleksey > > From ashipile at redhat.com Tue Dec 13 15:51:36 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 13 Dec 2016 15:51:36 +0000 Subject: hg: shenandoah/jdk9/hotspot: Do more Full GC tries following the allocation failure Message-ID: <201612131551.uBDFpaQX028057@aojmv0008.oracle.com> Changeset: 7d3e70252b18 Author: shade Date: 2016-12-13 16:51 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/7d3e70252b18 Do more Full GC tries following the allocation failure ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp From rkennke at redhat.com Tue Dec 13 17:03:17 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 13 Dec 2016 18:03:17 +0100 Subject: RFR: Add remaining unused free space to 'used' counter in free list Message-ID: <1481648597.2597.97.camel@redhat.com> I noticed that when a program allocates many objects that are slightly larger than half a region, we would continuously run into full GC. The reason is that when we skip to next region for allocation, we did not count the remaining unused free space as 'used', and thus barely reported half of heap remaining when running OOM. Oops. Fixed in ShenandoahFreeList by adding last-current-region's remaining free() to the free-lists used. Ok? http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/ Roman From zgu at redhat.com Tue Dec 13 17:07:46 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 13 Dec 2016 12:07:46 -0500 Subject: RFR: Add remaining unused free space to 'used' counter in free list In-Reply-To: <1481648597.2597.97.camel@redhat.com> References: <1481648597.2597.97.camel@redhat.com> Message-ID: <2121c2f2-529e-6ff9-f137-ff80bfd0064a@redhat.com> Look good. -Zhengyu On 12/13/2016 12:03 PM, Roman Kennke wrote: > I noticed that when a program allocates many objects that are slightly > larger than half a region, we would continuously run into full GC. The > reason is that when we skip to next region for allocation, we did not > count the remaining unused free space as 'used', and thus barely > reported half of heap remaining when running OOM. Oops. > > Fixed in ShenandoahFreeList by adding last-current-region's remaining > free() to the free-lists used. > > Ok? > > http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/ > > Roman From shade at redhat.com Tue Dec 13 17:10:51 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 13 Dec 2016 18:10:51 +0100 Subject: RFR: Add remaining unused free space to 'used' counter in free list In-Reply-To: <1481648597.2597.97.camel@redhat.com> References: <1481648597.2597.97.camel@redhat.com> Message-ID: <4203aa39-fdac-37a2-ee80-fb1694a50c52@redhat.com> On 12/13/2016 06:03 PM, Roman Kennke wrote: > I noticed that when a program allocates many objects that are slightly > larger than half a region, we would continuously run into full GC. The > reason is that when we skip to next region for allocation, we did not > count the remaining unused free space as 'used', and thus barely > reported half of heap remaining when running OOM. Oops. > > Fixed in ShenandoahFreeList by adding last-current-region's remaining > free() to the free-lists used. > > Ok? > > http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/ I don't see how it matches with the reverse operation, which decrements based on region used size only, not its free size? See: heap->decrease_used(region->used()); _heap->decrease_used(r->used()); Thanks, -Aleksey From rkennke at redhat.com Tue Dec 13 17:11:38 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 13 Dec 2016 18:11:38 +0100 Subject: RFR: Reduce waste in humongous allocations Message-ID: <1481649098.2597.99.camel@redhat.com> as Aleksey has shown, when repeatedly allocating humongous objects, we tend to leave gaps between them. The reason is that we start looking for contigous regions starting one region after the current (allocation) region, and then discard that alloc region, starting a new one after the humongous object. The fix is two-fold: - Instead of discarding currently active allocation regions, we re- append them to the free-list (together with any free regions that we skipped while searching a contiguous block). This should be useful, e.g. when we have a not-totally-filled alloc region and then allocate a humongous object. - When searching for contigous space, also consider the current alloc region. The complication here is that we must prevent concurrent allocations from it. This patch does it by pre-emptively allocating region-sized chunk, which has two effects: it blocks concurrent allocations and it tells us if the region is free in a concurrency-safe manner. If our search for contiguous block fails, we revert that by freeing such regions again. It passes jtreg tests and SPECjvm. http://cr.openjdk.java.net/~rkennke/fixhumongousalloc/webrev.00/ Ok? Roman From rkennke at redhat.com Tue Dec 13 17:16:36 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 13 Dec 2016 18:16:36 +0100 Subject: RFR: Add remaining unused free space to 'used' counter in free list In-Reply-To: <4203aa39-fdac-37a2-ee80-fb1694a50c52@redhat.com> References: <1481648597.2597.97.camel@redhat.com> <4203aa39-fdac-37a2-ee80-fb1694a50c52@redhat.com> Message-ID: <1481649396.2597.100.camel@redhat.com> Am Dienstag, den 13.12.2016, 18:10 +0100 schrieb Aleksey Shipilev: > On 12/13/2016 06:03 PM, Roman Kennke wrote: > > I noticed that when a program allocates many objects that are > > slightly > > larger than half a region, we would continuously run into full GC. > > The > > reason is that when we skip to next region for allocation, we did > > not > > count the remaining unused free space as 'used', and thus barely > > reported half of heap remaining when running OOM. Oops. > > > > Fixed in ShenandoahFreeList by adding last-current-region's > > remaining > > free() to the free-lists used. > > > > Ok? > > > > http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/ > > I don't see how it matches with the reverse operation, which > decrements based on > region used size only, not its free size? > > See: > ???????heap->decrease_used(region->used()); > ??????_heap->decrease_used(r->used()); This is in the heap. The patch addresses the ShenandoahFreeList. I checked it, for heap used counters, decrease and increase do match. Roman From shade at redhat.com Tue Dec 13 17:17:20 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 13 Dec 2016 18:17:20 +0100 Subject: RFR: Add remaining unused free space to 'used' counter in free list In-Reply-To: <1481649396.2597.100.camel@redhat.com> References: <1481648597.2597.97.camel@redhat.com> <4203aa39-fdac-37a2-ee80-fb1694a50c52@redhat.com> <1481649396.2597.100.camel@redhat.com> Message-ID: <4ab22e2d-45d1-4564-420e-cd0ee7f55a10@redhat.com> On 12/13/2016 06:16 PM, Roman Kennke wrote: > Am Dienstag, den 13.12.2016, 18:10 +0100 schrieb Aleksey Shipilev: >> On 12/13/2016 06:03 PM, Roman Kennke wrote: >>> http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/ >> >> I don't see how it matches with the reverse operation, which >> decrements based on >> region used size only, not its free size? >> >> See: >> heap->decrease_used(region->used()); >> _heap->decrease_used(r->used()); > > This is in the heap. The patch addresses the ShenandoahFreeList. > > I checked it, for heap used counters, decrease and increase do match. Ah, my mistake. Looks good then. -Aleksey From roman at kennke.org Tue Dec 13 17:20:45 2016 From: roman at kennke.org (roman at kennke.org) Date: Tue, 13 Dec 2016 17:20:45 +0000 Subject: hg: shenandoah/jdk9/hotspot: Add remaining unused free space to 'used' counter in free list. Makes heuristics more precise. Message-ID: <201612131720.uBDHKksD024295@aojmv0008.oracle.com> Changeset: 155d04209453 Author: rkennke Date: 2016-12-13 18:20 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/155d04209453 Add remaining unused free space to 'used' counter in free list. Makes heuristics more precise. ! src/share/vm/gc/shenandoah/shenandoahFreeSet.cpp From shade at redhat.com Tue Dec 13 18:09:52 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 13 Dec 2016 19:09:52 +0100 Subject: RFR: Reduce waste in humongous allocations In-Reply-To: <1481649098.2597.99.camel@redhat.com> References: <1481649098.2597.99.camel@redhat.com> Message-ID: <2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com> On 12/13/2016 06:11 PM, Roman Kennke wrote: > as Aleksey has shown, when repeatedly allocating humongous objects, we > tend to leave gaps between them. The reason is that we start looking > for contigous regions starting one region after the current > (allocation) region, and then discard that alloc region, starting a new > one after the humongous object. > > The fix is two-fold: > - Instead of discarding currently active allocation regions, we re- > append them to the free-list (together with any free regions that we > skipped while searching a contiguous block). This should be useful, > e.g. when we have a not-totally-filled alloc region and then allocate a > humongous object. > - When searching for contigous space, also consider the current alloc > region. The complication here is that we must prevent concurrent > allocations from it. This patch does it by pre-emptively allocating > region-sized chunk, which has two effects: it blocks concurrent > allocations and it tells us if the region is free in a concurrency-safe > manner. If our search for contiguous block fails, we revert that by > freeing such regions again. > > It passes jtreg tests and SPECjvm. > > http://cr.openjdk.java.net/~rkennke/fixhumongousalloc/webrev.00/ Ugh. The code got even more confusing than it was before... At this point I wonder if acquiring a lock when claiming free regions is saner than trying to do this in a lock-free manner. With TLAB allocations, this shouldn't be that painful? Seeing mutations in ShenandoahFreeSet::is_contiguous() makes me all itchy, it should be called differently. Also, does the code claim the regions one-by-one? What if we have two competing multi-region humongous allocations? Does it guarantee to allocate both (e.g. are they stepping on each other's toes, preventing global progress?) Thanks, -Aleksey From rkennke at redhat.com Tue Dec 13 18:20:17 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 13 Dec 2016 19:20:17 +0100 Subject: RFR: Reduce waste in humongous allocations In-Reply-To: <2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com> References: <1481649098.2597.99.camel@redhat.com> <2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com> Message-ID: <1481653217.2597.102.camel@redhat.com> Am Dienstag, den 13.12.2016, 19:09 +0100 schrieb Aleksey Shipilev: > On 12/13/2016 06:11 PM, Roman Kennke wrote: > > as Aleksey has shown, when repeatedly allocating humongous objects, > > we > > tend to leave gaps between them. The reason is that we start > > looking > > for contigous regions starting one region after the current > > (allocation) region, and then discard that alloc region, starting a > > new > > one after the humongous object. > > > > The fix is two-fold: > > - Instead of discarding currently active allocation regions, we re- > > append them to the free-list (together with any free regions that > > we > > skipped while searching a contiguous block). This should be useful, > > e.g. when we have a not-totally-filled alloc region and then > > allocate a > > humongous object. > > - When searching for contigous space, also consider the current > > alloc > > region. The complication here is that we must prevent concurrent > > allocations from it. This patch does it by pre-emptively allocating > > region-sized chunk, which has two effects: it blocks concurrent > > allocations and it tells us if the region is free in a concurrency- > > safe? > > manner. If our search for contiguous block fails, we revert that by > > freeing such regions again. > > > > It passes jtreg tests and SPECjvm. > > > > http://cr.openjdk.java.net/~rkennke/fixhumongousalloc/webrev.00/ > > Ugh. The code got even more confusing than it was before... At this > point I > wonder if acquiring a lock when claiming free regions is saner than > trying to do > this in a lock-free manner. With TLAB allocations, this shouldn't be > that painful? It's not painful in terms of performance, but painful in terms of implemention. We cannot easily acquire the Heap_lock on allocations because the allocation might come out of a write barrier, and that Java thread is not-in-VM (they call into the VM via a cheap leaf-call). We could change that (and have been there already) to use regular calls like, e.g. allocations do, but this opens up a whole new class of other problems. For example, we need oopmaps at write-barriers which, iirc, presented us some serious optimization problems in C2 land. With Roland's work, those might have gone away though (seems like we can well live with control inputs to write barriers now..) We have been there, and it might be The Correct Way to do it, but it's not trivial at all. > Seeing mutations in ShenandoahFreeSet::is_contiguous() makes me all > itchy, it > should be called differently. > > Also, does the code claim the regions one-by-one? What if we have two > competing > multi-region humongous allocations? Does it guarantee to allocate > both (e.g. are > they stepping on each other's toes, preventing global progress?) I guess it could happen. How else could we do it? I know this stuff is a bit nightmarish. Accept that as stop-gap- solution, and re-visit locked allocation with non-leaf-write-barriers and all that stuff later? Roman From shade at redhat.com Tue Dec 13 18:48:38 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 13 Dec 2016 19:48:38 +0100 Subject: RFR: Reduce waste in humongous allocations In-Reply-To: <1481653217.2597.102.camel@redhat.com> References: <1481649098.2597.99.camel@redhat.com> <2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com> <1481653217.2597.102.camel@redhat.com> Message-ID: <4d4e6526-a669-8cd0-1fe5-8411a74e6f75@redhat.com> On 12/13/2016 07:20 PM, Roman Kennke wrote: > Am Dienstag, den 13.12.2016, 19:09 +0100 schrieb Aleksey Shipilev: >> On 12/13/2016 06:11 PM, Roman Kennke wrote: >>> http://cr.openjdk.java.net/~rkennke/fixhumongousalloc/webrev.00/ >> >> Ugh. The code got even more confusing than it was before... At this point >> I wonder if acquiring a lock when claiming free regions is saner than >> trying to do this in a lock-free manner. With TLAB allocations, this >> shouldn't be that painful? > > It's not painful in terms of performance, but painful in terms of > implemention. We cannot easily acquire the Heap_lock on allocations because > the allocation might come out of a write barrier, and that Java thread is > not-in-VM (they call into the VM via a cheap leaf-call). We could change that > (and have been there already) to use regular calls like, e.g. allocations do, > but this opens up a whole new class of other problems. For example, we need > oopmaps at write-barriers which, iirc, presented us some serious optimization > problems in C2 land. With Roland's work, those might have gone away though > (seems like we can well live with control inputs to write barriers now..) OUCH. > We have been there, and it might be The Correct Way to do it, but it's not > trivial at all. We don't need Heap_lock specifically, right? I wonder if we can dispense with a very short-lived spinlock only in ShenandoahFreeSet to trim the lock-free madness down there. >> Also, does the code claim the regions one-by-one? What if we have two >> competing multi-region humongous allocations? Does it guarantee to >> allocate both (e.g. are they stepping on each other's toes, preventing >> global progress?) > > I guess it could happen. How else could we do it? > > I know this stuff is a bit nightmarish. Accept that as stop-gap- solution, > and re-visit locked allocation with non-leaf-write-barriers and all that > stuff later? No, because I think those competing multi-region allocs are very real, and will bite us. Let's push something is not affected by that. So far the cure is worse than a disease :) Thanks, -Aleksey From rwestrel at redhat.com Wed Dec 14 08:06:19 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 14 Dec 2016 09:06:19 +0100 Subject: RFR: Reduce waste in humongous allocations In-Reply-To: <1481653217.2597.102.camel@redhat.com> References: <1481649098.2597.99.camel@redhat.com> <2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com> <1481653217.2597.102.camel@redhat.com> Message-ID: > It's not painful in terms of performance, but painful in terms of > implemention. We cannot easily acquire the Heap_lock on allocations > because the allocation might come out of a write barrier, and that Java > thread is not-in-VM (they call into the VM via a cheap leaf-call). We > could change that (and have been there already) to use regular calls > like, e.g. allocations do, but this opens up a whole new class of other > problems. For example, we need oopmaps at write-barriers which, iirc, > presented us some serious optimization problems in C2 land. With > Roland's work, those might have gone away though (seems like we can > well live with control inputs to write barriers now..) If we have a blocking runtime call at a write barrier then deoptimization at a a write barrier is possible and we need debug info at the write barrier. Having debug info and allowing the write barrier to move around would be quite complicated. Roland. From rkennke at redhat.com Wed Dec 14 09:30:57 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 14 Dec 2016 10:30:57 +0100 Subject: RFR: Reduce waste in humongous allocations In-Reply-To: References: <1481649098.2597.99.camel@redhat.com> <2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com> <1481653217.2597.102.camel@redhat.com> Message-ID: <1481707857.2597.105.camel@redhat.com> Am Mittwoch, den 14.12.2016, 09:06 +0100 schrieb Roland Westrelin: > > It's not painful in terms of performance, but painful in terms of > > implemention. We cannot easily acquire the Heap_lock on allocations > > because the allocation might come out of a write barrier, and that > > Java > > thread is not-in-VM (they call into the VM via a cheap leaf-call). > > We > > could change that (and have been there already) to use regular > > calls > > like, e.g. allocations do, but this opens up a whole new class of > > other > > problems. For example, we need oopmaps at write-barriers which, > > iirc, > > presented us some serious optimization problems in C2 land. With > > Roland's work, those might have gone away though (seems like we can > > well live with control inputs to write barriers now..) > > If we have a blocking runtime call at a write barrier then > deoptimization at a a write barrier is possible and we need debug > info > at the write barrier. Having debug info and allowing the write > barrier > to move around would be quite complicated. Yeah that's the issues we had last time we tried that. I am currently working on a different approach: instead of using a Hotspot Mutex or such, I'm now protecting the allocation code path by a little CAS-based spin lock. Kind of like what we already do for growing the heap, only better :-) It seems to work, only needs some more testing before I propose it for review. Stay tuned :-) Roman From shade at redhat.com Wed Dec 14 11:04:48 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 14 Dec 2016 12:04:48 +0100 Subject: RFR (S): Fix MXBean Full GC notifications Message-ID: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com> Hi, In JMH gc profiler, we have both "alloc" (actual allocations) and "churn" (space freed by collections) counters. For Shenandoah, these counters disagree wildly, because Shenandoah borks notifying MXBeans about Full GCs. Fix: http://cr.openjdk.java.net/~shade/shenandoah/mx-fullgc-notify/webrev.01/ Thanks, -Aleksey From rkennke at redhat.com Wed Dec 14 11:38:06 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 14 Dec 2016 12:38:06 +0100 Subject: RFR (S): Fix MXBean Full GC notifications In-Reply-To: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com> References: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com> Message-ID: <1481715486.2597.106.camel@redhat.com> Am Mittwoch, den 14.12.2016, 12:04 +0100 schrieb Aleksey Shipilev: > Hi, > > In JMH gc profiler, we have both "alloc" (actual allocations) and > "churn" (space > freed by collections) counters. For Shenandoah, these counters > disagree wildly, > because Shenandoah borks notifying MXBeans about Full GCs. > > Fix: > ?http://cr.openjdk.java.net/~shade/shenandoah/mx-fullgc- > notify/webrev.01/ Yep. Roman From ashipile at redhat.com Wed Dec 14 11:56:35 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 14 Dec 2016 11:56:35 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix MXBean Full GC notifications. Message-ID: <201612141156.uBEBuZ72007037@aojmv0008.oracle.com> Changeset: a2d3be7f08ad Author: shade Date: 2016-12-14 12:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a2d3be7f08ad Fix MXBean Full GC notifications. ! src/share/vm/services/memoryManager.cpp ! src/share/vm/services/memoryManager.hpp ! src/share/vm/services/memoryService.cpp ! test/TEST.groups + test/gc/shenandoah/MXNotificationsFullGC.java From shade at redhat.com Wed Dec 14 12:45:41 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 14 Dec 2016 13:45:41 +0100 Subject: RFR (S): Fix MXBean Full GC notifications In-Reply-To: <1481715486.2597.106.camel@redhat.com> References: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com> <1481715486.2597.106.camel@redhat.com> Message-ID: <764d4117-3f41-900c-a1e3-77755f5d21fa@redhat.com> On 12/14/2016 12:38 PM, Roman Kennke wrote: > Am Mittwoch, den 14.12.2016, 12:04 +0100 schrieb Aleksey Shipilev: >> Hi, >> >> In JMH gc profiler, we have both "alloc" (actual allocations) and >> "churn" (space >> freed by collections) counters. For Shenandoah, these counters >> disagree wildly, >> because Shenandoah borks notifying MXBeans about Full GCs. >> >> Fix: >> http://cr.openjdk.java.net/~shade/shenandoah/mx-fullgc- >> notify/webrev.01/ > > > Yep. Of course the test started failing intermittently after I pushed it... This is a follow-up: diff -r a2d3be7f08ad test/gc/shenandoah/MXNotificationsFullGC.java --- a/test/gc/shenandoah/MXNotificationsFullGC.java Wed Dec 14 12:56:20 2016 +0100 +++ b/test/gc/shenandoah/MXNotificationsFullGC.java Wed Dec 14 13:26:22 2016 +0100 @@ -54,6 +54,9 @@ sink = new int[100_000]; } + // GC notifications are asynchronous, wait a little + Thread.sleep(1000); + if (!notified) { throw new IllegalStateException("Should have been notified"); } Does not fail after 50 runs. Ok? Thanks, -Aleksey From rkennke at redhat.com Wed Dec 14 12:55:00 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 14 Dec 2016 13:55:00 +0100 Subject: RFR (S): Fix MXBean Full GC notifications In-Reply-To: <764d4117-3f41-900c-a1e3-77755f5d21fa@redhat.com> References: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com> <1481715486.2597.106.camel@redhat.com> <764d4117-3f41-900c-a1e3-77755f5d21fa@redhat.com> Message-ID: <1481720100.2597.107.camel@redhat.com> Am Mittwoch, den 14.12.2016, 13:45 +0100 schrieb Aleksey Shipilev: > On 12/14/2016 12:38 PM, Roman Kennke wrote: > > Am Mittwoch, den 14.12.2016, 12:04 +0100 schrieb Aleksey Shipilev: > > > Hi, > > > > > > In JMH gc profiler, we have both "alloc" (actual allocations) and > > > "churn" (space > > > freed by collections) counters. For Shenandoah, these counters > > > disagree wildly, > > > because Shenandoah borks notifying MXBeans about Full GCs. > > > > > > Fix: > > > ?http://cr.openjdk.java.net/~shade/shenandoah/mx-fullgc- > > > notify/webrev.01/ > > > > > > Yep. > > Of course the test started failing intermittently after I pushed > it... This is a > follow-up: > > diff -r a2d3be7f08ad test/gc/shenandoah/MXNotificationsFullGC.java > --- a/test/gc/shenandoah/MXNotificationsFullGC.java Wed Dec 14 > 12:56:20 2016 +0100 > +++ b/test/gc/shenandoah/MXNotificationsFullGC.java Wed Dec 14 > 13:26:22 2016 +0100 > @@ -54,6 +54,9 @@ > ????????sink = new int[100_000]; > ?????} > > +????// GC notifications are asynchronous, wait a little > +????Thread.sleep(1000); > + > ?????if (!notified) { > ???????throw new IllegalStateException("Should have been notified"); > ?????} > > Does not fail after 50 runs. > > Ok? Sure. From ashipile at redhat.com Wed Dec 14 12:56:14 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 14 Dec 2016 12:56:14 +0000 Subject: hg: shenandoah/jdk9/hotspot: Workaround GC notification asynchronicity in test/gc/shenandoah/MXNotificationsFullGC. Message-ID: <201612141256.uBECuEox024913@aojmv0008.oracle.com> Changeset: a09a9979e356 Author: shade Date: 2016-12-14 13:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a09a9979e356 Workaround GC notification asynchronicity in test/gc/shenandoah/MXNotificationsFullGC. ! test/gc/shenandoah/MXNotificationsFullGC.java From rkennke at redhat.com Wed Dec 14 15:29:26 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 14 Dec 2016 16:29:26 +0100 Subject: RFR: JVMStat heap region counters Message-ID: <1481729366.2597.111.camel@redhat.com> This adds some infrastructure to monitor each heap region via JVMStat. It currently exposes for each region the number of used and live bytes, plus information whether the region is humongous, in the collection set, or unused (i.e. not yet allocated, when heap is growable). In addition, it provides the number of regions and their size as constants, and flags that tell if marking and evacuation is in progress. For the region data, it uses a packed format so that all info per region fits in one jlong counter. Should save bandwidth, especially when monitoring via network. The names of the counters and their format is documented in the header file. It's subject to changes, especially in the nearer future. It ups the PerfDataMemorySize for Shenandoah, so that we can fit in all those counters. In order to use it, one must provide -XX:+UsePerfData to turn on JVMStat, and -XX:+ShenandoahRegionSampling to provide live region data. The sampling rate can be set via -XX:ShenandoahRegionSamplingRate=$MS that tells the number of milliseconds between samples. The latter both flags can also be turned on via JMX (i.e. writable(Always)) , which is especially useful for temporarily turning on monitoring from a tool. http://cr.openjdk.java.net/~rkennke/regioncounters/webrev.00/ Ok? Roman From shade at redhat.com Wed Dec 14 15:44:37 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 14 Dec 2016 16:44:37 +0100 Subject: RFR: JVMStat heap region counters In-Reply-To: <1481729366.2597.111.camel@redhat.com> References: <1481729366.2597.111.camel@redhat.com> Message-ID: <85320b26-6071-1fdd-35a0-b26f8ec9d74f@redhat.com> On 12/14/2016 04:29 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/regioncounters/webrev.00/ *) I kinda crammed things into ShenandoahConcurrentThread. We can do: heap->monitoring_support()->update_counters(); ...once after the if. Otherwise looks good. Thanks, -Aleksey From roman at kennke.org Wed Dec 14 16:25:11 2016 From: roman at kennke.org (roman at kennke.org) Date: Wed, 14 Dec 2016 16:25:11 +0000 Subject: hg: shenandoah/jdk9/hotspot: JVMStat heap region counters Message-ID: <201612141625.uBEGPBIB025647@aojmv0008.oracle.com> Changeset: 1785c83977e3 Author: rkennke Date: 2016-12-14 17:23 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/1785c83977e3 JVMStat heap region counters ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp + src/share/vm/gc/shenandoah/shenandoahHeapRegionCounters.cpp + src/share/vm/gc/shenandoah/shenandoahHeapRegionCounters.hpp ! src/share/vm/gc/shenandoah/shenandoahMonitoringSupport.cpp ! src/share/vm/gc/shenandoah/shenandoahMonitoringSupport.hpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/runtime/arguments.cpp From rkennke at redhat.com Wed Dec 14 16:25:23 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 14 Dec 2016 17:25:23 +0100 Subject: RFR: JVMStat heap region counters In-Reply-To: <85320b26-6071-1fdd-35a0-b26f8ec9d74f@redhat.com> References: <1481729366.2597.111.camel@redhat.com> <85320b26-6071-1fdd-35a0-b26f8ec9d74f@redhat.com> Message-ID: <1481732723.2597.112.camel@redhat.com> Am Mittwoch, den 14.12.2016, 16:44 +0100 schrieb Aleksey Shipilev: > On 12/14/2016 04:29 PM, Roman Kennke wrote: > > http://cr.openjdk.java.net/~rkennke/regioncounters/webrev.00/ > > *) I kinda crammed things into ShenandoahConcurrentThread. We can do: > > ? heap->monitoring_support()->update_counters(); > > ...once after the if. Ok, I pushed it with the suggested change. Roman From rkennke at redhat.com Wed Dec 14 16:36:29 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 14 Dec 2016 17:36:29 +0100 Subject: RFR: Locked allocation Message-ID: <1481733389.2597.114.camel@redhat.com> This patch throws out all the lockfree allocation madness, and implements a much simpler locked allocation. Since we can't easily use Mutex and friends, and also don't need most of their functionality (wait/notify, nesting, etc), I implemented a very simple (simple as in, can read-and-understand it in one glance) CAS based spin-lock. This is wrapped around the normal allocation path, the humongous allocation path and the heap growing path. It is not locking around the call to full-gc, as this involves other locks and as CHF says, there are alligators there ;-) This does immensely simplify ShenandoahFreeSet, especially the racy humongous allocation path. It does fix the bug that some people have encountered about used not consistent with capacity. I've tested it using gc-bench (no regression in allocation throughput), SPECjvm and jtreg tests. Looks all fine. When reviewing, please pay special attention to the lock in ShenandoahHeap::allocate_memory()! http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/ Ok? Roman From shade at redhat.com Wed Dec 14 17:33:15 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 14 Dec 2016 18:33:15 +0100 Subject: RFR: Locked allocation In-Reply-To: <1481733389.2597.114.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> Message-ID: <7fc26a67-09e8-c5d1-b59d-f825aee6411b@redhat.com> On 12/14/2016 05:36 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/ Impressive! Comments: *) Double/long assert in ShenandoahFreeSet::increase_used. At least break the line, or better yet, combine two asserts in one? *) Outdated comment: 90 // The modulo will take care of wrapping around. *) Also, where *does* it wrap around now? Or we don't need it now, because we guarantee all the previous regions are finally claimed, and no holes left? *) Can we write this: while (_active_end - next > num) { ... as this? while (next + num < _active_end) { ... I think it is a tad more readable: the bound is on the right. *) In RecycleDirtyRegionsClosure, there is no more add_region, why? Was that call superfluous before? 864 _heap->free_regions()->add_region(r); Thanks, -Aleksey From zgu at redhat.com Wed Dec 14 18:10:06 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 14 Dec 2016 13:10:06 -0500 Subject: RFR: Locked allocation In-Reply-To: <1481733389.2597.114.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> Message-ID: <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com> Great job! It simplifies the logic a lot! A few minor suggestions: - ShenandoahFreeSet::clear() I only see one path to this method and it is from safepoint. so replacing fence with safepoint assertion should be appropriate. - asserting on _heap_lock == 1 on code paths that are protected by the lock makes code more readable. - Will this lock be hot? and you want to check safepoint during spinning? I wonder if it has impact on TTSP Thanks, -Zhengyu On 12/14/2016 11:36 AM, Roman Kennke wrote: > This patch throws out all the lockfree allocation madness, and > implements a much simpler locked allocation. Since we can't easily use > Mutex and friends, and also don't need most of their functionality > (wait/notify, nesting, etc), I implemented a very simple (simple as in, > can read-and-understand it in one glance) CAS based spin-lock. This is > wrapped around the normal allocation path, the humongous allocation > path and the heap growing path. It is not locking around the call to > full-gc, as this involves other locks and as CHF says, there are > alligators there ;-) > > This does immensely simplify ShenandoahFreeSet, especially the racy > humongous allocation path. It does fix the bug that some people have > encountered about used not consistent with capacity. > > I've tested it using gc-bench (no regression in allocation throughput), > SPECjvm and jtreg tests. Looks all fine. > > When reviewing, please pay special attention to the lock in > ShenandoahHeap::allocate_memory()! > > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/ > > Ok? > > Roman From zgu at redhat.com Wed Dec 14 18:39:04 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 14 Dec 2016 13:39:04 -0500 Subject: RFR: Locked allocation In-Reply-To: <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com> Message-ID: On 12/14/2016 01:10 PM, Zhengyu Gu wrote: > Great job! It simplifies the logic a lot! > > A few minor suggestions: > > - ShenandoahFreeSet::clear() > > I only see one path to this method and it is from safepoint. so > replacing fence with safepoint assertion should be appropriate. > > - asserting on _heap_lock == 1 on code paths that are protected by the > lock > makes code more readable. Or make _heap_lock an opaque object and store owner thread pointer, so can have assertion like assert(owned_by_self() ...), at least for debug mode. -Zhengyu > > - Will this lock be hot? and you want to check safepoint during spinning? > I wonder if it has impact on TTSP > > Thanks, > > -Zhengyu > > On 12/14/2016 11:36 AM, Roman Kennke wrote: >> This patch throws out all the lockfree allocation madness, and >> implements a much simpler locked allocation. Since we can't easily use >> Mutex and friends, and also don't need most of their functionality >> (wait/notify, nesting, etc), I implemented a very simple (simple as in, >> can read-and-understand it in one glance) CAS based spin-lock. This is >> wrapped around the normal allocation path, the humongous allocation >> path and the heap growing path. It is not locking around the call to >> full-gc, as this involves other locks and as CHF says, there are >> alligators there ;-) >> >> This does immensely simplify ShenandoahFreeSet, especially the racy >> humongous allocation path. It does fix the bug that some people have >> encountered about used not consistent with capacity. >> >> I've tested it using gc-bench (no regression in allocation throughput), >> SPECjvm and jtreg tests. Looks all fine. >> >> When reviewing, please pay special attention to the lock in >> ShenandoahHeap::allocate_memory()! >> >> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/ >> >> Ok? >> >> Roman > From rkennke at redhat.com Wed Dec 14 18:52:10 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 14 Dec 2016 19:52:10 +0100 Subject: RFR: Locked allocation In-Reply-To: <7fc26a67-09e8-c5d1-b59d-f825aee6411b@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <7fc26a67-09e8-c5d1-b59d-f825aee6411b@redhat.com> Message-ID: <1481741530.2597.116.camel@redhat.com> Am Mittwoch, den 14.12.2016, 18:33 +0100 schrieb Aleksey Shipilev: > On 12/14/2016 05:36 PM, Roman Kennke wrote: > > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/ > > Impressive! > > Comments: > > *) Double/long assert in ShenandoahFreeSet::increase_used. At least > break the > line, or better yet, combine two asserts in one? It's a sort of pre- and post-condition, hence the two checks. This guy was driving me nuts (and probably still is), so I'll leave it for now. I'll break the line though. > *) Outdated comment: > 90???// The modulo will take care of wrapping around. Oops. Will remove it. > *) Also, where *does* it wrap around now? Or we don't need it now, > because we > guarantee all the previous regions are finally claimed, and no holes > left? We used a ring-buffer when claiming humongous regions. When we found a region starting at index X away from 'current', then we would re-append all regions between current and X to the end of the list. We couldn't reasonable skip humongous regions concurrently. Now that it's single- threaded, we can simply ignore any humongous regions on the list. No more ring buffer needed, and we can never exceed _max_regions length. > *) Can we write this: > > ?while (_active_end - next > num) { ... > > as this? > > ?while (next + num < _active_end) { ... > > I think it is a tad more readable: the bound is on the right. Yep. Thanks for reminding me of good practices ! :-) > *) In RecycleDirtyRegionsClosure, there is no more add_region, why? > Was that > call superfluous before? Yes. Right after recycling regions, we will clear the free list. This was bogus. Will come with an updated patch shortly. Roman From rkennke at redhat.com Wed Dec 14 18:57:05 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 14 Dec 2016 19:57:05 +0100 Subject: RFR: Locked allocation In-Reply-To: <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com> Message-ID: <1481741825.2597.118.camel@redhat.com> Am Mittwoch, den 14.12.2016, 13:10 -0500 schrieb Zhengyu Gu: > Great job! It simplifies the logic a lot! > > A few minor suggestions: > > - ShenandoahFreeSet::clear() > > ????I only see one path to this method and it is from safepoint. so > ????replacing fence with safepoint assertion should be appropriate. Ah yes. I was thinking it solved the assert that you and others were facing. My reasoning was that other threads within the same safepoint would need to see the update. However, now that I think about it, those other threads would need to go through our new-fangled lock, and thus a CAS, and thus a fence... hmmm. Will need to try again. You may be right and this fence is bogus. > - asserting on _heap_lock == 1 on code paths that are protected by > the lock > ???makes code more readable. Yes. I was actually having the same idea as you and store the locking thread for debug checking, and do an opaque lock object, and even a scoped locker. All that should contribute to sanity. > - Will this lock be hot? I don't think it's very hot. > and you want to check safepoint during spinning? Nope. The whole point of this excerise was to avoid potentially safepointing (and thus requiring oopmap, debug-info, etc blah blah at write barriers) :-) > ???I wonder if it has impact on TTSP I doubt. gc-bench didn't show any such thing. In fact, it might be better than before now, at least when you've got threads racing to allocate humongous objects. The previous code was not even guaranteed to complete (could interleave claiming regions, never finding a contiguous block). Will come up with a patch later. Need food first. ;-) Roman > > Thanks, > > -Zhengyu > > On 12/14/2016 11:36 AM, Roman Kennke wrote: > > This patch throws out all the lockfree allocation madness, and > > implements a much simpler locked allocation. Since we can't easily > > use > > Mutex and friends, and also don't need most of their functionality > > (wait/notify, nesting, etc), I implemented a very simple (simple as > > in, > > can read-and-understand it in one glance) CAS based spin-lock. This > > is > > wrapped around the normal allocation path, the humongous allocation > > path and the heap growing path. It is not locking around the call > > to > > full-gc, as this involves other locks and as CHF says, there are > > alligators there ;-) > > > > This does immensely simplify ShenandoahFreeSet, especially the racy > > humongous allocation path. It does fix the bug that some people > > have > > encountered about used not consistent with capacity. > > > > I've tested it using gc-bench (no regression in allocation > > throughput), > > SPECjvm and jtreg tests. Looks all fine. > > > > When reviewing, please pay special attention to the lock in > > ShenandoahHeap::allocate_memory()! > > > > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/ > > > > Ok? > > > > Roman > > From chf at redhat.com Wed Dec 14 19:04:53 2016 From: chf at redhat.com (Christine Flood) Date: Wed, 14 Dec 2016 14:04:53 -0500 (EST) Subject: I believe I fixed the issues, can I push this? Message-ID: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com> http://cr.openjdk.java.net/~chf/connections/webrev.02/ From rkennke at redhat.com Wed Dec 14 19:09:13 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 14 Dec 2016 20:09:13 +0100 Subject: I believe I fixed the issues, can I push this? In-Reply-To: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com> References: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com> Message-ID: <1481742553.2597.122.camel@redhat.com> I have a half-finished patch that would make a connection matrix during marking, and maintain it using barriers. (I also have the infrastructure to do partial marking... stitching this together will probably give us what we want. Soon.) Other than that, I am fine with you pushing it ;-) Roman > http://cr.openjdk.java.net/~chf/connections/webrev.02/ From zgu at redhat.com Wed Dec 14 19:14:18 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 14 Dec 2016 14:14:18 -0500 Subject: I believe I fixed the issues, can I push this? In-Reply-To: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com> References: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com> Message-ID: <44f4598a-c256-1c38-b0e4-4603864835d6@redhat.com> 1. There is still a line of debugging code in ShenandoahHeap::calculate_matrix() 2. ShenandoahMatrix -> UseShenandoahMatrix to follow the convention. Otherwise, looks good. -Zhengyu On 12/14/2016 02:04 PM, Christine Flood wrote: > http://cr.openjdk.java.net/~chf/connections/webrev.02/ From zgu at redhat.com Wed Dec 14 19:41:09 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 14 Dec 2016 14:41:09 -0500 Subject: RFR: Locked allocation In-Reply-To: <1481741825.2597.118.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com> <1481741825.2597.118.camel@redhat.com> Message-ID: <7a0b2754-c194-4680-fc8f-29dcff0a1213@redhat.com> >> and you want to check safepoint during spinning? > Nope. The whole point of this excerise was to avoid potentially > safepointing (and thus requiring oopmap, debug-info, etc blah blah at > write barriers) :-) Yes, I forgot about safepointing problem. -Zhengyu >> I wonder if it has impact on TTSP > I doubt. gc-bench didn't show any such thing. In fact, it might be > better than before now, at least when you've got threads racing to > allocate humongous objects. The previous code was not even guaranteed > to complete (could interleave claiming regions, never finding a > contiguous block). > > Will come up with a patch later. Need food first. ;-) > > Roman > >> Thanks, >> >> -Zhengyu >> >> On 12/14/2016 11:36 AM, Roman Kennke wrote: >>> This patch throws out all the lockfree allocation madness, and >>> implements a much simpler locked allocation. Since we can't easily >>> use >>> Mutex and friends, and also don't need most of their functionality >>> (wait/notify, nesting, etc), I implemented a very simple (simple as >>> in, >>> can read-and-understand it in one glance) CAS based spin-lock. This >>> is >>> wrapped around the normal allocation path, the humongous allocation >>> path and the heap growing path. It is not locking around the call >>> to >>> full-gc, as this involves other locks and as CHF says, there are >>> alligators there ;-) >>> >>> This does immensely simplify ShenandoahFreeSet, especially the racy >>> humongous allocation path. It does fix the bug that some people >>> have >>> encountered about used not consistent with capacity. >>> >>> I've tested it using gc-bench (no regression in allocation >>> throughput), >>> SPECjvm and jtreg tests. Looks all fine. >>> >>> When reviewing, please pay special attention to the lock in >>> ShenandoahHeap::allocate_memory()! >>> >>> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/ >>> >>> Ok? >>> >>> Roman >> From aph at redhat.com Thu Dec 15 10:15:07 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 15 Dec 2016 10:15:07 +0000 Subject: RFR: Locked allocation In-Reply-To: <1481733389.2597.114.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> Message-ID: <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> On 14/12/16 16:36, Roman Kennke wrote: > When reviewing, please pay special attention to the lock in > ShenandoahHeap::allocate_memory()! I'm always rather nervous about anybody who invents their own spinlocks. It's a code smell: that doesn't mean it's wrong here, but it does deserve attention. I presume the idea here is that the native allocation is going to be fairly rare because threads will usually allocate inline from their own TLABs. However, please consider the situation where a thread holding the lock is descheduled. Andrew. From rkennke at redhat.com Thu Dec 15 11:31:10 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 15 Dec 2016 12:31:10 +0100 Subject: RFR: Locked allocation In-Reply-To: <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> Message-ID: <1481801470.2807.1.camel@redhat.com> Am Donnerstag, den 15.12.2016, 10:15 +0000 schrieb Andrew Haley: > On 14/12/16 16:36, Roman Kennke wrote: > > When reviewing, please pay special attention to the lock in > > ShenandoahHeap::allocate_memory()! > > I'm always rather nervous about anybody who invents their own > spinlocks. Yeah, understandable. We are too, which is why we went to great efforts to implement a lock-free allocatoin scheme a while ago. But it was always buggy and very complex and hard to understand+debug. And humongous allocation was inherently racy: how would you deal with multiple regions in one go, without taking a lock, and while other threads are taking regions from under your feet? The same goes for expanding the heap. And since we couldn't use Mutex (and don't need most of their functionality), the next best way to do it was implement a small cas- based spinlock. Besides, we already have been doing it, for heap expansion, but now it's better (using the right fences, etc). With my upcoming patch, it will also provide a scoped locker, and additional checks, for our sanity. > I presume the idea here is that the native allocation is going to be > fairly rare because threads will usually allocate inline from their > own TLABs. Yes. > ??However, please consider the situation where a thread > holding the lock is descheduled. Yes. We're doing a SpinPause() when spinning, this should get us back to the thread holding the lock quickly. If you have an idea how to improve this, let me know! gc-bench provides a couple of tests that bash the allocation code with multiple threads, and it did not find performance regressions or bugs. Roman From shade at redhat.com Thu Dec 15 11:32:02 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 15 Dec 2016 12:32:02 +0100 Subject: Bug: ReferenceProcessor does from-space writes? Message-ID: <8cbbae41-13e9-8e8e-6128-b2cbc9d061ca@redhat.com> Hi, Our CI brought us this assert: [VM ERROR] o.o.j.t.tearing.buffers.DirectByteBufferInterleaveTest (JVM args: [-Xmx16g, -XX:+ShenandoahStoreCheck, -XX:ShenandoahGCHeuristics=aggressive, -XX:+ShenandoahVerifyOptoBarriers, -XX:+VerifyStrictOopOperations, -XX:+UseShenandoahGC, -XX:-UseCompressedOops, -Xint]) Observed state Occurrences Expectation Interpretation 0, 128, 128 0 ACCEPTABLE Seeing all updates intact. Messages: # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/shenandoahBarrierSet.cpp:272 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/jenkins/workspace/jdk9-shenandoah-fastdebug/hotspot/src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp:272), pid=29026, tid=29028 # assert(o == __null || oopDesc::unsafe_equals(o, resolve_oop_static(o))) failed: only write to-space values hs_err shows this stack: V [libjvm.so+0x15bc01f] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x15f V [libjvm.so+0x15bcdda] VMError::report_and_die(Thread*, char const*, int, char const*, char const*, __va_list_tag*)+0x4a V [libjvm.so+0xa4e87a] report_vm_error(char const*, int, char const*, char const*, ...)+0xea V [libjvm.so+0x13b374f] ShenandoahBarrierSet::write_ref_field_work(void*, oop, bool)+0x11f V [libjvm.so+0x132f0e7] BarrierSet::write_ref_field(void*, oop, bool)+0x57 V [libjvm.so+0x132c6dd] ReferenceProcessor::enqueue_discovered_reflist(DiscoveredList&)+0x71d V [libjvm.so+0x1331582] RefProcEnqueueTask::work(unsigned int)+0xa2 V [libjvm.so+0x162b935] GangWorker::loop()+0xc5 V [libjvm.so+0x12315c2] thread_native_entry(Thread*)+0x112 The code is: ... next_d = java_lang_ref_Reference::discovered(obj); // RB here ... java_lang_ref_Reference::set_next_raw(obj, obj); if (! oopDesc::safe_equals(next_d, obj)) { oopDesc::bs()->write_ref_field( // !!! Oops, re-reading without RB here? java_lang_ref_Reference::discovered_addr(obj), next_d); Most uses of Reference::*_addr seem suspicious to me. Thanks, -Aleksey From rkennke at redhat.com Thu Dec 15 11:36:21 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 15 Dec 2016 12:36:21 +0100 Subject: Bug: ReferenceProcessor does from-space writes? In-Reply-To: <8cbbae41-13e9-8e8e-6128-b2cbc9d061ca@redhat.com> References: <8cbbae41-13e9-8e8e-6128-b2cbc9d061ca@redhat.com> Message-ID: <1481801781.2807.3.camel@redhat.com> This is odd. During marking, we should only enqueue Reference objects that are in to-space. Adding read-barriers into ReferenceProcessor is most likely only hiding the real bug. The most likely cause is failing to mark a Reference object in previous cycle, thus not evacuating it.... Is this reproducible? Roman Am Donnerstag, den 15.12.2016, 12:32 +0100 schrieb Aleksey Shipilev: > Hi, > > Our CI brought us this assert: > > [VM ERROR] o.o.j.t.tearing.buffers.DirectByteBufferInterleaveTest > ????(JVM args: [-Xmx16g, -XX:+ShenandoahStoreCheck, > -XX:ShenandoahGCHeuristics=aggressive, > -XX:+ShenandoahVerifyOptoBarriers, > -XX:+VerifyStrictOopOperations, -XX:+UseShenandoahGC, -XX:- > UseCompressedOops, > -Xint]) > ? Observed state???Occurrences???Expectation??Interpretation > > ?????0, 128, 128?????????????0????ACCEPTABLE??Seeing all updates > intact. > > > ????Messages: > ????????# To suppress the following error report, specify this > argument > ????????# after -XX: or in .hotspotrc: > SuppressErrorAt=/shenandoahBarrierSet.cpp:272 > ????????# > ????????# A fatal error has been detected by the Java Runtime > Environment: > ????????# > ????????#??Internal Error > (/opt/jenkins/workspace/jdk9-shenandoah- > fastdebug/hotspot/src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp > :272), > pid=29026, tid=29028 > ????????#??assert(o == __null || oopDesc::unsafe_equals(o, > resolve_oop_static(o))) failed: only write to-space values > > > hs_err shows this stack: > > V??[libjvm.so+0x15bc01f]??VMError::report_and_die(int, char const*, > char const*, > __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, > int, > unsigned long)+0x15f > V??[libjvm.so+0x15bcdda]??VMError::report_and_die(Thread*, char > const*, int, > char const*, char const*, __va_list_tag*)+0x4a > V??[libjvm.so+0xa4e87a]??report_vm_error(char const*, int, char > const*, char > const*, ...)+0xea > V??[libjvm.so+0x13b374f]??ShenandoahBarrierSet::write_ref_field_work( > void*, oop, > bool)+0x11f > V??[libjvm.so+0x132f0e7]??BarrierSet::write_ref_field(void*, oop, > bool)+0x57 > V??[libjvm.so+0x132c6dd] > ReferenceProcessor::enqueue_discovered_reflist(DiscoveredList&)+0x71d > V??[libjvm.so+0x1331582]??RefProcEnqueueTask::work(unsigned int)+0xa2 > V??[libjvm.so+0x162b935]??GangWorker::loop()+0xc5 > V??[libjvm.so+0x12315c2]??thread_native_entry(Thread*)+0x112 > > The code is: > > ????... > ????next_d = java_lang_ref_Reference::discovered(obj); // RB here > ????... > ????java_lang_ref_Reference::set_next_raw(obj, obj); > ????if (! oopDesc::safe_equals(next_d, obj)) { > ??????oopDesc::bs()->write_ref_field( > ???????????// !!! Oops, re-reading without RB here? > ???????????java_lang_ref_Reference::discovered_addr(obj), > ???????????next_d); > > Most uses of Reference::*_addr seem suspicious to me. > > Thanks, > -Aleksey > From shade at redhat.com Thu Dec 15 11:43:21 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 15 Dec 2016 12:43:21 +0100 Subject: Bug: ReferenceProcessor does from-space writes? In-Reply-To: <1481801781.2807.3.camel@redhat.com> References: <8cbbae41-13e9-8e8e-6128-b2cbc9d061ca@redhat.com> <1481801781.2807.3.camel@redhat.com> Message-ID: <9435ace2-37b1-1429-3e1c-862bcbdb8741@redhat.com> Reproduced in CI two times, failed to reproduce locally. -Aleksey On 12/15/2016 12:36 PM, Roman Kennke wrote: > This is odd. > > During marking, we should only enqueue Reference objects that are in > to-space. Adding read-barriers into ReferenceProcessor is most likely > only hiding the real bug. The most likely cause is failing to mark a > Reference object in previous cycle, thus not evacuating it.... > > Is this reproducible? > > Roman > > Am Donnerstag, den 15.12.2016, 12:32 +0100 schrieb Aleksey Shipilev: >> Hi, >> >> Our CI brought us this assert: >> >> [VM ERROR] o.o.j.t.tearing.buffers.DirectByteBufferInterleaveTest >> (JVM args: [-Xmx16g, -XX:+ShenandoahStoreCheck, >> -XX:ShenandoahGCHeuristics=aggressive, >> -XX:+ShenandoahVerifyOptoBarriers, >> -XX:+VerifyStrictOopOperations, -XX:+UseShenandoahGC, -XX:- >> UseCompressedOops, >> -Xint]) >> Observed state Occurrences Expectation Interpretation >> >> 0, 128, 128 0 ACCEPTABLE Seeing all updates >> intact. >> >> >> Messages: >> # To suppress the following error report, specify this >> argument >> # after -XX: or in .hotspotrc: >> SuppressErrorAt=/shenandoahBarrierSet.cpp:272 >> # >> # A fatal error has been detected by the Java Runtime >> Environment: >> # >> # Internal Error >> (/opt/jenkins/workspace/jdk9-shenandoah- >> fastdebug/hotspot/src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp >> :272), >> pid=29026, tid=29028 >> # assert(o == __null || oopDesc::unsafe_equals(o, >> resolve_oop_static(o))) failed: only write to-space values >> >> >> hs_err shows this stack: >> >> V [libjvm.so+0x15bc01f] VMError::report_and_die(int, char const*, >> char const*, >> __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, >> int, >> unsigned long)+0x15f >> V [libjvm.so+0x15bcdda] VMError::report_and_die(Thread*, char >> const*, int, >> char const*, char const*, __va_list_tag*)+0x4a >> V [libjvm.so+0xa4e87a] report_vm_error(char const*, int, char >> const*, char >> const*, ...)+0xea >> V [libjvm.so+0x13b374f] ShenandoahBarrierSet::write_ref_field_work( >> void*, oop, >> bool)+0x11f >> V [libjvm.so+0x132f0e7] BarrierSet::write_ref_field(void*, oop, >> bool)+0x57 >> V [libjvm.so+0x132c6dd] >> ReferenceProcessor::enqueue_discovered_reflist(DiscoveredList&)+0x71d >> V [libjvm.so+0x1331582] RefProcEnqueueTask::work(unsigned int)+0xa2 >> V [libjvm.so+0x162b935] GangWorker::loop()+0xc5 >> V [libjvm.so+0x12315c2] thread_native_entry(Thread*)+0x112 >> >> The code is: >> >> ... >> next_d = java_lang_ref_Reference::discovered(obj); // RB here >> ... >> java_lang_ref_Reference::set_next_raw(obj, obj); >> if (! oopDesc::safe_equals(next_d, obj)) { >> oopDesc::bs()->write_ref_field( >> // !!! Oops, re-reading without RB here? >> java_lang_ref_Reference::discovered_addr(obj), >> next_d); >> >> Most uses of Reference::*_addr seem suspicious to me. >> >> Thanks, >> -Aleksey >> From aph at redhat.com Thu Dec 15 11:44:37 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 15 Dec 2016 11:44:37 +0000 Subject: RFR: Locked allocation In-Reply-To: <1481801470.2807.1.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> <1481801470.2807.1.camel@redhat.com> Message-ID: On 15/12/16 11:31, Roman Kennke wrote: > Am Donnerstag, den 15.12.2016, 10:15 +0000 schrieb Andrew Haley: >> On 14/12/16 16:36, Roman Kennke wrote: >>> When reviewing, please pay special attention to the lock in ShenandoahHeap::allocate_memory()! > >> However, please consider the situation where a thread holding the lock is descheduled. > > Yes. We're doing a SpinPause() when spinning, this should get us back to the thread holding the lock quickly. If you have an idea how to improve this, let me know! Please have a look at the way SpinPause() is defined! Maybe it's worth looking at backoff after spinning for a while. But it's very hard to test for consistent behaviour under extreme conditions. Allocating very large objects is quite likely to result in page faults, and therefore quite likely to cause a thread to be descheduled. On a heavily loaded system I would expect long delays for page faults, while the lock is held. I fear that it's very tempting to design Shenandoah so that it behaves extremely well when it's not being "abused". > gc-bench provides a couple of tests that bash the allocation code with multiple threads, and it did not find performance regressions or bugs. Sure, but I'm thinking about systems which are overloaded. I don't know if gc-bench would help there. I presume that you have considered allocating humongous object outside of Shenandoah's regions altogether. But even mentioning such a thing takes me way outside my area of expertise, so... Andrew. From shade at redhat.com Thu Dec 15 11:55:07 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 15 Dec 2016 12:55:07 +0100 Subject: RFR: Locked allocation In-Reply-To: References: <1481733389.2597.114.camel@redhat.com> <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> <1481801470.2807.1.camel@redhat.com> Message-ID: <07f6da55-a37c-efd4-a14f-d3113b336a9d@redhat.com> On 12/15/2016 12:44 PM, Andrew Haley wrote: > Maybe it's worth looking at backoff after spinning for a while. But > it's very hard to test for consistent behaviour under extreme > conditions. Allocating very large objects is quite likely to > result in page faults, and therefore quite likely to cause a > thread to be descheduled. On a heavily loaded system I would > expect long delays for page faults, while the lock is held. Generally true. But I think current change only covers the freelist/region manipulation work, which should complete very quickly. The initialization (which is the hard part of doing "new" on large Java objects) should and will happen outside the spinlocked path. Think about this as coarsening the current juggling-the-knives lock-free mechanics with a spinlocked entry to the small critical section. We are not expected to do any heavy-lifting while holding that lock. This minimizes the need for sophisticated backoffs, etc. Pretty much how we don't usually think about backoffs with lock-free update code :) Thanks, -Aleksey From rkennke at redhat.com Thu Dec 15 12:01:05 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 15 Dec 2016 13:01:05 +0100 Subject: RFR: Locked allocation In-Reply-To: References: <1481733389.2597.114.camel@redhat.com> <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> <1481801470.2807.1.camel@redhat.com> Message-ID: <1481803265.2807.5.camel@redhat.com> Am Donnerstag, den 15.12.2016, 11:44 +0000 schrieb Andrew Haley: > On 15/12/16 11:31, Roman Kennke wrote: > > Am Donnerstag, den 15.12.2016, 10:15 +0000 schrieb Andrew Haley: > > > On 14/12/16 16:36, Roman Kennke wrote: > > > > When reviewing, please pay special attention to the lock in > > > > ShenandoahHeap::allocate_memory()! > > > However, please consider the situation where a thread holding the > > > lock is descheduled. > > > > Yes. We're doing a SpinPause() when spinning, this should get us > > back to the thread holding the lock quickly. If you have an idea > > how to improve this, let me know! > > Please have a look at the way SpinPause() is defined! I did. > Maybe it's worth looking at backoff after spinning for a while.??But > it's very hard to test for consistent behaviour under extreme > conditions.??Allocating very large objects is quite likely to > result in page faults, and therefore quite likely to cause a > thread to be descheduled.??On a heavily loaded system I would > expect long delays for page faults, while the lock is held. > > I fear that it's very tempting to design Shenandoah so that it > behaves extremely well when it's not being "abused". > > > gc-bench provides a couple of tests that bash the allocation code > > with multiple threads, and it did not find performance regressions > > or bugs. > > Sure, but I'm thinking about systems which are overloaded.??I don't > know if gc-bench would help there. I think it's specifically designed to abuse the GC as much as we can. ;-) Aleksey even wrote a test that allocates arrays without initializing them, cranking out alloc rates in the 100s of GB/sec ... cannot really do that with ordinary Java code, but should abuse the GC quite a lot. :-D And I firmly believe that doing a simple lock around the allocation code is much more resistent to abuse than the previous implementation, where multiple threads racing to allocate humongous objects could lock- step each other, I think it couldn't even guarantee to complete... it's much better now I think. Also, speaking of code smell... the previous lock-free code, well, 'code smell' is not the right word for it ;-) stinking pile of.. well, you get the idea ;-) > I presume that you have considered allocating humongous object > outside > of Shenandoah's regions altogether.??But even mentioning such a thing > takes me way outside my area of expertise, so... yeah... nope ;-) http://replycandy.com/wp-content/uploads/Godzilla-Nope-Response-Meme.jp g (thanks shade for pointing me to the picture ;-) ) Roman From aph at redhat.com Thu Dec 15 12:01:56 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 15 Dec 2016 12:01:56 +0000 Subject: RFR: Locked allocation In-Reply-To: <07f6da55-a37c-efd4-a14f-d3113b336a9d@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> <1481801470.2807.1.camel@redhat.com> <07f6da55-a37c-efd4-a14f-d3113b336a9d@redhat.com> Message-ID: On 15/12/16 11:55, Aleksey Shipilev wrote: > Think about this as coarsening the current juggling-the-knives > lock-free mechanics with a spinlocked entry to the small critical > section. We are not expected to do any heavy-lifting while holding > that lock. This minimizes the need for sophisticated backoffs, > etc. Pretty much how we don't usually think about backoffs with > lock-free update code :) OK. Andrew. From rwestrel at redhat.com Thu Dec 15 12:04:55 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 15 Dec 2016 13:04:55 +0100 Subject: RFR: Locked allocation In-Reply-To: <1481801470.2807.1.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> <1481801470.2807.1.camel@redhat.com> Message-ID: > And since we couldn't use Mutex (and don't need most of their > functionality), the next best way to do it was implement a small cas- > based spinlock. Even a VM Mutex with the no_safepoint_check_flag? Roland. From rkennke at redhat.com Thu Dec 15 12:10:57 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 15 Dec 2016 13:10:57 +0100 Subject: RFR: Locked allocation In-Reply-To: References: <1481733389.2597.114.camel@redhat.com> <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> <1481801470.2807.1.camel@redhat.com> Message-ID: <1481803857.2807.7.camel@redhat.com> Am Donnerstag, den 15.12.2016, 13:04 +0100 schrieb Roland Westrelin: > > And since we couldn't use Mutex (and don't need most of their > > functionality), the next best way to do it was implement a small > > cas- > > based spinlock. > > Even a VM Mutex with the no_safepoint_check_flag? One issue was that Mutex was expecting the thread in VM, unless it's rank special. We can only be in VM when we have a non-leaf call at the write barrier. If I make the lock ranked 'special' I run into asserts that check correct lock ordering. We need to allocate stuff when evacuating roots, and this is holding the CodeCache_lock which is also ranked 'special' etc pp. We could probably add some extra code for Shenandoah to Mutex that avoids alls this stuff, but would that be better than implementing the simple lock as I did? Roman From rwestrel at redhat.com Thu Dec 15 12:23:49 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 15 Dec 2016 13:23:49 +0100 Subject: RFR: Locked allocation In-Reply-To: <1481803857.2807.7.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com> <1481801470.2807.1.camel@redhat.com> <1481803857.2807.7.camel@redhat.com> Message-ID: > One issue was that Mutex was expecting the thread in VM, unless it's > rank special. We can only be in VM when we have a non-leaf call at the > write barrier. > > If I make the lock ranked 'special' I run into asserts that check > correct lock ordering. We need to allocate stuff when evacuating roots, > and this is holding the CodeCache_lock which is also ranked 'special' > etc pp. Ok. Thanks. Roland. From rkennke at redhat.com Thu Dec 15 14:40:49 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 15 Dec 2016 15:40:49 +0100 Subject: RFR: Locked allocation In-Reply-To: <1481741825.2597.118.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com> <1481741825.2597.118.camel@redhat.com> Message-ID: <1481812849.2807.10.camel@redhat.com> So, here comes the update. - I improved the lock to be a scoped locker (similar to MutexLocker), this should help to keep things in order. - It also keeps track of the locking thread in debug builds, and provides asserts that the current thread holds the lock. - I added this check in a few places in ShenandoahFreeSet, and then realized that consequently I should also require the same lock in any code that reads or modifies the ShenandoahFreeSet structure. Therefore I added locking to the few places that build the free list. While strictly speaking this is overkill, it doesn't hurt either. - I also found the reason for the assert: the implementations of current() and next() have been a little inconsistent, which lead to allocating thread seeing the same region 2x when hitting the upper boundary (i.e. shortly before OOM), and therefore accounting the remaining free space 2x. I changed current() to only return the region at the current ptr, and next() to only advance that ptr (but not returning anything), and adjusted calling code. - I fixed the few things that Aleksey and Zhengyu mentioned too. Tested with specjvm, jmh-specjvm, gc-bench, jtreg in release and fastdebug. http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.01 Ok? Roman Am Mittwoch, den 14.12.2016, 19:57 +0100 schrieb Roman Kennke: > Am Mittwoch, den 14.12.2016, 13:10 -0500 schrieb Zhengyu Gu: > > Great job! It simplifies the logic a lot! > > > > A few minor suggestions: > > > > - ShenandoahFreeSet::clear() > > > > ????I only see one path to this method and it is from safepoint. so > > ????replacing fence with safepoint assertion should be appropriate. > > Ah yes. I was thinking it solved the assert that you and others were > facing. My reasoning was that other threads within the same safepoint > would need to see the update. However, now that I think about it, > those > other threads would need to go through our new-fangled lock, and thus > a > CAS, and thus a fence...??hmmm. Will need to try again. You may be > right and this fence is bogus. > > > - asserting on _heap_lock == 1 on code paths that are protected by > > the lock > > ???makes code more readable. > > Yes. I was actually having the same idea as you and store the locking > thread for debug checking, and do an opaque lock object, and even a > scoped locker. All that should contribute to sanity. > > > - Will this lock be hot? > > I don't think it's very hot. > > > ?and you want to check safepoint during spinning? > > Nope. The whole point of this excerise was to avoid potentially > safepointing (and thus requiring oopmap, debug-info, etc blah blah at > write barriers) :-) > > > ???I wonder if it has impact on TTSP > > I doubt. gc-bench didn't show any such thing. In fact, it might be > better than before now, at least when you've got threads racing to > allocate humongous objects. The previous code was not even guaranteed > to complete (could interleave claiming regions, never finding a > contiguous block). > > Will come up with a patch later. Need food first. ;-) > > Roman > > > > > Thanks, > > > > -Zhengyu > > > > On 12/14/2016 11:36 AM, Roman Kennke wrote: > > > This patch throws out all the lockfree allocation madness, and > > > implements a much simpler locked allocation. Since we can't > > > easily > > > use > > > Mutex and friends, and also don't need most of their > > > functionality > > > (wait/notify, nesting, etc), I implemented a very simple (simple > > > as > > > in, > > > can read-and-understand it in one glance) CAS based spin-lock. > > > This > > > is > > > wrapped around the normal allocation path, the humongous > > > allocation > > > path and the heap growing path. It is not locking around the call > > > to > > > full-gc, as this involves other locks and as CHF says, there are > > > alligators there ;-) > > > > > > This does immensely simplify ShenandoahFreeSet, especially the > > > racy > > > humongous allocation path. It does fix the bug that some people > > > have > > > encountered about used not consistent with capacity. > > > > > > I've tested it using gc-bench (no regression in allocation > > > throughput), > > > SPECjvm and jtreg tests. Looks all fine. > > > > > > When reviewing, please pay special attention to the lock in > > > ShenandoahHeap::allocate_memory()! > > > > > > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/ > > > > > > Ok? > > > > > > Roman > > > > From shade at redhat.com Thu Dec 15 14:53:10 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 15 Dec 2016 15:53:10 +0100 Subject: RFR: Locked allocation In-Reply-To: <1481812849.2807.10.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com> <1481741825.2597.118.camel@redhat.com> <1481812849.2807.10.camel@redhat.com> Message-ID: <3f866843-01a9-564c-a300-02a256cdc5b8@redhat.com> On 12/15/2016 03:40 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.01 Looks good. Minor nit: asserts in ShenandoahHeapLock may use ShenandoahHeap::assert_heaplock_owned_by_current_thread? Thanks, -Aleksey From zgu at redhat.com Thu Dec 15 15:11:39 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 15 Dec 2016 10:11:39 -0500 Subject: RFR: Locked allocation In-Reply-To: <1481812849.2807.10.camel@redhat.com> References: <1481733389.2597.114.camel@redhat.com> <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com> <1481741825.2597.118.camel@redhat.com> <1481812849.2807.10.camel@redhat.com> Message-ID: <6481f1c3-9e7c-226e-55b7-3e8eb87d41c0@redhat.com> Looks good! One minor thing: ShenandoahHeap::assert_heaplock_owned_by_current_thread() can be debug_only method. Thanks, -Zhengyu On 12/15/2016 09:40 AM, Roman Kennke wrote: > So, here comes the update. > > - I improved the lock to be a scoped locker (similar to MutexLocker), > this should help to keep things in order. > - It also keeps track of the locking thread in debug builds, and > provides asserts that the current thread holds the lock. > - I added this check in a few places in ShenandoahFreeSet, and then > realized that consequently I should also require the same lock in any > code that reads or modifies the ShenandoahFreeSet structure. Therefore > I added locking to the few places that build the free list. While > strictly speaking this is overkill, it doesn't hurt either. > > - I also found the reason for the assert: the implementations of > current() and next() have been a little inconsistent, which lead to > allocating thread seeing the same region 2x when hitting the upper > boundary (i.e. shortly before OOM), and therefore accounting the > remaining free space 2x. I changed current() to only return the region > at the current ptr, and next() to only advance that ptr (but not > returning anything), and adjusted calling code. > > - I fixed the few things that Aleksey and Zhengyu mentioned too. > > Tested with specjvm, jmh-specjvm, gc-bench, jtreg in release and > fastdebug. > > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.01 > > Ok? > > Roman > > > Am Mittwoch, den 14.12.2016, 19:57 +0100 schrieb Roman Kennke: >> Am Mittwoch, den 14.12.2016, 13:10 -0500 schrieb Zhengyu Gu: >>> Great job! It simplifies the logic a lot! >>> >>> A few minor suggestions: >>> >>> - ShenandoahFreeSet::clear() >>> >>> I only see one path to this method and it is from safepoint. so >>> replacing fence with safepoint assertion should be appropriate. >> Ah yes. I was thinking it solved the assert that you and others were >> facing. My reasoning was that other threads within the same safepoint >> would need to see the update. However, now that I think about it, >> those >> other threads would need to go through our new-fangled lock, and thus >> a >> CAS, and thus a fence... hmmm. Will need to try again. You may be >> right and this fence is bogus. >> >>> - asserting on _heap_lock == 1 on code paths that are protected by >>> the lock >>> makes code more readable. >> Yes. I was actually having the same idea as you and store the locking >> thread for debug checking, and do an opaque lock object, and even a >> scoped locker. All that should contribute to sanity. >> >>> - Will this lock be hot? >> I don't think it's very hot. >> >>> and you want to check safepoint during spinning? >> Nope. The whole point of this excerise was to avoid potentially >> safepointing (and thus requiring oopmap, debug-info, etc blah blah at >> write barriers) :-) >> >>> I wonder if it has impact on TTSP >> I doubt. gc-bench didn't show any such thing. In fact, it might be >> better than before now, at least when you've got threads racing to >> allocate humongous objects. The previous code was not even guaranteed >> to complete (could interleave claiming regions, never finding a >> contiguous block). >> >> Will come up with a patch later. Need food first. ;-) >> >> Roman >> >>> Thanks, >>> >>> -Zhengyu >>> >>> On 12/14/2016 11:36 AM, Roman Kennke wrote: >>>> This patch throws out all the lockfree allocation madness, and >>>> implements a much simpler locked allocation. Since we can't >>>> easily >>>> use >>>> Mutex and friends, and also don't need most of their >>>> functionality >>>> (wait/notify, nesting, etc), I implemented a very simple (simple >>>> as >>>> in, >>>> can read-and-understand it in one glance) CAS based spin-lock. >>>> This >>>> is >>>> wrapped around the normal allocation path, the humongous >>>> allocation >>>> path and the heap growing path. It is not locking around the call >>>> to >>>> full-gc, as this involves other locks and as CHF says, there are >>>> alligators there ;-) >>>> >>>> This does immensely simplify ShenandoahFreeSet, especially the >>>> racy >>>> humongous allocation path. It does fix the bug that some people >>>> have >>>> encountered about used not consistent with capacity. >>>> >>>> I've tested it using gc-bench (no regression in allocation >>>> throughput), >>>> SPECjvm and jtreg tests. Looks all fine. >>>> >>>> When reviewing, please pay special attention to the lock in >>>> ShenandoahHeap::allocate_memory()! >>>> >>>> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/ >>>> >>>> Ok? >>>> >>>> Roman >>> From roman at kennke.org Thu Dec 15 15:51:24 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 15 Dec 2016 15:51:24 +0000 Subject: hg: shenandoah/jdk9/hotspot: Locked allocation Message-ID: <201612151551.uBFFpO76003765@aojmv0008.oracle.com> Changeset: 9fc91ebeb858 Author: rkennke Date: 2016-12-15 16:50 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9fc91ebeb858 Locked allocation ! src/share/vm/gc/shenandoah/shenandoahFreeSet.cpp ! src/share/vm/gc/shenandoah/shenandoahFreeSet.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp From rkennke at redhat.com Thu Dec 15 15:54:32 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 15 Dec 2016 16:54:32 +0100 Subject: RFR: JDK8 C2 fixes Message-ID: <1481817272.2807.12.camel@redhat.com> This change fixes two problems in library_call.cpp: - in inline_unsafe_access(), read-barriers should be moved up, otherwise we'd have one store in the else branch that does not have a read-barrier on its value. - for arraycopies, we must not turn oop-copies into int-copies, this would bypass the post-barrier that updates our references. With those changes, derby passes again without crashing. Ok? http://cr.openjdk.java.net/~rkennke/jdk8-c2-fix/webrev.00/ Roman From rwestrel at redhat.com Thu Dec 15 15:57:16 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 15 Dec 2016 16:57:16 +0100 Subject: RFR: JDK8 C2 fixes In-Reply-To: <1481817272.2807.12.camel@redhat.com> References: <1481817272.2807.12.camel@redhat.com> Message-ID: > - in inline_unsafe_access(), read-barriers should be moved up, > otherwise we'd have one store in the else branch that does not have a > read-barrier on its value. Is this one required? The else branch stores outside the heap as I understand. > - for arraycopies, we must not turn oop-copies into int-copies, this > would bypass the post-barrier that updates our references. Ok. Roland. From rkennke at redhat.com Thu Dec 15 15:58:00 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 15 Dec 2016 16:58:00 +0100 Subject: RFR: Fix freeze when running OOM during write barrier Message-ID: <1481817480.2807.14.camel@redhat.com> We sometimes freeze when a write-barrier runs out of memory. Reason is the recent refactoring in our driver thread: we would skip turning off evacuation, however Java threads are waiting for this to happen. They'll wait indefinitely, and thus never return to a safepoint. http://cr.openjdk.java.net/~rkennke/fixfreeze/webrev.00/ Ok to push? Roman From shade at redhat.com Thu Dec 15 15:59:58 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 15 Dec 2016 16:59:58 +0100 Subject: RFR: Fix freeze when running OOM during write barrier In-Reply-To: <1481817480.2807.14.camel@redhat.com> References: <1481817480.2807.14.camel@redhat.com> Message-ID: On 12/15/2016 04:58 PM, Roman Kennke wrote: > We sometimes freeze when a write-barrier runs out of memory. Reason is > the recent refactoring in our driver thread: we would skip turning off > evacuation, however Java threads are waiting for this to happen. > They'll wait indefinitely, and thus never return to a safepoint. > > http://cr.openjdk.java.net/~rkennke/fixfreeze/webrev.00/ Yes. Sorry about this. -Aleksey From roman at kennke.org Thu Dec 15 16:01:15 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 15 Dec 2016 16:01:15 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix freeze when running OOM during write barrier Message-ID: <201612151601.uBFG1F7Z006451@aojmv0008.oracle.com> Changeset: 9935fc55ebc2 Author: rkennke Date: 2016-12-15 17:00 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9935fc55ebc2 Fix freeze when running OOM during write barrier ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp From rkennke at redhat.com Thu Dec 15 16:09:42 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 15 Dec 2016 17:09:42 +0100 Subject: RFR: JDK8 C2 fixes In-Reply-To: References: <1481817272.2807.12.camel@redhat.com> Message-ID: <1481818182.2807.15.camel@redhat.com> Am Donnerstag, den 15.12.2016, 16:57 +0100 schrieb Roland Westrelin: > > - in inline_unsafe_access(), read-barriers should be moved up, > > otherwise we'd have one store in the else branch that does not have > > a > > read-barrier on its value. > > Is this one required? The else branch stores outside the heap as I > understand. You are right, it's not needed. The problem goes away just with the arraycopy fix: http://cr.openjdk.java.net/~rkennke/jdk8-c2-fix/webrev.01/ Will push that then... Roman From roman at kennke.org Thu Dec 15 16:10:45 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 15 Dec 2016 16:10:45 +0000 Subject: hg: shenandoah/jdk8u/hotspot: Prevent C2 optimization that turns oop arraycopy into int arraycopy and elide the required post-barrier. Message-ID: <201612151610.uBFGAjqw008754@aojmv0008.oracle.com> Changeset: cb8a8ef885c3 Author: rkennke Date: 2016-12-15 17:10 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/cb8a8ef885c3 Prevent C2 optimization that turns oop arraycopy into int arraycopy and elide the required post-barrier. ! src/share/vm/opto/library_call.cpp From rkennke at redhat.com Thu Dec 15 16:34:11 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 15 Dec 2016 17:34:11 +0100 Subject: RFR: Fix ReferenceProcessor related assert Message-ID: <1481819651.2807.17.camel@redhat.com> Aleksey recently found an assert: #??Internal Error (/opt/jenkins/workspace/jdk9-shenandoah- fastdebug/hotspot/src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp:2 72), pid=29026, tid=29028 #??assert(o == __null || oopDesc::unsafe_equals(o, resolve_oop_static(o))) failed: only write to-space values coming from: V??[libjvm.so+0xa4e87a]??report_vm_error(char const*, int, char const*, char const*, ...)+0xea V??[libjvm.so+0x13b374f]??ShenandoahBarrierSet::write_ref_field_work(vo id*, oop, bool)+0x11f V??[libjvm.so+0x132f0e7]??BarrierSet::write_ref_field(void*, oop, bool)+0x57 V??[libjvm.so+0x132c6dd]??ReferenceProcessor::enqueue_discovered_reflis t(DiscoveredList&)+0x71d I think this is harmless, but needs some treatment. What happens is this: in enqueue_discovered_reflist() it calls swap_reference_pending_list() which can give us a from-space reference (GC roots in Universe get updated after weakref processing!). Then it stores that in set_discovered_raw() which is ok, because that does the correct read-barrier before storing, but then goes on to call write_ref_field() which, for Shenandoah, only does assert a few things. And blows up when it gets a from-space reference. The cheapest fix is to do a read-barrier in debug build. http://cr.openjdk.java.net/~rkennke/fixrefproc/webrev.00/ Ok to push? Roman From shade at redhat.com Thu Dec 15 16:40:48 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 15 Dec 2016 17:40:48 +0100 Subject: RFR: Fix ReferenceProcessor related assert In-Reply-To: <1481819651.2807.17.camel@redhat.com> References: <1481819651.2807.17.camel@redhat.com> Message-ID: <0830ab17-4c99-6728-78cc-cf3b9a4cdc5d@redhat.com> On 12/15/2016 05:34 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/fixrefproc/webrev.00/ Looks okay. -Aleksey From roman at kennke.org Thu Dec 15 16:41:51 2016 From: roman at kennke.org (roman at kennke.org) Date: Thu, 15 Dec 2016 16:41:51 +0000 Subject: hg: shenandoah/jdk9/hotspot: Fix assert coming from ReferenceProcessor. Message-ID: <201612151641.uBFGfqxm016867@aojmv0008.oracle.com> Changeset: d9e673adfa1c Author: rkennke Date: 2016-12-15 17:41 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/d9e673adfa1c Fix assert coming from ReferenceProcessor. ! src/share/vm/gc/shared/referenceProcessor.cpp From zgu at redhat.com Thu Dec 15 17:43:31 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 15 Dec 2016 12:43:31 -0500 Subject: RFR: Use heuristics to determine the number of conc threads for each conc gc cycle Message-ID: <8bd11a25-988c-3862-51ae-c23c683ae41a@redhat.com> This is an experimental heuristics that determines the number of concurrent threads for each concurrent GC cycle. SPECjbb runs do not show obvious improvement, it seems to ramp up load quickly, so conc thread count stays high. The change set also contains some cleanup. http://cr.openjdk.java.net/~zgu/shenandoah/conc-worker-heuristics/webrev.00/ Test: SPECjbb, some of SPECjvm benchmarks. Thanks, -Zhengyu From chf at redhat.com Fri Dec 16 13:36:48 2016 From: chf at redhat.com (chf at redhat.com) Date: Fri, 16 Dec 2016 13:36:48 +0000 Subject: hg: shenandoah/jdk9/hotspot: Connection Matrix Message-ID: <201612161336.uBGDamuI003692@aojmv0008.oracle.com> Changeset: c5cd9ee7a881 Author: chf Date: 2016-12-15 14:32 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/c5cd9ee7a881 Connection Matrix ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.cpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.hpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp From zgu at redhat.com Fri Dec 16 14:40:34 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 16 Dec 2016 09:40:34 -0500 Subject: RFR:(XS): Small enhancement for large allocation Message-ID: When large allocation fails, current implementation only grows heap by 1 and retry. This is slightly inefficient. We can grow the heap by required regions at once, to avoid unnecessary loop. http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.00/ Thanks, -Zhengyu From shade at redhat.com Fri Dec 16 14:48:42 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 16 Dec 2016 15:48:42 +0100 Subject: RFR:(XS): Small enhancement for large allocation In-Reply-To: References: Message-ID: <53563771-6b45-eb00-0f50-7df1720ba013@redhat.com> On 12/16/2016 03:40 PM, Zhengyu Gu wrote: > When large allocation fails, current implementation only grows heap by 1 and > retry. This is slightly inefficient. > We can grow the heap by required regions at once, to avoid unnecessary loop. > > http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.00/ Been meaning to fix that! Shouldn't we instead fix the logic in ShenandoahHeap::allocate_memory_work, and not try to do another grow_heap_by in downcall to allocate_memory_under_lock -> allocate_large_memory? HeapWord* ShenandoahHeap::allocate_memory_work(size_t word_size) { ShenandoahHeapLock heap_lock(this); HeapWord* result = allocate_memory_under_lock(word_size); while (result == NULL && _num_regions < _max_regions) { grow_heap_by(1); // <--- depend on word_size here result = allocate_memory_under_lock(word_size); } return result; } Thanks, -Aleksey From rkennke at redhat.com Fri Dec 16 14:55:06 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 16 Dec 2016 15:55:06 +0100 Subject: RFR: Degenerating concurrent marking Message-ID: <1481900106.2807.20.camel@redhat.com> This patch implements what I call 'degenerating concurrent marking'. If, during concurrent mark, we run out of memory, instead of stopping, throwing away all marking data and doing a full-gc, it gracefully hands over all existing marking work to the subsequent final-mark pause, finishes marking there, and kicks of normal marking. The idea being that in most cases, the OOM is not happening because we got into a bad situation (fragmented heap or such) but only temporary alloc bursts or such, *and* chances are high that we're almost done marking anyway. I made it such that existing mark bitmaps, task queues, SATB buffers and weakref-queues are left intact, if the heuristics decide to go into degenerated concurrent marking, then the final-mark pause carries on where concurrent marking left. Interestingly, the code for this is mostly in place already ... in final marking we already finish off marking in the way that we need. I needed to tweak the termination protocol in the taskqueue for that, and not clear task queues on cancellation. Instead I added a 'shortcut' in the case we need to terminate without draining the task queues. Please look at this carefully, I am not totally sure I got that right. In addition, I also re-wrote adaptive heuristics. It will start out with 10% free threshold (i.e. we start marking when 10% available space is left), and lower that if we have 5 successful markings in a row, and bump that up if we fail to complete concurrent marking. We limit the free threshold 30 References: <8bd11a25-988c-3862-51ae-c23c683ae41a@redhat.com> Message-ID: <1481901040.2807.22.camel@redhat.com> Am Donnerstag, den 15.12.2016, 12:43 -0500 schrieb Zhengyu Gu: > This is an experimental heuristics that determines the number of > concurrent threads for each concurrent GC cycle. > > SPECjbb runs do not show obvious improvement, it seems to ramp up > load quickly, so conc thread count stays high. > > > The change set also contains some cleanup. > > > http://cr.openjdk.java.net/~zgu/shenandoah/conc-worker-heuristics/web > rev.00/ > > > Test: > ???SPECjbb, some of SPECjvm benchmarks. > > > Thanks, > > -Zhengyu > Interesting. I think the patch is ok. However, under which situation do you expect an improvement? Can we construct a benchmark for this? I think that applications with high alloc pressure (like SPECjbb) will push us to maximum threads. Low alloc pressure would let us stay lower too, but those apps would likely not be dominated by GC work anyway. Roman From zgu at redhat.com Fri Dec 16 15:11:44 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 16 Dec 2016 10:11:44 -0500 Subject: RFR:(XS): Small enhancement for large allocation In-Reply-To: <53563771-6b45-eb00-0f50-7df1720ba013@redhat.com> References: <53563771-6b45-eb00-0f50-7df1720ba013@redhat.com> Message-ID: Agree! http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.01/ -Zhengyu On 12/16/2016 09:48 AM, Aleksey Shipilev wrote: > On 12/16/2016 03:40 PM, Zhengyu Gu wrote: >> When large allocation fails, current implementation only grows heap by 1 and >> retry. This is slightly inefficient. >> We can grow the heap by required regions at once, to avoid unnecessary loop. >> >> http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.00/ > Been meaning to fix that! Shouldn't we instead fix the logic in > ShenandoahHeap::allocate_memory_work, and not try to do another grow_heap_by in > downcall to allocate_memory_under_lock -> allocate_large_memory? > > > HeapWord* ShenandoahHeap::allocate_memory_work(size_t word_size) { > ShenandoahHeapLock heap_lock(this); > > HeapWord* result = allocate_memory_under_lock(word_size); > while (result == NULL && _num_regions < _max_regions) { > grow_heap_by(1); // <--- depend on word_size here > result = allocate_memory_under_lock(word_size); > } > > return result; > } > > Thanks, > -Aleksey > From zgu at redhat.com Fri Dec 16 15:16:26 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 16 Dec 2016 10:16:26 -0500 Subject: RFR: Use heuristics to determine the number of conc threads for each conc gc cycle In-Reply-To: <1481901040.2807.22.camel@redhat.com> References: <8bd11a25-988c-3862-51ae-c23c683ae41a@redhat.com> <1481901040.2807.22.camel@redhat.com> Message-ID: <874f388d-a504-6cd8-098e-2b33328d5f45@redhat.com> I am not sure either, I withdraw it for now and try to find some "real" applications. I think benchmarks distort the heuristics. I will separate clean up and send RFR for that part only. Thanks, -Zhengyu On 12/16/2016 10:10 AM, Roman Kennke wrote: > Am Donnerstag, den 15.12.2016, 12:43 -0500 schrieb Zhengyu Gu: >> This is an experimental heuristics that determines the number of >> concurrent threads for each concurrent GC cycle. >> >> SPECjbb runs do not show obvious improvement, it seems to ramp up >> load quickly, so conc thread count stays high. >> >> >> The change set also contains some cleanup. >> >> >> http://cr.openjdk.java.net/~zgu/shenandoah/conc-worker-heuristics/web >> rev.00/ >> >> >> Test: >> SPECjbb, some of SPECjvm benchmarks. >> >> >> Thanks, >> >> -Zhengyu >> > Interesting. I think the patch is ok. > > However, under which situation do you expect an improvement? Can we > construct a benchmark for this? > > I think that applications with high alloc pressure (like SPECjbb) will > push us to maximum threads. Low alloc pressure would let us stay lower > too, but those apps would likely not be dominated by GC work anyway. > > Roman From shade at redhat.com Fri Dec 16 15:17:04 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 16 Dec 2016 16:17:04 +0100 Subject: RFR:(XS): Small enhancement for large allocation In-Reply-To: References: <53563771-6b45-eb00-0f50-7df1720ba013@redhat.com> Message-ID: <1b67413c-1837-1cf7-4872-7b835a4b52c5@redhat.com> On 12/16/2016 04:11 PM, Zhengyu Gu wrote: > http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.01/ Yup. Why not the usual macro? align_size_up(word_size * HeapWordSize, ShenandoahHeapRegion::RegionSizeBytes) Thanks, -Aleksey From shade at redhat.com Fri Dec 16 15:18:07 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 16 Dec 2016 16:18:07 +0100 Subject: RFR:(XS): Small enhancement for large allocation In-Reply-To: <1b67413c-1837-1cf7-4872-7b835a4b52c5@redhat.com> References: <53563771-6b45-eb00-0f50-7df1720ba013@redhat.com> <1b67413c-1837-1cf7-4872-7b835a4b52c5@redhat.com> Message-ID: <34d59aea-3a79-473f-5332-13b4e27e3fae@redhat.com> On 12/16/2016 04:17 PM, Aleksey Shipilev wrote: > On 12/16/2016 04:11 PM, Zhengyu Gu wrote: >> http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.01/ > > Yup. > > Why not the usual macro? > align_size_up(word_size * HeapWordSize, ShenandoahHeapRegion::RegionSizeBytes) Nevermind :) -Aleksey From zgu at redhat.com Fri Dec 16 15:54:49 2016 From: zgu at redhat.com (zgu at redhat.com) Date: Fri, 16 Dec 2016 15:54:49 +0000 Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets Message-ID: <201612161554.uBGFsnck007533@aojmv0008.oracle.com> Changeset: 0638df313dc4 Author: zgu Date: 2016-12-16 10:33 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/0638df313dc4 More efficient heap expansion ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp Changeset: eb5f5b74878d Author: zgu Date: 2016-12-16 10:34 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/eb5f5b74878d Merge ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp From zgu at redhat.com Fri Dec 16 17:02:44 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 16 Dec 2016 12:02:44 -0500 Subject: RFR: Degenerating concurrent marking In-Reply-To: <1481900106.2807.20.camel@redhat.com> References: <1481900106.2807.20.camel@redhat.com> Message-ID: <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com> Hi Roman, - taskqueue Adding force termination to TerminatorTerminator seems more logical to me class TerminatorTerminator: public CHeapObj { public: virtual bool should_exit_termination() = 0; virtual bool should_force_termination() = 0; }; - shenandoahConcurrentMark.cpp #392 Please update assert message. Otherwise, look good to me. Thanks, -Zhengyu On 12/16/2016 09:55 AM, Roman Kennke wrote: > This patch implements what I call 'degenerating concurrent marking'. > If, during concurrent mark, we run out of memory, instead of stopping, > throwing away all marking data and doing a full-gc, it gracefully hands > over all existing marking work to the subsequent final-mark pause, > finishes marking there, and kicks of normal marking. The idea being > that in most cases, the OOM is not happening because we got into a bad > situation (fragmented heap or such) but only temporary alloc bursts or > such, *and* chances are high that we're almost done marking anyway. > > I made it such that existing mark bitmaps, task queues, SATB buffers > and weakref-queues are left intact, if the heuristics decide to go into > degenerated concurrent marking, then the final-mark pause carries on > where concurrent marking left. Interestingly, the code for this is > mostly in place already ... in final marking we already finish off > marking in the way that we need. > > I needed to tweak the termination protocol in the taskqueue for that, > and not clear task queues on cancellation. Instead I added a 'shortcut' > in the case we need to terminate without draining the task queues. > Please look at this carefully, I am not totally sure I got that right. > > In addition, I also re-wrote adaptive heuristics. It will start out > with 10% free threshold (i.e. we start marking when 10% available space > is left), and lower that if we have 5 successful markings in a row, and > bump that up if we fail to complete concurrent marking. We limit the > free threshold 30 > This adaptive heuristics work very well for me, and I'm tempted to make > this default soon. It makes much better use of headroom, which means > fewer GC cycles, and thus better throughput. > > > http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/ > > Ok? Opinions? > > Roman > From rkennke at redhat.com Fri Dec 16 19:33:38 2016 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 16 Dec 2016 20:33:38 +0100 Subject: RFR: Degenerating concurrent marking In-Reply-To: <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com> References: <1481900106.2807.20.camel@redhat.com> <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com> Message-ID: <1481916818.2807.27.camel@redhat.com> Am Freitag, den 16.12.2016, 12:02 -0500 schrieb Zhengyu Gu: > Hi Roman, > > - taskqueue > ?? > ???Adding force termination to TerminatorTerminator seems more > logical to me > > class TerminatorTerminator: public CHeapObj { > public: > ???virtual bool should_exit_termination() = 0; > ???virtual bool should_force_termination() = 0; > }; > > - shenandoahConcurrentMark.cpp #392 > > ???Please update assert message. Ok. Like this: http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.01/ Roman > > > Otherwise, look good to me. > > Thanks, > > -Zhengyu > > > On 12/16/2016 09:55 AM, Roman Kennke wrote: > > This patch implements what I call 'degenerating concurrent > > marking'. > > If, during concurrent mark, we run out of memory, instead of > > stopping, > > throwing away all marking data and doing a full-gc, it gracefully > > hands > > over all existing marking work to the subsequent final-mark pause, > > finishes marking there, and kicks of normal marking. The idea being > > that in most cases, the OOM is not happening because we got into a > > bad > > situation (fragmented heap or such) but only temporary alloc bursts > > or > > such, *and* chances are high that we're almost done marking anyway. > > > > I made it such that existing mark bitmaps, task queues, SATB > > buffers > > and weakref-queues are left intact, if the heuristics decide to go > > into > > degenerated concurrent marking, then the final-mark pause carries > > on > > where concurrent marking left. Interestingly, the code for this is > > mostly in place already ... in final marking we already finish off > > marking in the way that we need. > > > > I needed to tweak the termination protocol in the taskqueue for > > that, > > and not clear task queues on cancellation. Instead I added a > > 'shortcut' > > in the case we need to terminate without draining the task queues. > > Please look at this carefully, I am not totally sure I got that > > right. > > > > In addition, I also re-wrote adaptive heuristics. It will start out > > with 10% free threshold (i.e. we start marking when 10% available > > space > > is left), and lower that if we have 5 successful markings in a row, > > and > > bump that up if we fail to complete concurrent marking. We limit > > the > > free threshold 30 > configured. > > > > This adaptive heuristics work very well for me, and I'm tempted to > > make > > this default soon. It makes much better use of headroom, which > > means > > fewer GC cycles, and thus better throughput. > > > > > > http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/ > > > > Ok? Opinions? > > > > Roman > > > > From roman at kennke.org Sat Dec 17 13:30:55 2016 From: roman at kennke.org (roman at kennke.org) Date: Sat, 17 Dec 2016 13:30:55 +0000 Subject: hg: shenandoah/jdk9/hotspot: Ensure metadata alive for Shenandoah too. Message-ID: <201612171330.uBHDUtJQ028583@aojmv0008.oracle.com> Changeset: baec38f7a7e5 Author: rkennke Date: 2016-12-17 14:30 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/baec38f7a7e5 Ensure metadata alive for Shenandoah too. ! src/share/vm/ci/ciObjectFactory.cpp From rkennke at redhat.com Sat Dec 17 13:32:18 2016 From: rkennke at redhat.com (Roman Kennke) Date: Sat, 17 Dec 2016 14:32:18 +0100 Subject: FYI: Ensure metadata alive for Shenandoah too Message-ID: <1481981538.2807.29.camel@redhat.com> I pushed the following fix. It fixes an occasional assert about a root object not being marked. diff --git a/src/share/vm/ci/ciObjectFactory.cpp b/src/share/vm/ci/ciObjectFactory.cpp --- a/src/share/vm/ci/ciObjectFactory.cpp +++ b/src/share/vm/ci/ciObjectFactory.cpp @@ -413,7 +413,7 @@ ???ASSERT_IN_VM; // We're handling raw oops here. ? ?#if INCLUDE_ALL_GCS -??if (!UseG1GC) { +??if (!(UseG1GC || UseShenandoahGC)) { ?????return; ???} ???Klass* metadata_owner_klass; From rkennke at redhat.com Sat Dec 17 13:41:23 2016 From: rkennke at redhat.com (Roman Kennke) Date: Sat, 17 Dec 2016 14:41:23 +0100 Subject: RFR: Degenerating concurrent marking In-Reply-To: <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com> References: <1481900106.2807.20.camel@redhat.com> <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com> Message-ID: <1481982083.2807.31.camel@redhat.com> As suggested by Zhengyu on IRC, I now changed it to: ????if (terminator != NULL && terminator->should_force_termination()) { ??????return true; ????} makes the code more readable. The assert that I observed was not caused by this change and is already fixed. Ok to go now? http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.03 Roman Am Freitag, den 16.12.2016, 12:02 -0500 schrieb Zhengyu Gu: > Hi Roman, > > - taskqueue > ?? > ???Adding force termination to TerminatorTerminator seems more > logical to me > > class TerminatorTerminator: public CHeapObj { > public: > ???virtual bool should_exit_termination() = 0; > ???virtual bool should_force_termination() = 0; > }; > > - shenandoahConcurrentMark.cpp #392 > > ???Please update assert message. > > > Otherwise, look good to me. > > Thanks, > > -Zhengyu > > > On 12/16/2016 09:55 AM, Roman Kennke wrote: > > This patch implements what I call 'degenerating concurrent > > marking'. > > If, during concurrent mark, we run out of memory, instead of > > stopping, > > throwing away all marking data and doing a full-gc, it gracefully > > hands > > over all existing marking work to the subsequent final-mark pause, > > finishes marking there, and kicks of normal marking. The idea being > > that in most cases, the OOM is not happening because we got into a > > bad > > situation (fragmented heap or such) but only temporary alloc bursts > > or > > such, *and* chances are high that we're almost done marking anyway. > > > > I made it such that existing mark bitmaps, task queues, SATB > > buffers > > and weakref-queues are left intact, if the heuristics decide to go > > into > > degenerated concurrent marking, then the final-mark pause carries > > on > > where concurrent marking left. Interestingly, the code for this is > > mostly in place already ... in final marking we already finish off > > marking in the way that we need. > > > > I needed to tweak the termination protocol in the taskqueue for > > that, > > and not clear task queues on cancellation. Instead I added a > > 'shortcut' > > in the case we need to terminate without draining the task queues. > > Please look at this carefully, I am not totally sure I got that > > right. > > > > In addition, I also re-wrote adaptive heuristics. It will start out > > with 10% free threshold (i.e. we start marking when 10% available > > space > > is left), and lower that if we have 5 successful markings in a row, > > and > > bump that up if we fail to complete concurrent marking. We limit > > the > > free threshold 30 > configured. > > > > This adaptive heuristics work very well for me, and I'm tempted to > > make > > this default soon. It makes much better use of headroom, which > > means > > fewer GC cycles, and thus better throughput. > > > > > > http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/ > > > > Ok? Opinions? > > > > Roman > > > > From rkennke at redhat.com Sat Dec 17 15:52:34 2016 From: rkennke at redhat.com (Roman Kennke) Date: Sat, 17 Dec 2016 16:52:34 +0100 Subject: RFR: Degenerating concurrent marking In-Reply-To: <1481982083.2807.31.camel@redhat.com> References: <1481900106.2807.20.camel@redhat.com> <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com> <1481982083.2807.31.camel@redhat.com> Message-ID: <1481989954.2807.32.camel@redhat.com> Am Samstag, den 17.12.2016, 14:41 +0100 schrieb Roman Kennke: > As suggested by Zhengyu on IRC, I now changed it to: > > ????if (terminator != NULL && terminator->should_force_termination()) > { > ??????return true; > ????} > > makes the code more readable. Hmm, no, this didn't work. We need the spinning as was proposed before: http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.04 This passes all tests that I throw at it. ok to push? Roman > > The assert that I observed was not caused by this change and is > already > fixed. > > Ok to go now? > > http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.03 > > Roman > > > Am Freitag, den 16.12.2016, 12:02 -0500 schrieb Zhengyu Gu: > > Hi Roman, > > > > - taskqueue > > ?? > > ???Adding force termination to TerminatorTerminator seems more > > logical to me > > > > class TerminatorTerminator: public CHeapObj { > > public: > > ???virtual bool should_exit_termination() = 0; > > ???virtual bool should_force_termination() = 0; > > }; > > > > - shenandoahConcurrentMark.cpp #392 > > > > ???Please update assert message. > > > > > > Otherwise, look good to me. > > > > Thanks, > > > > -Zhengyu > > > > > > On 12/16/2016 09:55 AM, Roman Kennke wrote: > > > This patch implements what I call 'degenerating concurrent > > > marking'. > > > If, during concurrent mark, we run out of memory, instead of > > > stopping, > > > throwing away all marking data and doing a full-gc, it gracefully > > > hands > > > over all existing marking work to the subsequent final-mark > > > pause, > > > finishes marking there, and kicks of normal marking. The idea > > > being > > > that in most cases, the OOM is not happening because we got into > > > a > > > bad > > > situation (fragmented heap or such) but only temporary alloc > > > bursts > > > or > > > such, *and* chances are high that we're almost done marking > > > anyway. > > > > > > I made it such that existing mark bitmaps, task queues, SATB > > > buffers > > > and weakref-queues are left intact, if the heuristics decide to > > > go > > > into > > > degenerated concurrent marking, then the final-mark pause carries > > > on > > > where concurrent marking left. Interestingly, the code for this > > > is > > > mostly in place already ... in final marking we already finish > > > off > > > marking in the way that we need. > > > > > > I needed to tweak the termination protocol in the taskqueue for > > > that, > > > and not clear task queues on cancellation. Instead I added a > > > 'shortcut' > > > in the case we need to terminate without draining the task > > > queues. > > > Please look at this carefully, I am not totally sure I got that > > > right. > > > > > > In addition, I also re-wrote adaptive heuristics. It will start > > > out > > > with 10% free threshold (i.e. we start marking when 10% available > > > space > > > is left), and lower that if we have 5 successful markings in a > > > row, > > > and > > > bump that up if we fail to complete concurrent marking. We limit > > > the > > > free threshold 30 > > configured. > > > > > > This adaptive heuristics work very well for me, and I'm tempted > > > to > > > make > > > this default soon. It makes much better use of headroom, which > > > means > > > fewer GC cycles, and thus better throughput. > > > > > > > > > http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/ > > > > > > Ok? Opinions? > > > > > > Roman > > > > > > > From rkennke at redhat.com Sun Dec 18 13:24:46 2016 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 18 Dec 2016 14:24:46 +0100 Subject: RFR: Fix freeze on OOM during evacuation Message-ID: <1482067486.2807.35.camel@redhat.com> The run_service() loop in ShenandoahConcurrentThread can still deadlock when OOM happens during evacuation: when we get out of final-mark, but have not yet started the GC threads, a Java thread could OOM and the ShenandoahConcurrentThread never get to resetting the evacuation-in- progress flag. The Java thread would wait forever and not get to a safepoint, while the GC waits for Java threads to get to safepoint for the next pause. The change fixes it by always resetting the evac flag when coming out of service_normal_cycle(). Tested by running SPECjvm in a loop 24hours and jcstress. http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.00 Ok to push? Roman From roman at kennke.org Mon Dec 19 11:05:42 2016 From: roman at kennke.org (roman at kennke.org) Date: Mon, 19 Dec 2016 11:05:42 +0000 Subject: hg: shenandoah/jdk8u/hotspot: Ensure metadata alive for Shenandoah too. Message-ID: <201612191105.uBJB5gl4024641@aojmv0008.oracle.com> Changeset: 91b6e4811a5f Author: rkennke Date: 2016-12-19 12:05 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/91b6e4811a5f Ensure metadata alive for Shenandoah too. ! src/share/vm/ci/ciObjectFactory.cpp From roman at kennke.org Mon Dec 19 14:17:55 2016 From: roman at kennke.org (roman at kennke.org) Date: Mon, 19 Dec 2016 14:17:55 +0000 Subject: hg: shenandoah/jdk8u/hotspot: Added missing read-barrier to inline_unsafe_ordered_store() in C2 intrinsics. Message-ID: <201612191417.uBJEHtTI017325@aojmv0008.oracle.com> Changeset: c7ccb4a2b360 Author: rkennke Date: 2016-12-19 15:17 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b360 Added missing read-barrier to inline_unsafe_ordered_store() in C2 intrinsics. ! src/share/vm/opto/library_call.cpp From rkennke at redhat.com Mon Dec 19 14:21:08 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 19 Dec 2016 15:21:08 +0100 Subject: FYI: (Re-)Added missing read barrier in C2's inline_unsafe_ordered_store() intrinsic Message-ID: <1482157268.2807.42.camel@redhat.com> This one fell under the table, probably because it's not present in jdk9. http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b360 Roman From zgu at redhat.com Mon Dec 19 14:25:54 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 19 Dec 2016 09:25:54 -0500 Subject: FYI: (Re-)Added missing read barrier in C2's inline_unsafe_ordered_store() intrinsic In-Reply-To: <1482157268.2807.42.camel@redhat.com> References: <1482157268.2807.42.camel@redhat.com> Message-ID: <4f19513a-e4b0-a606-8dad-a114e41a610c@redhat.com> Should the read barrier be shenandoah only? -Zhengyu On 12/19/2016 09:21 AM, Roman Kennke wrote: > This one fell under the table, probably because it's not present in > jdk9. > > http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b360 > > Roman > From rkennke at redhat.com Mon Dec 19 14:28:20 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 19 Dec 2016 15:28:20 +0100 Subject: FYI: (Re-)Added missing read barrier in C2's inline_unsafe_ordered_store() intrinsic In-Reply-To: <4f19513a-e4b0-a606-8dad-a114e41a610c@redhat.com> References: <1482157268.2807.42.camel@redhat.com> <4f19513a-e4b0-a606-8dad-a114e41a610c@redhat.com> Message-ID: <1482157700.2807.43.camel@redhat.com> Am Montag, den 19.12.2016, 09:25 -0500 schrieb Zhengyu Gu: > Should the read barrier be shenandoah only? It already is. Yes, we should refactor this to be more obvious. Roman > > -Zhengyu > > > On 12/19/2016 09:21 AM, Roman Kennke wrote: > > This one fell under the table, probably because it's not present in > > jdk9. > > > > http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b36 > > 0 > > > > Roman > > > > From zgu at redhat.com Mon Dec 19 14:30:00 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 19 Dec 2016 09:30:00 -0500 Subject: FYI: (Re-)Added missing read barrier in C2's inline_unsafe_ordered_store() intrinsic In-Reply-To: <1482157700.2807.43.camel@redhat.com> References: <1482157268.2807.42.camel@redhat.com> <4f19513a-e4b0-a606-8dad-a114e41a610c@redhat.com> <1482157700.2807.43.camel@redhat.com> Message-ID: <4867f1d8-e97f-68fb-c6cb-9c29af4ed317@redhat.com> Okay, Thanks, -Zhengyu On 12/19/2016 09:28 AM, Roman Kennke wrote: > Am Montag, den 19.12.2016, 09:25 -0500 schrieb Zhengyu Gu: >> Should the read barrier be shenandoah only? > It already is. > > Yes, we should refactor this to be more obvious. > > Roman > >> -Zhengyu >> >> >> On 12/19/2016 09:21 AM, Roman Kennke wrote: >>> This one fell under the table, probably because it's not present in >>> jdk9. >>> >>> http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b36 >>> 0 >>> >>> Roman >>> >> From rwestrel at redhat.com Mon Dec 19 14:35:30 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 19 Dec 2016 15:35:30 +0100 Subject: FYI: (Re-)Added missing read barrier in C2's inline_unsafe_ordered_store() intrinsic In-Reply-To: <1482157268.2807.42.camel@redhat.com> References: <1482157268.2807.42.camel@redhat.com> Message-ID: > This one fell under the table, probably because it's not present in > jdk9. > > http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b360 Thanks for fixing that. Roland. From shade at redhat.com Mon Dec 19 15:43:01 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 19 Dec 2016 16:43:01 +0100 Subject: RFR: Fix freeze on OOM during evacuation In-Reply-To: <1482067486.2807.35.camel@redhat.com> References: <1482067486.2807.35.camel@redhat.com> Message-ID: On 12/18/2016 02:24 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.00 This webrev is contaminated with degenerate conc mark patch? -Aleksey From rkennke at redhat.com Mon Dec 19 16:25:52 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 19 Dec 2016 17:25:52 +0100 Subject: RFR: Fix freeze on OOM during evacuation In-Reply-To: References: <1482067486.2807.35.camel@redhat.com> Message-ID: <1482164752.2807.45.camel@redhat.com> Am Montag, den 19.12.2016, 16:43 +0100 schrieb Aleksey Shipilev: > On 12/18/2016 02:24 PM, Roman Kennke wrote: > > http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.00 > > This webrev is contaminated with degenerate conc mark patch? Duh. http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.01/ Roman From roman at kennke.org Mon Dec 19 16:33:31 2016 From: roman at kennke.org (roman at kennke.org) Date: Mon, 19 Dec 2016 16:33:31 +0000 Subject: hg: shenandoah/jdk8u/hotspot: Add missing eq barrier in opto runtime. Message-ID: <201612191633.uBJGXV2r022457@aojmv0008.oracle.com> Changeset: eb39f84890cb Author: rkennke Date: 2016-12-19 17:33 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/eb39f84890cb Add missing eq barrier in opto runtime. ! src/share/vm/opto/runtime.cpp From rkennke at redhat.com Mon Dec 19 16:34:33 2016 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 19 Dec 2016 17:34:33 +0100 Subject: FYI: (Re-) add object eq barrier in OptoRuntime Message-ID: <1482165273.2807.46.camel@redhat.com> Another one that probably got lost because it did not exist in jdk9... http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/eb39f84890cb Roman From shade at redhat.com Mon Dec 19 18:41:05 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 19 Dec 2016 19:41:05 +0100 Subject: RFR: Fix freeze on OOM during evacuation In-Reply-To: <1482164752.2807.45.camel@redhat.com> References: <1482067486.2807.35.camel@redhat.com> <1482164752.2807.45.camel@redhat.com> Message-ID: <64c2ba03-e1c2-8d1f-9003-2781d9893639@redhat.com> On 12/19/2016 05:25 PM, Roman Kennke wrote: > Am Montag, den 19.12.2016, 16:43 +0100 schrieb Aleksey Shipilev: >> On 12/18/2016 02:24 PM, Roman Kennke wrote: >>> http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.00 >> >> This webrev is contaminated with degenerate conc mark patch? > > Duh. > > http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.01/ Okay. -Aleksey From rwestrel at redhat.com Mon Dec 19 20:10:00 2016 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 19 Dec 2016 21:10:00 +0100 Subject: FYI: (Re-) add object eq barrier in OptoRuntime In-Reply-To: <1482165273.2807.46.camel@redhat.com> References: <1482165273.2807.46.camel@redhat.com> Message-ID: > Another one that probably got lost because it did not exist in jdk9... > > http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/eb39f84890cb Thanks for fixing that one too. Roland. From zgu at redhat.com Mon Dec 19 21:23:30 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 19 Dec 2016 16:23:30 -0500 Subject: RFR: Degenerating concurrent marking In-Reply-To: <1481989954.2807.32.camel@redhat.com> References: <1481900106.2807.20.camel@redhat.com> <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com> <1481982083.2807.31.camel@redhat.com> <1481989954.2807.32.camel@redhat.com> Message-ID: Okay. -Zhengyu On 12/17/2016 10:52 AM, Roman Kennke wrote: > Am Samstag, den 17.12.2016, 14:41 +0100 schrieb Roman Kennke: >> As suggested by Zhengyu on IRC, I now changed it to: >> >> if (terminator != NULL && terminator->should_force_termination()) >> { >> return true; >> } >> >> makes the code more readable. > Hmm, no, this didn't work. We need the spinning as was proposed before: > > http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.04 > > This passes all tests that I throw at it. > > ok to push? > > Roman > >> The assert that I observed was not caused by this change and is >> already >> fixed. >> >> Ok to go now? >> >> http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.03 >> >> Roman >> >> >> Am Freitag, den 16.12.2016, 12:02 -0500 schrieb Zhengyu Gu: >>> Hi Roman, >>> >>> - taskqueue >>> >>> Adding force termination to TerminatorTerminator seems more >>> logical to me >>> >>> class TerminatorTerminator: public CHeapObj { >>> public: >>> virtual bool should_exit_termination() = 0; >>> virtual bool should_force_termination() = 0; >>> }; >>> >>> - shenandoahConcurrentMark.cpp #392 >>> >>> Please update assert message. >>> >>> >>> Otherwise, look good to me. >>> >>> Thanks, >>> >>> -Zhengyu >>> >>> >>> On 12/16/2016 09:55 AM, Roman Kennke wrote: >>>> This patch implements what I call 'degenerating concurrent >>>> marking'. >>>> If, during concurrent mark, we run out of memory, instead of >>>> stopping, >>>> throwing away all marking data and doing a full-gc, it gracefully >>>> hands >>>> over all existing marking work to the subsequent final-mark >>>> pause, >>>> finishes marking there, and kicks of normal marking. The idea >>>> being >>>> that in most cases, the OOM is not happening because we got into >>>> a >>>> bad >>>> situation (fragmented heap or such) but only temporary alloc >>>> bursts >>>> or >>>> such, *and* chances are high that we're almost done marking >>>> anyway. >>>> >>>> I made it such that existing mark bitmaps, task queues, SATB >>>> buffers >>>> and weakref-queues are left intact, if the heuristics decide to >>>> go >>>> into >>>> degenerated concurrent marking, then the final-mark pause carries >>>> on >>>> where concurrent marking left. Interestingly, the code for this >>>> is >>>> mostly in place already ... in final marking we already finish >>>> off >>>> marking in the way that we need. >>>> >>>> I needed to tweak the termination protocol in the taskqueue for >>>> that, >>>> and not clear task queues on cancellation. Instead I added a >>>> 'shortcut' >>>> in the case we need to terminate without draining the task >>>> queues. >>>> Please look at this carefully, I am not totally sure I got that >>>> right. >>>> >>>> In addition, I also re-wrote adaptive heuristics. It will start >>>> out >>>> with 10% free threshold (i.e. we start marking when 10% available >>>> space >>>> is left), and lower that if we have 5 successful markings in a >>>> row, >>>> and >>>> bump that up if we fail to complete concurrent marking. We limit >>>> the >>>> free threshold 30>>> configured. >>>> >>>> This adaptive heuristics work very well for me, and I'm tempted >>>> to >>>> make >>>> this default soon. It makes much better use of headroom, which >>>> means >>>> fewer GC cycles, and thus better throughput. >>>> >>>> >>>> http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/ >>>> >>>> Ok? Opinions? >>>> >>>> Roman >>>> >>> From roman at kennke.org Mon Dec 19 21:48:36 2016 From: roman at kennke.org (roman at kennke.org) Date: Mon, 19 Dec 2016 21:48:36 +0000 Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets Message-ID: <201612192148.uBJLmaZB012487@aojmv0008.oracle.com> Changeset: fc0c2ad9497d Author: rkennke Date: 2016-12-19 22:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fc0c2ad9497d Fix freeze on OOM during evacuation ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp Changeset: 84363ca14be9 Author: rkennke Date: 2016-12-19 22:48 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/84363ca14be9 Degenerating concurrent marking ! src/share/vm/gc/shared/taskqueue.cpp ! src/share/vm/gc/shared/taskqueue.hpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahTaskqueue.cpp ! src/share/vm/gc/shenandoah/shenandoahTaskqueue.hpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp From rwestrel at redhat.com Tue Dec 20 10:04:08 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Tue, 20 Dec 2016 10:04:08 +0000 Subject: hg: shenandoah/jdk8u/hotspot: null check bypasses read barrier Message-ID: <201612201004.uBKA48KQ012542@aojmv0008.oracle.com> Changeset: 05f696d8443b Author: roland Date: 2016-12-20 11:03 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/05f696d8443b null check bypasses read barrier ! src/share/vm/opto/compile.cpp From shade at redhat.com Tue Dec 20 10:57:21 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 20 Dec 2016 11:57:21 +0100 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah Message-ID: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> Hi, Since we care mostly about pause times, and not the raw throughput, it makes sense to enable safepoints in counted loops. This makes us much more responsive (as in, TTSP is lower) in many interesting scenarios. Change: http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/webrev.01/ The easiest example that is present in any workload of interest is looping through a large array/ArrayList. SPECjvm2008 throughput does appear affected where tight loops are present: Benchmark Mode Cnt Score Error Units # -XX:-UseCountedLoopSafepoints Compiler.compiler thrpt 30 217.169 ? 5.166 ops/min Compiler.sunflow thrpt 30 473.940 ? 20.246 ops/min Compress.test thrpt 15 647.552 ? 3.528 ops/min CryptoAes.test thrpt 15 44.367 ? 2.402 ops/min CryptoRsa.test thrpt 15 2066.495 ? 11.809 ops/min CryptoSignVerify.test thrpt 15 10372.019 ? 50.713 ops/min Derby.test thrpt 30 375.954 ? 13.539 ops/min MpegAudio.test thrpt 15 197.299 ? 2.411 ops/min ScimarkFFT.large thrpt 15 55.618 ? 0.142 ops/min ScimarkFFT.small thrpt 15 664.370 ? 7.304 ops/min ScimarkLU.large thrpt 15 14.767 ? 0.082 ops/min ScimarkLU.small thrpt 15 926.435 ? 8.790 ops/min ScimarkMonteCarlo.test thrpt 15 4508.333 ? 68.869 ops/min ScimarkSOR.large thrpt 15 74.596 ? 0.052 ops/min ScimarkSOR.small thrpt 15 466.186 ? 1.308 ops/min ScimarkSparse.large thrpt 15 48.932 ? 11.991 ops/min ScimarkSparse.small thrpt 15 360.907 ? 6.739 ops/min Serial.test thrpt 30 8779.857 ? 77.717 ops/s Sunflow.test thrpt 15 124.546 ? 2.110 ops/min XmlTransform.test thrpt 20 429.422 ? 24.964 ops/min XmlValidation.test thrpt 30 773.254 ? 8.561 ops/min # -XX:+UseCountedLoopSafepoints Compiler.compiler thrpt 20 213.199 ? 8.146 ops/min Compiler.sunflow thrpt 27 486.745 ? 21.118 ops/min Compress.test thrpt 15 637.303 ? 4.800 ops/min <--- -1.5% CryptoAes.test thrpt 15 46.943 ? 0.345 ops/min CryptoRsa.test thrpt 15 2042.072 ? 12.379 ops/min <--- -1.1% CryptoSignVerify.test thrpt 15 10240.459 ? 63.095 ops/min Derby.test thrpt 30 406.943 ? 12.625 ops/min MpegAudio.test thrpt 15 193.173 ? 1.414 ops/min ScimarkFFT.large thrpt 15 55.629 ? 0.104 ops/min ScimarkFFT.small thrpt 15 669.153 ? 6.683 ops/min ScimarkLU.large thrpt 15 13.510 ? 0.075 ops/min <--- -8.5% ScimarkLU.small thrpt 15 581.737 ? 6.539 ops/min <--- -37.3% ScimarkMonteCarlo.test thrpt 15 4485.049 ? 11.864 ops/min ScimarkSOR.large thrpt 15 74.594 ? 0.045 ops/min ScimarkSOR.small thrpt 15 421.046 ? 0.456 ops/min <--- -9.6% ScimarkSparse.large thrpt 15 40.995 ? 0.283 ops/min ScimarkSparse.small thrpt 15 319.079 ? 1.391 ops/min <--- -11.3% Serial.test thrpt 30 8717.823 ? 81.147 ops/s Sunflow.test thrpt 15 127.221 ? 1.578 ops/min XmlTransform.test thrpt 20 445.762 ? 8.278 ops/min XmlValidation.test thrpt 30 760.121 ? 9.963 ops/min Note that Scimark are expected to regress that much: they do have very tight loops, and that's our problem: the TTSP times there are in multi-second range! The difference is explained by different code generation. For example, in most dramatic ScimarkLU.small case: Hottest loop uses AVX2 (vmovdqu and friends): http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu-shenandoah-minus.perfasm Hottest loop uses AVX (vmovsd and friends): http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu-shenandoah-plus.perfasm As such, I believe enabling this by default, and figuring out code quality issues as we go forward is the sane tactics. Thanks, -Aleksey From rkennke at redhat.com Tue Dec 20 11:02:11 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 20 Dec 2016 12:02:11 +0100 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> Message-ID: <1482231731.2807.54.camel@redhat.com> Am Dienstag, den 20.12.2016, 11:57 +0100 schrieb Aleksey Shipilev: > Hi, > > Since we care mostly about pause times, and not the raw throughput, > it makes > sense to enable safepoints in counted loops. This makes us much more > responsive > (as in, TTSP is lower) in many interesting scenarios. > > Change: > ? http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/webrev.0 > 1/ > > The easiest example that is present in any workload of interest is > looping > through a large array/ArrayList. > > SPECjvm2008 throughput does appear affected where tight loops are > present: > > Benchmark????????????????Mode??Cnt??????Score????Error????Units > > # -XX:-UseCountedLoopSafepoints > Compiler.compiler???????thrpt???30????217.169 ???5.166??ops/min > Compiler.sunflow????????thrpt???30????473.940 ? 20.246??ops/min > Compress.test???????????thrpt???15????647.552 ???3.528??ops/min > CryptoAes.test??????????thrpt???15?????44.367 ???2.402??ops/min > CryptoRsa.test??????????thrpt???15???2066.495 ? 11.809??ops/min > CryptoSignVerify.test???thrpt???15??10372.019 ? 50.713??ops/min > Derby.test??????????????thrpt???30????375.954 ? 13.539??ops/min > MpegAudio.test??????????thrpt???15????197.299 ???2.411??ops/min > ScimarkFFT.large????????thrpt???15?????55.618 ???0.142??ops/min > ScimarkFFT.small????????thrpt???15????664.370 ???7.304??ops/min > ScimarkLU.large?????????thrpt???15?????14.767 ???0.082??ops/min > ScimarkLU.small?????????thrpt???15????926.435 ???8.790??ops/min > ScimarkMonteCarlo.test??thrpt???15???4508.333 ? 68.869??ops/min > ScimarkSOR.large????????thrpt???15?????74.596 ???0.052??ops/min > ScimarkSOR.small????????thrpt???15????466.186 ???1.308??ops/min > ScimarkSparse.large?????thrpt???15?????48.932 ? 11.991??ops/min > ScimarkSparse.small?????thrpt???15????360.907 ???6.739??ops/min > Serial.test?????????????thrpt???30???8779.857 ? 77.717????ops/s > Sunflow.test????????????thrpt???15????124.546 ???2.110??ops/min > XmlTransform.test???????thrpt???20????429.422 ? 24.964??ops/min > XmlValidation.test??????thrpt???30????773.254 ???8.561??ops/min > > # -XX:+UseCountedLoopSafepoints > Compiler.compiler???????thrpt???20????213.199 ???8.146??ops/min > Compiler.sunflow????????thrpt???27????486.745 ? 21.118??ops/min > Compress.test???????????thrpt???15????637.303 ???4.800??ops/min < > ---??-1.5% > CryptoAes.test??????????thrpt???15?????46.943 ???0.345??ops/min > CryptoRsa.test??????????thrpt???15???2042.072 ? 12.379??ops/min < > ---??-1.1% > CryptoSignVerify.test???thrpt???15??10240.459 ? 63.095??ops/min > Derby.test??????????????thrpt???30????406.943 ? 12.625??ops/min > MpegAudio.test??????????thrpt???15????193.173 ???1.414??ops/min > ScimarkFFT.large????????thrpt???15?????55.629 ???0.104??ops/min > ScimarkFFT.small????????thrpt???15????669.153 ???6.683??ops/min > ScimarkLU.large?????????thrpt???15?????13.510 ???0.075??ops/min < > ---??-8.5% > ScimarkLU.small?????????thrpt???15????581.737 ???6.539??ops/min <--- > -37.3% > ScimarkMonteCarlo.test??thrpt???15???4485.049 ? 11.864??ops/min > ScimarkSOR.large????????thrpt???15?????74.594 ???0.045??ops/min > ScimarkSOR.small????????thrpt???15????421.046 ???0.456??ops/min < > ---??-9.6% > ScimarkSparse.large?????thrpt???15?????40.995 ???0.283??ops/min > ScimarkSparse.small?????thrpt???15????319.079 ???1.391??ops/min <--- > -11.3% > Serial.test?????????????thrpt???30???8717.823 ? 81.147????ops/s > Sunflow.test????????????thrpt???15????127.221 ???1.578??ops/min > XmlTransform.test???????thrpt???20????445.762 ???8.278??ops/min > XmlValidation.test??????thrpt???30????760.121 ???9.963??ops/min > > Note that Scimark are expected to regress that much: they do have > very tight > loops, and that's our problem: the TTSP times there are in multi- > second range! > The difference is explained by different code generation. For > example, in most > dramatic ScimarkLU.small case: > > Hottest loop uses AVX2 (vmovdqu and friends): > > http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu > -shenandoah-minus.perfasm > > Hottest loop uses AVX (vmovsd and friends): > > http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu > -shenandoah-plus.perfasm > > As such, I believe enabling this by default, and figuring out code > quality > issues as we go forward is the sane tactics. Yes. The regressions, especially in scimark.lu are bad, but as you say, the ones that regress are also the ones that show extreme TTSP. The patch is ok for me. Folks who prefer raw throughput and can live with multisecond pause times can still turn the option off :-) In the long run, we should look at strip mining the loops. Roman From ashipile at redhat.com Tue Dec 20 11:09:59 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 20 Dec 2016 11:09:59 +0000 Subject: hg: shenandoah/jdk9/hotspot: Enable UseCountedLoopSafepoints with Shenandoah. Message-ID: <201612201109.uBKB9xPF029971@aojmv0008.oracle.com> Changeset: c2fd76aa8981 Author: shade Date: 2016-12-20 12:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/c2fd76aa8981 Enable UseCountedLoopSafepoints with Shenandoah. ! src/share/vm/runtime/arguments.cpp From rkennke at redhat.com Tue Dec 20 11:10:56 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 20 Dec 2016 12:10:56 +0100 Subject: RFR: Improve AryEq Message-ID: <1482232256.2807.56.camel@redhat.com> This adds an cmp-barrier to the code generated by AryEq. A false negative in the array ptr comparison would result in the slow-path being taken, even though it's not necessary. The barrier should get us on the fast path more often. Ok? http://cr.openjdk.java.net/~rkennke/aryeq/webrev.00/ Roman From shade at redhat.com Tue Dec 20 11:15:33 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 20 Dec 2016 12:15:33 +0100 Subject: RFR: Improve AryEq In-Reply-To: <1482232256.2807.56.camel@redhat.com> References: <1482232256.2807.56.camel@redhat.com> Message-ID: <743b2c9e-cf5a-f7ac-ef69-1c8c6a47b54a@redhat.com> On 12/20/2016 12:10 PM, Roman Kennke wrote: > This adds an cmp-barrier to the code generated by AryEq. A false > negative in the array ptr comparison would result in the slow-path > being taken, even though it's not necessary. The barrier should get us > on the fast path more often. > > Ok? > > http://cr.openjdk.java.net/~rkennke/aryeq/webrev.00/ Looks okay, but would be interesting to see if we can merge null-checking paths with acmp barrier? 8603 oopDesc::bs()->asm_acmp_barrier(this, ary1, ary2); 8604 jcc(Assembler::equal, TRUE_LABEL); 8605 8606 // Need additional checks for arrays_equals. 8607 testptr(ary1, ary1); 8608 jcc(Assembler::zero, FALSE_LABEL); 8609 testptr(ary2, ary2); 8610 jcc(Assembler::zero, FALSE_LABEL); Thanks, -Aleksey From rwestrel at redhat.com Tue Dec 20 11:44:13 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Tue, 20 Dec 2016 11:44:13 +0000 Subject: hg: shenandoah/jdk8u/hotspot: read barrier in unsafe can break C2 graph Message-ID: <201612201144.uBKBiDEO009069@aojmv0008.oracle.com> Changeset: b9bba0d6458d Author: roland Date: 2016-12-20 12:44 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/b9bba0d6458d read barrier in unsafe can break C2 graph ! src/share/vm/opto/library_call.cpp From rwestrel at redhat.com Tue Dec 20 13:29:52 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Tue, 20 Dec 2016 13:29:52 +0000 Subject: hg: shenandoah/jdk8u/hotspot: add back accidentally dropped write barriers in GraphKit::store_String_* Message-ID: <201612201329.uBKDTqw2009839@aojmv0008.oracle.com> Changeset: 4ba3e50858e2 Author: roland Date: 2016-12-20 14:29 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/4ba3e50858e2 add back accidentally dropped write barriers in GraphKit::store_String_* ! src/share/vm/opto/graphKit.cpp From aph at redhat.com Tue Dec 20 13:32:49 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 20 Dec 2016 13:32:49 +0000 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> Message-ID: <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> On 20/12/16 10:57, Aleksey Shipilev wrote: > Since we care mostly about pause times, and not the raw throughput, it makes > sense to enable safepoints in counted loops. This makes us much more responsive > (as in, TTSP is lower) in many interesting scenarios. True, but I have seen some very interesting cases where we beat G1 in throughput. Let's not overdo this: at the very least we need to know how to restore throughput when running Shenandoah; all this business of one flag affecting others can be surprising. Andrew. From rkennke at redhat.com Tue Dec 20 14:01:38 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 20 Dec 2016 15:01:38 +0100 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> Message-ID: <1482242498.2807.60.camel@redhat.com> Am Dienstag, den 20.12.2016, 13:32 +0000 schrieb Andrew Haley: > On 20/12/16 10:57, Aleksey Shipilev wrote: > > Since we care mostly about pause times, and not the raw throughput, > > it makes > > sense to enable safepoints in counted loops. This makes us much > > more responsive > > (as in, TTSP is lower) in many interesting scenarios. > > True, but I have seen some very interesting cases where we beat G1 in > throughput. Yes. As far as I can see, those are not affected by this (e.g. compiler benchmarks). And multiple seconds (!) just to get to a safepoint seems way too much, and it's more than 1 program that is affected by this. > ??Let's not overdo this: at the very least we need to know > how to restore throughput when running Shenandoah; easy: -XX:-UseCountedLoopSafepoints Infact, I've been thinking for a while about a sort of 'priority' setting for Shenandoah, where one could choose between 'throughput' and 'pausetime', and we would turn on or off specific options to improve one or the other, e.g. this UseCountedLoopSafepoints flag, some heuristics settings, and so on. Kindof like the -XX:+AggressiveOpts setting, but towards one or the other priority. However, so far there are not that many settings in this regard, and our priority is always leaning towards pause times anyway... > all this business > of one flag affecting others can be surprising. Indeed. I would be most worried about turning on code paths that are not used otherwise, and thus run into bugs that are not ours, but in this case it seems to be simple enough. Roman From lennart.borjeson at cinnober.com Tue Dec 20 15:18:31 2016 From: lennart.borjeson at cinnober.com (=?utf-8?B?TGVubmFydCBCw7ZyamVzb24=?=) Date: Tue, 20 Dec 2016 15:18:31 +0000 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: <1482242498.2807.60.camel@redhat.com> References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> Message-ID: <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com> I feel I must chip in here? (I?m continuously testing Shenandoah, as well as other JVM variants, with our products, as part of my work.) We have recently encountered an issue with a commercial JVM which had elected to skip safepoint checks for all counted loops. This broke our product, since we have a crucial spin wait in a long-indexed loop. (As you know, the JVM normally inserts safepoint checks in long-indexed, but not in int-indexed, counted loops.) Such a change in behaviour is extremely hard to track down, and I regard it as a significant functional change. I urge you, as I?ve urged the vendor in question, to keep the ?standard? behaviour as default. And BTW, Shenandoah is starting to perform very well in my tests. Our primary metric is transaction roundtrip time, and outlier elimination is important. In my latest tests (of a week ago), Shenandoah had much shorter maximum times than our baseline (which uses Hotspot+ParNew+CMS). You really have made a fantastic work this year! Best regards, /Lennart > 20 dec. 2016 kl. 15:01 skrev Roman Kennke : > > Am Dienstag, den 20.12.2016, 13:32 +0000 schrieb Andrew Haley: >> On 20/12/16 10:57, Aleksey Shipilev wrote: >>> Since we care mostly about pause times, and not the raw throughput, >>> it makes >>> sense to enable safepoints in counted loops. This makes us much >>> more responsive >>> (as in, TTSP is lower) in many interesting scenarios. >> >> True, but I have seen some very interesting cases where we beat G1 in >> throughput. > > Yes. As far as I can see, those are not affected by this (e.g. compiler > benchmarks). And multiple seconds (!) just to get to a safepoint seems > way too much, and it's more than 1 program that is affected by this. > >> Let's not overdo this: at the very least we need to know >> how to restore throughput when running Shenandoah; > > easy: -XX:-UseCountedLoopSafepoints > > Infact, I've been thinking for a while about a sort of 'priority' > setting for Shenandoah, where one could choose between 'throughput' and > 'pausetime', and we would turn on or off specific options to improve > one or the other, e.g. this UseCountedLoopSafepoints flag, some > heuristics settings, and so on. Kindof like the -XX:+AggressiveOpts > setting, but towards one or the other priority. > > However, so far there are not that many settings in this regard, and > our priority is always leaning towards pause times anyway... > >> all this business >> of one flag affecting others can be surprising. > > Indeed. I would be most worried about turning on code paths that are > not used otherwise, and thus run into bugs that are not ours, but in > this case it seems to be simple enough. > > Roman From shade at redhat.com Tue Dec 20 15:26:09 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 20 Dec 2016 16:26:09 +0100 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com> References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com> Message-ID: <8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com> Hi Lennart, On 12/20/2016 04:18 PM, Lennart B?rjeson wrote: > We have recently encountered an issue with a commercial JVM which had elected > to skip safepoint checks for all counted loops. This broke our product, since > we have a crucial spin wait in a long-indexed loop. > > (As you know, the JVM normally inserts safepoint checks in long-indexed, but > not in int-indexed, counted loops.) > > Such a change in behaviour is extremely hard to track down, and I regard it > as a significant functional change. > > I urge you, as I?ve urged the vendor in question, to keep the ?standard? > behaviour as default. I am a bit confused about the notion of "standard behavior". There is no standard that mandates either putting safepoint checks into loops, or skipping them. This Shenandoah change _inserts_ more safepoint checks, not eliminates them, so this seems like something you want? Thanks, -Aleksey From simone.bordet at gmail.com Tue Dec 20 15:27:32 2016 From: simone.bordet at gmail.com (Simone Bordet) Date: Tue, 20 Dec 2016 16:27:32 +0100 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com> References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com> Message-ID: Hi, On Tue, Dec 20, 2016 at 4:18 PM, Lennart B?rjeson wrote: > I feel I must chip in here? > > (I?m continuously testing Shenandoah, as well as other JVM variants, with our products, as part of my work.) > > We have recently encountered an issue with a commercial JVM which had elected to skip safepoint checks for all counted loops. This broke our product, since we have a crucial spin wait in a long-indexed loop. Wow. Can you detail how your product makes use of the fact that the JVM is polling (or not) for a safepoint ? I'm guessing you are doing this from native code ? Custom JVM modifications ? > (As you know, the JVM normally inserts safepoint checks in long-indexed, but not in int-indexed, counted loops.) > > Such a change in behaviour is extremely hard to track down, and I regard it as a significant functional change. Would not be the opposite, i.e. your product relying on very specific implementation detail of how the JVM works ? Thanks ! -- Simone Bordet http://bordet.blogspot.com --- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless. Victoria Livschitz From lennart.borjeson at cinnober.com Tue Dec 20 15:50:47 2016 From: lennart.borjeson at cinnober.com (=?utf-8?B?TGVubmFydCBCw7ZyamVzb24=?=) Date: Tue, 20 Dec 2016 15:50:47 +0000 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com> Message-ID: <6F4CD75A-122F-44B8-B88A-CFB2742FA447@cinnober.com> > 20 dec. 2016 kl. 16:27 skrev Simone Bordet : > > Hi, > > On Tue, Dec 20, 2016 at 4:18 PM, Lennart B?rjeson > wrote: >> I feel I must chip in here? >> >> (I?m continuously testing Shenandoah, as well as other JVM variants, with our products, as part of my work.) >> >> We have recently encountered an issue with a commercial JVM which had elected to skip safepoint checks for all counted loops. This broke our product, since we have a crucial spin wait in a long-indexed loop. > > Wow. Can you detail how your product makes use of the fact that the > JVM is polling (or not) for a safepoint ? > I'm guessing you are doing this from native code ? Custom JVM modifications ? No, just standard java. And I wouldn?t say we *made use* of it, we just had some code which worked in one JVM and not in the next? In our case, we had a while-loop testing a long variable, which somehow was deemed to be a counted loop, and consequently not checked under the new behaviour. Very tricky to identify. > >> (As you know, the JVM normally inserts safepoint checks in long-indexed, but not in int-indexed, counted loops.) >> >> Such a change in behaviour is extremely hard to track down, and I regard it as a significant functional change. > > Would not be the opposite, i.e. your product relying on very specific > implementation detail of how the JVM works ? > Well, you?re *always* dependent on how the JVM works, aren?t you. ;-) From lennart.borjeson at cinnober.com Tue Dec 20 15:54:36 2016 From: lennart.borjeson at cinnober.com (=?utf-8?B?TGVubmFydCBCw7ZyamVzb24=?=) Date: Tue, 20 Dec 2016 15:54:36 +0000 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: <8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com> References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com> <8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com> Message-ID: > 20 dec. 2016 kl. 16:26 skrev Aleksey Shipilev : > > I am a bit confused about the notion of "standard behavior". There is no > standard that mandates either putting safepoint checks into loops, or skipping them. > > This Shenandoah change _inserts_ more safepoint checks, not eliminates them, so > this seems like something you want? > I was thinking about the flag UseCountedLoopSafepoints. The current default is ?false?, and I gathered you were discussing to change this to ?true?? From shade at redhat.com Tue Dec 20 15:57:54 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 20 Dec 2016 16:57:54 +0100 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com> <8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com> Message-ID: On 12/20/2016 04:54 PM, Lennart B?rjeson wrote: >> 20 dec. 2016 kl. 16:26 skrev Aleksey Shipilev : >> >> I am a bit confused about the notion of "standard behavior". There is no >> standard that mandates either putting safepoint checks into loops, or >> skipping them. >> >> This Shenandoah change _inserts_ more safepoint checks, not eliminates >> them, so this seems like something you want? >> > > I was thinking about the flag UseCountedLoopSafepoints. The current default > is ?false?, and I gathered you were discussing to change this to ?true?? Yes. "true" means Hotspot will emit safepoints checks in counted loops, thus improving time-to-safepoint, and therefore improving pause time. Isn't that the behavior you want for your product? -Aleksey From aph at redhat.com Tue Dec 20 16:52:09 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 20 Dec 2016 16:52:09 +0000 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: <1482242498.2807.60.camel@redhat.com> References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> Message-ID: On 20/12/16 14:01, Roman Kennke wrote: > Am Dienstag, den 20.12.2016, 13:32 +0000 schrieb Andrew Haley: >> On 20/12/16 10:57, Aleksey Shipilev wrote: >>> Since we care mostly about pause times, and not the raw throughput, >>> it makes >>> sense to enable safepoints in counted loops. This makes us much >>> more responsive >>> (as in, TTSP is lower) in many interesting scenarios. >> >> True, but I have seen some very interesting cases where we beat G1 in >> throughput. > > Yes. As far as I can see, those are not affected by this (e.g. compiler > benchmarks). And multiple seconds (!) just to get to a safepoint seems > way too much, and it's more than 1 program that is affected by this. Can you tell me which program delays so long? I'd like to see it. I suspect that's a bug. And, of course, people are capable of using -XX:-UseCountedLoopSafepoints themselves. >> Let's not overdo this: at the very least we need to know >> how to restore throughput when running Shenandoah; > > easy: -XX:-UseCountedLoopSafepoints Right, so we know for sure that enabling Shenandoah only affects one other flag. Good! Andrew. From shade at redhat.com Tue Dec 20 17:08:31 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 20 Dec 2016 18:08:31 +0100 Subject: RFR (S): Refactor concurrent mark to be more inlineable Message-ID: <847fc65d-9814-26d1-7c38-ee46bc7d2627@redhat.com> Hi, I would like to refactor the concurrent mark to make it more inlineable, prepare it for conc mark prefetch, etc: http://cr.openjdk.java.net/~shade/shenandoah/concmark-inline/webrev.01/ In that patch: a) Peeled concurrent_process_queues before the hot loop; b) Inlined try_* methods to call a very fat do_object_or_array once. It also helps to pinpoint a single place where we get the tasks, so that future work on buffering and prefetching would capitalize on this; c) Optimize SATB draining code: poll the local queue immediately after draining SATB, do not do stealing which will bypass the local queue; d) Marked a few important closures "inline", and added headers where needed; Testing: hs_gc_shenandoah, SPECjvm/Derby. Thanks, -Aleksey From lennart.borjeson at cinnober.com Tue Dec 20 17:12:19 2016 From: lennart.borjeson at cinnober.com (=?Windows-1252?Q?Lennart_B=F6rjeson?=) Date: Tue, 20 Dec 2016 17:12:19 +0000 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com> <8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com> , Message-ID: > 20 dec. 2016 kl. 16:58 skrev Aleksey Shipilev : > > On 12/20/2016 04:54 PM, Lennart B?rjeson wrote: >>> 20 dec. 2016 kl. 16:26 skrev Aleksey Shipilev : >>> >>> I am a bit confused about the notion of "standard behavior". There is no >>> standard that mandates either putting safepoint checks into loops, or >>> skipping them. >>> >>> This Shenandoah change _inserts_ more safepoint checks, not eliminates >>> them, so this seems like something you want? >>> >> >> I was thinking about the flag UseCountedLoopSafepoints. The current default >> is ?false?, and I gathered you were discussing to change this to ?true?? > > Yes. "true" means Hotspot will emit safepoints checks in counted loops, thus > improving time-to-safepoint, and therefore improving pause time. Isn't that the > behavior you want for your product? > Well, if there were to be a safe point check in every counted loop, I fear overall performance would suffer too much. But I would of course need to test that. Note that the problem we had with the other JVM was more that the definition of "counted loop" had changed, than a change of a default value for a flag. From rkennke at redhat.com Tue Dec 20 17:15:19 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 20 Dec 2016 18:15:19 +0100 Subject: RFR (S): Refactor concurrent mark to be more inlineable In-Reply-To: <847fc65d-9814-26d1-7c38-ee46bc7d2627@redhat.com> References: <847fc65d-9814-26d1-7c38-ee46bc7d2627@redhat.com> Message-ID: <1482254119.2807.61.camel@redhat.com> Looks good to me! Roman Am Dienstag, den 20.12.2016, 18:08 +0100 schrieb Aleksey Shipilev: > Hi, > > I would like to refactor the concurrent mark to make it more > inlineable, prepare > it for conc mark prefetch, etc: > ? http://cr.openjdk.java.net/~shade/shenandoah/concmark-inline/webrev > .01/ > > In that patch: > ? a) Peeled concurrent_process_queues before the hot loop; > ? b) Inlined try_* methods to call a very fat do_object_or_array > once. It also > helps to pinpoint a single place where we get the tasks, so that > future work on > buffering and prefetching would capitalize on this; > ? c) Optimize SATB draining code: poll the local queue immediately > after > draining SATB, do not do stealing which will bypass the local queue; > ? d) Marked a few important closures "inline", and added headers > where needed; > > Testing: hs_gc_shenandoah, SPECjvm/Derby. > > Thanks, > -Aleksey > From rkennke at redhat.com Tue Dec 20 17:48:31 2016 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 20 Dec 2016 18:48:31 +0100 Subject: RFR: Improve AryEq In-Reply-To: <743b2c9e-cf5a-f7ac-ef69-1c8c6a47b54a@redhat.com> References: <1482232256.2807.56.camel@redhat.com> <743b2c9e-cf5a-f7ac-ef69-1c8c6a47b54a@redhat.com> Message-ID: <1482256111.2807.62.camel@redhat.com> Am Dienstag, den 20.12.2016, 12:15 +0100 schrieb Aleksey Shipilev: > On 12/20/2016 12:10 PM, Roman Kennke wrote: > > This adds an cmp-barrier to the code generated by AryEq. A false > > negative in the array ptr comparison would result in the slow-path > > being taken, even though it's not necessary. The barrier should get > > us > > on the fast path more often. > > > > Ok? > > > > http://cr.openjdk.java.net/~rkennke/aryeq/webrev.00/ > > Looks okay, but would be interesting to see if we can merge null- > checking paths > with acmp barrier? That would be complicated. Would need build special code just for this intrinsics... Doesn't seem worth for now. I'm pushing as is. Roman From roman at kennke.org Tue Dec 20 17:49:24 2016 From: roman at kennke.org (roman at kennke.org) Date: Tue, 20 Dec 2016 17:49:24 +0000 Subject: hg: shenandoah/jdk9/hotspot: Improve AryEq instruction by avoiding false negatives with a Shenandoah cmp barrier Message-ID: <201612201749.uBKHnOxR023662@aojmv0008.oracle.com> Changeset: 0d30308cdc65 Author: rkennke Date: 2016-12-20 18:49 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/0d30308cdc65 Improve AryEq instruction by avoiding false negatives with a Shenandoah cmp barrier ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/x86/vm/macroAssembler_x86.cpp From shade at redhat.com Tue Dec 20 17:56:17 2016 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 20 Dec 2016 18:56:17 +0100 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> Message-ID: <89058fec-99f8-6e29-1ca9-45ec0b72b444@redhat.com> On 12/20/2016 05:52 PM, Andrew Haley wrote: > On 20/12/16 14:01, Roman Kennke wrote: >> Am Dienstag, den 20.12.2016, 13:32 +0000 schrieb Andrew Haley: >>> On 20/12/16 10:57, Aleksey Shipilev wrote: >>>> Since we care mostly about pause times, and not the raw throughput, >>>> it makes >>>> sense to enable safepoints in counted loops. This makes us much >>>> more responsive >>>> (as in, TTSP is lower) in many interesting scenarios. >>> >>> True, but I have seen some very interesting cases where we beat G1 in >>> throughput. >> >> Yes. As far as I can see, those are not affected by this (e.g. compiler >> benchmarks). And multiple seconds (!) just to get to a safepoint seems >> way too much, and it's more than 1 program that is affected by this. > > Can you tell me which program delays so long? I'd like to see it. > > I suspect that's a bug. And, of course, people are capable of using > -XX:-UseCountedLoopSafepoints themselves. This is not a bug, it is a very known Hotspot issue: http://psy-lob-saw.blogspot.de/2015/12/safepoints.html http://psy-lob-saw.blogspot.de/2016/02/wait-for-it-counteduncounted-loops.html If you want a contrived example, here's one: http://icedtea.classpath.org/people/shade/gc-bench/file/5b77fb55a8b6/src/main/java/org/openjdk/gcbench/yield/ArrayIteration.java With 100M array, on my high-end i7 we have 300ms TTSP, which completely dominates Shenandoah pause time. With safepoints in the loop TTSP is down to 1-5ms. Another one: http://icedtea.classpath.org/people/shade/gc-bench/file/4c32eb6c67b0/src/main/java/org/openjdk/gcbench/yield/MonteCarloPI.java With 100M samples one MonteCarlo run takes 1s, and that's the TTSP on my desktop as well. With safepoints in the loop TTSP is down to 1-5ms. Another one: http://icedtea.classpath.org/people/shade/gc-bench/file/5b77fb55a8b6/src/main/java/org/openjdk/gcbench/fragger/LinkedListFragger.java If you do LinkedList.get(index), it does counted loop inside for stepping r->r.next N times. But since the whole thing is cache-hostile, you have a problem. On large machine with 32 slow cores and slow memory TTSPs are in 1+ second range. This completely blows "ultra low pause" targets. There is an alternative solution: loop mining, i.e. replacing one big loop with two nested loops, and safepointing the outer one. This requires heavy changes in C2. Roland wanted to take on this after the Xmas break. Thanks, -Aleksey From ashipile at redhat.com Tue Dec 20 18:20:23 2016 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 20 Dec 2016 18:20:23 +0000 Subject: hg: shenandoah/jdk9/hotspot: Refactor concurrent mark to be more inlineable. Message-ID: <201612201820.uBKIKN1Z001903@aojmv0008.oracle.com> Changeset: 5c7176fd9317 Author: shade Date: 2016-12-20 19:18 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/5c7176fd9317 Refactor concurrent mark to be more inlineable. ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp From rwestrel at redhat.com Tue Dec 20 20:37:39 2016 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Tue, 20 Dec 2016 20:37:39 +0000 Subject: hg: shenandoah/jdk9/hotspot: C2: the result of an implicit null check read barrier may be used when the check fails Message-ID: <201612202037.uBKKben6006314@aojmv0008.oracle.com> Changeset: 307980ea8e60 Author: roland Date: 2016-12-19 11:22 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/307980ea8e60 C2: the result of an implicit null check read barrier may be used when the check fails ! src/share/vm/opto/shenandoahSupport.cpp From aph at redhat.com Tue Dec 20 21:22:27 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 20 Dec 2016 21:22:27 +0000 Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah In-Reply-To: <89058fec-99f8-6e29-1ca9-45ec0b72b444@redhat.com> References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com> <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com> <1482242498.2807.60.camel@redhat.com> <89058fec-99f8-6e29-1ca9-45ec0b72b444@redhat.com> Message-ID: <27d1f284-abd1-8b62-9be5-ee78df361d46@redhat.com> On 20/12/16 17:56, Aleksey Shipilev wrote: > On 12/20/2016 05:52 PM, Andrew Haley wrote: >> On 20/12/16 14:01, Roman Kennke wrote: >>> >>> Yes. As far as I can see, those are not affected by this (e.g. compiler >>> benchmarks). And multiple seconds (!) just to get to a safepoint seems >>> way too much, and it's more than 1 program that is affected by this. >> >> Can you tell me which program delays so long? I'd like to see it. >> >> I suspect that's a bug. And, of course, people are capable of using >> -XX:-UseCountedLoopSafepoints themselves. > > This is not a bug, it is a very known Hotspot issue: > http://psy-lob-saw.blogspot.de/2015/12/safepoints.html > http://psy-lob-saw.blogspot.de/2016/02/wait-for-it-counteduncounted-loops.html Yes, yes, I know about counted loop safepoints. :-) > If you want a contrived example, here's one: > > http://icedtea.classpath.org/people/shade/gc-bench/file/5b77fb55a8b6/src/main/java/org/openjdk/gcbench/yield/ArrayIteration.java > > With 100M array, on my high-end i7 we have 300ms TTSP, which completely > dominates Shenandoah pause time. With safepoints in the loop TTSP is down to 1-5ms. Sure, but I was asking about a *program* which was affected by a multiple- second safepoint delay. I've never seen such a bad one. I know that it's possible in theory. > Another one: > http://icedtea.classpath.org/people/shade/gc-bench/file/4c32eb6c67b0/src/main/java/org/openjdk/gcbench/yield/MonteCarloPI.java > > With 100M samples one MonteCarlo run takes 1s, and that's the TTSP on my desktop > as well. With safepoints in the loop TTSP is down to 1-5ms. OK, right. So I take it that MonteCarloPI is an example of a real program which is affected in this way. > There is an alternative solution: loop mining, i.e. replacing one big loop with > two nested loops, and safepointing the outer one. This requires heavy changes in > C2. Roland wanted to take on this after the Xmas break. I can see the sense in that. Andrew. From rkennke at redhat.com Wed Dec 21 17:56:29 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 21 Dec 2016 18:56:29 +0100 Subject: RFR (JDK8-only): Fix freeze on OOM-on-evac regarding the PLL Message-ID: <1482342989.2807.77.camel@redhat.com> This is a complicated one. We may get a freeze under the following situation: when the final-mark pause is left, the ShenandoahConcurrentThread sends a message to the SurrogateLockerThread to release the pending-list-lock (see VM_ShenandoahReferenceOperation::doit_epilogue()). The SurrogateLockerThread is a Java thread that gets kicked off right after the pause. It attempts to acquire the PLL (a Java lock) and thus employs a write-barrier on it. When that write-barrier runs out-of- memory, it ends up in our oom_during_evacuation() loop and is waiting for the _evacuation_in_progress flag to get cleared. However, since the ShenandoahConcurrentThread is waiting for the SLT to finish, we never get to where we clear that flag (we don't even kick off evacuation yet). The proposed solution attempts to evacuate the PLL during the pause. If it succeeds, then the write-barrier will simply pick up the to-space object. If it fails, we schedule a full-gc, and turn off evacuation before leaving the pause. In no case can the write-barrier on the PLL run into OOM, and in all cases will it be correctly unlocked. Luckily for us, the whole PLL madness has been changed in a very positive way in JDK9, so this change does not apply there. http://cr.openjdk.java.net/~rkennke/fixoomevacpllfreeze/webrev.00/ Ok to push? Roman From rkennke at redhat.com Wed Dec 21 18:16:03 2016 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 21 Dec 2016 19:16:03 +0100 Subject: RFR (JDK8-only): Fix freeze on OOM-on-evac regarding the PLL In-Reply-To: <1482342989.2807.77.camel@redhat.com> References: <1482342989.2807.77.camel@redhat.com> Message-ID: <1482344163.2807.78.camel@redhat.com> Oh btw, I tested that by running specjvm with aggressive heuristics, this used to sometimes freeze before. Roman Am Mittwoch, den 21.12.2016, 18:56 +0100 schrieb Roman Kennke: > This is a complicated one. We may get a freeze under the following > situation: > > when the final-mark pause is left, the ShenandoahConcurrentThread > sends > a message to the SurrogateLockerThread to release the pending-list- > lock? > (see VM_ShenandoahReferenceOperation::doit_epilogue()). The > SurrogateLockerThread is a Java thread that gets kicked off right > after > the pause. It attempts to acquire the PLL (a Java lock) and thus > employs a write-barrier on it. When that write-barrier runs out-of- > memory, it ends up in our oom_during_evacuation() loop and is waiting > for the _evacuation_in_progress flag to get cleared. However, since > the > ShenandoahConcurrentThread is waiting for the SLT to finish, we never > get to where we clear that flag (we don't even kick off evacuation > yet). > > The proposed solution attempts to evacuate the PLL during the pause. > If > it succeeds, then the write-barrier will simply pick up the to-space > object. If it fails, we schedule a full-gc, and turn off evacuation > before leaving the pause. In no case can the write-barrier on the PLL > run into OOM, and in all cases will it be correctly unlocked. > > Luckily for us, the whole PLL madness has been changed in a very > positive way in JDK9, so this change does not apply there. > > http://cr.openjdk.java.net/~rkennke/fixoomevacpllfreeze/webrev.00/ > > Ok to push? > > Roman From zgu at redhat.com Wed Dec 21 18:18:24 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 21 Dec 2016 13:18:24 -0500 Subject: RFR (JDK8-only): Fix freeze on OOM-on-evac regarding the PLL In-Reply-To: <1482342989.2807.77.camel@redhat.com> References: <1482342989.2807.77.camel@redhat.com> Message-ID: Okay. -Zhengyu On 12/21/2016 12:56 PM, Roman Kennke wrote: > This is a complicated one. We may get a freeze under the following > situation: > > when the final-mark pause is left, the ShenandoahConcurrentThread sends > a message to the SurrogateLockerThread to release the pending-list-lock > (see VM_ShenandoahReferenceOperation::doit_epilogue()). The > SurrogateLockerThread is a Java thread that gets kicked off right after > the pause. It attempts to acquire the PLL (a Java lock) and thus > employs a write-barrier on it. When that write-barrier runs out-of- > memory, it ends up in our oom_during_evacuation() loop and is waiting > for the _evacuation_in_progress flag to get cleared. However, since the > ShenandoahConcurrentThread is waiting for the SLT to finish, we never > get to where we clear that flag (we don't even kick off evacuation > yet). > > The proposed solution attempts to evacuate the PLL during the pause. If > it succeeds, then the write-barrier will simply pick up the to-space > object. If it fails, we schedule a full-gc, and turn off evacuation > before leaving the pause. In no case can the write-barrier on the PLL > run into OOM, and in all cases will it be correctly unlocked. > > Luckily for us, the whole PLL madness has been changed in a very > positive way in JDK9, so this change does not apply there. > > http://cr.openjdk.java.net/~rkennke/fixoomevacpllfreeze/webrev.00/ > > Ok to push? > > Roman From roman at kennke.org Wed Dec 21 18:28:05 2016 From: roman at kennke.org (roman at kennke.org) Date: Wed, 21 Dec 2016 18:28:05 +0000 Subject: hg: shenandoah/jdk8u/hotspot: Fix freeze on OOM-on-evac regarding the PLL. Message-ID: <201612211828.uBLIS5HB011037@aojmv0008.oracle.com> Changeset: 9ba353933d12 Author: rkennke Date: 2016-12-21 19:27 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/9ba353933d12 Fix freeze on OOM-on-evac regarding the PLL. ! src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.hpp ! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.cpp From lennart.borjeson at cinnober.com Thu Dec 22 15:44:10 2016 From: lennart.borjeson at cinnober.com (=?utf-8?B?TGVubmFydCBCw7ZyamVzb24=?=) Date: Thu, 22 Dec 2016 15:44:10 +0000 Subject: Shenandoah and Graal? Message-ID: <02840F81-337E-426F-BDE1-593D2B4F8F89@cinnober.com> I?ve noticed that Graal seems to have been integrated in the openjdk9 sources as of build 150. I?ve already mentioned I?m getting better and better results with Shenadoah, but since I?ve have got encouraging results when testing with the Graal compiler, I?d like to eventually try out graal+shenandoah. Will that be possible? I?ve understood you?ve made shenandoah-related updates to C2, so I?d like to ask if Shenandoah is currently dependent on C2 only? Best regards, /Lennart B?rjeson From rkennke at redhat.com Thu Dec 22 15:48:20 2016 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 22 Dec 2016 16:48:20 +0100 Subject: Shenandoah and Graal? In-Reply-To: <02840F81-337E-426F-BDE1-593D2B4F8F89@cinnober.com> References: <02840F81-337E-426F-BDE1-593D2B4F8F89@cinnober.com> Message-ID: <1482421700.2807.93.camel@redhat.com> Hi Lennart, > I?ve noticed that Graal seems to have been integrated in the openjdk9 > sources as of build 150. > > I?ve already mentioned I?m getting better and better results with > Shenadoah, but since I?ve have got encouraging results when testing > with the Graal compiler, I?d like to eventually try out > graal+shenandoah. > > Will that be possible? I?ve understood you?ve made shenandoah-related > updates to C2, so I?d like to ask if Shenandoah is currently > dependent on C2 only? Graal does currently not compile the barriers that are required for Shenandoah. It's on our to-do list, but currently it's not possible. Best regards, Roman