From rwestrel at redhat.com  Thu Dec  1 08:12:37 2016
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Thu, 01 Dec 2016 08:12:37 +0000
Subject: hg: shenandoah/jdk9/hotspot: Couple fixes to write barrier expansion
Message-ID: <201612010812.uB18Cbdr018883@aojmv0008.oracle.com>

Changeset: 7e4baa0817d1
Author:    roland
Date:      2016-12-01 08:49 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/7e4baa0817d1

Couple fixes to write barrier expansion

! src/share/vm/classfile/classLoader.cpp
! src/share/vm/opto/shenandoahSupport.cpp


From rwestrel at redhat.com  Thu Dec  1 08:23:54 2016
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Thu, 01 Dec 2016 08:23:54 +0000
Subject: hg: shenandoah/jdk9/hotspot: undo change made by mistake to compile
	the world
Message-ID: <201612010823.uB18NsQh022239@aojmv0008.oracle.com>

Changeset: dfa629752080
Author:    roland
Date:      2016-12-01 09:23 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/dfa629752080

undo change made by mistake to compile the world

! src/share/vm/classfile/classLoader.cpp


From rwestrel at redhat.com  Fri Dec  2 16:33:33 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 02 Dec 2016 17:33:33 +0100
Subject: replace barrier's input with barrier's output in all dominated uses
	to decrease pressure on register allocator
Message-ID: <dk660n21eaq.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/shenandoah/registerpressure/webrev.00/

This implements Roman's suggestion that when we use an oop directly and
there's a dominating barrier, we can safely replace the oop by the
output of the barrier. So for instance:

a' = rb(a);
..
call(a);

can also be compiled as:

a' = rb(a);
..
call(a');

and if there's no use of a after the barrier then we don't keep both a
and a' live but only a'.

This is implemented in the patch:
- for write barriers at barrier expansion time.
- for read barriers, when read barriers are scheduled.

Roland.

From rkennke at redhat.com  Fri Dec  2 17:34:02 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 02 Dec 2016 18:34:02 +0100
Subject: replace barrier's input with barrier's output in all dominated
	uses to decrease pressure on register allocator
In-Reply-To: <dk660n21eaq.fsf@rwestrel.remote.csb>
References: <dk660n21eaq.fsf@rwestrel.remote.csb>
Message-ID: <1480700042.2597.2.camel@redhat.com>

Sounds and looks good.

The changes in shenandoahHeap.cpp|hpp seem unrelated though.

Roman

Am Freitag, den 02.12.2016, 17:33 +0100 schrieb Roland Westrelin:
> http://cr.openjdk.java.net/~roland/shenandoah/registerpressure/webrev
> .00/
> 
> This implements Roman's suggestion that when we use an oop directly
> and
> there's a dominating barrier, we can safely replace the oop by the
> output of the barrier. So for instance:
> 
> a' = rb(a);
> ..
> call(a);
> 
> can also be compiled as:
> 
> a' = rb(a);
> ..
> call(a');
> 
> and if there's no use of a after the barrier then we don't keep both
> a
> and a' live but only a'.
> 
> This is implemented in the patch:
> - for write barriers at barrier expansion time.
> - for read barriers, when read barriers are scheduled.
> 
> Roland.

From rwestrel at redhat.com  Fri Dec  2 17:35:07 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 2 Dec 2016 18:35:07 +0100
Subject: replace barrier's input with barrier's output in all dominated
	uses to decrease pressure on register allocator
In-Reply-To: <1480700042.2597.2.camel@redhat.com>
References: <dk660n21eaq.fsf@rwestrel.remote.csb>
	<1480700042.2597.2.camel@redhat.com>
Message-ID: <98eaa9b0-a605-986b-82ab-7237ed017961@redhat.com>

> The changes in shenandoahHeap.cpp|hpp seem unrelated though.

They are. At some point I tried building an optimized build and those
changes were required.

Roland.

From rkennke at redhat.com  Fri Dec  2 17:37:55 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 02 Dec 2016 18:37:55 +0100
Subject: replace barrier's input with barrier's output in all dominated
	uses to decrease pressure on register allocator
In-Reply-To: <98eaa9b0-a605-986b-82ab-7237ed017961@redhat.com>
References: <dk660n21eaq.fsf@rwestrel.remote.csb>
	<1480700042.2597.2.camel@redhat.com>
	<98eaa9b0-a605-986b-82ab-7237ed017961@redhat.com>
Message-ID: <1480700275.2597.3.camel@redhat.com>

Am Freitag, den 02.12.2016, 18:35 +0100 schrieb Roland Westrelin:
> > The changes in shenandoahHeap.cpp|hpp seem unrelated though.
> 
> They are. At some point I tried building an optimized build and those
> changes were required.

Hmm, strange. Doesn't matter, please push!

Roman


From shade at redhat.com  Mon Dec  5 15:14:42 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 5 Dec 2016 16:14:42 +0100
Subject: RFR (XS): Cherry-pick 8169261: Fix for JDK-8067744 creates build
	failures with some versions of gcc and/or linux
Message-ID: <324ede33-1036-9ba3-933a-e6c83858e78a@redhat.com>

Going to cherry-pick this one, otherwise hotspot_gc_shenandoah does not run:
  http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/d9e9bc313c5a

Ok?

Thanks,
-Aleksey


From shade at redhat.com  Mon Dec  5 16:00:52 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 5 Dec 2016 17:00:52 +0100
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
Message-ID: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>

Hi,

Currently, when concurrent GC is canceled, we still enter the VM operation for
concurrent evacuation, only to exit it quickly and slide into the full GC. This
causes *two* back-to-back safepoints: one short from evac, and another large for
full GC. While short one is normally short, it can hit the unlucky scheduling
outlier and drag the pause time up.

This change avoids going to evac if conc GC was canceled:
  http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.01/

Additionally, it resets the mark bitmaps before full GC with parallel workers,
not concurrent ones, which would be important once Zhengyu trims down the number
of concurrent workers.

Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick)

Thanks,
-Aleksey


From rkennke at redhat.com  Mon Dec  5 16:04:13 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 5 Dec 2016 11:04:13 -0500 (EST)
Subject: RFR (XS): Cherry-pick 8169261: Fix for JDK-8067744 creates
	build	failures with some versions of gcc and/or linux
Message-ID: <531194503.4583034.1480953853677.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com>

Yes.
Generally, I'd say what is approved upstream doesn't need approval here, unless you are unsure for some reason.

Roman

Am 05.12.2016 4:15 nachm. schrieb Aleksey Shipilev <shade at redhat.com>:
>
> Going to cherry-pick this one, otherwise hotspot_gc_shenandoah does not run: 
> ? http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/d9e9bc313c5a 
>
> Ok? 
>
> Thanks, 
> -Aleksey 
>

From ashipile at redhat.com  Mon Dec  5 16:05:34 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Mon, 05 Dec 2016 16:05:34 +0000
Subject: hg: shenandoah/jdk9/hotspot: Cherry-pick 8169261: Fix for JDK-8067744
	creates build failures with some versions of gcc and/or linux
Message-ID: <201612051605.uB5G5YTA011291@aojmv0008.oracle.com>

Changeset: 5db8e70a5237
Author:    shade
Date:      2016-12-05 16:26 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/5db8e70a5237

Cherry-pick 8169261: Fix for JDK-8067744 creates build failures with some versions of gcc and/or linux

! make/test/JtregNative.gmk


From shade at redhat.com  Mon Dec  5 16:07:48 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 5 Dec 2016 17:07:48 +0100
Subject: RFR (XS): Cherry-pick 8169261: Fix for JDK-8067744 creates build
	failures with some versions of gcc and/or linux
In-Reply-To: <531194503.4583034.1480953853677.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com>
References: <531194503.4583034.1480953853677.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com>
Message-ID: <10d07415-7b51-c544-1ada-e44b1772ad1e@redhat.com>

Okay, if that does not complicates the merge somehow :)

-Aleksey

On 12/05/2016 05:04 PM, Roman Kennke wrote:
> Yes.
> Generally, I'd say what is approved upstream doesn't need approval here, unless you are unsure for some reason.
> 
> Roman
> 
> Am 05.12.2016 4:15 nachm. schrieb Aleksey Shipilev <shade at redhat.com>:
>>
>> Going to cherry-pick this one, otherwise hotspot_gc_shenandoah does not run: 
>>   http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/d9e9bc313c5a 
>>
>> Ok? 
>>
>> Thanks, 
>> -Aleksey 
>>


From rkennke at redhat.com  Mon Dec  5 17:20:01 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 05 Dec 2016 18:20:01 +0100
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
In-Reply-To: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
References: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
Message-ID: <1480958401.2597.8.camel@redhat.com>

Some comments:

- GC can be cancelled during final-mark-pause. Might be worth to keep
the check for cancelled-gc after init-mark-pause. Same after
evacuation: if evacuation gets cancelled, we don't need to reset the
bitmaps because now it's done at start of full-gc. I think.

- This here looks wrong:

+??// b. Cancel evacuation, if in progress
+??if (_heap->is_evacuation_in_progress()) {
+????MutexLocker mu(Threads_lock);
+????_heap->set_evacuation_in_progress(false);
+??}

This happens during safepoint. The VMThread would hold the Threads_lock
and the above would deadlock.

We need to acquire the Threads_lock only when turning off evacuation
outside of a safepoint.

Roman

Am Montag, den 05.12.2016, 17:00 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> Currently, when concurrent GC is canceled, we still enter the VM
> operation for
> concurrent evacuation, only to exit it quickly and slide into the
> full GC. This
> causes *two* back-to-back safepoints: one short from evac, and
> another large for
> full GC. While short one is normally short, it can hit the unlucky
> scheduling
> outlier and drag the pause time up.
> 
> This change avoids going to evac if conc GC was canceled:
> ? http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.
> 01/
> 
> Additionally, it resets the mark bitmaps before full GC with parallel
> workers,
> not concurrent ones, which would be important once Zhengyu trims down
> the number
> of concurrent workers.
> 
> Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick)
> 
> Thanks,
> -Aleksey
> 

From zgu at redhat.com  Mon Dec  5 17:25:58 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 5 Dec 2016 12:25:58 -0500
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
In-Reply-To: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
References: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
Message-ID: <9ab07a34-8406-481d-870a-7d03688eeae4@redhat.com>

114 // b. Cancel evacuation, if in progress
115 if (_heap->is_evacuation_in_progress()) {
116 MutexLocker mu(Threads_lock);
117 _heap->set_evacuation_in_progress(false);
118 }


I think that we can eliminate Threads_lock above by changing the assertion below:

void JavaThread::set_evacuation_in_progress_all_threads(bool in_prog) {
   assert(Threads_lock->owned_by_self(), "must hold Threads_lock");       <==== assert_locked_or_safepoint(Threads_lock)
   _evacuation_in_progress_global = in_prog;
   for (JavaThread* t = Threads::first(); t != NULL; t = t->next()) {
     t->set_evacuation_in_progress(in_prog);
   }
}


Thanks,

-Zhengyu

On 12/05/2016 11:00 AM, Aleksey Shipilev wrote:
> Hi,
>
> Currently, when concurrent GC is canceled, we still enter the VM operation for
> concurrent evacuation, only to exit it quickly and slide into the full GC. This
> causes *two* back-to-back safepoints: one short from evac, and another large for
> full GC. While short one is normally short, it can hit the unlucky scheduling
> outlier and drag the pause time up.
>
> This change avoids going to evac if conc GC was canceled:
>    http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.01/
>
> Additionally, it resets the mark bitmaps before full GC with parallel workers,
> not concurrent ones, which would be important once Zhengyu trims down the number
> of concurrent workers.
>
> Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick)
>
> Thanks,
> -Aleksey
>

From rkennke at redhat.com  Mon Dec  5 17:28:41 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 05 Dec 2016 18:28:41 +0100
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
In-Reply-To: <9ab07a34-8406-481d-870a-7d03688eeae4@redhat.com>
References: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
	<9ab07a34-8406-481d-870a-7d03688eeae4@redhat.com>
Message-ID: <1480958921.2597.10.camel@redhat.com>

Am Montag, den 05.12.2016, 12:25 -0500 schrieb Zhengyu Gu:
> 114 // b. Cancel evacuation, if in progress
> 115 if (_heap->is_evacuation_in_progress()) {
> 116 MutexLocker mu(Threads_lock);
> 117 _heap->set_evacuation_in_progress(false);
> 118 }
> 
> 
> I think that we can eliminate Threads_lock above by changing the
> assertion below:
> 
> void JavaThread::set_evacuation_in_progress_all_threads(bool in_prog)
> {
> ???assert(Threads_lock->owned_by_self(), "must hold
> Threads_lock");???????<==== assert_locked_or_safepoint(Threads_lock)
> ???_evacuation_in_progress_global = in_prog;
> ???for (JavaThread* t = Threads::first(); t != NULL; t = t->next()) {
> ?????t->set_evacuation_in_progress(in_prog);
> ???}
> }

No, I don't think so. We're iterating over the threads, so we should
hold that lock. However, as I mentioned in that other email, the
VMThread should already hold it. Now that I think about it again, it's
probably not going to deadlock, it's simply re-entrant. In any case,
acquiring it should not be necessary.

Roman

> 
> 
> Thanks,
> 
> -Zhengyu
> 
> On 12/05/2016 11:00 AM, Aleksey Shipilev wrote:
> > Hi,
> > 
> > Currently, when concurrent GC is canceled, we still enter the VM
> > operation for
> > concurrent evacuation, only to exit it quickly and slide into the
> > full GC. This
> > causes *two* back-to-back safepoints: one short from evac, and
> > another large for
> > full GC. While short one is normally short, it can hit the unlucky
> > scheduling
> > outlier and drag the pause time up.
> > 
> > This change avoids going to evac if conc GC was canceled:
> > ???http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webr
> > ev.01/
> > 
> > Additionally, it resets the mark bitmaps before full GC with
> > parallel workers,
> > not concurrent ones, which would be important once Zhengyu trims
> > down the number
> > of concurrent workers.
> > 
> > Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick)
> > 
> > Thanks,
> > -Aleksey
> > 

From zgu at redhat.com  Mon Dec  5 17:43:59 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 5 Dec 2016 12:43:59 -0500
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
In-Reply-To: <1480958921.2597.10.camel@redhat.com>
References: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
	<9ab07a34-8406-481d-870a-7d03688eeae4@redhat.com>
	<1480958921.2597.10.camel@redhat.com>
Message-ID: <9f42aaa7-7b00-006f-580d-812ccdd7bb7b@redhat.com>

On 12/05/2016 12:28 PM, Roman Kennke wrote:

> Am Montag, den 05.12.2016, 12:25 -0500 schrieb Zhengyu Gu:
>> 114 // b. Cancel evacuation, if in progress
>> 115 if (_heap->is_evacuation_in_progress()) {
>> 116 MutexLocker mu(Threads_lock);
>> 117 _heap->set_evacuation_in_progress(false);
>> 118 }
>>
>>
>> I think that we can eliminate Threads_lock above by changing the
>> assertion below:
>>
>> void JavaThread::set_evacuation_in_progress_all_threads(bool in_prog)
>> {
>>     assert(Threads_lock->owned_by_self(), "must hold
>> Threads_lock");       <==== assert_locked_or_safepoint(Threads_lock)
>>     _evacuation_in_progress_global = in_prog;
>>     for (JavaThread* t = Threads::first(); t != NULL; t = t->next()) {
>>       t->set_evacuation_in_progress(in_prog);
>>     }
>> }
> No, I don't think so. We're iterating over the threads, so we should
> hold that lock. However, as I mentioned in that other email, the
> VMThread should already hold it. Now that I think about it again, it's
> probably not going to deadlock, it's simply re-entrant. In any case,
> acquiring it should not be necessary.

I think that it is safe to iterate over the thread list without the Thread_lock during safepoint.

Check following code:

void Threads::threads_do(ThreadClosure* tc) {
   assert_locked_or_safepoint(Threads_lock);
   // ALL_JAVA_THREADS iterates through all JavaThreads
   ALL_JAVA_THREADS(p) {
     tc->do_thread(p);
   }

....


-Zhengyu


> Roman
>
>>
>> Thanks,
>>
>> -Zhengyu
>>
>> On 12/05/2016 11:00 AM, Aleksey Shipilev wrote:
>>> Hi,
>>>
>>> Currently, when concurrent GC is canceled, we still enter the VM
>>> operation for
>>> concurrent evacuation, only to exit it quickly and slide into the
>>> full GC. This
>>> causes *two* back-to-back safepoints: one short from evac, and
>>> another large for
>>> full GC. While short one is normally short, it can hit the unlucky
>>> scheduling
>>> outlier and drag the pause time up.
>>>
>>> This change avoids going to evac if conc GC was canceled:
>>>     http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webr
>>> ev.01/
>>>
>>> Additionally, it resets the mark bitmaps before full GC with
>>> parallel workers,
>>> not concurrent ones, which would be important once Zhengyu trims
>>> down the number
>>> of concurrent workers.
>>>
>>> Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick)
>>>
>>> Thanks,
>>> -Aleksey
>>>


From rkennke at redhat.com  Mon Dec  5 17:47:03 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 5 Dec 2016 12:47:03 -0500 (EST)
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
Message-ID: <1793230632.4612707.1480960023576.JavaMail.zimbra@zmail12.collab.prod.int.phx2.redhat.com>

Ah yes I see what you mean. Yes we can change to assert_locked_or_safepoint() there.

/Roman

Am 05.12.2016 6:44 nachm. schrieb Zhengyu Gu <zgu at redhat.com>:
>
> On 12/05/2016 12:28 PM, Roman Kennke wrote: 
>
> > Am Montag, den 05.12.2016, 12:25 -0500 schrieb Zhengyu Gu: 
> >> 114 // b. Cancel evacuation, if in progress 
> >> 115 if (_heap->is_evacuation_in_progress()) { 
> >> 116 MutexLocker mu(Threads_lock); 
> >> 117 _heap->set_evacuation_in_progress(false); 
> >> 118 } 
> >> 
> >> 
> >> I think that we can eliminate Threads_lock above by changing the 
> >> assertion below: 
> >> 
> >> void JavaThread::set_evacuation_in_progress_all_threads(bool in_prog) 
> >> { 
> >>???? assert(Threads_lock->owned_by_self(), "must hold 
> >> Threads_lock");?????? <==== assert_locked_or_safepoint(Threads_lock) 
> >>???? _evacuation_in_progress_global = in_prog; 
> >>???? for (JavaThread* t = Threads::first(); t != NULL; t = t->next()) { 
> >>?????? t->set_evacuation_in_progress(in_prog); 
> >>???? } 
> >> } 
> > No, I don't think so. We're iterating over the threads, so we should 
> > hold that lock. However, as I mentioned in that other email, the 
> > VMThread should already hold it. Now that I think about it again, it's 
> > probably not going to deadlock, it's simply re-entrant. In any case, 
> > acquiring it should not be necessary. 
>
> I think that it is safe to iterate over the thread list without the Thread_lock during safepoint. 
>
> Check following code: 
>
> void Threads::threads_do(ThreadClosure* tc) { 
> ?? assert_locked_or_safepoint(Threads_lock); 
> ?? // ALL_JAVA_THREADS iterates through all JavaThreads 
> ?? ALL_JAVA_THREADS(p) { 
> ???? tc->do_thread(p); 
> ?? } 
>
> .... 
>
>
> -Zhengyu 
>
>
>
>
> > Roman 
> > 
> >> 
> >> Thanks, 
> >> 
> >> -Zhengyu 
> >> 
> >> On 12/05/2016 11:00 AM, Aleksey Shipilev wrote: 
> >>> Hi, 
> >>> 
> >>> Currently, when concurrent GC is canceled, we still enter the VM 
> >>> operation for 
> >>> concurrent evacuation, only to exit it quickly and slide into the 
> >>> full GC. This 
> >>> causes *two* back-to-back safepoints: one short from evac, and 
> >>> another large for 
> >>> full GC. While short one is normally short, it can hit the unlucky 
> >>> scheduling 
> >>> outlier and drag the pause time up. 
> >>> 
> >>> This change avoids going to evac if conc GC was canceled: 
> >>>???? http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webr 
> >>> ev.01/ 
> >>> 
> >>> Additionally, it resets the mark bitmaps before full GC with 
> >>> parallel workers, 
> >>> not concurrent ones, which would be important once Zhengyu trims 
> >>> down the number 
> >>> of concurrent workers. 
> >>> 
> >>> Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick) 
> >>> 
> >>> Thanks, 
> >>> -Aleksey 
> >>> 
>

From shade at redhat.com  Mon Dec  5 18:09:52 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 5 Dec 2016 19:09:52 +0100
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
In-Reply-To: <1480958401.2597.8.camel@redhat.com>
References: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
	<1480958401.2597.8.camel@redhat.com>
Message-ID: <5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com>

Okay! How about this then?
 http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.02/

Rewired ShenandoahConcurrentThread to capture cancellation after each phase.
Once a phase fails, it will re-spin towards full GC, which will recover. Also
dropped a mutex acquire in mark-compact, and changed assert to
assert_locked_or_safepoint.

Still passes hs_gc_shenandoah, and jcstress run is chugging along.

Thanks,
-Aleksey

On 12/05/2016 06:20 PM, Roman Kennke wrote:
> Some comments:
> 
> - GC can be cancelled during final-mark-pause. Might be worth to keep
> the check for cancelled-gc after init-mark-pause. Same after
> evacuation: if evacuation gets cancelled, we don't need to reset the
> bitmaps because now it's done at start of full-gc. I think.
> 
> - This here looks wrong:
> 
> +  // b. Cancel evacuation, if in progress
> +  if (_heap->is_evacuation_in_progress()) {
> +    MutexLocker mu(Threads_lock);
> +    _heap->set_evacuation_in_progress(false);
> +  }
> 
> This happens during safepoint. The VMThread would hold the Threads_lock
> and the above would deadlock.
> 
> We need to acquire the Threads_lock only when turning off evacuation
> outside of a safepoint.
> 
> Roman
> 
> Am Montag, den 05.12.2016, 17:00 +0100 schrieb Aleksey Shipilev:
>> Hi,
>>
>> Currently, when concurrent GC is canceled, we still enter the VM
>> operation for
>> concurrent evacuation, only to exit it quickly and slide into the
>> full GC. This
>> causes *two* back-to-back safepoints: one short from evac, and
>> another large for
>> full GC. While short one is normally short, it can hit the unlucky
>> scheduling
>> outlier and drag the pause time up.
>>
>> This change avoids going to evac if conc GC was canceled:
>>   http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.
>> 01/
>>
>> Additionally, it resets the mark bitmaps before full GC with parallel
>> workers,
>> not concurrent ones, which would be important once Zhengyu trims down
>> the number
>> of concurrent workers.
>>
>> Testing: hotspot_gc_shenandoah, jcstress (tests-all/quick)
>>
>> Thanks,
>> -Aleksey
>>


From rkennke at redhat.com  Mon Dec  5 18:44:01 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 05 Dec 2016 19:44:01 +0100
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
In-Reply-To: <5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com>
References: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
	<1480958401.2597.8.camel@redhat.com>
	<5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com>
Message-ID: <1480963441.2597.12.camel@redhat.com>

Am Montag, den 05.12.2016, 19:09 +0100 schrieb Aleksey Shipilev:
> Okay! How about this then?
> ?http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-
> evac/webrev.02/

Hmm, you still don't check for cancelled gc after final-mark pause.
Notice how initial-evacuation can, in theory, fail and cause full-gc.?

(Infact, if that happens, there'd be no need to exit the final-mark
safepoint: we could jump right into full-gc. However, that sounds a bit
tricky: would need to ensure that shenandoahConcurrentThread doesn't
start evacuation or another full-gc after that 'embedded' full-gc.)

I like the comments though!

Not your fault, but I find the use of both heap->cancelled_gc() and
should_terminate() confusing. Not sure if it can be consolidated
somehow? Not necessarily in this patch though.

Another crazy pants idea to consider: if GC gets cancelled during
marking, we could short-cut the full-gc: instead of throwing away the
half-completed mark-bitmap, we could have full-gc pick up both the
half-completed mark bitmap *and* the current taskqueues from concurrent
marking, and complete that, and then do the full-compact with it. The
idea here is that if we fail during marking, in all likelyhood we're
*almost* done with marking and don't necessarily need to make
everything again. Downside would be that the mark bitmap is slightly
pessimistic because of SATB.

Roman

From shade at redhat.com  Mon Dec  5 19:14:45 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 5 Dec 2016 20:14:45 +0100
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
In-Reply-To: <1480963441.2597.12.camel@redhat.com>
References: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
	<1480958401.2597.8.camel@redhat.com>
	<5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com>
	<1480963441.2597.12.camel@redhat.com>
Message-ID: <3d84b557-c23a-1a5b-840d-0e360599194f@redhat.com>

On 12/05/2016 07:44 PM, Roman Kennke wrote:
> Am Montag, den 05.12.2016, 19:09 +0100 schrieb Aleksey Shipilev:
>> Okay! How about this then?
>>  http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-
>> evac/webrev.02/
> 
> Hmm, you still don't check for cancelled gc after final-mark pause.
> Notice how initial-evacuation can, in theory, fail and cause full-gc. 

Right. Oops, the code is hairy, and prone to mishaps like that.

> Not your fault, but I find the use of both heap->cancelled_gc() and
> should_terminate() confusing. Not sure if it can be consolidated
> somehow? Not necessarily in this patch though.

Yes, let's rehash ShenandoahConcurrentThread::run_service into two methods, so
that code is cleaner and early returns make cancellation checks similar to our
beloved ParallelTerminator:
  http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.03/

Still passes hotspot_gc_shenandoah, and jcstress is running.

> The idea here is that if we fail during marking, in all likelyhood we're
> *almost* done with marking and don't necessarily need to make
> everything again. Downside would be that the mark bitmap is slightly
> pessimistic because of SATB.

No, I think Full GC should be our "last ditch" collection, and be able to
recover from any legitimate heap situation. This mandates starting from scratch,
to avoid spamming via e.g. SATB. We can probably do the "optimistic" STW
collection that does reuse the concurrent mark data though.

Thanks,
-Aleksey


From rkennke at redhat.com  Mon Dec  5 19:53:42 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 05 Dec 2016 20:53:42 +0100
Subject: RFR (S): Avoid evacuation if concurrent GC was cancelled
In-Reply-To: <3d84b557-c23a-1a5b-840d-0e360599194f@redhat.com>
References: <bb7ef3b8-ef8a-f33d-31ad-050e2be5cc66@redhat.com>
	<1480958401.2597.8.camel@redhat.com>
	<5c695bd3-ea8d-1a5c-02fd-96f32d2570f8@redhat.com>
	<1480963441.2597.12.camel@redhat.com>
	<3d84b557-c23a-1a5b-840d-0e360599194f@redhat.com>
Message-ID: <1480967622.2597.14.camel@redhat.com>

Am Montag, den 05.12.2016, 20:14 +0100 schrieb Aleksey Shipilev:
> On 12/05/2016 07:44 PM, Roman Kennke wrote:
> > Am Montag, den 05.12.2016, 19:09 +0100 schrieb Aleksey Shipilev:
> > > Okay! How about this then?
> > > ?http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-
> > > evac/webrev.02/
> > 
> > Hmm, you still don't check for cancelled gc after final-mark pause.
> > Notice how initial-evacuation can, in theory, fail and cause full-
> > gc.?
> 
> Right. Oops, the code is hairy, and prone to mishaps like that.
> 
> > Not your fault, but I find the use of both heap->cancelled_gc() and
> > should_terminate() confusing. Not sure if it can be consolidated
> > somehow? Not necessarily in this patch though.
> 
> Yes, let's rehash ShenandoahConcurrentThread::run_service into two
> methods, so
> that code is cleaner and early returns make cancellation checks
> similar to our
> beloved ParallelTerminator:
> ? http://cr.openjdk.java.net/~shade/shenandoah/cancel-no-evac/webrev.
> 03/
> 
> Still passes hotspot_gc_shenandoah, and jcstress is running.

Looks great!

> > The idea here is that if we fail during marking, in all likelyhood
> > we're
> > *almost* done with marking and don't necessarily need to make
> > everything again. Downside would be that the mark bitmap is
> > slightly
> > pessimistic because of SATB.
> 
> No, I think Full GC should be our "last ditch" collection, and be
> able to
> recover from any legitimate heap situation. This mandates starting
> from scratch,
> to avoid spamming via e.g. SATB.

Yes ok. Future idea: also compact humongous objects ;-)

>  We can probably do the "optimistic" STW
> collection that does reuse the concurrent mark data though.

Not exactly sure what you mean.?

Green light for the patch!

Roman

From ashipile at redhat.com  Mon Dec  5 20:28:24 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Mon, 05 Dec 2016 20:28:24 +0000
Subject: hg: shenandoah/jdk9/hotspot: Avoid evacuation if concurrent GC was
	cancelled. Make sure Full GC is able to recover.
Message-ID: <201612052028.uB5KSPn8015227@aojmv0008.oracle.com>

Changeset: 179aba55a53a
Author:    shade
Date:      2016-12-05 21:28 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/179aba55a53a

Avoid evacuation if concurrent GC was cancelled. Make sure Full GC is able to recover.

! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp
! src/share/vm/runtime/thread.cpp


From shade at redhat.com  Tue Dec  6 17:17:11 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 6 Dec 2016 18:17:11 +0100
Subject: RFC: TLAB size flapping
Message-ID: <8e9458d9-9c24-5b7c-6dd5-017728c81381@redhat.com>

Hi,

So, if you run allocation tests under -Xlog:gc+tlab, then a funny story unfolds.
The interesting piece of code is below, it is polled by TLAB allocation
machinery to figure what is the max TLAB allocatable without hassle:

size_t  ShenandoahHeap::unsafe_max_tlab_alloc(Thread *thread) const {
  size_t idx = _free_regions->current_index();
  ShenandoahHeapRegion* current = _free_regions->get(idx);
  if (current == NULL) {
    return 0;
  } else if (current->free() > MinTLABSize) {
    return current->free();
  } else {
    return MinTLABSize;
  }
}

This what happens next:

// Step 1: TLAB request for allocating, polling Shenandoah about the next free
// region. Shenandoah replies there is a current free region with 256 words
// busy (hm!). Okay, we claim the rest of the region for a TLAB then.
[2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ...
[2.328s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc: region = 1019,
capacity = 524288, used = 256, free = 524032
[2.328s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3) returns 524032
[2.328s][trace][gc,tlab] allocating new tlab of size 524032 at addr
0x00000006bec00800

// Step 2: Another TLAB request. No more space in current region. But yeah, we
// return MinTLABSize (those 256 words!), and shared infra moves on, asking us
// to allocate a new TLAB of 256 words. Now, the current region is depleted, so
// we allocate those 256 words in the *next* region.
[2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ...
[2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc: (failing) region
= 1019, capacity = 524288, used = 524288, free = 0
[2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3) returns 256
[2.329s][trace][gc,tlab] allocating new tlab of size 256 at addr 0x00000006bf000000

// Step 1 again. The cycle continues. Another TLAB request, current region has
// 256 words used, claim the rest... goes on and on.
[2.329s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ...
[2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc: region = 1020,
capacity = 524288, used = 256, free = 524032
[2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3) returns 524032
[2.329s][trace][gc,tlab] allocating new tlab of size 524032 at addr
0x00000006bf000800

So, this flaps TLAB allocations between the region size and MinTLABSize. Oops!
We enter the slow path *twice* per region, instead of doing it once. I think
returning MinTLABSize is wrong in the code above, and we have two options:
  a) Return 0 on MinTLABSize branch. If I read the code right, this will bail us
from TLAB allocation path, which is undesireable;
  b) Advance to the next free region, and try to poll its free().

G1 is susceptible to the same problem, as far as I can see.

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Dec  6 17:26:27 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 06 Dec 2016 18:26:27 +0100
Subject: RFC: TLAB size flapping
In-Reply-To: <8e9458d9-9c24-5b7c-6dd5-017728c81381@redhat.com>
References: <8e9458d9-9c24-5b7c-6dd5-017728c81381@redhat.com>
Message-ID: <1481045187.2597.19.camel@redhat.com>

Am Dienstag, den 06.12.2016, 18:17 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> So, if you run allocation tests under -Xlog:gc+tlab, then a funny
> story unfolds.
> The interesting piece of code is below, it is polled by TLAB
> allocation
> machinery to figure what is the max TLAB allocatable without hassle:
> 
> size_t??ShenandoahHeap::unsafe_max_tlab_alloc(Thread *thread) const {
> ? size_t idx = _free_regions->current_index();
> ? ShenandoahHeapRegion* current = _free_regions->get(idx);
> ? if (current == NULL) {
> ????return 0;
> ? } else if (current->free() > MinTLABSize) {
> ????return current->free();
> ? } else {
> ????return MinTLABSize;
> ? }
> }
> 
> This what happens next:
> 
> // Step 1: TLAB request for allocating, polling Shenandoah about the
> next free
> // region. Shenandoah replies there is a current free region with 256
> words
> // busy (hm!). Okay, we claim the rest of the region for a TLAB then.
> [2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ...
> [2.328s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc:
> region = 1019,
> capacity = 524288, used = 256, free = 524032
> [2.328s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3)
> returns 524032
> [2.328s][trace][gc,tlab] allocating new tlab of size 524032 at addr
> 0x00000006bec00800
> 
> // Step 2: Another TLAB request. No more space in current region. But
> yeah, we
> // return MinTLABSize (those 256 words!), and shared infra moves on,
> asking us
> // to allocate a new TLAB of 256 words. Now, the current region is
> depleted, so
> // we allocate those 256 words in the *next* region.
> [2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ...
> [2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc:
> (failing) region
> = 1019, capacity = 524288, used = 524288, free = 0
> [2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3)
> returns 256
> [2.329s][trace][gc,tlab] allocating new tlab of size 256 at addr
> 0x00000006bf000000
> 
> // Step 1 again. The cycle continues. Another TLAB request, current
> region has
> // 256 words used, claim the rest... goes on and on.
> [2.329s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ...
> [2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc:
> region = 1020,
> capacity = 524288, used = 256, free = 524032
> [2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3)
> returns 524032
> [2.329s][trace][gc,tlab] allocating new tlab of size 524032 at addr
> 0x00000006bf000800
> 
> So, this flaps TLAB allocations between the region size and
> MinTLABSize. Oops!

Oops indeed! :-)

> We enter the slow path *twice* per region, instead of doing it once.
> I think
> returning MinTLABSize is wrong in the code above, and we have two
> options:
> ? a) Return 0 on MinTLABSize branch. If I read the code right, this
> will bail us
> from TLAB allocation path, which is undesireable;
> ? b) Advance to the next free region, and try to poll its free().

Hmm, a seems undesirable. Do we really need to advance to next region?
Can't we simply return region-size here? I mean, it is inherently racy
and it doesn't matter if we advance right now, or a little later when
trying to allocate. Returning X here doesn't somehow magically
guarantee that we can later allocate X without skipping to next region.
Unless it's somehow done atomically. Which we don't. (Shenandoah does
lock-free allocations, maybe other GCs are better off because they
allocate under Heap_lock?)

Roman


From shade at redhat.com  Tue Dec  6 17:55:09 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 6 Dec 2016 18:55:09 +0100
Subject: RFR: TLAB size flapping
In-Reply-To: <1481045187.2597.19.camel@redhat.com>
References: <8e9458d9-9c24-5b7c-6dd5-017728c81381@redhat.com>
	<1481045187.2597.19.camel@redhat.com>
Message-ID: <7ff5d029-768d-1e9a-b093-2c81a379fb5f@redhat.com>

On 12/06/2016 06:26 PM, Roman Kennke wrote:
>> We enter the slow path *twice* per region, instead of doing it once.
>> I think
>> returning MinTLABSize is wrong in the code above, and we have two
>> options:
>>   a) Return 0 on MinTLABSize branch. If I read the code right, this
>> will bail us
>> from TLAB allocation path, which is undesireable;
>>   b) Advance to the next free region, and try to poll its free().
> 
> Hmm, a seems undesirable. Do we really need to advance to next region?
> Can't we simply return region-size here? I mean, it is inherently racy
> and it doesn't matter if we advance right now, or a little later when
> trying to allocate. Returning X here doesn't somehow magically
> guarantee that we can later allocate X without skipping to next region.
> Unless it's somehow done atomically. Which we don't. (Shenandoah does
> lock-free allocations, maybe other GCs are better off because they
> allocate under Heap_lock?)

Ah, very good, we can return the region size, knowing the next free region is
completely free:
  http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/webrev.01/

It does seem to improve allocation rates when multiple allocating threads are
bashing us with requests (caveat emptor: new workload, blah-blah):
 http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/baseline.txt
 http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/patched.txt

Thanks,
-Aleksey


From shade at redhat.com  Tue Dec  6 18:39:01 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 6 Dec 2016 19:39:01 +0100
Subject: Perf: excess store in allocation fast path?
Message-ID: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com>

Hi, (Roland?)

I think we have the excess store at allocation fast path, compare Shenandoah [1]
and Parallel [2]. And this is not storing the fwdptr, but seems to be the excess
zeroing. In that test, allocating a simple Object yields this:

  mov    %r11,(%rax)            ; mark word
  prefetchnta 0xc0(%r10)
  movl   $0xf80001dd,0x8(%rax)  ; class word
  mov    %rax,-0x8(%rax)        ; fwdptr
  mov    %r12d,0xc(%rax)        ; zeroing last 4 bytes
  mov    %r12,0x10(%rax)        ; <--- hey, what?

I think this happens because allocation fastpath bumps the instance size to
"cover" for the upcoming object's fwdptr, and accidentally zeroes it as well? Do
we need this? I can imagine the invariant that everything up to top pointer
should be zeroed, is this such a case?

The original test is in our suite [3], runnable like this, if you want to poke
around it:

$ java -jar target/benchmarks.jar alloc.plain.Objects --jvmArgs
"-XX:+UseShenandoahGC -Xmx8g -Xms8g" -f 1 -wi 5 -i 5 -t 1 -prof perfasm

Thanks,
-Aleksey

[1]
http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc-shenandoah.txt
[2]
http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc-parallel.txt
[3] http://icedtea.classpath.org/people/shade/gc-bench


From roman at kennke.org  Tue Dec  6 18:25:39 2016
From: roman at kennke.org (Roman Kennke)
Date: Tue, 06 Dec 2016 19:25:39 +0100
Subject: RFR: TLAB size flapping
Message-ID: <mailman.1.1481049565.23825.shenandoah-dev@openjdk.java.net>

OK!

Sent from my FairPhoneAm 06.12.2016 6:55 nachm. schrieb Aleksey Shipilev <shade at redhat.com>:
>
> On 12/06/2016 06:26 PM, Roman Kennke wrote: 
> >> We enter the slow path *twice* per region, instead of doing it once. 
> >> I think 
> >> returning MinTLABSize is wrong in the code above, and we have two 
> >> options: 
> >>?? a) Return 0 on MinTLABSize branch. If I read the code right, this 
> >> will bail us 
> >> from TLAB allocation path, which is undesireable; 
> >>?? b) Advance to the next free region, and try to poll its free(). 
> > 
> > Hmm, a seems undesirable. Do we really need to advance to next region? 
> > Can't we simply return region-size here? I mean, it is inherently racy 
> > and it doesn't matter if we advance right now, or a little later when 
> > trying to allocate. Returning X here doesn't somehow magically 
> > guarantee that we can later allocate X without skipping to next region. 
> > Unless it's somehow done atomically. Which we don't. (Shenandoah does 
> > lock-free allocations, maybe other GCs are better off because they 
> > allocate under Heap_lock?) 
>
> Ah, very good, we can return the region size, knowing the next free region is 
> completely free: 
> ? http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/webrev.01/ 
>
> It does seem to improve allocation rates when multiple allocating threads are 
> bashing us with requests (caveat emptor: new workload, blah-blah): 
> http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/baseline.txt 
> http://cr.openjdk.java.net/~shade/shenandoah/tlab-flapping/patched.txt 
>
> Thanks, 
> -Aleksey 
>
>

From ashipile at redhat.com  Tue Dec  6 18:42:07 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Tue, 06 Dec 2016 18:42:07 +0000
Subject: hg: shenandoah/jdk9/hotspot: Fix TLAB flapping. Do not reply with
	MinTLABSize if we have no space left in current region,
	make allocator to ask for another region.
Message-ID: <201612061842.uB6Ig7ro020621@aojmv0008.oracle.com>

Changeset: 7009fc6f74b3
Author:    shade
Date:      2016-12-06 19:41 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/7009fc6f74b3

Fix TLAB flapping. Do not reply with MinTLABSize if we have no space left in current region, make allocator to ask for another region.

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeapRegion.cpp


From chf at redhat.com  Tue Dec  6 18:47:53 2016
From: chf at redhat.com (Christine Flood)
Date: Tue, 6 Dec 2016 13:47:53 -0500 (EST)
Subject: First pass at a connection matrix...
In-Reply-To: <34332593.2371335.1481049924996.JavaMail.zimbra@redhat.com>
Message-ID: <342735797.2371608.1481050073955.JavaMail.zimbra@redhat.com>

This is just experimental for now.  The long term plan is to have this matrix built by write barriers and have a smarter metric for choosing connected collection set regions.  

The matrix is built and printed during concurrent marking if you run with -XX:+ShenandoahMatrix.
The somewhat silly heuristic is run via -XX:ShenandoahGCHeuristics=connections.

This isn't really integrated with the new region_in_collection_set stuff, but is enough for now.

http://cr.openjdk.java.net/~chf/connections/webrev.01/


Christine

From shade at redhat.com  Tue Dec  6 18:53:27 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 6 Dec 2016 19:53:27 +0100
Subject: RFR: TLAB size flapping
In-Reply-To: <201612061839.uB6IdOGp025048@int-mx10.intmail.prod.int.phx2.redhat.com>
References: <201612061839.uB6IdOGp025048@int-mx10.intmail.prod.int.phx2.redhat.com>
Message-ID: <0b3722cd-1a35-bdef-7aee-c9bad9261af7@redhat.com>

On 12/06/2016 07:25 PM, Roman Kennke wrote:
> OK!

Pushed.

I know some G1 folks are reading this list (waves), so here is the relevant bug
for G1. Maybe there is a better solution there:
  https://bugs.openjdk.java.net/browse/JDK-8170817

Thanks,
-Aleksey


From shade at redhat.com  Tue Dec  6 19:13:00 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 6 Dec 2016 20:13:00 +0100
Subject: First pass at a connection matrix...
In-Reply-To: <342735797.2371608.1481050073955.JavaMail.zimbra@redhat.com>
References: <342735797.2371608.1481050073955.JavaMail.zimbra@redhat.com>
Message-ID: <9ccde02b-73fd-d7d3-da7b-b2d7c25e1e04@redhat.com>

On 12/06/2016 07:47 PM, Christine Flood wrote:
> http://cr.openjdk.java.net/~chf/connections/webrev.01/

I don't mind this experimental code in repo, but let's do a few cleanups to
match with other experimental hacks we have!

 *) Change tty->print-s to log_develop_trace(gc); that also fixes tty->print vs.
tty->print_cr.

 *) Prefix new bug comments with FIXME

 *) Crush bad formatting early:
     - ConnectionHeuristics::choose_collection_set: indenting, 2 vs 3 spaces?
     - ShenandoahCollectorPolicy::phase_times() definition, excess space
     - globals.hpp, align right "\" around new additions

 *) UseShenandoahOWST in globals.hpp moved accidentally?

Otherwise looks okay.

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Dec  6 19:25:14 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 06 Dec 2016 20:25:14 +0100
Subject: Perf: excess store in allocation fast path?
In-Reply-To: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com>
References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com>
Message-ID: <1481052314.2597.21.camel@redhat.com>

Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev:
> Hi, (Roland?)
> 
> I think we have the excess store at allocation fast path, compare
> Shenandoah [1]
> and Parallel [2]. And this is not storing the fwdptr, but seems to be
> the excess
> zeroing. In that test, allocating a simple Object yields this:
> 
> ? mov????%r11,(%rax)????????????; mark word
> ? prefetchnta 0xc0(%r10)
> ? movl???$0xf80001dd,0x8(%rax)??; class word
> ? mov????%rax,-0x8(%rax)????????; fwdptr
> ? mov????%r12d,0xc(%rax)????????; zeroing last 4 bytes
> ? mov????%r12,0x10(%rax)????????; <--- hey, what?
> 
> I think this happens because allocation fastpath bumps the instance
> size to
> "cover" for the upcoming object's fwdptr, and accidentally zeroes it
> as well? Do
> we need this? I can imagine the invariant that everything up to top
> pointer
> should be zeroed, is this such a case?

It looks like initialization for the first field in the object. Maybe
we're failing the c2 opt that eliminates initial zeroing for fields?
Maybe our barrier or allocation stuff somehow gets in the way of that
and c2 can't see the initialization and therefore cannot optimize it
away?

Roman


From shade at redhat.com  Tue Dec  6 19:29:28 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 6 Dec 2016 20:29:28 +0100
Subject: Perf: excess store in allocation fast path?
In-Reply-To: <1481052314.2597.21.camel@redhat.com>
References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com>
	<1481052314.2597.21.camel@redhat.com>
Message-ID: <66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com>

On 12/06/2016 08:25 PM, Roman Kennke wrote:
> Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev:
>> I think we have the excess store at allocation fast path, compare 
>> Shenandoah [1] and Parallel [2]. And this is not storing the fwdptr, but
>> seems to be the excess zeroing. In that test, allocating a simple Object
>> yields this:
>>
>>   mov    %r11,(%rax)            ; mark word
>>   prefetchnta 0xc0(%r10)
>>   movl   $0xf80001dd,0x8(%rax)  ; class word
>>   mov    %rax,-0x8(%rax)        ; fwdptr
>>   mov    %r12d,0xc(%rax)        ; zeroing last 4 bytes
>>   mov    %r12,0x10(%rax)        ; <--- hey, what?
>>
>> I think this happens because allocation fastpath bumps the instance size
>> to "cover" for the upcoming object's fwdptr, and accidentally zeroes it as
>> well? Do we need this? I can imagine the invariant that everything up to
>> top pointer should be zeroed, is this such a case?
> 
> It looks like initialization for the first field in the object. Maybe
> we're failing the c2 opt that eliminates initial zeroing for fields?
> Maybe our barrier or allocation stuff somehow gets in the way of that
> and c2 can't see the initialization and therefore cannot optimize it
> away?

The test allocates new Object(), no fields. The object is 16 bytes long, yet we
store something beyond 16 bytes -- which AFAIR is the slot for the next object's
forwarding pointer.

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Dec  6 19:44:14 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 06 Dec 2016 20:44:14 +0100
Subject: Perf: excess store in allocation fast path?
In-Reply-To: <66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com>
References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com>
	<1481052314.2597.21.camel@redhat.com>
	<66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com>
Message-ID: <1481053454.2597.23.camel@redhat.com>

Am Dienstag, den 06.12.2016, 20:29 +0100 schrieb Aleksey Shipilev:
> On 12/06/2016 08:25 PM, Roman Kennke wrote:
> > Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev:
> > > I think we have the excess store at allocation fast path,
> > > compare?
> > > Shenandoah [1] and Parallel [2]. And this is not storing the
> > > fwdptr, but
> > > seems to be the excess zeroing. In that test, allocating a simple
> > > Object
> > > yields this:
> > > 
> > > ? mov????%r11,(%rax)????????????; mark word
> > > ? prefetchnta 0xc0(%r10)
> > > ? movl???$0xf80001dd,0x8(%rax)??; class word
> > > ? mov????%rax,-0x8(%rax)????????; fwdptr
> > > ? mov????%r12d,0xc(%rax)????????; zeroing last 4 bytes
> > > ? mov????%r12,0x10(%rax)????????; <--- hey, what?
> > > 
> > > I think this happens because allocation fastpath bumps the
> > > instance size
> > > to "cover" for the upcoming object's fwdptr, and accidentally
> > > zeroes it as
> > > well? Do we need this? I can imagine the invariant that
> > > everything up to
> > > top pointer should be zeroed, is this such a case?
> > 
> > It looks like initialization for the first field in the object.
> > Maybe
> > we're failing the c2 opt that eliminates initial zeroing for
> > fields?
> > Maybe our barrier or allocation stuff somehow gets in the way of
> > that
> > and c2 can't see the initialization and therefore cannot optimize
> > it
> > away?
> 
> The test allocates new Object(), no fields. The object is 16 bytes
> long, yet we
> store something beyond 16 bytes -- which AFAIR is the slot for the
> next object's
> forwarding pointer.

Try the attached patch. It preserves the obj_size, and passes that to
initialize_object().


-------------- next part --------------
diff --git a/src/share/vm/opto/macro.cpp b/src/share/vm/opto/macro.cpp
--- a/src/share/vm/opto/macro.cpp
+++ b/src/share/vm/opto/macro.cpp
@@ -1449,6 +1449,7 @@
     transform_later(old_eden_top);
     // Add to heap top to get a new heap top
 
+    Node* init_size_in_bytes = size_in_bytes;
     if (UseShenandoahGC) {
       // Allocate several words more for the Shenandoah brooks pointer.
       size_in_bytes = new AddLNode(size_in_bytes, _igvn.MakeConX(BrooksPointer::byte_size()));
@@ -1554,7 +1555,7 @@
     InitializeNode* init = alloc->initialization();
     fast_oop_rawmem = initialize_object(alloc,
                                         fast_oop_ctrl, fast_oop_rawmem, fast_oop,
-                                        klass_node, length, size_in_bytes);
+                                        klass_node, length, init_size_in_bytes);
 
     // If initialization is performed by an array copy, any required
     // MemBarStoreStore was already added. If the object does not

From shade at redhat.com  Tue Dec  6 19:50:57 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 6 Dec 2016 20:50:57 +0100
Subject: Perf: excess store in allocation fast path?
In-Reply-To: <1481053454.2597.23.camel@redhat.com>
References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com>
	<1481052314.2597.21.camel@redhat.com>
	<66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com>
	<1481053454.2597.23.camel@redhat.com>
Message-ID: <9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com>

On 12/06/2016 08:44 PM, Roman Kennke wrote:
> Try the attached patch. It preserves the obj_size, and passes that to
> initialize_object().

Yea, that works, see:

http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc-shenandoah-rkennke1.txt

Compare with baseline:

http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc-shenandoah.txt

...and have your 50 picoseconds per alloc back!

Now, I want to know if it's okay to skip zeroing memory past the allocation
pointer. I think it is safe, because that's how zeroing elimination works in
other cases?

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Dec  6 19:55:17 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 06 Dec 2016 20:55:17 +0100
Subject: Perf: excess store in allocation fast path?
In-Reply-To: <9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com>
References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com>
	<1481052314.2597.21.camel@redhat.com>
	<66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com>
	<1481053454.2597.23.camel@redhat.com>
	<9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com>
Message-ID: <1481054117.2597.24.camel@redhat.com>

Am Dienstag, den 06.12.2016, 20:50 +0100 schrieb Aleksey Shipilev:
> On 12/06/2016 08:44 PM, Roman Kennke wrote:
> > Try the attached patch. It preserves the obj_size, and passes that
> > to
> > initialize_object().
> 
> Yea, that works, see:
> 
> http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc
> -shenandoah-rkennke1.txt
> 
> Compare with baseline:
> 
> http://cr.openjdk.java.net/~shade/shenandoah/alloc-excess-store/alloc
> -shenandoah.txt
> 
> ...and have your 50 picoseconds per alloc back!
> 
> Now, I want to know if it's okay to skip zeroing memory past the
> allocation
> pointer. I think it is safe, because that's how zeroing elimination
> works in
> other cases?

It's not only ok, I think it is a bug to zero past the allocation ptr.
Consider what happens when you allocate at the region boundary, and
then initialize one word past the object -> we'd wreck the 1st word of
the next region.

Roman

From shade at redhat.com  Tue Dec  6 20:07:24 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 6 Dec 2016 21:07:24 +0100
Subject: Perf: excess store in allocation fast path?
In-Reply-To: <1481054117.2597.24.camel@redhat.com>
References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com>
	<1481052314.2597.21.camel@redhat.com>
	<66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com>
	<1481053454.2597.23.camel@redhat.com>
	<9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com>
	<1481054117.2597.24.camel@redhat.com>
Message-ID: <54c8fecd-6bd5-8427-cd11-34466f6d8ed4@redhat.com>

On 12/06/2016 08:55 PM, Roman Kennke wrote:
> Am Dienstag, den 06.12.2016, 20:50 +0100 schrieb Aleksey Shipilev:
>> Now, I want to know if it's okay to skip zeroing memory past the 
>> allocation pointer. I think it is safe, because that's how zeroing
>> elimination works in other cases?
> It's not only ok, I think it is a bug to zero past the allocation ptr.
> Consider what happens when you allocate at the region boundary, and
> then initialize one word past the object -> we'd wreck the 1st word of
> the next region.

Hrmpf. IIRC our filler object mechanics correctly, we allocate the space at the
end of the object, so there is no way to cross into other region?

Anyhow, that one notwithstanding, I meant if it's okay to have non-zeroed slot
_under_ the allocation top, as in:

  (obj2 header would go here)
 ----------------------------------------- alloc top
  [garbage slot, soon to be obj2 fwdptr]
  [obj1 fields]
  [obj1 header]
  [obj1 fwdptr]
  ...

It's not likely to be parsable, but still.

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Dec  6 20:53:19 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 06 Dec 2016 21:53:19 +0100
Subject: Perf: excess store in allocation fast path?
In-Reply-To: <54c8fecd-6bd5-8427-cd11-34466f6d8ed4@redhat.com>
References: <9d13f3da-424d-1dc6-3e48-75132ec60875@redhat.com>
	<1481052314.2597.21.camel@redhat.com>
	<66157ae6-7918-fbb9-c4f9-20a287e48786@redhat.com>
	<1481053454.2597.23.camel@redhat.com>
	<9f061c20-d6f9-29b9-91d6-5e632c501dcc@redhat.com>
	<1481054117.2597.24.camel@redhat.com>
	<54c8fecd-6bd5-8427-cd11-34466f6d8ed4@redhat.com>
Message-ID: <1481057599.2597.27.camel@redhat.com>

Am Dienstag, den 06.12.2016, 21:07 +0100 schrieb Aleksey Shipilev:
> On 12/06/2016 08:55 PM, Roman Kennke wrote:
> > Am Dienstag, den 06.12.2016, 20:50 +0100 schrieb Aleksey Shipilev:
> > > Now, I want to know if it's okay to skip zeroing memory past the?
> > > allocation pointer. I think it is safe, because that's how
> > > zeroing
> > > elimination works in other cases?
> > 
> > It's not only ok, I think it is a bug to zero past the allocation
> > ptr.
> > Consider what happens when you allocate at the region boundary, and
> > then initialize one word past the object -> we'd wreck the 1st word
> > of
> > the next region.
> 
> Hrmpf. IIRC our filler object mechanics correctly, we allocate the
> space at the
> end of the object, so there is no way to cross into other region?

Nope, this shouldn't be the case. We *should* always allocate brooks
ptr + object of this object, not into the next one.

> Anyhow, that one notwithstanding, I meant if it's okay to have non-
> zeroed slot
> _under_ the allocation top, as in:
> 
> ? (obj2 header would go here)
> ?----------------------------------------- alloc top
> ? [garbage slot, soon to be obj2 fwdptr]
> ? [obj1 fields]
> ? [obj1 header]
> ? [obj1 fwdptr]
> ? ...

It should be:

? (obj2 header would go here)
???[garbage slot, soon to be obj2 fwdptr]
? ----------------------------------------- alloc top
 ??[obj1 fields]
???[obj1 header]
???[obj1 fwdptr]
???...

if not, I'd say it's a bug.

Roman

From rkennke at redhat.com  Tue Dec  6 21:24:24 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 06 Dec 2016 22:24:24 +0100
Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u
Message-ID: <1481059464.2597.29.camel@redhat.com>

This huge change backports the current state of JDK9 (minus the last
bunch of patches) to jdk8u:

http://cr.openjdk.java.net/~rkennke/backport-jdk8/webrev.00/

Not sure if this can be reasonably reviewed. ;-)

I checked this line by line and also compared it to our baseline repo (
http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/).

The one thing missing is changes in src/share/vm/opto and
src/share/vm/adlc, but Roland is working on those.

I've checked with SPECjvm2008 and jcstress is on the way.
Unfortunately, I could not get the jtreg stuff to work.

Ok to go in?

Roman

From rkennke at redhat.com  Tue Dec  6 21:29:28 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 06 Dec 2016 22:29:28 +0100
Subject: RFR: Fix object initialization in C2
Message-ID: <1481059768.2597.31.camel@redhat.com>

As discussed in previous thread, we overshoot object initialization by
one word in C2 compiled allocation code. Besides generating one extra
store, I believe it's very dangerous: an object allocated at region end
would write to one word beyond, either thrashing the brooks ptr of the
next regions first object, or causing a SEGV at end of heap. I'm
actually surprised it hasn't happened yet ;-)

The fix is relatively simple: keep around the true object size, and
pass that to initialize_object() instead of the obj-size + brooksptr-
size that we calculated.

http://cr.openjdk.java.net/~rkennke/obj-init/webrev.00/

Ok?

Roman

From shade at redhat.com  Wed Dec  7 08:49:59 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 7 Dec 2016 09:49:59 +0100
Subject: RFR: Fix object initialization in C2
In-Reply-To: <1481059768.2597.31.camel@redhat.com>
References: <1481059768.2597.31.camel@redhat.com>
Message-ID: <8aaa8c2a-183d-5959-99f8-8ecfdc9cea9b@redhat.com>

On 12/06/2016 10:29 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/obj-init/webrev.00/

Looks good to me.

I would like Roland to OK this.

-Aleksey


From rwestrel at redhat.com  Wed Dec  7 08:55:24 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 07 Dec 2016 09:55:24 +0100
Subject: RFR: Fix object initialization in C2
In-Reply-To: <1481059768.2597.31.camel@redhat.com>
References: <1481059768.2597.31.camel@redhat.com>
Message-ID: <dk6shq0yv7n.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~rkennke/obj-init/webrev.00/

That looks good to me.

Roland.

From rkennke at redhat.com  Wed Dec  7 10:18:42 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 07 Dec 2016 11:18:42 +0100
Subject: RFR: Fix object initialization in C2
In-Reply-To: <dk6shq0yv7n.fsf@rwestrel.remote.csb>
References: <1481059768.2597.31.camel@redhat.com>
	<dk6shq0yv7n.fsf@rwestrel.remote.csb>
Message-ID: <1481105922.2597.32.camel@redhat.com>

Am Mittwoch, den 07.12.2016, 09:55 +0100 schrieb Roland Westrelin:
> > http://cr.openjdk.java.net/~rkennke/obj-init/webrev.00/
> 
> That looks good to me.

Thanks. I pushed it.

Turns out that we're saved by the prefetch-reserve in
ThreadLocalAllocationBuffer: it always allocates a few words more than
necessary and thus we're never jumping off the cliff. Lucky us :-)

Roman

From roman at kennke.org  Wed Dec  7 10:19:37 2016
From: roman at kennke.org (roman at kennke.org)
Date: Wed, 07 Dec 2016 10:19:37 +0000
Subject: hg: shenandoah/jdk9/hotspot: Fix object initialization in C2
Message-ID: <201612071019.uB7AJbp5019698@aojmv0008.oracle.com>

Changeset: f6d8d643198e
Author:    rkennke
Date:      2016-12-07 11:17 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/f6d8d643198e

Fix object initialization in C2

! src/share/vm/opto/macro.cpp


From shade at redhat.com  Wed Dec  7 12:12:50 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 7 Dec 2016 13:12:50 +0100
Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u
In-Reply-To: <1481059464.2597.29.camel@redhat.com>
References: <1481059464.2597.29.camel@redhat.com>
Message-ID: <f6cb2412-1c39-54fd-d534-c41723bd3356@redhat.com>

On 12/06/2016 10:24 PM, Roman Kennke wrote:
> This huge change backports the current state of JDK9 (minus the last
> bunch of patches) to jdk8u:
> 
> http://cr.openjdk.java.net/~rkennke/backport-jdk8/webrev.00/

Spot-checking:

*) src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp

These are not conditional for Shenandoah, do we hit these guarantees with other GCs?

  2032       guarantee(opr2->type() != T_OBJECT && opr2->type() != T_ARRAY,
"need acmp barrier?");
  2033       guarantee(opr1->type() != T_OBJECT && opr1->type() != T_ARRAY,
"need acmp barrier?");

*) src/share/vm/c1/c1_Runtime1.cpp

Bad indent:

  688 Handle h_obj(thread, obj);

*) src/share/vm/memory/barrierSet.cpp

Why we moved BarrierSet::write_ref_array here? Was that the upstream jdk-9 move?
Should probably stay closer to jdk-8 version.

*) src/share/vm/runtime/fieldDescriptor.hpp

Another leak from jdk-9?

  101   bool is_stable()                const    { return
access_flags().is_stable(); }

*) src/share/vm/runtime/os.hpp

Leak?

  56 class methodHandle;

*) src/share/vm/utilities/growableArray.hpp

Leak?

  30 #include "oops/oop.hpp"


Otherwise looks good.

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Dec  7 13:08:17 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 07 Dec 2016 14:08:17 +0100
Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u
In-Reply-To: <f6cb2412-1c39-54fd-d534-c41723bd3356@redhat.com>
References: <1481059464.2597.29.camel@redhat.com>
	<f6cb2412-1c39-54fd-d534-c41723bd3356@redhat.com>
Message-ID: <1481116097.2597.34.camel@redhat.com>

Am Mittwoch, den 07.12.2016, 13:12 +0100 schrieb Aleksey Shipilev:
> On 12/06/2016 10:24 PM, Roman Kennke wrote:
> > This huge change backports the current state of JDK9 (minus the
> > last
> > bunch of patches) to jdk8u:
> > 
> > http://cr.openjdk.java.net/~rkennke/backport-jdk8/webrev.00/
> 
> Spot-checking:
> 
> *) src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp
> 
> These are not conditional for Shenandoah, do we hit these guarantees
> with other GCs?
> 
> ? 2032???????guarantee(opr2->type() != T_OBJECT && opr2->type() !=
> T_ARRAY,
> "need acmp barrier?");
> ? 2033???????guarantee(opr1->type() != T_OBJECT && opr1->type() !=
> T_ARRAY,
> "need acmp barrier?");

Nope, we don't. Should I remove them? FWIW those are the same as we
have in jdk9-shenandoah.


> *) src/share/vm/c1/c1_Runtime1.cpp
> 
> Bad indent:
> 
> ? 688 Handle h_obj(thread, obj);

Uh, I was reading 'bad intent' and didn't know what you mean ;-)
?Will fix it before pushing.

> 
> *) src/share/vm/memory/barrierSet.cpp
> 
> Why we moved BarrierSet::write_ref_array here? Was that the upstream
> jdk-9 move?

No, this was moved in jdk9-shenandoah because we made that method
virtual.


> *) src/share/vm/runtime/fieldDescriptor.hpp
> 
> Another leak from jdk-9?
> 
> ? 101???bool is_stable()????????????????const????{ return
> access_flags().is_stable(); }

No. We need it in c2 to identify stable fields. (no read-barrier
needed...)

> *) src/share/vm/runtime/os.hpp
> 
> Leak?
> 
> ? 56 class methodHandle;

No. It's used some lines down as methodHandle* and we're changing order
of includes and this is needed for compilation.

> *) src/share/vm/utilities/growableArray.hpp
> 
> Leak?
> 
> ? 30 #include "oops/oop.hpp"

No. We introduced some code that uses oopDesc::is_safe().

Ok to push after fixing the bad intent ;-)

Roman

From shade at redhat.com  Wed Dec  7 19:50:24 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 7 Dec 2016 20:50:24 +0100
Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u
In-Reply-To: <1481116097.2597.34.camel@redhat.com>
References: <1481059464.2597.29.camel@redhat.com>
	<f6cb2412-1c39-54fd-d534-c41723bd3356@redhat.com>
	<1481116097.2597.34.camel@redhat.com>
Message-ID: <5608f8a7-38be-e499-ae9a-d476cd27172a@redhat.com>

On 12/07/2016 02:08 PM, Roman Kennke wrote:
> Am Mittwoch, den 07.12.2016, 13:12 +0100 schrieb Aleksey Shipilev:
>> On 12/06/2016 10:24 PM, Roman Kennke wrote:
> Ok to push after fixing the bad intent ;-)

Ok then.

Thanks,
-Aleksey


From roman at kennke.org  Wed Dec  7 20:03:22 2016
From: roman at kennke.org (roman at kennke.org)
Date: Wed, 07 Dec 2016 20:03:22 +0000
Subject: hg: shenandoah/jdk8u/hotspot: Backport JDK9 Shenandoah to JDK8u
Message-ID: <201612072003.uB7K3Mm0023185@aojmv0008.oracle.com>

Changeset: 87059e2365be
Author:    rkennke
Date:      2016-12-07 21:03 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/87059e2365be

Backport JDK9 Shenandoah to JDK8u

! src/cpu/aarch64/vm/aarch64.ad
! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp
! src/cpu/aarch64/vm/c1_LIRGenerator_aarch64.cpp
! src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp
! src/cpu/aarch64/vm/c1_Runtime1_aarch64.cpp
! src/cpu/aarch64/vm/interp_masm_aarch64.cpp
! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
! src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
! src/cpu/aarch64/vm/methodHandles_aarch64.cpp
! src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp
! src/cpu/aarch64/vm/shenandoahBarrierSet_aarch64.cpp
! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp
! src/cpu/aarch64/vm/stubRoutines_aarch64.hpp
! src/cpu/aarch64/vm/templateInterpreter_aarch64.cpp
! src/cpu/aarch64/vm/templateTable_aarch64.cpp
! src/cpu/x86/vm/assembler_x86.cpp
! src/cpu/x86/vm/assembler_x86.hpp
! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp
! src/cpu/x86/vm/c1_LIRGenerator_x86.cpp
! src/cpu/x86/vm/c1_MacroAssembler_x86.cpp
! src/cpu/x86/vm/c1_Runtime1_x86.cpp
! src/cpu/x86/vm/interp_masm_x86_64.cpp
! src/cpu/x86/vm/macroAssembler_x86.cpp
! src/cpu/x86/vm/macroAssembler_x86.hpp
! src/cpu/x86/vm/sharedRuntime_x86_64.cpp
! src/cpu/x86/vm/shenandoahBarrierSet_x86.cpp
! src/cpu/x86/vm/stubGenerator_x86_64.cpp
! src/cpu/x86/vm/stubRoutines_x86_64.hpp
! src/cpu/x86/vm/templateInterpreter_x86_64.cpp
! src/cpu/x86/vm/templateTable_x86_64.cpp
! src/cpu/x86/vm/x86_64.ad
! src/share/vm/asm/assembler.cpp
! src/share/vm/c1/c1_LIRGenerator.cpp
! src/share/vm/c1/c1_Runtime1.cpp
! src/share/vm/c1/c1_Runtime1.hpp
! src/share/vm/ci/ciInstanceKlass.cpp
! src/share/vm/classfile/classLoaderData.cpp
! src/share/vm/classfile/classLoaderData.hpp
! src/share/vm/classfile/javaClasses.cpp
! src/share/vm/classfile/systemDictionary.cpp
! src/share/vm/code/codeCache.cpp
! src/share/vm/code/nmethod.cpp
! src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp
+ src/share/vm/gc_implementation/shared/parallelCleaning.cpp
+ src/share/vm/gc_implementation/shared/parallelCleaning.hpp
- src/share/vm/gc_implementation/shenandoah/brooksPointer.cpp
! src/share/vm/gc_implementation/shenandoah/brooksPointer.hpp
! src/share/vm/gc_implementation/shenandoah/brooksPointer.inline.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahBarrierSet.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahBarrierSet.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahBarrierSet.inline.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahCollectionSet.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahCollectorPolicy.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.inline.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahFreeSet.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.inline.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.inline.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.hpp
- src/share/vm/gc_implementation/shenandoah/shenandoahJNICritical.cpp
- src/share/vm/gc_implementation/shenandoah/shenandoahJNICritical.hpp
+ src/share/vm/gc_implementation/shenandoah/shenandoahLogging.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahMonitoringSupport.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahOopClosures.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahOopClosures.inline.hpp
+ src/share/vm/gc_implementation/shenandoah/shenandoahPhaseTimes.cpp
+ src/share/vm/gc_implementation/shenandoah/shenandoahPhaseTimes.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.hpp
+ src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.cpp
+ src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.hpp
+ src/share/vm/gc_implementation/shenandoah/shenandoahTaskqueue.inline.hpp
+ src/share/vm/gc_implementation/shenandoah/shenandoahWorkerDataArray.cpp
+ src/share/vm/gc_implementation/shenandoah/shenandoahWorkerDataArray.hpp
+ src/share/vm/gc_implementation/shenandoah/shenandoahWorkerDataArray.inline.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp
! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.cpp
! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.hpp
! src/share/vm/gc_interface/collectedHeap.cpp
! src/share/vm/gc_interface/collectedHeap.hpp
! src/share/vm/memory/barrierSet.cpp
! src/share/vm/memory/barrierSet.hpp
! src/share/vm/memory/barrierSet.inline.hpp
! src/share/vm/memory/genMarkSweep.cpp
! src/share/vm/memory/space.inline.hpp
! src/share/vm/oops/instanceKlass.cpp
! src/share/vm/oops/instanceRefKlass.cpp
! src/share/vm/oops/oop.cpp
! src/share/vm/oops/oop.hpp
! src/share/vm/oops/oop.inline.hpp
! src/share/vm/opto/compile.cpp
! src/share/vm/opto/escape.cpp
! src/share/vm/opto/graphKit.cpp
! src/share/vm/opto/macro.cpp
! src/share/vm/opto/memnode.cpp
! src/share/vm/opto/shenandoahSupport.cpp
! src/share/vm/opto/shenandoahSupport.hpp
! src/share/vm/prims/jni.cpp
! src/share/vm/prims/jvm.cpp
! src/share/vm/prims/jvmtiEnv.cpp
! src/share/vm/prims/unsafe.cpp
! src/share/vm/runtime/arguments.cpp
! src/share/vm/runtime/biasedLocking.cpp
! src/share/vm/runtime/deoptimization.cpp
! src/share/vm/runtime/fieldDescriptor.hpp
! src/share/vm/runtime/init.cpp
! src/share/vm/runtime/mutexLocker.cpp
! src/share/vm/runtime/mutexLocker.hpp
! src/share/vm/runtime/objectMonitor.cpp
! src/share/vm/runtime/objectMonitor.hpp
! src/share/vm/runtime/os.hpp
! src/share/vm/runtime/safepoint.cpp
! src/share/vm/runtime/sharedRuntime.cpp
! src/share/vm/runtime/stubRoutines.cpp
! src/share/vm/runtime/stubRoutines.hpp
! src/share/vm/runtime/synchronizer.cpp
! src/share/vm/runtime/synchronizer.hpp
! src/share/vm/runtime/thread.cpp
! src/share/vm/services/attachListener.cpp
! src/share/vm/services/diagnosticCommand.cpp
! src/share/vm/services/heapDumper.cpp
! src/share/vm/services/threadService.cpp
! src/share/vm/utilities/growableArray.hpp
! src/share/vm/utilities/taskqueue.hpp
! test/TEST.groups


From rkennke at redhat.com  Thu Dec  8 11:00:16 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 12:00:16 +0100
Subject: RFR: C1 cleanup
Message-ID: <1481194816.2597.39.camel@redhat.com>

This is a cleanup of C1 related code:

- Removed tmp1 and tmp2 from the ShenandoahWriteBarrier op (currently
not needed)
- Removed unused includes
- Several whitespace fixes to make code as close as possible to
upstream
- Removed shenandoah_write_barrier_slow_id stub. we now use the shared
WB stub

http://cr.openjdk.java.net/~rkennke/c1-cleanup/webrev.00/

Ok?

Roman

From shade at redhat.com  Thu Dec  8 11:05:46 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 8 Dec 2016 12:05:46 +0100
Subject: RFR: C1 cleanup
In-Reply-To: <1481194816.2597.39.camel@redhat.com>
References: <1481194816.2597.39.camel@redhat.com>
Message-ID: <4c504e92-e14d-0e9d-2db7-91259182b88d@redhat.com>

On 12/08/2016 12:00 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/c1-cleanup/webrev.00/

*) Shouldn't this assert be on lir_cas_long branch only? Or is it in upstream
that (odd) way?

1600 void LIR_Assembler::emit_compare_and_swap(LIR_OpCompareAndSwap* op) {
1601   assert(VM_Version::supports_cx8(), "wrong machine");

*) Please break this line:
1461   LIR_OpShenandoahWriteBarrier(LIR_Opr obj, LIR_Opr result, CodeEmitInfo*
info, bool need_null_check) : LIR_Op1(lir_shenandoah_wb, obj, result, T_OBJECT,
lir_patch_none, info), _need_null_check(need_null_check) {

Otherwise looks good.

Thanks,
-Aleksey


From roman at kennke.org  Thu Dec  8 11:13:00 2016
From: roman at kennke.org (roman at kennke.org)
Date: Thu, 08 Dec 2016 11:13:00 +0000
Subject: hg: shenandoah/jdk9/hotspot: C1 cleanup
Message-ID: <201612081113.uB8BD0I1029974@aojmv0008.oracle.com>

Changeset: e4acea31c079
Author:    rkennke
Date:      2016-12-08 12:12 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e4acea31c079

C1 cleanup

! src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp
! src/cpu/aarch64/vm/c1_LIRGenerator_aarch64.cpp
! src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp
! src/cpu/x86/vm/c1_LIRAssembler_x86.cpp
! src/cpu/x86/vm/c1_Runtime1_x86.cpp
! src/share/vm/c1/c1_LIR.cpp
! src/share/vm/c1/c1_LIR.hpp
! src/share/vm/c1/c1_LIRGenerator.cpp
! src/share/vm/c1/c1_Runtime1.cpp
! src/share/vm/c1/c1_Runtime1.hpp


From rkennke at redhat.com  Thu Dec  8 11:13:23 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 12:13:23 +0100
Subject: RFR: C1 cleanup
In-Reply-To: <4c504e92-e14d-0e9d-2db7-91259182b88d@redhat.com>
References: <1481194816.2597.39.camel@redhat.com>
	<4c504e92-e14d-0e9d-2db7-91259182b88d@redhat.com>
Message-ID: <1481195603.2597.40.camel@redhat.com>

Am Donnerstag, den 08.12.2016, 12:05 +0100 schrieb Aleksey Shipilev:
> On 12/08/2016 12:00 PM, Roman Kennke wrote:
> > http://cr.openjdk.java.net/~rkennke/c1-cleanup/webrev.00/
> 
> *) Shouldn't this assert be on lir_cas_long branch only? Or is it in
> upstream
> that (odd) way?
> 
> 1600 void LIR_Assembler::emit_compare_and_swap(LIR_OpCompareAndSwap*
> op) {
> 1601???assert(VM_Version::supports_cx8(), "wrong machine");

It's in upstream like this.

> *) Please break this line:
> 1461???LIR_OpShenandoahWriteBarrier(LIR_Opr obj, LIR_Opr result,
> CodeEmitInfo*
> info, bool need_null_check) : LIR_Op1(lir_shenandoah_wb, obj, result,
> T_OBJECT,
> lir_patch_none, info), _need_null_check(need_null_check) {
> 
> Otherwise looks good.

Ok, pushed with that line broken in half.

Roman


From shade at redhat.com  Thu Dec  8 13:25:03 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 8 Dec 2016 14:25:03 +0100
Subject: RFR (S): Fix shutdown/cancelled races
Message-ID: <d8ba588a-be97-b272-7fcc-4327aae4e46b@redhat.com>

Hi,

The recent change for early cancellation introduced/exposed a few interesting
races in shutdown/cancellation sequence.

First race is on shutdown, and goes like this:
 a) SHHeap::stop() is called.
 b) SHHeap::stop() sets cancelled_gc to "true"
 c) SHConcThread loop detects canceled GC, and tries to exit
 d) SHConcThread fails, because neither full GC nor "terminate" is set
    assert (_do_full_gc || should_terminate(),
       "Either exiting, or impending Full GC");
 e) SHHeap::stop() eventually calls SHConcThread::stop() to set "terminate", but
it is too late.

Fixed by introducing the "graceful shutdown" flag.

Second race is between canceling GC and scheduling a full GC. Goes like this:
 a) ShenandoahHeap::collect() cancels GC
 b) SHConcThread loop detects canceled GC, and tries to exit
 c) SHConcThread fails, because neither full GC nor "terminate" is set
    assert (_do_full_gc || should_terminate(),
       "Either exiting, or impending Full GC");
 d) ShenandoahHeap::collect() eventually calls into do_full_gc() to set
_do_full_gc, but it is too late.

Solved by moving GC cancellation within the do_full_gc method, and canceling
after Full GC is scheduled.

Both fixes:
 http://cr.openjdk.java.net/~shade/shenandoah/cancel-races/webrev.01/

Testing: hs_gc_shenandoah (with sleeps in critical places to exacerbate races),
jcstress (tests-all) that was failing before.

Note that in last week's code both races could have tried to start concurrent
mark, or dived to sleep for 10ms, before SHConcThread could not detect it was
stopped. It would have exited early by detecting the canceled GC. New code
checks that early before doing the GC cycle, in case we slip like that again.

Thanks,
-Aleksey


From rwestrel at redhat.com  Thu Dec  8 14:15:02 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 08 Dec 2016 15:15:02 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
Message-ID: <dk6a8c6zevt.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/

This re-enables an optimization that was disabled with shenandoah.

Roland.

From shade at redhat.com  Thu Dec  8 14:25:31 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 8 Dec 2016 15:25:31 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
Message-ID: <cb0a23d2-9542-fcf7-effb-a5d83996835a@redhat.com>

On 12/08/2016 03:15 PM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/

*) (sirens sound, line breaking police storming in)

Break this line :)

1076   call = make_leaf_call(c, m, OptoRuntime::shenandoah_clone_barrier_Type(),
CAST_FROM_FN_PTR(address, SharedRuntime::shenandoah_clone_barrier),
"shenandoah_clone_barrier", raw_adr_type, dest->in(AddPNode::Base));

Roman would probably do a more thorough review of this compiler change.

Thanks,
-Aleksey


From rkennke at redhat.com  Thu Dec  8 14:55:05 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 15:55:05 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
Message-ID: <1481208905.2597.45.camel@redhat.com>

Am Donnerstag, den 08.12.2016, 15:15 +0100 schrieb Roland Westrelin:
> http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/
> 
> This re-enables an optimization that was disabled with shenandoah.

Cool! I like that!

Do we have any idea if it does improve performance? That would be
arraycopy on smallish arrays only right? Aleksey?

This removes the call to SharedRuntime::shenandoah_clone_barrier(). You
should also remove that method. I find references of it in :

src/share/vm/opto/runtime.cpp
src/share/vm/opto/runtime.hpp
src/share/vm/opto/escape.cpp
src/share/vm/runtime/sharedRuntime.hpp
src/share/vm/runtime/sharedRuntime.cpp

:-)

Also, we really need to trim down shenandoah-specific changes in c2. My
idea is to move everything that's more than 2 lines to
shenandoahSupport.cpp and have other code in C2 call that. I wanted to
do that for the GraphKit::shenandoah_XYZ_barrier() methods, but we seem
to be getting more of such stuff :-) That's for another patch though.

Roman

From rkennke at redhat.com  Thu Dec  8 14:57:47 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 15:57:47 +0100
Subject: RFR (S): Fix shutdown/cancelled races
In-Reply-To: <d8ba588a-be97-b272-7fcc-4327aae4e46b@redhat.com>
References: <d8ba588a-be97-b272-7fcc-4327aae4e46b@redhat.com>
Message-ID: <1481209067.2597.46.camel@redhat.com>

Patch looks good.

Did you need to change any of the tests? E.g. "with sleeps in critical
places to exacerbate races" ?? I can't tell you how often we have
'fixed' this code before... having a test triggering on the bug would
be awesome!

Roman


Am Donnerstag, den 08.12.2016, 14:25 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> The recent change for early cancellation introduced/exposed a few
> interesting
> races in shutdown/cancellation sequence.
> 
> First race is on shutdown, and goes like this:
> ?a) SHHeap::stop() is called.
> ?b) SHHeap::stop() sets cancelled_gc to "true"
> ?c) SHConcThread loop detects canceled GC, and tries to exit
> ?d) SHConcThread fails, because neither full GC nor "terminate" is
> set
> ????assert (_do_full_gc || should_terminate(),
> ???????"Either exiting, or impending Full GC");
> ?e) SHHeap::stop() eventually calls SHConcThread::stop() to set
> "terminate", but
> it is too late.
> 
> Fixed by introducing the "graceful shutdown" flag.
> 
> Second race is between canceling GC and scheduling a full GC. Goes
> like this:
> ?a) ShenandoahHeap::collect() cancels GC
> ?b) SHConcThread loop detects canceled GC, and tries to exit
> ?c) SHConcThread fails, because neither full GC nor "terminate" is
> set
> ????assert (_do_full_gc || should_terminate(),
> ???????"Either exiting, or impending Full GC");
> ?d) ShenandoahHeap::collect() eventually calls into do_full_gc() to
> set
> _do_full_gc, but it is too late.
> 
> Solved by moving GC cancellation within the do_full_gc method, and
> canceling
> after Full GC is scheduled.
> 
> Both fixes:
> ?http://cr.openjdk.java.net/~shade/shenandoah/cancel-races/webrev.01/
> 
> Testing: hs_gc_shenandoah (with sleeps in critical places to
> exacerbate races),
> jcstress (tests-all) that was failing before.
> 
> Note that in last week's code both races could have tried to start
> concurrent
> mark, or dived to sleep for 10ms, before SHConcThread could not
> detect it was
> stopped. It would have exited early by detecting the canceled GC. New
> code
> checks that early before doing the GC cycle, in case we slip like
> that again.
> 
> Thanks,
> -Aleksey
> 

From ashipile at redhat.com  Thu Dec  8 15:36:37 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Thu, 08 Dec 2016 15:36:37 +0000
Subject: hg: shenandoah/jdk9/hotspot: Fix shutdown/cancelled races.
Message-ID: <201612081536.uB8FabTM018219@aojmv0008.oracle.com>

Changeset: 36b281f64016
Author:    shade
Date:      2016-12-08 16:36 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/36b281f64016

Fix shutdown/cancelled races.

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp


From shade at redhat.com  Thu Dec  8 15:37:40 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 8 Dec 2016 16:37:40 +0100
Subject: RFR (S): Fix shutdown/cancelled races
In-Reply-To: <1481209067.2597.46.camel@redhat.com>
References: <d8ba588a-be97-b272-7fcc-4327aae4e46b@redhat.com>
	<1481209067.2597.46.camel@redhat.com>
Message-ID: <167226b6-3d00-d0c9-c658-e3972f430351@redhat.com>

On 12/08/2016 03:57 PM, Roman Kennke wrote:
> Patch looks good.

Thanks, pushed.

> Did you need to change any of the tests? E.g. "with sleeps in critical
> places to exacerbate races" ?? 

I had to put it right at Shenandoah product code to trigger, so not really
committable...

> I can't tell you how often we have
> 'fixed' this code before... having a test triggering on the bug would
> be awesome!

The "regular" jcstress testing found the races because of new asserts, so I
guess we somewhat covered there.

Thanks,
-Aleksey


From shade at redhat.com  Thu Dec  8 15:38:27 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 8 Dec 2016 16:38:27 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <1481208905.2597.45.camel@redhat.com>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
	<1481208905.2597.45.camel@redhat.com>
Message-ID: <7686428e-154f-4504-afbb-d2272c74633a@redhat.com>

On 12/08/2016 03:55 PM, Roman Kennke wrote:
> Am Donnerstag, den 08.12.2016, 15:15 +0100 schrieb Roland Westrelin:
>> http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/
>>
>> This re-enables an optimization that was disabled with shenandoah.
> 
> Cool! I like that!
> 
> Do we have any idea if it does improve performance? That would be
> arraycopy on smallish arrays only right? Aleksey?

Let me find the arraycopy tests (that I swear I did in OpenJDK for the previous
Roland's non-Shenandoah patch :) and run then.

Thanks,
-Aleksey


From rwestrel at redhat.com  Thu Dec  8 15:55:12 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 08 Dec 2016 16:55:12 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <1481208905.2597.45.camel@redhat.com>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
	<1481208905.2597.45.camel@redhat.com>
Message-ID: <dk67f7aza8v.fsf@rwestrel.remote.csb>


> This removes the call to SharedRuntime::shenandoah_clone_barrier().

It doesn't remove it. It moves it around. It's now only added to those
clones that were not converted to loads/stores. Clone when it's not
converted to loads/stores uses bulk copies. So it's not obvious to me
that we can do better than using the
SharedRuntime::shenandoah_clone_barrier() call. I also added a test for
any object fields so the call to
SharedRuntime::shenandoah_clone_barrier() is not emitted when it's
obviously not needed.

Roland.

From rkennke at redhat.com  Thu Dec  8 15:56:16 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 16:56:16 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <dk67f7aza8v.fsf@rwestrel.remote.csb>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
	<1481208905.2597.45.camel@redhat.com>
	<dk67f7aza8v.fsf@rwestrel.remote.csb>
Message-ID: <1481212576.2597.48.camel@redhat.com>

Am Donnerstag, den 08.12.2016, 16:55 +0100 schrieb Roland Westrelin:
> > This removes the call to SharedRuntime::shenandoah_clone_barrier().
> 
> It doesn't remove it. It moves it around. It's now only added to
> those
> clones that were not converted to loads/stores. Clone when it's not
> converted to loads/stores uses bulk copies. So it's not obvious to me
> that we can do better than using the
> SharedRuntime::shenandoah_clone_barrier() call. I also added a test
> for
> any object fields so the call to
> SharedRuntime::shenandoah_clone_barrier() is not emitted when it's
> obviously not needed.

Ah. Oops, my bad :-)

Ok to push then.

Roman


From rwestrel at redhat.com  Thu Dec  8 16:01:59 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 08 Dec 2016 17:01:59 +0100
Subject: backport of jdk9 c2 code to 8
Message-ID: <dk64m2ez9xk.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/shenandoah/jdk9-backport/webrev.00/

I had to import some non shenandoah changes from jdk9 to make my life
easier.

Roland.

From rkennke at redhat.com  Thu Dec  8 16:08:04 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 17:08:04 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <dk67f7aza8v.fsf@rwestrel.remote.csb>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
	<1481208905.2597.45.camel@redhat.com>
	<dk67f7aza8v.fsf@rwestrel.remote.csb>
Message-ID: <1481213284.2597.50.camel@redhat.com>

Am Donnerstag, den 08.12.2016, 16:55 +0100 schrieb Roland Westrelin:
> > This removes the call to SharedRuntime::shenandoah_clone_barrier().
> 
> It doesn't remove it. It moves it around. It's now only added to
> those
> clones that were not converted to loads/stores. Clone when it's not
> converted to loads/stores uses bulk copies. So it's not obvious to me
> that we can do better than using the
> SharedRuntime::shenandoah_clone_barrier() call.

Now that you say it, I wonder what's done for other GCs. They must be
doing something here, to update the card tables. Other arraycopy
routines call a special barrier in
BarrierSet::static_write_ref_array_post(), this is not suitable for
clones, but do they call any barrier for clone too? Or can other GCs
ignore it because it's basically initializing stores?

Roman

From rkennke at redhat.com  Thu Dec  8 16:09:16 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 17:09:16 +0100
Subject: backport of jdk9 c2 code to 8
In-Reply-To: <dk64m2ez9xk.fsf@rwestrel.remote.csb>
References: <dk64m2ez9xk.fsf@rwestrel.remote.csb>
Message-ID: <1481213356.2597.51.camel@redhat.com>

Am Donnerstag, den 08.12.2016, 17:01 +0100 schrieb Roland Westrelin:
> http://cr.openjdk.java.net/~roland/shenandoah/jdk9-backport/webrev.00
> /
> 
> I had to import some non shenandoah changes from jdk9 to make my life
> easier.

Looks good to me.

Roman

From rwestrel at redhat.com  Thu Dec  8 16:14:35 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 8 Dec 2016 17:14:35 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <1481213284.2597.50.camel@redhat.com>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
	<1481208905.2597.45.camel@redhat.com>
	<dk67f7aza8v.fsf@rwestrel.remote.csb>
	<1481213284.2597.50.camel@redhat.com>
Message-ID: <50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com>

> Now that you say it, I wonder what's done for other GCs. They must be
> doing something here, to update the card tables. Other arraycopy
> routines call a special barrier in
> BarrierSet::static_write_ref_array_post(), this is not suitable for
> clones, but do they call any barrier for clone too? Or can other GCs
> ignore it because it's basically initializing stores?

For clone, unless ReduceInitialCardMarks is false, nothing is done AFAICT.

Roland.

From rkennke at redhat.com  Thu Dec  8 16:24:25 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 17:24:25 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
	<1481208905.2597.45.camel@redhat.com>
	<dk67f7aza8v.fsf@rwestrel.remote.csb>
	<1481213284.2597.50.camel@redhat.com>
	<50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com>
Message-ID: <1481214265.2597.52.camel@redhat.com>

Am Donnerstag, den 08.12.2016, 17:14 +0100 schrieb Roland Westrelin:
> > Now that you say it, I wonder what's done for other GCs. They must
> > be
> > doing something here, to update the card tables. Other arraycopy
> > routines call a special barrier in
> > BarrierSet::static_write_ref_array_post(), this is not suitable for
> > clones, but do they call any barrier for clone too? Or can other
> > GCs
> > ignore it because it's basically initializing stores?
> 
> For clone, unless ReduceInitialCardMarks is false, nothing is done
> AFAICT.

Ok. And what happens when ReduceInitialCardMarks is false? Because this
might be what we need.

Roman


From shade at redhat.com  Thu Dec  8 16:37:44 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 8 Dec 2016 17:37:44 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <7686428e-154f-4504-afbb-d2272c74633a@redhat.com>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
	<1481208905.2597.45.camel@redhat.com>
	<7686428e-154f-4504-afbb-d2272c74633a@redhat.com>
Message-ID: <262d5e03-a5b3-43a0-5272-3138fc3da291@redhat.com>

On 12/08/2016 04:38 PM, Aleksey Shipilev wrote:
> On 12/08/2016 03:55 PM, Roman Kennke wrote:
>> Am Donnerstag, den 08.12.2016, 15:15 +0100 schrieb Roland Westrelin:
>>> http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/
>>> 
>>> This re-enables an optimization that was disabled with shenandoah.
>> 
>> Cool! I like that!
>> 
>> Do we have any idea if it does improve performance? That would be arraycopy
>> on smallish arrays only right? Aleksey?
> 
> Let me find the arraycopy tests (that I swear I did in OpenJDK for the
> previous Roland's non-Shenandoah patch :) and run then.

Using this test:

http://icedtea.classpath.org/people/shade/gc-bench/file/6d332199876c/src/main/java/org/openjdk/gcbench/runtime/arraycopy/RefArray.java

=== baseline

Benchmark          Mode  Cnt   Score   Error  Units

RefArray.nulls_01  avgt    5   3.987 ? 1.282  ns/op
RefArray.nulls_02  avgt    5   4.185 ? 0.145  ns/op
RefArray.nulls_04  avgt    5   5.022 ? 0.601  ns/op
RefArray.nulls_08  avgt    5   6.421 ? 0.252  ns/op
RefArray.nulls_16  avgt    5   8.344 ? 1.012  ns/op
RefArray.nulls_32  avgt    5  14.646 ? 1.486  ns/op
RefArray.nulls_64  avgt    5  28.125 ? 3.523  ns/op

RefArray.objs_01   avgt    5   3.905 ? 0.131  ns/op
RefArray.objs_02   avgt    5   4.267 ? 0.332  ns/op
RefArray.objs_04   avgt    5   4.838 ? 0.064  ns/op
RefArray.objs_08   avgt    5   6.459 ? 0.187  ns/op
RefArray.objs_16   avgt    5   8.610 ? 1.526  ns/op
RefArray.objs_32   avgt    5  14.269 ? 0.536  ns/op
RefArray.objs_64   avgt    5  27.225 ? 0.405  ns/op


=== baseline +UseShenandoahGC

Benchmark          Mode  Cnt   Score   Error  Units

RefArray.nulls_01  avgt    5  16.021 ? 0.379  ns/op
RefArray.nulls_02  avgt    5  15.997 ? 0.137  ns/op
RefArray.nulls_04  avgt    5  16.560 ? 0.342  ns/op
RefArray.nulls_08  avgt    5  16.103 ? 0.070  ns/op
RefArray.nulls_16  avgt    5  17.060 ? 0.285  ns/op
RefArray.nulls_32  avgt    5  18.654 ? 0.092  ns/op
RefArray.nulls_64  avgt    5  30.848 ? 0.948  ns/op

RefArray.objs_01   avgt    5  15.941 ? 0.015  ns/op
RefArray.objs_02   avgt    5  15.953 ? 0.041  ns/op
RefArray.objs_04   avgt    5  16.514 ? 0.059  ns/op
RefArray.objs_08   avgt    5  16.122 ? 0.032  ns/op
RefArray.objs_16   avgt    5  17.110 ? 0.146  ns/op
RefArray.objs_32   avgt    5  19.304 ? 0.622  ns/op
RefArray.objs_64   avgt    5  31.025 ? 0.806  ns/op


=== patched +UseShenandoahGC

Benchmark          Mode  Cnt   Score   Error  Units

RefArray.nulls_01  avgt    5   5.110 ? 0.033  ns/op
RefArray.nulls_02  avgt    5   5.293 ? 0.019  ns/op
RefArray.nulls_04  avgt    5   6.903 ? 0.065  ns/op
RefArray.nulls_08  avgt    5   9.627 ? 0.043  ns/op
RefArray.nulls_16  avgt    5  17.016 ? 0.134  ns/op
RefArray.nulls_32  avgt    5  19.466 ? 2.545  ns/op
RefArray.nulls_64  avgt    5  30.659 ? 0.147  ns/op

RefArray.objs_01   avgt    5   5.171 ? 0.106  ns/op
RefArray.objs_02   avgt    5   5.827 ? 0.013  ns/op
RefArray.objs_04   avgt    5   7.377 ? 0.046  ns/op
RefArray.objs_08   avgt    5   9.353 ? 0.099  ns/op
RefArray.objs_16   avgt    5  17.097 ? 0.434  ns/op
RefArray.objs_32   avgt    5  19.212 ? 0.792  ns/op
RefArray.objs_64   avgt    5  30.818 ? 0.301  ns/op


Good to go. I guess the code quality might be a teeny little better (we've seen
this before with null-paths in read barriers being thrown out), but I'll take
that too.

  0.82%    1.21%      ??     0x00007f2451477ea1: mov    0x10(%rcx),%r10d
  1.17%    1.04%      ??     0x00007f2451477ea5: test   %r10d,%r10d
                   ?  ??     0x00007f2451477ea8: je     0x00007f2451477f01
  2.23%    2.78%   ?  ??     0x00007f2451477eaa: mov    -0x8(%r12,%r10,8),%r10
 10.65%   13.44%   ?  ??     0x00007f2451477eaf: mov    %r10,%r11
  0.14%    0.19%   ?  ??     0x00007f2451477eb2: shr    $0x3,%r11
  2.68%    3.36%   ?  ???    0x00007f2451477eb6: mov    %r11d,0x10(%rdx)
  2.46%    3.38%   ?  ???    0x00007f2451477eba: mov    0x14(%rcx),%r11d
  0.05%            ?  ???    0x00007f2451477ebe: test   %r11d,%r11d
                   ?? ???    0x00007f2451477ec1: je     0x00007f2451477f06
  0.12%    0.17%   ?? ???    0x00007f2451477ec3: mov    -0x8(%r12,%r11,8),%r10
  0.64%    1.01%   ?? ???    0x00007f2451477ec8: mov    %r10,%r11
  2.42%    2.35%   ?? ???    0x00007f2451477ecb: shr    $0x3,%r11
  0.34%    0.41%   ?? ????   0x00007f2451477ecf: mov    %r11d,0x14(%rdx)
  1.27%    1.26%   ?? ????   0x00007f2451477ed3: mov    0x18(%rcx),%r11d
  0.10%    0.09%   ?? ????   0x00007f2451477ed7: test   %r11d,%r11d
                   ???????   0x00007f2451477eda: je     0x00007f2451477f0b
  1.77%    1.35%   ???????   0x00007f2451477edc: mov    -0x8(%r12,%r11,8),%r10
  0.36%    0.51%   ???????   0x00007f2451477ee1: mov    %r10,%r11
  1.10%    1.33%   ???????   0x00007f2451477ee4: shr    $0x3,%r11
  0.24%    0.19%   ????????  0x00007f2451477ee8: mov    %r11d,0x18(%rdx)
  1.77%    1.36%   ????????  0x00007f2451477eec: mov    0x1c(%rcx),%r11d
  0.02%    0.03%   ????????  0x00007f2451477ef0: test   %r11d,%r11d
                   ????????  0x00007f2451477ef3: jne    0x00007f2451477dd1
                   ??? ????  0x00007f2451477ef9: xor    %r10,%r10
                   ??? ????  0x00007f2451477efc: jmpq   0x00007f2451477dda
                   ???  ???  0x00007f2451477f01: xor    %r11,%r11
                    ??  ???  0x00007f2451477f04: jmp    0x00007f2451477eb6
                    ??   ??  0x00007f2451477f06: xor    %r11,%r11
                     ?   ??  0x00007f2451477f09: jmp    0x00007f2451477ecf
                     ?    ?  0x00007f2451477f0b: xor    %r11,%r11
                          ?  0x00007f2451477f0e: jmp    0x00007f2451477ee8


-Aleksey


From rwestrel at redhat.com  Thu Dec  8 16:41:23 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 08 Dec 2016 17:41:23 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <1481214265.2597.52.camel@redhat.com>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
	<1481208905.2597.45.camel@redhat.com>
	<dk67f7aza8v.fsf@rwestrel.remote.csb>
	<1481213284.2597.50.camel@redhat.com>
	<50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com>
	<1481214265.2597.52.camel@redhat.com>
Message-ID: <dk61sxiz83w.fsf@rwestrel.remote.csb>


> Ok. And what happens when ReduceInitialCardMarks is false? Because this
> might be what we need.

For instance clone:

    post_barrier(control(),
                 memory(raw_adr_type),
                 alloc_obj,
                 no_particular_field,
                 raw_adr_idx,
                 no_particular_value,
                 T_OBJECT,
                 false);

void GraphKit::post_barrier(Node* ctl,
                            Node* store,
                            Node* obj,
                            Node* adr,
                            uint  adr_idx,
                            Node* val,
                            BasicType bt,
                            bool use_precise) {
  BarrierSet* bs = Universe::heap()->barrier_set();
  set_control(ctl);
  switch (bs->kind()) {
    case BarrierSet::G1SATBCTLogging:
      g1_write_barrier_post(store, obj, adr, adr_idx, val, bt, use_precise);
      break;

    case BarrierSet::CardTableForRS:
    case BarrierSet::CardTableExtension:
      write_barrier_post(store, obj, adr, adr_idx, val, use_precise);
      break;

    case BarrierSet::ModRef:
    case BarrierSet::ShenandoahBarrierSet:
      break;

    default      :
      ShouldNotReachHere();

  }
}

For array clone, if I follow the logic correctly
arrayof_oop_disjoint_arraycopy stub.

The shenandoah clone barrier is a no-op unless
ShenandoahBarrierSet::need_update_refs_barrier() is true. If it's false
often enough, then it seems a reasonable trade off to do the bulk copy
and have an extra call.

Roland.

From roman at kennke.org  Thu Dec  8 16:48:50 2016
From: roman at kennke.org (roman at kennke.org)
Date: Thu, 08 Dec 2016 16:48:50 +0000
Subject: hg: shenandoah/jdk8u/hotspot: Added dummy arg consumer to
	pseudo-logging code to be able to build release.
Message-ID: <201612081648.uB8Gmol0014349@aojmv0008.oracle.com>

Changeset: db98996d26b2
Author:    rkennke
Date:      2016-12-08 17:48 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/db98996d26b2

Added dummy arg consumer to pseudo-logging code to be able to build release.

! src/share/vm/gc_implementation/shenandoah/shenandoahLogging.hpp


From rkennke at redhat.com  Thu Dec  8 16:52:05 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 17:52:05 +0100
Subject: Enable optimization of arraycopy as loads/stores with Shenandoah
In-Reply-To: <dk61sxiz83w.fsf@rwestrel.remote.csb>
References: <dk6a8c6zevt.fsf@rwestrel.remote.csb>
	<1481208905.2597.45.camel@redhat.com>
	<dk67f7aza8v.fsf@rwestrel.remote.csb>
	<1481213284.2597.50.camel@redhat.com>
	<50ca7cab-72a9-5a8d-d1fc-de39279dbffe@redhat.com>
	<1481214265.2597.52.camel@redhat.com>
	<dk61sxiz83w.fsf@rwestrel.remote.csb>
Message-ID: <1481215925.2597.54.camel@redhat.com>

Am Donnerstag, den 08.12.2016, 17:41 +0100 schrieb Roland Westrelin:
> > Ok. And what happens when ReduceInitialCardMarks is false? Because
> > this
> > might be what we need.
> 
> For instance clone:
> 
> ????post_barrier(control(),
> ?????????????????memory(raw_adr_type),
> ?????????????????alloc_obj,
> ?????????????????no_particular_field,
> ?????????????????raw_adr_idx,
> ?????????????????no_particular_value,
> ?????????????????T_OBJECT,
> ?????????????????false);
> 
> void GraphKit::post_barrier(Node* ctl,
> ????????????????????????????Node* store,
> ????????????????????????????Node* obj,
> ????????????????????????????Node* adr,
> ????????????????????????????uint??adr_idx,
> ????????????????????????????Node* val,
> ????????????????????????????BasicType bt,
> ????????????????????????????bool use_precise) {
> ? BarrierSet* bs = Universe::heap()->barrier_set();
> ? set_control(ctl);
> ? switch (bs->kind()) {
> ????case BarrierSet::G1SATBCTLogging:
> ??????g1_write_barrier_post(store, obj, adr, adr_idx, val, bt,
> use_precise);
> ??????break;
> 
> ????case BarrierSet::CardTableForRS:
> ????case BarrierSet::CardTableExtension:
> ??????write_barrier_post(store, obj, adr, adr_idx, val, use_precise);
> ??????break;
> 
> ????case BarrierSet::ModRef:
> ????case BarrierSet::ShenandoahBarrierSet:
> ??????break;
> 
> ????default??????:
> ??????ShouldNotReachHere();
> 
> ? }
> }
> 
> For array clone, if I follow the logic correctly
> arrayof_oop_disjoint_arraycopy stub.
> 
> The shenandoah clone barrier is a no-op unless
> ShenandoahBarrierSet::need_update_refs_barrier() is true. If it's
> false
> often enough, then it seems a reasonable trade off to do the bulk
> copy
> and have an extra call.

Ok. I know I went through this a while ago, but needed a refresher ;-)

Thanks, Roman

From rkennke at redhat.com  Thu Dec  8 16:54:52 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 17:54:52 +0100
Subject: RFR (XL): Backport JDK9 Shenandoah to JDK8u
In-Reply-To: <1481059464.2597.29.camel@redhat.com>
References: <1481059464.2597.29.camel@redhat.com>
Message-ID: <1481216092.2597.56.camel@redhat.com>

I just pushed this little attendum without review:

http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/db98996d26b2

It consumes the arguments for our dummy-logging in release builds. Same
hack as in JDK9 logging.

Roman

Am Dienstag, den 06.12.2016, 22:24 +0100 schrieb Roman Kennke:
> This huge change backports the current state of JDK9 (minus the last
> bunch of patches) to jdk8u:
> 
> http://cr.openjdk.java.net/~rkennke/backport-jdk8/webrev.00/
> 
> Not sure if this can be reasonably reviewed. ;-)
> 
> I checked this line by line and also compared it to our baseline repo
> (
> http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/).
> 
> The one thing missing is changes in src/share/vm/opto and
> src/share/vm/adlc, but Roland is working on those.
> 
> I've checked with SPECjvm2008 and jcstress is on the way.
> Unfortunately, I could not get the jtreg stuff to work.
> 
> Ok to go in?
> 
> Roman

From rkennke at redhat.com  Thu Dec  8 17:08:48 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 08 Dec 2016 18:08:48 +0100
Subject: RFR: More cleanup
Message-ID: <1481216928.2597.59.camel@redhat.com>

This removes some more unnecessary diffs between jdk9 baseline and
shenandoah:

http://cr.openjdk.java.net/~rkennke/cleanup/webrev.00/

There's not much of significance here, except that it brings our repo
closer to our baseline.

The two most interesting ones:
- In ThreadLocalAllocBuffer, we added 1 extra word for the brooks ptr
to the end reserve. I believe this has been added a long time ago, and
probably for the bug we resolved yesterday :-) In any case, it's not
needed.
- In JVM_Clone, we're doing a read-barrier when sticking an oop into a
Handle. However, there's no guarantee that when we're crossing a
safepoint there, that the oop is still valid for reading. Barriers
should therefore always be done after pulling the oop out of the
Handle. Done with this patch.

Ok?

Roman


From rwestrel at redhat.com  Fri Dec  9 08:30:48 2016
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Fri, 09 Dec 2016 08:30:48 +0000
Subject: hg: shenandoah/jdk8u/hotspot: backport shenandoah C2 support from jdk9
Message-ID: <201612090830.uB98UmSx026466@aojmv0008.oracle.com>

Changeset: da17b9cffd4f
Author:    roland
Date:      2016-12-08 13:28 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/da17b9cffd4f

backport shenandoah C2 support from jdk9

! src/share/vm/adlc/formssel.cpp
! src/share/vm/adlc/formssel.hpp
! src/share/vm/adlc/output_c.cpp
! src/share/vm/adlc/output_h.cpp
! src/share/vm/ci/ciInstanceKlass.cpp
! src/share/vm/ci/ciInstanceKlass.hpp
! src/share/vm/opto/addnode.cpp
! src/share/vm/opto/callGenerator.cpp
! src/share/vm/opto/callnode.cpp
! src/share/vm/opto/callnode.hpp
! src/share/vm/opto/cfgnode.cpp
! src/share/vm/opto/compile.cpp
! src/share/vm/opto/compile.hpp
! src/share/vm/opto/connode.cpp
! src/share/vm/opto/connode.hpp
! src/share/vm/opto/escape.cpp
! src/share/vm/opto/gcm.cpp
! src/share/vm/opto/graphKit.cpp
! src/share/vm/opto/graphKit.hpp
! src/share/vm/opto/lcm.cpp
! src/share/vm/opto/library_call.cpp
! src/share/vm/opto/loopPredicate.cpp
! src/share/vm/opto/loopTransform.cpp
! src/share/vm/opto/loopnode.cpp
! src/share/vm/opto/loopnode.hpp
! src/share/vm/opto/loopopts.cpp
! src/share/vm/opto/machnode.cpp
! src/share/vm/opto/machnode.hpp
! src/share/vm/opto/macro.cpp
! src/share/vm/opto/matcher.cpp
! src/share/vm/opto/matcher.hpp
! src/share/vm/opto/memnode.cpp
! src/share/vm/opto/memnode.hpp
! src/share/vm/opto/multnode.cpp
! src/share/vm/opto/multnode.hpp
! src/share/vm/opto/node.cpp
! src/share/vm/opto/node.hpp
! src/share/vm/opto/parse2.cpp
! src/share/vm/opto/parse3.cpp
! src/share/vm/opto/phaseX.cpp
! src/share/vm/opto/phaseX.hpp
! src/share/vm/opto/runtime.cpp
! src/share/vm/opto/runtime.hpp
! src/share/vm/opto/shenandoahSupport.cpp
! src/share/vm/opto/shenandoahSupport.hpp
! src/share/vm/opto/stringopts.cpp
! src/share/vm/opto/subnode.cpp
! src/share/vm/opto/superword.cpp
! src/share/vm/opto/superword.hpp
! src/share/vm/opto/type.cpp
! src/share/vm/opto/type.hpp


From rwestrel at redhat.com  Fri Dec  9 09:04:28 2016
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Fri, 09 Dec 2016 09:04:28 +0000
Subject: hg: shenandoah/jdk9/hotspot: Enable optimization of arraycopy as
	loads/stores with Shenandoah
Message-ID: <201612090904.uB994Shq005569@aojmv0008.oracle.com>

Changeset: f61052a4dd46
Author:    roland
Date:      2016-12-08 14:45 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/f61052a4dd46

Enable optimization of arraycopy as loads/stores with Shenandoah

! src/share/vm/ci/ciInstanceKlass.cpp
! src/share/vm/ci/ciInstanceKlass.hpp
! src/share/vm/opto/arraycopynode.cpp
! src/share/vm/opto/arraycopynode.hpp
! src/share/vm/opto/library_call.cpp
! src/share/vm/opto/macro.hpp
! src/share/vm/opto/macroArrayCopy.cpp
! src/share/vm/opto/shenandoahSupport.cpp


From rwestrel at redhat.com  Fri Dec  9 09:31:49 2016
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Fri, 09 Dec 2016 09:31:49 +0000
Subject: hg: shenandoah/jdk9/hotspot: replace barrier's input with barrier's
	output in all dominated uses to decrease pressure on register
	allocator
Message-ID: <201612090931.uB99Vntp013056@aojmv0008.oracle.com>

Changeset: 577da6ba5a48
Author:    roland
Date:      2016-12-02 16:49 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/577da6ba5a48

replace barrier's input with barrier's output in all dominated uses to decrease pressure on register allocator

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/opto/block.hpp
! src/share/vm/opto/lcm.cpp
! src/share/vm/opto/loopnode.hpp
! src/share/vm/opto/shenandoahSupport.cpp


From shade at redhat.com  Fri Dec  9 10:58:29 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 9 Dec 2016 11:58:29 +0100
Subject: RFR: More cleanup
In-Reply-To: <1481216928.2597.59.camel@redhat.com>
References: <1481216928.2597.59.camel@redhat.com>
Message-ID: <cd7da095-9c00-1fc6-6f4f-5c43225b7ea8@redhat.com>

On 12/08/2016 06:08 PM, Roman Kennke wrote:
> This removes some more unnecessary diffs between jdk9 baseline and
> shenandoah:
> 
> http://cr.openjdk.java.net/~rkennke/cleanup/webrev.00/

Looks okay to me.

-Aleksey


From roman at kennke.org  Fri Dec  9 11:02:11 2016
From: roman at kennke.org (roman at kennke.org)
Date: Fri, 09 Dec 2016 11:02:11 +0000
Subject: hg: shenandoah/jdk9/hotspot: More cleanup
Message-ID: <201612091102.uB9B2BGv006760@aojmv0008.oracle.com>

Changeset: be1010acc2ff
Author:    rkennke
Date:      2016-12-09 12:01 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/be1010acc2ff

More cleanup

! src/cpu/aarch64/vm/aarch64.ad
! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
! src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
! src/cpu/aarch64/vm/methodHandles_aarch64.cpp
! src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp
! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp
! src/cpu/aarch64/vm/stubRoutines_aarch64.hpp
! src/cpu/aarch64/vm/templateInterpreterGenerator_aarch64.cpp
! src/share/vm/classfile/systemDictionary.cpp
! src/share/vm/gc/serial/genMarkSweep.cpp
! src/share/vm/gc/shared/gcCause.hpp
! src/share/vm/gc/shared/threadLocalAllocBuffer.cpp
! src/share/vm/oops/cpCache.cpp
! src/share/vm/oops/instanceRefKlass.inline.hpp
! src/share/vm/oops/objArrayOop.hpp
! src/share/vm/oops/oop.cpp
! src/share/vm/prims/jni.cpp
! src/share/vm/prims/jvm.cpp
! src/share/vm/runtime/arguments.cpp
! src/share/vm/runtime/safepoint.cpp
! src/share/vm/services/heapDumper.cpp


From shade at redhat.com  Mon Dec 12 11:17:20 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 12 Dec 2016 12:17:20 +0100
Subject: RFR (S): Heap dump support
Message-ID: <c43394e3-c142-d59d-f1f8-10412c8fa5f8@redhat.com>

Hi,

I have been trying to analyze the cause for OOME with Shenandoah, only to figure
that Shenandoah does not support heap dumping (d'oh).

Solved this by implementing ShenandoahHeap::safe_object_iterate:
  http://cr.openjdk.java.net/~shade/shenandoah/heapdump-support/webrev.01/

Testing: manual heap dumps with fastdebug/release.

Thanks,
-Aleksey


From shade at redhat.com  Mon Dec 12 14:36:04 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 12 Dec 2016 15:36:04 +0100
Subject: RFR (S): Fix races on full GC request
Message-ID: <fd75d5bd-86ee-f598-0d1c-22ad299c0c35@redhat.com>

Hi,

There is yet another semi-race in scheduling Full GC, here:

 173 void ShenandoahConcurrentThread::do_full_gc(GCCause::Cause cause) {
 175   assert(Thread::current()->is_Java_thread(), "expect Java thread here");
 176
 177   MonitorLockerEx ml(&_full_gc_lock);
 178   schedule_full_gc(); // sets _do_full_gc = true
 179   _full_gc_cause = cause;
 180
 181   // Now that full GC is scheduled, we can abort everything else
 182   ShenandoahHeap::heap()->cancel_concgc(cause);
 183
 184   while (_do_full_gc) {
 185     ml.wait();
 186     OrderAccess::storeload();
 187   }
 188   assert(!_do_full_gc, "expect full GC to have completed");
 189 }

If there is a thread that blocked on _full_gc_lock when Full GC had started, but
re-entered after Full GC is completed, it would try to schedule full GC / cancel
conc GC again! This mostly happens when full GCs are really short.

In our current code, this also fails the assert in Shenandoah control thread
that every cancellation should have a reason, like impending full GC. This
interesting result is because there are racy unlocked gets of _do_full_gc in
assertion code.

Both are solved by turning _do_full_gc updates atomic/lock-free, and using the
lock only for wait/notifies:
  http://cr.openjdk.java.net/~shade/shenandoah/cancel-races-again/webrev.02/

Testing: hotspot_gc_shenandoah, jcstress

Thanks,
-Aleksey

P.S. I swear to God, another race there, and I will burn the entire termination
protocol thing down.


From zgu at redhat.com  Mon Dec 12 14:48:01 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 12 Dec 2016 09:48:01 -0500
Subject: RFR (S): Heap dump support
In-Reply-To: <c43394e3-c142-d59d-f1f8-10412c8fa5f8@redhat.com>
References: <c43394e3-c142-d59d-f1f8-10412c8fa5f8@redhat.com>
Message-ID: <810fe37d-1965-aa52-8103-a600064bc7a1@redhat.com>

Hi Aleksey,

ShenandoahSafeObjectIterateAdjustPtrsClosure seems a duplicate of ShenandoahAdjustPointersClosure in shenandoahMarkCompact.cpp.


Thanks,

-Zhengyu


On 12/12/2016 06:17 AM, Aleksey Shipilev wrote:
> Hi,
>
> I have been trying to analyze the cause for OOME with Shenandoah, only to figure
> that Shenandoah does not support heap dumping (d'oh).
>
> Solved this by implementing ShenandoahHeap::safe_object_iterate:
>    http://cr.openjdk.java.net/~shade/shenandoah/heapdump-support/webrev.01/
>
> Testing: manual heap dumps with fastdebug/release.
>
> Thanks,
> -Aleksey
>

From shade at redhat.com  Mon Dec 12 14:58:19 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 12 Dec 2016 15:58:19 +0100
Subject: RFR (S): Heap dump support
In-Reply-To: <810fe37d-1965-aa52-8103-a600064bc7a1@redhat.com>
References: <c43394e3-c142-d59d-f1f8-10412c8fa5f8@redhat.com>
	<810fe37d-1965-aa52-8103-a600064bc7a1@redhat.com>
Message-ID: <926cec20-2b26-83b9-bccd-17fe2e78ae65@redhat.com>

On 12/12/2016 03:48 PM, Zhengyu Gu wrote:
> ShenandoahSafeObjectIterateAdjustPtrsClosure seems a duplicate of
> ShenandoahAdjustPointersClosure in shenandoahMarkCompact.cpp.

Yes, except that mark-compact bypasses the usual fwdptr verification checks with
BrooksPointer::get_raw, which I don't want to do in regular code.

Also, I thought copying it would be more straightforward than making it shared.
We should clean up all these closures at once in some shared file, like
g1OopClosures.* do.

Thanks,
-Aleksey


From zgu at redhat.com  Mon Dec 12 15:13:13 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 12 Dec 2016 10:13:13 -0500
Subject: RFR (S): Heap dump support
In-Reply-To: <926cec20-2b26-83b9-bccd-17fe2e78ae65@redhat.com>
References: <c43394e3-c142-d59d-f1f8-10412c8fa5f8@redhat.com>
	<810fe37d-1965-aa52-8103-a600064bc7a1@redhat.com>
	<926cec20-2b26-83b9-bccd-17fe2e78ae65@redhat.com>
Message-ID: <18b0f407-b116-5101-cd72-65d9a19da53b@redhat.com>

Okay.

Thanks,

-Zhengyu

On 12/12/2016 09:58 AM, Aleksey Shipilev wrote:
> On 12/12/2016 03:48 PM, Zhengyu Gu wrote:
>> ShenandoahSafeObjectIterateAdjustPtrsClosure seems a duplicate of
>> ShenandoahAdjustPointersClosure in shenandoahMarkCompact.cpp.
> Yes, except that mark-compact bypasses the usual fwdptr verification checks with
> BrooksPointer::get_raw, which I don't want to do in regular code.
>
> Also, I thought copying it would be more straightforward than making it shared.
> We should clean up all these closures at once in some shared file, like
> g1OopClosures.* do.
>
> Thanks,
> -Aleksey
>
>
>


From rkennke at redhat.com  Mon Dec 12 15:16:42 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 12 Dec 2016 16:16:42 +0100
Subject: RFR (S): Heap dump support
In-Reply-To: <c43394e3-c142-d59d-f1f8-10412c8fa5f8@redhat.com>
References: <c43394e3-c142-d59d-f1f8-10412c8fa5f8@redhat.com>
Message-ID: <1481555802.2597.77.camel@redhat.com>

Hi Aleksey,

this would report evacuated objects twice, right? Maybe simply skip
cset regions?

Not sure we need to update references. Seems like extra unnecessary
work. Calling code should do the appropriate read barriers, or receive
opaque JNI handles.

I believe the straightforward way to implement this is to simply
delegate to marked_object_iterate() but only for non-cset regions.

Roman

Am Montag, den 12.12.2016, 12:17 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> I have been trying to analyze the cause for OOME with Shenandoah,
> only to figure
> that Shenandoah does not support heap dumping (d'oh).
> 
> Solved this by implementing ShenandoahHeap::safe_object_iterate:
> ? http://cr.openjdk.java.net/~shade/shenandoah/heapdump-support/webre
> v.01/
> 
> Testing: manual heap dumps with fastdebug/release.
> 
> Thanks,
> -Aleksey
> 

From rkennke at redhat.com  Mon Dec 12 15:20:12 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 12 Dec 2016 16:20:12 +0100
Subject: RFR (S): Fix races on full GC request
In-Reply-To: <fd75d5bd-86ee-f598-0d1c-22ad299c0c35@redhat.com>
References: <fd75d5bd-86ee-f598-0d1c-22ad299c0c35@redhat.com>
Message-ID: <1481556012.2597.79.camel@redhat.com>

Hi,


> ?173 void ShenandoahConcurrentThread::do_full_gc(GCCause::Cause
> cause) {
> ?175???assert(Thread::current()->is_Java_thread(), "expect Java
> thread here");
> ?176
> ?177???MonitorLockerEx ml(&_full_gc_lock);
> ?178???schedule_full_gc(); // sets _do_full_gc = true
> ?179???_full_gc_cause = cause;
> ?180
> ?181???// Now that full GC is scheduled, we can abort everything else
> ?182???ShenandoahHeap::heap()->cancel_concgc(cause);
> ?183
> ?184???while (_do_full_gc) {
> ?185?????ml.wait();
> ?186?????OrderAccess::storeload();
> ?187???}
> ?188???assert(!_do_full_gc, "expect full GC to have completed");
> ?189 }
> 
> If there is a thread that blocked on _full_gc_lock when Full GC had
> started, but
> re-entered after Full GC is completed, it would try to schedule full
> GC / cancel
> conc GC again! This mostly happens when full GCs are really short.
> 
> In our current code, this also fails the assert in Shenandoah control
> thread
> that every cancellation should have a reason, like impending full GC.
> This
> interesting result is because there are racy unlocked gets of
> _do_full_gc in
> assertion code.
> 
> Both are solved by turning _do_full_gc updates atomic/lock-free, and
> using the
> lock only for wait/notifies:
> ? http://cr.openjdk.java.net/~shade/shenandoah/cancel-races-again/web
> rev.02/

Looks good to me.


> P.S. I swear to God, another race there, and I will burn the entire
> termination
> protocol thing down.

:-D

I can't count how many races and strange conditions we already fixed
there. One entire problem class went *puff* away when Zhengyu suggested
to simplify JNI critical regions. I seriously hope it's the last one.
Otherwise we simply give up on terminating? :-P

Roman

From shade at redhat.com  Mon Dec 12 16:02:14 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 12 Dec 2016 17:02:14 +0100
Subject: RFR (S): Heap dump support
In-Reply-To: <1481555802.2597.77.camel@redhat.com>
References: <c43394e3-c142-d59d-f1f8-10412c8fa5f8@redhat.com>
	<1481555802.2597.77.camel@redhat.com>
Message-ID: <492adb14-2bb0-1b2b-8eed-35666eb8e465@redhat.com>

On 12/12/2016 04:16 PM, Roman Kennke wrote:
> this would report evacuated objects twice, right? Maybe simply skip
> cset regions?

Right, I missed the double-counting here!

> Not sure we need to update references. Seems like extra unnecessary
> work. Calling code should do the appropriate read barriers, or receive
> opaque JNI handles.

Except that HeapDumper does not have read barriers, and does abominable things
like accessing field with naked (oop + field_offset). A little more safety for
"safe_*" iteration method would not hurt.


> I believe the straightforward way to implement this is to simply
> delegate to marked_object_iterate() but only for non-cset regions.

Right. A little less straightforward way is to reuse flagged
heap_region_iterate() for this. See:
  http://cr.openjdk.java.net/~shade/shenandoah/heapdump-support/webrev.02/

Thanks,
-Aleksey


From roman at kennke.org  Mon Dec 12 16:03:22 2016
From: roman at kennke.org (roman at kennke.org)
Date: Mon, 12 Dec 2016 16:03:22 +0000
Subject: hg: shenandoah/jdk8u/hotspot: Added missing include of oop closures.
	Fixes linking problem.
Message-ID: <201612121603.uBCG3MR5027600@aojmv0008.oracle.com>

Changeset: 88c8ad7d034b
Author:    rkennke
Date:      2016-12-12 17:03 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/88c8ad7d034b

Added missing include of oop closures. Fixes linking problem.

! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentMark.cpp


From rkennke at redhat.com  Mon Dec 12 16:04:40 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 12 Dec 2016 17:04:40 +0100
Subject: RFR (S): Heap dump support
In-Reply-To: <492adb14-2bb0-1b2b-8eed-35666eb8e465@redhat.com>
References: <c43394e3-c142-d59d-f1f8-10412c8fa5f8@redhat.com>
	<1481555802.2597.77.camel@redhat.com>
	<492adb14-2bb0-1b2b-8eed-35666eb8e465@redhat.com>
Message-ID: <1481558680.2597.80.camel@redhat.com>

Am Montag, den 12.12.2016, 17:02 +0100 schrieb Aleksey Shipilev:
> On 12/12/2016 04:16 PM, Roman Kennke wrote:
> > this would report evacuated objects twice, right? Maybe simply skip
> > cset regions?
> 
> Right, I missed the double-counting here!
> 
> > Not sure we need to update references. Seems like extra unnecessary
> > work. Calling code should do the appropriate read barriers, or
> > receive
> > opaque JNI handles.
> 
> Except that HeapDumper does not have read barriers, and does
> abominable things
> like accessing field with naked (oop + field_offset). A little more
> safety for
> "safe_*" iteration method would not hurt.

Yep. And we're iterating over everything anyway, so we can just as well
fix the ptrs.


> > I believe the straightforward way to implement this is to simply
> > delegate to marked_object_iterate() but only for non-cset regions.
> 
> Right. A little less straightforward way is to reuse flagged
> heap_region_iterate() for this. See:
> ? http://cr.openjdk.java.net/~shade/shenandoah/heapdump-
> support/webrev.02/

Good! Please push!
Roman

From ashipile at redhat.com  Mon Dec 12 16:16:56 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Mon, 12 Dec 2016 16:16:56 +0000
Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets
Message-ID: <201612121616.uBCGGuBR001285@aojmv0008.oracle.com>

Changeset: 582651ecf809
Author:    shade
Date:      2016-12-12 17:06 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/582651ecf809

Heap dump support

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp

Changeset: aef414e15af5
Author:    shade
Date:      2016-12-12 17:08 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/aef414e15af5

Fix another Full GC trigger race

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp


From shade at redhat.com  Mon Dec 12 16:35:29 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 12 Dec 2016 17:35:29 +0100
Subject: RFR (XS): Enable -XX:+HeapDump{Before|After}FullGC
Message-ID: <4776529a-0485-6067-8da7-549943d2e39f@redhat.com>

Hi,

Little change: make Shenandoah dump the heap before/after Full GC, if requested,
like any diagnosable collector should do:
 http://cr.openjdk.java.net/~shade/shenandoah/heapdumps-before-after/webrev.01/

Thanks,
-Aleksey


From rkennke at redhat.com  Mon Dec 12 17:25:03 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 12 Dec 2016 18:25:03 +0100
Subject: RFR (XS): Enable -XX:+HeapDump{Before|After}FullGC
In-Reply-To: <4776529a-0485-6067-8da7-549943d2e39f@redhat.com>
References: <4776529a-0485-6067-8da7-549943d2e39f@redhat.com>
Message-ID: <1481563503.2597.82.camel@redhat.com>

Yes


Am Montag, den 12.12.2016, 17:35 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> Little change: make Shenandoah dump the heap before/after Full GC, if
> requested,
> like any diagnosable collector should do:
> ?http://cr.openjdk.java.net/~shade/shenandoah/heapdumps-before-after/
> webrev.01/
> 
> Thanks,
> -Aleksey
> 

From ashipile at redhat.com  Mon Dec 12 17:26:29 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Mon, 12 Dec 2016 17:26:29 +0000
Subject: hg: shenandoah/jdk9/hotspot: Enable -XX:+HeapDump{Before|After}FullGC.
Message-ID: <201612121726.uBCHQTRl018865@aojmv0008.oracle.com>

Changeset: 6f8831470752
Author:    shade
Date:      2016-12-12 18:26 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/6f8831470752

Enable -XX:+HeapDump{Before|After}FullGC.

! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp


From shade at redhat.com  Tue Dec 13 09:47:17 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 13 Dec 2016 10:47:17 +0100
Subject: Perf: wasted region after humongous alloc?
Message-ID: <4673219a-e9f4-fa0e-1d3f-bf5204537b6c@redhat.com>

Hi,

Been playing with tests, and realized we have an peculiar behavior when
allocating humongous objects, e.g. in:

public class Alloc {
  static final int SIZE = Integer.getInteger("size", 2_000_000);
  static Object sink;

  public static void main(String... args) throws Exception {
    for (int c = 0; c < 1000000; c++) {
      sink = new int[SIZE];
    }
  }
}

The region logging prints this:

...
region 238, used = 4194304, live = 0, flags = <humongous>
region 239, used = 4194304, live = 0, flags = <humongous>
region 240, used = 0, live = 0, flags = <none>
region 241, used = 4194304, live = 0, flags = <humongous>
region 242, used = 4194304, live = 0, flags = <humongous>
region 243, used = 0, live = 0, flags = <none>
region 244, used = 4194304, live = 0, flags = <humongous>
region 245, used = 4194304, live = 0, flags = <humongous>
region 246, used = 0, live = 0, flags = <none>
...

So there seems to be an empty region right after the humongous allocation. Are
we wasting it intentionally, or is it a bug? Seems wasteful either way.

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Dec 13 09:55:15 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 13 Dec 2016 10:55:15 +0100
Subject: Perf: wasted region after humongous alloc?
In-Reply-To: <4673219a-e9f4-fa0e-1d3f-bf5204537b6c@redhat.com>
References: <4673219a-e9f4-fa0e-1d3f-bf5204537b6c@redhat.com>
Message-ID: <1481622915.2597.92.camel@redhat.com>

It's probably because we're unconditionally skipping to the next region
in ShenandoahFreeSet::claim_contiguous(), assuming that normally the
'current' region is already allocated into. This might not be the case
though, especially when commonly allocating region-sized TLABs.

In any case, it is wasteful. Do you want to look into this?

Roman

Am Dienstag, den 13.12.2016, 10:47 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> Been playing with tests, and realized we have an peculiar behavior
> when
> allocating humongous objects, e.g. in:
> 
> public class Alloc {
> ? static final int SIZE = Integer.getInteger("size", 2_000_000);
> ? static Object sink;
> 
> ? public static void main(String... args) throws Exception {
> ????for (int c = 0; c < 1000000; c++) {
> ??????sink = new int[SIZE];
> ????}
> ? }
> }
> 
> The region logging prints this:
> 
> ...
> region 238, used = 4194304, live = 0, flags = <humongous>
> region 239, used = 4194304, live = 0, flags = <humongous>
> region 240, used = 0, live = 0, flags = <none>
> region 241, used = 4194304, live = 0, flags = <humongous>
> region 242, used = 4194304, live = 0, flags = <humongous>
> region 243, used = 0, live = 0, flags = <none>
> region 244, used = 4194304, live = 0, flags = <humongous>
> region 245, used = 4194304, live = 0, flags = <humongous>
> region 246, used = 0, live = 0, flags = <none>
> ...
> 
> So there seems to be an empty region right after the humongous
> allocation. Are
> we wasting it intentionally, or is it a bug? Seems wasteful either
> way.
> 
> Thanks,
> -Aleksey
> 

From shade at redhat.com  Tue Dec 13 10:11:06 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 13 Dec 2016 11:11:06 +0100
Subject: Perf: wasted region after humongous alloc?
In-Reply-To: <1481622915.2597.92.camel@redhat.com>
References: <4673219a-e9f4-fa0e-1d3f-bf5204537b6c@redhat.com>
	<1481622915.2597.92.camel@redhat.com>
Message-ID: <3f85aeb6-062c-11a4-e3c4-ddc5775bc2f5@redhat.com>

Aha, this decision is odd, and as the example below shows, it wastes regions.
Please take it into your work queue?

Thanks,
-Aleksey

On 12/13/2016 10:55 AM, Roman Kennke wrote:
> It's probably because we're unconditionally skipping to the next region
> in ShenandoahFreeSet::claim_contiguous(), assuming that normally the
> 'current' region is already allocated into. This might not be the case
> though, especially when commonly allocating region-sized TLABs.
> 
> In any case, it is wasteful. Do you want to look into this?
> 
> Roman
> 
> Am Dienstag, den 13.12.2016, 10:47 +0100 schrieb Aleksey Shipilev:
>> Hi,
>>
>> Been playing with tests, and realized we have an peculiar behavior
>> when
>> allocating humongous objects, e.g. in:
>>
>> public class Alloc {
>>   static final int SIZE = Integer.getInteger("size", 2_000_000);
>>   static Object sink;
>>
>>   public static void main(String... args) throws Exception {
>>     for (int c = 0; c < 1000000; c++) {
>>       sink = new int[SIZE];
>>     }
>>   }
>> }
>>
>> The region logging prints this:
>>
>> ...
>> region 238, used = 4194304, live = 0, flags = <humongous>
>> region 239, used = 4194304, live = 0, flags = <humongous>
>> region 240, used = 0, live = 0, flags = <none>
>> region 241, used = 4194304, live = 0, flags = <humongous>
>> region 242, used = 4194304, live = 0, flags = <humongous>
>> region 243, used = 0, live = 0, flags = <none>
>> region 244, used = 4194304, live = 0, flags = <humongous>
>> region 245, used = 4194304, live = 0, flags = <humongous>
>> region 246, used = 0, live = 0, flags = <none>
>> ...
>>
>> So there seems to be an empty region right after the humongous
>> allocation. Are
>> we wasting it intentionally, or is it a bug? Seems wasteful either
>> way.
>>
>> Thanks,
>> -Aleksey
>>


From shade at redhat.com  Tue Dec 13 13:23:19 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 13 Dec 2016 14:23:19 +0100
Subject: RFR (S): Do more Full GC tries following the allocation failure
Message-ID: <e9f56859-a8e1-ca1b-053b-61222eb62ad6@redhat.com>

Hi,

There is another interesting race after full GC: if there are many threads
alloc-failing and then requesting full GC, it might so happen that one of those
threads unblocks after full GC, tries to allocate only to find other threads
have claimed everything, fails, and that is OOME. While the better strategy
should be another full GC.

See the change and the comment:
  http://cr.openjdk.java.net/~shade/shenandoah/full-gc-retry/webrev.01/

Additionally, this gives us a tuning knob: with -XX:ShenandoahFullGCTries=0, we
say that we would rather fail with OOME than accept the Full GC.

Testing: hotspot_gc_shenandoah, gc-bench alloc tests (where it OOMEd before)

Thanks,
-Aleksey


From chf at redhat.com  Tue Dec 13 15:19:37 2016
From: chf at redhat.com (Christine Flood)
Date: Tue, 13 Dec 2016 10:19:37 -0500 (EST)
Subject: RFR (S): Do more Full GC tries following the allocation failure
In-Reply-To: <e9f56859-a8e1-ca1b-053b-61222eb62ad6@redhat.com>
References: <e9f56859-a8e1-ca1b-053b-61222eb62ad6@redhat.com>
Message-ID: <1621862713.4202511.1481642377106.JavaMail.zimbra@redhat.com>


I suppose three is the magic number...

This looks fine to me.

----- Original Message -----
> From: "Aleksey Shipilev" <shade at redhat.com>
> To: shenandoah-dev at openjdk.java.net
> Sent: Tuesday, December 13, 2016 8:23:19 AM
> Subject: RFR (S): Do more Full GC tries following the allocation failure
> 
> Hi,
> 
> There is another interesting race after full GC: if there are many threads
> alloc-failing and then requesting full GC, it might so happen that one of
> those
> threads unblocks after full GC, tries to allocate only to find other threads
> have claimed everything, fails, and that is OOME. While the better strategy
> should be another full GC.
> 
> See the change and the comment:
>   http://cr.openjdk.java.net/~shade/shenandoah/full-gc-retry/webrev.01/
> 
> Additionally, this gives us a tuning knob: with -XX:ShenandoahFullGCTries=0,
> we
> say that we would rather fail with OOME than accept the Full GC.
> 
> Testing: hotspot_gc_shenandoah, gc-bench alloc tests (where it OOMEd before)
> 
> Thanks,
> -Aleksey
> 
> 

From ashipile at redhat.com  Tue Dec 13 15:51:36 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Tue, 13 Dec 2016 15:51:36 +0000
Subject: hg: shenandoah/jdk9/hotspot: Do more Full GC tries following the
	allocation failure
Message-ID: <201612131551.uBDFpaQX028057@aojmv0008.oracle.com>

Changeset: 7d3e70252b18
Author:    shade
Date:      2016-12-13 16:51 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/7d3e70252b18

Do more Full GC tries following the allocation failure

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoah_globals.hpp


From rkennke at redhat.com  Tue Dec 13 17:03:17 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 13 Dec 2016 18:03:17 +0100
Subject: RFR: Add remaining unused free space to 'used' counter in free list
Message-ID: <1481648597.2597.97.camel@redhat.com>

I noticed that when a program allocates many objects that are slightly
larger than half a region, we would continuously run into full GC. The
reason is that when we skip to next region for allocation, we did not
count the remaining unused free space as 'used', and thus barely
reported half of heap remaining when running OOM. Oops.

Fixed in ShenandoahFreeList by adding last-current-region's remaining
free() to the free-lists used.

Ok?

http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/

Roman

From zgu at redhat.com  Tue Dec 13 17:07:46 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 13 Dec 2016 12:07:46 -0500
Subject: RFR: Add remaining unused free space to 'used' counter in free
	list
In-Reply-To: <1481648597.2597.97.camel@redhat.com>
References: <1481648597.2597.97.camel@redhat.com>
Message-ID: <2121c2f2-529e-6ff9-f137-ff80bfd0064a@redhat.com>

Look good.

-Zhengyu


On 12/13/2016 12:03 PM, Roman Kennke wrote:
> I noticed that when a program allocates many objects that are slightly
> larger than half a region, we would continuously run into full GC. The
> reason is that when we skip to next region for allocation, we did not
> count the remaining unused free space as 'used', and thus barely
> reported half of heap remaining when running OOM. Oops.
>
> Fixed in ShenandoahFreeList by adding last-current-region's remaining
> free() to the free-lists used.
>
> Ok?
>
> http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/
>
> Roman


From shade at redhat.com  Tue Dec 13 17:10:51 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 13 Dec 2016 18:10:51 +0100
Subject: RFR: Add remaining unused free space to 'used' counter in free
	list
In-Reply-To: <1481648597.2597.97.camel@redhat.com>
References: <1481648597.2597.97.camel@redhat.com>
Message-ID: <4203aa39-fdac-37a2-ee80-fb1694a50c52@redhat.com>

On 12/13/2016 06:03 PM, Roman Kennke wrote:
> I noticed that when a program allocates many objects that are slightly
> larger than half a region, we would continuously run into full GC. The
> reason is that when we skip to next region for allocation, we did not
> count the remaining unused free space as 'used', and thus barely
> reported half of heap remaining when running OOM. Oops.
> 
> Fixed in ShenandoahFreeList by adding last-current-region's remaining
> free() to the free-lists used.
> 
> Ok?
> 
> http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/

I don't see how it matches with the reverse operation, which decrements based on
region used size only, not its free size?

See:
       heap->decrease_used(region->used());
      _heap->decrease_used(r->used());

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Dec 13 17:11:38 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 13 Dec 2016 18:11:38 +0100
Subject: RFR: Reduce waste in humongous allocations
Message-ID: <1481649098.2597.99.camel@redhat.com>

as Aleksey has shown, when repeatedly allocating humongous objects, we
tend to leave gaps between them. The reason is that we start looking
for contigous regions starting one region after the current
(allocation) region, and then discard that alloc region, starting a new
one after the humongous object.

The fix is two-fold:
- Instead of discarding currently active allocation regions, we re-
append them to the free-list (together with any free regions that we
skipped while searching a contiguous block). This should be useful,
e.g. when we have a not-totally-filled alloc region and then allocate a
humongous object.
- When searching for contigous space, also consider the current alloc
region. The complication here is that we must prevent concurrent
allocations from it. This patch does it by pre-emptively allocating
region-sized chunk, which has two effects: it blocks concurrent
allocations and it tells us if the region is free in a concurrency-safe 
manner. If our search for contiguous block fails, we revert that by
freeing such regions again.

It passes jtreg tests and SPECjvm.

http://cr.openjdk.java.net/~rkennke/fixhumongousalloc/webrev.00/

Ok?

Roman


From rkennke at redhat.com  Tue Dec 13 17:16:36 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 13 Dec 2016 18:16:36 +0100
Subject: RFR: Add remaining unused free space to 'used' counter in free
	list
In-Reply-To: <4203aa39-fdac-37a2-ee80-fb1694a50c52@redhat.com>
References: <1481648597.2597.97.camel@redhat.com>
	<4203aa39-fdac-37a2-ee80-fb1694a50c52@redhat.com>
Message-ID: <1481649396.2597.100.camel@redhat.com>

Am Dienstag, den 13.12.2016, 18:10 +0100 schrieb Aleksey Shipilev:
> On 12/13/2016 06:03 PM, Roman Kennke wrote:
> > I noticed that when a program allocates many objects that are
> > slightly
> > larger than half a region, we would continuously run into full GC.
> > The
> > reason is that when we skip to next region for allocation, we did
> > not
> > count the remaining unused free space as 'used', and thus barely
> > reported half of heap remaining when running OOM. Oops.
> > 
> > Fixed in ShenandoahFreeList by adding last-current-region's
> > remaining
> > free() to the free-lists used.
> > 
> > Ok?
> > 
> > http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/
> 
> I don't see how it matches with the reverse operation, which
> decrements based on
> region used size only, not its free size?
> 
> See:
> ???????heap->decrease_used(region->used());
> ??????_heap->decrease_used(r->used());

This is in the heap. The patch addresses the ShenandoahFreeList.

I checked it, for heap used counters, decrease and increase do match.

Roman

From shade at redhat.com  Tue Dec 13 17:17:20 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 13 Dec 2016 18:17:20 +0100
Subject: RFR: Add remaining unused free space to 'used' counter in free
	list
In-Reply-To: <1481649396.2597.100.camel@redhat.com>
References: <1481648597.2597.97.camel@redhat.com>
	<4203aa39-fdac-37a2-ee80-fb1694a50c52@redhat.com>
	<1481649396.2597.100.camel@redhat.com>
Message-ID: <4ab22e2d-45d1-4564-420e-cd0ee7f55a10@redhat.com>

On 12/13/2016 06:16 PM, Roman Kennke wrote:
> Am Dienstag, den 13.12.2016, 18:10 +0100 schrieb Aleksey Shipilev:
>> On 12/13/2016 06:03 PM, Roman Kennke wrote:
>>> http://cr.openjdk.java.net/~rkennke/fixused/webrev.00/
>>
>> I don't see how it matches with the reverse operation, which
>> decrements based on
>> region used size only, not its free size?
>>
>> See:
>>        heap->decrease_used(region->used());
>>       _heap->decrease_used(r->used());
> 
> This is in the heap. The patch addresses the ShenandoahFreeList.
> 
> I checked it, for heap used counters, decrease and increase do match.

Ah, my mistake. Looks good then.

-Aleksey


From roman at kennke.org  Tue Dec 13 17:20:45 2016
From: roman at kennke.org (roman at kennke.org)
Date: Tue, 13 Dec 2016 17:20:45 +0000
Subject: hg: shenandoah/jdk9/hotspot: Add remaining unused free space to
	'used' counter in free list. Makes heuristics more precise.
Message-ID: <201612131720.uBDHKksD024295@aojmv0008.oracle.com>

Changeset: 155d04209453
Author:    rkennke
Date:      2016-12-13 18:20 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/155d04209453

Add remaining unused free space to 'used' counter in free list. Makes heuristics more precise.

! src/share/vm/gc/shenandoah/shenandoahFreeSet.cpp


From shade at redhat.com  Tue Dec 13 18:09:52 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 13 Dec 2016 19:09:52 +0100
Subject: RFR: Reduce waste in humongous allocations
In-Reply-To: <1481649098.2597.99.camel@redhat.com>
References: <1481649098.2597.99.camel@redhat.com>
Message-ID: <2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com>

On 12/13/2016 06:11 PM, Roman Kennke wrote:
> as Aleksey has shown, when repeatedly allocating humongous objects, we
> tend to leave gaps between them. The reason is that we start looking
> for contigous regions starting one region after the current
> (allocation) region, and then discard that alloc region, starting a new
> one after the humongous object.
> 
> The fix is two-fold:
> - Instead of discarding currently active allocation regions, we re-
> append them to the free-list (together with any free regions that we
> skipped while searching a contiguous block). This should be useful,
> e.g. when we have a not-totally-filled alloc region and then allocate a
> humongous object.
> - When searching for contigous space, also consider the current alloc
> region. The complication here is that we must prevent concurrent
> allocations from it. This patch does it by pre-emptively allocating
> region-sized chunk, which has two effects: it blocks concurrent
> allocations and it tells us if the region is free in a concurrency-safe 
> manner. If our search for contiguous block fails, we revert that by
> freeing such regions again.
> 
> It passes jtreg tests and SPECjvm.
> 
> http://cr.openjdk.java.net/~rkennke/fixhumongousalloc/webrev.00/

Ugh. The code got even more confusing than it was before... At this point I
wonder if acquiring a lock when claiming free regions is saner than trying to do
this in a lock-free manner. With TLAB allocations, this shouldn't be that painful?

Seeing mutations in ShenandoahFreeSet::is_contiguous() makes me all itchy, it
should be called differently.

Also, does the code claim the regions one-by-one? What if we have two competing
multi-region humongous allocations? Does it guarantee to allocate both (e.g. are
they stepping on each other's toes, preventing global progress?)

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Dec 13 18:20:17 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 13 Dec 2016 19:20:17 +0100
Subject: RFR: Reduce waste in humongous allocations
In-Reply-To: <2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com>
References: <1481649098.2597.99.camel@redhat.com>
	<2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com>
Message-ID: <1481653217.2597.102.camel@redhat.com>

Am Dienstag, den 13.12.2016, 19:09 +0100 schrieb Aleksey Shipilev:
> On 12/13/2016 06:11 PM, Roman Kennke wrote:
> > as Aleksey has shown, when repeatedly allocating humongous objects,
> > we
> > tend to leave gaps between them. The reason is that we start
> > looking
> > for contigous regions starting one region after the current
> > (allocation) region, and then discard that alloc region, starting a
> > new
> > one after the humongous object.
> > 
> > The fix is two-fold:
> > - Instead of discarding currently active allocation regions, we re-
> > append them to the free-list (together with any free regions that
> > we
> > skipped while searching a contiguous block). This should be useful,
> > e.g. when we have a not-totally-filled alloc region and then
> > allocate a
> > humongous object.
> > - When searching for contigous space, also consider the current
> > alloc
> > region. The complication here is that we must prevent concurrent
> > allocations from it. This patch does it by pre-emptively allocating
> > region-sized chunk, which has two effects: it blocks concurrent
> > allocations and it tells us if the region is free in a concurrency-
> > safe?
> > manner. If our search for contiguous block fails, we revert that by
> > freeing such regions again.
> > 
> > It passes jtreg tests and SPECjvm.
> > 
> > http://cr.openjdk.java.net/~rkennke/fixhumongousalloc/webrev.00/
> 
> Ugh. The code got even more confusing than it was before... At this
> point I
> wonder if acquiring a lock when claiming free regions is saner than
> trying to do
> this in a lock-free manner. With TLAB allocations, this shouldn't be
> that painful?

It's not painful in terms of performance, but painful in terms of
implemention. We cannot easily acquire the Heap_lock on allocations
because the allocation might come out of a write barrier, and that Java
thread is not-in-VM (they call into the VM via a cheap leaf-call). We
could change that (and have been there already) to use regular calls
like, e.g. allocations do, but this opens up a whole new class of other
problems. For example, we need oopmaps at write-barriers which, iirc,
presented us some serious optimization problems in C2 land. With
Roland's work, those might have gone away though (seems like we can
well live with control inputs to write barriers now..)

We have been there, and it might be The Correct Way to do it, but it's
not trivial at all.

> Seeing mutations in ShenandoahFreeSet::is_contiguous() makes me all
> itchy, it
> should be called differently.
> 
> Also, does the code claim the regions one-by-one? What if we have two
> competing
> multi-region humongous allocations? Does it guarantee to allocate
> both (e.g. are
> they stepping on each other's toes, preventing global progress?)

I guess it could happen. How else could we do it?

I know this stuff is a bit nightmarish. Accept that as stop-gap-
solution, and re-visit locked allocation with non-leaf-write-barriers
and all that stuff later?

Roman

From shade at redhat.com  Tue Dec 13 18:48:38 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 13 Dec 2016 19:48:38 +0100
Subject: RFR: Reduce waste in humongous allocations
In-Reply-To: <1481653217.2597.102.camel@redhat.com>
References: <1481649098.2597.99.camel@redhat.com>
	<2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com>
	<1481653217.2597.102.camel@redhat.com>
Message-ID: <4d4e6526-a669-8cd0-1fe5-8411a74e6f75@redhat.com>

On 12/13/2016 07:20 PM, Roman Kennke wrote:
> Am Dienstag, den 13.12.2016, 19:09 +0100 schrieb Aleksey Shipilev:
>> On 12/13/2016 06:11 PM, Roman Kennke wrote:
>>> http://cr.openjdk.java.net/~rkennke/fixhumongousalloc/webrev.00/
>> 
>> Ugh. The code got even more confusing than it was before... At this point
>> I wonder if acquiring a lock when claiming free regions is saner than 
>> trying to do this in a lock-free manner. With TLAB allocations, this
>> shouldn't be that painful?
> 
> It's not painful in terms of performance, but painful in terms of 
> implemention. We cannot easily acquire the Heap_lock on allocations because
> the allocation might come out of a write barrier, and that Java thread is
> not-in-VM (they call into the VM via a cheap leaf-call). We could change that
> (and have been there already) to use regular calls like, e.g. allocations do,
> but this opens up a whole new class of other problems. For example, we need
> oopmaps at write-barriers which, iirc, presented us some serious optimization
> problems in C2 land. With Roland's work, those might have gone away though
> (seems like we can well live with control inputs to write barriers now..)

OUCH.

> We have been there, and it might be The Correct Way to do it, but it's not
> trivial at all.

We don't need Heap_lock specifically, right? I wonder if we can dispense with a
very short-lived spinlock only in ShenandoahFreeSet to trim the lock-free
madness down there.

>> Also, does the code claim the regions one-by-one? What if we have two 
>> competing multi-region humongous allocations? Does it guarantee to
>> allocate both (e.g. are they stepping on each other's toes, preventing
>> global progress?)
> 
> I guess it could happen. How else could we do it?
> 
> I know this stuff is a bit nightmarish. Accept that as stop-gap- solution,
> and re-visit locked allocation with non-leaf-write-barriers and all that
> stuff later?

No, because I think those competing multi-region allocs are very real, and will
bite us. Let's push something is not affected by that. So far the cure is worse
than a disease :)

Thanks,
-Aleksey


From rwestrel at redhat.com  Wed Dec 14 08:06:19 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 14 Dec 2016 09:06:19 +0100
Subject: RFR: Reduce waste in humongous allocations
In-Reply-To: <1481653217.2597.102.camel@redhat.com>
References: <1481649098.2597.99.camel@redhat.com>
	<2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com>
	<1481653217.2597.102.camel@redhat.com>
Message-ID: <dk6h967vssk.fsf@rwestrel.remote.csb>


> It's not painful in terms of performance, but painful in terms of
> implemention. We cannot easily acquire the Heap_lock on allocations
> because the allocation might come out of a write barrier, and that Java
> thread is not-in-VM (they call into the VM via a cheap leaf-call). We
> could change that (and have been there already) to use regular calls
> like, e.g. allocations do, but this opens up a whole new class of other
> problems. For example, we need oopmaps at write-barriers which, iirc,
> presented us some serious optimization problems in C2 land. With
> Roland's work, those might have gone away though (seems like we can
> well live with control inputs to write barriers now..)

If we have a blocking runtime call at a write barrier then
deoptimization at a a write barrier is possible and we need debug info
at the write barrier. Having debug info and allowing the write barrier
to move around would be quite complicated.

Roland.

From rkennke at redhat.com  Wed Dec 14 09:30:57 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 14 Dec 2016 10:30:57 +0100
Subject: RFR: Reduce waste in humongous allocations
In-Reply-To: <dk6h967vssk.fsf@rwestrel.remote.csb>
References: <1481649098.2597.99.camel@redhat.com>
	<2616779e-dc25-c0cf-9659-fcd03fddcefa@redhat.com>
	<1481653217.2597.102.camel@redhat.com>
	<dk6h967vssk.fsf@rwestrel.remote.csb>
Message-ID: <1481707857.2597.105.camel@redhat.com>

Am Mittwoch, den 14.12.2016, 09:06 +0100 schrieb Roland Westrelin:
> > It's not painful in terms of performance, but painful in terms of
> > implemention. We cannot easily acquire the Heap_lock on allocations
> > because the allocation might come out of a write barrier, and that
> > Java
> > thread is not-in-VM (they call into the VM via a cheap leaf-call).
> > We
> > could change that (and have been there already) to use regular
> > calls
> > like, e.g. allocations do, but this opens up a whole new class of
> > other
> > problems. For example, we need oopmaps at write-barriers which,
> > iirc,
> > presented us some serious optimization problems in C2 land. With
> > Roland's work, those might have gone away though (seems like we can
> > well live with control inputs to write barriers now..)
> 
> If we have a blocking runtime call at a write barrier then
> deoptimization at a a write barrier is possible and we need debug
> info
> at the write barrier. Having debug info and allowing the write
> barrier
> to move around would be quite complicated.

Yeah that's the issues we had last time we tried that.

I am currently working on a different approach: instead of using a
Hotspot Mutex or such, I'm now protecting the allocation code path by a
 little CAS-based spin lock. Kind of like what we already do for
growing the heap, only better :-) It seems to work, only needs some
more testing before I propose it for review.


Stay tuned :-)

Roman

From shade at redhat.com  Wed Dec 14 11:04:48 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 14 Dec 2016 12:04:48 +0100
Subject: RFR (S): Fix MXBean Full GC notifications
Message-ID: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com>

Hi,

In JMH gc profiler, we have both "alloc" (actual allocations) and "churn" (space
freed by collections) counters. For Shenandoah, these counters disagree wildly,
because Shenandoah borks notifying MXBeans about Full GCs.

Fix:
 http://cr.openjdk.java.net/~shade/shenandoah/mx-fullgc-notify/webrev.01/

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Dec 14 11:38:06 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 14 Dec 2016 12:38:06 +0100
Subject: RFR (S): Fix MXBean Full GC notifications
In-Reply-To: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com>
References: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com>
Message-ID: <1481715486.2597.106.camel@redhat.com>

Am Mittwoch, den 14.12.2016, 12:04 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> In JMH gc profiler, we have both "alloc" (actual allocations) and
> "churn" (space
> freed by collections) counters. For Shenandoah, these counters
> disagree wildly,
> because Shenandoah borks notifying MXBeans about Full GCs.
> 
> Fix:
> ?http://cr.openjdk.java.net/~shade/shenandoah/mx-fullgc-
> notify/webrev.01/


Yep.

Roman

From ashipile at redhat.com  Wed Dec 14 11:56:35 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 14 Dec 2016 11:56:35 +0000
Subject: hg: shenandoah/jdk9/hotspot: Fix MXBean Full GC notifications.
Message-ID: <201612141156.uBEBuZ72007037@aojmv0008.oracle.com>

Changeset: a2d3be7f08ad
Author:    shade
Date:      2016-12-14 12:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a2d3be7f08ad

Fix MXBean Full GC notifications.

! src/share/vm/services/memoryManager.cpp
! src/share/vm/services/memoryManager.hpp
! src/share/vm/services/memoryService.cpp
! test/TEST.groups
+ test/gc/shenandoah/MXNotificationsFullGC.java


From shade at redhat.com  Wed Dec 14 12:45:41 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 14 Dec 2016 13:45:41 +0100
Subject: RFR (S): Fix MXBean Full GC notifications
In-Reply-To: <1481715486.2597.106.camel@redhat.com>
References: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com>
	<1481715486.2597.106.camel@redhat.com>
Message-ID: <764d4117-3f41-900c-a1e3-77755f5d21fa@redhat.com>

On 12/14/2016 12:38 PM, Roman Kennke wrote:
> Am Mittwoch, den 14.12.2016, 12:04 +0100 schrieb Aleksey Shipilev:
>> Hi,
>>
>> In JMH gc profiler, we have both "alloc" (actual allocations) and
>> "churn" (space
>> freed by collections) counters. For Shenandoah, these counters
>> disagree wildly,
>> because Shenandoah borks notifying MXBeans about Full GCs.
>>
>> Fix:
>>  http://cr.openjdk.java.net/~shade/shenandoah/mx-fullgc-
>> notify/webrev.01/
> 
> 
> Yep.

Of course the test started failing intermittently after I pushed it... This is a
follow-up:

diff -r a2d3be7f08ad test/gc/shenandoah/MXNotificationsFullGC.java
--- a/test/gc/shenandoah/MXNotificationsFullGC.java	Wed Dec 14 12:56:20 2016 +0100
+++ b/test/gc/shenandoah/MXNotificationsFullGC.java	Wed Dec 14 13:26:22 2016 +0100
@@ -54,6 +54,9 @@
        sink = new int[100_000];
     }

+    // GC notifications are asynchronous, wait a little
+    Thread.sleep(1000);
+
     if (!notified) {
       throw new IllegalStateException("Should have been notified");
     }

Does not fail after 50 runs.

Ok?

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Dec 14 12:55:00 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 14 Dec 2016 13:55:00 +0100
Subject: RFR (S): Fix MXBean Full GC notifications
In-Reply-To: <764d4117-3f41-900c-a1e3-77755f5d21fa@redhat.com>
References: <1b62c009-b692-269d-6ebb-db5539a690dc@redhat.com>
	<1481715486.2597.106.camel@redhat.com>
	<764d4117-3f41-900c-a1e3-77755f5d21fa@redhat.com>
Message-ID: <1481720100.2597.107.camel@redhat.com>

Am Mittwoch, den 14.12.2016, 13:45 +0100 schrieb Aleksey Shipilev:
> On 12/14/2016 12:38 PM, Roman Kennke wrote:
> > Am Mittwoch, den 14.12.2016, 12:04 +0100 schrieb Aleksey Shipilev:
> > > Hi,
> > > 
> > > In JMH gc profiler, we have both "alloc" (actual allocations) and
> > > "churn" (space
> > > freed by collections) counters. For Shenandoah, these counters
> > > disagree wildly,
> > > because Shenandoah borks notifying MXBeans about Full GCs.
> > > 
> > > Fix:
> > > ?http://cr.openjdk.java.net/~shade/shenandoah/mx-fullgc-
> > > notify/webrev.01/
> > 
> > 
> > Yep.
> 
> Of course the test started failing intermittently after I pushed
> it... This is a
> follow-up:
> 
> diff -r a2d3be7f08ad test/gc/shenandoah/MXNotificationsFullGC.java
> --- a/test/gc/shenandoah/MXNotificationsFullGC.java	Wed Dec 14
> 12:56:20 2016 +0100
> +++ b/test/gc/shenandoah/MXNotificationsFullGC.java	Wed Dec 14
> 13:26:22 2016 +0100
> @@ -54,6 +54,9 @@
> ????????sink = new int[100_000];
> ?????}
> 
> +????// GC notifications are asynchronous, wait a little
> +????Thread.sleep(1000);
> +
> ?????if (!notified) {
> ???????throw new IllegalStateException("Should have been notified");
> ?????}
> 
> Does not fail after 50 runs.
> 
> Ok?

Sure.


From ashipile at redhat.com  Wed Dec 14 12:56:14 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 14 Dec 2016 12:56:14 +0000
Subject: hg: shenandoah/jdk9/hotspot: Workaround GC notification
	asynchronicity in test/gc/shenandoah/MXNotificationsFullGC.
Message-ID: <201612141256.uBECuEox024913@aojmv0008.oracle.com>

Changeset: a09a9979e356
Author:    shade
Date:      2016-12-14 13:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a09a9979e356

Workaround GC notification asynchronicity in test/gc/shenandoah/MXNotificationsFullGC.

! test/gc/shenandoah/MXNotificationsFullGC.java


From rkennke at redhat.com  Wed Dec 14 15:29:26 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 14 Dec 2016 16:29:26 +0100
Subject: RFR: JVMStat heap region counters
Message-ID: <1481729366.2597.111.camel@redhat.com>

This adds some infrastructure to monitor each heap region via JVMStat.
It currently exposes for each region the number of used and live bytes,
plus information whether the region is humongous, in the collection
set, or unused (i.e. not yet allocated, when heap is growable). In
addition, it provides the number of regions and their size as
constants, and flags that tell if marking and evacuation is in
progress.

For the region data, it uses a packed format so that all info per
region fits in one jlong counter. Should save bandwidth, especially
when monitoring via network.

The names of the counters and their format is documented in the header
file. It's subject to changes, especially in the nearer future.

It ups the PerfDataMemorySize for Shenandoah, so that we can fit in all
those counters.

In order to use it, one must provide -XX:+UsePerfData to turn on
JVMStat, and -XX:+ShenandoahRegionSampling to provide live region data.
The sampling rate can be set via -XX:ShenandoahRegionSamplingRate=$MS
that tells the number of milliseconds between samples. The latter both
flags can also be turned on via JMX (i.e. writable(Always)) , which is
especially useful for temporarily turning on monitoring from a tool.

http://cr.openjdk.java.net/~rkennke/regioncounters/webrev.00/

Ok?

Roman

From shade at redhat.com  Wed Dec 14 15:44:37 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 14 Dec 2016 16:44:37 +0100
Subject: RFR: JVMStat heap region counters
In-Reply-To: <1481729366.2597.111.camel@redhat.com>
References: <1481729366.2597.111.camel@redhat.com>
Message-ID: <85320b26-6071-1fdd-35a0-b26f8ec9d74f@redhat.com>

On 12/14/2016 04:29 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/regioncounters/webrev.00/

*) I kinda crammed things into ShenandoahConcurrentThread. We can do:

  heap->monitoring_support()->update_counters();

...once after the if.

Otherwise looks good.

Thanks,
-Aleksey


From roman at kennke.org  Wed Dec 14 16:25:11 2016
From: roman at kennke.org (roman at kennke.org)
Date: Wed, 14 Dec 2016 16:25:11 +0000
Subject: hg: shenandoah/jdk9/hotspot: JVMStat heap region counters
Message-ID: <201612141625.uBEGPBIB025647@aojmv0008.oracle.com>

Changeset: 1785c83977e3
Author:    rkennke
Date:      2016-12-14 17:23 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/1785c83977e3

JVMStat heap region counters

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
+ src/share/vm/gc/shenandoah/shenandoahHeapRegionCounters.cpp
+ src/share/vm/gc/shenandoah/shenandoahHeapRegionCounters.hpp
! src/share/vm/gc/shenandoah/shenandoahMonitoringSupport.cpp
! src/share/vm/gc/shenandoah/shenandoahMonitoringSupport.hpp
! src/share/vm/gc/shenandoah/shenandoah_globals.hpp
! src/share/vm/runtime/arguments.cpp


From rkennke at redhat.com  Wed Dec 14 16:25:23 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 14 Dec 2016 17:25:23 +0100
Subject: RFR: JVMStat heap region counters
In-Reply-To: <85320b26-6071-1fdd-35a0-b26f8ec9d74f@redhat.com>
References: <1481729366.2597.111.camel@redhat.com>
	<85320b26-6071-1fdd-35a0-b26f8ec9d74f@redhat.com>
Message-ID: <1481732723.2597.112.camel@redhat.com>

Am Mittwoch, den 14.12.2016, 16:44 +0100 schrieb Aleksey Shipilev:
> On 12/14/2016 04:29 PM, Roman Kennke wrote:
> > http://cr.openjdk.java.net/~rkennke/regioncounters/webrev.00/
> 
> *) I kinda crammed things into ShenandoahConcurrentThread. We can do:
> 
> ? heap->monitoring_support()->update_counters();
> 
> ...once after the if.

Ok, I pushed it with the suggested change.

Roman

From rkennke at redhat.com  Wed Dec 14 16:36:29 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 14 Dec 2016 17:36:29 +0100
Subject: RFR: Locked allocation
Message-ID: <1481733389.2597.114.camel@redhat.com>

This patch throws out all the lockfree allocation madness, and
implements a much simpler locked allocation. Since we can't easily use
Mutex and friends, and also don't need most of their functionality
(wait/notify, nesting, etc), I implemented a very simple (simple as in,
can read-and-understand it in one glance) CAS based spin-lock. This is
wrapped around the normal allocation path, the humongous allocation
path and the heap growing path. It is not locking around the call to
full-gc, as this involves other locks and as CHF says, there are
alligators there ;-)

This does immensely simplify ShenandoahFreeSet, especially the racy
humongous allocation path. It does fix the bug that some people have
encountered about used not consistent with capacity.

I've tested it using gc-bench (no regression in allocation throughput),
SPECjvm and jtreg tests. Looks all fine.

When reviewing, please pay special attention to the lock in
ShenandoahHeap::allocate_memory()!

http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/

Ok?

Roman

From shade at redhat.com  Wed Dec 14 17:33:15 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 14 Dec 2016 18:33:15 +0100
Subject: RFR: Locked allocation
In-Reply-To: <1481733389.2597.114.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
Message-ID: <7fc26a67-09e8-c5d1-b59d-f825aee6411b@redhat.com>

On 12/14/2016 05:36 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/

Impressive!

Comments:

*) Double/long assert in ShenandoahFreeSet::increase_used. At least break the
line, or better yet, combine two asserts in one?

*) Outdated comment:
90   // The modulo will take care of wrapping around.

*) Also, where *does* it wrap around now? Or we don't need it now, because we
guarantee all the previous regions are finally claimed, and no holes left?

*) Can we write this:

 while (_active_end - next > num) { ...

as this?

 while (next + num < _active_end) { ...

I think it is a tad more readable: the bound is on the right.

*) In RecycleDirtyRegionsClosure, there is no more add_region, why? Was that
call superfluous before?

 864       _heap->free_regions()->add_region(r);

Thanks,
-Aleksey


From zgu at redhat.com  Wed Dec 14 18:10:06 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 14 Dec 2016 13:10:06 -0500
Subject: RFR: Locked allocation
In-Reply-To: <1481733389.2597.114.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
Message-ID: <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com>

Great job! It simplifies the logic a lot!

A few minor suggestions:

- ShenandoahFreeSet::clear()

    I only see one path to this method and it is from safepoint. so
    replacing fence with safepoint assertion should be appropriate.

- asserting on _heap_lock == 1 on code paths that are protected by the lock
   makes code more readable.

- Will this lock be hot? and you want to check safepoint during spinning?
   I wonder if it has impact on TTSP

Thanks,

-Zhengyu

On 12/14/2016 11:36 AM, Roman Kennke wrote:
> This patch throws out all the lockfree allocation madness, and
> implements a much simpler locked allocation. Since we can't easily use
> Mutex and friends, and also don't need most of their functionality
> (wait/notify, nesting, etc), I implemented a very simple (simple as in,
> can read-and-understand it in one glance) CAS based spin-lock. This is
> wrapped around the normal allocation path, the humongous allocation
> path and the heap growing path. It is not locking around the call to
> full-gc, as this involves other locks and as CHF says, there are
> alligators there ;-)
>
> This does immensely simplify ShenandoahFreeSet, especially the racy
> humongous allocation path. It does fix the bug that some people have
> encountered about used not consistent with capacity.
>
> I've tested it using gc-bench (no regression in allocation throughput),
> SPECjvm and jtreg tests. Looks all fine.
>
> When reviewing, please pay special attention to the lock in
> ShenandoahHeap::allocate_memory()!
>
> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/
>
> Ok?
>
> Roman


From zgu at redhat.com  Wed Dec 14 18:39:04 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 14 Dec 2016 13:39:04 -0500
Subject: RFR: Locked allocation
In-Reply-To: <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com>
Message-ID: <e6e41649-20cd-924e-aaaa-4b657dc8fbe8@redhat.com>

On 12/14/2016 01:10 PM, Zhengyu Gu wrote:

> Great job! It simplifies the logic a lot!
>
> A few minor suggestions:
>
> - ShenandoahFreeSet::clear()
>
>    I only see one path to this method and it is from safepoint. so
>    replacing fence with safepoint assertion should be appropriate.
>
> - asserting on _heap_lock == 1 on code paths that are protected by the 
> lock
>   makes code more readable.

Or make _heap_lock an opaque object and store owner thread pointer, so
can have assertion like assert(owned_by_self() ...), at least for debug mode.

-Zhengyu

>
> - Will this lock be hot? and you want to check safepoint during spinning?
>   I wonder if it has impact on TTSP
>
> Thanks,
>
> -Zhengyu
>
> On 12/14/2016 11:36 AM, Roman Kennke wrote:
>> This patch throws out all the lockfree allocation madness, and
>> implements a much simpler locked allocation. Since we can't easily use
>> Mutex and friends, and also don't need most of their functionality
>> (wait/notify, nesting, etc), I implemented a very simple (simple as in,
>> can read-and-understand it in one glance) CAS based spin-lock. This is
>> wrapped around the normal allocation path, the humongous allocation
>> path and the heap growing path. It is not locking around the call to
>> full-gc, as this involves other locks and as CHF says, there are
>> alligators there ;-)
>>
>> This does immensely simplify ShenandoahFreeSet, especially the racy
>> humongous allocation path. It does fix the bug that some people have
>> encountered about used not consistent with capacity.
>>
>> I've tested it using gc-bench (no regression in allocation throughput),
>> SPECjvm and jtreg tests. Looks all fine.
>>
>> When reviewing, please pay special attention to the lock in
>> ShenandoahHeap::allocate_memory()!
>>
>> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/
>>
>> Ok?
>>
>> Roman
>


From rkennke at redhat.com  Wed Dec 14 18:52:10 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 14 Dec 2016 19:52:10 +0100
Subject: RFR: Locked allocation
In-Reply-To: <7fc26a67-09e8-c5d1-b59d-f825aee6411b@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<7fc26a67-09e8-c5d1-b59d-f825aee6411b@redhat.com>
Message-ID: <1481741530.2597.116.camel@redhat.com>

Am Mittwoch, den 14.12.2016, 18:33 +0100 schrieb Aleksey Shipilev:
> On 12/14/2016 05:36 PM, Roman Kennke wrote:
> > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/
> 
> Impressive!
> 
> Comments:
> 
> *) Double/long assert in ShenandoahFreeSet::increase_used. At least
> break the
> line, or better yet, combine two asserts in one?

It's a sort of pre- and post-condition, hence the two checks. This guy
was driving me nuts (and probably still is), so I'll leave it for now.
I'll break the line though.

> *) Outdated comment:
> 90???// The modulo will take care of wrapping around.

Oops. Will remove it.

> *) Also, where *does* it wrap around now? Or we don't need it now,
> because we
> guarantee all the previous regions are finally claimed, and no holes
> left?

We used a ring-buffer when claiming humongous regions. When we found a
region starting at index X away from 'current', then we would re-append 
all regions between current and X to the end of the list. We couldn't
reasonable skip humongous regions concurrently. Now that it's single-
threaded, we can simply ignore any humongous regions on the list. No
more ring buffer needed, and we can never exceed _max_regions length.


> *) Can we write this:
> 
> ?while (_active_end - next > num) { ...
> 
> as this?
> 
> ?while (next + num < _active_end) { ...
> 
> I think it is a tad more readable: the bound is on the right.

Yep. Thanks for reminding me of good practices ! :-)

> *) In RecycleDirtyRegionsClosure, there is no more add_region, why?
> Was that
> call superfluous before?

Yes. Right after recycling regions, we will clear the free list. This
was bogus.

Will come with an updated patch shortly.

Roman

From rkennke at redhat.com  Wed Dec 14 18:57:05 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 14 Dec 2016 19:57:05 +0100
Subject: RFR: Locked allocation
In-Reply-To: <5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com>
Message-ID: <1481741825.2597.118.camel@redhat.com>

Am Mittwoch, den 14.12.2016, 13:10 -0500 schrieb Zhengyu Gu:
> Great job! It simplifies the logic a lot!
> 
> A few minor suggestions:
> 
> - ShenandoahFreeSet::clear()
> 
> ????I only see one path to this method and it is from safepoint. so
> ????replacing fence with safepoint assertion should be appropriate.

Ah yes. I was thinking it solved the assert that you and others were
facing. My reasoning was that other threads within the same safepoint
would need to see the update. However, now that I think about it, those
other threads would need to go through our new-fangled lock, and thus a
CAS, and thus a fence...  hmmm. Will need to try again. You may be
right and this fence is bogus.

> - asserting on _heap_lock == 1 on code paths that are protected by
> the lock
> ???makes code more readable.

Yes. I was actually having the same idea as you and store the locking
thread for debug checking, and do an opaque lock object, and even a
scoped locker. All that should contribute to sanity.

> - Will this lock be hot?

I don't think it's very hot.

>  and you want to check safepoint during spinning?

Nope. The whole point of this excerise was to avoid potentially
safepointing (and thus requiring oopmap, debug-info, etc blah blah at
write barriers) :-)

> ???I wonder if it has impact on TTSP

I doubt. gc-bench didn't show any such thing. In fact, it might be
better than before now, at least when you've got threads racing to
allocate humongous objects. The previous code was not even guaranteed
to complete (could interleave claiming regions, never finding a
contiguous block).

Will come up with a patch later. Need food first. ;-)

Roman

> 
> Thanks,
> 
> -Zhengyu
> 
> On 12/14/2016 11:36 AM, Roman Kennke wrote:
> > This patch throws out all the lockfree allocation madness, and
> > implements a much simpler locked allocation. Since we can't easily
> > use
> > Mutex and friends, and also don't need most of their functionality
> > (wait/notify, nesting, etc), I implemented a very simple (simple as
> > in,
> > can read-and-understand it in one glance) CAS based spin-lock. This
> > is
> > wrapped around the normal allocation path, the humongous allocation
> > path and the heap growing path. It is not locking around the call
> > to
> > full-gc, as this involves other locks and as CHF says, there are
> > alligators there ;-)
> > 
> > This does immensely simplify ShenandoahFreeSet, especially the racy
> > humongous allocation path. It does fix the bug that some people
> > have
> > encountered about used not consistent with capacity.
> > 
> > I've tested it using gc-bench (no regression in allocation
> > throughput),
> > SPECjvm and jtreg tests. Looks all fine.
> > 
> > When reviewing, please pay special attention to the lock in
> > ShenandoahHeap::allocate_memory()!
> > 
> > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/
> > 
> > Ok?
> > 
> > Roman
> 
> 

From chf at redhat.com  Wed Dec 14 19:04:53 2016
From: chf at redhat.com (Christine Flood)
Date: Wed, 14 Dec 2016 14:04:53 -0500 (EST)
Subject: I believe I fixed the issues, can I push this?
Message-ID: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com>

http://cr.openjdk.java.net/~chf/connections/webrev.02/

From rkennke at redhat.com  Wed Dec 14 19:09:13 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 14 Dec 2016 20:09:13 +0100
Subject: I believe I fixed the issues, can I push this?
In-Reply-To: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com>
References: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com>
Message-ID: <1481742553.2597.122.camel@redhat.com>

I have a half-finished patch that would make a connection matrix during
marking, and maintain it using barriers.

(I also have the infrastructure to do partial marking... stitching this
together will probably give us what we want. Soon.)

Other than that, I am fine with you pushing it ;-)

Roman


> http://cr.openjdk.java.net/~chf/connections/webrev.02/

From zgu at redhat.com  Wed Dec 14 19:14:18 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 14 Dec 2016 14:14:18 -0500
Subject: I believe I fixed the issues, can I push this?
In-Reply-To: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com>
References: <2019104040.4515651.1481742293293.JavaMail.zimbra@redhat.com>
Message-ID: <44f4598a-c256-1c38-b0e4-4603864835d6@redhat.com>

1. There is still a line of debugging code in ShenandoahHeap::calculate_matrix()
2. ShenandoahMatrix -> UseShenandoahMatrix to follow the convention.

Otherwise, looks good.

-Zhengyu


On 12/14/2016 02:04 PM, Christine Flood wrote:
> http://cr.openjdk.java.net/~chf/connections/webrev.02/


From zgu at redhat.com  Wed Dec 14 19:41:09 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 14 Dec 2016 14:41:09 -0500
Subject: RFR: Locked allocation
In-Reply-To: <1481741825.2597.118.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com>
	<1481741825.2597.118.camel@redhat.com>
Message-ID: <7a0b2754-c194-4680-fc8f-29dcff0a1213@redhat.com>


>>   and you want to check safepoint during spinning?
> Nope. The whole point of this excerise was to avoid potentially
> safepointing (and thus requiring oopmap, debug-info, etc blah blah at
> write barriers) :-)

Yes, I forgot about safepointing problem.


-Zhengyu


>>     I wonder if it has impact on TTSP
> I doubt. gc-bench didn't show any such thing. In fact, it might be
> better than before now, at least when you've got threads racing to
> allocate humongous objects. The previous code was not even guaranteed
> to complete (could interleave claiming regions, never finding a
> contiguous block).
>
> Will come up with a patch later. Need food first. ;-)
>
> Roman
>
>> Thanks,
>>
>> -Zhengyu
>>
>> On 12/14/2016 11:36 AM, Roman Kennke wrote:
>>> This patch throws out all the lockfree allocation madness, and
>>> implements a much simpler locked allocation. Since we can't easily
>>> use
>>> Mutex and friends, and also don't need most of their functionality
>>> (wait/notify, nesting, etc), I implemented a very simple (simple as
>>> in,
>>> can read-and-understand it in one glance) CAS based spin-lock. This
>>> is
>>> wrapped around the normal allocation path, the humongous allocation
>>> path and the heap growing path. It is not locking around the call
>>> to
>>> full-gc, as this involves other locks and as CHF says, there are
>>> alligators there ;-)
>>>
>>> This does immensely simplify ShenandoahFreeSet, especially the racy
>>> humongous allocation path. It does fix the bug that some people
>>> have
>>> encountered about used not consistent with capacity.
>>>
>>> I've tested it using gc-bench (no regression in allocation
>>> throughput),
>>> SPECjvm and jtreg tests. Looks all fine.
>>>
>>> When reviewing, please pay special attention to the lock in
>>> ShenandoahHeap::allocate_memory()!
>>>
>>> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/
>>>
>>> Ok?
>>>
>>> Roman
>>


From aph at redhat.com  Thu Dec 15 10:15:07 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 15 Dec 2016 10:15:07 +0000
Subject: RFR: Locked allocation
In-Reply-To: <1481733389.2597.114.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
Message-ID: <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>

On 14/12/16 16:36, Roman Kennke wrote:
> When reviewing, please pay special attention to the lock in
> ShenandoahHeap::allocate_memory()!

I'm always rather nervous about anybody who invents their own
spinlocks.  It's a code smell: that doesn't mean it's wrong here, but
it does deserve attention.

I presume the idea here is that the native allocation is going to be
fairly rare because threads will usually allocate inline from their
own TLABs.  However, please consider the situation where a thread
holding the lock is descheduled.

Andrew.

From rkennke at redhat.com  Thu Dec 15 11:31:10 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 15 Dec 2016 12:31:10 +0100
Subject: RFR: Locked allocation
In-Reply-To: <955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>
Message-ID: <1481801470.2807.1.camel@redhat.com>

Am Donnerstag, den 15.12.2016, 10:15 +0000 schrieb Andrew Haley:
> On 14/12/16 16:36, Roman Kennke wrote:
> > When reviewing, please pay special attention to the lock in
> > ShenandoahHeap::allocate_memory()!
> 
> I'm always rather nervous about anybody who invents their own
> spinlocks.

Yeah, understandable. We are too, which is why we went to great efforts
to implement a lock-free allocatoin scheme a while ago. But it was
always buggy and very complex and hard to understand+debug. And
humongous allocation was inherently racy: how would you deal with
multiple regions in one go, without taking a lock, and while other
threads are taking regions from under your feet? The same goes for
expanding the heap.

And since we couldn't use Mutex (and don't need most of their
functionality), the next best way to do it was implement a small cas-
based spinlock.

Besides, we already have been doing it, for heap expansion, but now
it's better (using the right fences, etc). With my upcoming patch, it
will also provide a scoped locker, and additional checks, for our
sanity.


> I presume the idea here is that the native allocation is going to be
> fairly rare because threads will usually allocate inline from their
> own TLABs.

Yes.

> ??However, please consider the situation where a thread
> holding the lock is descheduled.

Yes. We're doing a SpinPause() when spinning, this should get us back
to the thread holding the lock quickly. If you have an idea how to
improve this, let me know!

gc-bench provides a couple of tests that bash the allocation code with
multiple threads, and it did not find performance regressions or bugs.

Roman

From shade at redhat.com  Thu Dec 15 11:32:02 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 15 Dec 2016 12:32:02 +0100
Subject: Bug: ReferenceProcessor does from-space writes?
Message-ID: <8cbbae41-13e9-8e8e-6128-b2cbc9d061ca@redhat.com>

Hi,

Our CI brought us this assert:

[VM ERROR] o.o.j.t.tearing.buffers.DirectByteBufferInterleaveTest
    (JVM args: [-Xmx16g, -XX:+ShenandoahStoreCheck,
-XX:ShenandoahGCHeuristics=aggressive, -XX:+ShenandoahVerifyOptoBarriers,
-XX:+VerifyStrictOopOperations, -XX:+UseShenandoahGC, -XX:-UseCompressedOops,
-Xint])
  Observed state   Occurrences   Expectation  Interpretation

     0, 128, 128             0    ACCEPTABLE  Seeing all updates intact.


    Messages:
        # To suppress the following error report, specify this argument
        # after -XX: or in .hotspotrc:
SuppressErrorAt=/shenandoahBarrierSet.cpp:272
        #
        # A fatal error has been detected by the Java Runtime Environment:
        #
        #  Internal Error
(/opt/jenkins/workspace/jdk9-shenandoah-fastdebug/hotspot/src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp:272),
pid=29026, tid=29028
        #  assert(o == __null || oopDesc::unsafe_equals(o,
resolve_oop_static(o))) failed: only write to-space values


hs_err shows this stack:

V  [libjvm.so+0x15bc01f]  VMError::report_and_die(int, char const*, char const*,
__va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int,
unsigned long)+0x15f
V  [libjvm.so+0x15bcdda]  VMError::report_and_die(Thread*, char const*, int,
char const*, char const*, __va_list_tag*)+0x4a
V  [libjvm.so+0xa4e87a]  report_vm_error(char const*, int, char const*, char
const*, ...)+0xea
V  [libjvm.so+0x13b374f]  ShenandoahBarrierSet::write_ref_field_work(void*, oop,
bool)+0x11f
V  [libjvm.so+0x132f0e7]  BarrierSet::write_ref_field(void*, oop, bool)+0x57
V  [libjvm.so+0x132c6dd]
ReferenceProcessor::enqueue_discovered_reflist(DiscoveredList&)+0x71d
V  [libjvm.so+0x1331582]  RefProcEnqueueTask::work(unsigned int)+0xa2
V  [libjvm.so+0x162b935]  GangWorker::loop()+0xc5
V  [libjvm.so+0x12315c2]  thread_native_entry(Thread*)+0x112

The code is:

    ...
    next_d = java_lang_ref_Reference::discovered(obj); // RB here
    ...
    java_lang_ref_Reference::set_next_raw(obj, obj);
    if (! oopDesc::safe_equals(next_d, obj)) {
      oopDesc::bs()->write_ref_field(
           // !!! Oops, re-reading without RB here?
           java_lang_ref_Reference::discovered_addr(obj),
           next_d);

Most uses of Reference::*_addr seem suspicious to me.

Thanks,
-Aleksey


From rkennke at redhat.com  Thu Dec 15 11:36:21 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 15 Dec 2016 12:36:21 +0100
Subject: Bug: ReferenceProcessor does from-space writes?
In-Reply-To: <8cbbae41-13e9-8e8e-6128-b2cbc9d061ca@redhat.com>
References: <8cbbae41-13e9-8e8e-6128-b2cbc9d061ca@redhat.com>
Message-ID: <1481801781.2807.3.camel@redhat.com>

This is odd.

During marking, we should only enqueue Reference objects that are in
to-space. Adding read-barriers into ReferenceProcessor is most likely
only hiding the real bug. The most likely cause is failing to mark a
Reference object in previous cycle, thus not evacuating it....

Is this reproducible?

Roman

Am Donnerstag, den 15.12.2016, 12:32 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> Our CI brought us this assert:
> 
> [VM ERROR] o.o.j.t.tearing.buffers.DirectByteBufferInterleaveTest
> ????(JVM args: [-Xmx16g, -XX:+ShenandoahStoreCheck,
> -XX:ShenandoahGCHeuristics=aggressive,
> -XX:+ShenandoahVerifyOptoBarriers,
> -XX:+VerifyStrictOopOperations, -XX:+UseShenandoahGC, -XX:-
> UseCompressedOops,
> -Xint])
> ? Observed state???Occurrences???Expectation??Interpretation
> 
> ?????0, 128, 128?????????????0????ACCEPTABLE??Seeing all updates
> intact.
> 
> 
> ????Messages:
> ????????# To suppress the following error report, specify this
> argument
> ????????# after -XX: or in .hotspotrc:
> SuppressErrorAt=/shenandoahBarrierSet.cpp:272
> ????????#
> ????????# A fatal error has been detected by the Java Runtime
> Environment:
> ????????#
> ????????#??Internal Error
> (/opt/jenkins/workspace/jdk9-shenandoah-
> fastdebug/hotspot/src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp
> :272),
> pid=29026, tid=29028
> ????????#??assert(o == __null || oopDesc::unsafe_equals(o,
> resolve_oop_static(o))) failed: only write to-space values
> 
> 
> hs_err shows this stack:
> 
> V??[libjvm.so+0x15bc01f]??VMError::report_and_die(int, char const*,
> char const*,
> __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*,
> int,
> unsigned long)+0x15f
> V??[libjvm.so+0x15bcdda]??VMError::report_and_die(Thread*, char
> const*, int,
> char const*, char const*, __va_list_tag*)+0x4a
> V??[libjvm.so+0xa4e87a]??report_vm_error(char const*, int, char
> const*, char
> const*, ...)+0xea
> V??[libjvm.so+0x13b374f]??ShenandoahBarrierSet::write_ref_field_work(
> void*, oop,
> bool)+0x11f
> V??[libjvm.so+0x132f0e7]??BarrierSet::write_ref_field(void*, oop,
> bool)+0x57
> V??[libjvm.so+0x132c6dd]
> ReferenceProcessor::enqueue_discovered_reflist(DiscoveredList&)+0x71d
> V??[libjvm.so+0x1331582]??RefProcEnqueueTask::work(unsigned int)+0xa2
> V??[libjvm.so+0x162b935]??GangWorker::loop()+0xc5
> V??[libjvm.so+0x12315c2]??thread_native_entry(Thread*)+0x112
> 
> The code is:
> 
> ????...
> ????next_d = java_lang_ref_Reference::discovered(obj); // RB here
> ????...
> ????java_lang_ref_Reference::set_next_raw(obj, obj);
> ????if (! oopDesc::safe_equals(next_d, obj)) {
> ??????oopDesc::bs()->write_ref_field(
> ???????????// !!! Oops, re-reading without RB here?
> ???????????java_lang_ref_Reference::discovered_addr(obj),
> ???????????next_d);
> 
> Most uses of Reference::*_addr seem suspicious to me.
> 
> Thanks,
> -Aleksey
> 

From shade at redhat.com  Thu Dec 15 11:43:21 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 15 Dec 2016 12:43:21 +0100
Subject: Bug: ReferenceProcessor does from-space writes?
In-Reply-To: <1481801781.2807.3.camel@redhat.com>
References: <8cbbae41-13e9-8e8e-6128-b2cbc9d061ca@redhat.com>
	<1481801781.2807.3.camel@redhat.com>
Message-ID: <9435ace2-37b1-1429-3e1c-862bcbdb8741@redhat.com>

Reproduced in CI two times, failed to reproduce locally.

-Aleksey

On 12/15/2016 12:36 PM, Roman Kennke wrote:
> This is odd.
> 
> During marking, we should only enqueue Reference objects that are in
> to-space. Adding read-barriers into ReferenceProcessor is most likely
> only hiding the real bug. The most likely cause is failing to mark a
> Reference object in previous cycle, thus not evacuating it....
> 
> Is this reproducible?
> 
> Roman
> 
> Am Donnerstag, den 15.12.2016, 12:32 +0100 schrieb Aleksey Shipilev:
>> Hi,
>>
>> Our CI brought us this assert:
>>
>> [VM ERROR] o.o.j.t.tearing.buffers.DirectByteBufferInterleaveTest
>>     (JVM args: [-Xmx16g, -XX:+ShenandoahStoreCheck,
>> -XX:ShenandoahGCHeuristics=aggressive,
>> -XX:+ShenandoahVerifyOptoBarriers,
>> -XX:+VerifyStrictOopOperations, -XX:+UseShenandoahGC, -XX:-
>> UseCompressedOops,
>> -Xint])
>>   Observed state   Occurrences   Expectation  Interpretation
>>
>>      0, 128, 128             0    ACCEPTABLE  Seeing all updates
>> intact.
>>
>>
>>     Messages:
>>         # To suppress the following error report, specify this
>> argument
>>         # after -XX: or in .hotspotrc:
>> SuppressErrorAt=/shenandoahBarrierSet.cpp:272
>>         #
>>         # A fatal error has been detected by the Java Runtime
>> Environment:
>>         #
>>         #  Internal Error
>> (/opt/jenkins/workspace/jdk9-shenandoah-
>> fastdebug/hotspot/src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp
>> :272),
>> pid=29026, tid=29028
>>         #  assert(o == __null || oopDesc::unsafe_equals(o,
>> resolve_oop_static(o))) failed: only write to-space values
>>
>>
>> hs_err shows this stack:
>>
>> V  [libjvm.so+0x15bc01f]  VMError::report_and_die(int, char const*,
>> char const*,
>> __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*,
>> int,
>> unsigned long)+0x15f
>> V  [libjvm.so+0x15bcdda]  VMError::report_and_die(Thread*, char
>> const*, int,
>> char const*, char const*, __va_list_tag*)+0x4a
>> V  [libjvm.so+0xa4e87a]  report_vm_error(char const*, int, char
>> const*, char
>> const*, ...)+0xea
>> V  [libjvm.so+0x13b374f]  ShenandoahBarrierSet::write_ref_field_work(
>> void*, oop,
>> bool)+0x11f
>> V  [libjvm.so+0x132f0e7]  BarrierSet::write_ref_field(void*, oop,
>> bool)+0x57
>> V  [libjvm.so+0x132c6dd]
>> ReferenceProcessor::enqueue_discovered_reflist(DiscoveredList&)+0x71d
>> V  [libjvm.so+0x1331582]  RefProcEnqueueTask::work(unsigned int)+0xa2
>> V  [libjvm.so+0x162b935]  GangWorker::loop()+0xc5
>> V  [libjvm.so+0x12315c2]  thread_native_entry(Thread*)+0x112
>>
>> The code is:
>>
>>     ...
>>     next_d = java_lang_ref_Reference::discovered(obj); // RB here
>>     ...
>>     java_lang_ref_Reference::set_next_raw(obj, obj);
>>     if (! oopDesc::safe_equals(next_d, obj)) {
>>       oopDesc::bs()->write_ref_field(
>>            // !!! Oops, re-reading without RB here?
>>            java_lang_ref_Reference::discovered_addr(obj),
>>            next_d);
>>
>> Most uses of Reference::*_addr seem suspicious to me.
>>
>> Thanks,
>> -Aleksey
>>


From aph at redhat.com  Thu Dec 15 11:44:37 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 15 Dec 2016 11:44:37 +0000
Subject: RFR: Locked allocation
In-Reply-To: <1481801470.2807.1.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>
	<1481801470.2807.1.camel@redhat.com>
Message-ID: <ad83e66b-50d2-1ff5-9fd6-0f9cab219039@redhat.com>

On 15/12/16 11:31, Roman Kennke wrote:
> Am Donnerstag, den 15.12.2016, 10:15 +0000 schrieb Andrew Haley:
>> On 14/12/16 16:36, Roman Kennke wrote:
>>> When reviewing, please pay special attention to the lock in ShenandoahHeap::allocate_memory()!
> 
>> However, please consider the situation where a thread holding the lock is descheduled.
> 
> Yes. We're doing a SpinPause() when spinning, this should get us back to the thread holding the lock quickly. If you have an idea how to improve this, let me know!

Please have a look at the way SpinPause() is defined!

Maybe it's worth looking at backoff after spinning for a while.  But
it's very hard to test for consistent behaviour under extreme
conditions.  Allocating very large objects is quite likely to
result in page faults, and therefore quite likely to cause a
thread to be descheduled.  On a heavily loaded system I would
expect long delays for page faults, while the lock is held.

I fear that it's very tempting to design Shenandoah so that it
behaves extremely well when it's not being "abused".

> gc-bench provides a couple of tests that bash the allocation code with multiple threads, and it did not find performance regressions or bugs.

Sure, but I'm thinking about systems which are overloaded.  I don't
know if gc-bench would help there.

I presume that you have considered allocating humongous object outside
of Shenandoah's regions altogether.  But even mentioning such a thing
takes me way outside my area of expertise, so...

Andrew.

From shade at redhat.com  Thu Dec 15 11:55:07 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 15 Dec 2016 12:55:07 +0100
Subject: RFR: Locked allocation
In-Reply-To: <ad83e66b-50d2-1ff5-9fd6-0f9cab219039@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>
	<1481801470.2807.1.camel@redhat.com>
	<ad83e66b-50d2-1ff5-9fd6-0f9cab219039@redhat.com>
Message-ID: <07f6da55-a37c-efd4-a14f-d3113b336a9d@redhat.com>

On 12/15/2016 12:44 PM, Andrew Haley wrote:
> Maybe it's worth looking at backoff after spinning for a while.  But
> it's very hard to test for consistent behaviour under extreme
> conditions.  Allocating very large objects is quite likely to
> result in page faults, and therefore quite likely to cause a
> thread to be descheduled.  On a heavily loaded system I would
> expect long delays for page faults, while the lock is held.

Generally true. But I think current change only covers the freelist/region
manipulation work, which should complete very quickly. The initialization (which
is the hard part of doing "new" on large Java objects) should and will happen
outside the spinlocked path.

Think about this as coarsening the current juggling-the-knives lock-free
mechanics with a spinlocked entry to the small critical section. We are not
expected to do any heavy-lifting while holding that lock. This minimizes the
need for sophisticated backoffs, etc. Pretty much how we don't usually think
about backoffs with lock-free update code :)

Thanks,
-Aleksey


From rkennke at redhat.com  Thu Dec 15 12:01:05 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 15 Dec 2016 13:01:05 +0100
Subject: RFR: Locked allocation
In-Reply-To: <ad83e66b-50d2-1ff5-9fd6-0f9cab219039@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>
	<1481801470.2807.1.camel@redhat.com>
	<ad83e66b-50d2-1ff5-9fd6-0f9cab219039@redhat.com>
Message-ID: <1481803265.2807.5.camel@redhat.com>

Am Donnerstag, den 15.12.2016, 11:44 +0000 schrieb Andrew Haley:
> On 15/12/16 11:31, Roman Kennke wrote:
> > Am Donnerstag, den 15.12.2016, 10:15 +0000 schrieb Andrew Haley:
> > > On 14/12/16 16:36, Roman Kennke wrote:
> > > > When reviewing, please pay special attention to the lock in
> > > > ShenandoahHeap::allocate_memory()!
> > > However, please consider the situation where a thread holding the
> > > lock is descheduled.
> > 
> > Yes. We're doing a SpinPause() when spinning, this should get us
> > back to the thread holding the lock quickly. If you have an idea
> > how to improve this, let me know!
> 
> Please have a look at the way SpinPause() is defined!

I did.

> Maybe it's worth looking at backoff after spinning for a while.??But
> it's very hard to test for consistent behaviour under extreme
> conditions.??Allocating very large objects is quite likely to
> result in page faults, and therefore quite likely to cause a
> thread to be descheduled.??On a heavily loaded system I would
> expect long delays for page faults, while the lock is held.
> 
> I fear that it's very tempting to design Shenandoah so that it
> behaves extremely well when it's not being "abused".
> 
> > gc-bench provides a couple of tests that bash the allocation code
> > with multiple threads, and it did not find performance regressions
> > or bugs.
> 
> Sure, but I'm thinking about systems which are overloaded.??I don't
> know if gc-bench would help there.

I think it's specifically designed to abuse the GC as much as we can.
;-) Aleksey even wrote a test that allocates arrays without
initializing them, cranking out alloc rates in the 100s of GB/sec ...
cannot really do that with ordinary Java code, but should abuse the GC
quite a lot. :-D

And I firmly believe that doing a simple lock around the allocation
code is much more resistent to abuse than the previous implementation,
where multiple threads racing to allocate humongous objects could lock-
step each other, I think it couldn't even guarantee to complete... it's
 much better now I think.

Also, speaking of code smell... the previous lock-free code, well,
'code smell' is not the right word for it ;-) stinking pile of.. well,
you get the idea ;-)


> I presume that you have considered allocating humongous object
> outside
> of Shenandoah's regions altogether.??But even mentioning such a thing
> takes me way outside my area of expertise, so...

yeah... nope ;-)
http://replycandy.com/wp-content/uploads/Godzilla-Nope-Response-Meme.jp
g

(thanks shade for pointing me to the picture ;-) )

Roman

From aph at redhat.com  Thu Dec 15 12:01:56 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 15 Dec 2016 12:01:56 +0000
Subject: RFR: Locked allocation
In-Reply-To: <07f6da55-a37c-efd4-a14f-d3113b336a9d@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>
	<1481801470.2807.1.camel@redhat.com>
	<ad83e66b-50d2-1ff5-9fd6-0f9cab219039@redhat.com>
	<07f6da55-a37c-efd4-a14f-d3113b336a9d@redhat.com>
Message-ID: <b61390a8-5e4c-9da5-b0b8-d41a6f501a33@redhat.com>

On 15/12/16 11:55, Aleksey Shipilev wrote:

> Think about this as coarsening the current juggling-the-knives
> lock-free mechanics with a spinlocked entry to the small critical
> section. We are not expected to do any heavy-lifting while holding
> that lock. This minimizes the need for sophisticated backoffs,
> etc. Pretty much how we don't usually think about backoffs with
> lock-free update code :)

OK.

Andrew.


From rwestrel at redhat.com  Thu Dec 15 12:04:55 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 15 Dec 2016 13:04:55 +0100
Subject: RFR: Locked allocation
In-Reply-To: <1481801470.2807.1.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>
	<1481801470.2807.1.camel@redhat.com>
Message-ID: <dk6inqlcs9k.fsf@rwestrel.remote.csb>


> And since we couldn't use Mutex (and don't need most of their
> functionality), the next best way to do it was implement a small cas-
> based spinlock.

Even a VM Mutex with the no_safepoint_check_flag?

Roland.

From rkennke at redhat.com  Thu Dec 15 12:10:57 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 15 Dec 2016 13:10:57 +0100
Subject: RFR: Locked allocation
In-Reply-To: <dk6inqlcs9k.fsf@rwestrel.remote.csb>
References: <1481733389.2597.114.camel@redhat.com>
	<955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>
	<1481801470.2807.1.camel@redhat.com>
	<dk6inqlcs9k.fsf@rwestrel.remote.csb>
Message-ID: <1481803857.2807.7.camel@redhat.com>

Am Donnerstag, den 15.12.2016, 13:04 +0100 schrieb Roland Westrelin:
> > And since we couldn't use Mutex (and don't need most of their
> > functionality), the next best way to do it was implement a small
> > cas-
> > based spinlock.
> 
> Even a VM Mutex with the no_safepoint_check_flag?

One issue was that Mutex was expecting the thread in VM, unless it's
rank special. We can only be in VM when we have a non-leaf call at the
write barrier.

If I make the lock ranked 'special' I run into asserts that check
correct lock ordering. We need to allocate stuff when evacuating roots,
and this is holding the CodeCache_lock which is also ranked 'special'
etc pp.

We could probably add some extra code for Shenandoah to Mutex that
avoids alls this stuff, but would that be better than implementing the
simple lock as I did?

Roman

From rwestrel at redhat.com  Thu Dec 15 12:23:49 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 15 Dec 2016 13:23:49 +0100
Subject: RFR: Locked allocation
In-Reply-To: <1481803857.2807.7.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<955f480e-ea87-a37d-1049-f9d7346b99a4@redhat.com>
	<1481801470.2807.1.camel@redhat.com>
	<dk6inqlcs9k.fsf@rwestrel.remote.csb>
	<1481803857.2807.7.camel@redhat.com>
Message-ID: <dk6fulpcre2.fsf@rwestrel.remote.csb>


> One issue was that Mutex was expecting the thread in VM, unless it's
> rank special. We can only be in VM when we have a non-leaf call at the
> write barrier.
>
> If I make the lock ranked 'special' I run into asserts that check
> correct lock ordering. We need to allocate stuff when evacuating roots,
> and this is holding the CodeCache_lock which is also ranked 'special'
> etc pp.

Ok. Thanks.

Roland.

From rkennke at redhat.com  Thu Dec 15 14:40:49 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 15 Dec 2016 15:40:49 +0100
Subject: RFR: Locked allocation
In-Reply-To: <1481741825.2597.118.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com>
	<1481741825.2597.118.camel@redhat.com>
Message-ID: <1481812849.2807.10.camel@redhat.com>

So, here comes the update.

- I improved the lock to be a scoped locker (similar to MutexLocker),
this should help to keep things in order.
- It also keeps track of the locking thread in debug builds, and
provides asserts that the current thread holds the lock.
- I added this check in a few places in ShenandoahFreeSet, and then
realized that consequently I should also require the same lock in any
code that reads or modifies the ShenandoahFreeSet structure. Therefore
I added locking to the few places that build the free list. While
strictly speaking this is overkill, it doesn't hurt either.

- I also found the reason for the assert: the implementations of
current() and next() have been a little inconsistent, which lead to
allocating thread seeing the same region 2x when hitting the upper
boundary (i.e. shortly before OOM), and therefore accounting the
remaining free space 2x. I changed current() to only return the region
at the current ptr, and next() to only advance that ptr (but not
returning anything), and adjusted calling code.

- I fixed the few things that Aleksey and Zhengyu mentioned too.

Tested with specjvm, jmh-specjvm, gc-bench, jtreg in release and
fastdebug.

http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.01

Ok?

Roman


Am Mittwoch, den 14.12.2016, 19:57 +0100 schrieb Roman Kennke:
> Am Mittwoch, den 14.12.2016, 13:10 -0500 schrieb Zhengyu Gu:
> > Great job! It simplifies the logic a lot!
> > 
> > A few minor suggestions:
> > 
> > - ShenandoahFreeSet::clear()
> > 
> > ????I only see one path to this method and it is from safepoint. so
> > ????replacing fence with safepoint assertion should be appropriate.
> 
> Ah yes. I was thinking it solved the assert that you and others were
> facing. My reasoning was that other threads within the same safepoint
> would need to see the update. However, now that I think about it,
> those
> other threads would need to go through our new-fangled lock, and thus
> a
> CAS, and thus a fence...??hmmm. Will need to try again. You may be
> right and this fence is bogus.
> 
> > - asserting on _heap_lock == 1 on code paths that are protected by
> > the lock
> > ???makes code more readable.
> 
> Yes. I was actually having the same idea as you and store the locking
> thread for debug checking, and do an opaque lock object, and even a
> scoped locker. All that should contribute to sanity.
> 
> > - Will this lock be hot?
> 
> I don't think it's very hot.
> 
> > ?and you want to check safepoint during spinning?
> 
> Nope. The whole point of this excerise was to avoid potentially
> safepointing (and thus requiring oopmap, debug-info, etc blah blah at
> write barriers) :-)
> 
> > ???I wonder if it has impact on TTSP
> 
> I doubt. gc-bench didn't show any such thing. In fact, it might be
> better than before now, at least when you've got threads racing to
> allocate humongous objects. The previous code was not even guaranteed
> to complete (could interleave claiming regions, never finding a
> contiguous block).
> 
> Will come up with a patch later. Need food first. ;-)
> 
> Roman
> 
> > 
> > Thanks,
> > 
> > -Zhengyu
> > 
> > On 12/14/2016 11:36 AM, Roman Kennke wrote:
> > > This patch throws out all the lockfree allocation madness, and
> > > implements a much simpler locked allocation. Since we can't
> > > easily
> > > use
> > > Mutex and friends, and also don't need most of their
> > > functionality
> > > (wait/notify, nesting, etc), I implemented a very simple (simple
> > > as
> > > in,
> > > can read-and-understand it in one glance) CAS based spin-lock.
> > > This
> > > is
> > > wrapped around the normal allocation path, the humongous
> > > allocation
> > > path and the heap growing path. It is not locking around the call
> > > to
> > > full-gc, as this involves other locks and as CHF says, there are
> > > alligators there ;-)
> > > 
> > > This does immensely simplify ShenandoahFreeSet, especially the
> > > racy
> > > humongous allocation path. It does fix the bug that some people
> > > have
> > > encountered about used not consistent with capacity.
> > > 
> > > I've tested it using gc-bench (no regression in allocation
> > > throughput),
> > > SPECjvm and jtreg tests. Looks all fine.
> > > 
> > > When reviewing, please pay special attention to the lock in
> > > ShenandoahHeap::allocate_memory()!
> > > 
> > > http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/
> > > 
> > > Ok?
> > > 
> > > Roman
> > 
> > 

From shade at redhat.com  Thu Dec 15 14:53:10 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 15 Dec 2016 15:53:10 +0100
Subject: RFR: Locked allocation
In-Reply-To: <1481812849.2807.10.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com>
	<1481741825.2597.118.camel@redhat.com>
	<1481812849.2807.10.camel@redhat.com>
Message-ID: <3f866843-01a9-564c-a300-02a256cdc5b8@redhat.com>

On 12/15/2016 03:40 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.01

Looks good.

Minor nit: asserts in ShenandoahHeapLock may use
ShenandoahHeap::assert_heaplock_owned_by_current_thread?

Thanks,
-Aleksey


From zgu at redhat.com  Thu Dec 15 15:11:39 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 15 Dec 2016 10:11:39 -0500
Subject: RFR: Locked allocation
In-Reply-To: <1481812849.2807.10.camel@redhat.com>
References: <1481733389.2597.114.camel@redhat.com>
	<5aad0960-9108-f409-5f5c-eda2c286867c@redhat.com>
	<1481741825.2597.118.camel@redhat.com>
	<1481812849.2807.10.camel@redhat.com>
Message-ID: <6481f1c3-9e7c-226e-55b7-3e8eb87d41c0@redhat.com>

Looks good!

One minor thing:

ShenandoahHeap::assert_heaplock_owned_by_current_thread() can be debug_only method.

Thanks,

-Zhengyu

On 12/15/2016 09:40 AM, Roman Kennke wrote:
> So, here comes the update.
>
> - I improved the lock to be a scoped locker (similar to MutexLocker),
> this should help to keep things in order.
> - It also keeps track of the locking thread in debug builds, and
> provides asserts that the current thread holds the lock.
> - I added this check in a few places in ShenandoahFreeSet, and then
> realized that consequently I should also require the same lock in any
> code that reads or modifies the ShenandoahFreeSet structure. Therefore
> I added locking to the few places that build the free list. While
> strictly speaking this is overkill, it doesn't hurt either.
>
> - I also found the reason for the assert: the implementations of
> current() and next() have been a little inconsistent, which lead to
> allocating thread seeing the same region 2x when hitting the upper
> boundary (i.e. shortly before OOM), and therefore accounting the
> remaining free space 2x. I changed current() to only return the region
> at the current ptr, and next() to only advance that ptr (but not
> returning anything), and adjusted calling code.
>
> - I fixed the few things that Aleksey and Zhengyu mentioned too.
>
> Tested with specjvm, jmh-specjvm, gc-bench, jtreg in release and
> fastdebug.
>
> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.01
>
> Ok?
>
> Roman
>
>
> Am Mittwoch, den 14.12.2016, 19:57 +0100 schrieb Roman Kennke:
>> Am Mittwoch, den 14.12.2016, 13:10 -0500 schrieb Zhengyu Gu:
>>> Great job! It simplifies the logic a lot!
>>>
>>> A few minor suggestions:
>>>
>>> - ShenandoahFreeSet::clear()
>>>
>>>      I only see one path to this method and it is from safepoint. so
>>>      replacing fence with safepoint assertion should be appropriate.
>> Ah yes. I was thinking it solved the assert that you and others were
>> facing. My reasoning was that other threads within the same safepoint
>> would need to see the update. However, now that I think about it,
>> those
>> other threads would need to go through our new-fangled lock, and thus
>> a
>> CAS, and thus a fence...  hmmm. Will need to try again. You may be
>> right and this fence is bogus.
>>
>>> - asserting on _heap_lock == 1 on code paths that are protected by
>>> the lock
>>>     makes code more readable.
>> Yes. I was actually having the same idea as you and store the locking
>> thread for debug checking, and do an opaque lock object, and even a
>> scoped locker. All that should contribute to sanity.
>>
>>> - Will this lock be hot?
>> I don't think it's very hot.
>>
>>>   and you want to check safepoint during spinning?
>> Nope. The whole point of this excerise was to avoid potentially
>> safepointing (and thus requiring oopmap, debug-info, etc blah blah at
>> write barriers) :-)
>>
>>>     I wonder if it has impact on TTSP
>> I doubt. gc-bench didn't show any such thing. In fact, it might be
>> better than before now, at least when you've got threads racing to
>> allocate humongous objects. The previous code was not even guaranteed
>> to complete (could interleave claiming regions, never finding a
>> contiguous block).
>>
>> Will come up with a patch later. Need food first. ;-)
>>
>> Roman
>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>> On 12/14/2016 11:36 AM, Roman Kennke wrote:
>>>> This patch throws out all the lockfree allocation madness, and
>>>> implements a much simpler locked allocation. Since we can't
>>>> easily
>>>> use
>>>> Mutex and friends, and also don't need most of their
>>>> functionality
>>>> (wait/notify, nesting, etc), I implemented a very simple (simple
>>>> as
>>>> in,
>>>> can read-and-understand it in one glance) CAS based spin-lock.
>>>> This
>>>> is
>>>> wrapped around the normal allocation path, the humongous
>>>> allocation
>>>> path and the heap growing path. It is not locking around the call
>>>> to
>>>> full-gc, as this involves other locks and as CHF says, there are
>>>> alligators there ;-)
>>>>
>>>> This does immensely simplify ShenandoahFreeSet, especially the
>>>> racy
>>>> humongous allocation path. It does fix the bug that some people
>>>> have
>>>> encountered about used not consistent with capacity.
>>>>
>>>> I've tested it using gc-bench (no regression in allocation
>>>> throughput),
>>>> SPECjvm and jtreg tests. Looks all fine.
>>>>
>>>> When reviewing, please pay special attention to the lock in
>>>> ShenandoahHeap::allocate_memory()!
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/lockedalloc/webrev.00/
>>>>
>>>> Ok?
>>>>
>>>> Roman
>>>


From roman at kennke.org  Thu Dec 15 15:51:24 2016
From: roman at kennke.org (roman at kennke.org)
Date: Thu, 15 Dec 2016 15:51:24 +0000
Subject: hg: shenandoah/jdk9/hotspot: Locked allocation
Message-ID: <201612151551.uBFFpO76003765@aojmv0008.oracle.com>

Changeset: 9fc91ebeb858
Author:    rkennke
Date:      2016-12-15 16:50 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9fc91ebeb858

Locked allocation

! src/share/vm/gc/shenandoah/shenandoahFreeSet.cpp
! src/share/vm/gc/shenandoah/shenandoahFreeSet.hpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.cpp
! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.hpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp
! src/share/vm/gc/shenandoah/shenandoah_globals.hpp


From rkennke at redhat.com  Thu Dec 15 15:54:32 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 15 Dec 2016 16:54:32 +0100
Subject: RFR: JDK8 C2 fixes
Message-ID: <1481817272.2807.12.camel@redhat.com>

This change fixes two problems in library_call.cpp:

- in inline_unsafe_access(), read-barriers should be moved up,
otherwise we'd have one store in the else branch that does not have a
read-barrier on its value.

- for arraycopies, we must not turn oop-copies into int-copies, this
would bypass the post-barrier that updates our references.

With those changes, derby passes again without crashing.

Ok?

http://cr.openjdk.java.net/~rkennke/jdk8-c2-fix/webrev.00/

Roman

From rwestrel at redhat.com  Thu Dec 15 15:57:16 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 15 Dec 2016 16:57:16 +0100
Subject: RFR: JDK8 C2 fixes
In-Reply-To: <1481817272.2807.12.camel@redhat.com>
References: <1481817272.2807.12.camel@redhat.com>
Message-ID: <f194e703-6a5c-eab5-9a6c-f200f8b1d556@redhat.com>

> - in inline_unsafe_access(), read-barriers should be moved up,
> otherwise we'd have one store in the else branch that does not have a
> read-barrier on its value.

Is this one required? The else branch stores outside the heap as I
understand.

> - for arraycopies, we must not turn oop-copies into int-copies, this
> would bypass the post-barrier that updates our references.

Ok.

Roland.

From rkennke at redhat.com  Thu Dec 15 15:58:00 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 15 Dec 2016 16:58:00 +0100
Subject: RFR: Fix freeze when running OOM during write barrier
Message-ID: <1481817480.2807.14.camel@redhat.com>

We sometimes freeze when a write-barrier runs out of memory. Reason is
the recent refactoring in our driver thread: we would skip turning off
evacuation, however Java threads are waiting for this to happen.
They'll wait indefinitely, and thus never return to a safepoint.

http://cr.openjdk.java.net/~rkennke/fixfreeze/webrev.00/

Ok to push?

Roman

From shade at redhat.com  Thu Dec 15 15:59:58 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 15 Dec 2016 16:59:58 +0100
Subject: RFR: Fix freeze when running OOM during write barrier
In-Reply-To: <1481817480.2807.14.camel@redhat.com>
References: <1481817480.2807.14.camel@redhat.com>
Message-ID: <c18e5e82-ad14-dc00-061e-79830bec57df@redhat.com>

On 12/15/2016 04:58 PM, Roman Kennke wrote:
> We sometimes freeze when a write-barrier runs out of memory. Reason is
> the recent refactoring in our driver thread: we would skip turning off
> evacuation, however Java threads are waiting for this to happen.
> They'll wait indefinitely, and thus never return to a safepoint.
> 
> http://cr.openjdk.java.net/~rkennke/fixfreeze/webrev.00/

Yes. Sorry about this.

-Aleksey


From roman at kennke.org  Thu Dec 15 16:01:15 2016
From: roman at kennke.org (roman at kennke.org)
Date: Thu, 15 Dec 2016 16:01:15 +0000
Subject: hg: shenandoah/jdk9/hotspot: Fix freeze when running OOM during write
	barrier
Message-ID: <201612151601.uBFG1F7Z006451@aojmv0008.oracle.com>

Changeset: 9935fc55ebc2
Author:    rkennke
Date:      2016-12-15 17:00 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9935fc55ebc2

Fix freeze when running OOM during write barrier

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp


From rkennke at redhat.com  Thu Dec 15 16:09:42 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 15 Dec 2016 17:09:42 +0100
Subject: RFR: JDK8 C2 fixes
In-Reply-To: <f194e703-6a5c-eab5-9a6c-f200f8b1d556@redhat.com>
References: <1481817272.2807.12.camel@redhat.com>
	<f194e703-6a5c-eab5-9a6c-f200f8b1d556@redhat.com>
Message-ID: <1481818182.2807.15.camel@redhat.com>

Am Donnerstag, den 15.12.2016, 16:57 +0100 schrieb Roland Westrelin:
> > - in inline_unsafe_access(), read-barriers should be moved up,
> > otherwise we'd have one store in the else branch that does not have
> > a
> > read-barrier on its value.
> 
> Is this one required? The else branch stores outside the heap as I
> understand.

You are right, it's not needed. The problem goes away just with the
arraycopy fix:

http://cr.openjdk.java.net/~rkennke/jdk8-c2-fix/webrev.01/

Will push that then...

Roman


From roman at kennke.org  Thu Dec 15 16:10:45 2016
From: roman at kennke.org (roman at kennke.org)
Date: Thu, 15 Dec 2016 16:10:45 +0000
Subject: hg: shenandoah/jdk8u/hotspot: Prevent C2 optimization that turns oop
	arraycopy into int arraycopy and elide the required post-barrier.
Message-ID: <201612151610.uBFGAjqw008754@aojmv0008.oracle.com>

Changeset: cb8a8ef885c3
Author:    rkennke
Date:      2016-12-15 17:10 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/cb8a8ef885c3

Prevent C2 optimization that turns oop arraycopy into int arraycopy and elide the required post-barrier.

! src/share/vm/opto/library_call.cpp


From rkennke at redhat.com  Thu Dec 15 16:34:11 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 15 Dec 2016 17:34:11 +0100
Subject: RFR: Fix ReferenceProcessor related assert
Message-ID: <1481819651.2807.17.camel@redhat.com>

Aleksey recently found an assert:

#??Internal Error (/opt/jenkins/workspace/jdk9-shenandoah-
fastdebug/hotspot/src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp:2
72), pid=29026, tid=29028
#??assert(o == __null || oopDesc::unsafe_equals(o,
resolve_oop_static(o))) failed: only write to-space values

coming from:

V??[libjvm.so+0xa4e87a]??report_vm_error(char const*, int, char const*,
char const*, ...)+0xea
V??[libjvm.so+0x13b374f]??ShenandoahBarrierSet::write_ref_field_work(vo
id*, oop, bool)+0x11f
V??[libjvm.so+0x132f0e7]??BarrierSet::write_ref_field(void*, oop,
bool)+0x57
V??[libjvm.so+0x132c6dd]??ReferenceProcessor::enqueue_discovered_reflis
t(DiscoveredList&)+0x71d

I think this is harmless, but needs some treatment. What happens is
this: in enqueue_discovered_reflist() it calls
swap_reference_pending_list() which can give us a from-space reference
(GC roots in Universe get updated after weakref processing!). Then it
stores that in set_discovered_raw() which is ok, because that does the
correct read-barrier before storing, but then goes on to call
write_ref_field() which, for Shenandoah, only does assert a few things.
And blows up when it gets a from-space reference.

The cheapest fix is to do a read-barrier in debug build.

http://cr.openjdk.java.net/~rkennke/fixrefproc/webrev.00/

Ok to push?

Roman

From shade at redhat.com  Thu Dec 15 16:40:48 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 15 Dec 2016 17:40:48 +0100
Subject: RFR: Fix ReferenceProcessor related assert
In-Reply-To: <1481819651.2807.17.camel@redhat.com>
References: <1481819651.2807.17.camel@redhat.com>
Message-ID: <0830ab17-4c99-6728-78cc-cf3b9a4cdc5d@redhat.com>

On 12/15/2016 05:34 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/fixrefproc/webrev.00/

Looks okay.

-Aleksey


From roman at kennke.org  Thu Dec 15 16:41:51 2016
From: roman at kennke.org (roman at kennke.org)
Date: Thu, 15 Dec 2016 16:41:51 +0000
Subject: hg: shenandoah/jdk9/hotspot: Fix assert coming from
	ReferenceProcessor.
Message-ID: <201612151641.uBFGfqxm016867@aojmv0008.oracle.com>

Changeset: d9e673adfa1c
Author:    rkennke
Date:      2016-12-15 17:41 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/d9e673adfa1c

Fix assert coming from ReferenceProcessor.

! src/share/vm/gc/shared/referenceProcessor.cpp


From zgu at redhat.com  Thu Dec 15 17:43:31 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 15 Dec 2016 12:43:31 -0500
Subject: RFR: Use heuristics to determine the number of conc threads for each
	conc gc cycle
Message-ID: <8bd11a25-988c-3862-51ae-c23c683ae41a@redhat.com>

This is an experimental heuristics that determines the number of concurrent threads for each concurrent GC cycle.

SPECjbb runs do not show obvious improvement, it seems to ramp up load quickly, so conc thread count stays high.


The change set also contains some cleanup.


http://cr.openjdk.java.net/~zgu/shenandoah/conc-worker-heuristics/webrev.00/


Test:
   SPECjbb, some of SPECjvm benchmarks.


Thanks,

-Zhengyu


From chf at redhat.com  Fri Dec 16 13:36:48 2016
From: chf at redhat.com (chf at redhat.com)
Date: Fri, 16 Dec 2016 13:36:48 +0000
Subject: hg: shenandoah/jdk9/hotspot: Connection Matrix
Message-ID: <201612161336.uBGDamuI003692@aojmv0008.oracle.com>

Changeset: c5cd9ee7a881
Author:    chf
Date:      2016-12-15 14:32 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/c5cd9ee7a881

Connection Matrix

! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.cpp
! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.hpp
! src/share/vm/gc/shenandoah/shenandoah_globals.hpp


From zgu at redhat.com  Fri Dec 16 14:40:34 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 16 Dec 2016 09:40:34 -0500
Subject: RFR:(XS): Small enhancement for large allocation
Message-ID: <d2d55564-563a-97f7-5b0a-daa2e50437a2@redhat.com>

When large allocation fails, current implementation only grows heap by 1 and retry. This is slightly inefficient.
We can grow the heap by required regions at once, to avoid unnecessary loop.

http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.00/


Thanks,

-Zhengyu


From shade at redhat.com  Fri Dec 16 14:48:42 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 16 Dec 2016 15:48:42 +0100
Subject: RFR:(XS): Small enhancement for large allocation
In-Reply-To: <d2d55564-563a-97f7-5b0a-daa2e50437a2@redhat.com>
References: <d2d55564-563a-97f7-5b0a-daa2e50437a2@redhat.com>
Message-ID: <53563771-6b45-eb00-0f50-7df1720ba013@redhat.com>

On 12/16/2016 03:40 PM, Zhengyu Gu wrote:
> When large allocation fails, current implementation only grows heap by 1 and
> retry. This is slightly inefficient.
> We can grow the heap by required regions at once, to avoid unnecessary loop.
> 
> http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.00/

Been meaning to fix that! Shouldn't we instead fix the logic in
ShenandoahHeap::allocate_memory_work, and not try to do another grow_heap_by in
downcall to allocate_memory_under_lock -> allocate_large_memory?


HeapWord* ShenandoahHeap::allocate_memory_work(size_t word_size) {
  ShenandoahHeapLock heap_lock(this);

  HeapWord* result = allocate_memory_under_lock(word_size);
  while (result == NULL && _num_regions < _max_regions) {
    grow_heap_by(1); // <--- depend on word_size here
    result = allocate_memory_under_lock(word_size);
  }

  return result;
}

Thanks,
-Aleksey


From rkennke at redhat.com  Fri Dec 16 14:55:06 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 16 Dec 2016 15:55:06 +0100
Subject: RFR: Degenerating concurrent marking
Message-ID: <1481900106.2807.20.camel@redhat.com>

This patch implements what I call 'degenerating concurrent marking'.
If, during concurrent mark, we run out of memory, instead of stopping,
throwing away all marking data and doing a full-gc, it gracefully hands
over all existing marking work to the subsequent final-mark pause,
finishes marking there, and kicks of normal marking. The idea being
that in most cases, the OOM is not happening because we got into a bad
situation (fragmented heap or such) but only temporary alloc bursts or
such, *and* chances are high that we're almost done marking anyway.

I made it such that existing mark bitmaps, task queues, SATB buffers
and weakref-queues are left intact, if the heuristics decide to go into
degenerated concurrent marking, then the final-mark pause carries on
where concurrent marking left. Interestingly, the code for this is
mostly in place already ... in final marking we already finish off
marking in the way that we need.

I needed to tweak the termination protocol in the taskqueue for that,
and not clear task queues on cancellation. Instead I added a 'shortcut'
in the case we need to terminate without draining the task queues.
Please look at this carefully, I am not totally sure I got that right.

In addition, I also re-wrote adaptive heuristics. It will start out
with 10% free threshold (i.e. we start marking when 10% available space
is left), and lower that if we have 5 successful markings in a row, and
bump that up if we fail to complete concurrent marking. We limit the
free threshold 30<free_threshold<3. All parameters can be configured.

This adaptive heuristics work very well for me, and I'm tempted to make
this default soon. It makes much better use of headroom, which means
fewer GC cycles, and thus better throughput.


http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/

Ok? Opinions?

Roman


From rkennke at redhat.com  Fri Dec 16 15:10:40 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 16 Dec 2016 16:10:40 +0100
Subject: RFR: Use heuristics to determine the number of conc threads for
	each conc gc cycle
In-Reply-To: <8bd11a25-988c-3862-51ae-c23c683ae41a@redhat.com>
References: <8bd11a25-988c-3862-51ae-c23c683ae41a@redhat.com>
Message-ID: <1481901040.2807.22.camel@redhat.com>

Am Donnerstag, den 15.12.2016, 12:43 -0500 schrieb Zhengyu Gu:
> This is an experimental heuristics that determines the number of
> concurrent threads for each concurrent GC cycle.
> 
> SPECjbb runs do not show obvious improvement, it seems to ramp up
> load quickly, so conc thread count stays high.
> 
> 
> The change set also contains some cleanup.
> 
> 
> http://cr.openjdk.java.net/~zgu/shenandoah/conc-worker-heuristics/web
> rev.00/
> 
> 
> Test:
> ???SPECjbb, some of SPECjvm benchmarks.
> 
> 
> Thanks,
> 
> -Zhengyu
> 

Interesting. I think the patch is ok.

However, under which situation do you expect an improvement? Can we
construct a benchmark for this?

I think that applications with high alloc pressure (like SPECjbb) will
push us to maximum threads. Low alloc pressure would let us stay lower
too, but those apps would likely not be dominated by GC work anyway.

Roman

From zgu at redhat.com  Fri Dec 16 15:11:44 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 16 Dec 2016 10:11:44 -0500
Subject: RFR:(XS): Small enhancement for large allocation
In-Reply-To: <53563771-6b45-eb00-0f50-7df1720ba013@redhat.com>
References: <d2d55564-563a-97f7-5b0a-daa2e50437a2@redhat.com>
	<53563771-6b45-eb00-0f50-7df1720ba013@redhat.com>
Message-ID: <e4e9f78e-6f7f-7729-932f-b52341ac3320@redhat.com>

Agree!

http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.01/

-Zhengyu


On 12/16/2016 09:48 AM, Aleksey Shipilev wrote:
> On 12/16/2016 03:40 PM, Zhengyu Gu wrote:
>> When large allocation fails, current implementation only grows heap by 1 and
>> retry. This is slightly inefficient.
>> We can grow the heap by required regions at once, to avoid unnecessary loop.
>>
>> http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.00/
> Been meaning to fix that! Shouldn't we instead fix the logic in
> ShenandoahHeap::allocate_memory_work, and not try to do another grow_heap_by in
> downcall to allocate_memory_under_lock -> allocate_large_memory?
>
>
> HeapWord* ShenandoahHeap::allocate_memory_work(size_t word_size) {
>    ShenandoahHeapLock heap_lock(this);
>
>    HeapWord* result = allocate_memory_under_lock(word_size);
>    while (result == NULL && _num_regions < _max_regions) {
>      grow_heap_by(1); // <--- depend on word_size here
>      result = allocate_memory_under_lock(word_size);
>    }
>
>    return result;
> }
>
> Thanks,
> -Aleksey
>


From zgu at redhat.com  Fri Dec 16 15:16:26 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 16 Dec 2016 10:16:26 -0500
Subject: RFR: Use heuristics to determine the number of conc threads for
	each conc gc cycle
In-Reply-To: <1481901040.2807.22.camel@redhat.com>
References: <8bd11a25-988c-3862-51ae-c23c683ae41a@redhat.com>
	<1481901040.2807.22.camel@redhat.com>
Message-ID: <874f388d-a504-6cd8-098e-2b33328d5f45@redhat.com>

I am not sure either, I withdraw it for now and try to find some "real" applications.
I think benchmarks distort the heuristics.

I will separate clean up and send RFR for that part only.

Thanks,

-Zhengyu


On 12/16/2016 10:10 AM, Roman Kennke wrote:
> Am Donnerstag, den 15.12.2016, 12:43 -0500 schrieb Zhengyu Gu:
>> This is an experimental heuristics that determines the number of
>> concurrent threads for each concurrent GC cycle.
>>
>> SPECjbb runs do not show obvious improvement, it seems to ramp up
>> load quickly, so conc thread count stays high.
>>
>>
>> The change set also contains some cleanup.
>>
>>
>> http://cr.openjdk.java.net/~zgu/shenandoah/conc-worker-heuristics/web
>> rev.00/
>>
>>
>> Test:
>>     SPECjbb, some of SPECjvm benchmarks.
>>
>>
>> Thanks,
>>
>> -Zhengyu
>>
> Interesting. I think the patch is ok.
>
> However, under which situation do you expect an improvement? Can we
> construct a benchmark for this?
>
> I think that applications with high alloc pressure (like SPECjbb) will
> push us to maximum threads. Low alloc pressure would let us stay lower
> too, but those apps would likely not be dominated by GC work anyway.
>
> Roman


From shade at redhat.com  Fri Dec 16 15:17:04 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 16 Dec 2016 16:17:04 +0100
Subject: RFR:(XS): Small enhancement for large allocation
In-Reply-To: <e4e9f78e-6f7f-7729-932f-b52341ac3320@redhat.com>
References: <d2d55564-563a-97f7-5b0a-daa2e50437a2@redhat.com>
	<53563771-6b45-eb00-0f50-7df1720ba013@redhat.com>
	<e4e9f78e-6f7f-7729-932f-b52341ac3320@redhat.com>
Message-ID: <1b67413c-1837-1cf7-4872-7b835a4b52c5@redhat.com>

On 12/16/2016 04:11 PM, Zhengyu Gu wrote:
> http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.01/

Yup.

Why not the usual macro?
 align_size_up(word_size * HeapWordSize, ShenandoahHeapRegion::RegionSizeBytes)

Thanks,
-Aleksey


From shade at redhat.com  Fri Dec 16 15:18:07 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 16 Dec 2016 16:18:07 +0100
Subject: RFR:(XS): Small enhancement for large allocation
In-Reply-To: <1b67413c-1837-1cf7-4872-7b835a4b52c5@redhat.com>
References: <d2d55564-563a-97f7-5b0a-daa2e50437a2@redhat.com>
	<53563771-6b45-eb00-0f50-7df1720ba013@redhat.com>
	<e4e9f78e-6f7f-7729-932f-b52341ac3320@redhat.com>
	<1b67413c-1837-1cf7-4872-7b835a4b52c5@redhat.com>
Message-ID: <34d59aea-3a79-473f-5332-13b4e27e3fae@redhat.com>

On 12/16/2016 04:17 PM, Aleksey Shipilev wrote:
> On 12/16/2016 04:11 PM, Zhengyu Gu wrote:
>> http://cr.openjdk.java.net/~zgu/shenandoah/large_alloc/webrev.01/
> 
> Yup.
> 
> Why not the usual macro?
>  align_size_up(word_size * HeapWordSize, ShenandoahHeapRegion::RegionSizeBytes)

Nevermind :)

-Aleksey


From zgu at redhat.com  Fri Dec 16 15:54:49 2016
From: zgu at redhat.com (zgu at redhat.com)
Date: Fri, 16 Dec 2016 15:54:49 +0000
Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets
Message-ID: <201612161554.uBGFsnck007533@aojmv0008.oracle.com>

Changeset: 0638df313dc4
Author:    zgu
Date:      2016-12-16 10:33 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/0638df313dc4

More efficient heap expansion

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp

Changeset: eb5f5b74878d
Author:    zgu
Date:      2016-12-16 10:34 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/eb5f5b74878d

Merge

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp


From zgu at redhat.com  Fri Dec 16 17:02:44 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 16 Dec 2016 12:02:44 -0500
Subject: RFR: Degenerating concurrent marking
In-Reply-To: <1481900106.2807.20.camel@redhat.com>
References: <1481900106.2807.20.camel@redhat.com>
Message-ID: <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com>

Hi Roman,

- taskqueue
  
   Adding force termination to TerminatorTerminator seems more logical to me

class TerminatorTerminator: public CHeapObj<mtInternal> {
public:
   virtual bool should_exit_termination() = 0;
   virtual bool should_force_termination() = 0;
};

- shenandoahConcurrentMark.cpp #392

   Please update assert message.


Otherwise, look good to me.

Thanks,

-Zhengyu


On 12/16/2016 09:55 AM, Roman Kennke wrote:
> This patch implements what I call 'degenerating concurrent marking'.
> If, during concurrent mark, we run out of memory, instead of stopping,
> throwing away all marking data and doing a full-gc, it gracefully hands
> over all existing marking work to the subsequent final-mark pause,
> finishes marking there, and kicks of normal marking. The idea being
> that in most cases, the OOM is not happening because we got into a bad
> situation (fragmented heap or such) but only temporary alloc bursts or
> such, *and* chances are high that we're almost done marking anyway.
>
> I made it such that existing mark bitmaps, task queues, SATB buffers
> and weakref-queues are left intact, if the heuristics decide to go into
> degenerated concurrent marking, then the final-mark pause carries on
> where concurrent marking left. Interestingly, the code for this is
> mostly in place already ... in final marking we already finish off
> marking in the way that we need.
>
> I needed to tweak the termination protocol in the taskqueue for that,
> and not clear task queues on cancellation. Instead I added a 'shortcut'
> in the case we need to terminate without draining the task queues.
> Please look at this carefully, I am not totally sure I got that right.
>
> In addition, I also re-wrote adaptive heuristics. It will start out
> with 10% free threshold (i.e. we start marking when 10% available space
> is left), and lower that if we have 5 successful markings in a row, and
> bump that up if we fail to complete concurrent marking. We limit the
> free threshold 30<free_threshold<3. All parameters can be configured.
>
> This adaptive heuristics work very well for me, and I'm tempted to make
> this default soon. It makes much better use of headroom, which means
> fewer GC cycles, and thus better throughput.
>
>
> http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/
>
> Ok? Opinions?
>
> Roman
>


From rkennke at redhat.com  Fri Dec 16 19:33:38 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 16 Dec 2016 20:33:38 +0100
Subject: RFR: Degenerating concurrent marking
In-Reply-To: <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com>
References: <1481900106.2807.20.camel@redhat.com>
	<338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com>
Message-ID: <1481916818.2807.27.camel@redhat.com>

Am Freitag, den 16.12.2016, 12:02 -0500 schrieb Zhengyu Gu:
> Hi Roman,
> 
> - taskqueue
> ??
> ???Adding force termination to TerminatorTerminator seems more
> logical to me
> 
> class TerminatorTerminator: public CHeapObj<mtInternal> {
> public:
> ???virtual bool should_exit_termination() = 0;
> ???virtual bool should_force_termination() = 0;
> };
> 
> - shenandoahConcurrentMark.cpp #392
> 
> ???Please update assert message.

Ok. Like this:

http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.01/

Roman

> 
> 
> Otherwise, look good to me.
> 
> Thanks,
> 
> -Zhengyu
> 
> 
> On 12/16/2016 09:55 AM, Roman Kennke wrote:
> > This patch implements what I call 'degenerating concurrent
> > marking'.
> > If, during concurrent mark, we run out of memory, instead of
> > stopping,
> > throwing away all marking data and doing a full-gc, it gracefully
> > hands
> > over all existing marking work to the subsequent final-mark pause,
> > finishes marking there, and kicks of normal marking. The idea being
> > that in most cases, the OOM is not happening because we got into a
> > bad
> > situation (fragmented heap or such) but only temporary alloc bursts
> > or
> > such, *and* chances are high that we're almost done marking anyway.
> > 
> > I made it such that existing mark bitmaps, task queues, SATB
> > buffers
> > and weakref-queues are left intact, if the heuristics decide to go
> > into
> > degenerated concurrent marking, then the final-mark pause carries
> > on
> > where concurrent marking left. Interestingly, the code for this is
> > mostly in place already ... in final marking we already finish off
> > marking in the way that we need.
> > 
> > I needed to tweak the termination protocol in the taskqueue for
> > that,
> > and not clear task queues on cancellation. Instead I added a
> > 'shortcut'
> > in the case we need to terminate without draining the task queues.
> > Please look at this carefully, I am not totally sure I got that
> > right.
> > 
> > In addition, I also re-wrote adaptive heuristics. It will start out
> > with 10% free threshold (i.e. we start marking when 10% available
> > space
> > is left), and lower that if we have 5 successful markings in a row,
> > and
> > bump that up if we fail to complete concurrent marking. We limit
> > the
> > free threshold 30<free_threshold<3. All parameters can be
> > configured.
> > 
> > This adaptive heuristics work very well for me, and I'm tempted to
> > make
> > this default soon. It makes much better use of headroom, which
> > means
> > fewer GC cycles, and thus better throughput.
> > 
> > 
> > http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/
> > 
> > Ok? Opinions?
> > 
> > Roman
> > 
> 
> 

From roman at kennke.org  Sat Dec 17 13:30:55 2016
From: roman at kennke.org (roman at kennke.org)
Date: Sat, 17 Dec 2016 13:30:55 +0000
Subject: hg: shenandoah/jdk9/hotspot: Ensure metadata alive for Shenandoah too.
Message-ID: <201612171330.uBHDUtJQ028583@aojmv0008.oracle.com>

Changeset: baec38f7a7e5
Author:    rkennke
Date:      2016-12-17 14:30 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/baec38f7a7e5

Ensure metadata alive for Shenandoah too.

! src/share/vm/ci/ciObjectFactory.cpp


From rkennke at redhat.com  Sat Dec 17 13:32:18 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Sat, 17 Dec 2016 14:32:18 +0100
Subject: FYI: Ensure metadata alive for Shenandoah too
Message-ID: <1481981538.2807.29.camel@redhat.com>

I pushed the following fix. It fixes an occasional assert about a root
object not being marked.

diff --git a/src/share/vm/ci/ciObjectFactory.cpp
b/src/share/vm/ci/ciObjectFactory.cpp
--- a/src/share/vm/ci/ciObjectFactory.cpp
+++ b/src/share/vm/ci/ciObjectFactory.cpp
@@ -413,7 +413,7 @@
???ASSERT_IN_VM; // We're handling raw oops here.
?
?#if INCLUDE_ALL_GCS
-??if (!UseG1GC) {
+??if (!(UseG1GC || UseShenandoahGC)) {
?????return;
???}
???Klass* metadata_owner_klass;

From rkennke at redhat.com  Sat Dec 17 13:41:23 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Sat, 17 Dec 2016 14:41:23 +0100
Subject: RFR: Degenerating concurrent marking
In-Reply-To: <338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com>
References: <1481900106.2807.20.camel@redhat.com>
	<338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com>
Message-ID: <1481982083.2807.31.camel@redhat.com>

As suggested by Zhengyu on IRC, I now changed it to:

????if (terminator != NULL && terminator->should_force_termination()) {
??????return true;
????}

makes the code more readable.

The assert that I observed was not caused by this change and is already
fixed.

Ok to go now?

http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.03

Roman


Am Freitag, den 16.12.2016, 12:02 -0500 schrieb Zhengyu Gu:
> Hi Roman,
> 
> - taskqueue
> ??
> ???Adding force termination to TerminatorTerminator seems more
> logical to me
> 
> class TerminatorTerminator: public CHeapObj<mtInternal> {
> public:
> ???virtual bool should_exit_termination() = 0;
> ???virtual bool should_force_termination() = 0;
> };
> 
> - shenandoahConcurrentMark.cpp #392
> 
> ???Please update assert message.
> 
> 
> Otherwise, look good to me.
> 
> Thanks,
> 
> -Zhengyu
> 
> 
> On 12/16/2016 09:55 AM, Roman Kennke wrote:
> > This patch implements what I call 'degenerating concurrent
> > marking'.
> > If, during concurrent mark, we run out of memory, instead of
> > stopping,
> > throwing away all marking data and doing a full-gc, it gracefully
> > hands
> > over all existing marking work to the subsequent final-mark pause,
> > finishes marking there, and kicks of normal marking. The idea being
> > that in most cases, the OOM is not happening because we got into a
> > bad
> > situation (fragmented heap or such) but only temporary alloc bursts
> > or
> > such, *and* chances are high that we're almost done marking anyway.
> > 
> > I made it such that existing mark bitmaps, task queues, SATB
> > buffers
> > and weakref-queues are left intact, if the heuristics decide to go
> > into
> > degenerated concurrent marking, then the final-mark pause carries
> > on
> > where concurrent marking left. Interestingly, the code for this is
> > mostly in place already ... in final marking we already finish off
> > marking in the way that we need.
> > 
> > I needed to tweak the termination protocol in the taskqueue for
> > that,
> > and not clear task queues on cancellation. Instead I added a
> > 'shortcut'
> > in the case we need to terminate without draining the task queues.
> > Please look at this carefully, I am not totally sure I got that
> > right.
> > 
> > In addition, I also re-wrote adaptive heuristics. It will start out
> > with 10% free threshold (i.e. we start marking when 10% available
> > space
> > is left), and lower that if we have 5 successful markings in a row,
> > and
> > bump that up if we fail to complete concurrent marking. We limit
> > the
> > free threshold 30<free_threshold<3. All parameters can be
> > configured.
> > 
> > This adaptive heuristics work very well for me, and I'm tempted to
> > make
> > this default soon. It makes much better use of headroom, which
> > means
> > fewer GC cycles, and thus better throughput.
> > 
> > 
> > http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/
> > 
> > Ok? Opinions?
> > 
> > Roman
> > 
> 
> 

From rkennke at redhat.com  Sat Dec 17 15:52:34 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Sat, 17 Dec 2016 16:52:34 +0100
Subject: RFR: Degenerating concurrent marking
In-Reply-To: <1481982083.2807.31.camel@redhat.com>
References: <1481900106.2807.20.camel@redhat.com>
	<338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com>
	<1481982083.2807.31.camel@redhat.com>
Message-ID: <1481989954.2807.32.camel@redhat.com>

Am Samstag, den 17.12.2016, 14:41 +0100 schrieb Roman Kennke:
> As suggested by Zhengyu on IRC, I now changed it to:
> 
> ????if (terminator != NULL && terminator->should_force_termination()) 
> {
> ??????return true;
> ????}
> 
> makes the code more readable.

Hmm, no, this didn't work. We need the spinning as was proposed before:

http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.04

This passes all tests that I throw at it.

ok to push?

Roman

> 
> The assert that I observed was not caused by this change and is
> already
> fixed.
> 
> Ok to go now?
> 
> http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.03
> 
> Roman
> 
> 
> Am Freitag, den 16.12.2016, 12:02 -0500 schrieb Zhengyu Gu:
> > Hi Roman,
> > 
> > - taskqueue
> > ??
> > ???Adding force termination to TerminatorTerminator seems more
> > logical to me
> > 
> > class TerminatorTerminator: public CHeapObj<mtInternal> {
> > public:
> > ???virtual bool should_exit_termination() = 0;
> > ???virtual bool should_force_termination() = 0;
> > };
> > 
> > - shenandoahConcurrentMark.cpp #392
> > 
> > ???Please update assert message.
> > 
> > 
> > Otherwise, look good to me.
> > 
> > Thanks,
> > 
> > -Zhengyu
> > 
> > 
> > On 12/16/2016 09:55 AM, Roman Kennke wrote:
> > > This patch implements what I call 'degenerating concurrent
> > > marking'.
> > > If, during concurrent mark, we run out of memory, instead of
> > > stopping,
> > > throwing away all marking data and doing a full-gc, it gracefully
> > > hands
> > > over all existing marking work to the subsequent final-mark
> > > pause,
> > > finishes marking there, and kicks of normal marking. The idea
> > > being
> > > that in most cases, the OOM is not happening because we got into
> > > a
> > > bad
> > > situation (fragmented heap or such) but only temporary alloc
> > > bursts
> > > or
> > > such, *and* chances are high that we're almost done marking
> > > anyway.
> > > 
> > > I made it such that existing mark bitmaps, task queues, SATB
> > > buffers
> > > and weakref-queues are left intact, if the heuristics decide to
> > > go
> > > into
> > > degenerated concurrent marking, then the final-mark pause carries
> > > on
> > > where concurrent marking left. Interestingly, the code for this
> > > is
> > > mostly in place already ... in final marking we already finish
> > > off
> > > marking in the way that we need.
> > > 
> > > I needed to tweak the termination protocol in the taskqueue for
> > > that,
> > > and not clear task queues on cancellation. Instead I added a
> > > 'shortcut'
> > > in the case we need to terminate without draining the task
> > > queues.
> > > Please look at this carefully, I am not totally sure I got that
> > > right.
> > > 
> > > In addition, I also re-wrote adaptive heuristics. It will start
> > > out
> > > with 10% free threshold (i.e. we start marking when 10% available
> > > space
> > > is left), and lower that if we have 5 successful markings in a
> > > row,
> > > and
> > > bump that up if we fail to complete concurrent marking. We limit
> > > the
> > > free threshold 30<free_threshold<3. All parameters can be
> > > configured.
> > > 
> > > This adaptive heuristics work very well for me, and I'm tempted
> > > to
> > > make
> > > this default soon. It makes much better use of headroom, which
> > > means
> > > fewer GC cycles, and thus better throughput.
> > > 
> > > 
> > > http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/
> > > 
> > > Ok? Opinions?
> > > 
> > > Roman
> > > 
> > 
> > 

From rkennke at redhat.com  Sun Dec 18 13:24:46 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Sun, 18 Dec 2016 14:24:46 +0100
Subject: RFR: Fix freeze on OOM during evacuation
Message-ID: <1482067486.2807.35.camel@redhat.com>

The run_service() loop in ShenandoahConcurrentThread can still deadlock
when OOM happens during evacuation: when we get out of final-mark, but
have not yet started the GC threads, a Java thread could OOM and the
ShenandoahConcurrentThread never get to resetting the evacuation-in-
progress flag. The Java thread would wait forever and not get to a
safepoint, while the GC waits for Java threads to get to safepoint for
the next pause.

The change fixes it by always resetting the evac flag when coming out
of service_normal_cycle().


Tested by running SPECjvm in a loop 24hours and jcstress.

http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.00

Ok to push?

Roman

From roman at kennke.org  Mon Dec 19 11:05:42 2016
From: roman at kennke.org (roman at kennke.org)
Date: Mon, 19 Dec 2016 11:05:42 +0000
Subject: hg: shenandoah/jdk8u/hotspot: Ensure metadata alive for Shenandoah
	too.
Message-ID: <201612191105.uBJB5gl4024641@aojmv0008.oracle.com>

Changeset: 91b6e4811a5f
Author:    rkennke
Date:      2016-12-19 12:05 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/91b6e4811a5f

Ensure metadata alive for Shenandoah too.

! src/share/vm/ci/ciObjectFactory.cpp


From roman at kennke.org  Mon Dec 19 14:17:55 2016
From: roman at kennke.org (roman at kennke.org)
Date: Mon, 19 Dec 2016 14:17:55 +0000
Subject: hg: shenandoah/jdk8u/hotspot: Added missing read-barrier to
	inline_unsafe_ordered_store() in C2 intrinsics.
Message-ID: <201612191417.uBJEHtTI017325@aojmv0008.oracle.com>

Changeset: c7ccb4a2b360
Author:    rkennke
Date:      2016-12-19 15:17 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b360

Added missing read-barrier to inline_unsafe_ordered_store() in C2 intrinsics.

! src/share/vm/opto/library_call.cpp


From rkennke at redhat.com  Mon Dec 19 14:21:08 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 19 Dec 2016 15:21:08 +0100
Subject: FYI: (Re-)Added missing read barrier in C2's
	inline_unsafe_ordered_store() intrinsic
Message-ID: <1482157268.2807.42.camel@redhat.com>

This one fell under the table, probably because it's not present in
jdk9.

http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b360

Roman


From zgu at redhat.com  Mon Dec 19 14:25:54 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 19 Dec 2016 09:25:54 -0500
Subject: FYI: (Re-)Added missing read barrier in C2's
	inline_unsafe_ordered_store() intrinsic
In-Reply-To: <1482157268.2807.42.camel@redhat.com>
References: <1482157268.2807.42.camel@redhat.com>
Message-ID: <4f19513a-e4b0-a606-8dad-a114e41a610c@redhat.com>

Should the read barrier be shenandoah only?

-Zhengyu


On 12/19/2016 09:21 AM, Roman Kennke wrote:
> This one fell under the table, probably because it's not present in
> jdk9.
>
> http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b360
>
> Roman
>


From rkennke at redhat.com  Mon Dec 19 14:28:20 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 19 Dec 2016 15:28:20 +0100
Subject: FYI: (Re-)Added missing read barrier in C2's
	inline_unsafe_ordered_store() intrinsic
In-Reply-To: <4f19513a-e4b0-a606-8dad-a114e41a610c@redhat.com>
References: <1482157268.2807.42.camel@redhat.com>
	<4f19513a-e4b0-a606-8dad-a114e41a610c@redhat.com>
Message-ID: <1482157700.2807.43.camel@redhat.com>

Am Montag, den 19.12.2016, 09:25 -0500 schrieb Zhengyu Gu:
> Should the read barrier be shenandoah only?

It already is.

Yes, we should refactor this to be more obvious.

Roman

> 
> -Zhengyu
> 
> 
> On 12/19/2016 09:21 AM, Roman Kennke wrote:
> > This one fell under the table, probably because it's not present in
> > jdk9.
> > 
> > http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b36
> > 0
> > 
> > Roman
> > 
> 
> 

From zgu at redhat.com  Mon Dec 19 14:30:00 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 19 Dec 2016 09:30:00 -0500
Subject: FYI: (Re-)Added missing read barrier in C2's
	inline_unsafe_ordered_store() intrinsic
In-Reply-To: <1482157700.2807.43.camel@redhat.com>
References: <1482157268.2807.42.camel@redhat.com>
	<4f19513a-e4b0-a606-8dad-a114e41a610c@redhat.com>
	<1482157700.2807.43.camel@redhat.com>
Message-ID: <4867f1d8-e97f-68fb-c6cb-9c29af4ed317@redhat.com>

Okay,

Thanks,

-Zhengyu


On 12/19/2016 09:28 AM, Roman Kennke wrote:
> Am Montag, den 19.12.2016, 09:25 -0500 schrieb Zhengyu Gu:
>> Should the read barrier be shenandoah only?
> It already is.
>
> Yes, we should refactor this to be more obvious.
>
> Roman
>
>> -Zhengyu
>>
>>
>> On 12/19/2016 09:21 AM, Roman Kennke wrote:
>>> This one fell under the table, probably because it's not present in
>>> jdk9.
>>>
>>> http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b36
>>> 0
>>>
>>> Roman
>>>
>>


From rwestrel at redhat.com  Mon Dec 19 14:35:30 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 19 Dec 2016 15:35:30 +0100
Subject: FYI: (Re-)Added missing read barrier in C2's
	inline_unsafe_ordered_store() intrinsic
In-Reply-To: <1482157268.2807.42.camel@redhat.com>
References: <1482157268.2807.42.camel@redhat.com>
Message-ID: <bf2aebb3-1800-259b-c43d-c67d64f3beef@redhat.com>

> This one fell under the table, probably because it's not present in
> jdk9.
> 
> http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c7ccb4a2b360

Thanks for fixing that.

Roland.

From shade at redhat.com  Mon Dec 19 15:43:01 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 19 Dec 2016 16:43:01 +0100
Subject: RFR: Fix freeze on OOM during evacuation
In-Reply-To: <1482067486.2807.35.camel@redhat.com>
References: <1482067486.2807.35.camel@redhat.com>
Message-ID: <dbd78c97-30c3-3f72-2407-bf9ff86c02ad@redhat.com>

On 12/18/2016 02:24 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.00

This webrev is contaminated with degenerate conc mark patch?

-Aleksey


From rkennke at redhat.com  Mon Dec 19 16:25:52 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 19 Dec 2016 17:25:52 +0100
Subject: RFR: Fix freeze on OOM during evacuation
In-Reply-To: <dbd78c97-30c3-3f72-2407-bf9ff86c02ad@redhat.com>
References: <1482067486.2807.35.camel@redhat.com>
	<dbd78c97-30c3-3f72-2407-bf9ff86c02ad@redhat.com>
Message-ID: <1482164752.2807.45.camel@redhat.com>

Am Montag, den 19.12.2016, 16:43 +0100 schrieb Aleksey Shipilev:
> On 12/18/2016 02:24 PM, Roman Kennke wrote:
> > http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.00
> 
> This webrev is contaminated with degenerate conc mark patch?

Duh.

http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.01/

Roman

From roman at kennke.org  Mon Dec 19 16:33:31 2016
From: roman at kennke.org (roman at kennke.org)
Date: Mon, 19 Dec 2016 16:33:31 +0000
Subject: hg: shenandoah/jdk8u/hotspot: Add missing eq barrier in opto runtime.
Message-ID: <201612191633.uBJGXV2r022457@aojmv0008.oracle.com>

Changeset: eb39f84890cb
Author:    rkennke
Date:      2016-12-19 17:33 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/eb39f84890cb

Add missing eq barrier in opto runtime.

! src/share/vm/opto/runtime.cpp


From rkennke at redhat.com  Mon Dec 19 16:34:33 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 19 Dec 2016 17:34:33 +0100
Subject: FYI: (Re-) add object eq barrier in OptoRuntime
Message-ID: <1482165273.2807.46.camel@redhat.com>

Another one that probably got lost because it did not exist in jdk9...

http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/eb39f84890cb

Roman

From shade at redhat.com  Mon Dec 19 18:41:05 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 19 Dec 2016 19:41:05 +0100
Subject: RFR: Fix freeze on OOM during evacuation
In-Reply-To: <1482164752.2807.45.camel@redhat.com>
References: <1482067486.2807.35.camel@redhat.com>
	<dbd78c97-30c3-3f72-2407-bf9ff86c02ad@redhat.com>
	<1482164752.2807.45.camel@redhat.com>
Message-ID: <64c2ba03-e1c2-8d1f-9003-2781d9893639@redhat.com>

On 12/19/2016 05:25 PM, Roman Kennke wrote:
> Am Montag, den 19.12.2016, 16:43 +0100 schrieb Aleksey Shipilev:
>> On 12/18/2016 02:24 PM, Roman Kennke wrote:
>>> http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.00
>>
>> This webrev is contaminated with degenerate conc mark patch?
> 
> Duh.
> 
> http://cr.openjdk.java.net/~rkennke/fix-oom-evac/webrev.01/

Okay.

-Aleksey


From rwestrel at redhat.com  Mon Dec 19 20:10:00 2016
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 19 Dec 2016 21:10:00 +0100
Subject: FYI: (Re-) add object eq barrier in OptoRuntime
In-Reply-To: <1482165273.2807.46.camel@redhat.com>
References: <1482165273.2807.46.camel@redhat.com>
Message-ID: <dk67f6vd6jr.fsf@rwestrel.remote.csb>


> Another one that probably got lost because it did not exist in jdk9...
>
> http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/eb39f84890cb

Thanks for fixing that one too.

Roland.

From zgu at redhat.com  Mon Dec 19 21:23:30 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 19 Dec 2016 16:23:30 -0500
Subject: RFR: Degenerating concurrent marking
In-Reply-To: <1481989954.2807.32.camel@redhat.com>
References: <1481900106.2807.20.camel@redhat.com>
	<338d66f1-a46b-dfb3-9a89-03010387c36b@redhat.com>
	<1481982083.2807.31.camel@redhat.com>
	<1481989954.2807.32.camel@redhat.com>
Message-ID: <ebb7f5c9-4e88-139b-cca4-ede824695849@redhat.com>

Okay.

-Zhengyu


On 12/17/2016 10:52 AM, Roman Kennke wrote:
> Am Samstag, den 17.12.2016, 14:41 +0100 schrieb Roman Kennke:
>> As suggested by Zhengyu on IRC, I now changed it to:
>>
>>      if (terminator != NULL && terminator->should_force_termination())
>> {
>>        return true;
>>      }
>>
>> makes the code more readable.
> Hmm, no, this didn't work. We need the spinning as was proposed before:
>
> http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.04
>
> This passes all tests that I throw at it.
>
> ok to push?
>
> Roman
>
>> The assert that I observed was not caused by this change and is
>> already
>> fixed.
>>
>> Ok to go now?
>>
>> http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.03
>>
>> Roman
>>
>>
>> Am Freitag, den 16.12.2016, 12:02 -0500 schrieb Zhengyu Gu:
>>> Hi Roman,
>>>
>>> - taskqueue
>>>    
>>>     Adding force termination to TerminatorTerminator seems more
>>> logical to me
>>>
>>> class TerminatorTerminator: public CHeapObj<mtInternal> {
>>> public:
>>>     virtual bool should_exit_termination() = 0;
>>>     virtual bool should_force_termination() = 0;
>>> };
>>>
>>> - shenandoahConcurrentMark.cpp #392
>>>
>>>     Please update assert message.
>>>
>>>
>>> Otherwise, look good to me.
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>> On 12/16/2016 09:55 AM, Roman Kennke wrote:
>>>> This patch implements what I call 'degenerating concurrent
>>>> marking'.
>>>> If, during concurrent mark, we run out of memory, instead of
>>>> stopping,
>>>> throwing away all marking data and doing a full-gc, it gracefully
>>>> hands
>>>> over all existing marking work to the subsequent final-mark
>>>> pause,
>>>> finishes marking there, and kicks of normal marking. The idea
>>>> being
>>>> that in most cases, the OOM is not happening because we got into
>>>> a
>>>> bad
>>>> situation (fragmented heap or such) but only temporary alloc
>>>> bursts
>>>> or
>>>> such, *and* chances are high that we're almost done marking
>>>> anyway.
>>>>
>>>> I made it such that existing mark bitmaps, task queues, SATB
>>>> buffers
>>>> and weakref-queues are left intact, if the heuristics decide to
>>>> go
>>>> into
>>>> degenerated concurrent marking, then the final-mark pause carries
>>>> on
>>>> where concurrent marking left. Interestingly, the code for this
>>>> is
>>>> mostly in place already ... in final marking we already finish
>>>> off
>>>> marking in the way that we need.
>>>>
>>>> I needed to tweak the termination protocol in the taskqueue for
>>>> that,
>>>> and not clear task queues on cancellation. Instead I added a
>>>> 'shortcut'
>>>> in the case we need to terminate without draining the task
>>>> queues.
>>>> Please look at this carefully, I am not totally sure I got that
>>>> right.
>>>>
>>>> In addition, I also re-wrote adaptive heuristics. It will start
>>>> out
>>>> with 10% free threshold (i.e. we start marking when 10% available
>>>> space
>>>> is left), and lower that if we have 5 successful markings in a
>>>> row,
>>>> and
>>>> bump that up if we fail to complete concurrent marking. We limit
>>>> the
>>>> free threshold 30<free_threshold<3. All parameters can be
>>>> configured.
>>>>
>>>> This adaptive heuristics work very well for me, and I'm tempted
>>>> to
>>>> make
>>>> this default soon. It makes much better use of headroom, which
>>>> means
>>>> fewer GC cycles, and thus better throughput.
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/degen-marking/webrev.00/
>>>>
>>>> Ok? Opinions?
>>>>
>>>> Roman
>>>>
>>>


From roman at kennke.org  Mon Dec 19 21:48:36 2016
From: roman at kennke.org (roman at kennke.org)
Date: Mon, 19 Dec 2016 21:48:36 +0000
Subject: hg: shenandoah/jdk9/hotspot: 2 new changesets
Message-ID: <201612192148.uBJLmaZB012487@aojmv0008.oracle.com>

Changeset: fc0c2ad9497d
Author:    rkennke
Date:      2016-12-19 22:09 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fc0c2ad9497d

Fix freeze on OOM during evacuation

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp

Changeset: 84363ca14be9
Author:    rkennke
Date:      2016-12-19 22:48 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/84363ca14be9

Degenerating concurrent marking

! src/share/vm/gc/shared/taskqueue.cpp
! src/share/vm/gc/shared/taskqueue.hpp
! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahTaskqueue.cpp
! src/share/vm/gc/shenandoah/shenandoahTaskqueue.hpp
! src/share/vm/gc/shenandoah/shenandoah_globals.hpp


From rwestrel at redhat.com  Tue Dec 20 10:04:08 2016
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Tue, 20 Dec 2016 10:04:08 +0000
Subject: hg: shenandoah/jdk8u/hotspot: null check bypasses read barrier
Message-ID: <201612201004.uBKA48KQ012542@aojmv0008.oracle.com>

Changeset: 05f696d8443b
Author:    roland
Date:      2016-12-20 11:03 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/05f696d8443b

null check bypasses read barrier

! src/share/vm/opto/compile.cpp


From shade at redhat.com  Tue Dec 20 10:57:21 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 20 Dec 2016 11:57:21 +0100
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
Message-ID: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>

Hi,

Since we care mostly about pause times, and not the raw throughput, it makes
sense to enable safepoints in counted loops. This makes us much more responsive
(as in, TTSP is lower) in many interesting scenarios.

Change:
  http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/webrev.01/

The easiest example that is present in any workload of interest is looping
through a large array/ArrayList.

SPECjvm2008 throughput does appear affected where tight loops are present:

Benchmark                Mode  Cnt      Score    Error    Units

# -XX:-UseCountedLoopSafepoints
Compiler.compiler       thrpt   30    217.169 ?  5.166  ops/min
Compiler.sunflow        thrpt   30    473.940 ? 20.246  ops/min
Compress.test           thrpt   15    647.552 ?  3.528  ops/min
CryptoAes.test          thrpt   15     44.367 ?  2.402  ops/min
CryptoRsa.test          thrpt   15   2066.495 ? 11.809  ops/min
CryptoSignVerify.test   thrpt   15  10372.019 ? 50.713  ops/min
Derby.test              thrpt   30    375.954 ? 13.539  ops/min
MpegAudio.test          thrpt   15    197.299 ?  2.411  ops/min
ScimarkFFT.large        thrpt   15     55.618 ?  0.142  ops/min
ScimarkFFT.small        thrpt   15    664.370 ?  7.304  ops/min
ScimarkLU.large         thrpt   15     14.767 ?  0.082  ops/min
ScimarkLU.small         thrpt   15    926.435 ?  8.790  ops/min
ScimarkMonteCarlo.test  thrpt   15   4508.333 ? 68.869  ops/min
ScimarkSOR.large        thrpt   15     74.596 ?  0.052  ops/min
ScimarkSOR.small        thrpt   15    466.186 ?  1.308  ops/min
ScimarkSparse.large     thrpt   15     48.932 ? 11.991  ops/min
ScimarkSparse.small     thrpt   15    360.907 ?  6.739  ops/min
Serial.test             thrpt   30   8779.857 ? 77.717    ops/s
Sunflow.test            thrpt   15    124.546 ?  2.110  ops/min
XmlTransform.test       thrpt   20    429.422 ? 24.964  ops/min
XmlValidation.test      thrpt   30    773.254 ?  8.561  ops/min

# -XX:+UseCountedLoopSafepoints
Compiler.compiler       thrpt   20    213.199 ?  8.146  ops/min
Compiler.sunflow        thrpt   27    486.745 ? 21.118  ops/min
Compress.test           thrpt   15    637.303 ?  4.800  ops/min <---  -1.5%
CryptoAes.test          thrpt   15     46.943 ?  0.345  ops/min
CryptoRsa.test          thrpt   15   2042.072 ? 12.379  ops/min <---  -1.1%
CryptoSignVerify.test   thrpt   15  10240.459 ? 63.095  ops/min
Derby.test              thrpt   30    406.943 ? 12.625  ops/min
MpegAudio.test          thrpt   15    193.173 ?  1.414  ops/min
ScimarkFFT.large        thrpt   15     55.629 ?  0.104  ops/min
ScimarkFFT.small        thrpt   15    669.153 ?  6.683  ops/min
ScimarkLU.large         thrpt   15     13.510 ?  0.075  ops/min <---  -8.5%
ScimarkLU.small         thrpt   15    581.737 ?  6.539  ops/min <--- -37.3%
ScimarkMonteCarlo.test  thrpt   15   4485.049 ? 11.864  ops/min
ScimarkSOR.large        thrpt   15     74.594 ?  0.045  ops/min
ScimarkSOR.small        thrpt   15    421.046 ?  0.456  ops/min <---  -9.6%
ScimarkSparse.large     thrpt   15     40.995 ?  0.283  ops/min
ScimarkSparse.small     thrpt   15    319.079 ?  1.391  ops/min <--- -11.3%
Serial.test             thrpt   30   8717.823 ? 81.147    ops/s
Sunflow.test            thrpt   15    127.221 ?  1.578  ops/min
XmlTransform.test       thrpt   20    445.762 ?  8.278  ops/min
XmlValidation.test      thrpt   30    760.121 ?  9.963  ops/min

Note that Scimark are expected to regress that much: they do have very tight
loops, and that's our problem: the TTSP times there are in multi-second range!
The difference is explained by different code generation. For example, in most
dramatic ScimarkLU.small case:

Hottest loop uses AVX2 (vmovdqu and friends):

http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu-shenandoah-minus.perfasm

Hottest loop uses AVX (vmovsd and friends):

http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu-shenandoah-plus.perfasm

As such, I believe enabling this by default, and figuring out code quality
issues as we go forward is the sane tactics.

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Dec 20 11:02:11 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 20 Dec 2016 12:02:11 +0100
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
Message-ID: <1482231731.2807.54.camel@redhat.com>

Am Dienstag, den 20.12.2016, 11:57 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> Since we care mostly about pause times, and not the raw throughput,
> it makes
> sense to enable safepoints in counted loops. This makes us much more
> responsive
> (as in, TTSP is lower) in many interesting scenarios.
> 
> Change:
> ? http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/webrev.0
> 1/
> 
> The easiest example that is present in any workload of interest is
> looping
> through a large array/ArrayList.
> 
> SPECjvm2008 throughput does appear affected where tight loops are
> present:
> 
> Benchmark????????????????Mode??Cnt??????Score????Error????Units
> 
> # -XX:-UseCountedLoopSafepoints
> Compiler.compiler???????thrpt???30????217.169 ???5.166??ops/min
> Compiler.sunflow????????thrpt???30????473.940 ? 20.246??ops/min
> Compress.test???????????thrpt???15????647.552 ???3.528??ops/min
> CryptoAes.test??????????thrpt???15?????44.367 ???2.402??ops/min
> CryptoRsa.test??????????thrpt???15???2066.495 ? 11.809??ops/min
> CryptoSignVerify.test???thrpt???15??10372.019 ? 50.713??ops/min
> Derby.test??????????????thrpt???30????375.954 ? 13.539??ops/min
> MpegAudio.test??????????thrpt???15????197.299 ???2.411??ops/min
> ScimarkFFT.large????????thrpt???15?????55.618 ???0.142??ops/min
> ScimarkFFT.small????????thrpt???15????664.370 ???7.304??ops/min
> ScimarkLU.large?????????thrpt???15?????14.767 ???0.082??ops/min
> ScimarkLU.small?????????thrpt???15????926.435 ???8.790??ops/min
> ScimarkMonteCarlo.test??thrpt???15???4508.333 ? 68.869??ops/min
> ScimarkSOR.large????????thrpt???15?????74.596 ???0.052??ops/min
> ScimarkSOR.small????????thrpt???15????466.186 ???1.308??ops/min
> ScimarkSparse.large?????thrpt???15?????48.932 ? 11.991??ops/min
> ScimarkSparse.small?????thrpt???15????360.907 ???6.739??ops/min
> Serial.test?????????????thrpt???30???8779.857 ? 77.717????ops/s
> Sunflow.test????????????thrpt???15????124.546 ???2.110??ops/min
> XmlTransform.test???????thrpt???20????429.422 ? 24.964??ops/min
> XmlValidation.test??????thrpt???30????773.254 ???8.561??ops/min
> 
> # -XX:+UseCountedLoopSafepoints
> Compiler.compiler???????thrpt???20????213.199 ???8.146??ops/min
> Compiler.sunflow????????thrpt???27????486.745 ? 21.118??ops/min
> Compress.test???????????thrpt???15????637.303 ???4.800??ops/min <
> ---??-1.5%
> CryptoAes.test??????????thrpt???15?????46.943 ???0.345??ops/min
> CryptoRsa.test??????????thrpt???15???2042.072 ? 12.379??ops/min <
> ---??-1.1%
> CryptoSignVerify.test???thrpt???15??10240.459 ? 63.095??ops/min
> Derby.test??????????????thrpt???30????406.943 ? 12.625??ops/min
> MpegAudio.test??????????thrpt???15????193.173 ???1.414??ops/min
> ScimarkFFT.large????????thrpt???15?????55.629 ???0.104??ops/min
> ScimarkFFT.small????????thrpt???15????669.153 ???6.683??ops/min
> ScimarkLU.large?????????thrpt???15?????13.510 ???0.075??ops/min <
> ---??-8.5%
> ScimarkLU.small?????????thrpt???15????581.737 ???6.539??ops/min <---
> -37.3%
> ScimarkMonteCarlo.test??thrpt???15???4485.049 ? 11.864??ops/min
> ScimarkSOR.large????????thrpt???15?????74.594 ???0.045??ops/min
> ScimarkSOR.small????????thrpt???15????421.046 ???0.456??ops/min <
> ---??-9.6%
> ScimarkSparse.large?????thrpt???15?????40.995 ???0.283??ops/min
> ScimarkSparse.small?????thrpt???15????319.079 ???1.391??ops/min <---
> -11.3%
> Serial.test?????????????thrpt???30???8717.823 ? 81.147????ops/s
> Sunflow.test????????????thrpt???15????127.221 ???1.578??ops/min
> XmlTransform.test???????thrpt???20????445.762 ???8.278??ops/min
> XmlValidation.test??????thrpt???30????760.121 ???9.963??ops/min
> 
> Note that Scimark are expected to regress that much: they do have
> very tight
> loops, and that's our problem: the TTSP times there are in multi-
> second range!
> The difference is explained by different code generation. For
> example, in most
> dramatic ScimarkLU.small case:
> 
> Hottest loop uses AVX2 (vmovdqu and friends):
> 
> http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu
> -shenandoah-minus.perfasm
> 
> Hottest loop uses AVX (vmovsd and friends):
> 
> http://cr.openjdk.java.net/~shade/shenandoah/counted-loops/scimark-lu
> -shenandoah-plus.perfasm
> 
> As such, I believe enabling this by default, and figuring out code
> quality
> issues as we go forward is the sane tactics.

Yes. The regressions, especially in scimark.lu are bad, but as you say,
the ones that regress are also the ones that show extreme TTSP.

The patch is ok for me. Folks who prefer raw throughput and can live
with multisecond pause times can still turn the option off :-)

In the long run, we should look at strip mining the loops.

Roman


From ashipile at redhat.com  Tue Dec 20 11:09:59 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Tue, 20 Dec 2016 11:09:59 +0000
Subject: hg: shenandoah/jdk9/hotspot: Enable UseCountedLoopSafepoints with
	Shenandoah.
Message-ID: <201612201109.uBKB9xPF029971@aojmv0008.oracle.com>

Changeset: c2fd76aa8981
Author:    shade
Date:      2016-12-20 12:09 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/c2fd76aa8981

Enable UseCountedLoopSafepoints with Shenandoah.

! src/share/vm/runtime/arguments.cpp


From rkennke at redhat.com  Tue Dec 20 11:10:56 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 20 Dec 2016 12:10:56 +0100
Subject: RFR: Improve AryEq
Message-ID: <1482232256.2807.56.camel@redhat.com>

This adds an cmp-barrier to the code generated by AryEq. A false
negative in the array ptr comparison would result in the slow-path
being taken, even though it's not necessary. The barrier should get us
on the fast path more often.

Ok?

http://cr.openjdk.java.net/~rkennke/aryeq/webrev.00/

Roman


From shade at redhat.com  Tue Dec 20 11:15:33 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 20 Dec 2016 12:15:33 +0100
Subject: RFR: Improve AryEq
In-Reply-To: <1482232256.2807.56.camel@redhat.com>
References: <1482232256.2807.56.camel@redhat.com>
Message-ID: <743b2c9e-cf5a-f7ac-ef69-1c8c6a47b54a@redhat.com>

On 12/20/2016 12:10 PM, Roman Kennke wrote:
> This adds an cmp-barrier to the code generated by AryEq. A false
> negative in the array ptr comparison would result in the slow-path
> being taken, even though it's not necessary. The barrier should get us
> on the fast path more often.
> 
> Ok?
> 
> http://cr.openjdk.java.net/~rkennke/aryeq/webrev.00/

Looks okay, but would be interesting to see if we can merge null-checking paths
with acmp barrier?

8603     oopDesc::bs()->asm_acmp_barrier(this, ary1, ary2);
8604     jcc(Assembler::equal, TRUE_LABEL);
8605
8606     // Need additional checks for arrays_equals.
8607     testptr(ary1, ary1);
8608     jcc(Assembler::zero, FALSE_LABEL);
8609     testptr(ary2, ary2);
8610     jcc(Assembler::zero, FALSE_LABEL);

Thanks,
-Aleksey


From rwestrel at redhat.com  Tue Dec 20 11:44:13 2016
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Tue, 20 Dec 2016 11:44:13 +0000
Subject: hg: shenandoah/jdk8u/hotspot: read barrier in unsafe can break C2
	graph
Message-ID: <201612201144.uBKBiDEO009069@aojmv0008.oracle.com>

Changeset: b9bba0d6458d
Author:    roland
Date:      2016-12-20 12:44 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/b9bba0d6458d

read barrier in unsafe can break C2 graph

! src/share/vm/opto/library_call.cpp


From rwestrel at redhat.com  Tue Dec 20 13:29:52 2016
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Tue, 20 Dec 2016 13:29:52 +0000
Subject: hg: shenandoah/jdk8u/hotspot: add back accidentally dropped write
	barriers in GraphKit::store_String_*
Message-ID: <201612201329.uBKDTqw2009839@aojmv0008.oracle.com>

Changeset: 4ba3e50858e2
Author:    roland
Date:      2016-12-20 14:29 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/4ba3e50858e2

add back accidentally dropped write barriers in GraphKit::store_String_*

! src/share/vm/opto/graphKit.cpp


From aph at redhat.com  Tue Dec 20 13:32:49 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 20 Dec 2016 13:32:49 +0000
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
Message-ID: <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>

On 20/12/16 10:57, Aleksey Shipilev wrote:
> Since we care mostly about pause times, and not the raw throughput, it makes
> sense to enable safepoints in counted loops. This makes us much more responsive
> (as in, TTSP is lower) in many interesting scenarios.

True, but I have seen some very interesting cases where we beat G1 in
throughput.  Let's not overdo this: at the very least we need to know
how to restore throughput when running Shenandoah; all this business
of one flag affecting others can be surprising.

Andrew.


From rkennke at redhat.com  Tue Dec 20 14:01:38 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 20 Dec 2016 15:01:38 +0100
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
Message-ID: <1482242498.2807.60.camel@redhat.com>

Am Dienstag, den 20.12.2016, 13:32 +0000 schrieb Andrew Haley:
> On 20/12/16 10:57, Aleksey Shipilev wrote:
> > Since we care mostly about pause times, and not the raw throughput,
> > it makes
> > sense to enable safepoints in counted loops. This makes us much
> > more responsive
> > (as in, TTSP is lower) in many interesting scenarios.
> 
> True, but I have seen some very interesting cases where we beat G1 in
> throughput.

Yes. As far as I can see, those are not affected by this (e.g. compiler
benchmarks). And multiple seconds (!) just to get to a safepoint seems
way too much, and it's more than 1 program that is affected by this.

> ??Let's not overdo this: at the very least we need to know
> how to restore throughput when running Shenandoah;

easy: -XX:-UseCountedLoopSafepoints

Infact, I've been thinking for a while about a sort of 'priority'
setting for Shenandoah, where one could choose between 'throughput' and
'pausetime', and we would turn on or off specific options to improve
one or the other, e.g. this UseCountedLoopSafepoints flag, some
heuristics settings, and so on. Kindof like the -XX:+AggressiveOpts
setting, but towards one or the other priority.

However, so far there are not that many settings in this regard, and
our priority is always leaning towards pause times anyway...

>  all this business
> of one flag affecting others can be surprising.

Indeed. I would be most worried about turning on code paths that are
not used otherwise, and thus run into bugs that are not ours, but in
this case it seems to be simple enough.

Roman

From lennart.borjeson at cinnober.com  Tue Dec 20 15:18:31 2016
From: lennart.borjeson at cinnober.com (=?utf-8?B?TGVubmFydCBCw7ZyamVzb24=?=)
Date: Tue, 20 Dec 2016 15:18:31 +0000
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <1482242498.2807.60.camel@redhat.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
Message-ID: <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com>

I feel I must chip in here?

(I?m continuously testing Shenandoah, as well as other JVM variants, with our products, as part of my work.)

We have recently encountered an issue with a commercial JVM which had elected to skip safepoint checks for all counted loops. This broke our product, since we have a crucial spin wait in a long-indexed loop.

(As you know, the JVM normally inserts safepoint checks in long-indexed, but not in int-indexed, counted loops.)

Such a change in behaviour is extremely hard to track down, and I regard it as a significant functional change.

I urge you, as I?ve urged the vendor in question, to keep the ?standard? behaviour as default.


And BTW, Shenandoah is starting to perform very well in my tests. Our primary metric is transaction roundtrip time, and outlier elimination is important. In my latest tests (of a week ago), Shenandoah had much shorter maximum times than our baseline (which uses Hotspot+ParNew+CMS). You really have made a fantastic work this year!

Best regards,

/Lennart


> 20 dec. 2016 kl. 15:01 skrev Roman Kennke <rkennke at redhat.com>:
> 
> Am Dienstag, den 20.12.2016, 13:32 +0000 schrieb Andrew Haley:
>> On 20/12/16 10:57, Aleksey Shipilev wrote:
>>> Since we care mostly about pause times, and not the raw throughput,
>>> it makes
>>> sense to enable safepoints in counted loops. This makes us much
>>> more responsive
>>> (as in, TTSP is lower) in many interesting scenarios.
>> 
>> True, but I have seen some very interesting cases where we beat G1 in
>> throughput.
> 
> Yes. As far as I can see, those are not affected by this (e.g. compiler
> benchmarks). And multiple seconds (!) just to get to a safepoint seems
> way too much, and it's more than 1 program that is affected by this.
> 
>>   Let's not overdo this: at the very least we need to know
>> how to restore throughput when running Shenandoah;
> 
> easy: -XX:-UseCountedLoopSafepoints
> 
> Infact, I've been thinking for a while about a sort of 'priority'
> setting for Shenandoah, where one could choose between 'throughput' and
> 'pausetime', and we would turn on or off specific options to improve
> one or the other, e.g. this UseCountedLoopSafepoints flag, some
> heuristics settings, and so on. Kindof like the -XX:+AggressiveOpts
> setting, but towards one or the other priority.
> 
> However, so far there are not that many settings in this regard, and
> our priority is always leaning towards pause times anyway...
> 
>> all this business
>> of one flag affecting others can be surprising.
> 
> Indeed. I would be most worried about turning on code paths that are
> not used otherwise, and thus run into bugs that are not ours, but in
> this case it seems to be simple enough.
> 
> Roman


From shade at redhat.com  Tue Dec 20 15:26:09 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 20 Dec 2016 16:26:09 +0100
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
	<9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com>
Message-ID: <8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com>

Hi Lennart,

On 12/20/2016 04:18 PM, Lennart B?rjeson wrote:
> We have recently encountered an issue with a commercial JVM which had elected
> to skip safepoint checks for all counted loops. This broke our product, since
> we have a crucial spin wait in a long-indexed loop.
> 
> (As you know, the JVM normally inserts safepoint checks in long-indexed, but
> not in int-indexed, counted loops.)
> 
> Such a change in behaviour is extremely hard to track down, and I regard it
> as a significant functional change.
> 
> I urge you, as I?ve urged the vendor in question, to keep the ?standard?
> behaviour as default.

I am a bit confused about the notion of "standard behavior". There is no
standard that mandates either putting safepoint checks into loops, or skipping them.

This Shenandoah change _inserts_ more safepoint checks, not eliminates them, so
this seems like something you want?

Thanks,
-Aleksey


From simone.bordet at gmail.com  Tue Dec 20 15:27:32 2016
From: simone.bordet at gmail.com (Simone Bordet)
Date: Tue, 20 Dec 2016 16:27:32 +0100
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
	<9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com>
Message-ID: <CAFWmRJ0vdb=WzjTxkpzDDtBDDqQHn__JEPqng=A4H4hGD3byHg@mail.gmail.com>

Hi,

On Tue, Dec 20, 2016 at 4:18 PM, Lennart B?rjeson
<lennart.borjeson at cinnober.com> wrote:
> I feel I must chip in here?
>
> (I?m continuously testing Shenandoah, as well as other JVM variants, with our products, as part of my work.)
>
> We have recently encountered an issue with a commercial JVM which had elected to skip safepoint checks for all counted loops. This broke our product, since we have a crucial spin wait in a long-indexed loop.

Wow. Can you detail how your product makes use of the fact that the
JVM is polling (or not) for a safepoint ?
I'm guessing you are doing this from native code ? Custom JVM modifications ?

> (As you know, the JVM normally inserts safepoint checks in long-indexed, but not in int-indexed, counted loops.)
>
> Such a change in behaviour is extremely hard to track down, and I regard it as a significant functional change.

Would not be the opposite, i.e. your product relying on very specific
implementation detail of how the JVM works ?

Thanks !

-- 
Simone Bordet
http://bordet.blogspot.com
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.   Victoria Livschitz

From lennart.borjeson at cinnober.com  Tue Dec 20 15:50:47 2016
From: lennart.borjeson at cinnober.com (=?utf-8?B?TGVubmFydCBCw7ZyamVzb24=?=)
Date: Tue, 20 Dec 2016 15:50:47 +0000
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <CAFWmRJ0vdb=WzjTxkpzDDtBDDqQHn__JEPqng=A4H4hGD3byHg@mail.gmail.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
	<9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com>
	<CAFWmRJ0vdb=WzjTxkpzDDtBDDqQHn__JEPqng=A4H4hGD3byHg@mail.gmail.com>
Message-ID: <6F4CD75A-122F-44B8-B88A-CFB2742FA447@cinnober.com>


> 20 dec. 2016 kl. 16:27 skrev Simone Bordet <simone.bordet at gmail.com>:
> 
> Hi,
> 
> On Tue, Dec 20, 2016 at 4:18 PM, Lennart B?rjeson
> <lennart.borjeson at cinnober.com> wrote:
>> I feel I must chip in here?
>> 
>> (I?m continuously testing Shenandoah, as well as other JVM variants, with our products, as part of my work.)
>> 
>> We have recently encountered an issue with a commercial JVM which had elected to skip safepoint checks for all counted loops. This broke our product, since we have a crucial spin wait in a long-indexed loop.
> 
> Wow. Can you detail how your product makes use of the fact that the
> JVM is polling (or not) for a safepoint ?
> I'm guessing you are doing this from native code ? Custom JVM modifications ?

No, just standard java. And I wouldn?t say we *made use* of it, we just had some code which worked in one JVM and not in the next? 

In our case, we had a while-loop testing a long variable, which somehow was deemed to be a counted loop, and consequently not checked under the new behaviour. Very tricky to identify.

> 
>> (As you know, the JVM normally inserts safepoint checks in long-indexed, but not in int-indexed, counted loops.)
>> 
>> Such a change in behaviour is extremely hard to track down, and I regard it as a significant functional change.
> 
> Would not be the opposite, i.e. your product relying on very specific
> implementation detail of how the JVM works ?
> 

Well, you?re *always* dependent on how the JVM works, aren?t you. ;-)


From lennart.borjeson at cinnober.com  Tue Dec 20 15:54:36 2016
From: lennart.borjeson at cinnober.com (=?utf-8?B?TGVubmFydCBCw7ZyamVzb24=?=)
Date: Tue, 20 Dec 2016 15:54:36 +0000
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
	<9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com>
	<8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com>
Message-ID: <BA97A037-B4EB-4AFA-8EEA-7A126AAB22CE@cinnober.com>


> 20 dec. 2016 kl. 16:26 skrev Aleksey Shipilev <shade at redhat.com>:
> 
> I am a bit confused about the notion of "standard behavior". There is no
> standard that mandates either putting safepoint checks into loops, or skipping them.
> 
> This Shenandoah change _inserts_ more safepoint checks, not eliminates them, so
> this seems like something you want?
> 

I was thinking about the flag UseCountedLoopSafepoints. The current default is ?false?, and I gathered you were discussing to change this to ?true??


From shade at redhat.com  Tue Dec 20 15:57:54 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 20 Dec 2016 16:57:54 +0100
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <BA97A037-B4EB-4AFA-8EEA-7A126AAB22CE@cinnober.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
	<9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com>
	<8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com>
	<BA97A037-B4EB-4AFA-8EEA-7A126AAB22CE@cinnober.com>
Message-ID: <efa3fc9d-9cc5-5260-6eea-0c983f3dad58@redhat.com>

On 12/20/2016 04:54 PM, Lennart B?rjeson wrote:
>> 20 dec. 2016 kl. 16:26 skrev Aleksey Shipilev <shade at redhat.com>:
>> 
>> I am a bit confused about the notion of "standard behavior". There is no 
>> standard that mandates either putting safepoint checks into loops, or
>> skipping them.
>> 
>> This Shenandoah change _inserts_ more safepoint checks, not eliminates
>> them, so this seems like something you want?
>> 
> 
> I was thinking about the flag UseCountedLoopSafepoints. The current default
> is ?false?, and I gathered you were discussing to change this to ?true??

Yes. "true" means Hotspot will emit safepoints checks in counted loops, thus
improving time-to-safepoint, and therefore improving pause time. Isn't that the
behavior you want for your product?

-Aleksey


From aph at redhat.com  Tue Dec 20 16:52:09 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 20 Dec 2016 16:52:09 +0000
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <1482242498.2807.60.camel@redhat.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
Message-ID: <e05531c4-a984-d9d9-60f2-09e722f40d5a@redhat.com>

On 20/12/16 14:01, Roman Kennke wrote:
> Am Dienstag, den 20.12.2016, 13:32 +0000 schrieb Andrew Haley:
>> On 20/12/16 10:57, Aleksey Shipilev wrote:
>>> Since we care mostly about pause times, and not the raw throughput,
>>> it makes
>>> sense to enable safepoints in counted loops. This makes us much
>>> more responsive
>>> (as in, TTSP is lower) in many interesting scenarios.
>>
>> True, but I have seen some very interesting cases where we beat G1 in
>> throughput.
> 
> Yes. As far as I can see, those are not affected by this (e.g. compiler
> benchmarks). And multiple seconds (!) just to get to a safepoint seems
> way too much, and it's more than 1 program that is affected by this.

Can you tell me which program delays so long?  I'd like to see it.

I suspect that's a bug.  And, of course, people are capable of using
-XX:-UseCountedLoopSafepoints themselves.

>>   Let's not overdo this: at the very least we need to know
>> how to restore throughput when running Shenandoah;
> 
> easy: -XX:-UseCountedLoopSafepoints

Right, so we know for sure that enabling Shenandoah only affects one
other flag.  Good!

Andrew.

From shade at redhat.com  Tue Dec 20 17:08:31 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 20 Dec 2016 18:08:31 +0100
Subject: RFR (S): Refactor concurrent mark to be more inlineable
Message-ID: <847fc65d-9814-26d1-7c38-ee46bc7d2627@redhat.com>

Hi,

I would like to refactor the concurrent mark to make it more inlineable, prepare
it for conc mark prefetch, etc:
  http://cr.openjdk.java.net/~shade/shenandoah/concmark-inline/webrev.01/

In that patch:
  a) Peeled concurrent_process_queues before the hot loop;
  b) Inlined try_* methods to call a very fat do_object_or_array once. It also
helps to pinpoint a single place where we get the tasks, so that future work on
buffering and prefetching would capitalize on this;
  c) Optimize SATB draining code: poll the local queue immediately after
draining SATB, do not do stealing which will bypass the local queue;
  d) Marked a few important closures "inline", and added headers where needed;

Testing: hs_gc_shenandoah, SPECjvm/Derby.

Thanks,
-Aleksey


From lennart.borjeson at cinnober.com  Tue Dec 20 17:12:19 2016
From: lennart.borjeson at cinnober.com (=?Windows-1252?Q?Lennart_B=F6rjeson?=)
Date: Tue, 20 Dec 2016 17:12:19 +0000
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <efa3fc9d-9cc5-5260-6eea-0c983f3dad58@redhat.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
	<9F7DB362-A9F0-4860-965B-ED02714B8026@cinnober.com>
	<8b1b531c-62b4-a1c6-f19f-d6411cb4626b@redhat.com>
	<BA97A037-B4EB-4AFA-8EEA-7A126AAB22CE@cinnober.com>,
	<efa3fc9d-9cc5-5260-6eea-0c983f3dad58@redhat.com>
Message-ID: <A7E32BB3-FBB8-4E9E-88FE-0745705434B9@cinnober.com>


> 20 dec. 2016 kl. 16:58 skrev Aleksey Shipilev <shade at redhat.com>:
> 
> On 12/20/2016 04:54 PM, Lennart B?rjeson wrote:
>>> 20 dec. 2016 kl. 16:26 skrev Aleksey Shipilev <shade at redhat.com>:
>>> 
>>> I am a bit confused about the notion of "standard behavior". There is no 
>>> standard that mandates either putting safepoint checks into loops, or
>>> skipping them.
>>> 
>>> This Shenandoah change _inserts_ more safepoint checks, not eliminates
>>> them, so this seems like something you want?
>>> 
>> 
>> I was thinking about the flag UseCountedLoopSafepoints. The current default
>> is ?false?, and I gathered you were discussing to change this to ?true??
> 
> Yes. "true" means Hotspot will emit safepoints checks in counted loops, thus
> improving time-to-safepoint, and therefore improving pause time. Isn't that the
> behavior you want for your product?
> 

Well, if there were to be a safe point check in every counted loop, I fear overall performance would suffer too much. But I would of course need to test that. 

Note that the problem we had with the other JVM was more that the definition of "counted loop" had changed, than a change of a default value for a flag.

From rkennke at redhat.com  Tue Dec 20 17:15:19 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 20 Dec 2016 18:15:19 +0100
Subject: RFR (S): Refactor concurrent mark to be more inlineable
In-Reply-To: <847fc65d-9814-26d1-7c38-ee46bc7d2627@redhat.com>
References: <847fc65d-9814-26d1-7c38-ee46bc7d2627@redhat.com>
Message-ID: <1482254119.2807.61.camel@redhat.com>

Looks good to me!

Roman

Am Dienstag, den 20.12.2016, 18:08 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> I would like to refactor the concurrent mark to make it more
> inlineable, prepare
> it for conc mark prefetch, etc:
> ? http://cr.openjdk.java.net/~shade/shenandoah/concmark-inline/webrev
> .01/
> 
> In that patch:
> ? a) Peeled concurrent_process_queues before the hot loop;
> ? b) Inlined try_* methods to call a very fat do_object_or_array
> once. It also
> helps to pinpoint a single place where we get the tasks, so that
> future work on
> buffering and prefetching would capitalize on this;
> ? c) Optimize SATB draining code: poll the local queue immediately
> after
> draining SATB, do not do stealing which will bypass the local queue;
> ? d) Marked a few important closures "inline", and added headers
> where needed;
> 
> Testing: hs_gc_shenandoah, SPECjvm/Derby.
> 
> Thanks,
> -Aleksey
> 

From rkennke at redhat.com  Tue Dec 20 17:48:31 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 20 Dec 2016 18:48:31 +0100
Subject: RFR: Improve AryEq
In-Reply-To: <743b2c9e-cf5a-f7ac-ef69-1c8c6a47b54a@redhat.com>
References: <1482232256.2807.56.camel@redhat.com>
	<743b2c9e-cf5a-f7ac-ef69-1c8c6a47b54a@redhat.com>
Message-ID: <1482256111.2807.62.camel@redhat.com>

Am Dienstag, den 20.12.2016, 12:15 +0100 schrieb Aleksey Shipilev:
> On 12/20/2016 12:10 PM, Roman Kennke wrote:
> > This adds an cmp-barrier to the code generated by AryEq. A false
> > negative in the array ptr comparison would result in the slow-path
> > being taken, even though it's not necessary. The barrier should get
> > us
> > on the fast path more often.
> > 
> > Ok?
> > 
> > http://cr.openjdk.java.net/~rkennke/aryeq/webrev.00/
> 
> Looks okay, but would be interesting to see if we can merge null-
> checking paths
> with acmp barrier?

That would be complicated. Would need build special code just for this
intrinsics... Doesn't seem worth for now. I'm pushing as is.

Roman


From roman at kennke.org  Tue Dec 20 17:49:24 2016
From: roman at kennke.org (roman at kennke.org)
Date: Tue, 20 Dec 2016 17:49:24 +0000
Subject: hg: shenandoah/jdk9/hotspot: Improve AryEq instruction by avoiding
	false negatives with a Shenandoah cmp barrier
Message-ID: <201612201749.uBKHnOxR023662@aojmv0008.oracle.com>

Changeset: 0d30308cdc65
Author:    rkennke
Date:      2016-12-20 18:49 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/0d30308cdc65

Improve AryEq instruction by avoiding false negatives with a Shenandoah cmp barrier

! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
! src/cpu/x86/vm/macroAssembler_x86.cpp


From shade at redhat.com  Tue Dec 20 17:56:17 2016
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 20 Dec 2016 18:56:17 +0100
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <e05531c4-a984-d9d9-60f2-09e722f40d5a@redhat.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
	<e05531c4-a984-d9d9-60f2-09e722f40d5a@redhat.com>
Message-ID: <89058fec-99f8-6e29-1ca9-45ec0b72b444@redhat.com>

On 12/20/2016 05:52 PM, Andrew Haley wrote:
> On 20/12/16 14:01, Roman Kennke wrote:
>> Am Dienstag, den 20.12.2016, 13:32 +0000 schrieb Andrew Haley:
>>> On 20/12/16 10:57, Aleksey Shipilev wrote:
>>>> Since we care mostly about pause times, and not the raw throughput,
>>>> it makes
>>>> sense to enable safepoints in counted loops. This makes us much
>>>> more responsive
>>>> (as in, TTSP is lower) in many interesting scenarios.
>>>
>>> True, but I have seen some very interesting cases where we beat G1 in
>>> throughput.
>>
>> Yes. As far as I can see, those are not affected by this (e.g. compiler
>> benchmarks). And multiple seconds (!) just to get to a safepoint seems
>> way too much, and it's more than 1 program that is affected by this.
> 
> Can you tell me which program delays so long?  I'd like to see it.
> 
> I suspect that's a bug.  And, of course, people are capable of using
> -XX:-UseCountedLoopSafepoints themselves.

This is not a bug, it is a very known Hotspot issue:
  http://psy-lob-saw.blogspot.de/2015/12/safepoints.html
  http://psy-lob-saw.blogspot.de/2016/02/wait-for-it-counteduncounted-loops.html


If you want a contrived example, here's one:

http://icedtea.classpath.org/people/shade/gc-bench/file/5b77fb55a8b6/src/main/java/org/openjdk/gcbench/yield/ArrayIteration.java

With 100M array, on my high-end i7 we have 300ms TTSP, which completely
dominates Shenandoah pause time. With safepoints in the loop TTSP is down to 1-5ms.


Another one:
 http://icedtea.classpath.org/people/shade/gc-bench/file/4c32eb6c67b0/src/main/java/org/openjdk/gcbench/yield/MonteCarloPI.java

With 100M samples one MonteCarlo run takes 1s, and that's the TTSP on my desktop
as well. With safepoints in the loop TTSP is down to 1-5ms.


Another one:

http://icedtea.classpath.org/people/shade/gc-bench/file/5b77fb55a8b6/src/main/java/org/openjdk/gcbench/fragger/LinkedListFragger.java

If you do LinkedList.get(index), it does counted loop inside for stepping
r->r.next N times. But since the whole thing is cache-hostile, you have a
problem. On large machine with 32 slow cores and slow memory TTSPs are in 1+
second range.

This completely blows "ultra low pause" targets.

There is an alternative solution: loop mining, i.e. replacing one big loop with
two nested loops, and safepointing the outer one. This requires heavy changes in
C2. Roland wanted to take on this after the Xmas break.

Thanks,
-Aleksey


From ashipile at redhat.com  Tue Dec 20 18:20:23 2016
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Tue, 20 Dec 2016 18:20:23 +0000
Subject: hg: shenandoah/jdk9/hotspot: Refactor concurrent mark to be more
	inlineable.
Message-ID: <201612201820.uBKIKN1Z001903@aojmv0008.oracle.com>

Changeset: 5c7176fd9317
Author:    shade
Date:      2016-12-20 19:18 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/5c7176fd9317

Refactor concurrent mark to be more inlineable.

! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp
! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp


From rwestrel at redhat.com  Tue Dec 20 20:37:39 2016
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Tue, 20 Dec 2016 20:37:39 +0000
Subject: hg: shenandoah/jdk9/hotspot: C2: the result of an implicit null check
	read barrier may be used when the check fails
Message-ID: <201612202037.uBKKben6006314@aojmv0008.oracle.com>

Changeset: 307980ea8e60
Author:    roland
Date:      2016-12-19 11:22 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/307980ea8e60

C2: the result of an implicit null check read barrier may be used when the check fails

! src/share/vm/opto/shenandoahSupport.cpp


From aph at redhat.com  Tue Dec 20 21:22:27 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 20 Dec 2016 21:22:27 +0000
Subject: RFR (XS): Enable UseCountedLoopSafepoints with Shenandoah
In-Reply-To: <89058fec-99f8-6e29-1ca9-45ec0b72b444@redhat.com>
References: <9833e763-9b3b-8246-7584-20a8e033f4c6@redhat.com>
	<6c6c262b-c3a2-486f-3f06-19b811fd87c1@redhat.com>
	<1482242498.2807.60.camel@redhat.com>
	<e05531c4-a984-d9d9-60f2-09e722f40d5a@redhat.com>
	<89058fec-99f8-6e29-1ca9-45ec0b72b444@redhat.com>
Message-ID: <27d1f284-abd1-8b62-9be5-ee78df361d46@redhat.com>

On 20/12/16 17:56, Aleksey Shipilev wrote:
> On 12/20/2016 05:52 PM, Andrew Haley wrote:
>> On 20/12/16 14:01, Roman Kennke wrote:
>>>
>>> Yes. As far as I can see, those are not affected by this (e.g. compiler
>>> benchmarks). And multiple seconds (!) just to get to a safepoint seems
>>> way too much, and it's more than 1 program that is affected by this.
>>
>> Can you tell me which program delays so long?  I'd like to see it.
>>
>> I suspect that's a bug.  And, of course, people are capable of using
>> -XX:-UseCountedLoopSafepoints themselves.
> 
> This is not a bug, it is a very known Hotspot issue:
>   http://psy-lob-saw.blogspot.de/2015/12/safepoints.html
>   http://psy-lob-saw.blogspot.de/2016/02/wait-for-it-counteduncounted-loops.html

Yes, yes, I know about counted loop safepoints.  :-)

> If you want a contrived example, here's one:
> 
> http://icedtea.classpath.org/people/shade/gc-bench/file/5b77fb55a8b6/src/main/java/org/openjdk/gcbench/yield/ArrayIteration.java
> 
> With 100M array, on my high-end i7 we have 300ms TTSP, which completely
> dominates Shenandoah pause time. With safepoints in the loop TTSP is down to 1-5ms.

Sure, but I was asking about a *program* which was affected by a multiple-
second safepoint delay.  I've never seen such a bad one.  I know that it's
possible in theory.

> Another one:
>  http://icedtea.classpath.org/people/shade/gc-bench/file/4c32eb6c67b0/src/main/java/org/openjdk/gcbench/yield/MonteCarloPI.java
> 
> With 100M samples one MonteCarlo run takes 1s, and that's the TTSP on my desktop
> as well. With safepoints in the loop TTSP is down to 1-5ms.

OK, right.  So I take it that MonteCarloPI is an example of a real program
which is affected in this way.

> There is an alternative solution: loop mining, i.e. replacing one big loop with
> two nested loops, and safepointing the outer one. This requires heavy changes in
> C2. Roland wanted to take on this after the Xmas break.

I can see the sense in that.

Andrew.


From rkennke at redhat.com  Wed Dec 21 17:56:29 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 21 Dec 2016 18:56:29 +0100
Subject: RFR (JDK8-only): Fix freeze on OOM-on-evac regarding the PLL
Message-ID: <1482342989.2807.77.camel@redhat.com>

This is a complicated one. We may get a freeze under the following
situation:

when the final-mark pause is left, the ShenandoahConcurrentThread sends
a message to the SurrogateLockerThread to release the pending-list-lock 
(see VM_ShenandoahReferenceOperation::doit_epilogue()). The
SurrogateLockerThread is a Java thread that gets kicked off right after
the pause. It attempts to acquire the PLL (a Java lock) and thus
employs a write-barrier on it. When that write-barrier runs out-of-
memory, it ends up in our oom_during_evacuation() loop and is waiting
for the _evacuation_in_progress flag to get cleared. However, since the
ShenandoahConcurrentThread is waiting for the SLT to finish, we never
get to where we clear that flag (we don't even kick off evacuation
yet).

The proposed solution attempts to evacuate the PLL during the pause. If
it succeeds, then the write-barrier will simply pick up the to-space
object. If it fails, we schedule a full-gc, and turn off evacuation
before leaving the pause. In no case can the write-barrier on the PLL
run into OOM, and in all cases will it be correctly unlocked.

Luckily for us, the whole PLL madness has been changed in a very
positive way in JDK9, so this change does not apply there.

http://cr.openjdk.java.net/~rkennke/fixoomevacpllfreeze/webrev.00/

Ok to push?

Roman

From rkennke at redhat.com  Wed Dec 21 18:16:03 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 21 Dec 2016 19:16:03 +0100
Subject: RFR (JDK8-only): Fix freeze on OOM-on-evac regarding the PLL
In-Reply-To: <1482342989.2807.77.camel@redhat.com>
References: <1482342989.2807.77.camel@redhat.com>
Message-ID: <1482344163.2807.78.camel@redhat.com>

Oh btw, I tested that by running specjvm with aggressive heuristics,
this used to sometimes freeze before.

Roman

Am Mittwoch, den 21.12.2016, 18:56 +0100 schrieb Roman Kennke:
> This is a complicated one. We may get a freeze under the following
> situation:
> 
> when the final-mark pause is left, the ShenandoahConcurrentThread
> sends
> a message to the SurrogateLockerThread to release the pending-list-
> lock?
> (see VM_ShenandoahReferenceOperation::doit_epilogue()). The
> SurrogateLockerThread is a Java thread that gets kicked off right
> after
> the pause. It attempts to acquire the PLL (a Java lock) and thus
> employs a write-barrier on it. When that write-barrier runs out-of-
> memory, it ends up in our oom_during_evacuation() loop and is waiting
> for the _evacuation_in_progress flag to get cleared. However, since
> the
> ShenandoahConcurrentThread is waiting for the SLT to finish, we never
> get to where we clear that flag (we don't even kick off evacuation
> yet).
> 
> The proposed solution attempts to evacuate the PLL during the pause.
> If
> it succeeds, then the write-barrier will simply pick up the to-space
> object. If it fails, we schedule a full-gc, and turn off evacuation
> before leaving the pause. In no case can the write-barrier on the PLL
> run into OOM, and in all cases will it be correctly unlocked.
> 
> Luckily for us, the whole PLL madness has been changed in a very
> positive way in JDK9, so this change does not apply there.
> 
> http://cr.openjdk.java.net/~rkennke/fixoomevacpllfreeze/webrev.00/
> 
> Ok to push?
> 
> Roman

From zgu at redhat.com  Wed Dec 21 18:18:24 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 21 Dec 2016 13:18:24 -0500
Subject: RFR (JDK8-only): Fix freeze on OOM-on-evac regarding the PLL
In-Reply-To: <1482342989.2807.77.camel@redhat.com>
References: <1482342989.2807.77.camel@redhat.com>
Message-ID: <cbf68abf-32b6-9bc2-f706-5c3cbc946663@redhat.com>

Okay.

-Zhengyu


On 12/21/2016 12:56 PM, Roman Kennke wrote:
> This is a complicated one. We may get a freeze under the following
> situation:
>
> when the final-mark pause is left, the ShenandoahConcurrentThread sends
> a message to the SurrogateLockerThread to release the pending-list-lock
> (see VM_ShenandoahReferenceOperation::doit_epilogue()). The
> SurrogateLockerThread is a Java thread that gets kicked off right after
> the pause. It attempts to acquire the PLL (a Java lock) and thus
> employs a write-barrier on it. When that write-barrier runs out-of-
> memory, it ends up in our oom_during_evacuation() loop and is waiting
> for the _evacuation_in_progress flag to get cleared. However, since the
> ShenandoahConcurrentThread is waiting for the SLT to finish, we never
> get to where we clear that flag (we don't even kick off evacuation
> yet).
>
> The proposed solution attempts to evacuate the PLL during the pause. If
> it succeeds, then the write-barrier will simply pick up the to-space
> object. If it fails, we schedule a full-gc, and turn off evacuation
> before leaving the pause. In no case can the write-barrier on the PLL
> run into OOM, and in all cases will it be correctly unlocked.
>
> Luckily for us, the whole PLL madness has been changed in a very
> positive way in JDK9, so this change does not apply there.
>
> http://cr.openjdk.java.net/~rkennke/fixoomevacpllfreeze/webrev.00/
>
> Ok to push?
>
> Roman


From roman at kennke.org  Wed Dec 21 18:28:05 2016
From: roman at kennke.org (roman at kennke.org)
Date: Wed, 21 Dec 2016 18:28:05 +0000
Subject: hg: shenandoah/jdk8u/hotspot: Fix freeze on OOM-on-evac regarding the
	PLL.
Message-ID: <201612211828.uBLIS5HB011037@aojmv0008.oracle.com>

Changeset: 9ba353933d12
Author:    rkennke
Date:      2016-12-21 19:27 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/9ba353933d12

Fix freeze on OOM-on-evac regarding the PLL.

! src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.hpp
! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.cpp


From lennart.borjeson at cinnober.com  Thu Dec 22 15:44:10 2016
From: lennart.borjeson at cinnober.com (=?utf-8?B?TGVubmFydCBCw7ZyamVzb24=?=)
Date: Thu, 22 Dec 2016 15:44:10 +0000
Subject: Shenandoah and Graal?
Message-ID: <02840F81-337E-426F-BDE1-593D2B4F8F89@cinnober.com>

I?ve noticed that Graal seems to have been integrated in the openjdk9 sources as of build 150.

I?ve already mentioned I?m getting better and better results with Shenadoah, but since I?ve have got encouraging results when testing with the Graal compiler, I?d like to eventually try out graal+shenandoah.

Will that be possible? I?ve understood you?ve made shenandoah-related updates to C2, so I?d like to ask if Shenandoah is currently dependent on C2 only?

Best regards,

/Lennart B?rjeson

From rkennke at redhat.com  Thu Dec 22 15:48:20 2016
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 22 Dec 2016 16:48:20 +0100
Subject: Shenandoah and Graal?
In-Reply-To: <02840F81-337E-426F-BDE1-593D2B4F8F89@cinnober.com>
References: <02840F81-337E-426F-BDE1-593D2B4F8F89@cinnober.com>
Message-ID: <1482421700.2807.93.camel@redhat.com>

Hi Lennart,


> I?ve noticed that Graal seems to have been integrated in the openjdk9
> sources as of build 150.
> 
> I?ve already mentioned I?m getting better and better results with
> Shenadoah, but since I?ve have got encouraging results when testing
> with the Graal compiler, I?d like to eventually try out
> graal+shenandoah.
> 
> Will that be possible? I?ve understood you?ve made shenandoah-related 
> updates to C2, so I?d like to ask if Shenandoah is currently
> dependent on C2 only?

Graal does currently not compile the barriers that are required for
Shenandoah. It's on our to-do list, but currently it's not possible.

Best regards,
Roman