From maaartinus at gmail.com  Mon Jan  1 20:29:22 2018
From: maaartinus at gmail.com (Martin Grajcar)
Date: Mon, 1 Jan 2018 21:29:22 +0100
Subject: Using the Klass gap (Was: Master Thesis on Shenandoah)
Message-ID: <CAGsWfGiVDMDEHkyq=xMT41ancBJUFC86zPKCzDrf7Bj-Zj4Kqg@mail.gmail.com>

>* Am 08.11.2017 um 19:07 schrieb Dominik Inf?hr:*

I was pondering the idea to squeeze the fwd ptr into the so-called
> Klass-gap. This is 32 unused bits when the Klass* is compressed. It's
> only available for non-arrays, because for arrays, the array-length is
> squeezed into those 32bits.


A possibly stupid question, but shouldn't it be the other way round?

Currently, array length gets packed in a gap and you're thinking about
using the gap -- when available -- for the fwd ptr. This sounds slow and
complicated/

Can't you instead *always* use this gap for non-arrays and use a new slot
for the array length? This saves memory for non-arrays in exactly the same
way and needs no new conditional logic (I guess, arrays can already deal
with the case they need a new slot  for their length).

From rkennke at redhat.com  Tue Jan  2 11:23:23 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 2 Jan 2018 12:23:23 +0100
Subject: Using the Klass gap (Was: Master Thesis on Shenandoah)
In-Reply-To: <CAGsWfGiVDMDEHkyq=xMT41ancBJUFC86zPKCzDrf7Bj-Zj4Kqg@mail.gmail.com>
References: <CAGsWfGiVDMDEHkyq=xMT41ancBJUFC86zPKCzDrf7Bj-Zj4Kqg@mail.gmail.com>
Message-ID: <a84a616d-6cd4-84e4-8f9a-daefd067b42f@redhat.com>

Am 01.01.2018 um 21:29 schrieb Martin Grajcar:
>> * Am 08.11.2017 um 19:07 schrieb Dominik Inf?hr:*
> 
> I was pondering the idea to squeeze the fwd ptr into the so-called
>> Klass-gap. This is 32 unused bits when the Klass* is compressed. It's
>> only available for non-arrays, because for arrays, the array-length is
>> squeezed into those 32bits.
> 
> 
> A possibly stupid question, but shouldn't it be the other way round?
> 
> Currently, array length gets packed in a gap and you're thinking about
> using the gap -- when available -- for the fwd ptr. This sounds slow and
> complicated/
> 
> Can't you instead *always* use this gap for non-arrays and use a new slot
> for the array length? This saves memory for non-arrays in exactly the same
> way and needs no new conditional logic (I guess, arrays can already deal
> with the case they need a new slot  for their length).

Yes, this sounds like an attractive possibility. :-)

Thanks,
Roman


From zgu at redhat.com  Tue Jan  2 13:52:50 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 2 Jan 2018 08:52:50 -0500
Subject: Using the Klass gap (Was: Master Thesis on Shenandoah)
In-Reply-To: <a84a616d-6cd4-84e4-8f9a-daefd067b42f@redhat.com>
References: <CAGsWfGiVDMDEHkyq=xMT41ancBJUFC86zPKCzDrf7Bj-Zj4Kqg@mail.gmail.com>
 <a84a616d-6cd4-84e4-8f9a-daefd067b42f@redhat.com>
Message-ID: <af3649b3-76df-7a59-ecb1-a237820c9109@redhat.com>


On 01/02/2018 06:23 AM, Roman Kennke wrote:
> Am 01.01.2018 um 21:29 schrieb Martin Grajcar:
>>> * Am 08.11.2017 um 19:07 schrieb Dominik Inf?hr:*
>>
>> I was pondering the idea to squeeze the fwd ptr into the so-called
>>> Klass-gap. This is 32 unused bits when the Klass* is compressed. It's
>>> only available for non-arrays, because for arrays, the array-length is
>>> squeezed into those 32bits.
>>
>>
>> A possibly stupid question, but shouldn't it be the other way round?
>>
>> Currently, array length gets packed in a gap and you're thinking about
>> using the gap -- when available -- for the fwd ptr. This sounds slow and
>> complicated/
>>
>> Can't you instead *always* use this gap for non-arrays and use a new slot
>> for the array length? This saves memory for non-arrays in exactly the 
>> same
>> way and needs no new conditional logic (I guess, arrays can already deal
>> with the case they need a new slot  for their length).
> 
> Yes, this sounds like an attractive possibility. :-)

Agree. I think Java will have to switch to 64-bit array index at some point.

-Zhengyu

> 
> Thanks,
> Roman
> 

From aph at redhat.com  Tue Jan  2 14:07:11 2018
From: aph at redhat.com (Andrew Haley)
Date: Tue, 2 Jan 2018 14:07:11 +0000
Subject: Using the Klass gap (Was: Master Thesis on Shenandoah)
In-Reply-To: <a84a616d-6cd4-84e4-8f9a-daefd067b42f@redhat.com>
References: <CAGsWfGiVDMDEHkyq=xMT41ancBJUFC86zPKCzDrf7Bj-Zj4Kqg@mail.gmail.com>
 <a84a616d-6cd4-84e4-8f9a-daefd067b42f@redhat.com>
Message-ID: <5f0c8175-b04c-2b13-d54b-c16d4dc45804@redhat.com>

On 02/01/18 11:23, Roman Kennke wrote:
> Am 01.01.2018 um 21:29 schrieb Martin Grajcar:
>>> * Am 08.11.2017 um 19:07 schrieb Dominik Inf?hr:*
>>
>> I was pondering the idea to squeeze the fwd ptr into the so-called
>>> Klass-gap. This is 32 unused bits when the Klass* is compressed. It's
>>> only available for non-arrays, because for arrays, the array-length is
>>> squeezed into those 32bits.
>>
>>
>> A possibly stupid question, but shouldn't it be the other way round?
>>
>> Currently, array length gets packed in a gap and you're thinking about
>> using the gap -- when available -- for the fwd ptr. This sounds slow and
>> complicated/
>>
>> Can't you instead *always* use this gap for non-arrays and use a new slot
>> for the array length? This saves memory for non-arrays in exactly the same
>> way and needs no new conditional logic (I guess, arrays can already deal
>> with the case they need a new slot  for their length).
> 
> Yes, this sounds like an attractive possibility. :-)

Certainly.  That negative index thing on the read barrier side is
rather icky.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From aph at redhat.com  Tue Jan  2 14:21:30 2018
From: aph at redhat.com (Andrew Haley)
Date: Tue, 2 Jan 2018 14:21:30 +0000
Subject: RFR: Check BS type in immByteMapBase predicate
In-Reply-To: <07ece366-e8b1-96f5-2539-cbe07edd8a6d@redhat.com>
References: <66f43903-b1ed-a8e5-0283-91afc81a5222@redhat.com>
 <e6bf9d8b-72de-c062-f277-1b2def61cb25@redhat.com>
 <5d41f45a-aca8-c74c-a4dd-37e327b586d3@kennke.org>
 <07ece366-e8b1-96f5-2539-cbe07edd8a6d@redhat.com>
Message-ID: <29ec0b2c-e74d-3421-6051-f5675efe98fb@redhat.com>

On 05/12/17 12:19, Aleksey Shipilev wrote:
> On 12/05/2017 01:11 PM, Roman Kennke wrote:
>> Am 05.12.2017 um 11:55 schrieb Aleksey Shipilev:
>>> On 12/05/2017 11:50 AM, Roman Kennke wrote:
>>> ?What would happen if code uses that operand, but new predicate mismatches it (e.g. in Shenandoah)?
>> It cannot be used in Shenandoah because we don't? use the CardTableModRefBS. Checking for the BS
>> type seems the safest way to prevent the bug.
> 
> Oh, okay.
> 
>>>> I intend to push backports of this to 9 and 8 too. Do I need extra reviews for those?
>>> Since this is not 9- or 8u-specific, I think you just push to sh/jdk10, and then regular backports
>>> process handles the propagation to sh/jdk9 and sh/jdk10.
>>
>> Ok.
> 
> This is okay to go to sh/jdk10. Can you give aarch64 maintainers a heads-up about this fix? It
> probably warrants the fix in upstream for other collector's benefit, like Epsilon.

It looks OK.  If we have a bug report we can push it to all live
repos.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From rkennke at redhat.com  Tue Jan  2 17:08:46 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 2 Jan 2018 18:08:46 +0100
Subject: RFR: Check BS type in immByteMapBase predicate
In-Reply-To: <29ec0b2c-e74d-3421-6051-f5675efe98fb@redhat.com>
References: <66f43903-b1ed-a8e5-0283-91afc81a5222@redhat.com>
 <e6bf9d8b-72de-c062-f277-1b2def61cb25@redhat.com>
 <5d41f45a-aca8-c74c-a4dd-37e327b586d3@kennke.org>
 <07ece366-e8b1-96f5-2539-cbe07edd8a6d@redhat.com>
 <29ec0b2c-e74d-3421-6051-f5675efe98fb@redhat.com>
Message-ID: <80ae30ef-b73c-af85-e193-f9a43c6b6764@redhat.com>

Am 02.01.2018 um 15:21 schrieb Andrew Haley:
> On 05/12/17 12:19, Aleksey Shipilev wrote:
>> On 12/05/2017 01:11 PM, Roman Kennke wrote:
>>> Am 05.12.2017 um 11:55 schrieb Aleksey Shipilev:
>>>> On 12/05/2017 11:50 AM, Roman Kennke wrote:
>>>>  ?What would happen if code uses that operand, but new predicate mismatches it (e.g. in Shenandoah)?
>>> It cannot be used in Shenandoah because we don't? use the CardTableModRefBS. Checking for the BS
>>> type seems the safest way to prevent the bug.
>>
>> Oh, okay.
>>
>>>>> I intend to push backports of this to 9 and 8 too. Do I need extra reviews for those?
>>>> Since this is not 9- or 8u-specific, I think you just push to sh/jdk10, and then regular backports
>>>> process handles the propagation to sh/jdk9 and sh/jdk10.
>>>
>>> Ok.
>>
>> This is okay to go to sh/jdk10. Can you give aarch64 maintainers a heads-up about this fix? It
>> probably warrants the fix in upstream for other collector's benefit, like Epsilon.
> 
> It looks OK.  If we have a bug report we can push it to all live
> repos.
> 

Thanks for sending a heads-up. The bug is here:
https://bugs.openjdk.java.net/browse/JDK-8193193

The review thread here:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-December/027858.html

And the commit is here now:
http://hg.openjdk.java.net/jdk/hs/rev/9ca19ebea22d

Thanks, Roman

From aph at redhat.com  Wed Jan  3 10:45:16 2018
From: aph at redhat.com (Andrew Haley)
Date: Wed, 3 Jan 2018 10:45:16 +0000
Subject: RFR: Missing enter/leave around keep_alive_barrier in AArch64
In-Reply-To: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com>
References: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com>
Message-ID: <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com>

On 07/12/17 13:18, Roman Kennke wrote:
> I've been missing enter/leave calls around the SATB pre barrier call in 
> MacroAssembler::keep_alive_barrier() for Shenandoah. This has been 
> sending EvilSyncBug (and possible some other tests) into endless loops.
> 
> The cleanest place to have them is in the (only) user of it in 
> generate_Reference_get():
> 
> http://cr.openjdk.java.net/~rkennke/aarch64-enter-leave/webrev.00/
> 
> Test: EvilSyncBug terminates now (aarch64). Running other tests right now
> 
> Ok?

All this saving and restoring of registers looks fantastically inefficient.
Is it the this does not matter because it is very rare?

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From rkennke at redhat.com  Wed Jan  3 12:08:55 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 3 Jan 2018 13:08:55 +0100
Subject: RFR: Missing enter/leave around keep_alive_barrier in AArch64
In-Reply-To: <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com>
References: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com>
 <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com>
Message-ID: <f181244b-979c-fb7c-356d-a887f139c6eb@redhat.com>

Am 03.01.2018 um 11:45 schrieb Andrew Haley:
> On 07/12/17 13:18, Roman Kennke wrote:
>> I've been missing enter/leave calls around the SATB pre barrier call in
>> MacroAssembler::keep_alive_barrier() for Shenandoah. This has been
>> sending EvilSyncBug (and possible some other tests) into endless loops.
>>
>> The cleanest place to have them is in the (only) user of it in
>> generate_Reference_get():
>>
>> http://cr.openjdk.java.net/~rkennke/aarch64-enter-leave/webrev.00/
>>
>> Test: EvilSyncBug terminates now (aarch64). Running other tests right now
>>
>> Ok?
> 
> All this saving and restoring of registers looks fantastically inefficient.
> Is it the this does not matter because it is very rare?
> 

Are you referring to enter()/leave() around calling the 
keep-alive-barriers? I think this is ok: it only pushes/pops a stack 
frame, and it is only needed and done in the Reference_get() interpreter 
'intrinsic', because it doesn't have a stack frame on its own.

Roman

From aph at redhat.com  Wed Jan  3 12:16:45 2018
From: aph at redhat.com (Andrew Haley)
Date: Wed, 3 Jan 2018 12:16:45 +0000
Subject: RFR: Missing enter/leave around keep_alive_barrier in AArch64
In-Reply-To: <f181244b-979c-fb7c-356d-a887f139c6eb@redhat.com>
References: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com>
 <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com>
 <f181244b-979c-fb7c-356d-a887f139c6eb@redhat.com>
Message-ID: <7938b616-d257-eb96-2288-8e17ff83e4fe@redhat.com>

On 03/01/18 12:08, Roman Kennke wrote:
> Are you referring to enter()/leave() around calling the 
> keep-alive-barriers? I think this is ok: it only pushes/pops a stack 
> frame, and it is only needed and done in the Reference_get() interpreter 
> 'intrinsic', because it doesn't have a stack frame on its own.

I'm thinking about the SATB barrier.  Calling into the runtime
clobbers all call-clobbered registers, and that's a lot, just to
push one pointer onto a list.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From rkennke at redhat.com  Wed Jan  3 12:34:10 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 3 Jan 2018 13:34:10 +0100
Subject: RFR: Missing enter/leave around keep_alive_barrier in AArch64
In-Reply-To: <7938b616-d257-eb96-2288-8e17ff83e4fe@redhat.com>
References: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com>
 <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com>
 <f181244b-979c-fb7c-356d-a887f139c6eb@redhat.com>
 <7938b616-d257-eb96-2288-8e17ff83e4fe@redhat.com>
Message-ID: <1aa12595-2582-e730-3614-e94851534284@redhat.com>

Am 03.01.2018 um 13:16 schrieb Andrew Haley:
> On 03/01/18 12:08, Roman Kennke wrote:
>> Are you referring to enter()/leave() around calling the
>> keep-alive-barriers? I think this is ok: it only pushes/pops a stack
>> frame, and it is only needed and done in the Reference_get() interpreter
>> 'intrinsic', because it doesn't have a stack frame on its own.
> 
> I'm thinking about the SATB barrier.  Calling into the runtime
> clobbers all call-clobbered registers, and that's a lot, just to
> push one pointer onto a list.
> 

Ah. This is ok. There is an assembly fast-path that checks for 
SATB-active, and if it is, pushes the pointer to the list. Only when the 
buffer is full, it calls into the slowpath/runtime, and only then it 
needs to push/pop the registers. And only in interpreted code.

Roman


From zgu at redhat.com  Wed Jan  3 18:29:26 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 3 Jan 2018 13:29:26 -0500
Subject: RFR: Minor cleanup, uses latest Atomic API
Message-ID: <5bc3c6bf-c222-b391-dda3-39f1d6a8a2a3@redhat.com>

Minor cleanup. Uses Atomic::sub() and Atomic::replace_if_null() APIs.

Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/atomic_cleanup/webrev.00/


Test:
   hotspot_gc_shenandoah (fastdebug + release)


Thanks,

-Zhengyu

From rkennke at redhat.com  Wed Jan  3 18:40:37 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 03 Jan 2018 19:40:37 +0100
Subject: RFR: Minor cleanup, uses latest Atomic API
In-Reply-To: <5bc3c6bf-c222-b391-dda3-39f1d6a8a2a3@redhat.com>
References: <5bc3c6bf-c222-b391-dda3-39f1d6a8a2a3@redhat.com>
Message-ID: <27EE6BAC-3C0C-4DF9-8663-EB540D4F604E@redhat.com>

Looks good! Thanks!

Am 3. Januar 2018 19:29:26 MEZ schrieb Zhengyu Gu <zgu at redhat.com>:
>Minor cleanup. Uses Atomic::sub() and Atomic::replace_if_null() APIs.
>
>Webrev:
>http://cr.openjdk.java.net/~zgu/shenandoah/atomic_cleanup/webrev.00/
>
>
>Test:
>   hotspot_gc_shenandoah (fastdebug + release)
>
>
>Thanks,
>
>-Zhengyu

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From zgu at redhat.com  Wed Jan  3 18:48:58 2018
From: zgu at redhat.com (zgu at redhat.com)
Date: Wed, 03 Jan 2018 18:48:58 +0000
Subject: hg: shenandoah/jdk10: Minor cleanup, uses latest Atomic API
Message-ID: <201801031848.w03ImxWc027341@aojmv0008.oracle.com>

Changeset: 1819ee64325f
Author:    zgu
Date:      2018-01-03 13:44 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/1819ee64325f

Minor cleanup, uses latest Atomic API

! src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.hpp
! src/hotspot/share/gc/shenandoah/shenandoahStrDedupTable.cpp


From shade at redhat.com  Tue Jan  9 15:28:52 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 9 Jan 2018 16:28:52 +0100
Subject: RFR: Match barrier fastpath checks better
Message-ID: <ded2e6e0-be54-ece7-ca4b-a6f480deb1d0@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/match-barrier-checks/webrev.01/
(Roland made the draft revision of this patch last year)

Current barrier fastpath checks the flags like this:

   0x0: movzbl 0x3d8(%r15),%r10d ; check evac-in-progress
  +0x8: test   %r10d,%r10d
  +0xB: jne    SLOW-PATH
 +0x11: ...

This wastes the register %reg, which is bad when barriers are back-to-back and register pressure is
high. The fix trivially folds the checks against memory with byte-sized immediates with cmpb, so the
resulting code is register-less and shorter:

   0x0: cmpb   $0x0,0x3d8(%r15)
  +0x8: jne    SLOW-PATH
  +0xE: ...

This follows similar .ad patterns that fold particular cmp shapes, and the fix would be upstreamed
separately. We would like to have this in Shenandoah repos for more thorough testing. "Unsigned"
shape covers Shenandoah WB checks, and "signed" covers SATB checks. (Amusingly, this affects C2, but
not C1, which generates cmpb for cases like these.) We actually need only tests against zero-es, but
there is nothing that prevents us to check for the entire range of bytes.

Regular benchmarks are affected very little, with some tiny improvements -- because barriers there
are already well-optimized. But in cases where barriers are not optimized(-able), the improvement is
substantial. For example, in recent SPSCQueue benchmarks [1], the score improved around +50%.

Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm

Thanks,
-Aleksey

[1] http://cr.openjdk.java.net/~shade/shenandoah/jctools-QueueThroughputBackoffNone.txt


From rkennke at redhat.com  Tue Jan  9 15:57:05 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 9 Jan 2018 16:57:05 +0100
Subject: RFR: Match barrier fastpath checks better
In-Reply-To: <ded2e6e0-be54-ece7-ca4b-a6f480deb1d0@redhat.com>
References: <ded2e6e0-be54-ece7-ca4b-a6f480deb1d0@redhat.com>
Message-ID: <47f6a86d-b7f8-8a87-d359-93af58cd69de@redhat.com>

Am 09.01.2018 um 16:28 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/match-barrier-checks/webrev.01/
> (Roland made the draft revision of this patch last year)
> 
> Current barrier fastpath checks the flags like this:
> 
>     0x0: movzbl 0x3d8(%r15),%r10d ; check evac-in-progress
>    +0x8: test   %r10d,%r10d
>    +0xB: jne    SLOW-PATH
>   +0x11: ...
> 
> This wastes the register %reg, which is bad when barriers are back-to-back and register pressure is
> high. The fix trivially folds the checks against memory with byte-sized immediates with cmpb, so the
> resulting code is register-less and shorter:
> 
>     0x0: cmpb   $0x0,0x3d8(%r15)
>    +0x8: jne    SLOW-PATH
>    +0xE: ...
> 
> This follows similar .ad patterns that fold particular cmp shapes, and the fix would be upstreamed
> separately. We would like to have this in Shenandoah repos for more thorough testing. "Unsigned"
> shape covers Shenandoah WB checks, and "signed" covers SATB checks. (Amusingly, this affects C2, but
> not C1, which generates cmpb for cases like these.) We actually need only tests against zero-es, but
> there is nothing that prevents us to check for the entire range of bytes.
> 
> Regular benchmarks are affected very little, with some tiny improvements -- because barriers there
> are already well-optimized. But in cases where barriers are not optimized(-able), the improvement is
> substantial. For example, in recent SPSCQueue benchmarks [1], the score improved around +50%.
> 
> Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm
> 
> Thanks,
> -Aleksey
> 
> [1] http://cr.openjdk.java.net/~shade/shenandoah/jctools-QueueThroughputBackoffNone.txt
> 

Looks good to me. Will test it later with traversal heuristics.

From rwestrel at redhat.com  Wed Jan 10 07:51:48 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 10 Jan 2018 08:51:48 +0100
Subject: RFR: Match barrier fastpath checks better
In-Reply-To: <ded2e6e0-be54-ece7-ca4b-a6f480deb1d0@redhat.com>
References: <ded2e6e0-be54-ece7-ca4b-a6f480deb1d0@redhat.com>
Message-ID: <dk61siy2lt7.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~shade/shenandoah/match-barrier-checks/webrev.01/

Good. Thanks for taking care of that.

Roland.

From ashipile at redhat.com  Wed Jan 10 09:25:51 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 10 Jan 2018 09:25:51 +0000
Subject: hg: shenandoah/jdk10: Match barrier fastpath checks better
Message-ID: <201801100925.w0A9PpjI015263@aojmv0008.oracle.com>

Changeset: 5eee46621175
Author:    shade
Date:      2018-01-09 16:05 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/5eee46621175

Match barrier fastpath checks better

! src/hotspot/cpu/x86/x86_64.ad


From shade at redhat.com  Wed Jan 10 09:45:26 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jan 2018 10:45:26 +0100
Subject: Perf: SATB and WB coalescing
Message-ID: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>

If you do a few back-to-back reference stores, like this:

http://icedtea.classpath.org/hg/gc-bench/file/6ec38e1bea7a/src/main/java/org/openjdk/gcbench/wip/BarriersMultiple.java

Then you shall find what WB coalescing breaks because of the SATB barriers in-between. See:

*) No WB, no SATB -> back-to-back stores:
  http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/noWB-noSATB.perfasm

*) WB, but no SATB -> initial evac-in-progress check, then back-to-back stores with RBs:
  http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-noSATB.perfasm

*) WB with SATB -> interleaved evac-in-progress and conc-mark-in-progress checks:
  http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB.perfasm

It seems the impact of the non-coalesced SATB barriers alone is the culprit, and WB coalescing is
the second-order effect:

Benchmark                                 Mode  Cnt   Score    Error  Units

# Base
BarriersMultiple.test                     avgt   15   2.739 ?  0.003  ns/op
BarriersMultiple.test:L1-dcache-loads     avgt    3  13.128 ?  0.475   #/op
BarriersMultiple.test:L1-dcache-stores    avgt    3   8.103 ?  0.133   #/op
BarriersMultiple.test:branches            avgt    3   4.039 ?  0.213   #/op
BarriersMultiple.test:cycles              avgt    3  10.344 ?  0.413   #/op
BarriersMultiple.test:instructions        avgt    3  30.273 ?  1.280   #/op

# +WB
BarriersMultiple.test                     avgt   15   3.459 ?  0.011  ns/op
BarriersMultiple.test:L1-dcache-loads     avgt    3  19.195 ?  0.638   #/op // +6
BarriersMultiple.test:L1-dcache-stores    avgt    3   8.080 ?  0.539   #/op
BarriersMultiple.test:branches            avgt    3   4.045 ?  0.118   #/op
BarriersMultiple.test:cycles              avgt    3  13.031 ?  0.324   #/op // +3
BarriersMultiple.test:instructions        avgt    3  40.426 ?  1.133   #/op

# +SATB
BarriersMultiple.test                     avgt   15   3.620 ?  0.005  ns/op
BarriersMultiple.test:L1-dcache-loads     avgt    3  18.148 ?  0.519   #/op // +5
BarriersMultiple.test:L1-dcache-stores    avgt    3   8.065 ?  0.409   #/op
BarriersMultiple.test:branches            avgt    3  13.115 ?  0.423   #/op
BarriersMultiple.test:cycles              avgt    3  13.628 ?  0.471   #/op // +3.5
BarriersMultiple.test:instructions        avgt    3  49.421 ?  1.880   #/op

# +SATB +WB
BarriersMultiple.test                     avgt   15   4.923 ?  0.040  ns/op
BarriersMultiple.test:L1-dcache-loads     avgt    3  28.269 ?  1.519   #/op // +15 (should be +11)
BarriersMultiple.test:L1-dcache-stores    avgt    3   8.112 ?  1.161   #/op
BarriersMultiple.test:branches            avgt    3  13.134 ?  1.134   #/op
BarriersMultiple.test:cycles              avgt    3  18.561 ?  1.198   #/op // +8 (should be +6.5)
BarriersMultiple.test:instructions        avgt    3  56.577 ?  4.024   #/op

I wonder if that means we need to go forward with tracking the GC state in one single flag, and
polling it with different masks, then coalescing the paths when masks are similar?

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 10 11:12:41 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 10 Jan 2018 12:12:41 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
Message-ID: <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>

Am 10.01.2018 um 10:45 schrieb Aleksey Shipilev:
> If you do a few back-to-back reference stores, like this:
> 
> http://icedtea.classpath.org/hg/gc-bench/file/6ec38e1bea7a/src/main/java/org/openjdk/gcbench/wip/BarriersMultiple.java
> 
> Then you shall find what WB coalescing breaks because of the SATB barriers in-between. See:
> 
> *) No WB, no SATB -> back-to-back stores:
>    http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/noWB-noSATB.perfasm
> 
> *) WB, but no SATB -> initial evac-in-progress check, then back-to-back stores with RBs:
>    http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-noSATB.perfasm
> 
> *) WB with SATB -> interleaved evac-in-progress and conc-mark-in-progress checks:
>    http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB.perfasm
> 
> It seems the impact of the non-coalesced SATB barriers alone is the culprit, and WB coalescing is
> the second-order effect:
> 
> Benchmark                                 Mode  Cnt   Score    Error  Units
> 
> # Base
> BarriersMultiple.test                     avgt   15   2.739 ?  0.003  ns/op
> BarriersMultiple.test:L1-dcache-loads     avgt    3  13.128 ?  0.475   #/op
> BarriersMultiple.test:L1-dcache-stores    avgt    3   8.103 ?  0.133   #/op
> BarriersMultiple.test:branches            avgt    3   4.039 ?  0.213   #/op
> BarriersMultiple.test:cycles              avgt    3  10.344 ?  0.413   #/op
> BarriersMultiple.test:instructions        avgt    3  30.273 ?  1.280   #/op
> 
> # +WB
> BarriersMultiple.test                     avgt   15   3.459 ?  0.011  ns/op
> BarriersMultiple.test:L1-dcache-loads     avgt    3  19.195 ?  0.638   #/op // +6
> BarriersMultiple.test:L1-dcache-stores    avgt    3   8.080 ?  0.539   #/op
> BarriersMultiple.test:branches            avgt    3   4.045 ?  0.118   #/op
> BarriersMultiple.test:cycles              avgt    3  13.031 ?  0.324   #/op // +3
> BarriersMultiple.test:instructions        avgt    3  40.426 ?  1.133   #/op
> 
> # +SATB
> BarriersMultiple.test                     avgt   15   3.620 ?  0.005  ns/op
> BarriersMultiple.test:L1-dcache-loads     avgt    3  18.148 ?  0.519   #/op // +5
> BarriersMultiple.test:L1-dcache-stores    avgt    3   8.065 ?  0.409   #/op
> BarriersMultiple.test:branches            avgt    3  13.115 ?  0.423   #/op
> BarriersMultiple.test:cycles              avgt    3  13.628 ?  0.471   #/op // +3.5
> BarriersMultiple.test:instructions        avgt    3  49.421 ?  1.880   #/op
> 
> # +SATB +WB
> BarriersMultiple.test                     avgt   15   4.923 ?  0.040  ns/op
> BarriersMultiple.test:L1-dcache-loads     avgt    3  28.269 ?  1.519   #/op // +15 (should be +11)
> BarriersMultiple.test:L1-dcache-stores    avgt    3   8.112 ?  1.161   #/op
> BarriersMultiple.test:branches            avgt    3  13.134 ?  1.134   #/op
> BarriersMultiple.test:cycles              avgt    3  18.561 ?  1.198   #/op // +8 (should be +6.5)
> BarriersMultiple.test:instructions        avgt    3  56.577 ?  4.024   #/op
> 
> I wonder if that means we need to go forward with tracking the GC state in one single flag, and
> polling it with different masks, then coalescing the paths when masks are similar?
> 
> Thanks,
> -Aleksey
> 

That confirms what I suspected since a while. And I also sorta hope that 
the traversal GC will solve it, because it only ever polls a single 
flag. We might even want to wrap RBs into evac-flag-checks initially, so 
that the optimizer can coalesce them too, and remove lone 
evac-checks-around-RBs after optimization.

Another related issue may be that both the GC barriers and a bunch of 
other stuff pollutes the raw memory slice. Which means that an 
interleaving allocation (among other stuff) in between barriers may 
prevent coalescing and optimization. I wonder if it makes sense to put 
all GC barriers on a separate memory slice instead? We basically need a 
memory slice that says 'stuff on this slice only ever changes at 
safepoints'.

Roman


Roman

From shade at redhat.com  Wed Jan 10 11:16:37 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jan 2018 12:16:37 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
Message-ID: <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>

On 01/10/2018 12:12 PM, Roman Kennke wrote:
> That confirms what I suspected since a while. And I also sorta hope that the traversal GC will solve
> it, because it only ever polls a single flag. We might even want to wrap RBs into evac-flag-checks
> initially, so that the optimizer can coalesce them too, and remove lone evac-checks-around-RBs after
> optimization.

Let's not conflate this with traversal GC: flag handling and coalescing barriers in important even
for our regular cycle. So I'd rather improve that part of the story, and then build traversal GC on top.

Do you have a separate patch that introduces a single flag instead of the assortment of
{mark,evac,updaterefs}-in-progress and fixes all the uses around? That would be a base for further
compiler optimizations, I think.

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 10 11:20:25 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 10 Jan 2018 12:20:25 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
Message-ID: <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>

Am 10.01.2018 um 12:16 schrieb Aleksey Shipilev:
> On 01/10/2018 12:12 PM, Roman Kennke wrote:
>> That confirms what I suspected since a while. And I also sorta hope that the traversal GC will solve
>> it, because it only ever polls a single flag. We might even want to wrap RBs into evac-flag-checks
>> initially, so that the optimizer can coalesce them too, and remove lone evac-checks-around-RBs after
>> optimization.
> 
> Let's not conflate this with traversal GC: flag handling and coalescing barriers in important even
> for our regular cycle. So I'd rather improve that part of the story, and then build traversal GC on top.
> 

Sure, that makes sense.

> Do you have a separate patch that introduces a single flag instead of the assortment of
> {mark,evac,updaterefs}-in-progress and fixes all the uses around? That would be a base for further
> compiler optimizations, I think.

No, for traversal GC I simply picked one flag (evac) and barriers use 
only that.

How would you use a single flag, if we need to check 2 or 3 different 
phases?

Roman

From aph at redhat.com  Wed Jan 10 11:22:05 2018
From: aph at redhat.com (Andrew Haley)
Date: Wed, 10 Jan 2018 11:22:05 +0000
Subject: Perf: SATB and WB coalescing
In-Reply-To: <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
Message-ID: <9c5043ae-c437-af9e-649b-9ad46a19c4b8@redhat.com>

On 10/01/18 11:12, Roman Kennke wrote:
> We basically need a 
> memory slice that says 'stuff on this slice only ever changes at 
> safepoints'.

I need something very similar for unmappable ByteBuffers.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From shade at redhat.com  Wed Jan 10 11:30:22 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jan 2018 12:30:22 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
 <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
Message-ID: <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>

On 01/10/2018 12:20 PM, Roman Kennke wrote:
> Am 10.01.2018 um 12:16 schrieb Aleksey Shipilev:
>> On 01/10/2018 12:12 PM, Roman Kennke wrote:
>>> That confirms what I suspected since a while. And I also sorta hope that the traversal GC will solve
>>> it, because it only ever polls a single flag. We might even want to wrap RBs into evac-flag-checks
>>> initially, so that the optimizer can coalesce them too, and remove lone evac-checks-around-RBs after
>>> optimization.
>>
>> Let's not conflate this with traversal GC: flag handling and coalescing barriers in important even
>> for our regular cycle. So I'd rather improve that part of the story, and then build traversal GC
>> on top.
>>
> 
> Sure, that makes sense.
> 
>> Do you have a separate patch that introduces a single flag instead of the assortment of
>> {mark,evac,updaterefs}-in-progress and fixes all the uses around? That would be a base for further
>> compiler optimizations, I think.
> 
> No, for traversal GC I simply picked one flag (evac) and barriers use only that.
> 
> How would you use a single flag, if we need to check 2 or 3 different phases?

Flag is int, and then bitmask it?

Something like:

  // Describes the current global GC state
  enum ShenandoahCollectorState {
    // Heap is not stable: there are forwarded objects.
    _heap_unstable,

    // Heap is under stabilization: do not introduce new forwarded objects.
    _heap_updating,

    // Heap is under evacuation: new forwarded objects are introduced.
    _heap_evacuating,

    // Heap is under marking.
    _heap_marking,
  };

  enum ShenandoahCollectorStateMask {
    _mask_heap_marking = 1 << _heap_marking;
    _mask_heap_evacuating = 1 << _heap_evacuating;
    ...
  };

Later:

  movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr());
  movb(tmp, Address(tmp, 0));
  testb(tmp, ShenandoahHeap::_mask_heap_evacuating);

I think phases really have different bit patterns then:

  mark:                          _heap_marking
  mark + UR:                     _heap_marking + _heap_updating + _heap_unstable
  evac:                          _heap_evacuating + _heap_unstable
  update-refs:                   _heap_updating + _heap_unstable
  idle:                          0
  idle + waiting for mark to UR: _heap_unstable
  partial:                       _heap_evacuating + _heap_updating + _heap_unstable

Barriers mapping:
  RB, CAS, ACMP if _heap_unstable
  WB if _heap_evacuating
  SVRB if _heap_updating, but not _heap_evacuating
  SWRB if _heap_updating, and _heap_evacuating
  SATB if _heap_marking

Something like that...

Then a happy path in compiler-specialized code basically checks the GC state for 0, which means no
barriers are required whatsoever until the safepoint hits, or _heap_unstable, which means only RBs
are required on that path until the safepoint.

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 10 11:35:46 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 10 Jan 2018 12:35:46 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
 <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
 <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>
Message-ID: <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com>

Am 10.01.2018 um 12:30 schrieb Aleksey Shipilev:
> On 01/10/2018 12:20 PM, Roman Kennke wrote:
>> Am 10.01.2018 um 12:16 schrieb Aleksey Shipilev:
>>> On 01/10/2018 12:12 PM, Roman Kennke wrote:
>>>> That confirms what I suspected since a while. And I also sorta hope that the traversal GC will solve
>>>> it, because it only ever polls a single flag. We might even want to wrap RBs into evac-flag-checks
>>>> initially, so that the optimizer can coalesce them too, and remove lone evac-checks-around-RBs after
>>>> optimization.
>>>
>>> Let's not conflate this with traversal GC: flag handling and coalescing barriers in important even
>>> for our regular cycle. So I'd rather improve that part of the story, and then build traversal GC
>>> on top.
>>>
>>
>> Sure, that makes sense.
>>
>>> Do you have a separate patch that introduces a single flag instead of the assortment of
>>> {mark,evac,updaterefs}-in-progress and fixes all the uses around? That would be a base for further
>>> compiler optimizations, I think.
>>
>> No, for traversal GC I simply picked one flag (evac) and barriers use only that.
>>
>> How would you use a single flag, if we need to check 2 or 3 different phases?
> 
> Flag is int, and then bitmask it?
> 
> Something like:
> 
>    // Describes the current global GC state
>    enum ShenandoahCollectorState {
>      // Heap is not stable: there are forwarded objects.
>      _heap_unstable,
> 
>      // Heap is under stabilization: do not introduce new forwarded objects.
>      _heap_updating,
> 
>      // Heap is under evacuation: new forwarded objects are introduced.
>      _heap_evacuating,
> 
>      // Heap is under marking.
>      _heap_marking,
>    };
> 
>    enum ShenandoahCollectorStateMask {
>      _mask_heap_marking = 1 << _heap_marking;
>      _mask_heap_evacuating = 1 << _heap_evacuating;
>      ...
>    };
> 
> Later:
> 
>    movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr());
>    movb(tmp, Address(tmp, 0));
>    testb(tmp, ShenandoahHeap::_mask_heap_evacuating);
> 
> I think phases really have different bit patterns then:
> 
>    mark:                          _heap_marking
>    mark + UR:                     _heap_marking + _heap_updating + _heap_unstable
>    evac:                          _heap_evacuating + _heap_unstable
>    update-refs:                   _heap_updating + _heap_unstable
>    idle:                          0
>    idle + waiting for mark to UR: _heap_unstable
>    partial:                       _heap_evacuating + _heap_updating + _heap_unstable
> 
> Barriers mapping:
>    RB, CAS, ACMP if _heap_unstable
>    WB if _heap_evacuating
>    SVRB if _heap_updating, but not _heap_evacuating
>    SWRB if _heap_updating, and _heap_evacuating
>    SATB if _heap_marking
> 
> Something like that...
> 
> Then a happy path in compiler-specialized code basically checks the GC state for 0, which means no
> barriers are required whatsoever until the safepoint hits, or _heap_unstable, which means only RBs
> are required on that path until the safepoint.
> 
> Thanks,
> -Aleksey
> 

Ah!
I made something like this a while ago and it hasn't gone in back then:
http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/

Roman


From shade at redhat.com  Wed Jan 10 11:43:10 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jan 2018 12:43:10 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
 <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
 <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>
 <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com>
Message-ID: <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com>

On 01/10/2018 12:35 PM, Roman Kennke wrote:
> Ah!
> I made something like this a while ago and it hasn't gone in back then:
> http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/

Okay! That looks like a good start. Now we "only" need to cover all other phases, and fix up the
codegen to make use of "test 0xOFF(TLS), mask". :)

I still think the phases themselves are inconvenient to encode, because they don't say everything
about the heap. For example, you would want to disambiguate the idle phase that has forwarded
objects waiting for CM-with-UR to fix stuff up, and idle phase where everything is fixed up. Maybe
just introducing separate "idle" and "idle-need-fixup" phases would be enough?

Then we can approach compiler checking for "idle" state, and optimize the happy path accordingly.

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 10 11:45:37 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 10 Jan 2018 12:45:37 +0100
Subject: RFR: Match barrier fastpath checks better
In-Reply-To: <ded2e6e0-be54-ece7-ca4b-a6f480deb1d0@redhat.com>
References: <ded2e6e0-be54-ece7-ca4b-a6f480deb1d0@redhat.com>
Message-ID: <36120d11-e479-e174-2529-d5d66f0b40d1@redhat.com>

Am 09.01.2018 um 16:28 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/match-barrier-checks/webrev.01/
> (Roland made the draft revision of this patch last year)
> 
> Current barrier fastpath checks the flags like this:
> 
>     0x0: movzbl 0x3d8(%r15),%r10d ; check evac-in-progress
>    +0x8: test   %r10d,%r10d
>    +0xB: jne    SLOW-PATH
>   +0x11: ...
> 
> This wastes the register %reg, which is bad when barriers are back-to-back and register pressure is
> high. The fix trivially folds the checks against memory with byte-sized immediates with cmpb, so the
> resulting code is register-less and shorter:
> 
>     0x0: cmpb   $0x0,0x3d8(%r15)
>    +0x8: jne    SLOW-PATH
>    +0xE: ...
> 
> This follows similar .ad patterns that fold particular cmp shapes, and the fix would be upstreamed
> separately. We would like to have this in Shenandoah repos for more thorough testing. "Unsigned"
> shape covers Shenandoah WB checks, and "signed" covers SATB checks. (Amusingly, this affects C2, but
> not C1, which generates cmpb for cases like these.) We actually need only tests against zero-es, but
> there is nothing that prevents us to check for the entire range of bytes.
> 
> Regular benchmarks are affected very little, with some tiny improvements -- because barriers there
> are already well-optimized. But in cases where barriers are not optimized(-able), the improvement is
> substantial. For example, in recent SPSCQueue benchmarks [1], the score improved around +50%.
> 
> Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm
> 
> Thanks,
> -Aleksey
> 
> [1] http://cr.openjdk.java.net/~shade/shenandoah/jctools-QueueThroughputBackoffNone.txt
> 

I tested it with traversal GC. It works and doesn't crash. It doesn't 
seem faster. But traversal GC is handicapped anyway until we get some 
proper optimizations.

Roman


From shade at redhat.com  Wed Jan 10 12:08:40 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jan 2018 13:08:40 +0100
Subject: RFR: ShenandoahWriteBarrierRB flag to conditionally disable RB on WB
 fastpath
Message-ID: <f8b64e73-f5d6-9f82-35bb-959f9f56d308@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/barrier-disable-wb-fastpath-rb/webrev.01/

I keep reimplementing this patch during performance investigations. It is sometimes useful to
dissect the WB cost by measuring if the evac-in-progress check itself, or the RB on the fastpath is
the reason for performance penalty. New flag allows to do that.

Testing: hotspot_gc_shenandoah, eyeballing assembly

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 10 12:22:43 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 10 Jan 2018 13:22:43 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
Message-ID: <91984108-4922-3756-93d5-e57bbd28fce4@redhat.com>


> That confirms what I suspected since a while. And I also sorta hope that 
> the traversal GC will solve it, because it only ever polls a single 
> flag. We might even want to wrap RBs into evac-flag-checks initially, so 
> that the optimizer can coalesce them too, and remove lone 
> evac-checks-around-RBs after optimization.
> 
> Another related issue may be that both the GC barriers and a bunch of 
> other stuff pollutes the raw memory slice. Which means that an 
> interleaving allocation (among other stuff) in between barriers may 
> prevent coalescing and optimization. I wonder if it makes sense to put 
> all GC barriers on a separate memory slice instead? We basically need a 
> memory slice that says 'stuff on this slice only ever changes at 
> safepoints'.

Allocations are probably a bad example, because allocations *can* 
trigger safepoints (on slowpath). Not sure if we could possibly generate 
barrier-free-paths on paths with allocations but without alloc-slow-paths?

A better example is indeed SATB barriers: they currently consume and 
produce raw memory slice. Which means that they disturb optimizations of 
other barriers. I.e. they cause re-load and re-check of the 
-in-progress-flags (and thus coalescing them). As you noted, SATB 
barriers are particularily bad because they tend to interleave with RBs 
and WBs.

There are other things that produce raw memory, but cannot cause a 
safepoint that would disturb us similarily (e.g. monitorexit).

Ideally, when the new GC interface arrives, we'll get to generate the 
whole blob for 'store-oop-to-heap' in which case we can generate one 
gc-phase-check to begin with, and put all relevant barriers inside that 
check (...and still be subject to further coalescing,, path-splitting 
and loop hoisting in later optimization phases).

Roman

From rkennke at redhat.com  Wed Jan 10 12:23:14 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 10 Jan 2018 13:23:14 +0100
Subject: RFR: ShenandoahWriteBarrierRB flag to conditionally disable RB on
 WB fastpath
In-Reply-To: <f8b64e73-f5d6-9f82-35bb-959f9f56d308@redhat.com>
References: <f8b64e73-f5d6-9f82-35bb-959f9f56d308@redhat.com>
Message-ID: <fad20bba-2134-831f-260d-f0f5f4ce51ef@redhat.com>

Am 10.01.2018 um 13:08 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/barrier-disable-wb-fastpath-rb/webrev.01/
> 
> I keep reimplementing this patch during performance investigations. It is sometimes useful to
> dissect the WB cost by measuring if the evac-in-progress check itself, or the RB on the fastpath is
> the reason for performance penalty. New flag allows to do that.
> 
> Testing: hotspot_gc_shenandoah, eyeballing assembly
> 
> Thanks,
> -Aleksey
> 
Ok

From ashipile at redhat.com  Wed Jan 10 20:12:06 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 10 Jan 2018 20:12:06 +0000
Subject: hg: shenandoah/jdk10: ShenandoahWriteBarrierRB flag to conditionally
 disable RB on WB fastpath
Message-ID: <201801102012.w0AKC6EG020627@aojmv0008.oracle.com>

Changeset: 92710862e1a5
Author:    shade
Date:      2018-01-10 13:05 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/92710862e1a5

ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath

! src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
! src/hotspot/cpu/x86/macroAssembler_x86.cpp
! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp
! src/hotspot/share/opto/shenandoahSupport.cpp


From shade at redhat.com  Wed Jan 10 20:29:16 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jan 2018 21:29:16 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
 <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
 <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>
 <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com>
 <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com>
Message-ID: <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com>

On 01/10/2018 12:43 PM, Aleksey Shipilev wrote:
> On 01/10/2018 12:35 PM, Roman Kennke wrote:
>> Ah!
>> I made something like this a while ago and it hasn't gone in back then:
>> http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/
> 
> I still think the phases themselves are inconvenient to encode, because they don't say everything
> about the heap. For example, you would want to disambiguate the idle phase that has forwarded
> objects waiting for CM-with-UR to fix stuff up, and idle phase where everything is fixed up. Maybe
> just introducing separate "idle" and "idle-need-fixup" phases would be enough?

Ah, that is probably solved by treating need_update_refs specially.

> Then we can approach compiler checking for "idle" state, and optimize the happy path accordingly.

Okay, so the dirty patch for the idea:
  http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/

perfasm for the offending test:
  http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm

Both SATB and WB are checking off the same TLS flag.

Now, two ideas:

 *) The way the patch is structured now, successful testb $0x0, 0x3d8(%r15) means no barriers are
required until the next safepoint poll (e.g. no marking, no evac, no update-refs, no partial, and
*no need to update refs*) -- which means the heap is as stable as it gets;

 *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
-- which might be the lesser evil;

-Aleksey


From rkennke at redhat.com  Wed Jan 10 20:42:26 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 10 Jan 2018 21:42:26 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
 <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
 <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>
 <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com>
 <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com>
 <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com>
Message-ID: <55f7bf32-7bfd-4247-21ad-6a49ee87d728@redhat.com>

Am 10.01.2018 um 21:29 schrieb Aleksey Shipilev:
> On 01/10/2018 12:43 PM, Aleksey Shipilev wrote:
>> On 01/10/2018 12:35 PM, Roman Kennke wrote:
>>> Ah!
>>> I made something like this a while ago and it hasn't gone in back then:
>>> http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/
>>
>> I still think the phases themselves are inconvenient to encode, because they don't say everything
>> about the heap. For example, you would want to disambiguate the idle phase that has forwarded
>> objects waiting for CM-with-UR to fix stuff up, and idle phase where everything is fixed up. Maybe
>> just introducing separate "idle" and "idle-need-fixup" phases would be enough?
> 
> Ah, that is probably solved by treating need_update_refs specially.
> 
>> Then we can approach compiler checking for "idle" state, and optimize the happy path accordingly.
> 
> Okay, so the dirty patch for the idea:
>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
> 
> perfasm for the offending test:
>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
> 
> Both SATB and WB are checking off the same TLS flag.
> 
> Now, two ideas:
> 
>   *) The way the patch is structured now, successful testb $0x0, 0x3d8(%r15) means no barriers are
> required until the next safepoint poll (e.g. no marking, no evac, no update-refs, no partial, and
> *no need to update refs*) -- which means the heap is as stable as it gets;
> 
>   *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
> -- which might be the lesser evil;
> 
> -Aleksey
> 

I was discussing this with Roland before Xmas until now. There seem to 
be ways to do that and all are rather complex.

  This could lead to split-ifs and versioned-loops that generate code 
paths completely without barriers. E.g.: code shaped like this:

while (..) { // Assuming no SP inside loop
   if (evac-in-progress) {
     barrier()
   }
   store();
}

Could be:
if (evac-in-progress) {
   while (..) {
     barrier();
     store();
   }
} else {
   while (..) {
     store();
   }
}

Currently we also suffer other problems: since all evac- and satb-checks 
are consuming raw memory slice, and things like SATB barriers produce 
raw memory slice (for no really good reason, except that we store some 
non-Java-memory), we constantly pollute raw memory, leading to the 
compiler to not trust the evac-flags across multiple barriers or other 
code that produces raw memory!


Roland proposed to implement compiler optimization passes that 
specifically optimize gc-phase-checks with respect to safepoints.

I was thinking in a different direction: we could introduce a new 
special memory slice, e.g. Compile::SafepointIdx, with the meaning 
'stuff on this slice only ever changes at safepoints'. I.e. any node 
that is a safepoint or could trigger a safepoint (e.g. calls, allocs, 
etc), would produce a new state on that slice. GC-phase-checks would 
consume it. This way, I think we could automatically get what we want by 
exploiting C2's memory aliasing model. According to Roland, this is not 
very trivial either though: currently SafepointNode (and sub-classes) 
don't produce any memory state. This might need lots of work to get right.

Roman

From zgu at redhat.com  Wed Jan 10 21:18:36 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 10 Jan 2018 16:18:36 -0500
Subject: RFR: [9 Backport] Shenandoah string deduplication support
Message-ID: <cae7b600-019e-60df-b376-430b12a57b5b@redhat.com>

This is jdk9 backport of latest string deduplication implementation.


Webrev: 
http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/

Test:
   hotspot_gc_shenandoah (release + fastdebug)


Thanks,

-Zhengyu

From shade at redhat.com  Wed Jan 10 21:24:55 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jan 2018 22:24:55 +0100
Subject: RFR: [9 Backport] Shenandoah string deduplication support
In-Reply-To: <cae7b600-019e-60df-b376-430b12a57b5b@redhat.com>
References: <cae7b600-019e-60df-b376-430b12a57b5b@redhat.com>
Message-ID: <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com>

On 01/10/2018 10:18 PM, Zhengyu Gu wrote:
> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/

Thank you, this looks good.

How much was changed compared to sh/jdk10, and are there places we should take a special look at?
I assume G1 changes are the retraction to the upstream jdk9 state?

Thanks,
-Aleksey


From zgu at redhat.com  Wed Jan 10 21:31:10 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 10 Jan 2018 16:31:10 -0500
Subject: RFR: [9 Backport] Shenandoah string deduplication support
In-Reply-To: <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com>
References: <cae7b600-019e-60df-b376-430b12a57b5b@redhat.com>
 <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com>
Message-ID: <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com>


On 01/10/2018 04:24 PM, Aleksey Shipilev wrote:
> On 01/10/2018 10:18 PM, Zhengyu Gu wrote:
>> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/
> 
> Thank you, this looks good.
> 
> How much was changed compared to sh/jdk10, and are there places we should take a special look at?
> I assume G1 changes are the retraction to the upstream jdk9 state?
> 

jdk10 patch applied pretty clean. The only file got mismerged was 
shenandoahRootProcessor.cpp.

Yes, G1 changes reverted early changes back to upstream state.

Thanks,

-Zhengyu


> Thanks,
> -Aleksey
> 

From rkennke at redhat.com  Wed Jan 10 22:04:20 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 10 Jan 2018 23:04:20 +0100
Subject: RFR: Simplify and optimize ShenandoahHeap::requires_marking()
Message-ID: <a60662cb-0cc5-c36e-23ff-2ed7728f7e79@redhat.com>

When I implemented partial-GC, I complicated 
ShenandoahHeap::requires_marking(). This method basically decides which 
oops on SATB queues to keep and which to discard when filtering SATB 
queues (before processing them). The idea behind this was that we need 
all oops on the queues for partial-GC, but only not-yet-marked oops 
during concurrent marking.

Work on traversal GC lead to a similar problem, and then it occurred to 
me that we don't need to complicate that code at all: during partial GC, 
we never use the bitmap. Simply returning !is_marked_next(obj) does the 
same as return true, and is probably. This should restore a little bit 
of performance of the regular Shenandoah mode, at the potential cost of 
a little bit of performance for partial GC. However, I am not even sure 
that this code is performance critical at all. I couldn't see any 
performance changes.

http://cr.openjdk.java.net/~rkennke/req-marking/webrev.00/

Testing: hotspot_gc_shenandoah

Ok to push?

Roman

From shade at redhat.com  Wed Jan 10 22:23:25 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jan 2018 23:23:25 +0100
Subject: RFR: [9 Backport] Shenandoah string deduplication support
In-Reply-To: <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com>
References: <cae7b600-019e-60df-b376-430b12a57b5b@redhat.com>
 <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com>
 <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com>
Message-ID: <91279636-d93e-7232-9384-467d93f04c4a@redhat.com>

On 01/10/2018 10:31 PM, Zhengyu Gu wrote:
> 
> On 01/10/2018 04:24 PM, Aleksey Shipilev wrote:
>> On 01/10/2018 10:18 PM, Zhengyu Gu wrote:
>>> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/
>>
>> Thank you, this looks good.
>>
>> How much was changed compared to sh/jdk10, and are there places we should take a special look at?
>> I assume G1 changes are the retraction to the upstream jdk9 state?
>>
> 
> jdk10 patch applied pretty clean. The only file got mismerged was shenandoahRootProcessor.cpp.
> 
> Yes, G1 changes reverted early changes back to upstream state.

Good for me then!

-Aleksey


From shade at redhat.com  Wed Jan 10 22:29:07 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jan 2018 23:29:07 +0100
Subject: RFR: Simplify and optimize ShenandoahHeap::requires_marking()
In-Reply-To: <a60662cb-0cc5-c36e-23ff-2ed7728f7e79@redhat.com>
References: <a60662cb-0cc5-c36e-23ff-2ed7728f7e79@redhat.com>
Message-ID: <14f0d81d-db04-6af4-3630-c5ddb631d9e0@redhat.com>

On 01/10/2018 11:04 PM, Roman Kennke wrote:
> When I implemented partial-GC, I complicated ShenandoahHeap::requires_marking(). This method
> basically decides which oops on SATB queues to keep and which to discard when filtering SATB queues
> (before processing them). The idea behind this was that we need all oops on the queues for
> partial-GC, but only not-yet-marked oops during concurrent marking.
> 
> Work on traversal GC lead to a similar problem, and then it occurred to me that we don't need to
> complicate that code at all: during partial GC, we never use the bitmap. Simply returning
> !is_marked_next(obj) does the same as return true, and is probably. This should restore a little bit
> of performance of the regular Shenandoah mode, at the potential cost of a little bit of performance
> for partial GC. However, I am not even sure that this code is performance critical at all. I
> couldn't see any performance changes.
> 
> http://cr.openjdk.java.net/~rkennke/req-marking/webrev.00/
> 
> Testing: hotspot_gc_shenandoah
> 
> Ok to push?

That makes sense. Looks good.

Thanks,
-Aleksey


From shade at redhat.com  Thu Jan 11 10:51:24 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 11 Jan 2018 11:51:24 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
 <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
 <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>
 <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com>
 <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com>
 <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com>
Message-ID: <d25f5b04-9d1f-e070-9c35-50702867e462@redhat.com>

On 01/10/2018 09:29 PM, Aleksey Shipilev wrote:
> Okay, so the dirty patch for the idea:
>   http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
> 
> perfasm for the offending test:
>   http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
> 
>  *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
> -- which might be the lesser evil;

Hey, this one works with the dirty hack like this:
  http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch

It now drags commons GC state loads (and puts in the register):
  http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm

...and this eliminates around 8 L1 reads, that recovers 50% of the overhead:

Benchmark                               Mode  Cnt   Score    Error  Units

# -WB -SATB
BarriersMultiple.test                   avgt   15   2.760 ?  0.081  ns/op
BarriersMultiple.test:L1-dcache-loads   avgt    3  13.121 ?  0.444   #/op
BarriersMultiple.test:L1-dcache-stores  avgt    3   8.089 ?  0.141   #/op
BarriersMultiple.test:branches          avgt    3   4.039 ?  0.220   #/op
BarriersMultiple.test:cycles            avgt    3  10.429 ?  2.041   #/op
BarriersMultiple.test:instructions      avgt    3  30.306 ?  2.414   #/op

# +WB +SATB
BarriersMultiple.test                   avgt   15   4.897 ?  0.003  ns/op
BarriersMultiple.test:L1-dcache-loads   avgt    3  28.195 ?  0.838   #/op
BarriersMultiple.test:L1-dcache-stores  avgt    3   8.102 ?  0.274   #/op
BarriersMultiple.test:branches          avgt    3  13.074 ?  0.344   #/op
BarriersMultiple.test:cycles            avgt    3  18.492 ?  2.365   #/op
BarriersMultiple.test:instructions      avgt    3  56.423 ?  1.681   #/op

# +WB +SATB +TLS commoning
BarriersMultiple.test                   avgt   15   3.884 ?  0.003  ns/op
BarriersMultiple.test:L1-dcache-loads   avgt    3  20.221 ?  0.602   #/op  // -8!
BarriersMultiple.test:L1-dcache-stores  avgt    3   8.093 ?  0.264   #/op
BarriersMultiple.test:branches          avgt    3  13.133 ?  0.395   #/op
BarriersMultiple.test:cycles            avgt    3  14.668 ?  0.771   #/op  // -4!
BarriersMultiple.test:instructions      avgt    3  58.636 ?  2.368   #/op


Thanks,
-Aleksey


From roman at kennke.org  Thu Jan 11 11:17:03 2018
From: roman at kennke.org (roman at kennke.org)
Date: Thu, 11 Jan 2018 11:17:03 +0000
Subject: hg: shenandoah/jdk10: Simplify and optimize
 ShenandoahHeap::requires_marking()
Message-ID: <201801111117.w0BBH3fb027627@aojmv0008.oracle.com>

Changeset: 2795496dbaf3
Author:    rkennke
Date:      2018-01-11 12:12 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/2795496dbaf3

Simplify and optimize ShenandoahHeap::requires_marking()

! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoahPartialGC.cpp


From rkennke at redhat.com  Thu Jan 11 11:19:50 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 11 Jan 2018 12:19:50 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <d25f5b04-9d1f-e070-9c35-50702867e462@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
 <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
 <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>
 <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com>
 <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com>
 <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com>
 <d25f5b04-9d1f-e070-9c35-50702867e462@redhat.com>
Message-ID: <cf1d2a99-f5a2-67fa-a5cc-4892ad3d9e2d@redhat.com>

Am 11.01.2018 um 11:51 schrieb Aleksey Shipilev:
> On 01/10/2018 09:29 PM, Aleksey Shipilev wrote:
>> Okay, so the dirty patch for the idea:
>>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
>>

>> perfasm for the offending test:
>>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
>>
>>   *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
>> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
>> -- which might be the lesser evil;
> 
> Hey, this one works with the dirty hack like this:
>    http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch
> 
> It now drags commons GC state loads (and puts in the register):
>    http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm
> 
> ...and this eliminates around 8 L1 reads, that recovers 50% of the overhead:
> 
> Benchmark                               Mode  Cnt   Score    Error  Units
> 
> # -WB -SATB
> BarriersMultiple.test                   avgt   15   2.760 ?  0.081  ns/op
> BarriersMultiple.test:L1-dcache-loads   avgt    3  13.121 ?  0.444   #/op
> BarriersMultiple.test:L1-dcache-stores  avgt    3   8.089 ?  0.141   #/op
> BarriersMultiple.test:branches          avgt    3   4.039 ?  0.220   #/op
> BarriersMultiple.test:cycles            avgt    3  10.429 ?  2.041   #/op
> BarriersMultiple.test:instructions      avgt    3  30.306 ?  2.414   #/op
> 
> # +WB +SATB
> BarriersMultiple.test                   avgt   15   4.897 ?  0.003  ns/op
> BarriersMultiple.test:L1-dcache-loads   avgt    3  28.195 ?  0.838   #/op
> BarriersMultiple.test:L1-dcache-stores  avgt    3   8.102 ?  0.274   #/op
> BarriersMultiple.test:branches          avgt    3  13.074 ?  0.344   #/op
> BarriersMultiple.test:cycles            avgt    3  18.492 ?  2.365   #/op
> BarriersMultiple.test:instructions      avgt    3  56.423 ?  1.681   #/op
> 
> # +WB +SATB +TLS commoning
> BarriersMultiple.test                   avgt   15   3.884 ?  0.003  ns/op
> BarriersMultiple.test:L1-dcache-loads   avgt    3  20.221 ?  0.602   #/op  // -8!
> BarriersMultiple.test:L1-dcache-stores  avgt    3   8.093 ?  0.264   #/op
> BarriersMultiple.test:branches          avgt    3  13.133 ?  0.395   #/op
> BarriersMultiple.test:cycles            avgt    3  14.668 ?  0.771   #/op  // -4!
> BarriersMultiple.test:instructions      avgt    3  58.636 ?  2.368   #/op
> 
> 
> Thanks,
> -Aleksey
> 

Ok, this basically makes the load of the flag appear to access immutable 
memory. It can now basically freely float above or below safepoints. We 
need to ensure that this cannot happen, otherwise we'll see the wrong 
flag state. But it seems to be step #1. Maybe restore the control into 
the LoadUBNode is enough to keep it at the right side of safepoints?

Roman


From shade at redhat.com  Thu Jan 11 11:35:04 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 11 Jan 2018 12:35:04 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <cf1d2a99-f5a2-67fa-a5cc-4892ad3d9e2d@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
 <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
 <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>
 <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com>
 <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com>
 <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com>
 <d25f5b04-9d1f-e070-9c35-50702867e462@redhat.com>
 <cf1d2a99-f5a2-67fa-a5cc-4892ad3d9e2d@redhat.com>
Message-ID: <12e9d1a4-c039-9ab8-214f-205306759ad4@redhat.com>

On 01/11/2018 12:19 PM, Roman Kennke wrote:
> Am 11.01.2018 um 11:51 schrieb Aleksey Shipilev:
>> On 01/10/2018 09:29 PM, Aleksey Shipilev wrote:
>>> Okay, so the dirty patch for the idea:
>>> ?? http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
>>>
> 
>>> perfasm for the offending test:
>>> ?? http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
>>>
>>> ? *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
>>> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
>>> -- which might be the lesser evil;
>>
>> Hey, this one works with the dirty hack like this:
>> ?? http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch
>>
>> It now drags commons GC state loads (and puts in the register):
>> ?? http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm
>>
> 
> Ok, this basically makes the load of the flag appear to access immutable memory. It can now
> basically freely float above or below safepoints. We need to ensure that this cannot happen,
> otherwise we'll see the wrong flag state. But it seems to be step #1. Maybe restore the control into
> the LoadUBNode is enough to keep it at the right side of safepoints?

That was basically a hack to see if the idea is profitable. It appears profitable. In addition to
that safepoint caveat, I had to disable WB coalescing, because the hack produces broken graph
otherwise, and C2 asserts. Roland said he can sketch the real patch some time later. Meanwhile, I'd
go and prepare the base patch for single-flag that TLS coalescing thing implicitly relies on. We can
try other hacks if Roland has no cycles to look at it, after the base patch is done.

-Aleksey


From rkennke at redhat.com  Thu Jan 11 11:51:58 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 11 Jan 2018 12:51:58 +0100
Subject: Perf: SATB and WB coalescing
In-Reply-To: <12e9d1a4-c039-9ab8-214f-205306759ad4@redhat.com>
References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com>
 <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com>
 <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com>
 <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com>
 <c7d9ce14-cbde-8223-33f0-40b036595fcd@redhat.com>
 <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com>
 <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com>
 <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com>
 <d25f5b04-9d1f-e070-9c35-50702867e462@redhat.com>
 <cf1d2a99-f5a2-67fa-a5cc-4892ad3d9e2d@redhat.com>
 <12e9d1a4-c039-9ab8-214f-205306759ad4@redhat.com>
Message-ID: <2697760b-840b-5290-496e-084b589a9459@redhat.com>

Am 11.01.2018 um 12:35 schrieb Aleksey Shipilev:
> On 01/11/2018 12:19 PM, Roman Kennke wrote:
>> Am 11.01.2018 um 11:51 schrieb Aleksey Shipilev:
>>> On 01/10/2018 09:29 PM, Aleksey Shipilev wrote:
>>>> Okay, so the dirty patch for the idea:
>>>>  ?? http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/
>>>>
>>
>>>> perfasm for the offending test:
>>>>  ?? http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
>>>>
>>>>  ? *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I
>>>> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register
>>>> -- which might be the lesser evil;
>>>
>>> Hey, this one works with the dirty hack like this:
>>>  ?? http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch
>>>
>>> It now drags commons GC state loads (and puts in the register):
>>>  ?? http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm
>>>
>>
>> Ok, this basically makes the load of the flag appear to access immutable memory. It can now
>> basically freely float above or below safepoints. We need to ensure that this cannot happen,
>> otherwise we'll see the wrong flag state. But it seems to be step #1. Maybe restore the control into
>> the LoadUBNode is enough to keep it at the right side of safepoints?
> 
> That was basically a hack to see if the idea is profitable. It appears profitable. In addition to
> that safepoint caveat, I had to disable WB coalescing, because the hack produces broken graph
> otherwise, and C2 asserts. Roland said he can sketch the real patch some time later. Meanwhile, I'd
> go and prepare the base patch for single-flag that TLS coalescing thing implicitly relies on. We can
> try other hacks if Roland has no cycles to look at it, after the base patch is done.
> 
> -Aleksey
> 

Yeah ok. I tried your hack with traversal GC. It does work, and I think 
I see some little improvement, but I guess the disabled optimization 
off-sets it a little.

I'll clean up the traversal GC and propose it soon-ish. It's not useful 
to have it wait in limbo until all possible optimizations are in place. 
Performance is already quite good (and exceeds default shenandoah for 
some workloads too, and looses some other workloads).

Thanks and cheers,
Roman

From zgu at redhat.com  Thu Jan 11 15:09:31 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 11 Jan 2018 10:09:31 -0500
Subject: RFR: [9 Backport] Shenandoah string deduplication support
In-Reply-To: <91279636-d93e-7232-9384-467d93f04c4a@redhat.com>
References: <cae7b600-019e-60df-b376-430b12a57b5b@redhat.com>
 <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com>
 <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com>
 <91279636-d93e-7232-9384-467d93f04c4a@redhat.com>
Message-ID: <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com>

Hi Roman,

Could you review this backport?

Thanks,

-Zhengyu

On 01/10/2018 05:23 PM, Aleksey Shipilev wrote:
> On 01/10/2018 10:31 PM, Zhengyu Gu wrote:
>>
>> On 01/10/2018 04:24 PM, Aleksey Shipilev wrote:
>>> On 01/10/2018 10:18 PM, Zhengyu Gu wrote:
>>>> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/
>>>
>>> Thank you, this looks good.
>>>
>>> How much was changed compared to sh/jdk10, and are there places we should take a special look at?
>>> I assume G1 changes are the retraction to the upstream jdk9 state?
>>>
>>
>> jdk10 patch applied pretty clean. The only file got mismerged was shenandoahRootProcessor.cpp.
>>
>> Yes, G1 changes reverted early changes back to upstream state.
> 
> Good for me then!
> 
> -Aleksey
> 
> 

From roman at kennke.org  Thu Jan 11 15:43:16 2018
From: roman at kennke.org (Roman Kennke)
Date: Thu, 11 Jan 2018 16:43:16 +0100
Subject: RFR: [9 Backport] Shenandoah string deduplication support
In-Reply-To: <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com>
References: <cae7b600-019e-60df-b376-430b12a57b5b@redhat.com>
 <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com>
 <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com>
 <91279636-d93e-7232-9384-467d93f04c4a@redhat.com>
 <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com>
Message-ID: <37D74C17-294C-4096-8C7C-B160417922A9@kennke.org>

Yes, in a few hours (I hope...)

Am 11. Januar 2018 16:09:31 MEZ schrieb Zhengyu Gu <zgu at redhat.com>:
>Hi Roman,
>
>Could you review this backport?
>
>Thanks,
>
>-Zhengyu
>
>On 01/10/2018 05:23 PM, Aleksey Shipilev wrote:
>> On 01/10/2018 10:31 PM, Zhengyu Gu wrote:
>>>
>>> On 01/10/2018 04:24 PM, Aleksey Shipilev wrote:
>>>> On 01/10/2018 10:18 PM, Zhengyu Gu wrote:
>>>>> Webrev:
>http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/
>>>>
>>>> Thank you, this looks good.
>>>>
>>>> How much was changed compared to sh/jdk10, and are there places we
>should take a special look at?
>>>> I assume G1 changes are the retraction to the upstream jdk9 state?
>>>>
>>>
>>> jdk10 patch applied pretty clean. The only file got mismerged was
>shenandoahRootProcessor.cpp.
>>>
>>> Yes, G1 changes reverted early changes back to upstream state.
>> 
>> Good for me then!
>> 
>> -Aleksey
>> 
>> 

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From rkennke at redhat.com  Thu Jan 11 20:59:29 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 11 Jan 2018 21:59:29 +0100
Subject: RFR: [9 Backport] Shenandoah string deduplication support
In-Reply-To: <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com>
References: <cae7b600-019e-60df-b376-430b12a57b5b@redhat.com>
 <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com>
 <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com>
 <91279636-d93e-7232-9384-467d93f04c4a@redhat.com>
 <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com>
Message-ID: <46d2f90a-29ff-a729-42f9-672f33e93568@redhat.com>

Looks good to me.

Thank you for doing this!

Roman

> Hi Roman,
> 
> Could you review this backport?
> 
> Thanks,
> 
> -Zhengyu
> 
> On 01/10/2018 05:23 PM, Aleksey Shipilev wrote:
>> On 01/10/2018 10:31 PM, Zhengyu Gu wrote:
>>>
>>> On 01/10/2018 04:24 PM, Aleksey Shipilev wrote:
>>>> On 01/10/2018 10:18 PM, Zhengyu Gu wrote:
>>>>> Webrev: 
>>>>> http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/ 
>>>>>
>>>>
>>>> Thank you, this looks good.
>>>>
>>>> How much was changed compared to sh/jdk10, and are there places we 
>>>> should take a special look at?
>>>> I assume G1 changes are the retraction to the upstream jdk9 state?
>>>>
>>>
>>> jdk10 patch applied pretty clean. The only file got mismerged was 
>>> shenandoahRootProcessor.cpp.
>>>
>>> Yes, G1 changes reverted early changes back to upstream state.
>>
>> Good for me then!
>>
>> -Aleksey
>>
>>


From zgu at redhat.com  Thu Jan 11 21:14:38 2018
From: zgu at redhat.com (zgu at redhat.com)
Date: Thu, 11 Jan 2018 21:14:38 +0000
Subject: hg: shenandoah/jdk9/hotspot: [Backport] Shenandoah string
 deduplication support
Message-ID: <201801112114.w0BLEc9f022904@aojmv0008.oracle.com>

Changeset: 917523f492d2
Author:    zgu
Date:      2018-01-11 16:10 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/917523f492d2

[Backport] Shenandoah string deduplication support

! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp
! src/cpu/x86/vm/stubGenerator_x86_64.cpp
! src/share/vm/classfile/stringTable.cpp
! src/share/vm/gc/g1/g1StringDedup.hpp
! src/share/vm/gc/g1/g1StringDedupQueue.cpp
! src/share/vm/gc/g1/g1StringDedupQueue.hpp
! src/share/vm/gc/g1/g1StringDedupTable.cpp
! src/share/vm/gc/g1/g1StringDedupThread.cpp
! src/share/vm/gc/g1/g1StringDedupThread.hpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp
! src/share/vm/gc/shenandoah/shenandoahOopClosures.hpp
! src/share/vm/gc/shenandoah/shenandoahOopClosures.inline.hpp
! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.cpp
! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.hpp
! src/share/vm/gc/shenandoah/shenandoahRootProcessor.cpp
+ src/share/vm/gc/shenandoah/shenandoahStrDedupQueue.cpp
+ src/share/vm/gc/shenandoah/shenandoahStrDedupQueue.hpp
+ src/share/vm/gc/shenandoah/shenandoahStrDedupQueue.inline.hpp
+ src/share/vm/gc/shenandoah/shenandoahStrDedupTable.cpp
+ src/share/vm/gc/shenandoah/shenandoahStrDedupTable.hpp
+ src/share/vm/gc/shenandoah/shenandoahStrDedupThread.cpp
+ src/share/vm/gc/shenandoah/shenandoahStrDedupThread.hpp
! src/share/vm/gc/shenandoah/shenandoahStringDedup.cpp
! src/share/vm/gc/shenandoah/shenandoahStringDedup.hpp
! src/share/vm/runtime/arguments.cpp
! src/share/vm/runtime/mutexLocker.cpp
! test/gc/shenandoah/ShenandoahStrDedupStress.java
! test/gc/shenandoah/TestShenandoahStrDedup.java


From shade at redhat.com  Fri Jan 12 11:11:49 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 12 Jan 2018 12:11:49 +0100
Subject: RFR: Single thread-local GC state flag for all barriers
Message-ID: <bff00cc7-a555-d24a-e0d4-469ff1d406e1@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.01/

Please review this carefully. I did AArch64 change blindly, symmetric with x86. Taking care of SATB
check requires some specialization in g1_wb_pre, and the relevant compiler matching changes. This
would get convenient as we common the gc-state load between the safepoints, and that would not touch
G1 SATB barriers then.

Example disassembly, (0x3d8(%r15) is our flag):
  http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm

Testing: hotspot_gc_shenandoah, eyeballing generated code

Thanks,
-Aleksey


From rwestrel at redhat.com  Fri Jan 12 14:27:13 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 12 Jan 2018 15:27:13 +0100
Subject: RFR: leverage profiling for tableswitch/lookup switch
Message-ID: <dk6vag7ywxq.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/shenandoah/switch-profiling/webrev.00/

This change is independent of shenandoah but the plan is to have it bake
for a bit here before it's proposed upstream. This is a follow up to:

http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-December/004535.html

1) profile collection is fixed with c1
2) C2 uses profiling to set frequencies of the branches of the switch
3) the tree of choices is trimmed down (if some branches are never taken)
4) the backend uses frequencies from profiling so scheduling is
not messed up

We saw that not having 4) messes up loop strip mining.
3) totally flies with the microbenchmarks:

before with -XX:+UseShenandoahGC:
WriteBarrierTableSwitch.common      1000  avgt   15  1109.139 ?   9.030  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  2383.219 ? 229.815  ns/op

after with -XX:+UseShenandoahGC:
WriteBarrierTableSwitch.common      1000  avgt   15  514.100 ? 20.067  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  505.883 ? 14.498  ns/op

I have another patch coming that should help this microbenchmark when
more than one branch of the switch is taken.

Roland.

From shade at redhat.com  Fri Jan 12 15:20:22 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 12 Jan 2018 16:20:22 +0100
Subject: RFR: leverage profiling for tableswitch/lookup switch
In-Reply-To: <dk6vag7ywxq.fsf@rwestrel.remote.csb>
References: <dk6vag7ywxq.fsf@rwestrel.remote.csb>
Message-ID: <e6babe0b-d0de-ec3c-450d-71bed5eba4dc@redhat.com>

On 01/12/2018 03:27 PM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/shenandoah/switch-profiling/webrev.00/

Not qualified of judging on intricate details for C2, so cursory review:

*) Maybe we should guard the feature with "chicken" diagnostic flag, like
ShenandoahTableSwitchProfiling? This would also mark the paths we would need to remove/refresh once
the change trickles down from upstream.

*) gcm.cpp, comment is outdated:
 1870     // Divide the frequency between all successors evenly

*) parse2.cpp, this one is just table[3*j + 0], etc:

 498       table[j+j+j+0] = iter().get_int_table(2+j+j);
 499       table[j+j+j+1] = iter().get_dest_table(2+j+j+1);
 500       table[j+j+j+2] = profile == NULL ? 1 : profile->count_at(j);

 527     jint match_int   = table[j+j+j+0];
 528     int  dest        = table[j+j+j+1];
 529     int  cnt         = table[j+j+j+2];

Thanks,
-Aleksey


From shade at redhat.com  Fri Jan 12 15:57:59 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 12 Jan 2018 16:57:59 +0100
Subject: RFR: Single thread-local GC state flag for all barriers
In-Reply-To: <bff00cc7-a555-d24a-e0d4-469ff1d406e1@redhat.com>
References: <bff00cc7-a555-d24a-e0d4-469ff1d406e1@redhat.com>
Message-ID: <a1c8e224-cf07-9f64-6854-1556850a02aa@redhat.com>

On 01/12/2018 12:11 PM, Aleksey Shipilev wrote:
> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.01/
> 
> Please review this carefully. I did AArch64 change blindly, symmetric with x86. Taking care of SATB
> check requires some specialization in g1_wb_pre, and the relevant compiler matching changes. This
> would get convenient as we common the gc-state load between the safepoints, and that would not touch
> G1 SATB barriers then.
> 
> Example disassembly, (0x3d8(%r15) is our flag):
>   http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
> 
> Testing: hotspot_gc_shenandoah, eyeballing generated code

Renamed need_update_refs to is_unstable -- this captures the intent better, and updated comments a
little:
  http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.02/

Thanks,
-Aleksey


From rwestrel at redhat.com  Fri Jan 12 16:05:17 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 12 Jan 2018 17:05:17 +0100
Subject: RFR: leverage profiling for tableswitch/lookup switch
In-Reply-To: <e6babe0b-d0de-ec3c-450d-71bed5eba4dc@redhat.com>
References: <dk6vag7ywxq.fsf@rwestrel.remote.csb>
 <e6babe0b-d0de-ec3c-450d-71bed5eba4dc@redhat.com>
Message-ID: <dk6mv1jysea.fsf@rwestrel.remote.csb>


> *) Maybe we should guard the feature with "chicken" diagnostic flag, like
> ShenandoahTableSwitchProfiling? This would also mark the paths we would need to remove/refresh once
> the change trickles down from upstream.

Sure but would you want all code paths that were changed to be guarded
by a new flag? That sounds a bit overkill to me.

> *) gcm.cpp, comment is outdated:
>  1870     // Divide the frequency between all successors evenly

Right.

> *) parse2.cpp, this one is just table[3*j + 0], etc:
>
>  498       table[j+j+j+0] = iter().get_int_table(2+j+j);
>  499       table[j+j+j+1] = iter().get_dest_table(2+j+j+1);
>  500       table[j+j+j+2] = profile == NULL ? 1 : profile->count_at(j);
>
>  527     jint match_int   = table[j+j+j+0];
>  528     int  dest        = table[j+j+j+1];
>  529     int  cnt         = table[j+j+j+2];

The original code insists on using j+j instead of 2*j, I suppose because
it emphasizes that each element is really a different field or something
like that. I followed along. Anyway I can change it if you like.

Roland.

From shade at redhat.com  Fri Jan 12 16:10:40 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 12 Jan 2018 17:10:40 +0100
Subject: RFR: leverage profiling for tableswitch/lookup switch
In-Reply-To: <dk6mv1jysea.fsf@rwestrel.remote.csb>
References: <dk6vag7ywxq.fsf@rwestrel.remote.csb>
 <e6babe0b-d0de-ec3c-450d-71bed5eba4dc@redhat.com>
 <dk6mv1jysea.fsf@rwestrel.remote.csb>
Message-ID: <56642b43-0540-7f35-65b8-37e51f5b6a80@redhat.com>

On 01/12/2018 05:05 PM, Roland Westrelin wrote:
> 
>> *) Maybe we should guard the feature with "chicken" diagnostic flag, like
>> ShenandoahTableSwitchProfiling? This would also mark the paths we would need to remove/refresh once
>> the change trickles down from upstream.
> 
> Sure but would you want all code paths that were changed to be guarded
> by a new flag? That sounds a bit overkill to me.

I guess the entry in LIRGenerator::do_TableSwitch, LIRGenerator::do_LookupSwitch and
Parse::create_jump_tables would be enough?


>> *) parse2.cpp, this one is just table[3*j + 0], etc:
>>
>>  498       table[j+j+j+0] = iter().get_int_table(2+j+j);
>>  499       table[j+j+j+1] = iter().get_dest_table(2+j+j+1);
>>  500       table[j+j+j+2] = profile == NULL ? 1 : profile->count_at(j);
>>
>>  527     jint match_int   = table[j+j+j+0];
>>  528     int  dest        = table[j+j+j+1];
>>  529     int  cnt         = table[j+j+j+2];
> 
> The original code insists on using j+j instead of 2*j, I suppose because
> it emphasizes that each element is really a different field or something
> like that. I followed along. Anyway I can change it if you like.

This looks like a plain Array-of-Structs, and the usual idiom is len*sIndex + off, so yeah, "j+j+j"
looks very odd. I suspect "j+j" was the attempt at microoptimization? I would guess upstream would
suggest the same change.

Thanks,
-Aleksey


From rwestrel at redhat.com  Fri Jan 12 16:54:42 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 12 Jan 2018 17:54:42 +0100
Subject: RFR: improve profiled predicates
Message-ID: <dk6k1wnyq3x.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/shenandoah/improved-profiled-predicates/webrev.00/

This change should make profiled predicates robuster:

- they now handle more control flow constructs
- instead of bailing out when profiling is missing, they assume it's an
  untaken path which allows frequencies to be computed on all
  paths
- they now support profiling data from lookupswitch/tableswitch (on Jump nodes)
- if a profiled predicate traps, the trap is recorded separately from
  regular predicate traps. The fallback will then be to recompile
  without profiled predicates but with regular predicates (instead of
  disabling predicates entirely). That change requires changing how
  traps are recorded and that's why there are small changes spread over
  so many files.

Roland.

From rkennke at redhat.com  Fri Jan 12 22:06:51 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 12 Jan 2018 23:06:51 +0100
Subject: RFR: Single thread-local GC state flag for all barriers
In-Reply-To: <a1c8e224-cf07-9f64-6854-1556850a02aa@redhat.com>
References: <bff00cc7-a555-d24a-e0d4-469ff1d406e1@redhat.com>
 <a1c8e224-cf07-9f64-6854-1556850a02aa@redhat.com>
Message-ID: <b90d9156-029f-2c52-fdb0-ee76117af4c9@redhat.com>

Am 12.01.2018 um 16:57 schrieb Aleksey Shipilev:
> On 01/12/2018 12:11 PM, Aleksey Shipilev wrote:
>> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.01/
>>
>> Please review this carefully. I did AArch64 change blindly, symmetric with x86. Taking care of SATB
>> check requires some specialization in g1_wb_pre, and the relevant compiler matching changes. This
>> would get convenient as we common the gc-state load between the safepoints, and that would not touch
>> G1 SATB barriers then.
>>
>> Example disassembly, (0x3d8(%r15) is our flag):
>>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm
>>
>> Testing: hotspot_gc_shenandoah, eyeballing generated code
> 
> Renamed need_update_refs to is_unstable -- this captures the intent better, and updated comments a
> little:
>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.02/
> 
> Thanks,
> -Aleksey
> 
> 

This is great stuff!

One little note: traversal GC will only have one state, and thus doesn't 
need the masking. I need to see how that fits with this new code :-)

I only have fairly minor comments:

src/hotspot/cpu/x86/macroAssembler_x86.cpp:

    if (ShenandoahConditionalSATBBarrier) {
      Label done;
-    movptr(tmp, (intptr_t) 
ShenandoahHeap::concurrent_mark_in_progress_addr());
-    testb(Address(tmp, 0), 1);
+    movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr());
+    testb(Address(tmp, 0), ShenandoahHeap::MARKING);

Can't we use the thread-local flag here? There are several occurances of 
the pattern in that file.

src/hotspot/share/c1/c1_LIRGenerator.cpp:

same as above?

src/hotspot/share/opto/graphKit.cpp

same as above?

This stuff probably qualifies for a separate patch, as it diverges from 
previous code.

-------------

need_update_refs() -> is_unstable():

I don't think this captures the intent better. The former, I know right 
away what it means, the latter means I need to look up what 'unstable' 
means in our context.

---------------

C2 changes look ok from afar, but Roland should look at them too.

-------------

src/hotspot/share/runtime/thread.hpp:

+  // Support for Shenandoah barriers
+  static char _gc_state_global;
+  char _gc_state;

Little side-note: upstream got rid of the static stuff for SATB state 
and moved it into the collector. It's not merged into Shenandoah yet (we 
really need to update our codebase!!) Maybe we should do the same? Keep 
global state in ShenandoahHeap and ThreadLocal state in Thread, and 
nowhere else.

-------------

AArch64 looks reasonable, but should probably be built and tested once? 
I don't have resources right now to do that though.


Everything else looks great!

From shade at redhat.com  Sat Jan 13 09:51:00 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Sat, 13 Jan 2018 10:51:00 +0100
Subject: Perf: WB without RB on fastpath
Message-ID: <821a291c-73d2-36d2-3f43-ee70b43ab0ae@redhat.com>

The single flag change opens up an interesting opportunity for us: we can check for the GC state to
be zero, and that means no barriers are required whatsoever. So, instead of doing:

     testb $0x4, 0x3d8(TLS)
     jnz EVAC-IN-PROGRESS
     mov %r, -0x8(%r)
DONE:
     ...
(later)
EVAC-IN-PROGRESS:
     <test against cset>
     <jump to slowpath>

...we can do:

     cmpb $0x0, 0x3d8(TLS)
     jne NON-STABLE-HEAP
DONE:
     ...
(later)
NON-STABLE HEAP:
     test $0x4, 0x3d8(TLS)
     jz DONE
     <test against cset>
     <jump to slowpath>

So the fastpath is the same, we just test against different value. Slowpath gets a bit slower. The
performance improvement can be estimated with passive, -XX:+ShWB and -XX:(+|-)ShWriteBarrierRB.
Overnight runs translate to:

Compiler.compiler: +1.0%
Compiler.sunflow:  +1.2%
Compress:          +2.6%
CryptoSignVerify:  +0.3%
MpegAudio:         +1.9%
ScimarkLU.large:   +4.8%
ScimarkLU.small:   +9.5%
XmlTransform:      +1.6%
XmlValidation:     +2.5%

...and no regressions!

Roman mentions separately that Traversal GC does not require RB at all on fastpath, which seems to
be the special case of this generic optimization.

Thanks,
-Aleksey


From shade at redhat.com  Mon Jan 15 11:05:02 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jan 2018 12:05:02 +0100
Subject: RFR: Single thread-local GC state flag for all barriers
In-Reply-To: <b90d9156-029f-2c52-fdb0-ee76117af4c9@redhat.com>
References: <bff00cc7-a555-d24a-e0d4-469ff1d406e1@redhat.com>
 <a1c8e224-cf07-9f64-6854-1556850a02aa@redhat.com>
 <b90d9156-029f-2c52-fdb0-ee76117af4c9@redhat.com>
Message-ID: <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com>

Updated patch:
  http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.03/


On 01/12/2018 11:06 PM, Roman Kennke wrote:
> One little note: traversal GC will only have one state, and thus doesn't need the masking. I need to
> see how that fits with this new code :-)

So does partial now. You can add another constant to ShenandoahHeap::GCState, and act accordingly.

> I only have fairly minor comments:
> 
> src/hotspot/cpu/x86/macroAssembler_x86.cpp:
> 
> ?? if (ShenandoahConditionalSATBBarrier) {
> ???? Label done;
> -??? movptr(tmp, (intptr_t) ShenandoahHeap::concurrent_mark_in_progress_addr());
> -??? testb(Address(tmp, 0), 1);
> +??? movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr());
> +??? testb(Address(tmp, 0), ShenandoahHeap::MARKING);
> 
> Can't we use the thread-local flag here? There are several occurances of the pattern in that file.

We can use it here, fixed.

> src/hotspot/share/c1/c1_LIRGenerator.cpp:
> 
> same as above?

Not really: the TLS access is complicated there. WB does it differently: it is accessing TLS flags
after the actual lowering. Kept intact.

> src/hotspot/share/opto/graphKit.cpp
> 
> same as above?

Tried to, but the change gets uncomfortably large.


> need_update_refs() -> is_unstable():
> 
> I don't think this captures the intent better. The former, I know right away what it means, the
> latter means I need to look up what 'unstable' means in our context.

I still think need_update_refs is a bad name: it describes "what corrective action we should do",
not "what the heap state is". This gets awkward when we choose not to do RB when need_update_refs is
false. But I guess "is_unstable" is too generic, how about "has_forwarded_objects"? This clearly
states what the heap state is, and makes it trivial to comprehend.

> ---------------
> 
> C2 changes look ok from afar, but Roland should look at them too.
> 
> -------------

Roland, can you take a look?

> src/hotspot/share/runtime/thread.hpp:
> 
> +? // Support for Shenandoah barriers
> +? static char _gc_state_global;
> +? char _gc_state;
> 
> Little side-note: upstream got rid of the static stuff for SATB state and moved it into the
> collector. It's not merged into Shenandoah yet (we really need to update our codebase!!) Maybe we
> should do the same? Keep global state in ShenandoahHeap and ThreadLocal state in Thread, and nowhere
> else.

This patch follows what we have with evac_in_progress -- we have static field in Thread, let's keep
it that way for a time being.

> AArch64 looks reasonable, but should probably be built and tested once? I don't have resources
> right now to do that though.
I have cross-compiled it to AArch64 and ran basic tests on RPi 3. It failed, and I discovered a few
AArch64-specific bugs in the patch. Fixed them, and the basic tests run fine now.


Thanks,
-Aleksey


From rkennke at redhat.com  Mon Jan 15 11:33:34 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 15 Jan 2018 12:33:34 +0100
Subject: RFR: Single thread-local GC state flag for all barriers
In-Reply-To: <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com>
References: <bff00cc7-a555-d24a-e0d4-469ff1d406e1@redhat.com>
 <a1c8e224-cf07-9f64-6854-1556850a02aa@redhat.com>
 <b90d9156-029f-2c52-fdb0-ee76117af4c9@redhat.com>
 <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com>
Message-ID: <017badf7-f155-7642-25eb-d673879deb65@redhat.com>

Am 15.01.2018 um 12:05 schrieb Aleksey Shipilev:
> Updated patch:
>    http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.03/
> 
> 
> On 01/12/2018 11:06 PM, Roman Kennke wrote:
>> One little note: traversal GC will only have one state, and thus doesn't need the masking. I need to
>> see how that fits with this new code :-)
> 
> So does partial now. You can add another constant to ShenandoahHeap::GCState, and act accordingly.

Yes. I believe traversal is still different: with partial we can fall 
back to regular Shenandoah (intermediate GC), and thus need to check 
different states, and thus need the masking check. With traversal we'd 
never do that, and thus can use a simple zero/not-zero check. I suspect 
this would be a tiny little bit faster. ? But I'll figure this out once 
your patch is in.

>> I only have fairly minor comments:
>>
>> src/hotspot/cpu/x86/macroAssembler_x86.cpp:
>>
>>  ?? if (ShenandoahConditionalSATBBarrier) {
>>  ???? Label done;
>> -??? movptr(tmp, (intptr_t) ShenandoahHeap::concurrent_mark_in_progress_addr());
>> -??? testb(Address(tmp, 0), 1);
>> +??? movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr());
>> +??? testb(Address(tmp, 0), ShenandoahHeap::MARKING);
>>
>> Can't we use the thread-local flag here? There are several occurances of the pattern in that file.
> 
> We can use it here, fixed.

Thanks!

>> src/hotspot/share/c1/c1_LIRGenerator.cpp:
>>
>> same as above?
> 
> Not really: the TLS access is complicated there. WB does it differently: it is accessing TLS flags
> after the actual lowering. Kept intact.

Ok.

>> src/hotspot/share/opto/graphKit.cpp
>>
>> same as above?
> 
> Tried to, but the change gets uncomfortably large.

Ok, that is fine. Should do that later. (But I suspect it only affects 
partial GC stuff, and is thus not high priority.)

>> need_update_refs() -> is_unstable():
>>
>> I don't think this captures the intent better. The former, I know right away what it means, the
>> latter means I need to look up what 'unstable' means in our context.
> 
> I still think need_update_refs is a bad name: it describes "what corrective action we should do",
> not "what the heap state is". This gets awkward when we choose not to do RB when need_update_refs is
> false. But I guess "is_unstable" is too generic, how about "has_forwarded_objects"? This clearly
> states what the heap state is, and makes it trivial to comprehend.

Very good!

> 
>> ---------------
>>
>> C2 changes look ok from afar, but Roland should look at them too.
>>
>> -------------
> 
> Roland, can you take a look?
> 
>> src/hotspot/share/runtime/thread.hpp:
>>
>> +? // Support for Shenandoah barriers
>> +? static char _gc_state_global;
>> +? char _gc_state;
>>
>> Little side-note: upstream got rid of the static stuff for SATB state and moved it into the
>> collector. It's not merged into Shenandoah yet (we really need to update our codebase!!) Maybe we
>> should do the same? Keep global state in ShenandoahHeap and ThreadLocal state in Thread, and nowhere
>> else.
> 
> This patch follows what we have with evac_in_progress -- we have static field in Thread, let's keep
> it that way for a time being.

Yes. Let's change this once the upstream stuff arrives, and make it 
consistent then. I actually see a chance to upstream our stuff, and make 
G1 use the generic flag, or maybe even generify it even more and make 
room for generic thread local GC data structures.

>> AArch64 looks reasonable, but should probably be built and tested once? I don't have resources
>> right now to do that though.
> I have cross-compiled it to AArch64 and ran basic tests on RPi 3. It failed, and I discovered a few
> AArch64-specific bugs in the patch. Fixed them, and the basic tests run fine now.

Very good. Patch looks ok for me now.

Roman


From shade at redhat.com  Mon Jan 15 12:23:19 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jan 2018 13:23:19 +0100
Subject: RFR: Common TLS access to GC state, where possible
Message-ID: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/
(The initial version of this patch was drafted by Roland)

This patch bases on single GC state flag patch. This enables us to match that load at once, and
common all the loads of GC state between the safepoints, thus avoiding excess L1 cache accesses.
This covers for the cases where we cannot move the barriers themselves, and thus improves the
worst-case scenario.

It sure helps targeted back-to-back store benchmarks:

Benchmark                                    Mode  Cnt   Score    Error  Units

# default
BarriersMultiple.test                        avgt   15   5.935 ?  0.003  ns/op
BarriersMultiple.test:L1-dcache-loads        avgt    3  35.420 ?  2.116   #/op
BarriersMultiple.test:L1-dcache-stores       avgt    3   9.082 ?  0.603   #/op
BarriersMultiple.test:branches               avgt    3  18.187 ?  1.005   #/op
BarriersMultiple.test:cycles                 avgt    3  22.401 ?  1.249   #/op
BarriersMultiple.test:instructions           avgt    3  83.810 ?  4.297   #/op

# -XX:+ShenandoahCommonGCStateLoads
BarriersMultiple.test                        avgt   15   5.392 ?  0.116  ns/op
BarriersMultiple.test:L1-dcache-loads        avgt    3  26.302 ?  0.456   #/op  // -9!
BarriersMultiple.test:L1-dcache-stores       avgt    3   9.078 ?  1.174   #/op
BarriersMultiple.test:branches               avgt    3  18.218 ?  0.092   #/op
BarriersMultiple.test:cycles                 avgt    3  20.368 ?  3.023   #/op  // -2
BarriersMultiple.test:instructions           avgt    3  86.984 ?  1.127   #/op

...but comes with the caveat: the increased register pressure (?) seems to penalize some of the
bigger workloads. To avoid bitrot, and get the matchers for GC state loads into our codebase, I
propose pushing this under disabled experimental flag. New test validates the feature is not
completely broken.

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From rkennke at redhat.com  Mon Jan 15 12:27:56 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 15 Jan 2018 13:27:56 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
Message-ID: <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com>

Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/
> (The initial version of this patch was drafted by Roland)
> 
> This patch bases on single GC state flag patch. This enables us to match that load at once, and
> common all the loads of GC state between the safepoints, thus avoiding excess L1 cache accesses.
> This covers for the cases where we cannot move the barriers themselves, and thus improves the
> worst-case scenario.
> 
> It sure helps targeted back-to-back store benchmarks:
> 
> Benchmark                                    Mode  Cnt   Score    Error  Units
> 
> # default
> BarriersMultiple.test                        avgt   15   5.935 ?  0.003  ns/op
> BarriersMultiple.test:L1-dcache-loads        avgt    3  35.420 ?  2.116   #/op
> BarriersMultiple.test:L1-dcache-stores       avgt    3   9.082 ?  0.603   #/op
> BarriersMultiple.test:branches               avgt    3  18.187 ?  1.005   #/op
> BarriersMultiple.test:cycles                 avgt    3  22.401 ?  1.249   #/op
> BarriersMultiple.test:instructions           avgt    3  83.810 ?  4.297   #/op
> 
> # -XX:+ShenandoahCommonGCStateLoads
> BarriersMultiple.test                        avgt   15   5.392 ?  0.116  ns/op
> BarriersMultiple.test:L1-dcache-loads        avgt    3  26.302 ?  0.456   #/op  // -9!
> BarriersMultiple.test:L1-dcache-stores       avgt    3   9.078 ?  1.174   #/op
> BarriersMultiple.test:branches               avgt    3  18.218 ?  0.092   #/op
> BarriersMultiple.test:cycles                 avgt    3  20.368 ?  3.023   #/op  // -2
> BarriersMultiple.test:instructions           avgt    3  86.984 ?  1.127   #/op
> 
> ...but comes with the caveat: the increased register pressure (?) seems to penalize some of the
> bigger workloads. To avoid bitrot, and get the matchers for GC state loads into our codebase, I
> propose pushing this under disabled experimental flag. New test validates the feature is not
> completely broken.
> 
> Testing: hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 

I tried the initial Roland patch with traversal GC (against the then 
evac-in-progress flag), and have seen occurances of back-to-back 
evac-loads-checks that have not been common-ed. Roland is looking at it. 
I suggest to at least hold it back until this is resolved or confirmed 
to be a separate issue.

Roman

From rkennke at redhat.com  Mon Jan 15 13:25:07 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 15 Jan 2018 14:25:07 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
Message-ID: <4ac30fe4-c9f2-14a6-0e90-3365e272bfd5@redhat.com>

Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/
> (The initial version of this patch was drafted by Roland)
> 
> This patch bases on single GC state flag patch. This enables us to match that load at once, and
> common all the loads of GC state between the safepoints, thus avoiding excess L1 cache accesses.
> This covers for the cases where we cannot move the barriers themselves, and thus improves the
> worst-case scenario.
> 
> It sure helps targeted back-to-back store benchmarks:
> 
> Benchmark                                    Mode  Cnt   Score    Error  Units
> 
> # default
> BarriersMultiple.test                        avgt   15   5.935 ?  0.003  ns/op
> BarriersMultiple.test:L1-dcache-loads        avgt    3  35.420 ?  2.116   #/op
> BarriersMultiple.test:L1-dcache-stores       avgt    3   9.082 ?  0.603   #/op
> BarriersMultiple.test:branches               avgt    3  18.187 ?  1.005   #/op
> BarriersMultiple.test:cycles                 avgt    3  22.401 ?  1.249   #/op
> BarriersMultiple.test:instructions           avgt    3  83.810 ?  4.297   #/op
> 
> # -XX:+ShenandoahCommonGCStateLoads
> BarriersMultiple.test                        avgt   15   5.392 ?  0.116  ns/op
> BarriersMultiple.test:L1-dcache-loads        avgt    3  26.302 ?  0.456   #/op  // -9!
> BarriersMultiple.test:L1-dcache-stores       avgt    3   9.078 ?  1.174   #/op
> BarriersMultiple.test:branches               avgt    3  18.218 ?  0.092   #/op
> BarriersMultiple.test:cycles                 avgt    3  20.368 ?  3.023   #/op  // -2
> BarriersMultiple.test:instructions           avgt    3  86.984 ?  1.127   #/op
> 
> ...but comes with the caveat: the increased register pressure (?) seems to penalize some of the
> bigger workloads. To avoid bitrot, and get the matchers for GC state loads into our codebase, I
> propose pushing this under disabled experimental flag. New test validates the feature is not
> completely broken.
> 
> Testing: hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 

Also, I am not sure if the patch already does it: what about also moving 
up the actual tests? And thus creating longer paths with/without 
barriers? I suspect it would be slightly trickier now because of the 
different masks that it needs to check? It might not be very useful with 
default heuristics because we tend to interleave different barriers 
(SATB vs. evac), but may be tremendously useful for traversal GC, where 
we only have one phase and can thus group all the barriers into one path 
(enqueue, WBs, *hopefully* even RBs and acmp barriers), and remain 
barrier-free in another?

Roman


From shade at redhat.com  Mon Jan 15 13:38:51 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jan 2018 14:38:51 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
 <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com>
Message-ID: <bc4a99d5-0d86-b9e4-c528-8503ad6b7250@redhat.com>

On 01/15/2018 01:27 PM, Roman Kennke wrote:
> Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev:
> I tried the initial Roland patch with traversal GC (against the then evac-in-progress flag), and
> have seen occurances of back-to-back evac-loads-checks that have not been common-ed. Roland is
> looking at it. I suggest to at least hold it back until this is resolved or confirmed to be a
> separate issue.

This is a separate issue, having nothing to do with barrier moves. This is about commoning the TLS
access, so that this:

 testb $0x2, 0x3d8(TLS)
 jne SLOW
 ...
 testb $0x2, 0x3d8(TLS)
 jne SLOW
 ...

becomes:

 mov %r11, 0x3d8(TLS)
 and $0x2, %r11
 test %r11, %r11
 jne SLOW
 ...
 test %r11, %r11
 jne SLOW
 ...

...saving the TLS access on back-to-back barriers, which are dormant anyhow.

> Also, I am not sure if the patch already does it: what about also moving up the actual tests? And
> thus creating longer paths with/without barriers? I suspect it would be slightly trickier now
> because of the different masks that it needs to check? It might not be very useful with default
> heuristics because we tend to interleave different barriers (SATB vs. evac), but may be
> tremendously useful for traversal GC, where we only have one phase and can thus group all the
> barriers into one path (enqueue, WBs, *hopefully* even RBs and acmp barriers), and remain
> barrier-free in another?

Let's have some perspective, and not put all our eggs in one basket, okay? This patch helps the
cases where (multiple) barriers cannot be optimized. It does not move the barriers around --
instead, it makes their fastpaths faster by not accessing the TLS every time.

The whole machinery actually helps both SATB and WB checks, because after recent GC state both SATB
and WB are checking against the same flag. It also aids future work, because it brings forward the
matchers for generic GC state loads, not only evac-in-progress loads. If you want to have the
barrier-free paths, you have to care about the generic GC state, not just evac-in-progress.

Please note the optimization is disabled by default, but we want the C2 scaffolding anyway.

-Aleksey


From rkennke at redhat.com  Mon Jan 15 14:07:51 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 15 Jan 2018 15:07:51 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <bc4a99d5-0d86-b9e4-c528-8503ad6b7250@redhat.com>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
 <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com>
 <bc4a99d5-0d86-b9e4-c528-8503ad6b7250@redhat.com>
Message-ID: <456d81bb-c16c-9812-a6f5-3396e39daaec@redhat.com>

Am 15.01.2018 um 14:38 schrieb Aleksey Shipilev:
> On 01/15/2018 01:27 PM, Roman Kennke wrote:
>> Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev:
>> I tried the initial Roland patch with traversal GC (against the then evac-in-progress flag), and
>> have seen occurances of back-to-back evac-loads-checks that have not been common-ed. Roland is
>> looking at it. I suggest to at least hold it back until this is resolved or confirmed to be a
>> separate issue.
> 
> This is a separate issue, having nothing to do with barrier moves. This is about commoning the TLS
> access, so that this:
> 
>   testb $0x2, 0x3d8(TLS)
>   jne SLOW
>   ...
>   testb $0x2, 0x3d8(TLS)
>   jne SLOW
>   ...
> 
> becomes:
> 
>   mov %r11, 0x3d8(TLS)
>   and $0x2, %r11
>   test %r11, %r11
>   jne SLOW
>   ...
>   test %r11, %r11
>   jne SLOW
>   ...
> 
> ...saving the TLS access on back-to-back barriers, which are dormant anyhow.

Yes, this is what I was talking about, and I have still seen exactly 
those patterns after Roland's patch (at least for some cases).

>> Also, I am not sure if the patch already does it: what about also moving up the actual tests? And
>> thus creating longer paths with/without barriers? I suspect it would be slightly trickier now
>> because of the different masks that it needs to check? It might not be very useful with default
>> heuristics because we tend to interleave different barriers (SATB vs. evac), but may be
>> tremendously useful for traversal GC, where we only have one phase and can thus group all the
>> barriers into one path (enqueue, WBs, *hopefully* even RBs and acmp barriers), and remain
>> barrier-free in another?
> 
> Let's have some perspective, and not put all our eggs in one basket, okay? This patch helps the
> cases where (multiple) barriers cannot be optimized. It does not move the barriers around --
> instead, it makes their fastpaths faster by not accessing the TLS every time.
> 
> The whole machinery actually helps both SATB and WB checks, because after recent GC state both SATB
> and WB are checking against the same flag. It also aids future work, because it brings forward the
> matchers for generic GC state loads, not only evac-in-progress loads. If you want to have the
> barrier-free paths, you have to care about the generic GC state, not just evac-in-progress.
> 
> Please note the optimization is disabled by default, but we want the C2 scaffolding anyway.

Ok. This is not so separate though. What I was suggesting in this last 
comment was to also common the actual checks, so your above example 
could become (assuming same flags):

    mov %r11, 0x3d8(TLS)
    and $0x2, %r11
    test %r11, %r11
    jne SLOW
    ...

I am not (yet) suggesting to move any barriers around. All I care about 
for now is commoning the loads, and when that works, also commoning the 
tests. This alone should lead to nice groups of barriers under one 
flag-load-test, and a fast path without barriers. Or not?

Roman

From shade at redhat.com  Mon Jan 15 14:53:51 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jan 2018 15:53:51 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <456d81bb-c16c-9812-a6f5-3396e39daaec@redhat.com>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
 <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com>
 <bc4a99d5-0d86-b9e4-c528-8503ad6b7250@redhat.com>
 <456d81bb-c16c-9812-a6f5-3396e39daaec@redhat.com>
Message-ID: <3abdbb2b-f6c0-1836-4c68-d68493b9bd6c@redhat.com>

On 01/15/2018 03:07 PM, Roman Kennke wrote:
> Ok. This is not so separate though. What I was suggesting in this last comment was to also common
> the actual checks, so your above example could become (assuming same flags):
> 
> ?? mov %r11, 0x3d8(TLS)
> ?? and $0x2, %r11
> ?? test %r11, %r11
> ?? jne SLOW
> ?? ...
> 
> I am not (yet) suggesting to move any barriers around. All I care about for now is commoning the
> loads, and when that works, also commoning the tests. This alone should lead to nice groups of
> barriers under one flag-load-test, and a fast path without barriers. Or not?

If would, but it requires rewiring the control flow (that what I meant by "moving the barriers",
probably confusingly), while this particular change just commons the accesses to the flag itself. In
my mind, this is orthogonal to rewiring the control flow, and it caters for cases where rewiring is
not possible due to structural reasons.

In other words, you want three things:
 a) Detect the GC state load;
 b) Common the GC state loads over multiple branches;
 c) Try to rewire branches so that huge happy paths are present under single branch;

The patch in this RFR does (a) [provides scaffolding] and (b) [experimentally, disabled by default,
as proof-of-concept such commoning is possible and available for performance testing if needed]. (c)
would work even if (c) is not possible in a particular case. It seems odd to wait for (c) before
pushing (a)+(b) out, right?

Thanks,
-Aleksey


From rwestrel at redhat.com  Mon Jan 15 15:44:36 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 15 Jan 2018 16:44:36 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
Message-ID: <dk6vag3xh23.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/

C2 code looks ok to me.

Roland.

From rwestrel at redhat.com  Mon Jan 15 16:32:41 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 15 Jan 2018 17:32:41 +0100
Subject: RFR: Single thread-local GC state flag for all barriers
In-Reply-To: <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com>
References: <bff00cc7-a555-d24a-e0d4-469ff1d406e1@redhat.com>
 <a1c8e224-cf07-9f64-6854-1556850a02aa@redhat.com>
 <b90d9156-029f-2c52-fdb0-ee76117af4c9@redhat.com>
 <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com>
Message-ID: <dk6shb7xety.fsf@rwestrel.remote.csb>


>   http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.03/

C2 code looks ok.

Roland.

From shade at redhat.com  Mon Jan 15 16:40:39 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jan 2018 17:40:39 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <dk6vag3xh23.fsf@rwestrel.remote.csb>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
 <dk6vag3xh23.fsf@rwestrel.remote.csb>
Message-ID: <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com>

On 01/15/2018 04:44 PM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/
> 
> C2 code looks ok to me.

Does it interfere/help your pending work? I think that is Roman's concern.

Thanks,
-Aleksey


From rwestrel at redhat.com  Mon Jan 15 16:45:47 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 15 Jan 2018 17:45:47 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
 <dk6vag3xh23.fsf@rwestrel.remote.csb>
 <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com>
Message-ID: <dk6po6bxe84.fsf@rwestrel.remote.csb>


> Does it interfere/help your pending work? I think that is Roman's concern.

It's fine AFAICT.

Roland.

From shade at redhat.com  Mon Jan 15 16:58:15 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jan 2018 17:58:15 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <dk6po6bxe84.fsf@rwestrel.remote.csb>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
 <dk6vag3xh23.fsf@rwestrel.remote.csb>
 <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com>
 <dk6po6bxe84.fsf@rwestrel.remote.csb>
Message-ID: <1379a4f5-8333-3cea-d316-3cd414e2037b@redhat.com>

On 01/15/2018 05:45 PM, Roland Westrelin wrote:
> 
>> Does it interfere/help your pending work? I think that is Roman's concern.
> 
> It's fine AFAICT.

Roman, does this resolve your concern? Or you still want to hold this patch off?

Thanks,
-Aleksey


From shade at redhat.com  Mon Jan 15 17:18:27 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jan 2018 18:18:27 +0100
Subject: RFR: [9] Bulk backports to sh/jdk9
Message-ID: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180115/webrev.01/

This backports all outstanding work to sh/jdk9. This passes a few nightlies.

Changes include:

 [backport] Increase test timeouts
 [backport] Report fwdptr size in JNI GetObjectSize
 [backport] Disable verification from non-Shenandoah VMOps.
 [backport] Cleanup reset_{next|complete}_mark_bitmap
 [backport] Verifier should check klass pointers before attempting to reach for object size
 [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode
 [backport] Shenandoah SA implementation
 [backport] Allow use of fp spills around write barrier
 [backport] Rehash VMOperations and cycle driver mechanics for consistency
 [backport] Minor cleanup, uses latest Atomic API
 [backport] Match barrier fastpath checks better
 [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath

String deduplication, NIO checkIndex fix, and assorted Windows compilation fixes were already
backported by Zhengyu and Roman.

Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks

Thanks,
-Aleksey


From zgu at redhat.com  Mon Jan 15 17:21:36 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 15 Jan 2018 12:21:36 -0500
Subject: RFR: Hint unused regions instead of uncommit them
Message-ID: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com>

This patch adds new experimental flag ShenandoahIdleRegions (default to 
false) to hint kernel that the regions are not needed (vs. 
madvise(MADV_DONTNEED), instead of proactively uncommitting.

It appears that does have advantage over uncommitting regions, although, 
not by as much as I was expected.

SPECjbb2015:

Baseline:
RUN RESULT: hbIR (max attempted) = 59167, hbIR (settled) = 51984, 
max-jOPS = 47925, critical-jOPS = 19108

-XX:ShenandoahUncommitDelay=0 -XX:-ShenandoahIdleRegions
RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 36501, 
max-jOPS = 30839, critical-jOPS = 8841

-XX:ShenandoahUncommitDelay=0 -XX:+ShenandoahIdleRegions
RUN RESULT: hbIR (max attempted) = 49322, hbIR (settled) = 42968, 
max-jOPS = 35019, critical-jOPS = 9283


Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.00/

Test:

   hotspot_gc_shenandoah (fastdebug + release)


Thanks,

-Zhengyu

From rkennke at redhat.com  Mon Jan 15 17:27:35 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 15 Jan 2018 18:27:35 +0100
Subject: RFR: Common TLS access to GC state, where possible
In-Reply-To: <1379a4f5-8333-3cea-d316-3cd414e2037b@redhat.com>
References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com>
 <dk6vag3xh23.fsf@rwestrel.remote.csb>
 <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com>
 <dk6po6bxe84.fsf@rwestrel.remote.csb>
 <1379a4f5-8333-3cea-d316-3cd414e2037b@redhat.com>
Message-ID: <96db164d-1508-e6b5-0cb2-7c07183d80d1@redhat.com>

Am 15.01.2018 um 17:58 schrieb Aleksey Shipilev:
> On 01/15/2018 05:45 PM, Roland Westrelin wrote:
>>
>>> Does it interfere/help your pending work? I think that is Roman's concern.
>>
>> It's fine AFAICT.
> 
> Roman, does this resolve your concern? Or you still want to hold this patch off?
> 
> Thanks,
> -Aleksey
> 

It's fine for me.

Roman


From ashipile at redhat.com  Mon Jan 15 17:35:07 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Mon, 15 Jan 2018 17:35:07 +0000
Subject: hg: shenandoah/jdk10: 2 new changesets
Message-ID: <201801151735.w0FHZ82l018554@aojmv0008.oracle.com>

Changeset: 8735773ec619
Author:    shade
Date:      2018-01-15 12:19 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/8735773ec619

Single thread-local GC state flag for all barriers

! src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp
! src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
! src/hotspot/cpu/aarch64/shenandoahBarrierSet_aarch64.cpp
! src/hotspot/cpu/x86/c1_Runtime1_x86.cpp
! src/hotspot/cpu/x86/macroAssembler_x86.cpp
! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp
! src/hotspot/cpu/x86/x86_64.ad
! src/hotspot/share/c1/c1_LIRGenerator.cpp
! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp
! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp
! src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp
! src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp
! src/hotspot/share/opto/cfgnode.hpp
! src/hotspot/share/opto/compile.cpp
! src/hotspot/share/opto/graphKit.cpp
! src/hotspot/share/opto/ifnode.cpp
! src/hotspot/share/opto/memnode.hpp
! src/hotspot/share/opto/node.hpp
! src/hotspot/share/opto/shenandoahSupport.cpp
! src/hotspot/share/runtime/thread.cpp
! src/hotspot/share/runtime/thread.hpp
! src/hotspot/share/runtime/thread.inline.hpp

Changeset: d55c6d5216d1
Author:    shade
Date:      2018-01-15 12:32 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/d55c6d5216d1

Common TLS access to GC state, where possible

! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp
! src/hotspot/share/opto/graphKit.cpp
! src/hotspot/share/opto/loopnode.cpp
! src/hotspot/share/opto/loopnode.hpp
! src/hotspot/share/opto/shenandoahSupport.cpp
! src/hotspot/share/opto/shenandoahSupport.hpp
+ test/hotspot/jtreg/gc/shenandoah/compiler/TestCommonGCLoads.java


From shade at redhat.com  Mon Jan 15 17:43:03 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jan 2018 18:43:03 +0100
Subject: RFR: improve profiled predicates
In-Reply-To: <dk6k1wnyq3x.fsf@rwestrel.remote.csb>
References: <dk6k1wnyq3x.fsf@rwestrel.remote.csb>
Message-ID: <5637e356-3c95-6768-05d4-83e4b1cd4fe6@redhat.com>

On 01/12/2018 05:54 PM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/shenandoah/improved-profiled-predicates/webrev.00/

Help me understand why we are pushing this to sh/jdk10? Is this for pre-stabilization until we
upstream this separately? We don't backport this at all to sh/jdk9 and sh/jdk8?

Nits:

loopPredicate.cpp
 *) indenting is off starting line 331, also see lines 334 and 339
 *) fenv.h/math.h includes in the middle of the file?
 *) indenting at line 1208

deoptimization.hpp:
 *) Comment for the reason here?
  65     Reason_profile_predicate,

DataLayout.java:
 *) Comment is outdated:
  96   // 4 bits of trap history (none/one reason/many reasons),

 *) Indenting, and also the condition looks reversed. Cell size is the size of ptr, right? And we
have the union with u8 inside, which takes 2 slots on 32-bit VM?

 120   static int headerSizeInCells() {
 121       return VM.getVM().isLP64() ? 2 : 1;
 122   }


Thanks,
-Aleksey


From shade at redhat.com  Mon Jan 15 23:10:01 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 00:10:01 +0100
Subject: RFR: [8u] Bulk backports to sh/jdk8u
Message-ID: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/

This backports all outstanding work to sh/jdk8u. This passes a few nightlies in sh/jdk10. Some
changes, notably moving the VM operations around required some fiddling to match the code in sh/jdk8u.

Changes include:

 [backport] Increase test timeouts
 [backport] Report fwdptr size in JNI GetObjectSize
 [backport] Disable verification from non-Shenandoah VMOps.
 [backport] Cleanup reset_{next|complete}_mark_bitmap
 [backport] Verifier should check klass pointers before attempting to reach for object size
 [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode
 [backport] Shenandoah SA implementation
 [backport] Allow use of fp spills around write barrier
 [backport] Rehash VMOperations and cycle driver mechanics for consistency
 [backport] Minor cleanup, uses latest Atomic API
 [backport] Match barrier fastpath checks better
 [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath

NIO checkIndex fix, and assorted Windows compilation fixes were already
backported by Zhengyu and Roman.

Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks

Thanks,
-Aleksey


From shade at redhat.com  Tue Jan 16 11:26:54 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 12:26:54 +0100
Subject: RFR: Make degenerated update-refs use region-set cursor to hand
 over work
In-Reply-To: <73df3b1a-5926-1b7a-2194-cb3649bdf456@redhat.com>
References: <1a915cfc-4f78-242d-e528-3ce6b0729a1c@redhat.com>
 <73df3b1a-5926-1b7a-2194-cb3649bdf456@redhat.com>
Message-ID: <773dce1c-ef79-dbec-944b-3210ec72cda4@redhat.com>

On 12/14/2017 10:49 PM, Roman Kennke wrote:
> Am 14.12.2017 um 19:06 schrieb Aleksey Shipilev:
>> http://cr.openjdk.java.net/~shade/shenandoah/ur-degen-cursor/webrev.01/
>>
>> This is based on previous RFR that cleans up operations. For Degenerate GC to work, we want to drop
>> cancellation flag right away, and do init-update-refs, followed by final-update-refs to finish the
>> update refs work. But, final-update-refs would not finish work when cancellation is cleared.
>>
>> Since work handover is tracked by regions cursor anyway, why don't we use that to signal available
>> work? This also handles the case where cancellation is called when all threads have processed all
>> regions during conc-update-refs, and reacted on cancellation at the end of the phase. Current code
>> would make a futile attempt to whip up workers during final-update-refs, when we know there is no
>> work left.
> 
> Ok

Forgot to push this one! Re-testing and pushing today...

Thanks,
-Aleksey


From shade at redhat.com  Tue Jan 16 11:49:53 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 12:49:53 +0100
Subject: RFR: Hint unused regions instead of uncommit them
In-Reply-To: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com>
References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com>
Message-ID: <9bed336a-34df-5192-24da-db675b22cc45@redhat.com>

On 01/15/2018 06:21 PM, Zhengyu Gu wrote:
> This patch adds new experimental flag ShenandoahIdleRegions (default to false) to hint kernel that
> the regions are not needed (vs. madvise(MADV_DONTNEED), instead of proactively uncommitting.
> 
> It appears that does have advantage over uncommitting regions, although, not by as much as I was
> expected.
> 
> SPECjbb2015:
> 
> Baseline:
> RUN RESULT: hbIR (max attempted) = 59167, hbIR (settled) = 51984, max-jOPS = 47925, critical-jOPS =
> 19108
> 
> -XX:ShenandoahUncommitDelay=0 -XX:-ShenandoahIdleRegions
> RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 36501, max-jOPS = 30839, critical-jOPS =
> 8841
> 
> -XX:ShenandoahUncommitDelay=0 -XX:+ShenandoahIdleRegions
> RUN RESULT: hbIR (max attempted) = 49322, hbIR (settled) = 42968, max-jOPS = 35019, critical-jOPS =
> 9283
> 
> 
> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.00/

As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting
uneasy using this. madvise call that basically corrupts memory, say what? And it also does not
support large pages...

It _maybe_ makes sense to optionally support this, but only if we make the code changes minimal. It
looks like the fair bit of complexity comes from the attempt to fallback to commit/uncommit when
idling fails. Could we just test that idle/activate_memory works, and select one of the options
without fallback? E.g. when ShenandoahIdleRegions is true, LargePages is false, and idling works,
make do_commit/do_uncommit only do idle_memory/activate_memory, and fail hard when idle_memory
returns false. You would not need the _idle_region flag too then.

Thanks,
-ALeksey


From zgu at redhat.com  Tue Jan 16 13:13:59 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 16 Jan 2018 08:13:59 -0500
Subject: RFR: Hint unused regions instead of uncommit them
In-Reply-To: <9bed336a-34df-5192-24da-db675b22cc45@redhat.com>
References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com>
 <9bed336a-34df-5192-24da-db675b22cc45@redhat.com>
Message-ID: <6c57108d-5a93-c33f-e102-4bd7ec571e17@redhat.com>


On 01/16/2018 06:49 AM, Aleksey Shipilev wrote:
> On 01/15/2018 06:21 PM, Zhengyu Gu wrote:
>> This patch adds new experimental flag ShenandoahIdleRegions (default to false) to hint kernel that
>> the regions are not needed (vs. madvise(MADV_DONTNEED), instead of proactively uncommitting.
>>
>> It appears that does have advantage over uncommitting regions, although, not by as much as I was
>> expected.
>>
>> SPECjbb2015:
>>
>> Baseline:
>> RUN RESULT: hbIR (max attempted) = 59167, hbIR (settled) = 51984, max-jOPS = 47925, critical-jOPS =
>> 19108
>>
>> -XX:ShenandoahUncommitDelay=0 -XX:-ShenandoahIdleRegions
>> RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 36501, max-jOPS = 30839, critical-jOPS =
>> 8841
>>
>> -XX:ShenandoahUncommitDelay=0 -XX:+ShenandoahIdleRegions
>> RUN RESULT: hbIR (max attempted) = 49322, hbIR (settled) = 42968, max-jOPS = 35019, critical-jOPS =
>> 9283
>>
>>
>> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.00/
> 
> As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting
> uneasy using this. madvise call that basically corrupts memory, say what? And it also does not
> support large pages...
Hummm ... can you point me how it can corrupt memory? since it is the 
way how thread stack is released.

> 
> It _maybe_ makes sense to optionally support this, but only if we make the code changes minimal. It
> looks like the fair bit of complexity comes from the attempt to fallback to commit/uncommit when
> idling fails. Could we just test that idle/activate_memory works, and select one of the options
> without fallback? E.g. when ShenandoahIdleRegions is true, LargePages is false, and idling works,
> make do_commit/do_uncommit only do idle_memory/activate_memory, and fail hard when idle_memory
> returns false. You would not need the _idle_region flag too then.

Sure.

-Zhengyu

> 
> Thanks,
> -ALeksey
> 

From rkennke at redhat.com  Tue Jan 16 17:33:55 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 18:33:55 +0100
Subject: RFR: [9] Bulk backports to sh/jdk9
In-Reply-To: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com>
References: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com>
Message-ID: <d4ee6d7f-3d5a-8924-6dd7-93e817a1a39a@redhat.com>

Am 15.01.2018 um 18:18 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180115/webrev.01/
> 
> This backports all outstanding work to sh/jdk9. This passes a few nightlies.
> 
> Changes include:
> 
>   [backport] Increase test timeouts
>   [backport] Report fwdptr size in JNI GetObjectSize
>   [backport] Disable verification from non-Shenandoah VMOps.
>   [backport] Cleanup reset_{next|complete}_mark_bitmap
>   [backport] Verifier should check klass pointers before attempting to reach for object size
>   [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode
>   [backport] Shenandoah SA implementation
>   [backport] Allow use of fp spills around write barrier
>   [backport] Rehash VMOperations and cycle driver mechanics for consistency
>   [backport] Minor cleanup, uses latest Atomic API
>   [backport] Match barrier fastpath checks better
>   [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath
> 
> String deduplication, NIO checkIndex fix, and assorted Windows compilation fixes were already
> backported by Zhengyu and Roman.
> 
> Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks
> 
> Thanks,
> -Aleksey
> 

On thing that struck me that must have slipped my previous jdk10 review 
(but doesn't stop this backport):

-  void start_concurrent_marking();
    void stop_concurrent_marking();

Why is start_concurrent_marking() gone, but not stop_concurrent_marking() ?


Can't say about C2 changes.

Other than that, it's good for me.

Roman

From shade at redhat.com  Tue Jan 16 17:35:47 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 18:35:47 +0100
Subject: RFR: [9] Bulk backports to sh/jdk9
In-Reply-To: <d4ee6d7f-3d5a-8924-6dd7-93e817a1a39a@redhat.com>
References: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com>
 <d4ee6d7f-3d5a-8924-6dd7-93e817a1a39a@redhat.com>
Message-ID: <0b9acb5d-8b80-e17f-8aa2-234dad37fe9b@redhat.com>

On 01/16/2018 06:33 PM, Roman Kennke wrote:
> On thing that struck me that must have slipped my previous jdk10 review (but doesn't stop this
> backport):
> 
> -? void start_concurrent_marking();
> ?? void stop_concurrent_marking();
> 
> Why is start_concurrent_marking() gone, but not stop_concurrent_marking() ?

Because start* and stop* were not really symmetric. start_concurrent_marking() was the alias for
init-mark, while stop_concurrent_marking() is the method that cleans up mark mess, either in
concurrent or Full GC cycle. The naming choice was misleading.

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Jan 16 17:36:58 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 18:36:58 +0100
Subject: RFR: [9] Bulk backports to sh/jdk9
In-Reply-To: <0b9acb5d-8b80-e17f-8aa2-234dad37fe9b@redhat.com>
References: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com>
 <d4ee6d7f-3d5a-8924-6dd7-93e817a1a39a@redhat.com>
 <0b9acb5d-8b80-e17f-8aa2-234dad37fe9b@redhat.com>
Message-ID: <6d669c8a-0c8d-f15c-2706-c0ce346a7f48@redhat.com>

Am 16.01.2018 um 18:35 schrieb Aleksey Shipilev:
> On 01/16/2018 06:33 PM, Roman Kennke wrote:
>> On thing that struck me that must have slipped my previous jdk10 review (but doesn't stop this
>> backport):
>>
>> -? void start_concurrent_marking();
>>  ?? void stop_concurrent_marking();
>>
>> Why is start_concurrent_marking() gone, but not stop_concurrent_marking() ?
> 
> Because start* and stop* were not really symmetric. start_concurrent_marking() was the alias for
> init-mark, while stop_concurrent_marking() is the method that cleans up mark mess, either in
> concurrent or Full GC cycle. The naming choice was misleading.

Ok

From rkennke at redhat.com  Tue Jan 16 17:38:59 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 18:38:59 +0100
Subject: RFR: [8u] Bulk backports to sh/jdk8u
In-Reply-To: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com>
References: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com>
Message-ID: <1193e25c-addc-c158-d6ff-63148e2e3ddd@redhat.com>

Am 16.01.2018 um 00:10 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/
> 
> This backports all outstanding work to sh/jdk8u. This passes a few nightlies in sh/jdk10. Some
> changes, notably moving the VM operations around required some fiddling to match the code in sh/jdk8u.
> 
> Changes include:
> 
>   [backport] Increase test timeouts
>   [backport] Report fwdptr size in JNI GetObjectSize
>   [backport] Disable verification from non-Shenandoah VMOps.
>   [backport] Cleanup reset_{next|complete}_mark_bitmap
>   [backport] Verifier should check klass pointers before attempting to reach for object size
>   [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode
>   [backport] Shenandoah SA implementation
>   [backport] Allow use of fp spills around write barrier
>   [backport] Rehash VMOperations and cycle driver mechanics for consistency
>   [backport] Minor cleanup, uses latest Atomic API
>   [backport] Match barrier fastpath checks better
>   [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath
> 
> NIO checkIndex fix, and assorted Windows compilation fixes were already
> backported by Zhengyu and Roman.
> 
> Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks
> 
> Thanks,
> -Aleksey
> 

Looks good to me. Can't say for sure about C2 changes.

Roman


From rkennke at redhat.com  Tue Jan 16 17:43:23 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 18:43:23 +0100
Subject: RFR: Guard interpreter keep alive barrier with
 ShenandoahKeepAliveBarrier
Message-ID: <d2fa5322-3305-ea05-6cc1-cfae30f2b33a@redhat.com>

One thing that I found in traversal GC work: with 
-ShenandoahKeepAliveBarrier we still generate some code in the 
interpreter that is only used for the keep-alive-barrier. This patch 
avoids this. I realize that this should require some better refactoring 
(to move more of that code into keep_alive_barrier() to begin with), but 
I suspect that can wait until upstream codegens arrive, then we need to 
refactor it (big time) anyway.

http://cr.openjdk.java.net/~rkennke/interpreter_keep_alive_barrier/webrev.00/

Tests: hotspot_gc_shenandoah

Ok?

From shade at redhat.com  Tue Jan 16 17:47:32 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 18:47:32 +0100
Subject: RFR: Guard interpreter keep alive barrier with
 ShenandoahKeepAliveBarrier
In-Reply-To: <d2fa5322-3305-ea05-6cc1-cfae30f2b33a@redhat.com>
References: <d2fa5322-3305-ea05-6cc1-cfae30f2b33a@redhat.com>
Message-ID: <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com>

On 01/16/2018 06:43 PM, Roman Kennke wrote:
> One thing that I found in traversal GC work: with -ShenandoahKeepAliveBarrier we still generate some
> code in the interpreter that is only used for the keep-alive-barrier. This patch avoids this. I
> realize that this should require some better refactoring (to move more of that code into
> keep_alive_barrier() to begin with), but I suspect that can wait until upstream codegens arrive,
> then we need to refactor it (big time) anyway.
> 
> http://cr.openjdk.java.net/~rkennke/interpreter_keep_alive_barrier/webrev.00/

I think this is already handled inside MacroAssembler::keep_alive_barrier, that this block
eventually calls into:

void MacroAssembler::keep_alive_barrier(Register val,
                                        Register thread,
                                        Register tmp) {

  if (UseG1GC) {
    // Generate the G1 pre-barrier code to log the value of
    // the referent field in an SATB buffer.
    g1_write_barrier_pre(noreg,
                         rax /* pre_val */,
                         thread /* thread */,
                         tmp,
                         true /* tosca_live */,
                         true /* expand_call */);
  } else if (UseShenandoahGC && ShenandoahKeepAliveBarrier) {
    shenandoah_write_barrier_pre(noreg,
                                 rax /* pre_val */,
                                 thread /* thread */,
                                 tmp,
                                 true /* tosca_live */,
                                 true /* expand_call */);
  }
}

So the better fix would probably revisit all uses of keep_alive_barrier, and protect their relevant
blocks, then putting the assert(ShenandoahKeepAliveBarrier) in MacroAssembler::keep_alive_barrier?

-Aleksey


From rkennke at redhat.com  Tue Jan 16 17:49:41 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 18:49:41 +0100
Subject: RFR: Guard interpreter keep alive barrier with
 ShenandoahKeepAliveBarrier
In-Reply-To: <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com>
References: <d2fa5322-3305-ea05-6cc1-cfae30f2b33a@redhat.com>
 <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com>
Message-ID: <4745f18f-4952-74c5-81e9-dc5b33740d9d@redhat.com>

Am 16.01.2018 um 18:47 schrieb Aleksey Shipilev:
> On 01/16/2018 06:43 PM, Roman Kennke wrote:
>> One thing that I found in traversal GC work: with -ShenandoahKeepAliveBarrier we still generate some
>> code in the interpreter that is only used for the keep-alive-barrier. This patch avoids this. I
>> realize that this should require some better refactoring (to move more of that code into
>> keep_alive_barrier() to begin with), but I suspect that can wait until upstream codegens arrive,
>> then we need to refactor it (big time) anyway.
>>
>> http://cr.openjdk.java.net/~rkennke/interpreter_keep_alive_barrier/webrev.00/
> 
> I think this is already handled inside MacroAssembler::keep_alive_barrier, that this block
> eventually calls into:
> 
> void MacroAssembler::keep_alive_barrier(Register val,
>                                          Register thread,
>                                          Register tmp) {
> 
>    if (UseG1GC) {
>      // Generate the G1 pre-barrier code to log the value of
>      // the referent field in an SATB buffer.
>      g1_write_barrier_pre(noreg,
>                           rax /* pre_val */,
>                           thread /* thread */,
>                           tmp,
>                           true /* tosca_live */,
>                           true /* expand_call */);
>    } else if (UseShenandoahGC && ShenandoahKeepAliveBarrier) {
>      shenandoah_write_barrier_pre(noreg,
>                                   rax /* pre_val */,
>                                   thread /* thread */,
>                                   tmp,
>                                   true /* tosca_live */,
>                                   true /* expand_call */);
>    }
> }
> 
> So the better fix would probably revisit all uses of keep_alive_barrier, and protect their relevant
> blocks, then putting the assert(ShenandoahKeepAliveBarrier) in MacroAssembler::keep_alive_barrier?
> 
> -Aleksey
> 

Yes, that is what I mean with 'this should need more refactoring' ;-) 
Only the code that I touched uses it, so we should infact move all that 
code under keep_alive_barrier() instead. Want me to do that now? Or wait 
until codegen for interpreter arrives and do it really properly?

Roman


From shade at redhat.com  Tue Jan 16 17:49:50 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 18:49:50 +0100
Subject: RFR: Hint unused regions instead of uncommit them
In-Reply-To: <6c57108d-5a93-c33f-e102-4bd7ec571e17@redhat.com>
References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com>
 <9bed336a-34df-5192-24da-db675b22cc45@redhat.com>
 <6c57108d-5a93-c33f-e102-4bd7ec571e17@redhat.com>
Message-ID: <2168843c-7581-b8ba-2e5a-ea7577579e3e@redhat.com>

On 01/16/2018 02:13 PM, Zhengyu Gu wrote:
>> As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting
>> uneasy using this. madvise call that basically corrupts memory, say what? And it also does not
>> support large pages...
> Hummm ... can you point me how it can corrupt memory? since it is the way how thread stack is released.

Ah, I meant that it is very surprising to have madvise to do anything that affects correctness
MADV_DONTNEED basically destructs the page contents, as far as application is concerned. Awkward API...

-Aleksey


From shade at redhat.com  Tue Jan 16 17:52:44 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 18:52:44 +0100
Subject: RFR: Guard interpreter keep alive barrier with
 ShenandoahKeepAliveBarrier
In-Reply-To: <4745f18f-4952-74c5-81e9-dc5b33740d9d@redhat.com>
References: <d2fa5322-3305-ea05-6cc1-cfae30f2b33a@redhat.com>
 <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com>
 <4745f18f-4952-74c5-81e9-dc5b33740d9d@redhat.com>
Message-ID: <8fa031cc-ca9e-ce4c-d5b8-f4f3110f2b93@redhat.com>

On 01/16/2018 06:49 PM, Roman Kennke wrote:
> Am 16.01.2018 um 18:47 schrieb Aleksey Shipilev:
>> So the better fix would probably revisit all uses of keep_alive_barrier, and protect their relevant
>> blocks, then putting the assert(ShenandoahKeepAliveBarrier) in MacroAssembler::keep_alive_barrier?
>>
>> -Aleksey
>>
> 
> Yes, that is what I mean with 'this should need more refactoring' ;-) Only the code that I touched
> uses it, so we should infact move all that code under keep_alive_barrier() instead. Want me to do
> that now? Or wait until codegen for interpreter arrives and do it really properly?

I think it does not matter at this point. We usually use Shenandoah*Barrier as the performance
investigation tool, which means we do care about what compilers do. We are not really interested in
what interpreters do perf-wise. So, a better move resource-wise would be to make it right once,
after codegen interfaces arrive.

Or, does it affect Traversal GC perf?

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Jan 16 17:54:13 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 18:54:13 +0100
Subject: RFR: Guard interpreter keep alive barrier with
 ShenandoahKeepAliveBarrier
In-Reply-To: <8fa031cc-ca9e-ce4c-d5b8-f4f3110f2b93@redhat.com>
References: <d2fa5322-3305-ea05-6cc1-cfae30f2b33a@redhat.com>
 <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com>
 <4745f18f-4952-74c5-81e9-dc5b33740d9d@redhat.com>
 <8fa031cc-ca9e-ce4c-d5b8-f4f3110f2b93@redhat.com>
Message-ID: <0d0c50ac-5e39-d2c1-a9d1-891ccf545cf7@redhat.com>

Am 16.01.2018 um 18:52 schrieb Aleksey Shipilev:
> On 01/16/2018 06:49 PM, Roman Kennke wrote:
>> Am 16.01.2018 um 18:47 schrieb Aleksey Shipilev:
>>> So the better fix would probably revisit all uses of keep_alive_barrier, and protect their relevant
>>> blocks, then putting the assert(ShenandoahKeepAliveBarrier) in MacroAssembler::keep_alive_barrier?
>>>
>>> -Aleksey
>>>
>>
>> Yes, that is what I mean with 'this should need more refactoring' ;-) Only the code that I touched
>> uses it, so we should infact move all that code under keep_alive_barrier() instead. Want me to do
>> that now? Or wait until codegen for interpreter arrives and do it really properly?
> 
> I think it does not matter at this point. We usually use Shenandoah*Barrier as the performance
> investigation tool, which means we do care about what compilers do. We are not really interested in
> what interpreters do perf-wise. So, a better move resource-wise would be to make it right once,
> after codegen interfaces arrive.
> 
> Or, does it affect Traversal GC perf?
> 

No, not really. It just means we keep alive more weakrefs than we need to.

Ok, let's drop it for now.

Roman


From rkennke at redhat.com  Tue Jan 16 17:59:47 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 18:59:47 +0100
Subject: RFR: Defer cleaning of system dictionary and friends to parallel
 cleaning phase
Message-ID: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com>

Found this during traversal GC work: when cleaning the system dictionary 
and friends, we do clean it in the first pass, *single threaded* and 
then do the cleaning stuff again, but multi-threaded. We shall defer 
cleaning to the parallel phase to begin with. That's what G1 does too.

http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/

Ok?

Roman

From shade at redhat.com  Tue Jan 16 18:09:50 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 19:09:50 +0100
Subject: RFR: Defer cleaning of system dictionary and friends to parallel
 cleaning phase
In-Reply-To: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com>
References: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com>
Message-ID: <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com>

On 01/16/2018 06:59 PM, Roman Kennke wrote:
> Found this during traversal GC work: when cleaning the system dictionary and friends, we do clean it
> in the first pass, *single threaded* and then do the cleaning stuff again, but multi-threaded. We
> shall defer cleaning to the parallel phase to begin with. That's what G1 does too.
> 
> http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/
> 

Awwwwwwww.

Note that in G1, there are two calls to do_unloading: one from weak_refs_work with "false", and
another from mark_sweep_phase1 with default "true".

Are you saying that doing this once with "false" is enough? It looks that ParallelCleaning stuff
purges ResolvedMethodTable, but does it do ClassLoaderDataGraph::do_unloading with
clean_previous_versions? Maybe we should cautiously say "full_gc", not "false" in the patch, so
last-ditch can still do it?

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Jan 16 18:15:27 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 19:15:27 +0100
Subject: RFR: Defer cleaning of system dictionary and friends to parallel
 cleaning phase
In-Reply-To: <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com>
References: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com>
 <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com>
Message-ID: <5f6bb6b8-edfc-61e5-121e-d30a0370f372@redhat.com>

Am 16.01.2018 um 19:09 schrieb Aleksey Shipilev:
> On 01/16/2018 06:59 PM, Roman Kennke wrote:
>> Found this during traversal GC work: when cleaning the system dictionary and friends, we do clean it
>> in the first pass, *single threaded* and then do the cleaning stuff again, but multi-threaded. We
>> shall defer cleaning to the parallel phase to begin with. That's what G1 does too.
>>
>> http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/
>>
> 
> Awwwwwwww.
> 
> Note that in G1, there are two calls to do_unloading: one from weak_refs_work with "false", and
> another from mark_sweep_phase1 with default "true".

For ordinary concurrent GCs, it cleans everything in parallel phase, and 
thus passes 'false' to do_unloading(). For full-GC, I guess they don't 
care and do everything single-threaded.

> Are you saying that doing this once with "false" is enough? It looks that ParallelCleaning stuff
> purges ResolvedMethodTable, but does it do ClassLoaderDataGraph::do_unloading with
> clean_previous_versions? Maybe we should cautiously say "full_gc", not "false" in the patch, so
> last-ditch can still do it?

I believe the ParallelCleaning handles everything. Zhengyu?

Roman

From rkennke at redhat.com  Tue Jan 16 18:38:42 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 19:38:42 +0100
Subject: RFR: Traveral GC heuristics
Message-ID: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>

This started out as a smallish partial-GC experiment, then into a clone 
of partial GC, and ended up as a standalone GC mode for Shenandoah, 
which is a frankensteinization of partial+concurrent-marking, with some 
goodies :-)

The idea is to do everything, marking+evacuation+update-refs, in one 
single phase. This is not very difficult to do: while traversing, 
evacuate objects that are in the Cset, and update references as we go. I 
chose to traverse the heap using an incremental-update approach, mostly 
because this is what partial GC does, and as said above, this started 
out as a clone of partial :-)

The tricky part is to choose the Cset: I made it such that each GC cycle 
collects liveness information, and bases the decision about Cset in the 
next cycle on that liveness information. Yes, this means the first cycle 
does not collect anything (except immediate garbage).

Advantages:
- obviously, touching all live objects only once means less time spent 
in GC. Measurements show that traversing the heap and doing everything 
is only slightly longer than Shenandoah's marking phase, and this might 
actually be because we also need to mark through newly allocated objects.
- Traversal-order evacuation gives us 10x increase in ordering-sensitive 
microbenchmark: https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/

- Simpler barriers: i-u style barriers don't need to load the pre-value, 
and can be optimized much better (hoisted out of hot paths, etc). Some 
of it is already done in this patch, but there are plenty of 
opportunities to make it even better.
- Possibly less floating garbage because we trace through newly 
allocated objects too, and don't treat it implicitely live.
- we don't need a keep-alive-barrier for Reference.get() which means we 
keep fewer referents alive just because they happen to be accessed 
during GC.
- MWF is only a switch away (if I understand MWF correctly): 
-XX:+ShenandoahMWF
- It does not need RBs in the WB fast-path, because outside of the 
single phase, nothing is ever forwarded.
- It does not need the membar stuff in the WBs because we turn on/off 
the phase during safepoint

Disadvantages:
- Store-value barrier needs to be a WB, RB is not sufficient. The 
storeval barrier is there to ensure only to-space values ever get 
written to fields during update-refs. 3-phase Shenandoah doesn't 
evacuate during update-refs, and therefore RB is enough. We need WB 
here. (I believe this is off-set by optimization opportunities, see above)
- Known I-U problem: mutators can outrun the GC with allocations and let 
us not terminate.
- It needs barriers for constants (need to check this).

Stuff left to do:
- Implement sane degeneration: if we hit OOM, we simply restart and go 
into full-GC.
- Depending on degen: make heuristics adaptive. Currently it requires 
manual tweaking of thresholds.

Relevant knobs:
- ShenandoahGarbageThreshold: regions with more garbage than this go 
into the Cset. Notice that this is based on the *previous* cycle, so we 
may actually have much more garbage (but not less).
- ShenandoahFreeThreshold: start GC when we have less than that much 
free heap.

I'll not go into all the details for now and give you the code:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/


Roman

From zgu at redhat.com  Tue Jan 16 19:09:12 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 16 Jan 2018 14:09:12 -0500
Subject: RFR: Hint unused regions instead of uncommit them
In-Reply-To: <2168843c-7581-b8ba-2e5a-ea7577579e3e@redhat.com>
References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com>
 <9bed336a-34df-5192-24da-db675b22cc45@redhat.com>
 <6c57108d-5a93-c33f-e102-4bd7ec571e17@redhat.com>
 <2168843c-7581-b8ba-2e5a-ea7577579e3e@redhat.com>
Message-ID: <183f0c80-eac7-bb0a-caf9-c0db0d25cd5e@redhat.com>


On 01/16/2018 12:49 PM, Aleksey Shipilev wrote:
> On 01/16/2018 02:13 PM, Zhengyu Gu wrote:
>>> As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting
>>> uneasy using this. madvise call that basically corrupts memory, say what? And it also does not
>>> support large pages...
>> Hummm ... can you point me how it can corrupt memory? since it is the way how thread stack is released.
> 
> Ah, I meant that it is very surprising to have madvise to do anything that affects correctness
> MADV_DONTNEED basically destructs the page contents, as far as application is concerned. Awkward API...

Well, unmapping also destructs page contents. In fact, it will 
reconstruct content from the underlying mapped file, if it has backing 
file, or zero-fill-on-demand (which does not do any good to us) pages 
for mappings without an underlying file.

-Zhengyu


> 
> -Aleksey
> 

From zgu at redhat.com  Tue Jan 16 19:20:57 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 16 Jan 2018 14:20:57 -0500
Subject: RFR: [8u] Bulk backports to sh/jdk8u
In-Reply-To: <1193e25c-addc-c158-d6ff-63148e2e3ddd@redhat.com>
References: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com>
 <1193e25c-addc-c158-d6ff-63148e2e3ddd@redhat.com>
Message-ID: <fddbd68c-abf7-82b9-2640-a1ac3b7c1854@redhat.com>

Good to me. Can not say about barrier stuffs.

-Zhengyu

On 01/16/2018 12:38 PM, Roman Kennke wrote:
> Am 16.01.2018 um 00:10 schrieb Aleksey Shipilev:
>> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/ 
>>
>>
>> This backports all outstanding work to sh/jdk8u. This passes a few 
>> nightlies in sh/jdk10. Some
>> changes, notably moving the VM operations around required some 
>> fiddling to match the code in sh/jdk8u.
>>
>> Changes include:
>>
>>   [backport] Increase test timeouts
>>   [backport] Report fwdptr size in JNI GetObjectSize
>>   [backport] Disable verification from non-Shenandoah VMOps.
>>   [backport] Cleanup reset_{next|complete}_mark_bitmap
>>   [backport] Verifier should check klass pointers before attempting to 
>> reach for object size
>>   [backport] TestSelectiveBarrierFlags times out due to too aggressive 
>> compilation mode
>>   [backport] Shenandoah SA implementation
>>   [backport] Allow use of fp spills around write barrier
>>   [backport] Rehash VMOperations and cycle driver mechanics for 
>> consistency
>>   [backport] Minor cleanup, uses latest Atomic API
>>   [backport] Match barrier fastpath checks better
>>   [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB 
>> on WB fastpath
>>
>> NIO checkIndex fix, and assorted Windows compilation fixes were already
>> backported by Zhengyu and Roman.
>>
>> Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks
>>
>> Thanks,
>> -Aleksey
>>
> 
> Looks good to me. Can't say for sure about C2 changes.
> 
> Roman
> 

From shade at redhat.com  Tue Jan 16 19:24:06 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 20:24:06 +0100
Subject: RFR: ShConcurrentThread races with set_gc_state_bit
Message-ID: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/single-flag-races/webrev.01/

Zhengyu found this peculiar race:

When ShConcurrentThread sets {evac,update_refs}_in_progress, the set_gc_state_bit checks for the
safepoint. It turns out, after we checked for the safepoint and entered the Thread_lock-free branch,
the safepoint may be over. The way out is to restore the *_concurrent family of methods, and acquire
Thread_lock there unconditionally.

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From zgu at redhat.com  Tue Jan 16 19:33:23 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 16 Jan 2018 14:33:23 -0500
Subject: RFR: ShConcurrentThread races with set_gc_state_bit
In-Reply-To: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com>
References: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com>
Message-ID: <0567b57b-d152-be85-8189-4f4d8f7f31a2@redhat.com>


661 void ShenandoahHeap::set_gc_state_bit_concurrently(uint bit, bool 
value) {
1662   _gc_state.set_cond(bit, value);
1663   MutexLocker mu(Threads_lock);
1664   JavaThread::set_gc_state_all_threads(_gc_state.raw_value());


I wonder if you want to move _gc_state.set_cond(bit, value) into locked 
section? In case that global state is set, then we hit a safepoint ... 
not sure if it is matter.

Otherwise, it looks good.

Thanks,

-Zhengyu

On 01/16/2018 02:24 PM, Aleksey Shipilev wrote:
> http://cr.openjdk.java.net/~shade/shenandoah/single-flag-races/webrev.01/
> 
> Zhengyu found this peculiar race:
> 
> When ShConcurrentThread sets {evac,update_refs}_in_progress, the set_gc_state_bit checks for the
> safepoint. It turns out, after we checked for the safepoint and entered the Thread_lock-free branch,
> the safepoint may be over. The way out is to restore the *_concurrent family of methods, and acquire
> Thread_lock there unconditionally.
> 
> Testing: hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 

From shade at redhat.com  Tue Jan 16 19:34:37 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 20:34:37 +0100
Subject: RFR: ShConcurrentThread races with set_gc_state_bit
In-Reply-To: <0567b57b-d152-be85-8189-4f4d8f7f31a2@redhat.com>
References: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com>
 <0567b57b-d152-be85-8189-4f4d8f7f31a2@redhat.com>
Message-ID: <bb7d52d4-7fdf-4632-11ac-8733293d6659@redhat.com>

On 01/16/2018 08:33 PM, Zhengyu Gu wrote:
> 661 void ShenandoahHeap::set_gc_state_bit_concurrently(uint bit, bool value) {
> 1662?? _gc_state.set_cond(bit, value);
> 1663?? MutexLocker mu(Threads_lock);
> 1664?? JavaThread::set_gc_state_all_threads(_gc_state.raw_value());
> 
> 
> I wonder if you want to move _gc_state.set_cond(bit, value) into locked section? In case that global
> state is set, then we hit a safepoint ... not sure if it is matter.

Does not really matter: the Threads_lock is here to capture all threads. The GC state manipulation
is MT-safe in itself.

Thanks,
-Aleksey


From zgu at redhat.com  Tue Jan 16 19:38:49 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 16 Jan 2018 14:38:49 -0500
Subject: RFR: ShConcurrentThread races with set_gc_state_bit
In-Reply-To: <bb7d52d4-7fdf-4632-11ac-8733293d6659@redhat.com>
References: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com>
 <0567b57b-d152-be85-8189-4f4d8f7f31a2@redhat.com>
 <bb7d52d4-7fdf-4632-11ac-8733293d6659@redhat.com>
Message-ID: <97d969a1-5930-6805-5a67-932e7ef35d2c@redhat.com>


On 01/16/2018 02:34 PM, Aleksey Shipilev wrote:
> On 01/16/2018 08:33 PM, Zhengyu Gu wrote:
>> 661 void ShenandoahHeap::set_gc_state_bit_concurrently(uint bit, bool value) {
>> 1662   _gc_state.set_cond(bit, value);
>> 1663   MutexLocker mu(Threads_lock);
>> 1664   JavaThread::set_gc_state_all_threads(_gc_state.raw_value());
>>
>>
>> I wonder if you want to move _gc_state.set_cond(bit, value) into locked section? In case that global
>> state is set, then we hit a safepoint ... not sure if it is matter.
> 
> Does not really matter: the Threads_lock is here to capture all threads. The GC state manipulation
> is MT-safe in itself.
OK.

Thanks,

-Zhengyu

> 
> Thanks,
> -Aleksey
> 

From ashipile at redhat.com  Tue Jan 16 19:43:55 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Tue, 16 Jan 2018 19:43:55 +0000
Subject: hg: shenandoah/jdk10: ShConcurrentThread races with set_gc_state_bit
Message-ID: <201801161943.w0GJhtEk016595@aojmv0008.oracle.com>

Changeset: 544322604347
Author:    shade
Date:      2018-01-16 20:23 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/544322604347

ShConcurrentThread races with set_gc_state_bit

! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp


From zgu at redhat.com  Tue Jan 16 19:48:25 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 16 Jan 2018 14:48:25 -0500
Subject: RFR: Defer cleaning of system dictionary and friends to parallel
 cleaning phase
In-Reply-To: <5f6bb6b8-edfc-61e5-121e-d30a0370f372@redhat.com>
References: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com>
 <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com>
 <5f6bb6b8-edfc-61e5-121e-d30a0370f372@redhat.com>
Message-ID: <deb6eae2-1a1e-2a42-ae02-03f7bc62d670@redhat.com>


On 01/16/2018 01:15 PM, Roman Kennke wrote:
> Am 16.01.2018 um 19:09 schrieb Aleksey Shipilev:
>> On 01/16/2018 06:59 PM, Roman Kennke wrote:
>>> Found this during traversal GC work: when cleaning the system 
>>> dictionary and friends, we do clean it
>>> in the first pass, *single threaded* and then do the cleaning stuff 
>>> again, but multi-threaded. We
>>> shall defer cleaning to the parallel phase to begin with. That's what 
>>> G1 does too.
>>>
>>> http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/
>>>
>>
>> Awwwwwwww.
>>
>> Note that in G1, there are two calls to do_unloading: one from 
>> weak_refs_work with "false", and
>> another from mark_sweep_phase1 with default "true".
> 
> For ordinary concurrent GCs, it cleans everything in parallel phase, and 
> thus passes 'false' to do_unloading(). For full-GC, I guess they don't 
> care and do everything single-threaded.
> 
>> Are you saying that doing this once with "false" is enough? It looks 
>> that ParallelCleaning stuff
>> purges ResolvedMethodTable, but does it do 
>> ClassLoaderDataGraph::do_unloading with
>> clean_previous_versions? Maybe we should cautiously say "full_gc", not 
>> "false" in the patch, so
>> last-ditch can still do it?
> 
> I believe the ParallelCleaning handles everything. Zhengyu?
ParallelCleaning does handle ResolvedMethodTable ...

-Zhengyu


> 
> Roman

From shade at redhat.com  Tue Jan 16 20:28:51 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 21:28:51 +0100
Subject: RFR: Defer cleaning of system dictionary and friends to parallel
 cleaning phase
In-Reply-To: <deb6eae2-1a1e-2a42-ae02-03f7bc62d670@redhat.com>
References: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com>
 <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com>
 <5f6bb6b8-edfc-61e5-121e-d30a0370f372@redhat.com>
 <deb6eae2-1a1e-2a42-ae02-03f7bc62d670@redhat.com>
Message-ID: <174d5523-ed9b-8f00-e570-102cbbf186c7@redhat.com>

On 01/16/2018 08:48 PM, Zhengyu Gu wrote:
> 
> 
> On 01/16/2018 01:15 PM, Roman Kennke wrote:
>> Am 16.01.2018 um 19:09 schrieb Aleksey Shipilev:
>>> On 01/16/2018 06:59 PM, Roman Kennke wrote:
>>>> Found this during traversal GC work: when cleaning the system dictionary and friends, we do
>>>> clean it
>>>> in the first pass, *single threaded* and then do the cleaning stuff again, but multi-threaded. We
>>>> shall defer cleaning to the parallel phase to begin with. That's what G1 does too.
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/
>>>>
>>>
>>> Awwwwwwww.
>>>
>>> Note that in G1, there are two calls to do_unloading: one from weak_refs_work with "false", and
>>> another from mark_sweep_phase1 with default "true".
>>
>> For ordinary concurrent GCs, it cleans everything in parallel phase, and thus passes 'false' to
>> do_unloading(). For full-GC, I guess they don't care and do everything single-threaded.
>>
>>> Are you saying that doing this once with "false" is enough? It looks that ParallelCleaning stuff
>>> purges ResolvedMethodTable, but does it do ClassLoaderDataGraph::do_unloading with
>>> clean_previous_versions? Maybe we should cautiously say "full_gc", not "false" in the patch, so
>>> last-ditch can still do it?
>>
>> I believe the ParallelCleaning handles everything. Zhengyu?
> ParallelCleaning does handle ResolvedMethodTable ...

Ok then!

-Aleksey


From shade at redhat.com  Tue Jan 16 21:36:58 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 16 Jan 2018 22:36:58 +0100
Subject: RFR: Refactor allocation failure and explicit GC handling
Message-ID: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/refactor-af-explicit-gc/webrev.01/

This refactors the allocation failure and explicit GC handling, and prepares the code for the
arrival of STW Degenerate GC.

Tour of changes:

1. For historical reasons, we used to have the full_gc_* members in ShConcThread to handle the
allocation failure, because that was the only option available for us. With the advent of degenerate
CM and UR it started to mean just the "allocation failure". With Degenerate GC, it would further
depart from its original meaning. So, renaming full_gc_* to alloc_failure_* to capture the real
intent and rewiring accordingly is one part of the refactoring.

Behavioral change: Alloc-failed threads are not immediately kicked after degenerated CM and
degenerate UR, and instead they wait for the end of the cycle. This avoids a bad race against the
alloc-failed threads that are coming with cancellation at the same time, and it keeps us away from
OOM-during-evac when after-CM cleanup cannot regain enough space. This would be the behavior of the
upcoming Degenerated GC anyway.


2. There is also the path that invokes explicit GCs. Again, for historical reasons, that originally
meant only Full GC. With the advent of ExplicitGCInvokesConcurrent support, it means both concurrent
and Full GC cycles! So, renaming conc_gc_* to explicit_gc_* and rewiring accordingly is the second
part of refactoring.

Behavioral change: Explicit GC no longer cancels the concurrent cycle, instead it waits for another
control loop iteration to start explicit GC. This is for the best, because it both simplifies our
handling logic, and allows requesters to wait for their own cycle. This is interesting when
concurrent cycle is running, ExplicitGCInvokesConcurrent is enabled and System.gc() is called: the
requesting thread would wait for one complete GC cycle to start and finish.


3. The logic in main control loop used to handle weird paths from cancellations back to Full GC.
Having proper designations for alloc failure and explicit GCs help to write out the proper
priorities for these events. This also allows us to potentially plug Degenerate GC for the
out-of-cycle Allocation Failures, instead of unconditionally doing the Full GC.


4. Pulling the code out of ShenandoahHeap back to ShenandoahConcurrentThread allows to reduce
coupling. Also, ShenandoahGCCause is eliminated in favor of proper GCCause, which simplifies logic
further.


5. Additionally, gc+stats now tells things like these:

----- 8< ----------------------------------------------------------------------------------------

 Under allocation pressure, concurrent cycles will cancel, and either continue phase under
 stop-the-world pause or result in stop-the-world Full GC. Increase heap size, tune GC heuristics,
 or lower allocation rate to avoid degenerated and Full GC cycles.

   85 successful concurrent GC cycles
   27 cancelled concurrent GC cycles (5 degenerated marks, 10 degenerated update refs, 12 Full GCs)
   11 out-of-cycle allocation failures (11 Full GCs)
    0 explicitly requested GC cycles (0 Full GCs)

----- 8< ----------------------------------------------------------------------------------------


Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Jan 16 22:16:04 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jan 2018 23:16:04 +0100
Subject: RFR: Refactor allocation failure and explicit GC handling
In-Reply-To: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
References: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
Message-ID: <12c87930-1d42-56e8-a502-90b545a8b7a8@redhat.com>

Hi Aleksey,

I like it. This mess was long overdue for some refactoring ;-)

I am unsure about:

> Behavioral change: Explicit GC no longer cancels the concurrent cycle, instead it waits for another
> control loop iteration to start explicit GC. This is for the best, because it both simplifies our
> handling logic, and allows requesters to wait for their own cycle. This is interesting when
> concurrent cycle is running, ExplicitGCInvokesConcurrent is enabled and System.gc() is called: the
> requesting thread would wait for one complete GC cycle to start and finish.

That does mean that System.gc() at the beginning of marking would wait 
until marking+evac(+updaterefs?) finishes, then does the full-gc, and 
only then is the Java thread allowed to progress? I guess it does not 
really matter very much, but what is the point to wait for current cycle 
completion if it goes into full-gc anyway? I guess it is more relevant 
with ExplicitGCInvokesConcurrent (as you point out).

Other than that, it is good for me.

Thanks, Roman

From shade at redhat.com  Tue Jan 16 23:25:28 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jan 2018 00:25:28 +0100
Subject: RFR: Refactor allocation failure and explicit GC handling
In-Reply-To: <12c87930-1d42-56e8-a502-90b545a8b7a8@redhat.com>
References: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
 <12c87930-1d42-56e8-a502-90b545a8b7a8@redhat.com>
Message-ID: <48267858-d451-946b-24a4-048d868942cd@redhat.com>

On 01/16/2018 11:16 PM, Roman Kennke wrote:
> I am unsure about:
> 
>> Behavioral change: Explicit GC no longer cancels the concurrent cycle, instead it waits for another
>> control loop iteration to start explicit GC. This is for the best, because it both simplifies our
>> handling logic, and allows requesters to wait for their own cycle. This is interesting when
>> concurrent cycle is running, ExplicitGCInvokesConcurrent is enabled and System.gc() is called: the
>> requesting thread would wait for one complete GC cycle to start and finish.
> 
> That does mean that System.gc() at the beginning of marking would wait until
> marking+evac(+updaterefs?) finishes, then does the full-gc, and only then is the Java thread allowed
> to progress? 

Yes. Think about it like the event loop, where Full GC request gets queued while Conc GC is being
processed at the moment, and the Full GC requester waits its place in line. Basically shifts
System.gc() from being "OMG, drop everything" to being "Noted, take a number, we shall do this at
our convenience".

> I guess it does not really matter very much, but what is the point to wait for current
> cycle completion if it goes into full-gc anyway? 

There is little performance point, I guess, and there are no performance guarantees for System.gc
either :) There are two off-the-bat considerations: a) the abrupt explicit GC in the middle of
regular cycle can wreck up inflight compaction decisions of smarter relocation heuristics; b) as
concurrent GC cycle runs, we have more chances to coalesce explicit GC-s from multiple threads, and
do one Full GC at once, not many quick back-to-back cancellations.

But ultimately, this thing is really the implementation convenience: it makes cancellations *only*
happen during allocation failures, which simplifies reasoning about the whole thing. This helps a
lot with Degenerated GC, because cancellation is then the sole route to Degenerate GC (which can
then be upgraded to Full GC), without the need to figure out if that cancellation was due to
explicit GC.

-Aleksey


From zgu at redhat.com  Wed Jan 17 02:45:24 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 16 Jan 2018 21:45:24 -0500
Subject: RFR: Refactor allocation failure and explicit GC handling
In-Reply-To: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
References: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
Message-ID: <ddb16c55-c9f0-242f-e562-d521f26a226e@redhat.com>

ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor 
lock, and this method is in ShenandoahHeap::allocate_memory() path, in 
turn, can be called inside write barrier ... seems to be the scenario we 
talked before, that we *can not* do.

Maybe, I missed something?

Thanks,

-Zhengyu

On 01/16/2018 04:36 PM, Aleksey Shipilev wrote:
> http://cr.openjdk.java.net/~shade/shenandoah/refactor-af-explicit-gc/webrev.01/
> 
> This refactors the allocation failure and explicit GC handling, and prepares the code for the
> arrival of STW Degenerate GC.
> 
> Tour of changes:
> 
> 1. For historical reasons, we used to have the full_gc_* members in ShConcThread to handle the
> allocation failure, because that was the only option available for us. With the advent of degenerate
> CM and UR it started to mean just the "allocation failure". With Degenerate GC, it would further
> depart from its original meaning. So, renaming full_gc_* to alloc_failure_* to capture the real
> intent and rewiring accordingly is one part of the refactoring.
> 
> Behavioral change: Alloc-failed threads are not immediately kicked after degenerated CM and
> degenerate UR, and instead they wait for the end of the cycle. This avoids a bad race against the
> alloc-failed threads that are coming with cancellation at the same time, and it keeps us away from
> OOM-during-evac when after-CM cleanup cannot regain enough space. This would be the behavior of the
> upcoming Degenerated GC anyway.
> 
> 
> 2. There is also the path that invokes explicit GCs. Again, for historical reasons, that originally
> meant only Full GC. With the advent of ExplicitGCInvokesConcurrent support, it means both concurrent
> and Full GC cycles! So, renaming conc_gc_* to explicit_gc_* and rewiring accordingly is the second
> part of refactoring.
> 
> Behavioral change: Explicit GC no longer cancels the concurrent cycle, instead it waits for another
> control loop iteration to start explicit GC. This is for the best, because it both simplifies our
> handling logic, and allows requesters to wait for their own cycle. This is interesting when
> concurrent cycle is running, ExplicitGCInvokesConcurrent is enabled and System.gc() is called: the
> requesting thread would wait for one complete GC cycle to start and finish.
> 
> 
> 3. The logic in main control loop used to handle weird paths from cancellations back to Full GC.
> Having proper designations for alloc failure and explicit GCs help to write out the proper
> priorities for these events. This also allows us to potentially plug Degenerate GC for the
> out-of-cycle Allocation Failures, instead of unconditionally doing the Full GC.
> 
> 
> 4. Pulling the code out of ShenandoahHeap back to ShenandoahConcurrentThread allows to reduce
> coupling. Also, ShenandoahGCCause is eliminated in favor of proper GCCause, which simplifies logic
> further.
> 
> 
> 5. Additionally, gc+stats now tells things like these:
> 
> ----- 8< ----------------------------------------------------------------------------------------
> 
>   Under allocation pressure, concurrent cycles will cancel, and either continue phase under
>   stop-the-world pause or result in stop-the-world Full GC. Increase heap size, tune GC heuristics,
>   or lower allocation rate to avoid degenerated and Full GC cycles.
> 
>     85 successful concurrent GC cycles
>     27 cancelled concurrent GC cycles (5 degenerated marks, 10 degenerated update refs, 12 Full GCs)
>     11 out-of-cycle allocation failures (11 Full GCs)
>      0 explicitly requested GC cycles (0 Full GCs)
> 
> ----- 8< ----------------------------------------------------------------------------------------
> 
> 
> Testing: hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 

From rwestrel at redhat.com  Wed Jan 17 08:23:09 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 17 Jan 2018 09:23:09 +0100
Subject: RFR: [9] Bulk backports to sh/jdk9
In-Reply-To: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com>
References: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com>
Message-ID: <dk6h8rkyjv6.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180115/webrev.01/

C2 changes look ok ok.

Roland.

From rwestrel at redhat.com  Wed Jan 17 08:23:46 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 17 Jan 2018 09:23:46 +0100
Subject: RFR: [8u] Bulk backports to sh/jdk8u
In-Reply-To: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com>
References: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com>
Message-ID: <dk6efmoyju5.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/

C2 changes look ok.

Roland.

From shade at redhat.com  Wed Jan 17 08:32:11 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jan 2018 09:32:11 +0100
Subject: RFR: Refactor allocation failure and explicit GC handling
In-Reply-To: <ddb16c55-c9f0-242f-e562-d521f26a226e@redhat.com>
References: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
 <ddb16c55-c9f0-242f-e562-d521f26a226e@redhat.com>
Message-ID: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com>

On 01/17/2018 03:45 AM, Zhengyu Gu wrote:
> ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in
> ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be
> the scenario we talked before, that we *can not* do.

Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code
too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) ->
ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and
this is why we have separate ::handle_alloc_failure_evac().

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 17 09:01:32 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 17 Jan 2018 10:01:32 +0100
Subject: RFR: Refactor allocation failure and explicit GC handling
In-Reply-To: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com>
References: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
 <ddb16c55-c9f0-242f-e562-d521f26a226e@redhat.com>
 <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com>
Message-ID: <040313a8-7c7d-ce4f-3d75-78b050dc58aa@redhat.com>

Am 17.01.2018 um 09:32 schrieb Aleksey Shipilev:
> On 01/17/2018 03:45 AM, Zhengyu Gu wrote:
>> ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in
>> ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be
>> the scenario we talked before, that we *can not* do.
> 
> Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code
> too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) ->
> ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and
> this is why we have separate ::handle_alloc_failure_evac().
> 

Yes. We must not take locks under the write-barrier, because that is a 
leaf-call and must not possibly take a safepoint. It's ok to take locks 
in the allocation(-failure) path, because that is a no-leaf call, and 
may take safepoints.

  Infact, this is what happens with other GCs: allocation failure goes 
straight to VMThread::execute() ... I wonder if we could also do this 
and avoid the locking? But then, how to communicate with the ShConcThread ?

Roman


From shade at redhat.com  Wed Jan 17 09:03:50 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jan 2018 10:03:50 +0100
Subject: RFR: Refactor allocation failure and explicit GC handling
In-Reply-To: <040313a8-7c7d-ce4f-3d75-78b050dc58aa@redhat.com>
References: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
 <ddb16c55-c9f0-242f-e562-d521f26a226e@redhat.com>
 <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com>
 <040313a8-7c7d-ce4f-3d75-78b050dc58aa@redhat.com>
Message-ID: <ef73d93e-da9f-d1b9-69ea-7a162e982b97@redhat.com>

On 01/17/2018 10:01 AM, Roman Kennke wrote:
> Am 17.01.2018 um 09:32 schrieb Aleksey Shipilev:
> Infact, this is what happens with other GCs: allocation failure goes straight to
> VMThread::execute() ... I wonder if we could also do this and avoid the locking? But then, how to
> communicate with the ShConcThread ?

No, we should not do the VMOp right away. Our ShConcThread is really a Driver, and we need to tell
the Driver we have experienced allocation failure. Then it could decide what to do: Full GC,
Degenerated GC, continue with Conc GC, fail hard...

Thanks,
-Aleksey


From ashipile at redhat.com  Wed Jan 17 09:52:46 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 17 Jan 2018 09:52:46 +0000
Subject: hg: shenandoah/jdk9/hotspot: 12 new changesets
Message-ID: <201801170952.w0H9qk5B000404@aojmv0008.oracle.com>

Changeset: d0ad502cc3a0
Author:    rkennke
Date:      2018-01-15 16:29 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/d0ad502cc3a0

[backport] Increase test timeouts

! test/gc/shenandoah/EvilSyncBug.java
! test/gc/shenandoah/jvmti/TestHeapDump.java

Changeset: e18143c303e9
Author:    shade
Date:      2018-01-15 16:32 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e18143c303e9

[backport] Report fwdptr size in JNI GetObjectSize

! src/share/vm/prims/jvmtiEnv.cpp
! src/share/vm/prims/whitebox.cpp

Changeset: a0be695501fe
Author:    rkennke
Date:      2018-01-15 16:33 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a0be695501fe

[backport] Disable verification from non-Shenandoah VMOps.

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp

Changeset: 9da7354496dd
Author:    shade
Date:      2018-01-15 16:37 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9da7354496dd

[backport] Cleanup reset_{next|complete}_mark_bitmap

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp

Changeset: 3b3dbadb82eb
Author:    shade
Date:      2018-01-15 16:39 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/3b3dbadb82eb

[backport] Verifier should check klass pointers before attempting to reach for object size

! src/share/vm/gc/shenandoah/shenandoahVerifier.cpp

Changeset: 8a3aef24b983
Author:    shade
Date:      2018-01-15 16:39 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/8a3aef24b983

[backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode

! test/gc/shenandoah/TestSelectiveBarrierFlags.java

Changeset: 2bca755bd2e5
Author:    zgu
Date:      2018-01-15 16:52 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/2bca755bd2e5

[backport] Shenandoah SA implementation

! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shared/CollectedHeap.java
! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shared/CollectedHeapName.java
+ src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shenandoah/ShenandoahHeap.java
+ src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shenandoah/ShenandoahHeapRegion.java
+ src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shenandoah/ShenandoahHeapRegionSet.java
! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/Universe.java
! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java
! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/tools/HeapSummary.java
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahHeapRegion.hpp
! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.hpp
+ src/share/vm/gc/shenandoah/vmStructs_shenandoah.hpp
! src/share/vm/runtime/vmStructs.cpp

Changeset: 2f34f1efc3e1
Author:    roland
Date:      2018-01-15 17:03 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/2f34f1efc3e1

[backport] Allow use of fp spills around write barrier

! src/share/vm/opto/lcm.cpp

Changeset: e1bdfc09b91a
Author:    shade
Date:      2018-01-15 17:24 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e1bdfc09b91a

[backport] Rehash VMOperations and cycle driver mechanics for consistency

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp
! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.cpp
! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.hpp
! src/share/vm/gc/shenandoah/shenandoahUtils.cpp
! src/share/vm/gc/shenandoah/shenandoahUtils.hpp
! src/share/vm/gc/shenandoah/shenandoahWorkerPolicy.cpp
! src/share/vm/gc/shenandoah/shenandoahWorkerPolicy.hpp
! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp

Changeset: a335541ed527
Author:    zgu
Date:      2018-01-15 17:28 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a335541ed527

[backport] Minor cleanup, uses latest Atomic API

! src/share/vm/gc/shenandoah/shenandoahCodeRoots.hpp

Changeset: fcf4e5e7b36f
Author:    shade
Date:      2018-01-15 17:29 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fcf4e5e7b36f

[backport] Match barrier fastpath checks better

! src/cpu/x86/vm/x86_64.ad

Changeset: b2bc1c1c6fd7
Author:    shade
Date:      2018-01-15 17:32 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/b2bc1c1c6fd7

[backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath

! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
! src/cpu/x86/vm/macroAssembler_x86.cpp
! src/share/vm/gc/shenandoah/shenandoah_globals.hpp
! src/share/vm/opto/shenandoahSupport.cpp


From ashipile at redhat.com  Wed Jan 17 10:33:54 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 17 Jan 2018 10:33:54 +0000
Subject: hg: shenandoah/jdk8u/hotspot: 12 new changesets
Message-ID: <201801171033.w0HAXtRT016624@aojmv0008.oracle.com>

Changeset: c580b405b19c
Author:    rkennke
Date:      2018-01-15 18:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c580b405b19c

[backport] Increase test timeouts

! test/gc/shenandoah/EvilSyncBug.java
! test/gc/shenandoah/jvmti/TestHeapDump.sh

Changeset: 889331b172e1
Author:    shade
Date:      2018-01-15 18:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/889331b172e1

[backport] Report fwdptr size in JNI GetObjectSize

! src/share/vm/prims/jvmtiEnv.cpp
! src/share/vm/prims/whitebox.cpp

Changeset: 229a50c88055
Author:    rkennke
Date:      2018-01-15 18:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/229a50c88055

[backport] Disable verification from non-Shenandoah VMOps.

! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp

Changeset: 8459d5e19134
Author:    shade
Date:      2018-01-15 18:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/8459d5e19134

[backport] Cleanup reset_{next|complete}_mark_bitmap

! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp

Changeset: 1a1daa04a9ca
Author:    shade
Date:      2018-01-15 18:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/1a1daa04a9ca

[backport] Verifier should check klass pointers before attempting to reach for object size

! src/share/vm/gc_implementation/shenandoah/shenandoahVerifier.cpp

Changeset: a53bcb78b95d
Author:    shade
Date:      2018-01-15 18:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/a53bcb78b95d

[backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode

! test/gc/shenandoah/TestSelectiveBarrierFlags.java

Changeset: b9559ebe9575
Author:    zgu
Date:      2018-01-15 19:21 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/b9559ebe9575

[backport] Shenandoah SA implementation

+ agent/src/share/classes/sun/jvm/hotspot/gc_implementation/shenandoah/ShenandoahHeap.java
+ agent/src/share/classes/sun/jvm/hotspot/gc_implementation/shenandoah/ShenandoahHeapRegion.java
+ agent/src/share/classes/sun/jvm/hotspot/gc_implementation/shenandoah/ShenandoahHeapRegionSet.java
! agent/src/share/classes/sun/jvm/hotspot/gc_interface/CollectedHeap.java
! agent/src/share/classes/sun/jvm/hotspot/gc_interface/CollectedHeapName.java
! agent/src/share/classes/sun/jvm/hotspot/memory/Universe.java
! agent/src/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java
! agent/src/share/classes/sun/jvm/hotspot/tools/HeapSummary.java
! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.hpp
+ src/share/vm/gc_implementation/shenandoah/vmStructs_shenandoah.hpp
! src/share/vm/runtime/vmStructs.cpp

Changeset: 2310d6a52d04
Author:    roland
Date:      2018-01-17 10:28 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/2310d6a52d04

[backport] Allow use of fp spills around write barrier

! src/share/vm/opto/lcm.cpp

Changeset: 6d265ee073d5
Author:    shade
Date:      2018-01-17 10:28 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/6d265ee073d5

[backport] Rehash VMOperations and cycle driver mechanics for consistency

! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahPhaseTimings.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahPhaseTimings.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahUtils.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahUtils.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoahWorkerPolicy.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahWorkerPolicy.hpp
! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.cpp

Changeset: 65ff5f8ac60f
Author:    zgu
Date:      2018-01-17 10:28 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/65ff5f8ac60f

[backport] Minor cleanup, uses latest Atomic API

! src/share/vm/gc_implementation/shenandoah/shenandoahCodeRoots.hpp

Changeset: 755e302d100e
Author:    shade
Date:      2018-01-17 10:28 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/755e302d100e

[backport] Match barrier fastpath checks better

! src/cpu/x86/vm/x86_64.ad

Changeset: 32480cdd3a60
Author:    shade
Date:      2018-01-17 10:28 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/32480cdd3a60

[backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath

! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
! src/cpu/x86/vm/macroAssembler_x86.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp
! src/share/vm/opto/shenandoahSupport.cpp


From zgu at redhat.com  Wed Jan 17 13:35:47 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 17 Jan 2018 08:35:47 -0500
Subject: RFR: Refactor allocation failure and explicit GC handling
In-Reply-To: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com>
References: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
 <ddb16c55-c9f0-242f-e562-d521f26a226e@redhat.com>
 <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com>
Message-ID: <5d1c1ddf-cac2-1e16-859c-6b30b733ca46@redhat.com>


On 01/17/2018 03:32 AM, Aleksey Shipilev wrote:
> On 01/17/2018 03:45 AM, Zhengyu Gu wrote:
>> ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in
>> ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be
>> the scenario we talked before, that we *can not* do.
> 
> Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code
> too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) ->
> ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and
> this is why we have separate ::handle_alloc_failure_evac().

Ah, it is a bit hard to read, could you add some comments like:

ShenandoahHeap:
  726   if (type == _alloc_tlab || type == _alloc_shared) {
    ....
  } else {
     assert(type == _alloc_gclab || type == _alloc_shared_gc, ...");
     // OOM handled by ....
  }


Thanks,

-Zhengyu


> 
> Thanks,
> -Aleksey
> 

From shade at redhat.com  Wed Jan 17 14:12:21 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jan 2018 15:12:21 +0100
Subject: RFR: Refactor allocation failure and explicit GC handling
In-Reply-To: <5d1c1ddf-cac2-1e16-859c-6b30b733ca46@redhat.com>
References: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
 <ddb16c55-c9f0-242f-e562-d521f26a226e@redhat.com>
 <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com>
 <5d1c1ddf-cac2-1e16-859c-6b30b733ca46@redhat.com>
Message-ID: <74baaf19-1715-5f8a-f700-7d4f0cc08441@redhat.com>

On 01/17/2018 02:35 PM, Zhengyu Gu wrote:
> 
> 
> On 01/17/2018 03:32 AM, Aleksey Shipilev wrote:
>> On 01/17/2018 03:45 AM, Zhengyu Gu wrote:
>>> ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in
>>> ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be
>>> the scenario we talked before, that we *can not* do.
>>
>> Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code
>> too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) ->
>> ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and
>> this is why we have separate ::handle_alloc_failure_evac().
> 
> Ah, it is a bit hard to read, could you add some comments like:
> 
> ShenandoahHeap:
> ?726?? if (type == _alloc_tlab || type == _alloc_shared) {
> ?? ....
> ?} else {
> ??? assert(type == _alloc_gclab || type == _alloc_shared_gc, ...");
> ??? // OOM handled by ....
> ?}
> 

That makes sense, added:

diff -r 898a5ca31274 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp	Tue Jan 16 22:15:34 2018 +0100
+++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp	Wed Jan 17 15:11:55 2018 +0100
@@ -741,6 +741,10 @@
       concurrent_thread()->handle_alloc_failure();
       result = allocate_memory_under_lock(word_size, type, in_new_region);
     }
+  } else {
+    assert(type == _alloc_gclab || type == _alloc_shared_gc, "Can only accept these types here");
+    // Do not call handle_alloc_failure() here, because we cannot block.
+    // The allocation failure would be handled by the WB slowpath with handle_alloc_failure_evac().
   }

   if (in_new_region) {

Thanks,
-Aleksey


From zgu at redhat.com  Wed Jan 17 14:21:33 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 17 Jan 2018 09:21:33 -0500
Subject: RFR: Refactor allocation failure and explicit GC handling
In-Reply-To: <74baaf19-1715-5f8a-f700-7d4f0cc08441@redhat.com>
References: <c3ca4026-951c-fbbd-adf2-6ac3d156af28@redhat.com>
 <ddb16c55-c9f0-242f-e562-d521f26a226e@redhat.com>
 <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com>
 <5d1c1ddf-cac2-1e16-859c-6b30b733ca46@redhat.com>
 <74baaf19-1715-5f8a-f700-7d4f0cc08441@redhat.com>
Message-ID: <a142f706-0b7b-ac69-c8f1-be368a2b3787@redhat.com>


On 01/17/2018 09:12 AM, Aleksey Shipilev wrote:
> On 01/17/2018 02:35 PM, Zhengyu Gu wrote:
>>
>>
>> On 01/17/2018 03:32 AM, Aleksey Shipilev wrote:
>>> On 01/17/2018 03:45 AM, Zhengyu Gu wrote:
>>>> ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in
>>>> ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be
>>>> the scenario we talked before, that we *can not* do.
>>>
>>> Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code
>>> too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) ->
>>> ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and
>>> this is why we have separate ::handle_alloc_failure_evac().
>>
>> Ah, it is a bit hard to read, could you add some comments like:
>>
>> ShenandoahHeap:
>>   726   if (type == _alloc_tlab || type == _alloc_shared) {
>>     ....
>>   } else {
>>      assert(type == _alloc_gclab || type == _alloc_shared_gc, ...");
>>      // OOM handled by ....
>>   }
>>
> 
> That makes sense, added:
> 
> diff -r 898a5ca31274 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp	Tue Jan 16 22:15:34 2018 +0100
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp	Wed Jan 17 15:11:55 2018 +0100
> @@ -741,6 +741,10 @@
>         concurrent_thread()->handle_alloc_failure();
>         result = allocate_memory_under_lock(word_size, type, in_new_region);
>       }
> +  } else {
> +    assert(type == _alloc_gclab || type == _alloc_shared_gc, "Can only accept these types here");
> +    // Do not call handle_alloc_failure() here, because we cannot block.
> +    // The allocation failure would be handled by the WB slowpath with handle_alloc_failure_evac().
>     }
> 
>     if (in_new_region) {
>

Great! Looks good to me.

Thanks,

-Zhengyu


> Thanks,
> -Aleksey
> 

From zgu at redhat.com  Wed Jan 17 14:28:55 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 17 Jan 2018 09:28:55 -0500
Subject: RFR: [8u] Bulk backports to sh/jdk8u
In-Reply-To: <dk6efmoyju5.fsf@rwestrel.remote.csb>
References: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com>
 <dk6efmoyju5.fsf@rwestrel.remote.csb>
Message-ID: <4b28c064-9fab-4386-7c3c-d7d92245b28e@redhat.com>

SA and cleanup look good.

-Zhengyu

On 01/17/2018 03:23 AM, Roland Westrelin wrote:
> 
>> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/
> 
> C2 changes look ok.
> 
> Roland.
> 

From roman at kennke.org  Wed Jan 17 14:37:57 2018
From: roman at kennke.org (roman at kennke.org)
Date: Wed, 17 Jan 2018 14:37:57 +0000
Subject: hg: shenandoah/jdk10: Defer cleaning of system dictionary and friends
 to parallel cleaning phase
Message-ID: <201801171437.w0HEbvcs028801@aojmv0008.oracle.com>

Changeset: 1d1238a0603b
Author:    rkennke
Date:      2018-01-17 15:33 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/1d1238a0603b

Defer cleaning of system dictionary and friends to parallel cleaning phase

! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp


From rkennke at redhat.com  Wed Jan 17 14:37:55 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 17 Jan 2018 15:37:55 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
Message-ID: <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>

Testing showed up some regressions in non-traversal code and two issues 
that I introduced (or haven't fixed) when single-flag patch arrived.

The following now passes hotspot_gc_shenandoah tests and runs of specjvm 
with fastdebug with -XX:+ShenandoahVerify 
-XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4

Differential:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/
Full:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/

Please review, test, comment, etc. :-)

Cheers, Roman

> This started out as a smallish partial-GC experiment, then into a clone 
> of partial GC, and ended up as a standalone GC mode for Shenandoah, 
> which is a frankensteinization of partial+concurrent-marking, with some 
> goodies :-)
> 
> The idea is to do everything, marking+evacuation+update-refs, in one 
> single phase. This is not very difficult to do: while traversing, 
> evacuate objects that are in the Cset, and update references as we go. I 
> chose to traverse the heap using an incremental-update approach, mostly 
> because this is what partial GC does, and as said above, this started 
> out as a clone of partial :-)
> 
> The tricky part is to choose the Cset: I made it such that each GC cycle 
> collects liveness information, and bases the decision about Cset in the 
> next cycle on that liveness information. Yes, this means the first cycle 
> does not collect anything (except immediate garbage).
> 
> Advantages:
> - obviously, touching all live objects only once means less time spent 
> in GC. Measurements show that traversing the heap and doing everything 
> is only slightly longer than Shenandoah's marking phase, and this might 
> actually be because we also need to mark through newly allocated objects.
> - Traversal-order evacuation gives us 10x increase in ordering-sensitive 
> microbenchmark: 
> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/
> 
> - Simpler barriers: i-u style barriers don't need to load the pre-value, 
> and can be optimized much better (hoisted out of hot paths, etc). Some 
> of it is already done in this patch, but there are plenty of 
> opportunities to make it even better.
> - Possibly less floating garbage because we trace through newly 
> allocated objects too, and don't treat it implicitely live.
> - we don't need a keep-alive-barrier for Reference.get() which means we 
> keep fewer referents alive just because they happen to be accessed 
> during GC.
> - MWF is only a switch away (if I understand MWF correctly): 
> -XX:+ShenandoahMWF
> - It does not need RBs in the WB fast-path, because outside of the 
> single phase, nothing is ever forwarded.
> - It does not need the membar stuff in the WBs because we turn on/off 
> the phase during safepoint
> 
> Disadvantages:
> - Store-value barrier needs to be a WB, RB is not sufficient. The 
> storeval barrier is there to ensure only to-space values ever get 
> written to fields during update-refs. 3-phase Shenandoah doesn't 
> evacuate during update-refs, and therefore RB is enough. We need WB 
> here. (I believe this is off-set by optimization opportunities, see above)
> - Known I-U problem: mutators can outrun the GC with allocations and let 
> us not terminate.
> - It needs barriers for constants (need to check this).
> 
> Stuff left to do:
> - Implement sane degeneration: if we hit OOM, we simply restart and go 
> into full-GC.
> - Depending on degen: make heuristics adaptive. Currently it requires 
> manual tweaking of thresholds.
> 
> Relevant knobs:
> - ShenandoahGarbageThreshold: regions with more garbage than this go 
> into the Cset. Notice that this is based on the *previous* cycle, so we 
> may actually have much more garbage (but not less).
> - ShenandoahFreeThreshold: start GC when we have less than that much 
> free heap.
> 
> I'll not go into all the details for now and give you the code:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/
> 
> 
> Roman


From shade at redhat.com  Wed Jan 17 14:44:21 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jan 2018 15:44:21 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
Message-ID: <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>

On 01/17/2018 03:37 PM, Roman Kennke wrote:
> Testing showed up some regressions in non-traversal code and two issues that I introduced (or
> haven't fixed) when single-flag patch arrived.
> 
> The following now passes hotspot_gc_shenandoah tests and runs of specjvm with fastdebug with
> -XX:+ShenandoahVerify -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4
> 
> Differential:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/

Small-ish questions:

*) This solves some Partial GC bug, not Traversal GC bug? If so, can you RFR and push it separately?

--- old/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp	2018-01-17 15:32:54.756247073 +0100
+++ new/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp	2018-01-17 15:32:54.391251897 +0100
@@ -169,7 +169,7 @@
 }

 bool ShenandoahBarrierSet::need_update_refs_barrier() {
-  if (_heap->is_concurrent_partial_in_progress() || _heap->is_concurrent_traversal_in_progress()) {
+  if (UseShenandoahMatrix || _heap->is_concurrent_traversal_in_progress()) {
     return true;
   }
   if (_heap->shenandoahPolicy()->update_refs()) {


*) I think we have discussed the RFR for this -- does it turn out to be needed after all?

--- old/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp	2018-01-17 15:32:54.135255280 +0100
+++ new/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp	2018-01-17 15:32:53.869258796 +0100
@@ -727,7 +727,7 @@
   const int referent_offset = java_lang_ref_Reference::referent_offset;
   guarantee(referent_offset > 0, "referent offset not initialized");

-  if (UseG1GC || UseShenandoahGC) {
+  if (UseG1GC || (UseShenandoahGC && ShenandoahKeepAliveBarrier)) {
     Label slow_path;
     // rbx: method

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 17 15:08:19 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 17 Jan 2018 16:08:19 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
Message-ID: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>

Am 17.01.2018 um 15:44 schrieb Aleksey Shipilev:
> On 01/17/2018 03:37 PM, Roman Kennke wrote:
>> Testing showed up some regressions in non-traversal code and two issues that I introduced (or
>> haven't fixed) when single-flag patch arrived.
>>
>> The following now passes hotspot_gc_shenandoah tests and runs of specjvm with fastdebug with
>> -XX:+ShenandoahVerify -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4
>>
>> Differential:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/
> 
> Small-ish questions:
> 
> *) This solves some Partial GC bug, not Traversal GC bug? If so, can you RFR and push it separately?
> 
> --- old/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp	2018-01-17 15:32:54.756247073 +0100
> +++ new/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp	2018-01-17 15:32:54.391251897 +0100
> @@ -169,7 +169,7 @@
>   }
> 
>   bool ShenandoahBarrierSet::need_update_refs_barrier() {
> -  if (_heap->is_concurrent_partial_in_progress() || _heap->is_concurrent_traversal_in_progress()) {
> +  if (UseShenandoahMatrix || _heap->is_concurrent_traversal_in_progress()) {
>       return true;
>     }
>     if (_heap->shenandoahPolicy()->update_refs()) {
> 
> 

No, this is a bug that I introduced with webrev.00 and reverted back 
with webrev.01. When using matrix, we always need to do the 
update-matrix-stuff, not only when partial GC is in progress. With 
traversal, we only need to go into the barrier when the traversal GC is 
in progress.

> *) I think we have discussed the RFR for this -- does it turn out to be needed after all?
> 
> --- old/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp	2018-01-17 15:32:54.135255280 +0100
> +++ new/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp	2018-01-17 15:32:53.869258796 +0100
> @@ -727,7 +727,7 @@
>     const int referent_offset = java_lang_ref_Reference::referent_offset;
>     guarantee(referent_offset > 0, "referent offset not initialized");
> 
> -  if (UseG1GC || UseShenandoahGC) {
> +  if (UseG1GC || (UseShenandoahGC && ShenandoahKeepAliveBarrier)) {
>       Label slow_path;
>       // rbx: method

Oops. Reverted here:
Diff:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.02.diff/
Full:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.02/

(give it some seconds to upload)

Better?

Roman

From zgu at redhat.com  Wed Jan 17 17:10:41 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 17 Jan 2018 12:10:41 -0500
Subject: RFR: Traveral GC heuristics
In-Reply-To: <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
Message-ID: <694e454c-dd0f-ceb5-5258-a89fbd9690a4@redhat.com>

shenandoahOopClosures.hpp:
   Missing string dedup version

shenandoahSupport.cpp
L#615 - 656
L#3537 - 3556
L#3981 - 4056
   indent

sharedRuntime.cpp

  213   assert(oopDesc::is_oop(orig, true /* ignore mark word */), "Error");
  214   // store the original value that was in the field reference
  215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); }
  216 return;
  217   thread->satb_mark_queue().enqueue(orig);
  218 JRT_END

L#216: does not look right. Should it be inside UseShenandoahGC block?

Thanks,

-Zhengyu


On 01/17/2018 09:37 AM, Roman Kennke wrote:
> Testing showed up some regressions in non-traversal code and two issues 
> that I introduced (or haven't fixed) when single-flag patch arrived.
> 
> The following now passes hotspot_gc_shenandoah tests and runs of specjvm 
> with fastdebug with -XX:+ShenandoahVerify 
> -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4
> 
> Differential:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/
> Full:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/
> 
> Please review, test, comment, etc. :-)
> 
> Cheers, Roman
> 
>> This started out as a smallish partial-GC experiment, then into a 
>> clone of partial GC, and ended up as a standalone GC mode for 
>> Shenandoah, which is a frankensteinization of 
>> partial+concurrent-marking, with some goodies :-)
>>
>> The idea is to do everything, marking+evacuation+update-refs, in one 
>> single phase. This is not very difficult to do: while traversing, 
>> evacuate objects that are in the Cset, and update references as we go. 
>> I chose to traverse the heap using an incremental-update approach, 
>> mostly because this is what partial GC does, and as said above, this 
>> started out as a clone of partial :-)
>>
>> The tricky part is to choose the Cset: I made it such that each GC 
>> cycle collects liveness information, and bases the decision about Cset 
>> in the next cycle on that liveness information. Yes, this means the 
>> first cycle does not collect anything (except immediate garbage).
>>
>> Advantages:
>> - obviously, touching all live objects only once means less time spent 
>> in GC. Measurements show that traversing the heap and doing everything 
>> is only slightly longer than Shenandoah's marking phase, and this 
>> might actually be because we also need to mark through newly allocated 
>> objects.
>> - Traversal-order evacuation gives us 10x increase in 
>> ordering-sensitive microbenchmark: 
>> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/
>>
>> - Simpler barriers: i-u style barriers don't need to load the 
>> pre-value, and can be optimized much better (hoisted out of hot paths, 
>> etc). Some of it is already done in this patch, but there are plenty 
>> of opportunities to make it even better.
>> - Possibly less floating garbage because we trace through newly 
>> allocated objects too, and don't treat it implicitely live.
>> - we don't need a keep-alive-barrier for Reference.get() which means 
>> we keep fewer referents alive just because they happen to be accessed 
>> during GC.
>> - MWF is only a switch away (if I understand MWF correctly): 
>> -XX:+ShenandoahMWF
>> - It does not need RBs in the WB fast-path, because outside of the 
>> single phase, nothing is ever forwarded.
>> - It does not need the membar stuff in the WBs because we turn on/off 
>> the phase during safepoint
>>
>> Disadvantages:
>> - Store-value barrier needs to be a WB, RB is not sufficient. The 
>> storeval barrier is there to ensure only to-space values ever get 
>> written to fields during update-refs. 3-phase Shenandoah doesn't 
>> evacuate during update-refs, and therefore RB is enough. We need WB 
>> here. (I believe this is off-set by optimization opportunities, see 
>> above)
>> - Known I-U problem: mutators can outrun the GC with allocations and 
>> let us not terminate.
>> - It needs barriers for constants (need to check this).
>>
>> Stuff left to do:
>> - Implement sane degeneration: if we hit OOM, we simply restart and go 
>> into full-GC.
>> - Depending on degen: make heuristics adaptive. Currently it requires 
>> manual tweaking of thresholds.
>>
>> Relevant knobs:
>> - ShenandoahGarbageThreshold: regions with more garbage than this go 
>> into the Cset. Notice that this is based on the *previous* cycle, so 
>> we may actually have much more garbage (but not less).
>> - ShenandoahFreeThreshold: start GC when we have less than that much 
>> free heap.
>>
>> I'll not go into all the details for now and give you the code:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/
>>
>>
>> Roman
> 

From shade at redhat.com  Wed Jan 17 17:54:26 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jan 2018 18:54:26 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
Message-ID: <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>

On 01/17/2018 04:08 PM, Roman Kennke wrote:
> Full:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.02/

Exciting!

c1_Runtime1_x86.cpp:

*) Let's rewrite this:

   if (bs->kind() == BarrierSet::ShenandoahBarrierSet && !ShenandoahSATBBarrier &&
!ShenandoahConditionalSATBBarrier && !ShenandoahStoreValEnqueueBarrier) {

into:

    if (bs->kind() == BarrierSet::ShenandoahBarrierSet && !(ShenandoahSATBBarrier ||
ShenandoahConditionalSATBBarrier || ShenandoahStoreValEnqueueBarrier) {

*) Re:

1644           __ testb(gc_state, ShenandoahHeap::MARKING | ShenandoahHeap::TRAVERSAL);

So, set_concurrent_traversal_in_progress activates the SATB queues, and this is good. Why don't we
set the ShenandoahHeap::MARKING bit to gc_state there, and avoid "| TRAVERSAL" all around the
arch-specific code?


*) shenandoahBarrierSet_x86.cpp: Pushes/pops around the call to g1_write_barrier_pre seem
suspicious. How do we know we need to caller-save rbx, rcx, rdx, c_rarg1? Deserves a comment, maybe?

*) ShenandoahStoreValReadBarrier, ShenandoahStoreValWriteBarrier, ShenandoahStoreValReadBarrier
exclusion tests

*) shenandoahBarrierSet.cpp: branches are the same, which looks like typo. Should be compound
boolean predicate?

  42       if (ALWAYS_ENQUEUE && !oopDesc::is_null(o)) {
  43         ShenandoahBarrierSet::enqueue(o);
  44       } else if (evac) {
  45         ShenandoahBarrierSet::enqueue(o);
  46       }

shenandoahCollectorPolicy.cpp

 *) Stray debugging lines:

1367     // tty->print_cr("CSET regions:");
1376         // r->print_on(tty);

 *) Heuristics need work: I think it runs into problem that adaptive cset selection solves: it
chooses either too big or too small cset. I wonder if you can actually reuse that in traversal
heuristics

shenandoahConcurrentThread.cpp:

 *) I think you want to introduce ShenandoahHeap::{vmop_entry,entry,op}_traversal family of methods,
and call them, as we do with the rest of VM ops.

 *) This is not needed anymore:

 207   // TODO: Call this properly with Shenandoah*CycleMark
 208   heap->set_used_at_last_gc();

shenandoahHeap.cpp:

 *) As mentioned above, this:

1683 void ShenandoahHeap::set_concurrent_traversal_in_progress(bool in_progress) {
1684   set_gc_state_bit(TRAVERSAL_BITPOS, in_progress);
1685   JavaThread::satb_mark_queue_set().set_active_all_threads(in_progress, !in_progress);
1686   set_evacuation_in_progress_at_safepoint(in_progress);
1687   set_has_forwarded_objects(in_progress);
1688 }

is probably just:

 void ShenandoahHeap::set_concurrent_traversal_in_progress(bool in_progress) {
   set_gc_state_bit(TRAVERSAL_BITPOS, in_progress);
   set_gc_state_bit(MARKING_BITPOS, in_progress);
   set_gc_state_bit(HAS_FORWARDED_OBJECTS_BITPOS, in_progress);
   set_gc_state_bit_at_safepoint(_gc_state.raw_value());
   JavaThread::satb_mark_queue_set().set_active_all_threads(in_progress, !in_progress);
 }

shenandoahVerifier.cpp:

*) Why are we testing for "next" bitmap here? _verify_liveness_complete and the comment seem to
disagree? Comments still mention "partial"?

 void ShenandoahVerifier::verify_after_traversal() {
   verify_at_safepoint(
         "After Traversal",
         _verify_forwarded_none,      // cannot have forwarded objects
         _verify_marked_next,         // bitmaps might be stale, but alloc-after-mark should be well
         _verify_matrix_disable,      // matrix is conservatively consistent
         _verify_cset_none,           // no cset references left after partial
         _verify_liveness_complete,   // no reliable liveness data anymore
         _verify_regions_nocset       // no cset regions, trash regions allowed
  );
 }

shenandoah_globals.hpp:

 *) Comment is duplicated:

 316                                                                             \
 317   diagnostic(bool, ShenandoahStoreValEnqueueBarrier, false,                 \
 318           "Turn on/off enqueuing of oops after write barriers (MWF)")       \
 319                                                                             \
 320   diagnostic(bool, ShenandoahMWF, false,                                    \
 321           "Turn on/off enqueuing of oops after write barriers (MWF)")       \


graphKit.cpp:

 *) So we predicate shenandoah_enqueue_barrier with !ShenandoahMWF here:

4887     if (ShenandoahStoreValEnqueueBarrier && !ShenandoahMWF) {
4888       shenandoah_enqueue_barrier(obj);
4889     }

 ...but not around other uses of ShenandoahStoreValEnqueueBarrier, e.g. in c1_LIRGenerator?

Other C2:

 *) Roland should take a look, but I find it uncomfortable to change do_unswitching,
find_unswitching_candidate with new arguements...

sharedRuntime.cpp:

 *) Bug due to bad indentation and braces?

 215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); }
 216 return;

shenandoahTraversalGC*:

 *) Really, really unfortunate to duplicate a lot from shenandoahConcurrentMark. Maybe we should
massage the codebase so that we could reuse significant chunks of the code?

Thanks,
-Aleksey


From ashipile at redhat.com  Wed Jan 17 18:08:32 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 17 Jan 2018 18:08:32 +0000
Subject: hg: shenandoah/jdk10: 2 new changesets
Message-ID: <201801171808.w0HI8WD2018594@aojmv0008.oracle.com>

Changeset: fd9724b26fdd
Author:    shade
Date:      2018-01-17 15:37 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/fd9724b26fdd

Refactor allocation failure and explicit GC handling

! src/hotspot/share/gc/shared/gcCause.cpp
! src/hotspot/share/gc/shared/gcCause.hpp
! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.hpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp

Changeset: 26b9048c042a
Author:    shade
Date:      2018-01-17 16:08 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/26b9048c042a

Make degenerated update-refs use region-set cursor to hand over work

! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp


From rkennke at redhat.com  Wed Jan 17 20:54:19 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 17 Jan 2018 21:54:19 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <694e454c-dd0f-ceb5-5258-a89fbd9690a4@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <694e454c-dd0f-ceb5-5258-a89fbd9690a4@redhat.com>
Message-ID: <adff3ec9-887e-9464-7db5-57ed40d649f2@redhat.com>

Am 17.01.2018 um 18:10 schrieb Zhengyu Gu:
> shenandoahOopClosures.hpp:
>  ? Missing string dedup version

I am not sure what needs to be done for strdedup. Add support for it in 
a followup patch?

> shenandoahSupport.cpp
> L#615 - 656
> L#3537 - 3556
> L#3981 - 4056
>  ? indent

Fixed.

> sharedRuntime.cpp
> 
>  ?213?? assert(oopDesc::is_oop(orig, true /* ignore mark word */), 
> "Error");
>  ?214?? // store the original value that was in the field reference
>  ?215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); }
>  ?216 return;
>  ?217?? thread->satb_mark_queue().enqueue(orig);
>  ?218 JRT_END
> 
> L#216: does not look right. Should it be inside UseShenandoahGC block?

It's not needed and can go away.

You'll find the updated patch in reply to Aleksey's review that I'll 
post shortly (after testing).

Thanks, Roman

> Thanks,
> 
> -Zhengyu
> 
> 
> On 01/17/2018 09:37 AM, Roman Kennke wrote:
>> Testing showed up some regressions in non-traversal code and two 
>> issues that I introduced (or haven't fixed) when single-flag patch 
>> arrived.
>>
>> The following now passes hotspot_gc_shenandoah tests and runs of 
>> specjvm with fastdebug with -XX:+ShenandoahVerify 
>> -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4
>>
>> Differential:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/
>> Full:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/
>>
>> Please review, test, comment, etc. :-)
>>
>> Cheers, Roman
>>
>>> This started out as a smallish partial-GC experiment, then into a 
>>> clone of partial GC, and ended up as a standalone GC mode for 
>>> Shenandoah, which is a frankensteinization of 
>>> partial+concurrent-marking, with some goodies :-)
>>>
>>> The idea is to do everything, marking+evacuation+update-refs, in one 
>>> single phase. This is not very difficult to do: while traversing, 
>>> evacuate objects that are in the Cset, and update references as we 
>>> go. I chose to traverse the heap using an incremental-update 
>>> approach, mostly because this is what partial GC does, and as said 
>>> above, this started out as a clone of partial :-)
>>>
>>> The tricky part is to choose the Cset: I made it such that each GC 
>>> cycle collects liveness information, and bases the decision about 
>>> Cset in the next cycle on that liveness information. Yes, this means 
>>> the first cycle does not collect anything (except immediate garbage).
>>>
>>> Advantages:
>>> - obviously, touching all live objects only once means less time 
>>> spent in GC. Measurements show that traversing the heap and doing 
>>> everything is only slightly longer than Shenandoah's marking phase, 
>>> and this might actually be because we also need to mark through newly 
>>> allocated objects.
>>> - Traversal-order evacuation gives us 10x increase in 
>>> ordering-sensitive microbenchmark: 
>>> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/
>>>
>>> - Simpler barriers: i-u style barriers don't need to load the 
>>> pre-value, and can be optimized much better (hoisted out of hot 
>>> paths, etc). Some of it is already done in this patch, but there are 
>>> plenty of opportunities to make it even better.
>>> - Possibly less floating garbage because we trace through newly 
>>> allocated objects too, and don't treat it implicitely live.
>>> - we don't need a keep-alive-barrier for Reference.get() which means 
>>> we keep fewer referents alive just because they happen to be accessed 
>>> during GC.
>>> - MWF is only a switch away (if I understand MWF correctly): 
>>> -XX:+ShenandoahMWF
>>> - It does not need RBs in the WB fast-path, because outside of the 
>>> single phase, nothing is ever forwarded.
>>> - It does not need the membar stuff in the WBs because we turn on/off 
>>> the phase during safepoint
>>>
>>> Disadvantages:
>>> - Store-value barrier needs to be a WB, RB is not sufficient. The 
>>> storeval barrier is there to ensure only to-space values ever get 
>>> written to fields during update-refs. 3-phase Shenandoah doesn't 
>>> evacuate during update-refs, and therefore RB is enough. We need WB 
>>> here. (I believe this is off-set by optimization opportunities, see 
>>> above)
>>> - Known I-U problem: mutators can outrun the GC with allocations and 
>>> let us not terminate.
>>> - It needs barriers for constants (need to check this).
>>>
>>> Stuff left to do:
>>> - Implement sane degeneration: if we hit OOM, we simply restart and 
>>> go into full-GC.
>>> - Depending on degen: make heuristics adaptive. Currently it requires 
>>> manual tweaking of thresholds.
>>>
>>> Relevant knobs:
>>> - ShenandoahGarbageThreshold: regions with more garbage than this go 
>>> into the Cset. Notice that this is based on the *previous* cycle, so 
>>> we may actually have much more garbage (but not less).
>>> - ShenandoahFreeThreshold: start GC when we have less than that much 
>>> free heap.
>>>
>>> I'll not go into all the details for now and give you the code:
>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/
>>>
>>>
>>> Roman
>>


From zgu at redhat.com  Wed Jan 17 20:56:45 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 17 Jan 2018 15:56:45 -0500
Subject: RFR: Traveral GC heuristics
In-Reply-To: <adff3ec9-887e-9464-7db5-57ed40d649f2@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <694e454c-dd0f-ceb5-5258-a89fbd9690a4@redhat.com>
 <adff3ec9-887e-9464-7db5-57ed40d649f2@redhat.com>
Message-ID: <ba97c69f-f90b-994b-bc7a-024c2e9196d0@redhat.com>


On 01/17/2018 03:54 PM, Roman Kennke wrote:
> Am 17.01.2018 um 18:10 schrieb Zhengyu Gu:
>> shenandoahOopClosures.hpp:
>>    Missing string dedup version
> 
> I am not sure what needs to be done for strdedup. Add support for it in 
> a followup patch?

Sure. I can add the support afterward.

Thanks,

-Zhengyu


> 
>> shenandoahSupport.cpp
>> L#615 - 656
>> L#3537 - 3556
>> L#3981 - 4056
>>    indent
> 
> Fixed.
> 
>> sharedRuntime.cpp
>>
>>   213   assert(oopDesc::is_oop(orig, true /* ignore mark word */), 
>> "Error");
>>   214   // store the original value that was in the field reference
>>   215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); }
>>   216 return;
>>   217   thread->satb_mark_queue().enqueue(orig);
>>   218 JRT_END
>>
>> L#216: does not look right. Should it be inside UseShenandoahGC block?
> 
> It's not needed and can go away.
> 
> You'll find the updated patch in reply to Aleksey's review that I'll 
> post shortly (after testing).
> 
> Thanks, Roman
> 
>> Thanks,
>>
>> -Zhengyu
>>
>>
>> On 01/17/2018 09:37 AM, Roman Kennke wrote:
>>> Testing showed up some regressions in non-traversal code and two 
>>> issues that I introduced (or haven't fixed) when single-flag patch 
>>> arrived.
>>>
>>> The following now passes hotspot_gc_shenandoah tests and runs of 
>>> specjvm with fastdebug with -XX:+ShenandoahVerify 
>>> -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4
>>>
>>> Differential:
>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/
>>> Full:
>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/
>>>
>>> Please review, test, comment, etc. :-)
>>>
>>> Cheers, Roman
>>>
>>>> This started out as a smallish partial-GC experiment, then into a 
>>>> clone of partial GC, and ended up as a standalone GC mode for 
>>>> Shenandoah, which is a frankensteinization of 
>>>> partial+concurrent-marking, with some goodies :-)
>>>>
>>>> The idea is to do everything, marking+evacuation+update-refs, in one 
>>>> single phase. This is not very difficult to do: while traversing, 
>>>> evacuate objects that are in the Cset, and update references as we 
>>>> go. I chose to traverse the heap using an incremental-update 
>>>> approach, mostly because this is what partial GC does, and as said 
>>>> above, this started out as a clone of partial :-)
>>>>
>>>> The tricky part is to choose the Cset: I made it such that each GC 
>>>> cycle collects liveness information, and bases the decision about 
>>>> Cset in the next cycle on that liveness information. Yes, this means 
>>>> the first cycle does not collect anything (except immediate garbage).
>>>>
>>>> Advantages:
>>>> - obviously, touching all live objects only once means less time 
>>>> spent in GC. Measurements show that traversing the heap and doing 
>>>> everything is only slightly longer than Shenandoah's marking phase, 
>>>> and this might actually be because we also need to mark through 
>>>> newly allocated objects.
>>>> - Traversal-order evacuation gives us 10x increase in 
>>>> ordering-sensitive microbenchmark: 
>>>> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/
>>>>
>>>> - Simpler barriers: i-u style barriers don't need to load the 
>>>> pre-value, and can be optimized much better (hoisted out of hot 
>>>> paths, etc). Some of it is already done in this patch, but there are 
>>>> plenty of opportunities to make it even better.
>>>> - Possibly less floating garbage because we trace through newly 
>>>> allocated objects too, and don't treat it implicitely live.
>>>> - we don't need a keep-alive-barrier for Reference.get() which means 
>>>> we keep fewer referents alive just because they happen to be 
>>>> accessed during GC.
>>>> - MWF is only a switch away (if I understand MWF correctly): 
>>>> -XX:+ShenandoahMWF
>>>> - It does not need RBs in the WB fast-path, because outside of the 
>>>> single phase, nothing is ever forwarded.
>>>> - It does not need the membar stuff in the WBs because we turn 
>>>> on/off the phase during safepoint
>>>>
>>>> Disadvantages:
>>>> - Store-value barrier needs to be a WB, RB is not sufficient. The 
>>>> storeval barrier is there to ensure only to-space values ever get 
>>>> written to fields during update-refs. 3-phase Shenandoah doesn't 
>>>> evacuate during update-refs, and therefore RB is enough. We need WB 
>>>> here. (I believe this is off-set by optimization opportunities, see 
>>>> above)
>>>> - Known I-U problem: mutators can outrun the GC with allocations and 
>>>> let us not terminate.
>>>> - It needs barriers for constants (need to check this).
>>>>
>>>> Stuff left to do:
>>>> - Implement sane degeneration: if we hit OOM, we simply restart and 
>>>> go into full-GC.
>>>> - Depending on degen: make heuristics adaptive. Currently it 
>>>> requires manual tweaking of thresholds.
>>>>
>>>> Relevant knobs:
>>>> - ShenandoahGarbageThreshold: regions with more garbage than this go 
>>>> into the Cset. Notice that this is based on the *previous* cycle, so 
>>>> we may actually have much more garbage (but not less).
>>>> - ShenandoahFreeThreshold: start GC when we have less than that much 
>>>> free heap.
>>>>
>>>> I'll not go into all the details for now and give you the code:
>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/
>>>>
>>>>
>>>> Roman
>>>
> 

From rkennke at redhat.com  Wed Jan 17 21:58:52 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 17 Jan 2018 22:58:52 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
Message-ID: <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>

>> Full:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.02/
> 
> Exciting!
> 
> c1_Runtime1_x86.cpp:
> 
> *) Let's rewrite this:
> 
>     if (bs->kind() == BarrierSet::ShenandoahBarrierSet && !ShenandoahSATBBarrier &&
> !ShenandoahConditionalSATBBarrier && !ShenandoahStoreValEnqueueBarrier) {
> 
> into:
> 
>      if (bs->kind() == BarrierSet::ShenandoahBarrierSet && !(ShenandoahSATBBarrier ||
> ShenandoahConditionalSATBBarrier || ShenandoahStoreValEnqueueBarrier) {

Done.

> *) Re:
> 
> 1644           __ testb(gc_state, ShenandoahHeap::MARKING | ShenandoahHeap::TRAVERSAL);
> 
> So, set_concurrent_traversal_in_progress activates the SATB queues, and this is good. Why don't we
> set the ShenandoahHeap::MARKING bit to gc_state there, and avoid "| TRAVERSAL" all around the
> arch-specific code?

Done.

> *) shenandoahBarrierSet_x86.cpp: Pushes/pops around the call to g1_write_barrier_pre seem
> suspicious. How do we know we need to caller-save rbx, rcx, rdx, c_rarg1? Deserves a comment, maybe?

It's the same set of regs that need to be saved+restored in the write 
barrier, a few lines above.

> *) ShenandoahStoreValReadBarrier, ShenandoahStoreValWriteBarrier, ShenandoahStoreValReadBarrier
> exclusion tests

ShStoreValWB and ShStoreValEnq are not exclusive. I need them both in 
tandem. I added exclusion test for ShStoreValEnq against ShStoreValRB in 
shenandoahCollectorPolicy.cpp, but don't know how to 'encode' that in 
TestSelectiveBarrierFlags.java

> *) shenandoahBarrierSet.cpp: branches are the same, which looks like typo. Should be compound
> boolean predicate?
> 
>    42       if (ALWAYS_ENQUEUE && !oopDesc::is_null(o)) {
>    43         ShenandoahBarrierSet::enqueue(o);
>    44       } else if (evac) {
>    45         ShenandoahBarrierSet::enqueue(o);
>    46       }

Done.

> shenandoahCollectorPolicy.cpp
> 
>   *) Stray debugging lines:
> 
> 1367     // tty->print_cr("CSET regions:");
> 1376         // r->print_on(tty);

Removed.

>   *) Heuristics need work: I think it runs into problem that adaptive cset selection solves: it
> chooses either too big or too small cset. I wonder if you can actually reuse that in traversal
> heuristics

Yes, but we need degenerate GC for traversal first :-)

> shenandoahConcurrentThread.cpp:
> 
>   *) I think you want to introduce ShenandoahHeap::{vmop_entry,entry,op}_traversal family of methods,
> and call them, as we do with the rest of VM ops.

Done.

>   *) This is not needed anymore:
> 
>   207   // TODO: Call this properly with Shenandoah*CycleMark
>   208   heap->set_used_at_last_gc();

Removed.

> shenandoahHeap.cpp:
> 
>   *) As mentioned above, this:
> 
> 1683 void ShenandoahHeap::set_concurrent_traversal_in_progress(bool in_progress) {
> 1684   set_gc_state_bit(TRAVERSAL_BITPOS, in_progress);
> 1685   JavaThread::satb_mark_queue_set().set_active_all_threads(in_progress, !in_progress);
> 1686   set_evacuation_in_progress_at_safepoint(in_progress);
> 1687   set_has_forwarded_objects(in_progress);
> 1688 }
> 
> is probably just:
> 
>   void ShenandoahHeap::set_concurrent_traversal_in_progress(bool in_progress) {
>     set_gc_state_bit(TRAVERSAL_BITPOS, in_progress);
>     set_gc_state_bit(MARKING_BITPOS, in_progress);
>     set_gc_state_bit(HAS_FORWARDED_OBJECTS_BITPOS, in_progress);
>     set_gc_state_bit_at_safepoint(_gc_state.raw_value());
>     JavaThread::satb_mark_queue_set().set_active_all_threads(in_progress, !in_progress);
>   }

Done.

> shenandoahVerifier.cpp:
> 
> *) Why are we testing for "next" bitmap here?

Because traversal uses the next bitmap, and only this, and I don't care 
to swap with complete, but I want to verify it. Good?

> _verify_liveness_complete and the comment seem to
> disagree? Comments still mention "partial"?

Fixed.

> shenandoah_globals.hpp:
> 
>   *) Comment is duplicated:
> 
>   316                                                                             \
>   317   diagnostic(bool, ShenandoahStoreValEnqueueBarrier, false,                 \
>   318           "Turn on/off enqueuing of oops after write barriers (MWF)")       \
>   319                                                                             \
>   320   diagnostic(bool, ShenandoahMWF, false,                                    \
>   321           "Turn on/off enqueuing of oops after write barriers (MWF)")       \
> 
>

Fixed.

> graphKit.cpp:
> 
>   *) So we predicate shenandoah_enqueue_barrier with !ShenandoahMWF here:
> 
> 4887     if (ShenandoahStoreValEnqueueBarrier && !ShenandoahMWF) {
> 4888       shenandoah_enqueue_barrier(obj);
> 4889     }
> 
>   ...but not around other uses of ShenandoahStoreValEnqueueBarrier, e.g. in c1_LIRGenerator?
> 

I only implemented this sketchy MWF thing in C2 for now. This definitely 
needs more work, and I don't even know if it is correct.

> Other C2:
> 
>   *) Roland should take a look, but I find it uncomfortable to change do_unswitching,
> find_unswitching_candidate with new arguements...

This was actually done by Roland to get the new barriers to work and 
optimize well enough.

> sharedRuntime.cpp:
> 
>   *) Bug due to bad indentation and braces?
> 
>   215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); }
>   216 return;

Yeah, this is not needed. I removed it.

> shenandoahTraversalGC*:
> 
>   *) Really, really unfortunate to duplicate a lot from shenandoahConcurrentMark. Maybe we should
> massage the codebase so that we could reuse significant chunks of the code?

Yes, maybe. But for the start, I did not want it to interfere with 
existing code if I can avoid it. For this reason, this looks like a 
copy+paste job from conc-mark and partial for some parts.

Thanks for reviewing and spotting all the issues. I could not really 
make a diff webrev, because I first had to pull -u your latest work, and 
this messed up my differential webrev... sorry. Only full webrev now:

http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/


From zgu at redhat.com  Wed Jan 17 21:59:17 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 17 Jan 2018 16:59:17 -0500
Subject: RFR: Hint unused regions instead of uncommit them
In-Reply-To: <9bed336a-34df-5192-24da-db675b22cc45@redhat.com>
References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com>
 <9bed336a-34df-5192-24da-db675b22cc45@redhat.com>
Message-ID: <c3d4bade-928b-fa06-95b5-ec54de0d2ac4@redhat.com>

On 01/16/2018 06:49 AM, Aleksey Shipilev wrote:
> On 01/15/2018 06:21 PM, Zhengyu Gu wrote:
>> This patch adds new experimental flag ShenandoahIdleRegions (default to false) to hint kernel that
>> the regions are not needed (vs. madvise(MADV_DONTNEED), instead of proactively uncommitting.
>>
>> It appears that does have advantage over uncommitting regions, although, not by as much as I was
>> expected.
>>
>> SPECjbb2015:
>>
>> Baseline:
>> RUN RESULT: hbIR (max attempted) = 59167, hbIR (settled) = 51984, max-jOPS = 47925, critical-jOPS =
>> 19108
>>
>> -XX:ShenandoahUncommitDelay=0 -XX:-ShenandoahIdleRegions
>> RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 36501, max-jOPS = 30839, critical-jOPS =
>> 8841
>>
>> -XX:ShenandoahUncommitDelay=0 -XX:+ShenandoahIdleRegions
>> RUN RESULT: hbIR (max attempted) = 49322, hbIR (settled) = 42968, max-jOPS = 35019, critical-jOPS =
>> 9283
>>
>>
>> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.00/
> 
> As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting
> uneasy using this. madvise call that basically corrupts memory, say what? And it also does not
> support large pages...
> 
> It _maybe_ makes sense to optionally support this, but only if we make the code changes minimal. It
> looks like the fair bit of complexity comes from the attempt to fallback to commit/uncommit when
> idling fails. Could we just test that idle/activate_memory works, and select one of the options
> without fallback? E.g. when ShenandoahIdleRegions is true, LargePages is false, and idling works,
> make do_commit/do_uncommit only do idle_memory/activate_memory, and fail hard when idle_memory
> returns false. You would not need the _idle_region flag too then.
Okay, made it fatal if can not idle the region.

Updated webrev: 
http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.01/

Test:

   hotspot_gc_shenandoah (fastdebug + release)

   Manual test to verify large pages are actually used.

Thanks,

-Zhengyu


> 
> Thanks,
> -ALeksey
> 

From zgu at redhat.com  Thu Jan 18 00:39:34 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 17 Jan 2018 19:39:34 -0500
Subject: RFR: Bitmap size might not be page aligned when large page is used
Message-ID: <68242bad-5451-e508-cd83-9d9b71ef2bde@redhat.com>

I discovered this when running tests for idling regions.

#  Out of Memory Error 
(/home/zgu/workspace/shenandoah-jdk10/src/hotspot/os/linux/os_linux.cpp:2598), 
pid=23493, tid=23494
#
# JRE version:  (10.0) (fastdebug build )
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 
10-internal+0-adhoc.zgu.shenandoah-jdk10, mixed mode, aot, tiered, 
compressed oops, Shenandoah gc, linux-amd64)
# Core dump will be written. Default location: 
/home/zgu/workspace/shenandoah-jdk10/test/hotspot/jtreg/gc/arguments/core.%p
#

This bug is only reproducible when large pages are actually used (when 
system has enough large pages)

http://cr.openjdk.java.net/~zgu/shenandoah/bitmap_size_large_page/webrev.00/


Test:

   hotspot_gc_shenandoah (fastdebug + release)


Thanks,

-Zhengyu


From shade at redhat.com  Thu Jan 18 08:58:32 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 18 Jan 2018 09:58:32 +0100
Subject: RFR: Bitmap size might not be page aligned when large page is used
In-Reply-To: <68242bad-5451-e508-cd83-9d9b71ef2bde@redhat.com>
References: <68242bad-5451-e508-cd83-9d9b71ef2bde@redhat.com>
Message-ID: <15adbdff-f98e-3bc4-a01c-ad42e77446e4@redhat.com>

On 01/18/2018 01:39 AM, Zhengyu Gu wrote:
> I discovered this when running tests for idling regions.
> 
> #? Out of Memory Error
> (/home/zgu/workspace/shenandoah-jdk10/src/hotspot/os/linux/os_linux.cpp:2598), pid=23493, tid=23494
> #
> # JRE version:? (10.0) (fastdebug build )
> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 10-internal+0-adhoc.zgu.shenandoah-jdk10, mixed mode,
> aot, tiered, compressed oops, Shenandoah gc, linux-amd64)
> # Core dump will be written. Default location:
> /home/zgu/workspace/shenandoah-jdk10/test/hotspot/jtreg/gc/arguments/core.%p
> #
> 
> This bug is only reproducible when large pages are actually used (when system has enough large pages)
> 
> http://cr.openjdk.java.net/~zgu/shenandoah/bitmap_size_large_page/webrev.00/

D'uh. Looks good!

-Aleksey


From rwestrel at redhat.com  Thu Jan 18 13:20:24 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 18 Jan 2018 14:20:24 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
Message-ID: <dk6po67wbfr.fsf@rwestrel.remote.csb>


>> Other C2:
>> 
>>   *) Roland should take a look, but I find it uncomfortable to change do_unswitching,
>> find_unswitching_candidate with new arguements...
>
> This was actually done by Roland to get the new barriers to work and 
> optimize well enough.

C2 stuff is ok.

Roland.

From zgu at redhat.com  Thu Jan 18 13:48:46 2018
From: zgu at redhat.com (zgu at redhat.com)
Date: Thu, 18 Jan 2018 13:48:46 +0000
Subject: hg: shenandoah/jdk10: Bitmap size might not be page aligned when
 large page is used
Message-ID: <201801181348.w0IDmkTh010708@aojmv0008.oracle.com>

Changeset: 1a6a9f288dd2
Author:    zgu
Date:      2018-01-18 08:23 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/1a6a9f288dd2

Bitmap size might not be page aligned when large page is used

! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp


From shade at redhat.com  Thu Jan 18 15:18:15 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 18 Jan 2018 16:18:15 +0100
Subject: Degenerated GC
Message-ID: <e12ebc32-eb0d-355f-1fe5-74df60622135@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/

This patch implements Degenerate GC: better solution to handle allocation failures. We had pushed
bits and pieces of infrastructure needed for it over few past weeks.

Our current scheme roughly approximates the same thing: if allocation failure is raised during the
concurrent mark or concurrent update-refs, we immediately STW and complete the phase under the
pause. There are major caveats in that scheme though: it only works reliably for the phases that
have final-STWs, it complicates the control code significantly, and it tries to continue the cycle
concurrent cycle afterwards, even though we know something is fishy.

Degenerate GC is basically the STW continuation of the concurrent cycle. When concurrent cycle
degenerates, we invoke a single VM operation ("dive into STW"), and complete the same cycle there.
In most cases, we degenerate at the end of concurrent cycle when the majority of work is already done.

If Degenerate GC experiences the second allocation failure during that STW cycle (e.g. during evac),
it upgrades to Full GC. It stands to reason that Degenerate GC is cheaper than Full GC, but here is
how they compare most of the time:

# Degenerated at evacuation, upgraded to Full GC:
[46.755s][info][gc] GC(109) Cancelling concurrent GC: Allocation Failure
[46.755s][info][gc] GC(109) Cannot finish degeneration, upgrading to Full GC
[46.994s][info][gc] GC(109) Pause Degenerated GC (Evacuation) 4054M->527M(4096M) 239.331ms

# Degenerated at update-refs
[52.145s][info][gc] Cancelling concurrent GC: Allocation Failure
[52.147s][info][gc] GC(123) Concurrent update references 3360M->3946M(4096M) 218.713ms
[52.177s][info][gc] GC(124) Pause Degenerated GC (Update Refs) 3946M->1725M(4096M) 20.201ms

So, degeneration can be seen as the softer graceful degradation step before full-stop full-heap
full-moving Full GC.

Degenerate GC brings several major improvements over our usual degenerate scheme:

 a) When allocation failure is raised, we stop *all* threads, not just that allocator thread. This
makes sense because it is very likely that other threads would experience the allocation failure
shortly. This is our failure mode, and GC log would register the GC pause that would correlate with
the actual stalls experienced by application threads.

 b) When degenerate STW is running, it uses ParallelGCThreads count, completing the cycle as fast as
it possibly can. Otherwise, if we degenerated the concurrent cycle, most mutator threads would
probably be stuck waiting for allocation to succeed, but the concurrent cycle would still run with
ConcGCThreads (which is realistically lower than ParallelGCThread), wasting precious wall time.

 c) It handles out-of-cycle allocation failure. When ShConcurrentThread cannot catch up with issuing
the GC cycles fast enough, or when the heuristics misses the allocation spike, our current code just
Full GCs. Current change runs the Degenerate GC, in hope that mark would identify enough immediate
garbage to proceed with the cycle. (This would get better once we give the GC a stash of "reserved"
regions for evacuation!)

 d) It allows easier future handling of partial, traversal, and evac degeneration: we are already at
STW, and we can do whatever at that point.


Degenerate GC seems to improve the survivability on densely populated heaps. This could be modeled
roughly by having a normal heavily-allocating and heavily-threaded workload with a very tight heap.
Current gc+stats would tell that most allocation failures are handled by Degenerated GCs then:

-Xmx16g

[140.227s][info][gc,stats]   48 successful concurrent GCs
[140.227s][info][gc,stats]      0 invoked explicitly
[140.227s][info][gc,stats]
[140.227s][info][gc,stats]    2 Degenerated GCs
[140.227s][info][gc,stats]      2 caused by allocation failure
[140.227s][info][gc,stats]      0 upgraded to Full GC
[140.227s][info][gc,stats]
[140.227s][info][gc,stats]    0 Full GCs
[140.227s][info][gc,stats]      0 invoked explicitly
[140.227s][info][gc,stats]      0 caused by allocation failure
[140.227s][info][gc,stats]      0 upgraded from Degenerated GC

-Xmx2g

[197.491s][info][gc,stats]  379 successful concurrent GCs
[197.491s][info][gc,stats]      0 invoked explicitly
[197.491s][info][gc,stats]
[197.491s][info][gc,stats]  120 Degenerated GCs
[197.491s][info][gc,stats]    120 caused by allocation failure
[197.491s][info][gc,stats]     47 upgraded to Full GC
[197.491s][info][gc,stats]
[197.491s][info][gc,stats]   49 Full GCs
[197.491s][info][gc,stats]      0 invoked explicitly
[197.491s][info][gc,stats]      2 caused by allocation failure
[197.491s][info][gc,stats]     47 upgraded from Degenerated GC

(Full GC upgrades are from evac OOME-s, and alloc-failure Full GCs are the heuristics chickening out
from multiple back-to-back Degenerated GCs into Full GC).

Still fully testing it, but early reviews are welcome.

Testing: hotspot_gc_shenandoah, benchmarks

Thanks,
-Aleksey


From shade at redhat.com  Thu Jan 18 19:51:34 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 18 Jan 2018 20:51:34 +0100
Subject: Degenerated GC
In-Reply-To: <e12ebc32-eb0d-355f-1fe5-74df60622135@redhat.com>
References: <e12ebc32-eb0d-355f-1fe5-74df60622135@redhat.com>
Message-ID: <b9c6364e-2fb1-500a-d024-90470a5643ba@redhat.com>

On 01/18/2018 04:18 PM, Aleksey Shipilev wrote:
> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/

Amped up alloc-failure injection, and that exposed a few bugs. Fixed them:
  http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.02/

GCBasher runs for half an hour now without problems. Running further...

-Aleksey


From cflood at redhat.com  Thu Jan 18 23:21:00 2018
From: cflood at redhat.com (Christine Flood)
Date: Thu, 18 Jan 2018 18:21:00 -0500
Subject: RFR: Traveral GC heuristics
In-Reply-To: <dk6po67wbfr.fsf@rwestrel.remote.csb>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <dk6po67wbfr.fsf@rwestrel.remote.csb>
Message-ID: <CALKUemzXLwU86fEmZ-WckZn-_8K0xodFYPDO6WiGKKfzBLP6wQ@mail.gmail.com>

Can we at least include a number of comments that we are using SATB
queues for convenience but this isn't using an SATB algorithm.
Otherwise future developers will curse us for misleading them.

Is there some way to come up with a common abstraction for partial gc
and traversal gc so we don't have to have all those duplicate timings?

You have the MWF flag, but I don't see the implementation.  You need
something in ShenandoahBarrierSet to see if the object being written
to was allocated after TAMS and if so, both the object and the field
need to be marked.

Christine


On Thu, Jan 18, 2018 at 8:20 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
>
>>> Other C2:
>>>
>>>   *) Roland should take a look, but I find it uncomfortable to change do_unswitching,
>>> find_unswitching_candidate with new arguements...
>>
>> This was actually done by Roland to get the new barriers to work and
>> optimize well enough.
>
> C2 stuff is ok.
>
> Roland.

From shade at redhat.com  Fri Jan 19 07:55:46 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 08:55:46 +0100
Subject: Degenerated GC
In-Reply-To: <b9c6364e-2fb1-500a-d024-90470a5643ba@redhat.com>
References: <e12ebc32-eb0d-355f-1fe5-74df60622135@redhat.com>
 <b9c6364e-2fb1-500a-d024-90470a5643ba@redhat.com>
Message-ID: <e3bba57c-dee1-2176-3142-e69e92743ebe@redhat.com>

On 01/18/2018 08:51 PM, Aleksey Shipilev wrote:
> On 01/18/2018 04:18 PM, Aleksey Shipilev wrote:
>> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/
> 
> Amped up alloc-failure injection, and that exposed a few bugs. Fixed them:
>   http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.02/
> 
> GCBasher runs for half an hour now without problems. Running further...

8-hour GCBasher passes with:

$ -Xmx1g -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions
-XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahDegenerateALot TestGCBasherWithShenandoah 28800000

[27665.812s][info][gc,stats       ] 85556 successful concurrent GCs
[27665.812s][info][gc,stats       ]      0 invoked explicitly
[27665.812s][info][gc,stats       ]
[27665.812s][info][gc,stats       ] 44995 Degenerated GCs
[27665.812s][info][gc,stats       ]   44995 caused by allocation failure
[27665.812s][info][gc,stats       ]   8628 upgraded to Full GC
[27665.812s][info][gc,stats       ]
[27665.812s][info][gc,stats       ] 8758 Full GCs
[27665.812s][info][gc,stats       ]      0 invoked explicitly
[27665.812s][info][gc,stats       ]    130 caused by allocation failure
[27665.812s][info][gc,stats       ]   8628 upgraded from Degenerated GC

So, I am pretty sure it works :)

-Aleksey


From shade at redhat.com  Fri Jan 19 09:10:04 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 10:10:04 +0100
Subject: RFC: Pick up 9.0.4 to sh/jdk9
Message-ID: <575a7f9c-6ee3-cae9-1b17-a5ae9c871ad3@redhat.com>

Upstream jdk-updates/jdk9u had pushed the changesets for 9.0.4:
  http://hg.openjdk.java.net/jdk-updates/jdk9u

Let's pick them up! A few trivial merges were needed.

Testing: hotspot_gc_shenandoah {fastdebug|release}

Thanks,
-Alekse


From shade at redhat.com  Fri Jan 19 10:05:59 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 11:05:59 +0100
Subject: RFR: Hint unused regions instead of uncommit them
In-Reply-To: <c3d4bade-928b-fa06-95b5-ec54de0d2ac4@redhat.com>
References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com>
 <9bed336a-34df-5192-24da-db675b22cc45@redhat.com>
 <c3d4bade-928b-fa06-95b5-ec54de0d2ac4@redhat.com>
Message-ID: <8998368d-e6c1-2218-7ef5-5d9fe152c9c9@redhat.com>

On 01/17/2018 10:59 PM, Zhengyu Gu wrote:
> Updated webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.01/

All right, good!

-Aleksey


From shade at redhat.com  Fri Jan 19 10:24:40 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 11:24:40 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
Message-ID: <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>

On 01/17/2018 10:58 PM, Roman Kennke wrote:
> Yes, maybe. But for the start, I did not want it to interfere with existing code if I can avoid
> it. For this reason, this looks like a copy+paste job from conc-mark and partial for some parts.

Okay, but please plan to common these things right away. We cannot have two copy-pasted 1000+ LOC
blocks and hope for the best ;)

> Thanks for reviewing and spotting all the issues. I could not really make a diff webrev, because I
> first had to pull -u your latest work, and this messed up my differential webrev... sorry. Only full
> webrev now:
> 
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/

Sorry to be a PITA about this, but the change is quite large, and I think we want to be more
forward-looking to backports and stability.

Another sweep through the code:

*) GCCause::to_string misses the to_string case for _shenandoah_traversal_gc?

*) So, wait. SBS::nterpreter_write_barrier_impl caller-saves registers when they do not equal to
dst. New code in SBS::interpreter_storeval_barrier just does it unconditionally. Is WB too cautious,
or SVB is too lax about this?

*) I think with minimal changes, we can make ShenandoahStoreValEnqueueBarrier exclusive, which will
make testing much easier (encoding this in TestSelectiveBarriers would be trivial). E.g. say:

   if (UseShenandoahGC) {
     if (ShenandoahStoreValWriteBarrier || ShenandoahStoreValEnqueueBarrier) {
       // perform WB
     }
     if (ShenandoahStoreValEnqueueBarrier) {
       // enqueue
     }
     if (ShenandoahStoreValReadBarrier) {
       // RB
     }
   }

*) Minor nit: please indent second arguments like this:

     FLAG_SET_DEFAULT(UseShenandoahMatrix,              false);
     FLAG_SET_DEFAULT(ShenandoahSATBBarrier,            false);
     FLAG_SET_DEFAULT(ShenandoahConditionalSATBBarrier, false);
     FLAG_SET_DEFAULT(ShenandoahStoreValReadBarrier,    false);
     FLAG_SET_DEFAULT(ShenandoahStoreValWriteBarrier,   true);
     FLAG_SET_DEFAULT(ShenandoahStoreValEnqueueBarrier, true);
     FLAG_SET_DEFAULT(ShenandoahKeepAliveBarrier,       false);
     FLAG_SET_DEFAULT(ShenandoahAsmWB,                  true);
     FLAG_SET_DEFAULT(ShenandoahBarriersForConst,       true);
     FLAG_SET_DEFAULT(ShenandoahWBWithMemBar,           false);
     FLAG_SET_DEFAULT(ShenandoahWriteBarrierRB,         false);

*) shenandoahOopClosures.hpp, indenting is a bit off here:

 240       _thread(Thread::current()), _queue(q) {}

...

 273   virtual bool do_metadata() { return true; }

*) I wonder if we want to pull out ShenandoahWBWithMemBar changes into a separate changeset? This
looks potentially backportable, and usable outside of Traversal GC.

Thanks,
-Aleksey


From rkennke at redhat.com  Fri Jan 19 10:43:23 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 19 Jan 2018 11:43:23 +0100
Subject: Degenerated GC
In-Reply-To: <b9c6364e-2fb1-500a-d024-90470a5643ba@redhat.com>
References: <e12ebc32-eb0d-355f-1fe5-74df60622135@redhat.com>
 <b9c6364e-2fb1-500a-d024-90470a5643ba@redhat.com>
Message-ID: <b4dbc34e-e242-b44e-04c1-0e8600ccdca4@redhat.com>

Am 18.01.2018 um 20:51 schrieb Aleksey Shipilev:
> On 01/18/2018 04:18 PM, Aleksey Shipilev wrote:
>> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/
> 
> Amped up alloc-failure injection, and that exposed a few bugs. Fixed them:
>    http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.02/
> 
> GCBasher runs for half an hour now without problems. Running further...
> 
> -Aleksey
> 

I have no complaints about it. I like it!

Roman


From shade at redhat.com  Fri Jan 19 10:52:11 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 11:52:11 +0100
Subject: RFR: Demote warning message about OOM-during-evac to informational
Message-ID: <a4154e11-2b93-154b-0b15-ef8df4089ca7@redhat.com>

Let's finally do this:

diff -r 8e52377a090e src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp	Fri Jan 19 11:38:51 2018 +0100
+++ b/src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp	Fri Jan 19 11:50:16 2018 +0100
@@ -396,7 +396,9 @@
   if ((! Thread::current()->is_GC_task_thread()) && (! Thread::current()->is_ConcurrentGC_thread())) {
     assert(! Threads_lock->owned_by_self()
            || SafepointSynchronize::is_at_safepoint(), "must not hold Threads_lock here");
-    log_info(gc)("%s. Let Java thread wait until evacuation finishes.",
GCCause::to_string(GCCause::_shenandoah_allocation_failure_evac));
+    log_info(gc)("%s. Thread \"%s\" waits until evacuation finishes.",
+                 GCCause::to_string(GCCause::_shenandoah_allocation_failure_evac),
+                 Thread::current()->name());
     while (heap->is_evacuation_in_progress()) { // wait.
       Thread::current()->_ParkEvent->park(1);
     }

User has nothing to do with that warning, and it is non-user-actionable. So, no point in putting
scary messages in the GC log. It now prints:

[info][gc] GC(63) Concurrent cleanup 611M->611M(1024M) 0.202ms
[info][gc] GC(63) Cancelling concurrent GC: Allocation Failure During Evac
[info][gc] Allocation Failure During Evac. Thread "MyShinyThread" waits until evacuation finishes.
[info][gc] GC(63) Concurrent evacuation 612M->994M(1024M) 315.488ms
[info][gc] GC(64) Pause Full (Allocation Failure) 994M->541M(1024M) 312.493ms

Testing: hotspot_fast_gc_shenandoah

Thanks,
-Aleksey


From rkennke at redhat.com  Fri Jan 19 10:52:53 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 19 Jan 2018 11:52:53 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <CALKUemzXLwU86fEmZ-WckZn-_8K0xodFYPDO6WiGKKfzBLP6wQ@mail.gmail.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <dk6po67wbfr.fsf@rwestrel.remote.csb>
 <CALKUemzXLwU86fEmZ-WckZn-_8K0xodFYPDO6WiGKKfzBLP6wQ@mail.gmail.com>
Message-ID: <0f37e367-1c6d-c694-a6f0-91bcb380012a@redhat.com>

Am 19.01.2018 um 00:21 schrieb Christine Flood:
> Can we at least include a number of comments that we are using SATB
> queues for convenience but this isn't using an SATB algorithm.
> Otherwise future developers will curse us for misleading them.

I've added this note on top of shenandoahTraversalGC.hpp:

/**
  * NOTE: We are using the SATB buffer in thread.hpp and 
satbMarkQueue.hpp, however, it is not an SATB algorithm.
  * We're using the buffer as generic oop buffer to enqueue new values 
in concurrent oop stores, IOW, the algorithm
  * is incremental-update-based.
  */

> Is there some way to come up with a common abstraction for partial gc
> and traversal gc so we don't have to have all those duplicate timings?

Aleksey also noted this with regards to conc-mark. I wanted it to not 
impact existing code for the start. I'll see into refactoring and 
commoning the code after the initial change is in and got some testing 
and play time?

> You have the MWF flag, but I don't see the implementation.  You need
> something in ShenandoahBarrierSet to see if the object being written
> to was allocated after TAMS and if so, both the object and the field
> need to be marked.

I've only implemented it in C2. It's not checking TAMS (because I don't 
really maintain a usable TAMS) but instead enqueue the target object 
unconditionally. I've probably not understood MWF correctly? Should I 
rip it out and put it back in later, and hopefully correct?

I will post a revised changeset with the above comment added later in 
this thread.

Roman

From rkennke at redhat.com  Fri Jan 19 10:54:40 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 19 Jan 2018 11:54:40 +0100
Subject: RFR: Demote warning message about OOM-during-evac to informational
In-Reply-To: <a4154e11-2b93-154b-0b15-ef8df4089ca7@redhat.com>
References: <a4154e11-2b93-154b-0b15-ef8df4089ca7@redhat.com>
Message-ID: <c3d718de-81d8-903c-f78c-15ad4fd74927@redhat.com>

Am 19.01.2018 um 11:52 schrieb Aleksey Shipilev:
> Let's finally do this:
> 
> diff -r 8e52377a090e src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp	Fri Jan 19 11:38:51 2018 +0100
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp	Fri Jan 19 11:50:16 2018 +0100
> @@ -396,7 +396,9 @@
>     if ((! Thread::current()->is_GC_task_thread()) && (! Thread::current()->is_ConcurrentGC_thread())) {
>       assert(! Threads_lock->owned_by_self()
>              || SafepointSynchronize::is_at_safepoint(), "must not hold Threads_lock here");
> -    log_info(gc)("%s. Let Java thread wait until evacuation finishes.",
> GCCause::to_string(GCCause::_shenandoah_allocation_failure_evac));
> +    log_info(gc)("%s. Thread \"%s\" waits until evacuation finishes.",
> +                 GCCause::to_string(GCCause::_shenandoah_allocation_failure_evac),
> +                 Thread::current()->name());
>       while (heap->is_evacuation_in_progress()) { // wait.
>         Thread::current()->_ParkEvent->park(1);
>       }
> 
> User has nothing to do with that warning, and it is non-user-actionable. So, no point in putting
> scary messages in the GC log. It now prints:
> 
> [info][gc] GC(63) Concurrent cleanup 611M->611M(1024M) 0.202ms
> [info][gc] GC(63) Cancelling concurrent GC: Allocation Failure During Evac
> [info][gc] Allocation Failure During Evac. Thread "MyShinyThread" waits until evacuation finishes.
> [info][gc] GC(63) Concurrent evacuation 612M->994M(1024M) 315.488ms
> [info][gc] GC(64) Pause Full (Allocation Failure) 994M->541M(1024M) 312.493ms
> 
> Testing: hotspot_fast_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 

Yeah ok.

Do we still have any desire to fix this for real? I've pursued a couple 
of possible implementations, and all of them seemed overly complex or 
performance-impacting...

Roman


From ashipile at redhat.com  Fri Jan 19 11:00:52 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Fri, 19 Jan 2018 11:00:52 +0000
Subject: hg: shenandoah/jdk10: Demote warning message about OOM-during-evac to
 informational
Message-ID: <201801191100.w0JB0qmZ003346@aojmv0008.oracle.com>

Changeset: 12654193e434
Author:    shade
Date:      2018-01-19 11:52 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/12654193e434

Demote warning message about OOM-during-evac to informational

! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp


From shade at redhat.com  Fri Jan 19 14:15:56 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 15:15:56 +0100
Subject: RFR: TestSelectiveBarrierFlags should accept multi-element flag
 selections
Message-ID: <2658f693-856c-418e-f6a9-4ed521abe200@redhat.com>

Roman's test changes need this: Fixed the bug that breaks when more than 2 flags per group are
present, and also rewritten for clarity:

diff -r 12654193e434 test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java
--- a/test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java	Fri Jan 19 11:52:40 2018 +0100
+++ b/test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java	Fri Jan 19 15:14:50 2018 +0100
@@ -69,10 +69,11 @@

             StringBuilder sb = new StringBuilder();
             for (String[] l : opts) {
-                int f = t % (l.length + 1);
-                conf.add("-XX:" + ((f & 1) == 1 ? "+" : "-") + l[0]);
-                if (l.length > 1) {
-                    conf.add("-XX:" + ((f & 2) == 2 ? "+" : "-") + l[1]);
+                // Make a choice which flag to select from the group.
+                // Zero means no flag is selected from the group.
+                int choice = t % (l.length + 1);
+                for (int e = 0; e < l.length; e++) {
+                  conf.add("-XX:" + ((choice == (e + 1)) ? "+" : "-") + l[e]);
                 }
                 t = t / (l.length + 1);
             }

Testing: TestSelectiveBarrierFlags {fastdebug,release}

Thanks,
-Aleksey


From zgu at redhat.com  Fri Jan 19 14:41:05 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 19 Jan 2018 09:41:05 -0500
Subject: RFR(XXS) Missing resource mark
Message-ID: <04976088-7470-8852-5629-50a7d4fe5b5e@redhat.com>

Crashed inside handle_alloc_failure_evac() due to missing ResourceMark 
when calling thread->name().

#
#  Internal Error 
(/home/zgu/workspace/shenandoah-jdk10/src/hotspot/share/memory/resourceArea.hpp:63), 
pid=1230, tid=1232
#  fatal error: memory leak: allocating without ResourceMark
#

Webrev: 
http://cr.openjdk.java.net/~zgu/shenandoah/handle_alloc_evac_rm/webrev.00/

Test:

   hotspot_gc_shenandoah (fastdebug)


Thanks,

-Zhengyu

From shade at redhat.com  Fri Jan 19 14:42:18 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 15:42:18 +0100
Subject: RFR(XXS) Missing resource mark
In-Reply-To: <04976088-7470-8852-5629-50a7d4fe5b5e@redhat.com>
References: <04976088-7470-8852-5629-50a7d4fe5b5e@redhat.com>
Message-ID: <c729f775-500e-7a9a-c0e8-4ae642d115c8@redhat.com>

On 01/19/2018 03:41 PM, Zhengyu Gu wrote:
> Crashed inside handle_alloc_failure_evac() due to missing ResourceMark when calling thread->name().
> 
> #
> #? Internal Error
> (/home/zgu/workspace/shenandoah-jdk10/src/hotspot/share/memory/resourceArea.hpp:63), pid=1230, tid=1232
> #? fatal error: memory leak: allocating without ResourceMark
> #
> 
> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/handle_alloc_evac_rm/webrev.00/

Looks good!

-Aleksey


From zgu at redhat.com  Fri Jan 19 14:58:24 2018
From: zgu at redhat.com (zgu at redhat.com)
Date: Fri, 19 Jan 2018 14:58:24 +0000
Subject: hg: shenandoah/jdk10: Missing resource mark in
 SH::handle_alloc_failure_evac()
Message-ID: <201801191458.w0JEwOqi002428@aojmv0008.oracle.com>

Changeset: d791ef88cdff
Author:    zgu
Date:      2018-01-19 09:54 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/d791ef88cdff

Missing resource mark in SH::handle_alloc_failure_evac()

! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp


From rkennke at redhat.com  Fri Jan 19 15:26:02 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 19 Jan 2018 16:26:02 +0100
Subject: RFR: TestSelectiveBarrierFlags should accept multi-element flag
 selections
In-Reply-To: <2658f693-856c-418e-f6a9-4ed521abe200@redhat.com>
References: <2658f693-856c-418e-f6a9-4ed521abe200@redhat.com>
Message-ID: <cc96b645-2a31-9550-c8fb-7ba6e15b0e13@redhat.com>

Good. This makes my test work :-) Push it!

Roman

> Roman's test changes need this: Fixed the bug that breaks when more than 2 flags per group are
> present, and also rewritten for clarity:
> 
> diff -r 12654193e434 test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java
> --- a/test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java	Fri Jan 19 11:52:40 2018 +0100
> +++ b/test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java	Fri Jan 19 15:14:50 2018 +0100
> @@ -69,10 +69,11 @@
> 
>               StringBuilder sb = new StringBuilder();
>               for (String[] l : opts) {
> -                int f = t % (l.length + 1);
> -                conf.add("-XX:" + ((f & 1) == 1 ? "+" : "-") + l[0]);
> -                if (l.length > 1) {
> -                    conf.add("-XX:" + ((f & 2) == 2 ? "+" : "-") + l[1]);
> +                // Make a choice which flag to select from the group.
> +                // Zero means no flag is selected from the group.
> +                int choice = t % (l.length + 1);
> +                for (int e = 0; e < l.length; e++) {
> +                  conf.add("-XX:" + ((choice == (e + 1)) ? "+" : "-") + l[e]);
>                   }
>                   t = t / (l.length + 1);
>               }
> 
> Testing: TestSelectiveBarrierFlags {fastdebug,release}
> 
> Thanks,
> -Aleksey
> 


From shade at redhat.com  Fri Jan 19 15:27:59 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 16:27:59 +0100
Subject: RFC: Pick up 9.0.4 to sh/jdk9
In-Reply-To: <575a7f9c-6ee3-cae9-1b17-a5ae9c871ad3@redhat.com>
References: <575a7f9c-6ee3-cae9-1b17-a5ae9c871ad3@redhat.com>
Message-ID: <6b4285eb-92db-a479-d132-c7c40928d3f9@redhat.com>

On 01/19/2018 10:10 AM, Aleksey Shipilev wrote:
> Upstream jdk-updates/jdk9u had pushed the changesets for 9.0.4:
>   http://hg.openjdk.java.net/jdk-updates/jdk9u
> 
> Let's pick them up! A few trivial merges were needed.

Ah, upstream seems to have borked AArch64!
  https://bugs.openjdk.java.net/browse/JDK-8195685

Let's wait a little then...

-Aleksey


From ashipile at redhat.com  Fri Jan 19 15:36:37 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Fri, 19 Jan 2018 15:36:37 +0000
Subject: hg: shenandoah/jdk10: TestSelectiveBarrierFlags should accept
 multi-element flag selections
Message-ID: <201801191536.w0JFab77015699@aojmv0008.oracle.com>

Changeset: 67294a38c0c7
Author:    shade
Date:      2018-01-19 16:27 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/67294a38c0c7

TestSelectiveBarrierFlags should accept multi-element flag selections

! test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java


From zgu at redhat.com  Fri Jan 19 16:46:28 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 19 Jan 2018 11:46:28 -0500
Subject: Degenerated GC
In-Reply-To: <e3bba57c-dee1-2176-3142-e69e92743ebe@redhat.com>
References: <e12ebc32-eb0d-355f-1fe5-74df60622135@redhat.com>
 <b9c6364e-2fb1-500a-d024-90470a5643ba@redhat.com>
 <e3bba57c-dee1-2176-3142-e69e92743ebe@redhat.com>
Message-ID: <f74190b8-61ba-6ace-059e-c4a9aa554e37@redhat.com>


shenandoahHeap.cpp:

1600
1601     // Allocations happen during concurrent preclean, record peak 
after the phase:
1602     shenandoahPolicy()->record_peak_occupancy();
1603   }
1604
1605   // Allocations happen during bitmap cleanup, record peak after 
the phase:
1606   shenandoahPolicy()->record_peak_occupancy();

May call twice.


Otherwise, looks good.

-Zhengyu


On 01/19/2018 02:55 AM, Aleksey Shipilev wrote:
> On 01/18/2018 08:51 PM, Aleksey Shipilev wrote:
>> On 01/18/2018 04:18 PM, Aleksey Shipilev wrote:
>>> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/
>>
>> Amped up alloc-failure injection, and that exposed a few bugs. Fixed them:
>>    http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.02/
>>
>> GCBasher runs for half an hour now without problems. Running further...
> 
> 8-hour GCBasher passes with:
> 
> $ -Xmx1g -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions
> -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahDegenerateALot TestGCBasherWithShenandoah 28800000
> 
> [27665.812s][info][gc,stats       ] 85556 successful concurrent GCs
> [27665.812s][info][gc,stats       ]      0 invoked explicitly
> [27665.812s][info][gc,stats       ]
> [27665.812s][info][gc,stats       ] 44995 Degenerated GCs
> [27665.812s][info][gc,stats       ]   44995 caused by allocation failure
> [27665.812s][info][gc,stats       ]   8628 upgraded to Full GC
> [27665.812s][info][gc,stats       ]
> [27665.812s][info][gc,stats       ] 8758 Full GCs
> [27665.812s][info][gc,stats       ]      0 invoked explicitly
> [27665.812s][info][gc,stats       ]    130 caused by allocation failure
> [27665.812s][info][gc,stats       ]   8628 upgraded from Degenerated GC
> 
> So, I am pretty sure it works :)
> 
> -Aleksey
> 

From rkennke at redhat.com  Fri Jan 19 16:53:54 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 19 Jan 2018 17:53:54 +0100
Subject: RFR: Implement flag to generate write-barriers without membars
Message-ID: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com>

I extracted this from Traversal GC because it may be useful for other 
situations too. It introduces a flag ShenandoahWBWithMembar which 
enables to avoid generation of the load-load-membar in the 
write-barrier. This membar is not needed when evacuation is always 
turned off at safepoints (e.g. partial).

http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.00/

Test: hotspot_gc_shenandoah passed

Roman

From shade at redhat.com  Fri Jan 19 16:59:59 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 17:59:59 +0100
Subject: RFR: Implement flag to generate write-barriers without membars
In-Reply-To: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com>
References: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com>
Message-ID: <1afef25c-8d6e-e0a1-8e37-ea010f03a666@redhat.com>

On 01/19/2018 05:53 PM, Roman Kennke wrote:
> I extracted this from Traversal GC because it may be useful for other situations too. It introduces
> a flag ShenandoahWBWithMembar which enables to avoid generation of the load-load-membar in the
> write-barrier. This membar is not needed when evacuation is always turned off at safepoints (e.g.
> partial).
> 
> http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.00/

Looks good to me, assuming nothing changed since Traversal GC code that Roland reviewed :)

Minor nit: can we make the option name consistent with other selective options. E.g. we have
ShenandoahWriteBarrierRB. So this one seems to be ShenandoahWriteBarrierMembar?

Thanks,
-Aleksey


From shade at redhat.com  Fri Jan 19 17:05:39 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 18:05:39 +0100
Subject: RFR: Allocation failure injection machinery
Message-ID: <b67750d0-a613-acb0-8bbe-a24c8ed1edef@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/inject-alloc-failure/webrev.01/

Found many bugs in Degenerated GC with this machinery. But it is separate from the rest of the code,
and is useful to have for general testing: for example, to test if baseline without Degenerated GC
fails the same way. Therefore, this patch splits the machinery out.

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From zgu at redhat.com  Fri Jan 19 17:35:11 2018
From: zgu at redhat.com (zgu at redhat.com)
Date: Fri, 19 Jan 2018 17:35:11 +0000
Subject: hg: shenandoah/jdk10: Hint unused regions instead of uncommit them
Message-ID: <201801191735.w0JHZB9d026722@aojmv0008.oracle.com>

Changeset: 46c3360b6623
Author:    zgu
Date:      2018-01-19 11:37 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/46c3360b6623

Hint unused regions instead of uncommit them

! src/hotspot/os/linux/os_linux.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp
! src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp
! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp
! src/hotspot/share/runtime/os.cpp
! src/hotspot/share/runtime/os.hpp
! test/hotspot/jtreg/gc/shenandoah/acceptance/HeapUncommit.java


From rkennke at redhat.com  Fri Jan 19 17:43:44 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 19 Jan 2018 18:43:44 +0100
Subject: RFR: Implement flag to generate write-barriers without membars
In-Reply-To: <1afef25c-8d6e-e0a1-8e37-ea010f03a666@redhat.com>
References: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com>
 <1afef25c-8d6e-e0a1-8e37-ea010f03a666@redhat.com>
Message-ID: <ebf49f1f-2a3d-bc30-e151-e372668afa52@redhat.com>

Am 19.01.2018 um 17:59 schrieb Aleksey Shipilev:
> On 01/19/2018 05:53 PM, Roman Kennke wrote:
>> I extracted this from Traversal GC because it may be useful for other situations too. It introduces
>> a flag ShenandoahWBWithMembar which enables to avoid generation of the load-load-membar in the
>> write-barrier. This membar is not needed when evacuation is always turned off at safepoints (e.g.
>> partial).
>>
>> http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.00/
> 
> Looks good to me, assuming nothing changed since Traversal GC code that Roland reviewed :)
> 
> Minor nit: can we make the option name consistent with other selective options. E.g. we have
> ShenandoahWriteBarrierRB. So this one seems to be ShenandoahWriteBarrierMembar?
> 
> Thanks,
> -Aleksey
> 

Nothing should have changed since Roland's review.

http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.01/

Ok to go?

Roman

From shade at redhat.com  Fri Jan 19 17:44:36 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 19 Jan 2018 18:44:36 +0100
Subject: RFR: Implement flag to generate write-barriers without membars
In-Reply-To: <ebf49f1f-2a3d-bc30-e151-e372668afa52@redhat.com>
References: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com>
 <1afef25c-8d6e-e0a1-8e37-ea010f03a666@redhat.com>
 <ebf49f1f-2a3d-bc30-e151-e372668afa52@redhat.com>
Message-ID: <486374dc-7bde-7a42-911f-a41f374ed729@redhat.com>

On 01/19/2018 06:43 PM, Roman Kennke wrote:
> Nothing should have changed since Roland's review.
> 
> http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.01/
> 
> Ok to go?

OK for me.

-Aleksey


From roman at kennke.org  Fri Jan 19 17:52:47 2018
From: roman at kennke.org (roman at kennke.org)
Date: Fri, 19 Jan 2018 17:52:47 +0000
Subject: hg: shenandoah/jdk10: Implement flag to generate write-barriers
 without membars.
Message-ID: <201801191752.w0JHqliZ003447@aojmv0008.oracle.com>

Changeset: ecb87af5e0d8
Author:    rkennke
Date:      2018-01-19 18:40 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/ecb87af5e0d8

Implement flag to generate write-barriers without membars.

! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp
! src/hotspot/share/opto/compile.cpp
! src/hotspot/share/opto/shenandoahSupport.cpp


From rkennke at redhat.com  Sat Jan 20 13:46:21 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Sat, 20 Jan 2018 14:46:21 +0100
Subject: Race and double-counting objects in task balancing code?
Message-ID: <d1d4c617-8960-1e5f-8fb2-85da055b96bf@redhat.com>

Hi there,

I'm currently chasing a failure of TestGCThreadGroups.java with 
Traversal GC. I'm getting objects double counted and liveness going off 
the rails. It only seems to happen with ConcGCThreads > ParallelGCThreads.

I am wondering what prevents GC workers from stealing oops off of queues 
that are currently transferred to 'regular' queues. ? Might we have a 
race there? Is this transferral thread-safe wrt to stealing? Or am I 
missing something? Please have a look at the last patch:

http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/

around:

shenandoahTraversalGC.cpp mark_loop_work()

The code is almost 100% identical to what we do in 
shenandoahConcurrentMark.cpp

I wonder if simply letting fewer GC threads steal from extra queues 
might be the safer way to transfer work from extra queues?

Roman

From shade at redhat.com  Mon Jan 22 09:16:22 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 22 Jan 2018 10:16:22 +0100
Subject: RFR: Make concurrent precleaning log message optional again
Message-ID: <e2bb7f6b-6b4c-0be3-0d17-97aa8019c8b5@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/preclean-optional/webrev.01/

This is the fix for UX regression after recent refactoring: even when precleaning is not enabled
and/or process references is not enabled, we still print "Concurrent precleaning" message in the log.

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From rkennke at redhat.com  Mon Jan 22 09:48:43 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 22 Jan 2018 10:48:43 +0100
Subject: Race and double-counting objects in task balancing code?
In-Reply-To: <d1d4c617-8960-1e5f-8fb2-85da055b96bf@redhat.com>
References: <d1d4c617-8960-1e5f-8fb2-85da055b96bf@redhat.com>
Message-ID: <bfad60ae-7731-23d5-46ec-b5b5f3571e24@redhat.com>

I think I know what it is. This is the first GC mode in Shenandoah that 
also traces newly allocated objects. I believe what I am seeing is that 
the GC thread doesn't see the updated region top yet, and thus fails the 
assertion live <= used.

Roman

> Hi there,
> 
> I'm currently chasing a failure of TestGCThreadGroups.java with 
> Traversal GC. I'm getting objects double counted and liveness going off 
> the rails. It only seems to happen with ConcGCThreads > ParallelGCThreads.
> 
> I am wondering what prevents GC workers from stealing oops off of queues 
> that are currently transferred to 'regular' queues. ? Might we have a 
> race there? Is this transferral thread-safe wrt to stealing? Or am I 
> missing something? Please have a look at the last patch:
> 
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/
> 
> around:
> 
> shenandoahTraversalGC.cpp mark_loop_work()
> 
> The code is almost 100% identical to what we do in 
> shenandoahConcurrentMark.cpp
> 
> I wonder if simply letting fewer GC threads steal from extra queues 
> might be the safer way to transfer work from extra queues?
> 
> Roman


From shade at redhat.com  Mon Jan 22 09:51:32 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 22 Jan 2018 10:51:32 +0100
Subject: RFR: Log message on ref processing and class unload for mark events
Message-ID: <55fbd69b-cb79-6920-979c-1693a5dc5800@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/mark-message/webrev.01/

Another UX improvement: print marking cycle flavor in the log. Helps to diagnose if failing/slower
marking cycle was somehow special.

[14.017s][info][gc] GC(65) Pause Init Mark 0.266ms
[14.047s][info][gc] GC(65) Concurrent marking (class unload) 900M->961M(1024M) 30.572ms
[14.053s][info][gc] GC(65) Pause Final Mark (class unload) 5.529ms
...
[14.135s][info][gc] GC(66) Pause Init Mark 0.697ms
[14.195s][info][gc] GC(66) Concurrent marking (ref process) 869M->927M(1024M) 60.646ms
[14.196s][info][gc] GC(66) Concurrent precleaning 927M->928M(1024M) 0.431ms
[14.200s][info][gc] GC(66) Pause Final Mark (ref process) 4.355ms
...
[14.378s][info][gc] GC(67) Pause Init Mark 0.633ms
[14.453s][info][gc] GC(67) Concurrent marking 911M->988M(1024M) 75.755ms
[14.456s][info][gc] GC(67) Pause Final Mark 2.735ms

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From rkennke at redhat.com  Mon Jan 22 09:59:00 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 22 Jan 2018 10:59:00 +0100
Subject: RFR: Make concurrent precleaning log message optional again
In-Reply-To: <e2bb7f6b-6b4c-0be3-0d17-97aa8019c8b5@redhat.com>
References: <e2bb7f6b-6b4c-0be3-0d17-97aa8019c8b5@redhat.com>
Message-ID: <7ad35698-0f1c-7ad9-8689-6fd61f6e9597@redhat.com>

Am 22.01.2018 um 10:16 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/preclean-optional/webrev.01/
> 
> This is the fix for UX regression after recent refactoring: even when precleaning is not enabled
> and/or process references is not enabled, we still print "Concurrent precleaning" message in the log.
> 
> Testing: hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 

Ok

From rkennke at redhat.com  Mon Jan 22 09:59:38 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 22 Jan 2018 10:59:38 +0100
Subject: RFR: Log message on ref processing and class unload for mark
 events
In-Reply-To: <55fbd69b-cb79-6920-979c-1693a5dc5800@redhat.com>
References: <55fbd69b-cb79-6920-979c-1693a5dc5800@redhat.com>
Message-ID: <39519b76-9268-bc75-d996-9fe39d97d468@redhat.com>

Am 22.01.2018 um 10:51 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/mark-message/webrev.01/
> 
> Another UX improvement: print marking cycle flavor in the log. Helps to diagnose if failing/slower
> marking cycle was somehow special.
> 
> [14.017s][info][gc] GC(65) Pause Init Mark 0.266ms
> [14.047s][info][gc] GC(65) Concurrent marking (class unload) 900M->961M(1024M) 30.572ms
> [14.053s][info][gc] GC(65) Pause Final Mark (class unload) 5.529ms
> ...
> [14.135s][info][gc] GC(66) Pause Init Mark 0.697ms
> [14.195s][info][gc] GC(66) Concurrent marking (ref process) 869M->927M(1024M) 60.646ms
> [14.196s][info][gc] GC(66) Concurrent precleaning 927M->928M(1024M) 0.431ms
> [14.200s][info][gc] GC(66) Pause Final Mark (ref process) 4.355ms
> ...
> [14.378s][info][gc] GC(67) Pause Init Mark 0.633ms
> [14.453s][info][gc] GC(67) Concurrent marking 911M->988M(1024M) 75.755ms
> [14.456s][info][gc] GC(67) Pause Final Mark 2.735ms
> 
> Testing: hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 

Very good. Go!

Roman

From rkennke at redhat.com  Mon Jan 22 10:05:08 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 22 Jan 2018 11:05:08 +0100
Subject: RFR: Allocation failure injection machinery
In-Reply-To: <b67750d0-a613-acb0-8bbe-a24c8ed1edef@redhat.com>
References: <b67750d0-a613-acb0-8bbe-a24c8ed1edef@redhat.com>
Message-ID: <9bcce6f4-3b35-1486-ed27-8f08b6485c00@redhat.com>

Am 19.01.2018 um 18:05 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/inject-alloc-failure/webrev.01/
> 
> Found many bugs in Degenerated GC with this machinery. But it is separate from the rest of the code,
> and is useful to have for general testing: for example, to test if baseline without Degenerated GC
> fails the same way. Therefore, this patch splits the machinery out.
> 
> Testing: hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 

Looks good.


From ashipile at redhat.com  Mon Jan 22 10:15:59 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Mon, 22 Jan 2018 10:15:59 +0000
Subject: hg: shenandoah/jdk10: 3 new changesets
Message-ID: <201801221015.w0MAFxVp016646@aojmv0008.oracle.com>

Changeset: 820129a799b1
Author:    shade
Date:      2018-01-19 18:49 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/820129a799b1

Allocation failure injection machinery

! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp
! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp
! test/hotspot/jtreg/gc/shenandoah/LotsOfCycles.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocIntArrays.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocObjectArrays.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocObjects.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/RetainObjects.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/SieveObjects.java
! test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithShenandoah.java
! test/hotspot/jtreg/gc/stress/gclocker/TestGCLockerWithShenandoah.java
! test/hotspot/jtreg/gc/stress/gcold/TestGCOldWithShenandoah.java

Changeset: e5398dce6e7b
Author:    shade
Date:      2018-01-22 10:10 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/e5398dce6e7b

Make concurrent precleaning log message optional again

! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp

Changeset: b8c39bdc0dac
Author:    shade
Date:      2018-01-22 10:47 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/b8c39bdc0dac

Log message on ref processing and class unload for mark events

! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp


From shade at redhat.com  Mon Jan 22 11:23:58 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 22 Jan 2018 12:23:58 +0100
Subject: RFR: Do not put down update-refs-in-progress flag concurrently
Message-ID: <e95b2584-e64b-5f9d-3332-6659ffc7c391@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/no-concurrent-ur-flag/webrev.01/

There is a race with update-refs-in-progress flag handling that is reliably reproducible with
Degenerated GC patch and AllocFailureALot. On cancellation path, ShConcThread puts u-r-in-p to false
(this was added to handle partial GC failure, IIRC). But, this is enough race window for *native*
thread to skip StoreValBarrier that is sensed by ShBarrierSet::need_update_refs_barrier and then
ShBarrierSet::write_ref_array silently corrupts the heap by not fixing up from-space ptrs.

The way out is to handle it properly, at safepoint. No thread is waiting for that flag to get down,
so there is no reason at all to do this concurrently. Full GC code has to clean up the flag instead.

There is a similar but significantly more complicated patch for evac-in-progress, which is better be
separate from this.

Testing: hotspot_gc_shenandoah, Degenerate GC tests

Thanks,
-Aleksey


From shade at redhat.com  Mon Jan 22 11:51:45 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 22 Jan 2018 12:51:45 +0100
Subject: Bug: -XX:-ShenandoahWriteBarrierMemBar crashes XmlTransform
Message-ID: <3c4a3faf-63a4-fa32-3f89-6718c4a6d459@redhat.com>

Run XmlTransform with:
-XX:ShenandoahGCHeuristics=passive -XX:+ShenandoahWriteBarrier -XX:-ShenandoahWriteBarrierMemBar

Fails with:
#  Internal Error
(/home/shade/trunks/shenandoah-jdk10/src/hotspot/share/opto/shenandoahSupport.cpp:4062), pid=5060,
tid=5085
#  assert(load->Opcode() == Op_LoadUB) failed: inconsistent

Stack: [0x00007f25c462d000,0x00007f25c472e000],  sp=0x00007f25c4725ad0,  free space=994k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V  [libjvm.so+0x1969e5e]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*,
Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x4ce
V  [libjvm.so+0x196a9cf]  VMError::report_and_die(Thread*, char const*, int, char const*, char
const*, __va_list_tag*)+0x2f
V  [libjvm.so+0xaf7d82]  report_vm_error(char const*, int, char const*, char const*, ...)+0x112
V  [libjvm.so+0x17b10ea]  ShenandoahWriteBarrierNode::move_evacuation_test_out_of_loop(IfNode*,
PhaseIdealLoop*)+0xc2a
V  [libjvm.so+0x1149679]  PhaseIdealLoop::do_unswitching(IdealLoopTree*, Node_List&)+0x2ac9
V  [libjvm.so+0x17b0397]  ShenandoahWriteBarrierNode::optimize_after_expansion(Node_List const&,
Node_List const&, Node_List&, PhaseIdealLoop*)+0x3c7
V  [libjvm.so+0x115e95d]  PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0x11bd
V  [libjvm.so+0xa4bc3b]  Compile::optimize_loops(int&, PhaseIterGVN&, LoopOptsMode)+0x58b
V  [libjvm.so+0x17a3b78]  ShenandoahWriteBarrierNode::expand(Compile*, PhaseIterGVN&, int&)+0x648

I put the additional printing in the assert, and it is now:

assert(load->Opcode() == Op_LoadUB) failed: inconsistent: AndI

I believe that AndI is the mask from GC state load. Not sure if that entire branch matters for
correctness, or we just assert wrong things. Removing the assert makes the compiler fail with "Bad
graph detected in build_loop_late".

Can you guys understand what is going on there, and fix it? I think Traversal GC is broken because
of that.

Thanks,
-Aleksey


From shade at redhat.com  Mon Jan 22 11:56:31 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 22 Jan 2018 12:56:31 +0100
Subject: Degenerated GC
In-Reply-To: <f74190b8-61ba-6ace-059e-c4a9aa554e37@redhat.com>
References: <e12ebc32-eb0d-355f-1fe5-74df60622135@redhat.com>
 <b9c6364e-2fb1-500a-d024-90470a5643ba@redhat.com>
 <e3bba57c-dee1-2176-3142-e69e92743ebe@redhat.com>
 <f74190b8-61ba-6ace-059e-c4a9aa554e37@redhat.com>
Message-ID: <ce3e2e2c-646b-99d2-e18d-b7c2b44d20d6@redhat.com>

On 01/19/2018 05:46 PM, Zhengyu Gu wrote:
> shenandoahHeap.cpp:
> 1600
> 1601???? // Allocations happen during concurrent preclean, record peak after the phase:
> 1602???? shenandoahPolicy()->record_peak_occupancy();
> 1603?? }
> 1604
> 1605?? // Allocations happen during bitmap cleanup, record peak after the phase:
> 1606?? shenandoahPolicy()->record_peak_occupancy();
> 
> May call twice.

Yup, that one is fixed, thanks!

I have been chasing a weird bug in Degenerated GC, which turns out to be a separate issue, see
update-refs-in-progress race RFR on this list. That bugfix should be pushed before Degenerated GC,
otherwise tests start to reliably fail.

Updated patch for Degenerated GC:
  http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.03/

Thanks,
-Aleksey


From rkennke at redhat.com  Mon Jan 22 12:20:46 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 22 Jan 2018 13:20:46 +0100
Subject: RFR: Do not put down update-refs-in-progress flag concurrently
In-Reply-To: <e95b2584-e64b-5f9d-3332-6659ffc7c391@redhat.com>
References: <e95b2584-e64b-5f9d-3332-6659ffc7c391@redhat.com>
Message-ID: <4e7d7fd1-8fc6-efc9-223b-8fc6d437da33@redhat.com>

Am 22.01.2018 um 12:23 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/no-concurrent-ur-flag/webrev.01/
> 
> There is a race with update-refs-in-progress flag handling that is reliably reproducible with
> Degenerated GC patch and AllocFailureALot. On cancellation path, ShConcThread puts u-r-in-p to false
> (this was added to handle partial GC failure, IIRC). But, this is enough race window for *native*
> thread to skip StoreValBarrier that is sensed by ShBarrierSet::need_update_refs_barrier and then
> ShBarrierSet::write_ref_array silently corrupts the heap by not fixing up from-space ptrs.
> 
> The way out is to handle it properly, at safepoint. No thread is waiting for that flag to get down,
> so there is no reason at all to do this concurrently. Full GC code has to clean up the flag instead.
> 
> There is a similar but significantly more complicated patch for evac-in-progress, which is better be
> separate from this.
> 
> Testing: hotspot_gc_shenandoah, Degenerate GC tests
> 
> Thanks,
> -Aleksey
> 

Sounds good. We've had enough issues with concurrently putting down 
-in-progress flags. Let's just do it at a proper safepoint.

Roman


From ashipile at redhat.com  Mon Jan 22 15:48:01 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Mon, 22 Jan 2018 15:48:01 +0000
Subject: hg: shenandoah/jdk10: Do not put down update-refs-in-progress flag
 concurrently
Message-ID: <201801221548.w0MFm1PE003003@aojmv0008.oracle.com>

Changeset: dc779781dd5e
Author:    shade
Date:      2018-01-22 12:04 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/dc779781dd5e

Do not put down update-refs-in-progress flag concurrently

! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp
! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp


From ashipile at redhat.com  Mon Jan 22 15:48:16 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Mon, 22 Jan 2018 15:48:16 +0000
Subject: hg: shenandoah/jdk10: Degenerated GC
Message-ID: <201801221548.w0MFmGMX003143@aojmv0008.oracle.com>

Changeset: 45d471869b73
Author:    shade
Date:      2018-01-22 12:52 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/45d471869b73

Degenerated GC

! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.hpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp
! src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp
! src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp
! src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp
! src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp
! src/hotspot/share/gc/shenandoah/shenandoahVerifier.hpp
! src/hotspot/share/gc/shenandoah/shenandoahWorkerPolicy.cpp
! src/hotspot/share/gc/shenandoah/shenandoahWorkerPolicy.hpp
! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp
! src/hotspot/share/gc/shenandoah/vm_operations_shenandoah.cpp
! src/hotspot/share/gc/shenandoah/vm_operations_shenandoah.hpp
! src/hotspot/share/runtime/vm_operations.hpp


From rkennke at redhat.com  Mon Jan 22 22:16:05 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 22 Jan 2018 23:16:05 +0100
Subject: Bug: -XX:-ShenandoahWriteBarrierMemBar crashes XmlTransform
In-Reply-To: <3c4a3faf-63a4-fa32-3f89-6718c4a6d459@redhat.com>
References: <3c4a3faf-63a4-fa32-3f89-6718c4a6d459@redhat.com>
Message-ID: <c48141a0-a6f4-53da-dd7e-24eaa3fc09f7@redhat.com>

Am 22.01.2018 um 12:51 schrieb Aleksey Shipilev:
> Run XmlTransform with:
> -XX:ShenandoahGCHeuristics=passive -XX:+ShenandoahWriteBarrier -XX:-ShenandoahWriteBarrierMemBar
> 
> Fails with:
> #  Internal Error
> (/home/shade/trunks/shenandoah-jdk10/src/hotspot/share/opto/shenandoahSupport.cpp:4062), pid=5060,
> tid=5085
> #  assert(load->Opcode() == Op_LoadUB) failed: inconsistent
> 
> Stack: [0x00007f25c462d000,0x00007f25c472e000],  sp=0x00007f25c4725ad0,  free space=994k
> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> V  [libjvm.so+0x1969e5e]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*,
> Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x4ce
> V  [libjvm.so+0x196a9cf]  VMError::report_and_die(Thread*, char const*, int, char const*, char
> const*, __va_list_tag*)+0x2f
> V  [libjvm.so+0xaf7d82]  report_vm_error(char const*, int, char const*, char const*, ...)+0x112
> V  [libjvm.so+0x17b10ea]  ShenandoahWriteBarrierNode::move_evacuation_test_out_of_loop(IfNode*,
> PhaseIdealLoop*)+0xc2a
> V  [libjvm.so+0x1149679]  PhaseIdealLoop::do_unswitching(IdealLoopTree*, Node_List&)+0x2ac9
> V  [libjvm.so+0x17b0397]  ShenandoahWriteBarrierNode::optimize_after_expansion(Node_List const&,
> Node_List const&, Node_List&, PhaseIdealLoop*)+0x3c7
> V  [libjvm.so+0x115e95d]  PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0x11bd
> V  [libjvm.so+0xa4bc3b]  Compile::optimize_loops(int&, PhaseIterGVN&, LoopOptsMode)+0x58b
> V  [libjvm.so+0x17a3b78]  ShenandoahWriteBarrierNode::expand(Compile*, PhaseIterGVN&, int&)+0x648
> 
> I put the additional printing in the assert, and it is now:
> 
> assert(load->Opcode() == Op_LoadUB) failed: inconsistent: AndI
> 
> I believe that AndI is the mask from GC state load. Not sure if that entire branch matters for
> correctness, or we just assert wrong things. Removing the assert makes the compiler fail with "Bad
> graph detected in build_loop_late".
> 
> Can you guys understand what is going on there, and fix it? I think Traversal GC is broken because
> of that.
> 
> Thanks,
> -Aleksey
> 

Interestingly, I don't see it with the traversal patch. So maybe 
something in it fixes it, or the different graph shapes generated by 
traversal doesn't trigger it. Maybe try with the latest patch from the 
'Traversal GC' thread?

Roman


From rkennke at redhat.com  Mon Jan 22 22:17:23 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 22 Jan 2018 23:17:23 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
Message-ID: <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>

Am 19.01.2018 um 11:24 schrieb Aleksey Shipilev:
> On 01/17/2018 10:58 PM, Roman Kennke wrote:
>> Yes, maybe. But for the start, I did not want it to interfere with existing code if I can avoid
>> it. For this reason, this looks like a copy+paste job from conc-mark and partial for some parts.
> 
> Okay, but please plan to common these things right away. We cannot have two copy-pasted 1000+ LOC
> blocks and hope for the best ;)

Well the plan is to get rid of all the other stuff and make traversal 
the GC to rule them all ;-)


>> Thanks for reviewing and spotting all the issues. I could not really make a diff webrev, because I
>> first had to pull -u your latest work, and this messed up my differential webrev... sorry. Only full
>> webrev now:
>>
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/
> 
> Sorry to be a PITA about this, but the change is quite large, and I think we want to be more
> forward-looking to backports and stability.
> 
> Another sweep through the code:
> 
> *) GCCause::to_string misses the to_string case for _shenandoah_traversal_gc?

Added.

> *) So, wait. SBS::nterpreter_write_barrier_impl caller-saves registers when they do not equal to
> dst. New code in SBS::interpreter_storeval_barrier just does it unconditionally. Is WB too cautious,
> or SVB is too lax about this?

Neither. The WB returns a value into the same register as the input 
value. We don't want to trash this when returning. The enqueing barrier 
is a one-way street.

> *) I think with minimal changes, we can make ShenandoahStoreValEnqueueBarrier exclusive, which will
> make testing much easier (encoding this in TestSelectiveBarriers would be trivial). E.g. say:
> 
>     if (UseShenandoahGC) {
>       if (ShenandoahStoreValWriteBarrier || ShenandoahStoreValEnqueueBarrier) {
>         // perform WB
>       }
>       if (ShenandoahStoreValEnqueueBarrier) {
>         // enqueue
>       }
>       if (ShenandoahStoreValReadBarrier) {
>         // RB
>       }
>     }

Done. Altough it turned out to be not so minimal. Needed to add checks 
all over the place.

> *) Minor nit: please indent second arguments like this:
> 
>       FLAG_SET_DEFAULT(UseShenandoahMatrix,              false);
>       FLAG_SET_DEFAULT(ShenandoahSATBBarrier,            false);
>       FLAG_SET_DEFAULT(ShenandoahConditionalSATBBarrier, false);
>       FLAG_SET_DEFAULT(ShenandoahStoreValReadBarrier,    false);
>       FLAG_SET_DEFAULT(ShenandoahStoreValWriteBarrier,   true);
>       FLAG_SET_DEFAULT(ShenandoahStoreValEnqueueBarrier, true);
>       FLAG_SET_DEFAULT(ShenandoahKeepAliveBarrier,       false);
>       FLAG_SET_DEFAULT(ShenandoahAsmWB,                  true);
>       FLAG_SET_DEFAULT(ShenandoahBarriersForConst,       true);
>       FLAG_SET_DEFAULT(ShenandoahWBWithMemBar,           false);
>       FLAG_SET_DEFAULT(ShenandoahWriteBarrierRB,         false);

Done.

> *) shenandoahOopClosures.hpp, indenting is a bit off here:
> 
>   240       _thread(Thread::current()), _queue(q) {}
> 
> ...
> 
>   273   virtual bool do_metadata() { return true; }

Fixed.

> *) I wonder if we want to pull out ShenandoahWBWithMemBar changes into a separate changeset? This
> looks potentially backportable, and usable outside of Traversal GC.

Already done and pushed.

Also, I have added tests that exercise traversal heuristics just as we 
do for other heuristics. This turned up a number of bugs and 
improvements that I fixed:

- when growing the heap, we must make sure that the TAMS points to end 
for the new regions, otherwise we'd treat them implicitely marked.
- added periodic GC
- folded 'SATB' queue processing into thread stack scanning. The problem 
here is that iterating the threads 2x is cumbersome because of the 
claiming protocol: we need to fire the task/workers 2x: once for the 
SATB queues, once for the thread scanning. I folded it into one pass. 
This required a (trivial) extension in the upstream parallel thread 
scanning/iteration protocol.
- I tripped an assert in SHR::increase_live_data(). I think the reason 
is that we have a race here: a GC thread might not yet see the updated 
SHR::_top but already accounts for the updated live data. I excluded 
conc-traversal from that check. This could probably be fixed by doing 
the proper concurrency membars, but do we care? For assertion code?

- interesting bug: in mark-compact, we first check for 
stuff-in-progress, and turn it off. when checking for 
marking-in-progress first, we turn that off first and also turn off 
SATB. Notice the overlap of MARKING with TRAVERSAL. We then go on to 
check for TRAVERSAL, see that it's also ON, turn it off, which also 
turns off SATB again, and trip an assert because it checks the correct 
SATB active state. Reordering the checks fixes this.
- I had a little index-out-of-bounds in humongous-checking code. 
Trivially fixed by bounds-checking.

- Updated patch to match current head (some conflicts with degen)

Differential:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.04.diff/
Full:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.04/

Testing: hotspot_gc_shenandoah passes

Roman

From shade at redhat.com  Tue Jan 23 10:36:34 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 23 Jan 2018 11:36:34 +0100
Subject: Bug: -XX:-ShenandoahWriteBarrierMemBar crashes XmlTransform
In-Reply-To: <c48141a0-a6f4-53da-dd7e-24eaa3fc09f7@redhat.com>
References: <3c4a3faf-63a4-fa32-3f89-6718c4a6d459@redhat.com>
 <c48141a0-a6f4-53da-dd7e-24eaa3fc09f7@redhat.com>
Message-ID: <33a5abbc-5547-909f-f7af-a9754163a52e@redhat.com>

On 01/22/2018 11:16 PM, Roman Kennke wrote:
> Interestingly, I don't see it with the traversal patch. So maybe something in it fixes it, or the
> different graph shapes generated by traversal doesn't trigger it. Maybe try with the latest patch
> from the 'Traversal GC' thread?

Actually it fails with Traversal GC patch too, although much less (intermittently). I see that
Traversal GC disables some WB-related optimizations with do_evac flags, but it seems the graph is
still incorrect and it fails.

#  Internal Error (/home/shade/trunks/shenandoah-jdk10/src/hotspot/share/opto/loopopts.cpp:1537),
pid=61675, tid=61700
#  Error: assert(b->is_Bool()) failed

V  [libjvm.so+0x1169dc6]  PhaseIdealLoop::clone_iff(PhiNode*, IdealLoopTree*)+0x86
V  [libjvm.so+0x116e10c]  PhaseIdealLoop::clone_loop(IdealLoopTree*, Node_List&, int,
PhaseIdealLoop::CloneLoopMode, Node*)+0x10ec
V  [libjvm.so+0x11444bc]  PhaseIdealLoop::create_slow_version_of_loop(IdealLoopTree*, Node_List&,
int, PhaseIdealLoop::CloneLoopMode)+0xcac
V  [libjvm.so+0x1149735]  PhaseIdealLoop::do_unswitching(IdealLoopTree*, Node_List&, bool)+0x125
V  [libjvm.so+0x113f263]  IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x163
V  [libjvm.so+0x113f176]  IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x76


Anyhow, it should be fixed before Traversal GC arrives, because the ShWBMemBar should be
independently backportable.

-Aleksey


From shade at redhat.com  Tue Jan 23 11:05:15 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 23 Jan 2018 12:05:15 +0100
Subject: RFR: Use properly-scoped FormatBuffers instead of err_msg when
 message is retained
Message-ID: <a342934e-e789-7eb9-3c3b-1b80d942644a@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/formatbuffers/webrev.01/

It seems we cannot use err_msg the way we do it now, when its result is retained for later. In this
case, we have to use the properly-scoped FormatBuffer that has clear lifetime. Otherwise, on some
platforms and compilers we have this:

[0.620s][info][gc] GC(0) Pause Init Mark 5.032ms
[0.661s][info][gc] GC(0) \u0008 1M->1M(7912M) 40.084ms
[0.708s][info][gc] GC(0) \u0008 47.164ms

Testing: hotspot_gc_shenandoah, eyeballing the GC logs on failing configs

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Jan 23 11:23:40 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 23 Jan 2018 12:23:40 +0100
Subject: RFR: Use properly-scoped FormatBuffers instead of err_msg when
 message is retained
In-Reply-To: <a342934e-e789-7eb9-3c3b-1b80d942644a@redhat.com>
References: <a342934e-e789-7eb9-3c3b-1b80d942644a@redhat.com>
Message-ID: <1AA42139-11CC-490C-B1D8-550CBBDD5D26@redhat.com>

Yes please

Am 23. Januar 2018 12:05:15 MEZ schrieb Aleksey Shipilev <shade at redhat.com>:
>http://cr.openjdk.java.net/~shade/shenandoah/formatbuffers/webrev.01/
>
>It seems we cannot use err_msg the way we do it now, when its result is
>retained for later. In this
>case, we have to use the properly-scoped FormatBuffer that has clear
>lifetime. Otherwise, on some
>platforms and compilers we have this:
>
>[0.620s][info][gc] GC(0) Pause Init Mark 5.032ms
>[0.661s][info][gc] GC(0) \u0008 1M->1M(7912M) 40.084ms
>[0.708s][info][gc] GC(0) \u0008 47.164ms
>
>Testing: hotspot_gc_shenandoah, eyeballing the GC logs on failing
>configs
>
>Thanks,
>-Aleksey

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From ashipile at redhat.com  Tue Jan 23 11:39:30 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Tue, 23 Jan 2018 11:39:30 +0000
Subject: hg: shenandoah/jdk10: Use properly-scoped FormatBuffers instead of
 err_msg when message is retained
Message-ID: <201801231139.w0NBdVOk011467@aojmv0008.oracle.com>

Changeset: 6b22dfb1ca65
Author:    shade
Date:      2018-01-23 11:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/6b22dfb1ca65

Use properly-scoped FormatBuffers instead of err_msg when message is retained

! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp


From shade at redhat.com  Tue Jan 23 16:06:20 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 23 Jan 2018 17:06:20 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
 <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
Message-ID: <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>

On 01/22/2018 11:17 PM, Roman Kennke wrote:
> - when growing the heap, we must make sure that the TAMS points to end for the new regions, 
> otherwise we'd treat them implicitely marked.

Is this the source of this spooky change?

   inline ShenandoahHeapRegion* get(size_t i) const {
-    assert (i < _active_end, "sanity");
+    assert (i < _reserved_end, "sanity");
     return _regions[i];
   }

get() is supposed to only return added regions. I think if you return something farther the
_active_end, you read garbage.


> - folded 'SATB' queue processing into thread stack scanning. The problem here is that iterating 
> the threads 2x is cumbersome because of the claiming protocol: we need to fire the task/workers
> 2x: once for the SATB queues, once for the thread scanning. I folded it into one pass. This
> required a (trivial) extension in the upstream parallel thread scanning/iteration protocol.

So this is the source for extension of RootProcessor and Thread methods? I am a bit uneasy about
this (mostly because it raises backporting questions). I'd rather include this into RootProcessor
right away, and assert nothing passes non-NULL ThreadClosure there. Then, sh/jdk8u, sh/jdk9 and
sh/jdk10 versions would agree on the shape of RootProcessor methods and the calls to it, while
sh/jdk10 would call RootProcessor with non-NULL ThreadClosure, and it would *also* implement the
relevant parts in Thread.


> - I tripped an assert in SHR::increase_live_data(). I think the reason is that we have a race 
> here: a GC thread might not yet see the updated SHR::_top but already accounts for the updated 
> live data. I excluded conc-traversal from that check. This could probably be fixed by doing the
> proper concurrency membars, but do we care? For assertion code?

But wait, this change means "s" is greater than max_jint on some paths during Traversal?!

 inline void ShenandoahHeapRegion::increase_live_data_words(size_t s) {
-  assert (s <= (size_t)max_jint, "sanity");
+  assert (s <= (size_t)max_jint || _heap->is_concurrent_traversal_in_progress(), "sanity");
   increase_live_data_words((int)s);
 }

Also, I am confused where Traversal calls increase_live_data_words(size_t), because both call sites
are already protected:

    if (!sh->is_concurrent_traversal_in_progress()) {
      r->increase_live_data_words(used_words);
    }

...

    if (!ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) {
      r->increase_live_data_words(word_size);
    }

> Full:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.04/

Since we have "adaptive" failures with C2 and/or -ShWBMemBar, I propose we chicken out, and drop all
C2 changes (apart from the actual enqueue_barrier) from this change, then follow up on optimization
story in subsequent changesets. This way we could integrated Traversal GC, and not risk immediate
regression in non-Traversal code.

This is done, along with other minor touchups here (apply over webrev.04):
  http://cr.openjdk.java.net/~shade/shenandoah/traversal-shade-updates-1.patch

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Jan 23 17:23:49 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 23 Jan 2018 18:23:49 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
 <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
 <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
Message-ID: <CEBE3BCC-7CC0-4F30-B6CA-08BD84F6E874@redhat.com>


Am 23. Januar 2018 17:06:20 MEZ schrieb Aleksey Shipilev <shade at redhat.com>:
>On 01/22/2018 11:17 PM, Roman Kennke wrote:
>> - when growing the heap, we must make sure that the TAMS points to
>end for the new regions, 
>> otherwise we'd treat them implicitely marked.
>
>Is this the source of this spooky change?
>
>   inline ShenandoahHeapRegion* get(size_t i) const {
>-    assert (i < _active_end, "sanity");
>+    assert (i < _reserved_end, "sanity");
>     return _regions[i];
>   }
>
>get() is supposed to only return added regions. I think if you return
>something farther the
>_active_end, you read garbage.

I think all regions are initialized, but no memory allocated?


>> - folded 'SATB' queue processing into thread stack scanning. The
>problem here is that iterating 
>> the threads 2x is cumbersome because of the claiming protocol: we
>need to fire the task/workers
>> 2x: once for the SATB queues, once for the thread scanning. I folded
>it into one pass. This
>> required a (trivial) extension in the upstream parallel thread
>scanning/iteration protocol.
>
>So this is the source for extension of RootProcessor and Thread
>methods?

Yes.

> I am a bit uneasy about
>this (mostly because it raises backporting questions). I'd rather
>include this into RootProcessor
>right away, and assert nothing passes non-NULL ThreadClosure there.

OK, can break that out of the patch.

>Then, sh/jdk8u, sh/jdk9 and
>sh/jdk10 versions would agree on the shape of RootProcessor methods and
>the calls to it, while
>sh/jdk10 would call RootProcessor with non-NULL ThreadClosure, and it
>would *also* implement the
>relevant parts in Thread.

Makes sense.

>> - I tripped an assert in SHR::increase_live_data(). I think the
>reason is that we have a race 
>> here: a GC thread might not yet see the updated SHR::_top but already
>accounts for the updated 
>> live data. I excluded conc-traversal from that check. This could
>probably be fixed by doing the
>> proper concurrency membars, but do we care? For assertion code?
>
>But wait, this change means "s" is greater than max_jint on some paths
>during Traversal?!
>
> inline void ShenandoahHeapRegion::increase_live_data_words(size_t s) {
>-  assert (s <= (size_t)max_jint, "sanity");
>+  assert (s <= (size_t)max_jint ||
>_heap->is_concurrent_traversal_in_progress(), "sanity");
>   increase_live_data_words((int)s);
> }

Gah. No. Will fix it.

Stay tuned for updated patch.


>
>Also, I am confused where Traversal calls
>increase_live_data_words(size_t), because both call sites
>are already protected:
>
>    if (!sh->is_concurrent_traversal_in_progress()) {
>      r->increase_live_data_words(used_words);
>    }
>
>...
>
>  if (!ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) {
>      r->increase_live_data_words(word_size);
>    }
>
>> Full:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.04/
>
>Since we have "adaptive" failures with C2 and/or -ShWBMemBar, I propose
>we chicken out, and drop all
>C2 changes (apart from the actual enqueue_barrier) from this change,
>then follow up on optimization
>story in subsequent changesets. This way we could integrated Traversal
>GC, and not risk immediate
>regression in non-Traversal code.
>
>This is done, along with other minor touchups here (apply over
>webrev.04):
>http://cr.openjdk.java.net/~shade/shenandoah/traversal-shade-updates-1.patch
>
>Thanks,
>-Aleksey

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From rkennke at redhat.com  Tue Jan 23 20:24:03 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 23 Jan 2018 21:24:03 +0100
Subject: RFR: Add ShenandoahRootProcessor API to report threads while scanning
 roots
Message-ID: <430edf0d-8135-577b-2beb-629ac078d64e@redhat.com>

As discussed in the Traversal GC thread, this breaks out the 
ShenandoahRootProcessor API to report threads while scanning roots. It 
is not implemented here, and only asserts that the ThreadClosure* is 
NULL. All call-sites are updated to pass NULL.

The idea is to make backporting easier/less conflict-prone.

http://cr.openjdk.java.net/~rkennke/root-proc-threads/webrev.00/

Test: hotspot_gc_shenandoah

Ok?

From shade at redhat.com  Tue Jan 23 20:32:28 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 23 Jan 2018 21:32:28 +0100
Subject: RFR: Add ShenandoahRootProcessor API to report threads while
 scanning roots
In-Reply-To: <430edf0d-8135-577b-2beb-629ac078d64e@redhat.com>
References: <430edf0d-8135-577b-2beb-629ac078d64e@redhat.com>
Message-ID: <eb6d4d24-4ce5-29fb-0337-220c9e5a4271@redhat.com>

On 01/23/2018 09:24 PM, Roman Kennke wrote:
> As discussed in the Traversal GC thread, this breaks out the ShenandoahRootProcessor API to report
> threads while scanning roots. It is not implemented here, and only asserts that the ThreadClosure*
> is NULL. All call-sites are updated to pass NULL.
> 
> The idea is to make backporting easier/less conflict-prone.
> 
> http://cr.openjdk.java.net/~rkennke/root-proc-threads/webrev.00/
> 
> Test: hotspot_gc_shenandoah
> 
> Ok?

OK!

-Aleksey


From roman at kennke.org  Tue Jan 23 20:38:24 2018
From: roman at kennke.org (roman at kennke.org)
Date: Tue, 23 Jan 2018 20:38:24 +0000
Subject: hg: shenandoah/jdk10: Add ShenandoahRootProcessor API to report
 threads while scanning roots
Message-ID: <201801232038.w0NKcOWu006989@aojmv0008.oracle.com>

Changeset: bd01b07ba0d7
Author:    rkennke
Date:      2018-01-23 21:20 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/bd01b07ba0d7

Add ShenandoahRootProcessor API to report threads while scanning roots

! src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp
! src/hotspot/share/gc/shenandoah/shenandoahPartialGC.cpp
! src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.cpp
! src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.hpp


From rkennke at redhat.com  Tue Jan 23 20:41:10 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 23 Jan 2018 21:41:10 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
 <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
 <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
Message-ID: <a91ae285-5b2a-9d3f-61c3-e7ebd86ae233@redhat.com>

Am 23.01.2018 um 17:06 schrieb Aleksey Shipilev:
> On 01/22/2018 11:17 PM, Roman Kennke wrote:
>> - when growing the heap, we must make sure that the TAMS points to end for the new regions,
>> otherwise we'd treat them implicitely marked.
> 
> Is this the source of this spooky change?
> 
>     inline ShenandoahHeapRegion* get(size_t i) const {
> -    assert (i < _active_end, "sanity");
> +    assert (i < _reserved_end, "sanity");
>       return _regions[i];
>     }
> 
> get() is supposed to only return added regions. I think if you return something farther the
> _active_end, you read garbage.

All SHR are created and added to the regions list at the start. Which 
means iterating to active_end actually does what I wanted. Reverted back 
that change.

>> - folded 'SATB' queue processing into thread stack scanning. The problem here is that iterating
>> the threads 2x is cumbersome because of the claiming protocol: we need to fire the task/workers
>> 2x: once for the SATB queues, once for the thread scanning. I folded it into one pass. This
>> required a (trivial) extension in the upstream parallel thread scanning/iteration protocol.
> 
> So this is the source for extension of RootProcessor and Thread methods? I am a bit uneasy about
> this (mostly because it raises backporting questions). I'd rather include this into RootProcessor
> right away, and assert nothing passes non-NULL ThreadClosure there. Then, sh/jdk8u, sh/jdk9 and
> sh/jdk10 versions would agree on the shape of RootProcessor methods and the calls to it, while
> sh/jdk10 would call RootProcessor with non-NULL ThreadClosure, and it would *also* implement the
> relevant parts in Thread.

Done in separate patch.

>> - I tripped an assert in SHR::increase_live_data(). I think the reason is that we have a race
>> here: a GC thread might not yet see the updated SHR::_top but already accounts for the updated
>> live data. I excluded conc-traversal from that check. This could probably be fixed by doing the
>> proper concurrency membars, but do we care? For assertion code?
> 
> But wait, this change means "s" is greater than max_jint on some paths during Traversal?!
> 
>   inline void ShenandoahHeapRegion::increase_live_data_words(size_t s) {
> -  assert (s <= (size_t)max_jint, "sanity");
> +  assert (s <= (size_t)max_jint || _heap->is_concurrent_traversal_in_progress(), "sanity");
>     increase_live_data_words((int)s);
>   }

I removed the change to that assert. We only need the other one.

> Also, I am confused where Traversal calls increase_live_data_words(size_t), because both call sites
> are already protected:
> 
>      if (!sh->is_concurrent_traversal_in_progress()) {
>        r->increase_live_data_words(used_words);
>      }
> 
> ...
> 
>      if (!ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) {
>        r->increase_live_data_words(word_size);
>      }

It's called from Traveral's own code.

> Since we have "adaptive" failures with C2 and/or -ShWBMemBar, I propose we chicken out, and drop all
> C2 changes (apart from the actual enqueue_barrier) from this change, then follow up on optimization
> story in subsequent changesets. This way we could integrated Traversal GC, and not risk immediate
> regression in non-Traversal code.
> 
> This is done, along with other minor touchups here (apply over webrev.04):
>    http://cr.openjdk.java.net/~shade/shenandoah/traversal-shade-updates-1.patch

Cool, thanks.

Differential patch:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/
Full patch, including your changes:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/

(give it some seconds to fully upload)

Roman

From shade at redhat.com  Wed Jan 24 10:11:11 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 11:11:11 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <a91ae285-5b2a-9d3f-61c3-e7ebd86ae233@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
 <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
 <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
 <a91ae285-5b2a-9d3f-61c3-e7ebd86ae233@redhat.com>
Message-ID: <fee092d6-2754-5b30-4cbb-2f2ef8efde5b@redhat.com>

On 01/23/2018 09:41 PM, Roman Kennke wrote:
> Differential patch:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/
> Full patch, including your changes:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/

Okay! This looks safe enough to push.

I have a minor question about why this is needed:

1657     case _degenerated_outside_cycle:
1658       if (shenandoahPolicy()->can_do_traversal_gc()) {
1659         // Not possible to degenerate from here, upgrade to Full GC right away.
1660         cancel_concgc(GCCause::_allocation_failure);
1661         op_degenerated_fail();
1662         return;
1663       }

Aren't we good with the usual Degenerated GC cycle here?

-Aleksey


From shade at redhat.com  Wed Jan 24 10:26:22 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 11:26:22 +0100
Subject: RFR: Degenerated GC: shortcut cycles, upgrade futile cycles
Message-ID: <3283e088-cc9e-fe58-ae0e-10e40d41538b@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc-shortcuts/webrev.01/

This makes Degenerated GCs much less painful: they shortcut like concurrent cycle does, and they do
not try to do back-to-back degens when memory is not reclaimed.

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From shade at redhat.com  Wed Jan 24 11:01:23 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 12:01:23 +0100
Subject: RFR: Log concurrent mark that updates references
Message-ID: <fc053ef8-e38c-4334-2d6a-786148194da5@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/mark-message-ur/webrev.01/

Small follow-up, we can actually print if we are running CM-with-UR or not.

Testing: hotspot_fast_gc_shenandoah, eyeballing logs

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 24 11:12:38 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jan 2018 12:12:38 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <fee092d6-2754-5b30-4cbb-2f2ef8efde5b@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
 <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
 <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
 <a91ae285-5b2a-9d3f-61c3-e7ebd86ae233@redhat.com>
 <fee092d6-2754-5b30-4cbb-2f2ef8efde5b@redhat.com>
Message-ID: <be0808fb-81d7-eb42-9cf0-e41da28fe808@redhat.com>

Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev:
> On 01/23/2018 09:41 PM, Roman Kennke wrote:
>> Differential patch:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/
>> Full patch, including your changes:
>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/
> 
> Okay! This looks safe enough to push.
> 
> I have a minor question about why this is needed:
> 
> 1657     case _degenerated_outside_cycle:
> 1658       if (shenandoahPolicy()->can_do_traversal_gc()) {
> 1659         // Not possible to degenerate from here, upgrade to Full GC right away.
> 1660         cancel_concgc(GCCause::_allocation_failure);
> 1661         op_degenerated_fail();
> 1662         return;
> 1663       }
> 
> Aren't we good with the usual Degenerated GC cycle here?
> 
> -Aleksey
> 
> 
> 

The problem is that degen_outside_cycles goes into normal marking, and 
something's not up for that. I'm hitting asserts when I go there.

To be honest, I am also not happy to have all this heuristics-specific 
code/branches all over the place. Could this stuff be abstracted into 
heuristics API? I.e. driver thread calls into heuristics to do stuff 
(e.g. normal-degen, degen-outside-cycle, but also other stuff that is 
currently sprinkled over different places), and heuristics calls the 
right thing to take care of it?

Roman


From rkennke at redhat.com  Wed Jan 24 11:14:16 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jan 2018 12:14:16 +0100
Subject: RFR: Degenerated GC: shortcut cycles, upgrade futile cycles
In-Reply-To: <3283e088-cc9e-fe58-ae0e-10e40d41538b@redhat.com>
References: <3283e088-cc9e-fe58-ae0e-10e40d41538b@redhat.com>
Message-ID: <a5750b9c-9cb8-f7dc-f29c-81032fb19974@redhat.com>

Am 24.01.2018 um 11:26 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc-shortcuts/webrev.01/
> 
> This makes Degenerated GCs much less painful: they shortcut like concurrent cycle does, and they do
> not try to do back-to-back degens when memory is not reclaimed.
> 
> Testing: hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 
Ok

From rkennke at redhat.com  Wed Jan 24 11:14:25 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jan 2018 12:14:25 +0100
Subject: RFR: Log concurrent mark that updates references
In-Reply-To: <fc053ef8-e38c-4334-2d6a-786148194da5@redhat.com>
References: <fc053ef8-e38c-4334-2d6a-786148194da5@redhat.com>
Message-ID: <dc3279f6-519b-e631-b13f-e0a80b7c522d@redhat.com>

Am 24.01.2018 um 12:01 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/mark-message-ur/webrev.01/
> 
> Small follow-up, we can actually print if we are running CM-with-UR or not.
> 
> Testing: hotspot_fast_gc_shenandoah, eyeballing logs
> 
> Thanks,
> -Aleksey
> 
Ok


From shade at redhat.com  Wed Jan 24 11:14:47 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 12:14:47 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <be0808fb-81d7-eb42-9cf0-e41da28fe808@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
 <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
 <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
 <a91ae285-5b2a-9d3f-61c3-e7ebd86ae233@redhat.com>
 <fee092d6-2754-5b30-4cbb-2f2ef8efde5b@redhat.com>
 <be0808fb-81d7-eb42-9cf0-e41da28fe808@redhat.com>
Message-ID: <a450ec3b-b67c-7453-74df-5fbb7e4b8ccb@redhat.com>

On 01/24/2018 12:12 PM, Roman Kennke wrote:
> Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev:
>> On 01/23/2018 09:41 PM, Roman Kennke wrote:
>>> Differential patch:
>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/
>>> Full patch, including your changes:
>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/
>>
>> Okay! This looks safe enough to push.
>>
>> I have a minor question about why this is needed:
>>
>> 1657???? case _degenerated_outside_cycle:
>> 1658?????? if (shenandoahPolicy()->can_do_traversal_gc()) {
>> 1659???????? // Not possible to degenerate from here, upgrade to Full GC right away.
>> 1660???????? cancel_concgc(GCCause::_allocation_failure);
>> 1661???????? op_degenerated_fail();
>> 1662???????? return;
>> 1663?????? }
>>
>> Aren't we good with the usual Degenerated GC cycle here?
>>
> 
> The problem is that degen_outside_cycles goes into normal marking, and something's not up for that.
> I'm hitting asserts when I go there.

That probably indicates a bug? Traversal GC is ought to leave the heap in the state that is ready
for the usual concurrent cycle, no?

> To be honest, I am also not happy to have all this heuristics-specific code/branches all over the
> place. Could this stuff be abstracted into heuristics API? I.e. driver thread calls into heuristics
> to do stuff (e.g. normal-degen, degen-outside-cycle, but also other stuff that is currently
> sprinkled over different places), and heuristics calls the right thing to take care of it?

Baby steps...

-Aleksey


From rkennke at redhat.com  Wed Jan 24 11:17:05 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jan 2018 12:17:05 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <a450ec3b-b67c-7453-74df-5fbb7e4b8ccb@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
 <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
 <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
 <a91ae285-5b2a-9d3f-61c3-e7ebd86ae233@redhat.com>
 <fee092d6-2754-5b30-4cbb-2f2ef8efde5b@redhat.com>
 <be0808fb-81d7-eb42-9cf0-e41da28fe808@redhat.com>
 <a450ec3b-b67c-7453-74df-5fbb7e4b8ccb@redhat.com>
Message-ID: <7273cfa6-7b01-af08-db88-43724cd4570f@redhat.com>

Am 24.01.2018 um 12:14 schrieb Aleksey Shipilev:
> On 01/24/2018 12:12 PM, Roman Kennke wrote:
>> Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev:
>>> On 01/23/2018 09:41 PM, Roman Kennke wrote:
>>>> Differential patch:
>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/
>>>> Full patch, including your changes:
>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/
>>>
>>> Okay! This looks safe enough to push.
>>>
>>> I have a minor question about why this is needed:
>>>
>>> 1657???? case _degenerated_outside_cycle:
>>> 1658?????? if (shenandoahPolicy()->can_do_traversal_gc()) {
>>> 1659???????? // Not possible to degenerate from here, upgrade to Full GC right away.
>>> 1660???????? cancel_concgc(GCCause::_allocation_failure);
>>> 1661???????? op_degenerated_fail();
>>> 1662???????? return;
>>> 1663?????? }
>>>
>>> Aren't we good with the usual Degenerated GC cycle here?
>>>
>>
>> The problem is that degen_outside_cycles goes into normal marking, and something's not up for that.
>> I'm hitting asserts when I go there.
> 
> That probably indicates a bug? Traversal GC is ought to leave the heap in the state that is ready
> for the usual concurrent cycle, no?

In the middle of traversal GC? I don't know... Also, cannot, in any 
case, expect *concurrent* cycle to work: we don't have the barriers for 
that. Theoretically, we could do STW normal cycle, but what would be the 
point? I'd rather have a STW degen traversal pickup.

>> To be honest, I am also not happy to have all this heuristics-specific code/branches all over the
>> place. Could this stuff be abstracted into heuristics API? I.e. driver thread calls into heuristics
>> to do stuff (e.g. normal-degen, degen-outside-cycle, but also other stuff that is currently
>> sprinkled over different places), and heuristics calls the right thing to take care of it?
> 
> Baby steps...

;-)

From shade at redhat.com  Wed Jan 24 11:19:29 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 12:19:29 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <7273cfa6-7b01-af08-db88-43724cd4570f@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
 <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
 <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
 <a91ae285-5b2a-9d3f-61c3-e7ebd86ae233@redhat.com>
 <fee092d6-2754-5b30-4cbb-2f2ef8efde5b@redhat.com>
 <be0808fb-81d7-eb42-9cf0-e41da28fe808@redhat.com>
 <a450ec3b-b67c-7453-74df-5fbb7e4b8ccb@redhat.com>
 <7273cfa6-7b01-af08-db88-43724cd4570f@redhat.com>
Message-ID: <407a3385-6981-549f-6aff-f0362cb5d3f3@redhat.com>

On 01/24/2018 12:17 PM, Roman Kennke wrote:
> Am 24.01.2018 um 12:14 schrieb Aleksey Shipilev:
>> On 01/24/2018 12:12 PM, Roman Kennke wrote:
>>> Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev:
>>>> On 01/23/2018 09:41 PM, Roman Kennke wrote:
>>>>> Differential patch:
>>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/
>>>>> Full patch, including your changes:
>>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/
>>>>
>>>> Okay! This looks safe enough to push.
>>>>
>>>> I have a minor question about why this is needed:
>>>>
>>>> 1657???? case _degenerated_outside_cycle:
>>>> 1658?????? if (shenandoahPolicy()->can_do_traversal_gc()) {
>>>> 1659???????? // Not possible to degenerate from here, upgrade to Full GC right away.
>>>> 1660???????? cancel_concgc(GCCause::_allocation_failure);
>>>> 1661???????? op_degenerated_fail();
>>>> 1662???????? return;
>>>> 1663?????? }
>>>>
>>>> Aren't we good with the usual Degenerated GC cycle here?
>>>>
>>>
>>> The problem is that degen_outside_cycles goes into normal marking, and something's not up for that.
>>> I'm hitting asserts when I go there.
>>
>> That probably indicates a bug? Traversal GC is ought to leave the heap in the state that is ready
>> for the usual concurrent cycle, no?
> 
> In the middle of traversal GC? I don't know... Also, cannot, in any case, expect *concurrent* cycle
> to work: we don't have the barriers for that. Theoretically, we could do STW normal cycle, but what
> would be the point? I'd rather have a STW degen traversal pickup.

"outside cycle" means you are out of Traversal GC already -- that means outside the *complete*
cycle. So, here is where Traversal differs from Partial? Partial may be followed by the normal
concurrent cycle, and Traversal can only run Traversals?

-Aleksey


From roman at kennke.org  Wed Jan 24 11:21:29 2018
From: roman at kennke.org (Roman Kennke)
Date: Wed, 24 Jan 2018 12:21:29 +0100
Subject: RFR: Traveral GC heuristics
In-Reply-To: <407a3385-6981-549f-6aff-f0362cb5d3f3@redhat.com>
References: <f0715fa7-b31f-8d1c-8015-22d1c64056ab@redhat.com>
 <e119d322-384a-9d30-35ef-c9dd352854c9@redhat.com>
 <b5e1b09e-514c-e344-a48c-e1cdf0c0726c@redhat.com>
 <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com>
 <c092efbe-d95d-3d0b-d1f2-adcb7c15414c@redhat.com>
 <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com>
 <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com>
 <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com>
 <ecd8e996-165e-476e-4e94-7720c6768cb3@redhat.com>
 <a91ae285-5b2a-9d3f-61c3-e7ebd86ae233@redhat.com>
 <fee092d6-2754-5b30-4cbb-2f2ef8efde5b@redhat.com>
 <be0808fb-81d7-eb42-9cf0-e41da28fe808@redhat.com>
 <a450ec3b-b67c-7453-74df-5fbb7e4b8ccb@redhat.com>
 <7273cfa6-7b01-af08-db88-43724cd4570f@redhat.com>
 <407a3385-6981-549f-6aff-f0362cb5d3f3@redhat.com>
Message-ID: <E4CA6C68-CFB7-4A2A-BEE0-196563BBEF03@kennke.org>


Am 24. Januar 2018 12:19:29 MEZ schrieb Aleksey Shipilev <shade at redhat.com>:
>On 01/24/2018 12:17 PM, Roman Kennke wrote:
>> Am 24.01.2018 um 12:14 schrieb Aleksey Shipilev:
>>> On 01/24/2018 12:12 PM, Roman Kennke wrote:
>>>> Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev:
>>>>> On 01/23/2018 09:41 PM, Roman Kennke wrote:
>>>>>> Differential patch:
>>>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/
>>>>>> Full patch, including your changes:
>>>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/
>>>>>
>>>>> Okay! This looks safe enough to push.
>>>>>
>>>>> I have a minor question about why this is needed:
>>>>>
>>>>> 1657???? case _degenerated_outside_cycle:
>>>>> 1658?????? if (shenandoahPolicy()->can_do_traversal_gc()) {
>>>>> 1659???????? // Not possible to degenerate from here, upgrade to
>Full GC right away.
>>>>> 1660???????? cancel_concgc(GCCause::_allocation_failure);
>>>>> 1661???????? op_degenerated_fail();
>>>>> 1662???????? return;
>>>>> 1663?????? }
>>>>>
>>>>> Aren't we good with the usual Degenerated GC cycle here?
>>>>>
>>>>
>>>> The problem is that degen_outside_cycles goes into normal marking,
>and something's not up for that.
>>>> I'm hitting asserts when I go there.
>>>
>>> That probably indicates a bug? Traversal GC is ought to leave the
>heap in the state that is ready
>>> for the usual concurrent cycle, no?
>> 
>> In the middle of traversal GC? I don't know... Also, cannot, in any
>case, expect *concurrent* cycle
>> to work: we don't have the barriers for that. Theoretically, we could
>do STW normal cycle, but what
>> would be the point? I'd rather have a STW degen traversal pickup.
>
>"outside cycle" means you are out of Traversal GC already -- that means
>outside the *complete*
>cycle. So, here is where Traversal differs from Partial? Partial may be
>followed by the normal
>concurrent cycle, and Traversal can only run Traversals?

Yes, exactly. Traversal is not a minor GC like partial would be. It is much like normal concept GC.

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From roman at kennke.org  Wed Jan 24 13:03:20 2018
From: roman at kennke.org (roman at kennke.org)
Date: Wed, 24 Jan 2018 13:03:20 +0000
Subject: hg: shenandoah/jdk10: Traversal GC heuristics
Message-ID: <201801241303.w0OD3K3C000435@aojmv0008.oracle.com>

Changeset: 36640d8dec5f
Author:    rkennke
Date:      2018-01-24 13:57 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/36640d8dec5f

Traversal GC heuristics

! make/hotspot/lib/JvmOverrideFiles.gmk
! src/hotspot/cpu/x86/c1_Runtime1_x86.cpp
! src/hotspot/cpu/x86/macroAssembler_x86.cpp
! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp
! src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
! src/hotspot/cpu/x86/templateTable_x86.cpp
! src/hotspot/share/c1/c1_LIR.hpp
! src/hotspot/share/c1/c1_LIRGenerator.cpp
! src/hotspot/share/gc/shared/barrierSet.hpp
! src/hotspot/share/gc/shared/gcCause.cpp
! src/hotspot/share/gc/shared/gcCause.hpp
! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp
! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp
! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.hpp
! src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp
! src/hotspot/share/gc/shenandoah/shenandoahOopClosures.hpp
! src/hotspot/share/gc/shenandoah/shenandoahOopClosures.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp
! src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp
! src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.cpp
! src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.hpp
+ src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.cpp
+ src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.hpp
+ src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp
! src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp
! src/hotspot/share/gc/shenandoah/shenandoahVerifier.hpp
! src/hotspot/share/gc/shenandoah/shenandoahWorkerPolicy.cpp
! src/hotspot/share/gc/shenandoah/shenandoahWorkerPolicy.hpp
! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp
! src/hotspot/share/gc/shenandoah/shenandoah_specialized_oop_closures.hpp
! src/hotspot/share/gc/shenandoah/vm_operations_shenandoah.cpp
! src/hotspot/share/gc/shenandoah/vm_operations_shenandoah.hpp
! src/hotspot/share/opto/graphKit.cpp
! src/hotspot/share/opto/graphKit.hpp
! src/hotspot/share/runtime/sharedRuntime.cpp
! src/hotspot/share/runtime/thread.cpp
! src/hotspot/share/runtime/thread.hpp
! src/hotspot/share/runtime/vm_operations.hpp
! test/hotspot/jtreg/gc/shenandoah/LotsOfCycles.java
! test/hotspot/jtreg/gc/shenandoah/ShenandoahStrDedupStress.java
! test/hotspot/jtreg/gc/shenandoah/TestGCThreadGroups.java
! test/hotspot/jtreg/gc/shenandoah/TestPeriodicGC.java
! test/hotspot/jtreg/gc/shenandoah/TestRegionSampling.java
! test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java
! test/hotspot/jtreg/gc/shenandoah/TestShenandoahStrDedup.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocHumongousFragment.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocIntArrays.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocObjectArrays.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocObjects.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/HeapUncommit.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/RetainObjects.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/SieveObjects.java
! test/hotspot/jtreg/gc/shenandoah/acceptance/StringInternCleanup.java
! test/hotspot/jtreg/gc/shenandoah/options/TestHeuristicsUnlock.java
! test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithShenandoah.java
! test/hotspot/jtreg/gc/stress/gcold/TestGCOldWithShenandoah.java


From rkennke at redhat.com  Wed Jan 24 14:12:05 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jan 2018 15:12:05 +0100
Subject: RFR: Relax assert in SBS::is_safe()
Message-ID: <45c52e7d-8251-def1-3277-50494d7ed94e@redhat.com>

With traversal I am hitting the assert in SBS::is_safe() (through 
weakref discovery) because GC got cancelled and the obj is not in 
to-space. It is not a problem with conc-mark because there we don't evac 
during marking.

The fix is to fall-through the in_cset() check when GC got cancelled, 
and check if there is an actual copy.

http://cr.openjdk.java.net/~rkennke/fixissafe/webrev.00/

Testing: hotspot_gc_shenandoah passes the occasional above failure now.

Good?

Roman

From shade at redhat.com  Wed Jan 24 14:18:07 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 15:18:07 +0100
Subject: RFR: Relax assert in SBS::is_safe()
In-Reply-To: <45c52e7d-8251-def1-3277-50494d7ed94e@redhat.com>
References: <45c52e7d-8251-def1-3277-50494d7ed94e@redhat.com>
Message-ID: <51e3ff18-cada-b81f-bffd-0c10b5e90e96@redhat.com>

On 01/24/2018 03:12 PM, Roman Kennke wrote:
> With traversal I am hitting the assert in SBS::is_safe() (through weakref discovery) because GC got
> cancelled and the obj is not in to-space. It is not a problem with conc-mark because there we don't
> evac during marking.
> 
> The fix is to fall-through the in_cset() check when GC got cancelled, and check if there is an
> actual copy.
> 
> http://cr.openjdk.java.net/~rkennke/fixissafe/webrev.00/

Makes sense.

Thanks,
-Aleksey


From roman at kennke.org  Wed Jan 24 14:26:14 2018
From: roman at kennke.org (roman at kennke.org)
Date: Wed, 24 Jan 2018 14:26:14 +0000
Subject: hg: shenandoah/jdk10: Relax assert in SBS::is_safe()
Message-ID: <201801241426.w0OEQE2i029361@aojmv0008.oracle.com>

Changeset: 3a6457fecc72
Author:    rkennke
Date:      2018-01-24 15:09 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/3a6457fecc72

Relax assert in SBS::is_safe()

! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp


From ashipile at redhat.com  Wed Jan 24 14:35:13 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 24 Jan 2018 14:35:13 +0000
Subject: hg: shenandoah/jdk10: 2 new changesets
Message-ID: <201801241435.w0OEZDwH002674@aojmv0008.oracle.com>

Changeset: 15261c4a6adf
Author:    shade
Date:      2018-01-24 15:30 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/15261c4a6adf

Degenerated GC: shortcut cycles, upgrade futile cycles

! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp

Changeset: 351efe4f6d40
Author:    shade
Date:      2018-01-24 15:30 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/351efe4f6d40

Log concurrent mark that updates references

! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp


From shade at redhat.com  Wed Jan 24 16:31:53 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 17:31:53 +0100
Subject: RFR: Fix Traversal GC regression
Message-ID: <67cd6e2f-7c1f-1281-ac35-1d2c0651274f@redhat.com>

After Traversal GC commit, normal cycle on Compiler.compiler fails within:

V  [libjvm.so+0x153c713]  oop ShenandoahHeap::evac_update_oop_ref<unsigned int>(unsigned int*,
bool&)+0x333
V  [libjvm.so+0x1539082]  ShenandoahBarrierSet::write_ref_array(HeapWord*, unsigned long)+0x852
V  [libjvm.so+0x133be02]  void ObjArrayKlass::do_copy<unsigned int>(arrayOop, unsigned int*,
arrayOop, unsigned int*, int, Thread*)+0x142
V  [libjvm.so+0x133932b]  ObjArrayKlass::copy_array(arrayOop, int, arrayOop, int, int, Thread*)+0x72b
V  [libjvm.so+0xf23325]  JVM_ArrayCopy+0x1e5

The troubling bit is why do we even get here:

  inline void do_oop_work(T* p) {
    oop o;
    if (STOREVAL_WRITE_BARRIER) {
      bool evac;
      o = _heap->evac_update_oop_ref(p, evac); <--- ????
      if ((ALWAYS_ENQUEUE || evac) && !oopDesc::is_null(o)) {
        ShenandoahBarrierSet::enqueue(o);
      }
    } else {
      o = _heap->maybe_update_oop_ref(p);
    }
    if (UPDATE_MATRIX && !oopDesc::is_null(o)) {
      _heap->connection_matrix()->set_connected(p, o);
    }
  }

It happens because the condition in selector is wrong:
  http://cr.openjdk.java.net/~shade/shenandoah/traversal-regr-1/webrev.01/

(Note the symmetry against the branch at L223.

Testing: failing Compiler.compiler

-Aleksey


From shade at redhat.com  Wed Jan 24 16:42:56 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 17:42:56 +0100
Subject: RFR: Unsafe comparison in ShenandoahHeap::evac_update_oop_ref
Message-ID: <ca539b28-c69e-9faa-5bfb-cc17ebed0d42@redhat.com>

Traversal GC fails in ShenandoahHeap::evac_update_oop_ref when -XX:+VerifyStrictOopOperations is
enabled, because we need:

$ hg qdiff
diff -r 21c595539121 src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp	Wed Jan 24 17:32:27 2018 +0100
+++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp	Wed Jan 24 17:40:35 2018 +0100
@@ -151,7 +151,7 @@
         forwarded_oop = evacuate_object(heap_oop, Thread::current(), evac);
       }
       oop prev = atomic_compare_exchange_oop(forwarded_oop, p, heap_oop);
-      if (prev == heap_oop) {
+      if (oopDesc::unsafe_equals(prev, heap_oop)) {
         return forwarded_oop;
       } else {
         return NULL;

This actually affects partial too, which call this method in SVWB.

Testing: failing test

Thanks,
-Aleksey


From zgu at redhat.com  Wed Jan 24 16:45:26 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 24 Jan 2018 11:45:26 -0500
Subject: RFR: Unsafe comparison in ShenandoahHeap::evac_update_oop_ref
In-Reply-To: <ca539b28-c69e-9faa-5bfb-cc17ebed0d42@redhat.com>
References: <ca539b28-c69e-9faa-5bfb-cc17ebed0d42@redhat.com>
Message-ID: <ff4fda05-81ad-22cf-82c5-d9361adfa3ee@redhat.com>

Looks good.

-Zhengyu

On 01/24/2018 11:42 AM, Aleksey Shipilev wrote:
> Traversal GC fails in ShenandoahHeap::evac_update_oop_ref when -XX:+VerifyStrictOopOperations is
> enabled, because we need:
> 
> $ hg qdiff
> diff -r 21c595539121 src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp	Wed Jan 24 17:32:27 2018 +0100
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp	Wed Jan 24 17:40:35 2018 +0100
> @@ -151,7 +151,7 @@
>           forwarded_oop = evacuate_object(heap_oop, Thread::current(), evac);
>         }
>         oop prev = atomic_compare_exchange_oop(forwarded_oop, p, heap_oop);
> -      if (prev == heap_oop) {
> +      if (oopDesc::unsafe_equals(prev, heap_oop)) {
>           return forwarded_oop;
>         } else {
>           return NULL;
> 
> This actually affects partial too, which call this method in SVWB.
> 
> Testing: failing test
> 
> Thanks,
> -Aleksey
> 

From shade at redhat.com  Wed Jan 24 17:03:14 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 18:03:14 +0100
Subject: RFR: Fix and rewrite update-refs barrier selector
In-Reply-To: <67cd6e2f-7c1f-1281-ac35-1d2c0651274f@redhat.com>
References: <67cd6e2f-7c1f-1281-ac35-1d2c0651274f@redhat.com>
Message-ID: <029d10b3-0e87-b192-c29d-f6aa2a86b6d4@redhat.com>

On 01/24/2018 05:31 PM, Aleksey Shipilev wrote:
> After Traversal GC commit, normal cycle on Compiler.compiler fails within:
> 
> V  [libjvm.so+0x153c713]  oop ShenandoahHeap::evac_update_oop_ref<unsigned int>(unsigned int*,
> bool&)+0x333
> V  [libjvm.so+0x1539082]  ShenandoahBarrierSet::write_ref_array(HeapWord*, unsigned long)+0x852
> V  [libjvm.so+0x133be02]  void ObjArrayKlass::do_copy<unsigned int>(arrayOop, unsigned int*,
> arrayOop, unsigned int*, int, Thread*)+0x142
> V  [libjvm.so+0x133932b]  ObjArrayKlass::copy_array(arrayOop, int, arrayOop, int, int, Thread*)+0x72b
> V  [libjvm.so+0xf23325]  JVM_ArrayCopy+0x1e5
> 
> The troubling bit is why do we even get here:
> 
>   inline void do_oop_work(T* p) {
>     oop o;
>     if (STOREVAL_WRITE_BARRIER) {
>       bool evac;
>       o = _heap->evac_update_oop_ref(p, evac); <--- ????
>       if ((ALWAYS_ENQUEUE || evac) && !oopDesc::is_null(o)) {
>         ShenandoahBarrierSet::enqueue(o);
>       }
>     } else {
>       o = _heap->maybe_update_oop_ref(p);
>     }
>     if (UPDATE_MATRIX && !oopDesc::is_null(o)) {
>       _heap->connection_matrix()->set_connected(p, o);
>     }
>   }
> 
> It happens because the condition in selector is wrong:
>   http://cr.openjdk.java.net/~shade/shenandoah/traversal-regr-1/webrev.01/

Actually, let's rewrite the damn fragile thing:
  http://cr.openjdk.java.net/~shade/shenandoah/traversal-regr-1/webrev.02/

-Aleksey


From shade at redhat.com  Wed Jan 24 18:31:58 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 24 Jan 2018 19:31:58 +0100
Subject: RFR: VerifyJCStressTest should test all heuristics
Message-ID: <8b4d3794-a0b3-4467-559e-ad55570572d8@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/verify-jcstress-all/webrev.01/

We have missed unsafe oopDesc operation with traversal heuristics, because no test validates it.
Extended VerifyJCStressTest with all heuristics. (Passive excludes -XX:+ShVerifyOptoBarriers,
because barrier config is odd there).

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 24 20:54:27 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jan 2018 21:54:27 +0100
Subject: RFR: VerifyJCStressTest should test all heuristics
In-Reply-To: <8b4d3794-a0b3-4467-559e-ad55570572d8@redhat.com>
References: <8b4d3794-a0b3-4467-559e-ad55570572d8@redhat.com>
Message-ID: <CAAN-Kyg0RZjW+Zbz6ZU6jWMqKjwWar2kDiU0eg9RdfVEOh9PYg@mail.gmail.com>

Am 24.01.2018 um 19:31 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/verify-jcstress-all/webrev.01/
>
> We have missed unsafe oopDesc operation with traversal heuristics, because no test validates it.
> Extended VerifyJCStressTest with all heuristics. (Passive excludes -XX:+ShVerifyOptoBarriers,
> because barrier config is odd there).
>
> Testing: hotspot_gc_shenandoah
>
> Thanks,
> -Aleksey
>

Yup

From rkennke at redhat.com  Wed Jan 24 20:55:17 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jan 2018 21:55:17 +0100
Subject: RFR: Fix and rewrite update-refs barrier selector
In-Reply-To: <029d10b3-0e87-b192-c29d-f6aa2a86b6d4@redhat.com>
References: <67cd6e2f-7c1f-1281-ac35-1d2c0651274f@redhat.com>
 <029d10b3-0e87-b192-c29d-f6aa2a86b6d4@redhat.com>
Message-ID: <CAAN-KyiH=q62_ST7Vw+uMNXh5xSKK8rdtJZA9MwZdc21Et65Ww@mail.gmail.com>

Good. Sorry for breaking it. Thanks for fixing!

On Wed, Jan 24, 2018 at 6:03 PM, Aleksey Shipilev <shade at redhat.com> wrote:

> On 01/24/2018 05:31 PM, Aleksey Shipilev wrote:
> > After Traversal GC commit, normal cycle on Compiler.compiler fails
> within:
> >
> > V  [libjvm.so+0x153c713]  oop ShenandoahHeap::evac_update_oop_ref<unsigned
> int>(unsigned int*,
> > bool&)+0x333
> > V  [libjvm.so+0x1539082]  ShenandoahBarrierSet::write_ref_array(HeapWord*,
> unsigned long)+0x852
> > V  [libjvm.so+0x133be02]  void ObjArrayKlass::do_copy<unsigned
> int>(arrayOop, unsigned int*,
> > arrayOop, unsigned int*, int, Thread*)+0x142
> > V  [libjvm.so+0x133932b]  ObjArrayKlass::copy_array(arrayOop, int,
> arrayOop, int, int, Thread*)+0x72b
> > V  [libjvm.so+0xf23325]  JVM_ArrayCopy+0x1e5
> >
> > The troubling bit is why do we even get here:
> >
> >   inline void do_oop_work(T* p) {
> >     oop o;
> >     if (STOREVAL_WRITE_BARRIER) {
> >       bool evac;
> >       o = _heap->evac_update_oop_ref(p, evac); <--- ????
> >       if ((ALWAYS_ENQUEUE || evac) && !oopDesc::is_null(o)) {
> >         ShenandoahBarrierSet::enqueue(o);
> >       }
> >     } else {
> >       o = _heap->maybe_update_oop_ref(p);
> >     }
> >     if (UPDATE_MATRIX && !oopDesc::is_null(o)) {
> >       _heap->connection_matrix()->set_connected(p, o);
> >     }
> >   }
> >
> > It happens because the condition in selector is wrong:
> >   http://cr.openjdk.java.net/~shade/shenandoah/traversal-
> regr-1/webrev.01/
>
> Actually, let's rewrite the damn fragile thing:
>   http://cr.openjdk.java.net/~shade/shenandoah/traversal-regr-1/webrev.02/
>
> -Aleksey
>
>
>

From rkennke at redhat.com  Wed Jan 24 20:55:47 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jan 2018 21:55:47 +0100
Subject: RFR: Unsafe comparison in ShenandoahHeap::evac_update_oop_ref
In-Reply-To: <ca539b28-c69e-9faa-5bfb-cc17ebed0d42@redhat.com>
References: <ca539b28-c69e-9faa-5bfb-cc17ebed0d42@redhat.com>
Message-ID: <CAAN-KygJF5+PVk3Co+=Gro=twQ-b1-S2oRS7NypFjpGmvD-gVg@mail.gmail.com>

Ugh. Please push it. Thanks for fixing.

On Wed, Jan 24, 2018 at 5:42 PM, Aleksey Shipilev <shade at redhat.com> wrote:

> Traversal GC fails in ShenandoahHeap::evac_update_oop_ref when
> -XX:+VerifyStrictOopOperations is
> enabled, because we need:
>
> $ hg qdiff
> diff -r 21c595539121 src/hotspot/share/gc/shenandoah/shenandoahHeap.
> inline.hpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp Wed Jan
> 24 17:32:27 2018 +0100
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp Wed Jan
> 24 17:40:35 2018 +0100
> @@ -151,7 +151,7 @@
>          forwarded_oop = evacuate_object(heap_oop, Thread::current(),
> evac);
>        }
>        oop prev = atomic_compare_exchange_oop(forwarded_oop, p, heap_oop);
> -      if (prev == heap_oop) {
> +      if (oopDesc::unsafe_equals(prev, heap_oop)) {
>          return forwarded_oop;
>        } else {
>          return NULL;
>
> This actually affects partial too, which call this method in SVWB.
>
> Testing: failing test
>
> Thanks,
> -Aleksey
>
>

From ashipile at redhat.com  Wed Jan 24 21:01:02 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 24 Jan 2018 21:01:02 +0000
Subject: hg: shenandoah/jdk10: 3 new changesets
Message-ID: <201801242101.w0OL12wP018895@aojmv0008.oracle.com>

Changeset: d8a9b5bfb1bd
Author:    shade
Date:      2018-01-24 18:02 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/d8a9b5bfb1bd

Fix and rewrite update-refs barrier selector

! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp

Changeset: 8437e22953c0
Author:    shade
Date:      2018-01-24 18:03 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/8437e22953c0

Unsafe comparison in ShenandoahHeap::evac_update_oop_ref

! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp

Changeset: 30e8ba6e2794
Author:    shade
Date:      2018-01-24 19:14 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/30e8ba6e2794

VerifyJCStressTest should test all heuristics

! test/hotspot/jtreg/gc/shenandoah/acceptance/VerifyJCStressTest.java


From shade at redhat.com  Thu Jan 25 10:27:18 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 25 Jan 2018 11:27:18 +0100
Subject: RFR: ShBS::interpreter_storeval_barrier signature fix and cleanup
Message-ID: <39e49d42-8d3d-b59e-b7d3-4edf8020dae1@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/shbs-storeval-fix/webrev.01/

sh/jdk10 aarch64 build fails with:

/pool/buildbot/slaves/sobornost/shenandoah-jdk10/build/src/hotspot/cpu/aarch64/shenandoahBarrierSet_aarch64.cpp:110:6:
error: prototype for ?void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*,
Register)? does not match any in class ?ShenandoahBarrierSet?
 void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler* masm, Register dst) {
      ^~~~~~~~~~~~~~~~~~~~

/pool/buildbot/slaves/sobornost/shenandoah-jdk10/build/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp:129:8:
error: candidate is: virtual void
ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*, Register, Register, Register)
   void interpreter_storeval_barrier(MacroAssembler* masm, Register dst, Register tmp, Register thread);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is because the argument lists for interpreter_storeval_barrier are messed up.

Testing: hotspot_fast_gc_shenandoah, builds on x86_64 and aarch64

Thanks,
-Aleksey


From rkennke at redhat.com  Thu Jan 25 12:29:09 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 25 Jan 2018 13:29:09 +0100
Subject: RFR: ShBS::interpreter_storeval_barrier signature fix and cleanup
In-Reply-To: <39e49d42-8d3d-b59e-b7d3-4edf8020dae1@redhat.com>
References: <39e49d42-8d3d-b59e-b7d3-4edf8020dae1@redhat.com>
Message-ID: <E1CE210C-DF75-4278-BE23-D25B0E6330A8@redhat.com>

Oops. I forgot to check aarch64 when doing traversal. Sorry.

The patch is fine. Thanks for fixing it!

Cheers, Roman

Am 25. Januar 2018 11:27:18 MEZ schrieb Aleksey Shipilev <shade at redhat.com>:
>http://cr.openjdk.java.net/~shade/shenandoah/shbs-storeval-fix/webrev.01/
>
>sh/jdk10 aarch64 build fails with:
>
>/pool/buildbot/slaves/sobornost/shenandoah-jdk10/build/src/hotspot/cpu/aarch64/shenandoahBarrierSet_aarch64.cpp:110:6:
>error: prototype for ?void
>ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*,
>Register)? does not match any in class ?ShenandoahBarrierSet?
>void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*
>masm, Register dst) {
>      ^~~~~~~~~~~~~~~~~~~~
>
>/pool/buildbot/slaves/sobornost/shenandoah-jdk10/build/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp:129:8:
>error: candidate is: virtual void
>ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*,
>Register, Register, Register)
>void interpreter_storeval_barrier(MacroAssembler* masm, Register dst,
>Register tmp, Register thread);
>        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>This is because the argument lists for interpreter_storeval_barrier are
>messed up.
>
>Testing: hotspot_fast_gc_shenandoah, builds on x86_64 and aarch64
>
>Thanks,
>-Aleksey

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From ashipile at redhat.com  Thu Jan 25 14:28:29 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Thu, 25 Jan 2018 14:28:29 +0000
Subject: hg: shenandoah/jdk10: ShBS::interpreter_storeval_barrier signature
 fix and cleanup
Message-ID: <201801251428.w0PESTXn015379@aojmv0008.oracle.com>

Changeset: 6183a72bd5c2
Author:    shade
Date:      2018-01-25 11:24 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/6183a72bd5c2

ShBS::interpreter_storeval_barrier signature fix and cleanup

! src/hotspot/cpu/aarch64/shenandoahBarrierSet_aarch64.cpp
! src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp
! src/hotspot/cpu/x86/templateTable_x86.cpp
! src/hotspot/share/gc/shared/barrierSet.hpp
! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp


From zgu at redhat.com  Thu Jan 25 17:15:03 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 25 Jan 2018 12:15:03 -0500
Subject: RFR: Hole in CAS barrier when using traversal heuristics
Message-ID: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>

I am not complete sure this is right fix. There is hole in CAS barrier 
when using traversal heuristics.

E.g. Unsafe_CompareAndSetObject() evacuates target and exchange object, 
but not the field, so it may hit assertion in ShenandoahBarrier::enqueue().

I could not come up a reliable reproducer, but I have seen this a few 
time with specjvm ScimarkLU with options:

"-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions 
-XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal 
-Xlog:gc+stats"

Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/cas_traversal/webrev.00/


Thanks,

-Zhengyu

From rkennke at redhat.com  Thu Jan 25 17:26:37 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 25 Jan 2018 18:26:37 +0100
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
Message-ID: <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>

Am 25.01.2018 um 18:15 schrieb Zhengyu Gu:
> I am not complete sure this is right fix. There is hole in CAS barrier 
> when using traversal heuristics.
> 
> E.g. Unsafe_CompareAndSetObject() evacuates target and exchange object, 
> but not the field, so it may hit assertion in ShenandoahBarrier::enqueue().
> 
> I could not come up a reliable reproducer, but I have seen this a few 
> time with specjvm ScimarkLU with options:
> 
> "-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions 
> -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal 
> -Xlog:gc+stats"
> 
> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/cas_traversal/webrev.00/
> 
> 
> Thanks,
> 
> -Zhengyu

Hi Zhengyu,

I am not sure what you mean by 'evacuates target and exchange object,
but not the field' .. clearly the target object needs to be evacuated, 
because we only write to to-space (write-barrier). Also, the exchange 
object needs to be evacuated, to ensure we end up only with to-space 
references in fields (storeval-barrier). What do you mean by evacuation 
of 'the field' ? The target field is part of the target object.

The issue here seems to be that the usual 'pre-barrier' (i.e. 
SATB-barrier) should not be called at all. However, since we do set the 
MARKING bit, we still get into this code path. We might just want to 
check for traversal-in-progress and return right at the start of the method.

Roman

From zgu at redhat.com  Thu Jan 25 17:35:01 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 25 Jan 2018 12:35:01 -0500
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
Message-ID: <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>


On 01/25/2018 12:26 PM, Roman Kennke wrote:
> Am 25.01.2018 um 18:15 schrieb Zhengyu Gu:
>> I am not complete sure this is right fix. There is hole in CAS barrier 
>> when using traversal heuristics.
>>
>> E.g. Unsafe_CompareAndSetObject() evacuates target and exchange 
>> object, but not the field, so it may hit assertion in 
>> ShenandoahBarrier::enqueue().
>>
>> I could not come up a reliable reproducer, but I have seen this a few 
>> time with specjvm ScimarkLU with options:
>>
>> "-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions 
>> -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal 
>> -Xlog:gc+stats"
>>
>> Webrev: 
>> http://cr.openjdk.java.net/~zgu/shenandoah/cas_traversal/webrev.00/
>>
>>
>> Thanks,
>>
>> -Zhengyu
> 
> Hi Zhengyu,
> 
> I am not sure what you mean by 'evacuates target and exchange object,
> but not the field' .. clearly the target object needs to be evacuated, 
> because we only write to to-space (write-barrier). Also, the exchange 
> object needs to be evacuated, to ensure we end up only with to-space 
> references in fields (storeval-barrier). What do you mean by evacuation 
> of 'the field' ? The target field is part of the target object.

There is an example:
http://hg.openjdk.java.net/shenandoah/jdk10/file/6183a72bd5c2/src/hotspot/share/prims/unsafe.cpp#l1020

the addr points a field in object, that might not be evacuated and I 
think you do have to enqueue it, as the object may be gray.

Thanks,

-Zhengyu


> 
> The issue here seems to be that the usual 'pre-barrier' (i.e. 
> SATB-barrier) should not be called at all. However, since we do set the 
> MARKING bit, we still get into this code path. We might just want to 
> check for traversal-in-progress and return right at the start of the 
> method.
> 
> Roman

From shade at redhat.com  Thu Jan 25 17:41:30 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 25 Jan 2018 18:41:30 +0100
Subject: RFR: Fix 32-bit build by ifdef-ing non-implemented store-val barrier
Message-ID: <ee9bddf3-b72b-8173-1657-05cff83a34f1@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/build/storeval-i586/webrev.01/

x86_32 build is broken because Traversal GC references 64-bit registers:

/home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp: In member function
?virtual void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*, Register, Register)?:
/home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp:167:13: error:
?c_rarg1? was not declared in this scope
     __ push(c_rarg1);
             ^~~~~~~
/home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp:171:41: error:
?r15_thread? was not declared in this scope
     __ g1_write_barrier_pre(noreg, dst, r15_thread, tmp, true, false);
                                         ^~~~~~~~~~

The way out is to ifdef the barrier, like we did with the interpreter_write_barrier_impl a few
blocks above.

Testing: failing build

-Aleksey


From rkennke at redhat.com  Thu Jan 25 17:51:13 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 25 Jan 2018 18:51:13 +0100
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
 <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
Message-ID: <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>

Am 25.01.2018 um 18:35 schrieb Zhengyu Gu:
> 
> 
> On 01/25/2018 12:26 PM, Roman Kennke wrote:
>> Am 25.01.2018 um 18:15 schrieb Zhengyu Gu:
>>> I am not complete sure this is right fix. There is hole in CAS 
>>> barrier when using traversal heuristics.
>>>
>>> E.g. Unsafe_CompareAndSetObject() evacuates target and exchange 
>>> object, but not the field, so it may hit assertion in 
>>> ShenandoahBarrier::enqueue().
>>>
>>> I could not come up a reliable reproducer, but I have seen this a few 
>>> time with specjvm ScimarkLU with options:
>>>
>>> "-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions 
>>> -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal 
>>> -Xlog:gc+stats"
>>>
>>> Webrev: 
>>> http://cr.openjdk.java.net/~zgu/shenandoah/cas_traversal/webrev.00/
>>>
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>
>> Hi Zhengyu,
>>
>> I am not sure what you mean by 'evacuates target and exchange object,
>> but not the field' .. clearly the target object needs to be evacuated, 
>> because we only write to to-space (write-barrier). Also, the exchange 
>> object needs to be evacuated, to ensure we end up only with to-space 
>> references in fields (storeval-barrier). What do you mean by 
>> evacuation of 'the field' ? The target field is part of the target 
>> object.
> 
> There is an example:
> http://hg.openjdk.java.net/shenandoah/jdk10/file/6183a72bd5c2/src/hotspot/share/prims/unsafe.cpp#l1020 
> 
> 
> the addr points a field in object, that might not be evacuated and I 
> think you do have to enqueue it, as the object may be gray.

addr is a field in p, which is in to-space by the WB a few lines above. 
This should be good. x is the exchange value, also evacuated by the 
storeval barrier. So all should be fine.

What error/assert/crash are you seeing? Is it something in 
SBS::is_safe()? Then it may already be fixed by my subsequent changeset?

It may happen that traversal GC gets cancelled, and then we hit an 
overly strict assert like that.

Roman

From rkennke at redhat.com  Thu Jan 25 17:51:33 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 25 Jan 2018 18:51:33 +0100
Subject: RFR: Fix 32-bit build by ifdef-ing non-implemented store-val
 barrier
In-Reply-To: <ee9bddf3-b72b-8173-1657-05cff83a34f1@redhat.com>
References: <ee9bddf3-b72b-8173-1657-05cff83a34f1@redhat.com>
Message-ID: <d145bed7-65dd-1342-2b90-1a61f968b272@redhat.com>

Am 25.01.2018 um 18:41 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/build/storeval-i586/webrev.01/
> 
> x86_32 build is broken because Traversal GC references 64-bit registers:
> 
> /home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp: In member function
> ?virtual void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*, Register, Register)?:
> /home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp:167:13: error:
> ?c_rarg1? was not declared in this scope
>       __ push(c_rarg1);
>               ^~~~~~~
> /home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp:171:41: error:
> ?r15_thread? was not declared in this scope
>       __ g1_write_barrier_pre(noreg, dst, r15_thread, tmp, true, false);
>                                           ^~~~~~~~~~
> 
> The way out is to ifdef the barrier, like we did with the interpreter_write_barrier_impl a few
> blocks above.
> 
> Testing: failing build
> 
> -Aleksey
> 

Oh my. Yes, please push. Thanks for fixing it!

Roman


From ashipile at redhat.com  Thu Jan 25 18:05:21 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Thu, 25 Jan 2018 18:05:21 +0000
Subject: hg: shenandoah/jdk10: Fix 32-bit build by ifdef-ing non-implemented
 storeval barrier
Message-ID: <201801251805.w0PI5Msh003251@aojmv0008.oracle.com>

Changeset: 3c12448ec444
Author:    shade
Date:      2018-01-25 18:44 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/3c12448ec444

Fix 32-bit build by ifdef-ing non-implemented storeval barrier

! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp


From zgu at redhat.com  Thu Jan 25 19:05:38 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 25 Jan 2018 14:05:38 -0500
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
 <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
 <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>
Message-ID: <f597a96a-a0de-9aef-e5a7-71fc580083c3@redhat.com>


>>>
>>> Hi Zhengyu,
>>>
>>> I am not sure what you mean by 'evacuates target and exchange object,
>>> but not the field' .. clearly the target object needs to be 
>>> evacuated, because we only write to to-space (write-barrier). Also, 
>>> the exchange object needs to be evacuated, to ensure we end up only 
>>> with to-space references in fields (storeval-barrier). What do you 
>>> mean by evacuation of 'the field' ? The target field is part of the 
>>> target object.
>>
>> There is an example:
>> http://hg.openjdk.java.net/shenandoah/jdk10/file/6183a72bd5c2/src/hotspot/share/prims/unsafe.cpp#l1020 
>>
>>
>> the addr points a field in object, that might not be evacuated and I 
>> think you do have to enqueue it, as the object may be gray.
> 
> addr is a field in p, which is in to-space by the WB a few lines above. 
> This should be good. x is the exchange value, also evacuated by the 
> storeval barrier. So all should be fine.

Yes, p and x are fine, but the field (e.g. an object) to be swapped out, 
may still in cset, and it is enqueued by 
oopDesc::atomic_compare_exchange_oop() 
(http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/oops/oop.inline.hpp#l407), 
then hit assertion failure here:

http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp#l464

hs_err: https://paste.fedoraproject.org/paste/isPRNXG6PqaPgx9a18IA4w


BTW, what's reason it has to be to-space object? can it be evacuated 
during processing SATB buffers, or by storeval_barrier()?


-Zhengyu


> 
> What error/assert/crash are you seeing? Is it something in 
> SBS::is_safe()? Then it may already be fixed by my subsequent changeset?
> 
> It may happen that traversal GC gets cancelled, and then we hit an 
> overly strict assert like that.
> 
> Roman

From rkennke at redhat.com  Thu Jan 25 19:29:25 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 25 Jan 2018 20:29:25 +0100
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <f597a96a-a0de-9aef-e5a7-71fc580083c3@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
 <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
 <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>
 <f597a96a-a0de-9aef-e5a7-71fc580083c3@redhat.com>
Message-ID: <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com>

Am 25.01.2018 um 20:05 schrieb Zhengyu Gu:
> 
>>>>
>>>> Hi Zhengyu,
>>>>
>>>> I am not sure what you mean by 'evacuates target and exchange object,
>>>> but not the field' .. clearly the target object needs to be 
>>>> evacuated, because we only write to to-space (write-barrier). Also, 
>>>> the exchange object needs to be evacuated, to ensure we end up only 
>>>> with to-space references in fields (storeval-barrier). What do you 
>>>> mean by evacuation of 'the field' ? The target field is part of the 
>>>> target object.
>>>
>>> There is an example:
>>> http://hg.openjdk.java.net/shenandoah/jdk10/file/6183a72bd5c2/src/hotspot/share/prims/unsafe.cpp#l1020 
>>>
>>>
>>> the addr points a field in object, that might not be evacuated and I 
>>> think you do have to enqueue it, as the object may be gray.
>>
>> addr is a field in p, which is in to-space by the WB a few lines 
>> above. This should be good. x is the exchange value, also evacuated by 
>> the storeval barrier. So all should be fine.
> 
> Yes, p and x are fine, but the field (e.g. an object) to be swapped out, 
> may still in cset,

No. The field is not an object. The field is a reference, and belongs to 
p, and points to another object (e.g. x). It is not an object by itself 
and thus cannot be evacuated or such.

  and it is enqueued by
> oopDesc::atomic_compare_exchange_oop() 
> (http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/oops/oop.inline.hpp#l407), 
> then hit assertion failure here:
> 
> http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp#l464 
> 
> 
> hs_err: https://paste.fedoraproject.org/paste/isPRNXG6PqaPgx9a18IA4w

Hmm, ok. This is a problem in oopDesc::::atomic_compare_exchange_oop(). 
It calls write_ref_field_pre(), which it shouldn't do. By our design, it 
should call storeval_barrier() instead, which does the right thing. 
However, this is going to change once we merge from upstream...

> BTW, what's reason it has to be to-space object? can it be evacuated 
> during processing SATB buffers, or by storeval_barrier()?

We have two reasons for forcing to-space objects:
- We must only ever write to to-space objects for consistency
- We must only ever store to-space objects into fields, because the GC 
threads that concurrently update fields may already have visited it. If 
Java threads were writing from-space objects we may end up with 
pointers/fields to from-space objects after GC.

Roman


From zgu at redhat.com  Thu Jan 25 19:55:06 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 25 Jan 2018 14:55:06 -0500
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
 <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
 <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>
 <f597a96a-a0de-9aef-e5a7-71fc580083c3@redhat.com>
 <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com>
Message-ID: <09b2edec-6113-e0c5-c2ba-297eecd39cda@redhat.com>

>> BTW, what's reason it has to be to-space object? can it be evacuated 
>> during processing SATB buffers, or by storeval_barrier()?
Sorry, I am not clear on my question, which should be: why should only 
enqueue to-space object during conc-traversal gc?

> 
> We have two reasons for forcing to-space objects:
> - We must only ever write to to-space objects for consistency
> - We must only ever store to-space objects into fields, because the GC 
> threads that concurrently update fields may already have visited it. If 
> Java threads were writing from-space objects we may end up with 
> pointers/fields to from-space objects after GC.

I understand above reasons.

But these do not apply to object to be enqueued to satisfy SATB 
protocol, since we do not write or update this object, but just to make 
sure it should be marked. If I understand correctly, this object has to 
be in to-space at the end of GC cycle with traversal gc. however, I 
don't see why it has to be to-space object to be enqueued, can it be 
evacuated when it is processed?

BTW, do you want to take over this one?

Thanks,

-Zhengyu


> 
> Roman
> 

From zgu at redhat.com  Thu Jan 25 20:02:44 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 25 Jan 2018 15:02:44 -0500
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
 <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
 <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>
 <f597a96a-a0de-9aef-e5a7-71fc580083c3@redhat.com>
 <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com>
Message-ID: <5886ba13-5457-358f-071b-8082ea3742bc@redhat.com>

> 
> No. The field is not an object. The field is a reference, and belongs to 
> p, and points to another object (e.g. x). It is not an object by itself 
> and thus cannot be evacuated or such.
Sorry, my bad writing, it is a reference to an object that may still in 
from-space.

> 
>   and it is enqueued by
>> oopDesc::atomic_compare_exchange_oop() 
>> (http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/oops/oop.inline.hpp#l407), 
>> then hit assertion failure here:
>>
>> http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp#l464 
>>
>>
>> hs_err: https://paste.fedoraproject.org/paste/isPRNXG6PqaPgx9a18IA4w
> 
> Hmm, ok. This is a problem in oopDesc::::atomic_compare_exchange_oop(). 
> It calls write_ref_field_pre(), which it shouldn't do. By our design, it 
> should call storeval_barrier() instead, which does the right thing. 
> However, this is going to change once we merge from upstream...

Sorry, call storeval_barrier() on what? my understanding this that, you 
have to apply SATB barrier on swapped out *old* value, which is this 
write_ref_field_pre() does, no?


Thanks,

-Zhengyu

> 
>> BTW, what's reason it has to be to-space object? can it be evacuated 
>> during processing SATB buffers, or by storeval_barrier()?
> 
> We have two reasons for forcing to-space objects:
> - We must only ever write to to-space objects for consistency
> - We must only ever store to-space objects into fields, because the GC 
> threads that concurrently update fields may already have visited it. If 
> Java threads were writing from-space objects we may end up with 
> pointers/fields to from-space objects after GC.
> 
> Roman
> 

From rkennke at redhat.com  Thu Jan 25 20:06:32 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 25 Jan 2018 21:06:32 +0100
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <09b2edec-6113-e0c5-c2ba-297eecd39cda@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
 <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
 <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>
 <f597a96a-a0de-9aef-e5a7-71fc580083c3@redhat.com>
 <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com>
 <09b2edec-6113-e0c5-c2ba-297eecd39cda@redhat.com>
Message-ID: <2378a348-ff67-24f7-6995-848e2dd15a3a@redhat.com>

Am 25.01.2018 um 20:55 schrieb Zhengyu Gu:
>>> BTW, what's reason it has to be to-space object? can it be evacuated 
>>> during processing SATB buffers, or by storeval_barrier()?
> Sorry, I am not clear on my question, which should be: why should only 
> enqueue to-space object during conc-traversal gc?
> 
>>
>> We have two reasons for forcing to-space objects:
>> - We must only ever write to to-space objects for consistency
>> - We must only ever store to-space objects into fields, because the GC 
>> threads that concurrently update fields may already have visited it. 
>> If Java threads were writing from-space objects we may end up with 
>> pointers/fields to from-space objects after GC.
> 
> I understand above reasons.
> 
> But these do not apply to object to be enqueued to satisfy SATB 
> protocol, since we do not write or update this object, but just to make 
> sure it should be marked. If I understand correctly, this object has to 
> be in to-space at the end of GC cycle with traversal gc. however, I 
> don't see why it has to be to-space object to be enqueued, can it be 
> evacuated when it is processed?

The storeval barrier has two purposes: one is to ensure consistency vs. 
'update-refs' (traversal updates references). The other is to ensure 
consistency vs traversal of the heap (e.g. 'marking'). If it needs to 
write-barrier the object anyway (to ensure update-refs consistency), 
then we can just as well make this an invariant. Then we can avoid 
reading the fwd ptrs in the GC thread.

> BTW, do you want to take over this one?

Ok, can do it. Do you happen to have a reproducer?

Roman

From rkennke at redhat.com  Thu Jan 25 20:09:39 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 25 Jan 2018 21:09:39 +0100
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <5886ba13-5457-358f-071b-8082ea3742bc@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
 <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
 <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>
 <f597a96a-a0de-9aef-e5a7-71fc580083c3@redhat.com>
 <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com>
 <5886ba13-5457-358f-071b-8082ea3742bc@redhat.com>
Message-ID: <2bd76355-c636-dfc4-e1a1-031640999c66@redhat.com>

Am 25.01.2018 um 21:02 schrieb Zhengyu Gu:
>>
>> No. The field is not an object. The field is a reference, and belongs 
>> to p, and points to another object (e.g. x). It is not an object by 
>> itself and thus cannot be evacuated or such.
> Sorry, my bad writing, it is a reference to an object that may still in 
> from-space.

Ok. Yes.

>> ? and it is enqueued by
>>> oopDesc::atomic_compare_exchange_oop() 
>>> (http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/oops/oop.inline.hpp#l407), 
>>> then hit assertion failure here:
>>>
>>> http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp#l464 
>>>
>>>
>>> hs_err: https://paste.fedoraproject.org/paste/isPRNXG6PqaPgx9a18IA4w
>>
>> Hmm, ok. This is a problem in 
>> oopDesc::::atomic_compare_exchange_oop(). It calls 
>> write_ref_field_pre(), which it shouldn't do. By our design, it should 
>> call storeval_barrier() instead, which does the right thing. However, 
>> this is going to change once we merge from upstream...
> 
> Sorry, call storeval_barrier() on what? my understanding this that, you 
> have to apply SATB barrier on swapped out *old* value, which is this 
> write_ref_field_pre() does, no?

We have a little naming problem here. While we're using G1's SATB buffer 
to enqueue objects, the traversal GC algorithm is *not* SATB-based. It 
is incremental-update-based, which is kindof the opposite of SATB. (one 
could call it 'snapshot-at-the-end' (-of-traversal). Instead of enqueing 
the previous values on stores, it enqueues the *new* values on stores. 
This is why the storeval barrier can do both enqueue (for i-u) and WB 
(for update-refs-consistency) in one swoop.

I hope this clarifies it. ?

Roman

From zgu at redhat.com  Thu Jan 25 20:18:02 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 25 Jan 2018 15:18:02 -0500
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <2378a348-ff67-24f7-6995-848e2dd15a3a@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
 <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
 <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>
 <f597a96a-a0de-9aef-e5a7-71fc580083c3@redhat.com>
 <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com>
 <09b2edec-6113-e0c5-c2ba-297eecd39cda@redhat.com>
 <2378a348-ff67-24f7-6995-848e2dd15a3a@redhat.com>
Message-ID: <dae7bb4b-2f38-3dbf-0095-f9c5ca5445c4@redhat.com>

>  > Ok, can do it. Do you happen to have a reproducer?

Not simple reproducer. I got this by running ScimarkLU benchmark, it may 
takes a few runs.

${JAVA_HOME}/bin/java -jar jmh-specjvm2016.jar ScimarkLU --jvmArgs 
"-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions 
-XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal 
-Xlog:gc+stats" -f 1


Thanks,

-Zhengyu


> 
> Roman

From zgu at redhat.com  Thu Jan 25 20:19:42 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 25 Jan 2018 15:19:42 -0500
Subject: RFR: Hole in CAS barrier when using traversal heuristics
In-Reply-To: <2bd76355-c636-dfc4-e1a1-031640999c66@redhat.com>
References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com>
 <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com>
 <acbff464-0b1d-a9c5-aa55-02de627fda24@redhat.com>
 <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com>
 <f597a96a-a0de-9aef-e5a7-71fc580083c3@redhat.com>
 <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com>
 <5886ba13-5457-358f-071b-8082ea3742bc@redhat.com>
 <2bd76355-c636-dfc4-e1a1-031640999c66@redhat.com>
Message-ID: <e46c0693-cc03-668b-46b0-39a75e549c6c@redhat.com>

> 
> We have a little naming problem here. While we're using G1's SATB buffer 
> to enqueue objects, the traversal GC algorithm is *not* SATB-based. It 
> is incremental-update-based, which is kindof the opposite of SATB. (one 
> could call it 'snapshot-at-the-end' (-of-traversal). Instead of enqueing 
> the previous values on stores, it enqueues the *new* values on stores. 
> This is why the storeval barrier can do both enqueue (for i-u) and WB 
> (for update-refs-consistency) in one swoop.
> 
> I hope this clarifies it. ?
Okay, I guess I have to catch up this :-(

Thanks,

-Zhengyu

> 
> Roman

From rkennke at redhat.com  Fri Jan 26 10:43:43 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 26 Jan 2018 11:43:43 +0100
Subject: RFR: Don't enter SATB pre-barrier when in traversal.
Message-ID: <8dbb6ada-3e0c-b4fe-9b1f-e03af6a7580e@redhat.com>

This is the fix for the problem that Zhengyu found. It's another side 
effect of turning on MARKING during traversal. We need to ensure to not 
enter the SATB pre-barrier during traversal.

http://cr.openjdk.java.net/~rkennke/traversal-no-pre-barrier/webrev.00/

Passes hotspot_gc_shenandoah

Good?

From shade at redhat.com  Fri Jan 26 11:17:51 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 26 Jan 2018 12:17:51 +0100
Subject: RFR: Don't enter SATB pre-barrier when in traversal.
In-Reply-To: <8dbb6ada-3e0c-b4fe-9b1f-e03af6a7580e@redhat.com>
References: <8dbb6ada-3e0c-b4fe-9b1f-e03af6a7580e@redhat.com>
Message-ID: <577cbe94-e051-0387-6c91-00d066cded4f@redhat.com>

On 01/26/2018 11:43 AM, Roman Kennke wrote:
> This is the fix for the problem that Zhengyu found. It's another side effect of turning on MARKING
> during traversal. We need to ensure to not enter the SATB pre-barrier during traversal.
> 
> http://cr.openjdk.java.net/~rkennke/traversal-no-pre-barrier/webrev.00/

Okay.

So maybe it is wrong to turn off MARKING during Traversal GC? Let GC state MARKING only mean regular
concurrent marking cycle?

Thanks,
-Aleksey


From shade at redhat.com  Fri Jan 26 12:00:04 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 26 Jan 2018 13:00:04 +0100
Subject: RFR: [9] Bulk backports to sh/jdk9
Message-ID: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180126/webrev.01/

Changes include:

8735773ec619: Single thread-local GC state flag for all barriers
544322604347: ShConcurrentThread races with set_gc_state_bit
dc779781dd5e: Do not put down update-refs-in-progress flag concurrently
d55c6d5216d1: Common TLS access to GC state, where possible
1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase
fd9724b26fdd: Refactor allocation failure and explicit GC handling
e5398dce6e7b: Make concurrent precleaning log message optional again
26b9048c042a: Make degenerated update-refs use region-set cursor to hand over work
1a6a9f288dd2: Bitmap size might not be page aligned when large page is used
12654193e434: Demote warning message about OOM-during-evac to informational
67294a38c0c7: TestSelectiveBarrierFlags should accept multi-element flag selections
ecb87af5e0d8: Implement flag to generate write-barriers without membars.
820129a799b1: Allocation failure injection machinery
b8c39bdc0dac: Log message on ref processing, class unload, update refs for mark events
45d471869b73: Degenerated GC
15261c4a6adf: Degenerated GC: shortcut cycles, upgrade futile cycles
bd01b07ba0d7: Add ShenandoahRootProcessor API to report threads while scanning roots
3a6457fecc72: Relax assert in SBS::is_safe()
30e8ba6e2794: VerifyJCStressTest should test all heuristics
6183a72bd5c2: ShBS::interpreter_storeval_barrier signature fix and cleanup
3c12448ec444: Fix 32-bit build by ifdef-ing non-implemented storeval barrier

Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm

Thanks,
-Aleksey


From rkennke at redhat.com  Fri Jan 26 11:53:33 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 26 Jan 2018 12:53:33 +0100
Subject: RFR: Don't enter SATB pre-barrier when in traversal.
In-Reply-To: <577cbe94-e051-0387-6c91-00d066cded4f@redhat.com>
References: <8dbb6ada-3e0c-b4fe-9b1f-e03af6a7580e@redhat.com>
 <577cbe94-e051-0387-6c91-00d066cded4f@redhat.com>
Message-ID: <E180D80E-116D-405E-B36B-08EA1FFD39F3@redhat.com>


Am 26. Januar 2018 12:17:51 MEZ schrieb Aleksey Shipilev <shade at redhat.com>:
>On 01/26/2018 11:43 AM, Roman Kennke wrote:
>> This is the fix for the problem that Zhengyu found. It's another side
>effect of turning on MARKING
>> during traversal. We need to ensure to not enter the SATB pre-barrier
>during traversal.
>> 
>>
>http://cr.openjdk.java.net/~rkennke/traversal-no-pre-barrier/webrev.00/
>
>Okay.
>
>So maybe it is wrong to turn off MARKING during Traversal GC? Let GC
>state MARKING only mean regular
>concurrent marking cycle?


Yes I think so. Major GC phases (concmark, evac, uprefs, partial and traversal) should not overlap, and barrier code should positively select what it wants, and not exclude what it doesn't want.

Let me rewrite this stuff.


Roman

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From shade at redhat.com  Fri Jan 26 16:48:29 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 26 Jan 2018 17:48:29 +0100
Subject: RFR: [8u] Critical backports to sh/jdk8u
Message-ID: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180126-crit/webrev.01/

We do not have much time to complete bulk backports, so let us backport only the critical
bug/perf/test fixes:

dc779781dd5e: Do not put down update-refs-in-progress flag concurrently
1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase
1a6a9f288dd2: Bitmap size might not be page aligned when large page is used
30e8ba6e2794: VerifyJCStressTest should test all heuristics
820129a799b1: Allocation failure injection machinery

Let's do these right now. We shall backport other things as time allows.

Testing: hotspot_gc_shenandoah {fastdebug|release}

Thanks,
-Aleksey


From shade at redhat.com  Fri Jan 26 17:00:20 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 26 Jan 2018 18:00:20 +0100
Subject: RFR: Conditionalize PerfDataMemorySize on enabled heap sampling
Message-ID: <b81d620e-a2d3-78d2-4222-d0fc82973112@redhat.com>

Saves some memory when sampling is not enabled (default case), and boosts up when we deal with lots
of Shenandoah sampling data:

diff -r 3c12448ec444 src/hotspot/share/runtime/arguments.cpp
--- a/src/hotspot/share/runtime/arguments.cpp	Thu Jan 25 18:44:13 2018 +0100
+++ b/src/hotspot/share/runtime/arguments.cpp	Fri Jan 26 17:58:21 2018 +0100
@@ -2035,8 +2035,10 @@
     FLAG_SET_DEFAULT(ParallelRefProcEnabled, true);
   }

-  if (FLAG_IS_DEFAULT(PerfDataMemorySize)) {
-    FLAG_SET_DEFAULT(PerfDataMemorySize, 512*K);
+  if (ShenandoahRegionSampling && FLAG_IS_DEFAULT(PerfDataMemorySize)) {
+    // When sampling is enabled, max out the PerfData memory to get more
+    // Shenandoah data in, including Matrix.
+    FLAG_SET_DEFAULT(PerfDataMemorySize, 2048*K);
   }

 #ifdef COMPILER2

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From cflood at redhat.com  Fri Jan 26 17:03:40 2018
From: cflood at redhat.com (Christine Flood)
Date: Fri, 26 Jan 2018 12:03:40 -0500
Subject: RFR: Conditionalize PerfDataMemorySize on enabled heap sampling
In-Reply-To: <b81d620e-a2d3-78d2-4222-d0fc82973112@redhat.com>
References: <b81d620e-a2d3-78d2-4222-d0fc82973112@redhat.com>
Message-ID: <CALKUemydsRVs4Usw-ne0X3j-VZH4iTFyAQszzR3QOTYSOSpk1A@mail.gmail.com>

Looks good,

Thanks!

Christine


On Fri, Jan 26, 2018 at 12:00 PM, Aleksey Shipilev <shade at redhat.com> wrote:
> Saves some memory when sampling is not enabled (default case), and boosts up when we deal with lots
> of Shenandoah sampling data:
>
> diff -r 3c12448ec444 src/hotspot/share/runtime/arguments.cpp
> --- a/src/hotspot/share/runtime/arguments.cpp   Thu Jan 25 18:44:13 2018 +0100
> +++ b/src/hotspot/share/runtime/arguments.cpp   Fri Jan 26 17:58:21 2018 +0100
> @@ -2035,8 +2035,10 @@
>      FLAG_SET_DEFAULT(ParallelRefProcEnabled, true);
>    }
>
> -  if (FLAG_IS_DEFAULT(PerfDataMemorySize)) {
> -    FLAG_SET_DEFAULT(PerfDataMemorySize, 512*K);
> +  if (ShenandoahRegionSampling && FLAG_IS_DEFAULT(PerfDataMemorySize)) {
> +    // When sampling is enabled, max out the PerfData memory to get more
> +    // Shenandoah data in, including Matrix.
> +    FLAG_SET_DEFAULT(PerfDataMemorySize, 2048*K);
>    }
>
>  #ifdef COMPILER2
>
> Testing: hotspot_gc_shenandoah
>
> Thanks,
> -Aleksey
>

From zgu at redhat.com  Fri Jan 26 17:10:20 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 26 Jan 2018 12:10:20 -0500
Subject: RFR: [8u] Critical backports to sh/jdk8u
In-Reply-To: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com>
References: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com>
Message-ID: <82090529-a7db-d446-a773-399de7e36ff1@redhat.com>

Backport looks good.

-Zhengyu

On 01/26/2018 11:48 AM, Aleksey Shipilev wrote:
> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180126-crit/webrev.01/
> 
> We do not have much time to complete bulk backports, so let us backport only the critical
> bug/perf/test fixes:
> 
> dc779781dd5e: Do not put down update-refs-in-progress flag concurrently
> 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase
> 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used
> 30e8ba6e2794: VerifyJCStressTest should test all heuristics
> 820129a799b1: Allocation failure injection machinery
> 
> Let's do these right now. We shall backport other things as time allows.
> 
> Testing: hotspot_gc_shenandoah {fastdebug|release}
> 
> Thanks,
> -Aleksey
> 

From cflood at redhat.com  Fri Jan 26 17:37:35 2018
From: cflood at redhat.com (Christine Flood)
Date: Fri, 26 Jan 2018 12:37:35 -0500
Subject: RFR: [8u] Critical backports to sh/jdk8u
In-Reply-To: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com>
References: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com>
Message-ID: <CALKUemxZK0oNbOzXkYGXjVFmR-AZidECoXJZT1sE7=84-LEFMg@mail.gmail.com>

void ShenandoahHeap::try_inject_alloc_failure() {
+   if (ShenandoahAllocFailureALot && !cancelled_concgc() &&
((os::random() % 1000) > 950)) {
+     _inject_alloc_failure.set();
+     Thread::current()->_ParkEvent->park(1);
+     if (cancelled_concgc()) {
+       log_info(gc)("Allocation failure was successfully injected");
+     }
+   }
+ }

Is it possible that there is a race and we get to the test for
cancelled_concgc before it actually gets set?

Is there any reason not to try JCStressTests with frequent allocation
failures?

I'm just curious.  I don't see any reason to stop the patch moving forward.


Christine


On Fri, Jan 26, 2018 at 11:48 AM, Aleksey Shipilev <shade at redhat.com> wrote:
> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180126-crit/webrev.01/
>
> We do not have much time to complete bulk backports, so let us backport only the critical
> bug/perf/test fixes:
>
> dc779781dd5e: Do not put down update-refs-in-progress flag concurrently
> 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase
> 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used
> 30e8ba6e2794: VerifyJCStressTest should test all heuristics
> 820129a799b1: Allocation failure injection machinery
>
> Let's do these right now. We shall backport other things as time allows.
>
> Testing: hotspot_gc_shenandoah {fastdebug|release}
>
> Thanks,
> -Aleksey
>

From shade at redhat.com  Fri Jan 26 17:40:30 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 26 Jan 2018 18:40:30 +0100
Subject: RFR: [8u] Critical backports to sh/jdk8u
In-Reply-To: <CALKUemxZK0oNbOzXkYGXjVFmR-AZidECoXJZT1sE7=84-LEFMg@mail.gmail.com>
References: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com>
 <CALKUemxZK0oNbOzXkYGXjVFmR-AZidECoXJZT1sE7=84-LEFMg@mail.gmail.com>
Message-ID: <f7cbd618-624c-2b98-2827-f0d3bbd98c13@redhat.com>

On 01/26/2018 06:37 PM, Christine Flood wrote:
> void ShenandoahHeap::try_inject_alloc_failure() {
> +   if (ShenandoahAllocFailureALot && !cancelled_concgc() &&
> ((os::random() % 1000) > 950)) {
> +     _inject_alloc_failure.set();
> +     Thread::current()->_ParkEvent->park(1);
> +     if (cancelled_concgc()) {
> +       log_info(gc)("Allocation failure was successfully injected");
> +     }
> +   }
> + }
> 
> Is it possible that there is a race and we get to the test for
> cancelled_concgc before it actually gets set?

Yes, but we don't care. This is mostly to observe that any thread had reacted on this failure within
the time set -- a debugging measure.

> Is there any reason not to try JCStressTests with frequent allocation
> failures?

JCStressTests are mostly for testing internal compiler/oop verification, supposed to be fast. Other
tests figure out what happens on alloc failures.

-Aleksey


From ashipile at redhat.com  Fri Jan 26 17:45:08 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Fri, 26 Jan 2018 17:45:08 +0000
Subject: hg: shenandoah/jdk10: Conditionalize PerfDataMemorySize on enabled
 heap sampling
Message-ID: <201801261745.w0QHj8Qe001342@aojmv0008.oracle.com>

Changeset: 16198c705496
Author:    shade
Date:      2018-01-26 17:56 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/16198c705496

Conditionalize PerfDataMemorySize on enabled heap sampling

! src/hotspot/share/runtime/arguments.cpp


From zgu at redhat.com  Fri Jan 26 17:54:03 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 26 Jan 2018 12:54:03 -0500
Subject: RFR: Missing cancelled concgc check results assertion failure
Message-ID: <028cd960-5aaa-7e5f-6137-2deaee117166@redhat.com>

Diving to weak reference work without checking cancelled concgc, results 
assertion failure of not emptied task queues.

http://cr.openjdk.java.net/~zgu/shenandoah/tq_cancelled_gc/hs_err.txt

Webrev:
http://cr.openjdk.java.net/~zgu/shenandoah/tq_cancelled_gc/webrev.00/


Thanks,

-Zhengyu

From ashipile at redhat.com  Fri Jan 26 18:04:24 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Fri, 26 Jan 2018 18:04:24 +0000
Subject: hg: shenandoah/jdk8u/hotspot: 5 new changesets
Message-ID: <201801261804.w0QI4OY8009963@aojmv0008.oracle.com>

Changeset: daa774ac0d72
Author:    shade
Date:      2018-01-22 12:04 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/daa774ac0d72

Do not put down update-refs-in-progress flag concurrently

! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp

Changeset: 04b591f74de6
Author:    rkennke
Date:      2018-01-17 15:33 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/04b591f74de6

Defer cleaning of system dictionary and friends to parallel cleaning phase

! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp

Changeset: 5ae425989ac9
Author:    zgu
Date:      2018-01-18 08:23 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/5ae425989ac9

Bitmap size might not be page aligned when large page is used

! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp

Changeset: 08632a44a72e
Author:    shade
Date:      2018-01-24 19:14 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/08632a44a72e

VerifyJCStressTest should test all heuristics

! test/gc/shenandoah/acceptance/VerifyJCStressTest.java

Changeset: 0020bc4708fc
Author:    shade
Date:      2018-01-19 18:49 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/0020bc4708fc

Allocation failure injection machinery

! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp
! test/gc/shenandoah/LotsOfCycles.java
! test/gc/shenandoah/acceptance/AllocIntArrays.java
! test/gc/shenandoah/acceptance/AllocObjectArrays.java
! test/gc/shenandoah/acceptance/AllocObjects.java
! test/gc/shenandoah/acceptance/RetainObjects.java
! test/gc/shenandoah/acceptance/SieveObjects.java


From rkennke at redhat.com  Tue Jan 30 09:54:18 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 30 Jan 2018 10:54:18 +0100
Subject: RFR: Make major GC phases exclusive from each other
Message-ID: <CAAN-KygNkkbTQL1ucuK3Jm+VVoxMn=Mwkc+8gFYxGGsT+kJkDA@mail.gmail.com>

Currently, partial and traversal use overlapping GC phase bits: partial
also activates evac, traversal activates everything. This causes a little
mess when selecting barriers, as observed by Zhengyu last week.

This patch makes all the major phases exclusive (marking, evac,
update-refs, partial and traversal). Barriers are always included, and
never excluded. This seems cleaner and easier to understand to me.

The state bit for 'has-forwarded' is still overlapping. Not sure what to do
with that.

Bits in the gc-state bitmask are now addressed via mask, and not via
position. This allows to check for groups of phases in one check. E.g.
write-barriers are now checking for EVACUATION | PARTIAL | TRAVERSAL

Passes hotspot_gc_shenandoah

http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.00/

Ok?

Some observations while I did this:
- ShenandoahConditionalSATBBarrier can now be greatly simplified or even
eliminated
- Partial can use machinery from Traversal for speed boost: e.g.
ShenandoahEnqueueBarrier
- Traversal still has a liveness accounting problem

... all of which I will address in followup patches

Roman

From shade at redhat.com  Tue Jan 30 10:07:49 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 30 Jan 2018 11:07:49 +0100
Subject: RFR: Make major GC phases exclusive from each other
In-Reply-To: <CAAN-KygNkkbTQL1ucuK3Jm+VVoxMn=Mwkc+8gFYxGGsT+kJkDA@mail.gmail.com>
References: <CAAN-KygNkkbTQL1ucuK3Jm+VVoxMn=Mwkc+8gFYxGGsT+kJkDA@mail.gmail.com>
Message-ID: <fe149a00-2f04-52d5-00a3-d09dd474b2d8@redhat.com>

On 01/30/2018 10:54 AM, Roman Kennke wrote:
> This patch makes all the major phases exclusive (marking, evac,
> update-refs, partial and traversal). Barriers are always included, and
> never excluded. This seems cleaner and easier to understand to me.

Yup. Definitely looks better.

> The state bit for 'has-forwarded' is still overlapping. Not sure what to do
> with that.

Nothing, it should be like that by design.

> http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.00/

*) The change in shenandoahBarrierSet.cpp is not needed anymore, as the two bits are now exclusive?

 283   if (_heap->is_concurrent_mark_in_progress() && ! _heap->is_concurrent_traversal_in_progress()) {

*) set_gc_state_bit is now misnomer, I think: it is set_gc_state_mask?

*) It also seems possible to put the mask exactly once now?

Instead of:

+   set_gc_state_bit(TRAVERSAL, in_progress);
+   set_gc_state_bit(HAS_FORWARDED, in_progress);

Do:

   set_gc_state_bit(HAS_FORWARDED | TRAVERSAL, in_progress);

Thanks,
-Aleksey


From rkennke at redhat.com  Tue Jan 30 10:51:40 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 30 Jan 2018 11:51:40 +0100
Subject: RFR: Make major GC phases exclusive from each other
In-Reply-To: <fe149a00-2f04-52d5-00a3-d09dd474b2d8@redhat.com>
References: <CAAN-KygNkkbTQL1ucuK3Jm+VVoxMn=Mwkc+8gFYxGGsT+kJkDA@mail.gmail.com>
 <fe149a00-2f04-52d5-00a3-d09dd474b2d8@redhat.com>
Message-ID: <CAAN-KyhYFF1XVtDk-vJuf9PYa72O3RFEw8tt9pLqF8FTdGyfNg@mail.gmail.com>

On Tue, Jan 30, 2018 at 11:07 AM, Aleksey Shipilev <shade at redhat.com> wrote:

> On 01/30/2018 10:54 AM, Roman Kennke wrote:
> > This patch makes all the major phases exclusive (marking, evac,
> > update-refs, partial and traversal). Barriers are always included, and
> > never excluded. This seems cleaner and easier to understand to me.
>
> Yup. Definitely looks better.
>
> > The state bit for 'has-forwarded' is still overlapping. Not sure what to
> do
> > with that.
>
> Nothing, it should be like that by design.
>
>
Ok, good.


> > http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.00/
>
> *) The change in shenandoahBarrierSet.cpp is not needed anymore, as the
> two bits are now exclusive?
>
>  283   if (_heap->is_concurrent_mark_in_progress() && !
> _heap->is_concurrent_traversal_in_progress()) {
>
> Right. Good catch.


> *) set_gc_state_bit is now misnomer, I think: it is set_gc_state_mask?
>
>
Fixed.


> *) It also seems possible to put the mask exactly once now?
>
> Instead of:
>
> +   set_gc_state_bit(TRAVERSAL, in_progress);
> +   set_gc_state_bit(HAS_FORWARDED, in_progress);
>
> Do:
>
>    set_gc_state_bit(HAS_FORWARDED | TRAVERSAL, in_progress);
>
>
Fixed.

Differential:
http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.01.diff/
Full:
http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.01/

Ok now?

Roman

From shade at redhat.com  Tue Jan 30 10:55:58 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 30 Jan 2018 11:55:58 +0100
Subject: RFR: Make major GC phases exclusive from each other
In-Reply-To: <CAAN-KyhYFF1XVtDk-vJuf9PYa72O3RFEw8tt9pLqF8FTdGyfNg@mail.gmail.com>
References: <CAAN-KygNkkbTQL1ucuK3Jm+VVoxMn=Mwkc+8gFYxGGsT+kJkDA@mail.gmail.com>
 <fe149a00-2f04-52d5-00a3-d09dd474b2d8@redhat.com>
 <CAAN-KyhYFF1XVtDk-vJuf9PYa72O3RFEw8tt9pLqF8FTdGyfNg@mail.gmail.com>
Message-ID: <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com>

On 01/30/2018 11:51 AM, Roman Kennke wrote:
> Full:
> http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.01/

Good. I'd still do a few tune-ups:

 *) Keep the asserts in set/unset in ShenandoahSharedBitmap that the incoming value fits into byte.

 *) Arguments should be "mask", not "bit"
  void ShenandoahHeap::set_gc_state_mask_concurrently(uint bit, bool value) {
  void ShenandoahHeap::set_gc_state_mask(uint bit, bool value) {

No need for re-review.

Thanks,
-Aleksey


From roman at kennke.org  Tue Jan 30 11:25:04 2018
From: roman at kennke.org (roman at kennke.org)
Date: Tue, 30 Jan 2018 11:25:04 +0000
Subject: hg: shenandoah/jdk10: Make major GC phases exclusive from each other
Message-ID: <201801301125.w0UBP4ch005284@aojmv0008.oracle.com>

Changeset: dd1b2cd3c66e
Author:    rkennke
Date:      2018-01-30 12:20 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/dd1b2cd3c66e

Make major GC phases exclusive from each other

! src/hotspot/cpu/x86/c1_Runtime1_x86.cpp
! src/hotspot/cpu/x86/macroAssembler_x86.cpp
! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp
! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp
! src/hotspot/share/opto/shenandoahSupport.cpp


From rkennke at redhat.com  Tue Jan 30 11:21:49 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 30 Jan 2018 12:21:49 +0100
Subject: RFR: Make major GC phases exclusive from each other
In-Reply-To: <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com>
References: <CAAN-KygNkkbTQL1ucuK3Jm+VVoxMn=Mwkc+8gFYxGGsT+kJkDA@mail.gmail.com>
 <fe149a00-2f04-52d5-00a3-d09dd474b2d8@redhat.com>
 <CAAN-KyhYFF1XVtDk-vJuf9PYa72O3RFEw8tt9pLqF8FTdGyfNg@mail.gmail.com>
 <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com>
Message-ID: <CAAN-KyigQEK-0u-PFt7Vr5=SCCuGLAX1_TcK_rLweNDk5HmQdQ@mail.gmail.com>

On Tue, Jan 30, 2018 at 11:55 AM, Aleksey Shipilev <shade at redhat.com> wrote:

> On 01/30/2018 11:51 AM, Roman Kennke wrote:
> > Full:
> > http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.01/
>
> Good. I'd still do a few tune-ups:
>
>  *) Keep the asserts in set/unset in ShenandoahSharedBitmap that the
> incoming value fits into byte.
>
>  *) Arguments should be "mask", not "bit"
>   void ShenandoahHeap::set_gc_state_mask_concurrently(uint bit, bool
> value) {
>   void ShenandoahHeap::set_gc_state_mask(uint bit, bool value) {
>
> No need for re-review.
>
> Ok. I'm pushing:

Differential:
http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02.diff/
Full:
http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02/

Thanks, Roman

From shade at redhat.com  Tue Jan 30 15:07:16 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 30 Jan 2018 16:07:16 +0100
Subject: Idea: aliased heap for checking to-space invariant
Message-ID: <06d4e36d-262c-07f4-c131-5eccc63aa7f2@redhat.com>

So I have been walking and muttering to myself how we cannot mprotect(PROT_READ) the collection set,
because we have to accept the fwdptr update in the same page. We used to mprotect cset for
verification, but that code basically mprotect(PROT_WRITE)-ed the page when fwdptr write had
faulted, restarted the fwdptr update, accepting everything else after that too. Thus it was became
too racy to be useful. This was the reason for us to ditch that verification part, and instead rely
on explicit ShenandoahStoreCheck machinery.

Then it hit me: the memory protection is enforced on virtual pages, not on physical pages, which
means we can use the aliased heap to accept the fwdptr stores, while normal heap cset is protected
from writes! I.e. have the normal heap WRITE|READ as usual, have the alias heap WRITE|READ as usual,
then when cset is selected WRITE-protect the cset, and watch out for failures. The fwdptr updates
from WB code should instead go via the aliased heap that is WRITE-enabled.

This gives us several advantages:
  *) We capture all bad writes mechanically, instead of hoping we covered all ShStoreCheck cases
  *) The upstream exposure in .ad and platform-specific macro-assemblers goes away
  *) Roman's work on aliased heaps is not in vain :)
  *) We don't arrive to the mess with "differently-shaped" pointers to both normal and aliased heap,
because we never leak aliased heap pointers anywhere: we just use that as the location for the
fwdptr CAS.

We can (and probably should) only enable this for verification, so we don't have any ill effects for
non-verificated modes (which would just do the same thing they do today).

Thanks,
-Aleksey


From shade at redhat.com  Tue Jan 30 18:25:25 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 30 Jan 2018 19:25:25 +0100
Subject: RFR: [9] Bulk backports to sh/jdk9
In-Reply-To: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com>
References: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com>
Message-ID: <f0a56e43-e8f5-2d83-bc55-ad2c06e8162e@redhat.com>

On 01/26/2018 01:00 PM, Aleksey Shipilev wrote:
> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180126/webrev.01/
> 
> Changes include:
> 
> 8735773ec619: Single thread-local GC state flag for all barriers
> 544322604347: ShConcurrentThread races with set_gc_state_bit
> dc779781dd5e: Do not put down update-refs-in-progress flag concurrently
> d55c6d5216d1: Common TLS access to GC state, where possible
> 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase
> fd9724b26fdd: Refactor allocation failure and explicit GC handling
> e5398dce6e7b: Make concurrent precleaning log message optional again
> 26b9048c042a: Make degenerated update-refs use region-set cursor to hand over work
> 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used
> 12654193e434: Demote warning message about OOM-during-evac to informational
> 67294a38c0c7: TestSelectiveBarrierFlags should accept multi-element flag selections
> ecb87af5e0d8: Implement flag to generate write-barriers without membars.
> 820129a799b1: Allocation failure injection machinery
> b8c39bdc0dac: Log message on ref processing, class unload, update refs for mark events
> 45d471869b73: Degenerated GC
> 15261c4a6adf: Degenerated GC: shortcut cycles, upgrade futile cycles
> bd01b07ba0d7: Add ShenandoahRootProcessor API to report threads while scanning roots
> 3a6457fecc72: Relax assert in SBS::is_safe()
> 30e8ba6e2794: VerifyJCStressTest should test all heuristics
> 6183a72bd5c2: ShBS::interpreter_storeval_barrier signature fix and cleanup
> 3c12448ec444: Fix 32-bit build by ifdef-ing non-implemented storeval barrier
> 
> Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm

Ping.

-Aleksey


From shade at redhat.com  Tue Jan 30 18:26:03 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 30 Jan 2018 19:26:03 +0100
Subject: RFR: Missing cancelled concgc check results assertion failure
In-Reply-To: <028cd960-5aaa-7e5f-6137-2deaee117166@redhat.com>
References: <028cd960-5aaa-7e5f-6137-2deaee117166@redhat.com>
Message-ID: <7f7f5173-669f-225d-1e8d-e7cd1f70703b@redhat.com>

On 01/26/2018 06:54 PM, Zhengyu Gu wrote:
> Diving to weak reference work without checking cancelled concgc, results assertion failure of not
> emptied task queues.
> 
> http://cr.openjdk.java.net/~zgu/shenandoah/tq_cancelled_gc/hs_err.txt
> 
> Webrev:
> http://cr.openjdk.java.net/~zgu/shenandoah/tq_cancelled_gc/webrev.00/

Sounds reasonable to me.

-Aleksey


From rkennke at redhat.com  Tue Jan 30 18:46:17 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 30 Jan 2018 19:46:17 +0100
Subject: Idea: aliased heap for checking to-space invariant
In-Reply-To: <06d4e36d-262c-07f4-c131-5eccc63aa7f2@redhat.com>
References: <06d4e36d-262c-07f4-c131-5eccc63aa7f2@redhat.com>
Message-ID: <a1f9cd03-2cc4-00b1-f878-c93d062578d6@redhat.com>

Am 30.01.2018 um 16:07 schrieb Aleksey Shipilev:
> So I have been walking and muttering to myself how we cannot mprotect(PROT_READ) the collection set,
> because we have to accept the fwdptr update in the same page. We used to mprotect cset for
> verification, but that code basically mprotect(PROT_WRITE)-ed the page when fwdptr write had
> faulted, restarted the fwdptr update, accepting everything else after that too. Thus it was became
> too racy to be useful. This was the reason for us to ditch that verification part, and instead rely
> on explicit ShenandoahStoreCheck machinery.
> 
> Then it hit me: the memory protection is enforced on virtual pages, not on physical pages, which
> means we can use the aliased heap to accept the fwdptr stores, while normal heap cset is protected
> from writes! I.e. have the normal heap WRITE|READ as usual, have the alias heap WRITE|READ as usual,
> then when cset is selected WRITE-protect the cset, and watch out for failures. The fwdptr updates
> from WB code should instead go via the aliased heap that is WRITE-enabled.
> 
> This gives us several advantages:
>    *) We capture all bad writes mechanically, instead of hoping we covered all ShStoreCheck cases
>    *) The upstream exposure in .ad and platform-specific macro-assemblers goes away
>    *) Roman's work on aliased heaps is not in vain :)
>    *) We don't arrive to the mess with "differently-shaped" pointers to both normal and aliased heap,
> because we never leak aliased heap pointers anywhere: we just use that as the location for the
> fwdptr CAS.
> 
> We can (and probably should) only enable this for verification, so we don't have any ill effects for
> non-verificated modes (which would just do the same thing they do today).
> 

This sounds like a great idea! I think it would work.

If we are going to introduce the machinery for multi-mapping (and we 
might eventually get it through ZGC anyway), we might want to think 
about finishing off the safe-oom-during-evac issue. Your inspiration 
inspired me: what if we create the 2nd mapping only on demand? I.e. when 
we hit oom-during-evac, we create the 2nd mapping, use it to safely flip 
the blocking bit in the forwarding pointer, and when the whole 
oom-during-evac sequence is over, we unmap the 2nd mapping. This way 
we'd keep memory counters in the kernel normal until we hit the very 
unlikely error case, and then only for a limited time. WDYT?

Roman


From zgu at redhat.com  Tue Jan 30 19:07:23 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 30 Jan 2018 14:07:23 -0500
Subject: RFR: [9] Bulk backports to sh/jdk9
In-Reply-To: <f0a56e43-e8f5-2d83-bc55-ad2c06e8162e@redhat.com>
References: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com>
 <f0a56e43-e8f5-2d83-bc55-ad2c06e8162e@redhat.com>
Message-ID: <4ac5ddf7-ede8-8ca6-79f1-ebe907ef50a8@redhat.com>

Looks ok to me.

-Zhengyu

On 01/30/2018 01:25 PM, Aleksey Shipilev wrote:
> On 01/26/2018 01:00 PM, Aleksey Shipilev wrote:
>> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180126/webrev.01/
>>
>> Changes include:
>>
>> 8735773ec619: Single thread-local GC state flag for all barriers
>> 544322604347: ShConcurrentThread races with set_gc_state_bit
>> dc779781dd5e: Do not put down update-refs-in-progress flag concurrently
>> d55c6d5216d1: Common TLS access to GC state, where possible
>> 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase
>> fd9724b26fdd: Refactor allocation failure and explicit GC handling
>> e5398dce6e7b: Make concurrent precleaning log message optional again
>> 26b9048c042a: Make degenerated update-refs use region-set cursor to hand over work
>> 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used
>> 12654193e434: Demote warning message about OOM-during-evac to informational
>> 67294a38c0c7: TestSelectiveBarrierFlags should accept multi-element flag selections
>> ecb87af5e0d8: Implement flag to generate write-barriers without membars.
>> 820129a799b1: Allocation failure injection machinery
>> b8c39bdc0dac: Log message on ref processing, class unload, update refs for mark events
>> 45d471869b73: Degenerated GC
>> 15261c4a6adf: Degenerated GC: shortcut cycles, upgrade futile cycles
>> bd01b07ba0d7: Add ShenandoahRootProcessor API to report threads while scanning roots
>> 3a6457fecc72: Relax assert in SBS::is_safe()
>> 30e8ba6e2794: VerifyJCStressTest should test all heuristics
>> 6183a72bd5c2: ShBS::interpreter_storeval_barrier signature fix and cleanup
>> 3c12448ec444: Fix 32-bit build by ifdef-ing non-implemented storeval barrier
>>
>> Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm
> 
> Ping.
> 
> -Aleksey
> 
> 

From zgu at redhat.com  Tue Jan 30 19:40:06 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 30 Jan 2018 14:40:06 -0500
Subject: RFR: String deduplication for traversal GC
Message-ID: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com>

Please review the implementation of string deduplication for traversal GC.


Webrev: 
http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/


Test:

   hotspot_gc_shenandoah (fastdebug + release)
   specJVM with -XX:+UseStringDeduplication (fastdebug)


Thanks,

-Zhengyu

From rkennke at redhat.com  Tue Jan 30 19:46:16 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 30 Jan 2018 20:46:16 +0100
Subject: RFR: String deduplication for traversal GC
In-Reply-To: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com>
References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com>
Message-ID: <aca8cf03-cd7b-78de-d915-715908d71ffa@redhat.com>

Am 30.01.2018 um 20:40 schrieb Zhengyu Gu:
> Please review the implementation of string deduplication for traversal GC.
> 
> 
> Webrev: 
> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/
> 
> 
> Test:
> 
>  ? hotspot_gc_shenandoah (fastdebug + release)
>  ? specJVM with -XX:+UseStringDeduplication (fastdebug)
> 
> 
> Thanks,
> 
> -Zhengyu

I wonder if it should be possible to make the closure templated instead 
of making multiple explicit classes, like this:

template <bool STRDEDUP>
class ShenandoahTraversalSuperClosure .. {

..
    template <class T>
    void work(T* p);
}

and then something like:

template <bool STRDEDUP>
class ShenandoahTraversalDedupClosure : public 
ShenandoahTraversalSuperClosure<STRDEDUP> {

I am not totally sure about how to stitch it together, but something 
like this should work? Or maybe it's not worth all the hassle. ?

(Infact, I suspect something like the above would be possible for the 
metadata flag too...)

Roman

From rkennke at redhat.com  Tue Jan 30 19:48:00 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 30 Jan 2018 20:48:00 +0100
Subject: RFR: String deduplication for traversal GC
In-Reply-To: <aca8cf03-cd7b-78de-d915-715908d71ffa@redhat.com>
References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com>
 <aca8cf03-cd7b-78de-d915-715908d71ffa@redhat.com>
Message-ID: <fc01dd44-9d98-d16a-e5df-f8f8b89ad142@redhat.com>

Am 30.01.2018 um 20:46 schrieb Roman Kennke:
> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu:
>> Please review the implementation of string deduplication for traversal 
>> GC.
>>
>>
>> Webrev: 
>> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/
>>
>>
>> Test:
>>
>> ?? hotspot_gc_shenandoah (fastdebug + release)
>> ?? specJVM with -XX:+UseStringDeduplication (fastdebug)
>>
>>
>> Thanks,
>>
>> -Zhengyu
> 
> I wonder if it should be possible to make the closure templated instead 
> of making multiple explicit classes, like this:
> 
> template <bool STRDEDUP>
> class ShenandoahTraversalSuperClosure .. {
> 
> ..
>  ?? template <class T>
>  ?? void work(T* p);
> }
> 
> and then something like:
> 
> template <bool STRDEDUP>
> class ShenandoahTraversalDedupClosure : public 
> ShenandoahTraversalSuperClosure<STRDEDUP> {
> 
> I am not totally sure about how to stitch it together, but something 
> like this should work? Or maybe it's not worth all the hassle. ?
> 
> (Infact, I suspect something like the above would be possible for the 
> metadata flag too...)
> 
> Roman
> 


Ah, one weirdo in this scheme is the definition of work(), which would 
look something like:

template <bool STRDEDUP>
template <class T>
inline void ShenandoahTraversalSuperClosure::work(T* p) {

Roman

From zgu at redhat.com  Tue Jan 30 20:07:20 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 30 Jan 2018 15:07:20 -0500
Subject: RFR: String deduplication for traversal GC
In-Reply-To: <fc01dd44-9d98-d16a-e5df-f8f8b89ad142@redhat.com>
References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com>
 <aca8cf03-cd7b-78de-d915-715908d71ffa@redhat.com>
 <fc01dd44-9d98-d16a-e5df-f8f8b89ad142@redhat.com>
Message-ID: <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com>


On 01/30/2018 02:48 PM, Roman Kennke wrote:
> Am 30.01.2018 um 20:46 schrieb Roman Kennke:
>> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu:
>>> Please review the implementation of string deduplication for 
>>> traversal GC.
>>>
>>>
>>> Webrev: 
>>> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/
>>>
>>>
>>> Test:
>>>
>>>    hotspot_gc_shenandoah (fastdebug + release)
>>>    specJVM with -XX:+UseStringDeduplication (fastdebug)
>>>
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>
>> I wonder if it should be possible to make the closure templated 
>> instead of making multiple explicit classes, like this:
>>
>> template <bool STRDEDUP>
>> class ShenandoahTraversalSuperClosure .. {
>>
>> ..
>>     template <class T>
>>     void work(T* p);
>> }
>>
>> and then something like:
>>
>> template <bool STRDEDUP>
>> class ShenandoahTraversalDedupClosure : public 
>> ShenandoahTraversalSuperClosure<STRDEDUP> {
>>
>> I am not totally sure about how to stitch it together, but something 
>> like this should work? Or maybe it's not worth all the hassle. ?
>>
>> (Infact, I suspect something like the above would be possible for the 
>> metadata flag too...)
>>
>> Roman
>>
> 
> 
> Ah, one weirdo in this scheme is the definition of work(), which would 
> look something like:
> 
> template <bool STRDEDUP>
> template <class T>
> inline void ShenandoahTraversalSuperClosure::work(T* p) {

What's advantage of this style?

Thanks,

-Zhengyu


> 
> Roman

From rkennke at redhat.com  Tue Jan 30 20:08:37 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 30 Jan 2018 21:08:37 +0100
Subject: RFR: String deduplication for traversal GC
In-Reply-To: <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com>
References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com>
 <aca8cf03-cd7b-78de-d915-715908d71ffa@redhat.com>
 <fc01dd44-9d98-d16a-e5df-f8f8b89ad142@redhat.com>
 <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com>
Message-ID: <A34A165F-CA1D-4C71-AEC8-78594981C0F1@redhat.com>

Less clutter/boilerplate code? ;-) I guess it's mostly a matter oft taste..

Am 30. Januar 2018 21:07:20 MEZ schrieb Zhengyu Gu <zgu at redhat.com>:
>
>
>On 01/30/2018 02:48 PM, Roman Kennke wrote:
>> Am 30.01.2018 um 20:46 schrieb Roman Kennke:
>>> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu:
>>>> Please review the implementation of string deduplication for 
>>>> traversal GC.
>>>>
>>>>
>>>> Webrev: 
>>>>
>http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/
>>>>
>>>>
>>>> Test:
>>>>
>>>>    hotspot_gc_shenandoah (fastdebug + release)
>>>>    specJVM with -XX:+UseStringDeduplication (fastdebug)
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> -Zhengyu
>>>
>>> I wonder if it should be possible to make the closure templated 
>>> instead of making multiple explicit classes, like this:
>>>
>>> template <bool STRDEDUP>
>>> class ShenandoahTraversalSuperClosure .. {
>>>
>>> ..
>>>     template <class T>
>>>     void work(T* p);
>>> }
>>>
>>> and then something like:
>>>
>>> template <bool STRDEDUP>
>>> class ShenandoahTraversalDedupClosure : public 
>>> ShenandoahTraversalSuperClosure<STRDEDUP> {
>>>
>>> I am not totally sure about how to stitch it together, but something
>
>>> like this should work? Or maybe it's not worth all the hassle. ?
>>>
>>> (Infact, I suspect something like the above would be possible for
>the 
>>> metadata flag too...)
>>>
>>> Roman
>>>
>> 
>> 
>> Ah, one weirdo in this scheme is the definition of work(), which
>would 
>> look something like:
>> 
>> template <bool STRDEDUP>
>> template <class T>
>> inline void ShenandoahTraversalSuperClosure::work(T* p) {
>
>What's advantage of this style?
>
>Thanks,
>
>-Zhengyu
>
>
>> 
>> Roman

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From zgu at redhat.com  Tue Jan 30 20:12:22 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 30 Jan 2018 15:12:22 -0500
Subject: RFR: String deduplication for traversal GC
In-Reply-To: <A34A165F-CA1D-4C71-AEC8-78594981C0F1@redhat.com>
References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com>
 <aca8cf03-cd7b-78de-d915-715908d71ffa@redhat.com>
 <fc01dd44-9d98-d16a-e5df-f8f8b89ad142@redhat.com>
 <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com>
 <A34A165F-CA1D-4C71-AEC8-78594981C0F1@redhat.com>
Message-ID: <415b37b8-3d93-db29-8237-a2a28bc9a7e7@redhat.com>

Let's follow existing style. If we decide to change, we should change 
them all together.

Thanks,

-Zhengyu

On 01/30/2018 03:08 PM, Roman Kennke wrote:
> Less clutter/boilerplate code? ;-) I guess it's mostly a matter oft taste..
> 
> Am 30. Januar 2018 21:07:20 MEZ schrieb Zhengyu Gu <zgu at redhat.com>:
>>
>>
>> On 01/30/2018 02:48 PM, Roman Kennke wrote:
>>> Am 30.01.2018 um 20:46 schrieb Roman Kennke:
>>>> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu:
>>>>> Please review the implementation of string deduplication for
>>>>> traversal GC.
>>>>>
>>>>>
>>>>> Webrev:
>>>>>
>> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/
>>>>>
>>>>>
>>>>> Test:
>>>>>
>>>>>     hotspot_gc_shenandoah (fastdebug + release)
>>>>>     specJVM with -XX:+UseStringDeduplication (fastdebug)
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Zhengyu
>>>>
>>>> I wonder if it should be possible to make the closure templated
>>>> instead of making multiple explicit classes, like this:
>>>>
>>>> template <bool STRDEDUP>
>>>> class ShenandoahTraversalSuperClosure .. {
>>>>
>>>> ..
>>>>      template <class T>
>>>>      void work(T* p);
>>>> }
>>>>
>>>> and then something like:
>>>>
>>>> template <bool STRDEDUP>
>>>> class ShenandoahTraversalDedupClosure : public
>>>> ShenandoahTraversalSuperClosure<STRDEDUP> {
>>>>
>>>> I am not totally sure about how to stitch it together, but something
>>
>>>> like this should work? Or maybe it's not worth all the hassle. ?
>>>>
>>>> (Infact, I suspect something like the above would be possible for
>> the
>>>> metadata flag too...)
>>>>
>>>> Roman
>>>>
>>>
>>>
>>> Ah, one weirdo in this scheme is the definition of work(), which
>> would
>>> look something like:
>>>
>>> template <bool STRDEDUP>
>>> template <class T>
>>> inline void ShenandoahTraversalSuperClosure::work(T* p) {
>>
>> What's advantage of this style?
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>>
>>> Roman
> 

From zgu at redhat.com  Tue Jan 30 20:19:37 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 30 Jan 2018 15:19:37 -0500
Subject: RFR: String deduplication for traversal GC
In-Reply-To: <aca8cf03-cd7b-78de-d915-715908d71ffa@redhat.com>
References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com>
 <aca8cf03-cd7b-78de-d915-715908d71ffa@redhat.com>
Message-ID: <3a707760-906c-2f2d-6cd3-7e238269b36b@redhat.com>


On 01/30/2018 02:46 PM, Roman Kennke wrote:
> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu:
>> Please review the implementation of string deduplication for traversal 
>> GC.
>>
>>
>> Webrev: 
>> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/
>>
>>
>> Test:
>>
>>    hotspot_gc_shenandoah (fastdebug + release)
>>    specJVM with -XX:+UseStringDeduplication (fastdebug)
>>
>>
>> Thanks,
>>
>> -Zhengyu
> 
> I wonder if it should be possible to make the closure templated instead 
> of making multiple explicit classes, like this:
> 
> template <bool STRDEDUP>
> class ShenandoahTraversalSuperClosure .. {
> 
> ..
>     template <class T>
>     void work(T* p);
> }
> 
> and then something like:

I had a version like this for early dedup closures, but changed to 
current style based on shade's comments.

-Zhengyu


> 
> template <bool STRDEDUP>
> class ShenandoahTraversalDedupClosure : public 
> ShenandoahTraversalSuperClosure<STRDEDUP> {
> 
> I am not totally sure about how to stitch it together, but something 
> like this should work? Or maybe it's not worth all the hassle. ?
> 
> (Infact, I suspect something like the above would be possible for the 
> metadata flag too...)
> 
> Roman

From rkennke at redhat.com  Tue Jan 30 21:21:46 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 30 Jan 2018 22:21:46 +0100
Subject: RFR: String deduplication for traversal GC
In-Reply-To: <415b37b8-3d93-db29-8237-a2a28bc9a7e7@redhat.com>
References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com>
 <aca8cf03-cd7b-78de-d915-715908d71ffa@redhat.com>
 <fc01dd44-9d98-d16a-e5df-f8f8b89ad142@redhat.com>
 <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com>
 <A34A165F-CA1D-4C71-AEC8-78594981C0F1@redhat.com>
 <415b37b8-3d93-db29-8237-a2a28bc9a7e7@redhat.com>
Message-ID: <00D13E3E-5B3D-408B-BF5E-19BD77B6423D@redhat.com>

Yes, that is fine. The rest of patch looks good to me too.

Am 30. Januar 2018 21:12:22 MEZ schrieb Zhengyu Gu <zgu at redhat.com>:
>Let's follow existing style. If we decide to change, we should change 
>them all together.
>
>Thanks,
>
>-Zhengyu
>
>On 01/30/2018 03:08 PM, Roman Kennke wrote:
>> Less clutter/boilerplate code? ;-) I guess it's mostly a matter oft
>taste..
>> 
>> Am 30. Januar 2018 21:07:20 MEZ schrieb Zhengyu Gu <zgu at redhat.com>:
>>>
>>>
>>> On 01/30/2018 02:48 PM, Roman Kennke wrote:
>>>> Am 30.01.2018 um 20:46 schrieb Roman Kennke:
>>>>> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu:
>>>>>> Please review the implementation of string deduplication for
>>>>>> traversal GC.
>>>>>>
>>>>>>
>>>>>> Webrev:
>>>>>>
>>>
>http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/
>>>>>>
>>>>>>
>>>>>> Test:
>>>>>>
>>>>>>     hotspot_gc_shenandoah (fastdebug + release)
>>>>>>     specJVM with -XX:+UseStringDeduplication (fastdebug)
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -Zhengyu
>>>>>
>>>>> I wonder if it should be possible to make the closure templated
>>>>> instead of making multiple explicit classes, like this:
>>>>>
>>>>> template <bool STRDEDUP>
>>>>> class ShenandoahTraversalSuperClosure .. {
>>>>>
>>>>> ..
>>>>>      template <class T>
>>>>>      void work(T* p);
>>>>> }
>>>>>
>>>>> and then something like:
>>>>>
>>>>> template <bool STRDEDUP>
>>>>> class ShenandoahTraversalDedupClosure : public
>>>>> ShenandoahTraversalSuperClosure<STRDEDUP> {
>>>>>
>>>>> I am not totally sure about how to stitch it together, but
>something
>>>
>>>>> like this should work? Or maybe it's not worth all the hassle. ?
>>>>>
>>>>> (Infact, I suspect something like the above would be possible for
>>> the
>>>>> metadata flag too...)
>>>>>
>>>>> Roman
>>>>>
>>>>
>>>>
>>>> Ah, one weirdo in this scheme is the definition of work(), which
>>> would
>>>> look something like:
>>>>
>>>> template <bool STRDEDUP>
>>>> template <class T>
>>>> inline void ShenandoahTraversalSuperClosure::work(T* p) {
>>>
>>> What's advantage of this style?
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>>>
>>>> Roman
>> 

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From rkennke at redhat.com  Wed Jan 31 10:09:22 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 11:09:22 +0100
Subject: RFR: [9] Bulk backports to sh/jdk9
In-Reply-To: <4ac5ddf7-ede8-8ca6-79f1-ebe907ef50a8@redhat.com>
References: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com>
 <f0a56e43-e8f5-2d83-bc55-ad2c06e8162e@redhat.com>
 <4ac5ddf7-ede8-8ca6-79f1-ebe907ef50a8@redhat.com>
Message-ID: <CAAN-Kygz7Sn4Lrm6QKV_G7HKUdTEfDS=8breP_EfPLnV2Z8WGA@mail.gmail.com>

Looks good to me too

On Tue, Jan 30, 2018 at 8:07 PM, Zhengyu Gu <zgu at redhat.com> wrote:

> Looks ok to me.
>
> -Zhengyu
>
>
> On 01/30/2018 01:25 PM, Aleksey Shipilev wrote:
>
>> On 01/26/2018 01:00 PM, Aleksey Shipilev wrote:
>>
>>> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-
>>> 20180126/webrev.01/
>>>
>>> Changes include:
>>>
>>> 8735773ec619: Single thread-local GC state flag for all barriers
>>> 544322604347: ShConcurrentThread races with set_gc_state_bit
>>> dc779781dd5e: Do not put down update-refs-in-progress flag concurrently
>>> d55c6d5216d1: Common TLS access to GC state, where possible
>>> 1d1238a0603b: Defer cleaning of system dictionary and friends to
>>> parallel cleaning phase
>>> fd9724b26fdd: Refactor allocation failure and explicit GC handling
>>> e5398dce6e7b: Make concurrent precleaning log message optional again
>>> 26b9048c042a: Make degenerated update-refs use region-set cursor to hand
>>> over work
>>> 1a6a9f288dd2: Bitmap size might not be page aligned when large page is
>>> used
>>> 12654193e434: Demote warning message about OOM-during-evac to
>>> informational
>>> 67294a38c0c7: TestSelectiveBarrierFlags should accept multi-element flag
>>> selections
>>> ecb87af5e0d8: Implement flag to generate write-barriers without membars.
>>> 820129a799b1: Allocation failure injection machinery
>>> b8c39bdc0dac: Log message on ref processing, class unload, update refs
>>> for mark events
>>> 45d471869b73: Degenerated GC
>>> 15261c4a6adf: Degenerated GC: shortcut cycles, upgrade futile cycles
>>> bd01b07ba0d7: Add ShenandoahRootProcessor API to report threads while
>>> scanning roots
>>> 3a6457fecc72: Relax assert in SBS::is_safe()
>>> 30e8ba6e2794: VerifyJCStressTest should test all heuristics
>>> 6183a72bd5c2: ShBS::interpreter_storeval_barrier signature fix and
>>> cleanup
>>> 3c12448ec444: Fix 32-bit build by ifdef-ing non-implemented storeval
>>> barrier
>>>
>>> Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm
>>>
>>
>> Ping.
>>
>> -Aleksey
>>
>>
>>

From shade at redhat.com  Wed Jan 31 11:28:58 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 12:28:58 +0100
Subject: RFR: Single GCTimer shared by all operations
Message-ID: <da12c226-58d5-8aaa-7fa8-14a283b2a4ab@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/single-gc-timer/webrev.01/

Degenerated GC exposed a wrinkle in our GCTimer handling. Full GC has the separate GCTimer (for
legacy reasons?). All other operations run with GCTimer from ShHeap. Degenerated GC is peculiar: it
may start as the usual operation, but then *continue* as upgraded to Full GC. GCTimers then start to
misbehave with asserts like:

#  assert(_phases->length() <= 1000) failed: Too many recored phases?

The solution/cleanup is to use a single GCTimer, basically letting Full GC using the GCTimer from
ShHeap.

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 31 11:40:17 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 12:40:17 +0100
Subject: RFR: Single GCTimer shared by all operations
In-Reply-To: <da12c226-58d5-8aaa-7fa8-14a283b2a4ab@redhat.com>
References: <da12c226-58d5-8aaa-7fa8-14a283b2a4ab@redhat.com>
Message-ID: <d5b0f589-a54a-9d7a-eca2-98c5a249b399@redhat.com>

Am 31.01.2018 um 12:28 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/single-gc-timer/webrev.01/
> 
> Degenerated GC exposed a wrinkle in our GCTimer handling. Full GC has the separate GCTimer (for
> legacy reasons?). All other operations run with GCTimer from ShHeap. Degenerated GC is peculiar: it
> may start as the usual operation, but then *continue* as upgraded to Full GC. GCTimers then start to
> misbehave with asserts like:
> 
> #  assert(_phases->length() <= 1000) failed: Too many recored phases?
> 
> The solution/cleanup is to use a single GCTimer, basically letting Full GC using the GCTimer from
> ShHeap.
> 
> Testing: hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 

Seems good. Is there a difference between STWGCTimer and 
ConcurrentGCTimer that may fall on our feet?

Roman

From shade at redhat.com  Wed Jan 31 11:42:49 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 12:42:49 +0100
Subject: RFR: Single GCTimer shared by all operations
In-Reply-To: <d5b0f589-a54a-9d7a-eca2-98c5a249b399@redhat.com>
References: <da12c226-58d5-8aaa-7fa8-14a283b2a4ab@redhat.com>
 <d5b0f589-a54a-9d7a-eca2-98c5a249b399@redhat.com>
Message-ID: <b2f9d09d-e6b2-0a09-b962-d532e37cdf9d@redhat.com>

On 01/31/2018 12:40 PM, Roman Kennke wrote:
> Am 31.01.2018 um 12:28 schrieb Aleksey Shipilev:
>> http://cr.openjdk.java.net/~shade/shenandoah/single-gc-timer/webrev.01/
>>
>> Degenerated GC exposed a wrinkle in our GCTimer handling. Full GC has the separate GCTimer (for
>> legacy reasons?). All other operations run with GCTimer from ShHeap. Degenerated GC is peculiar: it
>> may start as the usual operation, but then *continue* as upgraded to Full GC. GCTimers then start to
>> misbehave with asserts like:
>>
>> #? assert(_phases->length() <= 1000) failed: Too many recored phases?
>>
>> The solution/cleanup is to use a single GCTimer, basically letting Full GC using the GCTimer from
>> ShHeap.
>>
>> Testing: hotspot_gc_shenandoah
>>
>> Thanks,
>> -Aleksey
>>
> 
> Seems good. Is there a difference between STWGCTimer and ConcurrentGCTimer that may fall on our feet?

I don't think so.

The bigger question if that fixes the failures that you see in tests? Because I cannot reproduce the
failure on my local machine.

-Aleksey


From rkennke at redhat.com  Wed Jan 31 12:12:46 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 13:12:46 +0100
Subject: RFR: Single GCTimer shared by all operations
In-Reply-To: <b2f9d09d-e6b2-0a09-b962-d532e37cdf9d@redhat.com>
References: <da12c226-58d5-8aaa-7fa8-14a283b2a4ab@redhat.com>
 <d5b0f589-a54a-9d7a-eca2-98c5a249b399@redhat.com>
 <b2f9d09d-e6b2-0a09-b962-d532e37cdf9d@redhat.com>
Message-ID: <164c6ad9-82c3-a836-9362-af8112a8cab5@redhat.com>

Am 31.01.2018 um 12:42 schrieb Aleksey Shipilev:
> On 01/31/2018 12:40 PM, Roman Kennke wrote:
>> Am 31.01.2018 um 12:28 schrieb Aleksey Shipilev:
>>> http://cr.openjdk.java.net/~shade/shenandoah/single-gc-timer/webrev.01/
>>>
>>> Degenerated GC exposed a wrinkle in our GCTimer handling. Full GC has the separate GCTimer (for
>>> legacy reasons?). All other operations run with GCTimer from ShHeap. Degenerated GC is peculiar: it
>>> may start as the usual operation, but then *continue* as upgraded to Full GC. GCTimers then start to
>>> misbehave with asserts like:
>>>
>>> #? assert(_phases->length() <= 1000) failed: Too many recored phases?
>>>
>>> The solution/cleanup is to use a single GCTimer, basically letting Full GC using the GCTimer from
>>> ShHeap.
>>>
>>> Testing: hotspot_gc_shenandoah
>>>
>>> Thanks,
>>> -Aleksey
>>>
>>
>> Seems good. Is there a difference between STWGCTimer and ConcurrentGCTimer that may fall on our feet?
> 
> I don't think so.
> 
> The bigger question if that fixes the failures that you see in tests? Because I cannot reproduce the
> failure on my local machine.
> 
> -Aleksey
> 

Seems good in the few runs I could make. Let's push it and undergo more 
testing when it's in.

Roman

From zgu at redhat.com  Wed Jan 31 13:24:51 2018
From: zgu at redhat.com (zgu at redhat.com)
Date: Wed, 31 Jan 2018 13:24:51 +0000
Subject: hg: shenandoah/jdk10: String deduplication for traversal GC
Message-ID: <201801311324.w0VDOpgc017315@aojmv0008.oracle.com>

Changeset: 6657b88f3a63
Author:    zgu
Date:      2018-01-31 08:19 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/6657b88f3a63

String deduplication for traversal GC

! src/hotspot/share/gc/shenandoah/shenandoahOopClosures.hpp
! src/hotspot/share/gc/shenandoah/shenandoahOopClosures.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.cpp
! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.hpp
! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.inline.hpp
! src/hotspot/share/gc/shenandoah/shenandoah_specialized_oop_closures.hpp
! test/hotspot/jtreg/gc/shenandoah/ShenandoahStrDedupStress.java


From zgu at redhat.com  Wed Jan 31 13:26:33 2018
From: zgu at redhat.com (zgu at redhat.com)
Date: Wed, 31 Jan 2018 13:26:33 +0000
Subject: hg: shenandoah/jdk10: Missing cancelled concgc check results
 assertion failure
Message-ID: <201801311326.w0VDQXHD017862@aojmv0008.oracle.com>

Changeset: 3ef7ac462979
Author:    zgu
Date:      2018-01-31 08:22 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/3ef7ac462979

Missing cancelled concgc check results assertion failure

! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.cpp


From ashipile at redhat.com  Wed Jan 31 13:49:48 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 31 Jan 2018 13:49:48 +0000
Subject: hg: shenandoah/jdk10: Single GCTimer shared by all operations
Message-ID: <201801311349.w0VDnm35025225@aojmv0008.oracle.com>

Changeset: 4050463704a4
Author:    shade
Date:      2018-01-31 12:29 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/4050463704a4

Single GCTimer shared by all operations

! src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp
! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp
! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.hpp
! src/hotspot/share/gc/shenandoah/shenandoahUtils.cpp
! src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp


From ashipile at redhat.com  Wed Jan 31 15:26:42 2018
From: ashipile at redhat.com (ashipile at redhat.com)
Date: Wed, 31 Jan 2018 15:26:42 +0000
Subject: hg: shenandoah/jdk9/hotspot: 21 new changesets
Message-ID: <201801311526.w0VFQhEN028358@aojmv0008.oracle.com>

Changeset: 489bec20624c
Author:    shade
Date:      2018-01-15 12:19 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/489bec20624c

[backport] Single thread-local GC state flag for all barriers

! src/cpu/aarch64/vm/c1_Runtime1_aarch64.cpp
! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
! src/cpu/aarch64/vm/shenandoahBarrierSet_aarch64.cpp
! src/cpu/x86/vm/c1_Runtime1_x86.cpp
! src/cpu/x86/vm/macroAssembler_x86.cpp
! src/cpu/x86/vm/shenandoahBarrierSet_x86.cpp
! src/cpu/x86/vm/x86_64.ad
! src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp
! src/share/vm/gc/shenandoah/shenandoahSharedVariables.hpp
! src/share/vm/gc/shenandoah/shenandoahVerifier.cpp
! src/share/vm/opto/cfgnode.hpp
! src/share/vm/opto/compile.cpp
! src/share/vm/opto/graphKit.cpp
! src/share/vm/opto/ifnode.cpp
! src/share/vm/opto/memnode.hpp
! src/share/vm/opto/node.hpp
! src/share/vm/opto/shenandoahSupport.cpp
! src/share/vm/runtime/thread.cpp
! src/share/vm/runtime/thread.hpp
! src/share/vm/runtime/thread.inline.hpp

Changeset: 447b871ee85b
Author:    shade
Date:      2018-01-16 20:23 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/447b871ee85b

[backport] ShConcurrentThread races with set_gc_state_bit

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp

Changeset: f667c875b72d
Author:    shade
Date:      2018-01-22 12:04 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/f667c875b72d

[backport] Do not put down update-refs-in-progress flag concurrently

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp

Changeset: ba8a39b9672d
Author:    shade
Date:      2018-01-15 12:32 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/ba8a39b9672d

[backport] Common TLS access to GC state, where possible

! src/share/vm/gc/shenandoah/shenandoah_globals.hpp
! src/share/vm/opto/graphKit.cpp
! src/share/vm/opto/loopnode.cpp
! src/share/vm/opto/loopnode.hpp
! src/share/vm/opto/shenandoahSupport.cpp
! src/share/vm/opto/shenandoahSupport.hpp
+ test/gc/shenandoah/compiler/TestCommonGCLoads.java

Changeset: 2ed987e64f80
Author:    rkennke
Date:      2018-01-17 15:33 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/2ed987e64f80

[backport] Defer cleaning of system dictionary and friends to parallel cleaning phase

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp

Changeset: 4c58342d9fc1
Author:    shade
Date:      2018-01-17 15:37 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/4c58342d9fc1

[backport] Refactor allocation failure and explicit GC handling

! src/share/vm/gc/shared/gcCause.cpp
! src/share/vm/gc/shared/gcCause.hpp
! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp
! src/share/vm/gc/shenandoah/shenandoah_globals.hpp

Changeset: 417fb8d6c4d0
Author:    shade
Date:      2018-01-22 10:10 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/417fb8d6c4d0

[backport] Make concurrent precleaning log message optional again

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp

Changeset: d6298f7d7545
Author:    shade
Date:      2018-01-17 16:08 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/d6298f7d7545

[backport] Make degenerated update-refs use region-set cursor to hand over work

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp

Changeset: 260edcc9f8a2
Author:    zgu
Date:      2018-01-18 08:23 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/260edcc9f8a2

[backport] Bitmap size might not be page aligned when large page is used

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp

Changeset: fd14b29d82d7
Author:    shade
Date:      2018-01-19 11:52 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fd14b29d82d7

[backport] Demote warning message about OOM-during-evac to informational

! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp

Changeset: 939b89fc6bd3
Author:    shade
Date:      2018-01-19 16:27 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/939b89fc6bd3

[backport] TestSelectiveBarrierFlags should accept multi-element flag selections

! test/gc/shenandoah/TestSelectiveBarrierFlags.java

Changeset: 18f77577944a
Author:    rkennke
Date:      2018-01-19 18:40 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/18f77577944a

[backport] Implement flag to generate write-barriers without membars.

! src/share/vm/gc/shenandoah/shenandoah_globals.hpp
! src/share/vm/opto/compile.cpp
! src/share/vm/opto/shenandoahSupport.cpp

Changeset: 882e15472997
Author:    shade
Date:      2018-01-19 18:49 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/882e15472997

[backport] Allocation failure injection machinery

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoah_globals.hpp
! test/gc/shenandoah/LotsOfCycles.java
! test/gc/shenandoah/acceptance/AllocIntArrays.java
! test/gc/shenandoah/acceptance/AllocObjectArrays.java
! test/gc/shenandoah/acceptance/AllocObjects.java
! test/gc/shenandoah/acceptance/RetainObjects.java
! test/gc/shenandoah/acceptance/SieveObjects.java
! test/gc/stress/TestGCOldWithShenandoah.java
! test/gc/stress/gcbasher/TestGCBasherWithShenandoah.java
! test/gc/stress/gclocker/TestGCLockerWithShenandoah.java

Changeset: 93865bd554e1
Author:    shade
Date:      2018-01-22 10:47 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/93865bd554e1

[backport] Log message on ref processing, class unload, update refs for mark events

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp

Changeset: 5cfc9680da7d
Author:    shade
Date:      2018-01-22 12:52 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/5cfc9680da7d

[backport] Degenerated GC

! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp
! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp
! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.cpp
! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.hpp
! src/share/vm/gc/shenandoah/shenandoahUtils.hpp
! src/share/vm/gc/shenandoah/shenandoahVerifier.cpp
! src/share/vm/gc/shenandoah/shenandoahVerifier.hpp
! src/share/vm/gc/shenandoah/shenandoahWorkerPolicy.cpp
! src/share/vm/gc/shenandoah/shenandoahWorkerPolicy.hpp
! src/share/vm/gc/shenandoah/shenandoah_globals.hpp
! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp
! src/share/vm/gc/shenandoah/vm_operations_shenandoah.hpp
! src/share/vm/runtime/vm_operations.hpp

Changeset: 9240f42fb9d1
Author:    shade
Date:      2018-01-24 15:30 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9240f42fb9d1

[backport] Degenerated GC: shortcut cycles, upgrade futile cycles

! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.hpp

Changeset: fd4837b82b06
Author:    rkennke
Date:      2018-01-23 21:20 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fd4837b82b06

[backport] Add ShenandoahRootProcessor API to report threads while scanning roots

! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp
! src/share/vm/gc/shenandoah/shenandoahHeap.cpp
! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp
! src/share/vm/gc/shenandoah/shenandoahRootProcessor.cpp
! src/share/vm/gc/shenandoah/shenandoahRootProcessor.hpp

Changeset: bfa5f2485433
Author:    rkennke
Date:      2018-01-24 15:09 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/bfa5f2485433

[backport] Relax assert in SBS::is_safe()

! src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp

Changeset: 1be91cb7a447
Author:    shade
Date:      2018-01-24 19:14 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/1be91cb7a447

[backport] VerifyJCStressTest should test all heuristics

! test/gc/shenandoah/acceptance/VerifyJCStressTest.java

Changeset: 4c7ca6405439
Author:    shade
Date:      2018-01-25 11:24 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/4c7ca6405439

[backport] ShBS::interpreter_storeval_barrier signature fix and cleanup

! src/cpu/aarch64/vm/shenandoahBarrierSet_aarch64.cpp
! src/cpu/aarch64/vm/templateTable_aarch64.cpp
! src/cpu/x86/vm/shenandoahBarrierSet_x86.cpp
! src/cpu/x86/vm/templateTable_x86.cpp
! src/share/vm/gc/shared/barrierSet.hpp
! src/share/vm/gc/shenandoah/shenandoahBarrierSet.hpp

Changeset: a5e7ea380dc5
Author:    shade
Date:      2018-01-25 18:44 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a5e7ea380dc5

[backport] Fix 32-bit build by ifdef-ing non-implemented storeval barrier

! src/cpu/x86/vm/shenandoahBarrierSet_x86.cpp


From rwestrel at redhat.com  Wed Jan 31 15:31:43 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 31 Jan 2018 16:31:43 +0100
Subject: RFR: backport of 8191887
Message-ID: <dk67eryrqn4.fsf@rwestrel.remote.csb>


I hit 8191887 when running specjvm with Shenandoah. This was fixed
upstream so I propose we cherry pick it. The fix doesn't apply
cleanly so here it is on top of the current shenandoah repo:

http://cr.openjdk.java.net/~roland/shenandoah/8191887/webrev.00/

Roland.

From rwestrel at redhat.com  Wed Jan 31 15:34:08 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 31 Jan 2018 16:34:08 +0100
Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar
Message-ID: <dk64ln2rqj3.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/shenandoah/loop_unswitching%2b-ShenandoahWriteBarrierMemBar/webrev.00/

This fixes:

http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-January/004738.html

Part of the logic required for this to work (the code added by the
patch) also got lost at some point.

Roland.

From shade at redhat.com  Wed Jan 31 15:43:26 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 16:43:26 +0100
Subject: RFR: backport of 8191887
In-Reply-To: <dk67eryrqn4.fsf@rwestrel.remote.csb>
References: <dk67eryrqn4.fsf@rwestrel.remote.csb>
Message-ID: <4b18ee64-8e92-0d72-fa4c-002fbdf48d9f@redhat.com>

On 01/31/2018 04:31 PM, Roland Westrelin wrote:
> I hit 8191887 when running specjvm with Shenandoah. This was fixed
> upstream so I propose we cherry pick it. The fix doesn't apply
> cleanly so here it is on top of the current shenandoah repo:
> 
> http://cr.openjdk.java.net/~roland/shenandoah/8191887/webrev.00/

Yes please, anything that helps resolves conflicts during the merges is cool.

Please push it as:
 "Cherry-pick 8191887: assert(b->is_Bool()) in PhaseIdealLoop::clone_iff() due to Opaque4 node"

-Aleksey


From shade at redhat.com  Wed Jan 31 15:45:31 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 16:45:31 +0100
Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar
In-Reply-To: <dk64ln2rqj3.fsf@rwestrel.remote.csb>
References: <dk64ln2rqj3.fsf@rwestrel.remote.csb>
Message-ID: <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com>

On 01/31/2018 04:34 PM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/shenandoah/loop_unswitching%2b-ShenandoahWriteBarrierMemBar/webrev.00/

Looks good to me!

4059     Node* load = iff->in(1)->in(1)->in(1)->in(1);

Do we care anywhere else about deeper chain from that iff to the actual load? I.e. no other code is
broken due to the new graph shape?

> Part of the logic required for this to work (the code added by the
> patch) also got lost at some point.

That makes sense.

-Aleksey


From rwestrel at redhat.com  Wed Jan 31 16:03:16 2018
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Wed, 31 Jan 2018 16:03:16 +0000
Subject: hg: shenandoah/jdk10: Cherry-pick 8191887: assert(b->is_Bool()) in
 PhaseIdealLoop::clone_iff() due to Opaque4 node
Message-ID: <201801311603.w0VG3Hcu012590@aojmv0008.oracle.com>

Changeset: e3d076dce734
Author:    roland
Date:      2018-01-31 16:26 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/e3d076dce734

Cherry-pick 8191887: assert(b->is_Bool()) in PhaseIdealLoop::clone_iff() due to Opaque4 node
Summary: add special handling for graph shape If->Opaque4->Bool->CmpP
Reviewed-by: kvn

! src/hotspot/share/opto/loopnode.hpp
! src/hotspot/share/opto/loopopts.cpp
+ test/hotspot/jtreg/compiler/unsafe/TestLoopUnswitching.java


From rwestrel at redhat.com  Wed Jan 31 16:04:36 2018
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 31 Jan 2018 17:04:36 +0100
Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar
In-Reply-To: <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com>
References: <dk64ln2rqj3.fsf@rwestrel.remote.csb>
 <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com>
Message-ID: <dk61si6rp4b.fsf@rwestrel.remote.csb>


> Do we care anywhere else about deeper chain from that iff to the actual load? I.e. no other code is
> broken due to the new graph shape?

Not as far as can tell.

Roland.

From shade at redhat.com  Wed Jan 31 16:05:21 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 17:05:21 +0100
Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar
In-Reply-To: <dk61si6rp4b.fsf@rwestrel.remote.csb>
References: <dk64ln2rqj3.fsf@rwestrel.remote.csb>
 <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com>
 <dk61si6rp4b.fsf@rwestrel.remote.csb>
Message-ID: <2b5cfaa0-5809-9732-bcbf-f7ab746f1d9e@redhat.com>

On 01/31/2018 05:04 PM, Roland Westrelin wrote:
>> Do we care anywhere else about deeper chain from that iff to the actual load? I.e. no other code is
>> broken due to the new graph shape?
> 
> Not as far as can tell.

All good then.

-Aleksey


From rwestrel at redhat.com  Wed Jan 31 16:17:31 2018
From: rwestrel at redhat.com (rwestrel at redhat.com)
Date: Wed, 31 Jan 2018 16:17:31 +0000
Subject: hg: shenandoah/jdk10: fix -ShenandoahWriteBarrierMemBar and loop
 unswitching
Message-ID: <201801311617.w0VGHVsL017798@aojmv0008.oracle.com>

Changeset: af9272163588
Author:    roland
Date:      2018-01-31 16:17 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/af9272163588

fix -ShenandoahWriteBarrierMemBar and loop unswitching

! src/hotspot/share/opto/shenandoahSupport.cpp


From shade at redhat.com  Wed Jan 31 16:49:21 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 17:49:21 +0100
Subject: RFR: Make major GC phases exclusive from each other
In-Reply-To: <CAAN-KyigQEK-0u-PFt7Vr5=SCCuGLAX1_TcK_rLweNDk5HmQdQ@mail.gmail.com>
References: <CAAN-KygNkkbTQL1ucuK3Jm+VVoxMn=Mwkc+8gFYxGGsT+kJkDA@mail.gmail.com>
 <fe149a00-2f04-52d5-00a3-d09dd474b2d8@redhat.com>
 <CAAN-KyhYFF1XVtDk-vJuf9PYa72O3RFEw8tt9pLqF8FTdGyfNg@mail.gmail.com>
 <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com>
 <CAAN-KyigQEK-0u-PFt7Vr5=SCCuGLAX1_TcK_rLweNDk5HmQdQ@mail.gmail.com>
Message-ID: <675232a7-9eeb-8ff4-5c6e-038d7b137857@redhat.com>

On 01/30/2018 12:21 PM, Roman Kennke wrote:
> Differential:
> http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02.diff/
> <http://cr.openjdk.java.net/%7Erkennke/exclusive-gc-phases/webrev.02.diff/>
> Full:
> http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02/

Oh wait, where are AArch64 parts?

-Aleksey


From shade at redhat.com  Wed Jan 31 17:37:11 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 18:37:11 +0100
Subject: RFR [9] 2018-02-01: Bulk backports to sh/jdk9
Message-ID: <5ab32b43-3d02-60f7-4a2e-4f611b1b8f4b@redhat.com>

http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180201/webrev.01/

This backports the follow-up bugfixes we have recently found to sh/jdk9:

16198c705496: [backport] Conditionalize PerfDataMemorySize on enabled heap sampling
dd1b2cd3c66e: [backport] Make major GC phases exclusive from each other
4050463704a4: [backport] Single GCTimer shared by all operations
af9272163588: [backport] fix -ShenandoahWriteBarrierMemBar and loop unswitching

sh/jdk10 nightly is running with them now to verify separately.

Testing: hotspot_gc_shenandoah {fastdebug|release}

Thanks,
-Aleksey


From rkennke at redhat.com  Wed Jan 31 17:39:05 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 18:39:05 +0100
Subject: RFR: Make major GC phases exclusive from each other
In-Reply-To: <675232a7-9eeb-8ff4-5c6e-038d7b137857@redhat.com>
References: <CAAN-KygNkkbTQL1ucuK3Jm+VVoxMn=Mwkc+8gFYxGGsT+kJkDA@mail.gmail.com>
 <fe149a00-2f04-52d5-00a3-d09dd474b2d8@redhat.com>
 <CAAN-KyhYFF1XVtDk-vJuf9PYa72O3RFEw8tt9pLqF8FTdGyfNg@mail.gmail.com>
 <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com>
 <CAAN-KyigQEK-0u-PFt7Vr5=SCCuGLAX1_TcK_rLweNDk5HmQdQ@mail.gmail.com>
 <675232a7-9eeb-8ff4-5c6e-038d7b137857@redhat.com>
Message-ID: <ED4A24C2-7BF9-4C76-9929-2087B0943A29@redhat.com>

Will do them as soon as possible.

Am 31. Januar 2018 17:49:21 MEZ schrieb Aleksey Shipilev <shade at redhat.com>:
>On 01/30/2018 12:21 PM, Roman Kennke wrote:
>> Differential:
>>
>http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02.diff/
>>
><http://cr.openjdk.java.net/%7Erkennke/exclusive-gc-phases/webrev.02.diff/>
>> Full:
>> http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02/
>
>Oh wait, where are AArch64 parts?
>
>-Aleksey

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.

From rkennke at redhat.com  Wed Jan 31 19:14:54 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 20:14:54 +0100
Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar
In-Reply-To: <dk61si6rp4b.fsf@rwestrel.remote.csb>
References: <dk64ln2rqj3.fsf@rwestrel.remote.csb>
 <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com>
 <dk61si6rp4b.fsf@rwestrel.remote.csb>
Message-ID: <d07295f5-ba76-e8a2-4cc2-8ca614ba4151@redhat.com>

Am 31.01.2018 um 17:04 schrieb Roland Westrelin:
> 
>> Do we care anywhere else about deeper chain from that iff to the actual load? I.e. no other code is
>> broken due to the new graph shape?
> 
> Not as far as can tell.
> 
> Roland.
> 


With the patch, -XX:-ShenandoahWriteBarrierMemBar does not crash 
anymore, but it's significantly slower than with membar... like 75% 
slower. Which seems illogical.

tried with:
-XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions 
-XX:+UnlockExperimentalVMOptions -Xms4g -Xmx4g 
-XX:ShenandoahGCHeuristics=traversal -XX:ShenandoahFreeThreshold=17 
-XX:-ShenandoahWriteBarrierMemBar

on specjvm compiler.

Roman

From rkennke at redhat.com  Wed Jan 31 19:59:26 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 20:59:26 +0100
Subject: RFR: Don't treat allocation regions implicitely live during traversal
 GC
Message-ID: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>

Until now, we treated allocation regions from between GC cycles all live 
in traversal GC. This seems inconsequential: we are not treating alloc 
regions live during the cycle. This means that all the allocated garbage 
will have to pass through one complete cycle to count its liveness, and 
then another cycle to clear it up. This patch changes this to 
traverse+clear alloc regions in the next cycle.

This gives some application a huge boost. E.g. compiler.compiler goes 
from 180ops/m to around 205ops/m in my tests.

http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/

Ok?

Roman

From shade at redhat.com  Wed Jan 31 20:03:50 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 21:03:50 +0100
Subject: RFR: Don't treat allocation regions implicitely live during
 traversal GC
In-Reply-To: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>
References: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>
Message-ID: <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>

On 01/31/2018 08:59 PM, Roman Kennke wrote:
> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems
> inconsequential: we are not treating alloc regions live during the cycle. This means that all the
> allocated garbage will have to pass through one complete cycle to count its liveness, and then
> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next
> cycle.
> 
> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around
> 205ops/m in my tests.
> 
> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/

Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g.
"adaptive"?

-Aleksey


From shade at redhat.com  Wed Jan 31 20:15:10 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 21:15:10 +0100
Subject: RFR: Don't treat allocation regions implicitely live during
 traversal GC
In-Reply-To: <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>
References: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>
 <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>
Message-ID: <263fedae-8aa3-ca77-3c4b-38710184f97b@redhat.com>

On 01/31/2018 09:03 PM, Aleksey Shipilev wrote:
> On 01/31/2018 08:59 PM, Roman Kennke wrote:
>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems
>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the
>> allocated garbage will have to pass through one complete cycle to count its liveness, and then
>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next
>> cycle.
>>
>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around
>> 205ops/m in my tests.
>>
>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/
> 
> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g.
> "adaptive"?

Yes, it does break half of hotspot_gc_shenandoah. Should be e.g.:

bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const {
  ShenandoahHeap* heap = ShenandoahHeap::heap();
  if (heap->shenandoahPolicy()->can_do_traversal_gc()) {
    if (heap->is_concurrent_traversal_in_progress()) {
      return false;
    }
    switch (type) {
      case ShenandoahHeap::_alloc_tlab:
      case ShenandoahHeap::_alloc_shared:
        return false;
      case ShenandoahHeap::_alloc_gclab:
      case ShenandoahHeap::_alloc_shared_gc:
        return true;
      default:
        ShouldNotReachHere();
    }
  }
  return true;
}

-Aleksey


From rkennke at redhat.com  Wed Jan 31 20:15:48 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 21:15:48 +0100
Subject: RFR: Don't treat allocation regions implicitely live during
 traversal GC
In-Reply-To: <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>
References: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>
 <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>
Message-ID: <b47cd381-5678-cd02-d3ce-b6c5df34548a@redhat.com>

Am 31.01.2018 um 21:03 schrieb Aleksey Shipilev:
> On 01/31/2018 08:59 PM, Roman Kennke wrote:
>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems
>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the
>> allocated garbage will have to pass through one complete cycle to count its liveness, and then
>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next
>> cycle.
>>
>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around
>> 205ops/m in my tests.
>>
>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/
> 
> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g.
> "adaptive"?
> 

Grr. See this is what happens when you want to rush out a change when 
brain is pudding ;-) In my tests I guarded this by UseNewCode...

http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.01/

Better?

Roman


From shade at redhat.com  Wed Jan 31 20:19:15 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 21:19:15 +0100
Subject: RFR: Don't treat allocation regions implicitely live during
 traversal GC
In-Reply-To: <b47cd381-5678-cd02-d3ce-b6c5df34548a@redhat.com>
References: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>
 <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>
 <b47cd381-5678-cd02-d3ce-b6c5df34548a@redhat.com>
Message-ID: <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com>

On 01/31/2018 09:15 PM, Roman Kennke wrote:
> Am 31.01.2018 um 21:03 schrieb Aleksey Shipilev:
>> On 01/31/2018 08:59 PM, Roman Kennke wrote:
>>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems
>>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the
>>> allocated garbage will have to pass through one complete cycle to count its liveness, and then
>>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next
>>> cycle.
>>>
>>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around
>>> 205ops/m in my tests.
>>>
>>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/
>>
>> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g.
>> "adaptive"?
>>
> 
> Grr. See this is what happens when you want to rush out a change when brain is pudding ;-) In my
> tests I guarded this by UseNewCode...
> 
> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.01/
> 
> Better?

Yes, that seems okay. I'd still suggest a switch to guard from accidental enums:

bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const {
  if (ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) {
    return false;
  }
  switch (type) {
    case ShenandoahHeap::_alloc_tlab:
    case ShenandoahHeap::_alloc_shared:
      return ShenandoahAllocImplicitLive;
    case ShenandoahHeap::_alloc_gclab:
    case ShenandoahHeap::_alloc_shared_gc:
      return true;
    default:
      ShouldNotReachHere();
      return true;
  }
}

-Aleksey


From rkennke at redhat.com  Wed Jan 31 20:19:34 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 21:19:34 +0100
Subject: RFR: Don't treat allocation regions implicitely live during
 traversal GC
In-Reply-To: <263fedae-8aa3-ca77-3c4b-38710184f97b@redhat.com>
References: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>
 <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>
 <263fedae-8aa3-ca77-3c4b-38710184f97b@redhat.com>
Message-ID: <8dee2e31-d368-c788-16ec-863270ea9c7e@redhat.com>

Am 31.01.2018 um 21:15 schrieb Aleksey Shipilev:
> On 01/31/2018 09:03 PM, Aleksey Shipilev wrote:
>> On 01/31/2018 08:59 PM, Roman Kennke wrote:
>>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems
>>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the
>>> allocated garbage will have to pass through one complete cycle to count its liveness, and then
>>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next
>>> cycle.
>>>
>>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around
>>> 205ops/m in my tests.
>>>
>>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/
>>
>> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g.
>> "adaptive"?
> 
> Yes, it does break half of hotspot_gc_shenandoah. Should be e.g.:
> 
> bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const {
>    ShenandoahHeap* heap = ShenandoahHeap::heap();
>    if (heap->shenandoahPolicy()->can_do_traversal_gc()) {
>      if (heap->is_concurrent_traversal_in_progress()) {
>        return false;
>      }
>      switch (type) {
>        case ShenandoahHeap::_alloc_tlab:
>        case ShenandoahHeap::_alloc_shared:
>          return false;
>        case ShenandoahHeap::_alloc_gclab:
>        case ShenandoahHeap::_alloc_shared_gc:
>          return true;
>        default:
>          ShouldNotReachHere();
>      }
>    }
>    return true;
> }
> 
> -Aleksey
> 


This seems even better. I'm going to push this then?

http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.02/

Good?

Roman


From shade at redhat.com  Wed Jan 31 20:20:39 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 21:20:39 +0100
Subject: RFR: Don't treat allocation regions implicitely live during
 traversal GC
In-Reply-To: <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com>
References: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>
 <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>
 <b47cd381-5678-cd02-d3ce-b6c5df34548a@redhat.com>
 <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com>
Message-ID: <b5b06140-3501-68f8-ecd5-f934a8440c80@redhat.com>

On 01/31/2018 09:19 PM, Aleksey Shipilev wrote:
> On 01/31/2018 09:15 PM, Roman Kennke wrote:
>> Am 31.01.2018 um 21:03 schrieb Aleksey Shipilev:
>>> On 01/31/2018 08:59 PM, Roman Kennke wrote:
>>>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems
>>>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the
>>>> allocated garbage will have to pass through one complete cycle to count its liveness, and then
>>>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next
>>>> cycle.
>>>>
>>>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around
>>>> 205ops/m in my tests.
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/
>>>
>>> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g.
>>> "adaptive"?
>>>
>>
>> Grr. See this is what happens when you want to rush out a change when brain is pudding ;-) In my
>> tests I guarded this by UseNewCode...
>>
>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.01/
>>
>> Better?
> 
> Yes, that seems okay. I'd still suggest a switch to guard from accidental enums:
> 
> bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const {
>   if (ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) {
>     return false;
>   }
>   switch (type) {
>     case ShenandoahHeap::_alloc_tlab:
>     case ShenandoahHeap::_alloc_shared:
>       return ShenandoahAllocImplicitLive;
>     case ShenandoahHeap::_alloc_gclab:
>     case ShenandoahHeap::_alloc_shared_gc:
>       return true;
>     default:
>       ShouldNotReachHere();
>       return true;
>   }
> }
> 
> -Aleksey

I like the ShenandoahAllocImplicitLive flag better, because it avoids the v-call to
collectionPolicy() on allocation path. And it reduces coupling between components.

-Aleksey


From rkennke at redhat.com  Wed Jan 31 20:26:52 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 21:26:52 +0100
Subject: RFR: Don't treat allocation regions implicitely live during
 traversal GC
In-Reply-To: <b5b06140-3501-68f8-ecd5-f934a8440c80@redhat.com>
References: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>
 <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>
 <b47cd381-5678-cd02-d3ce-b6c5df34548a@redhat.com>
 <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com>
 <b5b06140-3501-68f8-ecd5-f934a8440c80@redhat.com>
Message-ID: <d23eb688-70b1-1490-f95e-c1e986bcc010@redhat.com>

Am 31.01.2018 um 21:20 schrieb Aleksey Shipilev:
> On 01/31/2018 09:19 PM, Aleksey Shipilev wrote:
>> On 01/31/2018 09:15 PM, Roman Kennke wrote:
>>> Am 31.01.2018 um 21:03 schrieb Aleksey Shipilev:
>>>> On 01/31/2018 08:59 PM, Roman Kennke wrote:
>>>>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems
>>>>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the
>>>>> allocated garbage will have to pass through one complete cycle to count its liveness, and then
>>>>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next
>>>>> cycle.
>>>>>
>>>>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around
>>>>> 205ops/m in my tests.
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/
>>>>
>>>> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g.
>>>> "adaptive"?
>>>>
>>>
>>> Grr. See this is what happens when you want to rush out a change when brain is pudding ;-) In my
>>> tests I guarded this by UseNewCode...
>>>
>>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.01/
>>>
>>> Better?
>>
>> Yes, that seems okay. I'd still suggest a switch to guard from accidental enums:
>>
>> bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const {
>>    if (ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) {
>>      return false;
>>    }
>>    switch (type) {
>>      case ShenandoahHeap::_alloc_tlab:
>>      case ShenandoahHeap::_alloc_shared:
>>        return ShenandoahAllocImplicitLive;
>>      case ShenandoahHeap::_alloc_gclab:
>>      case ShenandoahHeap::_alloc_shared_gc:
>>        return true;
>>      default:
>>        ShouldNotReachHere();
>>        return true;
>>    }
>> }
>>
>> -Aleksey
> 
> I like the ShenandoahAllocImplicitLive flag better, because it avoids the v-call to
> collectionPolicy() on allocation path. And it reduces coupling between components.
> 
> -Aleksey
> 
> 
> 

Ok. Then this:

http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.03

?

Roman

From shade at redhat.com  Wed Jan 31 20:28:38 2018
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 31 Jan 2018 21:28:38 +0100
Subject: RFR: Don't treat allocation regions implicitely live during
 traversal GC
In-Reply-To: <d23eb688-70b1-1490-f95e-c1e986bcc010@redhat.com>
References: <d4c2d842-b4c8-08f4-ed3c-f10489cf9f7f@redhat.com>
 <c8742a23-2d36-be0b-4487-75778c48ec90@redhat.com>
 <b47cd381-5678-cd02-d3ce-b6c5df34548a@redhat.com>
 <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com>
 <b5b06140-3501-68f8-ecd5-f934a8440c80@redhat.com>
 <d23eb688-70b1-1490-f95e-c1e986bcc010@redhat.com>
Message-ID: <5acd32d8-04a1-4b3b-51b2-72fa59089aae@redhat.com>

On 01/31/2018 09:26 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.03

Yes, seems good.

-Aleksey


From roman at kennke.org  Wed Jan 31 20:37:53 2018
From: roman at kennke.org (roman at kennke.org)
Date: Wed, 31 Jan 2018 20:37:53 +0000
Subject: hg: shenandoah/jdk10: Don't treat allocation regions implicitely live
 during traversal GC
Message-ID: <201801312037.w0VKbsx0024851@aojmv0008.oracle.com>

Changeset: 207591c5122b
Author:    rkennke
Date:      2018-01-31 21:14 +0100
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/207591c5122b

Don't treat allocation regions implicitely live during traversal GC

! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp
! src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp
! src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp
! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp


From zgu at redhat.com  Wed Jan 31 20:45:06 2018
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 31 Jan 2018 15:45:06 -0500
Subject: RFR: More cancelled concgc check and bailout
Message-ID: <ea9b416c-70d8-0ada-45ac-e08de61acfa7@redhat.com>

More cancelled concgc check and bailout. With this patch, traversal GC 
passed specJVM  with string deduplication on.


Webrev: 
http://cr.openjdk.java.net/~zgu/shenandoah/traversal_cancelled_gc/webrev.00/


Test:
   hotspot_gc_shenandoah (fastdebug + release)

Thanks,

-Zhengyu


From rkennke at redhat.com  Wed Jan 31 21:41:43 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 22:41:43 +0100
Subject: RFR: More cancelled concgc check and bailout
In-Reply-To: <ea9b416c-70d8-0ada-45ac-e08de61acfa7@redhat.com>
References: <ea9b416c-70d8-0ada-45ac-e08de61acfa7@redhat.com>
Message-ID: <8d99af81-652b-958e-6d3b-7411bf6af76d@redhat.com>


> More cancelled concgc check and bailout. With this patch, traversal GC 
> passed specJVM? with string deduplication on.
> 
> 
> Webrev: 
> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_cancelled_gc/webrev.00/ 
> 
> 
> 
> Test:
>  ? hotspot_gc_shenandoah (fastdebug + release)
> 
> Thanks,
> 
> -Zhengyu
> 

Looks good to me. Thanks!

From rkennke at redhat.com  Wed Jan 31 21:58:44 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 22:58:44 +0100
Subject: RFR: Don't count evacs double in traversal GC
Message-ID: <1c2666da-2b69-11eb-f722-75891c90374d@redhat.com>

I think this improved liveness work just led me to find the liveness 
accounting bug that I have observed occasionally. It seems we are 
counting evacs double: once when allocating the gclab/shared-gc, and 
once by the usual GC mechanics: we evac cset objects, then push them to 
the queue, and when it's popped, we count liveness for the object, 
regardless in which region it is. Let's never count any liveness on 
allocation, and do GC traversal count it. This is more precise (not 
counting any GCLAB waste).

http://cr.openjdk.java.net/~rkennke/traversal-liveness-accounting/webrev.00/

Test: hotspot_gc_shenandoah

Ok?

Roman

From zgu at redhat.com  Wed Jan 31 22:08:15 2018
From: zgu at redhat.com (zgu at redhat.com)
Date: Wed, 31 Jan 2018 22:08:15 +0000
Subject: hg: shenandoah/jdk10: Cancelled congc check and bailout to avoid
 assertion failure
Message-ID: <201801312208.w0VM8Ff0025280@aojmv0008.oracle.com>

Changeset: 29e22a0191fa
Author:    zgu
Date:      2018-01-31 17:03 -0500
URL:       http://hg.openjdk.java.net/shenandoah/jdk10/rev/29e22a0191fa

Cancelled congc check and bailout to avoid assertion failure

! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.cpp


From rkennke at redhat.com  Wed Jan 31 22:18:39 2018
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 31 Jan 2018 23:18:39 +0100
Subject: RFR: Don't count evacs double in traversal GC
In-Reply-To: <1c2666da-2b69-11eb-f722-75891c90374d@redhat.com>
References: <1c2666da-2b69-11eb-f722-75891c90374d@redhat.com>
Message-ID: <055cde79-e44f-261c-e858-271aa2a83244@redhat.com>

Am 31.01.2018 um 22:58 schrieb Roman Kennke:
> I think this improved liveness work just led me to find the liveness 
> accounting bug that I have observed occasionally. It seems we are 
> counting evacs double: once when allocating the gclab/shared-gc, and 
> once by the usual GC mechanics: we evac cset objects, then push them to 
> the queue, and when it's popped, we count liveness for the object, 
> regardless in which region it is. Let's never count any liveness on 
> allocation, and do GC traversal count it. This is more precise (not 
> counting any GCLAB waste).
> 
> http://cr.openjdk.java.net/~rkennke/traversal-liveness-accounting/webrev.00/ 
> 
> 
> Test: hotspot_gc_shenandoah
> 
> Ok?
> 
> Roman

Ok, this is probably nonsense. The traversal-in-progress check should 
already have caught this. However, outside of the traversal phase, there 
are no evacs either, so the whole switch-block is useless. The patch 
should still be useful as a cleanup and simplification.

Ok?

Roman