From rkennke at redhat.com  Thu Jun  1 09:29:44 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 1 Jun 2017 11:29:44 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
Message-ID: <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>

Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
> Hi Roman, I agree that is really needed but:
>
> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>> I realized that sharing workers with GC is not so easy.
>>
>> We need to be able to use the workers at a safepoint during concurrent
>> GC work (which also uses the same workers). This does not only require
>> that those workers be suspended, like e.g.
>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have
>> finished their tasks. This needs some careful handling to work without
>> races: it requires a SuspendibleThreadSetJoiner around the corresponding
>> run_task() call and also the tasks themselves need to join the STS and
>> handle requests for safepoints not by yielding, but by leaving the task.
>> This is far too peculiar for me to make the call to hook up GC workers
>> for safepoint cleanup, and I thus removed those parts. I left the API in
>> CollectedHeap in place. I think GC devs who know better about G1 and CMS
>> should make that call, or else just use a separate thread pool.
>>
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>
>> Is it ok now?
>
> I still think you should put the "Parallel Safepoint Cleanup" workers
> inside Shenandoah,
> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.:
>
> _cleanup_workers = heap->get_safepoint_workers();
> _num_cleanup_workers = _cleanup_workers != NULL ?
> _cleanup_workers->total_workers() : 1;
> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
> StrongRootsScope srs(_num_cleanup_workers);
> if (_cleanup_workers != NULL) {
>   _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
> } else {
>   cleanup.work(0);
> }
>
> That way you don't even need your new flags, but it will be up to the
> other GCs to make their worker available
> or cheat with a separate workgang.
I can do that, I don't mind. The question is, do we want that?
I wouldn't call it 'cheating with a separate workgang' though. I see
that both G1 and CMS suspend their worker threads at a safepoint. However:
- Do they finish their work, stop, and then restart work after
safepoint? Or are the workers simply calling STS::yield() to suspend and
later resume their work where they left off. If they only call yield()
(or whatever equivalent in CMS), then this is not enough: the workers
need to be truly idle in order to be used by the safepoint cleaners.
- Parallel and serial GC don't have workgangs of their own.

So, as far as I can tell, this means that parallel safepoint cleanup
would only be supported by GCs for which we explicitely implement it,
after having carefully checked if/how workgangs are suspended at
safepoints, or by providing GC-internal thread pools. Do we really want
that?

Roman


From erik.helin at oracle.com  Thu Jun  1 09:35:31 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Thu, 1 Jun 2017 11:35:31 +0200
Subject: RFR (7xS): 8071280: Specialize
 HeapRegion::oops_on_card_seq_iterate_careful() for use during concurrent
 refinement and updating the rset
In-Reply-To: <1496218454.3287.2.camel@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <aeb8b20f-080d-3313-cb37-25ba41934bb2@oracle.com>
 <1493985823.2777.52.camel@oracle.com>
 <3098f31e-3301-362d-8c9c-b06f27e7133c@oracle.com>
 <1494847168.2707.17.camel@oracle.com> <1495538626.2781.3.camel@oracle.com>
 <1496218454.3287.2.camel@oracle.com>
Message-ID: <d93962cc-79c4-d225-f521-b6c5c9445d8f@oracle.com>

On 05/31/2017 10:14 AM, Thomas Schatzl wrote:
> Hi all,
>
> On Tue, 2017-05-23 at 13:23 +0200, Thomas Schatzl wrote:
>> Hi all,
>>
>>   Erik Helin had a few comments regarding naming etc. that this new
>> webrev incorporates:
>>
>> Webrevs:
>> http://cr.openjdk.java.net/~tschatzl/8071280/webrev.2_to_3/ (diff)
>> http://cr.openjdk.java.net/~tschatzl/8071280/webrev.3/ (full)
>
>   Erik and me did some more investigation on this and found that
> actually the is_gc_active parameter
> for HeapRegion::is_obj_dead_with_size() is not required any more due to
> the addition of the ClassUnloadingWithConcurrentMark clause inside.
>
> I removed that and updated the webrev.
>
> Sorry for another change - Erik promised me that it's good now :)

Yep, this is good to go now. As we've discussed, there are some further 
improvements that can be done to this code, but we do that as a later patch.

Thanks a lot for cleaning up this code!
Erik

> Thanks,
>   Thomas
>


From erik.helin at oracle.com  Thu Jun  1 11:59:58 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Thu, 1 Jun 2017 13:59:58 +0200
Subject: RFR (7xS): 8177707: Specialize G1RemSet::refine_card for
 concurrent/during safepoint refinement
In-Reply-To: <1496218627.3287.5.camel@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <3E94A9B0-D0AB-4521-9727-D4B1D0954BAA@oracle.com>
 <1493985121.2777.42.camel@oracle.com>
 <D438EF79-9DAB-4F97-A564-B3E4009105B9@oracle.com>
 <192c34a3-c869-599d-0661-2ca9c524b626@oracle.com>
 <1496218627.3287.5.camel@oracle.com>
Message-ID: <f305f481-444c-f127-e7ea-d325fd92d3b5@oracle.com>

On 05/31/2017 10:17 AM, Thomas Schatzl wrote:
> Hi Erik,
>
> On Tue, 2017-05-30 at 11:43 +0200, Erik Helin wrote:
>> On 05/09/2017 01:31 AM, Kim Barrett wrote:
>>>
>>>>
>>>> On May 5, 2017, at 7:52 AM, Thomas Schatzl <thomas.schatzl at oracle
>>>> .com> wrote:
>>>>
>>>> New webrevs:
>>>> http://cr.openjdk.java.net/~tschatzl/8177707/webrev.0_to_1/
>>>> (diff)
>>>> http://cr.openjdk.java.net/~tschatzl/8177707/webrev.1/ (full)
>>>>
>>>> Thanks,
>>>>   Thomas
>>> Looks good.
>> Looks good to me as well! I've tried to really to go through the
>> patch with a looking glass, and AFAICS all the code have been
>> duplicated correctly.
>
>  thanks.
>
> Unfortunately, with these reviews taking their time a change in the
> jdk10 repo required an update. In
> particular G1RootRegionScanClosure::apply_to_weak_ref_discovered_field(
> ) had to be replaced by
> G1RootRegionScanClosure::reference_iteration_mode() as introduced
> lately.
>
> The full patch:
>
> --- old/src/share/vm/gc/g1/g1OopClosures.hpp	2017-05-16
> 09:57:27.140974921 +0200
> +++ new/src/share/vm/gc/g1/g1OopClosures.hpp	2017-05-16
> 09:57:27.035971738 +0200
> @@ -181,7 +181,8 @@
>      _worker_i(worker_i) {
>    }
>
> -  bool apply_to_weak_ref_discovered_field() { return true; }
> +  // This closure needs special handling for InstanceRefKlass.
> +  virtual ReferenceIterationMode reference_iteration_mode() { return
> DO_DISCOVERED_AND_DISCOVERY; }
>
>    template <class T> void do_oop_nv(T* p);
>    virtual void do_oop(narrowOop* p) { do_oop_nv(p); }

Looks good! Now push it before something else changes :)
Erik

> Thanks,
>   Thomas
>


From robbin.ehn at oracle.com  Thu Jun  1 12:18:23 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 1 Jun 2017 14:18:23 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
Message-ID: <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>

Hi Roman,

On 06/01/2017 11:29 AM, Roman Kennke wrote:
> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>> Hi Roman, I agree that is really needed but:
>>
>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>> I realized that sharing workers with GC is not so easy.
>>>
>>> We need to be able to use the workers at a safepoint during concurrent
>>> GC work (which also uses the same workers). This does not only require
>>> that those workers be suspended, like e.g.
>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have
>>> finished their tasks. This needs some careful handling to work without
>>> races: it requires a SuspendibleThreadSetJoiner around the corresponding
>>> run_task() call and also the tasks themselves need to join the STS and
>>> handle requests for safepoints not by yielding, but by leaving the task.
>>> This is far too peculiar for me to make the call to hook up GC workers
>>> for safepoint cleanup, and I thus removed those parts. I left the API in
>>> CollectedHeap in place. I think GC devs who know better about G1 and CMS
>>> should make that call, or else just use a separate thread pool.
>>>
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>
>>> Is it ok now?
>>
>> I still think you should put the "Parallel Safepoint Cleanup" workers
>> inside Shenandoah,
>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.:
>>
>> _cleanup_workers = heap->get_safepoint_workers();
>> _num_cleanup_workers = _cleanup_workers != NULL ?
>> _cleanup_workers->total_workers() : 1;
>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>> StrongRootsScope srs(_num_cleanup_workers);
>> if (_cleanup_workers != NULL) {
>>    _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>> } else {
>>    cleanup.work(0);
>> }
>>
>> That way you don't even need your new flags, but it will be up to the
>> other GCs to make their worker available
>> or cheat with a separate workgang.
> I can do that, I don't mind. The question is, do we want that?

The problem is that we do not want to haste such decision, we believe there is a better solution.
I think you also would want another solution.
But it's seems like such solution with 1 'global' thread pool either own by GC or the VM it self is quite the undertaking.
Since this probably will not be done any time soon my suggestion is, to not hold you back (we also want this), just to make
the code parallel and as an intermediate step ask the GC if it minds sharing it's thread.

Now when Shenandoah is merged it's possible that e.g. G1 will share the code for a separate thread pool, do something of it's own or
wait until the bigger question about thread pool(s) have been resolved.

By adding a thread pool directly to the SafepointSynchronizer and flags for it we might limit our future options.

> I wouldn't call it 'cheating with a separate workgang' though. I see
> that both G1 and CMS suspend their worker threads at a safepoint. However:

Yes it's not cheating but I want decent heuristics between e.g. number of concurrent marking threads and parallel safepoint threads since they compete for cpu time.
As the code looks now, I think that decisions must be made by the GC.

> - Do they finish their work, stop, and then restart work after
> safepoint? Or are the workers simply calling STS::yield() to suspend and
> later resume their work where they left off. If they only call yield()
> (or whatever equivalent in CMS), then this is not enough: the workers
> need to be truly idle in order to be used by the safepoint cleaners.
> - Parallel and serial GC don't have workgangs of their own.

I know Erik ? have been some prototyping here, he can probably fill you in.

> 
> So, as far as I can tell, this means that parallel safepoint cleanup
> would only be supported by GCs for which we explicitely implement it,
> after having carefully checked if/how workgangs are suspended at
> safepoints, or by providing GC-internal thread pools. Do we really want
> that?

We know we probably don't want a thread pool in the SafepointSynchronizer and probably not more pools, but we need to think about it.

Do you agree?

Still thanks for doing this!

/Robbin

> 
> Roman
> 


From rkennke at redhat.com  Thu Jun  1 15:50:59 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 1 Jun 2017 17:50:59 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
Message-ID: <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>

Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
> Hi Roman,
>
> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>> Hi Roman, I agree that is really needed but:
>>>
>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>> I realized that sharing workers with GC is not so easy.
>>>>
>>>> We need to be able to use the workers at a safepoint during concurrent
>>>> GC work (which also uses the same workers). This does not only require
>>>> that those workers be suspended, like e.g.
>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have
>>>> finished their tasks. This needs some careful handling to work without
>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>> corresponding
>>>> run_task() call and also the tasks themselves need to join the STS and
>>>> handle requests for safepoints not by yielding, but by leaving the
>>>> task.
>>>> This is far too peculiar for me to make the call to hook up GC workers
>>>> for safepoint cleanup, and I thus removed those parts. I left the
>>>> API in
>>>> CollectedHeap in place. I think GC devs who know better about G1
>>>> and CMS
>>>> should make that call, or else just use a separate thread pool.
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>
>>>> Is it ok now?
>>>
>>> I still think you should put the "Parallel Safepoint Cleanup" workers
>>> inside Shenandoah,
>>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.:
>>>
>>> _cleanup_workers = heap->get_safepoint_workers();
>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>> _cleanup_workers->total_workers() : 1;
>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>> StrongRootsScope srs(_num_cleanup_workers);
>>> if (_cleanup_workers != NULL) {
>>>    _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>>> } else {
>>>    cleanup.work(0);
>>> }
>>>
>>> That way you don't even need your new flags, but it will be up to the
>>> other GCs to make their worker available
>>> or cheat with a separate workgang.
>> I can do that, I don't mind. The question is, do we want that?
>
> The problem is that we do not want to haste such decision, we believe
> there is a better solution.
> I think you also would want another solution.
> But it's seems like such solution with 1 'global' thread pool either
> own by GC or the VM it self is quite the undertaking.
> Since this probably will not be done any time soon my suggestion is,
> to not hold you back (we also want this), just to make
> the code parallel and as an intermediate step ask the GC if it minds
> sharing it's thread.
>
> Now when Shenandoah is merged it's possible that e.g. G1 will share
> the code for a separate thread pool, do something of it's own or
> wait until the bigger question about thread pool(s) have been resolved.
>
> By adding a thread pool directly to the SafepointSynchronizer and
> flags for it we might limit our future options.
>
>> I wouldn't call it 'cheating with a separate workgang' though. I see
>> that both G1 and CMS suspend their worker threads at a safepoint.
>> However:
>
> Yes it's not cheating but I want decent heuristics between e.g. number
> of concurrent marking threads and parallel safepoint threads since
> they compete for cpu time.
> As the code looks now, I think that decisions must be made by the GC.

Ok, I see your point. I updated the proposed patch accordingly:

http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
<http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>

But it means that parallel safepoint cleanup is not really available
unless it's implemented by a GC.

There's one little change compared to the current state even with serial
cleanup: nmethod marking and monitor deflation are now done in one
single pass.

I am curious about what you're thinking about when you say you 'want
another solution'. I am having another solution in mind too: concurrent
monitor deflation. I am currently drafting a JEP but it's not ready yet.

So what do you think of the latest iteration of that patch?

Roman


From rkennke at redhat.com  Thu Jun  1 16:21:06 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 1 Jun 2017 18:21:06 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
Message-ID: <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>

Am 01.06.2017 um 17:50 schrieb Roman Kennke:
> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>> Hi Roman,
>>
>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>> Hi Roman, I agree that is really needed but:
>>>>
>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>> I realized that sharing workers with GC is not so easy.
>>>>>
>>>>> We need to be able to use the workers at a safepoint during concurrent
>>>>> GC work (which also uses the same workers). This does not only require
>>>>> that those workers be suspended, like e.g.
>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have
>>>>> finished their tasks. This needs some careful handling to work without
>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>> corresponding
>>>>> run_task() call and also the tasks themselves need to join the STS and
>>>>> handle requests for safepoints not by yielding, but by leaving the
>>>>> task.
>>>>> This is far too peculiar for me to make the call to hook up GC workers
>>>>> for safepoint cleanup, and I thus removed those parts. I left the
>>>>> API in
>>>>> CollectedHeap in place. I think GC devs who know better about G1
>>>>> and CMS
>>>>> should make that call, or else just use a separate thread pool.
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>
>>>>> Is it ok now?
>>>> I still think you should put the "Parallel Safepoint Cleanup" workers
>>>> inside Shenandoah,
>>>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.:
>>>>
>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>> _cleanup_workers->total_workers() : 1;
>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>> if (_cleanup_workers != NULL) {
>>>>    _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>>>> } else {
>>>>    cleanup.work(0);
>>>> }
>>>>
>>>> That way you don't even need your new flags, but it will be up to the
>>>> other GCs to make their worker available
>>>> or cheat with a separate workgang.
>>> I can do that, I don't mind. The question is, do we want that?
>> The problem is that we do not want to haste such decision, we believe
>> there is a better solution.
>> I think you also would want another solution.
>> But it's seems like such solution with 1 'global' thread pool either
>> own by GC or the VM it self is quite the undertaking.
>> Since this probably will not be done any time soon my suggestion is,
>> to not hold you back (we also want this), just to make
>> the code parallel and as an intermediate step ask the GC if it minds
>> sharing it's thread.
>>
>> Now when Shenandoah is merged it's possible that e.g. G1 will share
>> the code for a separate thread pool, do something of it's own or
>> wait until the bigger question about thread pool(s) have been resolved.
>>
>> By adding a thread pool directly to the SafepointSynchronizer and
>> flags for it we might limit our future options.
>>
>>> I wouldn't call it 'cheating with a separate workgang' though. I see
>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>> However:
>> Yes it's not cheating but I want decent heuristics between e.g. number
>> of concurrent marking threads and parallel safepoint threads since
>> they compete for cpu time.
>> As the code looks now, I think that decisions must be made by the GC.
> Ok, I see your point. I updated the proposed patch accordingly:
>
> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
Oops. Minor mistake there. Correction:
http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
<http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>

(Removed 'class WorkGang' from safepoint.hpp, and forgot to add it into
collectedHeap.hpp, resulting in build failure...)

Roman


From rkennke at redhat.com  Thu Jun  1 20:50:22 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 1 Jun 2017 22:50:22 +0200
Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap into
 its own subclass
Message-ID: <b33ca127-c0d1-5a4b-7565-0ffe2ca6fe52@redhat.com>

What $SUBJECT says.

I went over genCollectedHeap.[hpp|cpp] and moved everything that I could
find that is CMS-only into a new CMSHeap class.

http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/
<http://cr.openjdk.java.net/%7Erkennke/8179387/webrev.00/>

It is possible that I overlooked something there. There may be code in
there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff.

Also not that I have not removed that little part:

  always_do_update_barrier = UseConcMarkSweepGC;

because I expect it to go away with Erik ?'s big refactoring.

What do you think?

Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC

Roman


From david.holmes at oracle.com  Fri Jun  2 01:54:36 2017
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 2 Jun 2017 11:54:36 +1000
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
Message-ID: <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>

Hi Roman,

I am about to disappear on an extended vacation so will let others 
pursue this. IIUC this is longer an opt-in by the user at runtime, but 
an opt-in by the particular GC developers. Okay. My only concern with 
that is if Shenandoah is the only GC that currently opts in then this 
code is not going to get much testing and will be more prone to 
incidental breakage.

Cheers,
David

On 2/06/2017 2:21 AM, Roman Kennke wrote:
> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>> Hi Roman,
>>>
>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>> Hi Roman, I agree that is really needed but:
>>>>>
>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>
>>>>>> We need to be able to use the workers at a safepoint during concurrent
>>>>>> GC work (which also uses the same workers). This does not only require
>>>>>> that those workers be suspended, like e.g.
>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have
>>>>>> finished their tasks. This needs some careful handling to work without
>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>> corresponding
>>>>>> run_task() call and also the tasks themselves need to join the STS and
>>>>>> handle requests for safepoints not by yielding, but by leaving the
>>>>>> task.
>>>>>> This is far too peculiar for me to make the call to hook up GC workers
>>>>>> for safepoint cleanup, and I thus removed those parts. I left the
>>>>>> API in
>>>>>> CollectedHeap in place. I think GC devs who know better about G1
>>>>>> and CMS
>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>
>>>>>> Is it ok now?
>>>>> I still think you should put the "Parallel Safepoint Cleanup" workers
>>>>> inside Shenandoah,
>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.:
>>>>>
>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>> _cleanup_workers->total_workers() : 1;
>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>> if (_cleanup_workers != NULL) {
>>>>>     _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>>>>> } else {
>>>>>     cleanup.work(0);
>>>>> }
>>>>>
>>>>> That way you don't even need your new flags, but it will be up to the
>>>>> other GCs to make their worker available
>>>>> or cheat with a separate workgang.
>>>> I can do that, I don't mind. The question is, do we want that?
>>> The problem is that we do not want to haste such decision, we believe
>>> there is a better solution.
>>> I think you also would want another solution.
>>> But it's seems like such solution with 1 'global' thread pool either
>>> own by GC or the VM it self is quite the undertaking.
>>> Since this probably will not be done any time soon my suggestion is,
>>> to not hold you back (we also want this), just to make
>>> the code parallel and as an intermediate step ask the GC if it minds
>>> sharing it's thread.
>>>
>>> Now when Shenandoah is merged it's possible that e.g. G1 will share
>>> the code for a separate thread pool, do something of it's own or
>>> wait until the bigger question about thread pool(s) have been resolved.
>>>
>>> By adding a thread pool directly to the SafepointSynchronizer and
>>> flags for it we might limit our future options.
>>>
>>>> I wouldn't call it 'cheating with a separate workgang' though. I see
>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>> However:
>>> Yes it's not cheating but I want decent heuristics between e.g. number
>>> of concurrent marking threads and parallel safepoint threads since
>>> they compete for cpu time.
>>> As the code looks now, I think that decisions must be made by the GC.
>> Ok, I see your point. I updated the proposed patch accordingly:
>>
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
> Oops. Minor mistake there. Correction:
> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
> 
> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it into
> collectedHeap.hpp, resulting in build failure...)
> 
> Roman
> 


From erik.helin at oracle.com  Fri Jun  2 08:23:28 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Fri, 2 Jun 2017 10:23:28 +0200
Subject: RFR (7xS): 8177044: Remove _scan_top from HeapRegion
In-Reply-To: <1496218781.3287.9.camel@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <1493986396.2777.61.camel@oracle.com>
 <90aae39c-5314-e051-02a1-67e717be538b@oracle.com>
 <1494512288.3120.2.camel@oracle.com> <1496218781.3287.9.camel@oracle.com>
Message-ID: <4faeb89e-257b-11cd-aabc-4a007263b594@oracle.com>

On 05/31/2017 10:19 AM, Thomas Schatzl wrote:
> Hi all,

Hi Thomas!

>   Erik had a look at these changes and had minor comments:
>
> I.e. he asked about not removing one default value for a parameter in
> the declaration of the constructor of G1UpdateRSOrPushRefOopClosure
> (and do it later).
>
> Also, one call to memset() is redundant and has been removed.
>
> Here are the changes again:
>
> http://cr.openjdk.java.net/~tschatzl/8177044/webrev.1_to_2/ (diff)
> http://cr.openjdk.java.net/~tschatzl/8177044/webrev.2/ (full)

Looks good to me, thanks for cleaning this up!
Erik

> Thanks,
>   thomas
>
> On Thu, 2017-05-11 at 16:18 +0200, Thomas Schatzl wrote:
>> Hi Kim and Sangheon,
>>
>> On Tue, 2017-05-09 at 11:12 -0700, sangheon wrote:
>>>
>>> Hi Thomas,
>>>
>>> On 05/05/2017 05:13 AM, Thomas Schatzl wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>>    recent reviews have made changes necessary to parts of the
>>>> changeset
>>>> chain.
>>>>
>>>> Here is a list of links to updated webrevs. Since they have
>>>> apparently
>>>> not been reviewed yet, I simply overwrote the old webrevs.
>>>>
>>>> JDK-8177044: Remove _scan_top from HeapRegion
>>>> http://cr.openjdk.java.net/~tschatzl/8177044/webrev/
>>> Looks good to me.
>>> And agree to Kim about retaining the comment.
>>>
>>> src/share/vm/gc/g1/g1RemSet.cpp
>>>   765   if (scan_limit <= start) {
>>>   766     // If the trimmed region is empty, the card must be
>>> stale.
>>>   767     return false;
>>>
>>   thanks for your review.
>>
>> For reference, here is the comment I intend to push:
>>
>> --- old/src/share/vm/gc/g1/g1RemSet.cpp	2017-05-11
>> 16:14:56.054517736 +0200
>> +++ new/src/share/vm/gc/g1/g1RemSet.cpp	2017-05-11
>> 16:14:55.951514554 +0200
>> @@ -735,6 +735,7 @@
>>
>>    HeapWord* scan_limit = _scan_state->scan_top(r->hrm_index());
>>    if (scan_limit <= start) {
>> +    // If the card starts above the area in the region containing
>> objects to scan, skip it.
>>      return false;
>>    }
>>
>> because the original comment is wrong now.
>>
>> New Webrevs:
>> http://cr.openjdk.java.net/~tschatzl/8177044/webrev.0_to_1/ (diff)
>> http://cr.openjdk.java.net/~tschatzl/8177044/webrev.1/ (full)
>>
>> Thanks again for your reviews,
>>   Thomas
>>


From rkennke at redhat.com  Fri Jun  2 08:55:15 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 2 Jun 2017 10:55:15 +0200
Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap
 into its own subclass
In-Reply-To: <b33ca127-c0d1-5a4b-7565-0ffe2ca6fe52@redhat.com>
References: <b33ca127-c0d1-5a4b-7565-0ffe2ca6fe52@redhat.com>
Message-ID: <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com>

Take this patch. It #ifdef ASSERT's a call to check_gen_kinds() that is
only present in debug builds.


http://cr.openjdk.java.net/~rkennke/8179387/webrev.01/
<http://cr.openjdk.java.net/%7Erkennke/8179387/webrev.01/>

Roman

Am 01.06.2017 um 22:50 schrieb Roman Kennke:
> What $SUBJECT says.
>
> I went over genCollectedHeap.[hpp|cpp] and moved everything that I could
> find that is CMS-only into a new CMSHeap class.
>
> http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/
> <http://cr.openjdk.java.net/%7Erkennke/8179387/webrev.00/>
>
> It is possible that I overlooked something there. There may be code in
> there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff.
>
> Also not that I have not removed that little part:
>
>   always_do_update_barrier = UseConcMarkSweepGC;
>
> because I expect it to go away with Erik ?'s big refactoring.
>
> What do you think?
>
> Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC
>
> Roman
>


From rkennke at redhat.com  Fri Jun  2 09:41:47 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 2 Jun 2017 11:41:47 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
Message-ID: <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>

Hi David,
thanks for reviewing. I'll be on vacation the next two weeks too, with
only sporadic access to work stuff.
Yes, exposure will not be as good as otherwise, but it's not totally
untested either: the serial code path is the same as the parallel, the
only difference is that it's not actually called by multiple threads.
It's ok I think.

I found two more issues that I think should be addressed:
- There are some counters in deflate_idle_monitors() and I'm not sure I
correctly handle them in the split-up and MT'ed thread-local/ global
list deflation
- nmethod marking seems to unconditionally poke true or something like
that in nmethod fields. This doesn't hurt correctness-wise, but it's
probably worth checking if it's already true, especially when doing this
with multiple threads concurrently.

I'll send an updated patch around later, I hope I can get to it today...

Roman

> Hi Roman,
>
> I am about to disappear on an extended vacation so will let others
> pursue this. IIUC this is longer an opt-in by the user at runtime, but
> an opt-in by the particular GC developers. Okay. My only concern with
> that is if Shenandoah is the only GC that currently opts in then this
> code is not going to get much testing and will be more prone to
> incidental breakage.
>
> Cheers,
> David
>
> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>> Hi Roman,
>>>>
>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>
>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>
>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>> concurrent
>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>> require
>>>>>>> that those workers be suspended, like e.g.
>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have
>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>> without
>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>> corresponding
>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>> STS and
>>>>>>> handle requests for safepoints not by yielding, but by leaving the
>>>>>>> task.
>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>> workers
>>>>>>> for safepoint cleanup, and I thus removed those parts. I left the
>>>>>>> API in
>>>>>>> CollectedHeap in place. I think GC devs who know better about G1
>>>>>>> and CMS
>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>
>>>>>>> Is it ok now?
>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>> workers
>>>>>> inside Shenandoah,
>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.:
>>>>>>
>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>> if (_cleanup_workers != NULL) {
>>>>>>     _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>>>>>> } else {
>>>>>>     cleanup.work(0);
>>>>>> }
>>>>>>
>>>>>> That way you don't even need your new flags, but it will be up to
>>>>>> the
>>>>>> other GCs to make their worker available
>>>>>> or cheat with a separate workgang.
>>>>> I can do that, I don't mind. The question is, do we want that?
>>>> The problem is that we do not want to haste such decision, we believe
>>>> there is a better solution.
>>>> I think you also would want another solution.
>>>> But it's seems like such solution with 1 'global' thread pool either
>>>> own by GC or the VM it self is quite the undertaking.
>>>> Since this probably will not be done any time soon my suggestion is,
>>>> to not hold you back (we also want this), just to make
>>>> the code parallel and as an intermediate step ask the GC if it minds
>>>> sharing it's thread.
>>>>
>>>> Now when Shenandoah is merged it's possible that e.g. G1 will share
>>>> the code for a separate thread pool, do something of it's own or
>>>> wait until the bigger question about thread pool(s) have been
>>>> resolved.
>>>>
>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>> flags for it we might limit our future options.
>>>>
>>>>> I wouldn't call it 'cheating with a separate workgang' though. I see
>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>> However:
>>>> Yes it's not cheating but I want decent heuristics between e.g. number
>>>> of concurrent marking threads and parallel safepoint threads since
>>>> they compete for cpu time.
>>>> As the code looks now, I think that decisions must be made by the GC.
>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>> Oops. Minor mistake there. Correction:
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>
>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it into
>> collectedHeap.hpp, resulting in build failure...)
>>
>> Roman
>>


From robbin.ehn at oracle.com  Fri Jun  2 10:39:11 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Fri, 2 Jun 2017 12:39:11 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
Message-ID: <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>

Hi Roman,

On 06/02/2017 11:41 AM, Roman Kennke wrote:
> Hi David,
> thanks for reviewing. I'll be on vacation the next two weeks too, with
> only sporadic access to work stuff.
> Yes, exposure will not be as good as otherwise, but it's not totally
> untested either: the serial code path is the same as the parallel, the
> only difference is that it's not actually called by multiple threads.
> It's ok I think.
> 
> I found two more issues that I think should be addressed:
> - There are some counters in deflate_idle_monitors() and I'm not sure I
> correctly handle them in the split-up and MT'ed thread-local/ global
> list deflation
> - nmethod marking seems to unconditionally poke true or something like
> that in nmethod fields. This doesn't hurt correctness-wise, but it's
> probably worth checking if it's already true, especially when doing this
> with multiple threads concurrently.
> 
> I'll send an updated patch around later, I hope I can get to it today...

I'll review that when you get it out.
I think this looks as a reasonable step before we tackle this with a major effort, such as the JEP you and Carsten doing.
And another effort to 'fix' nmethods marking.

Internal discussion yesterday lead us to conclude that the runtime will probably need more threads.
This would be a good driver to do a 'global' worker pool which serves both gc, runtime and safepoints with threads.

> 
> Roman
> 
>> Hi Roman,
>>
>> I am about to disappear on an extended vacation so will let others
>> pursue this. IIUC this is longer an opt-in by the user at runtime, but
>> an opt-in by the particular GC developers. Okay. My only concern with
>> that is if Shenandoah is the only GC that currently opts in then this
>> code is not going to get much testing and will be more prone to
>> incidental breakage.

As I mentioned before, it seem like Erik ? have some idea, maybe he can do this after his barrier patch.

Thanks!

/Robbin

>>
>> Cheers,
>> David
>>
>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>> Hi Roman,
>>>>>
>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>
>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>
>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>> concurrent
>>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>>> require
>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have
>>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>>> without
>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>> corresponding
>>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>>> STS and
>>>>>>>> handle requests for safepoints not by yielding, but by leaving the
>>>>>>>> task.
>>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>>> workers
>>>>>>>> for safepoint cleanup, and I thus removed those parts. I left the
>>>>>>>> API in
>>>>>>>> CollectedHeap in place. I think GC devs who know better about G1
>>>>>>>> and CMS
>>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>
>>>>>>>> Is it ok now?
>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>> workers
>>>>>>> inside Shenandoah,
>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.:
>>>>>>>
>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>      _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>>>>>>> } else {
>>>>>>>      cleanup.work(0);
>>>>>>> }
>>>>>>>
>>>>>>> That way you don't even need your new flags, but it will be up to
>>>>>>> the
>>>>>>> other GCs to make their worker available
>>>>>>> or cheat with a separate workgang.
>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>> The problem is that we do not want to haste such decision, we believe
>>>>> there is a better solution.
>>>>> I think you also would want another solution.
>>>>> But it's seems like such solution with 1 'global' thread pool either
>>>>> own by GC or the VM it self is quite the undertaking.
>>>>> Since this probably will not be done any time soon my suggestion is,
>>>>> to not hold you back (we also want this), just to make
>>>>> the code parallel and as an intermediate step ask the GC if it minds
>>>>> sharing it's thread.
>>>>>
>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will share
>>>>> the code for a separate thread pool, do something of it's own or
>>>>> wait until the bigger question about thread pool(s) have been
>>>>> resolved.
>>>>>
>>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>>> flags for it we might limit our future options.
>>>>>
>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I see
>>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>>> However:
>>>>> Yes it's not cheating but I want decent heuristics between e.g. number
>>>>> of concurrent marking threads and parallel safepoint threads since
>>>>> they compete for cpu time.
>>>>> As the code looks now, I think that decisions must be made by the GC.
>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>> Oops. Minor mistake there. Correction:
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>
>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it into
>>> collectedHeap.hpp, resulting in build failure...)
>>>
>>> Roman
>>>
> 


From calvinrsmith at hotmail.com  Fri Jun  2 16:20:00 2017
From: calvinrsmith at hotmail.com (Calvin Smith)
Date: Fri, 2 Jun 2017 16:20:00 +0000
Subject: Revew Proposal for transactional aware GC
Message-ID: <BLUPR11MB080309F6256567ED92D91CE6C9F70@BLUPR11MB0803.namprd11.prod.outlook.com>

TransactionGC: Less GC for Transactional Systems


Similar to proposal: JEP draft: Epsilon GC: The Arbitrarily Low Overhead Garbage (Non-)Collector


Summary

Start as normal and run with the chosen garbage collector. A class/method may be registered and once it starts to run and until the method exits all memory is allocated in a thread local allocation buffer and is not available for other threads. In addition there is no garbage collection for these objects during this time. Once the buffer is exhausted then an OutOfMemory will be raised and the stack unwind

Goals

When running in a request / response transaction environment do not collect objects created during the transaction.

Non-Goals

There is no goal to require any special Java code. The same code should work with or without this feature.

Motivation

When running in a request / response transaction environment there may be a need to create a lot of objects, however, none of these objects are required after the transaction has completed and if the transaction is short enough (single digit milliseconds) any garbage collection at all will delay the system. Some of this can be mitigated by carefully tracing and being aware of the objects that are created, however, due to presence of third-party code (some of which may be built into the JDK) this is not always possible.

Description

Standard JDK by default; enabled with special option: -XX:+UseTransactionGC.

With this option the JDK works as normal until a given configured method is invoked. Once invoked all future allocations from that point on, on that thread, until the method exits is different. All allocations occur in a thread local allocation buffer where an allocation is simply updating a pointer to account for the allocation. There is no garbage tracking or collection within this method. Instead the memory buffer gets more filled up with each allocation. Once the method exits then the pointer is updated again to be reset at the place it was at the start of the method, thus de-allocating all the objects at one time.

The goal is to avoid a GC pause to collect objects created during a transaction. A global GC may still run and may pause the thread. As there is no GC of this buffer all of the objects created during the transaction must fit within the TLAB at one time.

Issues to solve / be aware of when the configured method is executing:

* Any memory allocated may only be referenced by other objects also created during this time. For example a cache created before the method starts may not access any of these objects

* Any memory allocated may only be referenced by the allocating thread.

* Any attempt to break the prior two rules will result in a VM error

* Due to first two rules there is no need for GC to occur, the memory may be de-allocated simply by setting the TLAB pointer back to it's original location.

* finalize must be called on these objects prior to de-alloction, however, execution of the finalize must still honor the first two rules.

* Any uncaught exceptions that pass out of the method must be moved prior to de-allocation such that the exception is no longer in the area to be cleared. Furthermore, any references that the exception contains must also be moved.

* Any exception thrown and caught during this time it may be treated like any other object and will be de-allocated at the exit of the method.

* If an object that is created in this method is required to be referenced from the outside then it must be created outside the method. Once such way is to place the data for the object on a queue and have a background thread process the queue and create the object. Adding to the queue must not create any objects or the background thread cannot access them. Instead pre-allocated entries may be used.

* The objects will still show up during a heap dump, perhaps they can be marked in some fashion to make it easier to recognize that they are not GC'able objects.

* JNI. Any JNI call cannot access the created objects after the method has ended.


Further enhancements:

* Synchronization - As none of the objects created in this mode can be accessed by other threads there is no need for synchronization, so all synchronization can be removed / stubbed out.

Alternatives

Some of this may be done without changing the JVM and instead using a Java agent. When the method is invoked the classes referenced are re-transformed and the methods of the referenced classes are converted to static, all objects are created as bytes in a byte array. All field references are updated to read / write this byte array. Then an object reference is just a number which is an index into the array. Then when an instance method is invoked two extra parameters are passed a) the byte array b) the index into the byte array that is the start of the instance


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170602/c280ad11/attachment.htm>

From kirk at kodewerk.com  Tue Jun  6 08:26:24 2017
From: kirk at kodewerk.com (Kirk Pepperdine)
Date: Tue, 6 Jun 2017 10:26:24 +0200
Subject: Parallel reference processingq
Message-ID: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>

Hi,

I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold.

Kind regards,
Kirk Pepperdine


From shade at redhat.com  Tue Jun  6 08:50:25 2017
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 6 Jun 2017 10:50:25 +0200
Subject: Parallel reference processingq
In-Reply-To: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
Message-ID: <1290fc41-01fd-767a-59c8-768162d1a98b@redhat.com>

On 06/06/2017 10:26 AM, Kirk Pepperdine wrote:
> Hi,
> 
> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold.

See:
 https://bugs.openjdk.java.net/browse/JDK-8043575

-Aleksey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170606/5f82e131/signature.asc>

From sangheon.kim at oracle.com  Tue Jun  6 17:44:06 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Tue, 6 Jun 2017 10:44:06 -0700
Subject: Parallel reference processingq
In-Reply-To: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
Message-ID: <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com>

Hi Kirk,

On 06/06/2017 01:26 AM, Kirk Pepperdine wrote:
> Hi,
>
> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold.
The biggest reason that I think is in some cases - if there are not many 
references [1]- single thread case is faster. Of course, this is 
controversial as choosing a benchmark will show different results. 
Probably big enough applications tend to have many references. But this 
is why we don't set 'ParallelRefProcEnabled=true' as a default.

Current implementation spends some time on starting/stopping worker 
threads. We start and stop worker threads 9 times (3 for SoftReference 
and 2 times for other types) for reference processing.  And this results 
in slower than single thread case in some cases.

JDK-8043575 <https://bugs.openjdk.java.net/browse/JDK-8043575> is 
proposing to dynamically switch between MT and single thread. And there 
are other CRs to enhance references processing.
I have a prototype but need more refining. Please keep on eye on this if 
you are interested. (Thanks, Aleksey for the link at the other email thread)

[1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby is 
exceptional case that shows over 12k FinalReferences. So single thread 
is faster except Derby case.

Thanks,
Sangheon


>
> Kind regards,
> Kirk Pepperdine
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170606/82458c39/attachment.htm>

From kirk at kodewerk.com  Tue Jun  6 19:26:30 2017
From: kirk at kodewerk.com (Kirk Pepperdine)
Date: Tue, 6 Jun 2017 21:26:30 +0200
Subject: Parallel reference processingq
In-Reply-To: <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com>
References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
 <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com>
Message-ID: <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com>


> On Jun 6, 2017, at 7:44 PM, sangheon <sangheon.kim at oracle.com> wrote:
> 
> Hi Kirk,
> 
> On 06/06/2017 01:26 AM, Kirk Pepperdine wrote:
>> Hi,
>> 
>> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold.
> The biggest reason that I think is in some cases - if there are not many references [1]- single thread case is faster. Of course, this is controversial as choosing a benchmark will show different results. Probably big enough applications tend to have many references. But this is why we don't set 'ParallelRefProcEnabled=true' as a default.
> 
> Current implementation spends some time on starting/stopping worker threads. We start and stop worker threads 9 times (3 for SoftReference and 2 times for other types) for reference processing.  And this results in slower than single thread case in some cases. 
> 
> JDK-8043575 <https://bugs.openjdk.java.net/browse/JDK-8043575> is proposing to dynamically switch between MT and single thread. And there are other CRs to enhance references processing.
> I have a prototype but need more refining. Please keep on eye on this if you are interested. (Thanks, Aleksey for the link at the other email thread)
> 
> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby is exceptional case that shows over 12k FinalReferences. So single thread is faster except Derby case.

SpecJVM doesn?t represent the real world. In the real world most applications use weak, soft and final references with a sprinkling of Phantom. I think Aleksey?s link was most interesting, my bad for not searching the bug database prior to posting.

Anyways, I don?t mind charging clients a fee to tell them to turn on this flag but?.

Kind regards,
Kirk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170606/d290af7d/attachment.htm>

From sangheon.kim at oracle.com  Tue Jun  6 19:40:15 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Tue, 6 Jun 2017 12:40:15 -0700
Subject: Parallel reference processingq
In-Reply-To: <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com>
References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
 <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com>
 <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com>
Message-ID: <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com>


On 06/06/2017 12:26 PM, Kirk Pepperdine wrote:
>
>> On Jun 6, 2017, at 7:44 PM, sangheon <sangheon.kim at oracle.com 
>> <mailto:sangheon.kim at oracle.com>> wrote:
>>
>> Hi Kirk,
>>
>> On 06/06/2017 01:26 AM, Kirk Pepperdine wrote:
>>> Hi,
>>>
>>> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold.
>> The biggest reason that I think is in some cases - if there are not 
>> many references [1]- single thread case is faster. Of course, this is 
>> controversial as choosing a benchmark will show different results. 
>> Probably big enough applications tend to have many references. But 
>> this is why we don't set 'ParallelRefProcEnabled=true' as a default.
>>
>> Current implementation spends some time on starting/stopping worker 
>> threads. We start and stop worker threads 9 times (3 for 
>> SoftReference and 2 times for other types) for reference processing.  
>> And this results in slower than single thread case in some cases.
>>
>> JDK-8043575 <https://bugs.openjdk.java.net/browse/JDK-8043575> is 
>> proposing to dynamically switch between MT and single thread. And 
>> there are other CRs to enhance references processing.
>> I have a prototype but need more refining. Please keep on eye on this 
>> if you are interested. (Thanks, Aleksey for the link at the other 
>> email thread)
>>
>> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby 
>> is exceptional case that shows over 12k FinalReferences. So single 
>> thread is faster except Derby case.
>
> SpecJVM doesn?t represent the real world.
Absolutely!
I was trying to answer the reason why ParallelRefProcEnabled is set to 
false as a default.

> In the real world most applications use weak, soft and final 
> references with a sprinkling of Phantom. I think Aleksey?s link was 
> most interesting, my bad for not searching the bug database prior to 
> posting.
>
> Anyways, I don?t mind charging clients a fee to tell them to turn on 
> this flag but?.
Okay.
I hope JDK-8043575 <https://bugs.openjdk.java.net/browse/JDK-8043575> 
would help them.

Thanks,
Sangheon


>
> Kind regards,
> Kirk
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170606/59d861fa/attachment.htm>

From kirk at kodewerk.com  Tue Jun  6 19:51:41 2017
From: kirk at kodewerk.com (Kirk Pepperdine)
Date: Tue, 6 Jun 2017 21:51:41 +0200
Subject: Parallel reference processingq
In-Reply-To: <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com>
References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
 <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com>
 <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com>
 <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com>
Message-ID: <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com>


> On Jun 6, 2017, at 9:40 PM, sangheon <sangheon.kim at oracle.com> wrote:
> 
> 
> 
> On 06/06/2017 12:26 PM, Kirk Pepperdine wrote:
>> 
>>> On Jun 6, 2017, at 7:44 PM, sangheon <sangheon.kim at oracle.com <mailto:sangheon.kim at oracle.com>> wrote:
>>> 
>>> Hi Kirk,
>>> 
>>> On 06/06/2017 01:26 AM, Kirk Pepperdine wrote:
>>>> Hi,
>>>> 
>>>> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold.
>>> The biggest reason that I think is in some cases - if there are not many references [1]- single thread case is faster. Of course, this is controversial as choosing a benchmark will show different results. Probably big enough applications tend to have many references. But this is why we don't set 'ParallelRefProcEnabled=true' as a default.
>>> 
>>> Current implementation spends some time on starting/stopping worker threads. We start and stop worker threads 9 times (3 for SoftReference and 2 times for other types) for reference processing.  And this results in slower than single thread case in some cases. 
>>> 
>>> JDK-8043575 <https://bugs.openjdk.java.net/browse/JDK-8043575> is proposing to dynamically switch between MT and single thread. And there are other CRs to enhance references processing.
>>> I have a prototype but need more refining. Please keep on eye on this if you are interested. (Thanks, Aleksey for the link at the other email thread)
>>> 
>>> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby is exceptional case that shows over 12k FinalReferences. So single thread is faster except Derby case.
>> 
>> SpecJVM doesn?t represent the real world.
> Absolutely!
> I was trying to answer the reason why ParallelRefProcEnabled is set to false as a default.

I got that.. I was trying to suggest that basing this decision on that benchmark isn?t a great idea.

Kind regards,
Kirk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170606/e46cb72e/attachment.htm>

From sangheon.kim at oracle.com  Tue Jun  6 20:36:56 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Tue, 6 Jun 2017 13:36:56 -0700
Subject: Parallel reference processingq
In-Reply-To: <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com>
References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
 <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com>
 <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com>
 <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com>
 <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com>
Message-ID: <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com>


On 06/06/2017 12:51 PM, Kirk Pepperdine wrote:
>
>> On Jun 6, 2017, at 9:40 PM, sangheon <sangheon.kim at oracle.com 
>> <mailto:sangheon.kim at oracle.com>> wrote:
>>
>>
>>
>> On 06/06/2017 12:26 PM, Kirk Pepperdine wrote:
>>>
>>>> On Jun 6, 2017, at 7:44 PM, sangheon <sangheon.kim at oracle.com 
>>>> <mailto:sangheon.kim at oracle.com>> wrote:
>>>>
>>>> Hi Kirk,
>>>>
>>>> On 06/06/2017 01:26 AM, Kirk Pepperdine wrote:
>>>>> Hi,
>>>>>
>>>>> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold.
>>>> The biggest reason that I think is in some cases - if there are not 
>>>> many references [1]- single thread case is faster. Of course, this 
>>>> is controversial as choosing a benchmark will show different 
>>>> results. Probably big enough applications tend to have many 
>>>> references. But this is why we don't set 
>>>> 'ParallelRefProcEnabled=true' as a default.
>>>>
>>>> Current implementation spends some time on starting/stopping worker 
>>>> threads. We start and stop worker threads 9 times (3 for 
>>>> SoftReference and 2 times for other types) for reference 
>>>> processing.  And this results in slower than single thread case in 
>>>> some cases.
>>>>
>>>> JDK-8043575 <https://bugs.openjdk.java.net/browse/JDK-8043575> is 
>>>> proposing to dynamically switch between MT and single thread. And 
>>>> there are other CRs to enhance references processing.
>>>> I have a prototype but need more refining. Please keep on eye on 
>>>> this if you are interested. (Thanks, Aleksey for the link at the 
>>>> other email thread)
>>>>
>>>> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby 
>>>> is exceptional case that shows over 12k FinalReferences. So single 
>>>> thread is faster except Derby case.
>>>
>>> SpecJVM doesn?t represent the real world.
>> Absolutely!
>> I was trying to answer the reason why ParallelRefProcEnabled is set 
>> to false as a default.
>
> I got that.. I was trying to suggest that basing this decision on that 
> benchmark isn?t a great idea.
Probably my explanation was incomplete.
ParallelRefProcEnabled command-line option was introduced long time ago 
with false as a default. And my previous answer with Specjvm2008 was my 
guess from recent data when I investigated JDK-8043575. I was saying if 
we don't have enough references to process, single thread is better 
choice. So this could be the reason of current default value. Or my 
guess would be simply wrong. :)

Probably you are saying that we have to use other benchmarks to decide 
the default value.
May I ask what is your recommendation for the benchmarks?
I will not try to change its default value but your recommendation would 
be helpful for further investigation.

Thanks,
Sangheon


>
> Kind regards,
> Kirk
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170606/22be8732/attachment.htm>

From stuart.monteith at linaro.org  Wed Jun  7 16:20:55 2017
From: stuart.monteith at linaro.org (Stuart Monteith)
Date: Wed, 7 Jun 2017 17:20:55 +0100
Subject: Revew Proposal for transactional aware GC
In-Reply-To: <BLUPR11MB080309F6256567ED92D91CE6C9F70@BLUPR11MB0803.namprd11.prod.outlook.com>
References: <BLUPR11MB080309F6256567ED92D91CE6C9F70@BLUPR11MB0803.namprd11.prod.outlook.com>
Message-ID: <CAEGA6kZVmqtGty4fKERO4iZhr2w1vUgkSg3WeooACUAZstEAYg@mail.gmail.com>

Hello,
   This is similar to how the Realtime Specification for Java (RTSJ -
JSR-1) manages memory. If you aren't aware, see:
http://www.rtsj.org/specjavadoc/book_index.html .
The concept they use is called "ScopeMemory". It is expected that the
VM will throw an appropriate exception when references to objects in
ScopedMemory are stored in objects outside - I think it is a
javax.realtime.MemoryAccessError. Is this what you mean by a "VM
error"?

They also solve the problem of how exceptions are allocated by use of
an javax.realtime.ThrowBoundaryError which is thrown outside of the
scope of the ScopedMemory where the exception was raised.

There is precedent for resetting the whole VM between transactions,
but I won't say more on that.

BR,
   Stuart


On 2 June 2017 at 17:20, Calvin Smith <calvinrsmith at hotmail.com> wrote:
> TransactionGC: Less GC for Transactional Systems
>
>
> Similar to proposal: JEP draft: Epsilon GC: The Arbitrarily Low Overhead
> Garbage (Non-)Collector
>
>
> Summary
>
> Start as normal and run with the chosen garbage collector. A class/method
> may be registered and once it starts to run and until the method exits all
> memory is allocated in a thread local allocation buffer and is not available
> for other threads. In addition there is no garbage collection for these
> objects during this time. Once the buffer is exhausted then an OutOfMemory
> will be raised and the stack unwind
>
> Goals
>
> When running in a request / response transaction environment do not collect
> objects created during the transaction.
>
> Non-Goals
>
> There is no goal to require any special Java code. The same code should work
> with or without this feature.
>
> Motivation
>
> When running in a request / response transaction environment there may be a
> need to create a lot of objects, however, none of these objects are required
> after the transaction has completed and if the transaction is short enough
> (single digit milliseconds) any garbage collection at all will delay the
> system. Some of this can be mitigated by carefully tracing and being aware
> of the objects that are created, however, due to presence of third-party
> code (some of which may be built into the JDK) this is not always possible.
>
> Description
>
> Standard JDK by default; enabled with special option: -XX:+UseTransactionGC.
>
> With this option the JDK works as normal until a given configured method is
> invoked. Once invoked all future allocations from that point on, on that
> thread, until the method exits is different. All allocations occur in a
> thread local allocation buffer where an allocation is simply updating a
> pointer to account for the allocation. There is no garbage tracking or
> collection within this method. Instead the memory buffer gets more filled up
> with each allocation. Once the method exits then the pointer is updated
> again to be reset at the place it was at the start of the method, thus
> de-allocating all the objects at one time.
>
> The goal is to avoid a GC pause to collect objects created during a
> transaction. A global GC may still run and may pause the thread. As there is
> no GC of this buffer all of the objects created during the transaction must
> fit within the TLAB at one time.
>
> Issues to solve / be aware of when the configured method is executing:
>
> * Any memory allocated may only be referenced by other objects also created
> during this time. For example a cache created before the method starts may
> not access any of these objects
>
> * Any memory allocated may only be referenced by the allocating thread.
>
> * Any attempt to break the prior two rules will result in a VM error
>
> * Due to first two rules there is no need for GC to occur, the memory may be
> de-allocated simply by setting the TLAB pointer back to it's original
> location.
>
> * finalize must be called on these objects prior to de-alloction, however,
> execution of the finalize must still honor the first two rules.
>
> * Any uncaught exceptions that pass out of the method must be moved prior to
> de-allocation such that the exception is no longer in the area to be
> cleared. Furthermore, any references that the exception contains must also
> be moved.
>
> * Any exception thrown and caught during this time it may be treated like
> any other object and will be de-allocated at the exit of the method.
>
> * If an object that is created in this method is required to be referenced
> from the outside then it must be created outside the method. Once such way
> is to place the data for the object on a queue and have a background thread
> process the queue and create the object. Adding to the queue must not create
> any objects or the background thread cannot access them. Instead
> pre-allocated entries may be used.
>
> * The objects will still show up during a heap dump, perhaps they can be
> marked in some fashion to make it easier to recognize that they are not
> GC'able objects.
>
> * JNI. Any JNI call cannot access the created objects after the method has
> ended.
>
>
>
> Further enhancements:
>
> * Synchronization - As none of the objects created in this mode can be
> accessed by other threads there is no need for synchronization, so all
> synchronization can be removed / stubbed out.
>
> Alternatives
>
> Some of this may be done without changing the JVM and instead using a Java
> agent. When the method is invoked the classes referenced are re-transformed
> and the methods of the referenced classes are converted to static, all
> objects are created as bytes in a byte array. All field references are
> updated to read / write this byte array. Then an object reference is just a
> number which is an index into the array. Then when an instance method is
> invoked two extra parameters are passed a) the byte array b) the index into
> the byte array that is the start of the instance
>
>
>
>
>


From kirk at kodewerk.com  Wed Jun  7 06:17:19 2017
From: kirk at kodewerk.com (Kirk Pepperdine)
Date: Wed, 7 Jun 2017 08:17:19 +0200
Subject: Parallel reference processingq
In-Reply-To: <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com>
References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
 <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com>
 <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com>
 <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com>
 <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com>
 <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com>
Message-ID: <C4891B4A-BA72-4844-93FB-2DDCC84306E8@kodewerk.com>


>>>>> 
>>>>> JDK-8043575 <https://bugs.openjdk.java.net/browse/JDK-8043575> is proposing to dynamically switch between MT and single thread. And there are other CRs to enhance references processing.
>>>>> I have a prototype but need more refining. Please keep on eye on this if you are interested. (Thanks, Aleksey for the link at the other email thread)
>>>>> 
>>>>> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby is exceptional case that shows over 12k FinalReferences. So single thread is faster except Derby case.
>>>> 
>>>> SpecJVM doesn?t represent the real world.
>>> Absolutely!
>>> I was trying to answer the reason why ParallelRefProcEnabled is set to false as a default.
>> 
>> I got that.. I was trying to suggest that basing this decision on that benchmark isn?t a great idea.
> Probably my explanation was incomplete.

I think we?re talking past each other, my apologies for being a bit too terse. The referenced bug report seems to cover all of my concerns.

> ParallelRefProcEnabled command-line option was introduced long time ago with false as a default. And my previous answer with Specjvm2008 was my guess from recent data when I investigated JDK-8043575. I was saying if we don't have enough references to process, single thread is better choice. So this could be the reason of current default value. Or my guess would be simply wrong. :)

Right, hence my comment that SpecJVM isn?t a great benchmark as it doesn?t represent enterprise applications and hence has nothing useful to say about reference processing which is commonly seen in enterprise applications.

> 
> Probably you are saying that we have to use other benchmarks to decide the default value.
> May I ask what is your recommendation for the benchmarks?

I don?t have a specific benchmark for this. I?m relying on observations made from customer?s applications. I don?t know if the SPEC application server benchmark addresses this question. I?ve not run it in quite some time so I don?t recall. At any rate, it is very clear that real world applications use many frameworks that rely heavily on the use of reference types. For example, Hibernate with secondary caching turned on. CMS is sensitive but the G1?s remark phase appears to be exceptionally sensitive to the amount of references it processes. I?ve attached a pause time and G1 breakout of the other phases that is very typical of what I?m seeing.

Kind regards,
Kirk


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170607/cfe22ddc/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.tiff
Type: image/tiff
Size: 1102226 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170607/cfe22ddc/PastedGraphic-2.tiff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.tiff
Type: image/tiff
Size: 1292744 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170607/cfe22ddc/PastedGraphic-1.tiff>

From sangheon.kim at oracle.com  Wed Jun  7 20:28:56 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Wed, 7 Jun 2017 13:28:56 -0700
Subject: Parallel reference processingq
In-Reply-To: <C4891B4A-BA72-4844-93FB-2DDCC84306E8@kodewerk.com>
References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
 <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com>
 <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com>
 <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com>
 <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com>
 <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com>
 <C4891B4A-BA72-4844-93FB-2DDCC84306E8@kodewerk.com>
Message-ID: <70472822-8015-cd9f-97ec-40d377f40aef@oracle.com>


On 06/06/2017 11:17 PM, Kirk Pepperdine wrote:
>
>>>>>>
>>>>>> JDK-8043575 <https://bugs.openjdk.java.net/browse/JDK-8043575> is 
>>>>>> proposing to dynamically switch between MT and single thread. And 
>>>>>> there are other CRs to enhance references processing.
>>>>>> I have a prototype but need more refining. Please keep on eye on 
>>>>>> this if you are interested. (Thanks, Aleksey for the link at the 
>>>>>> other email thread)
>>>>>>
>>>>>> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. 
>>>>>> Derby is exceptional case that shows over 12k FinalReferences. So 
>>>>>> single thread is faster except Derby case.
>>>>>
>>>>> SpecJVM doesn?t represent the real world.
>>>> Absolutely!
>>>> I was trying to answer the reason why ParallelRefProcEnabled is set 
>>>> to false as a default.
>>>
>>> I got that.. I was trying to suggest that basing this decision on 
>>> that benchmark isn?t a great idea.
>> Probably my explanation was incomplete.
>
> I think we?re talking past each other, my apologies for being a bit 
> too terse.
It was same here.. :)

> The referenced bug report seems to cover all of my concerns.
Good to hear the CR would cover your concerns.

>
>> ParallelRefProcEnabled command-line option was introduced long time 
>> ago with false as a default. And my previous answer with Specjvm2008 
>> was my guess from recent data when I investigated JDK-8043575. I was 
>> saying if we don't have enough references to process, single thread 
>> is better choice. So this could be the reason of current default 
>> value. Or my guess would be simply wrong. :)
>
> Right, hence my comment that SpecJVM isn?t a great benchmark as it 
> doesn?t represent enterprise applications and hence has nothing useful 
> to say about reference processing which is commonly seen in enterprise 
> applications.
Yes, SpecJVM doesn't represent enterprise applications.

I was trying to say that MTness of reference processing is mostly 
affected by the total of references. But saying the benchmark name made 
a noise.
JDK-8043575 is mostly about dynamically choosing MTness for reference 
processing. Focusing on the switch(turn on/off the option), any 
applications that showing several aspects(many references, limited 
references etc..) seem okay to me. In my case, as SpecJVM2008-Derby has 
many final references(over 12k), my prototype should show almost same 
result as "baseline, +ParallelRefProcEnabled". And for other sub-tests 
of SpecJVM2008, my prototype should show almost same as "baseline, 
-ParallelRefProcEnabled" because with limited references, single thread 
shows better results.

Initially I worked with micro-benchmark but I also wanted to test with 
known benchmark as well.
Hope this explains why I used SpecJVM2008.

>
>>
>> Probably you are saying that we have to use other benchmarks to 
>> decide the default value.
>> May I ask what is your recommendation for the benchmarks?
>
> I don?t have a specific benchmark for this. I?m relying on 
> observations made from customer?s applications. I don?t know if the 
> SPEC application server benchmark addresses this question. I?ve not 
> run it in quite some time so I don?t recall. At any rate, it is very 
> clear that real world applications use many frameworks that rely 
> heavily on the use of reference types. For example, Hibernate with 
> secondary caching turned on. CMS is sensitive but the G1?s remark 
> phase appears to be exceptionally sensitive to the amount of 
> references it processes. I?ve attached a pause time and G1 breakout of 
> the other phases that is very typical of what I?m seeing.
Thank you for the attachments.
I will analyze a bit more.

Thanks,
Sangheon


>
> Kind regards,
> Kirk
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170607/c6535b29/attachment.htm>

From mark.reinhold at oracle.com  Wed Jun  7 22:42:05 2017
From: mark.reinhold at oracle.com (mark.reinhold at oracle.com)
Date: Wed,  7 Jun 2017 15:42:05 -0700 (PDT)
Subject: JEP 189: Shenandoah: An Ultra-Low-Pause-Time Garbage Collector
Message-ID: <20170607224205.4A45DF977C@eggemoggin.niobe.net>

New JEP Candidate: http://openjdk.java.net/jeps/189

- Mark


From mark.reinhold at oracle.com  Wed Jun  7 23:12:59 2017
From: mark.reinhold at oracle.com (mark.reinhold at oracle.com)
Date: Wed,  7 Jun 2017 16:12:59 -0700 (PDT)
Subject: JEP 304: Garbage-Collector Interface
Message-ID: <20170607231259.1EE66F978A@eggemoggin.niobe.net>

New JEP Candidate: http://openjdk.java.net/jeps/304

- Mark


From stefan.johansson at oracle.com  Thu Jun  8 12:35:58 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Thu, 8 Jun 2017 14:35:58 +0200
Subject: RFR: 8177544: Restructure G1 Full GC code
Message-ID: <62d1f02b-1fc0-ffcf-b8e0-e88ebacecebe@oracle.com>

Hi,

Please review this enhancement:
https://bugs.openjdk.java.net/browse/JDK-8177544

Webrev:
http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.00/

Summary:
This is more or less only code moving around. The function 
do_full_collection in G1CollectedHeap is very large and breaking it up 
to smaller parts and grouping together some of the stack objects help 
readability.

In addition to splitting the large function to smaller ones I've 
introduced two new classes:
- G1FullGCScope that groups most of the previously spread out stack objects.
- G1SerialCollector that handles the interaction with G1MarkSweep.

Doing this change will simplify future changes to the full GC.

Testing:
* Locally run JTREG tests
* RBT hotspot tier 2 & 3

Thanks,
Stefan


From Milan.Mimica at infobip.com  Thu Jun  8 12:57:27 2017
From: Milan.Mimica at infobip.com (Milan Mimica)
Date: Thu, 8 Jun 2017 12:57:27 +0000
Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as
 belonging to mtGC
In-Reply-To: <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com>
References: <1495365159435.54025@infobip.com>
 <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com>
 <1495734990075.28893@infobip.com>,
 <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com>
Message-ID: <1496926647107.27811@infobip.com>


Milan Mimica, Senior Software Engineer / Division Lead
> From: Kim Barrett <kim.barrett at oracle.com>
> Sent: Saturday, May 27, 2017 01:00
>
> We appreciate your interest in working on this and are happy to help, but we
> really do need to check off this process item before going any deeper.

Filled and sent.

>
> Yes, some refactoring seems required in order to properly fix JDK-8176571.
> That?s what I meant by:
>
>On May 22, 2017, at 4:41 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>> There doesn't seem to be a good path into the functionality provided
>> by ArrayAllocator<> that has such a runtime MEMFLAGS value [...] The
>> lack of a runtime-only path for propagating that information would
>> need to be added [sic. fixed].

Okay. There are two patches.

- refactor_array_allocator.diff
Pass MEMFLAGS down to the concrete allocator via
call stack instead of using template var. Indeed it feels right, especially
because some methods don't even use MEMFLAGS.

- heapBitMap_nmt.diff
Changed CHeapBitMap to use configurable NMT pool. Changed
all (not just 'fine' bitmaps) G1 usages of CHeapBitMap to specify mtGC. Had to
add a pair CHeapBitMap constructors to make test_bitMap.cpp happy. That doesn't
feel right, but I don't know what to do, except to rewrite the test.

Btw, what are your plans on c++>=11? For example I could use delegating
constructors here, or std::is_same.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: heapBitMap_nmt.diff
Type: text/x-patch
Size: 5602 bytes
Desc: heapBitMap_nmt.diff
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170608/585cac3f/heapBitMap_nmt.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: refactor_array_allocator.diff
Type: text/x-patch
Size: 11817 bytes
Desc: refactor_array_allocator.diff
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170608/585cac3f/refactor_array_allocator.diff>

From kim.barrett at oracle.com  Fri Jun  9 01:04:33 2017
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 8 Jun 2017 21:04:33 -0400
Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as
 belonging to mtGC
In-Reply-To: <1496926647107.27811@infobip.com>
References: <1495365159435.54025@infobip.com>
 <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com>
 <1495734990075.28893@infobip.com>
 <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com>
 <1496926647107.27811@infobip.com>
Message-ID: <E2C0ACBF-E0CB-426F-97A6-99224CFDBE0C@oracle.com>

> On Jun 8, 2017, at 8:57 AM, Milan Mimica <Milan.Mimica at infobip.com> wrote:
> 
> 
> Milan Mimica, Senior Software Engineer / Division Lead
>> From: Kim Barrett <kim.barrett at oracle.com>
>> Sent: Saturday, May 27, 2017 01:00
>> 
>> We appreciate your interest in working on this and are happy to help, but we
>> really do need to check off this process item before going any deeper.
> 
> Filled and sent.

Thanks.

>> Yes, some refactoring seems required in order to properly fix JDK-8176571.
>> That?s what I meant by:
>> 
>> On May 22, 2017, at 4:41 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>> There doesn't seem to be a good path into the functionality provided
>>> by ArrayAllocator<> that has such a runtime MEMFLAGS value [...] The
>>> lack of a runtime-only path for propagating that information would
>>> need to be added [sic. fixed].
> 
> Okay. There are two patches.

Thanks for splitting this into a refactoring followed by the use of
that refactoring.  That should make reviewing and discussion easier.

I'm looking at the refactoring part, but am running out of time today.

We should probably have a new RFE for the refactoring.  I'll take care
of that tomorrow.

Note that the refactoring patch doesn't apply cleanly to jdk10/hs tip.
There's a merge conflict with the fix for JDK-8168467 (resolved
2017/03/15).  After dealing with that, the refactoring looks good to
me on an initial pass.  I want to take a more careful look tomorrow.

You'll need an Oracle sponsor, since you are not (yet) a committer,
and also because these changes affect hotspot, so need to eventually
be pushed via jprt.  I can be the sponsor for the refactoring.

What testing has been done?  And are there any tests you can point to
that are directly affected?  (I already know about
TestArrayAllocatorMallocLimit.java.)  I'll probably want to run some
tests using our internal test facilities as part of sponsoring.

I haven't looked at the second patch at all yet.

> - refactor_array_allocator.diff
> Pass MEMFLAGS down to the concrete allocator via
> call stack instead of using template var. Indeed it feels right, especially
> because some methods don't even use MEMFLAGS.
> 
> - heapBitMap_nmt.diff
> Changed CHeapBitMap to use configurable NMT pool. Changed
> all (not just 'fine' bitmaps) G1 usages of CHeapBitMap to specify mtGC. Had to
> add a pair CHeapBitMap constructors to make test_bitMap.cpp happy. That doesn't
> feel right, but I don't know what to do, except to rewrite the test.
> 
> Btw, what are your plans on c++>=11? For example I could use delegating
> constructors here, or std::is_same.

I think that?s unlikely to happen soon, though there is interest.  But there?s also a
fair amount of work involved, which needs to be balanced against other tasks.

I think going beyond c++11 isn?t even feasible right now, as some of the relevant
compilers don?t yet have a version that supports c++14.

There?s in-progress work to add some metaprogramming utilities, including
IsSame, as we keep encountering places where such would be useful.  I?m
expecting that to show up pretty soon.


From kirk at kodewerk.com  Fri Jun  9 15:17:30 2017
From: kirk at kodewerk.com (Kirk Pepperdine)
Date: Fri, 9 Jun 2017 17:17:30 +0200
Subject: Parallel reference processingq
In-Reply-To: <70472822-8015-cd9f-97ec-40d377f40aef@oracle.com>
References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com>
 <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com>
 <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com>
 <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com>
 <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com>
 <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com>
 <C4891B4A-BA72-4844-93FB-2DDCC84306E8@kodewerk.com>
 <70472822-8015-cd9f-97ec-40d377f40aef@oracle.com>
Message-ID: <50ABF569-87D4-4A1A-9619-935D54880062@kodewerk.com>


> 
> I was trying to say that MTness of reference processing is mostly affected by the total of references. But saying the benchmark name made a noise.
> JDK-8043575 is mostly about dynamically choosing MTness for reference processing. Focusing on the switch(turn on/off the option), any applications that showing several aspects(many references, limited references etc..) seem okay to me. In my case, as SpecJVM2008-Derby has many final references(over 12k), my prototype should show almost same result as "baseline, +ParallelRefProcEnabled". And for other sub-tests of SpecJVM2008, my prototype should show almost same as "baseline, -ParallelRefProcEnabled" because with limited references, single thread shows better results.

Right, this is an issue in that I often a range of 100-250K references processed.

Kind regards,
Kirk


From sangheon.kim at oracle.com  Fri Jun  9 23:57:54 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Fri, 9 Jun 2017 16:57:54 -0700
Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing
Message-ID: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>

Hi all,

Can I have some reviews for changes to improve logging for j.l.reference 
processing?

This patch is proposing to add logs for balance queue time, phase1~3 
time of reference processing, enqueue time for discovered references at 
debug level and workers distribution stats at trace level. And it also 
includes trace events for those cases.


The log will be changed like below:
* Before
[debug][gc,ref] GC(9) SoftReference 0.581ms
[debug][gc,ref] GC(9) WeakReference 1.066ms
[debug][gc,ref] GC(9) FinalReference 0.376ms
[debug][gc,ref] GC(9) PhantomReference 0.468ms
[debug][gc,ref] GC(9) JNI Weak Reference 0.005ms
[debug][gc,ref] GC(9) Ref Counts: Soft: 0 Weak: 0 Final: 0 Phantom: 0

* After
[debug][gc,ref] GC(5) SoftReference 0.895ms
[debug][gc,ref] GC(5)   Balance queues: 0.001ms
[debug][gc,ref] GC(5)   Phase1: 0.456ms
[trace][gc,ref] GC(5)     Process lists (ms)        Min: 0.0, Avg:  0.3, 
Max:  0.3, Diff:  0.3, Sum:  5.8, Workers: 23
[debug][gc,ref] GC(5)   Phase2: 0.059ms
[trace][gc,ref] GC(5)     Process lists (ms)        Min: 0.0, Avg:  0.0, 
Max:  0.0, Diff:  0.0, Sum:  0.0, Workers: 23
[debug][gc,ref] GC(5)   Phase3: 0.374ms
[trace][gc,ref] GC(5)     Process lists (ms)        Min: 0.0, Avg:  0.2, 
Max:  0.3, Diff:  0.3, Sum:  4.0, Workers: 23
[debug][gc,ref] GC(5)   Cleared: 0
[debug][gc,ref] GC(5)   Discovered: 0
...
[debug][gc,ref] GC(5) JNI Weak Reference 0.003ms
[debug][gc,ref] GC(5) Enqueue reference lists 0.081ms
[debug][gc,ref] GC(5)   Counts:  Soft: 0  Weak: 0  Final: 0  Phantom: 0

CR: https://bugs.openjdk.java.net/browse/JDK-8173335
webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0
Testing: JPRT and local tests of combinations with 
+/-ParallelRefProcEnabled and gc types.

Thanks,
Sangheon


From Milan.Mimica at infobip.com  Sat Jun 10 15:07:29 2017
From: Milan.Mimica at infobip.com (Milan Mimica)
Date: Sat, 10 Jun 2017 15:07:29 +0000
Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as
 belonging to mtGC
In-Reply-To: <E2C0ACBF-E0CB-426F-97A6-99224CFDBE0C@oracle.com>
References: <1495365159435.54025@infobip.com>
 <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com>
 <1495734990075.28893@infobip.com>
 <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com>
 <1496926647107.27811@infobip.com>,
 <E2C0ACBF-E0CB-426F-97A6-99224CFDBE0C@oracle.com>
Message-ID: <1497107249632.26184@infobip.com>


Milan Mimica, Senior Software Engineer / Division Lead
> From: Kim Barrett <kim.barrett at oracle.com>
> Sent: Friday, June 9, 2017 03:04
>
> Note that the refactoring patch doesn't apply cleanly to jdk10/hs tip.
> There's a merge conflict with the fix for JDK-8168467 (resolved
> 2017/03/15).

Oh, I 've developed it against jdk10/jdk10. Care to explain a bit the
difference? Attached are patches against jdk10/hs.

> What testing has been done?  And are there any tests you can point to
> that are directly affected?  (I already know about
> TestArrayAllocatorMallocLimit.java.)  I'll probably want to run some
> tests using our internal test facilities as part of sponsoring.

I have run jtreg on my laptop. There are some failures actually,
but happens also on clean repo. I'm not aware of anything else.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: heapBitMap_nmt.diff
Type: text/x-patch
Size: 5506 bytes
Desc: heapBitMap_nmt.diff
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170610/60cdb146/heapBitMap_nmt.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: refactor_array_allocator.diff
Type: text/x-patch
Size: 11804 bytes
Desc: refactor_array_allocator.diff
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170610/60cdb146/refactor_array_allocator.diff>

From thomas.schatzl at oracle.com  Mon Jun 12 11:34:38 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 12 Jun 2017 13:34:38 +0200
Subject: RFR (7xS): 8178148: Log more detailed information about scan rs
 phase
In-Reply-To: <04ab36fb-afaf-0ed9-6480-b955474d4bee@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <1493986396.2777.61.camel@oracle.com>
 <41d0b773-06fd-dfef-beda-2d62797210d9@oracle.com>
 <1494849002.2707.20.camel@oracle.com>
 <c37f3bfc-9945-2998-c5bb-f7858318ac3c@oracle.com>
 <1495543832.2781.37.camel@oracle.com>
 <04ab36fb-afaf-0ed9-6480-b955474d4bee@oracle.com>
Message-ID: <1497267278.2777.2.camel@oracle.com>

Hi all,

? sorry for another round of reviews: Erik asked me to add a gtest test
for the linked subitems, both for the (existing) set() and (new) add()
methods.

Webrev:?http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2_to_3/?(di
ff):?http://cr.openjdk.java.net/~tschatzl/8178148/webrev.3/?(full)

Testing:
local testing, jprt

Thanks,
? Thomas

On Tue, 2017-05-23 at 12:15 -0700, sangheon wrote:
> Hi Thomas,
> 
> On 05/23/2017 05:50 AM, Thomas Schatzl wrote:
> > 
> > Hi all,
> > 
> > ???unfortunately, for support of some code there is need for one
> > more
> > public method in the g1gcphasetimes class.
> > 
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8178148/webrev.1_to_2/ (diff)
> > http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2/ (full)
> Webrev.2 still looks good to me.
> 
> Thanks,
> Sangheon
> 
> 
> > 
> > 
> > Sorry for the issue.
> > 
> > Thanks,
> > ???Thomas
> > 


From shade at redhat.com  Mon Jun 12 16:06:43 2017
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 12 Jun 2017 18:06:43 +0200
Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing
In-Reply-To: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>
References: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>
Message-ID: <bae34b84-4b8d-3d4e-9954-a204da1ab50f@redhat.com>

On 06/10/2017 01:57 AM, sangheon wrote:
> CR: https://bugs.openjdk.java.net/browse/JDK-8173335
> webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0

Oh, good! I had to instrument these by hand when optimizing RP paths.

Comments after brief look:

 *) So, the path with NULL executor are also not handling the timer? E.g. CMS:

 5262   if (rp->processing_is_mt()) {
 5263     rp->balance_all_queues();
 5264     CMSRefProcTaskExecutor task_executor(*this);
 5265     rp->enqueue_discovered_references(&task_executor, _gc_timer_cm);
 5266   } else {
 5267     rp->enqueue_discovered_references(NULL);
 5268   }

 *) I would leave "Ref Counts" line as usual for compatibility reasons. Changing
it to "Counts" would force GC log parsers to handle that corner case too.

 *) This may reuse Indents?

   95       out->print("%s", "    ");

 *) Probably makes sense to "hg mv -A" the workerDataArray files to preserve the
Mercurial history -- webrev should say something like "copied from ...", IIRC.

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170612/7d842324/signature.asc>

From sangheon.kim at oracle.com  Tue Jun 13 00:13:21 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Mon, 12 Jun 2017 17:13:21 -0700
Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing
In-Reply-To: <bae34b84-4b8d-3d4e-9954-a204da1ab50f@redhat.com>
References: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>
 <bae34b84-4b8d-3d4e-9954-a204da1ab50f@redhat.com>
Message-ID: <a34ab9df-3e0a-b7ff-79e4-c4015ea843bc@oracle.com>

Hi Aleksey,

Thanks for the review.

On 06/12/2017 09:06 AM, Aleksey Shipilev wrote:
> On 06/10/2017 01:57 AM, sangheon wrote:
>> CR: https://bugs.openjdk.java.net/browse/JDK-8173335
>> webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0
> Oh, good! I had to instrument these by hand when optimizing RP paths.
>
> Comments after brief look:
>
>   *) So, the path with NULL executor are also not handling the timer? E.g. CMS:
>
>   5262   if (rp->processing_is_mt()) {
>   5263     rp->balance_all_queues();
>   5264     CMSRefProcTaskExecutor task_executor(*this);
>   5265     rp->enqueue_discovered_references(&task_executor, _gc_timer_cm);
>   5266   } else {
>   5267     rp->enqueue_discovered_references(NULL);
>   5268   }
Fixed to use timers for similar cases that you pointed. Thanks for 
catching up this!
I started this CR as a part of MT ref. processing(JDK-8043575), so I 
only added to that path. But this should be fixed.

>
>   *) I would leave "Ref Counts" line as usual for compatibility reasons. Changing
> it to "Counts" would force GC log parsers to handle that corner case too.
Changed, 'Counts -> Ref Counts'.

>
>   *) This may reuse Indents?
>
>     95       out->print("%s", "    ");
Fixed to use Indents[2].

>
>   *) Probably makes sense to "hg mv -A" the workerDataArray files to preserve the
> Mercurial history -- webrev should say something like "copied from ...", IIRC.
Fixed.

webrev:
http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/
http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0

Thanks,
Sangheon


>
> Thanks,
> -Aleksey
>


From thomas.schatzl at oracle.com  Tue Jun 13 09:36:06 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 13 Jun 2017 11:36:06 +0200
Subject: RFR: 8177544: Restructure G1 Full GC code
In-Reply-To: <62d1f02b-1fc0-ffcf-b8e0-e88ebacecebe@oracle.com>
References: <62d1f02b-1fc0-ffcf-b8e0-e88ebacecebe@oracle.com>
Message-ID: <1497346566.2829.33.camel@oracle.com>

Hi,

? thanks for your hard work on the parallel full gc that starts with
this refactoring :)

On Thu, 2017-06-08 at 14:35 +0200, Stefan Johansson wrote:
> Hi,
> 
> Please review this enhancement:
> https://bugs.openjdk.java.net/browse/JDK-8177544
> 
> Webrev:
> http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.00/
> 
> Summary:
> This is more or less only code moving around. The function?
> do_full_collection in G1CollectedHeap is very large and breaking it
> up to smaller parts and grouping together some of the stack objects
> help readability.
> 
> In addition to splitting the large function to smaller ones I've?
> introduced two new classes:
> - G1FullGCScope that groups most of the previously spread out stack
> objects.
> - G1SerialCollector that handles the interaction with G1MarkSweep.
> 
> Doing this change will simplify future changes to the full GC.
> 
> Testing:
> * Locally run JTREG tests
> * RBT hotspot tier 2 & 3
> 

Some initial thoughts of the change, mostly to start a discussion:

? - G1FullGCScope class: please add a line what the purpose of the
class is.

? - a better name for G1CollectedHeap::reset_card_cache_and_queue()
could be abort_refinement().

? - G1CollectedHeap.cpp:1145: please remove the "stale" word in that
comment. It's confusing me because at that point because "stale" cards
are kind of defined for a particular context and does not fit here.

? - can you move all the printing after collection
(g1CollectedHeap.cpp:1239 - 1249) into an extra method too? Something
like "print_heap_after_full_collection()"? (I think there is some
argument to also have a print_heap_before_full_collection() method).

? - G1SerialCollector is actually a "G1SerialFullCollector". I do not
remember whether the follow-up change removes it again anyway, but it
seems to be a simple renaming.

? - G1SerialCollector interface: while I could live with the
prepare/do/complete naming of the methods, the typical sequence is
(unfortunately gc_prologue(), collect(), gc_epilogue())

? - previously printing and verifying the heap has been outside the
"Pause Full" GCTraceTime. I am okay with that.

? - could we put the code from g1CollectedHeap.cpp:1215-1232 into a
"prepare_for_regular_collection" method?

? - the order of the gc_epilogue() and g1_policy-
>record_full_collection_end() calls is different.

Actually, if it were for me, I would put the whole full gc setup and
teardown into a separate class/file.

Have public gc_prologue()/collect()/gc_epilogue() methods where
gc_prologue() is the first part of do_full_collection_inner() until
application of the G1SerialCollector, collect() the instantiation and
application of G1SerialCollector, and gc_epilogue() the remainder.

E.g. in G1CollectedHeap we only have the calls to these three methods
(there is no need to have all three).

At least I think it would help a lot if all that full gc stuff would be
separate physically from do-all-G1CollectedHeap.
With the G1FullGCScope there is almost no reference to G1CollectedHeap
afaics.

(There is _allocator->init_mutator_alloc_region() call)

? - g1CollectedHeap.hpp: please try to sort the definitions of the new
methods in order of calling them.

Thanks,
? Thomas


From thomas.schatzl at oracle.com  Tue Jun 13 11:21:22 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 13 Jun 2017 13:21:22 +0200
Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing
In-Reply-To: <a34ab9df-3e0a-b7ff-79e4-c4015ea843bc@oracle.com>
References: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>
 <bae34b84-4b8d-3d4e-9954-a204da1ab50f@redhat.com>
 <a34ab9df-3e0a-b7ff-79e4-c4015ea843bc@oracle.com>
Message-ID: <1497352882.2829.65.camel@oracle.com>

Hi Sangheon,


On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote:
> Hi Aleksey,
> 
> Thanks for the review.
> 
> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote:
> > 
> > On 06/10/2017 01:57 AM, sangheon wrote:
> > > 
> > > CR: https://bugs.openjdk.java.net/browse/JDK-8173335
> > > webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0

- There should be a destructor in ReferenceProcessor cleaning up the
dynamically allocated memory.

- the change should move gc+ref output to something else: there is so
much additional junk printed with gc+ref=trace so that the phase
logging is drowned out with real trace information and unusable for
regular consumption.

Also I would prefer to have this detailed log output interspersed
within the (existing) gc+phases output. Like under the "Reference
Processing" and "Reference Enqueuing" sections for G1 in particular.

Maybe with gc+phases+ref=debug/trace so that "everything" could be
enabled using "gc+phases*=debug/trace"?

I can see that the code throws away the previous information about
reference processing after every use (the phasetimes reused). This is
does not allow printing of the data at convenient times and places.

I.e. I would prefer if the data were aggregated (not only from one
particular phase) and later printed together.

I kind of disagree with Aleksey about need for backwards compatibility
of log messages. This is such a big breaking change in the amount of
information shown that existing users will want to adapt their log
readers anyway.
As mentioned, due to real trace code here, gc+ref=trace is unusable.

We could still provide minimal backwards compatible output under
gc+ref=debug if needed.

- I would prefer if resetting the reference phase times logger wouldn't
be kind of an afterthought of printing :)

Also it might be useful to keep the data around for somewhat longer
(not throw it away after every phase). Don't we need the data for
further analysis?

This would also allow printing it later using different log tags (with
different formatting).

- I like the split of phasetimes into data storage and printing. I do
not like that basically the timing data is created twice, once for the
phasetimes, once for the GCTimer (for JFR basically). Or the gctimer is
passed everywhere. But that is another issue I guess.

Thanks,
? Thomas


> > Oh, good! I had to instrument these by hand when optimizing RP
> > paths.
> > 
> > Comments after brief look:
> > 
> > ? *) So, the path with NULL executor are also not handling the
> > timer? E.g. CMS:
> > 
> > ? 5262???if (rp->processing_is_mt()) {
> > ? 5263?????rp->balance_all_queues();
> > ? 5264?????CMSRefProcTaskExecutor task_executor(*this);
> > ? 5265?????rp->enqueue_discovered_references(&task_executor,
> > _gc_timer_cm);
> > ? 5266???} else {
> > ? 5267?????rp->enqueue_discovered_references(NULL);
> > ? 5268???}
> Fixed to use timers for similar cases that you pointed. Thanks for?
> catching up this!
> I started this CR as a part of MT ref. processing(JDK-8043575), so I?
> only added to that path. But this should be fixed.
> > 
> > 
> > ? *) I would leave "Ref Counts" line as usual for compatibility
> > reasons. Changing
> > it to "Counts" would force GC log parsers to handle that corner
> > case too.
> Changed, 'Counts -> Ref Counts'.
> > 
> > 
> > ? *) This may reuse Indents?
> > 
> > ????95???????out->print("%s", "????");
> Fixed to use Indents[2].
> 
> > 
> > 
> > ? *) Probably makes sense to "hg mv -A" the workerDataArray files
> > to preserve the
> > Mercurial history -- webrev should say something like "copied from
> > ...", IIRC.
> Fixed.
> 
> webrev:
> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/
> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0
> 
> Thanks,
> Sangheon
> 
> 
> > 
> > 
> > Thanks,
> > -Aleksey
> > 


From erik.helin at oracle.com  Tue Jun 13 12:21:22 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Tue, 13 Jun 2017 14:21:22 +0200
Subject: RFR (7xS): 8178148: Log more detailed information about scan rs
 phase
In-Reply-To: <1497267278.2777.2.camel@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <1493986396.2777.61.camel@oracle.com>
 <41d0b773-06fd-dfef-beda-2d62797210d9@oracle.com>
 <1494849002.2707.20.camel@oracle.com>
 <c37f3bfc-9945-2998-c5bb-f7858318ac3c@oracle.com>
 <1495543832.2781.37.camel@oracle.com>
 <04ab36fb-afaf-0ed9-6480-b955474d4bee@oracle.com>
 <1497267278.2777.2.camel@oracle.com>
Message-ID: <c2ba3f7e-1cb5-356c-5658-c8be1481b698@oracle.com>

On 06/12/2017 01:34 PM, Thomas Schatzl wrote:
> Hi all,
>
>   sorry for another round of reviews: Erik asked me to add a gtest test
> for the linked subitems, both for the (existing) set() and (new) add()
> methods.
>
> Webrev: http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2_to_3/ (di
> ff): http://cr.openjdk.java.net/~tschatzl/8178148/webrev.3/ (full)
>
> Testing:
> local testing, jprt

Looks good, Reviewed.

Thanks,
Erik

> Thanks,
>   Thomas
>
> On Tue, 2017-05-23 at 12:15 -0700, sangheon wrote:
>> Hi Thomas,
>>
>> On 05/23/2017 05:50 AM, Thomas Schatzl wrote:
>>>
>>> Hi all,
>>>
>>>    unfortunately, for support of some code there is need for one
>>> more
>>> public method in the g1gcphasetimes class.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev.1_to_2/ (diff)
>>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2/ (full)
>> Webrev.2 still looks good to me.
>>
>> Thanks,
>> Sangheon
>>
>>
>>>
>>>
>>> Sorry for the issue.
>>>
>>> Thanks,
>>>    Thomas
>>>


From SL at elp-consult.co.uk  Tue Jun 13 16:35:36 2017
From: SL at elp-consult.co.uk (Shi Lu)
Date: Tue, 13 Jun 2017 16:35:36 +0000
Subject: JVM expert lead opportunity in Bay Area
In-Reply-To: <VI1PR0801MB20309B7D38D176194031B01CDFC20@VI1PR0801MB2030.eurprd08.prod.outlook.com>
References: <VI1PR0801MB20309B7D38D176194031B01CDFC20@VI1PR0801MB2030.eurprd08.prod.outlook.com>
Message-ID: <VI1PR0801MB20303D3FF3C44A719C4C369ADFC20@VI1PR0801MB2030.eurprd08.prod.outlook.com>

Hello there,

This is Jay from ELP Consult Ltd, greetings from London! I am a global strategic hiring partner with Alibaba Group, and would like to take this opportunity to introduce a great opportunity here:

Alibaba Chief Technology Officer recently announced a re-structuring plan for Alibaba CTO group, separate the AIS business group to numbers of different divisions, but still work together to create a globally competitive combined hardware and software infrastructure for Alibaba eco-system.

Set up system software division by the platform architecture team, JVM, Linux Kernel OS team together. therefore they need high-end technical Leader with strong JVM OpenJDK experience especially on GC to build up a new team in US and work together with the existing team in China HQ Hangzhou.

you will be responsible the development of new technologies, be able to lead a team to conduct in-depth research and innovation. as one of world's largest users of Java, Alibaba will provide you with the extreme technical challenges, which you will never find it from anywhere else.

If you are interested to explore more, please do not hesitate to reply me in order to have a confidential chat, I am looking forward to hear from you soon, thank you.


?????????????????????
Kind Regards/Mit freundlichen Gr?ssen/????

Jay Lu
Tel: +44 208 8996136
Mobile: +44 7917405668
ELP Consult Ltd. ????
      [ELP Logo]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170613/2141bce6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 2670 bytes
Desc: image001.jpg
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170613/2141bce6/image001.jpg>

From sangheon.kim at oracle.com  Tue Jun 13 21:21:29 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Tue, 13 Jun 2017 14:21:29 -0700
Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing
In-Reply-To: <1497352882.2829.65.camel@oracle.com>
References: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>
 <bae34b84-4b8d-3d4e-9954-a204da1ab50f@redhat.com>
 <a34ab9df-3e0a-b7ff-79e4-c4015ea843bc@oracle.com>
 <1497352882.2829.65.camel@oracle.com>
Message-ID: <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com>

Hi Thomas,

Thank you for reviewing this.

On 06/13/2017 04:21 AM, Thomas Schatzl wrote:
> Hi Sangheon,
>
>
> On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote:
>> Hi Aleksey,
>>
>> Thanks for the review.
>>
>> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote:
>>> On 06/10/2017 01:57 AM, sangheon wrote:
>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8173335
>>>> webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0
> - There should be a destructor in ReferenceProcessor cleaning up the
> dynamically allocated memory.
Thomas and I had some discussion about this and agreed to file a 
separate CR for freeing issue.

I noticed that there's no destructor when I wrote this, but this is how 
we usually implement.
However as this seems incorrect, I will add a destructor for newly added 
class but it will not be used in this patch.
It will be used in the following CR( 
https://bugs.openjdk.java.net/browse/JDK-8182120 ) which fixes 
not-freeing issue in ReferenceProcessor.
FYI, ReferenceProcessor has heap allocated members of 
ReferencePolicy(and its friends) but it is not freed too. So instead of 
extending this patch, I propose to separate this freeing issue.

>
> - the change should move gc+ref output to something else: there is so
> much additional junk printed with gc+ref=trace so that the phase
> logging is drowned out with real trace information and unusable for
> regular consumption.
Okay, I will add it.
But I asked introducing 'gc+ref+phases' before but you didn't like it. 
:) Probably I didn't provide much details?!

>
> Also I would prefer to have this detailed log output interspersed
> within the (existing) gc+phases output. Like under the "Reference
> Processing" and "Reference Enqueuing" sections for G1 in particular.
Frankly speaking, I'm not much interested now.
When I started investigating this CR, you mentioned about this too. (But 
you were okay for either way. i.e. current one or interspersing into G1 
logging. :) )
I also tried in that way(interspersing one) and my feeling is that I 
don't see much benefit to have ref logs in G1 phases section. It looks 
better organized but it doesn't mean current log style is worse.
Ref. logs are printed separately for long time and other shared codes 
also print logs immediately.

On the other hand, current implementation(re-use and print immediately) 
seems simpler to implement.
In addition, ReferenceProcessor::process_discovered_reflist() is 
repeatedly called for different type of References so re-using log 
printer seems natural to me. :)

>
> Maybe with gc+phases+ref=debug/trace so that "everything" could be
> enabled using "gc+phases*=debug/trace"?
Yes, good idea.

>
> I can see that the code throws away the previous information about
> reference processing after every use (the phasetimes reused). This is
> does not allow printing of the data at convenient times and places.
>
> I.e. I would prefer if the data were aggregated (not only from one
> particular phase) and later printed together.
If we don't intersperse with existing G1 log, do you still think 
printing later is needed?
Probably printing after Phanthom Ref. processed or different location?

>
> I kind of disagree with Aleksey about need for backwards compatibility
> of log messages. This is such a big breaking change in the amount of
> information shown that existing users will want to adapt their log
> readers anyway.
True that log parsers already should be updated, but I understood 
Aleksey's comment something like preferring 'Ref Counts' instead of 
'Counts'.

> As mentioned, due to real trace code here, gc+ref=trace is unusable.
FYI, probably you tested with fastdebug because there are many 
debug/trace logs for debug build. It doesn't bother from product build 
actually.
But as I said, I will change current new logs' channel from 'gc+ref' to 
'gc+phases+ref'.

>
> We could still provide minimal backwards compatible output under
> gc+ref=debug if needed.
I'm don't see much value on this.
gc+phases+ref seems better.

>
> - I would prefer if resetting the reference phase times logger wouldn't
> be kind of an afterthought of printing :)
>
> Also it might be useful to keep the data around for somewhat longer
> (not throw it away after every phase). Don't we need the data for
> further analysis?
I don't have strong opinion on this.

I didn't consider keeping log data for further analysis. This could a 
minor reason for supporting keeping log data longer but I think 
interspersing with existing G1 log would be the main reason of keeping it.

>
> This would also allow printing it later using different log tags (with
> different formatting).
>
> - I like the split of phasetimes into data storage and printing. I do
> not like that basically the timing data is created twice, once for the
> phasetimes, once for the GCTimer (for JFR basically).
No, currently timing data is created once and used for both phase log 
and GCTimer.
Or am I missing something?

So in summary, mostly I agree with your comments except below 2:
1. Interspersing with G1 log.
2. Keeping log data longer. (This should be done if we go with 
interspersing idea)

Let me post updated webrev, after making all decision.

Thanks,
Sangheon


> Or the gctimer is
> passed everywhere. But that is another issue I guess.
>
> Thanks,
>    Thomas
>
>
>>> Oh, good! I had to instrument these by hand when optimizing RP
>>> paths.
>>>
>>> Comments after brief look:
>>>
>>>    *) So, the path with NULL executor are also not handling the
>>> timer? E.g. CMS:
>>>
>>>    5262   if (rp->processing_is_mt()) {
>>>    5263     rp->balance_all_queues();
>>>    5264     CMSRefProcTaskExecutor task_executor(*this);
>>>    5265     rp->enqueue_discovered_references(&task_executor,
>>> _gc_timer_cm);
>>>    5266   } else {
>>>    5267     rp->enqueue_discovered_references(NULL);
>>>    5268   }
>> Fixed to use timers for similar cases that you pointed. Thanks for
>> catching up this!
>> I started this CR as a part of MT ref. processing(JDK-8043575), so I
>> only added to that path. But this should be fixed.
>>>
>>>    *) I would leave "Ref Counts" line as usual for compatibility
>>> reasons. Changing
>>> it to "Counts" would force GC log parsers to handle that corner
>>> case too.
>> Changed, 'Counts -> Ref Counts'.
>>>
>>>    *) This may reuse Indents?
>>>
>>>      95       out->print("%s", "    ");
>> Fixed to use Indents[2].
>>
>>>
>>>    *) Probably makes sense to "hg mv -A" the workerDataArray files
>>> to preserve the
>>> Mercurial history -- webrev should say something like "copied from
>>> ...", IIRC.
>> Fixed.
>>
>> webrev:
>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/
>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0
>>
>> Thanks,
>> Sangheon
>>
>>
>>>
>>> Thanks,
>>> -Aleksey
>>>


From sangheon.kim at oracle.com  Tue Jun 13 21:29:06 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Tue, 13 Jun 2017 14:29:06 -0700
Subject: RFR (7xS): 8178148: Log more detailed information about scan rs
 phase
In-Reply-To: <c2ba3f7e-1cb5-356c-5658-c8be1481b698@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <1493986396.2777.61.camel@oracle.com>
 <41d0b773-06fd-dfef-beda-2d62797210d9@oracle.com>
 <1494849002.2707.20.camel@oracle.com>
 <c37f3bfc-9945-2998-c5bb-f7858318ac3c@oracle.com>
 <1495543832.2781.37.camel@oracle.com>
 <04ab36fb-afaf-0ed9-6480-b955474d4bee@oracle.com>
 <1497267278.2777.2.camel@oracle.com>
 <c2ba3f7e-1cb5-356c-5658-c8be1481b698@oracle.com>
Message-ID: <78237c97-13b7-505e-35d0-acf9a58fa17f@oracle.com>

Hi Thomas,

On 06/13/2017 05:21 AM, Erik Helin wrote:
> On 06/12/2017 01:34 PM, Thomas Schatzl wrote:
>> Hi all,
>>
>>   sorry for another round of reviews: Erik asked me to add a gtest test
>> for the linked subitems, both for the (existing) set() and (new) add()
>> methods.
>>
>> Webrev: http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2_to_3/ (di
>> ff): http://cr.openjdk.java.net/~tschatzl/8178148/webrev.3/ (full)
>>
>> Testing:
>> local testing, jprt
>
> Looks good, Reviewed.
Looks good to me too.

Thanks,
Sangheon


>
> Thanks,
> Erik
>
>> Thanks,
>>   Thomas
>>
>> On Tue, 2017-05-23 at 12:15 -0700, sangheon wrote:
>>> Hi Thomas,
>>>
>>> On 05/23/2017 05:50 AM, Thomas Schatzl wrote:
>>>>
>>>> Hi all,
>>>>
>>>>    unfortunately, for support of some code there is need for one
>>>> more
>>>> public method in the g1gcphasetimes class.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev.1_to_2/ (diff)
>>>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2/ (full)
>>> Webrev.2 still looks good to me.
>>>
>>> Thanks,
>>> Sangheon
>>>
>>>
>>>>
>>>>
>>>> Sorry for the issue.
>>>>
>>>> Thanks,
>>>>    Thomas
>>>>


From yasuenag at gmail.com  Wed Jun 14 04:22:56 2017
From: yasuenag at gmail.com (Yasumasa Suenaga)
Date: Wed, 14 Jun 2017 13:22:56 +0900
Subject: JDK-8153333: [REDO] STW phases at Concurrent GC should count in
 PerfCounter
Message-ID: <CAGFVN2CoZiXRsr3McK69PSNz1bYwDvnx+LkqEaJc25c4uuDovw@mail.gmail.com>

Hi all,

I changed PerfCounter to show CGC STW phase in jstat in JDK-8151674.
However, it occurred several jtreg test failure, so it was back-outed.

I want to resume to work for this issue.

http://cr.openjdk.java.net/~ysuenaga/JDK-8153333/webrev.03/hotspot/
http://cr.openjdk.java.net/~ysuenaga/JDK-8153333/webrev.03/jdk/

These changes are work fine on jtreg test as below:

  hotspot/test/serviceability/tmtools/jstat
  jdk/test/sun/tools


Since JDK 9, default GC algorithm is set to G1.
So I think this change is useful to watch GC behavior through jstat.

I cannot access JPRT. Could you help?


Thanks,

Yasumasa


From sangheon.kim at oracle.com  Wed Jun 14 07:52:55 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Wed, 14 Jun 2017 00:52:55 -0700
Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing
In-Reply-To: <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com>
References: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>
 <bae34b84-4b8d-3d4e-9954-a204da1ab50f@redhat.com>
 <a34ab9df-3e0a-b7ff-79e4-c4015ea843bc@oracle.com>
 <1497352882.2829.65.camel@oracle.com>
 <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com>
Message-ID: <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com>

Hi Thomas again,

On 06/13/2017 02:21 PM, sangheon wrote:
> Hi Thomas,
>
> Thank you for reviewing this.
>
> On 06/13/2017 04:21 AM, Thomas Schatzl wrote:
>> Hi Sangheon,
>>
>>
>> On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote:
>>> Hi Aleksey,
>>>
>>> Thanks for the review.
>>>
>>> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote:
>>>> On 06/10/2017 01:57 AM, sangheon wrote:
>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8173335
>>>>> webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0
>> - There should be a destructor in ReferenceProcessor cleaning up the
>> dynamically allocated memory.
> Thomas and I had some discussion about this and agreed to file a 
> separate CR for freeing issue.
>
> I noticed that there's no destructor when I wrote this, but this is 
> how we usually implement.
> However as this seems incorrect, I will add a destructor for newly 
> added class but it will not be used in this patch.
> It will be used in the following CR( 
> https://bugs.openjdk.java.net/browse/JDK-8182120 ) which fixes 
> not-freeing issue in ReferenceProcessor.
> FYI, ReferenceProcessor has heap allocated members of 
> ReferencePolicy(and its friends) but it is not freed too. So instead 
> of extending this patch, I propose to separate this freeing issue.
>
>>
>> - the change should move gc+ref output to something else: there is so
>> much additional junk printed with gc+ref=trace so that the phase
>> logging is drowned out with real trace information and unusable for
>> regular consumption.
> Okay, I will add it.
> But I asked introducing 'gc+ref+phases' before but you didn't like it. 
> :) Probably I didn't provide much details?!
>
>>
>> Also I would prefer to have this detailed log output interspersed
>> within the (existing) gc+phases output. Like under the "Reference
>> Processing" and "Reference Enqueuing" sections for G1 in particular.
> Frankly speaking, I'm not much interested now.
> When I started investigating this CR, you mentioned about this too. 
> (But you were okay for either way. i.e. current one or interspersing 
> into G1 logging. :) )
> I also tried in that way(interspersing one) and my feeling is that I 
> don't see much benefit to have ref logs in G1 phases section. It looks 
> better organized but it doesn't mean current log style is worse.
> Ref. logs are printed separately for long time and other shared codes 
> also print logs immediately.
>
> On the other hand, current implementation(re-use and print 
> immediately) seems simpler to implement.
> In addition, ReferenceProcessor::process_discovered_reflist() is 
> repeatedly called for different type of References so re-using log 
> printer seems natural to me. :)
>
>>
>> Maybe with gc+phases+ref=debug/trace so that "everything" could be
>> enabled using "gc+phases*=debug/trace"?
> Yes, good idea.
>
>>
>> I can see that the code throws away the previous information about
>> reference processing after every use (the phasetimes reused). This is
>> does not allow printing of the data at convenient times and places.
>>
>> I.e. I would prefer if the data were aggregated (not only from one
>> particular phase) and later printed together.
> If we don't intersperse with existing G1 log, do you still think 
> printing later is needed?
> Probably printing after Phanthom Ref. processed or different location?
>
>>
>> I kind of disagree with Aleksey about need for backwards compatibility
>> of log messages. This is such a big breaking change in the amount of
>> information shown that existing users will want to adapt their log
>> readers anyway.
> True that log parsers already should be updated, but I understood 
> Aleksey's comment something like preferring 'Ref Counts' instead of 
> 'Counts'.
>
>> As mentioned, due to real trace code here, gc+ref=trace is unusable.
> FYI, probably you tested with fastdebug because there are many 
> debug/trace logs for debug build. It doesn't bother from product build 
> actually.
> But as I said, I will change current new logs' channel from 'gc+ref' 
> to 'gc+phases+ref'.
>
>>
>> We could still provide minimal backwards compatible output under
>> gc+ref=debug if needed.
> I'm don't see much value on this.
> gc+phases+ref seems better.
>
>>
>> - I would prefer if resetting the reference phase times logger wouldn't
>> be kind of an afterthought of printing :)
>>
>> Also it might be useful to keep the data around for somewhat longer
>> (not throw it away after every phase). Don't we need the data for
>> further analysis?
> I don't have strong opinion on this.
>
> I didn't consider keeping log data for further analysis. This could a 
> minor reason for supporting keeping log data longer but I think 
> interspersing with existing G1 log would be the main reason of keeping 
> it.
>
>>
>> This would also allow printing it later using different log tags (with
>> different formatting).
>>
>> - I like the split of phasetimes into data storage and printing. I do
>> not like that basically the timing data is created twice, once for the
>> phasetimes, once for the GCTimer (for JFR basically).
> No, currently timing data is created once and used for both phase log 
> and GCTimer.
> Or am I missing something?
>
> So in summary, mostly I agree with your comments except below 2:
> 1. Interspersing with G1 log.
> 2. Keeping log data longer. (This should be done if we go with 
> interspersing idea)
I started working on above 2 items. :)
I will update webrev when I'm ready.

Thanks,
Sangheon


>
> Let me post updated webrev, after making all decision.
>
> Thanks,
> Sangheon
>
>
>> Or the gctimer is
>> passed everywhere. But that is another issue I guess.
>>
>> Thanks,
>>    Thomas
>>
>>
>>>> Oh, good! I had to instrument these by hand when optimizing RP
>>>> paths.
>>>>
>>>> Comments after brief look:
>>>>
>>>>    *) So, the path with NULL executor are also not handling the
>>>> timer? E.g. CMS:
>>>>
>>>>    5262   if (rp->processing_is_mt()) {
>>>>    5263     rp->balance_all_queues();
>>>>    5264     CMSRefProcTaskExecutor task_executor(*this);
>>>>    5265 rp->enqueue_discovered_references(&task_executor,
>>>> _gc_timer_cm);
>>>>    5266   } else {
>>>>    5267     rp->enqueue_discovered_references(NULL);
>>>>    5268   }
>>> Fixed to use timers for similar cases that you pointed. Thanks for
>>> catching up this!
>>> I started this CR as a part of MT ref. processing(JDK-8043575), so I
>>> only added to that path. But this should be fixed.
>>>>
>>>>    *) I would leave "Ref Counts" line as usual for compatibility
>>>> reasons. Changing
>>>> it to "Counts" would force GC log parsers to handle that corner
>>>> case too.
>>> Changed, 'Counts -> Ref Counts'.
>>>>
>>>>    *) This may reuse Indents?
>>>>
>>>>      95       out->print("%s", "    ");
>>> Fixed to use Indents[2].
>>>
>>>>
>>>>    *) Probably makes sense to "hg mv -A" the workerDataArray files
>>>> to preserve the
>>>> Mercurial history -- webrev should say something like "copied from
>>>> ...", IIRC.
>>> Fixed.
>>>
>>> webrev:
>>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/
>>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0
>>>
>>> Thanks,
>>> Sangheon
>>>
>>>
>>>>
>>>> Thanks,
>>>> -Aleksey
>>>>
>


From thomas.schatzl at oracle.com  Wed Jun 14 12:58:57 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 14 Jun 2017 14:58:57 +0200
Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as
 belonging to mtGC
In-Reply-To: <1497107249632.26184@infobip.com>
References: <1495365159435.54025@infobip.com>
 <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com>
 <1495734990075.28893@infobip.com>
 <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com>
 <1496926647107.27811@infobip.com>
 , <E2C0ACBF-E0CB-426F-97A6-99224CFDBE0C@oracle.com>
 <1497107249632.26184@infobip.com>
Message-ID: <1497445137.2785.7.camel@oracle.com>

Hi,

On Sat, 2017-06-10 at 15:07 +0000, Milan Mimica wrote:
> Milan Mimica, Senior Software Engineer / Division Lead
> > 
> > From: Kim Barrett <kim.barrett at oracle.com>
> > Sent: Friday, June 9, 2017 03:04
> > 
> > Note that the refactoring patch doesn't apply cleanly to jdk10/hs
> > tip.
> > There's a merge conflict with the fix for JDK-8168467 (resolved
> > 2017/03/15).
> Oh, I 've developed it against jdk10/jdk10. Care to explain a bit the
> difference? Attached are patches against jdk10/hs.

jdk10/hs is the current development tree where development happens and
new changes pushed into. hs10/hs10 is the tree where public builds are
made from. Changes are regularly (at least in the typical case) merged
to jdk10/jdk10 after some additional regression testing.

> > What testing has been done???And are there any tests you can point
> > to
> > that are directly affected???(I already know about
> > TestArrayAllocatorMallocLimit.java.)??I'll probably want to run
> > some
> > tests using our internal test facilities as part of sponsoring.
> I have run jtreg on my laptop. There are some failures actually,
> but happens also on clean repo. I'm not aware of anything else.
> 

I created?https://bugs.openjdk.java.net/browse/JDK-8182169?for the
ArrayAllocator refactoring as I could not find an existing issue.

I uploaded a webrev of that change to
http://cr.openjdk.java.net/~tschatzl/8182169/webrev/

Looks good to me.

I also uploaded a webrev for JDK-8176571 based on the above to
http://cr.openjdk.java.net/~tschatzl/8176571/webrev

Looks good to me too, but is there a reason to not use default parameters for the CHeapBitmap constructors?

As for testing I am going to move it through JPRT (our build and test system).

Thanks,
? Thomas


From Milan.Mimica at infobip.com  Wed Jun 14 13:59:14 2017
From: Milan.Mimica at infobip.com (Milan Mimica)
Date: Wed, 14 Jun 2017 13:59:14 +0000
Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as
 belonging to mtGC
In-Reply-To: <1497445137.2785.7.camel@oracle.com>
References: <1495365159435.54025@infobip.com>
 <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com>
 <1495734990075.28893@infobip.com>
 <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com>
 <1496926647107.27811@infobip.com>	,
 <E2C0ACBF-E0CB-426F-97A6-99224CFDBE0C@oracle.com>
 <1497107249632.26184@infobip.com>,<1497445137.2785.7.camel@oracle.com>
Message-ID: <1497448754310.80745@infobip.com>

Hi


Milan Mimica, Senior Software Engineer / Division Lead
> From: Thomas Schatzl <thomas.schatzl at oracle.com>
> Sent: Wednesday, June 14, 2017 14:58
> To: Milan Mimica; hotspot-gc-dev at openjdk.java.net
> Subject: Re: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC
>
>> Oh, I 've developed it against jdk10/jdk10. Care to explain a bit the
>> difference? Attached are patches against jdk10/hs.
>
> jdk10/hs is the current development tree where development happens and
> new changes pushed into. hs10/hs10 is the tree where public builds are
> made from. Changes are regularly (at least in the typical case) merged
> to jdk10/jdk10 after some additional regression testing.

I see. Thanks.


> I created https://bugs.openjdk.java.net/browse/JDK-8182169 for the
> ArrayAllocator refactoring as I could not find an existing issue.
>
> I uploaded a webrev of that change to
> http://cr.openjdk.java.net/~tschatzl/8182169/webrev/
>
> Looks good to me.
>
> I also uploaded a webrev for JDK-8176571 based on the above to
> http://cr.openjdk.java.net/~tschatzl/8176571/webrev
>
> Looks good to me too, but is there a reason to not use default parameters for the CHeapBitmap constructors?

I'd rather somehow make the argument mandatory, to force people to choose a memory category.


From stefan.johansson at oracle.com  Wed Jun 14 14:45:57 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 14 Jun 2017 16:45:57 +0200
Subject: RFR: 8177544: Restructure G1 Full GC code
In-Reply-To: <1497346566.2829.33.camel@oracle.com>
References: <62d1f02b-1fc0-ffcf-b8e0-e88ebacecebe@oracle.com>
 <1497346566.2829.33.camel@oracle.com>
Message-ID: <f6253d8e-210e-b87f-2e75-48ee24f13291@oracle.com>

Thanks Thomas for reviewing,

On 2017-06-13 11:36, Thomas Schatzl wrote:
> Hi,
>
>    thanks for your hard work on the parallel full gc that starts with
> this refactoring :)
:)
> On Thu, 2017-06-08 at 14:35 +0200, Stefan Johansson wrote:
>> Hi,
>>
>> Please review this enhancement:
>> https://bugs.openjdk.java.net/browse/JDK-8177544
>>
>> Webrev:
>> http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.00/
>>
>> Summary:
>> This is more or less only code moving around. The function
>> do_full_collection in G1CollectedHeap is very large and breaking it
>> up to smaller parts and grouping together some of the stack objects
>> help readability.
>>
>> In addition to splitting the large function to smaller ones I've
>> introduced two new classes:
>> - G1FullGCScope that groups most of the previously spread out stack
>> objects.
>> - G1SerialCollector that handles the interaction with G1MarkSweep.
>>
>> Doing this change will simplify future changes to the full GC.
>>
>> Testing:
>> * Locally run JTREG tests
>> * RBT hotspot tier 2 & 3
>>
> Some initial thoughts of the change, mostly to start a discussion:
>
>    - G1FullGCScope class: please add a line what the purpose of the
> class is.
Added a sentence, do you want more?
>    - a better name for G1CollectedHeap::reset_card_cache_and_queue()
> could be abort_refinement().
Sounds good, fixed.
>
>    - G1CollectedHeap.cpp:1145: please remove the "stale" word in that
> comment. It's confusing me because at that point because "stale" cards
> are kind of defined for a particular context and does not fit here.
Fixed.
>    - can you move all the printing after collection
> (g1CollectedHeap.cpp:1239 - 1249) into an extra method too? Something
> like "print_heap_after_full_collection()"? (I think there is some
> argument to also have a print_heap_before_full_collection() method).
Done, since this needed me to pass around the heap_transition I decided 
to move it into the G1FullGCScope and I think that was an improvement in 
it self.
>    - G1SerialCollector is actually a "G1SerialFullCollector". I do not
> remember whether the follow-up change removes it again anyway, but it
> seems to be a simple renaming.
Yes, it will be removed. And yes I can do the rename.
>
>    - G1SerialCollector interface: while I could live with the
> prepare/do/complete naming of the methods, the typical sequence is
> (unfortunately gc_prologue(), collect(), gc_epilogue())
I'm a bit hesitant about re-using or gc_*logue and moving stuff into 
them if that's what you mean. And if you can live with the current 
proposal I think I will stick with it.
>
>    - previously printing and verifying the heap has been outside the
> "Pause Full" GCTraceTime. I am okay with that.
I see, and I also think the new way is ok.
>
>    - could we put the code from g1CollectedHeap.cpp:1215-1232 into a
> "prepare_for_regular_collection" method?
Yes, will group them together and also include the above assert. I'll 
also move the MemoryService::track_memory_usage() call into gc_epilogue 
as it is called at a similar point for the YC. I called the new method 
prepare_heap_for_mutators.
>
>    - the order of the gc_epilogue() and g1_policy-
>> record_full_collection_end() calls is different.
The reason I moved them around is that 
increment_old_marking_cycles_completed has been moved into the epilogue. 
I was uncertain if the policy needed to see that update before recording 
the end. Digging into the policy I think this is not the case, I'll 
reorder them again.
> Actually, if it were for me, I would put the whole full gc setup and
> teardown into a separate class/file.
>
> Have public gc_prologue()/collect()/gc_epilogue() methods where
> gc_prologue() is the first part of do_full_collection_inner() until
> application of the G1SerialCollector, collect() the instantiation and
> application of G1SerialCollector, and gc_epilogue() the remainder.
>
> E.g. in G1CollectedHeap we only have the calls to these three methods
> (there is no need to have all three).
>
> At least I think it would help a lot if all that full gc stuff would be
> separate physically from do-all-G1CollectedHeap.
> With the G1FullGCScope there is almost no reference to G1CollectedHeap
> afaics.
>
> (There is _allocator->init_mutator_alloc_region() call)
I see your point and I think it would be good. But as we discussed over 
chat, might be something to look at once everything else in this area is 
done. Will create a RFE for this.

>    - g1CollectedHeap.hpp: please try to sort the definitions of the new
> methods in order of calling them.
Done.

Here are updated webrevs:
Full: http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.01/
Inc: http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.00-01/

Thanks,
Stefan
>
> Thanks,
>    Thomas


From kim.barrett at oracle.com  Wed Jun 14 17:39:03 2017
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 14 Jun 2017 13:39:03 -0400
Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as
 belonging to mtGC
In-Reply-To: <1497445137.2785.7.camel@oracle.com>
References: <1495365159435.54025@infobip.com>
 <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com>
 <1495734990075.28893@infobip.com>
 <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com>
 <1496926647107.27811@infobip.com>
 <E2C0ACBF-E0CB-426F-97A6-99224CFDBE0C@oracle.com>
 <1497107249632.26184@infobip.com> <1497445137.2785.7.camel@oracle.com>
Message-ID: <A93A9DA3-224D-4030-8140-72523E7107E3@oracle.com>

> On Jun 14, 2017, at 8:58 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi,

Thomas - Thanks for picking this up!  I got distracted?

> On Sat, 2017-06-10 at 15:07 +0000, Milan Mimica wrote:
>> Milan Mimica, Senior Software Engineer / Division Lead
>>> 
>>> From: Kim Barrett <kim.barrett at oracle.com>
>>> Sent: Friday, June 9, 2017 03:04
>>> 
>>> Note that the refactoring patch doesn't apply cleanly to jdk10/hs
>>> tip.
>>> There's a merge conflict with the fix for JDK-8168467 (resolved
>>> 2017/03/15).
>> Oh, I 've developed it against jdk10/jdk10. Care to explain a bit the
>> difference? Attached are patches against jdk10/hs.
> 
> jdk10/hs is the current development tree where development happens and
> new changes pushed into. hs10/hs10 is the tree where public builds are
> made from. Changes are regularly (at least in the typical case) merged
> to jdk10/jdk10 after some additional regression testing.
> 
>>> What testing has been done?  And are there any tests you can point
>>> to
>>> that are directly affected?  (I already know about
>>> TestArrayAllocatorMallocLimit.java.)  I'll probably want to run
>>> some
>>> tests using our internal test facilities as part of sponsoring.
>> I have run jtreg on my laptop. There are some failures actually,
>> but happens also on clean repo. I'm not aware of anything else.
>> 
> 
> I created https://bugs.openjdk.java.net/browse/JDK-8182169 for the
> ArrayAllocator refactoring as I could not find an existing issue.
> 
> I uploaded a webrev of that change to
> http://cr.openjdk.java.net/~tschatzl/8182169/webrev/
> 
> Looks good to me.

Looks good to me too.

> I also uploaded a webrev for JDK-8176571 based on the above to
> http://cr.openjdk.java.net/~tschatzl/8176571/webrev
> 
> Looks good to me too, but is there a reason to not use default parameters for the CHeapBitmap constructors?

Agreed.  Defaulting the flags to mtInternal (which is effectively what?s being done the hard way)
would simplify things.

> As for testing I am going to move it through JPRT (our build and test system).
> 
> Thanks,
>   Thomas


From kim.barrett at oracle.com  Wed Jun 14 17:48:04 2017
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 14 Jun 2017 13:48:04 -0400
Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as
 belonging to mtGC
In-Reply-To: <1497448754310.80745@infobip.com>
References: <1495365159435.54025@infobip.com>
 <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com>
 <1495734990075.28893@infobip.com>
 <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com>
 <1496926647107.27811@infobip.com>
 <E2C0ACBF-E0CB-426F-97A6-99224CFDBE0C@oracle.com>
 <1497107249632.26184@infobip.com> <1497445137.2785.7.camel@oracle.com>
 <1497448754310.80745@infobip.com>
Message-ID: <1DDCE6C7-B5CC-4878-85E8-44D497B41E7C@oracle.com>

> On Jun 14, 2017, at 9:59 AM, Milan Mimica <Milan.Mimica at infobip.com> wrote:
>> I also uploaded a webrev for JDK-8176571 based on the above to
>> http://cr.openjdk.java.net/~tschatzl/8176571/webrev
>> 
>> Looks good to me too, but is there a reason to not use default parameters for the CHeapBitmap constructors?
> 
> I'd rather somehow make the argument mandatory, to force people to choose a memory category.

I would support making the argument mandatory.  All non-test callers are presently in g1, and
are already being touched to change them to explicitly use mtGC, so would not be affected by
such a change.  But there are some callers in test_bitMap and test_bitMap_search native tests
that would need to be fixed.


From aph at redhat.com  Thu Jun 15 10:49:59 2017
From: aph at redhat.com (Andrew Haley)
Date: Thu, 15 Jun 2017 11:49:59 +0100
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <9c019b11-0c78-2649-d3bd-cd02fd999e68@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <0ccc1ec4-9866-cd2a-c6bc-a32c2da8cb3d@oracle.com>
 <9c019b11-0c78-2649-d3bd-cd02fd999e68@redhat.com>
Message-ID: <f37491a9-d450-e8b1-b55b-ee66bdc2d9dc@redhat.com>

On 29/05/17 15:16, Roman Kennke wrote:
> I agree that having a single pool would be good. The current WorkGang
> doesn't do it though, because we can't borrow threads while the GC is
> doing work, at least not in a way that is GC agnostic (see my reply to
> Robbin).

It would be nice, though, not to have to share worker threads between
GC and other jobs if we didn't need todo so.  If we're on a
large-scale multicore machine, we don't want to be trashing warm
caches in idle cores unless it's forced on us by the lack of hardware
resources.  We're not always short of cores, and the massively-
scalable multi-core world is nearly upon us.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From robbin.ehn at oracle.com  Thu Jun 15 12:10:03 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 15 Jun 2017 14:10:03 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <f37491a9-d450-e8b1-b55b-ee66bdc2d9dc@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <0ccc1ec4-9866-cd2a-c6bc-a32c2da8cb3d@oracle.com>
 <9c019b11-0c78-2649-d3bd-cd02fd999e68@redhat.com>
 <f37491a9-d450-e8b1-b55b-ee66bdc2d9dc@redhat.com>
Message-ID: <d5d76fe5-399a-d7b5-693f-3595d2d299cc@oracle.com>

On 06/15/2017 12:49 PM, Andrew Haley wrote:
> On 29/05/17 15:16, Roman Kennke wrote:
>> I agree that having a single pool would be good. The current WorkGang
>> doesn't do it though, because we can't borrow threads while the GC is
>> doing work, at least not in a way that is GC agnostic (see my reply to
>> Robbin).
> 
> It would be nice, though, not to have to share worker threads between
> GC and other jobs if we didn't need todo so.  If we're on a
> large-scale multicore machine, we don't want to be trashing warm
> caches in idle cores unless it's forced on us by the lack of hardware
> resources.  We're not always short of cores, and the massively-
> scalable multi-core world is nearly upon us.
> 

I agree with your point: just because we have a single pool doesn't necessary mean we need to share the threads, but share heuristics.
We have some stuff in the pipeline that we would like to also be done in parallel during STW or/and concurrent, like the concurrent monitor deflation.
JFR have already moved most of the logic of out STW, so consider JFR + Shenandoah + concurrent deflation and invoking Arrays.parallelSort() (don't forget about compiler 
threads running also).
Now we may end up trashing caches just because we have no heuristics.

The solution is not obvious to me, that why, at least, I need to think about it. (JEP)

Thanks, Robbin


From thomas.schatzl at oracle.com  Fri Jun 16 10:23:54 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 16 Jun 2017 12:23:54 +0200
Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as
 belonging to mtGC
In-Reply-To: <1DDCE6C7-B5CC-4878-85E8-44D497B41E7C@oracle.com>
References: <1495365159435.54025@infobip.com>
 <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com>
 <1495734990075.28893@infobip.com>
 <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com>
 <1496926647107.27811@infobip.com>
 <E2C0ACBF-E0CB-426F-97A6-99224CFDBE0C@oracle.com>
 <1497107249632.26184@infobip.com> <1497445137.2785.7.camel@oracle.com>
 <1497448754310.80745@infobip.com>
 <1DDCE6C7-B5CC-4878-85E8-44D497B41E7C@oracle.com>
Message-ID: <1497608634.3282.10.camel@oracle.com>

Hi,

On Wed, 2017-06-14 at 13:48 -0400, Kim Barrett wrote:
> > 
> > On Jun 14, 2017, at 9:59 AM, Milan Mimica <Milan.Mimica at infobip.com
> > > wrote:
> > > 
> > > I also uploaded a webrev for JDK-8176571 based on the above to
> > > http://cr.openjdk.java.net/~tschatzl/8176571/webrev
> > > 
> > > Looks good to me too, but is there a reason to not use default
> > > parameters for the CHeapBitmap constructors?
> > I'd rather somehow make the argument mandatory, to force people to
> > choose a memory category.
>
> I would support making the argument mandatory.??All non-test callers
> are presently in g1, and are already being touched to change them to
> explicitly use mtGC, so would not be affected by such a change.??But
> there are some callers in test_bitMap and test_bitMap_search native
> tests that would need to be fixed.

I would not block such an idea. The user of the bitmap should know what
it is going to be used for :)

However the tests heavily use templates to test all types of bitmaps,
so they expect the constructors to be the same. The best I could come
up to make this would be having a test private wrapper class for
CHeapBitmap that adds mtInternal to the constructor automatically - and
use that one for CHeapBitmap tests.

I am sure you C++ wizards immediately find something better though :)

Not sure if it is worth the effort, but feel free to convince me with a
changeset :)

Otherwise I just recommend using a default mtInternal value for the
MEMFLAGS parameter instead of the manual constructor duplication.

Note that we still have some time, as Milan's name does not show up on
the OCA signatory list yet (http://www.oracle.com/technetwork/community
/oca-486395.html).

Thanks,
? Thomas


From thomas.schatzl at oracle.com  Tue Jun 20 08:05:47 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 20 Jun 2017 10:05:47 +0200
Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure
In-Reply-To: <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <1493986396.2777.61.camel@oracle.com>
 <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com>
Message-ID: <1497945947.2784.6.camel@oracle.com>

Hi Sangheon, others,

On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote:
> Hi Thomas,
> 
> On 05/05/2017 05:13 AM, Thomas Schatzl wrote:
> > 
> > Hi all,
> > 
> > ???recent reviews have made changes necessary to parts of the
> > changeset chain.
> > 
> > Here is a list of links to updated webrevs. Since they have
> > apparently not been reviewed yet, I simply overwrote the old
> > webrevs.
> > 
> > JDK-8177044: Remove _scan_top from HeapRegion
> > http://cr.openjdk.java.net/~tschatzl/8177044/webrev/
> > 
> > JDK-8178148: Log more detailed information about scan rs phase
> > http://cr.openjdk.java.net/~tschatzl/8178148/webrev/
> > 
> > JDK-8175554: Improve G1UpdateRSOrPushRefClosure
> > http://cr.openjdk.java.net/~tschatzl/8175554/webrev/
> Looks good to me.
> I only have minor nits.
> 
> ------------------------------------------------------
> src/share/vm/gc/g1/g1OopClosures.hpp
> ???78???virtual void do_oop(oop* p) { do_oop_nv(p); }
> Misaligned with above line.
> 
> ------------------------------------------------------
> src/share/vm/gc/g1/g1RemSet.hpp
> ? 204???????????????????G1UpdateOrScanRSClosure* push_heap_cl,
> Rename to reflect new closure name?
> 
> ------------------------------------------------------
> src/share/vm/gc/g1/g1RootProcessor.hpp
> Copyright update.
> 
> ------------------------------------------------------
> src/share/vm/gc/g1/g1_specialized_oop_closures.hpp
> ???45???????f(G1UpdateOrScanRSClosure,_nv)?????????\
> Misaligned '\'.
> 

? I fixed all this in addition to incorporating ErikD's comments that
asked for factoring out two parts of the G1ParScanClosure and
G1UpdateOrScanRSClosure that were equal now.

I did some performance testing again due to that, and also found that
the check to filter out non-cross-region references
in?G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also
reverted it to the old code.

Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not update
_has_refs_into_cset as before. Fixed that as well.

Thanks,
? Thomas


From thomas.schatzl at oracle.com  Tue Jun 20 08:07:58 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 20 Jun 2017 10:07:58 +0200
Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure
In-Reply-To: <1497945947.2784.6.camel@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <1493986396.2777.61.camel@oracle.com>
 <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com>
 <1497945947.2784.6.camel@oracle.com>
Message-ID: <1497946078.2784.7.camel@oracle.com>

Hi again,

? webrev links:
http://cr.openjdk.java.net/~tschatzl/8175554/webrev.0_to_1/?(diff)
http://cr.openjdk.java.net/~tschatzl/8175554/webrev.1/?(full)

Thomas
?
On Tue, 2017-06-20 at 10:05 +0200, Thomas Schatzl wrote:
> Hi Sangheon, others,
> 
> On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote:
> > 
> > Hi Thomas,
> > 
> > On 05/05/2017 05:13 AM, Thomas Schatzl wrote:
> > > 
> > > 
> > > Hi all,
> > > 
> > > ???recent reviews have made changes necessary to parts of the
> > > changeset chain.
> > > 
> > > Here is a list of links to updated webrevs. Since they have
> > > apparently not been reviewed yet, I simply overwrote the old
> > > webrevs.
> > > 
> > > JDK-8177044: Remove _scan_top from HeapRegion
> > > http://cr.openjdk.java.net/~tschatzl/8177044/webrev/
> > > 
> > > JDK-8178148: Log more detailed information about scan rs phase
> > > http://cr.openjdk.java.net/~tschatzl/8178148/webrev/
> > > 
> > > JDK-8175554: Improve G1UpdateRSOrPushRefClosure
> > > http://cr.openjdk.java.net/~tschatzl/8175554/webrev/
> > Looks good to me.
> > I only have minor nits.
> > 
> > ------------------------------------------------------
> > src/share/vm/gc/g1/g1OopClosures.hpp
> > ???78???virtual void do_oop(oop* p) { do_oop_nv(p); }
> > Misaligned with above line.
> > 
> > ------------------------------------------------------
> > src/share/vm/gc/g1/g1RemSet.hpp
> > ? 204???????????????????G1UpdateOrScanRSClosure* push_heap_cl,
> > Rename to reflect new closure name?
> > 
> > ------------------------------------------------------
> > src/share/vm/gc/g1/g1RootProcessor.hpp
> > Copyright update.
> > 
> > ------------------------------------------------------
> > src/share/vm/gc/g1/g1_specialized_oop_closures.hpp
> > ???45???????f(G1UpdateOrScanRSClosure,_nv)?????????\
> > Misaligned '\'.
> > 
> ? I fixed all this in addition to incorporating ErikD's comments that
> asked for factoring out two parts of the G1ParScanClosure and
> G1UpdateOrScanRSClosure that were equal now.
> 
> I did some performance testing again due to that, and also found that
> the check to filter out non-cross-region references
> in?G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also
> reverted it to the old code.
> 
> Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not
> update
> _has_refs_into_cset as before. Fixed that as well.
> 
> Thanks,
> ? Thomas
> 


From sangheon.kim at oracle.com  Tue Jun 20 23:15:37 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Tue, 20 Jun 2017 16:15:37 -0700
Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing
In-Reply-To: <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com>
References: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>
 <bae34b84-4b8d-3d4e-9954-a204da1ab50f@redhat.com>
 <a34ab9df-3e0a-b7ff-79e4-c4015ea843bc@oracle.com>
 <1497352882.2829.65.camel@oracle.com>
 <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com>
 <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com>
Message-ID: <001e2f5b-4d5f-327f-79bc-9287045179e1@oracle.com>

Hi Thomas,

On 06/14/2017 12:52 AM, sangheon wrote:
> Hi Thomas again,
>
> On 06/13/2017 02:21 PM, sangheon wrote:
>> Hi Thomas,
>>
>> Thank you for reviewing this.
>>
>> On 06/13/2017 04:21 AM, Thomas Schatzl wrote:
>>> Hi Sangheon,
>>>
>>>
>>> On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote:
>>>> Hi Aleksey,
>>>>
>>>> Thanks for the review.
>>>>
>>>> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote:
>>>>> On 06/10/2017 01:57 AM, sangheon wrote:
>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8173335
>>>>>> webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0
>>> - There should be a destructor in ReferenceProcessor cleaning up the
>>> dynamically allocated memory.
>> Thomas and I had some discussion about this and agreed to file a 
>> separate CR for freeing issue.
>>
>> I noticed that there's no destructor when I wrote this, but this is 
>> how we usually implement.
>> However as this seems incorrect, I will add a destructor for newly 
>> added class but it will not be used in this patch.
>> It will be used in the following CR( 
>> https://bugs.openjdk.java.net/browse/JDK-8182120 ) which fixes 
>> not-freeing issue in ReferenceProcessor.
>> FYI, ReferenceProcessor has heap allocated members of 
>> ReferencePolicy(and its friends) but it is not freed too. So instead 
>> of extending this patch, I propose to separate this freeing issue.
>>
>>>
>>> - the change should move gc+ref output to something else: there is so
>>> much additional junk printed with gc+ref=trace so that the phase
>>> logging is drowned out with real trace information and unusable for
>>> regular consumption.
>> Okay, I will add it.
>> But I asked introducing 'gc+ref+phases' before but you didn't like 
>> it. :) Probably I didn't provide much details?!
>>
>>>
>>> Also I would prefer to have this detailed log output interspersed
>>> within the (existing) gc+phases output. Like under the "Reference
>>> Processing" and "Reference Enqueuing" sections for G1 in particular.
>> Frankly speaking, I'm not much interested now.
>> When I started investigating this CR, you mentioned about this too. 
>> (But you were okay for either way. i.e. current one or interspersing 
>> into G1 logging. :) )
>> I also tried in that way(interspersing one) and my feeling is that I 
>> don't see much benefit to have ref logs in G1 phases section. It 
>> looks better organized but it doesn't mean current log style is worse.
>> Ref. logs are printed separately for long time and other shared codes 
>> also print logs immediately.
>>
>> On the other hand, current implementation(re-use and print 
>> immediately) seems simpler to implement.
>> In addition, ReferenceProcessor::process_discovered_reflist() is 
>> repeatedly called for different type of References so re-using log 
>> printer seems natural to me. :)
>>
>>>
>>> Maybe with gc+phases+ref=debug/trace so that "everything" could be
>>> enabled using "gc+phases*=debug/trace"?
>> Yes, good idea.
>>
>>>
>>> I can see that the code throws away the previous information about
>>> reference processing after every use (the phasetimes reused). This is
>>> does not allow printing of the data at convenient times and places.
>>>
>>> I.e. I would prefer if the data were aggregated (not only from one
>>> particular phase) and later printed together.
>> If we don't intersperse with existing G1 log, do you still think 
>> printing later is needed?
>> Probably printing after Phanthom Ref. processed or different location?
>>
>>>
>>> I kind of disagree with Aleksey about need for backwards compatibility
>>> of log messages. This is such a big breaking change in the amount of
>>> information shown that existing users will want to adapt their log
>>> readers anyway.
>> True that log parsers already should be updated, but I understood 
>> Aleksey's comment something like preferring 'Ref Counts' instead of 
>> 'Counts'.
>>
>>> As mentioned, due to real trace code here, gc+ref=trace is unusable.
>> FYI, probably you tested with fastdebug because there are many 
>> debug/trace logs for debug build. It doesn't bother from product 
>> build actually.
>> But as I said, I will change current new logs' channel from 'gc+ref' 
>> to 'gc+phases+ref'.
>>
>>>
>>> We could still provide minimal backwards compatible output under
>>> gc+ref=debug if needed.
>> I'm don't see much value on this.
>> gc+phases+ref seems better.
>>
>>>
>>> - I would prefer if resetting the reference phase times logger wouldn't
>>> be kind of an afterthought of printing :)
>>>
>>> Also it might be useful to keep the data around for somewhat longer
>>> (not throw it away after every phase). Don't we need the data for
>>> further analysis?
>> I don't have strong opinion on this.
>>
>> I didn't consider keeping log data for further analysis. This could a 
>> minor reason for supporting keeping log data longer but I think 
>> interspersing with existing G1 log would be the main reason of 
>> keeping it.
>>
>>>
>>> This would also allow printing it later using different log tags (with
>>> different formatting).
>>>
>>> - I like the split of phasetimes into data storage and printing. I do
>>> not like that basically the timing data is created twice, once for the
>>> phasetimes, once for the GCTimer (for JFR basically).
>> No, currently timing data is created once and used for both phase log 
>> and GCTimer.
>> Or am I missing something?
>>
>> So in summary, mostly I agree with your comments except below 2:
>> 1. Interspersing with G1 log.
>> 2. Keeping log data longer. (This should be done if we go with 
>> interspersing idea)
> I started working on above 2 items. :)
> I will update webrev when I'm ready.
Here's updated webrevs which applied below:

1. Added destructor for ReferenceProcessorPhaseTimes.
2. Added 'gc+phases+ref' for newly added logs.
3. Interspersing reference logs into G1 young GC log.
     - Logs for other cases will be printed immediately.
4. All timing information have their own storage.
5. Total time is added.

Current reference logs will be:
1. New logs (except G1 young GC)
[1.541s][debug][gc,phases      ] GC(7) Finalize Marking 4.802ms
[1.541s][debug][gc,phases,start] GC(7) *_Reference Processing_**_<-- [1]_*
*[1.543s][debug][gc,phases,ref  ] GC(7) Reference Processing: 1.8ms**<-- [2]
**[1.543s][debug][gc,phases,ref  ] GC(7) SoftReference: 0.3ms**
**[1.543s][debug][gc,phases,ref  ] GC(7)     Balance queues: 0.0ms**
**[1.543s][debug][gc,phases,ref  ] GC(7)     Phase1: 0.3ms**
**[1.543s][trace][gc,phases,ref  ] GC(7) Process lists (ms)        Min: 
1541.3, Avg: 1541.3, Max: 1541.3, Diff:  0.0, Sum: 35450.0, Workers: 23**
**[1.543s][debug][gc,phases,ref  ] GC(7)     Phase2: 0.2ms**
**[1.543s][trace][gc,phases,ref  ] GC(7) Process lists (ms)        Min: 
1541.5, Avg: 1541.5, Max: 1541.5, Diff:  0.0, Sum: 35454.5, Workers: 23**
**[1.543s][debug][gc,phases,ref  ] GC(7)     Phase3: 0.3ms**
**[1.543s][trace][gc,phases,ref  ] GC(7) Process lists (ms)        Min: 
1541.7, Avg: 1541.8, Max: 1541.8, Diff:  0.0, Sum: 35460.5, Workers: 23**
**[1.543s][debug][gc,phases,ref  ] GC(7) Discovered: 0**
**[1.543s][debug][gc,phases,ref  ] GC(7) Cleared: 0**
**...**
**[1.543s][debug][gc,phases,ref  ] GC(7) Reference Enqueuing 0.1ms**
**[1.543s][trace][gc,phases,ref  ] GC(7)   Process lists (ms)        
Min: 1543.4, Avg: 1543.4, Max: 1543.4, Diff: 0.1, Sum: 35498.4, Workers: 
23**
**[1.543s][debug][gc,phases,ref  ] GC(7)   Ref Counts:  Soft: 0  Weak: 
0  Final: 0  Phantom: 0*
[1.544s][debug][gc,phases      ] GC(7) Reference Processing 2.445ms
[1.544s][debug][gc,phases,start] GC(7) Class Unloading
[1.544s][debug][gc,phases,start] GC(7) ClassLoaderData
[1.544s][debug][gc,phases      ] GC(7) ClassLoaderData 0.467ms

2. New logs for G1 young GC: -Xlog:gc+phases*=trace
[1.470s][info ][gc,phases     ] GC(6)   Post Evacuate Collection Set: 4.1ms
[1.470s][debug][gc,phases     ] GC(6)     Code Roots Fixup: 0.0ms
[1.470s][debug][gc,phases     ] GC(6)     Preserve CM Refs: 0.0ms
[1.470s][trace][gc,phases     ] GC(6)       Parallel Preserve CM Refs 
(ms): skipped
[1.470s][trace][gc,phases,task] GC(6)                                 - 
- - - - - - - - - - - - - - - - - - - - - -
*[1.470s][debug]_[gc,phases     ]_ GC(6) Reference Processing: 1.4ms**
**[1.470s][debug][gc,phases,ref ] GC(6) SoftReference: 0.2ms**
**[1.470s][debug][gc,phases,ref ] GC(6) Balance queues: 0.0ms**
**[1.470s][debug][gc,phases,ref ] GC(6) Phase1: 0.2ms**
**[1.470s][trace][gc,phases,ref ] GC(6) Process lists (ms)        Min: 
1463.2, Avg: 1463.2, Max: 1463.2, Diff:  0.0, Sum: 33653.2, Workers: 23**
**[1.470s][debug][gc,phases,ref ] GC(6) Phase2: 0.1ms**
**[1.470s][trace][gc,phases,ref ] GC(6) Process lists (ms)        Min: 
1463.3, Avg: 1463.3, Max: 1463.4, Diff:  0.0, Sum: 33656.8, Workers: 23**
**[1.470s][debug][gc,phases,ref ] GC(6) Phase3: 0.2ms**
**[1.470s][trace][gc,phases,ref ] GC(6) Process lists (ms)        Min: 
1463.5, Avg: 1463.5, Max: 1463.6, Diff:  0.0, Sum: 33661.6, Workers: 23**
**[1.470s][debug][gc,phases,ref ] GC(6) Discovered: 0**
**[1.470s][debug][gc,phases,ref ] GC(6) Cleared: 0*
...
[1.471s][debug][gc,phases     ] GC(6)     Clear Card Table: 0.0ms
*[1.471s][debug][gc,phases     ] GC(6)     Reference Enqueuing: 0.1ms**
**[1.471s][debug][gc,phases,ref ] GC(6)       Ref Counts:  Soft: 0  
Weak: 0  Final: 0  Phantom: 0*
[1.471s][debug][gc,phases     ] GC(6)     Merge Per-Thread State: 0.2ms

3. New logs for G1 young GC: -Xlog:gc+phases+ref=trace
[1.335s][debug]_*[gc,phases,ref]*_ GC(4)     Reference Processing: 
9.4ms_*<- This is still printed.*_
[1.335s][debug][gc,phases,ref] GC(4)       SoftReference: 7.0ms
[1.335s][debug][gc,phases,ref] GC(4)         Balance queues: 0.0ms
[1.335s][debug][gc,phases,ref] GC(4)         Phase1: 7.0ms
[1.335s][trace][gc,phases,ref] GC(4)           Process lists (ms)        
Min: 1329.0, Avg: 1329.0, Max: 1329.1, Diff: 0.1, Sum: 30568.1, Workers: 23
[1.335s][debug][gc,phases,ref] GC(4)         Phase2: 0.1ms
[1.335s][trace][gc,phases,ref] GC(4)           Process lists (ms)        
Min: 1329.1, Avg: 1329.1, Max: 1329.1, Diff: 0.0, Sum: 30569.7, Workers: 23
[1.335s][debug][gc,phases,ref] GC(4)         Phase3: 0.3ms
[1.335s][trace][gc,phases,ref] GC(4)           Process lists (ms)        
Min: 1329.4, Avg: 1329.4, Max: 1329.5, Diff: 0.1, Sum: 30576.7, Workers: 23
[1.335s][debug][gc,phases,ref] GC(4)         Discovered: 0
[1.335s][debug][gc,phases,ref] GC(4)         Cleared: 0
...
[1.335s][debug][gc,phases,ref] GC(4)     Reference Enqueuing: 0.1ms
[1.335s][debug][gc,phases,ref] GC(4)       Ref Counts: Soft: 0  Weak: 0  
Final: 0  Phantom: 0

[1]: Among implementations 'Reference Processing' GCTraceTime are 
measuring differently. Some include 
ReferenceProcessor::enqueue_discovered_references() while others don't. 
i.e. The former is something like measuring all reference process 
related work rather than process_discovered_references(). As to have its 
own total time seems right to me, I added it[2].

PS) There were some concern about exposing WorkerDataArray into 'shared' 
directory. But as there's no alternative to use, I'm hoping to share it now.

Webrev:
http://cr.openjdk.java.net/~sangheki/8173335/webrev.2 (full)
http://cr.openjdk.java.net/~sangheki/8173335/webrev.2_to_1/ (incremental)

Thanks,
Sangheon


>
> Thanks,
> Sangheon
>
>
>>
>> Let me post updated webrev, after making all decision.
>>
>> Thanks,
>> Sangheon
>>
>>
>>> Or the gctimer is
>>> passed everywhere. But that is another issue I guess.
>>>
>>> Thanks,
>>>    Thomas
>>>
>>>
>>>>> Oh, good! I had to instrument these by hand when optimizing RP
>>>>> paths.
>>>>>
>>>>> Comments after brief look:
>>>>>
>>>>>    *) So, the path with NULL executor are also not handling the
>>>>> timer? E.g. CMS:
>>>>>
>>>>>    5262   if (rp->processing_is_mt()) {
>>>>>    5263     rp->balance_all_queues();
>>>>>    5264     CMSRefProcTaskExecutor task_executor(*this);
>>>>>    5265 rp->enqueue_discovered_references(&task_executor,
>>>>> _gc_timer_cm);
>>>>>    5266   } else {
>>>>>    5267     rp->enqueue_discovered_references(NULL);
>>>>>    5268   }
>>>> Fixed to use timers for similar cases that you pointed. Thanks for
>>>> catching up this!
>>>> I started this CR as a part of MT ref. processing(JDK-8043575), so I
>>>> only added to that path. But this should be fixed.
>>>>>
>>>>>    *) I would leave "Ref Counts" line as usual for compatibility
>>>>> reasons. Changing
>>>>> it to "Counts" would force GC log parsers to handle that corner
>>>>> case too.
>>>> Changed, 'Counts -> Ref Counts'.
>>>>>
>>>>>    *) This may reuse Indents?
>>>>>
>>>>>      95       out->print("%s", "    ");
>>>> Fixed to use Indents[2].
>>>>
>>>>>
>>>>>    *) Probably makes sense to "hg mv -A" the workerDataArray files
>>>>> to preserve the
>>>>> Mercurial history -- webrev should say something like "copied from
>>>>> ...", IIRC.
>>>> Fixed.
>>>>
>>>> webrev:
>>>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/
>>>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0
>>>>
>>>> Thanks,
>>>> Sangheon
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> -Aleksey
>>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170620/c5c66c52/attachment.htm>

From email.sundarms at gmail.com  Wed Jun 21 06:45:09 2017
From: email.sundarms at gmail.com (Sundara Mohan M)
Date: Tue, 20 Jun 2017 23:45:09 -0700
Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag
Message-ID: <CAEY0QqDYHq2ZhjOFSqLDxduHb9ZLAYaD9LFPrmfeVXwqO4HQfw@mail.gmail.com>

Hi,
  Can someone shed more light on why G1OldCSetRegionThresholdPercent flag
is under experimental (Need to add  -XX:+UnlockExperimentalVMOptions to
modify it.)


Thanks,
Sundar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170620/06f62dcc/attachment.htm>

From erik.helin at oracle.com  Thu Jun 22 08:12:04 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Thu, 22 Jun 2017 10:12:04 +0200
Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap
 into its own subclass
In-Reply-To: <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com>
References: <b33ca127-c0d1-5a4b-7565-0ffe2ca6fe52@redhat.com>
 <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com>
Message-ID: <fb90f88a-ef22-550e-6ee9-35f29472dc01@oracle.com>

Hi Roman,

thanks for putting this patch together, it is a great step forward! One
thung that (in my mind) would improve it even further is if we embed a
GenCollectedHeap in CMSHeap and then make CMSHeap inherit directly from
CollectedHeap.

With this solution, the definition of CMSHeap would look like something
along the lines of:

class CMSHeap : public CollectedHeap {
  WorkGang* _wg;
  GenCollectedHeap _gch;

 public:
  CMSHeap(GenCollectorPolicy* policy) :
    _wg(new WorkGang("GC Thread", ParallelGCThreads, true, true),
    _gch(policy) {
    _wg->initialize_workers();
  }

  // a bunch of "facade" methods
  virtual bool supports_tlab_allocation() const {
    return _gch->supports_tlab_allocation();
  }

  virtual size_t tlab_capacity(Thread* t) const {
    return _gch->tlab_capacity(t);
  }
};

With this approach, you would have to implement a bunch of "facade"
methods that just delegates to _gch, such as the methods
supports_tlab_allocation and tlab_capacity above. There are two reasons
why I prefer this approach:
1. In the end we want CMSHeap to inherit from CollectedHeap anyway :)
2. It makes it very clear which methods we gradually have to
   re-implement in CMSHeap to eventually get rid of the _gch field (the
   end goal). This is much harder to see if CMSHeap inherits from
   GenCollectedHeap (see more below).

The second point will most likely cause some initial problems with
`protected` code in GenCollectedHeap. For example, as you noticed when
creating this patch, CMSHeap make use of a few `protected` fields and
methods from GenCollectedHeap, most notably:
- _process_strong_tasks
- process_roots()
- process_string_table_roots()

It would be much better (IMO) to share this code via composition rather
than inheritance. In this particular case, I would prefer to create a
class StrongRootsProcessor that encapsulates the root processing logic.
Then GenCollectedHeap and CMSHeap can both contain an instance of
StrongRootsProcessor.

What do you think of this approach? Do you have some spare cycles to try
this approach out?

Thanks,
Erik

On 06/02/2017 10:55 AM, Roman Kennke wrote:
> Take this patch. It #ifdef ASSERT's a call to check_gen_kinds() that is
> only present in debug builds.
> 
> 
> http://cr.openjdk.java.net/~rkennke/8179387/webrev.01/
> <http://cr.openjdk.java.net/%7Erkennke/8179387/webrev.01/>
> 
> Roman
> 
> Am 01.06.2017 um 22:50 schrieb Roman Kennke:
>> What $SUBJECT says.
>>
>> I went over genCollectedHeap.[hpp|cpp] and moved everything that I could
>> find that is CMS-only into a new CMSHeap class.
>>
>> http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/
>> <http://cr.openjdk.java.net/%7Erkennke/8179387/webrev.00/>
>>
>> It is possible that I overlooked something there. There may be code in
>> there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff.
>>
>> Also not that I have not removed that little part:
>>
>>   always_do_update_barrier = UseConcMarkSweepGC;
>>
>> because I expect it to go away with Erik ?'s big refactoring.
>>
>> What do you think?
>>
>> Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC
>>
>> Roman
>>
> 


From stefan.karlsson at oracle.com  Thu Jun 22 08:59:32 2017
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 22 Jun 2017 10:59:32 +0200
Subject: RFR: 818269: Remove gcTrace.hpp include from
 referenceProcessor.hpp
In-Reply-To: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com>
References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com>
Message-ID: <cb7f2700-b9bc-ad3e-e93e-6e07759e46bf@oracle.com>

This mail was supposed to go to hotspot-gc-dev (To:ed) not to jdk10-dev 
(BCC:ed).

Thanks,
StefanK

On 2017-06-22 10:46, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this trivial change to remove an include of gcTrace.hpp in 
> referenceProcessor.hpp, and changes needed to get the code to compile 
> after that.
> 
> http://cr.openjdk.java.net/~stefank/8182696/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8182696
> 
> I was prototyping ways to get more type safe time durations in HotSpot 
> and found that whenever I changed my header file, that almost all 
> HotSpot cpp files were recompiled. I tracked it down to come from the 
> unused include of gcTrace.hpp in referenceProcessor.hpp.
> 
> We could probably also try to figure out why changes 
> referenceProcessor.hpp triggers recompiles of the entire source code, 
> but I'd like to leave that exercise for another day.
> 
> Thanks,
> StefanK


From rkennke at redhat.com  Thu Jun 22 08:59:53 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 22 Jun 2017 10:59:53 +0200
Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap
 into its own subclass
In-Reply-To: <fb90f88a-ef22-550e-6ee9-35f29472dc01@oracle.com>
References: <b33ca127-c0d1-5a4b-7565-0ffe2ca6fe52@redhat.com>
 <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com>
 <fb90f88a-ef22-550e-6ee9-35f29472dc01@oracle.com>
Message-ID: <da89bc06-cea7-14c5-a339-b2d82205aef1@redhat.com>

That sounds like a good idea. I'll give it a try.

Roman

> Hi Roman,
>
> thanks for putting this patch together, it is a great step forward! One
> thung that (in my mind) would improve it even further is if we embed a
> GenCollectedHeap in CMSHeap and then make CMSHeap inherit directly from
> CollectedHeap.
>
> With this solution, the definition of CMSHeap would look like something
> along the lines of:
>
> class CMSHeap : public CollectedHeap {
>   WorkGang* _wg;
>   GenCollectedHeap _gch;
>
>  public:
>   CMSHeap(GenCollectorPolicy* policy) :
>     _wg(new WorkGang("GC Thread", ParallelGCThreads, true, true),
>     _gch(policy) {
>     _wg->initialize_workers();
>   }
>
>   // a bunch of "facade" methods
>   virtual bool supports_tlab_allocation() const {
>     return _gch->supports_tlab_allocation();
>   }
>
>   virtual size_t tlab_capacity(Thread* t) const {
>     return _gch->tlab_capacity(t);
>   }
> };
>
> With this approach, you would have to implement a bunch of "facade"
> methods that just delegates to _gch, such as the methods
> supports_tlab_allocation and tlab_capacity above. There are two reasons
> why I prefer this approach:
> 1. In the end we want CMSHeap to inherit from CollectedHeap anyway :)
> 2. It makes it very clear which methods we gradually have to
>    re-implement in CMSHeap to eventually get rid of the _gch field (the
>    end goal). This is much harder to see if CMSHeap inherits from
>    GenCollectedHeap (see more below).
>
> The second point will most likely cause some initial problems with
> `protected` code in GenCollectedHeap. For example, as you noticed when
> creating this patch, CMSHeap make use of a few `protected` fields and
> methods from GenCollectedHeap, most notably:
> - _process_strong_tasks
> - process_roots()
> - process_string_table_roots()
>
> It would be much better (IMO) to share this code via composition rather
> than inheritance. In this particular case, I would prefer to create a
> class StrongRootsProcessor that encapsulates the root processing logic.
> Then GenCollectedHeap and CMSHeap can both contain an instance of
> StrongRootsProcessor.
>
> What do you think of this approach? Do you have some spare cycles to try
> this approach out?
>
> Thanks,
> Erik
>
> On 06/02/2017 10:55 AM, Roman Kennke wrote:
>> Take this patch. It #ifdef ASSERT's a call to check_gen_kinds() that is
>> only present in debug builds.
>>
>>
>> http://cr.openjdk.java.net/~rkennke/8179387/webrev.01/
>> <http://cr.openjdk.java.net/%7Erkennke/8179387/webrev.01/>
>>
>> Roman
>>
>> Am 01.06.2017 um 22:50 schrieb Roman Kennke:
>>> What $SUBJECT says.
>>>
>>> I went over genCollectedHeap.[hpp|cpp] and moved everything that I could
>>> find that is CMS-only into a new CMSHeap class.
>>>
>>> http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/
>>> <http://cr.openjdk.java.net/%7Erkennke/8179387/webrev.00/>
>>>
>>> It is possible that I overlooked something there. There may be code in
>>> there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff.
>>>
>>> Also not that I have not removed that little part:
>>>
>>>   always_do_update_barrier = UseConcMarkSweepGC;
>>>
>>> because I expect it to go away with Erik ?'s big refactoring.
>>>
>>> What do you think?
>>>
>>> Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC
>>>
>>> Roman
>>>


From stefan.karlsson at oracle.com  Thu Jun 22 09:16:45 2017
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 22 Jun 2017 11:16:45 +0200
Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken
Message-ID: <dec90bdd-9bb2-4f64-d45f-581fc0452459@oracle.com>

Hi all,

Please review this patch to fix and strengthen is_object_aligned checks 
when pointers are passed in:

http://cr.openjdk.java.net/~stefank/8178490/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8178490

is_object_aligned only works correctly for sizes measured in words.

When a pointer is passed into:

inline bool is_object_aligned(intptr_t addr) {
   return addr == align_object_size(addr);
}

inline intptr_t align_object_size(intptr_t size) { 
 
 
   return align_size_up(size, MinObjAlignment); 
 
 
}

the pointer is incorrectly interpreted as a word size and the alignment 
is checked against MinObjectAligment instead of MinObjectAlignmentInBytes

Tested with JPRT together with different patches for:
  8178489 Make align functions more type safe and consistent

Thanks,
StefanK


From thomas.schatzl at oracle.com  Thu Jun 22 09:44:59 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 22 Jun 2017 11:44:59 +0200
Subject: RFR: 818269: Remove gcTrace.hpp include from
 referenceProcessor.hpp
In-Reply-To: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com>
References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com>
Message-ID: <1498124699.2831.18.camel@oracle.com>

Hi,

On Thu, 2017-06-22 at 10:46 +0200, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this trivial change to remove an include of gcTrace.hpp
> in?
> referenceProcessor.hpp, and changes needed to get the code to
> compile?
> after that.
> 
> http://cr.openjdk.java.net/~stefank/8182696/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8182696

? ship it.

Thomas


From thomas.schatzl at oracle.com  Thu Jun 22 10:00:12 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 22 Jun 2017 12:00:12 +0200
Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken
In-Reply-To: <dec90bdd-9bb2-4f64-d45f-581fc0452459@oracle.com>
References: <dec90bdd-9bb2-4f64-d45f-581fc0452459@oracle.com>
Message-ID: <1498125612.2831.19.camel@oracle.com>

Hi Stefan,

On Thu, 2017-06-22 at 11:16 +0200, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to fix and strengthen is_object_aligned
> checks?
> when pointers are passed in:
> 
> http://cr.openjdk.java.net/~stefank/8178490/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8178490
> 
> is_object_aligned only works correctly for sizes measured in words.
> 

? looks good.

Thomas


From thomas.schatzl at oracle.com  Thu Jun 22 10:18:19 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 22 Jun 2017 12:18:19 +0200
Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag
In-Reply-To: <CAEY0QqDYHq2ZhjOFSqLDxduHb9ZLAYaD9LFPrmfeVXwqO4HQfw@mail.gmail.com>
References: <CAEY0QqDYHq2ZhjOFSqLDxduHb9ZLAYaD9LFPrmfeVXwqO4HQfw@mail.gmail.com>
Message-ID: <1498126699.2831.29.camel@oracle.com>

Hi,

On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote:
> Hi,
> ? Can someone shed more light on why?G1OldCSetRegionThresholdPercent
> flag is under experimental (Need to add??-
> XX:+UnlockExperimentalVMOptions to modify it.)

? in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly as a
"I really want to do that and I know what I am doing" confirmation from
the user that he is aware that using this (in this case) option to
influence the set of regions taken in during mixed gc you might get
surprising behavior.

Also, I think there has been no official documentation for it - also
because it should be very rarely needed.
In particular, I am curious about the case when it would be useful to
change it. Could you give some log files showing that there is an issue
with the upper bound for the number of old gen regions to take during
GC? (i.e. the amount of old gen regions taken is too small and there is
ample pause time left and it matters to clean up more regions in a
single mixed gc?)

Sometimes there are problems with the lower bound that is controlled by
the -XX:G1MixedGCCountTarget (product level) option.

Hth,
? Thomas


From thomas.schatzl at oracle.com  Thu Jun 22 10:44:09 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 22 Jun 2017 12:44:09 +0200
Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure
In-Reply-To: <1497945947.2784.6.camel@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <1493986396.2777.61.camel@oracle.com>
 <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com>
 <1497945947.2784.6.camel@oracle.com>
Message-ID: <1498128249.2831.38.camel@oracle.com>

Hi all,

? after discussion with Erik, I removed one comment, and renamed the
closures to something that resembles their use. Also I had to
reintroduce the G1ParPushRefClosure removed in the initial patch due to
performance regressions.

G1UpdateOrScanRSClosure -> G1ScanObjsDuringUpdateRSClosure
G1ParPushRefClosure -> G1ScanObjsDuringScanRSClosure
G1ParScanClosure -> G1ScanEvacuatedObjClosure

We also found that the mechanism to collect cards that contain
references into the collection set to not lose any remembered set
entries during update RS if there is an evacuation failure is basically
superfluous. Other, existing mechanism make sure that all required
remembered sets are (re-)created in other stages of the GC.

Removal of this code has been decided to be out of scope here.

Webrev:
http://cr.openjdk.java.net/~tschatzl/8175554/webrev.1_to_2/?(diff)
http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2/?(full)
Testing:
jprt, local testing

Thanks,
? Thomas


On Tue, 2017-06-20 at 10:05 +0200, Thomas Schatzl wrote:
> Hi Sangheon, others,
> 
> On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote:
> > 
> > Hi Thomas,
> > 
> > On 05/05/2017 05:13 AM, Thomas Schatzl wrote:
> > > 
> > > 
> > > Hi all,
> > > 
> > > ???recent reviews have made changes necessary to parts of the
> > > changeset chain.
> > > 
> > > Here is a list of links to updated webrevs. Since they have
> > > apparently not been reviewed yet, I simply overwrote the old
> > > webrevs.
> > > 
> > > JDK-8177044: Remove _scan_top from HeapRegion
> > > http://cr.openjdk.java.net/~tschatzl/8177044/webrev/
> > > 
> > > JDK-8178148: Log more detailed information about scan rs phase
> > > http://cr.openjdk.java.net/~tschatzl/8178148/webrev/
> > > 
> > > JDK-8175554: Improve G1UpdateRSOrPushRefClosure
> > > http://cr.openjdk.java.net/~tschatzl/8175554/webrev/
> > Looks good to me.
> > I only have minor nits.
> > 
> > ------------------------------------------------------
> > src/share/vm/gc/g1/g1OopClosures.hpp
> > ???78???virtual void do_oop(oop* p) { do_oop_nv(p); }
> > Misaligned with above line.
> > 
> > ------------------------------------------------------
> > src/share/vm/gc/g1/g1RemSet.hpp
> > ? 204???????????????????G1UpdateOrScanRSClosure* push_heap_cl,
> > Rename to reflect new closure name?
> > 
> > ------------------------------------------------------
> > src/share/vm/gc/g1/g1RootProcessor.hpp
> > Copyright update.
> > 
> > ------------------------------------------------------
> > src/share/vm/gc/g1/g1_specialized_oop_closures.hpp
> > ???45???????f(G1UpdateOrScanRSClosure,_nv)?????????\
> > Misaligned '\'.
> > 
> ? I fixed all this in addition to incorporating ErikD's comments that
> asked for factoring out two parts of the G1ParScanClosure and
> G1UpdateOrScanRSClosure that were equal now.
> 
> I did some performance testing again due to that, and also found that
> the check to filter out non-cross-region references
> in?G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also
> reverted it to the old code.
> 
> Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not
> update
> _has_refs_into_cset as before. Fixed that as well.
> 
> Thanks,
> ? Thomas
> 


From stefan.karlsson at oracle.com  Thu Jun 22 13:19:46 2017
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 22 Jun 2017 15:19:46 +0200
Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken
In-Reply-To: <1498125612.2831.19.camel@oracle.com>
References: <dec90bdd-9bb2-4f64-d45f-581fc0452459@oracle.com>
 <1498125612.2831.19.camel@oracle.com>
Message-ID: <ab074841-bc20-a397-b860-9663039908ab@oracle.com>

Thanks, Thomas.

StefanK

On 2017-06-22 12:00, Thomas Schatzl wrote:
> Hi Stefan,
>
> On Thu, 2017-06-22 at 11:16 +0200, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to fix and strengthen is_object_aligned
>> checks
>> when pointers are passed in:
>>
>> http://cr.openjdk.java.net/~stefank/8178490/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8178490
>>
>> is_object_aligned only works correctly for sizes measured in words.
>>
>    looks good.
>
> Thomas
>


From stefan.karlsson at oracle.com  Thu Jun 22 13:20:10 2017
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 22 Jun 2017 15:20:10 +0200
Subject: RFR: 818269: Remove gcTrace.hpp include from
 referenceProcessor.hpp
In-Reply-To: <1498124699.2831.18.camel@oracle.com>
References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com>
 <1498124699.2831.18.camel@oracle.com>
Message-ID: <3994480b-f286-9f35-0189-413600605c89@oracle.com>

Thanks, Thomas.

StefanK

On 2017-06-22 11:44, Thomas Schatzl wrote:
> Hi,
>
> On Thu, 2017-06-22 at 10:46 +0200, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this trivial change to remove an include of gcTrace.hpp
>> in
>> referenceProcessor.hpp, and changes needed to get the code to
>> compile
>> after that.
>>
>> http://cr.openjdk.java.net/~stefank/8182696/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8182696
>    ship it.
>
> Thomas


From kim.barrett at oracle.com  Thu Jun 22 15:19:28 2017
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 22 Jun 2017 11:19:28 -0400
Subject: RFR: 818269: Remove gcTrace.hpp include from
 referenceProcessor.hpp
In-Reply-To: <cb7f2700-b9bc-ad3e-e93e-6e07759e46bf@oracle.com>
References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com>
 <cb7f2700-b9bc-ad3e-e93e-6e07759e46bf@oracle.com>
Message-ID: <086C5847-9FD6-4B8E-BE64-913BB87D3F23@oracle.com>

> On Jun 22, 2017, at 4:59 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> This mail was supposed to go to hotspot-gc-dev (To:ed) not to jdk10-dev (BCC:ed).
> 
> Thanks,
> StefanK
> 
> On 2017-06-22 10:46, Stefan Karlsson wrote:
>> Hi all,
>> Please review this trivial change to remove an include of gcTrace.hpp in referenceProcessor.hpp, and changes needed to get the code to compile after that.
>> http://cr.openjdk.java.net/~stefank/8182696/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8182696
>> I was prototyping ways to get more type safe time durations in HotSpot and found that whenever I changed my header file, that almost all HotSpot cpp files were recompiled. I tracked it down to come from the unused include of gcTrace.hpp in referenceProcessor.hpp.
>> We could probably also try to figure out why changes referenceProcessor.hpp triggers recompiles of the entire source code, but I'd like to leave that exercise for another day.
>> Thanks,
>> StefanK

Looks good.

There?s potential for interaction between this and 8181449, but we can sort that out if it happens.


From stefan.karlsson at oracle.com  Thu Jun 22 15:40:22 2017
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 22 Jun 2017 17:40:22 +0200
Subject: RFR: 818269: Remove gcTrace.hpp include from
 referenceProcessor.hpp
In-Reply-To: <086C5847-9FD6-4B8E-BE64-913BB87D3F23@oracle.com>
References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com>
 <cb7f2700-b9bc-ad3e-e93e-6e07759e46bf@oracle.com>
 <086C5847-9FD6-4B8E-BE64-913BB87D3F23@oracle.com>
Message-ID: <2d2c3eb8-b5f7-e258-e07b-26fd20f1cd9c@oracle.com>

On 2017-06-22 17:19, Kim Barrett wrote:
>> On Jun 22, 2017, at 4:59 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
>>
>> This mail was supposed to go to hotspot-gc-dev (To:ed) not to jdk10-dev (BCC:ed).
>>
>> Thanks,
>> StefanK
>>
>> On 2017-06-22 10:46, Stefan Karlsson wrote:
>>> Hi all,
>>> Please review this trivial change to remove an include of gcTrace.hpp in referenceProcessor.hpp, and changes needed to get the code to compile after that.
>>> http://cr.openjdk.java.net/~stefank/8182696/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8182696
>>> I was prototyping ways to get more type safe time durations in HotSpot and found that whenever I changed my header file, that almost all HotSpot cpp files were recompiled. I tracked it down to come from the unused include of gcTrace.hpp in referenceProcessor.hpp.
>>> We could probably also try to figure out why changes referenceProcessor.hpp triggers recompiles of the entire source code, but I'd like to leave that exercise for another day.
>>> Thanks,
>>> StefanK
> Looks good.
>
> There?s potential for interaction between this and 8181449, but we can sort that out if it happens.

Thanks, Kim. I'll wait until your change has been pushed, and will 
resolve any conflicts.

StefanK

>


From email.sundarms at gmail.com  Thu Jun 22 16:49:16 2017
From: email.sundarms at gmail.com (Sundara Mohan M)
Date: Thu, 22 Jun 2017 09:49:16 -0700
Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag
In-Reply-To: <1498126699.2831.29.camel@oracle.com>
References: <CAEY0QqDYHq2ZhjOFSqLDxduHb9ZLAYaD9LFPrmfeVXwqO4HQfw@mail.gmail.com>
 <1498126699.2831.29.camel@oracle.com>
Message-ID: <CAEY0QqBvzH8sXnD93o7qttAWRXMspyHQ32+H_gGmtKNWoJ2LWg@mail.gmail.com>

Hi Thomas,
   Thanks for the explanation.


I was trying to debug why it is not including some old region even though
it had ~100ms (though Ergo logs say it has accommodated all regions to
cover given 500ms).

Adding some log snippets here and attaching entire logs in case if that
helps.

Running app with 31G

CommandLine flags: -XX:GCLogFileSize=20971520
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap-dump
-XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500
-XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912
-XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled
-XX:+PrintAdaptiveSizePolicy -XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
-XX:+UseGCLogFileRotation -XX:+UseStringDeduplication

...
2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation Pause)
(mixed)
Desired survivor size 104857600 bytes, new threshold 1 (max 15)
- age   1:  131296848 bytes,  131296848 total
- age   2:  237559952 bytes,  368856800 total
- age   3:  137259376 bytes,  506116176 total
 9345.322: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 130042, predicted base time: 171.58 ms, remaining time:
328.42 ms, target pause time: 500.00 ms]
 9345.322: [G1Ergonomics (CSet Construction) add young regions to CSet,
eden: 121 regions, survivors: 77 regions, predicted young region time:
249.33 ms]
* 9345.322: [G1Ergonomics (CSet Construction) finish adding old regions to
CSet, reason: predicted time is too high, predicted time: 0.44 ms,
remaining time: 0.00 ms, old: 204 regions, min: 204 regions]*
 9345.322: [G1Ergonomics (CSet Construction) added expensive regions to
CSet, reason: old CSet region num not reached min, old: 204 regions,
expensive: 11 regions, min: 204 regions, remaining time: 0.00 ms]
 9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
121 regions, survivors: 77 regions, old: 204 regions, predicted pause time:
504.35 ms, target pause time: 500.00 ms]
 9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate
old regions available, candidate old regions: 1425 regions, reclaimable:
11364516952 bytes (34.14 %), threshold: 5.00 %]
, 0.3691404 secs]
   [Parallel Time: 301.4 ms, GC Workers: 13]
      [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max:
9345323.6, Diff: 0.6]
      [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff: 0.6,
Sum: 15.9]
      [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9, Sum:
809.4]
         [Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56, Sum:
674]
      [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum: 157.5]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
      [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff: 3.1,
Sum: 2922.8]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
         [Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23, Sum:
203]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum:
1.1]
      [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8, Diff: 0.5,
Sum: 3907.2]
      [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max: 9345624.0,
Diff: 0.2]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [String Dedup Fixup: 43.9 ms, GC Workers: 13]
      [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3, Sum: 28.6]
      [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2, Sum:
535.8]
   [Clear CT: 3.4 ms]
   [Other: 20.2 ms]
      [Choose CSet: 0.3 ms]
      [Ref Proc: 13.4 ms]
      [Ref Enq: 1.0 ms]
      [Redirty Cards: 2.0 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 2.1 ms]
   [Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M Heap:
15.6G(31.0G)->13.1G(31.0G)]
* [Times: user=4.53 sys=0.00, real=0.36 secs]*
....
2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation Pause)
(mixed)
Desired survivor size 104857600 bytes, new threshold 15 (max 15)
- age   1:   31749256 bytes,   31749256 total
 9387.489: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 127449, predicted base time: 168.88 ms, remaining time:
331.12 ms, target pause time: 500.00 ms]
 9387.489: [G1Ergonomics (CSet Construction) add young regions to CSet,
eden: 184 regions, survivors: 14 regions, predicted young region time:
62.79 ms]
* 9387.490: [G1Ergonomics (CSet Construction) finish adding old regions to
CSet, reason: old CSet region num reached max, old: 397 regions, max: 397
regions]*
 9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
184 regions, survivors: 14 regions, old: 397 regions, predicted pause time:
390.18 ms, target pause time: 500.00 ms]
* 9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate
old regions available, candidate old regions: 1028 regions, reclaimable:
8047410104 bytes (24.18 %), threshold: 5.00 %]*
*, 0.1700662 secs]*
   [Parallel Time: 101.4 ms, GC Workers: 13]
      [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max:
9387491.1, Diff: 0.6]
      [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff: 0.9,
Sum: 14.3]
      [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8, Sum:
361.9]
         [Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54, Sum:
668]
      [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum: 352.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2,
Sum: 0.7]
      [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8, Sum:
569.9]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13, Sum:
124]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum:
2.3]
      [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff: 0.9,
Sum: 1301.4]
      [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max: 9387591.1,
Diff: 0.4]
   [Code Root Fixup: 0.3 ms]
   [Code Root Purge: 0.0 ms]
   [String Dedup Fixup: 43.5 ms, GC Workers: 13]
      [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3, Sum:
561.3]
   [Clear CT: 3.9 ms]
   [Other: 21.1 ms]
      [Choose CSet: 0.8 ms]
      [Ref Proc: 12.8 ms]
      [Ref Enq: 0.9 ms]
      [Redirty Cards: 0.9 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 4.2 ms]
   [Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M Heap:
14.5G(31.0G)->10.1G(31.0G)]
* [Times: user=1.93 sys=0.00, real=0.17 secs]*
.....
2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation Pause)
(mixed)
Desired survivor size 104857600 bytes, new threshold 15 (max 15)
- age   1:   44204040 bytes,   44204040 total
- age   2:   31422896 bytes,   75626936 total
 9429.490: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 64391, predicted base time: 130.82 ms, remaining time:
369.18 ms, target pause time: 500.00 ms]
 9429.490: [G1Ergonomics (CSet Construction) add young regions to CSet,
eden: 178 regions, survivors: 20 regions, predicted young region time:
69.26 ms]
* 9429.491: [G1Ergonomics (CSet Construction) finish adding old regions to
CSet, reason: predicted time is too high, predicted time: 2.12 ms,
remaining time: 0.00 ms, old: 204 regions, min: 204 regions]*
 9429.491: [G1Ergonomics (CSet Construction) added expensive regions to
CSet, reason: old CSet region num not reached min, old: 204 regions,
expensive: 72 regions, min: 204 regions, remaining time: 0.00 ms]
 9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
178 regions, survivors: 20 regions, old: 204 regions, predicted pause time:
684.25 ms, target pause time: 500.00 ms]
 9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate
old regions available, candidate old regions: 824 regions, reclaimable:
6351099672 bytes (19.08 %), threshold: 5.00 %]
, 0.1729571 secs]
   [Parallel Time: 102.6 ms, GC Workers: 13]
      [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max:
9429491.9, Diff: 0.6]
      [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff: 0.9,
Sum: 16.9]
      [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2, Sum:
248.9]
         [Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40, Sum:
424]
      [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum: 222.8]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
Sum: 0.5]
      [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2, Sum:
831.3]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4, Sum: 34]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum:
2.2]
      [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1, Diff: 0.7,
Sum: 1322.7]
      [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max: 9429593.6,
Diff: 0.4]
   [Code Root Fixup: 0.2 ms]
   [Code Root Purge: 0.0 ms]
   [String Dedup Fixup: 45.4 ms, GC Workers: 13]
      [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, Sum: 1.5]
      [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4, Sum:
573.4]
   [Clear CT: 4.3 ms]
   [Other: 20.5 ms]
      [Choose CSet: 0.5 ms]
      [Ref Proc: 14.3 ms]
      [Ref Enq: 1.2 ms]
      [Redirty Cards: 0.7 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 2.4 ms]
   [Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M Heap:
11.5G(31.0G)->8796.0M(31.0G)]
* [Times: user=1.95 sys=0.00, real=0.17 secs]*


On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi,
>
> On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote:
> > Hi,
> >   Can someone shed more light on why G1OldCSetRegionThresholdPercent
> > flag is under experimental (Need to add  -
> > XX:+UnlockExperimentalVMOptions to modify it.)
>
>   in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly as a
> "I really want to do that and I know what I am doing" confirmation from
> the user that he is aware that using this (in this case) option to
> influence the set of regions taken in during mixed gc you might get
> surprising behavior.
>
> Also, I think there has been no official documentation for it - also
> because it should be very rarely needed.
> In particular, I am curious about the case when it would be useful to
> change it. Could you give some log files showing that there is an issue
> with the upper bound for the number of old gen regions to take during
> GC? (i.e. the amount of old gen regions taken is too small and there is
> ample pause time left and it matters to clean up more regions in a
> single mixed gc?)
>
> Sometimes there are problems with the lower bound that is controlled by
> the -XX:G1MixedGCCountTarget (product level) option.
>
> Hth,
>   Thomas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170622/8290ee99/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc-log.0
Type: application/octet-stream
Size: 3393036 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170622/8290ee99/gc-log.0>

From rkennke at redhat.com  Thu Jun 22 20:19:31 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 22 Jun 2017 22:19:31 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
Message-ID: <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>

So here's the latest iteration of that patch:

http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
<http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>

I checked and fixed all the counters. The problem here is that they are
not updated in a single place (deflate_idle_monitors() ) but in several
places, potentially by multiple threads. I split up deflation into
prepare_.. and a finish_.. methods to initialize local and update global
counters respectively, and pass around a counters object (allocated on
stack) to the various code paths that use it. Updating the counters
always happen under a lock, there's no need to do anything special with
regards to concurrency.

I also checked the nmethod marking, but there doesn't seem to be
anything in that code that looks problematic under concurrency. The
worst that can happen is that two threads write the same value into an
nmethod field. I think we can live with that ;-)

Good to go?

Tested by running specjvm and jcstress fastdebug+release without issues.

Roman

Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
> Hi Roman,
>
> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>> Hi David,
>> thanks for reviewing. I'll be on vacation the next two weeks too, with
>> only sporadic access to work stuff.
>> Yes, exposure will not be as good as otherwise, but it's not totally
>> untested either: the serial code path is the same as the parallel, the
>> only difference is that it's not actually called by multiple threads.
>> It's ok I think.
>>
>> I found two more issues that I think should be addressed:
>> - There are some counters in deflate_idle_monitors() and I'm not sure I
>> correctly handle them in the split-up and MT'ed thread-local/ global
>> list deflation
>> - nmethod marking seems to unconditionally poke true or something like
>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>> probably worth checking if it's already true, especially when doing this
>> with multiple threads concurrently.
>>
>> I'll send an updated patch around later, I hope I can get to it today...
>
> I'll review that when you get it out.
> I think this looks as a reasonable step before we tackle this with a
> major effort, such as the JEP you and Carsten doing.
> And another effort to 'fix' nmethods marking.
>
> Internal discussion yesterday lead us to conclude that the runtime
> will probably need more threads.
> This would be a good driver to do a 'global' worker pool which serves
> both gc, runtime and safepoints with threads.
>
>>
>> Roman
>>
>>> Hi Roman,
>>>
>>> I am about to disappear on an extended vacation so will let others
>>> pursue this. IIUC this is longer an opt-in by the user at runtime, but
>>> an opt-in by the particular GC developers. Okay. My only concern with
>>> that is if Shenandoah is the only GC that currently opts in then this
>>> code is not going to get much testing and will be more prone to
>>> incidental breakage.
>
> As I mentioned before, it seem like Erik ? have some idea, maybe he
> can do this after his barrier patch.
>
> Thanks!
>
> /Robbin
>
>>>
>>> Cheers,
>>> David
>>>
>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>> Hi Roman,
>>>>>>
>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>
>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>
>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>> concurrent
>>>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>>>> require
>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>> have
>>>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>>>> without
>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>> corresponding
>>>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>>>> STS and
>>>>>>>>> handle requests for safepoints not by yielding, but by leaving
>>>>>>>>> the
>>>>>>>>> task.
>>>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>>>> workers
>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I left the
>>>>>>>>> API in
>>>>>>>>> CollectedHeap in place. I think GC devs who know better about G1
>>>>>>>>> and CMS
>>>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>
>>>>>>>>> Is it ok now?
>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>> workers
>>>>>>>> inside Shenandoah,
>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>> e.g.:
>>>>>>>>
>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>      _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>>>>>>>> } else {
>>>>>>>>      cleanup.work(0);
>>>>>>>> }
>>>>>>>>
>>>>>>>> That way you don't even need your new flags, but it will be up to
>>>>>>>> the
>>>>>>>> other GCs to make their worker available
>>>>>>>> or cheat with a separate workgang.
>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>> The problem is that we do not want to haste such decision, we
>>>>>> believe
>>>>>> there is a better solution.
>>>>>> I think you also would want another solution.
>>>>>> But it's seems like such solution with 1 'global' thread pool either
>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>> Since this probably will not be done any time soon my suggestion is,
>>>>>> to not hold you back (we also want this), just to make
>>>>>> the code parallel and as an intermediate step ask the GC if it minds
>>>>>> sharing it's thread.
>>>>>>
>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will share
>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>> resolved.
>>>>>>
>>>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>>>> flags for it we might limit our future options.
>>>>>>
>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I
>>>>>>> see
>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>>>> However:
>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>> number
>>>>>> of concurrent marking threads and parallel safepoint threads since
>>>>>> they compete for cpu time.
>>>>>> As the code looks now, I think that decisions must be made by the
>>>>>> GC.
>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>> Oops. Minor mistake there. Correction:
>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>
>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>> into
>>>> collectedHeap.hpp, resulting in build failure...)
>>>>
>>>> Roman
>>>>
>>


From thomas.schatzl at oracle.com  Thu Jun 22 21:16:19 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 22 Jun 2017 23:16:19 +0200
Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag
In-Reply-To: <CAEY0QqBvzH8sXnD93o7qttAWRXMspyHQ32+H_gGmtKNWoJ2LWg@mail.gmail.com>
References: <CAEY0QqDYHq2ZhjOFSqLDxduHb9ZLAYaD9LFPrmfeVXwqO4HQfw@mail.gmail.com>
 <1498126699.2831.29.camel@oracle.com>
 <CAEY0QqBvzH8sXnD93o7qttAWRXMspyHQ32+H_gGmtKNWoJ2LWg@mail.gmail.com>
Message-ID: <1498166179.2710.44.camel@oracle.com>

Hi Sundara,

On Thu, 2017-06-22 at 09:49 -0700, Sundara Mohan M wrote:
> Hi Thomas,
> ? ?Thanks for the explanation.
> 
> 
> I was trying to debug why it is not including some old region even
> though it had ~100ms (though Ergo logs say it has accommodated all
> regions to cover given 500ms).

Ergo is self-training, but it takes some time to adapt to the
situation.

As long running a run that log shows (thanks!), the number of mixed gcs
is relatively small, and they are pretty far apart (in the range of
hours between mixed gc phases). Young gc occurrences distribution is
far from equal (even considering differences in used young gen size),
so it seems that the application is quite bursty from time to time.

The different mixed gc/old gen space reclamation phases are never
particularly long either, so my best guess would be that the values
used for how long particular regions take to evacuate are messed up.

I.e. from some graphs it roughly looks like is that there is roughly a
mixed gc phase at the start of every bursty phase (as far as I could
identify them looking at graphs), and one during the phase, typically
near the end.

So depending on when that mixed gc occurs (at the start of such a burst
or within), g1 trains itself on different application behavior that it
later uses these values on. This is always some kind of moving average,
which does not necessarily reflect reality.

Very good adaptation to this behavior seems beyond what g1 can do at the moment.

One could in theory force G1 to give much more weight to recent observations to make adaptations quicker (i.e. change some factors in that average calculation); but there is no user option for that, and it may open a separate can of worms (currently it seems to not too eagerly discount older observations compared to more recent ones if I read the code correctly).

But that is just something I made up right now by staring at your log
graphs, I may be wrong :)

It is unfortunately impossible to determine the exact values for these
predictions in a product VM (e.g. comparing actual/predicted detail
values the per-region prediction is made of) at this time as there is
no way to get these relevant values out of the VM.

Back to your problem (if there is one, you did not state any ;)): the
log shows a few issues with mixed gc actually: the one you explained
about not taking enough old gen regions because
the?G1OldCSetRegionThresholdPercent is too low as you suspected (still
not reaching max pause time; case 1), and the cases where the number of
old gen regions taken is too low so these are filled up with "expensive
old gen regions". However I am seeing both the actual time taken being
too low and too high (case 2 and 3)

Not sure what your goals are here, and what the actual issue is, but

- you can probably fix case 1 with increasing the mentioned
-XX:G1OldCSetRegionThresholdPercent option if that behavior annoys you.

- fix either case 2 or case 3 with decreasing or increasing
-XX:G1MixedGCCountTarget (one direction increases the minimum number of
regions to take, the other decreases it).

All in all an interesting case to look at :)

Thanks a lot,
? Thomas

> Adding some log snippets here and attaching entire logs in case if
> that helps.
> 
> Running app with 31G
> 
> CommandLine flags: -XX:GCLogFileSize=20971520
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap-
> dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500
> -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912
> -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled
> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
> -XX:+UseStringDeduplication
> 
> ...
> 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation
> Pause) (mixed)
> Desired survivor size 104857600 bytes, new threshold 1 (max 15)
> - age ? 1: ?131296848 bytes, ?131296848 total
> - age ? 2: ?237559952 bytes, ?368856800 total
> - age ? 3: ?137259376 bytes, ?506116176 total
> ?9345.322: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 130042, predicted base time: 171.58 ms, remaining
> time: 328.42 ms, target pause time: 500.00 ms]
> ?9345.322: [G1Ergonomics (CSet Construction) add young regions to
> CSet, eden: 121 regions, survivors: 77 regions, predicted young
> region time: 249.33 ms]
> ?9345.322: [G1Ergonomics (CSet Construction) finish adding old
> regions to CSet, reason: predicted time is too high, predicted time:
> 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
> ?9345.322: [G1Ergonomics (CSet Construction) added expensive regions
> to CSet, reason: old CSet region num not reached min, old: 204
> regions, expensive: 11 regions, min: 204 regions, remaining time:
> 0.00 ms]
> ?9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet,
> eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted
> pause time: 504.35 ms, target pause time: 500.00 ms]
> ?9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
> candidate old regions available, candidate old regions: 1425 regions,
> reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %]
> , 0.3691404 secs]
> ? ?[Parallel Time: 301.4 ms, GC Workers: 13]
> ? ? ? [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max:
> 9345323.6, Diff: 0.6]
> ? ? ? [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff:
> 0.6, Sum: 15.9]
> ? ? ? [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9,
> Sum: 809.4]
> ? ? ? ? ?[Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56,
> Sum: 674]
> ? ? ? [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum:
> 157.5]
> ? ? ? [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
> 0.0, Sum: 0.1]
> ? ? ? [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff:
> 3.1, Sum: 2922.8]
> ? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
> Sum: 0.4]
> ? ? ? ? ?[Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23,
> Sum: 203]
> ? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2,
> Sum: 1.1]
> ? ? ? [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8,
> Diff: 0.5, Sum: 3907.2]
> ? ? ? [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max:
> 9345624.0, Diff: 0.2]
> ? ?[Code Root Fixup: 0.1 ms]
> ? ?[Code Root Purge: 0.0 ms]
> ? ?[String Dedup Fixup: 43.9 ms, GC Workers: 13]
> ? ? ? [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3,
> Sum: 28.6]
> ? ? ? [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2,
> Sum: 535.8]
> ? ?[Clear CT: 3.4 ms]
> ? ?[Other: 20.2 ms]
> ? ? ? [Choose CSet: 0.3 ms]
> ? ? ? [Ref Proc: 13.4 ms]
> ? ? ? [Ref Enq: 1.0 ms]
> ? ? ? [Redirty Cards: 2.0 ms]
> ? ? ? [Humongous Register: 0.2 ms]
> ? ? ? [Humongous Reclaim: 0.1 ms]
> ? ? ? [Free CSet: 2.1 ms]
> ? ?[Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M
> Heap: 15.6G(31.0G)->13.1G(31.0G)]
> ?[Times: user=4.53 sys=0.00, real=0.36 secs]
> ....
> 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation
> Pause) (mixed)
> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
> - age ? 1: ? 31749256 bytes, ? 31749256 total
> ?9387.489: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 127449, predicted base time: 168.88 ms, remaining
> time: 331.12 ms, target pause time: 500.00 ms]
> ?9387.489: [G1Ergonomics (CSet Construction) add young regions to
> CSet, eden: 184 regions, survivors: 14 regions, predicted young
> region time: 62.79 ms]
> ?9387.490: [G1Ergonomics (CSet Construction) finish adding old
> regions to CSet, reason: old CSet region num reached max, old: 397
> regions, max: 397 regions]
> ?9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet,
> eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted
> pause time: 390.18 ms, target pause time: 500.00 ms]
> ?9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
> candidate old regions available, candidate old regions: 1028 regions,
> reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %]
> , 0.1700662 secs]
> ? ?[Parallel Time: 101.4 ms, GC Workers: 13]
> ? ? ? [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max:
> 9387491.1, Diff: 0.6]
> ? ? ? [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff:
> 0.9, Sum: 14.3]
> ? ? ? [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8,
> Sum: 361.9]
> ? ? ? ? ?[Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54,
> Sum: 668]
> ? ? ? [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum:
> 352.2]
> ? ? ? [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff:
> 0.2, Sum: 0.7]
> ? ? ? [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8,
> Sum: 569.9]
> ? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.1]
> ? ? ? ? ?[Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13,
> Sum: 124]
> ? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
> Sum: 2.3]
> ? ? ? [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff:
> 0.9, Sum: 1301.4]
> ? ? ? [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max:
> 9387591.1, Diff: 0.4]
> ? ?[Code Root Fixup: 0.3 ms]
> ? ?[Code Root Purge: 0.0 ms]
> ? ?[String Dedup Fixup: 43.5 ms, GC Workers: 13]
> ? ? ? [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.0]
> ? ? ? [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3,
> Sum: 561.3]
> ? ?[Clear CT: 3.9 ms]
> ? ?[Other: 21.1 ms]
> ? ? ? [Choose CSet: 0.8 ms]
> ? ? ? [Ref Proc: 12.8 ms]
> ? ? ? [Ref Enq: 0.9 ms]
> ? ? ? [Redirty Cards: 0.9 ms]
> ? ? ? [Humongous Register: 0.2 ms]
> ? ? ? [Humongous Reclaim: 0.1 ms]
> ? ? ? [Free CSet: 4.2 ms]
> ? ?[Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M
> Heap: 14.5G(31.0G)->10.1G(31.0G)]
> ?[Times: user=1.93 sys=0.00, real=0.17 secs]
> .....
> 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation
> Pause) (mixed)
> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
> - age ? 1: ? 44204040 bytes, ? 44204040 total
> - age ? 2: ? 31422896 bytes, ? 75626936 total
> ?9429.490: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 64391, predicted base time: 130.82 ms, remaining
> time: 369.18 ms, target pause time: 500.00 ms]
> ?9429.490: [G1Ergonomics (CSet Construction) add young regions to
> CSet, eden: 178 regions, survivors: 20 regions, predicted young
> region time: 69.26 ms]
> ?9429.491: [G1Ergonomics (CSet Construction) finish adding old
> regions to CSet, reason: predicted time is too high, predicted time:
> 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
> ?9429.491: [G1Ergonomics (CSet Construction) added expensive regions
> to CSet, reason: old CSet region num not reached min, old: 204
> regions, expensive: 72 regions, min: 204 regions, remaining time:
> 0.00 ms]
> ?9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet,
> eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted
> pause time: 684.25 ms, target pause time: 500.00 ms]
> ?9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
> candidate old regions available, candidate old regions: 824 regions,
> reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %]
> , 0.1729571 secs]
> ? ?[Parallel Time: 102.6 ms, GC Workers: 13]
> ? ? ? [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max:
> 9429491.9, Diff: 0.6]
> ? ? ? [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff:
> 0.9, Sum: 16.9]
> ? ? ? [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2,
> Sum: 248.9]
> ? ? ? ? ?[Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40,
> Sum: 424]
> ? ? ? [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum:
> 222.8]
> ? ? ? [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
> 0.1, Sum: 0.5]
> ? ? ? [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2,
> Sum: 831.3]
> ? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.1]
> ? ? ? ? ?[Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4,
> Sum: 34]
> ? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
> Sum: 2.2]
> ? ? ? [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1,
> Diff: 0.7, Sum: 1322.7]
> ? ? ? [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max:
> 9429593.6, Diff: 0.4]
> ? ?[Code Root Fixup: 0.2 ms]
> ? ?[Code Root Purge: 0.0 ms]
> ? ?[String Dedup Fixup: 45.4 ms, GC Workers: 13]
> ? ? ? [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5,
> Sum: 1.5]
> ? ? ? [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4,
> Sum: 573.4]
> ? ?[Clear CT: 4.3 ms]
> ? ?[Other: 20.5 ms]
> ? ? ? [Choose CSet: 0.5 ms]
> ? ? ? [Ref Proc: 14.3 ms]
> ? ? ? [Ref Enq: 1.2 ms]
> ? ? ? [Redirty Cards: 0.7 ms]
> ? ? ? [Humongous Register: 0.2 ms]
> ? ? ? [Humongous Reclaim: 0.1 ms]
> ? ? ? [Free CSet: 2.4 ms]
> ? ?[Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M
> Heap: 11.5G(31.0G)->8796.0M(31.0G)]
> ?[Times: user=1.95 sys=0.00, real=0.17 secs]
> 
> 
> On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl <thomas.schatzl at oracl
> e.com> wrote:
> > Hi,
> > 
> > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote:
> > > Hi,
> > > ? Can someone shed more light on
> > why?G1OldCSetRegionThresholdPercent
> > > flag is under experimental (Need to add??-
> > > XX:+UnlockExperimentalVMOptions to modify it.)
> > 
> > ? in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly
> > as a
> > "I really want to do that and I know what I am doing" confirmation
> > from
> > the user that he is aware that using this (in this case) option to
> > influence the set of regions taken in during mixed gc you might get
> > surprising behavior.
> > 
> > Also, I think there has been no official documentation for it -
> > also
> > because it should be very rarely needed.
> > In particular, I am curious about the case when it would be useful
> > to
> > change it. Could you give some log files showing that there is an
> > issue
> > with the upper bound for the number of old gen regions to take
> > during
> > GC? (i.e. the amount of old gen regions taken is too small and
> > there is
> > ample pause time left and it matters to clean up more regions in a
> > single mixed gc?)
> > 
> > Sometimes there are problems with the lower bound that is
> > controlled by
> > the -XX:G1MixedGCCountTarget (product level) option.
> > 
> > Hth,
> > ? Thomas
> > 
> > 


From email.sundarms at gmail.com  Thu Jun 22 22:11:48 2017
From: email.sundarms at gmail.com (Sundara Mohan M)
Date: Thu, 22 Jun 2017 15:11:48 -0700
Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag
In-Reply-To: <1498166179.2710.44.camel@oracle.com>
References: <CAEY0QqDYHq2ZhjOFSqLDxduHb9ZLAYaD9LFPrmfeVXwqO4HQfw@mail.gmail.com>
 <1498126699.2831.29.camel@oracle.com>
 <CAEY0QqBvzH8sXnD93o7qttAWRXMspyHQ32+H_gGmtKNWoJ2LWg@mail.gmail.com>
 <1498166179.2710.44.camel@oracle.com>
Message-ID: <CAEY0QqCA9zV+uH1+EHFWVkUXuDyamk=hJcdznu_AqXi2SRSdMw@mail.gmail.com>

Thanks for the insights on Ergo.

I was trying to migrate from CMS to G1GC, the app has a low memory
handler ( the thread which finds memory utilization from
Runtime.getFreememory and removes some data from in memory if it
exceeds the threshold).

In CMS this handler was not invoked frequently (for ex: When I have
60K objects it will kick in remove ~5K LRU objects and continue
regular operation) when i moved to G1GC this handler started kicking
in frequently(ex: When i have 60K object it will remove 5K LRU objects
and immediately after some time it will kick in and remove another 5K
and goes till 10K objects are left).

So, i was trying to find out why did mixed GC doesn't cleanup quick
enough before my low memory handler kicks in.

Though i see number of young gen collection and time taken to clean
has came down by ~40%.

Another issue (may be this is expected) is after increasing
G1OldCSetRegionThresholdPercent to 20% from 10% i am started seeing
few mixed GC taking 1s (most of the time is spent on UpdateRS,
MaxPause=500ms). Will get back once i have more understanding on what
is happening..

Thanks,
Sundar

On Thu, Jun 22, 2017 at 2:16 PM, Thomas Schatzl
<thomas.schatzl at oracle.com> wrote:
> Hi Sundara,
>
> On Thu, 2017-06-22 at 09:49 -0700, Sundara Mohan M wrote:
>> Hi Thomas,
>>    Thanks for the explanation.
>>
>>
>> I was trying to debug why it is not including some old region even
>> though it had ~100ms (though Ergo logs say it has accommodated all
>> regions to cover given 500ms).
>
> Ergo is self-training, but it takes some time to adapt to the
> situation.
>
> As long running a run that log shows (thanks!), the number of mixed gcs
> is relatively small, and they are pretty far apart (in the range of
> hours between mixed gc phases). Young gc occurrences distribution is
> far from equal (even considering differences in used young gen size),
> so it seems that the application is quite bursty from time to time.
>
> The different mixed gc/old gen space reclamation phases are never
> particularly long either, so my best guess would be that the values
> used for how long particular regions take to evacuate are messed up.
>
> I.e. from some graphs it roughly looks like is that there is roughly a
> mixed gc phase at the start of every bursty phase (as far as I could
> identify them looking at graphs), and one during the phase, typically
> near the end.
>
> So depending on when that mixed gc occurs (at the start of such a burst
> or within), g1 trains itself on different application behavior that it
> later uses these values on. This is always some kind of moving average,
> which does not necessarily reflect reality.
>
> Very good adaptation to this behavior seems beyond what g1 can do at the moment.
>
> One could in theory force G1 to give much more weight to recent observations to make adaptations quicker (i.e. change some factors in that average calculation); but there is no user option for that, and it may open a separate can of worms (currently it seems to not too eagerly discount older observations compared to more recent ones if I read the code correctly).
>
> But that is just something I made up right now by staring at your log
> graphs, I may be wrong :)
>
> It is unfortunately impossible to determine the exact values for these
> predictions in a product VM (e.g. comparing actual/predicted detail
> values the per-region prediction is made of) at this time as there is
> no way to get these relevant values out of the VM.
>
> Back to your problem (if there is one, you did not state any ;)): the
> log shows a few issues with mixed gc actually: the one you explained
> about not taking enough old gen regions because
> the G1OldCSetRegionThresholdPercent is too low as you suspected (still
> not reaching max pause time; case 1), and the cases where the number of
> old gen regions taken is too low so these are filled up with "expensive
> old gen regions". However I am seeing both the actual time taken being
> too low and too high (case 2 and 3)
>
> Not sure what your goals are here, and what the actual issue is, but
>
> - you can probably fix case 1 with increasing the mentioned
> -XX:G1OldCSetRegionThresholdPercent option if that behavior annoys you.
>
> - fix either case 2 or case 3 with decreasing or increasing
> -XX:G1MixedGCCountTarget (one direction increases the minimum number of
> regions to take, the other decreases it).
>
> All in all an interesting case to look at :)
>
> Thanks a lot,
>   Thomas
>
>> Adding some log snippets here and attaching entire logs in case if
>> that helps.
>>
>> Running app with 31G
>>
>> CommandLine flags: -XX:GCLogFileSize=20971520
>> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap-
>> dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500
>> -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912
>> -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled
>> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC
>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
>> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
>> -XX:+UseStringDeduplication
>>
>> ...
>> 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 1 (max 15)
>> - age   1:  131296848 bytes,  131296848 total
>> - age   2:  237559952 bytes,  368856800 total
>> - age   3:  137259376 bytes,  506116176 total
>>  9345.322: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 130042, predicted base time: 171.58 ms, remaining
>> time: 328.42 ms, target pause time: 500.00 ms]
>>  9345.322: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 121 regions, survivors: 77 regions, predicted young
>> region time: 249.33 ms]
>>  9345.322: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: predicted time is too high, predicted time:
>> 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>>  9345.322: [G1Ergonomics (CSet Construction) added expensive regions
>> to CSet, reason: old CSet region num not reached min, old: 204
>> regions, expensive: 11 regions, min: 204 regions, remaining time:
>> 0.00 ms]
>>  9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted
>> pause time: 504.35 ms, target pause time: 500.00 ms]
>>  9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 1425 regions,
>> reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %]
>> , 0.3691404 secs]
>>    [Parallel Time: 301.4 ms, GC Workers: 13]
>>       [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max:
>> 9345323.6, Diff: 0.6]
>>       [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff:
>> 0.6, Sum: 15.9]
>>       [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9,
>> Sum: 809.4]
>>          [Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56,
>> Sum: 674]
>>       [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum:
>> 157.5]
>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
>> 0.0, Sum: 0.1]
>>       [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff:
>> 3.1, Sum: 2922.8]
>>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.4]
>>          [Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23,
>> Sum: 203]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2,
>> Sum: 1.1]
>>       [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8,
>> Diff: 0.5, Sum: 3907.2]
>>       [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max:
>> 9345624.0, Diff: 0.2]
>>    [Code Root Fixup: 0.1 ms]
>>    [Code Root Purge: 0.0 ms]
>>    [String Dedup Fixup: 43.9 ms, GC Workers: 13]
>>       [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3,
>> Sum: 28.6]
>>       [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2,
>> Sum: 535.8]
>>    [Clear CT: 3.4 ms]
>>    [Other: 20.2 ms]
>>       [Choose CSet: 0.3 ms]
>>       [Ref Proc: 13.4 ms]
>>       [Ref Enq: 1.0 ms]
>>       [Redirty Cards: 2.0 ms]
>>       [Humongous Register: 0.2 ms]
>>       [Humongous Reclaim: 0.1 ms]
>>       [Free CSet: 2.1 ms]
>>    [Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M
>> Heap: 15.6G(31.0G)->13.1G(31.0G)]
>>  [Times: user=4.53 sys=0.00, real=0.36 secs]
>> ....
>> 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
>> - age   1:   31749256 bytes,   31749256 total
>>  9387.489: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 127449, predicted base time: 168.88 ms, remaining
>> time: 331.12 ms, target pause time: 500.00 ms]
>>  9387.489: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 184 regions, survivors: 14 regions, predicted young
>> region time: 62.79 ms]
>>  9387.490: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: old CSet region num reached max, old: 397
>> regions, max: 397 regions]
>>  9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted
>> pause time: 390.18 ms, target pause time: 500.00 ms]
>>  9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 1028 regions,
>> reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %]
>> , 0.1700662 secs]
>>    [Parallel Time: 101.4 ms, GC Workers: 13]
>>       [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max:
>> 9387491.1, Diff: 0.6]
>>       [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff:
>> 0.9, Sum: 14.3]
>>       [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8,
>> Sum: 361.9]
>>          [Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54,
>> Sum: 668]
>>       [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum:
>> 352.2]
>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff:
>> 0.2, Sum: 0.7]
>>       [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8,
>> Sum: 569.9]
>>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>>          [Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13,
>> Sum: 124]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
>> Sum: 2.3]
>>       [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff:
>> 0.9, Sum: 1301.4]
>>       [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max:
>> 9387591.1, Diff: 0.4]
>>    [Code Root Fixup: 0.3 ms]
>>    [Code Root Purge: 0.0 ms]
>>    [String Dedup Fixup: 43.5 ms, GC Workers: 13]
>>       [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.0]
>>       [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3,
>> Sum: 561.3]
>>    [Clear CT: 3.9 ms]
>>    [Other: 21.1 ms]
>>       [Choose CSet: 0.8 ms]
>>       [Ref Proc: 12.8 ms]
>>       [Ref Enq: 0.9 ms]
>>       [Redirty Cards: 0.9 ms]
>>       [Humongous Register: 0.2 ms]
>>       [Humongous Reclaim: 0.1 ms]
>>       [Free CSet: 4.2 ms]
>>    [Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M
>> Heap: 14.5G(31.0G)->10.1G(31.0G)]
>>  [Times: user=1.93 sys=0.00, real=0.17 secs]
>> .....
>> 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
>> - age   1:   44204040 bytes,   44204040 total
>> - age   2:   31422896 bytes,   75626936 total
>>  9429.490: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 64391, predicted base time: 130.82 ms, remaining
>> time: 369.18 ms, target pause time: 500.00 ms]
>>  9429.490: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 178 regions, survivors: 20 regions, predicted young
>> region time: 69.26 ms]
>>  9429.491: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: predicted time is too high, predicted time:
>> 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>>  9429.491: [G1Ergonomics (CSet Construction) added expensive regions
>> to CSet, reason: old CSet region num not reached min, old: 204
>> regions, expensive: 72 regions, min: 204 regions, remaining time:
>> 0.00 ms]
>>  9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted
>> pause time: 684.25 ms, target pause time: 500.00 ms]
>>  9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 824 regions,
>> reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %]
>> , 0.1729571 secs]
>>    [Parallel Time: 102.6 ms, GC Workers: 13]
>>       [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max:
>> 9429491.9, Diff: 0.6]
>>       [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff:
>> 0.9, Sum: 16.9]
>>       [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2,
>> Sum: 248.9]
>>          [Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40,
>> Sum: 424]
>>       [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum:
>> 222.8]
>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
>> 0.1, Sum: 0.5]
>>       [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2,
>> Sum: 831.3]
>>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>>          [Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4,
>> Sum: 34]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
>> Sum: 2.2]
>>       [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1,
>> Diff: 0.7, Sum: 1322.7]
>>       [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max:
>> 9429593.6, Diff: 0.4]
>>    [Code Root Fixup: 0.2 ms]
>>    [Code Root Purge: 0.0 ms]
>>    [String Dedup Fixup: 45.4 ms, GC Workers: 13]
>>       [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5,
>> Sum: 1.5]
>>       [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4,
>> Sum: 573.4]
>>    [Clear CT: 4.3 ms]
>>    [Other: 20.5 ms]
>>       [Choose CSet: 0.5 ms]
>>       [Ref Proc: 14.3 ms]
>>       [Ref Enq: 1.2 ms]
>>       [Redirty Cards: 0.7 ms]
>>       [Humongous Register: 0.2 ms]
>>       [Humongous Reclaim: 0.1 ms]
>>       [Free CSet: 2.4 ms]
>>    [Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M
>> Heap: 11.5G(31.0G)->8796.0M(31.0G)]
>>  [Times: user=1.95 sys=0.00, real=0.17 secs]
>>
>>
>> On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl <thomas.schatzl at oracl
>> e.com> wrote:
>> > Hi,
>> >
>> > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote:
>> > > Hi,
>> > >   Can someone shed more light on
>> > why G1OldCSetRegionThresholdPercent
>> > > flag is under experimental (Need to add  -
>> > > XX:+UnlockExperimentalVMOptions to modify it.)
>> >
>> >   in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly
>> > as a
>> > "I really want to do that and I know what I am doing" confirmation
>> > from
>> > the user that he is aware that using this (in this case) option to
>> > influence the set of regions taken in during mixed gc you might get
>> > surprising behavior.
>> >
>> > Also, I think there has been no official documentation for it -
>> > also
>> > because it should be very rarely needed.
>> > In particular, I am curious about the case when it would be useful
>> > to
>> > change it. Could you give some log files showing that there is an
>> > issue
>> > with the upper bound for the number of old gen regions to take
>> > during
>> > GC? (i.e. the amount of old gen regions taken is too small and
>> > there is
>> > ample pause time left and it matters to clean up more regions in a
>> > single mixed gc?)
>> >
>> > Sometimes there are problems with the lower bound that is
>> > controlled by
>> > the -XX:G1MixedGCCountTarget (product level) option.
>> >
>> > Hth,
>> >   Thomas
>> >
>> >


From ecki at zusammenkunft.net  Thu Jun 22 22:30:18 2017
From: ecki at zusammenkunft.net (Bernd Eckenfels)
Date: Thu, 22 Jun 2017 22:30:18 +0000
Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag
In-Reply-To: <CAEY0QqCA9zV+uH1+EHFWVkUXuDyamk=hJcdznu_AqXi2SRSdMw@mail.gmail.com>
References: <CAEY0QqDYHq2ZhjOFSqLDxduHb9ZLAYaD9LFPrmfeVXwqO4HQfw@mail.gmail.com>
 <1498126699.2831.29.camel@oracle.com>
 <CAEY0QqBvzH8sXnD93o7qttAWRXMspyHQ32+H_gGmtKNWoJ2LWg@mail.gmail.com>
 <1498166179.2710.44.camel@oracle.com>,
 <CAEY0QqCA9zV+uH1+EHFWVkUXuDyamk=hJcdznu_AqXi2SRSdMw@mail.gmail.com>
Message-ID: <HE1PR08MB2795A86F76E680218E88C0BBFFDB0@HE1PR08MB2795.eurprd08.prod.outlook.com>

Looking at used memory like this is a bit problematic, since the Java Heap tends to hold on memory - and only when the GC runs and tries to free memory it is known how the real memory is used. In case of CMS the collection is triggered regularly in the background. This is why the used memory metric is not that bad. However with G1 (and even worse with throughput collector) you often see a larger usage than actual referenced memory. (This is a bit an oversimplification as it does not address soft references)

What I typically recommend is to not look at the used memory metric at fixed intervals but wait for a Gc event and look at the 'usage after gc'. This also has problems (gives you the usage late) but it will avoid the false positive you have observed.

Gruss
Bernd
--
http://bernd.eckenfels.net
_____________________________
From: Sundara Mohan M <email.sundarms at gmail.com<mailto:email.sundarms at gmail.com>>
Sent: Freitag, Juni 23, 2017 12:23 AM
Subject: Re: G1OldCSetRegionThresholdPercent under ExperimentalFlag
To: Thomas Schatzl <thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>>
Cc: <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>


Thanks for the insights on Ergo.

I was trying to migrate from CMS to G1GC, the app has a low memory
handler ( the thread which finds memory utilization from
Runtime.getFreememory and removes some data from in memory if it
exceeds the threshold).

In CMS this handler was not invoked frequently (for ex: When I have
60K objects it will kick in remove ~5K LRU objects and continue
regular operation) when i moved to G1GC this handler started kicking
in frequently(ex: When i have 60K object it will remove 5K LRU objects
and immediately after some time it will kick in and remove another 5K
and goes till 10K objects are left).

So, i was trying to find out why did mixed GC doesn't cleanup quick
enough before my low memory handler kicks in.

Though i see number of young gen collection and time taken to clean
has came down by ~40%.

Another issue (may be this is expected) is after increasing
G1OldCSetRegionThresholdPercent to 20% from 10% i am started seeing
few mixed GC taking 1s (most of the time is spent on UpdateRS,
MaxPause=500ms). Will get back once i have more understanding on what
is happening..

Thanks,
Sundar

On Thu, Jun 22, 2017 at 2:16 PM, Thomas Schatzl
<thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>> wrote:
> Hi Sundara,
>
> On Thu, 2017-06-22 at 09:49 -0700, Sundara Mohan M wrote:
>> Hi Thomas,
>> Thanks for the explanation.
>>
>>
>> I was trying to debug why it is not including some old region even
>> though it had ~100ms (though Ergo logs say it has accommodated all
>> regions to cover given 500ms).
>
> Ergo is self-training, but it takes some time to adapt to the
> situation.
>
> As long running a run that log shows (thanks!), the number of mixed gcs
> is relatively small, and they are pretty far apart (in the range of
> hours between mixed gc phases). Young gc occurrences distribution is
> far from equal (even considering differences in used young gen size),
> so it seems that the application is quite bursty from time to time.
>
> The different mixed gc/old gen space reclamation phases are never
> particularly long either, so my best guess would be that the values
> used for how long particular regions take to evacuate are messed up.
>
> I.e. from some graphs it roughly looks like is that there is roughly a
> mixed gc phase at the start of every bursty phase (as far as I could
> identify them looking at graphs), and one during the phase, typically
> near the end.
>
> So depending on when that mixed gc occurs (at the start of such a burst
> or within), g1 trains itself on different application behavior that it
> later uses these values on. This is always some kind of moving average,
> which does not necessarily reflect reality.
>
> Very good adaptation to this behavior seems beyond what g1 can do at the moment.
>
> One could in theory force G1 to give much more weight to recent observations to make adaptations quicker (i.e. change some factors in that average calculation); but there is no user option for that, and it may open a separate can of worms (currently it seems to not too eagerly discount older observations compared to more recent ones if I read the code correctly).
>
> But that is just something I made up right now by staring at your log
> graphs, I may be wrong :)
>
> It is unfortunately impossible to determine the exact values for these
> predictions in a product VM (e.g. comparing actual/predicted detail
> values the per-region prediction is made of) at this time as there is
> no way to get these relevant values out of the VM.
>
> Back to your problem (if there is one, you did not state any ;)): the
> log shows a few issues with mixed gc actually: the one you explained
> about not taking enough old gen regions because
> the G1OldCSetRegionThresholdPercent is too low as you suspected (still
> not reaching max pause time; case 1), and the cases where the number of
> old gen regions taken is too low so these are filled up with "expensive
> old gen regions". However I am seeing both the actual time taken being
> too low and too high (case 2 and 3)
>
> Not sure what your goals are here, and what the actual issue is, but
>
> - you can probably fix case 1 with increasing the mentioned
> -XX:G1OldCSetRegionThresholdPercent option if that behavior annoys you.
>
> - fix either case 2 or case 3 with decreasing or increasing
> -XX:G1MixedGCCountTarget (one direction increases the minimum number of
> regions to take, the other decreases it).
>
> All in all an interesting case to look at :)
>
> Thanks a lot,
> Thomas
>
>> Adding some log snippets here and attaching entire logs in case if
>> that helps.
>>
>> Running app with 31G
>>
>> CommandLine flags: -XX:GCLogFileSize=20971520
>> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap-
>> dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500
>> -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912
>> -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled
>> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC
>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
>> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
>> -XX:+UseStringDeduplication
>>
>> ...
>> 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 1 (max 15)
>> - age 1: 131296848 bytes, 131296848 total
>> - age 2: 237559952 bytes, 368856800 total
>> - age 3: 137259376 bytes, 506116176 total
>> 9345.322: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 130042, predicted base time: 171.58 ms, remaining
>> time: 328.42 ms, target pause time: 500.00 ms]
>> 9345.322: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 121 regions, survivors: 77 regions, predicted young
>> region time: 249.33 ms]
>> 9345.322: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: predicted time is too high, predicted time:
>> 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>> 9345.322: [G1Ergonomics (CSet Construction) added expensive regions
>> to CSet, reason: old CSet region num not reached min, old: 204
>> regions, expensive: 11 regions, min: 204 regions, remaining time:
>> 0.00 ms]
>> 9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted
>> pause time: 504.35 ms, target pause time: 500.00 ms]
>> 9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 1425 regions,
>> reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %]
>> , 0.3691404 secs]
>> [Parallel Time: 301.4 ms, GC Workers: 13]
>> [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max:
>> 9345323.6, Diff: 0.6]
>> [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff:
>> 0.6, Sum: 15.9]
>> [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9,
>> Sum: 809.4]
>> [Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56,
>> Sum: 674]
>> [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum:
>> 157.5]
>> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
>> 0.0, Sum: 0.1]
>> [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff:
>> 3.1, Sum: 2922.8]
>> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.4]
>> [Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23,
>> Sum: 203]
>> [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2,
>> Sum: 1.1]
>> [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8,
>> Diff: 0.5, Sum: 3907.2]
>> [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max:
>> 9345624.0, Diff: 0.2]
>> [Code Root Fixup: 0.1 ms]
>> [Code Root Purge: 0.0 ms]
>> [String Dedup Fixup: 43.9 ms, GC Workers: 13]
>> [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3,
>> Sum: 28.6]
>> [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2,
>> Sum: 535.8]
>> [Clear CT: 3.4 ms]
>> [Other: 20.2 ms]
>> [Choose CSet: 0.3 ms]
>> [Ref Proc: 13.4 ms]
>> [Ref Enq: 1.0 ms]
>> [Redirty Cards: 2.0 ms]
>> [Humongous Register: 0.2 ms]
>> [Humongous Reclaim: 0.1 ms]
>> [Free CSet: 2.1 ms]
>> [Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M
>> Heap: 15.6G(31.0G)->13.1G(31.0G)]
>> [Times: user=4.53 sys=0.00, real=0.36 secs]
>> ....
>> 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
>> - age 1: 31749256 bytes, 31749256 total
>> 9387.489: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 127449, predicted base time: 168.88 ms, remaining
>> time: 331.12 ms, target pause time: 500.00 ms]
>> 9387.489: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 184 regions, survivors: 14 regions, predicted young
>> region time: 62.79 ms]
>> 9387.490: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: old CSet region num reached max, old: 397
>> regions, max: 397 regions]
>> 9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted
>> pause time: 390.18 ms, target pause time: 500.00 ms]
>> 9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 1028 regions,
>> reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %]
>> , 0.1700662 secs]
>> [Parallel Time: 101.4 ms, GC Workers: 13]
>> [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max:
>> 9387491.1, Diff: 0.6]
>> [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff:
>> 0.9, Sum: 14.3]
>> [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8,
>> Sum: 361.9]
>> [Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54,
>> Sum: 668]
>> [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum:
>> 352.2]
>> [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff:
>> 0.2, Sum: 0.7]
>> [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8,
>> Sum: 569.9]
>> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>> [Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13,
>> Sum: 124]
>> [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
>> Sum: 2.3]
>> [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff:
>> 0.9, Sum: 1301.4]
>> [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max:
>> 9387591.1, Diff: 0.4]
>> [Code Root Fixup: 0.3 ms]
>> [Code Root Purge: 0.0 ms]
>> [String Dedup Fixup: 43.5 ms, GC Workers: 13]
>> [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.0]
>> [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3,
>> Sum: 561.3]
>> [Clear CT: 3.9 ms]
>> [Other: 21.1 ms]
>> [Choose CSet: 0.8 ms]
>> [Ref Proc: 12.8 ms]
>> [Ref Enq: 0.9 ms]
>> [Redirty Cards: 0.9 ms]
>> [Humongous Register: 0.2 ms]
>> [Humongous Reclaim: 0.1 ms]
>> [Free CSet: 4.2 ms]
>> [Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M
>> Heap: 14.5G(31.0G)->10.1G(31.0G)]
>> [Times: user=1.93 sys=0.00, real=0.17 secs]
>> .....
>> 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
>> - age 1: 44204040 bytes, 44204040 total
>> - age 2: 31422896 bytes, 75626936 total
>> 9429.490: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 64391, predicted base time: 130.82 ms, remaining
>> time: 369.18 ms, target pause time: 500.00 ms]
>> 9429.490: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 178 regions, survivors: 20 regions, predicted young
>> region time: 69.26 ms]
>> 9429.491: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: predicted time is too high, predicted time:
>> 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>> 9429.491: [G1Ergonomics (CSet Construction) added expensive regions
>> to CSet, reason: old CSet region num not reached min, old: 204
>> regions, expensive: 72 regions, min: 204 regions, remaining time:
>> 0.00 ms]
>> 9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted
>> pause time: 684.25 ms, target pause time: 500.00 ms]
>> 9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 824 regions,
>> reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %]
>> , 0.1729571 secs]
>> [Parallel Time: 102.6 ms, GC Workers: 13]
>> [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max:
>> 9429491.9, Diff: 0.6]
>> [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff:
>> 0.9, Sum: 16.9]
>> [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2,
>> Sum: 248.9]
>> [Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40,
>> Sum: 424]
>> [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum:
>> 222.8]
>> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
>> 0.1, Sum: 0.5]
>> [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2,
>> Sum: 831.3]
>> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>> [Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4,
>> Sum: 34]
>> [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
>> Sum: 2.2]
>> [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1,
>> Diff: 0.7, Sum: 1322.7]
>> [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max:
>> 9429593.6, Diff: 0.4]
>> [Code Root Fixup: 0.2 ms]
>> [Code Root Purge: 0.0 ms]
>> [String Dedup Fixup: 45.4 ms, GC Workers: 13]
>> [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5,
>> Sum: 1.5]
>> [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4,
>> Sum: 573.4]
>> [Clear CT: 4.3 ms]
>> [Other: 20.5 ms]
>> [Choose CSet: 0.5 ms]
>> [Ref Proc: 14.3 ms]
>> [Ref Enq: 1.2 ms]
>> [Redirty Cards: 0.7 ms]
>> [Humongous Register: 0.2 ms]
>> [Humongous Reclaim: 0.1 ms]
>> [Free CSet: 2.4 ms]
>> [Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M
>> Heap: 11.5G(31.0G)->8796.0M(31.0G)]
>> [Times: user=1.95 sys=0.00, real=0.17 secs]
>>
>>
>> On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl <thomas.schatzl at oracl
>> e.com<http://e.com>> wrote:
>> > Hi,
>> >
>> > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote:
>> > > Hi,
>> > > Can someone shed more light on
>> > why G1OldCSetRegionThresholdPercent
>> > > flag is under experimental (Need to add -
>> > > XX:+UnlockExperimentalVMOptions to modify it.)
>> >
>> > in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly
>> > as a
>> > "I really want to do that and I know what I am doing" confirmation
>> > from
>> > the user that he is aware that using this (in this case) option to
>> > influence the set of regions taken in during mixed gc you might get
>> > surprising behavior.
>> >
>> > Also, I think there has been no official documentation for it -
>> > also
>> > because it should be very rarely needed.
>> > In particular, I am curious about the case when it would be useful
>> > to
>> > change it. Could you give some log files showing that there is an
>> > issue
>> > with the upper bound for the number of old gen regions to take
>> > during
>> > GC? (i.e. the amount of old gen regions taken is too small and
>> > there is
>> > ample pause time left and it matters to clean up more regions in a
>> > single mixed gc?)
>> >
>> > Sometimes there are problems with the lower bound that is
>> > controlled by
>> > the -XX:G1MixedGCCountTarget (product level) option.
>> >
>> > Hth,
>> > Thomas
>> >
>> >


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170622/0905c5b6/attachment.htm>

From thomas.schatzl at oracle.com  Fri Jun 23 12:29:17 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 23 Jun 2017 14:29:17 +0200
Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag
In-Reply-To: <CAEY0QqCA9zV+uH1+EHFWVkUXuDyamk=hJcdznu_AqXi2SRSdMw@mail.gmail.com>
References: <CAEY0QqDYHq2ZhjOFSqLDxduHb9ZLAYaD9LFPrmfeVXwqO4HQfw@mail.gmail.com>
 <1498126699.2831.29.camel@oracle.com>
 <CAEY0QqBvzH8sXnD93o7qttAWRXMspyHQ32+H_gGmtKNWoJ2LWg@mail.gmail.com>
 <1498166179.2710.44.camel@oracle.com>
 <CAEY0QqCA9zV+uH1+EHFWVkUXuDyamk=hJcdznu_AqXi2SRSdMw@mail.gmail.com>
Message-ID: <1498220957.2741.68.camel@oracle.com>

Hi,

On Thu, 2017-06-22 at 15:11 -0700, Sundara Mohan M wrote:
> Thanks for the insights on Ergo.
> 
> I was trying to migrate from CMS to G1GC, the app has a low memory
> handler ( the thread which finds memory utilization from
> Runtime.getFreememory and removes some data from in memory if it
> exceeds the threshold).
> 
> In CMS this handler was not invoked frequently (for ex: When I have
> 60K objects it will kick in remove ~5K LRU objects and continue
> regular operation) when i moved to G1GC this handler started kicking
> in frequently(ex: When i have 60K object it will remove 5K LRU
> objects and immediately after some time it will kick in and remove
> another 5K and goes till 10K objects are left).
>
> So, i was trying to find out why did mixed GC doesn't cleanup quick
> enough before my low memory handler kicks in.

As Bernd mentioned, G1 is only very lazily reclaiming space containing
dead objects, so such an approach has its limits.

I think CMS has this CMSTriggerInterval option that starts background
collection, which immediately reclaims space in the end (updating its
freelist) afaik.

One could get updated liveness information by jcmd/system.gc() with
-XX:ExplicitGCInvokesConcurrent starting marking regularly currently,
but it has a few drawbacks of its own:
- starts liveness analysis/marking immediately, potentially messing
with your pause time requirements
- unknown impact on prediction
- does not do space reclamation on its own, as reclamation will be
piggy-backed on the next few gcs
- will interrupt a currently running space reclamation (mixed gc)
phase. I.e. if you spam these, g1 will never reclaim any memory.
- "creatively" reuses system.gc() which might not be possible or
advisable in many cases.
- all of the above is implementation defined behavior.

There may be other caveats.

In a VM where you do not have a lot of control about memory management
by design it is definitely problematic to have another memory manager
on top where one of them does not know anything about the other.

Such an algorithm may also interact badly with future changes e.g. the
adaptive IHOP [1] feature in jdk9.

> Though i see number of young gen collection and time taken to clean
> has came down by ~40%.
> 
> Another issue (may be this is expected) is after increasing
> G1OldCSetRegionThresholdPercent to 20% from 10% i am started seeing
> few mixed GC taking 1s (most of the time is spent on UpdateRS,
> MaxPause=500ms). Will get back once i have more understanding on what
> is happening..

The option allows G1 to add more regions to the set of regions to be
collected. This implies potentially longer pauses if the predictions
are incorrect in the first place.

That is one reason why this is an experimental option.

Thanks,
? Thomas

[1]?https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-col
lector.htm#GUID-572C9203-AB27-46F1-9D33-42BA4F3C6BF3


From kim.barrett at oracle.com  Sat Jun 24 18:03:52 2017
From: kim.barrett at oracle.com (Kim Barrett)
Date: Sat, 24 Jun 2017 14:03:52 -0400
Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken
In-Reply-To: <dec90bdd-9bb2-4f64-d45f-581fc0452459@oracle.com>
References: <dec90bdd-9bb2-4f64-d45f-581fc0452459@oracle.com>
Message-ID: <463199D0-EF95-4399-AA9A-D741002FA43C@oracle.com>

> On Jun 22, 2017, at 5:16 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> Hi all,
> 
> Please review this patch to fix and strengthen is_object_aligned checks when pointers are passed in:
> 
> http://cr.openjdk.java.net/~stefank/8178490/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8178490
> 
> is_object_aligned only works correctly for sizes measured in words.
> 
> When a pointer is passed into:
> 
> inline bool is_object_aligned(intptr_t addr) {
>  return addr == align_object_size(addr);
> }
> 
> inline intptr_t align_object_size(intptr_t size) { 
>  return align_size_up(size, MinObjAlignment); 
> }
> 
> the pointer is incorrectly interpreted as a word size and the alignment is checked against MinObjectAligment instead of MinObjectAlignmentInBytes
> 
> Tested with JPRT together with different patches for:
> 8178489 Make align functions more type safe and consistent
> 
> Thanks,
> StefanK

Looks good.


From stefan.karlsson at oracle.com  Mon Jun 26 07:08:40 2017
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Mon, 26 Jun 2017 09:08:40 +0200
Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken
In-Reply-To: <463199D0-EF95-4399-AA9A-D741002FA43C@oracle.com>
References: <dec90bdd-9bb2-4f64-d45f-581fc0452459@oracle.com>
 <463199D0-EF95-4399-AA9A-D741002FA43C@oracle.com>
Message-ID: <2d57a062-83cc-0c79-f525-650f50d59e95@oracle.com>

Thanks, Kim.

StefanK

On 2017-06-24 20:03, Kim Barrett wrote:
>> On Jun 22, 2017, at 5:16 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
>>
>> Hi all,
>>
>> Please review this patch to fix and strengthen is_object_aligned checks when pointers are passed in:
>>
>> http://cr.openjdk.java.net/~stefank/8178490/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8178490
>>
>> is_object_aligned only works correctly for sizes measured in words.
>>
>> When a pointer is passed into:
>>
>> inline bool is_object_aligned(intptr_t addr) {
>>   return addr == align_object_size(addr);
>> }
>>
>> inline intptr_t align_object_size(intptr_t size) {
>>   return align_size_up(size, MinObjAlignment);
>> }
>>
>> the pointer is incorrectly interpreted as a word size and the alignment is checked against MinObjectAligment instead of MinObjectAlignmentInBytes
>>
>> Tested with JPRT together with different patches for:
>> 8178489 Make align functions more type safe and consistent
>>
>> Thanks,
>> StefanK
> Looks good.
>


From erik.osterlund at oracle.com  Mon Jun 26 13:34:22 2017
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 26 Jun 2017 15:34:22 +0200
Subject: RFR (S): 8182703: Correct G1 barrier queue lock orderings
Message-ID: <59510D5E.10009@oracle.com>

Hi,

Webrev: http://cr.openjdk.java.net/~eosterlund/8182703/webrev.02/
Bug: https://bugs.openjdk.java.net/browse/JDK-8182703

The G1 barrier queues have very awkward lock orderings for the following 
reasons:

1) These queues may queue up things when performing a reference write or 
resolving a jweak (intentionally or just happened to be jweak, even 
though it looks like a jobject), which can happen in a lot of places in 
the code. We resolve JNIHandles while holding special locks in many 
places. We perform reference writes also in many places. Now the 
unsuspecting hotspot developer might think that it is okay to resolve a 
JNIHandle or perform a reference write while possibly holding a special 
lock. But no. In some cases, object writes have been moved out of locks 
and replaced with lock-free CAS, only to dodge the G1 write barrier 
locks. I don't think the G1 lock ordering issues should shape the shared 
code rather than the other way around.
2) There is an issue that the shared queue locks have a "special" rank, 
which is below the lock ranks used by the cbl monitor and free list 
monitor. This leads to an issue when these locks have to be taken while 
holding the shared queue locks. The current solution is to drop the 
shared queue locks temporarily, introducing nasty data races. These 
races are guarded, but the whole race seems very unnecessary.

I argue that if the G1 write barrier queue locks were simply set 
appropriately in the first place by analyzing what ranks they should 
have, none of the above issues would exist. Therefore I propose this new 
ordering.

Specifically, I recognize that locks required for performing memory 
accesses and resolving JNIHandles are more special than the "special" 
rank. Therefore, this change introduces a new lock ordering category 
called "access", which is to be used by barriers required to perform 
memory accesses. In other words, by recognizing the rank is more special 
than "special", we can remove "special" code to walk around making its 
rank more "special". That seems desirable to me. The access locks need 
to comply to the same constraints as the special locks: they may not 
perform safepoint checks.

The old lock ranks were:

SATB_Q_FL_lock: special
SATB_Q_CBL_mon: leaf - 1
Shared_SATB_Q_lock: leaf - 1

DirtyCardQ_FL_lock: special
DirtyCardQ_CBL_mon: leaf - 1
Shared_DirtyCardQ_lock: leaf - 1

The new lock ranks are:

SATB_Q_FL_lock: access (special - 2)
SATB_Q_CBL_mon: access (special - 2)
Shared_SATB_Q_lock: access + 1 (special - 1)

DirtyCardQ_FL_lock: access (special - 2)
DirtyCardQ_CBL_mon: access (special - 2)
Shared_DirtyCardQ_lock: access + 1 (special - 1)

Analysis:

Each PtrQueue and PtrQueueSet group, SATB or DirtyCardQ have the same 
group of locks. The free list lock, the completed buffer list monitor 
and the shared queue lock.

Observations:
1) The free list lock and completed buffer list monitors (members of 
PtrQueueSet) are disjoint. We never hold both of them at the same time.
Rationale: The free list lock is only used from 
PtrQueueSet::allocate_buffer, PtrQueueSet::deallocate_buffer and 
PtrQueueSet::reduce_free_list, and no callsite from there can be 
expanded where the cbl monitor is acquired. So therefore it is 
impossible to acquire the cbl monitor while holding the free list lock. 
The opposite case of acquiring the free list lock while holding the cbl 
monitor is also not possible; only the following places acquire the cbl 
monitor: PtrQueueSet::enqueue_complete_buffer, 
PtrQueueSet::merge_bufferlists, 
PtrQueueSet::assert_completed_buffer_list_len_correct, 
PtrQueueSet::notify_if_necessary, FreeIdSet::claim_par_id, 
FreeIdSet::release_par_id, DirtyCardQueueSet::get_completed_buffer, 
DirtyCardQueueSet::clear, 
SATBMarkQueueSet::apply_closure_to_completed_buffer and 
SATBMarkQueueSet::abandon_partial_marking. Again, neither of these paths 
where the cbl monitor is held can expand callsites to a place where the 
free list locks are held. Therefore it holds that the cbl monitor can 
not be held while the free list lock is held, and the free list lock can 
not be held while the cbl monitor is held. Therefore they are held 
disjointly.
2) We might hold the shared queue locks before acquiring the completed 
buffer list monitor. (today we drop the shared queue lock then and 
reacquire it later as a hack as already described)
3) We do not acquire a shared queue lock while holding the free list 
lock or completed buffer list monitor, as there is no reference from a 
PtrQueueSet to its shared queue, so those code paths do not know how to 
reach the shared PtrQueue to acquire its lock. The derived classes are 
exceptions but they never use the shared queue lock while holding the 
completed buffer list monitor or free list lock. DirtyCardQueueSet uses 
the shared queue for concatenating logs (in a safepoint without holding 
those locks). The SATBMarkQueueSet uses the shared queue for filtering 
the buffers, fiddling with activeness, printing and resetting, all 
without grabbing any locks.
4) We do not acquire any other lock (above event) while holding the free 
list lock or completed buffer list monitors. This was discovered by 
manually expanding the call graphs from where these two locks are held.

Derived constraints:
a) Because of observation 1, the free list lock and completed buffer 
list monitors can have the same rank.
b) Because of observations 1 and 2, the shared queue lock ought to have 
a rank higher than the ranks of the free list lock and the completed 
buffer list monitors (not the case today).
c) Because of of observation 3 and 2, the free list lock and completed 
buffer list monitors ought to have a rank lower than the rank of the 
shared queue lock.
d) Because of observation 4 (and constraints a-c), all the barrier locks 
should be below the "special" rank without violating any existing ranks.

The proposed new lock ranks conform to the constraints derived from my 
observations. It is worth noting that the potential relationship that 
could break (and why they do not) are:
1) If a lock is acquired from within the barriers that does not involve 
the shared queue lock, the free list lock or the completed buffer list 
monitor, we have now inverted their relationship as that other lock 
would probably have a rank higher than or equal to "special". But due to 
observation 4, there are no such cases.
2) The relationship between the shared queue lock and the completed 
buffer list monitor has been changed so both can be held at the same 
time if the shared queue lock is acquired first (which it is). This is 
arguably the way it should have been from the first place, and the old 
solution had ugly hacks where we would drop the shared queue lock to not 
run into the lock order assert (and only not to run into the lock order 
assert, i.e. not to avoid potential deadlock) to ensure the locks are 
not held at the same time. That code has now been removed, so that the 
shared queue lock is still held when enqueueing completed buffers (no 
dodgy dropping and reclaiming), and the code for handling the races due 
to multiple concurrent enqueuers has also been removed and replaced with 
an assertion that there simply should not be multiple concurrent 
enqueuers. Since the shared queue lock is now held throughout the whole 
operation, there will be no concurrent enqueuers.
3) The completed buffer list monitor used to have a higher rank than the 
free list lock. Now they have the same. Therefore, they could previously 
allow them to be held at the same time if the cbl monitor was acquired 
first. However, as discussed, there is no such case, and they ought to 
have the same rank not to confuse their true disjointness. If anyone 
insists we do not break this relationship despite the true disjointness, 
I could consent to adding another access lock rank, like this: 
http://cr.openjdk.java.net/~eosterlund/8182703/webrev.01/ but I think it 
seems better to have the same rank since they are actually truly 
disjoint and should remain disjoint.

I do recognize that long term, we *might* want a lock-free solution or 
something (not saying we do or do not). But until then, the ranks ought 
to be corrected so that they do not cause these problems causing 
everyone to bash their head against the awkward G1 lock ranks throughout 
the code and make hacks around it.

Testing: JPRT with hotspot all and lots of local testing.

Thanks,
/Erik


From thomas.schatzl at oracle.com  Mon Jun 26 14:42:13 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 26 Jun 2017 16:42:13 +0200
Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing
In-Reply-To: <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com>
References: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>
 <bae34b84-4b8d-3d4e-9954-a204da1ab50f@redhat.com>
 <a34ab9df-3e0a-b7ff-79e4-c4015ea843bc@oracle.com>
 <1497352882.2829.65.camel@oracle.com>
 <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com>
 <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com>
Message-ID: <1498488133.2665.37.camel@oracle.com>

Hi Sangheon,

? thanks for all your changes, and sorry a bit for the delay...

On Wed, 2017-06-14 at 00:52 -0700, sangheon wrote:
> Hi Thomas again,
> On 06/13/2017 02:21 PM, sangheon wrote:
> > 
> > Hi Thomas,
> > 
> > Thank you for reviewing this.
> > 
> > On 06/13/2017 04:21 AM, Thomas Schatzl wrote:
> > > 
> > > Hi Sangheon,
> > > 
> > > 
> > > On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote:
> > > > 
> > > > Hi Aleksey,
> > > > 
> > > > Thanks for the review.
> > > > 
> > > > On 06/12/2017 09:06 AM, Aleksey Shipilev wrote:
> > > > > 
> > > > > On 06/10/2017 01:57 AM, sangheon wrote:
> > > > > > 
> > > > > > CR: https://bugs.openjdk.java.net/browse/JDK-8173335
> > > > > > webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev
> > > > > > .0
> > > - There should be a destructor in ReferenceProcessor cleaning up
> > > the dynamically allocated memory.
> > Thomas and I had some discussion about this and agreed to file a?
> > separate CR for freeing issue.
> > 
> > I noticed that there's no destructor when I wrote this, but this is
> > how we usually implement.
> > However as this seems incorrect, I will add a destructor for newly?
> > added class but it will not be used in this patch.
> > It will be used in the following CR(?
> > https://bugs.openjdk.java.net/browse/JDK-8182120 ) which fixes?
> > not-freeing issue in ReferenceProcessor.
> > FYI, ReferenceProcessor has heap allocated members of?
> > ReferencePolicy(and its friends) but it is not freed too. So
> > instead of extending this patch, I propose to separate this freeing
> > issue.

That's fine, thanks.

> > 
> > > - the change should move gc+ref output to something else: there
> > > is so much additional junk printed with gc+ref=trace so that the
> > > phase logging is drowned out with real trace information and
> > > unusable for regular consumption.
> > Okay, I will add it.
> > But I asked introducing 'gc+ref+phases' before but you didn't like
> > it. :) Probably I didn't provide much details?!

Yes. In the example you showed me earlier with gc+ref=trace the
examples did not contain the other gc+ref=trace output. That's why I
thought it would be fine. :)

> > > 
> > > - I would prefer if resetting the reference phase times logger
> > > wouldn't be kind of an afterthought of printing :)
> > > 
> > > Also it might be useful to keep the data around for somewhat
> > > longer (not throw it away after every phase). Don't we need the
> > > data for further analysis?
> > I don't have strong opinion on this.
> > 
> > I didn't consider keeping log data for further analysis. This could
> > a minor reason for supporting keeping log data longer but I think?
> > interspersing with existing G1 log would be the main reason of
> > keeping it.
> > 
> > > 
> > > 
> > > This would also allow printing it later using different log tags
> > > (with different formatting).
> > > 
> > > - I like the split of phasetimes into data storage and printing.
> > > I do not like that basically the timing data is created twice,
> > > once for the phasetimes, once for the GCTimer (for JFR
> > > basically). No, currently timing data is created once and used
> > > for both phase log ?and GCTimer.
> > Or am I missing something?
> > 
> > So in summary, mostly I agree with your comments except below 2:
> > 1. Interspersing with G1 log.
> > 2. Keeping log data longer. (This should be done if we go with?
> > interspersing idea)
> I started working on above 2 items. :)
> I will update webrev when I'm ready.
> 


Thanks a lot for considering all my comments.

I think the output is much nicer now :)

Some more notes:

- In the current change (webrev.2) the method with using the
"direct_print()" getter seems a bit forced only to keep the current
structure of the code, i.e. printing within the
ReferenceProcessor::process_references() method.

What do you think about moving the printing outside of that method for
all collectors, just passing a (properly initialized - that allows
moving the reset() method into gc specific code as well)
ReferenceProcessorPhaseTimes* that is then later used for printing,
either directly, or deferred?

At the location where the reference processing is done we know whether
we need to print directly or deferred. This also hides pretty specific
information about printing (like indentation level) from the reference
processing itself.

Also that would maybe allow storing the GCTimer reference somewhere in
the ReferenceProcessorPhaseTimes so that?we only need to pass a single
container for timing information around.

Overall that may reduce the code quite a bit, keeps similar components
(GCTimer and ReferenceProcessorPhaseTimes) together without
ReferenceProcessor needing to know about both of them, and removes the
ReferenceProcessor "global" reference to the
ReferenceProcessorPhaseTimes, which is easier to keep track of when
looking at the code (instead of having the GCTimer passed in and the
ReferenceProcessorPhaseTimes as member).

The collectors that print immediately probably also can get away with a
stack-allocated local ReferenceProcessorPhaseTimes, which somewhat
simplifies their lifecycle management.

- could you please tighten the visibility of
ReferenceProcessorPhaseTimes methods a bit? The getters of that class
are only ever used in the print* methods, and even some of these print*
methods are ever called from class local methods.

I think this would drastically decrease the surface of that method.

- there seems to be a bug in printing per-thread per-phase worker
times, the values seem to contain the absolute time at which the list
has been processed, not a duration. (with -XX:+ParallelRefProcEnabled
and gc+phases+ref=trace)

[1512.286s][debug][gc,phases,ref] GC(834) Reference Processing: 2.5ms
[1512.286s][debug][gc,phases,ref] GC(834)???SoftReference: 0.3ms
[1512.286s][debug][gc,phases,ref] GC(834)?????Balance queues: 0.0ms
[1512.286s][debug][gc,phases,ref] GC(834)?????Phase1: 0.3ms
[1512.286s][trace][gc,phases,ref] GC(834)???????Process lists
(ms)????????Min: 1512283.9, Avg: 1512283.9, Max: 1512283.9, Diff:??0.0,
Sum: 34782529.1, Workers: 23
[1512.286s][debug][gc,phases,ref] GC(834)?????Phase2: 0.3ms
[1512.286s][trace][gc,phases,ref] GC(834)???????Process lists
(ms)????????Min: 1512284.2, Avg: 1512284.2, Max: 1512284.2, Diff:??0.0,
Sum: 34782535.9, Workers: 23

- in referenceProcessorPhaseTimes.cpp:35: the code reads

if (_worker_time != NULL) {
? ...
}

with _worker_time being set to NULL just one line above (same with the
other constructor).

Not sure.

- in RefProcWorkerTimeTracker::~RefProcWorkerTimeTracker: how is it
possible that _worker_time is NULL? ReferenceProcessorPhaseTimes seems
to always allocate memory for it.

- RefProcPhaseTimesTracker takes the DiscoveredList array as parameter,
but only ever uses it to determine how many total entries this
DiscoveredList[] has. So it seems to me that it would be better in the
name of information hiding if the ReferenceProcessor, which already has
a total_count() method, would just pass this total instead of the
entire list.

This would also remove the need for the max_gc_counts() getter in
ReferenceProcessorPhaseTimes afaics too.

- "Ref Counts" vs. "Reference Counts" vs. something else in the output
of the enqueue phase: I would prefer to not use abbreviations. Since we
already mess up the logging output in a big way, we might also just go
all the way :P

Thanks,
? Thomas


From thomas.schatzl at oracle.com  Tue Jun 27 13:34:07 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 27 Jun 2017 15:34:07 +0200
Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in
 HeapRegionManager::par_iterate
Message-ID: <1498570447.2750.9.camel@oracle.com>

Hi all,

? can I have a review for this change that removes an unused parameter
in HeapRegionManager, and propagating this change to the callers?

I think one Reviewer should be sufficient for this change.

CR:
https://bugs.openjdk.java.net/browse/JDK-8183002
Webrev:
http://cr.openjdk.java.net/~tschatzl/8183002/webrev/
Testing:
jprt

Thanks,
? Thomas


From thomas.schatzl at oracle.com  Tue Jun 27 13:34:08 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 27 Jun 2017 15:34:08 +0200
Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure
Message-ID: <1498570448.2750.10.camel@oracle.com>

Hi all,

? subject says it all.

I think one Reviewer should be sufficient for this change.

CR:
https://bugs.openjdk.java.net/browse/JDK-8183006
Webrev:
http://cr.openjdk.java.net/~tschatzl/8183006/webrev/
Testing:
local compilation

Thanks,
? Thomas


From erik.helin at oracle.com  Tue Jun 27 13:44:56 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Tue, 27 Jun 2017 15:44:56 +0200
Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in
 HeapRegionManager::par_iterate
In-Reply-To: <1498570447.2750.9.camel@oracle.com>
References: <1498570447.2750.9.camel@oracle.com>
Message-ID: <79e3a710-7f32-d058-a481-0ecdbf8f3b50@oracle.com>

On 06/27/2017 03:34 PM, Thomas Schatzl wrote:
> Hi all,
>
>   can I have a review for this change that removes an unused parameter
> in HeapRegionManager, and propagating this change to the callers?
>
> I think one Reviewer should be sufficient for this change.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8183002
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8183002/webrev/

Thank you Thomas, looks good, Reviewed!

Erik

> Testing:
> jprt
>
> Thanks,
>   Thomas
>


From erik.helin at oracle.com  Tue Jun 27 13:45:27 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Tue, 27 Jun 2017 15:45:27 +0200
Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure
In-Reply-To: <1498570448.2750.10.camel@oracle.com>
References: <1498570448.2750.10.camel@oracle.com>
Message-ID: <0cf994b8-967b-2dd3-b386-253dd7dcc036@oracle.com>

On 06/27/2017 03:34 PM, Thomas Schatzl wrote:
> Hi all,
>
>   subject says it all.
>
> I think one Reviewer should be sufficient for this change.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8183006
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8183006/webrev/
> Testing:
> local compilation

Looks good, Reviewed.

Thanks,
Erik

> Thanks,
>   Thomas
>


From thomas.schatzl at oracle.com  Tue Jun 27 13:54:36 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 27 Jun 2017 15:54:36 +0200
Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in
 HeapRegionManager::par_iterate
In-Reply-To: <79e3a710-7f32-d058-a481-0ecdbf8f3b50@oracle.com>
References: <1498570447.2750.9.camel@oracle.com>
 <79e3a710-7f32-d058-a481-0ecdbf8f3b50@oracle.com>
Message-ID: <1498571676.2750.12.camel@oracle.com>

Hi Erik,

On Tue, 2017-06-27 at 15:44 +0200, Erik Helin wrote:
> On 06/27/2017 03:34 PM, Thomas Schatzl wrote:
> > 
> > Hi all,
> > 
> > 
> > [...]
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8183002
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8183002/webrev/
> Thank you Thomas, looks good, Reviewed!

?thanks for your review.

Thomas


From thomas.schatzl at oracle.com  Tue Jun 27 13:54:57 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 27 Jun 2017 15:54:57 +0200
Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure
In-Reply-To: <0cf994b8-967b-2dd3-b386-253dd7dcc036@oracle.com>
References: <1498570448.2750.10.camel@oracle.com>
 <0cf994b8-967b-2dd3-b386-253dd7dcc036@oracle.com>
Message-ID: <1498571697.2750.13.camel@oracle.com>

Hi,

On Tue, 2017-06-27 at 15:45 +0200, Erik Helin wrote:
> On 06/27/2017 03:34 PM, Thomas Schatzl wrote:
> > 
> > Hi all,
> > 
> > [...]
> > 
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8183006
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8183006/webrev/
> > Testing:
> > local compilation
> Looks good, Reviewed.

?thanks for your review.

Thomas


From robbin.ehn at oracle.com  Tue Jun 27 14:51:45 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Tue, 27 Jun 2017 16:51:45 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
Message-ID: <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>

Hi Roman,

There is something wrong in calculations:
INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0 : pop=27051 free=215487

free is larger than population, have not had the time to dig into this.

Thanks, Robbin

On 06/22/2017 10:19 PM, Roman Kennke wrote:
> So here's the latest iteration of that patch:
> 
> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
> 
> I checked and fixed all the counters. The problem here is that they are
> not updated in a single place (deflate_idle_monitors() ) but in several
> places, potentially by multiple threads. I split up deflation into
> prepare_.. and a finish_.. methods to initialize local and update global
> counters respectively, and pass around a counters object (allocated on
> stack) to the various code paths that use it. Updating the counters
> always happen under a lock, there's no need to do anything special with
> regards to concurrency.
> 
> I also checked the nmethod marking, but there doesn't seem to be
> anything in that code that looks problematic under concurrency. The
> worst that can happen is that two threads write the same value into an
> nmethod field. I think we can live with that ;-)
> 
> Good to go?
> 
> Tested by running specjvm and jcstress fastdebug+release without issues.
> 
> Roman
> 
> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>> Hi Roman,
>>
>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>> Hi David,
>>> thanks for reviewing. I'll be on vacation the next two weeks too, with
>>> only sporadic access to work stuff.
>>> Yes, exposure will not be as good as otherwise, but it's not totally
>>> untested either: the serial code path is the same as the parallel, the
>>> only difference is that it's not actually called by multiple threads.
>>> It's ok I think.
>>>
>>> I found two more issues that I think should be addressed:
>>> - There are some counters in deflate_idle_monitors() and I'm not sure I
>>> correctly handle them in the split-up and MT'ed thread-local/ global
>>> list deflation
>>> - nmethod marking seems to unconditionally poke true or something like
>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>>> probably worth checking if it's already true, especially when doing this
>>> with multiple threads concurrently.
>>>
>>> I'll send an updated patch around later, I hope I can get to it today...
>>
>> I'll review that when you get it out.
>> I think this looks as a reasonable step before we tackle this with a
>> major effort, such as the JEP you and Carsten doing.
>> And another effort to 'fix' nmethods marking.
>>
>> Internal discussion yesterday lead us to conclude that the runtime
>> will probably need more threads.
>> This would be a good driver to do a 'global' worker pool which serves
>> both gc, runtime and safepoints with threads.
>>
>>>
>>> Roman
>>>
>>>> Hi Roman,
>>>>
>>>> I am about to disappear on an extended vacation so will let others
>>>> pursue this. IIUC this is longer an opt-in by the user at runtime, but
>>>> an opt-in by the particular GC developers. Okay. My only concern with
>>>> that is if Shenandoah is the only GC that currently opts in then this
>>>> code is not going to get much testing and will be more prone to
>>>> incidental breakage.
>>
>> As I mentioned before, it seem like Erik ? have some idea, maybe he
>> can do this after his barrier patch.
>>
>> Thanks!
>>
>> /Robbin
>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>> Hi Roman,
>>>>>>>
>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>
>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>
>>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>>> concurrent
>>>>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>>>>> require
>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>>> have
>>>>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>>>>> without
>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>>> corresponding
>>>>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>>>>> STS and
>>>>>>>>>> handle requests for safepoints not by yielding, but by leaving
>>>>>>>>>> the
>>>>>>>>>> task.
>>>>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>>>>> workers
>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I left the
>>>>>>>>>> API in
>>>>>>>>>> CollectedHeap in place. I think GC devs who know better about G1
>>>>>>>>>> and CMS
>>>>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>
>>>>>>>>>> Is it ok now?
>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>>> workers
>>>>>>>>> inside Shenandoah,
>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>>> e.g.:
>>>>>>>>>
>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>       _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>>>>>>>>> } else {
>>>>>>>>>       cleanup.work(0);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> That way you don't even need your new flags, but it will be up to
>>>>>>>>> the
>>>>>>>>> other GCs to make their worker available
>>>>>>>>> or cheat with a separate workgang.
>>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>>> The problem is that we do not want to haste such decision, we
>>>>>>> believe
>>>>>>> there is a better solution.
>>>>>>> I think you also would want another solution.
>>>>>>> But it's seems like such solution with 1 'global' thread pool either
>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>> Since this probably will not be done any time soon my suggestion is,
>>>>>>> to not hold you back (we also want this), just to make
>>>>>>> the code parallel and as an intermediate step ask the GC if it minds
>>>>>>> sharing it's thread.
>>>>>>>
>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will share
>>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>>> resolved.
>>>>>>>
>>>>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>>>>> flags for it we might limit our future options.
>>>>>>>
>>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I
>>>>>>>> see
>>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>>>>> However:
>>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>>> number
>>>>>>> of concurrent marking threads and parallel safepoint threads since
>>>>>>> they compete for cpu time.
>>>>>>> As the code looks now, I think that decisions must be made by the
>>>>>>> GC.
>>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>> Oops. Minor mistake there. Correction:
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>
>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>>> into
>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>
>>>>> Roman
>>>>>
>>>
> 


From alexander.harlap at oracle.com  Tue Jun 27 16:28:31 2017
From: alexander.harlap at oracle.com (Alexander Harlap)
Date: Tue, 27 Jun 2017 12:28:31 -0400
Subject: Need sponsor to push attached 8178507 into jdk10/hs/hostspt
Message-ID: <1fa400ef-1dcd-d43b-2646-053c68b0ab1f@oracle.com>

I need a sponsor to push attached 8178507.patch - co-locate 
nsk.regression.gc tests.

Patch should go into jdk10/hs/hotspot

Reviewed by Leonid Mesnik and Igor Ignatiev

Thank you,

Alex

-------------- next part --------------
# HG changeset patch
# User aharlap
# Date 1498580135 14400
# Node ID f8228472bcdc7ba4b184a3b7e9f5f571e95fe8b4
# Parent  7d3478491210390556a9f34210bc9bc8d9f5ebd1
8178507: co-locate nsk.regression.gc tests
Summary: convert four tonga tests into jtreg
Reviewed-by: lmesnik, iignatyev

diff -r 7d3478491210 -r f8228472bcdc make/test/JtregNative.gmk
--- a/make/test/JtregNative.gmk	Tue Jun 27 12:27:27 2017 +0000
+++ b/make/test/JtregNative.gmk	Tue Jun 27 12:15:35 2017 -0400
@@ -45,6 +45,7 @@
 BUILD_HOTSPOT_JTREG_NATIVE_SRC += \
     $(HOTSPOT_TOPDIR)/test/gc/g1/TestJNIWeakG1 \
     $(HOTSPOT_TOPDIR)/test/gc/stress/gclocker \
+    $(HOTSPOT_TOPDIR)/test/gc/cslocker \
     $(HOTSPOT_TOPDIR)/test/native_sanity \
     $(HOTSPOT_TOPDIR)/test/runtime/jni/8025979 \
     $(HOTSPOT_TOPDIR)/test/runtime/jni/8033445 \
diff -r 7d3478491210 -r f8228472bcdc test/gc/TestFullGCALot.java
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test/gc/TestFullGCALot.java	Tue Jun 27 12:15:35 2017 -0400
@@ -0,0 +1,38 @@
+/*
+ * Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+/*
+ * @test TestFullGCALot
+ * @key gc
+ * @bug 4187687
+ * @summary Ensure no acess violation when using FullGCALot
+ * @run main/othervm -XX:+FullGCALot TestFullGCALot
+ */
+
+public class TestFullGCALot {
+
+    public static void main(String argv[]) {
+        System.out.println("Hello world!");
+    }
+}
+
diff -r 7d3478491210 -r f8228472bcdc test/gc/TestMemoryInitialization.java
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test/gc/TestMemoryInitialization.java	Tue Jun 27 12:15:35 2017 -0400
@@ -0,0 +1,48 @@
+
+/*
+ * Copyright (c) 2002, 2017 Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+/*
+ * @test TestMemoryInitialization
+ * @key gc
+ * @bug 4668531
+ * @summary Simple test for -XX:+CheckMemoryInitialization doesn't crash VM
+ * @run main/othervm -XX:+CheckMemoryInitialization TestMemoryInitialization
+ */
+
+public class TestMemoryInitialization {
+    final static int LOOP_LENGTH = 10;
+    final static int CHUNK_SIZE = 1500000;
+
+    public static byte[] buffer;
+
+    public static void main(String args[]) {
+
+        for (int i = 0; i < LOOP_LENGTH; i++) {
+            for (int j = 0; j < LOOP_LENGTH; j++) {
+                buffer = new byte[CHUNK_SIZE];
+                buffer = null;
+            }
+        }
+    }
+}
diff -r 7d3478491210 -r f8228472bcdc test/gc/TestStackOverflow.java
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test/gc/TestStackOverflow.java	Tue Jun 27 12:15:35 2017 -0400
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2002, 2017 Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+/*
+ * @test TestStackOverflow
+ * @key gc
+ * @bug 4396719
+ * @summary Test verifies only that VM doesn't crash but throw expected Error.
+ * @run main/othervm TestStackOverflow
+ */
+
+public class TestStackOverflow {
+    final static int LOOP_LENGTH = 1000000;
+    final static int LOGGING_STEP = 10000;
+
+    public static void main(String args[]) {
+        Object object = null;
+
+        for (int i = 0; i < LOOP_LENGTH; i++) {
+
+            // Check progress
+            if (i % LOGGING_STEP == 0) {
+                System.out.println(i);
+            }
+            try {
+                Object array[] = {object, object, object, object, object};
+                object = array;
+            } catch (OutOfMemoryError e) {
+                object = null;
+                System.out.println("Caught OutOfMemoryError.");
+                return;
+            } catch (StackOverflowError e) {
+                object = null;
+                System.out.println("Caught StackOverflowError.");
+                return;
+            }
+        }
+    }
+}
+
diff -r 7d3478491210 -r f8228472bcdc test/gc/cslocker/TestCSLocker.java
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test/gc/cslocker/TestCSLocker.java	Tue Jun 27 12:15:35 2017 -0400
@@ -0,0 +1,98 @@
+/*
+ * Copyright (c) 2007, 2017 Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+/*
+ * @test TestCSLocker
+ * @key gc
+ * @bug 6186200
+ * @summary This short test check RFE 6186200 changes. One thread locked
+ * @summary completely in JNI CS, while other is trying to allocate memory
+ * @summary provoking GC. OOM means FAIL, deadlock means PASS.
+ * @run main/native/othervm -Xmx256m TestCSLocker
+ */
+
+public class TestCSLocker extends Thread
+{
+    static int timeout = 5000;
+    public static void main(String args[]) throws Exception {
+        long startTime = System.currentTimeMillis();
+
+        // start garbage producer thread
+        GarbageProducer garbageProducer = new GarbageProducer(1000000, 10);
+        garbageProducer.start();
+
+        // start CS locker thread
+        CSLocker csLocker = new CSLocker();
+        csLocker.start();
+
+        // check timeout to success deadlocking
+        while(System.currentTimeMillis() < startTime + timeout) {
+            System.out.println("sleeping...");
+            sleep(1000);
+        }
+
+        csLocker.unlock();
+        garbageProducer.interrupt();
+    }
+}
+
+class GarbageProducer extends Thread
+{
+    private int size;
+    private int sleepTime;
+
+    GarbageProducer(int size, int sleepTime) {
+        this.size = size;
+        this.sleepTime = sleepTime;
+    }
+
+    public void run() {
+        boolean isRunning = true;
+
+        while (isRunning) {
+            try {
+                int[] arr = null;
+                arr = new int[size];
+                sleep(sleepTime);
+            } catch (InterruptedException e) {
+                isRunning = false;
+            }
+        }
+    }
+}
+
+class CSLocker extends Thread
+{
+    static { System.loadLibrary("TestCSLocker"); }
+
+    public void run() {
+        int[] a = new int[10];
+        a[0] = 1;
+        if (!lock(a)) {
+            throw new RuntimeException("failed to acquire CSLock");
+        }
+    }
+
+    native boolean lock(int[] array);
+    native void unlock();
+}
diff -r 7d3478491210 -r f8228472bcdc test/gc/cslocker/libTestCSLocker.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test/gc/cslocker/libTestCSLocker.c	Tue Jun 27 12:15:35 2017 -0400
@@ -0,0 +1,49 @@
+/*
+ * Copyright (c) 2007, 2017 Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+#include <jni.h>
+
+static volatile int release_critical = 0;
+
+JNIEXPORT jboolean JNICALL Java_CSLocker_lock
+  (JNIEnv *env, jobject obj, jintArray array)
+{
+    jboolean retval = JNI_TRUE;
+    void *nativeArray = (*env)->GetPrimitiveArrayCritical(env, array, 0);
+
+    if (nativeArray == NULL) {
+        retval = JNI_FALSE;
+    }
+
+    // deadblock
+    while (!release_critical) /* empty */;
+
+    (*env)->ReleasePrimitiveArrayCritical(env, array, nativeArray, 0);
+    return retval;
+}
+
+JNIEXPORT void JNICALL Java_CSLocker_unlock
+  (JNIEnv *env, jobject obj)
+{
+    release_critical = 1;
+}

From sangheon.kim at oracle.com  Tue Jun 27 17:08:09 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Tue, 27 Jun 2017 10:08:09 -0700
Subject: Need sponsor to push attached 8178507 into jdk10/hs/hostspt
In-Reply-To: <1fa400ef-1dcd-d43b-2646-053c68b0ab1f@oracle.com>
References: <1fa400ef-1dcd-d43b-2646-053c68b0ab1f@oracle.com>
Message-ID: <24ade58f-becd-29ba-1775-122b54b42d35@oracle.com>

Hi Alex,

I can sponsor this.

Thanks,
Sangheon


On 06/27/2017 09:28 AM, Alexander Harlap wrote:
> I need a sponsor to push attached 8178507.patch - co-locate 
> nsk.regression.gc tests.
>
> Patch should go into jdk10/hs/hotspot
>
> Reviewed by Leonid Mesnik and Igor Ignatiev
>
> Thank you,
>
> Alex
>


From email.sundarms at gmail.com  Tue Jun 27 19:44:51 2017
From: email.sundarms at gmail.com (Sundara Mohan M)
Date: Tue, 27 Jun 2017 12:44:51 -0700
Subject: Any idea why max = -1(-1K) in G1GC
Message-ID: <CAEY0QqA47njNr1bpf1m0nbYOFzeCHrtuP7=nrQo0aAOmjb0ZEg@mail.gmail.com>

When i try to get pool.getUsage() and print it i am getting

G1 Eden Space
init = 27262976(26624K) used = 0(0K) committed = 0(0K) max = -1(-1K)
G1 Survivor Space
init = 0(0K) used = 0(0K) committed = 0(0K) max = -1(-1K)
G1 Old Gen
init = 241172480(235520K) used = 0(0K) committed = 0(0K) max =
524288000(512000K)

With ConcMarkSweepGC

Par Eden Space
init = 71630848(69952K) used = 0(0K) committed = 0(0K) max = 139853824(136576K)
Par Survivor Space
init = 8912896(8704K) used = 0(0K) committed = 0(0K) max = 17432576(17024K)
CMS Old Gen
init = 178978816(174784K) used = 0(0K) committed = 0(0K) max =
349569024(341376K)


code
for (MemoryPoolMXBean pool : ManagementFactory.getMemoryPoolMXBeans()) {
   System.out.println(pool.getUsage())
}

Thanks,
Sundar


From rkennke at redhat.com  Tue Jun 27 19:47:21 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 27 Jun 2017 21:47:21 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
Message-ID: <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>

Hi Robbin,

Ugh. Thanks for catching this.
Problem was that I was accounting the thread-local deflations twice:
once in thread-local processing (basically a leftover from my earlier
attempt to implement this accounting) and then again in
finish_deflate_idle_monitors(). Should be fixed here:

http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
<http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>

Side question: which jtreg targets do you usually run?

Trying: make test TEST=hotspot_all
gives me *lots* of failures due to missing jcstress stuff (?!)
 And even other subsets seem to depend on several bits and pieces that I
have no idea about.

Roman

Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
> Hi Roman,
>
> There is something wrong in calculations:
> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0
> : pop=27051 free=215487
>
> free is larger than population, have not had the time to dig into this.
>
> Thanks, Robbin
>
> On 06/22/2017 10:19 PM, Roman Kennke wrote:
>> So here's the latest iteration of that patch:
>>
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>
>> I checked and fixed all the counters. The problem here is that they are
>> not updated in a single place (deflate_idle_monitors() ) but in several
>> places, potentially by multiple threads. I split up deflation into
>> prepare_.. and a finish_.. methods to initialize local and update global
>> counters respectively, and pass around a counters object (allocated on
>> stack) to the various code paths that use it. Updating the counters
>> always happen under a lock, there's no need to do anything special with
>> regards to concurrency.
>>
>> I also checked the nmethod marking, but there doesn't seem to be
>> anything in that code that looks problematic under concurrency. The
>> worst that can happen is that two threads write the same value into an
>> nmethod field. I think we can live with that ;-)
>>
>> Good to go?
>>
>> Tested by running specjvm and jcstress fastdebug+release without issues.
>>
>> Roman
>>
>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>> Hi Roman,
>>>
>>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>> Hi David,
>>>> thanks for reviewing. I'll be on vacation the next two weeks too, with
>>>> only sporadic access to work stuff.
>>>> Yes, exposure will not be as good as otherwise, but it's not totally
>>>> untested either: the serial code path is the same as the parallel, the
>>>> only difference is that it's not actually called by multiple threads.
>>>> It's ok I think.
>>>>
>>>> I found two more issues that I think should be addressed:
>>>> - There are some counters in deflate_idle_monitors() and I'm not
>>>> sure I
>>>> correctly handle them in the split-up and MT'ed thread-local/ global
>>>> list deflation
>>>> - nmethod marking seems to unconditionally poke true or something like
>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>>>> probably worth checking if it's already true, especially when doing
>>>> this
>>>> with multiple threads concurrently.
>>>>
>>>> I'll send an updated patch around later, I hope I can get to it
>>>> today...
>>>
>>> I'll review that when you get it out.
>>> I think this looks as a reasonable step before we tackle this with a
>>> major effort, such as the JEP you and Carsten doing.
>>> And another effort to 'fix' nmethods marking.
>>>
>>> Internal discussion yesterday lead us to conclude that the runtime
>>> will probably need more threads.
>>> This would be a good driver to do a 'global' worker pool which serves
>>> both gc, runtime and safepoints with threads.
>>>
>>>>
>>>> Roman
>>>>
>>>>> Hi Roman,
>>>>>
>>>>> I am about to disappear on an extended vacation so will let others
>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime,
>>>>> but
>>>>> an opt-in by the particular GC developers. Okay. My only concern with
>>>>> that is if Shenandoah is the only GC that currently opts in then this
>>>>> code is not going to get much testing and will be more prone to
>>>>> incidental breakage.
>>>
>>> As I mentioned before, it seem like Erik ? have some idea, maybe he
>>> can do this after his barrier patch.
>>>
>>> Thanks!
>>>
>>> /Robbin
>>>
>>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>>> Hi Roman,
>>>>>>>>
>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>>
>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>>
>>>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>>>> concurrent
>>>>>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>>>>>> require
>>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>>>> have
>>>>>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>>>>>> without
>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>>>> corresponding
>>>>>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>>>>>> STS and
>>>>>>>>>>> handle requests for safepoints not by yielding, but by leaving
>>>>>>>>>>> the
>>>>>>>>>>> task.
>>>>>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>>>>>> workers
>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I
>>>>>>>>>>> left the
>>>>>>>>>>> API in
>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>>>>>>> about G1
>>>>>>>>>>> and CMS
>>>>>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>>
>>>>>>>>>>> Is it ok now?
>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>>>> workers
>>>>>>>>>> inside Shenandoah,
>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>>>> e.g.:
>>>>>>>>>>
>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>>       _cleanup_workers->run_task(&cleanup,
>>>>>>>>>> _num_cleanup_workers);
>>>>>>>>>> } else {
>>>>>>>>>>       cleanup.work(0);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> That way you don't even need your new flags, but it will be
>>>>>>>>>> up to
>>>>>>>>>> the
>>>>>>>>>> other GCs to make their worker available
>>>>>>>>>> or cheat with a separate workgang.
>>>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>>>> The problem is that we do not want to haste such decision, we
>>>>>>>> believe
>>>>>>>> there is a better solution.
>>>>>>>> I think you also would want another solution.
>>>>>>>> But it's seems like such solution with 1 'global' thread pool
>>>>>>>> either
>>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>>> Since this probably will not be done any time soon my
>>>>>>>> suggestion is,
>>>>>>>> to not hold you back (we also want this), just to make
>>>>>>>> the code parallel and as an intermediate step ask the GC if it
>>>>>>>> minds
>>>>>>>> sharing it's thread.
>>>>>>>>
>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will
>>>>>>>> share
>>>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>>>> resolved.
>>>>>>>>
>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>>>>>> flags for it we might limit our future options.
>>>>>>>>
>>>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I
>>>>>>>>> see
>>>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>>>>>> However:
>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>>>> number
>>>>>>>> of concurrent marking threads and parallel safepoint threads since
>>>>>>>> they compete for cpu time.
>>>>>>>> As the code looks now, I think that decisions must be made by the
>>>>>>>> GC.
>>>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>> Oops. Minor mistake there. Correction:
>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>>
>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>>>> into
>>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>>
>>>>>> Roman
>>>>>>
>>>>
>>


From thomas.schatzl at oracle.com  Wed Jun 28 08:25:57 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 28 Jun 2017 10:25:57 +0200
Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure
In-Reply-To: <1498128249.2831.38.camel@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <1493986396.2777.61.camel@oracle.com>
 <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com>
 <1497945947.2784.6.camel@oracle.com> <1498128249.2831.38.camel@oracle.com>
Message-ID: <1498638357.2874.6.camel@oracle.com>

Hi all,

? Erik suggested a few more refactorings:

- rename G1ParClosureSuper ->?G1ScanClosureBase
- rename a few "oops_in_heap_closure" parameter -> "update_rs_cl"
- move instantiation of closures from oops_into_collection_set_do()
into scan_rem_set()/update_rem_set() methods.

I assume these are the final ones :)

Webrevs:
http://cr.openjdk.java.net/~tschatzl/8175554/webrev.3/?(full)
http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2_to_3/?(diff)
Testing:
jprt

Thanks,
? Thomas

On Thu, 2017-06-22 at 12:44 +0200, Thomas Schatzl wrote:
> Hi all,
> 
> ? after discussion with Erik, I removed one comment, and renamed the
> closures to something that resembles their use. Also I had to
> reintroduce the G1ParPushRefClosure removed in the initial patch due
> to
> performance regressions.
> 
> G1UpdateOrScanRSClosure -> G1ScanObjsDuringUpdateRSClosure
> G1ParPushRefClosure -> G1ScanObjsDuringScanRSClosure
> G1ParScanClosure -> G1ScanEvacuatedObjClosure
> 
> We also found that the mechanism to collect cards that contain
> references into the collection set to not lose any remembered set
> entries during update RS if there is an evacuation failure is
> basically
> superfluous. Other, existing mechanism make sure that all required
> remembered sets are (re-)created in other stages of the GC.
> 
> Removal of this code has been decided to be out of scope here.
> 
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8175554/webrev.1_to_2/?(diff)
> http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2/?(full)
> Testing:
> jprt, local testing
> 
> Thanks,
> ? Thomas
> 
> 
> On Tue, 2017-06-20 at 10:05 +0200, Thomas Schatzl wrote:
> > 
> > Hi Sangheon, others,
> > 
> > On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote:
> > > 
> > > 
> > > Hi Thomas,
> > > 
> > > On 05/05/2017 05:13 AM, Thomas Schatzl wrote:
> > > > 
> > > > 
> > > > 
> > > > Hi all,
> > > > 
> > > > ???recent reviews have made changes necessary to parts of the
> > > > changeset chain.
> > > > 
> > > > Here is a list of links to updated webrevs. Since they have
> > > > apparently not been reviewed yet, I simply overwrote the old
> > > > webrevs.
> > > > 
> > > > JDK-8177044: Remove _scan_top from HeapRegion
> > > > http://cr.openjdk.java.net/~tschatzl/8177044/webrev/
> > > > 
> > > > JDK-8178148: Log more detailed information about scan rs phase
> > > > http://cr.openjdk.java.net/~tschatzl/8178148/webrev/
> > > > 
> > > > JDK-8175554: Improve G1UpdateRSOrPushRefClosure
> > > > http://cr.openjdk.java.net/~tschatzl/8175554/webrev/
> > > Looks good to me.
> > > I only have minor nits.
> > > 
> > > ------------------------------------------------------
> > > src/share/vm/gc/g1/g1OopClosures.hpp
> > > ???78???virtual void do_oop(oop* p) { do_oop_nv(p); }
> > > Misaligned with above line.
> > > 
> > > ------------------------------------------------------
> > > src/share/vm/gc/g1/g1RemSet.hpp
> > > ? 204???????????????????G1UpdateOrScanRSClosure* push_heap_cl,
> > > Rename to reflect new closure name?
> > > 
> > > ------------------------------------------------------
> > > src/share/vm/gc/g1/g1RootProcessor.hpp
> > > Copyright update.
> > > 
> > > ------------------------------------------------------
> > > src/share/vm/gc/g1/g1_specialized_oop_closures.hpp
> > > ???45???????f(G1UpdateOrScanRSClosure,_nv)?????????\
> > > Misaligned '\'.
> > > 
> > ? I fixed all this in addition to incorporating ErikD's comments
> > that
> > asked for factoring out two parts of the G1ParScanClosure and
> > G1UpdateOrScanRSClosure that were equal now.
> > 
> > I did some performance testing again due to that, and also found
> > that
> > the check to filter out non-cross-region references
> > in?G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also
> > reverted it to the old code.
> > 
> > Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not
> > update
> > _has_refs_into_cset as before. Fixed that as well.
> > 
> > Thanks,
> > ? Thomas
> > 


From erik.helin at oracle.com  Wed Jun 28 08:28:04 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Wed, 28 Jun 2017 10:28:04 +0200
Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure
In-Reply-To: <1498638357.2874.6.camel@oracle.com>
References: <1491910205.2754.31.camel@oracle.com>
 <1493986396.2777.61.camel@oracle.com>
 <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com>
 <1497945947.2784.6.camel@oracle.com> <1498128249.2831.38.camel@oracle.com>
 <1498638357.2874.6.camel@oracle.com>
Message-ID: <4eaee6f1-8cad-4f9c-262e-c047db21debc@oracle.com>

On 06/28/2017 10:25 AM, Thomas Schatzl wrote:
> Hi all,
>
>   Erik suggested a few more refactorings:
>
> - rename G1ParClosureSuper -> G1ScanClosureBase
> - rename a few "oops_in_heap_closure" parameter -> "update_rs_cl"
> - move instantiation of closures from oops_into_collection_set_do()
> into scan_rem_set()/update_rem_set() methods.
>
> I assume these are the final ones :)
>
> Webrevs:
> http://cr.openjdk.java.net/~tschatzl/8175554/webrev.3/ (full)
> http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2_to_3/ (diff)

Thank you Thomas, this looks really nice now! Reviewed and ready to go :)

Thanks,
Erik

> Testing:
> jprt
>
> Thanks,
>   Thomas
>
> On Thu, 2017-06-22 at 12:44 +0200, Thomas Schatzl wrote:
>> Hi all,
>>
>>   after discussion with Erik, I removed one comment, and renamed the
>> closures to something that resembles their use. Also I had to
>> reintroduce the G1ParPushRefClosure removed in the initial patch due
>> to
>> performance regressions.
>>
>> G1UpdateOrScanRSClosure -> G1ScanObjsDuringUpdateRSClosure
>> G1ParPushRefClosure -> G1ScanObjsDuringScanRSClosure
>> G1ParScanClosure -> G1ScanEvacuatedObjClosure
>>
>> We also found that the mechanism to collect cards that contain
>> references into the collection set to not lose any remembered set
>> entries during update RS if there is an evacuation failure is
>> basically
>> superfluous. Other, existing mechanism make sure that all required
>> remembered sets are (re-)created in other stages of the GC.
>>
>> Removal of this code has been decided to be out of scope here.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8175554/webrev.1_to_2/ (diff)
>> http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2/ (full)
>> Testing:
>> jprt, local testing
>>
>> Thanks,
>>   Thomas
>>
>>
>> On Tue, 2017-06-20 at 10:05 +0200, Thomas Schatzl wrote:
>>>
>>> Hi Sangheon, others,
>>>
>>> On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote:
>>>>
>>>>
>>>> Hi Thomas,
>>>>
>>>> On 05/05/2017 05:13 AM, Thomas Schatzl wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>>    recent reviews have made changes necessary to parts of the
>>>>> changeset chain.
>>>>>
>>>>> Here is a list of links to updated webrevs. Since they have
>>>>> apparently not been reviewed yet, I simply overwrote the old
>>>>> webrevs.
>>>>>
>>>>> JDK-8177044: Remove _scan_top from HeapRegion
>>>>> http://cr.openjdk.java.net/~tschatzl/8177044/webrev/
>>>>>
>>>>> JDK-8178148: Log more detailed information about scan rs phase
>>>>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev/
>>>>>
>>>>> JDK-8175554: Improve G1UpdateRSOrPushRefClosure
>>>>> http://cr.openjdk.java.net/~tschatzl/8175554/webrev/
>>>> Looks good to me.
>>>> I only have minor nits.
>>>>
>>>> ------------------------------------------------------
>>>> src/share/vm/gc/g1/g1OopClosures.hpp
>>>>    78   virtual void do_oop(oop* p) { do_oop_nv(p); }
>>>> Misaligned with above line.
>>>>
>>>> ------------------------------------------------------
>>>> src/share/vm/gc/g1/g1RemSet.hpp
>>>>   204                   G1UpdateOrScanRSClosure* push_heap_cl,
>>>> Rename to reflect new closure name?
>>>>
>>>> ------------------------------------------------------
>>>> src/share/vm/gc/g1/g1RootProcessor.hpp
>>>> Copyright update.
>>>>
>>>> ------------------------------------------------------
>>>> src/share/vm/gc/g1/g1_specialized_oop_closures.hpp
>>>>    45       f(G1UpdateOrScanRSClosure,_nv)         \
>>>> Misaligned '\'.
>>>>
>>>   I fixed all this in addition to incorporating ErikD's comments
>>> that
>>> asked for factoring out two parts of the G1ParScanClosure and
>>> G1UpdateOrScanRSClosure that were equal now.
>>>
>>> I did some performance testing again due to that, and also found
>>> that
>>> the check to filter out non-cross-region references
>>> in G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also
>>> reverted it to the old code.
>>>
>>> Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not
>>> update
>>> _has_refs_into_cset as before. Fixed that as well.
>>>
>>> Thanks,
>>>   Thomas
>>>


From stefan.johansson at oracle.com  Wed Jun 28 08:56:55 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 28 Jun 2017 10:56:55 +0200
Subject: Any idea why max = -1(-1K) in G1GC
In-Reply-To: <CAEY0QqA47njNr1bpf1m0nbYOFzeCHrtuP7=nrQo0aAOmjb0ZEg@mail.gmail.com>
References: <CAEY0QqA47njNr1bpf1m0nbYOFzeCHrtuP7=nrQo0aAOmjb0ZEg@mail.gmail.com>
Message-ID: <808e42d4-ad73-e25c-2fd9-955bb3e83d6f@oracle.com>

Hi Sundar,

I understand that this might be a bit confusing. The -1 means undefined 
and the reason the max is undefined for Eden and Survivor is that they 
are logical spaces within the G1 heap. Technically the same is true for 
the Old Gen, but to not lose information the heap capacity is used as 
the max for Old Gen.

Some more detailed information from comments in the code:

g1MemoryPool.hpp
  35 // This file contains the three classes that represent the memory0
  36 // pools of the G1 spaces: G1EdenPool, G1SurvivorPool, and
  37 // G1OldGenPool. In G1, unlike our other GCs, we do not have a
  38 // physical space for each of those spaces. Instead, we allocate
  39 // regions for all three spaces out of a single pool of regions (that
  40 // pool basically covers the entire heap). As a result, the eden,
  41 // survivor, and old gen are considered logical spaces in G1, as each
  42 // is a set of non-contiguous regions. This is also reflected in the
  43 // way we map them to memory pools here. The easiest way to have done
  44 // this would have been to map the entire G1 heap to a single memory
  45 // pool. However, it's helpful to show how large the eden and survivor
  46 // get, as this does affect the performance and behavior of G1. Which
  47 // is why we introduce the three memory pools implemented here.
  48 //
  49 // See comments in g1MonitoringSupport.hpp for additional details
  50 // on this model.

g1MonitoringSupport.hpp
  94 // * Max Capacity
  95 //
  96 //    For jstat, we set the max capacity of all spaces to 
heap_capacity,
  97 //    given that we don't always have a reasonable upper bound on 
how big
  98 //    each space can grow. For the memory pools, we make the max
  99 //    capacity undefined with the exception of the old memory pool for
100 //    which we make the max capacity same as the max heap capacity.

Cheers,
Stefan

On 2017-06-27 21:44, Sundara Mohan M wrote:
> When i try to get pool.getUsage() and print it i am getting
>
> G1 Eden Space
> init = 27262976(26624K) used = 0(0K) committed = 0(0K) max = -1(-1K)
> G1 Survivor Space
> init = 0(0K) used = 0(0K) committed = 0(0K) max = -1(-1K)
> G1 Old Gen
> init = 241172480(235520K) used = 0(0K) committed = 0(0K) max =
> 524288000(512000K)
>
> With ConcMarkSweepGC
>
> Par Eden Space
> init = 71630848(69952K) used = 0(0K) committed = 0(0K) max = 139853824(136576K)
> Par Survivor Space
> init = 8912896(8704K) used = 0(0K) committed = 0(0K) max = 17432576(17024K)
> CMS Old Gen
> init = 178978816(174784K) used = 0(0K) committed = 0(0K) max =
> 349569024(341376K)
>
>
> code
> for (MemoryPoolMXBean pool : ManagementFactory.getMemoryPoolMXBeans()) {
>     System.out.println(pool.getUsage())
> }
>
> Thanks,
> Sundar


From stefan.johansson at oracle.com  Wed Jun 28 08:59:25 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 28 Jun 2017 10:59:25 +0200
Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure
In-Reply-To: <1498570448.2750.10.camel@oracle.com>
References: <1498570448.2750.10.camel@oracle.com>
Message-ID: <b3c55175-7629-5056-62b4-8618581006c5@oracle.com>


On 2017-06-27 15:34, Thomas Schatzl wrote:
> Hi all,
>
>    subject says it all.
>
> I think one Reviewer should be sufficient for this change.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8183006
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8183006/webrev/
Good!

Thanks,
Stefan
> Testing:
> local compilation
>
> Thanks,
>    Thomas


From stefan.johansson at oracle.com  Wed Jun 28 09:02:10 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 28 Jun 2017 11:02:10 +0200
Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in
 HeapRegionManager::par_iterate
In-Reply-To: <1498570447.2750.9.camel@oracle.com>
References: <1498570447.2750.9.camel@oracle.com>
Message-ID: <452c263b-8d01-21a9-0dc1-5b87170658e4@oracle.com>

Hi Thomas,

On 2017-06-27 15:34, Thomas Schatzl wrote:
> Hi all,
>
>    can I have a review for this change that removes an unused parameter
> in HeapRegionManager, and propagating this change to the callers?
>
> I think one Reviewer should be sufficient for this change.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8183002
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8183002/webrev/
Nice cleanup, looks good!

Thanks,
Stefan
> Testing:
> jprt
>
> Thanks,
>    Thomas


From thomas.schatzl at oracle.com  Wed Jun 28 09:14:51 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 28 Jun 2017 11:14:51 +0200
Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure
In-Reply-To: <b3c55175-7629-5056-62b4-8618581006c5@oracle.com>
References: <1498570448.2750.10.camel@oracle.com>
 <b3c55175-7629-5056-62b4-8618581006c5@oracle.com>
Message-ID: <1498641291.2874.12.camel@oracle.com>

Hi Stefan,

On Wed, 2017-06-28 at 10:59 +0200, Stefan Johansson wrote:
> On 2017-06-27 15:34, Thomas Schatzl wrote:
> > 
> > Hi all,
> > 
> > ???subject says it all.
> > 
> > I think one Reviewer should be sufficient for this change.
> > 
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8183006
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8183006/webrev/
> Good!
> 

? thanks for your review ;)

Thomas


From thomas.schatzl at oracle.com  Wed Jun 28 09:15:59 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 28 Jun 2017 11:15:59 +0200
Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in
 HeapRegionManager::par_iterate
In-Reply-To: <452c263b-8d01-21a9-0dc1-5b87170658e4@oracle.com>
References: <1498570447.2750.9.camel@oracle.com>
 <452c263b-8d01-21a9-0dc1-5b87170658e4@oracle.com>
Message-ID: <1498641359.2874.13.camel@oracle.com>

Hi Stefan,

On Wed, 2017-06-28 at 11:02 +0200, Stefan Johansson wrote:
> Hi Thomas,
> 
> On 2017-06-27 15:34, Thomas Schatzl wrote:
> > 
> > Hi all,
> > 
> > ???can I have a review for this change that removes an unused
> > parameter
> > in HeapRegionManager, and propagating this change to the callers?
> > 
> > I think one Reviewer should be sufficient for this change.
> > 
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8183002
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8183002/webrev/
> Nice cleanup, looks good!

? thanks for your review.

Thomas


From thomas.schatzl at oracle.com  Wed Jun 28 10:34:38 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 28 Jun 2017 12:34:38 +0200
Subject: RFR (S): 8178151: Clean up G1RemSet files
Message-ID: <1498646078.2874.16.camel@oracle.com>

Hi all,

? can I have reviews for this small change that is supposed to tighten
the interface and clean up documentation for the g1Remset* files
including some better naming?

[Note: I already sent this out for review two months ago in that big
RFR thread with 7 changes. However, so much time has been elapsed since
then, and everything based on this has been pushed, so I figured it
would be simpler to just make an extra RFR request]

CR:
https://bugs.openjdk.java.net/browse/JDK-8178151
Webrev:
http://cr.openjdk.java.net/~tschatzl/8178151/webrev.1/
Testing:
jprt

Thanks,
? Thomas


From erik.helin at oracle.com  Wed Jun 28 11:44:38 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Wed, 28 Jun 2017 13:44:38 +0200
Subject: RFR (S): 8178151: Clean up G1RemSet files
In-Reply-To: <1498646078.2874.16.camel@oracle.com>
References: <1498646078.2874.16.camel@oracle.com>
Message-ID: <7869f721-8736-cedd-b28d-a5869b409a6a@oracle.com>

On 06/28/2017 12:34 PM, Thomas Schatzl wrote:
> Hi all,
>
>   can I have reviews for this small change that is supposed to tighten
> the interface and clean up documentation for the g1Remset* files
> including some better naming?
>
> [Note: I already sent this out for review two months ago in that big
> RFR thread with 7 changes. However, so much time has been elapsed since
> then, and everything based on this has been pushed, so I figured it
> would be simpler to just make an extra RFR request]
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8178151
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8178151/webrev.1/

Looks good, Reviewed.

Thanks,
Erik

> Testing:
> jprt
>
> Thanks,
>   Thomas
>
>


From stefan.johansson at oracle.com  Wed Jun 28 11:57:27 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 28 Jun 2017 13:57:27 +0200
Subject: RFR (S): 8178151: Clean up G1RemSet files
In-Reply-To: <1498646078.2874.16.camel@oracle.com>
References: <1498646078.2874.16.camel@oracle.com>
Message-ID: <3d6f1488-0958-1ac3-7405-020e11dd9395@oracle.com>

Hi Thomas,

On 2017-06-28 12:34, Thomas Schatzl wrote:
> Hi all,
>
>    can I have reviews for this small change that is supposed to tighten
> the interface and clean up documentation for the g1Remset* files
> including some better naming?
>
> [Note: I already sent this out for review two months ago in that big
> RFR thread with 7 changes. However, so much time has been elapsed since
> then, and everything based on this has been pushed, so I figured it
> would be simpler to just make an extra RFR request]
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8178151
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8178151/webrev.1/
Looks good.

Thanks,
Stefan

> Testing:
> jprt
>
> Thanks,
>    Thomas
>
>


From thomas.schatzl at oracle.com  Wed Jun 28 12:18:32 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 28 Jun 2017 14:18:32 +0200
Subject: RFR (S): 8178151: Clean up G1RemSet files
In-Reply-To: <3d6f1488-0958-1ac3-7405-020e11dd9395@oracle.com>
References: <1498646078.2874.16.camel@oracle.com>
 <3d6f1488-0958-1ac3-7405-020e11dd9395@oracle.com>
Message-ID: <1498652312.2874.22.camel@oracle.com>

Hi Stefan, Erik,

On Wed, 2017-06-28 at 13:57 +0200, Stefan Johansson wrote:
> Hi Thomas,
> 
> On 2017-06-28 12:34, Thomas Schatzl wrote:
> > 
> > Hi all,
> > 
> > ???can I have reviews for this small change that is supposed to
> > tighten
> > the interface and clean up documentation for the g1Remset* files
> > including some better naming?
> > 
> > [Note: I already sent this out for review two months ago in that
> > big RFR thread with 7 changes. However, so much time has been
> > elapsed since then, and everything based on this has been pushed,
> > so I figured it would be simpler to just make an extra RFR request]
> > 
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8178151
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8178151/webrev.1/
> Looks good.

? thanks for the reviews.

Since this is a renaming/brushing up code quality only changeset I will
push asap (and it's been hanging around out for review for >8 weeks).

Thanks,
? Thomas


From erik.helin at oracle.com  Wed Jun 28 12:26:53 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Wed, 28 Jun 2017 14:26:53 +0200
Subject: RFR: Rename RefineRecordRefsIntoCSCardTableEntryClosure to
 G1RefineCardClosure
Message-ID: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com>

Hi all,

please see the below inlined patch that just renames 
RefineRecordRefsIntoCSCardTableEntryClosure to more sensible 
G1RefineCardClosure.

Bug: https://bugs.openjdk.java.net/browse/JDK-8183122
Testing: make hotspot

Thanks,
Erik

# HG changeset patch
# User ehelin
# Date 1498652248 -7200
#      Wed Jun 28 14:17:28 2017 +0200
# Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac
# Parent  46d3ce319f37d2996fb0393a4f54f7759148bd1d
8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to 
G1RefineCardClosure

diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp
--- a/src/share/vm/gc/g1/g1RemSet.cpp   Wed Jun 28 12:11:55 2017 +0200
+++ b/src/share/vm/gc/g1/g1RemSet.cpp   Wed Jun 28 14:17:28 2017 +0200
@@ -438,15 +438,14 @@
  // Closure used for updating RSets and recording references that
  // point into the collection set. Only called during an
  // evacuation pause.
-
-class RefineRecordRefsIntoCSCardTableEntryClosure: public 
CardTableEntryClosure {
+class G1RefineCardClosure: public CardTableEntryClosure {
    G1RemSet* _g1rs;
    DirtyCardQueue* _into_cset_dcq;
    G1ScanObjsDuringUpdateRSClosure* _update_rs_cl;
  public:
-  RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h,
-                                              DirtyCardQueue* 
into_cset_dcq,
- 
G1ScanObjsDuringUpdateRSClosure* update_rs_cl) :
+  G1RefineCardClosure(G1CollectedHeap* g1h,
+                      DirtyCardQueue* into_cset_dcq,
+                      G1ScanObjsDuringUpdateRSClosure* update_rs_cl) :
      _g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq), 
_update_rs_cl(update_rs_cl)
    {}

@@ -474,16 +473,16 @@
                                G1ParScanThreadState* pss,
                                uint worker_i) {
    G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i);
-  RefineRecordRefsIntoCSCardTableEntryClosure 
into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl);
+  G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq, &update_rs_cl);

    G1GCParPhaseTimesTracker x(_g1p->phase_times(), 
G1GCPhaseTimes::UpdateRS, worker_i);
    if (G1HotCardCache::default_use_cache()) {
      // Apply the closure to the entries of the hot card cache.
      G1GCParPhaseTimesTracker y(_g1p->phase_times(), 
G1GCPhaseTimes::ScanHCC, worker_i);
-    _g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i);
+    _g1->iterate_hcc_closure(&refine_card_cl, worker_i);
    }
    // Apply the closure to all remaining log entries.
-  _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, worker_i);
+  _g1->iterate_dirty_card_closure(&refine_card_cl, worker_i);
  }

  void G1RemSet::cleanupHRRS() {


From thomas.schatzl at oracle.com  Wed Jun 28 12:41:13 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 28 Jun 2017 14:41:13 +0200
Subject: RFR: 8183122: Rename
 RefineRecordRefsIntoCSCardTableEntryClosure to G1RefineCardClosure
In-Reply-To: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com>
References: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com>
Message-ID: <1498653673.2874.25.camel@oracle.com>

On Wed, 2017-06-28 at 14:26 +0200, Erik Helin wrote:
> Hi all,
> 
> please see the below inlined patch that just renames?
> RefineRecordRefsIntoCSCardTableEntryClosure to more sensible?
> G1RefineCardClosure.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8183122
> Testing: make hotspot

? it's a bit hard to read (an attachment would have been better imho),
but... looks good :)

Thomas

> 
> Thanks,
> Erik
> 
> # HG changeset patch
> # User ehelin
> # Date 1498652248 -7200
> #??????Wed Jun 28 14:17:28 2017 +0200
> # Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac
> # Parent??46d3ce319f37d2996fb0393a4f54f7759148bd1d
> 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to?
> G1RefineCardClosure
> 
> diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp
> --- a/src/share/vm/gc/g1/g1RemSet.cpp???Wed Jun 28 12:11:55 2017
> +0200
> +++ b/src/share/vm/gc/g1/g1RemSet.cpp???Wed Jun 28 14:17:28 2017
> +0200
> @@ -438,15 +438,14 @@
> ? // Closure used for updating RSets and recording references that
> ? // point into the collection set. Only called during an
> ? // evacuation pause.
> -
> -class RefineRecordRefsIntoCSCardTableEntryClosure: public?
> CardTableEntryClosure {
> +class G1RefineCardClosure: public CardTableEntryClosure {
> ????G1RemSet* _g1rs;
> ????DirtyCardQueue* _into_cset_dcq;
> ????G1ScanObjsDuringUpdateRSClosure* _update_rs_cl;
> ? public:
> -??RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h,
> -??????????????????????????????????????????????DirtyCardQueue*?
> into_cset_dcq,
> -?
> G1ScanObjsDuringUpdateRSClosure* update_rs_cl) :
> +??G1RefineCardClosure(G1CollectedHeap* g1h,
> +??????????????????????DirtyCardQueue* into_cset_dcq,
> +??????????????????????G1ScanObjsDuringUpdateRSClosure* update_rs_cl)
> :
> ??????_g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq),?
> _update_rs_cl(update_rs_cl)
> ????{}
> 
> @@ -474,16 +473,16 @@
> ????????????????????????????????G1ParScanThreadState* pss,
> ????????????????????????????????uint worker_i) {
> ????G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i);
> -??RefineRecordRefsIntoCSCardTableEntryClosure?
> into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl);
> +??G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq,
> &update_rs_cl);
> 
> ????G1GCParPhaseTimesTracker x(_g1p->phase_times(),?
> G1GCPhaseTimes::UpdateRS, worker_i);
> ????if (G1HotCardCache::default_use_cache()) {
> ??????// Apply the closure to the entries of the hot card cache.
> ??????G1GCParPhaseTimesTracker y(_g1p->phase_times(),?
> G1GCPhaseTimes::ScanHCC, worker_i);
> -????_g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i);
> +????_g1->iterate_hcc_closure(&refine_card_cl, worker_i);
> ????}
> ????// Apply the closure to all remaining log entries.
> -??_g1->iterate_dirty_card_closure(&into_cset_update_rs_cl,
> worker_i);
> +??_g1->iterate_dirty_card_closure(&refine_card_cl, worker_i);
> ? }
> 
> ? void G1RemSet::cleanupHRRS() {


From erik.helin at oracle.com  Wed Jun 28 12:59:35 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Wed, 28 Jun 2017 14:59:35 +0200
Subject: RFR:
Message-ID: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com>

Hi all,

this small patch removes the class OopsInHeapRegionClosure. 
OopsInHeapRegionClosure only contains a protected _from field and the 
public method set_from, and there are only two other classes inheriting 
from OopsInHeapRegionClosure (G1ScanClosureBase and UpdareRsetDeferred).

This patch gets rid of the class OopsInHeapRegionClosure and adds the 
corresponding field and method to the classes inheriting from 
OopsInHeapRegionClosure.

Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/

Bug: https://bugs.openjdk.java.net/browse/JDK-8183124

Testing: make jprt

Thanks,
Erik


From erik.helin at oracle.com  Wed Jun 28 13:30:53 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Wed, 28 Jun 2017 15:30:53 +0200
Subject: RFR: 8183124: Remove OopsInHeapRegionClosure
In-Reply-To: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com>
References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com>
Message-ID: <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com>

...and now with subject as well :)

Erik

On 06/28/2017 02:59 PM, Erik Helin wrote:
> Hi all,
>
> this small patch removes the class OopsInHeapRegionClosure.
> OopsInHeapRegionClosure only contains a protected _from field and the
> public method set_from, and there are only two other classes inheriting
> from OopsInHeapRegionClosure (G1ScanClosureBase and UpdareRsetDeferred).
>
> This patch gets rid of the class OopsInHeapRegionClosure and adds the
> corresponding field and method to the classes inheriting from
> OopsInHeapRegionClosure.
>
> Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8183124
>
> Testing: make jprt
>
> Thanks,
> Erik


From stefan.johansson at oracle.com  Wed Jun 28 13:36:36 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 28 Jun 2017 15:36:36 +0200
Subject: RFR: 8183124: Remove OopsInHeapRegionClosure
In-Reply-To: <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com>
References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com>
 <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com>
Message-ID: <bcbc8fe6-03e0-7cc1-15a4-1f3b1f549999@oracle.com>


On 2017-06-28 15:30, Erik Helin wrote:
> ...and now with subject as well :)
>
> Erik
>
> On 06/28/2017 02:59 PM, Erik Helin wrote:
>> Hi all,
>>
>> this small patch removes the class OopsInHeapRegionClosure.
>> OopsInHeapRegionClosure only contains a protected _from field and the
>> public method set_from, and there are only two other classes inheriting
>> from OopsInHeapRegionClosure (G1ScanClosureBase and UpdareRsetDeferred).
>>
>> This patch gets rid of the class OopsInHeapRegionClosure and adds the
>> corresponding field and method to the classes inheriting from
>> OopsInHeapRegionClosure.
>>
>> Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/
>>
Looks good,
StefanJ
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183124
>>
>> Testing: make jprt
>>
>> Thanks,
>> Erik


From stefan.johansson at oracle.com  Wed Jun 28 13:39:10 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 28 Jun 2017 15:39:10 +0200
Subject: RFR: Rename RefineRecordRefsIntoCSCardTableEntryClosure to
 G1RefineCardClosure
In-Reply-To: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com>
References: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com>
Message-ID: <784fc363-85e2-fb2a-2fcc-5f62e9c58d5c@oracle.com>

Good,
StefanJ

On 2017-06-28 14:26, Erik Helin wrote:
> Hi all,
>
> please see the below inlined patch that just renames 
> RefineRecordRefsIntoCSCardTableEntryClosure to more sensible 
> G1RefineCardClosure.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8183122
> Testing: make hotspot
>
> Thanks,
> Erik
>
> # HG changeset patch
> # User ehelin
> # Date 1498652248 -7200
> #      Wed Jun 28 14:17:28 2017 +0200
> # Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac
> # Parent  46d3ce319f37d2996fb0393a4f54f7759148bd1d
> 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to 
> G1RefineCardClosure
>
> diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp
> --- a/src/share/vm/gc/g1/g1RemSet.cpp   Wed Jun 28 12:11:55 2017 +0200
> +++ b/src/share/vm/gc/g1/g1RemSet.cpp   Wed Jun 28 14:17:28 2017 +0200
> @@ -438,15 +438,14 @@
>  // Closure used for updating RSets and recording references that
>  // point into the collection set. Only called during an
>  // evacuation pause.
> -
> -class RefineRecordRefsIntoCSCardTableEntryClosure: public 
> CardTableEntryClosure {
> +class G1RefineCardClosure: public CardTableEntryClosure {
>    G1RemSet* _g1rs;
>    DirtyCardQueue* _into_cset_dcq;
>    G1ScanObjsDuringUpdateRSClosure* _update_rs_cl;
>  public:
> -  RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h,
> -                                              DirtyCardQueue* 
> into_cset_dcq,
> - G1ScanObjsDuringUpdateRSClosure* update_rs_cl) :
> +  G1RefineCardClosure(G1CollectedHeap* g1h,
> +                      DirtyCardQueue* into_cset_dcq,
> +                      G1ScanObjsDuringUpdateRSClosure* update_rs_cl) :
>      _g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq), 
> _update_rs_cl(update_rs_cl)
>    {}
>
> @@ -474,16 +473,16 @@
>                                G1ParScanThreadState* pss,
>                                uint worker_i) {
>    G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i);
> -  RefineRecordRefsIntoCSCardTableEntryClosure 
> into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl);
> +  G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq, &update_rs_cl);
>
>    G1GCParPhaseTimesTracker x(_g1p->phase_times(), 
> G1GCPhaseTimes::UpdateRS, worker_i);
>    if (G1HotCardCache::default_use_cache()) {
>      // Apply the closure to the entries of the hot card cache.
>      G1GCParPhaseTimesTracker y(_g1p->phase_times(), 
> G1GCPhaseTimes::ScanHCC, worker_i);
> -    _g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i);
> +    _g1->iterate_hcc_closure(&refine_card_cl, worker_i);
>    }
>    // Apply the closure to all remaining log entries.
> -  _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, worker_i);
> +  _g1->iterate_dirty_card_closure(&refine_card_cl, worker_i);
>  }
>
>  void G1RemSet::cleanupHRRS() {


From erik.helin at oracle.com  Wed Jun 28 14:19:01 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Wed, 28 Jun 2017 16:19:01 +0200
Subject: RFR: 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure
 to G1RefineCardClosure
In-Reply-To: <1498653673.2874.25.camel@oracle.com>
References: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com>
 <1498653673.2874.25.camel@oracle.com>
Message-ID: <b39c6fed-1588-9f0b-2073-ef103930ab38@oracle.com>

On 06/28/2017 02:41 PM, Thomas Schatzl wrote:
> On Wed, 2017-06-28 at 14:26 +0200, Erik Helin wrote:
>> Hi all,
>>
>> please see the below inlined patch that just renames
>> RefineRecordRefsIntoCSCardTableEntryClosure to more sensible
>> G1RefineCardClosure.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183122
>> Testing: make hotspot
>
>   it's a bit hard to read (an attachment would have been better imho),
> but... looks good :)

Yeah, I wasn't sure if this list accepted attachments :/ Anyways, thanks 
for reviewing!

Erik

> Thomas
>
>>
>> Thanks,
>> Erik
>>
>> # HG changeset patch
>> # User ehelin
>> # Date 1498652248 -7200
>> #      Wed Jun 28 14:17:28 2017 +0200
>> # Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac
>> # Parent  46d3ce319f37d2996fb0393a4f54f7759148bd1d
>> 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to
>> G1RefineCardClosure
>>
>> diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp
>> --- a/src/share/vm/gc/g1/g1RemSet.cpp   Wed Jun 28 12:11:55 2017
>> +0200
>> +++ b/src/share/vm/gc/g1/g1RemSet.cpp   Wed Jun 28 14:17:28 2017
>> +0200
>> @@ -438,15 +438,14 @@
>>   // Closure used for updating RSets and recording references that
>>   // point into the collection set. Only called during an
>>   // evacuation pause.
>> -
>> -class RefineRecordRefsIntoCSCardTableEntryClosure: public
>> CardTableEntryClosure {
>> +class G1RefineCardClosure: public CardTableEntryClosure {
>>     G1RemSet* _g1rs;
>>     DirtyCardQueue* _into_cset_dcq;
>>     G1ScanObjsDuringUpdateRSClosure* _update_rs_cl;
>>   public:
>> -  RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h,
>> -                                              DirtyCardQueue*
>> into_cset_dcq,
>> -
>> G1ScanObjsDuringUpdateRSClosure* update_rs_cl) :
>> +  G1RefineCardClosure(G1CollectedHeap* g1h,
>> +                      DirtyCardQueue* into_cset_dcq,
>> +                      G1ScanObjsDuringUpdateRSClosure* update_rs_cl)
>> :
>>       _g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq),
>> _update_rs_cl(update_rs_cl)
>>     {}
>>
>> @@ -474,16 +473,16 @@
>>                                 G1ParScanThreadState* pss,
>>                                 uint worker_i) {
>>     G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i);
>> -  RefineRecordRefsIntoCSCardTableEntryClosure
>> into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl);
>> +  G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq,
>> &update_rs_cl);
>>
>>     G1GCParPhaseTimesTracker x(_g1p->phase_times(),
>> G1GCPhaseTimes::UpdateRS, worker_i);
>>     if (G1HotCardCache::default_use_cache()) {
>>       // Apply the closure to the entries of the hot card cache.
>>       G1GCParPhaseTimesTracker y(_g1p->phase_times(),
>> G1GCPhaseTimes::ScanHCC, worker_i);
>> -    _g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i);
>> +    _g1->iterate_hcc_closure(&refine_card_cl, worker_i);
>>     }
>>     // Apply the closure to all remaining log entries.
>> -  _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl,
>> worker_i);
>> +  _g1->iterate_dirty_card_closure(&refine_card_cl, worker_i);
>>   }
>>
>>   void G1RemSet::cleanupHRRS() {


From erik.helin at oracle.com  Wed Jun 28 14:19:26 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Wed, 28 Jun 2017 16:19:26 +0200
Subject: RFR: Rename RefineRecordRefsIntoCSCardTableEntryClosure to
 G1RefineCardClosure
In-Reply-To: <784fc363-85e2-fb2a-2fcc-5f62e9c58d5c@oracle.com>
References: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com>
 <784fc363-85e2-fb2a-2fcc-5f62e9c58d5c@oracle.com>
Message-ID: <13cf23c3-ffb9-a507-62aa-901c1483fa77@oracle.com>

On 06/28/2017 03:39 PM, Stefan Johansson wrote:
> Good,
> StefanJ

Thanks Stefan!
Erik

> On 2017-06-28 14:26, Erik Helin wrote:
>> Hi all,
>>
>> please see the below inlined patch that just renames
>> RefineRecordRefsIntoCSCardTableEntryClosure to more sensible
>> G1RefineCardClosure.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183122
>> Testing: make hotspot
>>
>> Thanks,
>> Erik
>>
>> # HG changeset patch
>> # User ehelin
>> # Date 1498652248 -7200
>> #      Wed Jun 28 14:17:28 2017 +0200
>> # Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac
>> # Parent  46d3ce319f37d2996fb0393a4f54f7759148bd1d
>> 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to
>> G1RefineCardClosure
>>
>> diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp
>> --- a/src/share/vm/gc/g1/g1RemSet.cpp   Wed Jun 28 12:11:55 2017 +0200
>> +++ b/src/share/vm/gc/g1/g1RemSet.cpp   Wed Jun 28 14:17:28 2017 +0200
>> @@ -438,15 +438,14 @@
>>  // Closure used for updating RSets and recording references that
>>  // point into the collection set. Only called during an
>>  // evacuation pause.
>> -
>> -class RefineRecordRefsIntoCSCardTableEntryClosure: public
>> CardTableEntryClosure {
>> +class G1RefineCardClosure: public CardTableEntryClosure {
>>    G1RemSet* _g1rs;
>>    DirtyCardQueue* _into_cset_dcq;
>>    G1ScanObjsDuringUpdateRSClosure* _update_rs_cl;
>>  public:
>> -  RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h,
>> -                                              DirtyCardQueue*
>> into_cset_dcq,
>> - G1ScanObjsDuringUpdateRSClosure* update_rs_cl) :
>> +  G1RefineCardClosure(G1CollectedHeap* g1h,
>> +                      DirtyCardQueue* into_cset_dcq,
>> +                      G1ScanObjsDuringUpdateRSClosure* update_rs_cl) :
>>      _g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq),
>> _update_rs_cl(update_rs_cl)
>>    {}
>>
>> @@ -474,16 +473,16 @@
>>                                G1ParScanThreadState* pss,
>>                                uint worker_i) {
>>    G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i);
>> -  RefineRecordRefsIntoCSCardTableEntryClosure
>> into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl);
>> +  G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq, &update_rs_cl);
>>
>>    G1GCParPhaseTimesTracker x(_g1p->phase_times(),
>> G1GCPhaseTimes::UpdateRS, worker_i);
>>    if (G1HotCardCache::default_use_cache()) {
>>      // Apply the closure to the entries of the hot card cache.
>>      G1GCParPhaseTimesTracker y(_g1p->phase_times(),
>> G1GCPhaseTimes::ScanHCC, worker_i);
>> -    _g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i);
>> +    _g1->iterate_hcc_closure(&refine_card_cl, worker_i);
>>    }
>>    // Apply the closure to all remaining log entries.
>> -  _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, worker_i);
>> +  _g1->iterate_dirty_card_closure(&refine_card_cl, worker_i);
>>  }
>>
>>  void G1RemSet::cleanupHRRS() {
>


From erik.helin at oracle.com  Wed Jun 28 14:20:58 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Wed, 28 Jun 2017 16:20:58 +0200
Subject: RFR: 8183124: Remove OopsInHeapRegionClosure
In-Reply-To: <bcbc8fe6-03e0-7cc1-15a4-1f3b1f549999@oracle.com>
References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com>
 <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com>
 <bcbc8fe6-03e0-7cc1-15a4-1f3b1f549999@oracle.com>
Message-ID: <a449d949-5ab0-08e0-b40f-51226eff1234@oracle.com>

On 06/28/2017 03:36 PM, Stefan Johansson wrote:
>
>
> On 2017-06-28 15:30, Erik Helin wrote:
>> ...and now with subject as well :)
>>
>> Erik
>>
>> On 06/28/2017 02:59 PM, Erik Helin wrote:
>>> Hi all,
>>>
>>> this small patch removes the class OopsInHeapRegionClosure.
>>> OopsInHeapRegionClosure only contains a protected _from field and the
>>> public method set_from, and there are only two other classes inheriting
>>> from OopsInHeapRegionClosure (G1ScanClosureBase and UpdareRsetDeferred).
>>>
>>> This patch gets rid of the class OopsInHeapRegionClosure and adds the
>>> corresponding field and method to the classes inheriting from
>>> OopsInHeapRegionClosure.
>>>
>>> Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/
>>>
> Looks good,

Thanks Stefan, appreciate the quick review!

Erik

> StefanJ
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183124
>>>
>>> Testing: make jprt
>>>
>>> Thanks,
>>> Erik
>


From thomas.schatzl at oracle.com  Wed Jun 28 14:53:59 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 28 Jun 2017 16:53:59 +0200
Subject: RFR: 8183124: Remove OopsInHeapRegionClosure
In-Reply-To: <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com>
References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com>
 <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com>
Message-ID: <1498661639.2755.10.camel@oracle.com>

Hi,

On Wed, 2017-06-28 at 15:30 +0200, Erik Helin wrote:
> ...and now with subject as well :)
> 
> Erik
> 
> On 06/28/2017 02:59 PM, Erik Helin wrote:
> > 
> > Hi all,
> > 
> > this small patch removes the class OopsInHeapRegionClosure.
> > OopsInHeapRegionClosure only contains a protected _from field and
> > the
> > public method set_from, and there are only two other classes
> > inheriting
> > from OopsInHeapRegionClosure (G1ScanClosureBase and
> > UpdareRsetDeferred).
> > 
> > This patch gets rid of the class OopsInHeapRegionClosure and adds
> > the
> > corresponding field and method to the classes inheriting from
> > OopsInHeapRegionClosure.
> > 
> > Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/
> > 
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8183124
> > 
> > Testing: make jprt

from my POV there are two reasons here:

- the additional class only for that field adds more overhead in
various aspects than just duplicating the member.

- using _from in UpdateRSDeferred is due to current way of checking for
cross-region pointers, using the _from value. It kind of saves us from
recreating it. However there are better options here that will fix both
JDK-8183127 and remove the need for the _from pointer completely.

So overall, this change seems good to me.

Thanks,
? Thomas


From erik.helin at oracle.com  Wed Jun 28 15:04:50 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Wed, 28 Jun 2017 17:04:50 +0200
Subject: RFR: 8183124: Remove OopsInHeapRegionClosure
In-Reply-To: <1498661639.2755.10.camel@oracle.com>
References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com>
 <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com>
 <1498661639.2755.10.camel@oracle.com>
Message-ID: <69a3e0e0-30a5-77ad-d9c2-dd2c6be95859@oracle.com>

On 06/28/2017 04:53 PM, Thomas Schatzl wrote:
> Hi,
>
> On Wed, 2017-06-28 at 15:30 +0200, Erik Helin wrote:
>> ...and now with subject as well :)
>>
>> Erik
>>
>> On 06/28/2017 02:59 PM, Erik Helin wrote:
>>>
>>> Hi all,
>>>
>>> this small patch removes the class OopsInHeapRegionClosure.
>>> OopsInHeapRegionClosure only contains a protected _from field and
>>> the
>>> public method set_from, and there are only two other classes
>>> inheriting
>>> from OopsInHeapRegionClosure (G1ScanClosureBase and
>>> UpdareRsetDeferred).
>>>
>>> This patch gets rid of the class OopsInHeapRegionClosure and adds
>>> the
>>> corresponding field and method to the classes inheriting from
>>> OopsInHeapRegionClosure.
>>>
>>> Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183124
>>>
>>> Testing: make jprt
>
> from my POV there are two reasons here:
>
> - the additional class only for that field adds more overhead in
> various aspects than just duplicating the member.

Yeah, thanks for clarifying. My motivation for this patch was mainly to 
get rid of the awkward inheritance. Having OopsInHeapRegionClosure is 
kind of like we would have a ClosureWithG1FieldClosure with a 
G1CollectedHeap* _g1h field, because many G1 closures has a _g1h field. 
This sort of code de-duplication is IMO worse than just having the field 
in multiple closures.

> - using _from in UpdateRSDeferred is due to current way of checking for
> cross-region pointers, using the _from value. It kind of saves us from
> recreating it. However there are better options here that will fix both
> JDK-8183127 and remove the need for the _from pointer completely.

This is a great idea, we should use HeapRegion::is_in_same_region (this 
will also make the code more similar to G1ParScanThreadState::update_rs).

Thanks,
Erik

> So overall, this change seems good to me.
>
> Thanks,
>   Thomas
>


From email.sundarms at gmail.com  Wed Jun 28 18:54:36 2017
From: email.sundarms at gmail.com (Sundara Mohan M)
Date: Wed, 28 Jun 2017 11:54:36 -0700
Subject: Why is G1GC collection usage threshold not updated early?
Message-ID: <CAEY0QqDfwzWpubot+nXpx61__bVqpc3R1xDiDr6=1KR1XSFQrQ@mail.gmail.com>

I am trying to estimate the free memory using metrics from
MemoryPoolMxBean.getCollectionUsage().

I am observing following behavior with G1GC

iteration=    0 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  100 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  200 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  300 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  400 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  500 - G1 Old Gen: u=  1% cu=  0% uth=75% cuth=75%
iteration=  600 - G1 Old Gen: u=  4% cu=  0% uth=75% cuth=75%
iteration=  700 - G1 Old Gen: u=  9% cu=  0% uth=75% cuth=75%
iteration=  800 - G1 Old Gen: u= 16% cu=  0% uth=75% cuth=75%
iteration=  900 - G1 Old Gen: u= 25% cu=  0% uth=75% cuth=75%
iteration= 1000 - G1 Old Gen: u= 34% cu=  0% uth=75% cuth=75%
iteration= 1100 - G1 Old Gen: u= 45% cu=  0% uth=75% cuth=75%
iteration= 1200 - G1 Old Gen: u= 38% cu=  0% uth=75% cuth=75%
iteration= 1300 - G1 Old Gen: u= 46% cu=  0% uth=75% cuth=75%
iteration= 1400 - G1 Old Gen: u= 52% cu=  0% uth=75% cuth=75%
iteration= 1500 - G1 Old Gen: u= 45% cu=  0% uth=75% cuth=75%
iteration= 1600 - G1 Old Gen: u= 67% cu=  0% uth=75% cuth=75%
iteration= 1700 - G1 Old Gen: u= 56% cu=  0% uth=75% cuth=75%
iteration= 1800 - G1 Old Gen: u= 55% cu=  0% uth=75% cuth=75%
iteration= 1900 - G1 Old Gen: u= 61% cu=  0% uth=75% cuth=75%
iteration= 2000 - G1 Old Gen: u= 56% cu=  0% uth=75% cuth=75%
iteration= 2100 - G1 Old Gen: u= 76% cu=  0% uth=75% cuth=75%
iteration= 2200 - G1 Old Gen: u= 65% cu=  0% uth=75% cuth=75%
iteration= 2300 - G1 Old Gen: u= 62% cu=  0% uth=75% cuth=75%
iteration= 2400 - G1 Old Gen: u= 75% cu=  0% uth=75% cuth=75%
iteration= 2500 - G1 Old Gen: u= 75% cu=  0% uth=75% cuth=75%
iteration= 2600 - G1 Old Gen: u= 72% cu=  0% uth=75% cuth=75%
iteration= 2700 - G1 Old Gen: u= 69% cu=  0% uth=75% cuth=75%
iteration= 2800 - G1 Old Gen: u= 74% cu=  0% uth=75% cuth=75%
iteration= 2900 - G1 Old Gen: u= 80% cu=  0% uth=75% cuth=75%
iteration= 3000 - G1 Old Gen: u= 83% cu=  0% uth=75% cuth=75%

*iteration= 3100 - G1 Old Gen: u= 89% cu=  0% uth=75% cuth=75%iteration=
3200 - G1 Old Gen: u= 71% cu= 59% uth=75% cuth=75%*
iteration= 3300 - G1 Old Gen: u= 90% cu= 59% uth=75% cuth=75%
iteration= 3400 - G1 Old Gen: u= 76% cu= 62% uth=75% cuth=75%
iteration= 3500 - G1 Old Gen: u= 65% cu= 65% uth=75% cuth=75%


CMS GC

iteration=    0 - CMS Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  100 - CMS Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  200 - CMS Old Gen: u=  1% cu=  0% uth=75% cuth=75%
iteration=  300 - CMS Old Gen: u=  3% cu=  0% uth=75% cuth=75%
iteration=  400 - CMS Old Gen: u= 12% cu=  0% uth=75% cuth=75%
iteration=  500 - CMS Old Gen: u= 19% cu=  0% uth=75% cuth=75%
iteration=  600 - CMS Old Gen: u= 34% cu=  0% uth=75% cuth=75%
iteration=  700 - CMS Old Gen: u= 43% cu=  0% uth=75% cuth=75%
*iteration=  800 - CMS Old Gen: u= 63% cu=  0% uth=75% cuth=75%*
*iteration=  900 - CMS Old Gen: u= 48% cu= 37% uth=75% cuth=75%*
*iteration= 1000 - CMS Old Gen: u= 60% cu= 37% uth=75% cuth=75%*
iteration= 1100 - CMS Old Gen: u= 58% cu= 45% uth=75% cuth=75%
iteration= 1200 - CMS Old Gen: u= 71% cu= 45% uth=75% cuth=75%
iteration= 1300 - CMS Old Gen: u= 66% cu= 53% uth=75% cuth=75%
iteration= 1400 - CMS Old Gen: u= 80% cu= 53% uth=75% cuth=75%

u = usage(getUsage), cu = collectionUsage (getCollectionUsage), uth = usage
threshold %, cuth = collection usage threshold %
my program just keeps allocating string and frees some strings.

1. Why does G1GC doesn't update it's collection usage till 59% whereas in
CMSGC it is updated at 37% itself?

Can someone shed more light on this?

Thanks,
Sundar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170628/8ffd4033/attachment.htm>

From robbin.ehn at oracle.com  Wed Jun 28 18:55:57 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 28 Jun 2017 20:55:57 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
 <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
Message-ID: <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>

Hi Roman

On 06/27/2017 09:47 PM, Roman Kennke wrote:
> Hi Robbin,
>
> Ugh. Thanks for catching this.
> Problem was that I was accounting the thread-local deflations twice:
> once in thread-local processing (basically a leftover from my earlier
> attempt to implement this accounting) and then again in
> finish_deflate_idle_monitors(). Should be fixed here:
>
> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>

Nit:
safepoint.cpp : ParallelSPCleanupTask
"const char* name = " is not needed and 1 is unused

>
> Side question: which jtreg targets do you usually run?

Right now I cherry pick directories from: hotspot/test/

I'm going to add a decent test group for local testing.

>
> Trying: make test TEST=hotspot_all
> gives me *lots* of failures due to missing jcstress stuff (?!)
>   And even other subsets seem to depend on several bits and pieces that I
> have no idea about.

Yes, you need to use internal tool 'jib' java integrate build to get that work or you can set some environment where the jcstress application stuff is...

I have a regression on ClassLoaderData root scanning, this should not be related,
but I only have 3 patches which could cause this, if it's not something in the environment that have changed.

Also do not see any immediate performance gains (off vs 4 threads), it might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
, but I need to-do some more testing. I know you often run with none default GSI.

I'll get back to you.

Thanks, Robbin

>
> Roman
>
> Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>> Hi Roman,
>>
>> There is something wrong in calculations:
>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0
>> : pop=27051 free=215487
>>
>> free is larger than population, have not had the time to dig into this.
>>
>> Thanks, Robbin
>>
>> On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>> So here's the latest iteration of that patch:
>>>
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>>
>>> I checked and fixed all the counters. The problem here is that they are
>>> not updated in a single place (deflate_idle_monitors() ) but in several
>>> places, potentially by multiple threads. I split up deflation into
>>> prepare_.. and a finish_.. methods to initialize local and update global
>>> counters respectively, and pass around a counters object (allocated on
>>> stack) to the various code paths that use it. Updating the counters
>>> always happen under a lock, there's no need to do anything special with
>>> regards to concurrency.
>>>
>>> I also checked the nmethod marking, but there doesn't seem to be
>>> anything in that code that looks problematic under concurrency. The
>>> worst that can happen is that two threads write the same value into an
>>> nmethod field. I think we can live with that ;-)
>>>
>>> Good to go?
>>>
>>> Tested by running specjvm and jcstress fastdebug+release without issues.
>>>
>>> Roman
>>>
>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>>> Hi Roman,
>>>>
>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>>> Hi David,
>>>>> thanks for reviewing. I'll be on vacation the next two weeks too, with
>>>>> only sporadic access to work stuff.
>>>>> Yes, exposure will not be as good as otherwise, but it's not totally
>>>>> untested either: the serial code path is the same as the parallel, the
>>>>> only difference is that it's not actually called by multiple threads.
>>>>> It's ok I think.
>>>>>
>>>>> I found two more issues that I think should be addressed:
>>>>> - There are some counters in deflate_idle_monitors() and I'm not
>>>>> sure I
>>>>> correctly handle them in the split-up and MT'ed thread-local/ global
>>>>> list deflation
>>>>> - nmethod marking seems to unconditionally poke true or something like
>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>>>>> probably worth checking if it's already true, especially when doing
>>>>> this
>>>>> with multiple threads concurrently.
>>>>>
>>>>> I'll send an updated patch around later, I hope I can get to it
>>>>> today...
>>>>
>>>> I'll review that when you get it out.
>>>> I think this looks as a reasonable step before we tackle this with a
>>>> major effort, such as the JEP you and Carsten doing.
>>>> And another effort to 'fix' nmethods marking.
>>>>
>>>> Internal discussion yesterday lead us to conclude that the runtime
>>>> will probably need more threads.
>>>> This would be a good driver to do a 'global' worker pool which serves
>>>> both gc, runtime and safepoints with threads.
>>>>
>>>>>
>>>>> Roman
>>>>>
>>>>>> Hi Roman,
>>>>>>
>>>>>> I am about to disappear on an extended vacation so will let others
>>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime,
>>>>>> but
>>>>>> an opt-in by the particular GC developers. Okay. My only concern with
>>>>>> that is if Shenandoah is the only GC that currently opts in then this
>>>>>> code is not going to get much testing and will be more prone to
>>>>>> incidental breakage.
>>>>
>>>> As I mentioned before, it seem like Erik ? have some idea, maybe he
>>>> can do this after his barrier patch.
>>>>
>>>> Thanks!
>>>>
>>>> /Robbin
>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> David
>>>>>>
>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>>>> Hi Roman,
>>>>>>>>>
>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>>>
>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>>>
>>>>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>>>>> concurrent
>>>>>>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>>>>>>> require
>>>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>>>>> have
>>>>>>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>>>>>>> without
>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>>>>> corresponding
>>>>>>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>>>>>>> STS and
>>>>>>>>>>>> handle requests for safepoints not by yielding, but by leaving
>>>>>>>>>>>> the
>>>>>>>>>>>> task.
>>>>>>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>>>>>>> workers
>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I
>>>>>>>>>>>> left the
>>>>>>>>>>>> API in
>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>>>>>>>> about G1
>>>>>>>>>>>> and CMS
>>>>>>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>>>
>>>>>>>>>>>> Is it ok now?
>>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>>>>> workers
>>>>>>>>>>> inside Shenandoah,
>>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>>>>> e.g.:
>>>>>>>>>>>
>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>>>        _cleanup_workers->run_task(&cleanup,
>>>>>>>>>>> _num_cleanup_workers);
>>>>>>>>>>> } else {
>>>>>>>>>>>        cleanup.work(0);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> That way you don't even need your new flags, but it will be
>>>>>>>>>>> up to
>>>>>>>>>>> the
>>>>>>>>>>> other GCs to make their worker available
>>>>>>>>>>> or cheat with a separate workgang.
>>>>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>>>>> The problem is that we do not want to haste such decision, we
>>>>>>>>> believe
>>>>>>>>> there is a better solution.
>>>>>>>>> I think you also would want another solution.
>>>>>>>>> But it's seems like such solution with 1 'global' thread pool
>>>>>>>>> either
>>>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>>>> Since this probably will not be done any time soon my
>>>>>>>>> suggestion is,
>>>>>>>>> to not hold you back (we also want this), just to make
>>>>>>>>> the code parallel and as an intermediate step ask the GC if it
>>>>>>>>> minds
>>>>>>>>> sharing it's thread.
>>>>>>>>>
>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will
>>>>>>>>> share
>>>>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>>>>> resolved.
>>>>>>>>>
>>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>>>>>>> flags for it we might limit our future options.
>>>>>>>>>
>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I
>>>>>>>>>> see
>>>>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>>>>>>> However:
>>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>>>>> number
>>>>>>>>> of concurrent marking threads and parallel safepoint threads since
>>>>>>>>> they compete for cpu time.
>>>>>>>>> As the code looks now, I think that decisions must be made by the
>>>>>>>>> GC.
>>>>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>>> Oops. Minor mistake there. Correction:
>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>>>
>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>>>>> into
>>>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>>>
>>>>>>> Roman
>>>>>>>
>>>>>
>>>
>


From ecki at zusammenkunft.net  Wed Jun 28 20:21:38 2017
From: ecki at zusammenkunft.net (Bernd Eckenfels)
Date: Wed, 28 Jun 2017 20:21:38 +0000
Subject: Why is G1GC collection usage threshold not updated early?
In-Reply-To: <CAEY0QqDfwzWpubot+nXpx61__bVqpc3R1xDiDr6=1KR1XSFQrQ@mail.gmail.com>
References: <CAEY0QqDfwzWpubot+nXpx61__bVqpc3R1xDiDr6=1KR1XSFQrQ@mail.gmail.com>
Message-ID: <HE1PR08MB2795C06D31B268C2A7FFC24EFFDD0@HE1PR08MB2795.eurprd08.prod.outlook.com>

I guess G1 started much later with Mixed Collections compared to CMS. And when no GC happens the cu is not 0. you should maybe log the collection count as well.

There is BTW A GC User mailinglist as well.

Gruss
Bernd
--
http://bernd.eckenfels.net
________________________________
From: hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net> on behalf of Sundara Mohan M <email.sundarms at gmail.com>
Sent: Wednesday, June 28, 2017 8:54:36 PM
To: hotspot-gc-dev at openjdk.java.net
Subject: Why is G1GC collection usage threshold not updated early?

I am trying to estimate the free memory using metrics from MemoryPoolMxBean.getCollectionUsage().

I am observing following behavior with G1GC

iteration=    0 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  100 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  200 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  300 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  400 - G1 Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  500 - G1 Old Gen: u=  1% cu=  0% uth=75% cuth=75%
iteration=  600 - G1 Old Gen: u=  4% cu=  0% uth=75% cuth=75%
iteration=  700 - G1 Old Gen: u=  9% cu=  0% uth=75% cuth=75%
iteration=  800 - G1 Old Gen: u= 16% cu=  0% uth=75% cuth=75%
iteration=  900 - G1 Old Gen: u= 25% cu=  0% uth=75% cuth=75%
iteration= 1000 - G1 Old Gen: u= 34% cu=  0% uth=75% cuth=75%
iteration= 1100 - G1 Old Gen: u= 45% cu=  0% uth=75% cuth=75%
iteration= 1200 - G1 Old Gen: u= 38% cu=  0% uth=75% cuth=75%
iteration= 1300 - G1 Old Gen: u= 46% cu=  0% uth=75% cuth=75%
iteration= 1400 - G1 Old Gen: u= 52% cu=  0% uth=75% cuth=75%
iteration= 1500 - G1 Old Gen: u= 45% cu=  0% uth=75% cuth=75%
iteration= 1600 - G1 Old Gen: u= 67% cu=  0% uth=75% cuth=75%
iteration= 1700 - G1 Old Gen: u= 56% cu=  0% uth=75% cuth=75%
iteration= 1800 - G1 Old Gen: u= 55% cu=  0% uth=75% cuth=75%
iteration= 1900 - G1 Old Gen: u= 61% cu=  0% uth=75% cuth=75%
iteration= 2000 - G1 Old Gen: u= 56% cu=  0% uth=75% cuth=75%
iteration= 2100 - G1 Old Gen: u= 76% cu=  0% uth=75% cuth=75%
iteration= 2200 - G1 Old Gen: u= 65% cu=  0% uth=75% cuth=75%
iteration= 2300 - G1 Old Gen: u= 62% cu=  0% uth=75% cuth=75%
iteration= 2400 - G1 Old Gen: u= 75% cu=  0% uth=75% cuth=75%
iteration= 2500 - G1 Old Gen: u= 75% cu=  0% uth=75% cuth=75%
iteration= 2600 - G1 Old Gen: u= 72% cu=  0% uth=75% cuth=75%
iteration= 2700 - G1 Old Gen: u= 69% cu=  0% uth=75% cuth=75%
iteration= 2800 - G1 Old Gen: u= 74% cu=  0% uth=75% cuth=75%
iteration= 2900 - G1 Old Gen: u= 80% cu=  0% uth=75% cuth=75%
iteration= 3000 - G1 Old Gen: u= 83% cu=  0% uth=75% cuth=75%
iteration= 3100 - G1 Old Gen: u= 89% cu=  0% uth=75% cuth=75%
iteration= 3200 - G1 Old Gen: u= 71% cu= 59% uth=75% cuth=75%
iteration= 3300 - G1 Old Gen: u= 90% cu= 59% uth=75% cuth=75%
iteration= 3400 - G1 Old Gen: u= 76% cu= 62% uth=75% cuth=75%
iteration= 3500 - G1 Old Gen: u= 65% cu= 65% uth=75% cuth=75%


CMS GC

iteration=    0 - CMS Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  100 - CMS Old Gen: u=  0% cu=  0% uth=75% cuth=75%
iteration=  200 - CMS Old Gen: u=  1% cu=  0% uth=75% cuth=75%
iteration=  300 - CMS Old Gen: u=  3% cu=  0% uth=75% cuth=75%
iteration=  400 - CMS Old Gen: u= 12% cu=  0% uth=75% cuth=75%
iteration=  500 - CMS Old Gen: u= 19% cu=  0% uth=75% cuth=75%
iteration=  600 - CMS Old Gen: u= 34% cu=  0% uth=75% cuth=75%
iteration=  700 - CMS Old Gen: u= 43% cu=  0% uth=75% cuth=75%
iteration=  800 - CMS Old Gen: u= 63% cu=  0% uth=75% cuth=75%
iteration=  900 - CMS Old Gen: u= 48% cu= 37% uth=75% cuth=75%
iteration= 1000 - CMS Old Gen: u= 60% cu= 37% uth=75% cuth=75%
iteration= 1100 - CMS Old Gen: u= 58% cu= 45% uth=75% cuth=75%
iteration= 1200 - CMS Old Gen: u= 71% cu= 45% uth=75% cuth=75%
iteration= 1300 - CMS Old Gen: u= 66% cu= 53% uth=75% cuth=75%
iteration= 1400 - CMS Old Gen: u= 80% cu= 53% uth=75% cuth=75%

u = usage(getUsage), cu = collectionUsage (getCollectionUsage), uth = usage threshold %, cuth = collection usage threshold %
my program just keeps allocating string and frees some strings.

1. Why does G1GC doesn't update it's collection usage till 59% whereas in CMSGC it is updated at 37% itself?

Can someone shed more light on this?

Thanks,
Sundar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170628/b2d17f7f/attachment.htm>

From rkennke at redhat.com  Wed Jun 28 20:23:37 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 28 Jun 2017 22:23:37 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
 <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
 <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>
Message-ID: <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>


>
> On 06/27/2017 09:47 PM, Roman Kennke wrote:
>> Hi Robbin,
>>
>> Ugh. Thanks for catching this.
>> Problem was that I was accounting the thread-local deflations twice:
>> once in thread-local processing (basically a leftover from my earlier
>> attempt to implement this accounting) and then again in
>> finish_deflate_idle_monitors(). Should be fixed here:
>>
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>
>
> Nit:
> safepoint.cpp : ParallelSPCleanupTask
> "const char* name = " is not needed and 1 is unused
>
Sorry, I don't understand what you mean by this. I see code like this:

      const char* name = "deflating idle monitors";

and it is used a few lines below, even 2x.

What's '1 is unused' ?

>>
>> Side question: which jtreg targets do you usually run?
>
> Right now I cherry pick directories from: hotspot/test/
>
> I'm going to add a decent test group for local testing.
That would be good!


>
>>
>> Trying: make test TEST=hotspot_all
>> gives me *lots* of failures due to missing jcstress stuff (?!)
>>   And even other subsets seem to depend on several bits and pieces
>> that I
>> have no idea about.
>
> Yes, you need to use internal tool 'jib' java integrate build to get
> that work or you can set some environment where the jcstress
> application stuff is...
Uhhh. We really do want a subset of tests that we can run reliably and
that are self-contained, how else are people (without that jib thingy)
supposed to do some sanity checking with their patches? ;-)
> I have a regression on ClassLoaderData root scanning, this should not
> be related,
> but I only have 3 patches which could cause this, if it's not
> something in the environment that have changed.
Let me know if it's my patch :-)
>
> Also do not see any immediate performance gains (off vs 4 threads), it
> might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
> , but I need to-do some more testing. I know you often run with none
> default GSI.

First of all, during the course of this review I reduced the change from
an actual implementation to a kind of framework, and it needs some
separate changes in the GC to make use of it. Not sure if you added
corresponding code in (e.g.) G1?

Also, this is only really visible in code that makes excessive use of
monitors, i.e. the one linked by Carsten's original patch, or the test
org.openjdk.gcbench.roots.Synchronizers.test in gc-bench:

http://icedtea.classpath.org/hg/gc-bench/

There are also some popular real-world apps that tend to do this. From
the top off my head, Cassandra is such an application.

Thanks, Roman

>
> I'll get back to you.
>
> Thanks, Robbin
>
>>
>> Roman
>>
>> Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>>> Hi Roman,
>>>
>>> There is something wrong in calculations:
>>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0
>>> : pop=27051 free=215487
>>>
>>> free is larger than population, have not had the time to dig into this.
>>>
>>> Thanks, Robbin
>>>
>>> On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>>> So here's the latest iteration of that patch:
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>>>
>>>> I checked and fixed all the counters. The problem here is that they
>>>> are
>>>> not updated in a single place (deflate_idle_monitors() ) but in
>>>> several
>>>> places, potentially by multiple threads. I split up deflation into
>>>> prepare_.. and a finish_.. methods to initialize local and update
>>>> global
>>>> counters respectively, and pass around a counters object (allocated on
>>>> stack) to the various code paths that use it. Updating the counters
>>>> always happen under a lock, there's no need to do anything special
>>>> with
>>>> regards to concurrency.
>>>>
>>>> I also checked the nmethod marking, but there doesn't seem to be
>>>> anything in that code that looks problematic under concurrency. The
>>>> worst that can happen is that two threads write the same value into an
>>>> nmethod field. I think we can live with that ;-)
>>>>
>>>> Good to go?
>>>>
>>>> Tested by running specjvm and jcstress fastdebug+release without
>>>> issues.
>>>>
>>>> Roman
>>>>
>>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>>>> Hi Roman,
>>>>>
>>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>>>> Hi David,
>>>>>> thanks for reviewing. I'll be on vacation the next two weeks too,
>>>>>> with
>>>>>> only sporadic access to work stuff.
>>>>>> Yes, exposure will not be as good as otherwise, but it's not totally
>>>>>> untested either: the serial code path is the same as the
>>>>>> parallel, the
>>>>>> only difference is that it's not actually called by multiple
>>>>>> threads.
>>>>>> It's ok I think.
>>>>>>
>>>>>> I found two more issues that I think should be addressed:
>>>>>> - There are some counters in deflate_idle_monitors() and I'm not
>>>>>> sure I
>>>>>> correctly handle them in the split-up and MT'ed thread-local/ global
>>>>>> list deflation
>>>>>> - nmethod marking seems to unconditionally poke true or something
>>>>>> like
>>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>>>>>> probably worth checking if it's already true, especially when doing
>>>>>> this
>>>>>> with multiple threads concurrently.
>>>>>>
>>>>>> I'll send an updated patch around later, I hope I can get to it
>>>>>> today...
>>>>>
>>>>> I'll review that when you get it out.
>>>>> I think this looks as a reasonable step before we tackle this with a
>>>>> major effort, such as the JEP you and Carsten doing.
>>>>> And another effort to 'fix' nmethods marking.
>>>>>
>>>>> Internal discussion yesterday lead us to conclude that the runtime
>>>>> will probably need more threads.
>>>>> This would be a good driver to do a 'global' worker pool which serves
>>>>> both gc, runtime and safepoints with threads.
>>>>>
>>>>>>
>>>>>> Roman
>>>>>>
>>>>>>> Hi Roman,
>>>>>>>
>>>>>>> I am about to disappear on an extended vacation so will let others
>>>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime,
>>>>>>> but
>>>>>>> an opt-in by the particular GC developers. Okay. My only concern
>>>>>>> with
>>>>>>> that is if Shenandoah is the only GC that currently opts in then
>>>>>>> this
>>>>>>> code is not going to get much testing and will be more prone to
>>>>>>> incidental breakage.
>>>>>
>>>>> As I mentioned before, it seem like Erik ? have some idea, maybe he
>>>>> can do this after his barrier patch.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> /Robbin
>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>>
>>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>>>>> Hi Roman,
>>>>>>>>>>
>>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>>>>
>>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>> GC work (which also uses the same workers). This does not
>>>>>>>>>>>>> only
>>>>>>>>>>>>> require
>>>>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>>>>>> have
>>>>>>>>>>>>> finished their tasks. This needs some careful handling to
>>>>>>>>>>>>> work
>>>>>>>>>>>>> without
>>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>>>>>> corresponding
>>>>>>>>>>>>> run_task() call and also the tasks themselves need to join
>>>>>>>>>>>>> the
>>>>>>>>>>>>> STS and
>>>>>>>>>>>>> handle requests for safepoints not by yielding, but by
>>>>>>>>>>>>> leaving
>>>>>>>>>>>>> the
>>>>>>>>>>>>> task.
>>>>>>>>>>>>> This is far too peculiar for me to make the call to hook
>>>>>>>>>>>>> up GC
>>>>>>>>>>>>> workers
>>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I
>>>>>>>>>>>>> left the
>>>>>>>>>>>>> API in
>>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>>>>>>>>> about G1
>>>>>>>>>>>>> and CMS
>>>>>>>>>>>>> should make that call, or else just use a separate thread
>>>>>>>>>>>>> pool.
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is it ok now?
>>>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>>>>>> workers
>>>>>>>>>>>> inside Shenandoah,
>>>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>>>>>> e.g.:
>>>>>>>>>>>>
>>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>>>>        _cleanup_workers->run_task(&cleanup,
>>>>>>>>>>>> _num_cleanup_workers);
>>>>>>>>>>>> } else {
>>>>>>>>>>>>        cleanup.work(0);
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> That way you don't even need your new flags, but it will be
>>>>>>>>>>>> up to
>>>>>>>>>>>> the
>>>>>>>>>>>> other GCs to make their worker available
>>>>>>>>>>>> or cheat with a separate workgang.
>>>>>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>>>>>> The problem is that we do not want to haste such decision, we
>>>>>>>>>> believe
>>>>>>>>>> there is a better solution.
>>>>>>>>>> I think you also would want another solution.
>>>>>>>>>> But it's seems like such solution with 1 'global' thread pool
>>>>>>>>>> either
>>>>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>>>>> Since this probably will not be done any time soon my
>>>>>>>>>> suggestion is,
>>>>>>>>>> to not hold you back (we also want this), just to make
>>>>>>>>>> the code parallel and as an intermediate step ask the GC if it
>>>>>>>>>> minds
>>>>>>>>>> sharing it's thread.
>>>>>>>>>>
>>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will
>>>>>>>>>> share
>>>>>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>>>>>> resolved.
>>>>>>>>>>
>>>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer
>>>>>>>>>> and
>>>>>>>>>> flags for it we might limit our future options.
>>>>>>>>>>
>>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang'
>>>>>>>>>>> though. I
>>>>>>>>>>> see
>>>>>>>>>>> that both G1 and CMS suspend their worker threads at a
>>>>>>>>>>> safepoint.
>>>>>>>>>>> However:
>>>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>>>>>> number
>>>>>>>>>> of concurrent marking threads and parallel safepoint threads
>>>>>>>>>> since
>>>>>>>>>> they compete for cpu time.
>>>>>>>>>> As the code looks now, I think that decisions must be made by
>>>>>>>>>> the
>>>>>>>>>> GC.
>>>>>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>>>> Oops. Minor mistake there. Correction:
>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>>>>
>>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>>>>>> into
>>>>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>>>>
>>>>>>>> Roman
>>>>>>>>
>>>>>>
>>>>
>>


From robbin.ehn at oracle.com  Wed Jun 28 21:08:54 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 28 Jun 2017 23:08:54 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
 <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
 <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>
 <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>
Message-ID: <39baaa4e-7e9e-6ef4-749c-7429078d23d8@oracle.com>

Hi Roman

On 06/28/2017 10:23 PM, Roman Kennke wrote:
> 
>>
>> On 06/27/2017 09:47 PM, Roman Kennke wrote:
>>> Hi Robbin,
>>>
>>> Ugh. Thanks for catching this.
>>> Problem was that I was accounting the thread-local deflations twice:
>>> once in thread-local processing (basically a leftover from my earlier
>>> attempt to implement this accounting) and then again in
>>> finish_deflate_idle_monitors(). Should be fixed here:
>>>
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>
>>
>> Nit:
>> safepoint.cpp : ParallelSPCleanupTask
>> "const char* name = " is not needed and 1 is unused
>>
> Sorry, I don't understand what you mean by this. I see code like this:
> 
>        const char* name = "deflating idle monitors";
> 
> and it is used a few lines below, even 2x.
> 
> What's '1 is unused' ?

Yes I didn't see name was at two places, so it's only the one that is used only once.
598       const char* name = "compilation policy safepoint handler";
599       EventSafepointCleanupTask event;
600       TraceTime timer("compilation policy safepoint handler", TRACETIME_LOG(Info, safepoint, cleanup));
601       CompilationPolicy::policy()->do_safepoint_work();
602       event_safepoint_cleanup_task_commit(event, name);

(you do not need webrev this :) )

> 
>>>
>>> Side question: which jtreg targets do you usually run?
>>
>> Right now I cherry pick directories from: hotspot/test/
>>
>> I'm going to add a decent test group for local testing.
> That would be good!
> 
> 
>>
>>>
>>> Trying: make test TEST=hotspot_all
>>> gives me *lots* of failures due to missing jcstress stuff (?!)
>>>    And even other subsets seem to depend on several bits and pieces
>>> that I
>>> have no idea about.
>>
>> Yes, you need to use internal tool 'jib' java integrate build to get
>> that work or you can set some environment where the jcstress
>> application stuff is...
> Uhhh. We really do want a subset of tests that we can run reliably and
> that are self-contained, how else are people (without that jib thingy)
> supposed to do some sanity checking with their patches? ;-)

Yes!

>> I have a regression on ClassLoaderData root scanning, this should not
>> be related,
>> but I only have 3 patches which could cause this, if it's not
>> something in the environment that have changed.
> Let me know if it's my patch :-)

No it seems to be an experimental numa patch that seem to makes -XX:-UseNUMA worse :)
Adding -XX:+UseNUMA numbers come back, worse 12ms -> 2ms and avg goes 0.44ms to 0.27 ms

>>
>> Also do not see any immediate performance gains (off vs 4 threads), it
>> might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
>> , but I need to-do some more testing. I know you often run with none
>> default GSI.
> 
> First of all, during the course of this review I reduced the change from
> an actual implementation to a kind of framework, and it needs some
> separate changes in the GC to make use of it. Not sure if you added
> corresponding code in (e.g.) G1?

I added the stuff directly in collectedheap just for testing.

> 
> Also, this is only really visible in code that makes excessive use of
> monitors, i.e. the one linked by Carsten's original patch, or the test
> org.openjdk.gcbench.roots.Synchronizers.test in gc-bench:
> 
> http://icedtea.classpath.org/hg/gc-bench/
> 
> There are also some popular real-world apps that tend to do this. From
> the top off my head, Cassandra is such an application.

I'll look at that.
My test burns ~13k monitors per second, not sure what that level counts as.

I just want to verify some more testing, I'll get back to you tomorrow!

Thanks for bearing with me!

/Robbin

> 
> Thanks, Roman
> 
>>
>> I'll get back to you.
>>
>> Thanks, Robbin
>>
>>>
>>> Roman
>>>
>>> Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>>>> Hi Roman,
>>>>
>>>> There is something wrong in calculations:
>>>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0
>>>> : pop=27051 free=215487
>>>>
>>>> free is larger than population, have not had the time to dig into this.
>>>>
>>>> Thanks, Robbin
>>>>
>>>> On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>>>> So here's the latest iteration of that patch:
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>>>>
>>>>> I checked and fixed all the counters. The problem here is that they
>>>>> are
>>>>> not updated in a single place (deflate_idle_monitors() ) but in
>>>>> several
>>>>> places, potentially by multiple threads. I split up deflation into
>>>>> prepare_.. and a finish_.. methods to initialize local and update
>>>>> global
>>>>> counters respectively, and pass around a counters object (allocated on
>>>>> stack) to the various code paths that use it. Updating the counters
>>>>> always happen under a lock, there's no need to do anything special
>>>>> with
>>>>> regards to concurrency.
>>>>>
>>>>> I also checked the nmethod marking, but there doesn't seem to be
>>>>> anything in that code that looks problematic under concurrency. The
>>>>> worst that can happen is that two threads write the same value into an
>>>>> nmethod field. I think we can live with that ;-)
>>>>>
>>>>> Good to go?
>>>>>
>>>>> Tested by running specjvm and jcstress fastdebug+release without
>>>>> issues.
>>>>>
>>>>> Roman
>>>>>
>>>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>>>>> Hi Roman,
>>>>>>
>>>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>>>>> Hi David,
>>>>>>> thanks for reviewing. I'll be on vacation the next two weeks too,
>>>>>>> with
>>>>>>> only sporadic access to work stuff.
>>>>>>> Yes, exposure will not be as good as otherwise, but it's not totally
>>>>>>> untested either: the serial code path is the same as the
>>>>>>> parallel, the
>>>>>>> only difference is that it's not actually called by multiple
>>>>>>> threads.
>>>>>>> It's ok I think.
>>>>>>>
>>>>>>> I found two more issues that I think should be addressed:
>>>>>>> - There are some counters in deflate_idle_monitors() and I'm not
>>>>>>> sure I
>>>>>>> correctly handle them in the split-up and MT'ed thread-local/ global
>>>>>>> list deflation
>>>>>>> - nmethod marking seems to unconditionally poke true or something
>>>>>>> like
>>>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>>>>>>> probably worth checking if it's already true, especially when doing
>>>>>>> this
>>>>>>> with multiple threads concurrently.
>>>>>>>
>>>>>>> I'll send an updated patch around later, I hope I can get to it
>>>>>>> today...
>>>>>>
>>>>>> I'll review that when you get it out.
>>>>>> I think this looks as a reasonable step before we tackle this with a
>>>>>> major effort, such as the JEP you and Carsten doing.
>>>>>> And another effort to 'fix' nmethods marking.
>>>>>>
>>>>>> Internal discussion yesterday lead us to conclude that the runtime
>>>>>> will probably need more threads.
>>>>>> This would be a good driver to do a 'global' worker pool which serves
>>>>>> both gc, runtime and safepoints with threads.
>>>>>>
>>>>>>>
>>>>>>> Roman
>>>>>>>
>>>>>>>> Hi Roman,
>>>>>>>>
>>>>>>>> I am about to disappear on an extended vacation so will let others
>>>>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime,
>>>>>>>> but
>>>>>>>> an opt-in by the particular GC developers. Okay. My only concern
>>>>>>>> with
>>>>>>>> that is if Shenandoah is the only GC that currently opts in then
>>>>>>>> this
>>>>>>>> code is not going to get much testing and will be more prone to
>>>>>>>> incidental breakage.
>>>>>>
>>>>>> As I mentioned before, it seem like Erik ? have some idea, maybe he
>>>>>> can do this after his barrier patch.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>>>>>> Hi Roman,
>>>>>>>>>>>
>>>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>> GC work (which also uses the same workers). This does not
>>>>>>>>>>>>>> only
>>>>>>>>>>>>>> require
>>>>>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> finished their tasks. This needs some careful handling to
>>>>>>>>>>>>>> work
>>>>>>>>>>>>>> without
>>>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>>>>>>> corresponding
>>>>>>>>>>>>>> run_task() call and also the tasks themselves need to join
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> STS and
>>>>>>>>>>>>>> handle requests for safepoints not by yielding, but by
>>>>>>>>>>>>>> leaving
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> task.
>>>>>>>>>>>>>> This is far too peculiar for me to make the call to hook
>>>>>>>>>>>>>> up GC
>>>>>>>>>>>>>> workers
>>>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I
>>>>>>>>>>>>>> left the
>>>>>>>>>>>>>> API in
>>>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>>>>>>>>>> about G1
>>>>>>>>>>>>>> and CMS
>>>>>>>>>>>>>> should make that call, or else just use a separate thread
>>>>>>>>>>>>>> pool.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is it ok now?
>>>>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>>>>>>> workers
>>>>>>>>>>>>> inside Shenandoah,
>>>>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>>>>>>> e.g.:
>>>>>>>>>>>>>
>>>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>>>>>         _cleanup_workers->run_task(&cleanup,
>>>>>>>>>>>>> _num_cleanup_workers);
>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>         cleanup.work(0);
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> That way you don't even need your new flags, but it will be
>>>>>>>>>>>>> up to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> other GCs to make their worker available
>>>>>>>>>>>>> or cheat with a separate workgang.
>>>>>>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>>>>>>> The problem is that we do not want to haste such decision, we
>>>>>>>>>>> believe
>>>>>>>>>>> there is a better solution.
>>>>>>>>>>> I think you also would want another solution.
>>>>>>>>>>> But it's seems like such solution with 1 'global' thread pool
>>>>>>>>>>> either
>>>>>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>>>>>> Since this probably will not be done any time soon my
>>>>>>>>>>> suggestion is,
>>>>>>>>>>> to not hold you back (we also want this), just to make
>>>>>>>>>>> the code parallel and as an intermediate step ask the GC if it
>>>>>>>>>>> minds
>>>>>>>>>>> sharing it's thread.
>>>>>>>>>>>
>>>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will
>>>>>>>>>>> share
>>>>>>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>>>>>>> resolved.
>>>>>>>>>>>
>>>>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer
>>>>>>>>>>> and
>>>>>>>>>>> flags for it we might limit our future options.
>>>>>>>>>>>
>>>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang'
>>>>>>>>>>>> though. I
>>>>>>>>>>>> see
>>>>>>>>>>>> that both G1 and CMS suspend their worker threads at a
>>>>>>>>>>>> safepoint.
>>>>>>>>>>>> However:
>>>>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>>>>>>> number
>>>>>>>>>>> of concurrent marking threads and parallel safepoint threads
>>>>>>>>>>> since
>>>>>>>>>>> they compete for cpu time.
>>>>>>>>>>> As the code looks now, I think that decisions must be made by
>>>>>>>>>>> the
>>>>>>>>>>> GC.
>>>>>>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>>>>> Oops. Minor mistake there. Correction:
>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>>>>>
>>>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>>>>>>> into
>>>>>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>>>>>
>>>>>>>>> Roman
>>>>>>>>>
>>>>>>>
>>>>>
>>>
> 


From sangheon.kim at oracle.com  Thu Jun 29 07:56:04 2017
From: sangheon.kim at oracle.com (sangheon)
Date: Thu, 29 Jun 2017 00:56:04 -0700
Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing
In-Reply-To: <1498488133.2665.37.camel@oracle.com>
References: <ab676ee2-95e9-9da8-be27-6f7822445bfa@oracle.com>
 <bae34b84-4b8d-3d4e-9954-a204da1ab50f@redhat.com>
 <a34ab9df-3e0a-b7ff-79e4-c4015ea843bc@oracle.com>
 <1497352882.2829.65.camel@oracle.com>
 <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com>
 <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com>
 <1498488133.2665.37.camel@oracle.com>
Message-ID: <d334abc5-16f5-83ac-8336-b871a36b78cb@oracle.com>

Hi Thomas,

Thank you very much for the thorough review.

On 06/26/2017 07:42 AM, Thomas Schatzl wrote:
> Hi Sangheon,
>
>    thanks for all your changes, and sorry a bit for the delay...
>
> On Wed, 2017-06-14 at 00:52 -0700, sangheon wrote:
>> Hi Thomas again,
>> On 06/13/2017 02:21 PM, sangheon wrote:
>>> Hi Thomas,
>>>
>>> Thank you for reviewing this.
>>>
>>> On 06/13/2017 04:21 AM, Thomas Schatzl wrote:
>>>> Hi Sangheon,
>>>>
>>>>
>>>> On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote:
>>>>> Hi Aleksey,
>>>>>
>>>>> Thanks for the review.
>>>>>
>>>>> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote:
>>>>>> On 06/10/2017 01:57 AM, sangheon wrote:
>>>>>>> CR:https://bugs.openjdk.java.net/browse/JDK-8173335
>>>>>>> webrev:http://cr.openjdk.java.net/~sangheki/8173335/webrev
>>>>>>> .0
>>>> - There should be a destructor in ReferenceProcessor cleaning up
>>>> the dynamically allocated memory.
>>> Thomas and I had some discussion about this and agreed to file a
>>> separate CR for freeing issue.
>>>
>>> I noticed that there's no destructor when I wrote this, but this is
>>> how we usually implement.
>>> However as this seems incorrect, I will add a destructor for newly
>>> added class but it will not be used in this patch.
>>> It will be used in the following CR(
>>> https://bugs.openjdk.java.net/browse/JDK-8182120  ) which fixes
>>> not-freeing issue in ReferenceProcessor.
>>> FYI, ReferenceProcessor has heap allocated members of
>>> ReferencePolicy(and its friends) but it is not freed too. So
>>> instead of extending this patch, I propose to separate this freeing
>>> issue.
> That's fine, thanks.
>
>>>> - the change should move gc+ref output to something else: there
>>>> is so much additional junk printed with gc+ref=trace so that the
>>>> phase logging is drowned out with real trace information and
>>>> unusable for regular consumption.
>>> Okay, I will add it.
>>> But I asked introducing 'gc+ref+phases' before but you didn't like
>>> it. :) Probably I didn't provide much details?!
> Yes. In the example you showed me earlier with gc+ref=trace the
> examples did not contain the other gc+ref=trace output. That's why I
> thought it would be fine. :)
:)

>>>> - I would prefer if resetting the reference phase times logger
>>>> wouldn't be kind of an afterthought of printing :)
>>>>
>>>> Also it might be useful to keep the data around for somewhat
>>>> longer (not throw it away after every phase). Don't we need the
>>>> data for further analysis?
>>> I don't have strong opinion on this.
>>>
>>> I didn't consider keeping log data for further analysis. This could
>>> a minor reason for supporting keeping log data longer but I think
>>> interspersing with existing G1 log would be the main reason of
>>> keeping it.
>>>
>>>> This would also allow printing it later using different log tags
>>>> (with different formatting).
>>>>
>>>> - I like the split of phasetimes into data storage and printing.
>>>> I do not like that basically the timing data is created twice,
>>>> once for the phasetimes, once for the GCTimer (for JFR
>>>> basically). No, currently timing data is created once and used
>>>> for both phase log  and GCTimer.
>>> Or am I missing something?
>>>
>>> So in summary, mostly I agree with your comments except below 2:
>>> 1. Interspersing with G1 log.
>>> 2. Keeping log data longer. (This should be done if we go with
>>> interspersing idea)
>> I started working on above 2 items. :)
>> I will update webrev when I'm ready.
>>
> Thanks a lot for considering all my comments.
>
> I think the output is much nicer now :)
Thanks!

> Some more notes:
>
> - In the current change (webrev.2) the method with using the
> "direct_print()" getter seems a bit forced only to keep the current
> structure of the code, i.e. printing within the
> ReferenceProcessor::process_references() method.
Right.

> What do you think about moving the printing outside of that method for
> all collectors, just passing a (properly initialized - that allows
> moving the reset() method into gc specific code as well)
> ReferenceProcessorPhaseTimes* that is then later used for printing,
> either directly, or deferred?
Okay, this seems better than current one.
During applying your suggestion, I tweaked a little bit, because giving 
the responsibility of printing logs to the callers seems not that 
natural to me. (I also prepared additional webrev for your original 
suggestion [1])

> At the location where the reference processing is done we know whether
> we need to print directly or deferred. This also hides pretty specific
> information about printing (like indentation level) from the reference
> processing itself.
>
> Also that would maybe allow storing the GCTimer reference somewhere in
> the ReferenceProcessorPhaseTimes so that we only need to pass a single
> container for timing information around.
Good idea, now GCTimer is included in ReferenceProcessorPhaseTimes.

> Overall that may reduce the code quite a bit, keeps similar components
> (GCTimer and ReferenceProcessorPhaseTimes) together without
> ReferenceProcessor needing to know about both of them, and removes the
> ReferenceProcessor "global" reference to the
> ReferenceProcessorPhaseTimes, which is easier to keep track of when
> looking at the code (instead of having the GCTimer passed in and the
> ReferenceProcessorPhaseTimes as member).
>
> The collectors that print immediately probably also can get away with a
> stack-allocated local ReferenceProcessorPhaseTimes, which somewhat
> simplifies their lifecycle management.
Right.

Mostly ReferenceProcessorPhaseTimes will be stack-allocated at the time 
of calling process_discovered_references() or 
enqueue_discovered_references(), except for G1 young GC case.

~ReferenceProcessorPhaseTimes() will not be added in the destructor of 
G1CollectedHeap as we don't have it now. This can be addressed in 
separate CR if needed.

> - could you please tighten the visibility of
> ReferenceProcessorPhaseTimes methods a bit? The getters of that class
> are only ever used in the print* methods, and even some of these print*
> methods are ever called from class local methods.
>
> I think this would drastically decrease the surface of that method.
You are right.
Tried to move to 'private' as many as possible.

> - there seems to be a bug in printing per-thread per-phase worker
> times, the values seem to contain the absolute time at which the list
> has been processed, not a duration. (with -XX:+ParallelRefProcEnabled
> and gc+phases+ref=trace)
>
> [1512.286s][debug][gc,phases,ref] GC(834) Reference Processing: 2.5ms
> [1512.286s][debug][gc,phases,ref] GC(834)   SoftReference: 0.3ms
> [1512.286s][debug][gc,phases,ref] GC(834)     Balance queues: 0.0ms
> [1512.286s][debug][gc,phases,ref] GC(834)     Phase1: 0.3ms
> [1512.286s][trace][gc,phases,ref] GC(834)       Process lists
> (ms)        Min: 1512283.9, Avg: 1512283.9, Max: 1512283.9, Diff:  0.0,
> Sum: 34782529.1, Workers: 23
> [1512.286s][debug][gc,phases,ref] GC(834)     Phase2: 0.3ms
> [1512.286s][trace][gc,phases,ref] GC(834)       Process lists
> (ms)        Min: 1512284.2, Avg: 1512284.2, Max: 1512284.2, Diff:  0.0,
> Sum: 34782535.9, Workers: 23
>
> - in referenceProcessorPhaseTimes.cpp:35: the code reads
>
> if (_worker_time != NULL) {
>    ...
> }
>
> with _worker_time being set to NULL just one line above (same with the
> other constructor).
>
> Not sure.
_worker_time check was remainder of previous change and resulted in 
above bug you pointed.
Thanks for catching this.
Fixed.

> - in RefProcWorkerTimeTracker::~RefProcWorkerTimeTracker: how is it
> possible that _worker_time is NULL? ReferenceProcessorPhaseTimes seems
> to always allocate memory for it.
Fixed.
_worker_time can't be NULL.

> - RefProcPhaseTimesTracker takes the DiscoveredList array as parameter,
> but only ever uses it to determine how many total entries this
> DiscoveredList[] has. So it seems to me that it would be better in the
> name of information hiding if the ReferenceProcessor, which already has
> a total_count() method, would just pass this total instead of the
> entire list.
The problem is that 'before/after' count should be gathered from 
constructor and destructor.
With passing a parameter, constructor could get the total but it is 
impossible from destructor.

But I agree with your point that passing DiscoveredList to get the 
total-count can be enhanced. So I changed to add a new method that 
returns total count at ReferenceProcessor(per ReferenceType). With this 
new approach, we can simplify a bit more. e.g. eliminate 
ReferenceProcessorPhaseTimes::max_gc_threads() and 
total_count_from_list() etc.

> This would also remove the need for the max_gc_counts() getter in
> ReferenceProcessorPhaseTimes afaics too.
[...]

> - "Ref Counts" vs. "Reference Counts" vs. something else in the output
> of the enqueue phase: I would prefer to not use abbreviations. Since we
> already mess up the logging output in a big way, we might also just go
> all the way :P
Changed to use 'Reference Counts'.

Updated webrev:
http://cr.openjdk.java.net/~sangheki/8173335/webrev.3
http://cr.openjdk.java.net/~sangheki/8173335/webrev.3_to_2/

Testing: JPRT and local test with all collectors.

[1]: with your suggestion, callers will stack allocate 
ReferenceProcessorPhaseTimes and GCTimer can be included in 
ReferenceProcessorPhaseTimes. i.e. applied only your suggestion
http://cr.openjdk.java.net/~sangheki/8173335/webrev.3b
http://cr.openjdk.java.net/~sangheki/8173335/webrev.3b_to_2/

Thanks,
Sangheon


> Thanks,
>    Thomas
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170629/26e6f367/attachment.htm>

From erik.helin at oracle.com  Thu Jun 29 09:11:19 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Thu, 29 Jun 2017 11:11:19 +0200
Subject: RFR: G1HRRSFlushLogBuffersOnVerify with remembered set verification
 does not work
Message-ID: <afe7d07b-0123-b966-8166-06dbdaed4b9d@oracle.com>

Hi all,

this patch removes the developer flag -XX:G1HRRSFlushLogBuffersOnVerify. 
This flag has been broken for some time and I don't see any reason for 
keeping it. The flag is `false` by default so I guess this code isn't 
exercised all that much :/

I assume that the original intent of the flag was to perform an "update 
rs" phase before doing remembered set (rem set) verification. Due to the 
"update rs" phase, all rem sets would be complete, so verification would 
verify more rem set entries. However, since this code was added, 
update_rs has changed quite a bit, and this code hasn't kept up. It is 
no longer possible to call update_rs in the way this code expects.

Instead of spending time on trying to get this code up to date, I 
suggest we just remove it. During verification after a collection (young 
or mixed) we already do this kind of rem set verification (since all rem 
sets must then be complete since all collections currently do 
update_rs). If we are worried that we verify too few rem set entries 
during e.g. remark and cleanup, then we could for example run with very 
aggressive concurrent refinement.

Bug: https://bugs.openjdk.java.net/browse/JDK-8153360

Patch: http://cr.openjdk.java.net/~ehelin/8153360/00/

Test: make hotspot - this is "just" removal of code

Thanks,
Erik


From thomas.schatzl at oracle.com  Thu Jun 29 09:37:36 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 29 Jun 2017 11:37:36 +0200
Subject: RFR: G1HRRSFlushLogBuffersOnVerify with remembered set
 verification does not work
In-Reply-To: <afe7d07b-0123-b966-8166-06dbdaed4b9d@oracle.com>
References: <afe7d07b-0123-b966-8166-06dbdaed4b9d@oracle.com>
Message-ID: <1498729056.2900.4.camel@oracle.com>

Hi Erik,

On Thu, 2017-06-29 at 11:11 +0200, Erik Helin wrote:
> Hi all,
> 
> this patch removes the developer flag
> -XX:G1HRRSFlushLogBuffersOnVerify.?
> This flag has been broken for some time and I don't see any reason
> for keeping it. The flag is `false` by default so I guess this code
> isn't exercised all that much :/
> 
> I assume that the original intent of the flag was to perform an
> "update rs" phase before doing remembered set (rem set) verification.
> Due to the "update rs" phase, all rem sets would be complete, so
> verification would verify more rem set entries. However, since this
> code was added, update_rs has changed quite a bit, and this code
> hasn't kept up. It is no longer possible to call update_rs in the way
> this code expects.
> 
> Instead of spending time on trying to get this code up to date, I?
> suggest we just remove it. During verification after a collection
> (young or mixed) we already do this kind of rem set verification
> (since all rem sets must then be complete since all collections
> currently do update_rs). If we are worried that we verify too few rem
> set entries during e.g. remark and cleanup, then we could for example
> run with very aggressive concurrent refinement.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8153360
> 
> Patch: http://cr.openjdk.java.net/~ehelin/8153360/00/
> 
> Test: make hotspot - this is "just" removal of code

? looks good.

Please add a comment about what the last clause in the verification
code actually means (heapRegion.cpp:584). Something like:

// Reference may not have been refined into the remembered sets yet.?
// Instead of looking into all dirty card queues, we take a shortcut
// by looking at whether the corresponding card is dirty.
// ObjArrays may either by marked on the object header or exactly.

(Actually I would guess the "correct" clause here would be is_array()
and not is_objArray(), but primitive type arrays are never marked as
they do not contain references)

I do not need a re-review for the comment change.

Thanks,
? Thomas


From robbin.ehn at oracle.com  Thu Jun 29 10:49:58 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 29 Jun 2017 12:49:58 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
 <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
 <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>
 <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>
Message-ID: <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com>

Hi Roman,

I haven't had the time to test all scenarios, and the numbers are just an indication:

Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg, avg of 10 worsed cleanups 0.0173s
Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg, avg of 10 worsed cleanups 0.0199s
Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg, avg of 10 worsed cleanups 0.0066s

When MonitorUsedDeflationThreshold=0 we are talking about 120000 free monitors to deflate.
And I get worse numbers doing the cleanup in 4 threads.

Any idea why I see these numbers?

Thanks, Robbin

On 06/28/2017 10:23 PM, Roman Kennke wrote:
> 
>>
>> On 06/27/2017 09:47 PM, Roman Kennke wrote:
>>> Hi Robbin,
>>>
>>> Ugh. Thanks for catching this.
>>> Problem was that I was accounting the thread-local deflations twice:
>>> once in thread-local processing (basically a leftover from my earlier
>>> attempt to implement this accounting) and then again in
>>> finish_deflate_idle_monitors(). Should be fixed here:
>>>
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>
>>
>> Nit:
>> safepoint.cpp : ParallelSPCleanupTask
>> "const char* name = " is not needed and 1 is unused
>>
> Sorry, I don't understand what you mean by this. I see code like this:
> 
>        const char* name = "deflating idle monitors";
> 
> and it is used a few lines below, even 2x.
> 
> What's '1 is unused' ?
> 
>>>
>>> Side question: which jtreg targets do you usually run?
>>
>> Right now I cherry pick directories from: hotspot/test/
>>
>> I'm going to add a decent test group for local testing.
> That would be good!
> 
> 
>>
>>>
>>> Trying: make test TEST=hotspot_all
>>> gives me *lots* of failures due to missing jcstress stuff (?!)
>>>    And even other subsets seem to depend on several bits and pieces
>>> that I
>>> have no idea about.
>>
>> Yes, you need to use internal tool 'jib' java integrate build to get
>> that work or you can set some environment where the jcstress
>> application stuff is...
> Uhhh. We really do want a subset of tests that we can run reliably and
> that are self-contained, how else are people (without that jib thingy)
> supposed to do some sanity checking with their patches? ;-)
>> I have a regression on ClassLoaderData root scanning, this should not
>> be related,
>> but I only have 3 patches which could cause this, if it's not
>> something in the environment that have changed.
> Let me know if it's my patch :-)
>>
>> Also do not see any immediate performance gains (off vs 4 threads), it
>> might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
>> , but I need to-do some more testing. I know you often run with none
>> default GSI.
> 
> First of all, during the course of this review I reduced the change from
> an actual implementation to a kind of framework, and it needs some
> separate changes in the GC to make use of it. Not sure if you added
> corresponding code in (e.g.) G1?
> 
> Also, this is only really visible in code that makes excessive use of
> monitors, i.e. the one linked by Carsten's original patch, or the test
> org.openjdk.gcbench.roots.Synchronizers.test in gc-bench:
> 
> http://icedtea.classpath.org/hg/gc-bench/
> 
> There are also some popular real-world apps that tend to do this. From
> the top off my head, Cassandra is such an application.
> 
> Thanks, Roman
> 
>>
>> I'll get back to you.
>>
>> Thanks, Robbin
>>
>>>
>>> Roman
>>>
>>> Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>>>> Hi Roman,
>>>>
>>>> There is something wrong in calculations:
>>>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0
>>>> : pop=27051 free=215487
>>>>
>>>> free is larger than population, have not had the time to dig into this.
>>>>
>>>> Thanks, Robbin
>>>>
>>>> On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>>>> So here's the latest iteration of that patch:
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>>>>
>>>>> I checked and fixed all the counters. The problem here is that they
>>>>> are
>>>>> not updated in a single place (deflate_idle_monitors() ) but in
>>>>> several
>>>>> places, potentially by multiple threads. I split up deflation into
>>>>> prepare_.. and a finish_.. methods to initialize local and update
>>>>> global
>>>>> counters respectively, and pass around a counters object (allocated on
>>>>> stack) to the various code paths that use it. Updating the counters
>>>>> always happen under a lock, there's no need to do anything special
>>>>> with
>>>>> regards to concurrency.
>>>>>
>>>>> I also checked the nmethod marking, but there doesn't seem to be
>>>>> anything in that code that looks problematic under concurrency. The
>>>>> worst that can happen is that two threads write the same value into an
>>>>> nmethod field. I think we can live with that ;-)
>>>>>
>>>>> Good to go?
>>>>>
>>>>> Tested by running specjvm and jcstress fastdebug+release without
>>>>> issues.
>>>>>
>>>>> Roman
>>>>>
>>>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>>>>> Hi Roman,
>>>>>>
>>>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>>>>> Hi David,
>>>>>>> thanks for reviewing. I'll be on vacation the next two weeks too,
>>>>>>> with
>>>>>>> only sporadic access to work stuff.
>>>>>>> Yes, exposure will not be as good as otherwise, but it's not totally
>>>>>>> untested either: the serial code path is the same as the
>>>>>>> parallel, the
>>>>>>> only difference is that it's not actually called by multiple
>>>>>>> threads.
>>>>>>> It's ok I think.
>>>>>>>
>>>>>>> I found two more issues that I think should be addressed:
>>>>>>> - There are some counters in deflate_idle_monitors() and I'm not
>>>>>>> sure I
>>>>>>> correctly handle them in the split-up and MT'ed thread-local/ global
>>>>>>> list deflation
>>>>>>> - nmethod marking seems to unconditionally poke true or something
>>>>>>> like
>>>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>>>>>>> probably worth checking if it's already true, especially when doing
>>>>>>> this
>>>>>>> with multiple threads concurrently.
>>>>>>>
>>>>>>> I'll send an updated patch around later, I hope I can get to it
>>>>>>> today...
>>>>>>
>>>>>> I'll review that when you get it out.
>>>>>> I think this looks as a reasonable step before we tackle this with a
>>>>>> major effort, such as the JEP you and Carsten doing.
>>>>>> And another effort to 'fix' nmethods marking.
>>>>>>
>>>>>> Internal discussion yesterday lead us to conclude that the runtime
>>>>>> will probably need more threads.
>>>>>> This would be a good driver to do a 'global' worker pool which serves
>>>>>> both gc, runtime and safepoints with threads.
>>>>>>
>>>>>>>
>>>>>>> Roman
>>>>>>>
>>>>>>>> Hi Roman,
>>>>>>>>
>>>>>>>> I am about to disappear on an extended vacation so will let others
>>>>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime,
>>>>>>>> but
>>>>>>>> an opt-in by the particular GC developers. Okay. My only concern
>>>>>>>> with
>>>>>>>> that is if Shenandoah is the only GC that currently opts in then
>>>>>>>> this
>>>>>>>> code is not going to get much testing and will be more prone to
>>>>>>>> incidental breakage.
>>>>>>
>>>>>> As I mentioned before, it seem like Erik ? have some idea, maybe he
>>>>>> can do this after his barrier patch.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>>>>>> Hi Roman,
>>>>>>>>>>>
>>>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>> GC work (which also uses the same workers). This does not
>>>>>>>>>>>>>> only
>>>>>>>>>>>>>> require
>>>>>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> finished their tasks. This needs some careful handling to
>>>>>>>>>>>>>> work
>>>>>>>>>>>>>> without
>>>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>>>>>>> corresponding
>>>>>>>>>>>>>> run_task() call and also the tasks themselves need to join
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> STS and
>>>>>>>>>>>>>> handle requests for safepoints not by yielding, but by
>>>>>>>>>>>>>> leaving
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> task.
>>>>>>>>>>>>>> This is far too peculiar for me to make the call to hook
>>>>>>>>>>>>>> up GC
>>>>>>>>>>>>>> workers
>>>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I
>>>>>>>>>>>>>> left the
>>>>>>>>>>>>>> API in
>>>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>>>>>>>>>> about G1
>>>>>>>>>>>>>> and CMS
>>>>>>>>>>>>>> should make that call, or else just use a separate thread
>>>>>>>>>>>>>> pool.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is it ok now?
>>>>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>>>>>>> workers
>>>>>>>>>>>>> inside Shenandoah,
>>>>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>>>>>>> e.g.:
>>>>>>>>>>>>>
>>>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>>>>>         _cleanup_workers->run_task(&cleanup,
>>>>>>>>>>>>> _num_cleanup_workers);
>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>         cleanup.work(0);
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> That way you don't even need your new flags, but it will be
>>>>>>>>>>>>> up to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> other GCs to make their worker available
>>>>>>>>>>>>> or cheat with a separate workgang.
>>>>>>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>>>>>>> The problem is that we do not want to haste such decision, we
>>>>>>>>>>> believe
>>>>>>>>>>> there is a better solution.
>>>>>>>>>>> I think you also would want another solution.
>>>>>>>>>>> But it's seems like such solution with 1 'global' thread pool
>>>>>>>>>>> either
>>>>>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>>>>>> Since this probably will not be done any time soon my
>>>>>>>>>>> suggestion is,
>>>>>>>>>>> to not hold you back (we also want this), just to make
>>>>>>>>>>> the code parallel and as an intermediate step ask the GC if it
>>>>>>>>>>> minds
>>>>>>>>>>> sharing it's thread.
>>>>>>>>>>>
>>>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will
>>>>>>>>>>> share
>>>>>>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>>>>>>> resolved.
>>>>>>>>>>>
>>>>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer
>>>>>>>>>>> and
>>>>>>>>>>> flags for it we might limit our future options.
>>>>>>>>>>>
>>>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang'
>>>>>>>>>>>> though. I
>>>>>>>>>>>> see
>>>>>>>>>>>> that both G1 and CMS suspend their worker threads at a
>>>>>>>>>>>> safepoint.
>>>>>>>>>>>> However:
>>>>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>>>>>>> number
>>>>>>>>>>> of concurrent marking threads and parallel safepoint threads
>>>>>>>>>>> since
>>>>>>>>>>> they compete for cpu time.
>>>>>>>>>>> As the code looks now, I think that decisions must be made by
>>>>>>>>>>> the
>>>>>>>>>>> GC.
>>>>>>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>>>>> Oops. Minor mistake there. Correction:
>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>>>>>
>>>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>>>>>>> into
>>>>>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>>>>>
>>>>>>>>> Roman
>>>>>>>>>
>>>>>>>
>>>>>
>>>
> 


From rkennke at redhat.com  Thu Jun 29 11:42:42 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 29 Jun 2017 13:42:42 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
 <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
 <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>
 <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>
 <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com>
Message-ID: <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com>

How many Java threads are involved in monitor Inflation ? Parallelization is spread by Java threads (i.e. each worker claims and deflates monitors of 1 java thread per step).

Roman

Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn <robbin.ehn at oracle.com>:
>Hi Roman,
>
>I haven't had the time to test all scenarios, and the numbers are just
>an indication:
>
>Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg, avg of
>10 worsed cleanups 0.0173s
>Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg, avg of
>10 worsed cleanups 0.0199s
>Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg, avg of
>10 worsed cleanups 0.0066s
>
>When MonitorUsedDeflationThreshold=0 we are talking about 120000 free
>monitors to deflate.
>And I get worse numbers doing the cleanup in 4 threads.
>
>Any idea why I see these numbers?
>
>Thanks, Robbin
>
>On 06/28/2017 10:23 PM, Roman Kennke wrote:
>> 
>>>
>>> On 06/27/2017 09:47 PM, Roman Kennke wrote:
>>>> Hi Robbin,
>>>>
>>>> Ugh. Thanks for catching this.
>>>> Problem was that I was accounting the thread-local deflations
>twice:
>>>> once in thread-local processing (basically a leftover from my
>earlier
>>>> attempt to implement this accounting) and then again in
>>>> finish_deflate_idle_monitors(). Should be fixed here:
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>
>>>
>>> Nit:
>>> safepoint.cpp : ParallelSPCleanupTask
>>> "const char* name = " is not needed and 1 is unused
>>>
>> Sorry, I don't understand what you mean by this. I see code like
>this:
>> 
>>        const char* name = "deflating idle monitors";
>> 
>> and it is used a few lines below, even 2x.
>> 
>> What's '1 is unused' ?
>> 
>>>>
>>>> Side question: which jtreg targets do you usually run?
>>>
>>> Right now I cherry pick directories from: hotspot/test/
>>>
>>> I'm going to add a decent test group for local testing.
>> That would be good!
>> 
>> 
>>>
>>>>
>>>> Trying: make test TEST=hotspot_all
>>>> gives me *lots* of failures due to missing jcstress stuff (?!)
>>>>    And even other subsets seem to depend on several bits and pieces
>>>> that I
>>>> have no idea about.
>>>
>>> Yes, you need to use internal tool 'jib' java integrate build to get
>>> that work or you can set some environment where the jcstress
>>> application stuff is...
>> Uhhh. We really do want a subset of tests that we can run reliably
>and
>> that are self-contained, how else are people (without that jib
>thingy)
>> supposed to do some sanity checking with their patches? ;-)
>>> I have a regression on ClassLoaderData root scanning, this should
>not
>>> be related,
>>> but I only have 3 patches which could cause this, if it's not
>>> something in the environment that have changed.
>> Let me know if it's my patch :-)
>>>
>>> Also do not see any immediate performance gains (off vs 4 threads),
>it
>>> might be
>http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
>>> , but I need to-do some more testing. I know you often run with none
>>> default GSI.
>> 
>> First of all, during the course of this review I reduced the change
>from
>> an actual implementation to a kind of framework, and it needs some
>> separate changes in the GC to make use of it. Not sure if you added
>> corresponding code in (e.g.) G1?
>> 
>> Also, this is only really visible in code that makes excessive use of
>> monitors, i.e. the one linked by Carsten's original patch, or the
>test
>> org.openjdk.gcbench.roots.Synchronizers.test in gc-bench:
>> 
>> http://icedtea.classpath.org/hg/gc-bench/
>> 
>> There are also some popular real-world apps that tend to do this.
>From
>> the top off my head, Cassandra is such an application.
>> 
>> Thanks, Roman
>> 
>>>
>>> I'll get back to you.
>>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Roman
>>>>
>>>> Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>>>>> Hi Roman,
>>>>>
>>>>> There is something wrong in calculations:
>>>>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25
>ForceMonitorScavenge=0
>>>>> : pop=27051 free=215487
>>>>>
>>>>> free is larger than population, have not had the time to dig into
>this.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>> On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>>>>> So here's the latest iteration of that patch:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>>>>>
>>>>>> I checked and fixed all the counters. The problem here is that
>they
>>>>>> are
>>>>>> not updated in a single place (deflate_idle_monitors() ) but in
>>>>>> several
>>>>>> places, potentially by multiple threads. I split up deflation
>into
>>>>>> prepare_.. and a finish_.. methods to initialize local and update
>>>>>> global
>>>>>> counters respectively, and pass around a counters object
>(allocated on
>>>>>> stack) to the various code paths that use it. Updating the
>counters
>>>>>> always happen under a lock, there's no need to do anything
>special
>>>>>> with
>>>>>> regards to concurrency.
>>>>>>
>>>>>> I also checked the nmethod marking, but there doesn't seem to be
>>>>>> anything in that code that looks problematic under concurrency.
>The
>>>>>> worst that can happen is that two threads write the same value
>into an
>>>>>> nmethod field. I think we can live with that ;-)
>>>>>>
>>>>>> Good to go?
>>>>>>
>>>>>> Tested by running specjvm and jcstress fastdebug+release without
>>>>>> issues.
>>>>>>
>>>>>> Roman
>>>>>>
>>>>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>>>>>> Hi Roman,
>>>>>>>
>>>>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>>>>>> Hi David,
>>>>>>>> thanks for reviewing. I'll be on vacation the next two weeks
>too,
>>>>>>>> with
>>>>>>>> only sporadic access to work stuff.
>>>>>>>> Yes, exposure will not be as good as otherwise, but it's not
>totally
>>>>>>>> untested either: the serial code path is the same as the
>>>>>>>> parallel, the
>>>>>>>> only difference is that it's not actually called by multiple
>>>>>>>> threads.
>>>>>>>> It's ok I think.
>>>>>>>>
>>>>>>>> I found two more issues that I think should be addressed:
>>>>>>>> - There are some counters in deflate_idle_monitors() and I'm
>not
>>>>>>>> sure I
>>>>>>>> correctly handle them in the split-up and MT'ed thread-local/
>global
>>>>>>>> list deflation
>>>>>>>> - nmethod marking seems to unconditionally poke true or
>something
>>>>>>>> like
>>>>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but
>it's
>>>>>>>> probably worth checking if it's already true, especially when
>doing
>>>>>>>> this
>>>>>>>> with multiple threads concurrently.
>>>>>>>>
>>>>>>>> I'll send an updated patch around later, I hope I can get to it
>>>>>>>> today...
>>>>>>>
>>>>>>> I'll review that when you get it out.
>>>>>>> I think this looks as a reasonable step before we tackle this
>with a
>>>>>>> major effort, such as the JEP you and Carsten doing.
>>>>>>> And another effort to 'fix' nmethods marking.
>>>>>>>
>>>>>>> Internal discussion yesterday lead us to conclude that the
>runtime
>>>>>>> will probably need more threads.
>>>>>>> This would be a good driver to do a 'global' worker pool which
>serves
>>>>>>> both gc, runtime and safepoints with threads.
>>>>>>>
>>>>>>>>
>>>>>>>> Roman
>>>>>>>>
>>>>>>>>> Hi Roman,
>>>>>>>>>
>>>>>>>>> I am about to disappear on an extended vacation so will let
>others
>>>>>>>>> pursue this. IIUC this is longer an opt-in by the user at
>runtime,
>>>>>>>>> but
>>>>>>>>> an opt-in by the particular GC developers. Okay. My only
>concern
>>>>>>>>> with
>>>>>>>>> that is if Shenandoah is the only GC that currently opts in
>then
>>>>>>>>> this
>>>>>>>>> code is not going to get much testing and will be more prone
>to
>>>>>>>>> incidental breakage.
>>>>>>>
>>>>>>> As I mentioned before, it seem like Erik ? have some idea, maybe
>he
>>>>>>> can do this after his barrier patch.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> /Robbin
>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>>>>>>> Hi Roman,
>>>>>>>>>>>>
>>>>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We need to be able to use the workers at a safepoint
>during
>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>> GC work (which also uses the same workers). This does
>not
>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>> require
>>>>>>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle,
>i.e.
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> finished their tasks. This needs some careful handling
>to
>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around
>the
>>>>>>>>>>>>>>> corresponding
>>>>>>>>>>>>>>> run_task() call and also the tasks themselves need to
>join
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> STS and
>>>>>>>>>>>>>>> handle requests for safepoints not by yielding, but by
>>>>>>>>>>>>>>> leaving
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> task.
>>>>>>>>>>>>>>> This is far too peculiar for me to make the call to hook
>>>>>>>>>>>>>>> up GC
>>>>>>>>>>>>>>> workers
>>>>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I
>>>>>>>>>>>>>>> left the
>>>>>>>>>>>>>>> API in
>>>>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>>>>>>>>>>> about G1
>>>>>>>>>>>>>>> and CMS
>>>>>>>>>>>>>>> should make that call, or else just use a separate
>thread
>>>>>>>>>>>>>>> pool.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>>>>>>>
><http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is it ok now?
>>>>>>>>>>>>>> I still think you should put the "Parallel Safepoint
>Cleanup"
>>>>>>>>>>>>>> workers
>>>>>>>>>>>>>> inside Shenandoah,
>>>>>>>>>>>>>> so the SafepointSynchronizer only calls
>get_safepoint_workers,
>>>>>>>>>>>>>> e.g.:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>>>>>>         _cleanup_workers->run_task(&cleanup,
>>>>>>>>>>>>>> _num_cleanup_workers);
>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>         cleanup.work(0);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That way you don't even need your new flags, but it will
>be
>>>>>>>>>>>>>> up to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> other GCs to make their worker available
>>>>>>>>>>>>>> or cheat with a separate workgang.
>>>>>>>>>>>>> I can do that, I don't mind. The question is, do we want
>that?
>>>>>>>>>>>> The problem is that we do not want to haste such decision,
>we
>>>>>>>>>>>> believe
>>>>>>>>>>>> there is a better solution.
>>>>>>>>>>>> I think you also would want another solution.
>>>>>>>>>>>> But it's seems like such solution with 1 'global' thread
>pool
>>>>>>>>>>>> either
>>>>>>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>>>>>>> Since this probably will not be done any time soon my
>>>>>>>>>>>> suggestion is,
>>>>>>>>>>>> to not hold you back (we also want this), just to make
>>>>>>>>>>>> the code parallel and as an intermediate step ask the GC if
>it
>>>>>>>>>>>> minds
>>>>>>>>>>>> sharing it's thread.
>>>>>>>>>>>>
>>>>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1
>will
>>>>>>>>>>>> share
>>>>>>>>>>>> the code for a separate thread pool, do something of it's
>own or
>>>>>>>>>>>> wait until the bigger question about thread pool(s) have
>been
>>>>>>>>>>>> resolved.
>>>>>>>>>>>>
>>>>>>>>>>>> By adding a thread pool directly to the
>SafepointSynchronizer
>>>>>>>>>>>> and
>>>>>>>>>>>> flags for it we might limit our future options.
>>>>>>>>>>>>
>>>>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang'
>>>>>>>>>>>>> though. I
>>>>>>>>>>>>> see
>>>>>>>>>>>>> that both G1 and CMS suspend their worker threads at a
>>>>>>>>>>>>> safepoint.
>>>>>>>>>>>>> However:
>>>>>>>>>>>> Yes it's not cheating but I want decent heuristics between
>e.g.
>>>>>>>>>>>> number
>>>>>>>>>>>> of concurrent marking threads and parallel safepoint
>threads
>>>>>>>>>>>> since
>>>>>>>>>>>> they compete for cpu time.
>>>>>>>>>>>> As the code looks now, I think that decisions must be made
>by
>>>>>>>>>>>> the
>>>>>>>>>>>> GC.
>>>>>>>>>>> Ok, I see your point. I updated the proposed patch
>accordingly:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>>>>>> Oops. Minor mistake there. Correction:
>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>>>>>>
>>>>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to
>add it
>>>>>>>>>> into
>>>>>>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>>>>>>
>>>>>>>>>> Roman
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>> 

-- 
Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170629/b7b53a02/attachment.htm>

From robbin.ehn at oracle.com  Thu Jun 29 12:17:07 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 29 Jun 2017 14:17:07 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <ab4c5cb0-c4b4-d816-6b03-dddae55cb223@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
 <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
 <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>
 <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>
 <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com>
 <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com>
Message-ID: <9a882506-282a-ec74-27de-5b22e258e352@oracle.com>

The test is using 24 threads (whatever that means), total number of javathreads is 57 (including compiler, etc...).

[29.186s][error][os       ] Num threads:57
[29.186s][error][os       ] omInUseCount:0
[29.186s][error][os       ] omInUseCount:2064
[29.187s][error][os       ] omInUseCount:1861
[29.188s][error][os       ] omInUseCount:1058
[29.188s][error][os       ] omInUseCount:2
[29.188s][error][os       ] omInUseCount:577
[29.189s][error][os       ] omInUseCount:1443
[29.189s][error][os       ] omInUseCount:122
[29.189s][error][os       ] omInUseCount:47
[29.189s][error][os       ] omInUseCount:497
[29.189s][error][os       ] omInUseCount:16
[29.189s][error][os       ] omInUseCount:113
[29.189s][error][os       ] omInUseCount:5
[29.189s][error][os       ] omInUseCount:678
[29.190s][error][os       ] omInUseCount:105
[29.190s][error][os       ] omInUseCount:609
[29.190s][error][os       ] omInUseCount:286
[29.190s][error][os       ] omInUseCount:228
[29.190s][error][os       ] omInUseCount:1391
[29.191s][error][os       ] omInUseCount:1652
[29.191s][error][os       ] omInUseCount:325
[29.191s][error][os       ] omInUseCount:439
[29.192s][error][os       ] omInUseCount:994
[29.192s][error][os       ] omInUseCount:103
[29.192s][error][os       ] omInUseCount:2337
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:1
[29.193s][error][os       ] omInUseCount:1
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:1
[29.193s][error][os       ] omInUseCount:2
[29.193s][error][os       ] omInUseCount:1
[29.193s][error][os       ] omInUseCount:1
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:1
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:1
[29.193s][error][os       ] omInUseCount:0
[29.193s][error][os       ] omInUseCount:0

So in my setup even if you parallel the per thread in use monitors work the synchronization overhead is still larger.

/Robbin

On 06/29/2017 01:42 PM, Roman Kennke wrote:
> How many Java threads are involved in monitor Inflation ? Parallelization is spread by Java threads (i.e. each worker claims and deflates monitors of 1 java thread per step).
> 
> Roman
> 
> Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn <robbin.ehn at oracle.com>:
> 
>     Hi Roman,
> 
>     I haven't had the time to test all scenarios, and the numbers are just an indication:
> 
>     Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg, avg of 10 worsed cleanups 0.0173s
>     Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg, avg of 10 worsed cleanups 0.0199s
>     Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg, avg of 10 worsed cleanups 0.0066s
> 
>     When MonitorUsedDeflationThreshold=0 we are talking about 120000 free monitors to deflate.
>     And I get worse numbers doing the cleanup in 4 threads.
> 
>     Any idea why I see these numbers?
> 
>     Thanks, Robbin
> 
>     On 06/28/2017 10:23 PM, Roman Kennke wrote:
> 
> 
> 
>             On 06/27/2017 09:47 PM, Roman Kennke wrote:
> 
>                 Hi Robbin,
> 
>                 Ugh. Thanks for catching this.
>                 Problem was that I was accounting the thread-local deflations twice:
>                 once in thread-local processing (basically a leftover from my earlier
>                 attempt to implement this accounting) and then again in
>                 finish_deflate_idle_monitors(). Should be fixed here:
> 
>                 http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
>                 <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>
> 
> 
>             Nit:
>             safepoint.cpp : ParallelSPCleanupTask
>             "const char* name = " is not needed and 1 is unused
> 
> 
>         Sorry, I don't understand what you mean by this. I see code like this:
> 
>         const char* name = "deflating idle monitors";
> 
>         and it is used a few lines below, even 2x.
> 
>         What's '1 is unused' ?
> 
> 
>                 Side question: which jtreg targets do you usually run?
> 
> 
>             Right now I cherry pick directories from: hotspot/test/
> 
>             I'm going to add a decent test group for local testing.
> 
>         That would be good!
> 
> 
> 
> 
>                 Trying: make test TEST=hotspot_all
>                 gives me *lots* of failures due to missing jcstress stuff (?!)
>                 And even other subsets seem to depend on several bits and pieces
>                 that I
>                 have no idea about.
> 
> 
>             Yes, you need to use internal tool 'jib' java integrate build to get
>             that work or you can set some environment where the jcstress
>             application stuff is...
> 
>         Uhhh. We really do want a subset of tests that we can run reliably and
>         that are self-contained, how else are people (without that jib thingy)
>         supposed to do some sanity checking with their patches? ;-)
> 
>             I have a regression on ClassLoaderData root scanning, this should not
>             be related,
>             but I only have 3 patches which could cause this, if it's not
>             something in the environment that have changed.
> 
>         Let me know if it's my patch :-)
> 
> 
>             Also do not see any immediate performance gains (off vs 4 threads), it
>             might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
>             , but I need to-do some more testing. I know you often run with none
>             default GSI.
> 
> 
>         First of all, during the course of this review I reduced the change from
>         an actual implementation to a kind of framework, and it needs some
>         separate changes in the GC to make use of it. Not sure if you added
>         corresponding code in (e.g.) G1?
> 
>         Also, this is only really visible in code that makes excessive use of
>         monitors, i.e. the one linked by Carsten's original patch, or the test
>         org.openjdk.gcbench.roots.Synchronizers.test in gc-bench:
> 
>         http://icedtea.classpath.org/hg/gc-bench/
> 
>         There are also some popular real-world apps that tend to do this. From
>         the top off my head, Cassandra is such an application.
> 
>         Thanks, Roman
> 
> 
>             I'll get back to you.
> 
>             Thanks, Robbin
> 
> 
>                 Roman
> 
>                 Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
> 
>                     Hi Roman,
> 
>                     There is something wrong in calculations:
>                     INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0
>                     : pop=27051 free=215487
> 
>                     free is larger than population, have not had the time to dig into this.
> 
>                     Thanks, Robbin
> 
>                     On 06/22/2017 10:19 PM, Roman Kennke wrote:
> 
>                         So here's the latest iteration of that patch:
> 
>                         http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>                         <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
> 
>                         I checked and fixed all the counters. The problem here is that they
>                         are
>                         not updated in a single place (deflate_idle_monitors() ) but in
>                         several
>                         places, potentially by multiple threads. I split up deflation into
>                         prepare_.. and a finish_.. methods to initialize local and update
>                         global
>                         counters respectively, and pass around a counters object (allocated on
>                         stack) to the various code paths that use it. Updating the counters
>                         always happen under a lock, there's no need to do anything special
>                         with
>                         regards to concurrency.
> 
>                         I also checked the nmethod marking, but there doesn't seem to be
>                         anything in that code that looks problematic under concurrency. The
>                         worst that can happen is that two threads write the same value into an
>                         nmethod field. I think we can live with that ;-)
> 
>                         Good to go?
> 
>                         Tested by running specjvm and jcstress fastdebug+release without
>                         issues.
> 
>                         Roman
> 
>                         Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
> 
>                             Hi Roman,
> 
>                             On 06/02/2017 11:41 AM, Roman Kennke wrote:
> 
>                                 Hi David,
>                                 thanks for reviewing. I'll be on vacation the next two weeks too,
>                                 with
>                                 only sporadic access to work stuff.
>                                 Yes, exposure will not be as good as otherwise, but it's not totally
>                                 untested either: the serial code path is the same as the
>                                 parallel, the
>                                 only difference is that it's not actually called by multiple
>                                 threads.
>                                 It's ok I think.
> 
>                                 I found two more issues that I think should be addressed:
>                                 - There are some counters in deflate_idle_monitors() and I'm not
>                                 sure I
>                                 correctly handle them in the split-up and MT'ed thread-local/ global
>                                 list deflation
>                                 - nmethod marking seems to unconditionally poke true or something
>                                 like
>                                 that in nmethod fields. This doesn't hurt correctness-wise, but it's
>                                 probably worth checking if it's already true, especially when doing
>                                 this
>                                 with multiple threads concurrently.
> 
>                                 I'll send an updated patch around later, I hope I can get to it
>                                 today...
> 
> 
>                             I'll review that when you get it out.
>                             I think this looks as a reasonable step before we tackle this with a
>                             major effort, such as the JEP you and Carsten doing.
>                             And another effort to 'fix' nmethods marking.
> 
>                             Internal discussion yesterday lead us to conclude that the runtime
>                             will probably need more threads.
>                             This would be a good driver to do a 'global' worker pool which serves
>                             both gc, runtime and safepoints with threads.
> 
> 
>                                 Roman
> 
>                                     Hi Roman,
> 
>                                     I am about to disappear on an extended vacation so will let others
>                                     pursue this. IIUC this is longer an opt-in by the user at runtime,
>                                     but
>                                     an opt-in by the particular GC developers. Okay. My only concern
>                                     with
>                                     that is if Shenandoah is the only GC that currently opts in then
>                                     this
>                                     code is not going to get much testing and will be more prone to
>                                     incidental breakage.
> 
> 
>                             As I mentioned before, it seem like Erik ? have some idea, maybe he
>                             can do this after his barrier patch.
> 
>                             Thanks!
> 
>                             /Robbin
> 
> 
>                                     Cheers,
>                                     David
> 
>                                     On 2/06/2017 2:21 AM, Roman Kennke wrote:
> 
>                                         Am 01.06.2017 um 17:50 schrieb Roman Kennke:
> 
>                                             Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
> 
>                                                 Hi Roman,
> 
>                                                 On 06/01/2017 11:29 AM, Roman Kennke wrote:
> 
>                                                     Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
> 
>                                                         Hi Roman, I agree that is really needed but:
> 
>                                                         On 05/31/2017 10:27 AM, Roman Kennke wrote:
> 
>                                                             I realized that sharing workers with GC is not so easy.
> 
>                                                             We need to be able to use the workers at a safepoint during
>                                                             concurrent
>                                                             GC work (which also uses the same workers). This does not
>                                                             only
>                                                             require
>                                                             that those workers be suspended, like e.g.
>                                                             SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>                                                             have
>                                                             finished their tasks. This needs some careful handling to
>                                                             work
>                                                             without
>                                                             races: it requires a SuspendibleThreadSetJoiner around the
>                                                             corresponding
>                                                             run_task() call and also the tasks themselves need to join
>                                                             the
>                                                             STS and
>                                                             handle requests for safepoints not by yielding, but by
>                                                             leaving
>                                                             the
>                                                             task.
>                                                             This is far too peculiar for me to make the call to hook
>                                                             up GC
>                                                             workers
>                                                             for safepoint cleanup, and I thus removed those parts. I
>                                                             left the
>                                                             API in
>                                                             CollectedHeap in place. I think GC devs who know better
>                                                             about G1
>                                                             and CMS
>                                                             should make that call, or else just use a separate thread
>                                                             pool.
> 
>                                                             http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>                                                             <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
> 
>                                                             Is it ok now?
> 
>                                                         I still think you should put the "Parallel Safepoint Cleanup"
>                                                         workers
>                                                         inside Shenandoah,
>                                                         so the SafepointSynchronizer only calls get_safepoint_workers,
>                                                         e.g.:
> 
>                                                         _cleanup_workers = heap->get_safepoint_workers();
>                                                         _num_cleanup_workers = _cleanup_workers != NULL ?
>                                                         _cleanup_workers->total_workers() : 1;
>                                                         ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>                                                         StrongRootsScope srs(_num_cleanup_workers);
>                                                         if (_cleanup_workers != NULL) {
>                                                         _cleanup_workers->run_task(&cleanup,
>                                                         _num_cleanup_workers);
>                                                         } else {
>                                                         cleanup.work <http://cleanup.work>(0);
>                                                         }
> 
>                                                         That way you don't even need your new flags, but it will be
>                                                         up to
>                                                         the
>                                                         other GCs to make their worker available
>                                                         or cheat with a separate workgang.
> 
>                                                     I can do that, I don't mind. The question is, do we want that?
> 
>                                                 The problem is that we do not want to haste such decision, we
>                                                 believe
>                                                 there is a better solution.
>                                                 I think you also would want another solution.
>                                                 But it's seems like such solution with 1 'global' thread pool
>                                                 either
>                                                 own by GC or the VM it self is quite the undertaking.
>                                                 Since this probably will not be done any time soon my
>                                                 suggestion is,
>                                                 to not hold you back (we also want this), just to make
>                                                 the code parallel and as an intermediate step ask the GC if it
>                                                 minds
>                                                 sharing it's thread.
> 
>                                                 Now when Shenandoah is merged it's possible that e.g. G1 will
>                                                 share
>                                                 the code for a separate thread pool, do something of it's own or
>                                                 wait until the bigger question about thread pool(s) have been
>                                                 resolved.
> 
>                                                 By adding a thread pool directly to the SafepointSynchronizer
>                                                 and
>                                                 flags for it we might limit our future options.
> 
>                                                     I wouldn't call it 'cheating with a separate workgang'
>                                                     though. I
>                                                     see
>                                                     that both G1 and CMS suspend their worker threads at a
>                                                     safepoint.
>                                                     However:
> 
>                                                 Yes it's not cheating but I want decent heuristics between e.g.
>                                                 number
>                                                 of concurrent marking threads and parallel safepoint threads
>                                                 since
>                                                 they compete for cpu time.
>                                                 As the code looks now, I think that decisions must be made by
>                                                 the
>                                                 GC.
> 
>                                             Ok, I see your point. I updated the proposed patch accordingly:
> 
>                                             http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>                                             <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
> 
>                                         Oops. Minor mistake there. Correction:
>                                         http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>                                         <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
> 
>                                         (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>                                         into
>                                         collectedHeap.hpp, resulting in build failure...)
> 
>                                         Roman
> 
> 
> 
> 
> 
> 
> 
> -- 
> Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.


From rkennke at redhat.com  Thu Jun 29 18:25:58 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 29 Jun 2017 20:25:58 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <9a882506-282a-ec74-27de-5b22e258e352@oracle.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <b64c0b62-5aca-2c88-5d4f-a5be4a5d697a@oracle.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
 <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
 <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>
 <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>
 <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com>
 <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com>
 <9a882506-282a-ec74-27de-5b22e258e352@oracle.com>
Message-ID: <47667919-0786-56a0-ebf9-d7c1b48766c2@redhat.com>

I just did a run with gcbench.
I am running:

build/linux-x86_64-normal-server-release/images/jdk/bin/java -jar
target/benchmarks.jar roots.Sync --jvmArgs "-Xmx8g -Xms8g
-XX:ParallelSafepointCleanupThreads=1 -XX:-UseBiasedLocking --add-opens
java.base/jdk.internal.misc=ALL-UNNAMED -XX:+PrintSafepointStatistics"
-p size=500000 -wi 5 -i 5 -f 1

i.e. I am giving it 500,000 monitors per thread on 8 java threads.

with VMThread I am getting:

          vmop                            [ threads:    total
initially_running wait_to_block ][ time:    spin   block    sync
cleanup    vmop ] page_trap_count
   0,646: G1IncCollectionPause            [               
19                 4             6 ][             0       0       0    
158     225 ]               4
   1,073: G1IncCollectionPause            [               
19                 5             6 ][             1       0       1    
159     174 ]               5
   1,961: G1IncCollectionPause            [               
19                 2             6 ][             0       0       0    
130      66 ]               2
   2,202: G1IncCollectionPause            [               
19                 5             6 ][             1       0       1    
127      70 ]               5
   2,445: G1IncCollectionPause            [               
19                 7             7 ][             1       0       1    
127      66 ]               7
   2,684: G1IncCollectionPause            [               
19                 7             7 ][             1       0       1    
127      66 ]               7
   3,371: G1IncCollectionPause            [               
19                 5             7 ][             1       0       1    
127      74 ]               5
   3,619: G1IncCollectionPause            [               
19                 5             6 ][             1       0       1    
127      66 ]               5
   3,857: G1IncCollectionPause            [               
19                 6             6 ][             1       0       1    
126      68 ]               6

I.e. it gets to fairly consistent >120us for cleanup.

With 4 safepoint cleanup threads I get:


          vmop                            [ threads:    total
initially_running wait_to_block ][ time:    spin   block    sync
cleanup    vmop ] page_trap_count
   0,650: G1IncCollectionPause            [               
19                 4             6 ][             0       0       0     
63     197 ]               4
   0,951: G1IncCollectionPause            [               
19                 0             1 ][             0       0       0     
64     151 ]               0
   1,214: G1IncCollectionPause            [               
19                 7             8 ][             0       0       0     
62      93 ]               6
   1,942: G1IncCollectionPause            [               
19                 4             6 ][             1       0       1     
59      71 ]               4
   2,118: G1IncCollectionPause            [               
19                 6             6 ][             1       0       1     
59      72 ]               6
   2,296: G1IncCollectionPause            [               
19                 5             6 ][             0       0       0     
59      69 ]               5

i.e. fairly consistently around 60 us (I think it's us?!)

I grant you that I'm throwing way way more monitors at it. With just
12000 monitors per thread I get columns of 0s under cleanup. :-)

Roman

Here's with 1 tAm 29.06.2017 um 14:17 schrieb Robbin Ehn:
> The test is using 24 threads (whatever that means), total number of
> javathreads is 57 (including compiler, etc...).
>
> [29.186s][error][os       ] Num threads:57
> [29.186s][error][os       ] omInUseCount:0
> [29.186s][error][os       ] omInUseCount:2064
> [29.187s][error][os       ] omInUseCount:1861
> [29.188s][error][os       ] omInUseCount:1058
> [29.188s][error][os       ] omInUseCount:2
> [29.188s][error][os       ] omInUseCount:577
> [29.189s][error][os       ] omInUseCount:1443
> [29.189s][error][os       ] omInUseCount:122
> [29.189s][error][os       ] omInUseCount:47
> [29.189s][error][os       ] omInUseCount:497
> [29.189s][error][os       ] omInUseCount:16
> [29.189s][error][os       ] omInUseCount:113
> [29.189s][error][os       ] omInUseCount:5
> [29.189s][error][os       ] omInUseCount:678
> [29.190s][error][os       ] omInUseCount:105
> [29.190s][error][os       ] omInUseCount:609
> [29.190s][error][os       ] omInUseCount:286
> [29.190s][error][os       ] omInUseCount:228
> [29.190s][error][os       ] omInUseCount:1391
> [29.191s][error][os       ] omInUseCount:1652
> [29.191s][error][os       ] omInUseCount:325
> [29.191s][error][os       ] omInUseCount:439
> [29.192s][error][os       ] omInUseCount:994
> [29.192s][error][os       ] omInUseCount:103
> [29.192s][error][os       ] omInUseCount:2337
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:1
> [29.193s][error][os       ] omInUseCount:1
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:1
> [29.193s][error][os       ] omInUseCount:2
> [29.193s][error][os       ] omInUseCount:1
> [29.193s][error][os       ] omInUseCount:1
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:1
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:1
> [29.193s][error][os       ] omInUseCount:0
> [29.193s][error][os       ] omInUseCount:0
>
> So in my setup even if you parallel the per thread in use monitors
> work the synchronization overhead is still larger.
>
> /Robbin
>
> On 06/29/2017 01:42 PM, Roman Kennke wrote:
>> How many Java threads are involved in monitor Inflation ?
>> Parallelization is spread by Java threads (i.e. each worker claims
>> and deflates monitors of 1 java thread per step).
>>
>> Roman
>>
>> Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn
>> <robbin.ehn at oracle.com>:
>>
>>     Hi Roman,
>>
>>     I haven't had the time to test all scenarios, and the numbers are
>> just an indication:
>>
>>     Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg,
>> avg of 10 worsed cleanups 0.0173s
>>     Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg,
>> avg of 10 worsed cleanups 0.0199s
>>     Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg,
>> avg of 10 worsed cleanups 0.0066s
>>
>>     When MonitorUsedDeflationThreshold=0 we are talking about 120000
>> free monitors to deflate.
>>     And I get worse numbers doing the cleanup in 4 threads.
>>
>>     Any idea why I see these numbers?
>>
>>     Thanks, Robbin
>>
>>     On 06/28/2017 10:23 PM, Roman Kennke wrote:
>>
>>
>>
>>             On 06/27/2017 09:47 PM, Roman Kennke wrote:
>>
>>                 Hi Robbin,
>>
>>                 Ugh. Thanks for catching this.
>>                 Problem was that I was accounting the thread-local
>> deflations twice:
>>                 once in thread-local processing (basically a leftover
>> from my earlier
>>                 attempt to implement this accounting) and then again in
>>                 finish_deflate_idle_monitors(). Should be fixed here:
>>
>>                 http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
>>                
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>
>>
>>
>>             Nit:
>>             safepoint.cpp : ParallelSPCleanupTask
>>             "const char* name = " is not needed and 1 is unused
>>
>>
>>         Sorry, I don't understand what you mean by this. I see code
>> like this:
>>
>>         const char* name = "deflating idle monitors";
>>
>>         and it is used a few lines below, even 2x.
>>
>>         What's '1 is unused' ?
>>
>>
>>                 Side question: which jtreg targets do you usually run?
>>
>>
>>             Right now I cherry pick directories from: hotspot/test/
>>
>>             I'm going to add a decent test group for local testing.
>>
>>         That would be good!
>>
>>
>>
>>
>>                 Trying: make test TEST=hotspot_all
>>                 gives me *lots* of failures due to missing jcstress
>> stuff (?!)
>>                 And even other subsets seem to depend on several bits
>> and pieces
>>                 that I
>>                 have no idea about.
>>
>>
>>             Yes, you need to use internal tool 'jib' java integrate
>> build to get
>>             that work or you can set some environment where the jcstress
>>             application stuff is...
>>
>>         Uhhh. We really do want a subset of tests that we can run
>> reliably and
>>         that are self-contained, how else are people (without that
>> jib thingy)
>>         supposed to do some sanity checking with their patches? ;-)
>>
>>             I have a regression on ClassLoaderData root scanning,
>> this should not
>>             be related,
>>             but I only have 3 patches which could cause this, if it's
>> not
>>             something in the environment that have changed.
>>
>>         Let me know if it's my patch :-)
>>
>>
>>             Also do not see any immediate performance gains (off vs 4
>> threads), it
>>             might be
>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
>>             , but I need to-do some more testing. I know you often
>> run with none
>>             default GSI.
>>
>>
>>         First of all, during the course of this review I reduced the
>> change from
>>         an actual implementation to a kind of framework, and it needs
>> some
>>         separate changes in the GC to make use of it. Not sure if you
>> added
>>         corresponding code in (e.g.) G1?
>>
>>         Also, this is only really visible in code that makes
>> excessive use of
>>         monitors, i.e. the one linked by Carsten's original patch, or
>> the test
>>         org.openjdk.gcbench.roots.Synchronizers.test in gc-bench:
>>
>>         http://icedtea.classpath.org/hg/gc-bench/
>>
>>         There are also some popular real-world apps that tend to do
>> this. From
>>         the top off my head, Cassandra is such an application.
>>
>>         Thanks, Roman
>>
>>
>>             I'll get back to you.
>>
>>             Thanks, Robbin
>>
>>
>>                 Roman
>>
>>                 Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>>
>>                     Hi Roman,
>>
>>                     There is something wrong in calculations:
>>                     INFO: Deflate: InCirc=43 InUse=18 Scavenged=25
>> ForceMonitorScavenge=0
>>                     : pop=27051 free=215487
>>
>>                     free is larger than population, have not had the
>> time to dig into this.
>>
>>                     Thanks, Robbin
>>
>>                     On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>
>>                         So here's the latest iteration of that patch:
>>
>>                        
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>                        
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>
>>                         I checked and fixed all the counters. The
>> problem here is that they
>>                         are
>>                         not updated in a single place
>> (deflate_idle_monitors() ) but in
>>                         several
>>                         places, potentially by multiple threads. I
>> split up deflation into
>>                         prepare_.. and a finish_.. methods to
>> initialize local and update
>>                         global
>>                         counters respectively, and pass around a
>> counters object (allocated on
>>                         stack) to the various code paths that use it.
>> Updating the counters
>>                         always happen under a lock, there's no need
>> to do anything special
>>                         with
>>                         regards to concurrency.
>>
>>                         I also checked the nmethod marking, but there
>> doesn't seem to be
>>                         anything in that code that looks problematic
>> under concurrency. The
>>                         worst that can happen is that two threads
>> write the same value into an
>>                         nmethod field. I think we can live with that ;-)
>>
>>                         Good to go?
>>
>>                         Tested by running specjvm and jcstress
>> fastdebug+release without
>>                         issues.
>>
>>                         Roman
>>
>>                         Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>
>>                             Hi Roman,
>>
>>                             On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>
>>                                 Hi David,
>>                                 thanks for reviewing. I'll be on
>> vacation the next two weeks too,
>>                                 with
>>                                 only sporadic access to work stuff.
>>                                 Yes, exposure will not be as good as
>> otherwise, but it's not totally
>>                                 untested either: the serial code path
>> is the same as the
>>                                 parallel, the
>>                                 only difference is that it's not
>> actually called by multiple
>>                                 threads.
>>                                 It's ok I think.
>>
>>                                 I found two more issues that I think
>> should be addressed:
>>                                 - There are some counters in
>> deflate_idle_monitors() and I'm not
>>                                 sure I
>>                                 correctly handle them in the split-up
>> and MT'ed thread-local/ global
>>                                 list deflation
>>                                 - nmethod marking seems to
>> unconditionally poke true or something
>>                                 like
>>                                 that in nmethod fields. This doesn't
>> hurt correctness-wise, but it's
>>                                 probably worth checking if it's
>> already true, especially when doing
>>                                 this
>>                                 with multiple threads concurrently.
>>
>>                                 I'll send an updated patch around
>> later, I hope I can get to it
>>                                 today...
>>
>>
>>                             I'll review that when you get it out.
>>                             I think this looks as a reasonable step
>> before we tackle this with a
>>                             major effort, such as the JEP you and
>> Carsten doing.
>>                             And another effort to 'fix' nmethods
>> marking.
>>
>>                             Internal discussion yesterday lead us to
>> conclude that the runtime
>>                             will probably need more threads.
>>                             This would be a good driver to do a
>> 'global' worker pool which serves
>>                             both gc, runtime and safepoints with
>> threads.
>>
>>
>>                                 Roman
>>
>>                                     Hi Roman,
>>
>>                                     I am about to disappear on an
>> extended vacation so will let others
>>                                     pursue this. IIUC this is longer
>> an opt-in by the user at runtime,
>>                                     but
>>                                     an opt-in by the particular GC
>> developers. Okay. My only concern
>>                                     with
>>                                     that is if Shenandoah is the only
>> GC that currently opts in then
>>                                     this
>>                                     code is not going to get much
>> testing and will be more prone to
>>                                     incidental breakage.
>>
>>
>>                             As I mentioned before, it seem like Erik
>> ? have some idea, maybe he
>>                             can do this after his barrier patch.
>>
>>                             Thanks!
>>
>>                             /Robbin
>>
>>
>>                                     Cheers,
>>                                     David
>>
>>                                     On 2/06/2017 2:21 AM, Roman
>> Kennke wrote:
>>
>>                                         Am 01.06.2017 um 17:50
>> schrieb Roman Kennke:
>>
>>                                             Am 01.06.2017 um 14:18
>> schrieb Robbin Ehn:
>>
>>                                                 Hi Roman,
>>
>>                                                 On 06/01/2017 11:29
>> AM, Roman Kennke wrote:
>>
>>                                                     Am 31.05.2017 um
>> 22:06 schrieb Robbin Ehn:
>>
>>                                                         Hi Roman, I
>> agree that is really needed but:
>>
>>                                                         On 05/31/2017
>> 10:27 AM, Roman Kennke wrote:
>>
>>                                                             I
>> realized that sharing workers with GC is not so easy.
>>
>>                                                             We need
>> to be able to use the workers at a safepoint during
>>                                                             concurrent
>>                                                             GC work
>> (which also uses the same workers). This does not
>>                                                             only
>>                                                             require
>>                                                             that
>> those workers be suspended, like e.g.
>>                                                            
>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>                                                             have
>>                                                             finished
>> their tasks. This needs some careful handling to
>>                                                             work
>>                                                             without
>>                                                             races: it
>> requires a SuspendibleThreadSetJoiner around the
>>                                                            
>> corresponding
>>                                                            
>> run_task() call and also the tasks themselves need to join
>>                                                             the
>>                                                             STS and
>>                                                             handle
>> requests for safepoints not by yielding, but by
>>                                                             leaving
>>                                                             the
>>                                                             task.
>>                                                             This is
>> far too peculiar for me to make the call to hook
>>                                                             up GC
>>                                                             workers
>>                                                             for
>> safepoint cleanup, and I thus removed those parts. I
>>                                                             left the
>>                                                             API in
>>                                                            
>> CollectedHeap in place. I think GC devs who know better
>>                                                             about G1
>>                                                             and CMS
>>                                                             should
>> make that call, or else just use a separate thread
>>                                                             pool.
>>
>>                                                            
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>                                                            
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>
>>                                                             Is it ok
>> now?
>>
>>                                                         I still think
>> you should put the "Parallel Safepoint Cleanup"
>>                                                         workers
>>                                                         inside
>> Shenandoah,
>>                                                         so the
>> SafepointSynchronizer only calls get_safepoint_workers,
>>                                                         e.g.:
>>
>>                                                        
>> _cleanup_workers = heap->get_safepoint_workers();
>>                                                        
>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>                                                        
>> _cleanup_workers->total_workers() : 1;
>>                                                        
>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>                                                        
>> StrongRootsScope srs(_num_cleanup_workers);
>>                                                         if
>> (_cleanup_workers != NULL) {
>>                                                        
>> _cleanup_workers->run_task(&cleanup,
>>                                                        
>> _num_cleanup_workers);
>>                                                         } else {
>>                                                         cleanup.work
>> <http://cleanup.work>(0);
>>                                                         }
>>
>>                                                         That way you
>> don't even need your new flags, but it will be
>>                                                         up to
>>                                                         the
>>                                                         other GCs to
>> make their worker available
>>                                                         or cheat with
>> a separate workgang.
>>
>>                                                     I can do that, I
>> don't mind. The question is, do we want that?
>>
>>                                                 The problem is that
>> we do not want to haste such decision, we
>>                                                 believe
>>                                                 there is a better
>> solution.
>>                                                 I think you also
>> would want another solution.
>>                                                 But it's seems like
>> such solution with 1 'global' thread pool
>>                                                 either
>>                                                 own by GC or the VM
>> it self is quite the undertaking.
>>                                                 Since this probably
>> will not be done any time soon my
>>                                                 suggestion is,
>>                                                 to not hold you back
>> (we also want this), just to make
>>                                                 the code parallel and
>> as an intermediate step ask the GC if it
>>                                                 minds
>>                                                 sharing it's thread.
>>
>>                                                 Now when Shenandoah
>> is merged it's possible that e.g. G1 will
>>                                                 share
>>                                                 the code for a
>> separate thread pool, do something of it's own or
>>                                                 wait until the bigger
>> question about thread pool(s) have been
>>                                                 resolved.
>>
>>                                                 By adding a thread
>> pool directly to the SafepointSynchronizer
>>                                                 and
>>                                                 flags for it we might
>> limit our future options.
>>
>>                                                     I wouldn't call
>> it 'cheating with a separate workgang'
>>                                                     though. I
>>                                                     see
>>                                                     that both G1 and
>> CMS suspend their worker threads at a
>>                                                     safepoint.
>>                                                     However:
>>
>>                                                 Yes it's not cheating
>> but I want decent heuristics between e.g.
>>                                                 number
>>                                                 of concurrent marking
>> threads and parallel safepoint threads
>>                                                 since
>>                                                 they compete for cpu
>> time.
>>                                                 As the code looks
>> now, I think that decisions must be made by
>>                                                 the
>>                                                 GC.
>>
>>                                             Ok, I see your point. I
>> updated the proposed patch accordingly:
>>
>>                                            
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>                                            
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>
>>                                         Oops. Minor mistake there.
>> Correction:
>>                                        
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>                                        
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>
>>                                         (Removed 'class WorkGang'
>> from safepoint.hpp, and forgot to add it
>>                                         into
>>                                         collectedHeap.hpp, resulting
>> in build failure...)
>>
>>                                         Roman
>>
>>
>>
>>
>>
>>
>>
>> -- 
>> Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.


From robbin.ehn at oracle.com  Thu Jun 29 19:27:15 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 29 Jun 2017 21:27:15 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <47667919-0786-56a0-ebf9-d7c1b48766c2@redhat.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
 <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
 <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>
 <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>
 <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com>
 <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com>
 <9a882506-282a-ec74-27de-5b22e258e352@oracle.com>
 <47667919-0786-56a0-ebf9-d7c1b48766c2@redhat.com>
Message-ID: <72d197f7-a99b-84bc-26f7-c9a84da26ccd@oracle.com>

Hi Roman,

Thanks,

There seem to be a performance gain vs old just running VM thread (again shaky numbers, but an indication):

Old code with,   MonitorUsedDeflationThreshold=0, 0.003099s, avg of 10 worsed cleanups 0.0213s
Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s, avg of 10 worsed cleanups 0.0173s

I'm assuming that combining deflation and nmethods marking in same pass is the reason for this.
Great!

I'm happy, looks good!

Thanks for fixing!

/Robbin

On 06/29/2017 08:25 PM, Roman Kennke wrote:
> I just did a run with gcbench.
> I am running:
> 
> build/linux-x86_64-normal-server-release/images/jdk/bin/java -jar
> target/benchmarks.jar roots.Sync --jvmArgs "-Xmx8g -Xms8g
> -XX:ParallelSafepointCleanupThreads=1 -XX:-UseBiasedLocking --add-opens
> java.base/jdk.internal.misc=ALL-UNNAMED -XX:+PrintSafepointStatistics"
> -p size=500000 -wi 5 -i 5 -f 1
> 
> i.e. I am giving it 500,000 monitors per thread on 8 java threads.
> 
> with VMThread I am getting:
> 
>            vmop                            [ threads:    total
> initially_running wait_to_block ][ time:    spin   block    sync
> cleanup    vmop ] page_trap_count
>     0,646: G1IncCollectionPause            [
> 19                 4             6 ][             0       0       0
> 158     225 ]               4
>     1,073: G1IncCollectionPause            [
> 19                 5             6 ][             1       0       1
> 159     174 ]               5
>     1,961: G1IncCollectionPause            [
> 19                 2             6 ][             0       0       0
> 130      66 ]               2
>     2,202: G1IncCollectionPause            [
> 19                 5             6 ][             1       0       1
> 127      70 ]               5
>     2,445: G1IncCollectionPause            [
> 19                 7             7 ][             1       0       1
> 127      66 ]               7
>     2,684: G1IncCollectionPause            [
> 19                 7             7 ][             1       0       1
> 127      66 ]               7
>     3,371: G1IncCollectionPause            [
> 19                 5             7 ][             1       0       1
> 127      74 ]               5
>     3,619: G1IncCollectionPause            [
> 19                 5             6 ][             1       0       1
> 127      66 ]               5
>     3,857: G1IncCollectionPause            [
> 19                 6             6 ][             1       0       1
> 126      68 ]               6
> 
> I.e. it gets to fairly consistent >120us for cleanup.
> 
> With 4 safepoint cleanup threads I get:
> 
> 
>            vmop                            [ threads:    total
> initially_running wait_to_block ][ time:    spin   block    sync
> cleanup    vmop ] page_trap_count
>     0,650: G1IncCollectionPause            [
> 19                 4             6 ][             0       0       0
> 63     197 ]               4
>     0,951: G1IncCollectionPause            [
> 19                 0             1 ][             0       0       0
> 64     151 ]               0
>     1,214: G1IncCollectionPause            [
> 19                 7             8 ][             0       0       0
> 62      93 ]               6
>     1,942: G1IncCollectionPause            [
> 19                 4             6 ][             1       0       1
> 59      71 ]               4
>     2,118: G1IncCollectionPause            [
> 19                 6             6 ][             1       0       1
> 59      72 ]               6
>     2,296: G1IncCollectionPause            [
> 19                 5             6 ][             0       0       0
> 59      69 ]               5
> 
> i.e. fairly consistently around 60 us (I think it's us?!)
> 
> I grant you that I'm throwing way way more monitors at it. With just
> 12000 monitors per thread I get columns of 0s under cleanup. :-)
> 
> Roman
> 
> Here's with 1 tAm 29.06.2017 um 14:17 schrieb Robbin Ehn:
>> The test is using 24 threads (whatever that means), total number of
>> javathreads is 57 (including compiler, etc...).
>>
>> [29.186s][error][os       ] Num threads:57
>> [29.186s][error][os       ] omInUseCount:0
>> [29.186s][error][os       ] omInUseCount:2064
>> [29.187s][error][os       ] omInUseCount:1861
>> [29.188s][error][os       ] omInUseCount:1058
>> [29.188s][error][os       ] omInUseCount:2
>> [29.188s][error][os       ] omInUseCount:577
>> [29.189s][error][os       ] omInUseCount:1443
>> [29.189s][error][os       ] omInUseCount:122
>> [29.189s][error][os       ] omInUseCount:47
>> [29.189s][error][os       ] omInUseCount:497
>> [29.189s][error][os       ] omInUseCount:16
>> [29.189s][error][os       ] omInUseCount:113
>> [29.189s][error][os       ] omInUseCount:5
>> [29.189s][error][os       ] omInUseCount:678
>> [29.190s][error][os       ] omInUseCount:105
>> [29.190s][error][os       ] omInUseCount:609
>> [29.190s][error][os       ] omInUseCount:286
>> [29.190s][error][os       ] omInUseCount:228
>> [29.190s][error][os       ] omInUseCount:1391
>> [29.191s][error][os       ] omInUseCount:1652
>> [29.191s][error][os       ] omInUseCount:325
>> [29.191s][error][os       ] omInUseCount:439
>> [29.192s][error][os       ] omInUseCount:994
>> [29.192s][error][os       ] omInUseCount:103
>> [29.192s][error][os       ] omInUseCount:2337
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:1
>> [29.193s][error][os       ] omInUseCount:1
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:1
>> [29.193s][error][os       ] omInUseCount:2
>> [29.193s][error][os       ] omInUseCount:1
>> [29.193s][error][os       ] omInUseCount:1
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:1
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:1
>> [29.193s][error][os       ] omInUseCount:0
>> [29.193s][error][os       ] omInUseCount:0
>>
>> So in my setup even if you parallel the per thread in use monitors
>> work the synchronization overhead is still larger.
>>
>> /Robbin
>>
>> On 06/29/2017 01:42 PM, Roman Kennke wrote:
>>> How many Java threads are involved in monitor Inflation ?
>>> Parallelization is spread by Java threads (i.e. each worker claims
>>> and deflates monitors of 1 java thread per step).
>>>
>>> Roman
>>>
>>> Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn
>>> <robbin.ehn at oracle.com>:
>>>
>>>      Hi Roman,
>>>
>>>      I haven't had the time to test all scenarios, and the numbers are
>>> just an indication:
>>>
>>>      Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg,
>>> avg of 10 worsed cleanups 0.0173s
>>>      Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg,
>>> avg of 10 worsed cleanups 0.0199s
>>>      Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg,
>>> avg of 10 worsed cleanups 0.0066s
>>>
>>>      When MonitorUsedDeflationThreshold=0 we are talking about 120000
>>> free monitors to deflate.
>>>      And I get worse numbers doing the cleanup in 4 threads.
>>>
>>>      Any idea why I see these numbers?
>>>
>>>      Thanks, Robbin
>>>
>>>      On 06/28/2017 10:23 PM, Roman Kennke wrote:
>>>
>>>
>>>
>>>              On 06/27/2017 09:47 PM, Roman Kennke wrote:
>>>
>>>                  Hi Robbin,
>>>
>>>                  Ugh. Thanks for catching this.
>>>                  Problem was that I was accounting the thread-local
>>> deflations twice:
>>>                  once in thread-local processing (basically a leftover
>>> from my earlier
>>>                  attempt to implement this accounting) and then again in
>>>                  finish_deflate_idle_monitors(). Should be fixed here:
>>>
>>>                  http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
>>>                 
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>
>>>
>>>
>>>              Nit:
>>>              safepoint.cpp : ParallelSPCleanupTask
>>>              "const char* name = " is not needed and 1 is unused
>>>
>>>
>>>          Sorry, I don't understand what you mean by this. I see code
>>> like this:
>>>
>>>          const char* name = "deflating idle monitors";
>>>
>>>          and it is used a few lines below, even 2x.
>>>
>>>          What's '1 is unused' ?
>>>
>>>
>>>                  Side question: which jtreg targets do you usually run?
>>>
>>>
>>>              Right now I cherry pick directories from: hotspot/test/
>>>
>>>              I'm going to add a decent test group for local testing.
>>>
>>>          That would be good!
>>>
>>>
>>>
>>>
>>>                  Trying: make test TEST=hotspot_all
>>>                  gives me *lots* of failures due to missing jcstress
>>> stuff (?!)
>>>                  And even other subsets seem to depend on several bits
>>> and pieces
>>>                  that I
>>>                  have no idea about.
>>>
>>>
>>>              Yes, you need to use internal tool 'jib' java integrate
>>> build to get
>>>              that work or you can set some environment where the jcstress
>>>              application stuff is...
>>>
>>>          Uhhh. We really do want a subset of tests that we can run
>>> reliably and
>>>          that are self-contained, how else are people (without that
>>> jib thingy)
>>>          supposed to do some sanity checking with their patches? ;-)
>>>
>>>              I have a regression on ClassLoaderData root scanning,
>>> this should not
>>>              be related,
>>>              but I only have 3 patches which could cause this, if it's
>>> not
>>>              something in the environment that have changed.
>>>
>>>          Let me know if it's my patch :-)
>>>
>>>
>>>              Also do not see any immediate performance gains (off vs 4
>>> threads), it
>>>              might be
>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
>>>              , but I need to-do some more testing. I know you often
>>> run with none
>>>              default GSI.
>>>
>>>
>>>          First of all, during the course of this review I reduced the
>>> change from
>>>          an actual implementation to a kind of framework, and it needs
>>> some
>>>          separate changes in the GC to make use of it. Not sure if you
>>> added
>>>          corresponding code in (e.g.) G1?
>>>
>>>          Also, this is only really visible in code that makes
>>> excessive use of
>>>          monitors, i.e. the one linked by Carsten's original patch, or
>>> the test
>>>          org.openjdk.gcbench.roots.Synchronizers.test in gc-bench:
>>>
>>>          http://icedtea.classpath.org/hg/gc-bench/
>>>
>>>          There are also some popular real-world apps that tend to do
>>> this. From
>>>          the top off my head, Cassandra is such an application.
>>>
>>>          Thanks, Roman
>>>
>>>
>>>              I'll get back to you.
>>>
>>>              Thanks, Robbin
>>>
>>>
>>>                  Roman
>>>
>>>                  Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>>>
>>>                      Hi Roman,
>>>
>>>                      There is something wrong in calculations:
>>>                      INFO: Deflate: InCirc=43 InUse=18 Scavenged=25
>>> ForceMonitorScavenge=0
>>>                      : pop=27051 free=215487
>>>
>>>                      free is larger than population, have not had the
>>> time to dig into this.
>>>
>>>                      Thanks, Robbin
>>>
>>>                      On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>>
>>>                          So here's the latest iteration of that patch:
>>>
>>>                         
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>>                         
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>>
>>>                          I checked and fixed all the counters. The
>>> problem here is that they
>>>                          are
>>>                          not updated in a single place
>>> (deflate_idle_monitors() ) but in
>>>                          several
>>>                          places, potentially by multiple threads. I
>>> split up deflation into
>>>                          prepare_.. and a finish_.. methods to
>>> initialize local and update
>>>                          global
>>>                          counters respectively, and pass around a
>>> counters object (allocated on
>>>                          stack) to the various code paths that use it.
>>> Updating the counters
>>>                          always happen under a lock, there's no need
>>> to do anything special
>>>                          with
>>>                          regards to concurrency.
>>>
>>>                          I also checked the nmethod marking, but there
>>> doesn't seem to be
>>>                          anything in that code that looks problematic
>>> under concurrency. The
>>>                          worst that can happen is that two threads
>>> write the same value into an
>>>                          nmethod field. I think we can live with that ;-)
>>>
>>>                          Good to go?
>>>
>>>                          Tested by running specjvm and jcstress
>>> fastdebug+release without
>>>                          issues.
>>>
>>>                          Roman
>>>
>>>                          Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>>
>>>                              Hi Roman,
>>>
>>>                              On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>
>>>                                  Hi David,
>>>                                  thanks for reviewing. I'll be on
>>> vacation the next two weeks too,
>>>                                  with
>>>                                  only sporadic access to work stuff.
>>>                                  Yes, exposure will not be as good as
>>> otherwise, but it's not totally
>>>                                  untested either: the serial code path
>>> is the same as the
>>>                                  parallel, the
>>>                                  only difference is that it's not
>>> actually called by multiple
>>>                                  threads.
>>>                                  It's ok I think.
>>>
>>>                                  I found two more issues that I think
>>> should be addressed:
>>>                                  - There are some counters in
>>> deflate_idle_monitors() and I'm not
>>>                                  sure I
>>>                                  correctly handle them in the split-up
>>> and MT'ed thread-local/ global
>>>                                  list deflation
>>>                                  - nmethod marking seems to
>>> unconditionally poke true or something
>>>                                  like
>>>                                  that in nmethod fields. This doesn't
>>> hurt correctness-wise, but it's
>>>                                  probably worth checking if it's
>>> already true, especially when doing
>>>                                  this
>>>                                  with multiple threads concurrently.
>>>
>>>                                  I'll send an updated patch around
>>> later, I hope I can get to it
>>>                                  today...
>>>
>>>
>>>                              I'll review that when you get it out.
>>>                              I think this looks as a reasonable step
>>> before we tackle this with a
>>>                              major effort, such as the JEP you and
>>> Carsten doing.
>>>                              And another effort to 'fix' nmethods
>>> marking.
>>>
>>>                              Internal discussion yesterday lead us to
>>> conclude that the runtime
>>>                              will probably need more threads.
>>>                              This would be a good driver to do a
>>> 'global' worker pool which serves
>>>                              both gc, runtime and safepoints with
>>> threads.
>>>
>>>
>>>                                  Roman
>>>
>>>                                      Hi Roman,
>>>
>>>                                      I am about to disappear on an
>>> extended vacation so will let others
>>>                                      pursue this. IIUC this is longer
>>> an opt-in by the user at runtime,
>>>                                      but
>>>                                      an opt-in by the particular GC
>>> developers. Okay. My only concern
>>>                                      with
>>>                                      that is if Shenandoah is the only
>>> GC that currently opts in then
>>>                                      this
>>>                                      code is not going to get much
>>> testing and will be more prone to
>>>                                      incidental breakage.
>>>
>>>
>>>                              As I mentioned before, it seem like Erik
>>> ? have some idea, maybe he
>>>                              can do this after his barrier patch.
>>>
>>>                              Thanks!
>>>
>>>                              /Robbin
>>>
>>>
>>>                                      Cheers,
>>>                                      David
>>>
>>>                                      On 2/06/2017 2:21 AM, Roman
>>> Kennke wrote:
>>>
>>>                                          Am 01.06.2017 um 17:50
>>> schrieb Roman Kennke:
>>>
>>>                                              Am 01.06.2017 um 14:18
>>> schrieb Robbin Ehn:
>>>
>>>                                                  Hi Roman,
>>>
>>>                                                  On 06/01/2017 11:29
>>> AM, Roman Kennke wrote:
>>>
>>>                                                      Am 31.05.2017 um
>>> 22:06 schrieb Robbin Ehn:
>>>
>>>                                                          Hi Roman, I
>>> agree that is really needed but:
>>>
>>>                                                          On 05/31/2017
>>> 10:27 AM, Roman Kennke wrote:
>>>
>>>                                                              I
>>> realized that sharing workers with GC is not so easy.
>>>
>>>                                                              We need
>>> to be able to use the workers at a safepoint during
>>>                                                              concurrent
>>>                                                              GC work
>>> (which also uses the same workers). This does not
>>>                                                              only
>>>                                                              require
>>>                                                              that
>>> those workers be suspended, like e.g.
>>>                                                             
>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>                                                              have
>>>                                                              finished
>>> their tasks. This needs some careful handling to
>>>                                                              work
>>>                                                              without
>>>                                                              races: it
>>> requires a SuspendibleThreadSetJoiner around the
>>>                                                             
>>> corresponding
>>>                                                             
>>> run_task() call and also the tasks themselves need to join
>>>                                                              the
>>>                                                              STS and
>>>                                                              handle
>>> requests for safepoints not by yielding, but by
>>>                                                              leaving
>>>                                                              the
>>>                                                              task.
>>>                                                              This is
>>> far too peculiar for me to make the call to hook
>>>                                                              up GC
>>>                                                              workers
>>>                                                              for
>>> safepoint cleanup, and I thus removed those parts. I
>>>                                                              left the
>>>                                                              API in
>>>                                                             
>>> CollectedHeap in place. I think GC devs who know better
>>>                                                              about G1
>>>                                                              and CMS
>>>                                                              should
>>> make that call, or else just use a separate thread
>>>                                                              pool.
>>>
>>>                                                             
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>                                                             
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>
>>>                                                              Is it ok
>>> now?
>>>
>>>                                                          I still think
>>> you should put the "Parallel Safepoint Cleanup"
>>>                                                          workers
>>>                                                          inside
>>> Shenandoah,
>>>                                                          so the
>>> SafepointSynchronizer only calls get_safepoint_workers,
>>>                                                          e.g.:
>>>
>>>                                                         
>>> _cleanup_workers = heap->get_safepoint_workers();
>>>                                                         
>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>                                                         
>>> _cleanup_workers->total_workers() : 1;
>>>                                                         
>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>                                                         
>>> StrongRootsScope srs(_num_cleanup_workers);
>>>                                                          if
>>> (_cleanup_workers != NULL) {
>>>                                                         
>>> _cleanup_workers->run_task(&cleanup,
>>>                                                         
>>> _num_cleanup_workers);
>>>                                                          } else {
>>>                                                          cleanup.work
>>> <http://cleanup.work>(0);
>>>                                                          }
>>>
>>>                                                          That way you
>>> don't even need your new flags, but it will be
>>>                                                          up to
>>>                                                          the
>>>                                                          other GCs to
>>> make their worker available
>>>                                                          or cheat with
>>> a separate workgang.
>>>
>>>                                                      I can do that, I
>>> don't mind. The question is, do we want that?
>>>
>>>                                                  The problem is that
>>> we do not want to haste such decision, we
>>>                                                  believe
>>>                                                  there is a better
>>> solution.
>>>                                                  I think you also
>>> would want another solution.
>>>                                                  But it's seems like
>>> such solution with 1 'global' thread pool
>>>                                                  either
>>>                                                  own by GC or the VM
>>> it self is quite the undertaking.
>>>                                                  Since this probably
>>> will not be done any time soon my
>>>                                                  suggestion is,
>>>                                                  to not hold you back
>>> (we also want this), just to make
>>>                                                  the code parallel and
>>> as an intermediate step ask the GC if it
>>>                                                  minds
>>>                                                  sharing it's thread.
>>>
>>>                                                  Now when Shenandoah
>>> is merged it's possible that e.g. G1 will
>>>                                                  share
>>>                                                  the code for a
>>> separate thread pool, do something of it's own or
>>>                                                  wait until the bigger
>>> question about thread pool(s) have been
>>>                                                  resolved.
>>>
>>>                                                  By adding a thread
>>> pool directly to the SafepointSynchronizer
>>>                                                  and
>>>                                                  flags for it we might
>>> limit our future options.
>>>
>>>                                                      I wouldn't call
>>> it 'cheating with a separate workgang'
>>>                                                      though. I
>>>                                                      see
>>>                                                      that both G1 and
>>> CMS suspend their worker threads at a
>>>                                                      safepoint.
>>>                                                      However:
>>>
>>>                                                  Yes it's not cheating
>>> but I want decent heuristics between e.g.
>>>                                                  number
>>>                                                  of concurrent marking
>>> threads and parallel safepoint threads
>>>                                                  since
>>>                                                  they compete for cpu
>>> time.
>>>                                                  As the code looks
>>> now, I think that decisions must be made by
>>>                                                  the
>>>                                                  GC.
>>>
>>>                                              Ok, I see your point. I
>>> updated the proposed patch accordingly:
>>>
>>>                                             
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>                                             
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>
>>>                                          Oops. Minor mistake there.
>>> Correction:
>>>                                         
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>                                         
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>
>>>                                          (Removed 'class WorkGang'
>>> from safepoint.hpp, and forgot to add it
>>>                                          into
>>>                                          collectedHeap.hpp, resulting
>>> in build failure...)
>>>
>>>                                          Roman
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet.
> 
> 


From rkennke at redhat.com  Thu Jun 29 20:04:18 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 29 Jun 2017 22:04:18 +0200
Subject: RFR: Parallelize safepoint cleanup
In-Reply-To: <72d197f7-a99b-84bc-26f7-c9a84da26ccd@oracle.com>
References: <a4451878-a38f-3cd3-9136-425b424e7ade@redhat.com>
 <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com>
 <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com>
 <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com>
 <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com>
 <d4218cc0-9377-0df3-4d62-074d55357286@redhat.com>
 <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com>
 <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com>
 <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com>
 <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com>
 <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com>
 <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com>
 <f797c416-e8a2-bc33-3f33-58ae51830aaf@oracle.com>
 <676d3b56-cee0-b68a-d700-e43695355148@redhat.com>
 <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com>
 <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com>
 <9a882506-282a-ec74-27de-5b22e258e352@oracle.com>
 <47667919-0786-56a0-ebf9-d7c1b48766c2@redhat.com>
 <72d197f7-a99b-84bc-26f7-c9a84da26ccd@oracle.com>
Message-ID: <8dfc2752-36f1-4444-243a-975818c7dc92@redhat.com>

Am 29.06.2017 um 21:27 schrieb Robbin Ehn:
> Hi Roman,
>
> Thanks,
>
> There seem to be a performance gain vs old just running VM thread
> (again shaky numbers, but an indication):
>
> Old code with,   MonitorUsedDeflationThreshold=0, 0.003099s, avg of 10
> worsed cleanups 0.0213s
> Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s, avg of 10
> worsed cleanups 0.0173s
>
> I'm assuming that combining deflation and nmethods marking in same
> pass is the reason for this.
> Great!

Yes, that seems likely. Thanks for your patient reviewing and testing! :-)

Also, the real winner (for me) was merging deflation and nmethod marking
into GC pass (as proposed in the very first patch). This parallelizes
much better because the GC can do other (root marking) work in parallel.
Unfortunately, this is currently not possible with OpenJDK GCs, because
they all use preserve_marks() and restore_marks() before/after GC to
store away the mark words... but we need them for deflation. (Shenandoah
doesn't do this, and can thus benefit from this optimization, and I
suppose G1 could do it too -- after all it shouldn't require the object
header for marking, right?)

Roman


From erik.helin at oracle.com  Fri Jun 30 09:37:12 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Fri, 30 Jun 2017 11:37:12 +0200
Subject: RFR: 8183281: Remove unnecessary call to increment_gc_time_stamp
Message-ID: <646a5d9a-b6d3-82c9-3937-027c3193d4c0@oracle.com>

Hi all,

the following small patch removes an unnecessary call to 
increment_gc_time_stamp from 
G1CollectedHeap::do_collection_pause_at_safepoint (and the long, wrong, 
comment above the call).

We already do a call increment_gc_time_stamp much earlier in 
do_collection_pause_at_safepoint, which is enough. The reasons outlined 
in the comment motivating a second call is no longer true, the code has 
changed (but the comment has not).

Bug: https://bugs.openjdk.java.net/browse/JDK-8183281
Patch: see below
Testing: make hotspot

Thanks,
Erik

# HG changeset patch
# User ehelin
# Date 1498814642 -7200
#      Fri Jun 30 11:24:02 2017 +0200
# Node ID 62400b3cbec4e0d06e0d6c21c9486070d8c906a4
# Parent  10ccf0a5f63fdca04d9eda2c774ccdd0e12bc1a1
8183281: Remove unnecessary call to increment_gc_time_stamp

diff -r 10ccf0a5f63f -r 62400b3cbec4 src/share/vm/gc/g1/g1CollectedHeap.cpp
--- a/src/share/vm/gc/g1/g1CollectedHeap.cpp    Thu Jun 29 19:09:04 2017 
+0000
+++ b/src/share/vm/gc/g1/g1CollectedHeap.cpp    Fri Jun 30 11:24:02 2017 
+0200
@@ -3266,29 +3266,6 @@

          MemoryService::track_memory_usage();

-        // In prepare_for_verify() below we'll need to scan the deferred
-        // update buffers to bring the RSets up-to-date if
-        // G1HRRSFlushLogBuffersOnVerify has been set. While scanning
-        // the update buffers we'll probably need to scan cards on the
-        // regions we just allocated to (i.e., the GC alloc
-        // regions). However, during the last GC we called
-        // set_saved_mark() on all the GC alloc regions, so card
-        // scanning might skip the [saved_mark_word()...top()] area of
-        // those regions (i.e., the area we allocated objects into
-        // during the last GC). But it shouldn't. Given that
-        // saved_mark_word() is conditional on whether the GC time stamp
-        // on the region is current or not, by incrementing the GC time
-        // stamp here we invalidate all the GC time stamps on all the
-        // regions and saved_mark_word() will simply return top() for
-        // all the regions. This is a nicer way of ensuring this rather
-        // than iterating over the regions and fixing them. In fact, the
-        // GC time stamp increment here also ensures that
-        // saved_mark_word() will return top() between pauses, i.e.,
-        // during concurrent refinement. So we don't need the
-        // is_gc_active() check to decided which top to use when
-        // scanning cards (see CR 7039627).
-        increment_gc_time_stamp();
-
          if (VerifyRememberedSets) {
            log_info(gc, verify)("[Verifying RemSets after GC]");
            VerifyRegionRemSetClosure v_cl;


From erik.helin at oracle.com  Fri Jun 30 09:47:37 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Fri, 30 Jun 2017 11:47:37 +0200
Subject: RFR: G1HRRSFlushLogBuffersOnVerify with remembered set
 verification does not work
In-Reply-To: <1498729056.2900.4.camel@oracle.com>
References: <afe7d07b-0123-b966-8166-06dbdaed4b9d@oracle.com>
 <1498729056.2900.4.camel@oracle.com>
Message-ID: <7e6c57ef-7e02-0b0d-5a35-3ea395089e30@oracle.com>

On 06/29/2017 11:37 AM, Thomas Schatzl wrote:
>> Patch: http://cr.openjdk.java.net/~ehelin/8153360/00/
>>
>> Test: make hotspot - this is "just" removal of code
>
>   looks good.
>
> Please add a comment about what the last clause in the verification
> code actually means (heapRegion.cpp:584). Something like:
>
> // Reference may not have been refined into the remembered sets yet.
> // Instead of looking into all dirty card queues, we take a shortcut
> // by looking at whether the corresponding card is dirty.
> // ObjArrays may either by marked on the object header or exactly.
>
> (Actually I would guess the "correct" clause here would be is_array()
> and not is_objArray(), but primitive type arrays are never marked as
> they do not contain references)
>
> I do not need a re-review for the comment change.

Thanks for the review Thomas, I will add the comment before I push!
Erik

> Thanks,
>   Thomas
>


From stefan.johansson at oracle.com  Fri Jun 30 11:53:00 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Fri, 30 Jun 2017 13:53:00 +0200
Subject: RFR: 8183281: Remove unnecessary call to increment_gc_time_stamp
In-Reply-To: <646a5d9a-b6d3-82c9-3937-027c3193d4c0@oracle.com>
References: <646a5d9a-b6d3-82c9-3937-027c3193d4c0@oracle.com>
Message-ID: <5b2dff36-0a55-feb8-7e80-52e4562a5651@oracle.com>

Hi Erik,

On 2017-06-30 11:37, Erik Helin wrote:
> Hi all,
>
> the following small patch removes an unnecessary call to 
> increment_gc_time_stamp from 
> G1CollectedHeap::do_collection_pause_at_safepoint (and the long, 
> wrong, comment above the call).
>
> We already do a call increment_gc_time_stamp much earlier in 
> do_collection_pause_at_safepoint, which is enough. The reasons 
> outlined in the comment motivating a second call is no longer true, 
> the code has changed (but the comment has not).
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8183281
> Patch: see below
> Testing: make hotspot
>
Patch looks good, but I would like to see some more testing than just 
building hotspot. Running the gc jtreg tests for example.

Thanks for cleaning up the code,
Stefan
> Thanks,
> Erik
>
> # HG changeset patch
> # User ehelin
> # Date 1498814642 -7200
> #      Fri Jun 30 11:24:02 2017 +0200
> # Node ID 62400b3cbec4e0d06e0d6c21c9486070d8c906a4
> # Parent  10ccf0a5f63fdca04d9eda2c774ccdd0e12bc1a1
> 8183281: Remove unnecessary call to increment_gc_time_stamp
>
> diff -r 10ccf0a5f63f -r 62400b3cbec4 
> src/share/vm/gc/g1/g1CollectedHeap.cpp
> --- a/src/share/vm/gc/g1/g1CollectedHeap.cpp    Thu Jun 29 19:09:04 
> 2017 +0000
> +++ b/src/share/vm/gc/g1/g1CollectedHeap.cpp    Fri Jun 30 11:24:02 
> 2017 +0200
> @@ -3266,29 +3266,6 @@
>
>          MemoryService::track_memory_usage();
>
> -        // In prepare_for_verify() below we'll need to scan the deferred
> -        // update buffers to bring the RSets up-to-date if
> -        // G1HRRSFlushLogBuffersOnVerify has been set. While scanning
> -        // the update buffers we'll probably need to scan cards on the
> -        // regions we just allocated to (i.e., the GC alloc
> -        // regions). However, during the last GC we called
> -        // set_saved_mark() on all the GC alloc regions, so card
> -        // scanning might skip the [saved_mark_word()...top()] area of
> -        // those regions (i.e., the area we allocated objects into
> -        // during the last GC). But it shouldn't. Given that
> -        // saved_mark_word() is conditional on whether the GC time stamp
> -        // on the region is current or not, by incrementing the GC time
> -        // stamp here we invalidate all the GC time stamps on all the
> -        // regions and saved_mark_word() will simply return top() for
> -        // all the regions. This is a nicer way of ensuring this rather
> -        // than iterating over the regions and fixing them. In fact, the
> -        // GC time stamp increment here also ensures that
> -        // saved_mark_word() will return top() between pauses, i.e.,
> -        // during concurrent refinement. So we don't need the
> -        // is_gc_active() check to decided which top to use when
> -        // scanning cards (see CR 7039627).
> -        increment_gc_time_stamp();
> -
>          if (VerifyRememberedSets) {
>            log_info(gc, verify)("[Verifying RemSets after GC]");
>            VerifyRegionRemSetClosure v_cl;


From stefan.johansson at oracle.com  Fri Jun 30 11:56:15 2017
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Fri, 30 Jun 2017 13:56:15 +0200
Subject: RFR: G1HRRSFlushLogBuffersOnVerify with remembered set
 verification does not work
In-Reply-To: <7e6c57ef-7e02-0b0d-5a35-3ea395089e30@oracle.com>
References: <afe7d07b-0123-b966-8166-06dbdaed4b9d@oracle.com>
 <1498729056.2900.4.camel@oracle.com>
 <7e6c57ef-7e02-0b0d-5a35-3ea395089e30@oracle.com>
Message-ID: <dc1ae54c-2f8a-0366-5b5e-7a41b6963391@oracle.com>

Hi Erik,

On 2017-06-30 11:47, Erik Helin wrote:
> On 06/29/2017 11:37 AM, Thomas Schatzl wrote:
>>> Patch: http://cr.openjdk.java.net/~ehelin/8153360/00/
>>>
>>> Test: make hotspot - this is "just" removal of code
>>
>>   looks good.
>>
>> Please add a comment about what the last clause in the verification
>> code actually means (heapRegion.cpp:584). Something like:
>>
>> // Reference may not have been refined into the remembered sets yet.
>> // Instead of looking into all dirty card queues, we take a shortcut
>> // by looking at whether the corresponding card is dirty.
>> // ObjArrays may either by marked on the object header or exactly.
>>
>> (Actually I would guess the "correct" clause here would be is_array()
>> and not is_objArray(), but primitive type arrays are never marked as
>> they do not contain references)
>>
>> I do not need a re-review for the comment change.
>
> Thanks for the review Thomas, I will add the comment before I push!
After the comment removal in JDK-8183281 which referenced the flag I 
say: Ship it.

Thanks,
Stefan
> Erik
>
>> Thanks,
>>   Thomas
>>


From erik.helin at oracle.com  Fri Jun 30 15:34:08 2017
From: erik.helin at oracle.com (Erik Helin)
Date: Fri, 30 Jun 2017 17:34:08 +0200
Subject: RFR: 8183281: Remove unnecessary call to increment_gc_time_stamp
In-Reply-To: <5b2dff36-0a55-feb8-7e80-52e4562a5651@oracle.com>
References: <646a5d9a-b6d3-82c9-3937-027c3193d4c0@oracle.com>
 <5b2dff36-0a55-feb8-7e80-52e4562a5651@oracle.com>
Message-ID: <c647ba54-cc4c-bec6-86b3-4cd4af14d39b@oracle.com>

On 06/30/2017 01:53 PM, Stefan Johansson wrote:
> Hi Erik,
>
> On 2017-06-30 11:37, Erik Helin wrote:
>> Hi all,
>>
>> the following small patch removes an unnecessary call to
>> increment_gc_time_stamp from
>> G1CollectedHeap::do_collection_pause_at_safepoint (and the long,
>> wrong, comment above the call).
>>
>> We already do a call increment_gc_time_stamp much earlier in
>> do_collection_pause_at_safepoint, which is enough. The reasons
>> outlined in the comment motivating a second call is no longer true,
>> the code has changed (but the comment has not).
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183281
>> Patch: see below
>> Testing: make hotspot
>>
> Patch looks good, but I would like to see some more testing than just
> building hotspot. Running the gc jtreg tests for example.

Thanks for reviewing! All pass for both fastdebug and product when 
running `make test TEST=hotspot_gc` on my Linux workstation.

Thanks,
Erik

> Thanks for cleaning up the code,
> Stefan
>> Thanks,
>> Erik
>>
>> # HG changeset patch
>> # User ehelin
>> # Date 1498814642 -7200
>> #      Fri Jun 30 11:24:02 2017 +0200
>> # Node ID 62400b3cbec4e0d06e0d6c21c9486070d8c906a4
>> # Parent  10ccf0a5f63fdca04d9eda2c774ccdd0e12bc1a1
>> 8183281: Remove unnecessary call to increment_gc_time_stamp
>>
>> diff -r 10ccf0a5f63f -r 62400b3cbec4
>> src/share/vm/gc/g1/g1CollectedHeap.cpp
>> --- a/src/share/vm/gc/g1/g1CollectedHeap.cpp    Thu Jun 29 19:09:04
>> 2017 +0000
>> +++ b/src/share/vm/gc/g1/g1CollectedHeap.cpp    Fri Jun 30 11:24:02
>> 2017 +0200
>> @@ -3266,29 +3266,6 @@
>>
>>          MemoryService::track_memory_usage();
>>
>> -        // In prepare_for_verify() below we'll need to scan the deferred
>> -        // update buffers to bring the RSets up-to-date if
>> -        // G1HRRSFlushLogBuffersOnVerify has been set. While scanning
>> -        // the update buffers we'll probably need to scan cards on the
>> -        // regions we just allocated to (i.e., the GC alloc
>> -        // regions). However, during the last GC we called
>> -        // set_saved_mark() on all the GC alloc regions, so card
>> -        // scanning might skip the [saved_mark_word()...top()] area of
>> -        // those regions (i.e., the area we allocated objects into
>> -        // during the last GC). But it shouldn't. Given that
>> -        // saved_mark_word() is conditional on whether the GC time stamp
>> -        // on the region is current or not, by incrementing the GC time
>> -        // stamp here we invalidate all the GC time stamps on all the
>> -        // regions and saved_mark_word() will simply return top() for
>> -        // all the regions. This is a nicer way of ensuring this rather
>> -        // than iterating over the regions and fixing them. In fact, the
>> -        // GC time stamp increment here also ensures that
>> -        // saved_mark_word() will return top() between pauses, i.e.,
>> -        // during concurrent refinement. So we don't need the
>> -        // is_gc_active() check to decided which top to use when
>> -        // scanning cards (see CR 7039627).
>> -        increment_gc_time_stamp();
>> -
>>          if (VerifyRememberedSets) {
>>            log_info(gc, verify)("[Verifying RemSets after GC]");
>>            VerifyRegionRemSetClosure v_cl;
>


From rkennke at redhat.com  Fri Jun 30 16:32:14 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 30 Jun 2017 18:32:14 +0200
Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap
 into its own subclass
In-Reply-To: <fb90f88a-ef22-550e-6ee9-35f29472dc01@oracle.com>
References: <b33ca127-c0d1-5a4b-7565-0ffe2ca6fe52@redhat.com>
 <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com>
 <fb90f88a-ef22-550e-6ee9-35f29472dc01@oracle.com>
Message-ID: <3d8b55a2-a787-3051-b351-ab9b0a24f5e0@redhat.com>

I came across one problem using this approach: We will have 2 instances
of CollectedHeap around, where there's usually only 1, and some code
expects only 1. For example, in CollectedHeap constructor, we create new
PerfData variables, and we now create them 2x, which leads to an assert
being thrown. I suspect there is more code like that.

I will attempt to refactor this a little more, maybe it's not that bad,
but it's probably not worth spending too much time on it.

Roman
> Hi Roman,
>
> thanks for putting this patch together, it is a great step forward! One
> thung that (in my mind) would improve it even further is if we embed a
> GenCollectedHeap in CMSHeap and then make CMSHeap inherit directly from
> CollectedHeap.
>
> With this solution, the definition of CMSHeap would look like something
> along the lines of:
>
> class CMSHeap : public CollectedHeap {
>   WorkGang* _wg;
>   GenCollectedHeap _gch;
>
>  public:
>   CMSHeap(GenCollectorPolicy* policy) :
>     _wg(new WorkGang("GC Thread", ParallelGCThreads, true, true),
>     _gch(policy) {
>     _wg->initialize_workers();
>   }
>
>   // a bunch of "facade" methods
>   virtual bool supports_tlab_allocation() const {
>     return _gch->supports_tlab_allocation();
>   }
>
>   virtual size_t tlab_capacity(Thread* t) const {
>     return _gch->tlab_capacity(t);
>   }
> };
>
> With this approach, you would have to implement a bunch of "facade"
> methods that just delegates to _gch, such as the methods
> supports_tlab_allocation and tlab_capacity above. There are two reasons
> why I prefer this approach:
> 1. In the end we want CMSHeap to inherit from CollectedHeap anyway :)
> 2. It makes it very clear which methods we gradually have to
>    re-implement in CMSHeap to eventually get rid of the _gch field (the
>    end goal). This is much harder to see if CMSHeap inherits from
>    GenCollectedHeap (see more below).
>
> The second point will most likely cause some initial problems with
> `protected` code in GenCollectedHeap. For example, as you noticed when
> creating this patch, CMSHeap make use of a few `protected` fields and
> methods from GenCollectedHeap, most notably:
> - _process_strong_tasks
> - process_roots()
> - process_string_table_roots()
>
> It would be much better (IMO) to share this code via composition rather
> than inheritance. In this particular case, I would prefer to create a
> class StrongRootsProcessor that encapsulates the root processing logic.
> Then GenCollectedHeap and CMSHeap can both contain an instance of
> StrongRootsProcessor.
>
> What do you think of this approach? Do you have some spare cycles to try
> this approach out?
>
> Thanks,
> Erik
>
> On 06/02/2017 10:55 AM, Roman Kennke wrote:
>> Take this patch. It #ifdef ASSERT's a call to check_gen_kinds() that is
>> only present in debug builds.
>>
>>
>> http://cr.openjdk.java.net/~rkennke/8179387/webrev.01/
>> <http://cr.openjdk.java.net/%7Erkennke/8179387/webrev.01/>
>>
>> Roman
>>
>> Am 01.06.2017 um 22:50 schrieb Roman Kennke:
>>> What $SUBJECT says.
>>>
>>> I went over genCollectedHeap.[hpp|cpp] and moved everything that I could
>>> find that is CMS-only into a new CMSHeap class.
>>>
>>> http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/
>>> <http://cr.openjdk.java.net/%7Erkennke/8179387/webrev.00/>
>>>
>>> It is possible that I overlooked something there. There may be code in
>>> there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff.
>>>
>>> Also not that I have not removed that little part:
>>>
>>>   always_do_update_barrier = UseConcMarkSweepGC;
>>>
>>> because I expect it to go away with Erik ?'s big refactoring.
>>>
>>> What do you think?
>>>
>>> Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC
>>>
>>> Roman
>>>