From rkennke at redhat.com Thu Jun 1 09:29:44 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 1 Jun 2017 11:29:44 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> Message-ID: <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: > Hi Roman, I agree that is really needed but: > > On 05/31/2017 10:27 AM, Roman Kennke wrote: >> I realized that sharing workers with GC is not so easy. >> >> We need to be able to use the workers at a safepoint during concurrent >> GC work (which also uses the same workers). This does not only require >> that those workers be suspended, like e.g. >> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have >> finished their tasks. This needs some careful handling to work without >> races: it requires a SuspendibleThreadSetJoiner around the corresponding >> run_task() call and also the tasks themselves need to join the STS and >> handle requests for safepoints not by yielding, but by leaving the task. >> This is far too peculiar for me to make the call to hook up GC workers >> for safepoint cleanup, and I thus removed those parts. I left the API in >> CollectedHeap in place. I think GC devs who know better about G1 and CMS >> should make that call, or else just use a separate thread pool. >> >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >> >> >> Is it ok now? > > I still think you should put the "Parallel Safepoint Cleanup" workers > inside Shenandoah, > so the SafepointSynchronizer only calls get_safepoint_workers, e.g.: > > _cleanup_workers = heap->get_safepoint_workers(); > _num_cleanup_workers = _cleanup_workers != NULL ? > _cleanup_workers->total_workers() : 1; > ParallelSPCleanupTask cleanup(_cleanup_subtasks); > StrongRootsScope srs(_num_cleanup_workers); > if (_cleanup_workers != NULL) { > _cleanup_workers->run_task(&cleanup, _num_cleanup_workers); > } else { > cleanup.work(0); > } > > That way you don't even need your new flags, but it will be up to the > other GCs to make their worker available > or cheat with a separate workgang. I can do that, I don't mind. The question is, do we want that? I wouldn't call it 'cheating with a separate workgang' though. I see that both G1 and CMS suspend their worker threads at a safepoint. However: - Do they finish their work, stop, and then restart work after safepoint? Or are the workers simply calling STS::yield() to suspend and later resume their work where they left off. If they only call yield() (or whatever equivalent in CMS), then this is not enough: the workers need to be truly idle in order to be used by the safepoint cleaners. - Parallel and serial GC don't have workgangs of their own. So, as far as I can tell, this means that parallel safepoint cleanup would only be supported by GCs for which we explicitely implement it, after having carefully checked if/how workgangs are suspended at safepoints, or by providing GC-internal thread pools. Do we really want that? Roman From erik.helin at oracle.com Thu Jun 1 09:35:31 2017 From: erik.helin at oracle.com (Erik Helin) Date: Thu, 1 Jun 2017 11:35:31 +0200 Subject: RFR (7xS): 8071280: Specialize HeapRegion::oops_on_card_seq_iterate_careful() for use during concurrent refinement and updating the rset In-Reply-To: <1496218454.3287.2.camel@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <1493985823.2777.52.camel@oracle.com> <3098f31e-3301-362d-8c9c-b06f27e7133c@oracle.com> <1494847168.2707.17.camel@oracle.com> <1495538626.2781.3.camel@oracle.com> <1496218454.3287.2.camel@oracle.com> Message-ID: On 05/31/2017 10:14 AM, Thomas Schatzl wrote: > Hi all, > > On Tue, 2017-05-23 at 13:23 +0200, Thomas Schatzl wrote: >> Hi all, >> >> Erik Helin had a few comments regarding naming etc. that this new >> webrev incorporates: >> >> Webrevs: >> http://cr.openjdk.java.net/~tschatzl/8071280/webrev.2_to_3/ (diff) >> http://cr.openjdk.java.net/~tschatzl/8071280/webrev.3/ (full) > > Erik and me did some more investigation on this and found that > actually the is_gc_active parameter > for HeapRegion::is_obj_dead_with_size() is not required any more due to > the addition of the ClassUnloadingWithConcurrentMark clause inside. > > I removed that and updated the webrev. > > Sorry for another change - Erik promised me that it's good now :) Yep, this is good to go now. As we've discussed, there are some further improvements that can be done to this code, but we do that as a later patch. Thanks a lot for cleaning up this code! Erik > Thanks, > Thomas > From erik.helin at oracle.com Thu Jun 1 11:59:58 2017 From: erik.helin at oracle.com (Erik Helin) Date: Thu, 1 Jun 2017 13:59:58 +0200 Subject: RFR (7xS): 8177707: Specialize G1RemSet::refine_card for concurrent/during safepoint refinement In-Reply-To: <1496218627.3287.5.camel@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <3E94A9B0-D0AB-4521-9727-D4B1D0954BAA@oracle.com> <1493985121.2777.42.camel@oracle.com> <192c34a3-c869-599d-0661-2ca9c524b626@oracle.com> <1496218627.3287.5.camel@oracle.com> Message-ID: On 05/31/2017 10:17 AM, Thomas Schatzl wrote: > Hi Erik, > > On Tue, 2017-05-30 at 11:43 +0200, Erik Helin wrote: >> On 05/09/2017 01:31 AM, Kim Barrett wrote: >>> >>>> >>>> On May 5, 2017, at 7:52 AM, Thomas Schatzl >>> .com> wrote: >>>> >>>> New webrevs: >>>> http://cr.openjdk.java.net/~tschatzl/8177707/webrev.0_to_1/ >>>> (diff) >>>> http://cr.openjdk.java.net/~tschatzl/8177707/webrev.1/ (full) >>>> >>>> Thanks, >>>> Thomas >>> Looks good. >> Looks good to me as well! I've tried to really to go through the >> patch with a looking glass, and AFAICS all the code have been >> duplicated correctly. > > thanks. > > Unfortunately, with these reviews taking their time a change in the > jdk10 repo required an update. In > particular G1RootRegionScanClosure::apply_to_weak_ref_discovered_field( > ) had to be replaced by > G1RootRegionScanClosure::reference_iteration_mode() as introduced > lately. > > The full patch: > > --- old/src/share/vm/gc/g1/g1OopClosures.hpp 2017-05-16 > 09:57:27.140974921 +0200 > +++ new/src/share/vm/gc/g1/g1OopClosures.hpp 2017-05-16 > 09:57:27.035971738 +0200 > @@ -181,7 +181,8 @@ > _worker_i(worker_i) { > } > > - bool apply_to_weak_ref_discovered_field() { return true; } > + // This closure needs special handling for InstanceRefKlass. > + virtual ReferenceIterationMode reference_iteration_mode() { return > DO_DISCOVERED_AND_DISCOVERY; } > > template void do_oop_nv(T* p); > virtual void do_oop(narrowOop* p) { do_oop_nv(p); } Looks good! Now push it before something else changes :) Erik > Thanks, > Thomas > From robbin.ehn at oracle.com Thu Jun 1 12:18:23 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 1 Jun 2017 14:18:23 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> Message-ID: <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> Hi Roman, On 06/01/2017 11:29 AM, Roman Kennke wrote: > Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >> Hi Roman, I agree that is really needed but: >> >> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>> I realized that sharing workers with GC is not so easy. >>> >>> We need to be able to use the workers at a safepoint during concurrent >>> GC work (which also uses the same workers). This does not only require >>> that those workers be suspended, like e.g. >>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have >>> finished their tasks. This needs some careful handling to work without >>> races: it requires a SuspendibleThreadSetJoiner around the corresponding >>> run_task() call and also the tasks themselves need to join the STS and >>> handle requests for safepoints not by yielding, but by leaving the task. >>> This is far too peculiar for me to make the call to hook up GC workers >>> for safepoint cleanup, and I thus removed those parts. I left the API in >>> CollectedHeap in place. I think GC devs who know better about G1 and CMS >>> should make that call, or else just use a separate thread pool. >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>> >>> >>> Is it ok now? >> >> I still think you should put the "Parallel Safepoint Cleanup" workers >> inside Shenandoah, >> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.: >> >> _cleanup_workers = heap->get_safepoint_workers(); >> _num_cleanup_workers = _cleanup_workers != NULL ? >> _cleanup_workers->total_workers() : 1; >> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >> StrongRootsScope srs(_num_cleanup_workers); >> if (_cleanup_workers != NULL) { >> _cleanup_workers->run_task(&cleanup, _num_cleanup_workers); >> } else { >> cleanup.work(0); >> } >> >> That way you don't even need your new flags, but it will be up to the >> other GCs to make their worker available >> or cheat with a separate workgang. > I can do that, I don't mind. The question is, do we want that? The problem is that we do not want to haste such decision, we believe there is a better solution. I think you also would want another solution. But it's seems like such solution with 1 'global' thread pool either own by GC or the VM it self is quite the undertaking. Since this probably will not be done any time soon my suggestion is, to not hold you back (we also want this), just to make the code parallel and as an intermediate step ask the GC if it minds sharing it's thread. Now when Shenandoah is merged it's possible that e.g. G1 will share the code for a separate thread pool, do something of it's own or wait until the bigger question about thread pool(s) have been resolved. By adding a thread pool directly to the SafepointSynchronizer and flags for it we might limit our future options. > I wouldn't call it 'cheating with a separate workgang' though. I see > that both G1 and CMS suspend their worker threads at a safepoint. However: Yes it's not cheating but I want decent heuristics between e.g. number of concurrent marking threads and parallel safepoint threads since they compete for cpu time. As the code looks now, I think that decisions must be made by the GC. > - Do they finish their work, stop, and then restart work after > safepoint? Or are the workers simply calling STS::yield() to suspend and > later resume their work where they left off. If they only call yield() > (or whatever equivalent in CMS), then this is not enough: the workers > need to be truly idle in order to be used by the safepoint cleaners. > - Parallel and serial GC don't have workgangs of their own. I know Erik ? have been some prototyping here, he can probably fill you in. > > So, as far as I can tell, this means that parallel safepoint cleanup > would only be supported by GCs for which we explicitely implement it, > after having carefully checked if/how workgangs are suspended at > safepoints, or by providing GC-internal thread pools. Do we really want > that? We know we probably don't want a thread pool in the SafepointSynchronizer and probably not more pools, but we need to think about it. Do you agree? Still thanks for doing this! /Robbin > > Roman > From rkennke at redhat.com Thu Jun 1 15:50:59 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 1 Jun 2017 17:50:59 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> Message-ID: <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: > Hi Roman, > > On 06/01/2017 11:29 AM, Roman Kennke wrote: >> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>> Hi Roman, I agree that is really needed but: >>> >>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>> I realized that sharing workers with GC is not so easy. >>>> >>>> We need to be able to use the workers at a safepoint during concurrent >>>> GC work (which also uses the same workers). This does not only require >>>> that those workers be suspended, like e.g. >>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have >>>> finished their tasks. This needs some careful handling to work without >>>> races: it requires a SuspendibleThreadSetJoiner around the >>>> corresponding >>>> run_task() call and also the tasks themselves need to join the STS and >>>> handle requests for safepoints not by yielding, but by leaving the >>>> task. >>>> This is far too peculiar for me to make the call to hook up GC workers >>>> for safepoint cleanup, and I thus removed those parts. I left the >>>> API in >>>> CollectedHeap in place. I think GC devs who know better about G1 >>>> and CMS >>>> should make that call, or else just use a separate thread pool. >>>> >>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>> >>>> >>>> Is it ok now? >>> >>> I still think you should put the "Parallel Safepoint Cleanup" workers >>> inside Shenandoah, >>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.: >>> >>> _cleanup_workers = heap->get_safepoint_workers(); >>> _num_cleanup_workers = _cleanup_workers != NULL ? >>> _cleanup_workers->total_workers() : 1; >>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>> StrongRootsScope srs(_num_cleanup_workers); >>> if (_cleanup_workers != NULL) { >>> _cleanup_workers->run_task(&cleanup, _num_cleanup_workers); >>> } else { >>> cleanup.work(0); >>> } >>> >>> That way you don't even need your new flags, but it will be up to the >>> other GCs to make their worker available >>> or cheat with a separate workgang. >> I can do that, I don't mind. The question is, do we want that? > > The problem is that we do not want to haste such decision, we believe > there is a better solution. > I think you also would want another solution. > But it's seems like such solution with 1 'global' thread pool either > own by GC or the VM it self is quite the undertaking. > Since this probably will not be done any time soon my suggestion is, > to not hold you back (we also want this), just to make > the code parallel and as an intermediate step ask the GC if it minds > sharing it's thread. > > Now when Shenandoah is merged it's possible that e.g. G1 will share > the code for a separate thread pool, do something of it's own or > wait until the bigger question about thread pool(s) have been resolved. > > By adding a thread pool directly to the SafepointSynchronizer and > flags for it we might limit our future options. > >> I wouldn't call it 'cheating with a separate workgang' though. I see >> that both G1 and CMS suspend their worker threads at a safepoint. >> However: > > Yes it's not cheating but I want decent heuristics between e.g. number > of concurrent marking threads and parallel safepoint threads since > they compete for cpu time. > As the code looks now, I think that decisions must be made by the GC. Ok, I see your point. I updated the proposed patch accordingly: http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ But it means that parallel safepoint cleanup is not really available unless it's implemented by a GC. There's one little change compared to the current state even with serial cleanup: nmethod marking and monitor deflation are now done in one single pass. I am curious about what you're thinking about when you say you 'want another solution'. I am having another solution in mind too: concurrent monitor deflation. I am currently drafting a JEP but it's not ready yet. So what do you think of the latest iteration of that patch? Roman From rkennke at redhat.com Thu Jun 1 16:21:06 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 1 Jun 2017 18:21:06 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> Message-ID: Am 01.06.2017 um 17:50 schrieb Roman Kennke: > Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >> Hi Roman, >> >> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>> Hi Roman, I agree that is really needed but: >>>> >>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>> I realized that sharing workers with GC is not so easy. >>>>> >>>>> We need to be able to use the workers at a safepoint during concurrent >>>>> GC work (which also uses the same workers). This does not only require >>>>> that those workers be suspended, like e.g. >>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have >>>>> finished their tasks. This needs some careful handling to work without >>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>> corresponding >>>>> run_task() call and also the tasks themselves need to join the STS and >>>>> handle requests for safepoints not by yielding, but by leaving the >>>>> task. >>>>> This is far too peculiar for me to make the call to hook up GC workers >>>>> for safepoint cleanup, and I thus removed those parts. I left the >>>>> API in >>>>> CollectedHeap in place. I think GC devs who know better about G1 >>>>> and CMS >>>>> should make that call, or else just use a separate thread pool. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>> >>>>> >>>>> Is it ok now? >>>> I still think you should put the "Parallel Safepoint Cleanup" workers >>>> inside Shenandoah, >>>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.: >>>> >>>> _cleanup_workers = heap->get_safepoint_workers(); >>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>> _cleanup_workers->total_workers() : 1; >>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>> StrongRootsScope srs(_num_cleanup_workers); >>>> if (_cleanup_workers != NULL) { >>>> _cleanup_workers->run_task(&cleanup, _num_cleanup_workers); >>>> } else { >>>> cleanup.work(0); >>>> } >>>> >>>> That way you don't even need your new flags, but it will be up to the >>>> other GCs to make their worker available >>>> or cheat with a separate workgang. >>> I can do that, I don't mind. The question is, do we want that? >> The problem is that we do not want to haste such decision, we believe >> there is a better solution. >> I think you also would want another solution. >> But it's seems like such solution with 1 'global' thread pool either >> own by GC or the VM it self is quite the undertaking. >> Since this probably will not be done any time soon my suggestion is, >> to not hold you back (we also want this), just to make >> the code parallel and as an intermediate step ask the GC if it minds >> sharing it's thread. >> >> Now when Shenandoah is merged it's possible that e.g. G1 will share >> the code for a separate thread pool, do something of it's own or >> wait until the bigger question about thread pool(s) have been resolved. >> >> By adding a thread pool directly to the SafepointSynchronizer and >> flags for it we might limit our future options. >> >>> I wouldn't call it 'cheating with a separate workgang' though. I see >>> that both G1 and CMS suspend their worker threads at a safepoint. >>> However: >> Yes it's not cheating but I want decent heuristics between e.g. number >> of concurrent marking threads and parallel safepoint threads since >> they compete for cpu time. >> As the code looks now, I think that decisions must be made by the GC. > Ok, I see your point. I updated the proposed patch accordingly: > > http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ > Oops. Minor mistake there. Correction: http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it into collectedHeap.hpp, resulting in build failure...) Roman From rkennke at redhat.com Thu Jun 1 20:50:22 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 1 Jun 2017 22:50:22 +0200 Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap into its own subclass Message-ID: What $SUBJECT says. I went over genCollectedHeap.[hpp|cpp] and moved everything that I could find that is CMS-only into a new CMSHeap class. http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/ It is possible that I overlooked something there. There may be code in there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff. Also not that I have not removed that little part: always_do_update_barrier = UseConcMarkSweepGC; because I expect it to go away with Erik ?'s big refactoring. What do you think? Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC Roman From david.holmes at oracle.com Fri Jun 2 01:54:36 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 2 Jun 2017 11:54:36 +1000 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> Message-ID: <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> Hi Roman, I am about to disappear on an extended vacation so will let others pursue this. IIUC this is longer an opt-in by the user at runtime, but an opt-in by the particular GC developers. Okay. My only concern with that is if Shenandoah is the only GC that currently opts in then this code is not going to get much testing and will be more prone to incidental breakage. Cheers, David On 2/06/2017 2:21 AM, Roman Kennke wrote: > Am 01.06.2017 um 17:50 schrieb Roman Kennke: >> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>> Hi Roman, >>> >>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>> Hi Roman, I agree that is really needed but: >>>>> >>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>> I realized that sharing workers with GC is not so easy. >>>>>> >>>>>> We need to be able to use the workers at a safepoint during concurrent >>>>>> GC work (which also uses the same workers). This does not only require >>>>>> that those workers be suspended, like e.g. >>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have >>>>>> finished their tasks. This needs some careful handling to work without >>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>> corresponding >>>>>> run_task() call and also the tasks themselves need to join the STS and >>>>>> handle requests for safepoints not by yielding, but by leaving the >>>>>> task. >>>>>> This is far too peculiar for me to make the call to hook up GC workers >>>>>> for safepoint cleanup, and I thus removed those parts. I left the >>>>>> API in >>>>>> CollectedHeap in place. I think GC devs who know better about G1 >>>>>> and CMS >>>>>> should make that call, or else just use a separate thread pool. >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>> >>>>>> >>>>>> Is it ok now? >>>>> I still think you should put the "Parallel Safepoint Cleanup" workers >>>>> inside Shenandoah, >>>>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.: >>>>> >>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>> _cleanup_workers->total_workers() : 1; >>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>> if (_cleanup_workers != NULL) { >>>>> _cleanup_workers->run_task(&cleanup, _num_cleanup_workers); >>>>> } else { >>>>> cleanup.work(0); >>>>> } >>>>> >>>>> That way you don't even need your new flags, but it will be up to the >>>>> other GCs to make their worker available >>>>> or cheat with a separate workgang. >>>> I can do that, I don't mind. The question is, do we want that? >>> The problem is that we do not want to haste such decision, we believe >>> there is a better solution. >>> I think you also would want another solution. >>> But it's seems like such solution with 1 'global' thread pool either >>> own by GC or the VM it self is quite the undertaking. >>> Since this probably will not be done any time soon my suggestion is, >>> to not hold you back (we also want this), just to make >>> the code parallel and as an intermediate step ask the GC if it minds >>> sharing it's thread. >>> >>> Now when Shenandoah is merged it's possible that e.g. G1 will share >>> the code for a separate thread pool, do something of it's own or >>> wait until the bigger question about thread pool(s) have been resolved. >>> >>> By adding a thread pool directly to the SafepointSynchronizer and >>> flags for it we might limit our future options. >>> >>>> I wouldn't call it 'cheating with a separate workgang' though. I see >>>> that both G1 and CMS suspend their worker threads at a safepoint. >>>> However: >>> Yes it's not cheating but I want decent heuristics between e.g. number >>> of concurrent marking threads and parallel safepoint threads since >>> they compete for cpu time. >>> As the code looks now, I think that decisions must be made by the GC. >> Ok, I see your point. I updated the proposed patch accordingly: >> >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >> > Oops. Minor mistake there. Correction: > http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ > > > (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it into > collectedHeap.hpp, resulting in build failure...) > > Roman > From erik.helin at oracle.com Fri Jun 2 08:23:28 2017 From: erik.helin at oracle.com (Erik Helin) Date: Fri, 2 Jun 2017 10:23:28 +0200 Subject: RFR (7xS): 8177044: Remove _scan_top from HeapRegion In-Reply-To: <1496218781.3287.9.camel@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <1493986396.2777.61.camel@oracle.com> <90aae39c-5314-e051-02a1-67e717be538b@oracle.com> <1494512288.3120.2.camel@oracle.com> <1496218781.3287.9.camel@oracle.com> Message-ID: <4faeb89e-257b-11cd-aabc-4a007263b594@oracle.com> On 05/31/2017 10:19 AM, Thomas Schatzl wrote: > Hi all, Hi Thomas! > Erik had a look at these changes and had minor comments: > > I.e. he asked about not removing one default value for a parameter in > the declaration of the constructor of G1UpdateRSOrPushRefOopClosure > (and do it later). > > Also, one call to memset() is redundant and has been removed. > > Here are the changes again: > > http://cr.openjdk.java.net/~tschatzl/8177044/webrev.1_to_2/ (diff) > http://cr.openjdk.java.net/~tschatzl/8177044/webrev.2/ (full) Looks good to me, thanks for cleaning this up! Erik > Thanks, > thomas > > On Thu, 2017-05-11 at 16:18 +0200, Thomas Schatzl wrote: >> Hi Kim and Sangheon, >> >> On Tue, 2017-05-09 at 11:12 -0700, sangheon wrote: >>> >>> Hi Thomas, >>> >>> On 05/05/2017 05:13 AM, Thomas Schatzl wrote: >>>> >>>> >>>> Hi all, >>>> >>>> recent reviews have made changes necessary to parts of the >>>> changeset >>>> chain. >>>> >>>> Here is a list of links to updated webrevs. Since they have >>>> apparently >>>> not been reviewed yet, I simply overwrote the old webrevs. >>>> >>>> JDK-8177044: Remove _scan_top from HeapRegion >>>> http://cr.openjdk.java.net/~tschatzl/8177044/webrev/ >>> Looks good to me. >>> And agree to Kim about retaining the comment. >>> >>> src/share/vm/gc/g1/g1RemSet.cpp >>> 765 if (scan_limit <= start) { >>> 766 // If the trimmed region is empty, the card must be >>> stale. >>> 767 return false; >>> >> thanks for your review. >> >> For reference, here is the comment I intend to push: >> >> --- old/src/share/vm/gc/g1/g1RemSet.cpp 2017-05-11 >> 16:14:56.054517736 +0200 >> +++ new/src/share/vm/gc/g1/g1RemSet.cpp 2017-05-11 >> 16:14:55.951514554 +0200 >> @@ -735,6 +735,7 @@ >> >> HeapWord* scan_limit = _scan_state->scan_top(r->hrm_index()); >> if (scan_limit <= start) { >> + // If the card starts above the area in the region containing >> objects to scan, skip it. >> return false; >> } >> >> because the original comment is wrong now. >> >> New Webrevs: >> http://cr.openjdk.java.net/~tschatzl/8177044/webrev.0_to_1/ (diff) >> http://cr.openjdk.java.net/~tschatzl/8177044/webrev.1/ (full) >> >> Thanks again for your reviews, >> Thomas >> From rkennke at redhat.com Fri Jun 2 08:55:15 2017 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 2 Jun 2017 10:55:15 +0200 Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap into its own subclass In-Reply-To: References: Message-ID: <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com> Take this patch. It #ifdef ASSERT's a call to check_gen_kinds() that is only present in debug builds. http://cr.openjdk.java.net/~rkennke/8179387/webrev.01/ Roman Am 01.06.2017 um 22:50 schrieb Roman Kennke: > What $SUBJECT says. > > I went over genCollectedHeap.[hpp|cpp] and moved everything that I could > find that is CMS-only into a new CMSHeap class. > > http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/ > > > It is possible that I overlooked something there. There may be code in > there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff. > > Also not that I have not removed that little part: > > always_do_update_barrier = UseConcMarkSweepGC; > > because I expect it to go away with Erik ?'s big refactoring. > > What do you think? > > Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC > > Roman > From rkennke at redhat.com Fri Jun 2 09:41:47 2017 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 2 Jun 2017 11:41:47 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> Message-ID: <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> Hi David, thanks for reviewing. I'll be on vacation the next two weeks too, with only sporadic access to work stuff. Yes, exposure will not be as good as otherwise, but it's not totally untested either: the serial code path is the same as the parallel, the only difference is that it's not actually called by multiple threads. It's ok I think. I found two more issues that I think should be addressed: - There are some counters in deflate_idle_monitors() and I'm not sure I correctly handle them in the split-up and MT'ed thread-local/ global list deflation - nmethod marking seems to unconditionally poke true or something like that in nmethod fields. This doesn't hurt correctness-wise, but it's probably worth checking if it's already true, especially when doing this with multiple threads concurrently. I'll send an updated patch around later, I hope I can get to it today... Roman > Hi Roman, > > I am about to disappear on an extended vacation so will let others > pursue this. IIUC this is longer an opt-in by the user at runtime, but > an opt-in by the particular GC developers. Okay. My only concern with > that is if Shenandoah is the only GC that currently opts in then this > code is not going to get much testing and will be more prone to > incidental breakage. > > Cheers, > David > > On 2/06/2017 2:21 AM, Roman Kennke wrote: >> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>> Hi Roman, >>>> >>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>> Hi Roman, I agree that is really needed but: >>>>>> >>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>> >>>>>>> We need to be able to use the workers at a safepoint during >>>>>>> concurrent >>>>>>> GC work (which also uses the same workers). This does not only >>>>>>> require >>>>>>> that those workers be suspended, like e.g. >>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have >>>>>>> finished their tasks. This needs some careful handling to work >>>>>>> without >>>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>>> corresponding >>>>>>> run_task() call and also the tasks themselves need to join the >>>>>>> STS and >>>>>>> handle requests for safepoints not by yielding, but by leaving the >>>>>>> task. >>>>>>> This is far too peculiar for me to make the call to hook up GC >>>>>>> workers >>>>>>> for safepoint cleanup, and I thus removed those parts. I left the >>>>>>> API in >>>>>>> CollectedHeap in place. I think GC devs who know better about G1 >>>>>>> and CMS >>>>>>> should make that call, or else just use a separate thread pool. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>> >>>>>>> >>>>>>> Is it ok now? >>>>>> I still think you should put the "Parallel Safepoint Cleanup" >>>>>> workers >>>>>> inside Shenandoah, >>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.: >>>>>> >>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>> _cleanup_workers->total_workers() : 1; >>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>> if (_cleanup_workers != NULL) { >>>>>> _cleanup_workers->run_task(&cleanup, _num_cleanup_workers); >>>>>> } else { >>>>>> cleanup.work(0); >>>>>> } >>>>>> >>>>>> That way you don't even need your new flags, but it will be up to >>>>>> the >>>>>> other GCs to make their worker available >>>>>> or cheat with a separate workgang. >>>>> I can do that, I don't mind. The question is, do we want that? >>>> The problem is that we do not want to haste such decision, we believe >>>> there is a better solution. >>>> I think you also would want another solution. >>>> But it's seems like such solution with 1 'global' thread pool either >>>> own by GC or the VM it self is quite the undertaking. >>>> Since this probably will not be done any time soon my suggestion is, >>>> to not hold you back (we also want this), just to make >>>> the code parallel and as an intermediate step ask the GC if it minds >>>> sharing it's thread. >>>> >>>> Now when Shenandoah is merged it's possible that e.g. G1 will share >>>> the code for a separate thread pool, do something of it's own or >>>> wait until the bigger question about thread pool(s) have been >>>> resolved. >>>> >>>> By adding a thread pool directly to the SafepointSynchronizer and >>>> flags for it we might limit our future options. >>>> >>>>> I wouldn't call it 'cheating with a separate workgang' though. I see >>>>> that both G1 and CMS suspend their worker threads at a safepoint. >>>>> However: >>>> Yes it's not cheating but I want decent heuristics between e.g. number >>>> of concurrent marking threads and parallel safepoint threads since >>>> they compete for cpu time. >>>> As the code looks now, I think that decisions must be made by the GC. >>> Ok, I see your point. I updated the proposed patch accordingly: >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>> >> Oops. Minor mistake there. Correction: >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >> >> >> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it into >> collectedHeap.hpp, resulting in build failure...) >> >> Roman >> From robbin.ehn at oracle.com Fri Jun 2 10:39:11 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 2 Jun 2017 12:39:11 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> Message-ID: <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> Hi Roman, On 06/02/2017 11:41 AM, Roman Kennke wrote: > Hi David, > thanks for reviewing. I'll be on vacation the next two weeks too, with > only sporadic access to work stuff. > Yes, exposure will not be as good as otherwise, but it's not totally > untested either: the serial code path is the same as the parallel, the > only difference is that it's not actually called by multiple threads. > It's ok I think. > > I found two more issues that I think should be addressed: > - There are some counters in deflate_idle_monitors() and I'm not sure I > correctly handle them in the split-up and MT'ed thread-local/ global > list deflation > - nmethod marking seems to unconditionally poke true or something like > that in nmethod fields. This doesn't hurt correctness-wise, but it's > probably worth checking if it's already true, especially when doing this > with multiple threads concurrently. > > I'll send an updated patch around later, I hope I can get to it today... I'll review that when you get it out. I think this looks as a reasonable step before we tackle this with a major effort, such as the JEP you and Carsten doing. And another effort to 'fix' nmethods marking. Internal discussion yesterday lead us to conclude that the runtime will probably need more threads. This would be a good driver to do a 'global' worker pool which serves both gc, runtime and safepoints with threads. > > Roman > >> Hi Roman, >> >> I am about to disappear on an extended vacation so will let others >> pursue this. IIUC this is longer an opt-in by the user at runtime, but >> an opt-in by the particular GC developers. Okay. My only concern with >> that is if Shenandoah is the only GC that currently opts in then this >> code is not going to get much testing and will be more prone to >> incidental breakage. As I mentioned before, it seem like Erik ? have some idea, maybe he can do this after his barrier patch. Thanks! /Robbin >> >> Cheers, >> David >> >> On 2/06/2017 2:21 AM, Roman Kennke wrote: >>> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>>> Hi Roman, >>>>> >>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>>> Hi Roman, I agree that is really needed but: >>>>>>> >>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>>> >>>>>>>> We need to be able to use the workers at a safepoint during >>>>>>>> concurrent >>>>>>>> GC work (which also uses the same workers). This does not only >>>>>>>> require >>>>>>>> that those workers be suspended, like e.g. >>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have >>>>>>>> finished their tasks. This needs some careful handling to work >>>>>>>> without >>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>>>> corresponding >>>>>>>> run_task() call and also the tasks themselves need to join the >>>>>>>> STS and >>>>>>>> handle requests for safepoints not by yielding, but by leaving the >>>>>>>> task. >>>>>>>> This is far too peculiar for me to make the call to hook up GC >>>>>>>> workers >>>>>>>> for safepoint cleanup, and I thus removed those parts. I left the >>>>>>>> API in >>>>>>>> CollectedHeap in place. I think GC devs who know better about G1 >>>>>>>> and CMS >>>>>>>> should make that call, or else just use a separate thread pool. >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>>> >>>>>>>> >>>>>>>> Is it ok now? >>>>>>> I still think you should put the "Parallel Safepoint Cleanup" >>>>>>> workers >>>>>>> inside Shenandoah, >>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.: >>>>>>> >>>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>>> _cleanup_workers->total_workers() : 1; >>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>>> if (_cleanup_workers != NULL) { >>>>>>> _cleanup_workers->run_task(&cleanup, _num_cleanup_workers); >>>>>>> } else { >>>>>>> cleanup.work(0); >>>>>>> } >>>>>>> >>>>>>> That way you don't even need your new flags, but it will be up to >>>>>>> the >>>>>>> other GCs to make their worker available >>>>>>> or cheat with a separate workgang. >>>>>> I can do that, I don't mind. The question is, do we want that? >>>>> The problem is that we do not want to haste such decision, we believe >>>>> there is a better solution. >>>>> I think you also would want another solution. >>>>> But it's seems like such solution with 1 'global' thread pool either >>>>> own by GC or the VM it self is quite the undertaking. >>>>> Since this probably will not be done any time soon my suggestion is, >>>>> to not hold you back (we also want this), just to make >>>>> the code parallel and as an intermediate step ask the GC if it minds >>>>> sharing it's thread. >>>>> >>>>> Now when Shenandoah is merged it's possible that e.g. G1 will share >>>>> the code for a separate thread pool, do something of it's own or >>>>> wait until the bigger question about thread pool(s) have been >>>>> resolved. >>>>> >>>>> By adding a thread pool directly to the SafepointSynchronizer and >>>>> flags for it we might limit our future options. >>>>> >>>>>> I wouldn't call it 'cheating with a separate workgang' though. I see >>>>>> that both G1 and CMS suspend their worker threads at a safepoint. >>>>>> However: >>>>> Yes it's not cheating but I want decent heuristics between e.g. number >>>>> of concurrent marking threads and parallel safepoint threads since >>>>> they compete for cpu time. >>>>> As the code looks now, I think that decisions must be made by the GC. >>>> Ok, I see your point. I updated the proposed patch accordingly: >>>> >>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>>> >>> Oops. Minor mistake there. Correction: >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>> >>> >>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it into >>> collectedHeap.hpp, resulting in build failure...) >>> >>> Roman >>> > From calvinrsmith at hotmail.com Fri Jun 2 16:20:00 2017 From: calvinrsmith at hotmail.com (Calvin Smith) Date: Fri, 2 Jun 2017 16:20:00 +0000 Subject: Revew Proposal for transactional aware GC Message-ID: TransactionGC: Less GC for Transactional Systems Similar to proposal: JEP draft: Epsilon GC: The Arbitrarily Low Overhead Garbage (Non-)Collector Summary Start as normal and run with the chosen garbage collector. A class/method may be registered and once it starts to run and until the method exits all memory is allocated in a thread local allocation buffer and is not available for other threads. In addition there is no garbage collection for these objects during this time. Once the buffer is exhausted then an OutOfMemory will be raised and the stack unwind Goals When running in a request / response transaction environment do not collect objects created during the transaction. Non-Goals There is no goal to require any special Java code. The same code should work with or without this feature. Motivation When running in a request / response transaction environment there may be a need to create a lot of objects, however, none of these objects are required after the transaction has completed and if the transaction is short enough (single digit milliseconds) any garbage collection at all will delay the system. Some of this can be mitigated by carefully tracing and being aware of the objects that are created, however, due to presence of third-party code (some of which may be built into the JDK) this is not always possible. Description Standard JDK by default; enabled with special option: -XX:+UseTransactionGC. With this option the JDK works as normal until a given configured method is invoked. Once invoked all future allocations from that point on, on that thread, until the method exits is different. All allocations occur in a thread local allocation buffer where an allocation is simply updating a pointer to account for the allocation. There is no garbage tracking or collection within this method. Instead the memory buffer gets more filled up with each allocation. Once the method exits then the pointer is updated again to be reset at the place it was at the start of the method, thus de-allocating all the objects at one time. The goal is to avoid a GC pause to collect objects created during a transaction. A global GC may still run and may pause the thread. As there is no GC of this buffer all of the objects created during the transaction must fit within the TLAB at one time. Issues to solve / be aware of when the configured method is executing: * Any memory allocated may only be referenced by other objects also created during this time. For example a cache created before the method starts may not access any of these objects * Any memory allocated may only be referenced by the allocating thread. * Any attempt to break the prior two rules will result in a VM error * Due to first two rules there is no need for GC to occur, the memory may be de-allocated simply by setting the TLAB pointer back to it's original location. * finalize must be called on these objects prior to de-alloction, however, execution of the finalize must still honor the first two rules. * Any uncaught exceptions that pass out of the method must be moved prior to de-allocation such that the exception is no longer in the area to be cleared. Furthermore, any references that the exception contains must also be moved. * Any exception thrown and caught during this time it may be treated like any other object and will be de-allocated at the exit of the method. * If an object that is created in this method is required to be referenced from the outside then it must be created outside the method. Once such way is to place the data for the object on a queue and have a background thread process the queue and create the object. Adding to the queue must not create any objects or the background thread cannot access them. Instead pre-allocated entries may be used. * The objects will still show up during a heap dump, perhaps they can be marked in some fashion to make it easier to recognize that they are not GC'able objects. * JNI. Any JNI call cannot access the created objects after the method has ended. Further enhancements: * Synchronization - As none of the objects created in this mode can be accessed by other threads there is no need for synchronization, so all synchronization can be removed / stubbed out. Alternatives Some of this may be done without changing the JVM and instead using a Java agent. When the method is invoked the classes referenced are re-transformed and the methods of the referenced classes are converted to static, all objects are created as bytes in a byte array. All field references are updated to read / write this byte array. Then an object reference is just a number which is an index into the array. Then when an instance method is invoked two extra parameters are passed a) the byte array b) the index into the byte array that is the start of the instance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirk at kodewerk.com Tue Jun 6 08:26:24 2017 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Tue, 6 Jun 2017 10:26:24 +0200 Subject: Parallel reference processingq Message-ID: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> Hi, I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold. Kind regards, Kirk Pepperdine From shade at redhat.com Tue Jun 6 08:50:25 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 6 Jun 2017 10:50:25 +0200 Subject: Parallel reference processingq In-Reply-To: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> Message-ID: <1290fc41-01fd-767a-59c8-768162d1a98b@redhat.com> On 06/06/2017 10:26 AM, Kirk Pepperdine wrote: > Hi, > > I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold. See: https://bugs.openjdk.java.net/browse/JDK-8043575 -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From sangheon.kim at oracle.com Tue Jun 6 17:44:06 2017 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 6 Jun 2017 10:44:06 -0700 Subject: Parallel reference processingq In-Reply-To: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> Message-ID: <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com> Hi Kirk, On 06/06/2017 01:26 AM, Kirk Pepperdine wrote: > Hi, > > I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold. The biggest reason that I think is in some cases - if there are not many references [1]- single thread case is faster. Of course, this is controversial as choosing a benchmark will show different results. Probably big enough applications tend to have many references. But this is why we don't set 'ParallelRefProcEnabled=true' as a default. Current implementation spends some time on starting/stopping worker threads. We start and stop worker threads 9 times (3 for SoftReference and 2 times for other types) for reference processing. And this results in slower than single thread case in some cases. JDK-8043575 is proposing to dynamically switch between MT and single thread. And there are other CRs to enhance references processing. I have a prototype but need more refining. Please keep on eye on this if you are interested. (Thanks, Aleksey for the link at the other email thread) [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby is exceptional case that shows over 12k FinalReferences. So single thread is faster except Derby case. Thanks, Sangheon > > Kind regards, > Kirk Pepperdine > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirk at kodewerk.com Tue Jun 6 19:26:30 2017 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Tue, 6 Jun 2017 21:26:30 +0200 Subject: Parallel reference processingq In-Reply-To: <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com> References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com> Message-ID: <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com> > On Jun 6, 2017, at 7:44 PM, sangheon wrote: > > Hi Kirk, > > On 06/06/2017 01:26 AM, Kirk Pepperdine wrote: >> Hi, >> >> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold. > The biggest reason that I think is in some cases - if there are not many references [1]- single thread case is faster. Of course, this is controversial as choosing a benchmark will show different results. Probably big enough applications tend to have many references. But this is why we don't set 'ParallelRefProcEnabled=true' as a default. > > Current implementation spends some time on starting/stopping worker threads. We start and stop worker threads 9 times (3 for SoftReference and 2 times for other types) for reference processing. And this results in slower than single thread case in some cases. > > JDK-8043575 is proposing to dynamically switch between MT and single thread. And there are other CRs to enhance references processing. > I have a prototype but need more refining. Please keep on eye on this if you are interested. (Thanks, Aleksey for the link at the other email thread) > > [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby is exceptional case that shows over 12k FinalReferences. So single thread is faster except Derby case. SpecJVM doesn?t represent the real world. In the real world most applications use weak, soft and final references with a sprinkling of Phantom. I think Aleksey?s link was most interesting, my bad for not searching the bug database prior to posting. Anyways, I don?t mind charging clients a fee to tell them to turn on this flag but?. Kind regards, Kirk -------------- next part -------------- An HTML attachment was scrubbed... URL: From sangheon.kim at oracle.com Tue Jun 6 19:40:15 2017 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 6 Jun 2017 12:40:15 -0700 Subject: Parallel reference processingq In-Reply-To: <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com> References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com> <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com> Message-ID: <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com> On 06/06/2017 12:26 PM, Kirk Pepperdine wrote: > >> On Jun 6, 2017, at 7:44 PM, sangheon > > wrote: >> >> Hi Kirk, >> >> On 06/06/2017 01:26 AM, Kirk Pepperdine wrote: >>> Hi, >>> >>> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold. >> The biggest reason that I think is in some cases - if there are not >> many references [1]- single thread case is faster. Of course, this is >> controversial as choosing a benchmark will show different results. >> Probably big enough applications tend to have many references. But >> this is why we don't set 'ParallelRefProcEnabled=true' as a default. >> >> Current implementation spends some time on starting/stopping worker >> threads. We start and stop worker threads 9 times (3 for >> SoftReference and 2 times for other types) for reference processing. >> And this results in slower than single thread case in some cases. >> >> JDK-8043575 is >> proposing to dynamically switch between MT and single thread. And >> there are other CRs to enhance references processing. >> I have a prototype but need more refining. Please keep on eye on this >> if you are interested. (Thanks, Aleksey for the link at the other >> email thread) >> >> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby >> is exceptional case that shows over 12k FinalReferences. So single >> thread is faster except Derby case. > > SpecJVM doesn?t represent the real world. Absolutely! I was trying to answer the reason why ParallelRefProcEnabled is set to false as a default. > In the real world most applications use weak, soft and final > references with a sprinkling of Phantom. I think Aleksey?s link was > most interesting, my bad for not searching the bug database prior to > posting. > > Anyways, I don?t mind charging clients a fee to tell them to turn on > this flag but?. Okay. I hope JDK-8043575 would help them. Thanks, Sangheon > > Kind regards, > Kirk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirk at kodewerk.com Tue Jun 6 19:51:41 2017 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Tue, 6 Jun 2017 21:51:41 +0200 Subject: Parallel reference processingq In-Reply-To: <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com> References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com> <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com> <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com> Message-ID: <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com> > On Jun 6, 2017, at 9:40 PM, sangheon wrote: > > > > On 06/06/2017 12:26 PM, Kirk Pepperdine wrote: >> >>> On Jun 6, 2017, at 7:44 PM, sangheon > wrote: >>> >>> Hi Kirk, >>> >>> On 06/06/2017 01:26 AM, Kirk Pepperdine wrote: >>>> Hi, >>>> >>>> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold. >>> The biggest reason that I think is in some cases - if there are not many references [1]- single thread case is faster. Of course, this is controversial as choosing a benchmark will show different results. Probably big enough applications tend to have many references. But this is why we don't set 'ParallelRefProcEnabled=true' as a default. >>> >>> Current implementation spends some time on starting/stopping worker threads. We start and stop worker threads 9 times (3 for SoftReference and 2 times for other types) for reference processing. And this results in slower than single thread case in some cases. >>> >>> JDK-8043575 is proposing to dynamically switch between MT and single thread. And there are other CRs to enhance references processing. >>> I have a prototype but need more refining. Please keep on eye on this if you are interested. (Thanks, Aleksey for the link at the other email thread) >>> >>> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby is exceptional case that shows over 12k FinalReferences. So single thread is faster except Derby case. >> >> SpecJVM doesn?t represent the real world. > Absolutely! > I was trying to answer the reason why ParallelRefProcEnabled is set to false as a default. I got that.. I was trying to suggest that basing this decision on that benchmark isn?t a great idea. Kind regards, Kirk -------------- next part -------------- An HTML attachment was scrubbed... URL: From sangheon.kim at oracle.com Tue Jun 6 20:36:56 2017 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 6 Jun 2017 13:36:56 -0700 Subject: Parallel reference processingq In-Reply-To: <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com> References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com> <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com> <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com> <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com> Message-ID: <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com> On 06/06/2017 12:51 PM, Kirk Pepperdine wrote: > >> On Jun 6, 2017, at 9:40 PM, sangheon > > wrote: >> >> >> >> On 06/06/2017 12:26 PM, Kirk Pepperdine wrote: >>> >>>> On Jun 6, 2017, at 7:44 PM, sangheon >>> > wrote: >>>> >>>> Hi Kirk, >>>> >>>> On 06/06/2017 01:26 AM, Kirk Pepperdine wrote: >>>>> Hi, >>>>> >>>>> I?m keep running into cases where reference processing dominates the pause times budget (no matter which collector is configured). In all cases configuring parallel reference processing helped enormously. Reference processing is single threaded by default. I?m wondering if there is a reason why reference processing could be parallel by default or parallelized if the workload exceeds a reasonable threshold. >>>> The biggest reason that I think is in some cases - if there are not >>>> many references [1]- single thread case is faster. Of course, this >>>> is controversial as choosing a benchmark will show different >>>> results. Probably big enough applications tend to have many >>>> references. But this is why we don't set >>>> 'ParallelRefProcEnabled=true' as a default. >>>> >>>> Current implementation spends some time on starting/stopping worker >>>> threads. We start and stop worker threads 9 times (3 for >>>> SoftReference and 2 times for other types) for reference >>>> processing. And this results in slower than single thread case in >>>> some cases. >>>> >>>> JDK-8043575 is >>>> proposing to dynamically switch between MT and single thread. And >>>> there are other CRs to enhance references processing. >>>> I have a prototype but need more refining. Please keep on eye on >>>> this if you are interested. (Thanks, Aleksey for the link at the >>>> other email thread) >>>> >>>> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby >>>> is exceptional case that shows over 12k FinalReferences. So single >>>> thread is faster except Derby case. >>> >>> SpecJVM doesn?t represent the real world. >> Absolutely! >> I was trying to answer the reason why ParallelRefProcEnabled is set >> to false as a default. > > I got that.. I was trying to suggest that basing this decision on that > benchmark isn?t a great idea. Probably my explanation was incomplete. ParallelRefProcEnabled command-line option was introduced long time ago with false as a default. And my previous answer with Specjvm2008 was my guess from recent data when I investigated JDK-8043575. I was saying if we don't have enough references to process, single thread is better choice. So this could be the reason of current default value. Or my guess would be simply wrong. :) Probably you are saying that we have to use other benchmarks to decide the default value. May I ask what is your recommendation for the benchmarks? I will not try to change its default value but your recommendation would be helpful for further investigation. Thanks, Sangheon > > Kind regards, > Kirk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart.monteith at linaro.org Wed Jun 7 16:20:55 2017 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Wed, 7 Jun 2017 17:20:55 +0100 Subject: Revew Proposal for transactional aware GC In-Reply-To: References: Message-ID: Hello, This is similar to how the Realtime Specification for Java (RTSJ - JSR-1) manages memory. If you aren't aware, see: http://www.rtsj.org/specjavadoc/book_index.html . The concept they use is called "ScopeMemory". It is expected that the VM will throw an appropriate exception when references to objects in ScopedMemory are stored in objects outside - I think it is a javax.realtime.MemoryAccessError. Is this what you mean by a "VM error"? They also solve the problem of how exceptions are allocated by use of an javax.realtime.ThrowBoundaryError which is thrown outside of the scope of the ScopedMemory where the exception was raised. There is precedent for resetting the whole VM between transactions, but I won't say more on that. BR, Stuart On 2 June 2017 at 17:20, Calvin Smith wrote: > TransactionGC: Less GC for Transactional Systems > > > Similar to proposal: JEP draft: Epsilon GC: The Arbitrarily Low Overhead > Garbage (Non-)Collector > > > Summary > > Start as normal and run with the chosen garbage collector. A class/method > may be registered and once it starts to run and until the method exits all > memory is allocated in a thread local allocation buffer and is not available > for other threads. In addition there is no garbage collection for these > objects during this time. Once the buffer is exhausted then an OutOfMemory > will be raised and the stack unwind > > Goals > > When running in a request / response transaction environment do not collect > objects created during the transaction. > > Non-Goals > > There is no goal to require any special Java code. The same code should work > with or without this feature. > > Motivation > > When running in a request / response transaction environment there may be a > need to create a lot of objects, however, none of these objects are required > after the transaction has completed and if the transaction is short enough > (single digit milliseconds) any garbage collection at all will delay the > system. Some of this can be mitigated by carefully tracing and being aware > of the objects that are created, however, due to presence of third-party > code (some of which may be built into the JDK) this is not always possible. > > Description > > Standard JDK by default; enabled with special option: -XX:+UseTransactionGC. > > With this option the JDK works as normal until a given configured method is > invoked. Once invoked all future allocations from that point on, on that > thread, until the method exits is different. All allocations occur in a > thread local allocation buffer where an allocation is simply updating a > pointer to account for the allocation. There is no garbage tracking or > collection within this method. Instead the memory buffer gets more filled up > with each allocation. Once the method exits then the pointer is updated > again to be reset at the place it was at the start of the method, thus > de-allocating all the objects at one time. > > The goal is to avoid a GC pause to collect objects created during a > transaction. A global GC may still run and may pause the thread. As there is > no GC of this buffer all of the objects created during the transaction must > fit within the TLAB at one time. > > Issues to solve / be aware of when the configured method is executing: > > * Any memory allocated may only be referenced by other objects also created > during this time. For example a cache created before the method starts may > not access any of these objects > > * Any memory allocated may only be referenced by the allocating thread. > > * Any attempt to break the prior two rules will result in a VM error > > * Due to first two rules there is no need for GC to occur, the memory may be > de-allocated simply by setting the TLAB pointer back to it's original > location. > > * finalize must be called on these objects prior to de-alloction, however, > execution of the finalize must still honor the first two rules. > > * Any uncaught exceptions that pass out of the method must be moved prior to > de-allocation such that the exception is no longer in the area to be > cleared. Furthermore, any references that the exception contains must also > be moved. > > * Any exception thrown and caught during this time it may be treated like > any other object and will be de-allocated at the exit of the method. > > * If an object that is created in this method is required to be referenced > from the outside then it must be created outside the method. Once such way > is to place the data for the object on a queue and have a background thread > process the queue and create the object. Adding to the queue must not create > any objects or the background thread cannot access them. Instead > pre-allocated entries may be used. > > * The objects will still show up during a heap dump, perhaps they can be > marked in some fashion to make it easier to recognize that they are not > GC'able objects. > > * JNI. Any JNI call cannot access the created objects after the method has > ended. > > > > Further enhancements: > > * Synchronization - As none of the objects created in this mode can be > accessed by other threads there is no need for synchronization, so all > synchronization can be removed / stubbed out. > > Alternatives > > Some of this may be done without changing the JVM and instead using a Java > agent. When the method is invoked the classes referenced are re-transformed > and the methods of the referenced classes are converted to static, all > objects are created as bytes in a byte array. All field references are > updated to read / write this byte array. Then an object reference is just a > number which is an index into the array. Then when an instance method is > invoked two extra parameters are passed a) the byte array b) the index into > the byte array that is the start of the instance > > > > > From kirk at kodewerk.com Wed Jun 7 06:17:19 2017 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Wed, 7 Jun 2017 08:17:19 +0200 Subject: Parallel reference processingq In-Reply-To: <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com> References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com> <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com> <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com> <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com> <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com> Message-ID: >>>>> >>>>> JDK-8043575 is proposing to dynamically switch between MT and single thread. And there are other CRs to enhance references processing. >>>>> I have a prototype but need more refining. Please keep on eye on this if you are interested. (Thanks, Aleksey for the link at the other email thread) >>>>> >>>>> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. Derby is exceptional case that shows over 12k FinalReferences. So single thread is faster except Derby case. >>>> >>>> SpecJVM doesn?t represent the real world. >>> Absolutely! >>> I was trying to answer the reason why ParallelRefProcEnabled is set to false as a default. >> >> I got that.. I was trying to suggest that basing this decision on that benchmark isn?t a great idea. > Probably my explanation was incomplete. I think we?re talking past each other, my apologies for being a bit too terse. The referenced bug report seems to cover all of my concerns. > ParallelRefProcEnabled command-line option was introduced long time ago with false as a default. And my previous answer with Specjvm2008 was my guess from recent data when I investigated JDK-8043575. I was saying if we don't have enough references to process, single thread is better choice. So this could be the reason of current default value. Or my guess would be simply wrong. :) Right, hence my comment that SpecJVM isn?t a great benchmark as it doesn?t represent enterprise applications and hence has nothing useful to say about reference processing which is commonly seen in enterprise applications. > > Probably you are saying that we have to use other benchmarks to decide the default value. > May I ask what is your recommendation for the benchmarks? I don?t have a specific benchmark for this. I?m relying on observations made from customer?s applications. I don?t know if the SPEC application server benchmark addresses this question. I?ve not run it in quite some time so I don?t recall. At any rate, it is very clear that real world applications use many frameworks that rely heavily on the use of reference types. For example, Hibernate with secondary caching turned on. CMS is sensitive but the G1?s remark phase appears to be exceptionally sensitive to the amount of references it processes. I?ve attached a pause time and G1 breakout of the other phases that is very typical of what I?m seeing. Kind regards, Kirk -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-2.tiff Type: image/tiff Size: 1102226 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.tiff Type: image/tiff Size: 1292744 bytes Desc: not available URL: From sangheon.kim at oracle.com Wed Jun 7 20:28:56 2017 From: sangheon.kim at oracle.com (sangheon) Date: Wed, 7 Jun 2017 13:28:56 -0700 Subject: Parallel reference processingq In-Reply-To: References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com> <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com> <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com> <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com> <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com> Message-ID: <70472822-8015-cd9f-97ec-40d377f40aef@oracle.com> On 06/06/2017 11:17 PM, Kirk Pepperdine wrote: > >>>>>> >>>>>> JDK-8043575 is >>>>>> proposing to dynamically switch between MT and single thread. And >>>>>> there are other CRs to enhance references processing. >>>>>> I have a prototype but need more refining. Please keep on eye on >>>>>> this if you are interested. (Thanks, Aleksey for the link at the >>>>>> other email thread) >>>>>> >>>>>> [1]: e.g. Most of Specjvm2008 sub-tests don't use references. >>>>>> Derby is exceptional case that shows over 12k FinalReferences. So >>>>>> single thread is faster except Derby case. >>>>> >>>>> SpecJVM doesn?t represent the real world. >>>> Absolutely! >>>> I was trying to answer the reason why ParallelRefProcEnabled is set >>>> to false as a default. >>> >>> I got that.. I was trying to suggest that basing this decision on >>> that benchmark isn?t a great idea. >> Probably my explanation was incomplete. > > I think we?re talking past each other, my apologies for being a bit > too terse. It was same here.. :) > The referenced bug report seems to cover all of my concerns. Good to hear the CR would cover your concerns. > >> ParallelRefProcEnabled command-line option was introduced long time >> ago with false as a default. And my previous answer with Specjvm2008 >> was my guess from recent data when I investigated JDK-8043575. I was >> saying if we don't have enough references to process, single thread >> is better choice. So this could be the reason of current default >> value. Or my guess would be simply wrong. :) > > Right, hence my comment that SpecJVM isn?t a great benchmark as it > doesn?t represent enterprise applications and hence has nothing useful > to say about reference processing which is commonly seen in enterprise > applications. Yes, SpecJVM doesn't represent enterprise applications. I was trying to say that MTness of reference processing is mostly affected by the total of references. But saying the benchmark name made a noise. JDK-8043575 is mostly about dynamically choosing MTness for reference processing. Focusing on the switch(turn on/off the option), any applications that showing several aspects(many references, limited references etc..) seem okay to me. In my case, as SpecJVM2008-Derby has many final references(over 12k), my prototype should show almost same result as "baseline, +ParallelRefProcEnabled". And for other sub-tests of SpecJVM2008, my prototype should show almost same as "baseline, -ParallelRefProcEnabled" because with limited references, single thread shows better results. Initially I worked with micro-benchmark but I also wanted to test with known benchmark as well. Hope this explains why I used SpecJVM2008. > >> >> Probably you are saying that we have to use other benchmarks to >> decide the default value. >> May I ask what is your recommendation for the benchmarks? > > I don?t have a specific benchmark for this. I?m relying on > observations made from customer?s applications. I don?t know if the > SPEC application server benchmark addresses this question. I?ve not > run it in quite some time so I don?t recall. At any rate, it is very > clear that real world applications use many frameworks that rely > heavily on the use of reference types. For example, Hibernate with > secondary caching turned on. CMS is sensitive but the G1?s remark > phase appears to be exceptionally sensitive to the amount of > references it processes. I?ve attached a pause time and G1 breakout of > the other phases that is very typical of what I?m seeing. Thank you for the attachments. I will analyze a bit more. Thanks, Sangheon > > Kind regards, > Kirk > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.reinhold at oracle.com Wed Jun 7 22:42:05 2017 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Wed, 7 Jun 2017 15:42:05 -0700 (PDT) Subject: JEP 189: Shenandoah: An Ultra-Low-Pause-Time Garbage Collector Message-ID: <20170607224205.4A45DF977C@eggemoggin.niobe.net> New JEP Candidate: http://openjdk.java.net/jeps/189 - Mark From mark.reinhold at oracle.com Wed Jun 7 23:12:59 2017 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Wed, 7 Jun 2017 16:12:59 -0700 (PDT) Subject: JEP 304: Garbage-Collector Interface Message-ID: <20170607231259.1EE66F978A@eggemoggin.niobe.net> New JEP Candidate: http://openjdk.java.net/jeps/304 - Mark From stefan.johansson at oracle.com Thu Jun 8 12:35:58 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 8 Jun 2017 14:35:58 +0200 Subject: RFR: 8177544: Restructure G1 Full GC code Message-ID: <62d1f02b-1fc0-ffcf-b8e0-e88ebacecebe@oracle.com> Hi, Please review this enhancement: https://bugs.openjdk.java.net/browse/JDK-8177544 Webrev: http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.00/ Summary: This is more or less only code moving around. The function do_full_collection in G1CollectedHeap is very large and breaking it up to smaller parts and grouping together some of the stack objects help readability. In addition to splitting the large function to smaller ones I've introduced two new classes: - G1FullGCScope that groups most of the previously spread out stack objects. - G1SerialCollector that handles the interaction with G1MarkSweep. Doing this change will simplify future changes to the full GC. Testing: * Locally run JTREG tests * RBT hotspot tier 2 & 3 Thanks, Stefan From Milan.Mimica at infobip.com Thu Jun 8 12:57:27 2017 From: Milan.Mimica at infobip.com (Milan Mimica) Date: Thu, 8 Jun 2017 12:57:27 +0000 Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC In-Reply-To: <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com> References: <1495365159435.54025@infobip.com> <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com> <1495734990075.28893@infobip.com>, <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com> Message-ID: <1496926647107.27811@infobip.com> Milan Mimica, Senior Software Engineer / Division Lead > From: Kim Barrett > Sent: Saturday, May 27, 2017 01:00 > > We appreciate your interest in working on this and are happy to help, but we > really do need to check off this process item before going any deeper. Filled and sent. > > Yes, some refactoring seems required in order to properly fix JDK-8176571. > That?s what I meant by: > >On May 22, 2017, at 4:41 PM, Kim Barrett wrote: >> There doesn't seem to be a good path into the functionality provided >> by ArrayAllocator<> that has such a runtime MEMFLAGS value [...] The >> lack of a runtime-only path for propagating that information would >> need to be added [sic. fixed]. Okay. There are two patches. - refactor_array_allocator.diff Pass MEMFLAGS down to the concrete allocator via call stack instead of using template var. Indeed it feels right, especially because some methods don't even use MEMFLAGS. - heapBitMap_nmt.diff Changed CHeapBitMap to use configurable NMT pool. Changed all (not just 'fine' bitmaps) G1 usages of CHeapBitMap to specify mtGC. Had to add a pair CHeapBitMap constructors to make test_bitMap.cpp happy. That doesn't feel right, but I don't know what to do, except to rewrite the test. Btw, what are your plans on c++>=11? For example I could use delegating constructors here, or std::is_same. -------------- next part -------------- A non-text attachment was scrubbed... Name: heapBitMap_nmt.diff Type: text/x-patch Size: 5602 bytes Desc: heapBitMap_nmt.diff URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: refactor_array_allocator.diff Type: text/x-patch Size: 11817 bytes Desc: refactor_array_allocator.diff URL: From kim.barrett at oracle.com Fri Jun 9 01:04:33 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 8 Jun 2017 21:04:33 -0400 Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC In-Reply-To: <1496926647107.27811@infobip.com> References: <1495365159435.54025@infobip.com> <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com> <1495734990075.28893@infobip.com> <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com> <1496926647107.27811@infobip.com> Message-ID: > On Jun 8, 2017, at 8:57 AM, Milan Mimica wrote: > > > Milan Mimica, Senior Software Engineer / Division Lead >> From: Kim Barrett >> Sent: Saturday, May 27, 2017 01:00 >> >> We appreciate your interest in working on this and are happy to help, but we >> really do need to check off this process item before going any deeper. > > Filled and sent. Thanks. >> Yes, some refactoring seems required in order to properly fix JDK-8176571. >> That?s what I meant by: >> >> On May 22, 2017, at 4:41 PM, Kim Barrett wrote: >>> There doesn't seem to be a good path into the functionality provided >>> by ArrayAllocator<> that has such a runtime MEMFLAGS value [...] The >>> lack of a runtime-only path for propagating that information would >>> need to be added [sic. fixed]. > > Okay. There are two patches. Thanks for splitting this into a refactoring followed by the use of that refactoring. That should make reviewing and discussion easier. I'm looking at the refactoring part, but am running out of time today. We should probably have a new RFE for the refactoring. I'll take care of that tomorrow. Note that the refactoring patch doesn't apply cleanly to jdk10/hs tip. There's a merge conflict with the fix for JDK-8168467 (resolved 2017/03/15). After dealing with that, the refactoring looks good to me on an initial pass. I want to take a more careful look tomorrow. You'll need an Oracle sponsor, since you are not (yet) a committer, and also because these changes affect hotspot, so need to eventually be pushed via jprt. I can be the sponsor for the refactoring. What testing has been done? And are there any tests you can point to that are directly affected? (I already know about TestArrayAllocatorMallocLimit.java.) I'll probably want to run some tests using our internal test facilities as part of sponsoring. I haven't looked at the second patch at all yet. > - refactor_array_allocator.diff > Pass MEMFLAGS down to the concrete allocator via > call stack instead of using template var. Indeed it feels right, especially > because some methods don't even use MEMFLAGS. > > - heapBitMap_nmt.diff > Changed CHeapBitMap to use configurable NMT pool. Changed > all (not just 'fine' bitmaps) G1 usages of CHeapBitMap to specify mtGC. Had to > add a pair CHeapBitMap constructors to make test_bitMap.cpp happy. That doesn't > feel right, but I don't know what to do, except to rewrite the test. > > Btw, what are your plans on c++>=11? For example I could use delegating > constructors here, or std::is_same. I think that?s unlikely to happen soon, though there is interest. But there?s also a fair amount of work involved, which needs to be balanced against other tasks. I think going beyond c++11 isn?t even feasible right now, as some of the relevant compilers don?t yet have a version that supports c++14. There?s in-progress work to add some metaprogramming utilities, including IsSame, as we keep encountering places where such would be useful. I?m expecting that to show up pretty soon. From kirk at kodewerk.com Fri Jun 9 15:17:30 2017 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Fri, 9 Jun 2017 17:17:30 +0200 Subject: Parallel reference processingq In-Reply-To: <70472822-8015-cd9f-97ec-40d377f40aef@oracle.com> References: <2C20D7DA-2E0C-44B4-B5CB-54201A670279@kodewerk.com> <2df56a2d-b219-9b84-0e84-967953f122dc@oracle.com> <05FF4DD7-5C2A-4061-AD55-E7AB30FAE1C1@kodewerk.com> <440ac23c-6f34-51a5-9bf6-c62696dc2d73@oracle.com> <7C8B47AA-21B2-4253-90F5-96EAE8CA4574@kodewerk.com> <73255ea9-6ed2-2a66-f3cb-4ac4325af070@oracle.com> <70472822-8015-cd9f-97ec-40d377f40aef@oracle.com> Message-ID: <50ABF569-87D4-4A1A-9619-935D54880062@kodewerk.com> > > I was trying to say that MTness of reference processing is mostly affected by the total of references. But saying the benchmark name made a noise. > JDK-8043575 is mostly about dynamically choosing MTness for reference processing. Focusing on the switch(turn on/off the option), any applications that showing several aspects(many references, limited references etc..) seem okay to me. In my case, as SpecJVM2008-Derby has many final references(over 12k), my prototype should show almost same result as "baseline, +ParallelRefProcEnabled". And for other sub-tests of SpecJVM2008, my prototype should show almost same as "baseline, -ParallelRefProcEnabled" because with limited references, single thread shows better results. Right, this is an issue in that I often a range of 100-250K references processed. Kind regards, Kirk From sangheon.kim at oracle.com Fri Jun 9 23:57:54 2017 From: sangheon.kim at oracle.com (sangheon) Date: Fri, 9 Jun 2017 16:57:54 -0700 Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing Message-ID: Hi all, Can I have some reviews for changes to improve logging for j.l.reference processing? This patch is proposing to add logs for balance queue time, phase1~3 time of reference processing, enqueue time for discovered references at debug level and workers distribution stats at trace level. And it also includes trace events for those cases. The log will be changed like below: * Before [debug][gc,ref] GC(9) SoftReference 0.581ms [debug][gc,ref] GC(9) WeakReference 1.066ms [debug][gc,ref] GC(9) FinalReference 0.376ms [debug][gc,ref] GC(9) PhantomReference 0.468ms [debug][gc,ref] GC(9) JNI Weak Reference 0.005ms [debug][gc,ref] GC(9) Ref Counts: Soft: 0 Weak: 0 Final: 0 Phantom: 0 * After [debug][gc,ref] GC(5) SoftReference 0.895ms [debug][gc,ref] GC(5) Balance queues: 0.001ms [debug][gc,ref] GC(5) Phase1: 0.456ms [trace][gc,ref] GC(5) Process lists (ms) Min: 0.0, Avg: 0.3, Max: 0.3, Diff: 0.3, Sum: 5.8, Workers: 23 [debug][gc,ref] GC(5) Phase2: 0.059ms [trace][gc,ref] GC(5) Process lists (ms) Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0, Workers: 23 [debug][gc,ref] GC(5) Phase3: 0.374ms [trace][gc,ref] GC(5) Process lists (ms) Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 4.0, Workers: 23 [debug][gc,ref] GC(5) Cleared: 0 [debug][gc,ref] GC(5) Discovered: 0 ... [debug][gc,ref] GC(5) JNI Weak Reference 0.003ms [debug][gc,ref] GC(5) Enqueue reference lists 0.081ms [debug][gc,ref] GC(5) Counts: Soft: 0 Weak: 0 Final: 0 Phantom: 0 CR: https://bugs.openjdk.java.net/browse/JDK-8173335 webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0 Testing: JPRT and local tests of combinations with +/-ParallelRefProcEnabled and gc types. Thanks, Sangheon From Milan.Mimica at infobip.com Sat Jun 10 15:07:29 2017 From: Milan.Mimica at infobip.com (Milan Mimica) Date: Sat, 10 Jun 2017 15:07:29 +0000 Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC In-Reply-To: References: <1495365159435.54025@infobip.com> <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com> <1495734990075.28893@infobip.com> <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com> <1496926647107.27811@infobip.com>, Message-ID: <1497107249632.26184@infobip.com> Milan Mimica, Senior Software Engineer / Division Lead > From: Kim Barrett > Sent: Friday, June 9, 2017 03:04 > > Note that the refactoring patch doesn't apply cleanly to jdk10/hs tip. > There's a merge conflict with the fix for JDK-8168467 (resolved > 2017/03/15). Oh, I 've developed it against jdk10/jdk10. Care to explain a bit the difference? Attached are patches against jdk10/hs. > What testing has been done? And are there any tests you can point to > that are directly affected? (I already know about > TestArrayAllocatorMallocLimit.java.) I'll probably want to run some > tests using our internal test facilities as part of sponsoring. I have run jtreg on my laptop. There are some failures actually, but happens also on clean repo. I'm not aware of anything else. -------------- next part -------------- A non-text attachment was scrubbed... Name: heapBitMap_nmt.diff Type: text/x-patch Size: 5506 bytes Desc: heapBitMap_nmt.diff URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: refactor_array_allocator.diff Type: text/x-patch Size: 11804 bytes Desc: refactor_array_allocator.diff URL: From thomas.schatzl at oracle.com Mon Jun 12 11:34:38 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 12 Jun 2017 13:34:38 +0200 Subject: RFR (7xS): 8178148: Log more detailed information about scan rs phase In-Reply-To: <04ab36fb-afaf-0ed9-6480-b955474d4bee@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <1493986396.2777.61.camel@oracle.com> <41d0b773-06fd-dfef-beda-2d62797210d9@oracle.com> <1494849002.2707.20.camel@oracle.com> <1495543832.2781.37.camel@oracle.com> <04ab36fb-afaf-0ed9-6480-b955474d4bee@oracle.com> Message-ID: <1497267278.2777.2.camel@oracle.com> Hi all, ? sorry for another round of reviews: Erik asked me to add a gtest test for the linked subitems, both for the (existing) set() and (new) add() methods. Webrev:?http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2_to_3/?(di ff):?http://cr.openjdk.java.net/~tschatzl/8178148/webrev.3/?(full) Testing: local testing, jprt Thanks, ? Thomas On Tue, 2017-05-23 at 12:15 -0700, sangheon wrote: > Hi Thomas, > > On 05/23/2017 05:50 AM, Thomas Schatzl wrote: > > > > Hi all, > > > > ???unfortunately, for support of some code there is need for one > > more > > public method in the g1gcphasetimes class. > > > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8178148/webrev.1_to_2/ (diff) > > http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2/ (full) > Webrev.2 still looks good to me. > > Thanks, > Sangheon > > > > > > > > Sorry for the issue. > > > > Thanks, > > ???Thomas > > From shade at redhat.com Mon Jun 12 16:06:43 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 12 Jun 2017 18:06:43 +0200 Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing In-Reply-To: References: Message-ID: On 06/10/2017 01:57 AM, sangheon wrote: > CR: https://bugs.openjdk.java.net/browse/JDK-8173335 > webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0 Oh, good! I had to instrument these by hand when optimizing RP paths. Comments after brief look: *) So, the path with NULL executor are also not handling the timer? E.g. CMS: 5262 if (rp->processing_is_mt()) { 5263 rp->balance_all_queues(); 5264 CMSRefProcTaskExecutor task_executor(*this); 5265 rp->enqueue_discovered_references(&task_executor, _gc_timer_cm); 5266 } else { 5267 rp->enqueue_discovered_references(NULL); 5268 } *) I would leave "Ref Counts" line as usual for compatibility reasons. Changing it to "Counts" would force GC log parsers to handle that corner case too. *) This may reuse Indents? 95 out->print("%s", " "); *) Probably makes sense to "hg mv -A" the workerDataArray files to preserve the Mercurial history -- webrev should say something like "copied from ...", IIRC. Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From sangheon.kim at oracle.com Tue Jun 13 00:13:21 2017 From: sangheon.kim at oracle.com (sangheon) Date: Mon, 12 Jun 2017 17:13:21 -0700 Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing In-Reply-To: References: Message-ID: Hi Aleksey, Thanks for the review. On 06/12/2017 09:06 AM, Aleksey Shipilev wrote: > On 06/10/2017 01:57 AM, sangheon wrote: >> CR: https://bugs.openjdk.java.net/browse/JDK-8173335 >> webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0 > Oh, good! I had to instrument these by hand when optimizing RP paths. > > Comments after brief look: > > *) So, the path with NULL executor are also not handling the timer? E.g. CMS: > > 5262 if (rp->processing_is_mt()) { > 5263 rp->balance_all_queues(); > 5264 CMSRefProcTaskExecutor task_executor(*this); > 5265 rp->enqueue_discovered_references(&task_executor, _gc_timer_cm); > 5266 } else { > 5267 rp->enqueue_discovered_references(NULL); > 5268 } Fixed to use timers for similar cases that you pointed. Thanks for catching up this! I started this CR as a part of MT ref. processing(JDK-8043575), so I only added to that path. But this should be fixed. > > *) I would leave "Ref Counts" line as usual for compatibility reasons. Changing > it to "Counts" would force GC log parsers to handle that corner case too. Changed, 'Counts -> Ref Counts'. > > *) This may reuse Indents? > > 95 out->print("%s", " "); Fixed to use Indents[2]. > > *) Probably makes sense to "hg mv -A" the workerDataArray files to preserve the > Mercurial history -- webrev should say something like "copied from ...", IIRC. Fixed. webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/ http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0 Thanks, Sangheon > > Thanks, > -Aleksey > From thomas.schatzl at oracle.com Tue Jun 13 09:36:06 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 13 Jun 2017 11:36:06 +0200 Subject: RFR: 8177544: Restructure G1 Full GC code In-Reply-To: <62d1f02b-1fc0-ffcf-b8e0-e88ebacecebe@oracle.com> References: <62d1f02b-1fc0-ffcf-b8e0-e88ebacecebe@oracle.com> Message-ID: <1497346566.2829.33.camel@oracle.com> Hi, ? thanks for your hard work on the parallel full gc that starts with this refactoring :) On Thu, 2017-06-08 at 14:35 +0200, Stefan Johansson wrote: > Hi, > > Please review this enhancement: > https://bugs.openjdk.java.net/browse/JDK-8177544 > > Webrev: > http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.00/ > > Summary: > This is more or less only code moving around. The function? > do_full_collection in G1CollectedHeap is very large and breaking it > up to smaller parts and grouping together some of the stack objects > help readability. > > In addition to splitting the large function to smaller ones I've? > introduced two new classes: > - G1FullGCScope that groups most of the previously spread out stack > objects. > - G1SerialCollector that handles the interaction with G1MarkSweep. > > Doing this change will simplify future changes to the full GC. > > Testing: > * Locally run JTREG tests > * RBT hotspot tier 2 & 3 > Some initial thoughts of the change, mostly to start a discussion: ? - G1FullGCScope class: please add a line what the purpose of the class is. ? - a better name for G1CollectedHeap::reset_card_cache_and_queue() could be abort_refinement(). ? - G1CollectedHeap.cpp:1145: please remove the "stale" word in that comment. It's confusing me because at that point because "stale" cards are kind of defined for a particular context and does not fit here. ? - can you move all the printing after collection (g1CollectedHeap.cpp:1239 - 1249) into an extra method too? Something like "print_heap_after_full_collection()"? (I think there is some argument to also have a print_heap_before_full_collection() method). ? - G1SerialCollector is actually a "G1SerialFullCollector". I do not remember whether the follow-up change removes it again anyway, but it seems to be a simple renaming. ? - G1SerialCollector interface: while I could live with the prepare/do/complete naming of the methods, the typical sequence is (unfortunately gc_prologue(), collect(), gc_epilogue()) ? - previously printing and verifying the heap has been outside the "Pause Full" GCTraceTime. I am okay with that. ? - could we put the code from g1CollectedHeap.cpp:1215-1232 into a "prepare_for_regular_collection" method? ? - the order of the gc_epilogue() and g1_policy- >record_full_collection_end() calls is different. Actually, if it were for me, I would put the whole full gc setup and teardown into a separate class/file. Have public gc_prologue()/collect()/gc_epilogue() methods where gc_prologue() is the first part of do_full_collection_inner() until application of the G1SerialCollector, collect() the instantiation and application of G1SerialCollector, and gc_epilogue() the remainder. E.g. in G1CollectedHeap we only have the calls to these three methods (there is no need to have all three). At least I think it would help a lot if all that full gc stuff would be separate physically from do-all-G1CollectedHeap. With the G1FullGCScope there is almost no reference to G1CollectedHeap afaics. (There is _allocator->init_mutator_alloc_region() call) ? - g1CollectedHeap.hpp: please try to sort the definitions of the new methods in order of calling them. Thanks, ? Thomas From thomas.schatzl at oracle.com Tue Jun 13 11:21:22 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 13 Jun 2017 13:21:22 +0200 Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing In-Reply-To: References: Message-ID: <1497352882.2829.65.camel@oracle.com> Hi Sangheon, On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote: > Hi Aleksey, > > Thanks for the review. > > On 06/12/2017 09:06 AM, Aleksey Shipilev wrote: > > > > On 06/10/2017 01:57 AM, sangheon wrote: > > > > > > CR: https://bugs.openjdk.java.net/browse/JDK-8173335 > > > webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0 - There should be a destructor in ReferenceProcessor cleaning up the dynamically allocated memory. - the change should move gc+ref output to something else: there is so much additional junk printed with gc+ref=trace so that the phase logging is drowned out with real trace information and unusable for regular consumption. Also I would prefer to have this detailed log output interspersed within the (existing) gc+phases output. Like under the "Reference Processing" and "Reference Enqueuing" sections for G1 in particular. Maybe with gc+phases+ref=debug/trace so that "everything" could be enabled using "gc+phases*=debug/trace"? I can see that the code throws away the previous information about reference processing after every use (the phasetimes reused). This is does not allow printing of the data at convenient times and places. I.e. I would prefer if the data were aggregated (not only from one particular phase) and later printed together. I kind of disagree with Aleksey about need for backwards compatibility of log messages. This is such a big breaking change in the amount of information shown that existing users will want to adapt their log readers anyway. As mentioned, due to real trace code here, gc+ref=trace is unusable. We could still provide minimal backwards compatible output under gc+ref=debug if needed. - I would prefer if resetting the reference phase times logger wouldn't be kind of an afterthought of printing :) Also it might be useful to keep the data around for somewhat longer (not throw it away after every phase). Don't we need the data for further analysis? This would also allow printing it later using different log tags (with different formatting). - I like the split of phasetimes into data storage and printing. I do not like that basically the timing data is created twice, once for the phasetimes, once for the GCTimer (for JFR basically). Or the gctimer is passed everywhere. But that is another issue I guess. Thanks, ? Thomas > > Oh, good! I had to instrument these by hand when optimizing RP > > paths. > > > > Comments after brief look: > > > > ? *) So, the path with NULL executor are also not handling the > > timer? E.g. CMS: > > > > ? 5262???if (rp->processing_is_mt()) { > > ? 5263?????rp->balance_all_queues(); > > ? 5264?????CMSRefProcTaskExecutor task_executor(*this); > > ? 5265?????rp->enqueue_discovered_references(&task_executor, > > _gc_timer_cm); > > ? 5266???} else { > > ? 5267?????rp->enqueue_discovered_references(NULL); > > ? 5268???} > Fixed to use timers for similar cases that you pointed. Thanks for? > catching up this! > I started this CR as a part of MT ref. processing(JDK-8043575), so I? > only added to that path. But this should be fixed. > > > > > > ? *) I would leave "Ref Counts" line as usual for compatibility > > reasons. Changing > > it to "Counts" would force GC log parsers to handle that corner > > case too. > Changed, 'Counts -> Ref Counts'. > > > > > > ? *) This may reuse Indents? > > > > ????95???????out->print("%s", "????"); > Fixed to use Indents[2]. > > > > > > > ? *) Probably makes sense to "hg mv -A" the workerDataArray files > > to preserve the > > Mercurial history -- webrev should say something like "copied from > > ...", IIRC. > Fixed. > > webrev: > http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/ > http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0 > > Thanks, > Sangheon > > > > > > > > Thanks, > > -Aleksey > > From erik.helin at oracle.com Tue Jun 13 12:21:22 2017 From: erik.helin at oracle.com (Erik Helin) Date: Tue, 13 Jun 2017 14:21:22 +0200 Subject: RFR (7xS): 8178148: Log more detailed information about scan rs phase In-Reply-To: <1497267278.2777.2.camel@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <1493986396.2777.61.camel@oracle.com> <41d0b773-06fd-dfef-beda-2d62797210d9@oracle.com> <1494849002.2707.20.camel@oracle.com> <1495543832.2781.37.camel@oracle.com> <04ab36fb-afaf-0ed9-6480-b955474d4bee@oracle.com> <1497267278.2777.2.camel@oracle.com> Message-ID: On 06/12/2017 01:34 PM, Thomas Schatzl wrote: > Hi all, > > sorry for another round of reviews: Erik asked me to add a gtest test > for the linked subitems, both for the (existing) set() and (new) add() > methods. > > Webrev: http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2_to_3/ (di > ff): http://cr.openjdk.java.net/~tschatzl/8178148/webrev.3/ (full) > > Testing: > local testing, jprt Looks good, Reviewed. Thanks, Erik > Thanks, > Thomas > > On Tue, 2017-05-23 at 12:15 -0700, sangheon wrote: >> Hi Thomas, >> >> On 05/23/2017 05:50 AM, Thomas Schatzl wrote: >>> >>> Hi all, >>> >>> unfortunately, for support of some code there is need for one >>> more >>> public method in the g1gcphasetimes class. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev.1_to_2/ (diff) >>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2/ (full) >> Webrev.2 still looks good to me. >> >> Thanks, >> Sangheon >> >> >>> >>> >>> Sorry for the issue. >>> >>> Thanks, >>> Thomas >>> From SL at elp-consult.co.uk Tue Jun 13 16:35:36 2017 From: SL at elp-consult.co.uk (Shi Lu) Date: Tue, 13 Jun 2017 16:35:36 +0000 Subject: JVM expert lead opportunity in Bay Area In-Reply-To: References: Message-ID: Hello there, This is Jay from ELP Consult Ltd, greetings from London! I am a global strategic hiring partner with Alibaba Group, and would like to take this opportunity to introduce a great opportunity here: Alibaba Chief Technology Officer recently announced a re-structuring plan for Alibaba CTO group, separate the AIS business group to numbers of different divisions, but still work together to create a globally competitive combined hardware and software infrastructure for Alibaba eco-system. Set up system software division by the platform architecture team, JVM, Linux Kernel OS team together. therefore they need high-end technical Leader with strong JVM OpenJDK experience especially on GC to build up a new team in US and work together with the existing team in China HQ Hangzhou. you will be responsible the development of new technologies, be able to lead a team to conduct in-depth research and innovation. as one of world's largest users of Java, Alibaba will provide you with the extreme technical challenges, which you will never find it from anywhere else. If you are interested to explore more, please do not hesitate to reply me in order to have a confidential chat, I am looking forward to hear from you soon, thank you. ????????????????????? Kind Regards/Mit freundlichen Gr?ssen/???? Jay Lu Tel: +44 208 8996136 Mobile: +44 7917405668 ELP Consult Ltd. ???? [ELP Logo] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 2670 bytes Desc: image001.jpg URL: From sangheon.kim at oracle.com Tue Jun 13 21:21:29 2017 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 13 Jun 2017 14:21:29 -0700 Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing In-Reply-To: <1497352882.2829.65.camel@oracle.com> References: <1497352882.2829.65.camel@oracle.com> Message-ID: <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com> Hi Thomas, Thank you for reviewing this. On 06/13/2017 04:21 AM, Thomas Schatzl wrote: > Hi Sangheon, > > > On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote: >> Hi Aleksey, >> >> Thanks for the review. >> >> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote: >>> On 06/10/2017 01:57 AM, sangheon wrote: >>>> CR: https://bugs.openjdk.java.net/browse/JDK-8173335 >>>> webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0 > - There should be a destructor in ReferenceProcessor cleaning up the > dynamically allocated memory. Thomas and I had some discussion about this and agreed to file a separate CR for freeing issue. I noticed that there's no destructor when I wrote this, but this is how we usually implement. However as this seems incorrect, I will add a destructor for newly added class but it will not be used in this patch. It will be used in the following CR( https://bugs.openjdk.java.net/browse/JDK-8182120 ) which fixes not-freeing issue in ReferenceProcessor. FYI, ReferenceProcessor has heap allocated members of ReferencePolicy(and its friends) but it is not freed too. So instead of extending this patch, I propose to separate this freeing issue. > > - the change should move gc+ref output to something else: there is so > much additional junk printed with gc+ref=trace so that the phase > logging is drowned out with real trace information and unusable for > regular consumption. Okay, I will add it. But I asked introducing 'gc+ref+phases' before but you didn't like it. :) Probably I didn't provide much details?! > > Also I would prefer to have this detailed log output interspersed > within the (existing) gc+phases output. Like under the "Reference > Processing" and "Reference Enqueuing" sections for G1 in particular. Frankly speaking, I'm not much interested now. When I started investigating this CR, you mentioned about this too. (But you were okay for either way. i.e. current one or interspersing into G1 logging. :) ) I also tried in that way(interspersing one) and my feeling is that I don't see much benefit to have ref logs in G1 phases section. It looks better organized but it doesn't mean current log style is worse. Ref. logs are printed separately for long time and other shared codes also print logs immediately. On the other hand, current implementation(re-use and print immediately) seems simpler to implement. In addition, ReferenceProcessor::process_discovered_reflist() is repeatedly called for different type of References so re-using log printer seems natural to me. :) > > Maybe with gc+phases+ref=debug/trace so that "everything" could be > enabled using "gc+phases*=debug/trace"? Yes, good idea. > > I can see that the code throws away the previous information about > reference processing after every use (the phasetimes reused). This is > does not allow printing of the data at convenient times and places. > > I.e. I would prefer if the data were aggregated (not only from one > particular phase) and later printed together. If we don't intersperse with existing G1 log, do you still think printing later is needed? Probably printing after Phanthom Ref. processed or different location? > > I kind of disagree with Aleksey about need for backwards compatibility > of log messages. This is such a big breaking change in the amount of > information shown that existing users will want to adapt their log > readers anyway. True that log parsers already should be updated, but I understood Aleksey's comment something like preferring 'Ref Counts' instead of 'Counts'. > As mentioned, due to real trace code here, gc+ref=trace is unusable. FYI, probably you tested with fastdebug because there are many debug/trace logs for debug build. It doesn't bother from product build actually. But as I said, I will change current new logs' channel from 'gc+ref' to 'gc+phases+ref'. > > We could still provide minimal backwards compatible output under > gc+ref=debug if needed. I'm don't see much value on this. gc+phases+ref seems better. > > - I would prefer if resetting the reference phase times logger wouldn't > be kind of an afterthought of printing :) > > Also it might be useful to keep the data around for somewhat longer > (not throw it away after every phase). Don't we need the data for > further analysis? I don't have strong opinion on this. I didn't consider keeping log data for further analysis. This could a minor reason for supporting keeping log data longer but I think interspersing with existing G1 log would be the main reason of keeping it. > > This would also allow printing it later using different log tags (with > different formatting). > > - I like the split of phasetimes into data storage and printing. I do > not like that basically the timing data is created twice, once for the > phasetimes, once for the GCTimer (for JFR basically). No, currently timing data is created once and used for both phase log and GCTimer. Or am I missing something? So in summary, mostly I agree with your comments except below 2: 1. Interspersing with G1 log. 2. Keeping log data longer. (This should be done if we go with interspersing idea) Let me post updated webrev, after making all decision. Thanks, Sangheon > Or the gctimer is > passed everywhere. But that is another issue I guess. > > Thanks, > Thomas > > >>> Oh, good! I had to instrument these by hand when optimizing RP >>> paths. >>> >>> Comments after brief look: >>> >>> *) So, the path with NULL executor are also not handling the >>> timer? E.g. CMS: >>> >>> 5262 if (rp->processing_is_mt()) { >>> 5263 rp->balance_all_queues(); >>> 5264 CMSRefProcTaskExecutor task_executor(*this); >>> 5265 rp->enqueue_discovered_references(&task_executor, >>> _gc_timer_cm); >>> 5266 } else { >>> 5267 rp->enqueue_discovered_references(NULL); >>> 5268 } >> Fixed to use timers for similar cases that you pointed. Thanks for >> catching up this! >> I started this CR as a part of MT ref. processing(JDK-8043575), so I >> only added to that path. But this should be fixed. >>> >>> *) I would leave "Ref Counts" line as usual for compatibility >>> reasons. Changing >>> it to "Counts" would force GC log parsers to handle that corner >>> case too. >> Changed, 'Counts -> Ref Counts'. >>> >>> *) This may reuse Indents? >>> >>> 95 out->print("%s", " "); >> Fixed to use Indents[2]. >> >>> >>> *) Probably makes sense to "hg mv -A" the workerDataArray files >>> to preserve the >>> Mercurial history -- webrev should say something like "copied from >>> ...", IIRC. >> Fixed. >> >> webrev: >> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/ >> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0 >> >> Thanks, >> Sangheon >> >> >>> >>> Thanks, >>> -Aleksey >>> From sangheon.kim at oracle.com Tue Jun 13 21:29:06 2017 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 13 Jun 2017 14:29:06 -0700 Subject: RFR (7xS): 8178148: Log more detailed information about scan rs phase In-Reply-To: References: <1491910205.2754.31.camel@oracle.com> <1493986396.2777.61.camel@oracle.com> <41d0b773-06fd-dfef-beda-2d62797210d9@oracle.com> <1494849002.2707.20.camel@oracle.com> <1495543832.2781.37.camel@oracle.com> <04ab36fb-afaf-0ed9-6480-b955474d4bee@oracle.com> <1497267278.2777.2.camel@oracle.com> Message-ID: <78237c97-13b7-505e-35d0-acf9a58fa17f@oracle.com> Hi Thomas, On 06/13/2017 05:21 AM, Erik Helin wrote: > On 06/12/2017 01:34 PM, Thomas Schatzl wrote: >> Hi all, >> >> sorry for another round of reviews: Erik asked me to add a gtest test >> for the linked subitems, both for the (existing) set() and (new) add() >> methods. >> >> Webrev: http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2_to_3/ (di >> ff): http://cr.openjdk.java.net/~tschatzl/8178148/webrev.3/ (full) >> >> Testing: >> local testing, jprt > > Looks good, Reviewed. Looks good to me too. Thanks, Sangheon > > Thanks, > Erik > >> Thanks, >> Thomas >> >> On Tue, 2017-05-23 at 12:15 -0700, sangheon wrote: >>> Hi Thomas, >>> >>> On 05/23/2017 05:50 AM, Thomas Schatzl wrote: >>>> >>>> Hi all, >>>> >>>> unfortunately, for support of some code there is need for one >>>> more >>>> public method in the g1gcphasetimes class. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev.1_to_2/ (diff) >>>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev.2/ (full) >>> Webrev.2 still looks good to me. >>> >>> Thanks, >>> Sangheon >>> >>> >>>> >>>> >>>> Sorry for the issue. >>>> >>>> Thanks, >>>> Thomas >>>> From yasuenag at gmail.com Wed Jun 14 04:22:56 2017 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Wed, 14 Jun 2017 13:22:56 +0900 Subject: JDK-8153333: [REDO] STW phases at Concurrent GC should count in PerfCounter Message-ID: Hi all, I changed PerfCounter to show CGC STW phase in jstat in JDK-8151674. However, it occurred several jtreg test failure, so it was back-outed. I want to resume to work for this issue. http://cr.openjdk.java.net/~ysuenaga/JDK-8153333/webrev.03/hotspot/ http://cr.openjdk.java.net/~ysuenaga/JDK-8153333/webrev.03/jdk/ These changes are work fine on jtreg test as below: hotspot/test/serviceability/tmtools/jstat jdk/test/sun/tools Since JDK 9, default GC algorithm is set to G1. So I think this change is useful to watch GC behavior through jstat. I cannot access JPRT. Could you help? Thanks, Yasumasa From sangheon.kim at oracle.com Wed Jun 14 07:52:55 2017 From: sangheon.kim at oracle.com (sangheon) Date: Wed, 14 Jun 2017 00:52:55 -0700 Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing In-Reply-To: <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com> References: <1497352882.2829.65.camel@oracle.com> <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com> Message-ID: <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com> Hi Thomas again, On 06/13/2017 02:21 PM, sangheon wrote: > Hi Thomas, > > Thank you for reviewing this. > > On 06/13/2017 04:21 AM, Thomas Schatzl wrote: >> Hi Sangheon, >> >> >> On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote: >>> Hi Aleksey, >>> >>> Thanks for the review. >>> >>> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote: >>>> On 06/10/2017 01:57 AM, sangheon wrote: >>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8173335 >>>>> webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0 >> - There should be a destructor in ReferenceProcessor cleaning up the >> dynamically allocated memory. > Thomas and I had some discussion about this and agreed to file a > separate CR for freeing issue. > > I noticed that there's no destructor when I wrote this, but this is > how we usually implement. > However as this seems incorrect, I will add a destructor for newly > added class but it will not be used in this patch. > It will be used in the following CR( > https://bugs.openjdk.java.net/browse/JDK-8182120 ) which fixes > not-freeing issue in ReferenceProcessor. > FYI, ReferenceProcessor has heap allocated members of > ReferencePolicy(and its friends) but it is not freed too. So instead > of extending this patch, I propose to separate this freeing issue. > >> >> - the change should move gc+ref output to something else: there is so >> much additional junk printed with gc+ref=trace so that the phase >> logging is drowned out with real trace information and unusable for >> regular consumption. > Okay, I will add it. > But I asked introducing 'gc+ref+phases' before but you didn't like it. > :) Probably I didn't provide much details?! > >> >> Also I would prefer to have this detailed log output interspersed >> within the (existing) gc+phases output. Like under the "Reference >> Processing" and "Reference Enqueuing" sections for G1 in particular. > Frankly speaking, I'm not much interested now. > When I started investigating this CR, you mentioned about this too. > (But you were okay for either way. i.e. current one or interspersing > into G1 logging. :) ) > I also tried in that way(interspersing one) and my feeling is that I > don't see much benefit to have ref logs in G1 phases section. It looks > better organized but it doesn't mean current log style is worse. > Ref. logs are printed separately for long time and other shared codes > also print logs immediately. > > On the other hand, current implementation(re-use and print > immediately) seems simpler to implement. > In addition, ReferenceProcessor::process_discovered_reflist() is > repeatedly called for different type of References so re-using log > printer seems natural to me. :) > >> >> Maybe with gc+phases+ref=debug/trace so that "everything" could be >> enabled using "gc+phases*=debug/trace"? > Yes, good idea. > >> >> I can see that the code throws away the previous information about >> reference processing after every use (the phasetimes reused). This is >> does not allow printing of the data at convenient times and places. >> >> I.e. I would prefer if the data were aggregated (not only from one >> particular phase) and later printed together. > If we don't intersperse with existing G1 log, do you still think > printing later is needed? > Probably printing after Phanthom Ref. processed or different location? > >> >> I kind of disagree with Aleksey about need for backwards compatibility >> of log messages. This is such a big breaking change in the amount of >> information shown that existing users will want to adapt their log >> readers anyway. > True that log parsers already should be updated, but I understood > Aleksey's comment something like preferring 'Ref Counts' instead of > 'Counts'. > >> As mentioned, due to real trace code here, gc+ref=trace is unusable. > FYI, probably you tested with fastdebug because there are many > debug/trace logs for debug build. It doesn't bother from product build > actually. > But as I said, I will change current new logs' channel from 'gc+ref' > to 'gc+phases+ref'. > >> >> We could still provide minimal backwards compatible output under >> gc+ref=debug if needed. > I'm don't see much value on this. > gc+phases+ref seems better. > >> >> - I would prefer if resetting the reference phase times logger wouldn't >> be kind of an afterthought of printing :) >> >> Also it might be useful to keep the data around for somewhat longer >> (not throw it away after every phase). Don't we need the data for >> further analysis? > I don't have strong opinion on this. > > I didn't consider keeping log data for further analysis. This could a > minor reason for supporting keeping log data longer but I think > interspersing with existing G1 log would be the main reason of keeping > it. > >> >> This would also allow printing it later using different log tags (with >> different formatting). >> >> - I like the split of phasetimes into data storage and printing. I do >> not like that basically the timing data is created twice, once for the >> phasetimes, once for the GCTimer (for JFR basically). > No, currently timing data is created once and used for both phase log > and GCTimer. > Or am I missing something? > > So in summary, mostly I agree with your comments except below 2: > 1. Interspersing with G1 log. > 2. Keeping log data longer. (This should be done if we go with > interspersing idea) I started working on above 2 items. :) I will update webrev when I'm ready. Thanks, Sangheon > > Let me post updated webrev, after making all decision. > > Thanks, > Sangheon > > >> Or the gctimer is >> passed everywhere. But that is another issue I guess. >> >> Thanks, >> Thomas >> >> >>>> Oh, good! I had to instrument these by hand when optimizing RP >>>> paths. >>>> >>>> Comments after brief look: >>>> >>>> *) So, the path with NULL executor are also not handling the >>>> timer? E.g. CMS: >>>> >>>> 5262 if (rp->processing_is_mt()) { >>>> 5263 rp->balance_all_queues(); >>>> 5264 CMSRefProcTaskExecutor task_executor(*this); >>>> 5265 rp->enqueue_discovered_references(&task_executor, >>>> _gc_timer_cm); >>>> 5266 } else { >>>> 5267 rp->enqueue_discovered_references(NULL); >>>> 5268 } >>> Fixed to use timers for similar cases that you pointed. Thanks for >>> catching up this! >>> I started this CR as a part of MT ref. processing(JDK-8043575), so I >>> only added to that path. But this should be fixed. >>>> >>>> *) I would leave "Ref Counts" line as usual for compatibility >>>> reasons. Changing >>>> it to "Counts" would force GC log parsers to handle that corner >>>> case too. >>> Changed, 'Counts -> Ref Counts'. >>>> >>>> *) This may reuse Indents? >>>> >>>> 95 out->print("%s", " "); >>> Fixed to use Indents[2]. >>> >>>> >>>> *) Probably makes sense to "hg mv -A" the workerDataArray files >>>> to preserve the >>>> Mercurial history -- webrev should say something like "copied from >>>> ...", IIRC. >>> Fixed. >>> >>> webrev: >>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/ >>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0 >>> >>> Thanks, >>> Sangheon >>> >>> >>>> >>>> Thanks, >>>> -Aleksey >>>> > From thomas.schatzl at oracle.com Wed Jun 14 12:58:57 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 14 Jun 2017 14:58:57 +0200 Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC In-Reply-To: <1497107249632.26184@infobip.com> References: <1495365159435.54025@infobip.com> <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com> <1495734990075.28893@infobip.com> <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com> <1496926647107.27811@infobip.com> , <1497107249632.26184@infobip.com> Message-ID: <1497445137.2785.7.camel@oracle.com> Hi, On Sat, 2017-06-10 at 15:07 +0000, Milan Mimica wrote: > Milan Mimica, Senior Software Engineer / Division Lead > > > > From: Kim Barrett > > Sent: Friday, June 9, 2017 03:04 > > > > Note that the refactoring patch doesn't apply cleanly to jdk10/hs > > tip. > > There's a merge conflict with the fix for JDK-8168467 (resolved > > 2017/03/15). > Oh, I 've developed it against jdk10/jdk10. Care to explain a bit the > difference? Attached are patches against jdk10/hs. jdk10/hs is the current development tree where development happens and new changes pushed into. hs10/hs10 is the tree where public builds are made from. Changes are regularly (at least in the typical case) merged to jdk10/jdk10 after some additional regression testing. > > What testing has been done???And are there any tests you can point > > to > > that are directly affected???(I already know about > > TestArrayAllocatorMallocLimit.java.)??I'll probably want to run > > some > > tests using our internal test facilities as part of sponsoring. > I have run jtreg on my laptop. There are some failures actually, > but happens also on clean repo. I'm not aware of anything else. > I created?https://bugs.openjdk.java.net/browse/JDK-8182169?for the ArrayAllocator refactoring as I could not find an existing issue. I uploaded a webrev of that change to http://cr.openjdk.java.net/~tschatzl/8182169/webrev/ Looks good to me. I also uploaded a webrev for JDK-8176571 based on the above to http://cr.openjdk.java.net/~tschatzl/8176571/webrev Looks good to me too, but is there a reason to not use default parameters for the CHeapBitmap constructors? As for testing I am going to move it through JPRT (our build and test system). Thanks, ? Thomas From Milan.Mimica at infobip.com Wed Jun 14 13:59:14 2017 From: Milan.Mimica at infobip.com (Milan Mimica) Date: Wed, 14 Jun 2017 13:59:14 +0000 Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC In-Reply-To: <1497445137.2785.7.camel@oracle.com> References: <1495365159435.54025@infobip.com> <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com> <1495734990075.28893@infobip.com> <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com> <1496926647107.27811@infobip.com> , <1497107249632.26184@infobip.com>,<1497445137.2785.7.camel@oracle.com> Message-ID: <1497448754310.80745@infobip.com> Hi Milan Mimica, Senior Software Engineer / Division Lead > From: Thomas Schatzl > Sent: Wednesday, June 14, 2017 14:58 > To: Milan Mimica; hotspot-gc-dev at openjdk.java.net > Subject: Re: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC > >> Oh, I 've developed it against jdk10/jdk10. Care to explain a bit the >> difference? Attached are patches against jdk10/hs. > > jdk10/hs is the current development tree where development happens and > new changes pushed into. hs10/hs10 is the tree where public builds are > made from. Changes are regularly (at least in the typical case) merged > to jdk10/jdk10 after some additional regression testing. I see. Thanks. > I created https://bugs.openjdk.java.net/browse/JDK-8182169 for the > ArrayAllocator refactoring as I could not find an existing issue. > > I uploaded a webrev of that change to > http://cr.openjdk.java.net/~tschatzl/8182169/webrev/ > > Looks good to me. > > I also uploaded a webrev for JDK-8176571 based on the above to > http://cr.openjdk.java.net/~tschatzl/8176571/webrev > > Looks good to me too, but is there a reason to not use default parameters for the CHeapBitmap constructors? I'd rather somehow make the argument mandatory, to force people to choose a memory category. From stefan.johansson at oracle.com Wed Jun 14 14:45:57 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 14 Jun 2017 16:45:57 +0200 Subject: RFR: 8177544: Restructure G1 Full GC code In-Reply-To: <1497346566.2829.33.camel@oracle.com> References: <62d1f02b-1fc0-ffcf-b8e0-e88ebacecebe@oracle.com> <1497346566.2829.33.camel@oracle.com> Message-ID: Thanks Thomas for reviewing, On 2017-06-13 11:36, Thomas Schatzl wrote: > Hi, > > thanks for your hard work on the parallel full gc that starts with > this refactoring :) :) > On Thu, 2017-06-08 at 14:35 +0200, Stefan Johansson wrote: >> Hi, >> >> Please review this enhancement: >> https://bugs.openjdk.java.net/browse/JDK-8177544 >> >> Webrev: >> http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.00/ >> >> Summary: >> This is more or less only code moving around. The function >> do_full_collection in G1CollectedHeap is very large and breaking it >> up to smaller parts and grouping together some of the stack objects >> help readability. >> >> In addition to splitting the large function to smaller ones I've >> introduced two new classes: >> - G1FullGCScope that groups most of the previously spread out stack >> objects. >> - G1SerialCollector that handles the interaction with G1MarkSweep. >> >> Doing this change will simplify future changes to the full GC. >> >> Testing: >> * Locally run JTREG tests >> * RBT hotspot tier 2 & 3 >> > Some initial thoughts of the change, mostly to start a discussion: > > - G1FullGCScope class: please add a line what the purpose of the > class is. Added a sentence, do you want more? > - a better name for G1CollectedHeap::reset_card_cache_and_queue() > could be abort_refinement(). Sounds good, fixed. > > - G1CollectedHeap.cpp:1145: please remove the "stale" word in that > comment. It's confusing me because at that point because "stale" cards > are kind of defined for a particular context and does not fit here. Fixed. > - can you move all the printing after collection > (g1CollectedHeap.cpp:1239 - 1249) into an extra method too? Something > like "print_heap_after_full_collection()"? (I think there is some > argument to also have a print_heap_before_full_collection() method). Done, since this needed me to pass around the heap_transition I decided to move it into the G1FullGCScope and I think that was an improvement in it self. > - G1SerialCollector is actually a "G1SerialFullCollector". I do not > remember whether the follow-up change removes it again anyway, but it > seems to be a simple renaming. Yes, it will be removed. And yes I can do the rename. > > - G1SerialCollector interface: while I could live with the > prepare/do/complete naming of the methods, the typical sequence is > (unfortunately gc_prologue(), collect(), gc_epilogue()) I'm a bit hesitant about re-using or gc_*logue and moving stuff into them if that's what you mean. And if you can live with the current proposal I think I will stick with it. > > - previously printing and verifying the heap has been outside the > "Pause Full" GCTraceTime. I am okay with that. I see, and I also think the new way is ok. > > - could we put the code from g1CollectedHeap.cpp:1215-1232 into a > "prepare_for_regular_collection" method? Yes, will group them together and also include the above assert. I'll also move the MemoryService::track_memory_usage() call into gc_epilogue as it is called at a similar point for the YC. I called the new method prepare_heap_for_mutators. > > - the order of the gc_epilogue() and g1_policy- >> record_full_collection_end() calls is different. The reason I moved them around is that increment_old_marking_cycles_completed has been moved into the epilogue. I was uncertain if the policy needed to see that update before recording the end. Digging into the policy I think this is not the case, I'll reorder them again. > Actually, if it were for me, I would put the whole full gc setup and > teardown into a separate class/file. > > Have public gc_prologue()/collect()/gc_epilogue() methods where > gc_prologue() is the first part of do_full_collection_inner() until > application of the G1SerialCollector, collect() the instantiation and > application of G1SerialCollector, and gc_epilogue() the remainder. > > E.g. in G1CollectedHeap we only have the calls to these three methods > (there is no need to have all three). > > At least I think it would help a lot if all that full gc stuff would be > separate physically from do-all-G1CollectedHeap. > With the G1FullGCScope there is almost no reference to G1CollectedHeap > afaics. > > (There is _allocator->init_mutator_alloc_region() call) I see your point and I think it would be good. But as we discussed over chat, might be something to look at once everything else in this area is done. Will create a RFE for this. > - g1CollectedHeap.hpp: please try to sort the definitions of the new > methods in order of calling them. Done. Here are updated webrevs: Full: http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.01/ Inc: http://cr.openjdk.java.net/~sjohanss/8177544/hotspot.00-01/ Thanks, Stefan > > Thanks, > Thomas From kim.barrett at oracle.com Wed Jun 14 17:39:03 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 14 Jun 2017 13:39:03 -0400 Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC In-Reply-To: <1497445137.2785.7.camel@oracle.com> References: <1495365159435.54025@infobip.com> <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com> <1495734990075.28893@infobip.com> <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com> <1496926647107.27811@infobip.com> <1497107249632.26184@infobip.com> <1497445137.2785.7.camel@oracle.com> Message-ID: > On Jun 14, 2017, at 8:58 AM, Thomas Schatzl wrote: > > Hi, Thomas - Thanks for picking this up! I got distracted? > On Sat, 2017-06-10 at 15:07 +0000, Milan Mimica wrote: >> Milan Mimica, Senior Software Engineer / Division Lead >>> >>> From: Kim Barrett >>> Sent: Friday, June 9, 2017 03:04 >>> >>> Note that the refactoring patch doesn't apply cleanly to jdk10/hs >>> tip. >>> There's a merge conflict with the fix for JDK-8168467 (resolved >>> 2017/03/15). >> Oh, I 've developed it against jdk10/jdk10. Care to explain a bit the >> difference? Attached are patches against jdk10/hs. > > jdk10/hs is the current development tree where development happens and > new changes pushed into. hs10/hs10 is the tree where public builds are > made from. Changes are regularly (at least in the typical case) merged > to jdk10/jdk10 after some additional regression testing. > >>> What testing has been done? And are there any tests you can point >>> to >>> that are directly affected? (I already know about >>> TestArrayAllocatorMallocLimit.java.) I'll probably want to run >>> some >>> tests using our internal test facilities as part of sponsoring. >> I have run jtreg on my laptop. There are some failures actually, >> but happens also on clean repo. I'm not aware of anything else. >> > > I created https://bugs.openjdk.java.net/browse/JDK-8182169 for the > ArrayAllocator refactoring as I could not find an existing issue. > > I uploaded a webrev of that change to > http://cr.openjdk.java.net/~tschatzl/8182169/webrev/ > > Looks good to me. Looks good to me too. > I also uploaded a webrev for JDK-8176571 based on the above to > http://cr.openjdk.java.net/~tschatzl/8176571/webrev > > Looks good to me too, but is there a reason to not use default parameters for the CHeapBitmap constructors? Agreed. Defaulting the flags to mtInternal (which is effectively what?s being done the hard way) would simplify things. > As for testing I am going to move it through JPRT (our build and test system). > > Thanks, > Thomas From kim.barrett at oracle.com Wed Jun 14 17:48:04 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 14 Jun 2017 13:48:04 -0400 Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC In-Reply-To: <1497448754310.80745@infobip.com> References: <1495365159435.54025@infobip.com> <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com> <1495734990075.28893@infobip.com> <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com> <1496926647107.27811@infobip.com> <1497107249632.26184@infobip.com> <1497445137.2785.7.camel@oracle.com> <1497448754310.80745@infobip.com> Message-ID: <1DDCE6C7-B5CC-4878-85E8-44D497B41E7C@oracle.com> > On Jun 14, 2017, at 9:59 AM, Milan Mimica wrote: >> I also uploaded a webrev for JDK-8176571 based on the above to >> http://cr.openjdk.java.net/~tschatzl/8176571/webrev >> >> Looks good to me too, but is there a reason to not use default parameters for the CHeapBitmap constructors? > > I'd rather somehow make the argument mandatory, to force people to choose a memory category. I would support making the argument mandatory. All non-test callers are presently in g1, and are already being touched to change them to explicitly use mtGC, so would not be affected by such a change. But there are some callers in test_bitMap and test_bitMap_search native tests that would need to be fixed. From aph at redhat.com Thu Jun 15 10:49:59 2017 From: aph at redhat.com (Andrew Haley) Date: Thu, 15 Jun 2017 11:49:59 +0100 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <9c019b11-0c78-2649-d3bd-cd02fd999e68@redhat.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <0ccc1ec4-9866-cd2a-c6bc-a32c2da8cb3d@oracle.com> <9c019b11-0c78-2649-d3bd-cd02fd999e68@redhat.com> Message-ID: On 29/05/17 15:16, Roman Kennke wrote: > I agree that having a single pool would be good. The current WorkGang > doesn't do it though, because we can't borrow threads while the GC is > doing work, at least not in a way that is GC agnostic (see my reply to > Robbin). It would be nice, though, not to have to share worker threads between GC and other jobs if we didn't need todo so. If we're on a large-scale multicore machine, we don't want to be trashing warm caches in idle cores unless it's forced on us by the lack of hardware resources. We're not always short of cores, and the massively- scalable multi-core world is nearly upon us. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Thu Jun 15 12:10:03 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 15 Jun 2017 14:10:03 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <0ccc1ec4-9866-cd2a-c6bc-a32c2da8cb3d@oracle.com> <9c019b11-0c78-2649-d3bd-cd02fd999e68@redhat.com> Message-ID: On 06/15/2017 12:49 PM, Andrew Haley wrote: > On 29/05/17 15:16, Roman Kennke wrote: >> I agree that having a single pool would be good. The current WorkGang >> doesn't do it though, because we can't borrow threads while the GC is >> doing work, at least not in a way that is GC agnostic (see my reply to >> Robbin). > > It would be nice, though, not to have to share worker threads between > GC and other jobs if we didn't need todo so. If we're on a > large-scale multicore machine, we don't want to be trashing warm > caches in idle cores unless it's forced on us by the lack of hardware > resources. We're not always short of cores, and the massively- > scalable multi-core world is nearly upon us. > I agree with your point: just because we have a single pool doesn't necessary mean we need to share the threads, but share heuristics. We have some stuff in the pipeline that we would like to also be done in parallel during STW or/and concurrent, like the concurrent monitor deflation. JFR have already moved most of the logic of out STW, so consider JFR + Shenandoah + concurrent deflation and invoking Arrays.parallelSort() (don't forget about compiler threads running also). Now we may end up trashing caches just because we have no heuristics. The solution is not obvious to me, that why, at least, I need to think about it. (JEP) Thanks, Robbin From thomas.schatzl at oracle.com Fri Jun 16 10:23:54 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 16 Jun 2017 12:23:54 +0200 Subject: [PATCH] JDK-8176571: Fine bitmaps should be allocated as belonging to mtGC In-Reply-To: <1DDCE6C7-B5CC-4878-85E8-44D497B41E7C@oracle.com> References: <1495365159435.54025@infobip.com> <1495434908161.70459@infobip.com> <1495466592.2573.76.camel@oracle.com> <1495734990075.28893@infobip.com> <6110539E-B4E2-475D-8199-7C30449960D2@oracle.com> <1496926647107.27811@infobip.com> <1497107249632.26184@infobip.com> <1497445137.2785.7.camel@oracle.com> <1497448754310.80745@infobip.com> <1DDCE6C7-B5CC-4878-85E8-44D497B41E7C@oracle.com> Message-ID: <1497608634.3282.10.camel@oracle.com> Hi, On Wed, 2017-06-14 at 13:48 -0400, Kim Barrett wrote: > > > > On Jun 14, 2017, at 9:59 AM, Milan Mimica > > wrote: > > > > > > I also uploaded a webrev for JDK-8176571 based on the above to > > > http://cr.openjdk.java.net/~tschatzl/8176571/webrev > > > > > > Looks good to me too, but is there a reason to not use default > > > parameters for the CHeapBitmap constructors? > > I'd rather somehow make the argument mandatory, to force people to > > choose a memory category. > > I would support making the argument mandatory.??All non-test callers > are presently in g1, and are already being touched to change them to > explicitly use mtGC, so would not be affected by such a change.??But > there are some callers in test_bitMap and test_bitMap_search native > tests that would need to be fixed. I would not block such an idea. The user of the bitmap should know what it is going to be used for :) However the tests heavily use templates to test all types of bitmaps, so they expect the constructors to be the same. The best I could come up to make this would be having a test private wrapper class for CHeapBitmap that adds mtInternal to the constructor automatically - and use that one for CHeapBitmap tests. I am sure you C++ wizards immediately find something better though :) Not sure if it is worth the effort, but feel free to convince me with a changeset :) Otherwise I just recommend using a default mtInternal value for the MEMFLAGS parameter instead of the manual constructor duplication. Note that we still have some time, as Milan's name does not show up on the OCA signatory list yet (http://www.oracle.com/technetwork/community /oca-486395.html). Thanks, ? Thomas From thomas.schatzl at oracle.com Tue Jun 20 08:05:47 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 20 Jun 2017 10:05:47 +0200 Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure In-Reply-To: <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <1493986396.2777.61.camel@oracle.com> <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com> Message-ID: <1497945947.2784.6.camel@oracle.com> Hi Sangheon, others, On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote: > Hi Thomas, > > On 05/05/2017 05:13 AM, Thomas Schatzl wrote: > > > > Hi all, > > > > ???recent reviews have made changes necessary to parts of the > > changeset chain. > > > > Here is a list of links to updated webrevs. Since they have > > apparently not been reviewed yet, I simply overwrote the old > > webrevs. > > > > JDK-8177044: Remove _scan_top from HeapRegion > > http://cr.openjdk.java.net/~tschatzl/8177044/webrev/ > > > > JDK-8178148: Log more detailed information about scan rs phase > > http://cr.openjdk.java.net/~tschatzl/8178148/webrev/ > > > > JDK-8175554: Improve G1UpdateRSOrPushRefClosure > > http://cr.openjdk.java.net/~tschatzl/8175554/webrev/ > Looks good to me. > I only have minor nits. > > ------------------------------------------------------ > src/share/vm/gc/g1/g1OopClosures.hpp > ???78???virtual void do_oop(oop* p) { do_oop_nv(p); } > Misaligned with above line. > > ------------------------------------------------------ > src/share/vm/gc/g1/g1RemSet.hpp > ? 204???????????????????G1UpdateOrScanRSClosure* push_heap_cl, > Rename to reflect new closure name? > > ------------------------------------------------------ > src/share/vm/gc/g1/g1RootProcessor.hpp > Copyright update. > > ------------------------------------------------------ > src/share/vm/gc/g1/g1_specialized_oop_closures.hpp > ???45???????f(G1UpdateOrScanRSClosure,_nv)?????????\ > Misaligned '\'. > ? I fixed all this in addition to incorporating ErikD's comments that asked for factoring out two parts of the G1ParScanClosure and G1UpdateOrScanRSClosure that were equal now. I did some performance testing again due to that, and also found that the check to filter out non-cross-region references in?G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also reverted it to the old code. Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not update _has_refs_into_cset as before. Fixed that as well. Thanks, ? Thomas From thomas.schatzl at oracle.com Tue Jun 20 08:07:58 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 20 Jun 2017 10:07:58 +0200 Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure In-Reply-To: <1497945947.2784.6.camel@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <1493986396.2777.61.camel@oracle.com> <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com> <1497945947.2784.6.camel@oracle.com> Message-ID: <1497946078.2784.7.camel@oracle.com> Hi again, ? webrev links: http://cr.openjdk.java.net/~tschatzl/8175554/webrev.0_to_1/?(diff) http://cr.openjdk.java.net/~tschatzl/8175554/webrev.1/?(full) Thomas ? On Tue, 2017-06-20 at 10:05 +0200, Thomas Schatzl wrote: > Hi Sangheon, others, > > On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote: > > > > Hi Thomas, > > > > On 05/05/2017 05:13 AM, Thomas Schatzl wrote: > > > > > > > > > Hi all, > > > > > > ???recent reviews have made changes necessary to parts of the > > > changeset chain. > > > > > > Here is a list of links to updated webrevs. Since they have > > > apparently not been reviewed yet, I simply overwrote the old > > > webrevs. > > > > > > JDK-8177044: Remove _scan_top from HeapRegion > > > http://cr.openjdk.java.net/~tschatzl/8177044/webrev/ > > > > > > JDK-8178148: Log more detailed information about scan rs phase > > > http://cr.openjdk.java.net/~tschatzl/8178148/webrev/ > > > > > > JDK-8175554: Improve G1UpdateRSOrPushRefClosure > > > http://cr.openjdk.java.net/~tschatzl/8175554/webrev/ > > Looks good to me. > > I only have minor nits. > > > > ------------------------------------------------------ > > src/share/vm/gc/g1/g1OopClosures.hpp > > ???78???virtual void do_oop(oop* p) { do_oop_nv(p); } > > Misaligned with above line. > > > > ------------------------------------------------------ > > src/share/vm/gc/g1/g1RemSet.hpp > > ? 204???????????????????G1UpdateOrScanRSClosure* push_heap_cl, > > Rename to reflect new closure name? > > > > ------------------------------------------------------ > > src/share/vm/gc/g1/g1RootProcessor.hpp > > Copyright update. > > > > ------------------------------------------------------ > > src/share/vm/gc/g1/g1_specialized_oop_closures.hpp > > ???45???????f(G1UpdateOrScanRSClosure,_nv)?????????\ > > Misaligned '\'. > > > ? I fixed all this in addition to incorporating ErikD's comments that > asked for factoring out two parts of the G1ParScanClosure and > G1UpdateOrScanRSClosure that were equal now. > > I did some performance testing again due to that, and also found that > the check to filter out non-cross-region references > in?G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also > reverted it to the old code. > > Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not > update > _has_refs_into_cset as before. Fixed that as well. > > Thanks, > ? Thomas > From sangheon.kim at oracle.com Tue Jun 20 23:15:37 2017 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 20 Jun 2017 16:15:37 -0700 Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing In-Reply-To: <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com> References: <1497352882.2829.65.camel@oracle.com> <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com> <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com> Message-ID: <001e2f5b-4d5f-327f-79bc-9287045179e1@oracle.com> Hi Thomas, On 06/14/2017 12:52 AM, sangheon wrote: > Hi Thomas again, > > On 06/13/2017 02:21 PM, sangheon wrote: >> Hi Thomas, >> >> Thank you for reviewing this. >> >> On 06/13/2017 04:21 AM, Thomas Schatzl wrote: >>> Hi Sangheon, >>> >>> >>> On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote: >>>> Hi Aleksey, >>>> >>>> Thanks for the review. >>>> >>>> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote: >>>>> On 06/10/2017 01:57 AM, sangheon wrote: >>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8173335 >>>>>> webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.0 >>> - There should be a destructor in ReferenceProcessor cleaning up the >>> dynamically allocated memory. >> Thomas and I had some discussion about this and agreed to file a >> separate CR for freeing issue. >> >> I noticed that there's no destructor when I wrote this, but this is >> how we usually implement. >> However as this seems incorrect, I will add a destructor for newly >> added class but it will not be used in this patch. >> It will be used in the following CR( >> https://bugs.openjdk.java.net/browse/JDK-8182120 ) which fixes >> not-freeing issue in ReferenceProcessor. >> FYI, ReferenceProcessor has heap allocated members of >> ReferencePolicy(and its friends) but it is not freed too. So instead >> of extending this patch, I propose to separate this freeing issue. >> >>> >>> - the change should move gc+ref output to something else: there is so >>> much additional junk printed with gc+ref=trace so that the phase >>> logging is drowned out with real trace information and unusable for >>> regular consumption. >> Okay, I will add it. >> But I asked introducing 'gc+ref+phases' before but you didn't like >> it. :) Probably I didn't provide much details?! >> >>> >>> Also I would prefer to have this detailed log output interspersed >>> within the (existing) gc+phases output. Like under the "Reference >>> Processing" and "Reference Enqueuing" sections for G1 in particular. >> Frankly speaking, I'm not much interested now. >> When I started investigating this CR, you mentioned about this too. >> (But you were okay for either way. i.e. current one or interspersing >> into G1 logging. :) ) >> I also tried in that way(interspersing one) and my feeling is that I >> don't see much benefit to have ref logs in G1 phases section. It >> looks better organized but it doesn't mean current log style is worse. >> Ref. logs are printed separately for long time and other shared codes >> also print logs immediately. >> >> On the other hand, current implementation(re-use and print >> immediately) seems simpler to implement. >> In addition, ReferenceProcessor::process_discovered_reflist() is >> repeatedly called for different type of References so re-using log >> printer seems natural to me. :) >> >>> >>> Maybe with gc+phases+ref=debug/trace so that "everything" could be >>> enabled using "gc+phases*=debug/trace"? >> Yes, good idea. >> >>> >>> I can see that the code throws away the previous information about >>> reference processing after every use (the phasetimes reused). This is >>> does not allow printing of the data at convenient times and places. >>> >>> I.e. I would prefer if the data were aggregated (not only from one >>> particular phase) and later printed together. >> If we don't intersperse with existing G1 log, do you still think >> printing later is needed? >> Probably printing after Phanthom Ref. processed or different location? >> >>> >>> I kind of disagree with Aleksey about need for backwards compatibility >>> of log messages. This is such a big breaking change in the amount of >>> information shown that existing users will want to adapt their log >>> readers anyway. >> True that log parsers already should be updated, but I understood >> Aleksey's comment something like preferring 'Ref Counts' instead of >> 'Counts'. >> >>> As mentioned, due to real trace code here, gc+ref=trace is unusable. >> FYI, probably you tested with fastdebug because there are many >> debug/trace logs for debug build. It doesn't bother from product >> build actually. >> But as I said, I will change current new logs' channel from 'gc+ref' >> to 'gc+phases+ref'. >> >>> >>> We could still provide minimal backwards compatible output under >>> gc+ref=debug if needed. >> I'm don't see much value on this. >> gc+phases+ref seems better. >> >>> >>> - I would prefer if resetting the reference phase times logger wouldn't >>> be kind of an afterthought of printing :) >>> >>> Also it might be useful to keep the data around for somewhat longer >>> (not throw it away after every phase). Don't we need the data for >>> further analysis? >> I don't have strong opinion on this. >> >> I didn't consider keeping log data for further analysis. This could a >> minor reason for supporting keeping log data longer but I think >> interspersing with existing G1 log would be the main reason of >> keeping it. >> >>> >>> This would also allow printing it later using different log tags (with >>> different formatting). >>> >>> - I like the split of phasetimes into data storage and printing. I do >>> not like that basically the timing data is created twice, once for the >>> phasetimes, once for the GCTimer (for JFR basically). >> No, currently timing data is created once and used for both phase log >> and GCTimer. >> Or am I missing something? >> >> So in summary, mostly I agree with your comments except below 2: >> 1. Interspersing with G1 log. >> 2. Keeping log data longer. (This should be done if we go with >> interspersing idea) > I started working on above 2 items. :) > I will update webrev when I'm ready. Here's updated webrevs which applied below: 1. Added destructor for ReferenceProcessorPhaseTimes. 2. Added 'gc+phases+ref' for newly added logs. 3. Interspersing reference logs into G1 young GC log. - Logs for other cases will be printed immediately. 4. All timing information have their own storage. 5. Total time is added. Current reference logs will be: 1. New logs (except G1 young GC) [1.541s][debug][gc,phases ] GC(7) Finalize Marking 4.802ms [1.541s][debug][gc,phases,start] GC(7) *_Reference Processing_**_<-- [1]_* *[1.543s][debug][gc,phases,ref ] GC(7) Reference Processing: 1.8ms**<-- [2] **[1.543s][debug][gc,phases,ref ] GC(7) SoftReference: 0.3ms** **[1.543s][debug][gc,phases,ref ] GC(7) Balance queues: 0.0ms** **[1.543s][debug][gc,phases,ref ] GC(7) Phase1: 0.3ms** **[1.543s][trace][gc,phases,ref ] GC(7) Process lists (ms) Min: 1541.3, Avg: 1541.3, Max: 1541.3, Diff: 0.0, Sum: 35450.0, Workers: 23** **[1.543s][debug][gc,phases,ref ] GC(7) Phase2: 0.2ms** **[1.543s][trace][gc,phases,ref ] GC(7) Process lists (ms) Min: 1541.5, Avg: 1541.5, Max: 1541.5, Diff: 0.0, Sum: 35454.5, Workers: 23** **[1.543s][debug][gc,phases,ref ] GC(7) Phase3: 0.3ms** **[1.543s][trace][gc,phases,ref ] GC(7) Process lists (ms) Min: 1541.7, Avg: 1541.8, Max: 1541.8, Diff: 0.0, Sum: 35460.5, Workers: 23** **[1.543s][debug][gc,phases,ref ] GC(7) Discovered: 0** **[1.543s][debug][gc,phases,ref ] GC(7) Cleared: 0** **...** **[1.543s][debug][gc,phases,ref ] GC(7) Reference Enqueuing 0.1ms** **[1.543s][trace][gc,phases,ref ] GC(7) Process lists (ms) Min: 1543.4, Avg: 1543.4, Max: 1543.4, Diff: 0.1, Sum: 35498.4, Workers: 23** **[1.543s][debug][gc,phases,ref ] GC(7) Ref Counts: Soft: 0 Weak: 0 Final: 0 Phantom: 0* [1.544s][debug][gc,phases ] GC(7) Reference Processing 2.445ms [1.544s][debug][gc,phases,start] GC(7) Class Unloading [1.544s][debug][gc,phases,start] GC(7) ClassLoaderData [1.544s][debug][gc,phases ] GC(7) ClassLoaderData 0.467ms 2. New logs for G1 young GC: -Xlog:gc+phases*=trace [1.470s][info ][gc,phases ] GC(6) Post Evacuate Collection Set: 4.1ms [1.470s][debug][gc,phases ] GC(6) Code Roots Fixup: 0.0ms [1.470s][debug][gc,phases ] GC(6) Preserve CM Refs: 0.0ms [1.470s][trace][gc,phases ] GC(6) Parallel Preserve CM Refs (ms): skipped [1.470s][trace][gc,phases,task] GC(6) - - - - - - - - - - - - - - - - - - - - - - - *[1.470s][debug]_[gc,phases ]_ GC(6) Reference Processing: 1.4ms** **[1.470s][debug][gc,phases,ref ] GC(6) SoftReference: 0.2ms** **[1.470s][debug][gc,phases,ref ] GC(6) Balance queues: 0.0ms** **[1.470s][debug][gc,phases,ref ] GC(6) Phase1: 0.2ms** **[1.470s][trace][gc,phases,ref ] GC(6) Process lists (ms) Min: 1463.2, Avg: 1463.2, Max: 1463.2, Diff: 0.0, Sum: 33653.2, Workers: 23** **[1.470s][debug][gc,phases,ref ] GC(6) Phase2: 0.1ms** **[1.470s][trace][gc,phases,ref ] GC(6) Process lists (ms) Min: 1463.3, Avg: 1463.3, Max: 1463.4, Diff: 0.0, Sum: 33656.8, Workers: 23** **[1.470s][debug][gc,phases,ref ] GC(6) Phase3: 0.2ms** **[1.470s][trace][gc,phases,ref ] GC(6) Process lists (ms) Min: 1463.5, Avg: 1463.5, Max: 1463.6, Diff: 0.0, Sum: 33661.6, Workers: 23** **[1.470s][debug][gc,phases,ref ] GC(6) Discovered: 0** **[1.470s][debug][gc,phases,ref ] GC(6) Cleared: 0* ... [1.471s][debug][gc,phases ] GC(6) Clear Card Table: 0.0ms *[1.471s][debug][gc,phases ] GC(6) Reference Enqueuing: 0.1ms** **[1.471s][debug][gc,phases,ref ] GC(6) Ref Counts: Soft: 0 Weak: 0 Final: 0 Phantom: 0* [1.471s][debug][gc,phases ] GC(6) Merge Per-Thread State: 0.2ms 3. New logs for G1 young GC: -Xlog:gc+phases+ref=trace [1.335s][debug]_*[gc,phases,ref]*_ GC(4) Reference Processing: 9.4ms_*<- This is still printed.*_ [1.335s][debug][gc,phases,ref] GC(4) SoftReference: 7.0ms [1.335s][debug][gc,phases,ref] GC(4) Balance queues: 0.0ms [1.335s][debug][gc,phases,ref] GC(4) Phase1: 7.0ms [1.335s][trace][gc,phases,ref] GC(4) Process lists (ms) Min: 1329.0, Avg: 1329.0, Max: 1329.1, Diff: 0.1, Sum: 30568.1, Workers: 23 [1.335s][debug][gc,phases,ref] GC(4) Phase2: 0.1ms [1.335s][trace][gc,phases,ref] GC(4) Process lists (ms) Min: 1329.1, Avg: 1329.1, Max: 1329.1, Diff: 0.0, Sum: 30569.7, Workers: 23 [1.335s][debug][gc,phases,ref] GC(4) Phase3: 0.3ms [1.335s][trace][gc,phases,ref] GC(4) Process lists (ms) Min: 1329.4, Avg: 1329.4, Max: 1329.5, Diff: 0.1, Sum: 30576.7, Workers: 23 [1.335s][debug][gc,phases,ref] GC(4) Discovered: 0 [1.335s][debug][gc,phases,ref] GC(4) Cleared: 0 ... [1.335s][debug][gc,phases,ref] GC(4) Reference Enqueuing: 0.1ms [1.335s][debug][gc,phases,ref] GC(4) Ref Counts: Soft: 0 Weak: 0 Final: 0 Phantom: 0 [1]: Among implementations 'Reference Processing' GCTraceTime are measuring differently. Some include ReferenceProcessor::enqueue_discovered_references() while others don't. i.e. The former is something like measuring all reference process related work rather than process_discovered_references(). As to have its own total time seems right to me, I added it[2]. PS) There were some concern about exposing WorkerDataArray into 'shared' directory. But as there's no alternative to use, I'm hoping to share it now. Webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.2 (full) http://cr.openjdk.java.net/~sangheki/8173335/webrev.2_to_1/ (incremental) Thanks, Sangheon > > Thanks, > Sangheon > > >> >> Let me post updated webrev, after making all decision. >> >> Thanks, >> Sangheon >> >> >>> Or the gctimer is >>> passed everywhere. But that is another issue I guess. >>> >>> Thanks, >>> Thomas >>> >>> >>>>> Oh, good! I had to instrument these by hand when optimizing RP >>>>> paths. >>>>> >>>>> Comments after brief look: >>>>> >>>>> *) So, the path with NULL executor are also not handling the >>>>> timer? E.g. CMS: >>>>> >>>>> 5262 if (rp->processing_is_mt()) { >>>>> 5263 rp->balance_all_queues(); >>>>> 5264 CMSRefProcTaskExecutor task_executor(*this); >>>>> 5265 rp->enqueue_discovered_references(&task_executor, >>>>> _gc_timer_cm); >>>>> 5266 } else { >>>>> 5267 rp->enqueue_discovered_references(NULL); >>>>> 5268 } >>>> Fixed to use timers for similar cases that you pointed. Thanks for >>>> catching up this! >>>> I started this CR as a part of MT ref. processing(JDK-8043575), so I >>>> only added to that path. But this should be fixed. >>>>> >>>>> *) I would leave "Ref Counts" line as usual for compatibility >>>>> reasons. Changing >>>>> it to "Counts" would force GC log parsers to handle that corner >>>>> case too. >>>> Changed, 'Counts -> Ref Counts'. >>>>> >>>>> *) This may reuse Indents? >>>>> >>>>> 95 out->print("%s", " "); >>>> Fixed to use Indents[2]. >>>> >>>>> >>>>> *) Probably makes sense to "hg mv -A" the workerDataArray files >>>>> to preserve the >>>>> Mercurial history -- webrev should say something like "copied from >>>>> ...", IIRC. >>>> Fixed. >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1/ >>>> http://cr.openjdk.java.net/~sangheki/8173335/webrev.1_to_0 >>>> >>>> Thanks, >>>> Sangheon >>>> >>>> >>>>> >>>>> Thanks, >>>>> -Aleksey >>>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From email.sundarms at gmail.com Wed Jun 21 06:45:09 2017 From: email.sundarms at gmail.com (Sundara Mohan M) Date: Tue, 20 Jun 2017 23:45:09 -0700 Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag Message-ID: Hi, Can someone shed more light on why G1OldCSetRegionThresholdPercent flag is under experimental (Need to add -XX:+UnlockExperimentalVMOptions to modify it.) Thanks, Sundar -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.helin at oracle.com Thu Jun 22 08:12:04 2017 From: erik.helin at oracle.com (Erik Helin) Date: Thu, 22 Jun 2017 10:12:04 +0200 Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap into its own subclass In-Reply-To: <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com> References: <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com> Message-ID: Hi Roman, thanks for putting this patch together, it is a great step forward! One thung that (in my mind) would improve it even further is if we embed a GenCollectedHeap in CMSHeap and then make CMSHeap inherit directly from CollectedHeap. With this solution, the definition of CMSHeap would look like something along the lines of: class CMSHeap : public CollectedHeap { WorkGang* _wg; GenCollectedHeap _gch; public: CMSHeap(GenCollectorPolicy* policy) : _wg(new WorkGang("GC Thread", ParallelGCThreads, true, true), _gch(policy) { _wg->initialize_workers(); } // a bunch of "facade" methods virtual bool supports_tlab_allocation() const { return _gch->supports_tlab_allocation(); } virtual size_t tlab_capacity(Thread* t) const { return _gch->tlab_capacity(t); } }; With this approach, you would have to implement a bunch of "facade" methods that just delegates to _gch, such as the methods supports_tlab_allocation and tlab_capacity above. There are two reasons why I prefer this approach: 1. In the end we want CMSHeap to inherit from CollectedHeap anyway :) 2. It makes it very clear which methods we gradually have to re-implement in CMSHeap to eventually get rid of the _gch field (the end goal). This is much harder to see if CMSHeap inherits from GenCollectedHeap (see more below). The second point will most likely cause some initial problems with `protected` code in GenCollectedHeap. For example, as you noticed when creating this patch, CMSHeap make use of a few `protected` fields and methods from GenCollectedHeap, most notably: - _process_strong_tasks - process_roots() - process_string_table_roots() It would be much better (IMO) to share this code via composition rather than inheritance. In this particular case, I would prefer to create a class StrongRootsProcessor that encapsulates the root processing logic. Then GenCollectedHeap and CMSHeap can both contain an instance of StrongRootsProcessor. What do you think of this approach? Do you have some spare cycles to try this approach out? Thanks, Erik On 06/02/2017 10:55 AM, Roman Kennke wrote: > Take this patch. It #ifdef ASSERT's a call to check_gen_kinds() that is > only present in debug builds. > > > http://cr.openjdk.java.net/~rkennke/8179387/webrev.01/ > > > Roman > > Am 01.06.2017 um 22:50 schrieb Roman Kennke: >> What $SUBJECT says. >> >> I went over genCollectedHeap.[hpp|cpp] and moved everything that I could >> find that is CMS-only into a new CMSHeap class. >> >> http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/ >> >> >> It is possible that I overlooked something there. There may be code in >> there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff. >> >> Also not that I have not removed that little part: >> >> always_do_update_barrier = UseConcMarkSweepGC; >> >> because I expect it to go away with Erik ?'s big refactoring. >> >> What do you think? >> >> Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC >> >> Roman >> > From stefan.karlsson at oracle.com Thu Jun 22 08:59:32 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 22 Jun 2017 10:59:32 +0200 Subject: RFR: 818269: Remove gcTrace.hpp include from referenceProcessor.hpp In-Reply-To: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com> References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com> Message-ID: This mail was supposed to go to hotspot-gc-dev (To:ed) not to jdk10-dev (BCC:ed). Thanks, StefanK On 2017-06-22 10:46, Stefan Karlsson wrote: > Hi all, > > Please review this trivial change to remove an include of gcTrace.hpp in > referenceProcessor.hpp, and changes needed to get the code to compile > after that. > > http://cr.openjdk.java.net/~stefank/8182696/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8182696 > > I was prototyping ways to get more type safe time durations in HotSpot > and found that whenever I changed my header file, that almost all > HotSpot cpp files were recompiled. I tracked it down to come from the > unused include of gcTrace.hpp in referenceProcessor.hpp. > > We could probably also try to figure out why changes > referenceProcessor.hpp triggers recompiles of the entire source code, > but I'd like to leave that exercise for another day. > > Thanks, > StefanK From rkennke at redhat.com Thu Jun 22 08:59:53 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 22 Jun 2017 10:59:53 +0200 Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap into its own subclass In-Reply-To: References: <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com> Message-ID: That sounds like a good idea. I'll give it a try. Roman > Hi Roman, > > thanks for putting this patch together, it is a great step forward! One > thung that (in my mind) would improve it even further is if we embed a > GenCollectedHeap in CMSHeap and then make CMSHeap inherit directly from > CollectedHeap. > > With this solution, the definition of CMSHeap would look like something > along the lines of: > > class CMSHeap : public CollectedHeap { > WorkGang* _wg; > GenCollectedHeap _gch; > > public: > CMSHeap(GenCollectorPolicy* policy) : > _wg(new WorkGang("GC Thread", ParallelGCThreads, true, true), > _gch(policy) { > _wg->initialize_workers(); > } > > // a bunch of "facade" methods > virtual bool supports_tlab_allocation() const { > return _gch->supports_tlab_allocation(); > } > > virtual size_t tlab_capacity(Thread* t) const { > return _gch->tlab_capacity(t); > } > }; > > With this approach, you would have to implement a bunch of "facade" > methods that just delegates to _gch, such as the methods > supports_tlab_allocation and tlab_capacity above. There are two reasons > why I prefer this approach: > 1. In the end we want CMSHeap to inherit from CollectedHeap anyway :) > 2. It makes it very clear which methods we gradually have to > re-implement in CMSHeap to eventually get rid of the _gch field (the > end goal). This is much harder to see if CMSHeap inherits from > GenCollectedHeap (see more below). > > The second point will most likely cause some initial problems with > `protected` code in GenCollectedHeap. For example, as you noticed when > creating this patch, CMSHeap make use of a few `protected` fields and > methods from GenCollectedHeap, most notably: > - _process_strong_tasks > - process_roots() > - process_string_table_roots() > > It would be much better (IMO) to share this code via composition rather > than inheritance. In this particular case, I would prefer to create a > class StrongRootsProcessor that encapsulates the root processing logic. > Then GenCollectedHeap and CMSHeap can both contain an instance of > StrongRootsProcessor. > > What do you think of this approach? Do you have some spare cycles to try > this approach out? > > Thanks, > Erik > > On 06/02/2017 10:55 AM, Roman Kennke wrote: >> Take this patch. It #ifdef ASSERT's a call to check_gen_kinds() that is >> only present in debug builds. >> >> >> http://cr.openjdk.java.net/~rkennke/8179387/webrev.01/ >> >> >> Roman >> >> Am 01.06.2017 um 22:50 schrieb Roman Kennke: >>> What $SUBJECT says. >>> >>> I went over genCollectedHeap.[hpp|cpp] and moved everything that I could >>> find that is CMS-only into a new CMSHeap class. >>> >>> http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/ >>> >>> >>> It is possible that I overlooked something there. There may be code in >>> there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff. >>> >>> Also not that I have not removed that little part: >>> >>> always_do_update_barrier = UseConcMarkSweepGC; >>> >>> because I expect it to go away with Erik ?'s big refactoring. >>> >>> What do you think? >>> >>> Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC >>> >>> Roman >>> From stefan.karlsson at oracle.com Thu Jun 22 09:16:45 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 22 Jun 2017 11:16:45 +0200 Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken Message-ID: Hi all, Please review this patch to fix and strengthen is_object_aligned checks when pointers are passed in: http://cr.openjdk.java.net/~stefank/8178490/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8178490 is_object_aligned only works correctly for sizes measured in words. When a pointer is passed into: inline bool is_object_aligned(intptr_t addr) { return addr == align_object_size(addr); } inline intptr_t align_object_size(intptr_t size) { return align_size_up(size, MinObjAlignment); } the pointer is incorrectly interpreted as a word size and the alignment is checked against MinObjectAligment instead of MinObjectAlignmentInBytes Tested with JPRT together with different patches for: 8178489 Make align functions more type safe and consistent Thanks, StefanK From thomas.schatzl at oracle.com Thu Jun 22 09:44:59 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 22 Jun 2017 11:44:59 +0200 Subject: RFR: 818269: Remove gcTrace.hpp include from referenceProcessor.hpp In-Reply-To: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com> References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com> Message-ID: <1498124699.2831.18.camel@oracle.com> Hi, On Thu, 2017-06-22 at 10:46 +0200, Stefan Karlsson wrote: > Hi all, > > Please review this trivial change to remove an include of gcTrace.hpp > in? > referenceProcessor.hpp, and changes needed to get the code to > compile? > after that. > > http://cr.openjdk.java.net/~stefank/8182696/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8182696 ? ship it. Thomas From thomas.schatzl at oracle.com Thu Jun 22 10:00:12 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 22 Jun 2017 12:00:12 +0200 Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken In-Reply-To: References: Message-ID: <1498125612.2831.19.camel@oracle.com> Hi Stefan, On Thu, 2017-06-22 at 11:16 +0200, Stefan Karlsson wrote: > Hi all, > > Please review this patch to fix and strengthen is_object_aligned > checks? > when pointers are passed in: > > http://cr.openjdk.java.net/~stefank/8178490/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8178490 > > is_object_aligned only works correctly for sizes measured in words. > ? looks good. Thomas From thomas.schatzl at oracle.com Thu Jun 22 10:18:19 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 22 Jun 2017 12:18:19 +0200 Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag In-Reply-To: References: Message-ID: <1498126699.2831.29.camel@oracle.com> Hi, On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote: > Hi, > ? Can someone shed more light on why?G1OldCSetRegionThresholdPercent > flag is under experimental (Need to add??- > XX:+UnlockExperimentalVMOptions to modify it.) ? in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly as a "I really want to do that and I know what I am doing" confirmation from the user that he is aware that using this (in this case) option to influence the set of regions taken in during mixed gc you might get surprising behavior. Also, I think there has been no official documentation for it - also because it should be very rarely needed. In particular, I am curious about the case when it would be useful to change it. Could you give some log files showing that there is an issue with the upper bound for the number of old gen regions to take during GC? (i.e. the amount of old gen regions taken is too small and there is ample pause time left and it matters to clean up more regions in a single mixed gc?) Sometimes there are problems with the lower bound that is controlled by the -XX:G1MixedGCCountTarget (product level) option. Hth, ? Thomas From thomas.schatzl at oracle.com Thu Jun 22 10:44:09 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 22 Jun 2017 12:44:09 +0200 Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure In-Reply-To: <1497945947.2784.6.camel@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <1493986396.2777.61.camel@oracle.com> <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com> <1497945947.2784.6.camel@oracle.com> Message-ID: <1498128249.2831.38.camel@oracle.com> Hi all, ? after discussion with Erik, I removed one comment, and renamed the closures to something that resembles their use. Also I had to reintroduce the G1ParPushRefClosure removed in the initial patch due to performance regressions. G1UpdateOrScanRSClosure -> G1ScanObjsDuringUpdateRSClosure G1ParPushRefClosure -> G1ScanObjsDuringScanRSClosure G1ParScanClosure -> G1ScanEvacuatedObjClosure We also found that the mechanism to collect cards that contain references into the collection set to not lose any remembered set entries during update RS if there is an evacuation failure is basically superfluous. Other, existing mechanism make sure that all required remembered sets are (re-)created in other stages of the GC. Removal of this code has been decided to be out of scope here. Webrev: http://cr.openjdk.java.net/~tschatzl/8175554/webrev.1_to_2/?(diff) http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2/?(full) Testing: jprt, local testing Thanks, ? Thomas On Tue, 2017-06-20 at 10:05 +0200, Thomas Schatzl wrote: > Hi Sangheon, others, > > On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote: > > > > Hi Thomas, > > > > On 05/05/2017 05:13 AM, Thomas Schatzl wrote: > > > > > > > > > Hi all, > > > > > > ???recent reviews have made changes necessary to parts of the > > > changeset chain. > > > > > > Here is a list of links to updated webrevs. Since they have > > > apparently not been reviewed yet, I simply overwrote the old > > > webrevs. > > > > > > JDK-8177044: Remove _scan_top from HeapRegion > > > http://cr.openjdk.java.net/~tschatzl/8177044/webrev/ > > > > > > JDK-8178148: Log more detailed information about scan rs phase > > > http://cr.openjdk.java.net/~tschatzl/8178148/webrev/ > > > > > > JDK-8175554: Improve G1UpdateRSOrPushRefClosure > > > http://cr.openjdk.java.net/~tschatzl/8175554/webrev/ > > Looks good to me. > > I only have minor nits. > > > > ------------------------------------------------------ > > src/share/vm/gc/g1/g1OopClosures.hpp > > ???78???virtual void do_oop(oop* p) { do_oop_nv(p); } > > Misaligned with above line. > > > > ------------------------------------------------------ > > src/share/vm/gc/g1/g1RemSet.hpp > > ? 204???????????????????G1UpdateOrScanRSClosure* push_heap_cl, > > Rename to reflect new closure name? > > > > ------------------------------------------------------ > > src/share/vm/gc/g1/g1RootProcessor.hpp > > Copyright update. > > > > ------------------------------------------------------ > > src/share/vm/gc/g1/g1_specialized_oop_closures.hpp > > ???45???????f(G1UpdateOrScanRSClosure,_nv)?????????\ > > Misaligned '\'. > > > ? I fixed all this in addition to incorporating ErikD's comments that > asked for factoring out two parts of the G1ParScanClosure and > G1UpdateOrScanRSClosure that were equal now. > > I did some performance testing again due to that, and also found that > the check to filter out non-cross-region references > in?G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also > reverted it to the old code. > > Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not > update > _has_refs_into_cset as before. Fixed that as well. > > Thanks, > ? Thomas > From stefan.karlsson at oracle.com Thu Jun 22 13:19:46 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 22 Jun 2017 15:19:46 +0200 Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken In-Reply-To: <1498125612.2831.19.camel@oracle.com> References: <1498125612.2831.19.camel@oracle.com> Message-ID: Thanks, Thomas. StefanK On 2017-06-22 12:00, Thomas Schatzl wrote: > Hi Stefan, > > On Thu, 2017-06-22 at 11:16 +0200, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to fix and strengthen is_object_aligned >> checks >> when pointers are passed in: >> >> http://cr.openjdk.java.net/~stefank/8178490/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8178490 >> >> is_object_aligned only works correctly for sizes measured in words. >> > looks good. > > Thomas > From stefan.karlsson at oracle.com Thu Jun 22 13:20:10 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 22 Jun 2017 15:20:10 +0200 Subject: RFR: 818269: Remove gcTrace.hpp include from referenceProcessor.hpp In-Reply-To: <1498124699.2831.18.camel@oracle.com> References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com> <1498124699.2831.18.camel@oracle.com> Message-ID: <3994480b-f286-9f35-0189-413600605c89@oracle.com> Thanks, Thomas. StefanK On 2017-06-22 11:44, Thomas Schatzl wrote: > Hi, > > On Thu, 2017-06-22 at 10:46 +0200, Stefan Karlsson wrote: >> Hi all, >> >> Please review this trivial change to remove an include of gcTrace.hpp >> in >> referenceProcessor.hpp, and changes needed to get the code to >> compile >> after that. >> >> http://cr.openjdk.java.net/~stefank/8182696/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8182696 > ship it. > > Thomas From kim.barrett at oracle.com Thu Jun 22 15:19:28 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 22 Jun 2017 11:19:28 -0400 Subject: RFR: 818269: Remove gcTrace.hpp include from referenceProcessor.hpp In-Reply-To: References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com> Message-ID: <086C5847-9FD6-4B8E-BE64-913BB87D3F23@oracle.com> > On Jun 22, 2017, at 4:59 AM, Stefan Karlsson wrote: > > This mail was supposed to go to hotspot-gc-dev (To:ed) not to jdk10-dev (BCC:ed). > > Thanks, > StefanK > > On 2017-06-22 10:46, Stefan Karlsson wrote: >> Hi all, >> Please review this trivial change to remove an include of gcTrace.hpp in referenceProcessor.hpp, and changes needed to get the code to compile after that. >> http://cr.openjdk.java.net/~stefank/8182696/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8182696 >> I was prototyping ways to get more type safe time durations in HotSpot and found that whenever I changed my header file, that almost all HotSpot cpp files were recompiled. I tracked it down to come from the unused include of gcTrace.hpp in referenceProcessor.hpp. >> We could probably also try to figure out why changes referenceProcessor.hpp triggers recompiles of the entire source code, but I'd like to leave that exercise for another day. >> Thanks, >> StefanK Looks good. There?s potential for interaction between this and 8181449, but we can sort that out if it happens. From stefan.karlsson at oracle.com Thu Jun 22 15:40:22 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 22 Jun 2017 17:40:22 +0200 Subject: RFR: 818269: Remove gcTrace.hpp include from referenceProcessor.hpp In-Reply-To: <086C5847-9FD6-4B8E-BE64-913BB87D3F23@oracle.com> References: <30ae52d3-67da-bda4-b25c-e9ed0cb079ee@oracle.com> <086C5847-9FD6-4B8E-BE64-913BB87D3F23@oracle.com> Message-ID: <2d2c3eb8-b5f7-e258-e07b-26fd20f1cd9c@oracle.com> On 2017-06-22 17:19, Kim Barrett wrote: >> On Jun 22, 2017, at 4:59 AM, Stefan Karlsson wrote: >> >> This mail was supposed to go to hotspot-gc-dev (To:ed) not to jdk10-dev (BCC:ed). >> >> Thanks, >> StefanK >> >> On 2017-06-22 10:46, Stefan Karlsson wrote: >>> Hi all, >>> Please review this trivial change to remove an include of gcTrace.hpp in referenceProcessor.hpp, and changes needed to get the code to compile after that. >>> http://cr.openjdk.java.net/~stefank/8182696/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8182696 >>> I was prototyping ways to get more type safe time durations in HotSpot and found that whenever I changed my header file, that almost all HotSpot cpp files were recompiled. I tracked it down to come from the unused include of gcTrace.hpp in referenceProcessor.hpp. >>> We could probably also try to figure out why changes referenceProcessor.hpp triggers recompiles of the entire source code, but I'd like to leave that exercise for another day. >>> Thanks, >>> StefanK > Looks good. > > There?s potential for interaction between this and 8181449, but we can sort that out if it happens. Thanks, Kim. I'll wait until your change has been pushed, and will resolve any conflicts. StefanK > From email.sundarms at gmail.com Thu Jun 22 16:49:16 2017 From: email.sundarms at gmail.com (Sundara Mohan M) Date: Thu, 22 Jun 2017 09:49:16 -0700 Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag In-Reply-To: <1498126699.2831.29.camel@oracle.com> References: <1498126699.2831.29.camel@oracle.com> Message-ID: Hi Thomas, Thanks for the explanation. I was trying to debug why it is not including some old region even though it had ~100ms (though Ergo logs say it has accommodated all regions to cover given 500ms). Adding some log snippets here and attaching entire logs in case if that helps. Running app with 31G CommandLine flags: -XX:GCLogFileSize=20971520 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap-dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500 -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912 -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation -XX:+UseStringDeduplication ... 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation Pause) (mixed) Desired survivor size 104857600 bytes, new threshold 1 (max 15) - age 1: 131296848 bytes, 131296848 total - age 2: 237559952 bytes, 368856800 total - age 3: 137259376 bytes, 506116176 total 9345.322: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 130042, predicted base time: 171.58 ms, remaining time: 328.42 ms, target pause time: 500.00 ms] 9345.322: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 121 regions, survivors: 77 regions, predicted young region time: 249.33 ms] * 9345.322: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: predicted time is too high, predicted time: 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]* 9345.322: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 204 regions, expensive: 11 regions, min: 204 regions, remaining time: 0.00 ms] 9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted pause time: 504.35 ms, target pause time: 500.00 ms] 9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 1425 regions, reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %] , 0.3691404 secs] [Parallel Time: 301.4 ms, GC Workers: 13] [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max: 9345323.6, Diff: 0.6] [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff: 0.6, Sum: 15.9] [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9, Sum: 809.4] [Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56, Sum: 674] [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum: 157.5] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff: 3.1, Sum: 2922.8] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4] [Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23, Sum: 203] [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.1] [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8, Diff: 0.5, Sum: 3907.2] [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max: 9345624.0, Diff: 0.2] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [String Dedup Fixup: 43.9 ms, GC Workers: 13] [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3, Sum: 28.6] [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2, Sum: 535.8] [Clear CT: 3.4 ms] [Other: 20.2 ms] [Choose CSet: 0.3 ms] [Ref Proc: 13.4 ms] [Ref Enq: 1.0 ms] [Redirty Cards: 2.0 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 2.1 ms] [Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M Heap: 15.6G(31.0G)->13.1G(31.0G)] * [Times: user=4.53 sys=0.00, real=0.36 secs]* .... 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation Pause) (mixed) Desired survivor size 104857600 bytes, new threshold 15 (max 15) - age 1: 31749256 bytes, 31749256 total 9387.489: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 127449, predicted base time: 168.88 ms, remaining time: 331.12 ms, target pause time: 500.00 ms] 9387.489: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 184 regions, survivors: 14 regions, predicted young region time: 62.79 ms] * 9387.490: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: old CSet region num reached max, old: 397 regions, max: 397 regions]* 9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted pause time: 390.18 ms, target pause time: 500.00 ms] * 9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 1028 regions, reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %]* *, 0.1700662 secs]* [Parallel Time: 101.4 ms, GC Workers: 13] [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max: 9387491.1, Diff: 0.6] [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff: 0.9, Sum: 14.3] [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8, Sum: 361.9] [Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54, Sum: 668] [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum: 352.2] [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.7] [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8, Sum: 569.9] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13, Sum: 124] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 2.3] [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff: 0.9, Sum: 1301.4] [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max: 9387591.1, Diff: 0.4] [Code Root Fixup: 0.3 ms] [Code Root Purge: 0.0 ms] [String Dedup Fixup: 43.5 ms, GC Workers: 13] [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3, Sum: 561.3] [Clear CT: 3.9 ms] [Other: 21.1 ms] [Choose CSet: 0.8 ms] [Ref Proc: 12.8 ms] [Ref Enq: 0.9 ms] [Redirty Cards: 0.9 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 4.2 ms] [Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M Heap: 14.5G(31.0G)->10.1G(31.0G)] * [Times: user=1.93 sys=0.00, real=0.17 secs]* ..... 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation Pause) (mixed) Desired survivor size 104857600 bytes, new threshold 15 (max 15) - age 1: 44204040 bytes, 44204040 total - age 2: 31422896 bytes, 75626936 total 9429.490: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 64391, predicted base time: 130.82 ms, remaining time: 369.18 ms, target pause time: 500.00 ms] 9429.490: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 178 regions, survivors: 20 regions, predicted young region time: 69.26 ms] * 9429.491: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: predicted time is too high, predicted time: 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]* 9429.491: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 204 regions, expensive: 72 regions, min: 204 regions, remaining time: 0.00 ms] 9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted pause time: 684.25 ms, target pause time: 500.00 ms] 9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 824 regions, reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %] , 0.1729571 secs] [Parallel Time: 102.6 ms, GC Workers: 13] [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max: 9429491.9, Diff: 0.6] [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff: 0.9, Sum: 16.9] [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2, Sum: 248.9] [Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40, Sum: 424] [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum: 222.8] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5] [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2, Sum: 831.3] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4, Sum: 34] [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 2.2] [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1, Diff: 0.7, Sum: 1322.7] [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max: 9429593.6, Diff: 0.4] [Code Root Fixup: 0.2 ms] [Code Root Purge: 0.0 ms] [String Dedup Fixup: 45.4 ms, GC Workers: 13] [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, Sum: 1.5] [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4, Sum: 573.4] [Clear CT: 4.3 ms] [Other: 20.5 ms] [Choose CSet: 0.5 ms] [Ref Proc: 14.3 ms] [Ref Enq: 1.2 ms] [Redirty Cards: 0.7 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 2.4 ms] [Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M Heap: 11.5G(31.0G)->8796.0M(31.0G)] * [Times: user=1.95 sys=0.00, real=0.17 secs]* On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl wrote: > Hi, > > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote: > > Hi, > > Can someone shed more light on why G1OldCSetRegionThresholdPercent > > flag is under experimental (Need to add - > > XX:+UnlockExperimentalVMOptions to modify it.) > > in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly as a > "I really want to do that and I know what I am doing" confirmation from > the user that he is aware that using this (in this case) option to > influence the set of regions taken in during mixed gc you might get > surprising behavior. > > Also, I think there has been no official documentation for it - also > because it should be very rarely needed. > In particular, I am curious about the case when it would be useful to > change it. Could you give some log files showing that there is an issue > with the upper bound for the number of old gen regions to take during > GC? (i.e. the amount of old gen regions taken is too small and there is > ample pause time left and it matters to clean up more regions in a > single mixed gc?) > > Sometimes there are problems with the lower bound that is controlled by > the -XX:G1MixedGCCountTarget (product level) option. > > Hth, > Thomas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gc-log.0 Type: application/octet-stream Size: 3393036 bytes Desc: not available URL: From rkennke at redhat.com Thu Jun 22 20:19:31 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 22 Jun 2017 22:19:31 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> Message-ID: <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> So here's the latest iteration of that patch: http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ I checked and fixed all the counters. The problem here is that they are not updated in a single place (deflate_idle_monitors() ) but in several places, potentially by multiple threads. I split up deflation into prepare_.. and a finish_.. methods to initialize local and update global counters respectively, and pass around a counters object (allocated on stack) to the various code paths that use it. Updating the counters always happen under a lock, there's no need to do anything special with regards to concurrency. I also checked the nmethod marking, but there doesn't seem to be anything in that code that looks problematic under concurrency. The worst that can happen is that two threads write the same value into an nmethod field. I think we can live with that ;-) Good to go? Tested by running specjvm and jcstress fastdebug+release without issues. Roman Am 02.06.2017 um 12:39 schrieb Robbin Ehn: > Hi Roman, > > On 06/02/2017 11:41 AM, Roman Kennke wrote: >> Hi David, >> thanks for reviewing. I'll be on vacation the next two weeks too, with >> only sporadic access to work stuff. >> Yes, exposure will not be as good as otherwise, but it's not totally >> untested either: the serial code path is the same as the parallel, the >> only difference is that it's not actually called by multiple threads. >> It's ok I think. >> >> I found two more issues that I think should be addressed: >> - There are some counters in deflate_idle_monitors() and I'm not sure I >> correctly handle them in the split-up and MT'ed thread-local/ global >> list deflation >> - nmethod marking seems to unconditionally poke true or something like >> that in nmethod fields. This doesn't hurt correctness-wise, but it's >> probably worth checking if it's already true, especially when doing this >> with multiple threads concurrently. >> >> I'll send an updated patch around later, I hope I can get to it today... > > I'll review that when you get it out. > I think this looks as a reasonable step before we tackle this with a > major effort, such as the JEP you and Carsten doing. > And another effort to 'fix' nmethods marking. > > Internal discussion yesterday lead us to conclude that the runtime > will probably need more threads. > This would be a good driver to do a 'global' worker pool which serves > both gc, runtime and safepoints with threads. > >> >> Roman >> >>> Hi Roman, >>> >>> I am about to disappear on an extended vacation so will let others >>> pursue this. IIUC this is longer an opt-in by the user at runtime, but >>> an opt-in by the particular GC developers. Okay. My only concern with >>> that is if Shenandoah is the only GC that currently opts in then this >>> code is not going to get much testing and will be more prone to >>> incidental breakage. > > As I mentioned before, it seem like Erik ? have some idea, maybe he > can do this after his barrier patch. > > Thanks! > > /Robbin > >>> >>> Cheers, >>> David >>> >>> On 2/06/2017 2:21 AM, Roman Kennke wrote: >>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>>>> Hi Roman, >>>>>> >>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>>>> Hi Roman, I agree that is really needed but: >>>>>>>> >>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>>>> >>>>>>>>> We need to be able to use the workers at a safepoint during >>>>>>>>> concurrent >>>>>>>>> GC work (which also uses the same workers). This does not only >>>>>>>>> require >>>>>>>>> that those workers be suspended, like e.g. >>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. >>>>>>>>> have >>>>>>>>> finished their tasks. This needs some careful handling to work >>>>>>>>> without >>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>>>>> corresponding >>>>>>>>> run_task() call and also the tasks themselves need to join the >>>>>>>>> STS and >>>>>>>>> handle requests for safepoints not by yielding, but by leaving >>>>>>>>> the >>>>>>>>> task. >>>>>>>>> This is far too peculiar for me to make the call to hook up GC >>>>>>>>> workers >>>>>>>>> for safepoint cleanup, and I thus removed those parts. I left the >>>>>>>>> API in >>>>>>>>> CollectedHeap in place. I think GC devs who know better about G1 >>>>>>>>> and CMS >>>>>>>>> should make that call, or else just use a separate thread pool. >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Is it ok now? >>>>>>>> I still think you should put the "Parallel Safepoint Cleanup" >>>>>>>> workers >>>>>>>> inside Shenandoah, >>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, >>>>>>>> e.g.: >>>>>>>> >>>>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>>>> _cleanup_workers->total_workers() : 1; >>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>>>> if (_cleanup_workers != NULL) { >>>>>>>> _cleanup_workers->run_task(&cleanup, _num_cleanup_workers); >>>>>>>> } else { >>>>>>>> cleanup.work(0); >>>>>>>> } >>>>>>>> >>>>>>>> That way you don't even need your new flags, but it will be up to >>>>>>>> the >>>>>>>> other GCs to make their worker available >>>>>>>> or cheat with a separate workgang. >>>>>>> I can do that, I don't mind. The question is, do we want that? >>>>>> The problem is that we do not want to haste such decision, we >>>>>> believe >>>>>> there is a better solution. >>>>>> I think you also would want another solution. >>>>>> But it's seems like such solution with 1 'global' thread pool either >>>>>> own by GC or the VM it self is quite the undertaking. >>>>>> Since this probably will not be done any time soon my suggestion is, >>>>>> to not hold you back (we also want this), just to make >>>>>> the code parallel and as an intermediate step ask the GC if it minds >>>>>> sharing it's thread. >>>>>> >>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will share >>>>>> the code for a separate thread pool, do something of it's own or >>>>>> wait until the bigger question about thread pool(s) have been >>>>>> resolved. >>>>>> >>>>>> By adding a thread pool directly to the SafepointSynchronizer and >>>>>> flags for it we might limit our future options. >>>>>> >>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I >>>>>>> see >>>>>>> that both G1 and CMS suspend their worker threads at a safepoint. >>>>>>> However: >>>>>> Yes it's not cheating but I want decent heuristics between e.g. >>>>>> number >>>>>> of concurrent marking threads and parallel safepoint threads since >>>>>> they compete for cpu time. >>>>>> As the code looks now, I think that decisions must be made by the >>>>>> GC. >>>>> Ok, I see your point. I updated the proposed patch accordingly: >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>>>> >>>> Oops. Minor mistake there. Correction: >>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>>> >>>> >>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it >>>> into >>>> collectedHeap.hpp, resulting in build failure...) >>>> >>>> Roman >>>> >> From thomas.schatzl at oracle.com Thu Jun 22 21:16:19 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 22 Jun 2017 23:16:19 +0200 Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag In-Reply-To: References: <1498126699.2831.29.camel@oracle.com> Message-ID: <1498166179.2710.44.camel@oracle.com> Hi Sundara, On Thu, 2017-06-22 at 09:49 -0700, Sundara Mohan M wrote: > Hi Thomas, > ? ?Thanks for the explanation. > > > I was trying to debug why it is not including some old region even > though it had ~100ms (though Ergo logs say it has accommodated all > regions to cover given 500ms). Ergo is self-training, but it takes some time to adapt to the situation. As long running a run that log shows (thanks!), the number of mixed gcs is relatively small, and they are pretty far apart (in the range of hours between mixed gc phases). Young gc occurrences distribution is far from equal (even considering differences in used young gen size), so it seems that the application is quite bursty from time to time. The different mixed gc/old gen space reclamation phases are never particularly long either, so my best guess would be that the values used for how long particular regions take to evacuate are messed up. I.e. from some graphs it roughly looks like is that there is roughly a mixed gc phase at the start of every bursty phase (as far as I could identify them looking at graphs), and one during the phase, typically near the end. So depending on when that mixed gc occurs (at the start of such a burst or within), g1 trains itself on different application behavior that it later uses these values on. This is always some kind of moving average, which does not necessarily reflect reality. Very good adaptation to this behavior seems beyond what g1 can do at the moment. One could in theory force G1 to give much more weight to recent observations to make adaptations quicker (i.e. change some factors in that average calculation); but there is no user option for that, and it may open a separate can of worms (currently it seems to not too eagerly discount older observations compared to more recent ones if I read the code correctly). But that is just something I made up right now by staring at your log graphs, I may be wrong :) It is unfortunately impossible to determine the exact values for these predictions in a product VM (e.g. comparing actual/predicted detail values the per-region prediction is made of) at this time as there is no way to get these relevant values out of the VM. Back to your problem (if there is one, you did not state any ;)): the log shows a few issues with mixed gc actually: the one you explained about not taking enough old gen regions because the?G1OldCSetRegionThresholdPercent is too low as you suspected (still not reaching max pause time; case 1), and the cases where the number of old gen regions taken is too low so these are filled up with "expensive old gen regions". However I am seeing both the actual time taken being too low and too high (case 2 and 3) Not sure what your goals are here, and what the actual issue is, but - you can probably fix case 1 with increasing the mentioned -XX:G1OldCSetRegionThresholdPercent option if that behavior annoys you. - fix either case 2 or case 3 with decreasing or increasing -XX:G1MixedGCCountTarget (one direction increases the minimum number of regions to take, the other decreases it). All in all an interesting case to look at :) Thanks a lot, ? Thomas > Adding some log snippets here and attaching entire logs in case if > that helps. > > Running app with 31G > > CommandLine flags: -XX:GCLogFileSize=20971520 > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap- > dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500 > -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912 > -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled > -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers > -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation > -XX:+UseStringDeduplication > > ... > 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation > Pause) (mixed) > Desired survivor size 104857600 bytes, new threshold 1 (max 15) > - age ? 1: ?131296848 bytes, ?131296848 total > - age ? 2: ?237559952 bytes, ?368856800 total > - age ? 3: ?137259376 bytes, ?506116176 total > ?9345.322: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 130042, predicted base time: 171.58 ms, remaining > time: 328.42 ms, target pause time: 500.00 ms] > ?9345.322: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 121 regions, survivors: 77 regions, predicted young > region time: 249.33 ms] > ?9345.322: [G1Ergonomics (CSet Construction) finish adding old > regions to CSet, reason: predicted time is too high, predicted time: > 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions] > ?9345.322: [G1Ergonomics (CSet Construction) added expensive regions > to CSet, reason: old CSet region num not reached min, old: 204 > regions, expensive: 11 regions, min: 204 regions, remaining time: > 0.00 ms] > ?9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted > pause time: 504.35 ms, target pause time: 500.00 ms] > ?9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: > candidate old regions available, candidate old regions: 1425 regions, > reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %] > , 0.3691404 secs] > ? ?[Parallel Time: 301.4 ms, GC Workers: 13] > ? ? ? [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max: > 9345323.6, Diff: 0.6] > ? ? ? [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff: > 0.6, Sum: 15.9] > ? ? ? [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9, > Sum: 809.4] > ? ? ? ? ?[Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56, > Sum: 674] > ? ? ? [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum: > 157.5] > ? ? ? [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: > 0.0, Sum: 0.1] > ? ? ? [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff: > 3.1, Sum: 2922.8] > ? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, > Sum: 0.4] > ? ? ? ? ?[Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23, > Sum: 203] > ? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, > Sum: 1.1] > ? ? ? [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8, > Diff: 0.5, Sum: 3907.2] > ? ? ? [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max: > 9345624.0, Diff: 0.2] > ? ?[Code Root Fixup: 0.1 ms] > ? ?[Code Root Purge: 0.0 ms] > ? ?[String Dedup Fixup: 43.9 ms, GC Workers: 13] > ? ? ? [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3, > Sum: 28.6] > ? ? ? [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2, > Sum: 535.8] > ? ?[Clear CT: 3.4 ms] > ? ?[Other: 20.2 ms] > ? ? ? [Choose CSet: 0.3 ms] > ? ? ? [Ref Proc: 13.4 ms] > ? ? ? [Ref Enq: 1.0 ms] > ? ? ? [Redirty Cards: 2.0 ms] > ? ? ? [Humongous Register: 0.2 ms] > ? ? ? [Humongous Reclaim: 0.1 ms] > ? ? ? [Free CSet: 2.1 ms] > ? ?[Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M > Heap: 15.6G(31.0G)->13.1G(31.0G)] > ?[Times: user=4.53 sys=0.00, real=0.36 secs] > .... > 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation > Pause) (mixed) > Desired survivor size 104857600 bytes, new threshold 15 (max 15) > - age ? 1: ? 31749256 bytes, ? 31749256 total > ?9387.489: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 127449, predicted base time: 168.88 ms, remaining > time: 331.12 ms, target pause time: 500.00 ms] > ?9387.489: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 184 regions, survivors: 14 regions, predicted young > region time: 62.79 ms] > ?9387.490: [G1Ergonomics (CSet Construction) finish adding old > regions to CSet, reason: old CSet region num reached max, old: 397 > regions, max: 397 regions] > ?9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted > pause time: 390.18 ms, target pause time: 500.00 ms] > ?9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: > candidate old regions available, candidate old regions: 1028 regions, > reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %] > , 0.1700662 secs] > ? ?[Parallel Time: 101.4 ms, GC Workers: 13] > ? ? ? [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max: > 9387491.1, Diff: 0.6] > ? ? ? [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff: > 0.9, Sum: 14.3] > ? ? ? [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8, > Sum: 361.9] > ? ? ? ? ?[Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54, > Sum: 668] > ? ? ? [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum: > 352.2] > ? ? ? [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: > 0.2, Sum: 0.7] > ? ? ? [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8, > Sum: 569.9] > ? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] > ? ? ? ? ?[Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13, > Sum: 124] > ? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, > Sum: 2.3] > ? ? ? [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff: > 0.9, Sum: 1301.4] > ? ? ? [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max: > 9387591.1, Diff: 0.4] > ? ?[Code Root Fixup: 0.3 ms] > ? ?[Code Root Purge: 0.0 ms] > ? ?[String Dedup Fixup: 43.5 ms, GC Workers: 13] > ? ? ? [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.0] > ? ? ? [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3, > Sum: 561.3] > ? ?[Clear CT: 3.9 ms] > ? ?[Other: 21.1 ms] > ? ? ? [Choose CSet: 0.8 ms] > ? ? ? [Ref Proc: 12.8 ms] > ? ? ? [Ref Enq: 0.9 ms] > ? ? ? [Redirty Cards: 0.9 ms] > ? ? ? [Humongous Register: 0.2 ms] > ? ? ? [Humongous Reclaim: 0.1 ms] > ? ? ? [Free CSet: 4.2 ms] > ? ?[Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M > Heap: 14.5G(31.0G)->10.1G(31.0G)] > ?[Times: user=1.93 sys=0.00, real=0.17 secs] > ..... > 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation > Pause) (mixed) > Desired survivor size 104857600 bytes, new threshold 15 (max 15) > - age ? 1: ? 44204040 bytes, ? 44204040 total > - age ? 2: ? 31422896 bytes, ? 75626936 total > ?9429.490: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 64391, predicted base time: 130.82 ms, remaining > time: 369.18 ms, target pause time: 500.00 ms] > ?9429.490: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 178 regions, survivors: 20 regions, predicted young > region time: 69.26 ms] > ?9429.491: [G1Ergonomics (CSet Construction) finish adding old > regions to CSet, reason: predicted time is too high, predicted time: > 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions] > ?9429.491: [G1Ergonomics (CSet Construction) added expensive regions > to CSet, reason: old CSet region num not reached min, old: 204 > regions, expensive: 72 regions, min: 204 regions, remaining time: > 0.00 ms] > ?9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted > pause time: 684.25 ms, target pause time: 500.00 ms] > ?9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: > candidate old regions available, candidate old regions: 824 regions, > reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %] > , 0.1729571 secs] > ? ?[Parallel Time: 102.6 ms, GC Workers: 13] > ? ? ? [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max: > 9429491.9, Diff: 0.6] > ? ? ? [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff: > 0.9, Sum: 16.9] > ? ? ? [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2, > Sum: 248.9] > ? ? ? ? ?[Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40, > Sum: 424] > ? ? ? [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum: > 222.8] > ? ? ? [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: > 0.1, Sum: 0.5] > ? ? ? [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2, > Sum: 831.3] > ? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] > ? ? ? ? ?[Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4, > Sum: 34] > ? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, > Sum: 2.2] > ? ? ? [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1, > Diff: 0.7, Sum: 1322.7] > ? ? ? [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max: > 9429593.6, Diff: 0.4] > ? ?[Code Root Fixup: 0.2 ms] > ? ?[Code Root Purge: 0.0 ms] > ? ?[String Dedup Fixup: 45.4 ms, GC Workers: 13] > ? ? ? [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, > Sum: 1.5] > ? ? ? [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4, > Sum: 573.4] > ? ?[Clear CT: 4.3 ms] > ? ?[Other: 20.5 ms] > ? ? ? [Choose CSet: 0.5 ms] > ? ? ? [Ref Proc: 14.3 ms] > ? ? ? [Ref Enq: 1.2 ms] > ? ? ? [Redirty Cards: 0.7 ms] > ? ? ? [Humongous Register: 0.2 ms] > ? ? ? [Humongous Reclaim: 0.1 ms] > ? ? ? [Free CSet: 2.4 ms] > ? ?[Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M > Heap: 11.5G(31.0G)->8796.0M(31.0G)] > ?[Times: user=1.95 sys=0.00, real=0.17 secs] > > > On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl e.com> wrote: > > Hi, > > > > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote: > > > Hi, > > > ? Can someone shed more light on > > why?G1OldCSetRegionThresholdPercent > > > flag is under experimental (Need to add??- > > > XX:+UnlockExperimentalVMOptions to modify it.) > > > > ? in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly > > as a > > "I really want to do that and I know what I am doing" confirmation > > from > > the user that he is aware that using this (in this case) option to > > influence the set of regions taken in during mixed gc you might get > > surprising behavior. > > > > Also, I think there has been no official documentation for it - > > also > > because it should be very rarely needed. > > In particular, I am curious about the case when it would be useful > > to > > change it. Could you give some log files showing that there is an > > issue > > with the upper bound for the number of old gen regions to take > > during > > GC? (i.e. the amount of old gen regions taken is too small and > > there is > > ample pause time left and it matters to clean up more regions in a > > single mixed gc?) > > > > Sometimes there are problems with the lower bound that is > > controlled by > > the -XX:G1MixedGCCountTarget (product level) option. > > > > Hth, > > ? Thomas > > > > From email.sundarms at gmail.com Thu Jun 22 22:11:48 2017 From: email.sundarms at gmail.com (Sundara Mohan M) Date: Thu, 22 Jun 2017 15:11:48 -0700 Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag In-Reply-To: <1498166179.2710.44.camel@oracle.com> References: <1498126699.2831.29.camel@oracle.com> <1498166179.2710.44.camel@oracle.com> Message-ID: Thanks for the insights on Ergo. I was trying to migrate from CMS to G1GC, the app has a low memory handler ( the thread which finds memory utilization from Runtime.getFreememory and removes some data from in memory if it exceeds the threshold). In CMS this handler was not invoked frequently (for ex: When I have 60K objects it will kick in remove ~5K LRU objects and continue regular operation) when i moved to G1GC this handler started kicking in frequently(ex: When i have 60K object it will remove 5K LRU objects and immediately after some time it will kick in and remove another 5K and goes till 10K objects are left). So, i was trying to find out why did mixed GC doesn't cleanup quick enough before my low memory handler kicks in. Though i see number of young gen collection and time taken to clean has came down by ~40%. Another issue (may be this is expected) is after increasing G1OldCSetRegionThresholdPercent to 20% from 10% i am started seeing few mixed GC taking 1s (most of the time is spent on UpdateRS, MaxPause=500ms). Will get back once i have more understanding on what is happening.. Thanks, Sundar On Thu, Jun 22, 2017 at 2:16 PM, Thomas Schatzl wrote: > Hi Sundara, > > On Thu, 2017-06-22 at 09:49 -0700, Sundara Mohan M wrote: >> Hi Thomas, >> Thanks for the explanation. >> >> >> I was trying to debug why it is not including some old region even >> though it had ~100ms (though Ergo logs say it has accommodated all >> regions to cover given 500ms). > > Ergo is self-training, but it takes some time to adapt to the > situation. > > As long running a run that log shows (thanks!), the number of mixed gcs > is relatively small, and they are pretty far apart (in the range of > hours between mixed gc phases). Young gc occurrences distribution is > far from equal (even considering differences in used young gen size), > so it seems that the application is quite bursty from time to time. > > The different mixed gc/old gen space reclamation phases are never > particularly long either, so my best guess would be that the values > used for how long particular regions take to evacuate are messed up. > > I.e. from some graphs it roughly looks like is that there is roughly a > mixed gc phase at the start of every bursty phase (as far as I could > identify them looking at graphs), and one during the phase, typically > near the end. > > So depending on when that mixed gc occurs (at the start of such a burst > or within), g1 trains itself on different application behavior that it > later uses these values on. This is always some kind of moving average, > which does not necessarily reflect reality. > > Very good adaptation to this behavior seems beyond what g1 can do at the moment. > > One could in theory force G1 to give much more weight to recent observations to make adaptations quicker (i.e. change some factors in that average calculation); but there is no user option for that, and it may open a separate can of worms (currently it seems to not too eagerly discount older observations compared to more recent ones if I read the code correctly). > > But that is just something I made up right now by staring at your log > graphs, I may be wrong :) > > It is unfortunately impossible to determine the exact values for these > predictions in a product VM (e.g. comparing actual/predicted detail > values the per-region prediction is made of) at this time as there is > no way to get these relevant values out of the VM. > > Back to your problem (if there is one, you did not state any ;)): the > log shows a few issues with mixed gc actually: the one you explained > about not taking enough old gen regions because > the G1OldCSetRegionThresholdPercent is too low as you suspected (still > not reaching max pause time; case 1), and the cases where the number of > old gen regions taken is too low so these are filled up with "expensive > old gen regions". However I am seeing both the actual time taken being > too low and too high (case 2 and 3) > > Not sure what your goals are here, and what the actual issue is, but > > - you can probably fix case 1 with increasing the mentioned > -XX:G1OldCSetRegionThresholdPercent option if that behavior annoys you. > > - fix either case 2 or case 3 with decreasing or increasing > -XX:G1MixedGCCountTarget (one direction increases the minimum number of > regions to take, the other decreases it). > > All in all an interesting case to look at :) > > Thanks a lot, > Thomas > >> Adding some log snippets here and attaching entire logs in case if >> that helps. >> >> Running app with 31G >> >> CommandLine flags: -XX:GCLogFileSize=20971520 >> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap- >> dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500 >> -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912 >> -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled >> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC >> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps >> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps >> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers >> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation >> -XX:+UseStringDeduplication >> >> ... >> 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation >> Pause) (mixed) >> Desired survivor size 104857600 bytes, new threshold 1 (max 15) >> - age 1: 131296848 bytes, 131296848 total >> - age 2: 237559952 bytes, 368856800 total >> - age 3: 137259376 bytes, 506116176 total >> 9345.322: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 130042, predicted base time: 171.58 ms, remaining >> time: 328.42 ms, target pause time: 500.00 ms] >> 9345.322: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 121 regions, survivors: 77 regions, predicted young >> region time: 249.33 ms] >> 9345.322: [G1Ergonomics (CSet Construction) finish adding old >> regions to CSet, reason: predicted time is too high, predicted time: >> 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions] >> 9345.322: [G1Ergonomics (CSet Construction) added expensive regions >> to CSet, reason: old CSet region num not reached min, old: 204 >> regions, expensive: 11 regions, min: 204 regions, remaining time: >> 0.00 ms] >> 9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted >> pause time: 504.35 ms, target pause time: 500.00 ms] >> 9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: >> candidate old regions available, candidate old regions: 1425 regions, >> reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %] >> , 0.3691404 secs] >> [Parallel Time: 301.4 ms, GC Workers: 13] >> [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max: >> 9345323.6, Diff: 0.6] >> [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff: >> 0.6, Sum: 15.9] >> [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9, >> Sum: 809.4] >> [Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56, >> Sum: 674] >> [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum: >> 157.5] >> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: >> 0.0, Sum: 0.1] >> [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff: >> 3.1, Sum: 2922.8] >> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, >> Sum: 0.4] >> [Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23, >> Sum: 203] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, >> Sum: 1.1] >> [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8, >> Diff: 0.5, Sum: 3907.2] >> [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max: >> 9345624.0, Diff: 0.2] >> [Code Root Fixup: 0.1 ms] >> [Code Root Purge: 0.0 ms] >> [String Dedup Fixup: 43.9 ms, GC Workers: 13] >> [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3, >> Sum: 28.6] >> [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2, >> Sum: 535.8] >> [Clear CT: 3.4 ms] >> [Other: 20.2 ms] >> [Choose CSet: 0.3 ms] >> [Ref Proc: 13.4 ms] >> [Ref Enq: 1.0 ms] >> [Redirty Cards: 2.0 ms] >> [Humongous Register: 0.2 ms] >> [Humongous Reclaim: 0.1 ms] >> [Free CSet: 2.1 ms] >> [Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M >> Heap: 15.6G(31.0G)->13.1G(31.0G)] >> [Times: user=4.53 sys=0.00, real=0.36 secs] >> .... >> 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation >> Pause) (mixed) >> Desired survivor size 104857600 bytes, new threshold 15 (max 15) >> - age 1: 31749256 bytes, 31749256 total >> 9387.489: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 127449, predicted base time: 168.88 ms, remaining >> time: 331.12 ms, target pause time: 500.00 ms] >> 9387.489: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 184 regions, survivors: 14 regions, predicted young >> region time: 62.79 ms] >> 9387.490: [G1Ergonomics (CSet Construction) finish adding old >> regions to CSet, reason: old CSet region num reached max, old: 397 >> regions, max: 397 regions] >> 9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted >> pause time: 390.18 ms, target pause time: 500.00 ms] >> 9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: >> candidate old regions available, candidate old regions: 1028 regions, >> reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %] >> , 0.1700662 secs] >> [Parallel Time: 101.4 ms, GC Workers: 13] >> [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max: >> 9387491.1, Diff: 0.6] >> [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff: >> 0.9, Sum: 14.3] >> [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8, >> Sum: 361.9] >> [Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54, >> Sum: 668] >> [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum: >> 352.2] >> [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: >> 0.2, Sum: 0.7] >> [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8, >> Sum: 569.9] >> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.1] >> [Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13, >> Sum: 124] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, >> Sum: 2.3] >> [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff: >> 0.9, Sum: 1301.4] >> [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max: >> 9387591.1, Diff: 0.4] >> [Code Root Fixup: 0.3 ms] >> [Code Root Purge: 0.0 ms] >> [String Dedup Fixup: 43.5 ms, GC Workers: 13] >> [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.0] >> [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3, >> Sum: 561.3] >> [Clear CT: 3.9 ms] >> [Other: 21.1 ms] >> [Choose CSet: 0.8 ms] >> [Ref Proc: 12.8 ms] >> [Ref Enq: 0.9 ms] >> [Redirty Cards: 0.9 ms] >> [Humongous Register: 0.2 ms] >> [Humongous Reclaim: 0.1 ms] >> [Free CSet: 4.2 ms] >> [Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M >> Heap: 14.5G(31.0G)->10.1G(31.0G)] >> [Times: user=1.93 sys=0.00, real=0.17 secs] >> ..... >> 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation >> Pause) (mixed) >> Desired survivor size 104857600 bytes, new threshold 15 (max 15) >> - age 1: 44204040 bytes, 44204040 total >> - age 2: 31422896 bytes, 75626936 total >> 9429.490: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 64391, predicted base time: 130.82 ms, remaining >> time: 369.18 ms, target pause time: 500.00 ms] >> 9429.490: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 178 regions, survivors: 20 regions, predicted young >> region time: 69.26 ms] >> 9429.491: [G1Ergonomics (CSet Construction) finish adding old >> regions to CSet, reason: predicted time is too high, predicted time: >> 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions] >> 9429.491: [G1Ergonomics (CSet Construction) added expensive regions >> to CSet, reason: old CSet region num not reached min, old: 204 >> regions, expensive: 72 regions, min: 204 regions, remaining time: >> 0.00 ms] >> 9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted >> pause time: 684.25 ms, target pause time: 500.00 ms] >> 9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: >> candidate old regions available, candidate old regions: 824 regions, >> reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %] >> , 0.1729571 secs] >> [Parallel Time: 102.6 ms, GC Workers: 13] >> [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max: >> 9429491.9, Diff: 0.6] >> [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff: >> 0.9, Sum: 16.9] >> [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2, >> Sum: 248.9] >> [Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40, >> Sum: 424] >> [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum: >> 222.8] >> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: >> 0.1, Sum: 0.5] >> [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2, >> Sum: 831.3] >> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.1] >> [Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4, >> Sum: 34] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, >> Sum: 2.2] >> [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1, >> Diff: 0.7, Sum: 1322.7] >> [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max: >> 9429593.6, Diff: 0.4] >> [Code Root Fixup: 0.2 ms] >> [Code Root Purge: 0.0 ms] >> [String Dedup Fixup: 45.4 ms, GC Workers: 13] >> [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, >> Sum: 1.5] >> [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4, >> Sum: 573.4] >> [Clear CT: 4.3 ms] >> [Other: 20.5 ms] >> [Choose CSet: 0.5 ms] >> [Ref Proc: 14.3 ms] >> [Ref Enq: 1.2 ms] >> [Redirty Cards: 0.7 ms] >> [Humongous Register: 0.2 ms] >> [Humongous Reclaim: 0.1 ms] >> [Free CSet: 2.4 ms] >> [Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M >> Heap: 11.5G(31.0G)->8796.0M(31.0G)] >> [Times: user=1.95 sys=0.00, real=0.17 secs] >> >> >> On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl > e.com> wrote: >> > Hi, >> > >> > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote: >> > > Hi, >> > > Can someone shed more light on >> > why G1OldCSetRegionThresholdPercent >> > > flag is under experimental (Need to add - >> > > XX:+UnlockExperimentalVMOptions to modify it.) >> > >> > in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly >> > as a >> > "I really want to do that and I know what I am doing" confirmation >> > from >> > the user that he is aware that using this (in this case) option to >> > influence the set of regions taken in during mixed gc you might get >> > surprising behavior. >> > >> > Also, I think there has been no official documentation for it - >> > also >> > because it should be very rarely needed. >> > In particular, I am curious about the case when it would be useful >> > to >> > change it. Could you give some log files showing that there is an >> > issue >> > with the upper bound for the number of old gen regions to take >> > during >> > GC? (i.e. the amount of old gen regions taken is too small and >> > there is >> > ample pause time left and it matters to clean up more regions in a >> > single mixed gc?) >> > >> > Sometimes there are problems with the lower bound that is >> > controlled by >> > the -XX:G1MixedGCCountTarget (product level) option. >> > >> > Hth, >> > Thomas >> > >> > From ecki at zusammenkunft.net Thu Jun 22 22:30:18 2017 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Thu, 22 Jun 2017 22:30:18 +0000 Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag In-Reply-To: References: <1498126699.2831.29.camel@oracle.com> <1498166179.2710.44.camel@oracle.com>, Message-ID: Looking at used memory like this is a bit problematic, since the Java Heap tends to hold on memory - and only when the GC runs and tries to free memory it is known how the real memory is used. In case of CMS the collection is triggered regularly in the background. This is why the used memory metric is not that bad. However with G1 (and even worse with throughput collector) you often see a larger usage than actual referenced memory. (This is a bit an oversimplification as it does not address soft references) What I typically recommend is to not look at the used memory metric at fixed intervals but wait for a Gc event and look at the 'usage after gc'. This also has problems (gives you the usage late) but it will avoid the false positive you have observed. Gruss Bernd -- http://bernd.eckenfels.net _____________________________ From: Sundara Mohan M > Sent: Freitag, Juni 23, 2017 12:23 AM Subject: Re: G1OldCSetRegionThresholdPercent under ExperimentalFlag To: Thomas Schatzl > Cc: > Thanks for the insights on Ergo. I was trying to migrate from CMS to G1GC, the app has a low memory handler ( the thread which finds memory utilization from Runtime.getFreememory and removes some data from in memory if it exceeds the threshold). In CMS this handler was not invoked frequently (for ex: When I have 60K objects it will kick in remove ~5K LRU objects and continue regular operation) when i moved to G1GC this handler started kicking in frequently(ex: When i have 60K object it will remove 5K LRU objects and immediately after some time it will kick in and remove another 5K and goes till 10K objects are left). So, i was trying to find out why did mixed GC doesn't cleanup quick enough before my low memory handler kicks in. Though i see number of young gen collection and time taken to clean has came down by ~40%. Another issue (may be this is expected) is after increasing G1OldCSetRegionThresholdPercent to 20% from 10% i am started seeing few mixed GC taking 1s (most of the time is spent on UpdateRS, MaxPause=500ms). Will get back once i have more understanding on what is happening.. Thanks, Sundar On Thu, Jun 22, 2017 at 2:16 PM, Thomas Schatzl > wrote: > Hi Sundara, > > On Thu, 2017-06-22 at 09:49 -0700, Sundara Mohan M wrote: >> Hi Thomas, >> Thanks for the explanation. >> >> >> I was trying to debug why it is not including some old region even >> though it had ~100ms (though Ergo logs say it has accommodated all >> regions to cover given 500ms). > > Ergo is self-training, but it takes some time to adapt to the > situation. > > As long running a run that log shows (thanks!), the number of mixed gcs > is relatively small, and they are pretty far apart (in the range of > hours between mixed gc phases). Young gc occurrences distribution is > far from equal (even considering differences in used young gen size), > so it seems that the application is quite bursty from time to time. > > The different mixed gc/old gen space reclamation phases are never > particularly long either, so my best guess would be that the values > used for how long particular regions take to evacuate are messed up. > > I.e. from some graphs it roughly looks like is that there is roughly a > mixed gc phase at the start of every bursty phase (as far as I could > identify them looking at graphs), and one during the phase, typically > near the end. > > So depending on when that mixed gc occurs (at the start of such a burst > or within), g1 trains itself on different application behavior that it > later uses these values on. This is always some kind of moving average, > which does not necessarily reflect reality. > > Very good adaptation to this behavior seems beyond what g1 can do at the moment. > > One could in theory force G1 to give much more weight to recent observations to make adaptations quicker (i.e. change some factors in that average calculation); but there is no user option for that, and it may open a separate can of worms (currently it seems to not too eagerly discount older observations compared to more recent ones if I read the code correctly). > > But that is just something I made up right now by staring at your log > graphs, I may be wrong :) > > It is unfortunately impossible to determine the exact values for these > predictions in a product VM (e.g. comparing actual/predicted detail > values the per-region prediction is made of) at this time as there is > no way to get these relevant values out of the VM. > > Back to your problem (if there is one, you did not state any ;)): the > log shows a few issues with mixed gc actually: the one you explained > about not taking enough old gen regions because > the G1OldCSetRegionThresholdPercent is too low as you suspected (still > not reaching max pause time; case 1), and the cases where the number of > old gen regions taken is too low so these are filled up with "expensive > old gen regions". However I am seeing both the actual time taken being > too low and too high (case 2 and 3) > > Not sure what your goals are here, and what the actual issue is, but > > - you can probably fix case 1 with increasing the mentioned > -XX:G1OldCSetRegionThresholdPercent option if that behavior annoys you. > > - fix either case 2 or case 3 with decreasing or increasing > -XX:G1MixedGCCountTarget (one direction increases the minimum number of > regions to take, the other decreases it). > > All in all an interesting case to look at :) > > Thanks a lot, > Thomas > >> Adding some log snippets here and attaching entire logs in case if >> that helps. >> >> Running app with 31G >> >> CommandLine flags: -XX:GCLogFileSize=20971520 >> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap- >> dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500 >> -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912 >> -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled >> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC >> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps >> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps >> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers >> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation >> -XX:+UseStringDeduplication >> >> ... >> 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation >> Pause) (mixed) >> Desired survivor size 104857600 bytes, new threshold 1 (max 15) >> - age 1: 131296848 bytes, 131296848 total >> - age 2: 237559952 bytes, 368856800 total >> - age 3: 137259376 bytes, 506116176 total >> 9345.322: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 130042, predicted base time: 171.58 ms, remaining >> time: 328.42 ms, target pause time: 500.00 ms] >> 9345.322: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 121 regions, survivors: 77 regions, predicted young >> region time: 249.33 ms] >> 9345.322: [G1Ergonomics (CSet Construction) finish adding old >> regions to CSet, reason: predicted time is too high, predicted time: >> 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions] >> 9345.322: [G1Ergonomics (CSet Construction) added expensive regions >> to CSet, reason: old CSet region num not reached min, old: 204 >> regions, expensive: 11 regions, min: 204 regions, remaining time: >> 0.00 ms] >> 9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted >> pause time: 504.35 ms, target pause time: 500.00 ms] >> 9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: >> candidate old regions available, candidate old regions: 1425 regions, >> reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %] >> , 0.3691404 secs] >> [Parallel Time: 301.4 ms, GC Workers: 13] >> [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max: >> 9345323.6, Diff: 0.6] >> [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff: >> 0.6, Sum: 15.9] >> [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9, >> Sum: 809.4] >> [Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56, >> Sum: 674] >> [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum: >> 157.5] >> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: >> 0.0, Sum: 0.1] >> [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff: >> 3.1, Sum: 2922.8] >> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, >> Sum: 0.4] >> [Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23, >> Sum: 203] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, >> Sum: 1.1] >> [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8, >> Diff: 0.5, Sum: 3907.2] >> [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max: >> 9345624.0, Diff: 0.2] >> [Code Root Fixup: 0.1 ms] >> [Code Root Purge: 0.0 ms] >> [String Dedup Fixup: 43.9 ms, GC Workers: 13] >> [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3, >> Sum: 28.6] >> [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2, >> Sum: 535.8] >> [Clear CT: 3.4 ms] >> [Other: 20.2 ms] >> [Choose CSet: 0.3 ms] >> [Ref Proc: 13.4 ms] >> [Ref Enq: 1.0 ms] >> [Redirty Cards: 2.0 ms] >> [Humongous Register: 0.2 ms] >> [Humongous Reclaim: 0.1 ms] >> [Free CSet: 2.1 ms] >> [Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M >> Heap: 15.6G(31.0G)->13.1G(31.0G)] >> [Times: user=4.53 sys=0.00, real=0.36 secs] >> .... >> 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation >> Pause) (mixed) >> Desired survivor size 104857600 bytes, new threshold 15 (max 15) >> - age 1: 31749256 bytes, 31749256 total >> 9387.489: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 127449, predicted base time: 168.88 ms, remaining >> time: 331.12 ms, target pause time: 500.00 ms] >> 9387.489: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 184 regions, survivors: 14 regions, predicted young >> region time: 62.79 ms] >> 9387.490: [G1Ergonomics (CSet Construction) finish adding old >> regions to CSet, reason: old CSet region num reached max, old: 397 >> regions, max: 397 regions] >> 9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted >> pause time: 390.18 ms, target pause time: 500.00 ms] >> 9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: >> candidate old regions available, candidate old regions: 1028 regions, >> reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %] >> , 0.1700662 secs] >> [Parallel Time: 101.4 ms, GC Workers: 13] >> [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max: >> 9387491.1, Diff: 0.6] >> [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff: >> 0.9, Sum: 14.3] >> [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8, >> Sum: 361.9] >> [Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54, >> Sum: 668] >> [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum: >> 352.2] >> [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: >> 0.2, Sum: 0.7] >> [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8, >> Sum: 569.9] >> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.1] >> [Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13, >> Sum: 124] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, >> Sum: 2.3] >> [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff: >> 0.9, Sum: 1301.4] >> [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max: >> 9387591.1, Diff: 0.4] >> [Code Root Fixup: 0.3 ms] >> [Code Root Purge: 0.0 ms] >> [String Dedup Fixup: 43.5 ms, GC Workers: 13] >> [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.0] >> [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3, >> Sum: 561.3] >> [Clear CT: 3.9 ms] >> [Other: 21.1 ms] >> [Choose CSet: 0.8 ms] >> [Ref Proc: 12.8 ms] >> [Ref Enq: 0.9 ms] >> [Redirty Cards: 0.9 ms] >> [Humongous Register: 0.2 ms] >> [Humongous Reclaim: 0.1 ms] >> [Free CSet: 4.2 ms] >> [Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M >> Heap: 14.5G(31.0G)->10.1G(31.0G)] >> [Times: user=1.93 sys=0.00, real=0.17 secs] >> ..... >> 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation >> Pause) (mixed) >> Desired survivor size 104857600 bytes, new threshold 15 (max 15) >> - age 1: 44204040 bytes, 44204040 total >> - age 2: 31422896 bytes, 75626936 total >> 9429.490: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 64391, predicted base time: 130.82 ms, remaining >> time: 369.18 ms, target pause time: 500.00 ms] >> 9429.490: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 178 regions, survivors: 20 regions, predicted young >> region time: 69.26 ms] >> 9429.491: [G1Ergonomics (CSet Construction) finish adding old >> regions to CSet, reason: predicted time is too high, predicted time: >> 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions] >> 9429.491: [G1Ergonomics (CSet Construction) added expensive regions >> to CSet, reason: old CSet region num not reached min, old: 204 >> regions, expensive: 72 regions, min: 204 regions, remaining time: >> 0.00 ms] >> 9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted >> pause time: 684.25 ms, target pause time: 500.00 ms] >> 9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: >> candidate old regions available, candidate old regions: 824 regions, >> reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %] >> , 0.1729571 secs] >> [Parallel Time: 102.6 ms, GC Workers: 13] >> [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max: >> 9429491.9, Diff: 0.6] >> [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff: >> 0.9, Sum: 16.9] >> [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2, >> Sum: 248.9] >> [Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40, >> Sum: 424] >> [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum: >> 222.8] >> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: >> 0.1, Sum: 0.5] >> [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2, >> Sum: 831.3] >> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.1] >> [Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4, >> Sum: 34] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, >> Sum: 2.2] >> [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1, >> Diff: 0.7, Sum: 1322.7] >> [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max: >> 9429593.6, Diff: 0.4] >> [Code Root Fixup: 0.2 ms] >> [Code Root Purge: 0.0 ms] >> [String Dedup Fixup: 45.4 ms, GC Workers: 13] >> [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, >> Sum: 1.5] >> [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4, >> Sum: 573.4] >> [Clear CT: 4.3 ms] >> [Other: 20.5 ms] >> [Choose CSet: 0.5 ms] >> [Ref Proc: 14.3 ms] >> [Ref Enq: 1.2 ms] >> [Redirty Cards: 0.7 ms] >> [Humongous Register: 0.2 ms] >> [Humongous Reclaim: 0.1 ms] >> [Free CSet: 2.4 ms] >> [Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M >> Heap: 11.5G(31.0G)->8796.0M(31.0G)] >> [Times: user=1.95 sys=0.00, real=0.17 secs] >> >> >> On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl > e.com> wrote: >> > Hi, >> > >> > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote: >> > > Hi, >> > > Can someone shed more light on >> > why G1OldCSetRegionThresholdPercent >> > > flag is under experimental (Need to add - >> > > XX:+UnlockExperimentalVMOptions to modify it.) >> > >> > in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly >> > as a >> > "I really want to do that and I know what I am doing" confirmation >> > from >> > the user that he is aware that using this (in this case) option to >> > influence the set of regions taken in during mixed gc you might get >> > surprising behavior. >> > >> > Also, I think there has been no official documentation for it - >> > also >> > because it should be very rarely needed. >> > In particular, I am curious about the case when it would be useful >> > to >> > change it. Could you give some log files showing that there is an >> > issue >> > with the upper bound for the number of old gen regions to take >> > during >> > GC? (i.e. the amount of old gen regions taken is too small and >> > there is >> > ample pause time left and it matters to clean up more regions in a >> > single mixed gc?) >> > >> > Sometimes there are problems with the lower bound that is >> > controlled by >> > the -XX:G1MixedGCCountTarget (product level) option. >> > >> > Hth, >> > Thomas >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Fri Jun 23 12:29:17 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 23 Jun 2017 14:29:17 +0200 Subject: G1OldCSetRegionThresholdPercent under ExperimentalFlag In-Reply-To: References: <1498126699.2831.29.camel@oracle.com> <1498166179.2710.44.camel@oracle.com> Message-ID: <1498220957.2741.68.camel@oracle.com> Hi, On Thu, 2017-06-22 at 15:11 -0700, Sundara Mohan M wrote: > Thanks for the insights on Ergo. > > I was trying to migrate from CMS to G1GC, the app has a low memory > handler ( the thread which finds memory utilization from > Runtime.getFreememory and removes some data from in memory if it > exceeds the threshold). > > In CMS this handler was not invoked frequently (for ex: When I have > 60K objects it will kick in remove ~5K LRU objects and continue > regular operation) when i moved to G1GC this handler started kicking > in frequently(ex: When i have 60K object it will remove 5K LRU > objects and immediately after some time it will kick in and remove > another 5K and goes till 10K objects are left). > > So, i was trying to find out why did mixed GC doesn't cleanup quick > enough before my low memory handler kicks in. As Bernd mentioned, G1 is only very lazily reclaiming space containing dead objects, so such an approach has its limits. I think CMS has this CMSTriggerInterval option that starts background collection, which immediately reclaims space in the end (updating its freelist) afaik. One could get updated liveness information by jcmd/system.gc() with -XX:ExplicitGCInvokesConcurrent starting marking regularly currently, but it has a few drawbacks of its own: - starts liveness analysis/marking immediately, potentially messing with your pause time requirements - unknown impact on prediction - does not do space reclamation on its own, as reclamation will be piggy-backed on the next few gcs - will interrupt a currently running space reclamation (mixed gc) phase. I.e. if you spam these, g1 will never reclaim any memory. - "creatively" reuses system.gc() which might not be possible or advisable in many cases. - all of the above is implementation defined behavior. There may be other caveats. In a VM where you do not have a lot of control about memory management by design it is definitely problematic to have another memory manager on top where one of them does not know anything about the other. Such an algorithm may also interact badly with future changes e.g. the adaptive IHOP [1] feature in jdk9. > Though i see number of young gen collection and time taken to clean > has came down by ~40%. > > Another issue (may be this is expected) is after increasing > G1OldCSetRegionThresholdPercent to 20% from 10% i am started seeing > few mixed GC taking 1s (most of the time is spent on UpdateRS, > MaxPause=500ms). Will get back once i have more understanding on what > is happening.. The option allows G1 to add more regions to the set of regions to be collected. This implies potentially longer pauses if the predictions are incorrect in the first place. That is one reason why this is an experimental option. Thanks, ? Thomas [1]?https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-col lector.htm#GUID-572C9203-AB27-46F1-9D33-42BA4F3C6BF3 From kim.barrett at oracle.com Sat Jun 24 18:03:52 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 24 Jun 2017 14:03:52 -0400 Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken In-Reply-To: References: Message-ID: <463199D0-EF95-4399-AA9A-D741002FA43C@oracle.com> > On Jun 22, 2017, at 5:16 AM, Stefan Karlsson wrote: > > Hi all, > > Please review this patch to fix and strengthen is_object_aligned checks when pointers are passed in: > > http://cr.openjdk.java.net/~stefank/8178490/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8178490 > > is_object_aligned only works correctly for sizes measured in words. > > When a pointer is passed into: > > inline bool is_object_aligned(intptr_t addr) { > return addr == align_object_size(addr); > } > > inline intptr_t align_object_size(intptr_t size) { > return align_size_up(size, MinObjAlignment); > } > > the pointer is incorrectly interpreted as a word size and the alignment is checked against MinObjectAligment instead of MinObjectAlignmentInBytes > > Tested with JPRT together with different patches for: > 8178489 Make align functions more type safe and consistent > > Thanks, > StefanK Looks good. From stefan.karlsson at oracle.com Mon Jun 26 07:08:40 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 26 Jun 2017 09:08:40 +0200 Subject: RFR: 8178490: Usages of is_object_aligned with pointers are broken In-Reply-To: <463199D0-EF95-4399-AA9A-D741002FA43C@oracle.com> References: <463199D0-EF95-4399-AA9A-D741002FA43C@oracle.com> Message-ID: <2d57a062-83cc-0c79-f525-650f50d59e95@oracle.com> Thanks, Kim. StefanK On 2017-06-24 20:03, Kim Barrett wrote: >> On Jun 22, 2017, at 5:16 AM, Stefan Karlsson wrote: >> >> Hi all, >> >> Please review this patch to fix and strengthen is_object_aligned checks when pointers are passed in: >> >> http://cr.openjdk.java.net/~stefank/8178490/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8178490 >> >> is_object_aligned only works correctly for sizes measured in words. >> >> When a pointer is passed into: >> >> inline bool is_object_aligned(intptr_t addr) { >> return addr == align_object_size(addr); >> } >> >> inline intptr_t align_object_size(intptr_t size) { >> return align_size_up(size, MinObjAlignment); >> } >> >> the pointer is incorrectly interpreted as a word size and the alignment is checked against MinObjectAligment instead of MinObjectAlignmentInBytes >> >> Tested with JPRT together with different patches for: >> 8178489 Make align functions more type safe and consistent >> >> Thanks, >> StefanK > Looks good. > From erik.osterlund at oracle.com Mon Jun 26 13:34:22 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 26 Jun 2017 15:34:22 +0200 Subject: RFR (S): 8182703: Correct G1 barrier queue lock orderings Message-ID: <59510D5E.10009@oracle.com> Hi, Webrev: http://cr.openjdk.java.net/~eosterlund/8182703/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8182703 The G1 barrier queues have very awkward lock orderings for the following reasons: 1) These queues may queue up things when performing a reference write or resolving a jweak (intentionally or just happened to be jweak, even though it looks like a jobject), which can happen in a lot of places in the code. We resolve JNIHandles while holding special locks in many places. We perform reference writes also in many places. Now the unsuspecting hotspot developer might think that it is okay to resolve a JNIHandle or perform a reference write while possibly holding a special lock. But no. In some cases, object writes have been moved out of locks and replaced with lock-free CAS, only to dodge the G1 write barrier locks. I don't think the G1 lock ordering issues should shape the shared code rather than the other way around. 2) There is an issue that the shared queue locks have a "special" rank, which is below the lock ranks used by the cbl monitor and free list monitor. This leads to an issue when these locks have to be taken while holding the shared queue locks. The current solution is to drop the shared queue locks temporarily, introducing nasty data races. These races are guarded, but the whole race seems very unnecessary. I argue that if the G1 write barrier queue locks were simply set appropriately in the first place by analyzing what ranks they should have, none of the above issues would exist. Therefore I propose this new ordering. Specifically, I recognize that locks required for performing memory accesses and resolving JNIHandles are more special than the "special" rank. Therefore, this change introduces a new lock ordering category called "access", which is to be used by barriers required to perform memory accesses. In other words, by recognizing the rank is more special than "special", we can remove "special" code to walk around making its rank more "special". That seems desirable to me. The access locks need to comply to the same constraints as the special locks: they may not perform safepoint checks. The old lock ranks were: SATB_Q_FL_lock: special SATB_Q_CBL_mon: leaf - 1 Shared_SATB_Q_lock: leaf - 1 DirtyCardQ_FL_lock: special DirtyCardQ_CBL_mon: leaf - 1 Shared_DirtyCardQ_lock: leaf - 1 The new lock ranks are: SATB_Q_FL_lock: access (special - 2) SATB_Q_CBL_mon: access (special - 2) Shared_SATB_Q_lock: access + 1 (special - 1) DirtyCardQ_FL_lock: access (special - 2) DirtyCardQ_CBL_mon: access (special - 2) Shared_DirtyCardQ_lock: access + 1 (special - 1) Analysis: Each PtrQueue and PtrQueueSet group, SATB or DirtyCardQ have the same group of locks. The free list lock, the completed buffer list monitor and the shared queue lock. Observations: 1) The free list lock and completed buffer list monitors (members of PtrQueueSet) are disjoint. We never hold both of them at the same time. Rationale: The free list lock is only used from PtrQueueSet::allocate_buffer, PtrQueueSet::deallocate_buffer and PtrQueueSet::reduce_free_list, and no callsite from there can be expanded where the cbl monitor is acquired. So therefore it is impossible to acquire the cbl monitor while holding the free list lock. The opposite case of acquiring the free list lock while holding the cbl monitor is also not possible; only the following places acquire the cbl monitor: PtrQueueSet::enqueue_complete_buffer, PtrQueueSet::merge_bufferlists, PtrQueueSet::assert_completed_buffer_list_len_correct, PtrQueueSet::notify_if_necessary, FreeIdSet::claim_par_id, FreeIdSet::release_par_id, DirtyCardQueueSet::get_completed_buffer, DirtyCardQueueSet::clear, SATBMarkQueueSet::apply_closure_to_completed_buffer and SATBMarkQueueSet::abandon_partial_marking. Again, neither of these paths where the cbl monitor is held can expand callsites to a place where the free list locks are held. Therefore it holds that the cbl monitor can not be held while the free list lock is held, and the free list lock can not be held while the cbl monitor is held. Therefore they are held disjointly. 2) We might hold the shared queue locks before acquiring the completed buffer list monitor. (today we drop the shared queue lock then and reacquire it later as a hack as already described) 3) We do not acquire a shared queue lock while holding the free list lock or completed buffer list monitor, as there is no reference from a PtrQueueSet to its shared queue, so those code paths do not know how to reach the shared PtrQueue to acquire its lock. The derived classes are exceptions but they never use the shared queue lock while holding the completed buffer list monitor or free list lock. DirtyCardQueueSet uses the shared queue for concatenating logs (in a safepoint without holding those locks). The SATBMarkQueueSet uses the shared queue for filtering the buffers, fiddling with activeness, printing and resetting, all without grabbing any locks. 4) We do not acquire any other lock (above event) while holding the free list lock or completed buffer list monitors. This was discovered by manually expanding the call graphs from where these two locks are held. Derived constraints: a) Because of observation 1, the free list lock and completed buffer list monitors can have the same rank. b) Because of observations 1 and 2, the shared queue lock ought to have a rank higher than the ranks of the free list lock and the completed buffer list monitors (not the case today). c) Because of of observation 3 and 2, the free list lock and completed buffer list monitors ought to have a rank lower than the rank of the shared queue lock. d) Because of observation 4 (and constraints a-c), all the barrier locks should be below the "special" rank without violating any existing ranks. The proposed new lock ranks conform to the constraints derived from my observations. It is worth noting that the potential relationship that could break (and why they do not) are: 1) If a lock is acquired from within the barriers that does not involve the shared queue lock, the free list lock or the completed buffer list monitor, we have now inverted their relationship as that other lock would probably have a rank higher than or equal to "special". But due to observation 4, there are no such cases. 2) The relationship between the shared queue lock and the completed buffer list monitor has been changed so both can be held at the same time if the shared queue lock is acquired first (which it is). This is arguably the way it should have been from the first place, and the old solution had ugly hacks where we would drop the shared queue lock to not run into the lock order assert (and only not to run into the lock order assert, i.e. not to avoid potential deadlock) to ensure the locks are not held at the same time. That code has now been removed, so that the shared queue lock is still held when enqueueing completed buffers (no dodgy dropping and reclaiming), and the code for handling the races due to multiple concurrent enqueuers has also been removed and replaced with an assertion that there simply should not be multiple concurrent enqueuers. Since the shared queue lock is now held throughout the whole operation, there will be no concurrent enqueuers. 3) The completed buffer list monitor used to have a higher rank than the free list lock. Now they have the same. Therefore, they could previously allow them to be held at the same time if the cbl monitor was acquired first. However, as discussed, there is no such case, and they ought to have the same rank not to confuse their true disjointness. If anyone insists we do not break this relationship despite the true disjointness, I could consent to adding another access lock rank, like this: http://cr.openjdk.java.net/~eosterlund/8182703/webrev.01/ but I think it seems better to have the same rank since they are actually truly disjoint and should remain disjoint. I do recognize that long term, we *might* want a lock-free solution or something (not saying we do or do not). But until then, the ranks ought to be corrected so that they do not cause these problems causing everyone to bash their head against the awkward G1 lock ranks throughout the code and make hacks around it. Testing: JPRT with hotspot all and lots of local testing. Thanks, /Erik From thomas.schatzl at oracle.com Mon Jun 26 14:42:13 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 26 Jun 2017 16:42:13 +0200 Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing In-Reply-To: <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com> References: <1497352882.2829.65.camel@oracle.com> <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com> <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com> Message-ID: <1498488133.2665.37.camel@oracle.com> Hi Sangheon, ? thanks for all your changes, and sorry a bit for the delay... On Wed, 2017-06-14 at 00:52 -0700, sangheon wrote: > Hi Thomas again, > On 06/13/2017 02:21 PM, sangheon wrote: > > > > Hi Thomas, > > > > Thank you for reviewing this. > > > > On 06/13/2017 04:21 AM, Thomas Schatzl wrote: > > > > > > Hi Sangheon, > > > > > > > > > On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote: > > > > > > > > Hi Aleksey, > > > > > > > > Thanks for the review. > > > > > > > > On 06/12/2017 09:06 AM, Aleksey Shipilev wrote: > > > > > > > > > > On 06/10/2017 01:57 AM, sangheon wrote: > > > > > > > > > > > > CR: https://bugs.openjdk.java.net/browse/JDK-8173335 > > > > > > webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev > > > > > > .0 > > > - There should be a destructor in ReferenceProcessor cleaning up > > > the dynamically allocated memory. > > Thomas and I had some discussion about this and agreed to file a? > > separate CR for freeing issue. > > > > I noticed that there's no destructor when I wrote this, but this is > > how we usually implement. > > However as this seems incorrect, I will add a destructor for newly? > > added class but it will not be used in this patch. > > It will be used in the following CR(? > > https://bugs.openjdk.java.net/browse/JDK-8182120 ) which fixes? > > not-freeing issue in ReferenceProcessor. > > FYI, ReferenceProcessor has heap allocated members of? > > ReferencePolicy(and its friends) but it is not freed too. So > > instead of extending this patch, I propose to separate this freeing > > issue. That's fine, thanks. > > > > > - the change should move gc+ref output to something else: there > > > is so much additional junk printed with gc+ref=trace so that the > > > phase logging is drowned out with real trace information and > > > unusable for regular consumption. > > Okay, I will add it. > > But I asked introducing 'gc+ref+phases' before but you didn't like > > it. :) Probably I didn't provide much details?! Yes. In the example you showed me earlier with gc+ref=trace the examples did not contain the other gc+ref=trace output. That's why I thought it would be fine. :) > > > > > > - I would prefer if resetting the reference phase times logger > > > wouldn't be kind of an afterthought of printing :) > > > > > > Also it might be useful to keep the data around for somewhat > > > longer (not throw it away after every phase). Don't we need the > > > data for further analysis? > > I don't have strong opinion on this. > > > > I didn't consider keeping log data for further analysis. This could > > a minor reason for supporting keeping log data longer but I think? > > interspersing with existing G1 log would be the main reason of > > keeping it. > > > > > > > > > > > This would also allow printing it later using different log tags > > > (with different formatting). > > > > > > - I like the split of phasetimes into data storage and printing. > > > I do not like that basically the timing data is created twice, > > > once for the phasetimes, once for the GCTimer (for JFR > > > basically). No, currently timing data is created once and used > > > for both phase log ?and GCTimer. > > Or am I missing something? > > > > So in summary, mostly I agree with your comments except below 2: > > 1. Interspersing with G1 log. > > 2. Keeping log data longer. (This should be done if we go with? > > interspersing idea) > I started working on above 2 items. :) > I will update webrev when I'm ready. > Thanks a lot for considering all my comments. I think the output is much nicer now :) Some more notes: - In the current change (webrev.2) the method with using the "direct_print()" getter seems a bit forced only to keep the current structure of the code, i.e. printing within the ReferenceProcessor::process_references() method. What do you think about moving the printing outside of that method for all collectors, just passing a (properly initialized - that allows moving the reset() method into gc specific code as well) ReferenceProcessorPhaseTimes* that is then later used for printing, either directly, or deferred? At the location where the reference processing is done we know whether we need to print directly or deferred. This also hides pretty specific information about printing (like indentation level) from the reference processing itself. Also that would maybe allow storing the GCTimer reference somewhere in the ReferenceProcessorPhaseTimes so that?we only need to pass a single container for timing information around. Overall that may reduce the code quite a bit, keeps similar components (GCTimer and ReferenceProcessorPhaseTimes) together without ReferenceProcessor needing to know about both of them, and removes the ReferenceProcessor "global" reference to the ReferenceProcessorPhaseTimes, which is easier to keep track of when looking at the code (instead of having the GCTimer passed in and the ReferenceProcessorPhaseTimes as member). The collectors that print immediately probably also can get away with a stack-allocated local ReferenceProcessorPhaseTimes, which somewhat simplifies their lifecycle management. - could you please tighten the visibility of ReferenceProcessorPhaseTimes methods a bit? The getters of that class are only ever used in the print* methods, and even some of these print* methods are ever called from class local methods. I think this would drastically decrease the surface of that method. - there seems to be a bug in printing per-thread per-phase worker times, the values seem to contain the absolute time at which the list has been processed, not a duration. (with -XX:+ParallelRefProcEnabled and gc+phases+ref=trace) [1512.286s][debug][gc,phases,ref] GC(834) Reference Processing: 2.5ms [1512.286s][debug][gc,phases,ref] GC(834)???SoftReference: 0.3ms [1512.286s][debug][gc,phases,ref] GC(834)?????Balance queues: 0.0ms [1512.286s][debug][gc,phases,ref] GC(834)?????Phase1: 0.3ms [1512.286s][trace][gc,phases,ref] GC(834)???????Process lists (ms)????????Min: 1512283.9, Avg: 1512283.9, Max: 1512283.9, Diff:??0.0, Sum: 34782529.1, Workers: 23 [1512.286s][debug][gc,phases,ref] GC(834)?????Phase2: 0.3ms [1512.286s][trace][gc,phases,ref] GC(834)???????Process lists (ms)????????Min: 1512284.2, Avg: 1512284.2, Max: 1512284.2, Diff:??0.0, Sum: 34782535.9, Workers: 23 - in referenceProcessorPhaseTimes.cpp:35: the code reads if (_worker_time != NULL) { ? ... } with _worker_time being set to NULL just one line above (same with the other constructor). Not sure. - in RefProcWorkerTimeTracker::~RefProcWorkerTimeTracker: how is it possible that _worker_time is NULL? ReferenceProcessorPhaseTimes seems to always allocate memory for it. - RefProcPhaseTimesTracker takes the DiscoveredList array as parameter, but only ever uses it to determine how many total entries this DiscoveredList[] has. So it seems to me that it would be better in the name of information hiding if the ReferenceProcessor, which already has a total_count() method, would just pass this total instead of the entire list. This would also remove the need for the max_gc_counts() getter in ReferenceProcessorPhaseTimes afaics too. - "Ref Counts" vs. "Reference Counts" vs. something else in the output of the enqueue phase: I would prefer to not use abbreviations. Since we already mess up the logging output in a big way, we might also just go all the way :P Thanks, ? Thomas From thomas.schatzl at oracle.com Tue Jun 27 13:34:07 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 27 Jun 2017 15:34:07 +0200 Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in HeapRegionManager::par_iterate Message-ID: <1498570447.2750.9.camel@oracle.com> Hi all, ? can I have a review for this change that removes an unused parameter in HeapRegionManager, and propagating this change to the callers? I think one Reviewer should be sufficient for this change. CR: https://bugs.openjdk.java.net/browse/JDK-8183002 Webrev: http://cr.openjdk.java.net/~tschatzl/8183002/webrev/ Testing: jprt Thanks, ? Thomas From thomas.schatzl at oracle.com Tue Jun 27 13:34:08 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 27 Jun 2017 15:34:08 +0200 Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure Message-ID: <1498570448.2750.10.camel@oracle.com> Hi all, ? subject says it all. I think one Reviewer should be sufficient for this change. CR: https://bugs.openjdk.java.net/browse/JDK-8183006 Webrev: http://cr.openjdk.java.net/~tschatzl/8183006/webrev/ Testing: local compilation Thanks, ? Thomas From erik.helin at oracle.com Tue Jun 27 13:44:56 2017 From: erik.helin at oracle.com (Erik Helin) Date: Tue, 27 Jun 2017 15:44:56 +0200 Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in HeapRegionManager::par_iterate In-Reply-To: <1498570447.2750.9.camel@oracle.com> References: <1498570447.2750.9.camel@oracle.com> Message-ID: <79e3a710-7f32-d058-a481-0ecdbf8f3b50@oracle.com> On 06/27/2017 03:34 PM, Thomas Schatzl wrote: > Hi all, > > can I have a review for this change that removes an unused parameter > in HeapRegionManager, and propagating this change to the callers? > > I think one Reviewer should be sufficient for this change. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8183002 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8183002/webrev/ Thank you Thomas, looks good, Reviewed! Erik > Testing: > jprt > > Thanks, > Thomas > From erik.helin at oracle.com Tue Jun 27 13:45:27 2017 From: erik.helin at oracle.com (Erik Helin) Date: Tue, 27 Jun 2017 15:45:27 +0200 Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure In-Reply-To: <1498570448.2750.10.camel@oracle.com> References: <1498570448.2750.10.camel@oracle.com> Message-ID: <0cf994b8-967b-2dd3-b386-253dd7dcc036@oracle.com> On 06/27/2017 03:34 PM, Thomas Schatzl wrote: > Hi all, > > subject says it all. > > I think one Reviewer should be sufficient for this change. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8183006 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8183006/webrev/ > Testing: > local compilation Looks good, Reviewed. Thanks, Erik > Thanks, > Thomas > From thomas.schatzl at oracle.com Tue Jun 27 13:54:36 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 27 Jun 2017 15:54:36 +0200 Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in HeapRegionManager::par_iterate In-Reply-To: <79e3a710-7f32-d058-a481-0ecdbf8f3b50@oracle.com> References: <1498570447.2750.9.camel@oracle.com> <79e3a710-7f32-d058-a481-0ecdbf8f3b50@oracle.com> Message-ID: <1498571676.2750.12.camel@oracle.com> Hi Erik, On Tue, 2017-06-27 at 15:44 +0200, Erik Helin wrote: > On 06/27/2017 03:34 PM, Thomas Schatzl wrote: > > > > Hi all, > > > > > > [...] > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8183002 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8183002/webrev/ > Thank you Thomas, looks good, Reviewed! ?thanks for your review. Thomas From thomas.schatzl at oracle.com Tue Jun 27 13:54:57 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 27 Jun 2017 15:54:57 +0200 Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure In-Reply-To: <0cf994b8-967b-2dd3-b386-253dd7dcc036@oracle.com> References: <1498570448.2750.10.camel@oracle.com> <0cf994b8-967b-2dd3-b386-253dd7dcc036@oracle.com> Message-ID: <1498571697.2750.13.camel@oracle.com> Hi, On Tue, 2017-06-27 at 15:45 +0200, Erik Helin wrote: > On 06/27/2017 03:34 PM, Thomas Schatzl wrote: > > > > Hi all, > > > > [...] > > > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8183006 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8183006/webrev/ > > Testing: > > local compilation > Looks good, Reviewed. ?thanks for your review. Thomas From robbin.ehn at oracle.com Tue Jun 27 14:51:45 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 27 Jun 2017 16:51:45 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> Message-ID: <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> Hi Roman, There is something wrong in calculations: INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0 : pop=27051 free=215487 free is larger than population, have not had the time to dig into this. Thanks, Robbin On 06/22/2017 10:19 PM, Roman Kennke wrote: > So here's the latest iteration of that patch: > > http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ > > > I checked and fixed all the counters. The problem here is that they are > not updated in a single place (deflate_idle_monitors() ) but in several > places, potentially by multiple threads. I split up deflation into > prepare_.. and a finish_.. methods to initialize local and update global > counters respectively, and pass around a counters object (allocated on > stack) to the various code paths that use it. Updating the counters > always happen under a lock, there's no need to do anything special with > regards to concurrency. > > I also checked the nmethod marking, but there doesn't seem to be > anything in that code that looks problematic under concurrency. The > worst that can happen is that two threads write the same value into an > nmethod field. I think we can live with that ;-) > > Good to go? > > Tested by running specjvm and jcstress fastdebug+release without issues. > > Roman > > Am 02.06.2017 um 12:39 schrieb Robbin Ehn: >> Hi Roman, >> >> On 06/02/2017 11:41 AM, Roman Kennke wrote: >>> Hi David, >>> thanks for reviewing. I'll be on vacation the next two weeks too, with >>> only sporadic access to work stuff. >>> Yes, exposure will not be as good as otherwise, but it's not totally >>> untested either: the serial code path is the same as the parallel, the >>> only difference is that it's not actually called by multiple threads. >>> It's ok I think. >>> >>> I found two more issues that I think should be addressed: >>> - There are some counters in deflate_idle_monitors() and I'm not sure I >>> correctly handle them in the split-up and MT'ed thread-local/ global >>> list deflation >>> - nmethod marking seems to unconditionally poke true or something like >>> that in nmethod fields. This doesn't hurt correctness-wise, but it's >>> probably worth checking if it's already true, especially when doing this >>> with multiple threads concurrently. >>> >>> I'll send an updated patch around later, I hope I can get to it today... >> >> I'll review that when you get it out. >> I think this looks as a reasonable step before we tackle this with a >> major effort, such as the JEP you and Carsten doing. >> And another effort to 'fix' nmethods marking. >> >> Internal discussion yesterday lead us to conclude that the runtime >> will probably need more threads. >> This would be a good driver to do a 'global' worker pool which serves >> both gc, runtime and safepoints with threads. >> >>> >>> Roman >>> >>>> Hi Roman, >>>> >>>> I am about to disappear on an extended vacation so will let others >>>> pursue this. IIUC this is longer an opt-in by the user at runtime, but >>>> an opt-in by the particular GC developers. Okay. My only concern with >>>> that is if Shenandoah is the only GC that currently opts in then this >>>> code is not going to get much testing and will be more prone to >>>> incidental breakage. >> >> As I mentioned before, it seem like Erik ? have some idea, maybe he >> can do this after his barrier patch. >> >> Thanks! >> >> /Robbin >> >>>> >>>> Cheers, >>>> David >>>> >>>> On 2/06/2017 2:21 AM, Roman Kennke wrote: >>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>>>>> Hi Roman, >>>>>>> >>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>>>>> Hi Roman, I agree that is really needed but: >>>>>>>>> >>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>>>>> >>>>>>>>>> We need to be able to use the workers at a safepoint during >>>>>>>>>> concurrent >>>>>>>>>> GC work (which also uses the same workers). This does not only >>>>>>>>>> require >>>>>>>>>> that those workers be suspended, like e.g. >>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. >>>>>>>>>> have >>>>>>>>>> finished their tasks. This needs some careful handling to work >>>>>>>>>> without >>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>>>>>> corresponding >>>>>>>>>> run_task() call and also the tasks themselves need to join the >>>>>>>>>> STS and >>>>>>>>>> handle requests for safepoints not by yielding, but by leaving >>>>>>>>>> the >>>>>>>>>> task. >>>>>>>>>> This is far too peculiar for me to make the call to hook up GC >>>>>>>>>> workers >>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I left the >>>>>>>>>> API in >>>>>>>>>> CollectedHeap in place. I think GC devs who know better about G1 >>>>>>>>>> and CMS >>>>>>>>>> should make that call, or else just use a separate thread pool. >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Is it ok now? >>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup" >>>>>>>>> workers >>>>>>>>> inside Shenandoah, >>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, >>>>>>>>> e.g.: >>>>>>>>> >>>>>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>>>>> _cleanup_workers->total_workers() : 1; >>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>>>>> if (_cleanup_workers != NULL) { >>>>>>>>> _cleanup_workers->run_task(&cleanup, _num_cleanup_workers); >>>>>>>>> } else { >>>>>>>>> cleanup.work(0); >>>>>>>>> } >>>>>>>>> >>>>>>>>> That way you don't even need your new flags, but it will be up to >>>>>>>>> the >>>>>>>>> other GCs to make their worker available >>>>>>>>> or cheat with a separate workgang. >>>>>>>> I can do that, I don't mind. The question is, do we want that? >>>>>>> The problem is that we do not want to haste such decision, we >>>>>>> believe >>>>>>> there is a better solution. >>>>>>> I think you also would want another solution. >>>>>>> But it's seems like such solution with 1 'global' thread pool either >>>>>>> own by GC or the VM it self is quite the undertaking. >>>>>>> Since this probably will not be done any time soon my suggestion is, >>>>>>> to not hold you back (we also want this), just to make >>>>>>> the code parallel and as an intermediate step ask the GC if it minds >>>>>>> sharing it's thread. >>>>>>> >>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will share >>>>>>> the code for a separate thread pool, do something of it's own or >>>>>>> wait until the bigger question about thread pool(s) have been >>>>>>> resolved. >>>>>>> >>>>>>> By adding a thread pool directly to the SafepointSynchronizer and >>>>>>> flags for it we might limit our future options. >>>>>>> >>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I >>>>>>>> see >>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint. >>>>>>>> However: >>>>>>> Yes it's not cheating but I want decent heuristics between e.g. >>>>>>> number >>>>>>> of concurrent marking threads and parallel safepoint threads since >>>>>>> they compete for cpu time. >>>>>>> As the code looks now, I think that decisions must be made by the >>>>>>> GC. >>>>>> Ok, I see your point. I updated the proposed patch accordingly: >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>>>>> >>>>> Oops. Minor mistake there. Correction: >>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>>>> >>>>> >>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it >>>>> into >>>>> collectedHeap.hpp, resulting in build failure...) >>>>> >>>>> Roman >>>>> >>> > From alexander.harlap at oracle.com Tue Jun 27 16:28:31 2017 From: alexander.harlap at oracle.com (Alexander Harlap) Date: Tue, 27 Jun 2017 12:28:31 -0400 Subject: Need sponsor to push attached 8178507 into jdk10/hs/hostspt Message-ID: <1fa400ef-1dcd-d43b-2646-053c68b0ab1f@oracle.com> I need a sponsor to push attached 8178507.patch - co-locate nsk.regression.gc tests. Patch should go into jdk10/hs/hotspot Reviewed by Leonid Mesnik and Igor Ignatiev Thank you, Alex -------------- next part -------------- # HG changeset patch # User aharlap # Date 1498580135 14400 # Node ID f8228472bcdc7ba4b184a3b7e9f5f571e95fe8b4 # Parent 7d3478491210390556a9f34210bc9bc8d9f5ebd1 8178507: co-locate nsk.regression.gc tests Summary: convert four tonga tests into jtreg Reviewed-by: lmesnik, iignatyev diff -r 7d3478491210 -r f8228472bcdc make/test/JtregNative.gmk --- a/make/test/JtregNative.gmk Tue Jun 27 12:27:27 2017 +0000 +++ b/make/test/JtregNative.gmk Tue Jun 27 12:15:35 2017 -0400 @@ -45,6 +45,7 @@ BUILD_HOTSPOT_JTREG_NATIVE_SRC += \ $(HOTSPOT_TOPDIR)/test/gc/g1/TestJNIWeakG1 \ $(HOTSPOT_TOPDIR)/test/gc/stress/gclocker \ + $(HOTSPOT_TOPDIR)/test/gc/cslocker \ $(HOTSPOT_TOPDIR)/test/native_sanity \ $(HOTSPOT_TOPDIR)/test/runtime/jni/8025979 \ $(HOTSPOT_TOPDIR)/test/runtime/jni/8033445 \ diff -r 7d3478491210 -r f8228472bcdc test/gc/TestFullGCALot.java --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test/gc/TestFullGCALot.java Tue Jun 27 12:15:35 2017 -0400 @@ -0,0 +1,38 @@ +/* + * Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved. + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This code is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 only, as + * published by the Free Software Foundation. + * + * This code is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * version 2 for more details (a copy is included in the LICENSE file that + * accompanied this code). + * + * You should have received a copy of the GNU General Public License version + * 2 along with this work; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. + * + * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA + * or visit www.oracle.com if you need additional information or have any + * questions. + */ + +/* + * @test TestFullGCALot + * @key gc + * @bug 4187687 + * @summary Ensure no acess violation when using FullGCALot + * @run main/othervm -XX:+FullGCALot TestFullGCALot + */ + +public class TestFullGCALot { + + public static void main(String argv[]) { + System.out.println("Hello world!"); + } +} + diff -r 7d3478491210 -r f8228472bcdc test/gc/TestMemoryInitialization.java --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test/gc/TestMemoryInitialization.java Tue Jun 27 12:15:35 2017 -0400 @@ -0,0 +1,48 @@ + +/* + * Copyright (c) 2002, 2017 Oracle and/or its affiliates. All rights reserved. + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This code is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 only, as + * published by the Free Software Foundation. + * + * This code is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * version 2 for more details (a copy is included in the LICENSE file that + * accompanied this code). + * + * You should have received a copy of the GNU General Public License version + * 2 along with this work; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. + * + * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA + * or visit www.oracle.com if you need additional information or have any + * questions. + */ + +/* + * @test TestMemoryInitialization + * @key gc + * @bug 4668531 + * @summary Simple test for -XX:+CheckMemoryInitialization doesn't crash VM + * @run main/othervm -XX:+CheckMemoryInitialization TestMemoryInitialization + */ + +public class TestMemoryInitialization { + final static int LOOP_LENGTH = 10; + final static int CHUNK_SIZE = 1500000; + + public static byte[] buffer; + + public static void main(String args[]) { + + for (int i = 0; i < LOOP_LENGTH; i++) { + for (int j = 0; j < LOOP_LENGTH; j++) { + buffer = new byte[CHUNK_SIZE]; + buffer = null; + } + } + } +} diff -r 7d3478491210 -r f8228472bcdc test/gc/TestStackOverflow.java --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test/gc/TestStackOverflow.java Tue Jun 27 12:15:35 2017 -0400 @@ -0,0 +1,60 @@ +/* + * Copyright (c) 2002, 2017 Oracle and/or its affiliates. All rights reserved. + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This code is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 only, as + * published by the Free Software Foundation. + * + * This code is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * version 2 for more details (a copy is included in the LICENSE file that + * accompanied this code). + * + * You should have received a copy of the GNU General Public License version + * 2 along with this work; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. + * + * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA + * or visit www.oracle.com if you need additional information or have any + * questions. + */ + +/* + * @test TestStackOverflow + * @key gc + * @bug 4396719 + * @summary Test verifies only that VM doesn't crash but throw expected Error. + * @run main/othervm TestStackOverflow + */ + +public class TestStackOverflow { + final static int LOOP_LENGTH = 1000000; + final static int LOGGING_STEP = 10000; + + public static void main(String args[]) { + Object object = null; + + for (int i = 0; i < LOOP_LENGTH; i++) { + + // Check progress + if (i % LOGGING_STEP == 0) { + System.out.println(i); + } + try { + Object array[] = {object, object, object, object, object}; + object = array; + } catch (OutOfMemoryError e) { + object = null; + System.out.println("Caught OutOfMemoryError."); + return; + } catch (StackOverflowError e) { + object = null; + System.out.println("Caught StackOverflowError."); + return; + } + } + } +} + diff -r 7d3478491210 -r f8228472bcdc test/gc/cslocker/TestCSLocker.java --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test/gc/cslocker/TestCSLocker.java Tue Jun 27 12:15:35 2017 -0400 @@ -0,0 +1,98 @@ +/* + * Copyright (c) 2007, 2017 Oracle and/or its affiliates. All rights reserved. + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This code is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 only, as + * published by the Free Software Foundation. + * + * This code is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * version 2 for more details (a copy is included in the LICENSE file that + * accompanied this code). + * + * You should have received a copy of the GNU General Public License version + * 2 along with this work; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. + * + * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA + * or visit www.oracle.com if you need additional information or have any + * questions. + */ + +/* + * @test TestCSLocker + * @key gc + * @bug 6186200 + * @summary This short test check RFE 6186200 changes. One thread locked + * @summary completely in JNI CS, while other is trying to allocate memory + * @summary provoking GC. OOM means FAIL, deadlock means PASS. + * @run main/native/othervm -Xmx256m TestCSLocker + */ + +public class TestCSLocker extends Thread +{ + static int timeout = 5000; + public static void main(String args[]) throws Exception { + long startTime = System.currentTimeMillis(); + + // start garbage producer thread + GarbageProducer garbageProducer = new GarbageProducer(1000000, 10); + garbageProducer.start(); + + // start CS locker thread + CSLocker csLocker = new CSLocker(); + csLocker.start(); + + // check timeout to success deadlocking + while(System.currentTimeMillis() < startTime + timeout) { + System.out.println("sleeping..."); + sleep(1000); + } + + csLocker.unlock(); + garbageProducer.interrupt(); + } +} + +class GarbageProducer extends Thread +{ + private int size; + private int sleepTime; + + GarbageProducer(int size, int sleepTime) { + this.size = size; + this.sleepTime = sleepTime; + } + + public void run() { + boolean isRunning = true; + + while (isRunning) { + try { + int[] arr = null; + arr = new int[size]; + sleep(sleepTime); + } catch (InterruptedException e) { + isRunning = false; + } + } + } +} + +class CSLocker extends Thread +{ + static { System.loadLibrary("TestCSLocker"); } + + public void run() { + int[] a = new int[10]; + a[0] = 1; + if (!lock(a)) { + throw new RuntimeException("failed to acquire CSLock"); + } + } + + native boolean lock(int[] array); + native void unlock(); +} diff -r 7d3478491210 -r f8228472bcdc test/gc/cslocker/libTestCSLocker.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test/gc/cslocker/libTestCSLocker.c Tue Jun 27 12:15:35 2017 -0400 @@ -0,0 +1,49 @@ +/* + * Copyright (c) 2007, 2017 Oracle and/or its affiliates. All rights reserved. + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This code is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 only, as + * published by the Free Software Foundation. + * + * This code is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * version 2 for more details (a copy is included in the LICENSE file that + * accompanied this code). + * + * You should have received a copy of the GNU General Public License version + * 2 along with this work; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. + * + * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA + * or visit www.oracle.com if you need additional information or have any + * questions. + */ + +#include + +static volatile int release_critical = 0; + +JNIEXPORT jboolean JNICALL Java_CSLocker_lock + (JNIEnv *env, jobject obj, jintArray array) +{ + jboolean retval = JNI_TRUE; + void *nativeArray = (*env)->GetPrimitiveArrayCritical(env, array, 0); + + if (nativeArray == NULL) { + retval = JNI_FALSE; + } + + // deadblock + while (!release_critical) /* empty */; + + (*env)->ReleasePrimitiveArrayCritical(env, array, nativeArray, 0); + return retval; +} + +JNIEXPORT void JNICALL Java_CSLocker_unlock + (JNIEnv *env, jobject obj) +{ + release_critical = 1; +} From sangheon.kim at oracle.com Tue Jun 27 17:08:09 2017 From: sangheon.kim at oracle.com (sangheon) Date: Tue, 27 Jun 2017 10:08:09 -0700 Subject: Need sponsor to push attached 8178507 into jdk10/hs/hostspt In-Reply-To: <1fa400ef-1dcd-d43b-2646-053c68b0ab1f@oracle.com> References: <1fa400ef-1dcd-d43b-2646-053c68b0ab1f@oracle.com> Message-ID: <24ade58f-becd-29ba-1775-122b54b42d35@oracle.com> Hi Alex, I can sponsor this. Thanks, Sangheon On 06/27/2017 09:28 AM, Alexander Harlap wrote: > I need a sponsor to push attached 8178507.patch - co-locate > nsk.regression.gc tests. > > Patch should go into jdk10/hs/hotspot > > Reviewed by Leonid Mesnik and Igor Ignatiev > > Thank you, > > Alex > From email.sundarms at gmail.com Tue Jun 27 19:44:51 2017 From: email.sundarms at gmail.com (Sundara Mohan M) Date: Tue, 27 Jun 2017 12:44:51 -0700 Subject: Any idea why max = -1(-1K) in G1GC Message-ID: When i try to get pool.getUsage() and print it i am getting G1 Eden Space init = 27262976(26624K) used = 0(0K) committed = 0(0K) max = -1(-1K) G1 Survivor Space init = 0(0K) used = 0(0K) committed = 0(0K) max = -1(-1K) G1 Old Gen init = 241172480(235520K) used = 0(0K) committed = 0(0K) max = 524288000(512000K) With ConcMarkSweepGC Par Eden Space init = 71630848(69952K) used = 0(0K) committed = 0(0K) max = 139853824(136576K) Par Survivor Space init = 8912896(8704K) used = 0(0K) committed = 0(0K) max = 17432576(17024K) CMS Old Gen init = 178978816(174784K) used = 0(0K) committed = 0(0K) max = 349569024(341376K) code for (MemoryPoolMXBean pool : ManagementFactory.getMemoryPoolMXBeans()) { System.out.println(pool.getUsage()) } Thanks, Sundar From rkennke at redhat.com Tue Jun 27 19:47:21 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 27 Jun 2017 21:47:21 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> Message-ID: <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> Hi Robbin, Ugh. Thanks for catching this. Problem was that I was accounting the thread-local deflations twice: once in thread-local processing (basically a leftover from my earlier attempt to implement this accounting) and then again in finish_deflate_idle_monitors(). Should be fixed here: http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/ Side question: which jtreg targets do you usually run? Trying: make test TEST=hotspot_all gives me *lots* of failures due to missing jcstress stuff (?!) And even other subsets seem to depend on several bits and pieces that I have no idea about. Roman Am 27.06.2017 um 16:51 schrieb Robbin Ehn: > Hi Roman, > > There is something wrong in calculations: > INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0 > : pop=27051 free=215487 > > free is larger than population, have not had the time to dig into this. > > Thanks, Robbin > > On 06/22/2017 10:19 PM, Roman Kennke wrote: >> So here's the latest iteration of that patch: >> >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ >> >> >> I checked and fixed all the counters. The problem here is that they are >> not updated in a single place (deflate_idle_monitors() ) but in several >> places, potentially by multiple threads. I split up deflation into >> prepare_.. and a finish_.. methods to initialize local and update global >> counters respectively, and pass around a counters object (allocated on >> stack) to the various code paths that use it. Updating the counters >> always happen under a lock, there's no need to do anything special with >> regards to concurrency. >> >> I also checked the nmethod marking, but there doesn't seem to be >> anything in that code that looks problematic under concurrency. The >> worst that can happen is that two threads write the same value into an >> nmethod field. I think we can live with that ;-) >> >> Good to go? >> >> Tested by running specjvm and jcstress fastdebug+release without issues. >> >> Roman >> >> Am 02.06.2017 um 12:39 schrieb Robbin Ehn: >>> Hi Roman, >>> >>> On 06/02/2017 11:41 AM, Roman Kennke wrote: >>>> Hi David, >>>> thanks for reviewing. I'll be on vacation the next two weeks too, with >>>> only sporadic access to work stuff. >>>> Yes, exposure will not be as good as otherwise, but it's not totally >>>> untested either: the serial code path is the same as the parallel, the >>>> only difference is that it's not actually called by multiple threads. >>>> It's ok I think. >>>> >>>> I found two more issues that I think should be addressed: >>>> - There are some counters in deflate_idle_monitors() and I'm not >>>> sure I >>>> correctly handle them in the split-up and MT'ed thread-local/ global >>>> list deflation >>>> - nmethod marking seems to unconditionally poke true or something like >>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's >>>> probably worth checking if it's already true, especially when doing >>>> this >>>> with multiple threads concurrently. >>>> >>>> I'll send an updated patch around later, I hope I can get to it >>>> today... >>> >>> I'll review that when you get it out. >>> I think this looks as a reasonable step before we tackle this with a >>> major effort, such as the JEP you and Carsten doing. >>> And another effort to 'fix' nmethods marking. >>> >>> Internal discussion yesterday lead us to conclude that the runtime >>> will probably need more threads. >>> This would be a good driver to do a 'global' worker pool which serves >>> both gc, runtime and safepoints with threads. >>> >>>> >>>> Roman >>>> >>>>> Hi Roman, >>>>> >>>>> I am about to disappear on an extended vacation so will let others >>>>> pursue this. IIUC this is longer an opt-in by the user at runtime, >>>>> but >>>>> an opt-in by the particular GC developers. Okay. My only concern with >>>>> that is if Shenandoah is the only GC that currently opts in then this >>>>> code is not going to get much testing and will be more prone to >>>>> incidental breakage. >>> >>> As I mentioned before, it seem like Erik ? have some idea, maybe he >>> can do this after his barrier patch. >>> >>> Thanks! >>> >>> /Robbin >>> >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote: >>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>>>>>> Hi Roman, >>>>>>>> >>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>>>>>> Hi Roman, I agree that is really needed but: >>>>>>>>>> >>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>>>>>> >>>>>>>>>>> We need to be able to use the workers at a safepoint during >>>>>>>>>>> concurrent >>>>>>>>>>> GC work (which also uses the same workers). This does not only >>>>>>>>>>> require >>>>>>>>>>> that those workers be suspended, like e.g. >>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. >>>>>>>>>>> have >>>>>>>>>>> finished their tasks. This needs some careful handling to work >>>>>>>>>>> without >>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>>>>>>> corresponding >>>>>>>>>>> run_task() call and also the tasks themselves need to join the >>>>>>>>>>> STS and >>>>>>>>>>> handle requests for safepoints not by yielding, but by leaving >>>>>>>>>>> the >>>>>>>>>>> task. >>>>>>>>>>> This is far too peculiar for me to make the call to hook up GC >>>>>>>>>>> workers >>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I >>>>>>>>>>> left the >>>>>>>>>>> API in >>>>>>>>>>> CollectedHeap in place. I think GC devs who know better >>>>>>>>>>> about G1 >>>>>>>>>>> and CMS >>>>>>>>>>> should make that call, or else just use a separate thread pool. >>>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Is it ok now? >>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup" >>>>>>>>>> workers >>>>>>>>>> inside Shenandoah, >>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, >>>>>>>>>> e.g.: >>>>>>>>>> >>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>>>>>> _cleanup_workers->total_workers() : 1; >>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>>>>>> if (_cleanup_workers != NULL) { >>>>>>>>>> _cleanup_workers->run_task(&cleanup, >>>>>>>>>> _num_cleanup_workers); >>>>>>>>>> } else { >>>>>>>>>> cleanup.work(0); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> That way you don't even need your new flags, but it will be >>>>>>>>>> up to >>>>>>>>>> the >>>>>>>>>> other GCs to make their worker available >>>>>>>>>> or cheat with a separate workgang. >>>>>>>>> I can do that, I don't mind. The question is, do we want that? >>>>>>>> The problem is that we do not want to haste such decision, we >>>>>>>> believe >>>>>>>> there is a better solution. >>>>>>>> I think you also would want another solution. >>>>>>>> But it's seems like such solution with 1 'global' thread pool >>>>>>>> either >>>>>>>> own by GC or the VM it self is quite the undertaking. >>>>>>>> Since this probably will not be done any time soon my >>>>>>>> suggestion is, >>>>>>>> to not hold you back (we also want this), just to make >>>>>>>> the code parallel and as an intermediate step ask the GC if it >>>>>>>> minds >>>>>>>> sharing it's thread. >>>>>>>> >>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will >>>>>>>> share >>>>>>>> the code for a separate thread pool, do something of it's own or >>>>>>>> wait until the bigger question about thread pool(s) have been >>>>>>>> resolved. >>>>>>>> >>>>>>>> By adding a thread pool directly to the SafepointSynchronizer and >>>>>>>> flags for it we might limit our future options. >>>>>>>> >>>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I >>>>>>>>> see >>>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint. >>>>>>>>> However: >>>>>>>> Yes it's not cheating but I want decent heuristics between e.g. >>>>>>>> number >>>>>>>> of concurrent marking threads and parallel safepoint threads since >>>>>>>> they compete for cpu time. >>>>>>>> As the code looks now, I think that decisions must be made by the >>>>>>>> GC. >>>>>>> Ok, I see your point. I updated the proposed patch accordingly: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>>>>>> >>>>>> Oops. Minor mistake there. Correction: >>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>>>>> >>>>>> >>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it >>>>>> into >>>>>> collectedHeap.hpp, resulting in build failure...) >>>>>> >>>>>> Roman >>>>>> >>>> >> From thomas.schatzl at oracle.com Wed Jun 28 08:25:57 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 28 Jun 2017 10:25:57 +0200 Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure In-Reply-To: <1498128249.2831.38.camel@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <1493986396.2777.61.camel@oracle.com> <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com> <1497945947.2784.6.camel@oracle.com> <1498128249.2831.38.camel@oracle.com> Message-ID: <1498638357.2874.6.camel@oracle.com> Hi all, ? Erik suggested a few more refactorings: - rename G1ParClosureSuper ->?G1ScanClosureBase - rename a few "oops_in_heap_closure" parameter -> "update_rs_cl" - move instantiation of closures from oops_into_collection_set_do() into scan_rem_set()/update_rem_set() methods. I assume these are the final ones :) Webrevs: http://cr.openjdk.java.net/~tschatzl/8175554/webrev.3/?(full) http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2_to_3/?(diff) Testing: jprt Thanks, ? Thomas On Thu, 2017-06-22 at 12:44 +0200, Thomas Schatzl wrote: > Hi all, > > ? after discussion with Erik, I removed one comment, and renamed the > closures to something that resembles their use. Also I had to > reintroduce the G1ParPushRefClosure removed in the initial patch due > to > performance regressions. > > G1UpdateOrScanRSClosure -> G1ScanObjsDuringUpdateRSClosure > G1ParPushRefClosure -> G1ScanObjsDuringScanRSClosure > G1ParScanClosure -> G1ScanEvacuatedObjClosure > > We also found that the mechanism to collect cards that contain > references into the collection set to not lose any remembered set > entries during update RS if there is an evacuation failure is > basically > superfluous. Other, existing mechanism make sure that all required > remembered sets are (re-)created in other stages of the GC. > > Removal of this code has been decided to be out of scope here. > > Webrev: > http://cr.openjdk.java.net/~tschatzl/8175554/webrev.1_to_2/?(diff) > http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2/?(full) > Testing: > jprt, local testing > > Thanks, > ? Thomas > > > On Tue, 2017-06-20 at 10:05 +0200, Thomas Schatzl wrote: > > > > Hi Sangheon, others, > > > > On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote: > > > > > > > > > Hi Thomas, > > > > > > On 05/05/2017 05:13 AM, Thomas Schatzl wrote: > > > > > > > > > > > > > > > > Hi all, > > > > > > > > ???recent reviews have made changes necessary to parts of the > > > > changeset chain. > > > > > > > > Here is a list of links to updated webrevs. Since they have > > > > apparently not been reviewed yet, I simply overwrote the old > > > > webrevs. > > > > > > > > JDK-8177044: Remove _scan_top from HeapRegion > > > > http://cr.openjdk.java.net/~tschatzl/8177044/webrev/ > > > > > > > > JDK-8178148: Log more detailed information about scan rs phase > > > > http://cr.openjdk.java.net/~tschatzl/8178148/webrev/ > > > > > > > > JDK-8175554: Improve G1UpdateRSOrPushRefClosure > > > > http://cr.openjdk.java.net/~tschatzl/8175554/webrev/ > > > Looks good to me. > > > I only have minor nits. > > > > > > ------------------------------------------------------ > > > src/share/vm/gc/g1/g1OopClosures.hpp > > > ???78???virtual void do_oop(oop* p) { do_oop_nv(p); } > > > Misaligned with above line. > > > > > > ------------------------------------------------------ > > > src/share/vm/gc/g1/g1RemSet.hpp > > > ? 204???????????????????G1UpdateOrScanRSClosure* push_heap_cl, > > > Rename to reflect new closure name? > > > > > > ------------------------------------------------------ > > > src/share/vm/gc/g1/g1RootProcessor.hpp > > > Copyright update. > > > > > > ------------------------------------------------------ > > > src/share/vm/gc/g1/g1_specialized_oop_closures.hpp > > > ???45???????f(G1UpdateOrScanRSClosure,_nv)?????????\ > > > Misaligned '\'. > > > > > ? I fixed all this in addition to incorporating ErikD's comments > > that > > asked for factoring out two parts of the G1ParScanClosure and > > G1UpdateOrScanRSClosure that were equal now. > > > > I did some performance testing again due to that, and also found > > that > > the check to filter out non-cross-region references > > in?G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also > > reverted it to the old code. > > > > Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not > > update > > _has_refs_into_cset as before. Fixed that as well. > > > > Thanks, > > ? Thomas > > From erik.helin at oracle.com Wed Jun 28 08:28:04 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 28 Jun 2017 10:28:04 +0200 Subject: RFR (7xS): 8175554: Improve G1UpdateRSOrPushRefClosure In-Reply-To: <1498638357.2874.6.camel@oracle.com> References: <1491910205.2754.31.camel@oracle.com> <1493986396.2777.61.camel@oracle.com> <3f492193-275d-3d4e-1a91-2d7e07fdaafb@oracle.com> <1497945947.2784.6.camel@oracle.com> <1498128249.2831.38.camel@oracle.com> <1498638357.2874.6.camel@oracle.com> Message-ID: <4eaee6f1-8cad-4f9c-262e-c047db21debc@oracle.com> On 06/28/2017 10:25 AM, Thomas Schatzl wrote: > Hi all, > > Erik suggested a few more refactorings: > > - rename G1ParClosureSuper -> G1ScanClosureBase > - rename a few "oops_in_heap_closure" parameter -> "update_rs_cl" > - move instantiation of closures from oops_into_collection_set_do() > into scan_rem_set()/update_rem_set() methods. > > I assume these are the final ones :) > > Webrevs: > http://cr.openjdk.java.net/~tschatzl/8175554/webrev.3/ (full) > http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2_to_3/ (diff) Thank you Thomas, this looks really nice now! Reviewed and ready to go :) Thanks, Erik > Testing: > jprt > > Thanks, > Thomas > > On Thu, 2017-06-22 at 12:44 +0200, Thomas Schatzl wrote: >> Hi all, >> >> after discussion with Erik, I removed one comment, and renamed the >> closures to something that resembles their use. Also I had to >> reintroduce the G1ParPushRefClosure removed in the initial patch due >> to >> performance regressions. >> >> G1UpdateOrScanRSClosure -> G1ScanObjsDuringUpdateRSClosure >> G1ParPushRefClosure -> G1ScanObjsDuringScanRSClosure >> G1ParScanClosure -> G1ScanEvacuatedObjClosure >> >> We also found that the mechanism to collect cards that contain >> references into the collection set to not lose any remembered set >> entries during update RS if there is an evacuation failure is >> basically >> superfluous. Other, existing mechanism make sure that all required >> remembered sets are (re-)created in other stages of the GC. >> >> Removal of this code has been decided to be out of scope here. >> >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8175554/webrev.1_to_2/ (diff) >> http://cr.openjdk.java.net/~tschatzl/8175554/webrev.2/ (full) >> Testing: >> jprt, local testing >> >> Thanks, >> Thomas >> >> >> On Tue, 2017-06-20 at 10:05 +0200, Thomas Schatzl wrote: >>> >>> Hi Sangheon, others, >>> >>> On Tue, 2017-05-30 at 15:15 -0700, sangheon wrote: >>>> >>>> >>>> Hi Thomas, >>>> >>>> On 05/05/2017 05:13 AM, Thomas Schatzl wrote: >>>>> >>>>> >>>>> >>>>> Hi all, >>>>> >>>>> recent reviews have made changes necessary to parts of the >>>>> changeset chain. >>>>> >>>>> Here is a list of links to updated webrevs. Since they have >>>>> apparently not been reviewed yet, I simply overwrote the old >>>>> webrevs. >>>>> >>>>> JDK-8177044: Remove _scan_top from HeapRegion >>>>> http://cr.openjdk.java.net/~tschatzl/8177044/webrev/ >>>>> >>>>> JDK-8178148: Log more detailed information about scan rs phase >>>>> http://cr.openjdk.java.net/~tschatzl/8178148/webrev/ >>>>> >>>>> JDK-8175554: Improve G1UpdateRSOrPushRefClosure >>>>> http://cr.openjdk.java.net/~tschatzl/8175554/webrev/ >>>> Looks good to me. >>>> I only have minor nits. >>>> >>>> ------------------------------------------------------ >>>> src/share/vm/gc/g1/g1OopClosures.hpp >>>> 78 virtual void do_oop(oop* p) { do_oop_nv(p); } >>>> Misaligned with above line. >>>> >>>> ------------------------------------------------------ >>>> src/share/vm/gc/g1/g1RemSet.hpp >>>> 204 G1UpdateOrScanRSClosure* push_heap_cl, >>>> Rename to reflect new closure name? >>>> >>>> ------------------------------------------------------ >>>> src/share/vm/gc/g1/g1RootProcessor.hpp >>>> Copyright update. >>>> >>>> ------------------------------------------------------ >>>> src/share/vm/gc/g1/g1_specialized_oop_closures.hpp >>>> 45 f(G1UpdateOrScanRSClosure,_nv) \ >>>> Misaligned '\'. >>>> >>> I fixed all this in addition to incorporating ErikD's comments >>> that >>> asked for factoring out two parts of the G1ParScanClosure and >>> G1UpdateOrScanRSClosure that were equal now. >>> >>> I did some performance testing again due to that, and also found >>> that >>> the check to filter out non-cross-region references >>> in G1UpdateOrScanRSClosure::do_oop_nv() seemed faster, so I also >>> reverted it to the old code. >>> >>> Also in this change G1UpdateOrScanRSClosure::do_oop_nv() did not >>> update >>> _has_refs_into_cset as before. Fixed that as well. >>> >>> Thanks, >>> Thomas >>> From stefan.johansson at oracle.com Wed Jun 28 08:56:55 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 28 Jun 2017 10:56:55 +0200 Subject: Any idea why max = -1(-1K) in G1GC In-Reply-To: References: Message-ID: <808e42d4-ad73-e25c-2fd9-955bb3e83d6f@oracle.com> Hi Sundar, I understand that this might be a bit confusing. The -1 means undefined and the reason the max is undefined for Eden and Survivor is that they are logical spaces within the G1 heap. Technically the same is true for the Old Gen, but to not lose information the heap capacity is used as the max for Old Gen. Some more detailed information from comments in the code: g1MemoryPool.hpp 35 // This file contains the three classes that represent the memory0 36 // pools of the G1 spaces: G1EdenPool, G1SurvivorPool, and 37 // G1OldGenPool. In G1, unlike our other GCs, we do not have a 38 // physical space for each of those spaces. Instead, we allocate 39 // regions for all three spaces out of a single pool of regions (that 40 // pool basically covers the entire heap). As a result, the eden, 41 // survivor, and old gen are considered logical spaces in G1, as each 42 // is a set of non-contiguous regions. This is also reflected in the 43 // way we map them to memory pools here. The easiest way to have done 44 // this would have been to map the entire G1 heap to a single memory 45 // pool. However, it's helpful to show how large the eden and survivor 46 // get, as this does affect the performance and behavior of G1. Which 47 // is why we introduce the three memory pools implemented here. 48 // 49 // See comments in g1MonitoringSupport.hpp for additional details 50 // on this model. g1MonitoringSupport.hpp 94 // * Max Capacity 95 // 96 // For jstat, we set the max capacity of all spaces to heap_capacity, 97 // given that we don't always have a reasonable upper bound on how big 98 // each space can grow. For the memory pools, we make the max 99 // capacity undefined with the exception of the old memory pool for 100 // which we make the max capacity same as the max heap capacity. Cheers, Stefan On 2017-06-27 21:44, Sundara Mohan M wrote: > When i try to get pool.getUsage() and print it i am getting > > G1 Eden Space > init = 27262976(26624K) used = 0(0K) committed = 0(0K) max = -1(-1K) > G1 Survivor Space > init = 0(0K) used = 0(0K) committed = 0(0K) max = -1(-1K) > G1 Old Gen > init = 241172480(235520K) used = 0(0K) committed = 0(0K) max = > 524288000(512000K) > > With ConcMarkSweepGC > > Par Eden Space > init = 71630848(69952K) used = 0(0K) committed = 0(0K) max = 139853824(136576K) > Par Survivor Space > init = 8912896(8704K) used = 0(0K) committed = 0(0K) max = 17432576(17024K) > CMS Old Gen > init = 178978816(174784K) used = 0(0K) committed = 0(0K) max = > 349569024(341376K) > > > code > for (MemoryPoolMXBean pool : ManagementFactory.getMemoryPoolMXBeans()) { > System.out.println(pool.getUsage()) > } > > Thanks, > Sundar From stefan.johansson at oracle.com Wed Jun 28 08:59:25 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 28 Jun 2017 10:59:25 +0200 Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure In-Reply-To: <1498570448.2750.10.camel@oracle.com> References: <1498570448.2750.10.camel@oracle.com> Message-ID: On 2017-06-27 15:34, Thomas Schatzl wrote: > Hi all, > > subject says it all. > > I think one Reviewer should be sufficient for this change. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8183006 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8183006/webrev/ Good! Thanks, Stefan > Testing: > local compilation > > Thanks, > Thomas From stefan.johansson at oracle.com Wed Jun 28 09:02:10 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 28 Jun 2017 11:02:10 +0200 Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in HeapRegionManager::par_iterate In-Reply-To: <1498570447.2750.9.camel@oracle.com> References: <1498570447.2750.9.camel@oracle.com> Message-ID: <452c263b-8d01-21a9-0dc1-5b87170658e4@oracle.com> Hi Thomas, On 2017-06-27 15:34, Thomas Schatzl wrote: > Hi all, > > can I have a review for this change that removes an unused parameter > in HeapRegionManager, and propagating this change to the callers? > > I think one Reviewer should be sufficient for this change. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8183002 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8183002/webrev/ Nice cleanup, looks good! Thanks, Stefan > Testing: > jprt > > Thanks, > Thomas From thomas.schatzl at oracle.com Wed Jun 28 09:14:51 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 28 Jun 2017 11:14:51 +0200 Subject: RFR (XXS): 8183006: Remove unused IterateOopClosureRegionClosure In-Reply-To: References: <1498570448.2750.10.camel@oracle.com> Message-ID: <1498641291.2874.12.camel@oracle.com> Hi Stefan, On Wed, 2017-06-28 at 10:59 +0200, Stefan Johansson wrote: > On 2017-06-27 15:34, Thomas Schatzl wrote: > > > > Hi all, > > > > ???subject says it all. > > > > I think one Reviewer should be sufficient for this change. > > > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8183006 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8183006/webrev/ > Good! > ? thanks for your review ;) Thomas From thomas.schatzl at oracle.com Wed Jun 28 09:15:59 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 28 Jun 2017 11:15:59 +0200 Subject: RFR (XXS): 8183002: Remove unused concurrent parameter in HeapRegionManager::par_iterate In-Reply-To: <452c263b-8d01-21a9-0dc1-5b87170658e4@oracle.com> References: <1498570447.2750.9.camel@oracle.com> <452c263b-8d01-21a9-0dc1-5b87170658e4@oracle.com> Message-ID: <1498641359.2874.13.camel@oracle.com> Hi Stefan, On Wed, 2017-06-28 at 11:02 +0200, Stefan Johansson wrote: > Hi Thomas, > > On 2017-06-27 15:34, Thomas Schatzl wrote: > > > > Hi all, > > > > ???can I have a review for this change that removes an unused > > parameter > > in HeapRegionManager, and propagating this change to the callers? > > > > I think one Reviewer should be sufficient for this change. > > > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8183002 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8183002/webrev/ > Nice cleanup, looks good! ? thanks for your review. Thomas From thomas.schatzl at oracle.com Wed Jun 28 10:34:38 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 28 Jun 2017 12:34:38 +0200 Subject: RFR (S): 8178151: Clean up G1RemSet files Message-ID: <1498646078.2874.16.camel@oracle.com> Hi all, ? can I have reviews for this small change that is supposed to tighten the interface and clean up documentation for the g1Remset* files including some better naming? [Note: I already sent this out for review two months ago in that big RFR thread with 7 changes. However, so much time has been elapsed since then, and everything based on this has been pushed, so I figured it would be simpler to just make an extra RFR request] CR: https://bugs.openjdk.java.net/browse/JDK-8178151 Webrev: http://cr.openjdk.java.net/~tschatzl/8178151/webrev.1/ Testing: jprt Thanks, ? Thomas From erik.helin at oracle.com Wed Jun 28 11:44:38 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 28 Jun 2017 13:44:38 +0200 Subject: RFR (S): 8178151: Clean up G1RemSet files In-Reply-To: <1498646078.2874.16.camel@oracle.com> References: <1498646078.2874.16.camel@oracle.com> Message-ID: <7869f721-8736-cedd-b28d-a5869b409a6a@oracle.com> On 06/28/2017 12:34 PM, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this small change that is supposed to tighten > the interface and clean up documentation for the g1Remset* files > including some better naming? > > [Note: I already sent this out for review two months ago in that big > RFR thread with 7 changes. However, so much time has been elapsed since > then, and everything based on this has been pushed, so I figured it > would be simpler to just make an extra RFR request] > > CR: > https://bugs.openjdk.java.net/browse/JDK-8178151 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8178151/webrev.1/ Looks good, Reviewed. Thanks, Erik > Testing: > jprt > > Thanks, > Thomas > > From stefan.johansson at oracle.com Wed Jun 28 11:57:27 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 28 Jun 2017 13:57:27 +0200 Subject: RFR (S): 8178151: Clean up G1RemSet files In-Reply-To: <1498646078.2874.16.camel@oracle.com> References: <1498646078.2874.16.camel@oracle.com> Message-ID: <3d6f1488-0958-1ac3-7405-020e11dd9395@oracle.com> Hi Thomas, On 2017-06-28 12:34, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this small change that is supposed to tighten > the interface and clean up documentation for the g1Remset* files > including some better naming? > > [Note: I already sent this out for review two months ago in that big > RFR thread with 7 changes. However, so much time has been elapsed since > then, and everything based on this has been pushed, so I figured it > would be simpler to just make an extra RFR request] > > CR: > https://bugs.openjdk.java.net/browse/JDK-8178151 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8178151/webrev.1/ Looks good. Thanks, Stefan > Testing: > jprt > > Thanks, > Thomas > > From thomas.schatzl at oracle.com Wed Jun 28 12:18:32 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 28 Jun 2017 14:18:32 +0200 Subject: RFR (S): 8178151: Clean up G1RemSet files In-Reply-To: <3d6f1488-0958-1ac3-7405-020e11dd9395@oracle.com> References: <1498646078.2874.16.camel@oracle.com> <3d6f1488-0958-1ac3-7405-020e11dd9395@oracle.com> Message-ID: <1498652312.2874.22.camel@oracle.com> Hi Stefan, Erik, On Wed, 2017-06-28 at 13:57 +0200, Stefan Johansson wrote: > Hi Thomas, > > On 2017-06-28 12:34, Thomas Schatzl wrote: > > > > Hi all, > > > > ???can I have reviews for this small change that is supposed to > > tighten > > the interface and clean up documentation for the g1Remset* files > > including some better naming? > > > > [Note: I already sent this out for review two months ago in that > > big RFR thread with 7 changes. However, so much time has been > > elapsed since then, and everything based on this has been pushed, > > so I figured it would be simpler to just make an extra RFR request] > > > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8178151 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8178151/webrev.1/ > Looks good. ? thanks for the reviews. Since this is a renaming/brushing up code quality only changeset I will push asap (and it's been hanging around out for review for >8 weeks). Thanks, ? Thomas From erik.helin at oracle.com Wed Jun 28 12:26:53 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 28 Jun 2017 14:26:53 +0200 Subject: RFR: Rename RefineRecordRefsIntoCSCardTableEntryClosure to G1RefineCardClosure Message-ID: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com> Hi all, please see the below inlined patch that just renames RefineRecordRefsIntoCSCardTableEntryClosure to more sensible G1RefineCardClosure. Bug: https://bugs.openjdk.java.net/browse/JDK-8183122 Testing: make hotspot Thanks, Erik # HG changeset patch # User ehelin # Date 1498652248 -7200 # Wed Jun 28 14:17:28 2017 +0200 # Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac # Parent 46d3ce319f37d2996fb0393a4f54f7759148bd1d 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to G1RefineCardClosure diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp --- a/src/share/vm/gc/g1/g1RemSet.cpp Wed Jun 28 12:11:55 2017 +0200 +++ b/src/share/vm/gc/g1/g1RemSet.cpp Wed Jun 28 14:17:28 2017 +0200 @@ -438,15 +438,14 @@ // Closure used for updating RSets and recording references that // point into the collection set. Only called during an // evacuation pause. - -class RefineRecordRefsIntoCSCardTableEntryClosure: public CardTableEntryClosure { +class G1RefineCardClosure: public CardTableEntryClosure { G1RemSet* _g1rs; DirtyCardQueue* _into_cset_dcq; G1ScanObjsDuringUpdateRSClosure* _update_rs_cl; public: - RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h, - DirtyCardQueue* into_cset_dcq, - G1ScanObjsDuringUpdateRSClosure* update_rs_cl) : + G1RefineCardClosure(G1CollectedHeap* g1h, + DirtyCardQueue* into_cset_dcq, + G1ScanObjsDuringUpdateRSClosure* update_rs_cl) : _g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq), _update_rs_cl(update_rs_cl) {} @@ -474,16 +473,16 @@ G1ParScanThreadState* pss, uint worker_i) { G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i); - RefineRecordRefsIntoCSCardTableEntryClosure into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl); + G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq, &update_rs_cl); G1GCParPhaseTimesTracker x(_g1p->phase_times(), G1GCPhaseTimes::UpdateRS, worker_i); if (G1HotCardCache::default_use_cache()) { // Apply the closure to the entries of the hot card cache. G1GCParPhaseTimesTracker y(_g1p->phase_times(), G1GCPhaseTimes::ScanHCC, worker_i); - _g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i); + _g1->iterate_hcc_closure(&refine_card_cl, worker_i); } // Apply the closure to all remaining log entries. - _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, worker_i); + _g1->iterate_dirty_card_closure(&refine_card_cl, worker_i); } void G1RemSet::cleanupHRRS() { From thomas.schatzl at oracle.com Wed Jun 28 12:41:13 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 28 Jun 2017 14:41:13 +0200 Subject: RFR: 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to G1RefineCardClosure In-Reply-To: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com> References: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com> Message-ID: <1498653673.2874.25.camel@oracle.com> On Wed, 2017-06-28 at 14:26 +0200, Erik Helin wrote: > Hi all, > > please see the below inlined patch that just renames? > RefineRecordRefsIntoCSCardTableEntryClosure to more sensible? > G1RefineCardClosure. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8183122 > Testing: make hotspot ? it's a bit hard to read (an attachment would have been better imho), but... looks good :) Thomas > > Thanks, > Erik > > # HG changeset patch > # User ehelin > # Date 1498652248 -7200 > #??????Wed Jun 28 14:17:28 2017 +0200 > # Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac > # Parent??46d3ce319f37d2996fb0393a4f54f7759148bd1d > 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to? > G1RefineCardClosure > > diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp > --- a/src/share/vm/gc/g1/g1RemSet.cpp???Wed Jun 28 12:11:55 2017 > +0200 > +++ b/src/share/vm/gc/g1/g1RemSet.cpp???Wed Jun 28 14:17:28 2017 > +0200 > @@ -438,15 +438,14 @@ > ? // Closure used for updating RSets and recording references that > ? // point into the collection set. Only called during an > ? // evacuation pause. > - > -class RefineRecordRefsIntoCSCardTableEntryClosure: public? > CardTableEntryClosure { > +class G1RefineCardClosure: public CardTableEntryClosure { > ????G1RemSet* _g1rs; > ????DirtyCardQueue* _into_cset_dcq; > ????G1ScanObjsDuringUpdateRSClosure* _update_rs_cl; > ? public: > -??RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h, > -??????????????????????????????????????????????DirtyCardQueue*? > into_cset_dcq, > -? > G1ScanObjsDuringUpdateRSClosure* update_rs_cl) : > +??G1RefineCardClosure(G1CollectedHeap* g1h, > +??????????????????????DirtyCardQueue* into_cset_dcq, > +??????????????????????G1ScanObjsDuringUpdateRSClosure* update_rs_cl) > : > ??????_g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq),? > _update_rs_cl(update_rs_cl) > ????{} > > @@ -474,16 +473,16 @@ > ????????????????????????????????G1ParScanThreadState* pss, > ????????????????????????????????uint worker_i) { > ????G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i); > -??RefineRecordRefsIntoCSCardTableEntryClosure? > into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl); > +??G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq, > &update_rs_cl); > > ????G1GCParPhaseTimesTracker x(_g1p->phase_times(),? > G1GCPhaseTimes::UpdateRS, worker_i); > ????if (G1HotCardCache::default_use_cache()) { > ??????// Apply the closure to the entries of the hot card cache. > ??????G1GCParPhaseTimesTracker y(_g1p->phase_times(),? > G1GCPhaseTimes::ScanHCC, worker_i); > -????_g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i); > +????_g1->iterate_hcc_closure(&refine_card_cl, worker_i); > ????} > ????// Apply the closure to all remaining log entries. > -??_g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, > worker_i); > +??_g1->iterate_dirty_card_closure(&refine_card_cl, worker_i); > ? } > > ? void G1RemSet::cleanupHRRS() { From erik.helin at oracle.com Wed Jun 28 12:59:35 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 28 Jun 2017 14:59:35 +0200 Subject: RFR: Message-ID: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com> Hi all, this small patch removes the class OopsInHeapRegionClosure. OopsInHeapRegionClosure only contains a protected _from field and the public method set_from, and there are only two other classes inheriting from OopsInHeapRegionClosure (G1ScanClosureBase and UpdareRsetDeferred). This patch gets rid of the class OopsInHeapRegionClosure and adds the corresponding field and method to the classes inheriting from OopsInHeapRegionClosure. Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8183124 Testing: make jprt Thanks, Erik From erik.helin at oracle.com Wed Jun 28 13:30:53 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 28 Jun 2017 15:30:53 +0200 Subject: RFR: 8183124: Remove OopsInHeapRegionClosure In-Reply-To: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com> References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com> Message-ID: <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com> ...and now with subject as well :) Erik On 06/28/2017 02:59 PM, Erik Helin wrote: > Hi all, > > this small patch removes the class OopsInHeapRegionClosure. > OopsInHeapRegionClosure only contains a protected _from field and the > public method set_from, and there are only two other classes inheriting > from OopsInHeapRegionClosure (G1ScanClosureBase and UpdareRsetDeferred). > > This patch gets rid of the class OopsInHeapRegionClosure and adds the > corresponding field and method to the classes inheriting from > OopsInHeapRegionClosure. > > Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8183124 > > Testing: make jprt > > Thanks, > Erik From stefan.johansson at oracle.com Wed Jun 28 13:36:36 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 28 Jun 2017 15:36:36 +0200 Subject: RFR: 8183124: Remove OopsInHeapRegionClosure In-Reply-To: <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com> References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com> <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com> Message-ID: On 2017-06-28 15:30, Erik Helin wrote: > ...and now with subject as well :) > > Erik > > On 06/28/2017 02:59 PM, Erik Helin wrote: >> Hi all, >> >> this small patch removes the class OopsInHeapRegionClosure. >> OopsInHeapRegionClosure only contains a protected _from field and the >> public method set_from, and there are only two other classes inheriting >> from OopsInHeapRegionClosure (G1ScanClosureBase and UpdareRsetDeferred). >> >> This patch gets rid of the class OopsInHeapRegionClosure and adds the >> corresponding field and method to the classes inheriting from >> OopsInHeapRegionClosure. >> >> Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/ >> Looks good, StefanJ >> Bug: https://bugs.openjdk.java.net/browse/JDK-8183124 >> >> Testing: make jprt >> >> Thanks, >> Erik From stefan.johansson at oracle.com Wed Jun 28 13:39:10 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 28 Jun 2017 15:39:10 +0200 Subject: RFR: Rename RefineRecordRefsIntoCSCardTableEntryClosure to G1RefineCardClosure In-Reply-To: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com> References: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com> Message-ID: <784fc363-85e2-fb2a-2fcc-5f62e9c58d5c@oracle.com> Good, StefanJ On 2017-06-28 14:26, Erik Helin wrote: > Hi all, > > please see the below inlined patch that just renames > RefineRecordRefsIntoCSCardTableEntryClosure to more sensible > G1RefineCardClosure. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8183122 > Testing: make hotspot > > Thanks, > Erik > > # HG changeset patch > # User ehelin > # Date 1498652248 -7200 > # Wed Jun 28 14:17:28 2017 +0200 > # Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac > # Parent 46d3ce319f37d2996fb0393a4f54f7759148bd1d > 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to > G1RefineCardClosure > > diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp > --- a/src/share/vm/gc/g1/g1RemSet.cpp Wed Jun 28 12:11:55 2017 +0200 > +++ b/src/share/vm/gc/g1/g1RemSet.cpp Wed Jun 28 14:17:28 2017 +0200 > @@ -438,15 +438,14 @@ > // Closure used for updating RSets and recording references that > // point into the collection set. Only called during an > // evacuation pause. > - > -class RefineRecordRefsIntoCSCardTableEntryClosure: public > CardTableEntryClosure { > +class G1RefineCardClosure: public CardTableEntryClosure { > G1RemSet* _g1rs; > DirtyCardQueue* _into_cset_dcq; > G1ScanObjsDuringUpdateRSClosure* _update_rs_cl; > public: > - RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h, > - DirtyCardQueue* > into_cset_dcq, > - G1ScanObjsDuringUpdateRSClosure* update_rs_cl) : > + G1RefineCardClosure(G1CollectedHeap* g1h, > + DirtyCardQueue* into_cset_dcq, > + G1ScanObjsDuringUpdateRSClosure* update_rs_cl) : > _g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq), > _update_rs_cl(update_rs_cl) > {} > > @@ -474,16 +473,16 @@ > G1ParScanThreadState* pss, > uint worker_i) { > G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i); > - RefineRecordRefsIntoCSCardTableEntryClosure > into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl); > + G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq, &update_rs_cl); > > G1GCParPhaseTimesTracker x(_g1p->phase_times(), > G1GCPhaseTimes::UpdateRS, worker_i); > if (G1HotCardCache::default_use_cache()) { > // Apply the closure to the entries of the hot card cache. > G1GCParPhaseTimesTracker y(_g1p->phase_times(), > G1GCPhaseTimes::ScanHCC, worker_i); > - _g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i); > + _g1->iterate_hcc_closure(&refine_card_cl, worker_i); > } > // Apply the closure to all remaining log entries. > - _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, worker_i); > + _g1->iterate_dirty_card_closure(&refine_card_cl, worker_i); > } > > void G1RemSet::cleanupHRRS() { From erik.helin at oracle.com Wed Jun 28 14:19:01 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 28 Jun 2017 16:19:01 +0200 Subject: RFR: 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to G1RefineCardClosure In-Reply-To: <1498653673.2874.25.camel@oracle.com> References: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com> <1498653673.2874.25.camel@oracle.com> Message-ID: On 06/28/2017 02:41 PM, Thomas Schatzl wrote: > On Wed, 2017-06-28 at 14:26 +0200, Erik Helin wrote: >> Hi all, >> >> please see the below inlined patch that just renames >> RefineRecordRefsIntoCSCardTableEntryClosure to more sensible >> G1RefineCardClosure. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8183122 >> Testing: make hotspot > > it's a bit hard to read (an attachment would have been better imho), > but... looks good :) Yeah, I wasn't sure if this list accepted attachments :/ Anyways, thanks for reviewing! Erik > Thomas > >> >> Thanks, >> Erik >> >> # HG changeset patch >> # User ehelin >> # Date 1498652248 -7200 >> # Wed Jun 28 14:17:28 2017 +0200 >> # Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac >> # Parent 46d3ce319f37d2996fb0393a4f54f7759148bd1d >> 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to >> G1RefineCardClosure >> >> diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp >> --- a/src/share/vm/gc/g1/g1RemSet.cpp Wed Jun 28 12:11:55 2017 >> +0200 >> +++ b/src/share/vm/gc/g1/g1RemSet.cpp Wed Jun 28 14:17:28 2017 >> +0200 >> @@ -438,15 +438,14 @@ >> // Closure used for updating RSets and recording references that >> // point into the collection set. Only called during an >> // evacuation pause. >> - >> -class RefineRecordRefsIntoCSCardTableEntryClosure: public >> CardTableEntryClosure { >> +class G1RefineCardClosure: public CardTableEntryClosure { >> G1RemSet* _g1rs; >> DirtyCardQueue* _into_cset_dcq; >> G1ScanObjsDuringUpdateRSClosure* _update_rs_cl; >> public: >> - RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h, >> - DirtyCardQueue* >> into_cset_dcq, >> - >> G1ScanObjsDuringUpdateRSClosure* update_rs_cl) : >> + G1RefineCardClosure(G1CollectedHeap* g1h, >> + DirtyCardQueue* into_cset_dcq, >> + G1ScanObjsDuringUpdateRSClosure* update_rs_cl) >> : >> _g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq), >> _update_rs_cl(update_rs_cl) >> {} >> >> @@ -474,16 +473,16 @@ >> G1ParScanThreadState* pss, >> uint worker_i) { >> G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i); >> - RefineRecordRefsIntoCSCardTableEntryClosure >> into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl); >> + G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq, >> &update_rs_cl); >> >> G1GCParPhaseTimesTracker x(_g1p->phase_times(), >> G1GCPhaseTimes::UpdateRS, worker_i); >> if (G1HotCardCache::default_use_cache()) { >> // Apply the closure to the entries of the hot card cache. >> G1GCParPhaseTimesTracker y(_g1p->phase_times(), >> G1GCPhaseTimes::ScanHCC, worker_i); >> - _g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i); >> + _g1->iterate_hcc_closure(&refine_card_cl, worker_i); >> } >> // Apply the closure to all remaining log entries. >> - _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, >> worker_i); >> + _g1->iterate_dirty_card_closure(&refine_card_cl, worker_i); >> } >> >> void G1RemSet::cleanupHRRS() { From erik.helin at oracle.com Wed Jun 28 14:19:26 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 28 Jun 2017 16:19:26 +0200 Subject: RFR: Rename RefineRecordRefsIntoCSCardTableEntryClosure to G1RefineCardClosure In-Reply-To: <784fc363-85e2-fb2a-2fcc-5f62e9c58d5c@oracle.com> References: <9c02a3e8-fa01-8329-3b01-011971ebcf94@oracle.com> <784fc363-85e2-fb2a-2fcc-5f62e9c58d5c@oracle.com> Message-ID: <13cf23c3-ffb9-a507-62aa-901c1483fa77@oracle.com> On 06/28/2017 03:39 PM, Stefan Johansson wrote: > Good, > StefanJ Thanks Stefan! Erik > On 2017-06-28 14:26, Erik Helin wrote: >> Hi all, >> >> please see the below inlined patch that just renames >> RefineRecordRefsIntoCSCardTableEntryClosure to more sensible >> G1RefineCardClosure. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8183122 >> Testing: make hotspot >> >> Thanks, >> Erik >> >> # HG changeset patch >> # User ehelin >> # Date 1498652248 -7200 >> # Wed Jun 28 14:17:28 2017 +0200 >> # Node ID f6b845d54277ff9232578fee4ba9f80c85aab0ac >> # Parent 46d3ce319f37d2996fb0393a4f54f7759148bd1d >> 8183122: Rename RefineRecordRefsIntoCSCardTableEntryClosure to >> G1RefineCardClosure >> >> diff -r 46d3ce319f37 -r f6b845d54277 src/share/vm/gc/g1/g1RemSet.cpp >> --- a/src/share/vm/gc/g1/g1RemSet.cpp Wed Jun 28 12:11:55 2017 +0200 >> +++ b/src/share/vm/gc/g1/g1RemSet.cpp Wed Jun 28 14:17:28 2017 +0200 >> @@ -438,15 +438,14 @@ >> // Closure used for updating RSets and recording references that >> // point into the collection set. Only called during an >> // evacuation pause. >> - >> -class RefineRecordRefsIntoCSCardTableEntryClosure: public >> CardTableEntryClosure { >> +class G1RefineCardClosure: public CardTableEntryClosure { >> G1RemSet* _g1rs; >> DirtyCardQueue* _into_cset_dcq; >> G1ScanObjsDuringUpdateRSClosure* _update_rs_cl; >> public: >> - RefineRecordRefsIntoCSCardTableEntryClosure(G1CollectedHeap* g1h, >> - DirtyCardQueue* >> into_cset_dcq, >> - G1ScanObjsDuringUpdateRSClosure* update_rs_cl) : >> + G1RefineCardClosure(G1CollectedHeap* g1h, >> + DirtyCardQueue* into_cset_dcq, >> + G1ScanObjsDuringUpdateRSClosure* update_rs_cl) : >> _g1rs(g1h->g1_rem_set()), _into_cset_dcq(into_cset_dcq), >> _update_rs_cl(update_rs_cl) >> {} >> >> @@ -474,16 +473,16 @@ >> G1ParScanThreadState* pss, >> uint worker_i) { >> G1ScanObjsDuringUpdateRSClosure update_rs_cl(_g1, pss, worker_i); >> - RefineRecordRefsIntoCSCardTableEntryClosure >> into_cset_update_rs_cl(_g1, into_cset_dcq, &update_rs_cl); >> + G1RefineCardClosure refine_card_cl(_g1, into_cset_dcq, &update_rs_cl); >> >> G1GCParPhaseTimesTracker x(_g1p->phase_times(), >> G1GCPhaseTimes::UpdateRS, worker_i); >> if (G1HotCardCache::default_use_cache()) { >> // Apply the closure to the entries of the hot card cache. >> G1GCParPhaseTimesTracker y(_g1p->phase_times(), >> G1GCPhaseTimes::ScanHCC, worker_i); >> - _g1->iterate_hcc_closure(&into_cset_update_rs_cl, worker_i); >> + _g1->iterate_hcc_closure(&refine_card_cl, worker_i); >> } >> // Apply the closure to all remaining log entries. >> - _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, worker_i); >> + _g1->iterate_dirty_card_closure(&refine_card_cl, worker_i); >> } >> >> void G1RemSet::cleanupHRRS() { > From erik.helin at oracle.com Wed Jun 28 14:20:58 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 28 Jun 2017 16:20:58 +0200 Subject: RFR: 8183124: Remove OopsInHeapRegionClosure In-Reply-To: References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com> <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com> Message-ID: On 06/28/2017 03:36 PM, Stefan Johansson wrote: > > > On 2017-06-28 15:30, Erik Helin wrote: >> ...and now with subject as well :) >> >> Erik >> >> On 06/28/2017 02:59 PM, Erik Helin wrote: >>> Hi all, >>> >>> this small patch removes the class OopsInHeapRegionClosure. >>> OopsInHeapRegionClosure only contains a protected _from field and the >>> public method set_from, and there are only two other classes inheriting >>> from OopsInHeapRegionClosure (G1ScanClosureBase and UpdareRsetDeferred). >>> >>> This patch gets rid of the class OopsInHeapRegionClosure and adds the >>> corresponding field and method to the classes inheriting from >>> OopsInHeapRegionClosure. >>> >>> Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/ >>> > Looks good, Thanks Stefan, appreciate the quick review! Erik > StefanJ >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183124 >>> >>> Testing: make jprt >>> >>> Thanks, >>> Erik > From thomas.schatzl at oracle.com Wed Jun 28 14:53:59 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 28 Jun 2017 16:53:59 +0200 Subject: RFR: 8183124: Remove OopsInHeapRegionClosure In-Reply-To: <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com> References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com> <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com> Message-ID: <1498661639.2755.10.camel@oracle.com> Hi, On Wed, 2017-06-28 at 15:30 +0200, Erik Helin wrote: > ...and now with subject as well :) > > Erik > > On 06/28/2017 02:59 PM, Erik Helin wrote: > > > > Hi all, > > > > this small patch removes the class OopsInHeapRegionClosure. > > OopsInHeapRegionClosure only contains a protected _from field and > > the > > public method set_from, and there are only two other classes > > inheriting > > from OopsInHeapRegionClosure (G1ScanClosureBase and > > UpdareRsetDeferred). > > > > This patch gets rid of the class OopsInHeapRegionClosure and adds > > the > > corresponding field and method to the classes inheriting from > > OopsInHeapRegionClosure. > > > > Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/ > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8183124 > > > > Testing: make jprt from my POV there are two reasons here: - the additional class only for that field adds more overhead in various aspects than just duplicating the member. - using _from in UpdateRSDeferred is due to current way of checking for cross-region pointers, using the _from value. It kind of saves us from recreating it. However there are better options here that will fix both JDK-8183127 and remove the need for the _from pointer completely. So overall, this change seems good to me. Thanks, ? Thomas From erik.helin at oracle.com Wed Jun 28 15:04:50 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 28 Jun 2017 17:04:50 +0200 Subject: RFR: 8183124: Remove OopsInHeapRegionClosure In-Reply-To: <1498661639.2755.10.camel@oracle.com> References: <76d67504-cb9e-c42e-cff9-e12085421bf7@oracle.com> <156d030c-05ef-4773-9485-a391d09eb0f4@oracle.com> <1498661639.2755.10.camel@oracle.com> Message-ID: <69a3e0e0-30a5-77ad-d9c2-dd2c6be95859@oracle.com> On 06/28/2017 04:53 PM, Thomas Schatzl wrote: > Hi, > > On Wed, 2017-06-28 at 15:30 +0200, Erik Helin wrote: >> ...and now with subject as well :) >> >> Erik >> >> On 06/28/2017 02:59 PM, Erik Helin wrote: >>> >>> Hi all, >>> >>> this small patch removes the class OopsInHeapRegionClosure. >>> OopsInHeapRegionClosure only contains a protected _from field and >>> the >>> public method set_from, and there are only two other classes >>> inheriting >>> from OopsInHeapRegionClosure (G1ScanClosureBase and >>> UpdareRsetDeferred). >>> >>> This patch gets rid of the class OopsInHeapRegionClosure and adds >>> the >>> corresponding field and method to the classes inheriting from >>> OopsInHeapRegionClosure. >>> >>> Patch: http://cr.openjdk.java.net/~ehelin/8183124/00/ >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183124 >>> >>> Testing: make jprt > > from my POV there are two reasons here: > > - the additional class only for that field adds more overhead in > various aspects than just duplicating the member. Yeah, thanks for clarifying. My motivation for this patch was mainly to get rid of the awkward inheritance. Having OopsInHeapRegionClosure is kind of like we would have a ClosureWithG1FieldClosure with a G1CollectedHeap* _g1h field, because many G1 closures has a _g1h field. This sort of code de-duplication is IMO worse than just having the field in multiple closures. > - using _from in UpdateRSDeferred is due to current way of checking for > cross-region pointers, using the _from value. It kind of saves us from > recreating it. However there are better options here that will fix both > JDK-8183127 and remove the need for the _from pointer completely. This is a great idea, we should use HeapRegion::is_in_same_region (this will also make the code more similar to G1ParScanThreadState::update_rs). Thanks, Erik > So overall, this change seems good to me. > > Thanks, > Thomas > From email.sundarms at gmail.com Wed Jun 28 18:54:36 2017 From: email.sundarms at gmail.com (Sundara Mohan M) Date: Wed, 28 Jun 2017 11:54:36 -0700 Subject: Why is G1GC collection usage threshold not updated early? Message-ID: I am trying to estimate the free memory using metrics from MemoryPoolMxBean.getCollectionUsage(). I am observing following behavior with G1GC iteration= 0 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 100 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 200 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 300 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 400 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 500 - G1 Old Gen: u= 1% cu= 0% uth=75% cuth=75% iteration= 600 - G1 Old Gen: u= 4% cu= 0% uth=75% cuth=75% iteration= 700 - G1 Old Gen: u= 9% cu= 0% uth=75% cuth=75% iteration= 800 - G1 Old Gen: u= 16% cu= 0% uth=75% cuth=75% iteration= 900 - G1 Old Gen: u= 25% cu= 0% uth=75% cuth=75% iteration= 1000 - G1 Old Gen: u= 34% cu= 0% uth=75% cuth=75% iteration= 1100 - G1 Old Gen: u= 45% cu= 0% uth=75% cuth=75% iteration= 1200 - G1 Old Gen: u= 38% cu= 0% uth=75% cuth=75% iteration= 1300 - G1 Old Gen: u= 46% cu= 0% uth=75% cuth=75% iteration= 1400 - G1 Old Gen: u= 52% cu= 0% uth=75% cuth=75% iteration= 1500 - G1 Old Gen: u= 45% cu= 0% uth=75% cuth=75% iteration= 1600 - G1 Old Gen: u= 67% cu= 0% uth=75% cuth=75% iteration= 1700 - G1 Old Gen: u= 56% cu= 0% uth=75% cuth=75% iteration= 1800 - G1 Old Gen: u= 55% cu= 0% uth=75% cuth=75% iteration= 1900 - G1 Old Gen: u= 61% cu= 0% uth=75% cuth=75% iteration= 2000 - G1 Old Gen: u= 56% cu= 0% uth=75% cuth=75% iteration= 2100 - G1 Old Gen: u= 76% cu= 0% uth=75% cuth=75% iteration= 2200 - G1 Old Gen: u= 65% cu= 0% uth=75% cuth=75% iteration= 2300 - G1 Old Gen: u= 62% cu= 0% uth=75% cuth=75% iteration= 2400 - G1 Old Gen: u= 75% cu= 0% uth=75% cuth=75% iteration= 2500 - G1 Old Gen: u= 75% cu= 0% uth=75% cuth=75% iteration= 2600 - G1 Old Gen: u= 72% cu= 0% uth=75% cuth=75% iteration= 2700 - G1 Old Gen: u= 69% cu= 0% uth=75% cuth=75% iteration= 2800 - G1 Old Gen: u= 74% cu= 0% uth=75% cuth=75% iteration= 2900 - G1 Old Gen: u= 80% cu= 0% uth=75% cuth=75% iteration= 3000 - G1 Old Gen: u= 83% cu= 0% uth=75% cuth=75% *iteration= 3100 - G1 Old Gen: u= 89% cu= 0% uth=75% cuth=75%iteration= 3200 - G1 Old Gen: u= 71% cu= 59% uth=75% cuth=75%* iteration= 3300 - G1 Old Gen: u= 90% cu= 59% uth=75% cuth=75% iteration= 3400 - G1 Old Gen: u= 76% cu= 62% uth=75% cuth=75% iteration= 3500 - G1 Old Gen: u= 65% cu= 65% uth=75% cuth=75% CMS GC iteration= 0 - CMS Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 100 - CMS Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 200 - CMS Old Gen: u= 1% cu= 0% uth=75% cuth=75% iteration= 300 - CMS Old Gen: u= 3% cu= 0% uth=75% cuth=75% iteration= 400 - CMS Old Gen: u= 12% cu= 0% uth=75% cuth=75% iteration= 500 - CMS Old Gen: u= 19% cu= 0% uth=75% cuth=75% iteration= 600 - CMS Old Gen: u= 34% cu= 0% uth=75% cuth=75% iteration= 700 - CMS Old Gen: u= 43% cu= 0% uth=75% cuth=75% *iteration= 800 - CMS Old Gen: u= 63% cu= 0% uth=75% cuth=75%* *iteration= 900 - CMS Old Gen: u= 48% cu= 37% uth=75% cuth=75%* *iteration= 1000 - CMS Old Gen: u= 60% cu= 37% uth=75% cuth=75%* iteration= 1100 - CMS Old Gen: u= 58% cu= 45% uth=75% cuth=75% iteration= 1200 - CMS Old Gen: u= 71% cu= 45% uth=75% cuth=75% iteration= 1300 - CMS Old Gen: u= 66% cu= 53% uth=75% cuth=75% iteration= 1400 - CMS Old Gen: u= 80% cu= 53% uth=75% cuth=75% u = usage(getUsage), cu = collectionUsage (getCollectionUsage), uth = usage threshold %, cuth = collection usage threshold % my program just keeps allocating string and frees some strings. 1. Why does G1GC doesn't update it's collection usage till 59% whereas in CMSGC it is updated at 37% itself? Can someone shed more light on this? Thanks, Sundar -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbin.ehn at oracle.com Wed Jun 28 18:55:57 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 28 Jun 2017 20:55:57 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> Message-ID: Hi Roman On 06/27/2017 09:47 PM, Roman Kennke wrote: > Hi Robbin, > > Ugh. Thanks for catching this. > Problem was that I was accounting the thread-local deflations twice: > once in thread-local processing (basically a leftover from my earlier > attempt to implement this accounting) and then again in > finish_deflate_idle_monitors(). Should be fixed here: > > http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/ > Nit: safepoint.cpp : ParallelSPCleanupTask "const char* name = " is not needed and 1 is unused > > Side question: which jtreg targets do you usually run? Right now I cherry pick directories from: hotspot/test/ I'm going to add a decent test group for local testing. > > Trying: make test TEST=hotspot_all > gives me *lots* of failures due to missing jcstress stuff (?!) > And even other subsets seem to depend on several bits and pieces that I > have no idea about. Yes, you need to use internal tool 'jib' java integrate build to get that work or you can set some environment where the jcstress application stuff is... I have a regression on ClassLoaderData root scanning, this should not be related, but I only have 3 patches which could cause this, if it's not something in the environment that have changed. Also do not see any immediate performance gains (off vs 4 threads), it might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24 , but I need to-do some more testing. I know you often run with none default GSI. I'll get back to you. Thanks, Robbin > > Roman > > Am 27.06.2017 um 16:51 schrieb Robbin Ehn: >> Hi Roman, >> >> There is something wrong in calculations: >> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0 >> : pop=27051 free=215487 >> >> free is larger than population, have not had the time to dig into this. >> >> Thanks, Robbin >> >> On 06/22/2017 10:19 PM, Roman Kennke wrote: >>> So here's the latest iteration of that patch: >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ >>> >>> >>> I checked and fixed all the counters. The problem here is that they are >>> not updated in a single place (deflate_idle_monitors() ) but in several >>> places, potentially by multiple threads. I split up deflation into >>> prepare_.. and a finish_.. methods to initialize local and update global >>> counters respectively, and pass around a counters object (allocated on >>> stack) to the various code paths that use it. Updating the counters >>> always happen under a lock, there's no need to do anything special with >>> regards to concurrency. >>> >>> I also checked the nmethod marking, but there doesn't seem to be >>> anything in that code that looks problematic under concurrency. The >>> worst that can happen is that two threads write the same value into an >>> nmethod field. I think we can live with that ;-) >>> >>> Good to go? >>> >>> Tested by running specjvm and jcstress fastdebug+release without issues. >>> >>> Roman >>> >>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn: >>>> Hi Roman, >>>> >>>> On 06/02/2017 11:41 AM, Roman Kennke wrote: >>>>> Hi David, >>>>> thanks for reviewing. I'll be on vacation the next two weeks too, with >>>>> only sporadic access to work stuff. >>>>> Yes, exposure will not be as good as otherwise, but it's not totally >>>>> untested either: the serial code path is the same as the parallel, the >>>>> only difference is that it's not actually called by multiple threads. >>>>> It's ok I think. >>>>> >>>>> I found two more issues that I think should be addressed: >>>>> - There are some counters in deflate_idle_monitors() and I'm not >>>>> sure I >>>>> correctly handle them in the split-up and MT'ed thread-local/ global >>>>> list deflation >>>>> - nmethod marking seems to unconditionally poke true or something like >>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's >>>>> probably worth checking if it's already true, especially when doing >>>>> this >>>>> with multiple threads concurrently. >>>>> >>>>> I'll send an updated patch around later, I hope I can get to it >>>>> today... >>>> >>>> I'll review that when you get it out. >>>> I think this looks as a reasonable step before we tackle this with a >>>> major effort, such as the JEP you and Carsten doing. >>>> And another effort to 'fix' nmethods marking. >>>> >>>> Internal discussion yesterday lead us to conclude that the runtime >>>> will probably need more threads. >>>> This would be a good driver to do a 'global' worker pool which serves >>>> both gc, runtime and safepoints with threads. >>>> >>>>> >>>>> Roman >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> I am about to disappear on an extended vacation so will let others >>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime, >>>>>> but >>>>>> an opt-in by the particular GC developers. Okay. My only concern with >>>>>> that is if Shenandoah is the only GC that currently opts in then this >>>>>> code is not going to get much testing and will be more prone to >>>>>> incidental breakage. >>>> >>>> As I mentioned before, it seem like Erik ? have some idea, maybe he >>>> can do this after his barrier patch. >>>> >>>> Thanks! >>>> >>>> /Robbin >>>> >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote: >>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>>>>>>> Hi Roman, >>>>>>>>> >>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>>>>>>> Hi Roman, I agree that is really needed but: >>>>>>>>>>> >>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>>>>>>> >>>>>>>>>>>> We need to be able to use the workers at a safepoint during >>>>>>>>>>>> concurrent >>>>>>>>>>>> GC work (which also uses the same workers). This does not only >>>>>>>>>>>> require >>>>>>>>>>>> that those workers be suspended, like e.g. >>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. >>>>>>>>>>>> have >>>>>>>>>>>> finished their tasks. This needs some careful handling to work >>>>>>>>>>>> without >>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>>>>>>>> corresponding >>>>>>>>>>>> run_task() call and also the tasks themselves need to join the >>>>>>>>>>>> STS and >>>>>>>>>>>> handle requests for safepoints not by yielding, but by leaving >>>>>>>>>>>> the >>>>>>>>>>>> task. >>>>>>>>>>>> This is far too peculiar for me to make the call to hook up GC >>>>>>>>>>>> workers >>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I >>>>>>>>>>>> left the >>>>>>>>>>>> API in >>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better >>>>>>>>>>>> about G1 >>>>>>>>>>>> and CMS >>>>>>>>>>>> should make that call, or else just use a separate thread pool. >>>>>>>>>>>> >>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is it ok now? >>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup" >>>>>>>>>>> workers >>>>>>>>>>> inside Shenandoah, >>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, >>>>>>>>>>> e.g.: >>>>>>>>>>> >>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>>>>>>> _cleanup_workers->total_workers() : 1; >>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>>>>>>> if (_cleanup_workers != NULL) { >>>>>>>>>>> _cleanup_workers->run_task(&cleanup, >>>>>>>>>>> _num_cleanup_workers); >>>>>>>>>>> } else { >>>>>>>>>>> cleanup.work(0); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> That way you don't even need your new flags, but it will be >>>>>>>>>>> up to >>>>>>>>>>> the >>>>>>>>>>> other GCs to make their worker available >>>>>>>>>>> or cheat with a separate workgang. >>>>>>>>>> I can do that, I don't mind. The question is, do we want that? >>>>>>>>> The problem is that we do not want to haste such decision, we >>>>>>>>> believe >>>>>>>>> there is a better solution. >>>>>>>>> I think you also would want another solution. >>>>>>>>> But it's seems like such solution with 1 'global' thread pool >>>>>>>>> either >>>>>>>>> own by GC or the VM it self is quite the undertaking. >>>>>>>>> Since this probably will not be done any time soon my >>>>>>>>> suggestion is, >>>>>>>>> to not hold you back (we also want this), just to make >>>>>>>>> the code parallel and as an intermediate step ask the GC if it >>>>>>>>> minds >>>>>>>>> sharing it's thread. >>>>>>>>> >>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will >>>>>>>>> share >>>>>>>>> the code for a separate thread pool, do something of it's own or >>>>>>>>> wait until the bigger question about thread pool(s) have been >>>>>>>>> resolved. >>>>>>>>> >>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer and >>>>>>>>> flags for it we might limit our future options. >>>>>>>>> >>>>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I >>>>>>>>>> see >>>>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint. >>>>>>>>>> However: >>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g. >>>>>>>>> number >>>>>>>>> of concurrent marking threads and parallel safepoint threads since >>>>>>>>> they compete for cpu time. >>>>>>>>> As the code looks now, I think that decisions must be made by the >>>>>>>>> GC. >>>>>>>> Ok, I see your point. I updated the proposed patch accordingly: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>>>>>>> >>>>>>> Oops. Minor mistake there. Correction: >>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>>>>>> >>>>>>> >>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it >>>>>>> into >>>>>>> collectedHeap.hpp, resulting in build failure...) >>>>>>> >>>>>>> Roman >>>>>>> >>>>> >>> > From ecki at zusammenkunft.net Wed Jun 28 20:21:38 2017 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Wed, 28 Jun 2017 20:21:38 +0000 Subject: Why is G1GC collection usage threshold not updated early? In-Reply-To: References: Message-ID: I guess G1 started much later with Mixed Collections compared to CMS. And when no GC happens the cu is not 0. you should maybe log the collection count as well. There is BTW A GC User mailinglist as well. Gruss Bernd -- http://bernd.eckenfels.net ________________________________ From: hotspot-gc-dev on behalf of Sundara Mohan M Sent: Wednesday, June 28, 2017 8:54:36 PM To: hotspot-gc-dev at openjdk.java.net Subject: Why is G1GC collection usage threshold not updated early? I am trying to estimate the free memory using metrics from MemoryPoolMxBean.getCollectionUsage(). I am observing following behavior with G1GC iteration= 0 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 100 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 200 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 300 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 400 - G1 Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 500 - G1 Old Gen: u= 1% cu= 0% uth=75% cuth=75% iteration= 600 - G1 Old Gen: u= 4% cu= 0% uth=75% cuth=75% iteration= 700 - G1 Old Gen: u= 9% cu= 0% uth=75% cuth=75% iteration= 800 - G1 Old Gen: u= 16% cu= 0% uth=75% cuth=75% iteration= 900 - G1 Old Gen: u= 25% cu= 0% uth=75% cuth=75% iteration= 1000 - G1 Old Gen: u= 34% cu= 0% uth=75% cuth=75% iteration= 1100 - G1 Old Gen: u= 45% cu= 0% uth=75% cuth=75% iteration= 1200 - G1 Old Gen: u= 38% cu= 0% uth=75% cuth=75% iteration= 1300 - G1 Old Gen: u= 46% cu= 0% uth=75% cuth=75% iteration= 1400 - G1 Old Gen: u= 52% cu= 0% uth=75% cuth=75% iteration= 1500 - G1 Old Gen: u= 45% cu= 0% uth=75% cuth=75% iteration= 1600 - G1 Old Gen: u= 67% cu= 0% uth=75% cuth=75% iteration= 1700 - G1 Old Gen: u= 56% cu= 0% uth=75% cuth=75% iteration= 1800 - G1 Old Gen: u= 55% cu= 0% uth=75% cuth=75% iteration= 1900 - G1 Old Gen: u= 61% cu= 0% uth=75% cuth=75% iteration= 2000 - G1 Old Gen: u= 56% cu= 0% uth=75% cuth=75% iteration= 2100 - G1 Old Gen: u= 76% cu= 0% uth=75% cuth=75% iteration= 2200 - G1 Old Gen: u= 65% cu= 0% uth=75% cuth=75% iteration= 2300 - G1 Old Gen: u= 62% cu= 0% uth=75% cuth=75% iteration= 2400 - G1 Old Gen: u= 75% cu= 0% uth=75% cuth=75% iteration= 2500 - G1 Old Gen: u= 75% cu= 0% uth=75% cuth=75% iteration= 2600 - G1 Old Gen: u= 72% cu= 0% uth=75% cuth=75% iteration= 2700 - G1 Old Gen: u= 69% cu= 0% uth=75% cuth=75% iteration= 2800 - G1 Old Gen: u= 74% cu= 0% uth=75% cuth=75% iteration= 2900 - G1 Old Gen: u= 80% cu= 0% uth=75% cuth=75% iteration= 3000 - G1 Old Gen: u= 83% cu= 0% uth=75% cuth=75% iteration= 3100 - G1 Old Gen: u= 89% cu= 0% uth=75% cuth=75% iteration= 3200 - G1 Old Gen: u= 71% cu= 59% uth=75% cuth=75% iteration= 3300 - G1 Old Gen: u= 90% cu= 59% uth=75% cuth=75% iteration= 3400 - G1 Old Gen: u= 76% cu= 62% uth=75% cuth=75% iteration= 3500 - G1 Old Gen: u= 65% cu= 65% uth=75% cuth=75% CMS GC iteration= 0 - CMS Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 100 - CMS Old Gen: u= 0% cu= 0% uth=75% cuth=75% iteration= 200 - CMS Old Gen: u= 1% cu= 0% uth=75% cuth=75% iteration= 300 - CMS Old Gen: u= 3% cu= 0% uth=75% cuth=75% iteration= 400 - CMS Old Gen: u= 12% cu= 0% uth=75% cuth=75% iteration= 500 - CMS Old Gen: u= 19% cu= 0% uth=75% cuth=75% iteration= 600 - CMS Old Gen: u= 34% cu= 0% uth=75% cuth=75% iteration= 700 - CMS Old Gen: u= 43% cu= 0% uth=75% cuth=75% iteration= 800 - CMS Old Gen: u= 63% cu= 0% uth=75% cuth=75% iteration= 900 - CMS Old Gen: u= 48% cu= 37% uth=75% cuth=75% iteration= 1000 - CMS Old Gen: u= 60% cu= 37% uth=75% cuth=75% iteration= 1100 - CMS Old Gen: u= 58% cu= 45% uth=75% cuth=75% iteration= 1200 - CMS Old Gen: u= 71% cu= 45% uth=75% cuth=75% iteration= 1300 - CMS Old Gen: u= 66% cu= 53% uth=75% cuth=75% iteration= 1400 - CMS Old Gen: u= 80% cu= 53% uth=75% cuth=75% u = usage(getUsage), cu = collectionUsage (getCollectionUsage), uth = usage threshold %, cuth = collection usage threshold % my program just keeps allocating string and frees some strings. 1. Why does G1GC doesn't update it's collection usage till 59% whereas in CMSGC it is updated at 37% itself? Can someone shed more light on this? Thanks, Sundar -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at redhat.com Wed Jun 28 20:23:37 2017 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 28 Jun 2017 22:23:37 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> Message-ID: <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> > > On 06/27/2017 09:47 PM, Roman Kennke wrote: >> Hi Robbin, >> >> Ugh. Thanks for catching this. >> Problem was that I was accounting the thread-local deflations twice: >> once in thread-local processing (basically a leftover from my earlier >> attempt to implement this accounting) and then again in >> finish_deflate_idle_monitors(). Should be fixed here: >> >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/ >> > > Nit: > safepoint.cpp : ParallelSPCleanupTask > "const char* name = " is not needed and 1 is unused > Sorry, I don't understand what you mean by this. I see code like this: const char* name = "deflating idle monitors"; and it is used a few lines below, even 2x. What's '1 is unused' ? >> >> Side question: which jtreg targets do you usually run? > > Right now I cherry pick directories from: hotspot/test/ > > I'm going to add a decent test group for local testing. That would be good! > >> >> Trying: make test TEST=hotspot_all >> gives me *lots* of failures due to missing jcstress stuff (?!) >> And even other subsets seem to depend on several bits and pieces >> that I >> have no idea about. > > Yes, you need to use internal tool 'jib' java integrate build to get > that work or you can set some environment where the jcstress > application stuff is... Uhhh. We really do want a subset of tests that we can run reliably and that are self-contained, how else are people (without that jib thingy) supposed to do some sanity checking with their patches? ;-) > I have a regression on ClassLoaderData root scanning, this should not > be related, > but I only have 3 patches which could cause this, if it's not > something in the environment that have changed. Let me know if it's my patch :-) > > Also do not see any immediate performance gains (off vs 4 threads), it > might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24 > , but I need to-do some more testing. I know you often run with none > default GSI. First of all, during the course of this review I reduced the change from an actual implementation to a kind of framework, and it needs some separate changes in the GC to make use of it. Not sure if you added corresponding code in (e.g.) G1? Also, this is only really visible in code that makes excessive use of monitors, i.e. the one linked by Carsten's original patch, or the test org.openjdk.gcbench.roots.Synchronizers.test in gc-bench: http://icedtea.classpath.org/hg/gc-bench/ There are also some popular real-world apps that tend to do this. From the top off my head, Cassandra is such an application. Thanks, Roman > > I'll get back to you. > > Thanks, Robbin > >> >> Roman >> >> Am 27.06.2017 um 16:51 schrieb Robbin Ehn: >>> Hi Roman, >>> >>> There is something wrong in calculations: >>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0 >>> : pop=27051 free=215487 >>> >>> free is larger than population, have not had the time to dig into this. >>> >>> Thanks, Robbin >>> >>> On 06/22/2017 10:19 PM, Roman Kennke wrote: >>>> So here's the latest iteration of that patch: >>>> >>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ >>>> >>>> >>>> I checked and fixed all the counters. The problem here is that they >>>> are >>>> not updated in a single place (deflate_idle_monitors() ) but in >>>> several >>>> places, potentially by multiple threads. I split up deflation into >>>> prepare_.. and a finish_.. methods to initialize local and update >>>> global >>>> counters respectively, and pass around a counters object (allocated on >>>> stack) to the various code paths that use it. Updating the counters >>>> always happen under a lock, there's no need to do anything special >>>> with >>>> regards to concurrency. >>>> >>>> I also checked the nmethod marking, but there doesn't seem to be >>>> anything in that code that looks problematic under concurrency. The >>>> worst that can happen is that two threads write the same value into an >>>> nmethod field. I think we can live with that ;-) >>>> >>>> Good to go? >>>> >>>> Tested by running specjvm and jcstress fastdebug+release without >>>> issues. >>>> >>>> Roman >>>> >>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn: >>>>> Hi Roman, >>>>> >>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote: >>>>>> Hi David, >>>>>> thanks for reviewing. I'll be on vacation the next two weeks too, >>>>>> with >>>>>> only sporadic access to work stuff. >>>>>> Yes, exposure will not be as good as otherwise, but it's not totally >>>>>> untested either: the serial code path is the same as the >>>>>> parallel, the >>>>>> only difference is that it's not actually called by multiple >>>>>> threads. >>>>>> It's ok I think. >>>>>> >>>>>> I found two more issues that I think should be addressed: >>>>>> - There are some counters in deflate_idle_monitors() and I'm not >>>>>> sure I >>>>>> correctly handle them in the split-up and MT'ed thread-local/ global >>>>>> list deflation >>>>>> - nmethod marking seems to unconditionally poke true or something >>>>>> like >>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's >>>>>> probably worth checking if it's already true, especially when doing >>>>>> this >>>>>> with multiple threads concurrently. >>>>>> >>>>>> I'll send an updated patch around later, I hope I can get to it >>>>>> today... >>>>> >>>>> I'll review that when you get it out. >>>>> I think this looks as a reasonable step before we tackle this with a >>>>> major effort, such as the JEP you and Carsten doing. >>>>> And another effort to 'fix' nmethods marking. >>>>> >>>>> Internal discussion yesterday lead us to conclude that the runtime >>>>> will probably need more threads. >>>>> This would be a good driver to do a 'global' worker pool which serves >>>>> both gc, runtime and safepoints with threads. >>>>> >>>>>> >>>>>> Roman >>>>>> >>>>>>> Hi Roman, >>>>>>> >>>>>>> I am about to disappear on an extended vacation so will let others >>>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime, >>>>>>> but >>>>>>> an opt-in by the particular GC developers. Okay. My only concern >>>>>>> with >>>>>>> that is if Shenandoah is the only GC that currently opts in then >>>>>>> this >>>>>>> code is not going to get much testing and will be more prone to >>>>>>> incidental breakage. >>>>> >>>>> As I mentioned before, it seem like Erik ? have some idea, maybe he >>>>> can do this after his barrier patch. >>>>> >>>>> Thanks! >>>>> >>>>> /Robbin >>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote: >>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>>>>>>>> Hi Roman, >>>>>>>>>> >>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>>>>>>>> Hi Roman, I agree that is really needed but: >>>>>>>>>>>> >>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>>>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>>>>>>>> >>>>>>>>>>>>> We need to be able to use the workers at a safepoint during >>>>>>>>>>>>> concurrent >>>>>>>>>>>>> GC work (which also uses the same workers). This does not >>>>>>>>>>>>> only >>>>>>>>>>>>> require >>>>>>>>>>>>> that those workers be suspended, like e.g. >>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. >>>>>>>>>>>>> have >>>>>>>>>>>>> finished their tasks. This needs some careful handling to >>>>>>>>>>>>> work >>>>>>>>>>>>> without >>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>>>>>>>>> corresponding >>>>>>>>>>>>> run_task() call and also the tasks themselves need to join >>>>>>>>>>>>> the >>>>>>>>>>>>> STS and >>>>>>>>>>>>> handle requests for safepoints not by yielding, but by >>>>>>>>>>>>> leaving >>>>>>>>>>>>> the >>>>>>>>>>>>> task. >>>>>>>>>>>>> This is far too peculiar for me to make the call to hook >>>>>>>>>>>>> up GC >>>>>>>>>>>>> workers >>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I >>>>>>>>>>>>> left the >>>>>>>>>>>>> API in >>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better >>>>>>>>>>>>> about G1 >>>>>>>>>>>>> and CMS >>>>>>>>>>>>> should make that call, or else just use a separate thread >>>>>>>>>>>>> pool. >>>>>>>>>>>>> >>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Is it ok now? >>>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup" >>>>>>>>>>>> workers >>>>>>>>>>>> inside Shenandoah, >>>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, >>>>>>>>>>>> e.g.: >>>>>>>>>>>> >>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>>>>>>>> _cleanup_workers->total_workers() : 1; >>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>>>>>>>> if (_cleanup_workers != NULL) { >>>>>>>>>>>> _cleanup_workers->run_task(&cleanup, >>>>>>>>>>>> _num_cleanup_workers); >>>>>>>>>>>> } else { >>>>>>>>>>>> cleanup.work(0); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> That way you don't even need your new flags, but it will be >>>>>>>>>>>> up to >>>>>>>>>>>> the >>>>>>>>>>>> other GCs to make their worker available >>>>>>>>>>>> or cheat with a separate workgang. >>>>>>>>>>> I can do that, I don't mind. The question is, do we want that? >>>>>>>>>> The problem is that we do not want to haste such decision, we >>>>>>>>>> believe >>>>>>>>>> there is a better solution. >>>>>>>>>> I think you also would want another solution. >>>>>>>>>> But it's seems like such solution with 1 'global' thread pool >>>>>>>>>> either >>>>>>>>>> own by GC or the VM it self is quite the undertaking. >>>>>>>>>> Since this probably will not be done any time soon my >>>>>>>>>> suggestion is, >>>>>>>>>> to not hold you back (we also want this), just to make >>>>>>>>>> the code parallel and as an intermediate step ask the GC if it >>>>>>>>>> minds >>>>>>>>>> sharing it's thread. >>>>>>>>>> >>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will >>>>>>>>>> share >>>>>>>>>> the code for a separate thread pool, do something of it's own or >>>>>>>>>> wait until the bigger question about thread pool(s) have been >>>>>>>>>> resolved. >>>>>>>>>> >>>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer >>>>>>>>>> and >>>>>>>>>> flags for it we might limit our future options. >>>>>>>>>> >>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang' >>>>>>>>>>> though. I >>>>>>>>>>> see >>>>>>>>>>> that both G1 and CMS suspend their worker threads at a >>>>>>>>>>> safepoint. >>>>>>>>>>> However: >>>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g. >>>>>>>>>> number >>>>>>>>>> of concurrent marking threads and parallel safepoint threads >>>>>>>>>> since >>>>>>>>>> they compete for cpu time. >>>>>>>>>> As the code looks now, I think that decisions must be made by >>>>>>>>>> the >>>>>>>>>> GC. >>>>>>>>> Ok, I see your point. I updated the proposed patch accordingly: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>>>>>>>> >>>>>>>> Oops. Minor mistake there. Correction: >>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>>>>>>> >>>>>>>> >>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it >>>>>>>> into >>>>>>>> collectedHeap.hpp, resulting in build failure...) >>>>>>>> >>>>>>>> Roman >>>>>>>> >>>>>> >>>> >> From robbin.ehn at oracle.com Wed Jun 28 21:08:54 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 28 Jun 2017 23:08:54 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> Message-ID: <39baaa4e-7e9e-6ef4-749c-7429078d23d8@oracle.com> Hi Roman On 06/28/2017 10:23 PM, Roman Kennke wrote: > >> >> On 06/27/2017 09:47 PM, Roman Kennke wrote: >>> Hi Robbin, >>> >>> Ugh. Thanks for catching this. >>> Problem was that I was accounting the thread-local deflations twice: >>> once in thread-local processing (basically a leftover from my earlier >>> attempt to implement this accounting) and then again in >>> finish_deflate_idle_monitors(). Should be fixed here: >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/ >>> >> >> Nit: >> safepoint.cpp : ParallelSPCleanupTask >> "const char* name = " is not needed and 1 is unused >> > Sorry, I don't understand what you mean by this. I see code like this: > > const char* name = "deflating idle monitors"; > > and it is used a few lines below, even 2x. > > What's '1 is unused' ? Yes I didn't see name was at two places, so it's only the one that is used only once. 598 const char* name = "compilation policy safepoint handler"; 599 EventSafepointCleanupTask event; 600 TraceTime timer("compilation policy safepoint handler", TRACETIME_LOG(Info, safepoint, cleanup)); 601 CompilationPolicy::policy()->do_safepoint_work(); 602 event_safepoint_cleanup_task_commit(event, name); (you do not need webrev this :) ) > >>> >>> Side question: which jtreg targets do you usually run? >> >> Right now I cherry pick directories from: hotspot/test/ >> >> I'm going to add a decent test group for local testing. > That would be good! > > >> >>> >>> Trying: make test TEST=hotspot_all >>> gives me *lots* of failures due to missing jcstress stuff (?!) >>> And even other subsets seem to depend on several bits and pieces >>> that I >>> have no idea about. >> >> Yes, you need to use internal tool 'jib' java integrate build to get >> that work or you can set some environment where the jcstress >> application stuff is... > Uhhh. We really do want a subset of tests that we can run reliably and > that are self-contained, how else are people (without that jib thingy) > supposed to do some sanity checking with their patches? ;-) Yes! >> I have a regression on ClassLoaderData root scanning, this should not >> be related, >> but I only have 3 patches which could cause this, if it's not >> something in the environment that have changed. > Let me know if it's my patch :-) No it seems to be an experimental numa patch that seem to makes -XX:-UseNUMA worse :) Adding -XX:+UseNUMA numbers come back, worse 12ms -> 2ms and avg goes 0.44ms to 0.27 ms >> >> Also do not see any immediate performance gains (off vs 4 threads), it >> might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24 >> , but I need to-do some more testing. I know you often run with none >> default GSI. > > First of all, during the course of this review I reduced the change from > an actual implementation to a kind of framework, and it needs some > separate changes in the GC to make use of it. Not sure if you added > corresponding code in (e.g.) G1? I added the stuff directly in collectedheap just for testing. > > Also, this is only really visible in code that makes excessive use of > monitors, i.e. the one linked by Carsten's original patch, or the test > org.openjdk.gcbench.roots.Synchronizers.test in gc-bench: > > http://icedtea.classpath.org/hg/gc-bench/ > > There are also some popular real-world apps that tend to do this. From > the top off my head, Cassandra is such an application. I'll look at that. My test burns ~13k monitors per second, not sure what that level counts as. I just want to verify some more testing, I'll get back to you tomorrow! Thanks for bearing with me! /Robbin > > Thanks, Roman > >> >> I'll get back to you. >> >> Thanks, Robbin >> >>> >>> Roman >>> >>> Am 27.06.2017 um 16:51 schrieb Robbin Ehn: >>>> Hi Roman, >>>> >>>> There is something wrong in calculations: >>>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0 >>>> : pop=27051 free=215487 >>>> >>>> free is larger than population, have not had the time to dig into this. >>>> >>>> Thanks, Robbin >>>> >>>> On 06/22/2017 10:19 PM, Roman Kennke wrote: >>>>> So here's the latest iteration of that patch: >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ >>>>> >>>>> >>>>> I checked and fixed all the counters. The problem here is that they >>>>> are >>>>> not updated in a single place (deflate_idle_monitors() ) but in >>>>> several >>>>> places, potentially by multiple threads. I split up deflation into >>>>> prepare_.. and a finish_.. methods to initialize local and update >>>>> global >>>>> counters respectively, and pass around a counters object (allocated on >>>>> stack) to the various code paths that use it. Updating the counters >>>>> always happen under a lock, there's no need to do anything special >>>>> with >>>>> regards to concurrency. >>>>> >>>>> I also checked the nmethod marking, but there doesn't seem to be >>>>> anything in that code that looks problematic under concurrency. The >>>>> worst that can happen is that two threads write the same value into an >>>>> nmethod field. I think we can live with that ;-) >>>>> >>>>> Good to go? >>>>> >>>>> Tested by running specjvm and jcstress fastdebug+release without >>>>> issues. >>>>> >>>>> Roman >>>>> >>>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn: >>>>>> Hi Roman, >>>>>> >>>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote: >>>>>>> Hi David, >>>>>>> thanks for reviewing. I'll be on vacation the next two weeks too, >>>>>>> with >>>>>>> only sporadic access to work stuff. >>>>>>> Yes, exposure will not be as good as otherwise, but it's not totally >>>>>>> untested either: the serial code path is the same as the >>>>>>> parallel, the >>>>>>> only difference is that it's not actually called by multiple >>>>>>> threads. >>>>>>> It's ok I think. >>>>>>> >>>>>>> I found two more issues that I think should be addressed: >>>>>>> - There are some counters in deflate_idle_monitors() and I'm not >>>>>>> sure I >>>>>>> correctly handle them in the split-up and MT'ed thread-local/ global >>>>>>> list deflation >>>>>>> - nmethod marking seems to unconditionally poke true or something >>>>>>> like >>>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's >>>>>>> probably worth checking if it's already true, especially when doing >>>>>>> this >>>>>>> with multiple threads concurrently. >>>>>>> >>>>>>> I'll send an updated patch around later, I hope I can get to it >>>>>>> today... >>>>>> >>>>>> I'll review that when you get it out. >>>>>> I think this looks as a reasonable step before we tackle this with a >>>>>> major effort, such as the JEP you and Carsten doing. >>>>>> And another effort to 'fix' nmethods marking. >>>>>> >>>>>> Internal discussion yesterday lead us to conclude that the runtime >>>>>> will probably need more threads. >>>>>> This would be a good driver to do a 'global' worker pool which serves >>>>>> both gc, runtime and safepoints with threads. >>>>>> >>>>>>> >>>>>>> Roman >>>>>>> >>>>>>>> Hi Roman, >>>>>>>> >>>>>>>> I am about to disappear on an extended vacation so will let others >>>>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime, >>>>>>>> but >>>>>>>> an opt-in by the particular GC developers. Okay. My only concern >>>>>>>> with >>>>>>>> that is if Shenandoah is the only GC that currently opts in then >>>>>>>> this >>>>>>>> code is not going to get much testing and will be more prone to >>>>>>>> incidental breakage. >>>>>> >>>>>> As I mentioned before, it seem like Erik ? have some idea, maybe he >>>>>> can do this after his barrier patch. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> /Robbin >>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote: >>>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>>>>>>>>> Hi Roman, >>>>>>>>>>> >>>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>>>>>>>>> Hi Roman, I agree that is really needed but: >>>>>>>>>>>>> >>>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>>>>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We need to be able to use the workers at a safepoint during >>>>>>>>>>>>>> concurrent >>>>>>>>>>>>>> GC work (which also uses the same workers). This does not >>>>>>>>>>>>>> only >>>>>>>>>>>>>> require >>>>>>>>>>>>>> that those workers be suspended, like e.g. >>>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. >>>>>>>>>>>>>> have >>>>>>>>>>>>>> finished their tasks. This needs some careful handling to >>>>>>>>>>>>>> work >>>>>>>>>>>>>> without >>>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>>>>>>>>>> corresponding >>>>>>>>>>>>>> run_task() call and also the tasks themselves need to join >>>>>>>>>>>>>> the >>>>>>>>>>>>>> STS and >>>>>>>>>>>>>> handle requests for safepoints not by yielding, but by >>>>>>>>>>>>>> leaving >>>>>>>>>>>>>> the >>>>>>>>>>>>>> task. >>>>>>>>>>>>>> This is far too peculiar for me to make the call to hook >>>>>>>>>>>>>> up GC >>>>>>>>>>>>>> workers >>>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I >>>>>>>>>>>>>> left the >>>>>>>>>>>>>> API in >>>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better >>>>>>>>>>>>>> about G1 >>>>>>>>>>>>>> and CMS >>>>>>>>>>>>>> should make that call, or else just use a separate thread >>>>>>>>>>>>>> pool. >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is it ok now? >>>>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup" >>>>>>>>>>>>> workers >>>>>>>>>>>>> inside Shenandoah, >>>>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, >>>>>>>>>>>>> e.g.: >>>>>>>>>>>>> >>>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>>>>>>>>> _cleanup_workers->total_workers() : 1; >>>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>>>>>>>>> if (_cleanup_workers != NULL) { >>>>>>>>>>>>> _cleanup_workers->run_task(&cleanup, >>>>>>>>>>>>> _num_cleanup_workers); >>>>>>>>>>>>> } else { >>>>>>>>>>>>> cleanup.work(0); >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> That way you don't even need your new flags, but it will be >>>>>>>>>>>>> up to >>>>>>>>>>>>> the >>>>>>>>>>>>> other GCs to make their worker available >>>>>>>>>>>>> or cheat with a separate workgang. >>>>>>>>>>>> I can do that, I don't mind. The question is, do we want that? >>>>>>>>>>> The problem is that we do not want to haste such decision, we >>>>>>>>>>> believe >>>>>>>>>>> there is a better solution. >>>>>>>>>>> I think you also would want another solution. >>>>>>>>>>> But it's seems like such solution with 1 'global' thread pool >>>>>>>>>>> either >>>>>>>>>>> own by GC or the VM it self is quite the undertaking. >>>>>>>>>>> Since this probably will not be done any time soon my >>>>>>>>>>> suggestion is, >>>>>>>>>>> to not hold you back (we also want this), just to make >>>>>>>>>>> the code parallel and as an intermediate step ask the GC if it >>>>>>>>>>> minds >>>>>>>>>>> sharing it's thread. >>>>>>>>>>> >>>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will >>>>>>>>>>> share >>>>>>>>>>> the code for a separate thread pool, do something of it's own or >>>>>>>>>>> wait until the bigger question about thread pool(s) have been >>>>>>>>>>> resolved. >>>>>>>>>>> >>>>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer >>>>>>>>>>> and >>>>>>>>>>> flags for it we might limit our future options. >>>>>>>>>>> >>>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang' >>>>>>>>>>>> though. I >>>>>>>>>>>> see >>>>>>>>>>>> that both G1 and CMS suspend their worker threads at a >>>>>>>>>>>> safepoint. >>>>>>>>>>>> However: >>>>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g. >>>>>>>>>>> number >>>>>>>>>>> of concurrent marking threads and parallel safepoint threads >>>>>>>>>>> since >>>>>>>>>>> they compete for cpu time. >>>>>>>>>>> As the code looks now, I think that decisions must be made by >>>>>>>>>>> the >>>>>>>>>>> GC. >>>>>>>>>> Ok, I see your point. I updated the proposed patch accordingly: >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>>>>>>>>> >>>>>>>>> Oops. Minor mistake there. Correction: >>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>>>>>>>> >>>>>>>>> >>>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it >>>>>>>>> into >>>>>>>>> collectedHeap.hpp, resulting in build failure...) >>>>>>>>> >>>>>>>>> Roman >>>>>>>>> >>>>>>> >>>>> >>> > From sangheon.kim at oracle.com Thu Jun 29 07:56:04 2017 From: sangheon.kim at oracle.com (sangheon) Date: Thu, 29 Jun 2017 00:56:04 -0700 Subject: RFR(S): 8173335: Improve logging for j.l.ref.reference processing In-Reply-To: <1498488133.2665.37.camel@oracle.com> References: <1497352882.2829.65.camel@oracle.com> <054513b1-4ff2-6656-fa3a-9c6e6736c32f@oracle.com> <0353babf-03eb-21cd-b286-9b0149dfb718@oracle.com> <1498488133.2665.37.camel@oracle.com> Message-ID: Hi Thomas, Thank you very much for the thorough review. On 06/26/2017 07:42 AM, Thomas Schatzl wrote: > Hi Sangheon, > > thanks for all your changes, and sorry a bit for the delay... > > On Wed, 2017-06-14 at 00:52 -0700, sangheon wrote: >> Hi Thomas again, >> On 06/13/2017 02:21 PM, sangheon wrote: >>> Hi Thomas, >>> >>> Thank you for reviewing this. >>> >>> On 06/13/2017 04:21 AM, Thomas Schatzl wrote: >>>> Hi Sangheon, >>>> >>>> >>>> On Mon, 2017-06-12 at 17:13 -0700, sangheon wrote: >>>>> Hi Aleksey, >>>>> >>>>> Thanks for the review. >>>>> >>>>> On 06/12/2017 09:06 AM, Aleksey Shipilev wrote: >>>>>> On 06/10/2017 01:57 AM, sangheon wrote: >>>>>>> CR:https://bugs.openjdk.java.net/browse/JDK-8173335 >>>>>>> webrev:http://cr.openjdk.java.net/~sangheki/8173335/webrev >>>>>>> .0 >>>> - There should be a destructor in ReferenceProcessor cleaning up >>>> the dynamically allocated memory. >>> Thomas and I had some discussion about this and agreed to file a >>> separate CR for freeing issue. >>> >>> I noticed that there's no destructor when I wrote this, but this is >>> how we usually implement. >>> However as this seems incorrect, I will add a destructor for newly >>> added class but it will not be used in this patch. >>> It will be used in the following CR( >>> https://bugs.openjdk.java.net/browse/JDK-8182120 ) which fixes >>> not-freeing issue in ReferenceProcessor. >>> FYI, ReferenceProcessor has heap allocated members of >>> ReferencePolicy(and its friends) but it is not freed too. So >>> instead of extending this patch, I propose to separate this freeing >>> issue. > That's fine, thanks. > >>>> - the change should move gc+ref output to something else: there >>>> is so much additional junk printed with gc+ref=trace so that the >>>> phase logging is drowned out with real trace information and >>>> unusable for regular consumption. >>> Okay, I will add it. >>> But I asked introducing 'gc+ref+phases' before but you didn't like >>> it. :) Probably I didn't provide much details?! > Yes. In the example you showed me earlier with gc+ref=trace the > examples did not contain the other gc+ref=trace output. That's why I > thought it would be fine. :) :) >>>> - I would prefer if resetting the reference phase times logger >>>> wouldn't be kind of an afterthought of printing :) >>>> >>>> Also it might be useful to keep the data around for somewhat >>>> longer (not throw it away after every phase). Don't we need the >>>> data for further analysis? >>> I don't have strong opinion on this. >>> >>> I didn't consider keeping log data for further analysis. This could >>> a minor reason for supporting keeping log data longer but I think >>> interspersing with existing G1 log would be the main reason of >>> keeping it. >>> >>>> This would also allow printing it later using different log tags >>>> (with different formatting). >>>> >>>> - I like the split of phasetimes into data storage and printing. >>>> I do not like that basically the timing data is created twice, >>>> once for the phasetimes, once for the GCTimer (for JFR >>>> basically). No, currently timing data is created once and used >>>> for both phase log and GCTimer. >>> Or am I missing something? >>> >>> So in summary, mostly I agree with your comments except below 2: >>> 1. Interspersing with G1 log. >>> 2. Keeping log data longer. (This should be done if we go with >>> interspersing idea) >> I started working on above 2 items. :) >> I will update webrev when I'm ready. >> > Thanks a lot for considering all my comments. > > I think the output is much nicer now :) Thanks! > Some more notes: > > - In the current change (webrev.2) the method with using the > "direct_print()" getter seems a bit forced only to keep the current > structure of the code, i.e. printing within the > ReferenceProcessor::process_references() method. Right. > What do you think about moving the printing outside of that method for > all collectors, just passing a (properly initialized - that allows > moving the reset() method into gc specific code as well) > ReferenceProcessorPhaseTimes* that is then later used for printing, > either directly, or deferred? Okay, this seems better than current one. During applying your suggestion, I tweaked a little bit, because giving the responsibility of printing logs to the callers seems not that natural to me. (I also prepared additional webrev for your original suggestion [1]) > At the location where the reference processing is done we know whether > we need to print directly or deferred. This also hides pretty specific > information about printing (like indentation level) from the reference > processing itself. > > Also that would maybe allow storing the GCTimer reference somewhere in > the ReferenceProcessorPhaseTimes so that we only need to pass a single > container for timing information around. Good idea, now GCTimer is included in ReferenceProcessorPhaseTimes. > Overall that may reduce the code quite a bit, keeps similar components > (GCTimer and ReferenceProcessorPhaseTimes) together without > ReferenceProcessor needing to know about both of them, and removes the > ReferenceProcessor "global" reference to the > ReferenceProcessorPhaseTimes, which is easier to keep track of when > looking at the code (instead of having the GCTimer passed in and the > ReferenceProcessorPhaseTimes as member). > > The collectors that print immediately probably also can get away with a > stack-allocated local ReferenceProcessorPhaseTimes, which somewhat > simplifies their lifecycle management. Right. Mostly ReferenceProcessorPhaseTimes will be stack-allocated at the time of calling process_discovered_references() or enqueue_discovered_references(), except for G1 young GC case. ~ReferenceProcessorPhaseTimes() will not be added in the destructor of G1CollectedHeap as we don't have it now. This can be addressed in separate CR if needed. > - could you please tighten the visibility of > ReferenceProcessorPhaseTimes methods a bit? The getters of that class > are only ever used in the print* methods, and even some of these print* > methods are ever called from class local methods. > > I think this would drastically decrease the surface of that method. You are right. Tried to move to 'private' as many as possible. > - there seems to be a bug in printing per-thread per-phase worker > times, the values seem to contain the absolute time at which the list > has been processed, not a duration. (with -XX:+ParallelRefProcEnabled > and gc+phases+ref=trace) > > [1512.286s][debug][gc,phases,ref] GC(834) Reference Processing: 2.5ms > [1512.286s][debug][gc,phases,ref] GC(834) SoftReference: 0.3ms > [1512.286s][debug][gc,phases,ref] GC(834) Balance queues: 0.0ms > [1512.286s][debug][gc,phases,ref] GC(834) Phase1: 0.3ms > [1512.286s][trace][gc,phases,ref] GC(834) Process lists > (ms) Min: 1512283.9, Avg: 1512283.9, Max: 1512283.9, Diff: 0.0, > Sum: 34782529.1, Workers: 23 > [1512.286s][debug][gc,phases,ref] GC(834) Phase2: 0.3ms > [1512.286s][trace][gc,phases,ref] GC(834) Process lists > (ms) Min: 1512284.2, Avg: 1512284.2, Max: 1512284.2, Diff: 0.0, > Sum: 34782535.9, Workers: 23 > > - in referenceProcessorPhaseTimes.cpp:35: the code reads > > if (_worker_time != NULL) { > ... > } > > with _worker_time being set to NULL just one line above (same with the > other constructor). > > Not sure. _worker_time check was remainder of previous change and resulted in above bug you pointed. Thanks for catching this. Fixed. > - in RefProcWorkerTimeTracker::~RefProcWorkerTimeTracker: how is it > possible that _worker_time is NULL? ReferenceProcessorPhaseTimes seems > to always allocate memory for it. Fixed. _worker_time can't be NULL. > - RefProcPhaseTimesTracker takes the DiscoveredList array as parameter, > but only ever uses it to determine how many total entries this > DiscoveredList[] has. So it seems to me that it would be better in the > name of information hiding if the ReferenceProcessor, which already has > a total_count() method, would just pass this total instead of the > entire list. The problem is that 'before/after' count should be gathered from constructor and destructor. With passing a parameter, constructor could get the total but it is impossible from destructor. But I agree with your point that passing DiscoveredList to get the total-count can be enhanced. So I changed to add a new method that returns total count at ReferenceProcessor(per ReferenceType). With this new approach, we can simplify a bit more. e.g. eliminate ReferenceProcessorPhaseTimes::max_gc_threads() and total_count_from_list() etc. > This would also remove the need for the max_gc_counts() getter in > ReferenceProcessorPhaseTimes afaics too. [...] > - "Ref Counts" vs. "Reference Counts" vs. something else in the output > of the enqueue phase: I would prefer to not use abbreviations. Since we > already mess up the logging output in a big way, we might also just go > all the way :P Changed to use 'Reference Counts'. Updated webrev: http://cr.openjdk.java.net/~sangheki/8173335/webrev.3 http://cr.openjdk.java.net/~sangheki/8173335/webrev.3_to_2/ Testing: JPRT and local test with all collectors. [1]: with your suggestion, callers will stack allocate ReferenceProcessorPhaseTimes and GCTimer can be included in ReferenceProcessorPhaseTimes. i.e. applied only your suggestion http://cr.openjdk.java.net/~sangheki/8173335/webrev.3b http://cr.openjdk.java.net/~sangheki/8173335/webrev.3b_to_2/ Thanks, Sangheon > Thanks, > Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.helin at oracle.com Thu Jun 29 09:11:19 2017 From: erik.helin at oracle.com (Erik Helin) Date: Thu, 29 Jun 2017 11:11:19 +0200 Subject: RFR: G1HRRSFlushLogBuffersOnVerify with remembered set verification does not work Message-ID: Hi all, this patch removes the developer flag -XX:G1HRRSFlushLogBuffersOnVerify. This flag has been broken for some time and I don't see any reason for keeping it. The flag is `false` by default so I guess this code isn't exercised all that much :/ I assume that the original intent of the flag was to perform an "update rs" phase before doing remembered set (rem set) verification. Due to the "update rs" phase, all rem sets would be complete, so verification would verify more rem set entries. However, since this code was added, update_rs has changed quite a bit, and this code hasn't kept up. It is no longer possible to call update_rs in the way this code expects. Instead of spending time on trying to get this code up to date, I suggest we just remove it. During verification after a collection (young or mixed) we already do this kind of rem set verification (since all rem sets must then be complete since all collections currently do update_rs). If we are worried that we verify too few rem set entries during e.g. remark and cleanup, then we could for example run with very aggressive concurrent refinement. Bug: https://bugs.openjdk.java.net/browse/JDK-8153360 Patch: http://cr.openjdk.java.net/~ehelin/8153360/00/ Test: make hotspot - this is "just" removal of code Thanks, Erik From thomas.schatzl at oracle.com Thu Jun 29 09:37:36 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 29 Jun 2017 11:37:36 +0200 Subject: RFR: G1HRRSFlushLogBuffersOnVerify with remembered set verification does not work In-Reply-To: References: Message-ID: <1498729056.2900.4.camel@oracle.com> Hi Erik, On Thu, 2017-06-29 at 11:11 +0200, Erik Helin wrote: > Hi all, > > this patch removes the developer flag > -XX:G1HRRSFlushLogBuffersOnVerify.? > This flag has been broken for some time and I don't see any reason > for keeping it. The flag is `false` by default so I guess this code > isn't exercised all that much :/ > > I assume that the original intent of the flag was to perform an > "update rs" phase before doing remembered set (rem set) verification. > Due to the "update rs" phase, all rem sets would be complete, so > verification would verify more rem set entries. However, since this > code was added, update_rs has changed quite a bit, and this code > hasn't kept up. It is no longer possible to call update_rs in the way > this code expects. > > Instead of spending time on trying to get this code up to date, I? > suggest we just remove it. During verification after a collection > (young or mixed) we already do this kind of rem set verification > (since all rem sets must then be complete since all collections > currently do update_rs). If we are worried that we verify too few rem > set entries during e.g. remark and cleanup, then we could for example > run with very aggressive concurrent refinement. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153360 > > Patch: http://cr.openjdk.java.net/~ehelin/8153360/00/ > > Test: make hotspot - this is "just" removal of code ? looks good. Please add a comment about what the last clause in the verification code actually means (heapRegion.cpp:584). Something like: // Reference may not have been refined into the remembered sets yet.? // Instead of looking into all dirty card queues, we take a shortcut // by looking at whether the corresponding card is dirty. // ObjArrays may either by marked on the object header or exactly. (Actually I would guess the "correct" clause here would be is_array() and not is_objArray(), but primitive type arrays are never marked as they do not contain references) I do not need a re-review for the comment change. Thanks, ? Thomas From robbin.ehn at oracle.com Thu Jun 29 10:49:58 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 29 Jun 2017 12:49:58 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> Message-ID: <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com> Hi Roman, I haven't had the time to test all scenarios, and the numbers are just an indication: Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg, avg of 10 worsed cleanups 0.0173s Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg, avg of 10 worsed cleanups 0.0199s Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg, avg of 10 worsed cleanups 0.0066s When MonitorUsedDeflationThreshold=0 we are talking about 120000 free monitors to deflate. And I get worse numbers doing the cleanup in 4 threads. Any idea why I see these numbers? Thanks, Robbin On 06/28/2017 10:23 PM, Roman Kennke wrote: > >> >> On 06/27/2017 09:47 PM, Roman Kennke wrote: >>> Hi Robbin, >>> >>> Ugh. Thanks for catching this. >>> Problem was that I was accounting the thread-local deflations twice: >>> once in thread-local processing (basically a leftover from my earlier >>> attempt to implement this accounting) and then again in >>> finish_deflate_idle_monitors(). Should be fixed here: >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/ >>> >> >> Nit: >> safepoint.cpp : ParallelSPCleanupTask >> "const char* name = " is not needed and 1 is unused >> > Sorry, I don't understand what you mean by this. I see code like this: > > const char* name = "deflating idle monitors"; > > and it is used a few lines below, even 2x. > > What's '1 is unused' ? > >>> >>> Side question: which jtreg targets do you usually run? >> >> Right now I cherry pick directories from: hotspot/test/ >> >> I'm going to add a decent test group for local testing. > That would be good! > > >> >>> >>> Trying: make test TEST=hotspot_all >>> gives me *lots* of failures due to missing jcstress stuff (?!) >>> And even other subsets seem to depend on several bits and pieces >>> that I >>> have no idea about. >> >> Yes, you need to use internal tool 'jib' java integrate build to get >> that work or you can set some environment where the jcstress >> application stuff is... > Uhhh. We really do want a subset of tests that we can run reliably and > that are self-contained, how else are people (without that jib thingy) > supposed to do some sanity checking with their patches? ;-) >> I have a regression on ClassLoaderData root scanning, this should not >> be related, >> but I only have 3 patches which could cause this, if it's not >> something in the environment that have changed. > Let me know if it's my patch :-) >> >> Also do not see any immediate performance gains (off vs 4 threads), it >> might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24 >> , but I need to-do some more testing. I know you often run with none >> default GSI. > > First of all, during the course of this review I reduced the change from > an actual implementation to a kind of framework, and it needs some > separate changes in the GC to make use of it. Not sure if you added > corresponding code in (e.g.) G1? > > Also, this is only really visible in code that makes excessive use of > monitors, i.e. the one linked by Carsten's original patch, or the test > org.openjdk.gcbench.roots.Synchronizers.test in gc-bench: > > http://icedtea.classpath.org/hg/gc-bench/ > > There are also some popular real-world apps that tend to do this. From > the top off my head, Cassandra is such an application. > > Thanks, Roman > >> >> I'll get back to you. >> >> Thanks, Robbin >> >>> >>> Roman >>> >>> Am 27.06.2017 um 16:51 schrieb Robbin Ehn: >>>> Hi Roman, >>>> >>>> There is something wrong in calculations: >>>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0 >>>> : pop=27051 free=215487 >>>> >>>> free is larger than population, have not had the time to dig into this. >>>> >>>> Thanks, Robbin >>>> >>>> On 06/22/2017 10:19 PM, Roman Kennke wrote: >>>>> So here's the latest iteration of that patch: >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ >>>>> >>>>> >>>>> I checked and fixed all the counters. The problem here is that they >>>>> are >>>>> not updated in a single place (deflate_idle_monitors() ) but in >>>>> several >>>>> places, potentially by multiple threads. I split up deflation into >>>>> prepare_.. and a finish_.. methods to initialize local and update >>>>> global >>>>> counters respectively, and pass around a counters object (allocated on >>>>> stack) to the various code paths that use it. Updating the counters >>>>> always happen under a lock, there's no need to do anything special >>>>> with >>>>> regards to concurrency. >>>>> >>>>> I also checked the nmethod marking, but there doesn't seem to be >>>>> anything in that code that looks problematic under concurrency. The >>>>> worst that can happen is that two threads write the same value into an >>>>> nmethod field. I think we can live with that ;-) >>>>> >>>>> Good to go? >>>>> >>>>> Tested by running specjvm and jcstress fastdebug+release without >>>>> issues. >>>>> >>>>> Roman >>>>> >>>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn: >>>>>> Hi Roman, >>>>>> >>>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote: >>>>>>> Hi David, >>>>>>> thanks for reviewing. I'll be on vacation the next two weeks too, >>>>>>> with >>>>>>> only sporadic access to work stuff. >>>>>>> Yes, exposure will not be as good as otherwise, but it's not totally >>>>>>> untested either: the serial code path is the same as the >>>>>>> parallel, the >>>>>>> only difference is that it's not actually called by multiple >>>>>>> threads. >>>>>>> It's ok I think. >>>>>>> >>>>>>> I found two more issues that I think should be addressed: >>>>>>> - There are some counters in deflate_idle_monitors() and I'm not >>>>>>> sure I >>>>>>> correctly handle them in the split-up and MT'ed thread-local/ global >>>>>>> list deflation >>>>>>> - nmethod marking seems to unconditionally poke true or something >>>>>>> like >>>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's >>>>>>> probably worth checking if it's already true, especially when doing >>>>>>> this >>>>>>> with multiple threads concurrently. >>>>>>> >>>>>>> I'll send an updated patch around later, I hope I can get to it >>>>>>> today... >>>>>> >>>>>> I'll review that when you get it out. >>>>>> I think this looks as a reasonable step before we tackle this with a >>>>>> major effort, such as the JEP you and Carsten doing. >>>>>> And another effort to 'fix' nmethods marking. >>>>>> >>>>>> Internal discussion yesterday lead us to conclude that the runtime >>>>>> will probably need more threads. >>>>>> This would be a good driver to do a 'global' worker pool which serves >>>>>> both gc, runtime and safepoints with threads. >>>>>> >>>>>>> >>>>>>> Roman >>>>>>> >>>>>>>> Hi Roman, >>>>>>>> >>>>>>>> I am about to disappear on an extended vacation so will let others >>>>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime, >>>>>>>> but >>>>>>>> an opt-in by the particular GC developers. Okay. My only concern >>>>>>>> with >>>>>>>> that is if Shenandoah is the only GC that currently opts in then >>>>>>>> this >>>>>>>> code is not going to get much testing and will be more prone to >>>>>>>> incidental breakage. >>>>>> >>>>>> As I mentioned before, it seem like Erik ? have some idea, maybe he >>>>>> can do this after his barrier patch. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> /Robbin >>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote: >>>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>>>>>>>>> Hi Roman, >>>>>>>>>>> >>>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>>>>>>>>> Hi Roman, I agree that is really needed but: >>>>>>>>>>>>> >>>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>>>>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We need to be able to use the workers at a safepoint during >>>>>>>>>>>>>> concurrent >>>>>>>>>>>>>> GC work (which also uses the same workers). This does not >>>>>>>>>>>>>> only >>>>>>>>>>>>>> require >>>>>>>>>>>>>> that those workers be suspended, like e.g. >>>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. >>>>>>>>>>>>>> have >>>>>>>>>>>>>> finished their tasks. This needs some careful handling to >>>>>>>>>>>>>> work >>>>>>>>>>>>>> without >>>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the >>>>>>>>>>>>>> corresponding >>>>>>>>>>>>>> run_task() call and also the tasks themselves need to join >>>>>>>>>>>>>> the >>>>>>>>>>>>>> STS and >>>>>>>>>>>>>> handle requests for safepoints not by yielding, but by >>>>>>>>>>>>>> leaving >>>>>>>>>>>>>> the >>>>>>>>>>>>>> task. >>>>>>>>>>>>>> This is far too peculiar for me to make the call to hook >>>>>>>>>>>>>> up GC >>>>>>>>>>>>>> workers >>>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I >>>>>>>>>>>>>> left the >>>>>>>>>>>>>> API in >>>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better >>>>>>>>>>>>>> about G1 >>>>>>>>>>>>>> and CMS >>>>>>>>>>>>>> should make that call, or else just use a separate thread >>>>>>>>>>>>>> pool. >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is it ok now? >>>>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup" >>>>>>>>>>>>> workers >>>>>>>>>>>>> inside Shenandoah, >>>>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers, >>>>>>>>>>>>> e.g.: >>>>>>>>>>>>> >>>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>>>>>>>>> _cleanup_workers->total_workers() : 1; >>>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>>>>>>>>> if (_cleanup_workers != NULL) { >>>>>>>>>>>>> _cleanup_workers->run_task(&cleanup, >>>>>>>>>>>>> _num_cleanup_workers); >>>>>>>>>>>>> } else { >>>>>>>>>>>>> cleanup.work(0); >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> That way you don't even need your new flags, but it will be >>>>>>>>>>>>> up to >>>>>>>>>>>>> the >>>>>>>>>>>>> other GCs to make their worker available >>>>>>>>>>>>> or cheat with a separate workgang. >>>>>>>>>>>> I can do that, I don't mind. The question is, do we want that? >>>>>>>>>>> The problem is that we do not want to haste such decision, we >>>>>>>>>>> believe >>>>>>>>>>> there is a better solution. >>>>>>>>>>> I think you also would want another solution. >>>>>>>>>>> But it's seems like such solution with 1 'global' thread pool >>>>>>>>>>> either >>>>>>>>>>> own by GC or the VM it self is quite the undertaking. >>>>>>>>>>> Since this probably will not be done any time soon my >>>>>>>>>>> suggestion is, >>>>>>>>>>> to not hold you back (we also want this), just to make >>>>>>>>>>> the code parallel and as an intermediate step ask the GC if it >>>>>>>>>>> minds >>>>>>>>>>> sharing it's thread. >>>>>>>>>>> >>>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will >>>>>>>>>>> share >>>>>>>>>>> the code for a separate thread pool, do something of it's own or >>>>>>>>>>> wait until the bigger question about thread pool(s) have been >>>>>>>>>>> resolved. >>>>>>>>>>> >>>>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer >>>>>>>>>>> and >>>>>>>>>>> flags for it we might limit our future options. >>>>>>>>>>> >>>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang' >>>>>>>>>>>> though. I >>>>>>>>>>>> see >>>>>>>>>>>> that both G1 and CMS suspend their worker threads at a >>>>>>>>>>>> safepoint. >>>>>>>>>>>> However: >>>>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g. >>>>>>>>>>> number >>>>>>>>>>> of concurrent marking threads and parallel safepoint threads >>>>>>>>>>> since >>>>>>>>>>> they compete for cpu time. >>>>>>>>>>> As the code looks now, I think that decisions must be made by >>>>>>>>>>> the >>>>>>>>>>> GC. >>>>>>>>>> Ok, I see your point. I updated the proposed patch accordingly: >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>>>>>>>>> >>>>>>>>> Oops. Minor mistake there. Correction: >>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>>>>>>>> >>>>>>>>> >>>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it >>>>>>>>> into >>>>>>>>> collectedHeap.hpp, resulting in build failure...) >>>>>>>>> >>>>>>>>> Roman >>>>>>>>> >>>>>>> >>>>> >>> > From rkennke at redhat.com Thu Jun 29 11:42:42 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 29 Jun 2017 13:42:42 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com> References: <486b5a72-bef8-4ebc-2729-3fe3aa3ab3b9@oracle.com> <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com> Message-ID: <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com> How many Java threads are involved in monitor Inflation ? Parallelization is spread by Java threads (i.e. each worker claims and deflates monitors of 1 java thread per step). Roman Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn : >Hi Roman, > >I haven't had the time to test all scenarios, and the numbers are just >an indication: > >Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg, avg of >10 worsed cleanups 0.0173s >Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg, avg of >10 worsed cleanups 0.0199s >Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg, avg of >10 worsed cleanups 0.0066s > >When MonitorUsedDeflationThreshold=0 we are talking about 120000 free >monitors to deflate. >And I get worse numbers doing the cleanup in 4 threads. > >Any idea why I see these numbers? > >Thanks, Robbin > >On 06/28/2017 10:23 PM, Roman Kennke wrote: >> >>> >>> On 06/27/2017 09:47 PM, Roman Kennke wrote: >>>> Hi Robbin, >>>> >>>> Ugh. Thanks for catching this. >>>> Problem was that I was accounting the thread-local deflations >twice: >>>> once in thread-local processing (basically a leftover from my >earlier >>>> attempt to implement this accounting) and then again in >>>> finish_deflate_idle_monitors(). Should be fixed here: >>>> >>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/ >>>> >>> >>> Nit: >>> safepoint.cpp : ParallelSPCleanupTask >>> "const char* name = " is not needed and 1 is unused >>> >> Sorry, I don't understand what you mean by this. I see code like >this: >> >> const char* name = "deflating idle monitors"; >> >> and it is used a few lines below, even 2x. >> >> What's '1 is unused' ? >> >>>> >>>> Side question: which jtreg targets do you usually run? >>> >>> Right now I cherry pick directories from: hotspot/test/ >>> >>> I'm going to add a decent test group for local testing. >> That would be good! >> >> >>> >>>> >>>> Trying: make test TEST=hotspot_all >>>> gives me *lots* of failures due to missing jcstress stuff (?!) >>>> And even other subsets seem to depend on several bits and pieces >>>> that I >>>> have no idea about. >>> >>> Yes, you need to use internal tool 'jib' java integrate build to get >>> that work or you can set some environment where the jcstress >>> application stuff is... >> Uhhh. We really do want a subset of tests that we can run reliably >and >> that are self-contained, how else are people (without that jib >thingy) >> supposed to do some sanity checking with their patches? ;-) >>> I have a regression on ClassLoaderData root scanning, this should >not >>> be related, >>> but I only have 3 patches which could cause this, if it's not >>> something in the environment that have changed. >> Let me know if it's my patch :-) >>> >>> Also do not see any immediate performance gains (off vs 4 threads), >it >>> might be >http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24 >>> , but I need to-do some more testing. I know you often run with none >>> default GSI. >> >> First of all, during the course of this review I reduced the change >from >> an actual implementation to a kind of framework, and it needs some >> separate changes in the GC to make use of it. Not sure if you added >> corresponding code in (e.g.) G1? >> >> Also, this is only really visible in code that makes excessive use of >> monitors, i.e. the one linked by Carsten's original patch, or the >test >> org.openjdk.gcbench.roots.Synchronizers.test in gc-bench: >> >> http://icedtea.classpath.org/hg/gc-bench/ >> >> There are also some popular real-world apps that tend to do this. >From >> the top off my head, Cassandra is such an application. >> >> Thanks, Roman >> >>> >>> I'll get back to you. >>> >>> Thanks, Robbin >>> >>>> >>>> Roman >>>> >>>> Am 27.06.2017 um 16:51 schrieb Robbin Ehn: >>>>> Hi Roman, >>>>> >>>>> There is something wrong in calculations: >>>>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 >ForceMonitorScavenge=0 >>>>> : pop=27051 free=215487 >>>>> >>>>> free is larger than population, have not had the time to dig into >this. >>>>> >>>>> Thanks, Robbin >>>>> >>>>> On 06/22/2017 10:19 PM, Roman Kennke wrote: >>>>>> So here's the latest iteration of that patch: >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ >>>>>> >>>>>> >>>>>> I checked and fixed all the counters. The problem here is that >they >>>>>> are >>>>>> not updated in a single place (deflate_idle_monitors() ) but in >>>>>> several >>>>>> places, potentially by multiple threads. I split up deflation >into >>>>>> prepare_.. and a finish_.. methods to initialize local and update >>>>>> global >>>>>> counters respectively, and pass around a counters object >(allocated on >>>>>> stack) to the various code paths that use it. Updating the >counters >>>>>> always happen under a lock, there's no need to do anything >special >>>>>> with >>>>>> regards to concurrency. >>>>>> >>>>>> I also checked the nmethod marking, but there doesn't seem to be >>>>>> anything in that code that looks problematic under concurrency. >The >>>>>> worst that can happen is that two threads write the same value >into an >>>>>> nmethod field. I think we can live with that ;-) >>>>>> >>>>>> Good to go? >>>>>> >>>>>> Tested by running specjvm and jcstress fastdebug+release without >>>>>> issues. >>>>>> >>>>>> Roman >>>>>> >>>>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn: >>>>>>> Hi Roman, >>>>>>> >>>>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote: >>>>>>>> Hi David, >>>>>>>> thanks for reviewing. I'll be on vacation the next two weeks >too, >>>>>>>> with >>>>>>>> only sporadic access to work stuff. >>>>>>>> Yes, exposure will not be as good as otherwise, but it's not >totally >>>>>>>> untested either: the serial code path is the same as the >>>>>>>> parallel, the >>>>>>>> only difference is that it's not actually called by multiple >>>>>>>> threads. >>>>>>>> It's ok I think. >>>>>>>> >>>>>>>> I found two more issues that I think should be addressed: >>>>>>>> - There are some counters in deflate_idle_monitors() and I'm >not >>>>>>>> sure I >>>>>>>> correctly handle them in the split-up and MT'ed thread-local/ >global >>>>>>>> list deflation >>>>>>>> - nmethod marking seems to unconditionally poke true or >something >>>>>>>> like >>>>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but >it's >>>>>>>> probably worth checking if it's already true, especially when >doing >>>>>>>> this >>>>>>>> with multiple threads concurrently. >>>>>>>> >>>>>>>> I'll send an updated patch around later, I hope I can get to it >>>>>>>> today... >>>>>>> >>>>>>> I'll review that when you get it out. >>>>>>> I think this looks as a reasonable step before we tackle this >with a >>>>>>> major effort, such as the JEP you and Carsten doing. >>>>>>> And another effort to 'fix' nmethods marking. >>>>>>> >>>>>>> Internal discussion yesterday lead us to conclude that the >runtime >>>>>>> will probably need more threads. >>>>>>> This would be a good driver to do a 'global' worker pool which >serves >>>>>>> both gc, runtime and safepoints with threads. >>>>>>> >>>>>>>> >>>>>>>> Roman >>>>>>>> >>>>>>>>> Hi Roman, >>>>>>>>> >>>>>>>>> I am about to disappear on an extended vacation so will let >others >>>>>>>>> pursue this. IIUC this is longer an opt-in by the user at >runtime, >>>>>>>>> but >>>>>>>>> an opt-in by the particular GC developers. Okay. My only >concern >>>>>>>>> with >>>>>>>>> that is if Shenandoah is the only GC that currently opts in >then >>>>>>>>> this >>>>>>>>> code is not going to get much testing and will be more prone >to >>>>>>>>> incidental breakage. >>>>>>> >>>>>>> As I mentioned before, it seem like Erik ? have some idea, maybe >he >>>>>>> can do this after his barrier patch. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> /Robbin >>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote: >>>>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke: >>>>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn: >>>>>>>>>>>> Hi Roman, >>>>>>>>>>>> >>>>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote: >>>>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn: >>>>>>>>>>>>>> Hi Roman, I agree that is really needed but: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote: >>>>>>>>>>>>>>> I realized that sharing workers with GC is not so easy. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We need to be able to use the workers at a safepoint >during >>>>>>>>>>>>>>> concurrent >>>>>>>>>>>>>>> GC work (which also uses the same workers). This does >not >>>>>>>>>>>>>>> only >>>>>>>>>>>>>>> require >>>>>>>>>>>>>>> that those workers be suspended, like e.g. >>>>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, >i.e. >>>>>>>>>>>>>>> have >>>>>>>>>>>>>>> finished their tasks. This needs some careful handling >to >>>>>>>>>>>>>>> work >>>>>>>>>>>>>>> without >>>>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around >the >>>>>>>>>>>>>>> corresponding >>>>>>>>>>>>>>> run_task() call and also the tasks themselves need to >join >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> STS and >>>>>>>>>>>>>>> handle requests for safepoints not by yielding, but by >>>>>>>>>>>>>>> leaving >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> task. >>>>>>>>>>>>>>> This is far too peculiar for me to make the call to hook >>>>>>>>>>>>>>> up GC >>>>>>>>>>>>>>> workers >>>>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I >>>>>>>>>>>>>>> left the >>>>>>>>>>>>>>> API in >>>>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better >>>>>>>>>>>>>>> about G1 >>>>>>>>>>>>>>> and CMS >>>>>>>>>>>>>>> should make that call, or else just use a separate >thread >>>>>>>>>>>>>>> pool. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is it ok now? >>>>>>>>>>>>>> I still think you should put the "Parallel Safepoint >Cleanup" >>>>>>>>>>>>>> workers >>>>>>>>>>>>>> inside Shenandoah, >>>>>>>>>>>>>> so the SafepointSynchronizer only calls >get_safepoint_workers, >>>>>>>>>>>>>> e.g.: >>>>>>>>>>>>>> >>>>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers(); >>>>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ? >>>>>>>>>>>>>> _cleanup_workers->total_workers() : 1; >>>>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>>>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers); >>>>>>>>>>>>>> if (_cleanup_workers != NULL) { >>>>>>>>>>>>>> _cleanup_workers->run_task(&cleanup, >>>>>>>>>>>>>> _num_cleanup_workers); >>>>>>>>>>>>>> } else { >>>>>>>>>>>>>> cleanup.work(0); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> That way you don't even need your new flags, but it will >be >>>>>>>>>>>>>> up to >>>>>>>>>>>>>> the >>>>>>>>>>>>>> other GCs to make their worker available >>>>>>>>>>>>>> or cheat with a separate workgang. >>>>>>>>>>>>> I can do that, I don't mind. The question is, do we want >that? >>>>>>>>>>>> The problem is that we do not want to haste such decision, >we >>>>>>>>>>>> believe >>>>>>>>>>>> there is a better solution. >>>>>>>>>>>> I think you also would want another solution. >>>>>>>>>>>> But it's seems like such solution with 1 'global' thread >pool >>>>>>>>>>>> either >>>>>>>>>>>> own by GC or the VM it self is quite the undertaking. >>>>>>>>>>>> Since this probably will not be done any time soon my >>>>>>>>>>>> suggestion is, >>>>>>>>>>>> to not hold you back (we also want this), just to make >>>>>>>>>>>> the code parallel and as an intermediate step ask the GC if >it >>>>>>>>>>>> minds >>>>>>>>>>>> sharing it's thread. >>>>>>>>>>>> >>>>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 >will >>>>>>>>>>>> share >>>>>>>>>>>> the code for a separate thread pool, do something of it's >own or >>>>>>>>>>>> wait until the bigger question about thread pool(s) have >been >>>>>>>>>>>> resolved. >>>>>>>>>>>> >>>>>>>>>>>> By adding a thread pool directly to the >SafepointSynchronizer >>>>>>>>>>>> and >>>>>>>>>>>> flags for it we might limit our future options. >>>>>>>>>>>> >>>>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang' >>>>>>>>>>>>> though. I >>>>>>>>>>>>> see >>>>>>>>>>>>> that both G1 and CMS suspend their worker threads at a >>>>>>>>>>>>> safepoint. >>>>>>>>>>>>> However: >>>>>>>>>>>> Yes it's not cheating but I want decent heuristics between >e.g. >>>>>>>>>>>> number >>>>>>>>>>>> of concurrent marking threads and parallel safepoint >threads >>>>>>>>>>>> since >>>>>>>>>>>> they compete for cpu time. >>>>>>>>>>>> As the code looks now, I think that decisions must be made >by >>>>>>>>>>>> the >>>>>>>>>>>> GC. >>>>>>>>>>> Ok, I see your point. I updated the proposed patch >accordingly: >>>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>>>>>>>>>> >>>>>>>>>> Oops. Minor mistake there. Correction: >>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to >add it >>>>>>>>>> into >>>>>>>>>> collectedHeap.hpp, resulting in build failure...) >>>>>>>>>> >>>>>>>>>> Roman >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbin.ehn at oracle.com Thu Jun 29 12:17:07 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 29 Jun 2017 14:17:07 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com> References: <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com> <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com> Message-ID: <9a882506-282a-ec74-27de-5b22e258e352@oracle.com> The test is using 24 threads (whatever that means), total number of javathreads is 57 (including compiler, etc...). [29.186s][error][os ] Num threads:57 [29.186s][error][os ] omInUseCount:0 [29.186s][error][os ] omInUseCount:2064 [29.187s][error][os ] omInUseCount:1861 [29.188s][error][os ] omInUseCount:1058 [29.188s][error][os ] omInUseCount:2 [29.188s][error][os ] omInUseCount:577 [29.189s][error][os ] omInUseCount:1443 [29.189s][error][os ] omInUseCount:122 [29.189s][error][os ] omInUseCount:47 [29.189s][error][os ] omInUseCount:497 [29.189s][error][os ] omInUseCount:16 [29.189s][error][os ] omInUseCount:113 [29.189s][error][os ] omInUseCount:5 [29.189s][error][os ] omInUseCount:678 [29.190s][error][os ] omInUseCount:105 [29.190s][error][os ] omInUseCount:609 [29.190s][error][os ] omInUseCount:286 [29.190s][error][os ] omInUseCount:228 [29.190s][error][os ] omInUseCount:1391 [29.191s][error][os ] omInUseCount:1652 [29.191s][error][os ] omInUseCount:325 [29.191s][error][os ] omInUseCount:439 [29.192s][error][os ] omInUseCount:994 [29.192s][error][os ] omInUseCount:103 [29.192s][error][os ] omInUseCount:2337 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:1 [29.193s][error][os ] omInUseCount:1 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:1 [29.193s][error][os ] omInUseCount:2 [29.193s][error][os ] omInUseCount:1 [29.193s][error][os ] omInUseCount:1 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:1 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:1 [29.193s][error][os ] omInUseCount:0 [29.193s][error][os ] omInUseCount:0 So in my setup even if you parallel the per thread in use monitors work the synchronization overhead is still larger. /Robbin On 06/29/2017 01:42 PM, Roman Kennke wrote: > How many Java threads are involved in monitor Inflation ? Parallelization is spread by Java threads (i.e. each worker claims and deflates monitors of 1 java thread per step). > > Roman > > Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn : > > Hi Roman, > > I haven't had the time to test all scenarios, and the numbers are just an indication: > > Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg, avg of 10 worsed cleanups 0.0173s > Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg, avg of 10 worsed cleanups 0.0199s > Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg, avg of 10 worsed cleanups 0.0066s > > When MonitorUsedDeflationThreshold=0 we are talking about 120000 free monitors to deflate. > And I get worse numbers doing the cleanup in 4 threads. > > Any idea why I see these numbers? > > Thanks, Robbin > > On 06/28/2017 10:23 PM, Roman Kennke wrote: > > > > On 06/27/2017 09:47 PM, Roman Kennke wrote: > > Hi Robbin, > > Ugh. Thanks for catching this. > Problem was that I was accounting the thread-local deflations twice: > once in thread-local processing (basically a leftover from my earlier > attempt to implement this accounting) and then again in > finish_deflate_idle_monitors(). Should be fixed here: > > http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/ > > > > Nit: > safepoint.cpp : ParallelSPCleanupTask > "const char* name = " is not needed and 1 is unused > > > Sorry, I don't understand what you mean by this. I see code like this: > > const char* name = "deflating idle monitors"; > > and it is used a few lines below, even 2x. > > What's '1 is unused' ? > > > Side question: which jtreg targets do you usually run? > > > Right now I cherry pick directories from: hotspot/test/ > > I'm going to add a decent test group for local testing. > > That would be good! > > > > > Trying: make test TEST=hotspot_all > gives me *lots* of failures due to missing jcstress stuff (?!) > And even other subsets seem to depend on several bits and pieces > that I > have no idea about. > > > Yes, you need to use internal tool 'jib' java integrate build to get > that work or you can set some environment where the jcstress > application stuff is... > > Uhhh. We really do want a subset of tests that we can run reliably and > that are self-contained, how else are people (without that jib thingy) > supposed to do some sanity checking with their patches? ;-) > > I have a regression on ClassLoaderData root scanning, this should not > be related, > but I only have 3 patches which could cause this, if it's not > something in the environment that have changed. > > Let me know if it's my patch :-) > > > Also do not see any immediate performance gains (off vs 4 threads), it > might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24 > , but I need to-do some more testing. I know you often run with none > default GSI. > > > First of all, during the course of this review I reduced the change from > an actual implementation to a kind of framework, and it needs some > separate changes in the GC to make use of it. Not sure if you added > corresponding code in (e.g.) G1? > > Also, this is only really visible in code that makes excessive use of > monitors, i.e. the one linked by Carsten's original patch, or the test > org.openjdk.gcbench.roots.Synchronizers.test in gc-bench: > > http://icedtea.classpath.org/hg/gc-bench/ > > There are also some popular real-world apps that tend to do this. From > the top off my head, Cassandra is such an application. > > Thanks, Roman > > > I'll get back to you. > > Thanks, Robbin > > > Roman > > Am 27.06.2017 um 16:51 schrieb Robbin Ehn: > > Hi Roman, > > There is something wrong in calculations: > INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0 > : pop=27051 free=215487 > > free is larger than population, have not had the time to dig into this. > > Thanks, Robbin > > On 06/22/2017 10:19 PM, Roman Kennke wrote: > > So here's the latest iteration of that patch: > > http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ > > > I checked and fixed all the counters. The problem here is that they > are > not updated in a single place (deflate_idle_monitors() ) but in > several > places, potentially by multiple threads. I split up deflation into > prepare_.. and a finish_.. methods to initialize local and update > global > counters respectively, and pass around a counters object (allocated on > stack) to the various code paths that use it. Updating the counters > always happen under a lock, there's no need to do anything special > with > regards to concurrency. > > I also checked the nmethod marking, but there doesn't seem to be > anything in that code that looks problematic under concurrency. The > worst that can happen is that two threads write the same value into an > nmethod field. I think we can live with that ;-) > > Good to go? > > Tested by running specjvm and jcstress fastdebug+release without > issues. > > Roman > > Am 02.06.2017 um 12:39 schrieb Robbin Ehn: > > Hi Roman, > > On 06/02/2017 11:41 AM, Roman Kennke wrote: > > Hi David, > thanks for reviewing. I'll be on vacation the next two weeks too, > with > only sporadic access to work stuff. > Yes, exposure will not be as good as otherwise, but it's not totally > untested either: the serial code path is the same as the > parallel, the > only difference is that it's not actually called by multiple > threads. > It's ok I think. > > I found two more issues that I think should be addressed: > - There are some counters in deflate_idle_monitors() and I'm not > sure I > correctly handle them in the split-up and MT'ed thread-local/ global > list deflation > - nmethod marking seems to unconditionally poke true or something > like > that in nmethod fields. This doesn't hurt correctness-wise, but it's > probably worth checking if it's already true, especially when doing > this > with multiple threads concurrently. > > I'll send an updated patch around later, I hope I can get to it > today... > > > I'll review that when you get it out. > I think this looks as a reasonable step before we tackle this with a > major effort, such as the JEP you and Carsten doing. > And another effort to 'fix' nmethods marking. > > Internal discussion yesterday lead us to conclude that the runtime > will probably need more threads. > This would be a good driver to do a 'global' worker pool which serves > both gc, runtime and safepoints with threads. > > > Roman > > Hi Roman, > > I am about to disappear on an extended vacation so will let others > pursue this. IIUC this is longer an opt-in by the user at runtime, > but > an opt-in by the particular GC developers. Okay. My only concern > with > that is if Shenandoah is the only GC that currently opts in then > this > code is not going to get much testing and will be more prone to > incidental breakage. > > > As I mentioned before, it seem like Erik ? have some idea, maybe he > can do this after his barrier patch. > > Thanks! > > /Robbin > > > Cheers, > David > > On 2/06/2017 2:21 AM, Roman Kennke wrote: > > Am 01.06.2017 um 17:50 schrieb Roman Kennke: > > Am 01.06.2017 um 14:18 schrieb Robbin Ehn: > > Hi Roman, > > On 06/01/2017 11:29 AM, Roman Kennke wrote: > > Am 31.05.2017 um 22:06 schrieb Robbin Ehn: > > Hi Roman, I agree that is really needed but: > > On 05/31/2017 10:27 AM, Roman Kennke wrote: > > I realized that sharing workers with GC is not so easy. > > We need to be able to use the workers at a safepoint during > concurrent > GC work (which also uses the same workers). This does not > only > require > that those workers be suspended, like e.g. > SuspendibleThreadSet::yield(), but they need to be idle, i.e. > have > finished their tasks. This needs some careful handling to > work > without > races: it requires a SuspendibleThreadSetJoiner around the > corresponding > run_task() call and also the tasks themselves need to join > the > STS and > handle requests for safepoints not by yielding, but by > leaving > the > task. > This is far too peculiar for me to make the call to hook > up GC > workers > for safepoint cleanup, and I thus removed those parts. I > left the > API in > CollectedHeap in place. I think GC devs who know better > about G1 > and CMS > should make that call, or else just use a separate thread > pool. > > http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ > > > Is it ok now? > > I still think you should put the "Parallel Safepoint Cleanup" > workers > inside Shenandoah, > so the SafepointSynchronizer only calls get_safepoint_workers, > e.g.: > > _cleanup_workers = heap->get_safepoint_workers(); > _num_cleanup_workers = _cleanup_workers != NULL ? > _cleanup_workers->total_workers() : 1; > ParallelSPCleanupTask cleanup(_cleanup_subtasks); > StrongRootsScope srs(_num_cleanup_workers); > if (_cleanup_workers != NULL) { > _cleanup_workers->run_task(&cleanup, > _num_cleanup_workers); > } else { > cleanup.work (0); > } > > That way you don't even need your new flags, but it will be > up to > the > other GCs to make their worker available > or cheat with a separate workgang. > > I can do that, I don't mind. The question is, do we want that? > > The problem is that we do not want to haste such decision, we > believe > there is a better solution. > I think you also would want another solution. > But it's seems like such solution with 1 'global' thread pool > either > own by GC or the VM it self is quite the undertaking. > Since this probably will not be done any time soon my > suggestion is, > to not hold you back (we also want this), just to make > the code parallel and as an intermediate step ask the GC if it > minds > sharing it's thread. > > Now when Shenandoah is merged it's possible that e.g. G1 will > share > the code for a separate thread pool, do something of it's own or > wait until the bigger question about thread pool(s) have been > resolved. > > By adding a thread pool directly to the SafepointSynchronizer > and > flags for it we might limit our future options. > > I wouldn't call it 'cheating with a separate workgang' > though. I > see > that both G1 and CMS suspend their worker threads at a > safepoint. > However: > > Yes it's not cheating but I want decent heuristics between e.g. > number > of concurrent marking threads and parallel safepoint threads > since > they compete for cpu time. > As the code looks now, I think that decisions must be made by > the > GC. > > Ok, I see your point. I updated the proposed patch accordingly: > > http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ > > > Oops. Minor mistake there. Correction: > http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ > > > (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it > into > collectedHeap.hpp, resulting in build failure...) > > Roman > > > > > > > > -- > Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From rkennke at redhat.com Thu Jun 29 18:25:58 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 29 Jun 2017 20:25:58 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <9a882506-282a-ec74-27de-5b22e258e352@oracle.com> References: <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com> <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com> <9a882506-282a-ec74-27de-5b22e258e352@oracle.com> Message-ID: <47667919-0786-56a0-ebf9-d7c1b48766c2@redhat.com> I just did a run with gcbench. I am running: build/linux-x86_64-normal-server-release/images/jdk/bin/java -jar target/benchmarks.jar roots.Sync --jvmArgs "-Xmx8g -Xms8g -XX:ParallelSafepointCleanupThreads=1 -XX:-UseBiasedLocking --add-opens java.base/jdk.internal.misc=ALL-UNNAMED -XX:+PrintSafepointStatistics" -p size=500000 -wi 5 -i 5 -f 1 i.e. I am giving it 500,000 monitors per thread on 8 java threads. with VMThread I am getting: vmop [ threads: total initially_running wait_to_block ][ time: spin block sync cleanup vmop ] page_trap_count 0,646: G1IncCollectionPause [ 19 4 6 ][ 0 0 0 158 225 ] 4 1,073: G1IncCollectionPause [ 19 5 6 ][ 1 0 1 159 174 ] 5 1,961: G1IncCollectionPause [ 19 2 6 ][ 0 0 0 130 66 ] 2 2,202: G1IncCollectionPause [ 19 5 6 ][ 1 0 1 127 70 ] 5 2,445: G1IncCollectionPause [ 19 7 7 ][ 1 0 1 127 66 ] 7 2,684: G1IncCollectionPause [ 19 7 7 ][ 1 0 1 127 66 ] 7 3,371: G1IncCollectionPause [ 19 5 7 ][ 1 0 1 127 74 ] 5 3,619: G1IncCollectionPause [ 19 5 6 ][ 1 0 1 127 66 ] 5 3,857: G1IncCollectionPause [ 19 6 6 ][ 1 0 1 126 68 ] 6 I.e. it gets to fairly consistent >120us for cleanup. With 4 safepoint cleanup threads I get: vmop [ threads: total initially_running wait_to_block ][ time: spin block sync cleanup vmop ] page_trap_count 0,650: G1IncCollectionPause [ 19 4 6 ][ 0 0 0 63 197 ] 4 0,951: G1IncCollectionPause [ 19 0 1 ][ 0 0 0 64 151 ] 0 1,214: G1IncCollectionPause [ 19 7 8 ][ 0 0 0 62 93 ] 6 1,942: G1IncCollectionPause [ 19 4 6 ][ 1 0 1 59 71 ] 4 2,118: G1IncCollectionPause [ 19 6 6 ][ 1 0 1 59 72 ] 6 2,296: G1IncCollectionPause [ 19 5 6 ][ 0 0 0 59 69 ] 5 i.e. fairly consistently around 60 us (I think it's us?!) I grant you that I'm throwing way way more monitors at it. With just 12000 monitors per thread I get columns of 0s under cleanup. :-) Roman Here's with 1 tAm 29.06.2017 um 14:17 schrieb Robbin Ehn: > The test is using 24 threads (whatever that means), total number of > javathreads is 57 (including compiler, etc...). > > [29.186s][error][os ] Num threads:57 > [29.186s][error][os ] omInUseCount:0 > [29.186s][error][os ] omInUseCount:2064 > [29.187s][error][os ] omInUseCount:1861 > [29.188s][error][os ] omInUseCount:1058 > [29.188s][error][os ] omInUseCount:2 > [29.188s][error][os ] omInUseCount:577 > [29.189s][error][os ] omInUseCount:1443 > [29.189s][error][os ] omInUseCount:122 > [29.189s][error][os ] omInUseCount:47 > [29.189s][error][os ] omInUseCount:497 > [29.189s][error][os ] omInUseCount:16 > [29.189s][error][os ] omInUseCount:113 > [29.189s][error][os ] omInUseCount:5 > [29.189s][error][os ] omInUseCount:678 > [29.190s][error][os ] omInUseCount:105 > [29.190s][error][os ] omInUseCount:609 > [29.190s][error][os ] omInUseCount:286 > [29.190s][error][os ] omInUseCount:228 > [29.190s][error][os ] omInUseCount:1391 > [29.191s][error][os ] omInUseCount:1652 > [29.191s][error][os ] omInUseCount:325 > [29.191s][error][os ] omInUseCount:439 > [29.192s][error][os ] omInUseCount:994 > [29.192s][error][os ] omInUseCount:103 > [29.192s][error][os ] omInUseCount:2337 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:1 > [29.193s][error][os ] omInUseCount:1 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:1 > [29.193s][error][os ] omInUseCount:2 > [29.193s][error][os ] omInUseCount:1 > [29.193s][error][os ] omInUseCount:1 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:1 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:1 > [29.193s][error][os ] omInUseCount:0 > [29.193s][error][os ] omInUseCount:0 > > So in my setup even if you parallel the per thread in use monitors > work the synchronization overhead is still larger. > > /Robbin > > On 06/29/2017 01:42 PM, Roman Kennke wrote: >> How many Java threads are involved in monitor Inflation ? >> Parallelization is spread by Java threads (i.e. each worker claims >> and deflates monitors of 1 java thread per step). >> >> Roman >> >> Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn >> : >> >> Hi Roman, >> >> I haven't had the time to test all scenarios, and the numbers are >> just an indication: >> >> Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg, >> avg of 10 worsed cleanups 0.0173s >> Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg, >> avg of 10 worsed cleanups 0.0199s >> Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg, >> avg of 10 worsed cleanups 0.0066s >> >> When MonitorUsedDeflationThreshold=0 we are talking about 120000 >> free monitors to deflate. >> And I get worse numbers doing the cleanup in 4 threads. >> >> Any idea why I see these numbers? >> >> Thanks, Robbin >> >> On 06/28/2017 10:23 PM, Roman Kennke wrote: >> >> >> >> On 06/27/2017 09:47 PM, Roman Kennke wrote: >> >> Hi Robbin, >> >> Ugh. Thanks for catching this. >> Problem was that I was accounting the thread-local >> deflations twice: >> once in thread-local processing (basically a leftover >> from my earlier >> attempt to implement this accounting) and then again in >> finish_deflate_idle_monitors(). Should be fixed here: >> >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/ >> >> >> >> >> Nit: >> safepoint.cpp : ParallelSPCleanupTask >> "const char* name = " is not needed and 1 is unused >> >> >> Sorry, I don't understand what you mean by this. I see code >> like this: >> >> const char* name = "deflating idle monitors"; >> >> and it is used a few lines below, even 2x. >> >> What's '1 is unused' ? >> >> >> Side question: which jtreg targets do you usually run? >> >> >> Right now I cherry pick directories from: hotspot/test/ >> >> I'm going to add a decent test group for local testing. >> >> That would be good! >> >> >> >> >> Trying: make test TEST=hotspot_all >> gives me *lots* of failures due to missing jcstress >> stuff (?!) >> And even other subsets seem to depend on several bits >> and pieces >> that I >> have no idea about. >> >> >> Yes, you need to use internal tool 'jib' java integrate >> build to get >> that work or you can set some environment where the jcstress >> application stuff is... >> >> Uhhh. We really do want a subset of tests that we can run >> reliably and >> that are self-contained, how else are people (without that >> jib thingy) >> supposed to do some sanity checking with their patches? ;-) >> >> I have a regression on ClassLoaderData root scanning, >> this should not >> be related, >> but I only have 3 patches which could cause this, if it's >> not >> something in the environment that have changed. >> >> Let me know if it's my patch :-) >> >> >> Also do not see any immediate performance gains (off vs 4 >> threads), it >> might be >> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24 >> , but I need to-do some more testing. I know you often >> run with none >> default GSI. >> >> >> First of all, during the course of this review I reduced the >> change from >> an actual implementation to a kind of framework, and it needs >> some >> separate changes in the GC to make use of it. Not sure if you >> added >> corresponding code in (e.g.) G1? >> >> Also, this is only really visible in code that makes >> excessive use of >> monitors, i.e. the one linked by Carsten's original patch, or >> the test >> org.openjdk.gcbench.roots.Synchronizers.test in gc-bench: >> >> http://icedtea.classpath.org/hg/gc-bench/ >> >> There are also some popular real-world apps that tend to do >> this. From >> the top off my head, Cassandra is such an application. >> >> Thanks, Roman >> >> >> I'll get back to you. >> >> Thanks, Robbin >> >> >> Roman >> >> Am 27.06.2017 um 16:51 schrieb Robbin Ehn: >> >> Hi Roman, >> >> There is something wrong in calculations: >> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 >> ForceMonitorScavenge=0 >> : pop=27051 free=215487 >> >> free is larger than population, have not had the >> time to dig into this. >> >> Thanks, Robbin >> >> On 06/22/2017 10:19 PM, Roman Kennke wrote: >> >> So here's the latest iteration of that patch: >> >> >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ >> >> >> >> I checked and fixed all the counters. The >> problem here is that they >> are >> not updated in a single place >> (deflate_idle_monitors() ) but in >> several >> places, potentially by multiple threads. I >> split up deflation into >> prepare_.. and a finish_.. methods to >> initialize local and update >> global >> counters respectively, and pass around a >> counters object (allocated on >> stack) to the various code paths that use it. >> Updating the counters >> always happen under a lock, there's no need >> to do anything special >> with >> regards to concurrency. >> >> I also checked the nmethod marking, but there >> doesn't seem to be >> anything in that code that looks problematic >> under concurrency. The >> worst that can happen is that two threads >> write the same value into an >> nmethod field. I think we can live with that ;-) >> >> Good to go? >> >> Tested by running specjvm and jcstress >> fastdebug+release without >> issues. >> >> Roman >> >> Am 02.06.2017 um 12:39 schrieb Robbin Ehn: >> >> Hi Roman, >> >> On 06/02/2017 11:41 AM, Roman Kennke wrote: >> >> Hi David, >> thanks for reviewing. I'll be on >> vacation the next two weeks too, >> with >> only sporadic access to work stuff. >> Yes, exposure will not be as good as >> otherwise, but it's not totally >> untested either: the serial code path >> is the same as the >> parallel, the >> only difference is that it's not >> actually called by multiple >> threads. >> It's ok I think. >> >> I found two more issues that I think >> should be addressed: >> - There are some counters in >> deflate_idle_monitors() and I'm not >> sure I >> correctly handle them in the split-up >> and MT'ed thread-local/ global >> list deflation >> - nmethod marking seems to >> unconditionally poke true or something >> like >> that in nmethod fields. This doesn't >> hurt correctness-wise, but it's >> probably worth checking if it's >> already true, especially when doing >> this >> with multiple threads concurrently. >> >> I'll send an updated patch around >> later, I hope I can get to it >> today... >> >> >> I'll review that when you get it out. >> I think this looks as a reasonable step >> before we tackle this with a >> major effort, such as the JEP you and >> Carsten doing. >> And another effort to 'fix' nmethods >> marking. >> >> Internal discussion yesterday lead us to >> conclude that the runtime >> will probably need more threads. >> This would be a good driver to do a >> 'global' worker pool which serves >> both gc, runtime and safepoints with >> threads. >> >> >> Roman >> >> Hi Roman, >> >> I am about to disappear on an >> extended vacation so will let others >> pursue this. IIUC this is longer >> an opt-in by the user at runtime, >> but >> an opt-in by the particular GC >> developers. Okay. My only concern >> with >> that is if Shenandoah is the only >> GC that currently opts in then >> this >> code is not going to get much >> testing and will be more prone to >> incidental breakage. >> >> >> As I mentioned before, it seem like Erik >> ? have some idea, maybe he >> can do this after his barrier patch. >> >> Thanks! >> >> /Robbin >> >> >> Cheers, >> David >> >> On 2/06/2017 2:21 AM, Roman >> Kennke wrote: >> >> Am 01.06.2017 um 17:50 >> schrieb Roman Kennke: >> >> Am 01.06.2017 um 14:18 >> schrieb Robbin Ehn: >> >> Hi Roman, >> >> On 06/01/2017 11:29 >> AM, Roman Kennke wrote: >> >> Am 31.05.2017 um >> 22:06 schrieb Robbin Ehn: >> >> Hi Roman, I >> agree that is really needed but: >> >> On 05/31/2017 >> 10:27 AM, Roman Kennke wrote: >> >> I >> realized that sharing workers with GC is not so easy. >> >> We need >> to be able to use the workers at a safepoint during >> concurrent >> GC work >> (which also uses the same workers). This does not >> only >> require >> that >> those workers be suspended, like e.g. >> >> SuspendibleThreadSet::yield(), but they need to be idle, i.e. >> have >> finished >> their tasks. This needs some careful handling to >> work >> without >> races: it >> requires a SuspendibleThreadSetJoiner around the >> >> corresponding >> >> run_task() call and also the tasks themselves need to join >> the >> STS and >> handle >> requests for safepoints not by yielding, but by >> leaving >> the >> task. >> This is >> far too peculiar for me to make the call to hook >> up GC >> workers >> for >> safepoint cleanup, and I thus removed those parts. I >> left the >> API in >> >> CollectedHeap in place. I think GC devs who know better >> about G1 >> and CMS >> should >> make that call, or else just use a separate thread >> pool. >> >> >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >> >> >> >> Is it ok >> now? >> >> I still think >> you should put the "Parallel Safepoint Cleanup" >> workers >> inside >> Shenandoah, >> so the >> SafepointSynchronizer only calls get_safepoint_workers, >> e.g.: >> >> >> _cleanup_workers = heap->get_safepoint_workers(); >> >> _num_cleanup_workers = _cleanup_workers != NULL ? >> >> _cleanup_workers->total_workers() : 1; >> >> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >> >> StrongRootsScope srs(_num_cleanup_workers); >> if >> (_cleanup_workers != NULL) { >> >> _cleanup_workers->run_task(&cleanup, >> >> _num_cleanup_workers); >> } else { >> cleanup.work >> (0); >> } >> >> That way you >> don't even need your new flags, but it will be >> up to >> the >> other GCs to >> make their worker available >> or cheat with >> a separate workgang. >> >> I can do that, I >> don't mind. The question is, do we want that? >> >> The problem is that >> we do not want to haste such decision, we >> believe >> there is a better >> solution. >> I think you also >> would want another solution. >> But it's seems like >> such solution with 1 'global' thread pool >> either >> own by GC or the VM >> it self is quite the undertaking. >> Since this probably >> will not be done any time soon my >> suggestion is, >> to not hold you back >> (we also want this), just to make >> the code parallel and >> as an intermediate step ask the GC if it >> minds >> sharing it's thread. >> >> Now when Shenandoah >> is merged it's possible that e.g. G1 will >> share >> the code for a >> separate thread pool, do something of it's own or >> wait until the bigger >> question about thread pool(s) have been >> resolved. >> >> By adding a thread >> pool directly to the SafepointSynchronizer >> and >> flags for it we might >> limit our future options. >> >> I wouldn't call >> it 'cheating with a separate workgang' >> though. I >> see >> that both G1 and >> CMS suspend their worker threads at a >> safepoint. >> However: >> >> Yes it's not cheating >> but I want decent heuristics between e.g. >> number >> of concurrent marking >> threads and parallel safepoint threads >> since >> they compete for cpu >> time. >> As the code looks >> now, I think that decisions must be made by >> the >> GC. >> >> Ok, I see your point. I >> updated the proposed patch accordingly: >> >> >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >> >> >> >> Oops. Minor mistake there. >> Correction: >> >> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >> >> >> >> (Removed 'class WorkGang' >> from safepoint.hpp, and forgot to add it >> into >> collectedHeap.hpp, resulting >> in build failure...) >> >> Roman >> >> >> >> >> >> >> >> -- >> Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From robbin.ehn at oracle.com Thu Jun 29 19:27:15 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 29 Jun 2017 21:27:15 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <47667919-0786-56a0-ebf9-d7c1b48766c2@redhat.com> References: <5c80f8df-27c9-f9a9-dc6d-47f9c6019a61@redhat.com> <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com> <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com> <9a882506-282a-ec74-27de-5b22e258e352@oracle.com> <47667919-0786-56a0-ebf9-d7c1b48766c2@redhat.com> Message-ID: <72d197f7-a99b-84bc-26f7-c9a84da26ccd@oracle.com> Hi Roman, Thanks, There seem to be a performance gain vs old just running VM thread (again shaky numbers, but an indication): Old code with, MonitorUsedDeflationThreshold=0, 0.003099s, avg of 10 worsed cleanups 0.0213s Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s, avg of 10 worsed cleanups 0.0173s I'm assuming that combining deflation and nmethods marking in same pass is the reason for this. Great! I'm happy, looks good! Thanks for fixing! /Robbin On 06/29/2017 08:25 PM, Roman Kennke wrote: > I just did a run with gcbench. > I am running: > > build/linux-x86_64-normal-server-release/images/jdk/bin/java -jar > target/benchmarks.jar roots.Sync --jvmArgs "-Xmx8g -Xms8g > -XX:ParallelSafepointCleanupThreads=1 -XX:-UseBiasedLocking --add-opens > java.base/jdk.internal.misc=ALL-UNNAMED -XX:+PrintSafepointStatistics" > -p size=500000 -wi 5 -i 5 -f 1 > > i.e. I am giving it 500,000 monitors per thread on 8 java threads. > > with VMThread I am getting: > > vmop [ threads: total > initially_running wait_to_block ][ time: spin block sync > cleanup vmop ] page_trap_count > 0,646: G1IncCollectionPause [ > 19 4 6 ][ 0 0 0 > 158 225 ] 4 > 1,073: G1IncCollectionPause [ > 19 5 6 ][ 1 0 1 > 159 174 ] 5 > 1,961: G1IncCollectionPause [ > 19 2 6 ][ 0 0 0 > 130 66 ] 2 > 2,202: G1IncCollectionPause [ > 19 5 6 ][ 1 0 1 > 127 70 ] 5 > 2,445: G1IncCollectionPause [ > 19 7 7 ][ 1 0 1 > 127 66 ] 7 > 2,684: G1IncCollectionPause [ > 19 7 7 ][ 1 0 1 > 127 66 ] 7 > 3,371: G1IncCollectionPause [ > 19 5 7 ][ 1 0 1 > 127 74 ] 5 > 3,619: G1IncCollectionPause [ > 19 5 6 ][ 1 0 1 > 127 66 ] 5 > 3,857: G1IncCollectionPause [ > 19 6 6 ][ 1 0 1 > 126 68 ] 6 > > I.e. it gets to fairly consistent >120us for cleanup. > > With 4 safepoint cleanup threads I get: > > > vmop [ threads: total > initially_running wait_to_block ][ time: spin block sync > cleanup vmop ] page_trap_count > 0,650: G1IncCollectionPause [ > 19 4 6 ][ 0 0 0 > 63 197 ] 4 > 0,951: G1IncCollectionPause [ > 19 0 1 ][ 0 0 0 > 64 151 ] 0 > 1,214: G1IncCollectionPause [ > 19 7 8 ][ 0 0 0 > 62 93 ] 6 > 1,942: G1IncCollectionPause [ > 19 4 6 ][ 1 0 1 > 59 71 ] 4 > 2,118: G1IncCollectionPause [ > 19 6 6 ][ 1 0 1 > 59 72 ] 6 > 2,296: G1IncCollectionPause [ > 19 5 6 ][ 0 0 0 > 59 69 ] 5 > > i.e. fairly consistently around 60 us (I think it's us?!) > > I grant you that I'm throwing way way more monitors at it. With just > 12000 monitors per thread I get columns of 0s under cleanup. :-) > > Roman > > Here's with 1 tAm 29.06.2017 um 14:17 schrieb Robbin Ehn: >> The test is using 24 threads (whatever that means), total number of >> javathreads is 57 (including compiler, etc...). >> >> [29.186s][error][os ] Num threads:57 >> [29.186s][error][os ] omInUseCount:0 >> [29.186s][error][os ] omInUseCount:2064 >> [29.187s][error][os ] omInUseCount:1861 >> [29.188s][error][os ] omInUseCount:1058 >> [29.188s][error][os ] omInUseCount:2 >> [29.188s][error][os ] omInUseCount:577 >> [29.189s][error][os ] omInUseCount:1443 >> [29.189s][error][os ] omInUseCount:122 >> [29.189s][error][os ] omInUseCount:47 >> [29.189s][error][os ] omInUseCount:497 >> [29.189s][error][os ] omInUseCount:16 >> [29.189s][error][os ] omInUseCount:113 >> [29.189s][error][os ] omInUseCount:5 >> [29.189s][error][os ] omInUseCount:678 >> [29.190s][error][os ] omInUseCount:105 >> [29.190s][error][os ] omInUseCount:609 >> [29.190s][error][os ] omInUseCount:286 >> [29.190s][error][os ] omInUseCount:228 >> [29.190s][error][os ] omInUseCount:1391 >> [29.191s][error][os ] omInUseCount:1652 >> [29.191s][error][os ] omInUseCount:325 >> [29.191s][error][os ] omInUseCount:439 >> [29.192s][error][os ] omInUseCount:994 >> [29.192s][error][os ] omInUseCount:103 >> [29.192s][error][os ] omInUseCount:2337 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:1 >> [29.193s][error][os ] omInUseCount:1 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:1 >> [29.193s][error][os ] omInUseCount:2 >> [29.193s][error][os ] omInUseCount:1 >> [29.193s][error][os ] omInUseCount:1 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:1 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:1 >> [29.193s][error][os ] omInUseCount:0 >> [29.193s][error][os ] omInUseCount:0 >> >> So in my setup even if you parallel the per thread in use monitors >> work the synchronization overhead is still larger. >> >> /Robbin >> >> On 06/29/2017 01:42 PM, Roman Kennke wrote: >>> How many Java threads are involved in monitor Inflation ? >>> Parallelization is spread by Java threads (i.e. each worker claims >>> and deflates monitors of 1 java thread per step). >>> >>> Roman >>> >>> Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn >>> : >>> >>> Hi Roman, >>> >>> I haven't had the time to test all scenarios, and the numbers are >>> just an indication: >>> >>> Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg, >>> avg of 10 worsed cleanups 0.0173s >>> Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg, >>> avg of 10 worsed cleanups 0.0199s >>> Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg, >>> avg of 10 worsed cleanups 0.0066s >>> >>> When MonitorUsedDeflationThreshold=0 we are talking about 120000 >>> free monitors to deflate. >>> And I get worse numbers doing the cleanup in 4 threads. >>> >>> Any idea why I see these numbers? >>> >>> Thanks, Robbin >>> >>> On 06/28/2017 10:23 PM, Roman Kennke wrote: >>> >>> >>> >>> On 06/27/2017 09:47 PM, Roman Kennke wrote: >>> >>> Hi Robbin, >>> >>> Ugh. Thanks for catching this. >>> Problem was that I was accounting the thread-local >>> deflations twice: >>> once in thread-local processing (basically a leftover >>> from my earlier >>> attempt to implement this accounting) and then again in >>> finish_deflate_idle_monitors(). Should be fixed here: >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/ >>> >>> >>> >>> >>> Nit: >>> safepoint.cpp : ParallelSPCleanupTask >>> "const char* name = " is not needed and 1 is unused >>> >>> >>> Sorry, I don't understand what you mean by this. I see code >>> like this: >>> >>> const char* name = "deflating idle monitors"; >>> >>> and it is used a few lines below, even 2x. >>> >>> What's '1 is unused' ? >>> >>> >>> Side question: which jtreg targets do you usually run? >>> >>> >>> Right now I cherry pick directories from: hotspot/test/ >>> >>> I'm going to add a decent test group for local testing. >>> >>> That would be good! >>> >>> >>> >>> >>> Trying: make test TEST=hotspot_all >>> gives me *lots* of failures due to missing jcstress >>> stuff (?!) >>> And even other subsets seem to depend on several bits >>> and pieces >>> that I >>> have no idea about. >>> >>> >>> Yes, you need to use internal tool 'jib' java integrate >>> build to get >>> that work or you can set some environment where the jcstress >>> application stuff is... >>> >>> Uhhh. We really do want a subset of tests that we can run >>> reliably and >>> that are self-contained, how else are people (without that >>> jib thingy) >>> supposed to do some sanity checking with their patches? ;-) >>> >>> I have a regression on ClassLoaderData root scanning, >>> this should not >>> be related, >>> but I only have 3 patches which could cause this, if it's >>> not >>> something in the environment that have changed. >>> >>> Let me know if it's my patch :-) >>> >>> >>> Also do not see any immediate performance gains (off vs 4 >>> threads), it >>> might be >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24 >>> , but I need to-do some more testing. I know you often >>> run with none >>> default GSI. >>> >>> >>> First of all, during the course of this review I reduced the >>> change from >>> an actual implementation to a kind of framework, and it needs >>> some >>> separate changes in the GC to make use of it. Not sure if you >>> added >>> corresponding code in (e.g.) G1? >>> >>> Also, this is only really visible in code that makes >>> excessive use of >>> monitors, i.e. the one linked by Carsten's original patch, or >>> the test >>> org.openjdk.gcbench.roots.Synchronizers.test in gc-bench: >>> >>> http://icedtea.classpath.org/hg/gc-bench/ >>> >>> There are also some popular real-world apps that tend to do >>> this. From >>> the top off my head, Cassandra is such an application. >>> >>> Thanks, Roman >>> >>> >>> I'll get back to you. >>> >>> Thanks, Robbin >>> >>> >>> Roman >>> >>> Am 27.06.2017 um 16:51 schrieb Robbin Ehn: >>> >>> Hi Roman, >>> >>> There is something wrong in calculations: >>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 >>> ForceMonitorScavenge=0 >>> : pop=27051 free=215487 >>> >>> free is larger than population, have not had the >>> time to dig into this. >>> >>> Thanks, Robbin >>> >>> On 06/22/2017 10:19 PM, Roman Kennke wrote: >>> >>> So here's the latest iteration of that patch: >>> >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/ >>> >>> >>> >>> I checked and fixed all the counters. The >>> problem here is that they >>> are >>> not updated in a single place >>> (deflate_idle_monitors() ) but in >>> several >>> places, potentially by multiple threads. I >>> split up deflation into >>> prepare_.. and a finish_.. methods to >>> initialize local and update >>> global >>> counters respectively, and pass around a >>> counters object (allocated on >>> stack) to the various code paths that use it. >>> Updating the counters >>> always happen under a lock, there's no need >>> to do anything special >>> with >>> regards to concurrency. >>> >>> I also checked the nmethod marking, but there >>> doesn't seem to be >>> anything in that code that looks problematic >>> under concurrency. The >>> worst that can happen is that two threads >>> write the same value into an >>> nmethod field. I think we can live with that ;-) >>> >>> Good to go? >>> >>> Tested by running specjvm and jcstress >>> fastdebug+release without >>> issues. >>> >>> Roman >>> >>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn: >>> >>> Hi Roman, >>> >>> On 06/02/2017 11:41 AM, Roman Kennke wrote: >>> >>> Hi David, >>> thanks for reviewing. I'll be on >>> vacation the next two weeks too, >>> with >>> only sporadic access to work stuff. >>> Yes, exposure will not be as good as >>> otherwise, but it's not totally >>> untested either: the serial code path >>> is the same as the >>> parallel, the >>> only difference is that it's not >>> actually called by multiple >>> threads. >>> It's ok I think. >>> >>> I found two more issues that I think >>> should be addressed: >>> - There are some counters in >>> deflate_idle_monitors() and I'm not >>> sure I >>> correctly handle them in the split-up >>> and MT'ed thread-local/ global >>> list deflation >>> - nmethod marking seems to >>> unconditionally poke true or something >>> like >>> that in nmethod fields. This doesn't >>> hurt correctness-wise, but it's >>> probably worth checking if it's >>> already true, especially when doing >>> this >>> with multiple threads concurrently. >>> >>> I'll send an updated patch around >>> later, I hope I can get to it >>> today... >>> >>> >>> I'll review that when you get it out. >>> I think this looks as a reasonable step >>> before we tackle this with a >>> major effort, such as the JEP you and >>> Carsten doing. >>> And another effort to 'fix' nmethods >>> marking. >>> >>> Internal discussion yesterday lead us to >>> conclude that the runtime >>> will probably need more threads. >>> This would be a good driver to do a >>> 'global' worker pool which serves >>> both gc, runtime and safepoints with >>> threads. >>> >>> >>> Roman >>> >>> Hi Roman, >>> >>> I am about to disappear on an >>> extended vacation so will let others >>> pursue this. IIUC this is longer >>> an opt-in by the user at runtime, >>> but >>> an opt-in by the particular GC >>> developers. Okay. My only concern >>> with >>> that is if Shenandoah is the only >>> GC that currently opts in then >>> this >>> code is not going to get much >>> testing and will be more prone to >>> incidental breakage. >>> >>> >>> As I mentioned before, it seem like Erik >>> ? have some idea, maybe he >>> can do this after his barrier patch. >>> >>> Thanks! >>> >>> /Robbin >>> >>> >>> Cheers, >>> David >>> >>> On 2/06/2017 2:21 AM, Roman >>> Kennke wrote: >>> >>> Am 01.06.2017 um 17:50 >>> schrieb Roman Kennke: >>> >>> Am 01.06.2017 um 14:18 >>> schrieb Robbin Ehn: >>> >>> Hi Roman, >>> >>> On 06/01/2017 11:29 >>> AM, Roman Kennke wrote: >>> >>> Am 31.05.2017 um >>> 22:06 schrieb Robbin Ehn: >>> >>> Hi Roman, I >>> agree that is really needed but: >>> >>> On 05/31/2017 >>> 10:27 AM, Roman Kennke wrote: >>> >>> I >>> realized that sharing workers with GC is not so easy. >>> >>> We need >>> to be able to use the workers at a safepoint during >>> concurrent >>> GC work >>> (which also uses the same workers). This does not >>> only >>> require >>> that >>> those workers be suspended, like e.g. >>> >>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. >>> have >>> finished >>> their tasks. This needs some careful handling to >>> work >>> without >>> races: it >>> requires a SuspendibleThreadSetJoiner around the >>> >>> corresponding >>> >>> run_task() call and also the tasks themselves need to join >>> the >>> STS and >>> handle >>> requests for safepoints not by yielding, but by >>> leaving >>> the >>> task. >>> This is >>> far too peculiar for me to make the call to hook >>> up GC >>> workers >>> for >>> safepoint cleanup, and I thus removed those parts. I >>> left the >>> API in >>> >>> CollectedHeap in place. I think GC devs who know better >>> about G1 >>> and CMS >>> should >>> make that call, or else just use a separate thread >>> pool. >>> >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/ >>> >>> >>> >>> Is it ok >>> now? >>> >>> I still think >>> you should put the "Parallel Safepoint Cleanup" >>> workers >>> inside >>> Shenandoah, >>> so the >>> SafepointSynchronizer only calls get_safepoint_workers, >>> e.g.: >>> >>> >>> _cleanup_workers = heap->get_safepoint_workers(); >>> >>> _num_cleanup_workers = _cleanup_workers != NULL ? >>> >>> _cleanup_workers->total_workers() : 1; >>> >>> ParallelSPCleanupTask cleanup(_cleanup_subtasks); >>> >>> StrongRootsScope srs(_num_cleanup_workers); >>> if >>> (_cleanup_workers != NULL) { >>> >>> _cleanup_workers->run_task(&cleanup, >>> >>> _num_cleanup_workers); >>> } else { >>> cleanup.work >>> (0); >>> } >>> >>> That way you >>> don't even need your new flags, but it will be >>> up to >>> the >>> other GCs to >>> make their worker available >>> or cheat with >>> a separate workgang. >>> >>> I can do that, I >>> don't mind. The question is, do we want that? >>> >>> The problem is that >>> we do not want to haste such decision, we >>> believe >>> there is a better >>> solution. >>> I think you also >>> would want another solution. >>> But it's seems like >>> such solution with 1 'global' thread pool >>> either >>> own by GC or the VM >>> it self is quite the undertaking. >>> Since this probably >>> will not be done any time soon my >>> suggestion is, >>> to not hold you back >>> (we also want this), just to make >>> the code parallel and >>> as an intermediate step ask the GC if it >>> minds >>> sharing it's thread. >>> >>> Now when Shenandoah >>> is merged it's possible that e.g. G1 will >>> share >>> the code for a >>> separate thread pool, do something of it's own or >>> wait until the bigger >>> question about thread pool(s) have been >>> resolved. >>> >>> By adding a thread >>> pool directly to the SafepointSynchronizer >>> and >>> flags for it we might >>> limit our future options. >>> >>> I wouldn't call >>> it 'cheating with a separate workgang' >>> though. I >>> see >>> that both G1 and >>> CMS suspend their worker threads at a >>> safepoint. >>> However: >>> >>> Yes it's not cheating >>> but I want decent heuristics between e.g. >>> number >>> of concurrent marking >>> threads and parallel safepoint threads >>> since >>> they compete for cpu >>> time. >>> As the code looks >>> now, I think that decisions must be made by >>> the >>> GC. >>> >>> Ok, I see your point. I >>> updated the proposed patch accordingly: >>> >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/ >>> >>> >>> >>> Oops. Minor mistake there. >>> Correction: >>> >>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/ >>> >>> >>> >>> (Removed 'class WorkGang' >>> from safepoint.hpp, and forgot to add it >>> into >>> collectedHeap.hpp, resulting >>> in build failure...) >>> >>> Roman >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. > > From rkennke at redhat.com Thu Jun 29 20:04:18 2017 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 29 Jun 2017 22:04:18 +0200 Subject: RFR: Parallelize safepoint cleanup In-Reply-To: <72d197f7-a99b-84bc-26f7-c9a84da26ccd@oracle.com> References: <46ad874e-eb41-7927-265a-40dea92dfe1e@oracle.com> <5711258b-99b0-e06f-ba6e-0b6b55d88345@redhat.com> <0e1e2779-9316-b756-6cc8-e0c8add14a94@oracle.com> <1910961c-11bd-0e86-dd03-4fce66b9969f@redhat.com> <2b466176-b688-53a8-bef9-c7ec2c8c745b@oracle.com> <42872a15-d26c-9798-c6a2-f3f7c945baf7@redhat.com> <5e7c7d00-4acd-bea3-3525-33dbd9159efb@oracle.com> <6f2c6de7-298b-bf14-ab1f-430c4acd43c9@redhat.com> <5cd676de-872d-6d4a-691b-da561173f7d0@oracle.com> <61d80e98-275f-b2b8-4ac7-6d5d03b047de@redhat.com> <676d3b56-cee0-b68a-d700-e43695355148@redhat.com> <1fbd2b4a-9aef-d6db-726e-929b6b466e4c@oracle.com> <08391C19-4675-475C-A30D-F10B364B5AF3@redhat.com> <9a882506-282a-ec74-27de-5b22e258e352@oracle.com> <47667919-0786-56a0-ebf9-d7c1b48766c2@redhat.com> <72d197f7-a99b-84bc-26f7-c9a84da26ccd@oracle.com> Message-ID: <8dfc2752-36f1-4444-243a-975818c7dc92@redhat.com> Am 29.06.2017 um 21:27 schrieb Robbin Ehn: > Hi Roman, > > Thanks, > > There seem to be a performance gain vs old just running VM thread > (again shaky numbers, but an indication): > > Old code with, MonitorUsedDeflationThreshold=0, 0.003099s, avg of 10 > worsed cleanups 0.0213s > Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s, avg of 10 > worsed cleanups 0.0173s > > I'm assuming that combining deflation and nmethods marking in same > pass is the reason for this. > Great! Yes, that seems likely. Thanks for your patient reviewing and testing! :-) Also, the real winner (for me) was merging deflation and nmethod marking into GC pass (as proposed in the very first patch). This parallelizes much better because the GC can do other (root marking) work in parallel. Unfortunately, this is currently not possible with OpenJDK GCs, because they all use preserve_marks() and restore_marks() before/after GC to store away the mark words... but we need them for deflation. (Shenandoah doesn't do this, and can thus benefit from this optimization, and I suppose G1 could do it too -- after all it shouldn't require the object header for marking, right?) Roman From erik.helin at oracle.com Fri Jun 30 09:37:12 2017 From: erik.helin at oracle.com (Erik Helin) Date: Fri, 30 Jun 2017 11:37:12 +0200 Subject: RFR: 8183281: Remove unnecessary call to increment_gc_time_stamp Message-ID: <646a5d9a-b6d3-82c9-3937-027c3193d4c0@oracle.com> Hi all, the following small patch removes an unnecessary call to increment_gc_time_stamp from G1CollectedHeap::do_collection_pause_at_safepoint (and the long, wrong, comment above the call). We already do a call increment_gc_time_stamp much earlier in do_collection_pause_at_safepoint, which is enough. The reasons outlined in the comment motivating a second call is no longer true, the code has changed (but the comment has not). Bug: https://bugs.openjdk.java.net/browse/JDK-8183281 Patch: see below Testing: make hotspot Thanks, Erik # HG changeset patch # User ehelin # Date 1498814642 -7200 # Fri Jun 30 11:24:02 2017 +0200 # Node ID 62400b3cbec4e0d06e0d6c21c9486070d8c906a4 # Parent 10ccf0a5f63fdca04d9eda2c774ccdd0e12bc1a1 8183281: Remove unnecessary call to increment_gc_time_stamp diff -r 10ccf0a5f63f -r 62400b3cbec4 src/share/vm/gc/g1/g1CollectedHeap.cpp --- a/src/share/vm/gc/g1/g1CollectedHeap.cpp Thu Jun 29 19:09:04 2017 +0000 +++ b/src/share/vm/gc/g1/g1CollectedHeap.cpp Fri Jun 30 11:24:02 2017 +0200 @@ -3266,29 +3266,6 @@ MemoryService::track_memory_usage(); - // In prepare_for_verify() below we'll need to scan the deferred - // update buffers to bring the RSets up-to-date if - // G1HRRSFlushLogBuffersOnVerify has been set. While scanning - // the update buffers we'll probably need to scan cards on the - // regions we just allocated to (i.e., the GC alloc - // regions). However, during the last GC we called - // set_saved_mark() on all the GC alloc regions, so card - // scanning might skip the [saved_mark_word()...top()] area of - // those regions (i.e., the area we allocated objects into - // during the last GC). But it shouldn't. Given that - // saved_mark_word() is conditional on whether the GC time stamp - // on the region is current or not, by incrementing the GC time - // stamp here we invalidate all the GC time stamps on all the - // regions and saved_mark_word() will simply return top() for - // all the regions. This is a nicer way of ensuring this rather - // than iterating over the regions and fixing them. In fact, the - // GC time stamp increment here also ensures that - // saved_mark_word() will return top() between pauses, i.e., - // during concurrent refinement. So we don't need the - // is_gc_active() check to decided which top to use when - // scanning cards (see CR 7039627). - increment_gc_time_stamp(); - if (VerifyRememberedSets) { log_info(gc, verify)("[Verifying RemSets after GC]"); VerifyRegionRemSetClosure v_cl; From erik.helin at oracle.com Fri Jun 30 09:47:37 2017 From: erik.helin at oracle.com (Erik Helin) Date: Fri, 30 Jun 2017 11:47:37 +0200 Subject: RFR: G1HRRSFlushLogBuffersOnVerify with remembered set verification does not work In-Reply-To: <1498729056.2900.4.camel@oracle.com> References: <1498729056.2900.4.camel@oracle.com> Message-ID: <7e6c57ef-7e02-0b0d-5a35-3ea395089e30@oracle.com> On 06/29/2017 11:37 AM, Thomas Schatzl wrote: >> Patch: http://cr.openjdk.java.net/~ehelin/8153360/00/ >> >> Test: make hotspot - this is "just" removal of code > > looks good. > > Please add a comment about what the last clause in the verification > code actually means (heapRegion.cpp:584). Something like: > > // Reference may not have been refined into the remembered sets yet. > // Instead of looking into all dirty card queues, we take a shortcut > // by looking at whether the corresponding card is dirty. > // ObjArrays may either by marked on the object header or exactly. > > (Actually I would guess the "correct" clause here would be is_array() > and not is_objArray(), but primitive type arrays are never marked as > they do not contain references) > > I do not need a re-review for the comment change. Thanks for the review Thomas, I will add the comment before I push! Erik > Thanks, > Thomas > From stefan.johansson at oracle.com Fri Jun 30 11:53:00 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 30 Jun 2017 13:53:00 +0200 Subject: RFR: 8183281: Remove unnecessary call to increment_gc_time_stamp In-Reply-To: <646a5d9a-b6d3-82c9-3937-027c3193d4c0@oracle.com> References: <646a5d9a-b6d3-82c9-3937-027c3193d4c0@oracle.com> Message-ID: <5b2dff36-0a55-feb8-7e80-52e4562a5651@oracle.com> Hi Erik, On 2017-06-30 11:37, Erik Helin wrote: > Hi all, > > the following small patch removes an unnecessary call to > increment_gc_time_stamp from > G1CollectedHeap::do_collection_pause_at_safepoint (and the long, > wrong, comment above the call). > > We already do a call increment_gc_time_stamp much earlier in > do_collection_pause_at_safepoint, which is enough. The reasons > outlined in the comment motivating a second call is no longer true, > the code has changed (but the comment has not). > > Bug: https://bugs.openjdk.java.net/browse/JDK-8183281 > Patch: see below > Testing: make hotspot > Patch looks good, but I would like to see some more testing than just building hotspot. Running the gc jtreg tests for example. Thanks for cleaning up the code, Stefan > Thanks, > Erik > > # HG changeset patch > # User ehelin > # Date 1498814642 -7200 > # Fri Jun 30 11:24:02 2017 +0200 > # Node ID 62400b3cbec4e0d06e0d6c21c9486070d8c906a4 > # Parent 10ccf0a5f63fdca04d9eda2c774ccdd0e12bc1a1 > 8183281: Remove unnecessary call to increment_gc_time_stamp > > diff -r 10ccf0a5f63f -r 62400b3cbec4 > src/share/vm/gc/g1/g1CollectedHeap.cpp > --- a/src/share/vm/gc/g1/g1CollectedHeap.cpp Thu Jun 29 19:09:04 > 2017 +0000 > +++ b/src/share/vm/gc/g1/g1CollectedHeap.cpp Fri Jun 30 11:24:02 > 2017 +0200 > @@ -3266,29 +3266,6 @@ > > MemoryService::track_memory_usage(); > > - // In prepare_for_verify() below we'll need to scan the deferred > - // update buffers to bring the RSets up-to-date if > - // G1HRRSFlushLogBuffersOnVerify has been set. While scanning > - // the update buffers we'll probably need to scan cards on the > - // regions we just allocated to (i.e., the GC alloc > - // regions). However, during the last GC we called > - // set_saved_mark() on all the GC alloc regions, so card > - // scanning might skip the [saved_mark_word()...top()] area of > - // those regions (i.e., the area we allocated objects into > - // during the last GC). But it shouldn't. Given that > - // saved_mark_word() is conditional on whether the GC time stamp > - // on the region is current or not, by incrementing the GC time > - // stamp here we invalidate all the GC time stamps on all the > - // regions and saved_mark_word() will simply return top() for > - // all the regions. This is a nicer way of ensuring this rather > - // than iterating over the regions and fixing them. In fact, the > - // GC time stamp increment here also ensures that > - // saved_mark_word() will return top() between pauses, i.e., > - // during concurrent refinement. So we don't need the > - // is_gc_active() check to decided which top to use when > - // scanning cards (see CR 7039627). > - increment_gc_time_stamp(); > - > if (VerifyRememberedSets) { > log_info(gc, verify)("[Verifying RemSets after GC]"); > VerifyRegionRemSetClosure v_cl; From stefan.johansson at oracle.com Fri Jun 30 11:56:15 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 30 Jun 2017 13:56:15 +0200 Subject: RFR: G1HRRSFlushLogBuffersOnVerify with remembered set verification does not work In-Reply-To: <7e6c57ef-7e02-0b0d-5a35-3ea395089e30@oracle.com> References: <1498729056.2900.4.camel@oracle.com> <7e6c57ef-7e02-0b0d-5a35-3ea395089e30@oracle.com> Message-ID: Hi Erik, On 2017-06-30 11:47, Erik Helin wrote: > On 06/29/2017 11:37 AM, Thomas Schatzl wrote: >>> Patch: http://cr.openjdk.java.net/~ehelin/8153360/00/ >>> >>> Test: make hotspot - this is "just" removal of code >> >> looks good. >> >> Please add a comment about what the last clause in the verification >> code actually means (heapRegion.cpp:584). Something like: >> >> // Reference may not have been refined into the remembered sets yet. >> // Instead of looking into all dirty card queues, we take a shortcut >> // by looking at whether the corresponding card is dirty. >> // ObjArrays may either by marked on the object header or exactly. >> >> (Actually I would guess the "correct" clause here would be is_array() >> and not is_objArray(), but primitive type arrays are never marked as >> they do not contain references) >> >> I do not need a re-review for the comment change. > > Thanks for the review Thomas, I will add the comment before I push! After the comment removal in JDK-8183281 which referenced the flag I say: Ship it. Thanks, Stefan > Erik > >> Thanks, >> Thomas >> From erik.helin at oracle.com Fri Jun 30 15:34:08 2017 From: erik.helin at oracle.com (Erik Helin) Date: Fri, 30 Jun 2017 17:34:08 +0200 Subject: RFR: 8183281: Remove unnecessary call to increment_gc_time_stamp In-Reply-To: <5b2dff36-0a55-feb8-7e80-52e4562a5651@oracle.com> References: <646a5d9a-b6d3-82c9-3937-027c3193d4c0@oracle.com> <5b2dff36-0a55-feb8-7e80-52e4562a5651@oracle.com> Message-ID: On 06/30/2017 01:53 PM, Stefan Johansson wrote: > Hi Erik, > > On 2017-06-30 11:37, Erik Helin wrote: >> Hi all, >> >> the following small patch removes an unnecessary call to >> increment_gc_time_stamp from >> G1CollectedHeap::do_collection_pause_at_safepoint (and the long, >> wrong, comment above the call). >> >> We already do a call increment_gc_time_stamp much earlier in >> do_collection_pause_at_safepoint, which is enough. The reasons >> outlined in the comment motivating a second call is no longer true, >> the code has changed (but the comment has not). >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8183281 >> Patch: see below >> Testing: make hotspot >> > Patch looks good, but I would like to see some more testing than just > building hotspot. Running the gc jtreg tests for example. Thanks for reviewing! All pass for both fastdebug and product when running `make test TEST=hotspot_gc` on my Linux workstation. Thanks, Erik > Thanks for cleaning up the code, > Stefan >> Thanks, >> Erik >> >> # HG changeset patch >> # User ehelin >> # Date 1498814642 -7200 >> # Fri Jun 30 11:24:02 2017 +0200 >> # Node ID 62400b3cbec4e0d06e0d6c21c9486070d8c906a4 >> # Parent 10ccf0a5f63fdca04d9eda2c774ccdd0e12bc1a1 >> 8183281: Remove unnecessary call to increment_gc_time_stamp >> >> diff -r 10ccf0a5f63f -r 62400b3cbec4 >> src/share/vm/gc/g1/g1CollectedHeap.cpp >> --- a/src/share/vm/gc/g1/g1CollectedHeap.cpp Thu Jun 29 19:09:04 >> 2017 +0000 >> +++ b/src/share/vm/gc/g1/g1CollectedHeap.cpp Fri Jun 30 11:24:02 >> 2017 +0200 >> @@ -3266,29 +3266,6 @@ >> >> MemoryService::track_memory_usage(); >> >> - // In prepare_for_verify() below we'll need to scan the deferred >> - // update buffers to bring the RSets up-to-date if >> - // G1HRRSFlushLogBuffersOnVerify has been set. While scanning >> - // the update buffers we'll probably need to scan cards on the >> - // regions we just allocated to (i.e., the GC alloc >> - // regions). However, during the last GC we called >> - // set_saved_mark() on all the GC alloc regions, so card >> - // scanning might skip the [saved_mark_word()...top()] area of >> - // those regions (i.e., the area we allocated objects into >> - // during the last GC). But it shouldn't. Given that >> - // saved_mark_word() is conditional on whether the GC time stamp >> - // on the region is current or not, by incrementing the GC time >> - // stamp here we invalidate all the GC time stamps on all the >> - // regions and saved_mark_word() will simply return top() for >> - // all the regions. This is a nicer way of ensuring this rather >> - // than iterating over the regions and fixing them. In fact, the >> - // GC time stamp increment here also ensures that >> - // saved_mark_word() will return top() between pauses, i.e., >> - // during concurrent refinement. So we don't need the >> - // is_gc_active() check to decided which top to use when >> - // scanning cards (see CR 7039627). >> - increment_gc_time_stamp(); >> - >> if (VerifyRememberedSets) { >> log_info(gc, verify)("[Verifying RemSets after GC]"); >> VerifyRegionRemSetClosure v_cl; > From rkennke at redhat.com Fri Jun 30 16:32:14 2017 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 30 Jun 2017 18:32:14 +0200 Subject: RFR: 8179387: Factor out CMS specific code from GenCollectedHeap into its own subclass In-Reply-To: References: <3521009f-6fab-4f8e-2375-b9d665a4c70b@redhat.com> Message-ID: <3d8b55a2-a787-3051-b351-ab9b0a24f5e0@redhat.com> I came across one problem using this approach: We will have 2 instances of CollectedHeap around, where there's usually only 1, and some code expects only 1. For example, in CollectedHeap constructor, we create new PerfData variables, and we now create them 2x, which leads to an assert being thrown. I suspect there is more code like that. I will attempt to refactor this a little more, maybe it's not that bad, but it's probably not worth spending too much time on it. Roman > Hi Roman, > > thanks for putting this patch together, it is a great step forward! One > thung that (in my mind) would improve it even further is if we embed a > GenCollectedHeap in CMSHeap and then make CMSHeap inherit directly from > CollectedHeap. > > With this solution, the definition of CMSHeap would look like something > along the lines of: > > class CMSHeap : public CollectedHeap { > WorkGang* _wg; > GenCollectedHeap _gch; > > public: > CMSHeap(GenCollectorPolicy* policy) : > _wg(new WorkGang("GC Thread", ParallelGCThreads, true, true), > _gch(policy) { > _wg->initialize_workers(); > } > > // a bunch of "facade" methods > virtual bool supports_tlab_allocation() const { > return _gch->supports_tlab_allocation(); > } > > virtual size_t tlab_capacity(Thread* t) const { > return _gch->tlab_capacity(t); > } > }; > > With this approach, you would have to implement a bunch of "facade" > methods that just delegates to _gch, such as the methods > supports_tlab_allocation and tlab_capacity above. There are two reasons > why I prefer this approach: > 1. In the end we want CMSHeap to inherit from CollectedHeap anyway :) > 2. It makes it very clear which methods we gradually have to > re-implement in CMSHeap to eventually get rid of the _gch field (the > end goal). This is much harder to see if CMSHeap inherits from > GenCollectedHeap (see more below). > > The second point will most likely cause some initial problems with > `protected` code in GenCollectedHeap. For example, as you noticed when > creating this patch, CMSHeap make use of a few `protected` fields and > methods from GenCollectedHeap, most notably: > - _process_strong_tasks > - process_roots() > - process_string_table_roots() > > It would be much better (IMO) to share this code via composition rather > than inheritance. In this particular case, I would prefer to create a > class StrongRootsProcessor that encapsulates the root processing logic. > Then GenCollectedHeap and CMSHeap can both contain an instance of > StrongRootsProcessor. > > What do you think of this approach? Do you have some spare cycles to try > this approach out? > > Thanks, > Erik > > On 06/02/2017 10:55 AM, Roman Kennke wrote: >> Take this patch. It #ifdef ASSERT's a call to check_gen_kinds() that is >> only present in debug builds. >> >> >> http://cr.openjdk.java.net/~rkennke/8179387/webrev.01/ >> >> >> Roman >> >> Am 01.06.2017 um 22:50 schrieb Roman Kennke: >>> What $SUBJECT says. >>> >>> I went over genCollectedHeap.[hpp|cpp] and moved everything that I could >>> find that is CMS-only into a new CMSHeap class. >>> >>> http://cr.openjdk.java.net/~rkennke/8179387/webrev.00/ >>> >>> >>> It is possible that I overlooked something there. There may be code in >>> there that doesn't shout "CMS" at me, but is still intrinsically CMS stuff. >>> >>> Also not that I have not removed that little part: >>> >>> always_do_update_barrier = UseConcMarkSweepGC; >>> >>> because I expect it to go away with Erik ?'s big refactoring. >>> >>> What do you think? >>> >>> Testing: hotspot_gc, specjvm, some little apps with -XX:+UseConcMarkSweepGC >>> >>> Roman >>>