RFR: Parallelize safepoint cleanup

Roman Kennke rkennke at redhat.com
Thu Jun 29 11:42:42 UTC 2017

How many Java threads are involved in monitor Inflation ? Parallelization is spread by Java threads (i.e. each worker claims and deflates monitors of 1 java thread per step).


Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn <robbin.ehn at oracle.com>:
>Hi Roman,
>I haven't had the time to test all scenarios, and the numbers are just
>an indication:
>Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg, avg of
>10 worsed cleanups 0.0173s
>Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg, avg of
>10 worsed cleanups 0.0199s
>Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg, avg of
>10 worsed cleanups 0.0066s
>When MonitorUsedDeflationThreshold=0 we are talking about 120000 free
>monitors to deflate.
>And I get worse numbers doing the cleanup in 4 threads.
>Any idea why I see these numbers?
>Thanks, Robbin
>On 06/28/2017 10:23 PM, Roman Kennke wrote:
>>> On 06/27/2017 09:47 PM, Roman Kennke wrote:
>>>> Hi Robbin,
>>>> Ugh. Thanks for catching this.
>>>> Problem was that I was accounting the thread-local deflations
>>>> once in thread-local processing (basically a leftover from my
>>>> attempt to implement this accounting) and then again in
>>>> finish_deflate_idle_monitors(). Should be fixed here:
>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>
>>> Nit:
>>> safepoint.cpp : ParallelSPCleanupTask
>>> "const char* name = " is not needed and 1 is unused
>> Sorry, I don't understand what you mean by this. I see code like
>>        const char* name = "deflating idle monitors";
>> and it is used a few lines below, even 2x.
>> What's '1 is unused' ?
>>>> Side question: which jtreg targets do you usually run?
>>> Right now I cherry pick directories from: hotspot/test/
>>> I'm going to add a decent test group for local testing.
>> That would be good!
>>>> Trying: make test TEST=hotspot_all
>>>> gives me *lots* of failures due to missing jcstress stuff (?!)
>>>>    And even other subsets seem to depend on several bits and pieces
>>>> that I
>>>> have no idea about.
>>> Yes, you need to use internal tool 'jib' java integrate build to get
>>> that work or you can set some environment where the jcstress
>>> application stuff is...
>> Uhhh. We really do want a subset of tests that we can run reliably
>> that are self-contained, how else are people (without that jib
>> supposed to do some sanity checking with their patches? ;-)
>>> I have a regression on ClassLoaderData root scanning, this should
>>> be related,
>>> but I only have 3 patches which could cause this, if it's not
>>> something in the environment that have changed.
>> Let me know if it's my patch :-)
>>> Also do not see any immediate performance gains (off vs 4 threads),
>>> might be
>>> , but I need to-do some more testing. I know you often run with none
>>> default GSI.
>> First of all, during the course of this review I reduced the change
>> an actual implementation to a kind of framework, and it needs some
>> separate changes in the GC to make use of it. Not sure if you added
>> corresponding code in (e.g.) G1?
>> Also, this is only really visible in code that makes excessive use of
>> monitors, i.e. the one linked by Carsten's original patch, or the
>> org.openjdk.gcbench.roots.Synchronizers.test in gc-bench:
>> http://icedtea.classpath.org/hg/gc-bench/
>> There are also some popular real-world apps that tend to do this.
>> the top off my head, Cassandra is such an application.
>> Thanks, Roman
>>> I'll get back to you.
>>> Thanks, Robbin
>>>> Roman
>>>> Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>>>>> Hi Roman,
>>>>> There is something wrong in calculations:
>>>>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25
>>>>> : pop=27051 free=215487
>>>>> free is larger than population, have not had the time to dig into
>>>>> Thanks, Robbin
>>>>> On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>>>>> So here's the latest iteration of that patch:
>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>>>>> I checked and fixed all the counters. The problem here is that
>>>>>> are
>>>>>> not updated in a single place (deflate_idle_monitors() ) but in
>>>>>> several
>>>>>> places, potentially by multiple threads. I split up deflation
>>>>>> prepare_.. and a finish_.. methods to initialize local and update
>>>>>> global
>>>>>> counters respectively, and pass around a counters object
>(allocated on
>>>>>> stack) to the various code paths that use it. Updating the
>>>>>> always happen under a lock, there's no need to do anything
>>>>>> with
>>>>>> regards to concurrency.
>>>>>> I also checked the nmethod marking, but there doesn't seem to be
>>>>>> anything in that code that looks problematic under concurrency.
>>>>>> worst that can happen is that two threads write the same value
>into an
>>>>>> nmethod field. I think we can live with that ;-)
>>>>>> Good to go?
>>>>>> Tested by running specjvm and jcstress fastdebug+release without
>>>>>> issues.
>>>>>> Roman
>>>>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>>>>>> Hi Roman,
>>>>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>>>>>> Hi David,
>>>>>>>> thanks for reviewing. I'll be on vacation the next two weeks
>>>>>>>> with
>>>>>>>> only sporadic access to work stuff.
>>>>>>>> Yes, exposure will not be as good as otherwise, but it's not
>>>>>>>> untested either: the serial code path is the same as the
>>>>>>>> parallel, the
>>>>>>>> only difference is that it's not actually called by multiple
>>>>>>>> threads.
>>>>>>>> It's ok I think.
>>>>>>>> I found two more issues that I think should be addressed:
>>>>>>>> - There are some counters in deflate_idle_monitors() and I'm
>>>>>>>> sure I
>>>>>>>> correctly handle them in the split-up and MT'ed thread-local/
>>>>>>>> list deflation
>>>>>>>> - nmethod marking seems to unconditionally poke true or
>>>>>>>> like
>>>>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but
>>>>>>>> probably worth checking if it's already true, especially when
>>>>>>>> this
>>>>>>>> with multiple threads concurrently.
>>>>>>>> I'll send an updated patch around later, I hope I can get to it
>>>>>>>> today...
>>>>>>> I'll review that when you get it out.
>>>>>>> I think this looks as a reasonable step before we tackle this
>with a
>>>>>>> major effort, such as the JEP you and Carsten doing.
>>>>>>> And another effort to 'fix' nmethods marking.
>>>>>>> Internal discussion yesterday lead us to conclude that the
>>>>>>> will probably need more threads.
>>>>>>> This would be a good driver to do a 'global' worker pool which
>>>>>>> both gc, runtime and safepoints with threads.
>>>>>>>> Roman
>>>>>>>>> Hi Roman,
>>>>>>>>> I am about to disappear on an extended vacation so will let
>>>>>>>>> pursue this. IIUC this is longer an opt-in by the user at
>>>>>>>>> but
>>>>>>>>> an opt-in by the particular GC developers. Okay. My only
>>>>>>>>> with
>>>>>>>>> that is if Shenandoah is the only GC that currently opts in
>>>>>>>>> this
>>>>>>>>> code is not going to get much testing and will be more prone
>>>>>>>>> incidental breakage.
>>>>>>> As I mentioned before, it seem like Erik Ö have some idea, maybe
>>>>>>> can do this after his barrier patch.
>>>>>>> Thanks!
>>>>>>> /Robbin
>>>>>>>>> Cheers,
>>>>>>>>> David
>>>>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>>>>>>> Hi Roman,
>>>>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>>>>>> We need to be able to use the workers at a safepoint
>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>> GC work (which also uses the same workers). This does
>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>> require
>>>>>>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle,
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> finished their tasks. This needs some careful handling
>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around
>>>>>>>>>>>>>>> corresponding
>>>>>>>>>>>>>>> run_task() call and also the tasks themselves need to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> STS and
>>>>>>>>>>>>>>> handle requests for safepoints not by yielding, but by
>>>>>>>>>>>>>>> leaving
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> task.
>>>>>>>>>>>>>>> This is far too peculiar for me to make the call to hook
>>>>>>>>>>>>>>> up GC
>>>>>>>>>>>>>>> workers
>>>>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I
>>>>>>>>>>>>>>> left the
>>>>>>>>>>>>>>> API in
>>>>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>>>>>>>>>>> about G1
>>>>>>>>>>>>>>> and CMS
>>>>>>>>>>>>>>> should make that call, or else just use a separate
>>>>>>>>>>>>>>> pool.
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>>>>>>> Is it ok now?
>>>>>>>>>>>>>> I still think you should put the "Parallel Safepoint
>>>>>>>>>>>>>> workers
>>>>>>>>>>>>>> inside Shenandoah,
>>>>>>>>>>>>>> so the SafepointSynchronizer only calls
>>>>>>>>>>>>>> e.g.:
>>>>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>>>>>>         _cleanup_workers->run_task(&cleanup,
>>>>>>>>>>>>>> _num_cleanup_workers);
>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>         cleanup.work(0);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> That way you don't even need your new flags, but it will
>>>>>>>>>>>>>> up to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> other GCs to make their worker available
>>>>>>>>>>>>>> or cheat with a separate workgang.
>>>>>>>>>>>>> I can do that, I don't mind. The question is, do we want
>>>>>>>>>>>> The problem is that we do not want to haste such decision,
>>>>>>>>>>>> believe
>>>>>>>>>>>> there is a better solution.
>>>>>>>>>>>> I think you also would want another solution.
>>>>>>>>>>>> But it's seems like such solution with 1 'global' thread
>>>>>>>>>>>> either
>>>>>>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>>>>>>> Since this probably will not be done any time soon my
>>>>>>>>>>>> suggestion is,
>>>>>>>>>>>> to not hold you back (we also want this), just to make
>>>>>>>>>>>> the code parallel and as an intermediate step ask the GC if
>>>>>>>>>>>> minds
>>>>>>>>>>>> sharing it's thread.
>>>>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1
>>>>>>>>>>>> share
>>>>>>>>>>>> the code for a separate thread pool, do something of it's
>own or
>>>>>>>>>>>> wait until the bigger question about thread pool(s) have
>>>>>>>>>>>> resolved.
>>>>>>>>>>>> By adding a thread pool directly to the
>>>>>>>>>>>> and
>>>>>>>>>>>> flags for it we might limit our future options.
>>>>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang'
>>>>>>>>>>>>> though. I
>>>>>>>>>>>>> see
>>>>>>>>>>>>> that both G1 and CMS suspend their worker threads at a
>>>>>>>>>>>>> safepoint.
>>>>>>>>>>>>> However:
>>>>>>>>>>>> Yes it's not cheating but I want decent heuristics between
>>>>>>>>>>>> number
>>>>>>>>>>>> of concurrent marking threads and parallel safepoint
>>>>>>>>>>>> since
>>>>>>>>>>>> they compete for cpu time.
>>>>>>>>>>>> As the code looks now, I think that decisions must be made
>>>>>>>>>>>> the
>>>>>>>>>>>> GC.
>>>>>>>>>>> Ok, I see your point. I updated the proposed patch
>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>>>>>> Oops. Minor mistake there. Correction:
>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to
>add it
>>>>>>>>>> into
>>>>>>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>>>>>> Roman

Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

More information about the hotspot-runtime-dev mailing list