RFR: Parallelize safepoint cleanup

Wed Jun 28 18:55:57 UTC 2017

Hi Roman

On 06/27/2017 09:47 PM, Roman Kennke wrote:
> Hi Robbin,
>
> Ugh. Thanks for catching this.
> Problem was that I was accounting the thread-local deflations twice:
> once in thread-local processing (basically a leftover from my earlier
> attempt to implement this accounting) and then again in
> finish_deflate_idle_monitors(). Should be fixed here:
>
> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>

Nit:
safepoint.cpp : ParallelSPCleanupTask
"const char* name = " is not needed and 1 is unused

>
> Side question: which jtreg targets do you usually run?

Right now I cherry pick directories from: hotspot/test/

I'm going to add a decent test group for local testing.

>
> Trying: make test TEST=hotspot_all
> gives me *lots* of failures due to missing jcstress stuff (?!)
>   And even other subsets seem to depend on several bits and pieces that I
> have no idea about.

Yes, you need to use internal tool 'jib' java integrate build to get that work or you can set some environment where the jcstress application stuff is...

I have a regression on ClassLoaderData root scanning, this should not be related,
but I only have 3 patches which could cause this, if it's not something in the environment that have changed.

Also do not see any immediate performance gains (off vs 4 threads), it might be http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
, but I need to-do some more testing. I know you often run with none default GSI.

I'll get back to you.

Thanks, Robbin

>
> Roman
>
> Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>> Hi Roman,
>>
>> There is something wrong in calculations:
>> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0
>> : pop=27051 free=215487
>>
>> free is larger than population, have not had the time to dig into this.
>>
>> Thanks, Robbin
>>
>> On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>> So here's the latest iteration of that patch:
>>>
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>>
>>> I checked and fixed all the counters. The problem here is that they are
>>> not updated in a single place (deflate_idle_monitors() ) but in several
>>> places, potentially by multiple threads. I split up deflation into
>>> prepare_.. and a finish_.. methods to initialize local and update global
>>> counters respectively, and pass around a counters object (allocated on
>>> stack) to the various code paths that use it. Updating the counters
>>> always happen under a lock, there's no need to do anything special with
>>> regards to concurrency.
>>>
>>> I also checked the nmethod marking, but there doesn't seem to be
>>> anything in that code that looks problematic under concurrency. The
>>> worst that can happen is that two threads write the same value into an
>>> nmethod field. I think we can live with that ;-)
>>>
>>> Good to go?
>>>
>>> Tested by running specjvm and jcstress fastdebug+release without issues.
>>>
>>> Roman
>>>
>>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>>> Hi Roman,
>>>>
>>>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>>> Hi David,
>>>>> thanks for reviewing. I'll be on vacation the next two weeks too, with
>>>>> only sporadic access to work stuff.
>>>>> Yes, exposure will not be as good as otherwise, but it's not totally
>>>>> untested either: the serial code path is the same as the parallel, the
>>>>> only difference is that it's not actually called by multiple threads.
>>>>> It's ok I think.
>>>>>
>>>>> I found two more issues that I think should be addressed:
>>>>> - There are some counters in deflate_idle_monitors() and I'm not
>>>>> sure I
>>>>> correctly handle them in the split-up and MT'ed thread-local/ global
>>>>> list deflation
>>>>> - nmethod marking seems to unconditionally poke true or something like
>>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>>>>> probably worth checking if it's already true, especially when doing
>>>>> this
>>>>> with multiple threads concurrently.
>>>>>
>>>>> I'll send an updated patch around later, I hope I can get to it
>>>>> today...
>>>>
>>>> I'll review that when you get it out.
>>>> I think this looks as a reasonable step before we tackle this with a
>>>> major effort, such as the JEP you and Carsten doing.
>>>> And another effort to 'fix' nmethods marking.
>>>>
>>>> Internal discussion yesterday lead us to conclude that the runtime
>>>> will probably need more threads.
>>>> This would be a good driver to do a 'global' worker pool which serves
>>>> both gc, runtime and safepoints with threads.
>>>>
>>>>>
>>>>> Roman
>>>>>
>>>>>> Hi Roman,
>>>>>>
>>>>>> I am about to disappear on an extended vacation so will let others
>>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime,
>>>>>> but
>>>>>> an opt-in by the particular GC developers. Okay. My only concern with
>>>>>> that is if Shenandoah is the only GC that currently opts in then this
>>>>>> code is not going to get much testing and will be more prone to
>>>>>> incidental breakage.
>>>>
>>>> As I mentioned before, it seem like Erik Ö have some idea, maybe he
>>>> can do this after his barrier patch.
>>>>
>>>> Thanks!
>>>>
>>>> /Robbin
>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> David
>>>>>>
>>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>>>> Hi Roman,
>>>>>>>>>
>>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>>>
>>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>>>
>>>>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>>>>> concurrent
>>>>>>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>>>>>>> require
>>>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>>>>> have
>>>>>>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>>>>>>> without
>>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>>>>> corresponding
>>>>>>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>>>>>>> STS and
>>>>>>>>>>>> handle requests for safepoints not by yielding, but by leaving
>>>>>>>>>>>> the
>>>>>>>>>>>> task.
>>>>>>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>>>>>>> workers
>>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I
>>>>>>>>>>>> left the
>>>>>>>>>>>> API in
>>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>>>>>>>> about G1
>>>>>>>>>>>> and CMS
>>>>>>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>>>>>>
>>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>>>
>>>>>>>>>>>> Is it ok now?
>>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>>>>> workers
>>>>>>>>>>> inside Shenandoah,
>>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>>>>> e.g.:
>>>>>>>>>>>
>>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>>>        _cleanup_workers->run_task(&cleanup,
>>>>>>>>>>> _num_cleanup_workers);
>>>>>>>>>>> } else {
>>>>>>>>>>>        cleanup.work(0);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> That way you don't even need your new flags, but it will be
>>>>>>>>>>> up to
>>>>>>>>>>> the
>>>>>>>>>>> other GCs to make their worker available
>>>>>>>>>>> or cheat with a separate workgang.
>>>>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>>>>> The problem is that we do not want to haste such decision, we
>>>>>>>>> believe
>>>>>>>>> there is a better solution.
>>>>>>>>> I think you also would want another solution.
>>>>>>>>> But it's seems like such solution with 1 'global' thread pool
>>>>>>>>> either
>>>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>>>> Since this probably will not be done any time soon my
>>>>>>>>> suggestion is,
>>>>>>>>> to not hold you back (we also want this), just to make
>>>>>>>>> the code parallel and as an intermediate step ask the GC if it
>>>>>>>>> minds
>>>>>>>>> sharing it's thread.
>>>>>>>>>
>>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will
>>>>>>>>> share
>>>>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>>>>> resolved.
>>>>>>>>>
>>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>>>>>>> flags for it we might limit our future options.
>>>>>>>>>
>>>>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I
>>>>>>>>>> see
>>>>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>>>>>>> However:
>>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>>>>> number
>>>>>>>>> of concurrent marking threads and parallel safepoint threads since
>>>>>>>>> they compete for cpu time.
>>>>>>>>> As the code looks now, I think that decisions must be made by the
>>>>>>>>> GC.
>>>>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>>> Oops. Minor mistake there. Correction:
>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>>>
>>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>>>>> into
>>>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>>>
>>>>>>> Roman
>>>>>>>
>>>>>
>>>
>