RFR: Parallelize safepoint cleanup

Thu Jun 22 20:19:31 UTC 2017

So here's the latest iteration of that patch:

http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
<http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>

I checked and fixed all the counters. The problem here is that they are
not updated in a single place (deflate_idle_monitors() ) but in several
places, potentially by multiple threads. I split up deflation into
prepare_.. and a finish_.. methods to initialize local and update global
counters respectively, and pass around a counters object (allocated on
stack) to the various code paths that use it. Updating the counters
always happen under a lock, there's no need to do anything special with
regards to concurrency.

I also checked the nmethod marking, but there doesn't seem to be
anything in that code that looks problematic under concurrency. The
worst that can happen is that two threads write the same value into an
nmethod field. I think we can live with that ;-)

Good to go?

Tested by running specjvm and jcstress fastdebug+release without issues.

Roman

Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
> Hi Roman,
>
> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>> Hi David,
>> thanks for reviewing. I'll be on vacation the next two weeks too, with
>> only sporadic access to work stuff.
>> Yes, exposure will not be as good as otherwise, but it's not totally
>> untested either: the serial code path is the same as the parallel, the
>> only difference is that it's not actually called by multiple threads.
>> It's ok I think.
>>
>> I found two more issues that I think should be addressed:
>> - There are some counters in deflate_idle_monitors() and I'm not sure I
>> correctly handle them in the split-up and MT'ed thread-local/ global
>> list deflation
>> - nmethod marking seems to unconditionally poke true or something like
>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>> probably worth checking if it's already true, especially when doing this
>> with multiple threads concurrently.
>>
>> I'll send an updated patch around later, I hope I can get to it today...
>
> I'll review that when you get it out.
> I think this looks as a reasonable step before we tackle this with a
> major effort, such as the JEP you and Carsten doing.
> And another effort to 'fix' nmethods marking.
>
> Internal discussion yesterday lead us to conclude that the runtime
> will probably need more threads.
> This would be a good driver to do a 'global' worker pool which serves
> both gc, runtime and safepoints with threads.
>
>>
>> Roman
>>
>>> Hi Roman,
>>>
>>> I am about to disappear on an extended vacation so will let others
>>> pursue this. IIUC this is longer an opt-in by the user at runtime, but
>>> an opt-in by the particular GC developers. Okay. My only concern with
>>> that is if Shenandoah is the only GC that currently opts in then this
>>> code is not going to get much testing and will be more prone to
>>> incidental breakage.
>
> As I mentioned before, it seem like Erik Ö have some idea, maybe he
> can do this after his barrier patch.
>
> Thanks!
>
> /Robbin
>
>>>
>>> Cheers,
>>> David
>>>
>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>> Hi Roman,
>>>>>>
>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>
>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>
>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>> concurrent
>>>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>>>> require
>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>> have
>>>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>>>> without
>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>> corresponding
>>>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>>>> STS and
>>>>>>>>> handle requests for safepoints not by yielding, but by leaving
>>>>>>>>> the
>>>>>>>>> task.
>>>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>>>> workers
>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I left the
>>>>>>>>> API in
>>>>>>>>> CollectedHeap in place. I think GC devs who know better about G1
>>>>>>>>> and CMS
>>>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>
>>>>>>>>> Is it ok now?
>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>> workers
>>>>>>>> inside Shenandoah,
>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>> e.g.:
>>>>>>>>
>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>      _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>>>>>>>> } else {
>>>>>>>>      cleanup.work(0);
>>>>>>>> }
>>>>>>>>
>>>>>>>> That way you don't even need your new flags, but it will be up to
>>>>>>>> the
>>>>>>>> other GCs to make their worker available
>>>>>>>> or cheat with a separate workgang.
>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>> The problem is that we do not want to haste such decision, we
>>>>>> believe
>>>>>> there is a better solution.
>>>>>> I think you also would want another solution.
>>>>>> But it's seems like such solution with 1 'global' thread pool either
>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>> Since this probably will not be done any time soon my suggestion is,
>>>>>> to not hold you back (we also want this), just to make
>>>>>> the code parallel and as an intermediate step ask the GC if it minds
>>>>>> sharing it's thread.
>>>>>>
>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will share
>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>> resolved.
>>>>>>
>>>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>>>> flags for it we might limit our future options.
>>>>>>
>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I
>>>>>>> see
>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>>>> However:
>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>> number
>>>>>> of concurrent marking threads and parallel safepoint threads since
>>>>>> they compete for cpu time.
>>>>>> As the code looks now, I think that decisions must be made by the
>>>>>> GC.
>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>> Oops. Minor mistake there. Correction:
>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>
>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>> into
>>>> collectedHeap.hpp, resulting in build failure...)
>>>>
>>>> Roman
>>>>
>>