RFR: Parallelize safepoint cleanup

Thu Jun 1 12:18:23 UTC 2017

Hi Roman,

On 06/01/2017 11:29 AM, Roman Kennke wrote:
> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>> Hi Roman, I agree that is really needed but:
>>
>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>> I realized that sharing workers with GC is not so easy.
>>>
>>> We need to be able to use the workers at a safepoint during concurrent
>>> GC work (which also uses the same workers). This does not only require
>>> that those workers be suspended, like e.g.
>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e. have
>>> finished their tasks. This needs some careful handling to work without
>>> races: it requires a SuspendibleThreadSetJoiner around the corresponding
>>> run_task() call and also the tasks themselves need to join the STS and
>>> handle requests for safepoints not by yielding, but by leaving the task.
>>> This is far too peculiar for me to make the call to hook up GC workers
>>> for safepoint cleanup, and I thus removed those parts. I left the API in
>>> CollectedHeap in place. I think GC devs who know better about G1 and CMS
>>> should make that call, or else just use a separate thread pool.
>>>
>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>
>>> Is it ok now?
>>
>> I still think you should put the "Parallel Safepoint Cleanup" workers
>> inside Shenandoah,
>> so the SafepointSynchronizer only calls get_safepoint_workers, e.g.:
>>
>> _cleanup_workers = heap->get_safepoint_workers();
>> _num_cleanup_workers = _cleanup_workers != NULL ?
>> _cleanup_workers->total_workers() : 1;
>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>> StrongRootsScope srs(_num_cleanup_workers);
>> if (_cleanup_workers != NULL) {
>>    _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>> } else {
>>    cleanup.work(0);
>> }
>>
>> That way you don't even need your new flags, but it will be up to the
>> other GCs to make their worker available
>> or cheat with a separate workgang.
> I can do that, I don't mind. The question is, do we want that?

The problem is that we do not want to haste such decision, we believe there is a better solution.
I think you also would want another solution.
But it's seems like such solution with 1 'global' thread pool either own by GC or the VM it self is quite the undertaking.
Since this probably will not be done any time soon my suggestion is, to not hold you back (we also want this), just to make
the code parallel and as an intermediate step ask the GC if it minds sharing it's thread.

Now when Shenandoah is merged it's possible that e.g. G1 will share the code for a separate thread pool, do something of it's own or
wait until the bigger question about thread pool(s) have been resolved.

By adding a thread pool directly to the SafepointSynchronizer and flags for it we might limit our future options.

> I wouldn't call it 'cheating with a separate workgang' though. I see
> that both G1 and CMS suspend their worker threads at a safepoint. However:

Yes it's not cheating but I want decent heuristics between e.g. number of concurrent marking threads and parallel safepoint threads since they compete for cpu time.
As the code looks now, I think that decisions must be made by the GC.

> - Do they finish their work, stop, and then restart work after
> safepoint? Or are the workers simply calling STS::yield() to suspend and
> later resume their work where they left off. If they only call yield()
> (or whatever equivalent in CMS), then this is not enough: the workers
> need to be truly idle in order to be used by the safepoint cleaners.
> - Parallel and serial GC don't have workgangs of their own.

I know Erik Ö have been some prototyping here, he can probably fill you in.

> 
> So, as far as I can tell, this means that parallel safepoint cleanup
> would only be supported by GCs for which we explicitely implement it,
> after having carefully checked if/how workgangs are suspended at
> safepoints, or by providing GC-internal thread pools. Do we really want
> that?

We know we probably don't want a thread pool in the SafepointSynchronizer and probably not more pools, but we need to think about it.

Do you agree?

Still thanks for doing this!

/Robbin

> 
> Roman
>