RFR: Parallelize safepoint cleanup

Tue Jun 27 14:51:45 UTC 2017

Hi Roman,

There is something wrong in calculations:
INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0 : pop=27051 free=215487

free is larger than population, have not had the time to dig into this.

Thanks, Robbin

On 06/22/2017 10:19 PM, Roman Kennke wrote:
> So here's the latest iteration of that patch:
> 
> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
> 
> I checked and fixed all the counters. The problem here is that they are
> not updated in a single place (deflate_idle_monitors() ) but in several
> places, potentially by multiple threads. I split up deflation into
> prepare_.. and a finish_.. methods to initialize local and update global
> counters respectively, and pass around a counters object (allocated on
> stack) to the various code paths that use it. Updating the counters
> always happen under a lock, there's no need to do anything special with
> regards to concurrency.
> 
> I also checked the nmethod marking, but there doesn't seem to be
> anything in that code that looks problematic under concurrency. The
> worst that can happen is that two threads write the same value into an
> nmethod field. I think we can live with that ;-)
> 
> Good to go?
> 
> Tested by running specjvm and jcstress fastdebug+release without issues.
> 
> Roman
> 
> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>> Hi Roman,
>>
>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>> Hi David,
>>> thanks for reviewing. I'll be on vacation the next two weeks too, with
>>> only sporadic access to work stuff.
>>> Yes, exposure will not be as good as otherwise, but it's not totally
>>> untested either: the serial code path is the same as the parallel, the
>>> only difference is that it's not actually called by multiple threads.
>>> It's ok I think.
>>>
>>> I found two more issues that I think should be addressed:
>>> - There are some counters in deflate_idle_monitors() and I'm not sure I
>>> correctly handle them in the split-up and MT'ed thread-local/ global
>>> list deflation
>>> - nmethod marking seems to unconditionally poke true or something like
>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>>> probably worth checking if it's already true, especially when doing this
>>> with multiple threads concurrently.
>>>
>>> I'll send an updated patch around later, I hope I can get to it today...
>>
>> I'll review that when you get it out.
>> I think this looks as a reasonable step before we tackle this with a
>> major effort, such as the JEP you and Carsten doing.
>> And another effort to 'fix' nmethods marking.
>>
>> Internal discussion yesterday lead us to conclude that the runtime
>> will probably need more threads.
>> This would be a good driver to do a 'global' worker pool which serves
>> both gc, runtime and safepoints with threads.
>>
>>>
>>> Roman
>>>
>>>> Hi Roman,
>>>>
>>>> I am about to disappear on an extended vacation so will let others
>>>> pursue this. IIUC this is longer an opt-in by the user at runtime, but
>>>> an opt-in by the particular GC developers. Okay. My only concern with
>>>> that is if Shenandoah is the only GC that currently opts in then this
>>>> code is not going to get much testing and will be more prone to
>>>> incidental breakage.
>>
>> As I mentioned before, it seem like Erik Ö have some idea, maybe he
>> can do this after his barrier patch.
>>
>> Thanks!
>>
>> /Robbin
>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>> Hi Roman,
>>>>>>>
>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>
>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>
>>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>>> concurrent
>>>>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>>>>> require
>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>>> have
>>>>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>>>>> without
>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>>> corresponding
>>>>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>>>>> STS and
>>>>>>>>>> handle requests for safepoints not by yielding, but by leaving
>>>>>>>>>> the
>>>>>>>>>> task.
>>>>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>>>>> workers
>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I left the
>>>>>>>>>> API in
>>>>>>>>>> CollectedHeap in place. I think GC devs who know better about G1
>>>>>>>>>> and CMS
>>>>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>
>>>>>>>>>> Is it ok now?
>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>>> workers
>>>>>>>>> inside Shenandoah,
>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>>> e.g.:
>>>>>>>>>
>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>       _cleanup_workers->run_task(&cleanup, _num_cleanup_workers);
>>>>>>>>> } else {
>>>>>>>>>       cleanup.work(0);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> That way you don't even need your new flags, but it will be up to
>>>>>>>>> the
>>>>>>>>> other GCs to make their worker available
>>>>>>>>> or cheat with a separate workgang.
>>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>>> The problem is that we do not want to haste such decision, we
>>>>>>> believe
>>>>>>> there is a better solution.
>>>>>>> I think you also would want another solution.
>>>>>>> But it's seems like such solution with 1 'global' thread pool either
>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>> Since this probably will not be done any time soon my suggestion is,
>>>>>>> to not hold you back (we also want this), just to make
>>>>>>> the code parallel and as an intermediate step ask the GC if it minds
>>>>>>> sharing it's thread.
>>>>>>>
>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will share
>>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>>> resolved.
>>>>>>>
>>>>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>>>>> flags for it we might limit our future options.
>>>>>>>
>>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I
>>>>>>>> see
>>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>>>>> However:
>>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>>> number
>>>>>>> of concurrent marking threads and parallel safepoint threads since
>>>>>>> they compete for cpu time.
>>>>>>> As the code looks now, I think that decisions must be made by the
>>>>>>> GC.
>>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>> Oops. Minor mistake there. Correction:
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>
>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>>> into
>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>
>>>>> Roman
>>>>>
>>>
>