RFR: Parallelize safepoint cleanup

Tue Jun 27 19:47:21 UTC 2017

Hi Robbin,

Ugh. Thanks for catching this.
Problem was that I was accounting the thread-local deflations twice:
once in thread-local processing (basically a leftover from my earlier
attempt to implement this accounting) and then again in
finish_deflate_idle_monitors(). Should be fixed here:

http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
<http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>

Side question: which jtreg targets do you usually run?

Trying: make test TEST=hotspot_all
gives me *lots* of failures due to missing jcstress stuff (?!)
 And even other subsets seem to depend on several bits and pieces that I
have no idea about.

Roman

Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
> Hi Roman,
>
> There is something wrong in calculations:
> INFO: Deflate: InCirc=43 InUse=18 Scavenged=25 ForceMonitorScavenge=0
> : pop=27051 free=215487
>
> free is larger than population, have not had the time to dig into this.
>
> Thanks, Robbin
>
> On 06/22/2017 10:19 PM, Roman Kennke wrote:
>> So here's the latest iteration of that patch:
>>
>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>
>> I checked and fixed all the counters. The problem here is that they are
>> not updated in a single place (deflate_idle_monitors() ) but in several
>> places, potentially by multiple threads. I split up deflation into
>> prepare_.. and a finish_.. methods to initialize local and update global
>> counters respectively, and pass around a counters object (allocated on
>> stack) to the various code paths that use it. Updating the counters
>> always happen under a lock, there's no need to do anything special with
>> regards to concurrency.
>>
>> I also checked the nmethod marking, but there doesn't seem to be
>> anything in that code that looks problematic under concurrency. The
>> worst that can happen is that two threads write the same value into an
>> nmethod field. I think we can live with that ;-)
>>
>> Good to go?
>>
>> Tested by running specjvm and jcstress fastdebug+release without issues.
>>
>> Roman
>>
>> Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>> Hi Roman,
>>>
>>> On 06/02/2017 11:41 AM, Roman Kennke wrote:
>>>> Hi David,
>>>> thanks for reviewing. I'll be on vacation the next two weeks too, with
>>>> only sporadic access to work stuff.
>>>> Yes, exposure will not be as good as otherwise, but it's not totally
>>>> untested either: the serial code path is the same as the parallel, the
>>>> only difference is that it's not actually called by multiple threads.
>>>> It's ok I think.
>>>>
>>>> I found two more issues that I think should be addressed:
>>>> - There are some counters in deflate_idle_monitors() and I'm not
>>>> sure I
>>>> correctly handle them in the split-up and MT'ed thread-local/ global
>>>> list deflation
>>>> - nmethod marking seems to unconditionally poke true or something like
>>>> that in nmethod fields. This doesn't hurt correctness-wise, but it's
>>>> probably worth checking if it's already true, especially when doing
>>>> this
>>>> with multiple threads concurrently.
>>>>
>>>> I'll send an updated patch around later, I hope I can get to it
>>>> today...
>>>
>>> I'll review that when you get it out.
>>> I think this looks as a reasonable step before we tackle this with a
>>> major effort, such as the JEP you and Carsten doing.
>>> And another effort to 'fix' nmethods marking.
>>>
>>> Internal discussion yesterday lead us to conclude that the runtime
>>> will probably need more threads.
>>> This would be a good driver to do a 'global' worker pool which serves
>>> both gc, runtime and safepoints with threads.
>>>
>>>>
>>>> Roman
>>>>
>>>>> Hi Roman,
>>>>>
>>>>> I am about to disappear on an extended vacation so will let others
>>>>> pursue this. IIUC this is longer an opt-in by the user at runtime,
>>>>> but
>>>>> an opt-in by the particular GC developers. Okay. My only concern with
>>>>> that is if Shenandoah is the only GC that currently opts in then this
>>>>> code is not going to get much testing and will be more prone to
>>>>> incidental breakage.
>>>
>>> As I mentioned before, it seem like Erik Ö have some idea, maybe he
>>> can do this after his barrier patch.
>>>
>>> Thanks!
>>>
>>> /Robbin
>>>
>>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>> On 2/06/2017 2:21 AM, Roman Kennke wrote:
>>>>>> Am 01.06.2017 um 17:50 schrieb Roman Kennke:
>>>>>>> Am 01.06.2017 um 14:18 schrieb Robbin Ehn:
>>>>>>>> Hi Roman,
>>>>>>>>
>>>>>>>> On 06/01/2017 11:29 AM, Roman Kennke wrote:
>>>>>>>>> Am 31.05.2017 um 22:06 schrieb Robbin Ehn:
>>>>>>>>>> Hi Roman, I agree that is really needed but:
>>>>>>>>>>
>>>>>>>>>> On 05/31/2017 10:27 AM, Roman Kennke wrote:
>>>>>>>>>>> I realized that sharing workers with GC is not so easy.
>>>>>>>>>>>
>>>>>>>>>>> We need to be able to use the workers at a safepoint during
>>>>>>>>>>> concurrent
>>>>>>>>>>> GC work (which also uses the same workers). This does not only
>>>>>>>>>>> require
>>>>>>>>>>> that those workers be suspended, like e.g.
>>>>>>>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>>>>>>> have
>>>>>>>>>>> finished their tasks. This needs some careful handling to work
>>>>>>>>>>> without
>>>>>>>>>>> races: it requires a SuspendibleThreadSetJoiner around the
>>>>>>>>>>> corresponding
>>>>>>>>>>> run_task() call and also the tasks themselves need to join the
>>>>>>>>>>> STS and
>>>>>>>>>>> handle requests for safepoints not by yielding, but by leaving
>>>>>>>>>>> the
>>>>>>>>>>> task.
>>>>>>>>>>> This is far too peculiar for me to make the call to hook up GC
>>>>>>>>>>> workers
>>>>>>>>>>> for safepoint cleanup, and I thus removed those parts. I
>>>>>>>>>>> left the
>>>>>>>>>>> API in
>>>>>>>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>>>>>>> about G1
>>>>>>>>>>> and CMS
>>>>>>>>>>> should make that call, or else just use a separate thread pool.
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>>>>>>>
>>>>>>>>>>> Is it ok now?
>>>>>>>>>> I still think you should put the "Parallel Safepoint Cleanup"
>>>>>>>>>> workers
>>>>>>>>>> inside Shenandoah,
>>>>>>>>>> so the SafepointSynchronizer only calls get_safepoint_workers,
>>>>>>>>>> e.g.:
>>>>>>>>>>
>>>>>>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>>>>>> _cleanup_workers->total_workers() : 1;
>>>>>>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>>>>>> if (_cleanup_workers != NULL) {
>>>>>>>>>>       _cleanup_workers->run_task(&cleanup,
>>>>>>>>>> _num_cleanup_workers);
>>>>>>>>>> } else {
>>>>>>>>>>       cleanup.work(0);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> That way you don't even need your new flags, but it will be
>>>>>>>>>> up to
>>>>>>>>>> the
>>>>>>>>>> other GCs to make their worker available
>>>>>>>>>> or cheat with a separate workgang.
>>>>>>>>> I can do that, I don't mind. The question is, do we want that?
>>>>>>>> The problem is that we do not want to haste such decision, we
>>>>>>>> believe
>>>>>>>> there is a better solution.
>>>>>>>> I think you also would want another solution.
>>>>>>>> But it's seems like such solution with 1 'global' thread pool
>>>>>>>> either
>>>>>>>> own by GC or the VM it self is quite the undertaking.
>>>>>>>> Since this probably will not be done any time soon my
>>>>>>>> suggestion is,
>>>>>>>> to not hold you back (we also want this), just to make
>>>>>>>> the code parallel and as an intermediate step ask the GC if it
>>>>>>>> minds
>>>>>>>> sharing it's thread.
>>>>>>>>
>>>>>>>> Now when Shenandoah is merged it's possible that e.g. G1 will
>>>>>>>> share
>>>>>>>> the code for a separate thread pool, do something of it's own or
>>>>>>>> wait until the bigger question about thread pool(s) have been
>>>>>>>> resolved.
>>>>>>>>
>>>>>>>> By adding a thread pool directly to the SafepointSynchronizer and
>>>>>>>> flags for it we might limit our future options.
>>>>>>>>
>>>>>>>>> I wouldn't call it 'cheating with a separate workgang' though. I
>>>>>>>>> see
>>>>>>>>> that both G1 and CMS suspend their worker threads at a safepoint.
>>>>>>>>> However:
>>>>>>>> Yes it's not cheating but I want decent heuristics between e.g.
>>>>>>>> number
>>>>>>>> of concurrent marking threads and parallel safepoint threads since
>>>>>>>> they compete for cpu time.
>>>>>>>> As the code looks now, I think that decisions must be made by the
>>>>>>>> GC.
>>>>>>> Ok, I see your point. I updated the proposed patch accordingly:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>> Oops. Minor mistake there. Correction:
>>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>>
>>>>>> (Removed 'class WorkGang' from safepoint.hpp, and forgot to add it
>>>>>> into
>>>>>> collectedHeap.hpp, resulting in build failure...)
>>>>>>
>>>>>> Roman
>>>>>>
>>>>
>>