RFR: Parallelize safepoint cleanup

Tue Jul 4 07:11:50 UTC 2017

Hi Roman,

On 07/03/2017 04:39 PM, Roman Kennke wrote:
> Hi Robbin,
> 
> does this require another review? I am not sure about David Holmes?

David is back in Aug, I think he was pretty okey with but I think we should get another review.
Must of our people had extended weekend and are back tomorrow.
I'm soon of for 5 weeks and I really want this to be pushed before that.

> 
> If not, I'm going to need a sponsor.

I will of course take care of that!

/Robbin

> 
> Thanks and cheers,
> Roman
> 
> Am 29.06.2017 um 21:27 schrieb Robbin Ehn:
>> Hi Roman,
>>
>> Thanks,
>>
>> There seem to be a performance gain vs old just running VM thread
>> (again shaky numbers, but an indication):
>>
>> Old code with,   MonitorUsedDeflationThreshold=0, 0.003099s, avg of 10
>> worsed cleanups 0.0213s
>> Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s, avg of 10
>> worsed cleanups 0.0173s
>>
>> I'm assuming that combining deflation and nmethods marking in same
>> pass is the reason for this.
>> Great!
>>
>> I'm happy, looks good!
>>
>> Thanks for fixing!
>>
>> /Robbin
>>
>> On 06/29/2017 08:25 PM, Roman Kennke wrote:
>>> I just did a run with gcbench.
>>> I am running:
>>>
>>> build/linux-x86_64-normal-server-release/images/jdk/bin/java -jar
>>> target/benchmarks.jar roots.Sync --jvmArgs "-Xmx8g -Xms8g
>>> -XX:ParallelSafepointCleanupThreads=1 -XX:-UseBiasedLocking --add-opens
>>> java.base/jdk.internal.misc=ALL-UNNAMED -XX:+PrintSafepointStatistics"
>>> -p size=500000 -wi 5 -i 5 -f 1
>>>
>>> i.e. I am giving it 500,000 monitors per thread on 8 java threads.
>>>
>>> with VMThread I am getting:
>>>
>>>             vmop                            [ threads:    total
>>> initially_running wait_to_block ][ time:    spin   block    sync
>>> cleanup    vmop ] page_trap_count
>>>      0,646: G1IncCollectionPause            [
>>> 19                 4             6 ][             0       0       0
>>> 158     225 ]               4
>>>      1,073: G1IncCollectionPause            [
>>> 19                 5             6 ][             1       0       1
>>> 159     174 ]               5
>>>      1,961: G1IncCollectionPause            [
>>> 19                 2             6 ][             0       0       0
>>> 130      66 ]               2
>>>      2,202: G1IncCollectionPause            [
>>> 19                 5             6 ][             1       0       1
>>> 127      70 ]               5
>>>      2,445: G1IncCollectionPause            [
>>> 19                 7             7 ][             1       0       1
>>> 127      66 ]               7
>>>      2,684: G1IncCollectionPause            [
>>> 19                 7             7 ][             1       0       1
>>> 127      66 ]               7
>>>      3,371: G1IncCollectionPause            [
>>> 19                 5             7 ][             1       0       1
>>> 127      74 ]               5
>>>      3,619: G1IncCollectionPause            [
>>> 19                 5             6 ][             1       0       1
>>> 127      66 ]               5
>>>      3,857: G1IncCollectionPause            [
>>> 19                 6             6 ][             1       0       1
>>> 126      68 ]               6
>>>
>>> I.e. it gets to fairly consistent >120us for cleanup.
>>>
>>> With 4 safepoint cleanup threads I get:
>>>
>>>
>>>             vmop                            [ threads:    total
>>> initially_running wait_to_block ][ time:    spin   block    sync
>>> cleanup    vmop ] page_trap_count
>>>      0,650: G1IncCollectionPause            [
>>> 19                 4             6 ][             0       0       0
>>> 63     197 ]               4
>>>      0,951: G1IncCollectionPause            [
>>> 19                 0             1 ][             0       0       0
>>> 64     151 ]               0
>>>      1,214: G1IncCollectionPause            [
>>> 19                 7             8 ][             0       0       0
>>> 62      93 ]               6
>>>      1,942: G1IncCollectionPause            [
>>> 19                 4             6 ][             1       0       1
>>> 59      71 ]               4
>>>      2,118: G1IncCollectionPause            [
>>> 19                 6             6 ][             1       0       1
>>> 59      72 ]               6
>>>      2,296: G1IncCollectionPause            [
>>> 19                 5             6 ][             0       0       0
>>> 59      69 ]               5
>>>
>>> i.e. fairly consistently around 60 us (I think it's us?!)
>>>
>>> I grant you that I'm throwing way way more monitors at it. With just
>>> 12000 monitors per thread I get columns of 0s under cleanup. :-)
>>>
>>> Roman
>>>
>>> Here's with 1 tAm 29.06.2017 um 14:17 schrieb Robbin Ehn:
>>>> The test is using 24 threads (whatever that means), total number of
>>>> javathreads is 57 (including compiler, etc...).
>>>>
>>>> [29.186s][error][os       ] Num threads:57
>>>> [29.186s][error][os       ] omInUseCount:0
>>>> [29.186s][error][os       ] omInUseCount:2064
>>>> [29.187s][error][os       ] omInUseCount:1861
>>>> [29.188s][error][os       ] omInUseCount:1058
>>>> [29.188s][error][os       ] omInUseCount:2
>>>> [29.188s][error][os       ] omInUseCount:577
>>>> [29.189s][error][os       ] omInUseCount:1443
>>>> [29.189s][error][os       ] omInUseCount:122
>>>> [29.189s][error][os       ] omInUseCount:47
>>>> [29.189s][error][os       ] omInUseCount:497
>>>> [29.189s][error][os       ] omInUseCount:16
>>>> [29.189s][error][os       ] omInUseCount:113
>>>> [29.189s][error][os       ] omInUseCount:5
>>>> [29.189s][error][os       ] omInUseCount:678
>>>> [29.190s][error][os       ] omInUseCount:105
>>>> [29.190s][error][os       ] omInUseCount:609
>>>> [29.190s][error][os       ] omInUseCount:286
>>>> [29.190s][error][os       ] omInUseCount:228
>>>> [29.190s][error][os       ] omInUseCount:1391
>>>> [29.191s][error][os       ] omInUseCount:1652
>>>> [29.191s][error][os       ] omInUseCount:325
>>>> [29.191s][error][os       ] omInUseCount:439
>>>> [29.192s][error][os       ] omInUseCount:994
>>>> [29.192s][error][os       ] omInUseCount:103
>>>> [29.192s][error][os       ] omInUseCount:2337
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:1
>>>> [29.193s][error][os       ] omInUseCount:1
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:1
>>>> [29.193s][error][os       ] omInUseCount:2
>>>> [29.193s][error][os       ] omInUseCount:1
>>>> [29.193s][error][os       ] omInUseCount:1
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:1
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:1
>>>> [29.193s][error][os       ] omInUseCount:0
>>>> [29.193s][error][os       ] omInUseCount:0
>>>>
>>>> So in my setup even if you parallel the per thread in use monitors
>>>> work the synchronization overhead is still larger.
>>>>
>>>> /Robbin
>>>>
>>>> On 06/29/2017 01:42 PM, Roman Kennke wrote:
>>>>> How many Java threads are involved in monitor Inflation ?
>>>>> Parallelization is spread by Java threads (i.e. each worker claims
>>>>> and deflates monitors of 1 java thread per step).
>>>>>
>>>>> Roman
>>>>>
>>>>> Am 29. Juni 2017 12:49:58 MESZ schrieb Robbin Ehn
>>>>> <robbin.ehn at oracle.com>:
>>>>>
>>>>>       Hi Roman,
>>>>>
>>>>>       I haven't had the time to test all scenarios, and the numbers are
>>>>> just an indication:
>>>>>
>>>>>       Do it VM thread, MonitorUsedDeflationThreshold=0, 0.002782s avg,
>>>>> avg of 10 worsed cleanups 0.0173s
>>>>>       Do it 4 workers, MonitorUsedDeflationThreshold=0, 0.002923s avg,
>>>>> avg of 10 worsed cleanups 0.0199s
>>>>>       Do it VM thread, MonitorUsedDeflationThreshold=1, 0.001889s avg,
>>>>> avg of 10 worsed cleanups 0.0066s
>>>>>
>>>>>       When MonitorUsedDeflationThreshold=0 we are talking about 120000
>>>>> free monitors to deflate.
>>>>>       And I get worse numbers doing the cleanup in 4 threads.
>>>>>
>>>>>       Any idea why I see these numbers?
>>>>>
>>>>>       Thanks, Robbin
>>>>>
>>>>>       On 06/28/2017 10:23 PM, Roman Kennke wrote:
>>>>>
>>>>>
>>>>>
>>>>>               On 06/27/2017 09:47 PM, Roman Kennke wrote:
>>>>>
>>>>>                   Hi Robbin,
>>>>>
>>>>>                   Ugh. Thanks for catching this.
>>>>>                   Problem was that I was accounting the thread-local
>>>>> deflations twice:
>>>>>                   once in thread-local processing (basically a leftover
>>>>> from my earlier
>>>>>                   attempt to implement this accounting) and then
>>>>> again in
>>>>>                   finish_deflate_idle_monitors(). Should be fixed here:
>>>>>
>>>>>                  
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.09/
>>>>>                 
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.09/>
>>>>>
>>>>>
>>>>>               Nit:
>>>>>               safepoint.cpp : ParallelSPCleanupTask
>>>>>               "const char* name = " is not needed and 1 is unused
>>>>>
>>>>>
>>>>>           Sorry, I don't understand what you mean by this. I see code
>>>>> like this:
>>>>>
>>>>>           const char* name = "deflating idle monitors";
>>>>>
>>>>>           and it is used a few lines below, even 2x.
>>>>>
>>>>>           What's '1 is unused' ?
>>>>>
>>>>>
>>>>>                   Side question: which jtreg targets do you usually
>>>>> run?
>>>>>
>>>>>
>>>>>               Right now I cherry pick directories from: hotspot/test/
>>>>>
>>>>>               I'm going to add a decent test group for local testing.
>>>>>
>>>>>           That would be good!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                   Trying: make test TEST=hotspot_all
>>>>>                   gives me *lots* of failures due to missing jcstress
>>>>> stuff (?!)
>>>>>                   And even other subsets seem to depend on several bits
>>>>> and pieces
>>>>>                   that I
>>>>>                   have no idea about.
>>>>>
>>>>>
>>>>>               Yes, you need to use internal tool 'jib' java integrate
>>>>> build to get
>>>>>               that work or you can set some environment where the
>>>>> jcstress
>>>>>               application stuff is...
>>>>>
>>>>>           Uhhh. We really do want a subset of tests that we can run
>>>>> reliably and
>>>>>           that are self-contained, how else are people (without that
>>>>> jib thingy)
>>>>>           supposed to do some sanity checking with their patches? ;-)
>>>>>
>>>>>               I have a regression on ClassLoaderData root scanning,
>>>>> this should not
>>>>>               be related,
>>>>>               but I only have 3 patches which could cause this, if it's
>>>>> not
>>>>>               something in the environment that have changed.
>>>>>
>>>>>           Let me know if it's my patch :-)
>>>>>
>>>>>
>>>>>               Also do not see any immediate performance gains (off vs 4
>>>>> threads), it
>>>>>               might be
>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/06994badeb24
>>>>>               , but I need to-do some more testing. I know you often
>>>>> run with none
>>>>>               default GSI.
>>>>>
>>>>>
>>>>>           First of all, during the course of this review I reduced the
>>>>> change from
>>>>>           an actual implementation to a kind of framework, and it needs
>>>>> some
>>>>>           separate changes in the GC to make use of it. Not sure if you
>>>>> added
>>>>>           corresponding code in (e.g.) G1?
>>>>>
>>>>>           Also, this is only really visible in code that makes
>>>>> excessive use of
>>>>>           monitors, i.e. the one linked by Carsten's original patch, or
>>>>> the test
>>>>>           org.openjdk.gcbench.roots.Synchronizers.test in gc-bench:
>>>>>
>>>>>           http://icedtea.classpath.org/hg/gc-bench/
>>>>>
>>>>>           There are also some popular real-world apps that tend to do
>>>>> this. From
>>>>>           the top off my head, Cassandra is such an application.
>>>>>
>>>>>           Thanks, Roman
>>>>>
>>>>>
>>>>>               I'll get back to you.
>>>>>
>>>>>               Thanks, Robbin
>>>>>
>>>>>
>>>>>                   Roman
>>>>>
>>>>>                   Am 27.06.2017 um 16:51 schrieb Robbin Ehn:
>>>>>
>>>>>                       Hi Roman,
>>>>>
>>>>>                       There is something wrong in calculations:
>>>>>                       INFO: Deflate: InCirc=43 InUse=18 Scavenged=25
>>>>> ForceMonitorScavenge=0
>>>>>                       : pop=27051 free=215487
>>>>>
>>>>>                       free is larger than population, have not had the
>>>>> time to dig into this.
>>>>>
>>>>>                       Thanks, Robbin
>>>>>
>>>>>                       On 06/22/2017 10:19 PM, Roman Kennke wrote:
>>>>>
>>>>>                           So here's the latest iteration of that patch:
>>>>>
>>>>>                         
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.08/
>>>>>                         
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.08/>
>>>>>
>>>>>                           I checked and fixed all the counters. The
>>>>> problem here is that they
>>>>>                           are
>>>>>                           not updated in a single place
>>>>> (deflate_idle_monitors() ) but in
>>>>>                           several
>>>>>                           places, potentially by multiple threads. I
>>>>> split up deflation into
>>>>>                           prepare_.. and a finish_.. methods to
>>>>> initialize local and update
>>>>>                           global
>>>>>                           counters respectively, and pass around a
>>>>> counters object (allocated on
>>>>>                           stack) to the various code paths that use it.
>>>>> Updating the counters
>>>>>                           always happen under a lock, there's no need
>>>>> to do anything special
>>>>>                           with
>>>>>                           regards to concurrency.
>>>>>
>>>>>                           I also checked the nmethod marking, but there
>>>>> doesn't seem to be
>>>>>                           anything in that code that looks problematic
>>>>> under concurrency. The
>>>>>                           worst that can happen is that two threads
>>>>> write the same value into an
>>>>>                           nmethod field. I think we can live with
>>>>> that ;-)
>>>>>
>>>>>                           Good to go?
>>>>>
>>>>>                           Tested by running specjvm and jcstress
>>>>> fastdebug+release without
>>>>>                           issues.
>>>>>
>>>>>                           Roman
>>>>>
>>>>>                           Am 02.06.2017 um 12:39 schrieb Robbin Ehn:
>>>>>
>>>>>                               Hi Roman,
>>>>>
>>>>>                               On 06/02/2017 11:41 AM, Roman Kennke
>>>>> wrote:
>>>>>
>>>>>                                   Hi David,
>>>>>                                   thanks for reviewing. I'll be on
>>>>> vacation the next two weeks too,
>>>>>                                   with
>>>>>                                   only sporadic access to work stuff.
>>>>>                                   Yes, exposure will not be as good as
>>>>> otherwise, but it's not totally
>>>>>                                   untested either: the serial code path
>>>>> is the same as the
>>>>>                                   parallel, the
>>>>>                                   only difference is that it's not
>>>>> actually called by multiple
>>>>>                                   threads.
>>>>>                                   It's ok I think.
>>>>>
>>>>>                                   I found two more issues that I think
>>>>> should be addressed:
>>>>>                                   - There are some counters in
>>>>> deflate_idle_monitors() and I'm not
>>>>>                                   sure I
>>>>>                                   correctly handle them in the split-up
>>>>> and MT'ed thread-local/ global
>>>>>                                   list deflation
>>>>>                                   - nmethod marking seems to
>>>>> unconditionally poke true or something
>>>>>                                   like
>>>>>                                   that in nmethod fields. This doesn't
>>>>> hurt correctness-wise, but it's
>>>>>                                   probably worth checking if it's
>>>>> already true, especially when doing
>>>>>                                   this
>>>>>                                   with multiple threads concurrently.
>>>>>
>>>>>                                   I'll send an updated patch around
>>>>> later, I hope I can get to it
>>>>>                                   today...
>>>>>
>>>>>
>>>>>                               I'll review that when you get it out.
>>>>>                               I think this looks as a reasonable step
>>>>> before we tackle this with a
>>>>>                               major effort, such as the JEP you and
>>>>> Carsten doing.
>>>>>                               And another effort to 'fix' nmethods
>>>>> marking.
>>>>>
>>>>>                               Internal discussion yesterday lead us to
>>>>> conclude that the runtime
>>>>>                               will probably need more threads.
>>>>>                               This would be a good driver to do a
>>>>> 'global' worker pool which serves
>>>>>                               both gc, runtime and safepoints with
>>>>> threads.
>>>>>
>>>>>
>>>>>                                   Roman
>>>>>
>>>>>                                       Hi Roman,
>>>>>
>>>>>                                       I am about to disappear on an
>>>>> extended vacation so will let others
>>>>>                                       pursue this. IIUC this is longer
>>>>> an opt-in by the user at runtime,
>>>>>                                       but
>>>>>                                       an opt-in by the particular GC
>>>>> developers. Okay. My only concern
>>>>>                                       with
>>>>>                                       that is if Shenandoah is the only
>>>>> GC that currently opts in then
>>>>>                                       this
>>>>>                                       code is not going to get much
>>>>> testing and will be more prone to
>>>>>                                       incidental breakage.
>>>>>
>>>>>
>>>>>                               As I mentioned before, it seem like Erik
>>>>> Ö have some idea, maybe he
>>>>>                               can do this after his barrier patch.
>>>>>
>>>>>                               Thanks!
>>>>>
>>>>>                               /Robbin
>>>>>
>>>>>
>>>>>                                       Cheers,
>>>>>                                       David
>>>>>
>>>>>                                       On 2/06/2017 2:21 AM, Roman
>>>>> Kennke wrote:
>>>>>
>>>>>                                           Am 01.06.2017 um 17:50
>>>>> schrieb Roman Kennke:
>>>>>
>>>>>                                               Am 01.06.2017 um 14:18
>>>>> schrieb Robbin Ehn:
>>>>>
>>>>>                                                   Hi Roman,
>>>>>
>>>>>                                                   On 06/01/2017 11:29
>>>>> AM, Roman Kennke wrote:
>>>>>
>>>>>                                                       Am 31.05.2017 um
>>>>> 22:06 schrieb Robbin Ehn:
>>>>>
>>>>>                                                           Hi Roman, I
>>>>> agree that is really needed but:
>>>>>
>>>>>                                                           On 05/31/2017
>>>>> 10:27 AM, Roman Kennke wrote:
>>>>>
>>>>>                                                               I
>>>>> realized that sharing workers with GC is not so easy.
>>>>>
>>>>>                                                               We need
>>>>> to be able to use the workers at a safepoint during
>>>>>                                                              
>>>>> concurrent
>>>>>                                                               GC work
>>>>> (which also uses the same workers). This does not
>>>>>                                                               only
>>>>>                                                               require
>>>>>                                                               that
>>>>> those workers be suspended, like e.g.
>>>>>                                                             
>>>>> SuspendibleThreadSet::yield(), but they need to be idle, i.e.
>>>>>                                                               have
>>>>>                                                               finished
>>>>> their tasks. This needs some careful handling to
>>>>>                                                               work
>>>>>                                                               without
>>>>>                                                               races: it
>>>>> requires a SuspendibleThreadSetJoiner around the
>>>>>                                                             
>>>>> corresponding
>>>>>                                                             
>>>>> run_task() call and also the tasks themselves need to join
>>>>>                                                               the
>>>>>                                                               STS and
>>>>>                                                               handle
>>>>> requests for safepoints not by yielding, but by
>>>>>                                                               leaving
>>>>>                                                               the
>>>>>                                                               task.
>>>>>                                                               This is
>>>>> far too peculiar for me to make the call to hook
>>>>>                                                               up GC
>>>>>                                                               workers
>>>>>                                                               for
>>>>> safepoint cleanup, and I thus removed those parts. I
>>>>>                                                               left the
>>>>>                                                               API in
>>>>>                                                             
>>>>> CollectedHeap in place. I think GC devs who know better
>>>>>                                                               about G1
>>>>>                                                               and CMS
>>>>>                                                               should
>>>>> make that call, or else just use a separate thread
>>>>>                                                               pool.
>>>>>
>>>>>                                                             
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.05/
>>>>>                                                             
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.05/>
>>>>>
>>>>>                                                               Is it ok
>>>>> now?
>>>>>
>>>>>                                                           I still think
>>>>> you should put the "Parallel Safepoint Cleanup"
>>>>>                                                           workers
>>>>>                                                           inside
>>>>> Shenandoah,
>>>>>                                                           so the
>>>>> SafepointSynchronizer only calls get_safepoint_workers,
>>>>>                                                           e.g.:
>>>>>
>>>>>                                                         
>>>>> _cleanup_workers = heap->get_safepoint_workers();
>>>>>                                                         
>>>>> _num_cleanup_workers = _cleanup_workers != NULL ?
>>>>>                                                         
>>>>> _cleanup_workers->total_workers() : 1;
>>>>>                                                         
>>>>> ParallelSPCleanupTask cleanup(_cleanup_subtasks);
>>>>>                                                         
>>>>> StrongRootsScope srs(_num_cleanup_workers);
>>>>>                                                           if
>>>>> (_cleanup_workers != NULL) {
>>>>>                                                         
>>>>> _cleanup_workers->run_task(&cleanup,
>>>>>                                                         
>>>>> _num_cleanup_workers);
>>>>>                                                           } else {
>>>>>                                                           cleanup.work
>>>>> <http://cleanup.work>(0);
>>>>>                                                           }
>>>>>
>>>>>                                                           That way you
>>>>> don't even need your new flags, but it will be
>>>>>                                                           up to
>>>>>                                                           the
>>>>>                                                           other GCs to
>>>>> make their worker available
>>>>>                                                           or cheat with
>>>>> a separate workgang.
>>>>>
>>>>>                                                       I can do that, I
>>>>> don't mind. The question is, do we want that?
>>>>>
>>>>>                                                   The problem is that
>>>>> we do not want to haste such decision, we
>>>>>                                                   believe
>>>>>                                                   there is a better
>>>>> solution.
>>>>>                                                   I think you also
>>>>> would want another solution.
>>>>>                                                   But it's seems like
>>>>> such solution with 1 'global' thread pool
>>>>>                                                   either
>>>>>                                                   own by GC or the VM
>>>>> it self is quite the undertaking.
>>>>>                                                   Since this probably
>>>>> will not be done any time soon my
>>>>>                                                   suggestion is,
>>>>>                                                   to not hold you back
>>>>> (we also want this), just to make
>>>>>                                                   the code parallel and
>>>>> as an intermediate step ask the GC if it
>>>>>                                                   minds
>>>>>                                                   sharing it's thread.
>>>>>
>>>>>                                                   Now when Shenandoah
>>>>> is merged it's possible that e.g. G1 will
>>>>>                                                   share
>>>>>                                                   the code for a
>>>>> separate thread pool, do something of it's own or
>>>>>                                                   wait until the bigger
>>>>> question about thread pool(s) have been
>>>>>                                                   resolved.
>>>>>
>>>>>                                                   By adding a thread
>>>>> pool directly to the SafepointSynchronizer
>>>>>                                                   and
>>>>>                                                   flags for it we might
>>>>> limit our future options.
>>>>>
>>>>>                                                       I wouldn't call
>>>>> it 'cheating with a separate workgang'
>>>>>                                                       though. I
>>>>>                                                       see
>>>>>                                                       that both G1 and
>>>>> CMS suspend their worker threads at a
>>>>>                                                       safepoint.
>>>>>                                                       However:
>>>>>
>>>>>                                                   Yes it's not cheating
>>>>> but I want decent heuristics between e.g.
>>>>>                                                   number
>>>>>                                                   of concurrent marking
>>>>> threads and parallel safepoint threads
>>>>>                                                   since
>>>>>                                                   they compete for cpu
>>>>> time.
>>>>>                                                   As the code looks
>>>>> now, I think that decisions must be made by
>>>>>                                                   the
>>>>>                                                   GC.
>>>>>
>>>>>                                               Ok, I see your point. I
>>>>> updated the proposed patch accordingly:
>>>>>
>>>>>                                             
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.06/
>>>>>                                             
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.06/>
>>>>>
>>>>>                                           Oops. Minor mistake there.
>>>>> Correction:
>>>>>                                         
>>>>> http://cr.openjdk.java.net/~rkennke/8180932/webrev.07/
>>>>>                                         
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.07/>
>>>>>
>>>>>                                           (Removed 'class WorkGang'
>>>>> from safepoint.hpp, and forgot to add it
>>>>>                                           into
>>>>>                                           collectedHeap.hpp, resulting
>>>>> in build failure...)
>>>>>
>>>>>                                           Roman
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>>>
>>>
>