RFR: Parallelize safepoint cleanup
Roman Kennke
rkennke at redhat.com
Wed May 24 14:40:45 UTC 2017
Erik Helin asked me on IRC to trim down the scope of this change and
split up the big patch into 3: the parallel cleanup, the GC hookup for
deflation, and the GC hookup for nmethods marking. So here comes the
first part:
http://cr.openjdk.java.net/~rkennke/8180932/webrev.01/
<http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.01/>
The description for that part still applies. Should be simpler to review
this way. Will file 2 more enhancement-bugs for the other two parts.
Roman
> Some operations in safepoint cleanup have been observed to (sometimes)
> take significant time. Most notably, idle monitor deflation and nmethod
> marking stick out in some popular applications and benchmarks.
>
> I propose to:
> - parallelize safepoint cleanup processing
> - enable to hook up idle monitor deflation and nmethod marking to GC VM
> ops, if GC can support it (resulting in even more efficient
> deflation/nmethod marking)
>
> In some of my measurements this resulted in much improved pause times.
> For example, in one popular benchmark on a server-class machine, I got
> total average pause time down from ~80ms to ~30ms. In none of my
> measurements has this resulted in decreased performance (although it may
> be possible to construct something. For example, it may not be worth to
> spin up worker threads if there's no work to do.)
>
> Some implementation notes:
>
> I introduced a dedicated worker thread pool in SafepointSynchronize.
> This is only initialized when -XX:+ParallelSafepointCleanup is enabled,
> and uses -XX:ParallelSafepointCleanupThreads=X threads, defaulting to 8
> threads (just a wild guess, open for discussion. With
> -XX:-ParallelSafepointCleanup turned off (the default) it will use the
> old serial safepoint cleanup processing (with optional GC hooks, see below).
>
> Parallel processing first lets all worker threads scan threads and
> thereby deflate idle monitors and mark nmethods (in one pass). The rest
> of the cleanup work is divided into claimed chunks by using SubTasksDone
> (like, e.g., in G1RootProcessor).
>
> Notice that I tried a bunch of other alternatives:
>
> - First I tried to let Java threads deflate their own monitors on
> safepoint arrival. This did not work out, because deflation (currently)
> depends on all Java threads having arrived. Adding another sync point
> there would have defeated the purpose.
>
> - Then I tried to always use workers of the current GC. This did not
> work either, because the GC may be using them, for example if a cleanup
> safepoint is happening during concurrent marking.
>
> - Then I gave SafepointSynchronize its own workers, and I now think this
> is the best solution: indepdent of the GC and relatively isolated code-wise.
>
>
> The other big thing in this change is the possibility to let the GC take
> over deflation and nmethod marking. The motivation for this is simple:
> GCs often scan threads themselves, and when they do, they can just as
> well also do the deflation and nmethod marking. This is more efficient,
> because it's better on caches, and because it parallelizes better (other
> GC workers can do other GC stuff while some are still busy with the
> threads). Notice that this change only provides convenient APIs for the
> GCs to consume, but no actual implementation. More specifically:
>
> - Deflation of idle monitors can be enabled for GC ops by overriding
> VM_Operation::deflates_idle_monitors() to return true. This
> automatically makes Threads::oops_do() and friends to deflate idle
> monitors. CAUTION: this only works if the GC leaves oop's mark words
> alone. Unfortunately, I think all GCs currently in OpenJDK preserve the
> mark word and temporarily use it as forwarding pointer, and thus this
> optimization is not possible. I have done it successfully in Shenandoah
> GC. GC devs need to evaluate this.
>
> - NMethod marking can be enabled by overriding
> VM_Operation::marks_nmethods() to return true. In order to mark nmethods
> during GC thread scanning, one has to call
> NMethodSweeper::prepare_mark_active_nmethods() and pass the returned
> CodeBlobClosure to Thread::oops_do() or
> Threads::possibly_parallel_oops_do(). This is relatively simple and
> should work for all existing GCs. Again, I have done it successfully in
> Shenandoah GC.
>
> - Hooking up deflation and nmethod marking also works with serial
> safepoint cleanup. This may be useful for workloads where it's not worth
> to spin up additional worker threads. They would still benefit from
> improved cleanup at GC pauses.
>
> Webrev:
> http://cr.openjdk.java.net/~rkennke/8180932/webrev.00/
> <http://cr.openjdk.java.net/%7Erkennke/8180932/webrev.00/>
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8180932
>
> Testing: specjvm, specjbb, hotspot_gc
>
> I suppose this requires CSR for the new options?
>
> Opinions?
>
> Roman
>
More information about the hotspot-gc-dev
mailing list