RFR: 8087324: Use semaphores when starting and stopping GC task threads

Wed Jul 1 16:31:25 UTC 2015

On 6/12/2015 7:52 AM, Stefan Karlsson wrote:
> Hi all,
>
> The current implementation to distribute tasks to GC worker threads 
> often cause long latencies (multiple milliseconds) when the threads 
> are started and stopped.
>
> The main reason is that the worker threads have to fight over the 
> Monitor lock when they are woken up from the call to Monitor::wait. 
> Another reason is that all worker threads call notify_all when they 
> finish a task and there wakes all all sleeping worker threads, which 
> will yet again force the worker threads to fight over the lock.
>
> I propose that we use semaphores instead, so that the worker threads 
> don't have to fight over a lock when they are woken up.
>
>
> The patches build upon the following patch which introduces a 
> Semaphore utility class. This patch will sent out for review on the 
> hotspot-dev, since it affects non-GC parts of the code:
> http://cr.openjdk.java.net/~stefank/8087322/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8087322
>
>
> The first patch that I would like to get reviewed is:
> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8087323 - Unify and split the 
> work gang classes
>
> It prepares for JDK-8087324, by separating the generic WorkGang 
> implementation from the more elaborate YieldingFlexibleWorkGang (CMS) 
> implementation. By having this part as a separate patch, I hope it 
> will be easier to review JDK-8087324. The patch changes the work gang 
> inheritance from:
>
> AbstractWorkGang
>  WorkGang
>   FlexibleWorkGang
>    YieldingFlexibleWorkGang
>
> to:
>
> AbstractWorkGang
>  WorkGang
>  YieldingFlexibleWorkGang
>
> Parts of the FlexibleWorkGang and WorkGang code that is going to be 
> used by both concrete work gang classes, has been moved into 
> AbstractWorkGang. I've duplicated some code in WorkGang and 
> YieldingFlexibleWorkGang, but that code will be removed from WorkGang 
> in the following patch.

http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.hpp.frames.html

There seems to be only one definition of is_YieldingFlexibleGang_task() 
now.  Is that right?  Is that useful?

  131   NOT_PRODUCT(virtual bool is_YieldingFlexibleGang_task() const {
  132     return true;
  133   })

Not a change  in your patch but

   86   AbstractWorkGang(const char* name, uint workers, bool are_GC_task_threads, bool are_ConcurrentGC_threads) :
   87       _name(name),
   88       _total_workers(workers),
   89       _active_workers(UseDynamicNumberOfGCThreads ? 1U : workers),
   90       _are_GC_task_threads(are_GC_task_threads),
   91       _are_Concurren

_active_workers is always calculated as >= 2 unless _total_workers is 
only 1.
So line 89 should be

_active_workers(UseDynamicNumberOfGCThreads ? MIN2(2, workers) : workers)

Should I file a CR for that?  Or do you want to include it.

Have you considered (maybe for a later patch) changing 
YieldingFlexibleWorkGang to
simply YieldingWorkGang?  The "Flexible" attribute of 
YieldingFlexibleWorkGang having
been moved into AbstractWorkGang.

http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.cpp.frames.html

Is the cast at 53 necessary? I see it in the original code too.

   50 AbstractGangWorker* YieldingFlexibleWorkGang::allocate_worker(uint which) {
   51   YieldingFlexibleGangWorker* new_member =
   52       new YieldingFlexibleGangWorker(this, which);
   53   return (YieldingFlexibleGangWorker*) new_member;
   54 }

The rest looks good.

I'll do the second patch next.

Jon

>
>
> The second patch I'd like to get reviewed is:
> http://cr.openjdk.java.net/~stefank/8087324/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8087324 - Use semaphores when 
> starting and stopping GC task threads
>
> It first simplifies the way we distribute the tasks to the GC worker 
> threads. For example, the coordinator thread dispatches a task to a 
> specific number of workers, and then waits for all work to be 
> completed. There's no risk that multiple tasks will be scheduled 
> simultaneously, so there's no need for the sequences number that is 
> used in the current implementation.
>
> The patch contains two task dispatch / thread synchronization 
> implementations:
>
> The first implementation uses Monitors, similar to what we did before 
> the patch, but with a slightly lower overhead since the code calls 
> notify_all less often. It still suffers from the "thundering heard" 
> problem. When the coordinator thread signals that the worker threads 
> should start, they all wake up from Monitor::wait and they all try to 
> lock the Monitor.
>
> The second, and the more interesting, implementation uses semaphores. 
> When the worker threads wake up from the semaphore wait, they don't 
> have to serialize the execution by taking a lock. This greatly 
> decreases the time it takes to start and stop the worker threads.
>
> The semaphore implementation is used on all platforms where the 
> Semaphore class has been implemented in JDK-8087322. So, on some OS:es 
> the code will revert to the Monitor-based solution until a Semaphore 
> class has been implemented for that OS. So, porters might want to 
> consider implementing the Sempahore class.
>
> There's also a diagnostic vm option 
> (-XX:+/-UseSemaphoreGCThreadsSynchronization) to turn off the 
> Semaphore-based implementation, which can be used to debug this new 
> code. It's mainly targeted towards support and sustaining engineering.
>
>
> The patches have been performance tested on Linux, Solaris, OSX, and 
> Windows.
>
> The effects of the patch can be seen by running benchmarks with small 
> young gen sizes, which triggers frequent and short GCs.
>
> For example, here are runs from the SPECjvm2008 xml.transform 
> benchmark with:
> -Xmx1g -Xms1g -Xmn64m -XX:+PrintGC -XX:+UseG1GC -jar SPECjvm2008.jar 
> -ikv xml.transform -it 30 -wt 30
>
> I got the following GC times:
>
>             Average    Median    99.9 percentile   Max
> Baseline: 8.76ms    8.44 ms   25.9 ms 34.7 ms
> Monitor:   6.17 ms 5.88 ms   26.0 ms 49.1 ms
> Semaphore: 3.43 ms 3.26 ms   13.4 ms           33.4 ms
>
> If I run an empty GC task 10 times per GC, by running the following code:
> http://cr.openjdk.java.net/~stefank/8087324/timedTask/
>
> I get the following numbers to complete the empty GC tasks:
>
>             Average    Median    99.9 percentile   Max
> Baseline: 1.43 ms    0.92 ms   3.43 ms           9.30ms
> Monitor:    0.75ms 0.72 ms   1.74 ms           2.78ms
> Semaphore: 0.07 ms 0.07 ms   0.17 ms           0.26 ms
>
>
>
> The code has been tested with JPRT and our nightly testing suites.
>
> I've created a unit test to run a small test with both the semaphore 
> implementation and the monitor implementation:
> http://cr.openjdk.java.net/~stefank/8087324/workgangTest/
>
> But since we currently don't have code to shutdown worker threads 
> after they have been started, I don't want to push this test (or clean 
> it up) until we have that in place. I created this bug for that:
> https://bugs.openjdk.java.net/browse/JDK-8087340
>
> Thanks,
> StefanK
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150701/11f7b7ce/attachment.htm>