RFR: 8087324: Use semaphores when starting and stopping GC task threads

Jon Masamitsu jon.masamitsu at oracle.com
Thu Jul 2 17:43:38 UTC 2015



On 6/29/2015 2:38 AM, Stefan Karlsson wrote:
> Hi all,
>
> "8087322: Implement a Semaphore utility class" has now been pushed, so 
> I've updated the patch to reflect the changes.
>
> http://cr.openjdk.java.net/~stefank/8087324/webrev.01.delta
> http://cr.openjdk.java.net/~stefank/8087324/webrev.01
http://cr.openjdk.java.net/~stefank/8087324/webrev.01/src/share/vm/gc/shared/workgroup.hpp.frames.html

Are these used?

  194   void print_worker_started_task(AbstractGangTask* task, uint worker_id);
  195   void print_worker_finished_task(AbstractGangTask* task, uint worker_id);


Rest looks good.  Just one question (just for my education).
  

http://cr.openjdk.java.net/~stefank/8087324/webrev.01/src/share/vm/gc/shared/workgroup.cpp.frames.html

  336 void GangWorker::loop() {
  337   while (true) {
  338     WorkData data = wait_for_task();
  339
  340     run_task(data);
  341
  342     signal_task_done();
  343   }
  344 }

Does this allow the same thread to execute more than 1
task ("data" here)?  Meaning if 2 threads are  requested but
1 thread is not scheduled to a cpu, will the other thread
do both chunks of work?

Jon


>
> - The IMPLEMENTS_SEMAPHORE_CLASS define was removed, since all 
> platforms need to provide a Semaphore implementation.
>
> - Removed the need to pass down "max number of workers" to the 
> Semaphore constructor.
>
> - Updated semaphore.hpp include path
>
> Thanks,
> StefanK
>
>
> On 2015-06-12 16:52, Stefan Karlsson wrote:
>> Hi all,
>>
>> The current implementation to distribute tasks to GC worker threads 
>> often cause long latencies (multiple milliseconds) when the threads 
>> are started and stopped.
>>
>> The main reason is that the worker threads have to fight over the 
>> Monitor lock when they are woken up from the call to Monitor::wait. 
>> Another reason is that all worker threads call notify_all when they 
>> finish a task and there wakes all all sleeping worker threads, which 
>> will yet again force the worker threads to fight over the lock.
>>
>> I propose that we use semaphores instead, so that the worker threads 
>> don't have to fight over a lock when they are woken up.
>>
>>
>> The patches build upon the following patch which introduces a 
>> Semaphore utility class. This patch will sent out for review on the 
>> hotspot-dev, since it affects non-GC parts of the code:
>> http://cr.openjdk.java.net/~stefank/8087322/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8087322
>>
>>
>> The first patch that I would like to get reviewed is:
>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8087323 - Unify and split 
>> the work gang classes
>>
>> It prepares for JDK-8087324, by separating the generic WorkGang 
>> implementation from the more elaborate YieldingFlexibleWorkGang (CMS) 
>> implementation. By having this part as a separate patch, I hope it 
>> will be easier to review JDK-8087324. The patch changes the work gang 
>> inheritance from:
>>
>> AbstractWorkGang
>>  WorkGang
>>   FlexibleWorkGang
>>    YieldingFlexibleWorkGang
>>
>> to:
>>
>> AbstractWorkGang
>>  WorkGang
>>  YieldingFlexibleWorkGang
>>
>> Parts of the FlexibleWorkGang and WorkGang code that is going to be 
>> used by both concrete work gang classes, has been moved into 
>> AbstractWorkGang. I've duplicated some code in WorkGang and 
>> YieldingFlexibleWorkGang, but that code will be removed from WorkGang 
>> in the following patch.
>>
>>
>> The second patch I'd like to get reviewed is:
>> http://cr.openjdk.java.net/~stefank/8087324/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8087324 - Use semaphores 
>> when starting and stopping GC task threads
>>
>> It first simplifies the way we distribute the tasks to the GC worker 
>> threads. For example, the coordinator thread dispatches a task to a 
>> specific number of workers, and then waits for all work to be 
>> completed. There's no risk that multiple tasks will be scheduled 
>> simultaneously, so there's no need for the sequences number that is 
>> used in the current implementation.
>>
>> The patch contains two task dispatch / thread synchronization 
>> implementations:
>>
>> The first implementation uses Monitors, similar to what we did before 
>> the patch, but with a slightly lower overhead since the code calls 
>> notify_all less often. It still suffers from the "thundering heard" 
>> problem. When the coordinator thread signals that the worker threads 
>> should start, they all wake up from Monitor::wait and they all try to 
>> lock the Monitor.
>>
>> The second, and the more interesting, implementation uses semaphores. 
>> When the worker threads wake up from the semaphore wait, they don't 
>> have to serialize the execution by taking a lock. This greatly 
>> decreases the time it takes to start and stop the worker threads.
>>
>> The semaphore implementation is used on all platforms where the 
>> Semaphore class has been implemented in JDK-8087322. So, on some 
>> OS:es the code will revert to the Monitor-based solution until a 
>> Semaphore class has been implemented for that OS. So, porters might 
>> want to consider implementing the Sempahore class.
>>
>> There's also a diagnostic vm option 
>> (-XX:+/-UseSemaphoreGCThreadsSynchronization) to turn off the 
>> Semaphore-based implementation, which can be used to debug this new 
>> code. It's mainly targeted towards support and sustaining engineering.
>>
>>
>> The patches have been performance tested on Linux, Solaris, OSX, and 
>> Windows.
>>
>> The effects of the patch can be seen by running benchmarks with small 
>> young gen sizes, which triggers frequent and short GCs.
>>
>> For example, here are runs from the SPECjvm2008 xml.transform 
>> benchmark with:
>> -Xmx1g -Xms1g -Xmn64m -XX:+PrintGC -XX:+UseG1GC -jar SPECjvm2008.jar 
>> -ikv xml.transform -it 30 -wt 30
>>
>> I got the following GC times:
>>
>>             Average    Median    99.9 percentile   Max
>> Baseline: 8.76ms    8.44 ms   25.9 ms 34.7 ms
>> Monitor:   6.17 ms 5.88 ms   26.0 ms 49.1 ms
>> Semaphore: 3.43 ms 3.26 ms   13.4 ms           33.4 ms
>>
>> If I run an empty GC task 10 times per GC, by running the following code:
>> http://cr.openjdk.java.net/~stefank/8087324/timedTask/
>>
>> I get the following numbers to complete the empty GC tasks:
>>
>>             Average    Median    99.9 percentile   Max
>> Baseline: 1.43 ms    0.92 ms   3.43 ms           9.30ms
>> Monitor:    0.75ms 0.72 ms   1.74 ms           2.78ms
>> Semaphore: 0.07 ms 0.07 ms   0.17 ms           0.26 ms
>>
>>
>>
>> The code has been tested with JPRT and our nightly testing suites.
>>
>> I've created a unit test to run a small test with both the semaphore 
>> implementation and the monitor implementation:
>> http://cr.openjdk.java.net/~stefank/8087324/workgangTest/
>>
>> But since we currently don't have code to shutdown worker threads 
>> after they have been started, I don't want to push this test (or 
>> clean it up) until we have that in place. I created this bug for that:
>> https://bugs.openjdk.java.net/browse/JDK-8087340
>>
>> Thanks,
>> StefanK
>>
>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150702/870f4d63/attachment.htm>


More information about the hotspot-gc-dev mailing list