RFR: 8087324: Use semaphores when starting and stopping GC task threads
Stefan Karlsson
stefan.karlsson at oracle.com
Thu Jul 2 17:58:37 UTC 2015
On 2015-07-02 19:43, Jon Masamitsu wrote:
>
>
> On 6/29/2015 2:38 AM, Stefan Karlsson wrote:
>> Hi all,
>>
>> "8087322: Implement a Semaphore utility class" has now been pushed,
>> so I've updated the patch to reflect the changes.
>>
>> http://cr.openjdk.java.net/~stefank/8087324/webrev.01.delta
>> http://cr.openjdk.java.net/~stefank/8087324/webrev.01
> http://cr.openjdk.java.net/~stefank/8087324/webrev.01/src/share/vm/gc/shared/workgroup.hpp.frames.html
>
> Are these used?
>
> 194 void print_worker_started_task(AbstractGangTask* task, uint worker_id);
> 195 void print_worker_finished_task(AbstractGangTask* task, uint worker_id);
No. I'll remove them.
>
>
> Rest looks good. Just one question (just for my education).
>
> http://cr.openjdk.java.net/~stefank/8087324/webrev.01/src/share/vm/gc/shared/workgroup.cpp.frames.html
>
> 336 void GangWorker::loop() {
> 337 while (true) {
> 338 WorkData data = wait_for_task();
> 339
> 340 run_task(data);
> 341
> 342 signal_task_done();
> 343 }
> 344 }
>
> Does this allow the same thread to execute more than 1
> task ("data" here)? Meaning if 2 threads are requested but
> 1 thread is not scheduled to a cpu, will the other thread
> do both chunks of work?
Yes, that's correct.
Thanks for reviewing!
StefanK
>
> Jon
>
>
>>
>> - The IMPLEMENTS_SEMAPHORE_CLASS define was removed, since all
>> platforms need to provide a Semaphore implementation.
>>
>> - Removed the need to pass down "max number of workers" to the
>> Semaphore constructor.
>>
>> - Updated semaphore.hpp include path
>>
>> Thanks,
>> StefanK
>>
>>
>> On 2015-06-12 16:52, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> The current implementation to distribute tasks to GC worker threads
>>> often cause long latencies (multiple milliseconds) when the threads
>>> are started and stopped.
>>>
>>> The main reason is that the worker threads have to fight over the
>>> Monitor lock when they are woken up from the call to Monitor::wait.
>>> Another reason is that all worker threads call notify_all when they
>>> finish a task and there wakes all all sleeping worker threads, which
>>> will yet again force the worker threads to fight over the lock.
>>>
>>> I propose that we use semaphores instead, so that the worker threads
>>> don't have to fight over a lock when they are woken up.
>>>
>>>
>>> The patches build upon the following patch which introduces a
>>> Semaphore utility class. This patch will sent out for review on the
>>> hotspot-dev, since it affects non-GC parts of the code:
>>> http://cr.openjdk.java.net/~stefank/8087322/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8087322
>>>
>>>
>>> The first patch that I would like to get reviewed is:
>>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8087323 - Unify and split
>>> the work gang classes
>>>
>>> It prepares for JDK-8087324, by separating the generic WorkGang
>>> implementation from the more elaborate YieldingFlexibleWorkGang
>>> (CMS) implementation. By having this part as a separate patch, I
>>> hope it will be easier to review JDK-8087324. The patch changes the
>>> work gang inheritance from:
>>>
>>> AbstractWorkGang
>>> WorkGang
>>> FlexibleWorkGang
>>> YieldingFlexibleWorkGang
>>>
>>> to:
>>>
>>> AbstractWorkGang
>>> WorkGang
>>> YieldingFlexibleWorkGang
>>>
>>> Parts of the FlexibleWorkGang and WorkGang code that is going to be
>>> used by both concrete work gang classes, has been moved into
>>> AbstractWorkGang. I've duplicated some code in WorkGang and
>>> YieldingFlexibleWorkGang, but that code will be removed from
>>> WorkGang in the following patch.
>>>
>>>
>>> The second patch I'd like to get reviewed is:
>>> http://cr.openjdk.java.net/~stefank/8087324/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8087324 - Use semaphores
>>> when starting and stopping GC task threads
>>>
>>> It first simplifies the way we distribute the tasks to the GC worker
>>> threads. For example, the coordinator thread dispatches a task to a
>>> specific number of workers, and then waits for all work to be
>>> completed. There's no risk that multiple tasks will be scheduled
>>> simultaneously, so there's no need for the sequences number that is
>>> used in the current implementation.
>>>
>>> The patch contains two task dispatch / thread synchronization
>>> implementations:
>>>
>>> The first implementation uses Monitors, similar to what we did
>>> before the patch, but with a slightly lower overhead since the code
>>> calls notify_all less often. It still suffers from the "thundering
>>> heard" problem. When the coordinator thread signals that the worker
>>> threads should start, they all wake up from Monitor::wait and they
>>> all try to lock the Monitor.
>>>
>>> The second, and the more interesting, implementation uses
>>> semaphores. When the worker threads wake up from the semaphore wait,
>>> they don't have to serialize the execution by taking a lock. This
>>> greatly decreases the time it takes to start and stop the worker
>>> threads.
>>>
>>> The semaphore implementation is used on all platforms where the
>>> Semaphore class has been implemented in JDK-8087322. So, on some
>>> OS:es the code will revert to the Monitor-based solution until a
>>> Semaphore class has been implemented for that OS. So, porters might
>>> want to consider implementing the Sempahore class.
>>>
>>> There's also a diagnostic vm option
>>> (-XX:+/-UseSemaphoreGCThreadsSynchronization) to turn off the
>>> Semaphore-based implementation, which can be used to debug this new
>>> code. It's mainly targeted towards support and sustaining engineering.
>>>
>>>
>>> The patches have been performance tested on Linux, Solaris, OSX, and
>>> Windows.
>>>
>>> The effects of the patch can be seen by running benchmarks with
>>> small young gen sizes, which triggers frequent and short GCs.
>>>
>>> For example, here are runs from the SPECjvm2008 xml.transform
>>> benchmark with:
>>> -Xmx1g -Xms1g -Xmn64m -XX:+PrintGC -XX:+UseG1GC -jar SPECjvm2008.jar
>>> -ikv xml.transform -it 30 -wt 30
>>>
>>> I got the following GC times:
>>>
>>> Average Median 99.9 percentile Max
>>> Baseline: 8.76ms 8.44 ms 25.9 ms 34.7 ms
>>> Monitor: 6.17 ms 5.88 ms 26.0 ms 49.1 ms
>>> Semaphore: 3.43 ms 3.26 ms 13.4 ms 33.4 ms
>>>
>>> If I run an empty GC task 10 times per GC, by running the following
>>> code:
>>> http://cr.openjdk.java.net/~stefank/8087324/timedTask/
>>>
>>> I get the following numbers to complete the empty GC tasks:
>>>
>>> Average Median 99.9 percentile Max
>>> Baseline: 1.43 ms 0.92 ms 3.43 ms 9.30ms
>>> Monitor: 0.75ms 0.72 ms 1.74 ms 2.78ms
>>> Semaphore: 0.07 ms 0.07 ms 0.17 ms 0.26 ms
>>>
>>>
>>>
>>> The code has been tested with JPRT and our nightly testing suites.
>>>
>>> I've created a unit test to run a small test with both the semaphore
>>> implementation and the monitor implementation:
>>> http://cr.openjdk.java.net/~stefank/8087324/workgangTest/
>>>
>>> But since we currently don't have code to shutdown worker threads
>>> after they have been started, I don't want to push this test (or
>>> clean it up) until we have that in place. I created this bug for that:
>>> https://bugs.openjdk.java.net/browse/JDK-8087340
>>>
>>> Thanks,
>>> StefanK
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150702/347d7625/attachment.htm>
More information about the hotspot-gc-dev
mailing list