RFR: 8087324: Use semaphores when starting and stopping GC task threads
Stefan Johansson
stefan.johansson at oracle.com
Fri Aug 14 11:27:03 UTC 2015
Hi Stefan,
On 2015-07-02 17:03, Jon Masamitsu wrote:
>
>
> On 07/02/2015 04:48 AM, Stefan Karlsson wrote:
>>
>>
>> On 2015-07-01 18:31, Jon Masamitsu wrote:
>>>
>>>
>>> On 6/12/2015 7:52 AM, Stefan Karlsson wrote:
>>>> Hi all,
>>>>
>>>> The current implementation to distribute tasks to GC worker threads
>>>> often cause long latencies (multiple milliseconds) when the threads
>>>> are started and stopped.
>>>>
>>>> The main reason is that the worker threads have to fight over the
>>>> Monitor lock when they are woken up from the call to Monitor::wait.
>>>> Another reason is that all worker threads call notify_all when they
>>>> finish a task and there wakes all all sleeping worker threads,
>>>> which will yet again force the worker threads to fight over the lock.
>>>>
>>>> I propose that we use semaphores instead, so that the worker
>>>> threads don't have to fight over a lock when they are woken up.
>>>>
>>>>
>>>> The patches build upon the following patch which introduces a
>>>> Semaphore utility class. This patch will sent out for review on the
>>>> hotspot-dev, since it affects non-GC parts of the code:
>>>> http://cr.openjdk.java.net/~stefank/8087322/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8087322
>>>>
>>>>
>>>> The first patch that I would like to get reviewed is:
>>>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8087323 - Unify and split
>>>> the work gang classes
>>>>
>>>> It prepares for JDK-8087324, by separating the generic WorkGang
>>>> implementation from the more elaborate YieldingFlexibleWorkGang
>>>> (CMS) implementation. By having this part as a separate patch, I
>>>> hope it will be easier to review JDK-8087324. The patch changes the
>>>> work gang inheritance from:
>>>>
>>>> AbstractWorkGang
>>>> WorkGang
>>>> FlexibleWorkGang
>>>> YieldingFlexibleWorkGang
>>>>
>>>> to:
>>>>
>>>> AbstractWorkGang
>>>> WorkGang
>>>> YieldingFlexibleWorkGang
>>>>
>>>> Parts of the FlexibleWorkGang and WorkGang code that is going to be
>>>> used by both concrete work gang classes, has been moved into
>>>> AbstractWorkGang. I've duplicated some code in WorkGang and
>>>> YieldingFlexibleWorkGang, but that code will be removed from
>>>> WorkGang in the following patch.
The changes for 8087323 looks good. I agree with Jons comments below
about removing the function is_Yielding... and the cast in
YieldingFlexibleGangTask.
Thanks,
Stefan
>>>
>>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.hpp.frames.html
>>>
>>> There seems to be only one definition of
>>> is_YieldingFlexibleGang_task() now. Is that right? Is that useful?
>>>
>>> 131 NOT_PRODUCT(virtual bool is_YieldingFlexibleGang_task() const {
>>> 132 return true;
>>> 133 })
>>
>> I agree. I don't think we need it anymore.
>>
> Thanks.
>
>>
>>>
>>> Not a change in your patch but
>>>
>>> 86 AbstractWorkGang(const char* name, uint workers, bool are_GC_task_threads, bool are_ConcurrentGC_threads) :
>>> 87 _name(name),
>>> 88 _total_workers(workers),
>>> 89 _active_workers(UseDynamicNumberOfGCThreads ? 1U : workers),
>>> 90 _are_GC_task_threads(are_GC_task_threads),
>>> 91 _are_Concurren
>>>
>>> _active_workers is always calculated as >= 2 unless _total_workers
>>> is only 1.
>>> So line 89 should be
>>>
>>> _active_workers(UseDynamicNumberOfGCThreads ? MIN2(2, workers) :
>>> workers)
>>>
>>> Should I file a CR for that? Or do you want to include it.
>>
>> I'm not sure that what is proposed above is correct. I see that
>> AdaptiveSizePolicy::calc_active_workers returns 2 as a minimum, but
>> both ConcurrentMark::calc_parallel_marking_threads and
>> AdaptiveSizePolicy::calc_active_conc_workers can return 1.
>>
>> I also don't think it should be AbstractWorkGang's responsibility to
>> have the knowledge about the minimum number of worker threads that
>> are used when UseDynamicNumberOfGCThreads are turned on. Maybe we
>> should set it to 0, and let the calc_*_active_workers setup the
>> default value.
>
> I think that at one time I had tried to set the default to 0 and
> something failed. I can see the point though.
>
>>
>> I would prefer to handle any changes, to this part of the code, as
>> separate RFEs.
>
> Fair enough. If I think it's worth doing, I'll file and RFE. Probably
> something more than just setting
> it to 2 (maybe picking a default value with the help of
> AdaptiveSizePolicy).
>
>
>>
>>> Have you considered (maybe for a later patch) changing
>>> YieldingFlexibleWorkGang to
>>> simply YieldingWorkGang? The "Flexible" attribute of
>>> YieldingFlexibleWorkGang having
>>> been moved into AbstractWorkGang.
>>
>> I thought about it, but didn't think it was important enough to
>> warrant that change in this patch. I wouldn't mind if a RFE was
>> created to change the name.
>
> I'll file the RFE if you agree with the point that "Flexible"
> describes what AbstractWorkGang does now.
>
>>
>>>
>>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.cpp.frames.html
>>>
>>> Is the cast at 53 necessary? I see it in the original code too.
>>>
>>> 50 AbstractGangWorker* YieldingFlexibleWorkGang::allocate_worker(uint which) {
>>> 51 YieldingFlexibleGangWorker* new_member =
>>> 52 new YieldingFlexibleGangWorker(this, which);
>>> 53 return (YieldingFlexibleGangWorker*) new_member;
>>> 54 }
>>
>> Yes, this is unnecessary.
>
> Thanks.
>
> Jon
>
>>
>>>
>>> The rest looks good.
>>
>> Thanks.
>>
>>>
>>> I'll do the second patch next.
>>
>> Great.
>>
>> StefanK
>>
>>>
>>> Jon
>>>
>>>>
>>>>
>>>> The second patch I'd like to get reviewed is:
>>>> http://cr.openjdk.java.net/~stefank/8087324/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8087324 - Use semaphores
>>>> when starting and stopping GC task threads
>>>>
>>>> It first simplifies the way we distribute the tasks to the GC
>>>> worker threads. For example, the coordinator thread dispatches a
>>>> task to a specific number of workers, and then waits for all work
>>>> to be completed. There's no risk that multiple tasks will be
>>>> scheduled simultaneously, so there's no need for the sequences
>>>> number that is used in the current implementation.
>>>>
>>>> The patch contains two task dispatch / thread synchronization
>>>> implementations:
>>>>
>>>> The first implementation uses Monitors, similar to what we did
>>>> before the patch, but with a slightly lower overhead since the code
>>>> calls notify_all less often. It still suffers from the "thundering
>>>> heard" problem. When the coordinator thread signals that the worker
>>>> threads should start, they all wake up from Monitor::wait and they
>>>> all try to lock the Monitor.
>>>>
>>>> The second, and the more interesting, implementation uses
>>>> semaphores. When the worker threads wake up from the semaphore
>>>> wait, they don't have to serialize the execution by taking a lock.
>>>> This greatly decreases the time it takes to start and stop the
>>>> worker threads.
>>>>
>>>> The semaphore implementation is used on all platforms where the
>>>> Semaphore class has been implemented in JDK-8087322. So, on some
>>>> OS:es the code will revert to the Monitor-based solution until a
>>>> Semaphore class has been implemented for that OS. So, porters might
>>>> want to consider implementing the Sempahore class.
>>>>
>>>> There's also a diagnostic vm option
>>>> (-XX:+/-UseSemaphoreGCThreadsSynchronization) to turn off the
>>>> Semaphore-based implementation, which can be used to debug this new
>>>> code. It's mainly targeted towards support and sustaining engineering.
>>>>
>>>>
>>>> The patches have been performance tested on Linux, Solaris, OSX,
>>>> and Windows.
>>>>
>>>> The effects of the patch can be seen by running benchmarks with
>>>> small young gen sizes, which triggers frequent and short GCs.
>>>>
>>>> For example, here are runs from the SPECjvm2008 xml.transform
>>>> benchmark with:
>>>> -Xmx1g -Xms1g -Xmn64m -XX:+PrintGC -XX:+UseG1GC -jar
>>>> SPECjvm2008.jar -ikv xml.transform -it 30 -wt 30
>>>>
>>>> I got the following GC times:
>>>>
>>>> Average Median 99.9 percentile Max
>>>> Baseline: 8.76ms 8.44 ms 25.9 ms 34.7 ms
>>>> Monitor: 6.17 ms 5.88 ms 26.0 ms 49.1 ms
>>>> Semaphore: 3.43 ms 3.26 ms 13.4 ms 33.4 ms
>>>>
>>>> If I run an empty GC task 10 times per GC, by running the following
>>>> code:
>>>> http://cr.openjdk.java.net/~stefank/8087324/timedTask/
>>>>
>>>> I get the following numbers to complete the empty GC tasks:
>>>>
>>>> Average Median 99.9 percentile Max
>>>> Baseline: 1.43 ms 0.92 ms 3.43 ms 9.30ms
>>>> Monitor: 0.75ms 0.72 ms 1.74 ms 2.78ms
>>>> Semaphore: 0.07 ms 0.07 ms 0.17 ms 0.26 ms
>>>>
>>>>
>>>>
>>>> The code has been tested with JPRT and our nightly testing suites.
>>>>
>>>> I've created a unit test to run a small test with both the
>>>> semaphore implementation and the monitor implementation:
>>>> http://cr.openjdk.java.net/~stefank/8087324/workgangTest/
>>>>
>>>> But since we currently don't have code to shutdown worker threads
>>>> after they have been started, I don't want to push this test (or
>>>> clean it up) until we have that in place. I created this bug for that:
>>>> https://bugs.openjdk.java.net/browse/JDK-8087340
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150814/e0cfff33/attachment.htm>
More information about the hotspot-gc-dev
mailing list