RFR: 8087324: Use semaphores when starting and stopping GC task threads

Mon Aug 17 07:10:04 UTC 2015

On 2015-08-14 13:27, Stefan Johansson wrote:
> Hi Stefan,
>
> On 2015-07-02 17:03, Jon Masamitsu wrote:
>>
>>
>> On 07/02/2015 04:48 AM, Stefan Karlsson wrote:
>>>
>>>
>>> On 2015-07-01 18:31, Jon Masamitsu wrote:
>>>>
>>>>
>>>> On 6/12/2015 7:52 AM, Stefan Karlsson wrote:
>>>>> Hi all,
>>>>>
>>>>> The current implementation to distribute tasks to GC worker 
>>>>> threads often cause long latencies (multiple milliseconds) when 
>>>>> the threads are started and stopped.
>>>>>
>>>>> The main reason is that the worker threads have to fight over the 
>>>>> Monitor lock when they are woken up from the call to 
>>>>> Monitor::wait. Another reason is that all worker threads call 
>>>>> notify_all when they finish a task and there wakes all all 
>>>>> sleeping worker threads, which will yet again force the worker 
>>>>> threads to fight over the lock.
>>>>>
>>>>> I propose that we use semaphores instead, so that the worker 
>>>>> threads don't have to fight over a lock when they are woken up.
>>>>>
>>>>>
>>>>> The patches build upon the following patch which introduces a 
>>>>> Semaphore utility class. This patch will sent out for review on 
>>>>> the hotspot-dev, since it affects non-GC parts of the code:
>>>>> http://cr.openjdk.java.net/~stefank/8087322/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8087322
>>>>>
>>>>>
>>>>> The first patch that I would like to get reviewed is:
>>>>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8087323 - Unify and split 
>>>>> the work gang classes
>>>>>
>>>>> It prepares for JDK-8087324, by separating the generic WorkGang 
>>>>> implementation from the more elaborate YieldingFlexibleWorkGang 
>>>>> (CMS) implementation. By having this part as a separate patch, I 
>>>>> hope it will be easier to review JDK-8087324. The patch changes 
>>>>> the work gang inheritance from:
>>>>>
>>>>> AbstractWorkGang
>>>>>  WorkGang
>>>>>   FlexibleWorkGang
>>>>>    YieldingFlexibleWorkGang
>>>>>
>>>>> to:
>>>>>
>>>>> AbstractWorkGang
>>>>>  WorkGang
>>>>>  YieldingFlexibleWorkGang
>>>>>
>>>>> Parts of the FlexibleWorkGang and WorkGang code that is going to 
>>>>> be used by both concrete work gang classes, has been moved into 
>>>>> AbstractWorkGang. I've duplicated some code in WorkGang and 
>>>>> YieldingFlexibleWorkGang, but that code will be removed from 
>>>>> WorkGang in the following patch.
> The changes for 8087323 looks good. I agree with Jons comments below 
> about removing the function is_Yielding... and the cast in 
> YieldingFlexibleGangTask.

Removed.

Thanks,
StefanK

>
> Thanks,
> Stefan
>
>>>>
>>>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.hpp.frames.html
>>>>
>>>> There seems to be only one definition of 
>>>> is_YieldingFlexibleGang_task() now.  Is that right?  Is that useful?
>>>>
>>>>   131   NOT_PRODUCT(virtual bool is_YieldingFlexibleGang_task() const {
>>>>   132     return true;
>>>>   133   })
>>>
>>> I agree. I don't think we need it anymore.
>>>
>> Thanks.
>>
>>>
>>>>
>>>> Not a change  in your patch but
>>>>
>>>>    86   AbstractWorkGang(const char* name, uint workers, bool are_GC_task_threads, bool are_ConcurrentGC_threads) :
>>>>    87       _name(name),
>>>>    88       _total_workers(workers),
>>>>    89       _active_workers(UseDynamicNumberOfGCThreads ? 1U : workers),
>>>>    90       _are_GC_task_threads(are_GC_task_threads),
>>>>    91       _are_Concurren
>>>>
>>>> _active_workers is always calculated as >= 2 unless _total_workers 
>>>> is only 1.
>>>> So line 89 should be
>>>>
>>>> _active_workers(UseDynamicNumberOfGCThreads ? MIN2(2, workers) : 
>>>> workers)
>>>>
>>>> Should I file a CR for that?  Or do you want to include it.
>>>
>>> I'm not sure that what is proposed above is correct. I see that 
>>> AdaptiveSizePolicy::calc_active_workers returns 2 as a minimum, but 
>>> both ConcurrentMark::calc_parallel_marking_threads and 
>>> AdaptiveSizePolicy::calc_active_conc_workers can return 1.
>>>
>>> I also don't think it should be AbstractWorkGang's responsibility to 
>>> have the knowledge about the minimum number of worker threads that 
>>> are used when UseDynamicNumberOfGCThreads are turned on. Maybe we 
>>> should set it to 0, and let the calc_*_active_workers setup the 
>>> default value.
>>
>> I think that at one time I had tried to set the default to 0 and 
>> something failed.  I can see the point though.
>>
>>>
>>> I would prefer to handle any changes, to this part of the code, as 
>>> separate RFEs.
>>
>> Fair enough.  If I think it's worth doing, I'll file and RFE. 
>> Probably something more than just setting
>> it to 2 (maybe picking a default value with the help of 
>> AdaptiveSizePolicy).
>>
>>
>>>
>>>> Have you considered (maybe for a later patch) changing 
>>>> YieldingFlexibleWorkGang to
>>>> simply YieldingWorkGang?  The "Flexible" attribute of 
>>>> YieldingFlexibleWorkGang having
>>>> been moved into AbstractWorkGang.
>>>
>>> I thought about it, but didn't think it was important enough to 
>>> warrant that change in this patch. I wouldn't mind if a RFE was 
>>> created to change the name.
>>
>> I'll file the RFE  if you agree with the point that "Flexible" 
>> describes what AbstractWorkGang does now.
>>
>>>
>>>>
>>>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.cpp.frames.html
>>>>
>>>> Is the cast at 53 necessary? I see it in the original code too.
>>>>
>>>>    50 AbstractGangWorker* YieldingFlexibleWorkGang::allocate_worker(uint which) {
>>>>    51   YieldingFlexibleGangWorker* new_member =
>>>>    52       new YieldingFlexibleGangWorker(this, which);
>>>>    53   return (YieldingFlexibleGangWorker*) new_member;
>>>>    54 }
>>>
>>> Yes, this is unnecessary.
>>
>> Thanks.
>>
>> Jon
>>
>>>
>>>>
>>>> The rest looks good.
>>>
>>> Thanks.
>>>
>>>>
>>>> I'll do the second patch next.
>>>
>>> Great.
>>>
>>> StefanK
>>>
>>>>
>>>> Jon
>>>>
>>>>>
>>>>>
>>>>> The second patch I'd like to get reviewed is:
>>>>> http://cr.openjdk.java.net/~stefank/8087324/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8087324 - Use semaphores 
>>>>> when starting and stopping GC task threads
>>>>>
>>>>> It first simplifies the way we distribute the tasks to the GC 
>>>>> worker threads. For example, the coordinator thread dispatches a 
>>>>> task to a specific number of workers, and then waits for all work 
>>>>> to be completed. There's no risk that multiple tasks will be 
>>>>> scheduled simultaneously, so there's no need for the sequences 
>>>>> number that is used in the current implementation.
>>>>>
>>>>> The patch contains two task dispatch / thread synchronization 
>>>>> implementations:
>>>>>
>>>>> The first implementation uses Monitors, similar to what we did 
>>>>> before the patch, but with a slightly lower overhead since the 
>>>>> code calls notify_all less often. It still suffers from the 
>>>>> "thundering heard" problem. When the coordinator thread signals 
>>>>> that the worker threads should start, they all wake up from 
>>>>> Monitor::wait and they all try to lock the Monitor.
>>>>>
>>>>> The second, and the more interesting, implementation uses 
>>>>> semaphores. When the worker threads wake up from the semaphore 
>>>>> wait, they don't have to serialize the execution by taking a lock. 
>>>>> This greatly decreases the time it takes to start and stop the 
>>>>> worker threads.
>>>>>
>>>>> The semaphore implementation is used on all platforms where the 
>>>>> Semaphore class has been implemented in JDK-8087322. So, on some 
>>>>> OS:es the code will revert to the Monitor-based solution until a 
>>>>> Semaphore class has been implemented for that OS. So, porters 
>>>>> might want to consider implementing the Sempahore class.
>>>>>
>>>>> There's also a diagnostic vm option 
>>>>> (-XX:+/-UseSemaphoreGCThreadsSynchronization) to turn off the 
>>>>> Semaphore-based implementation, which can be used to debug this 
>>>>> new code. It's mainly targeted towards support and sustaining 
>>>>> engineering.
>>>>>
>>>>>
>>>>> The patches have been performance tested on Linux, Solaris, OSX, 
>>>>> and Windows.
>>>>>
>>>>> The effects of the patch can be seen by running benchmarks with 
>>>>> small young gen sizes, which triggers frequent and short GCs.
>>>>>
>>>>> For example, here are runs from the SPECjvm2008 xml.transform 
>>>>> benchmark with:
>>>>> -Xmx1g -Xms1g -Xmn64m -XX:+PrintGC -XX:+UseG1GC -jar 
>>>>> SPECjvm2008.jar -ikv xml.transform -it 30 -wt 30
>>>>>
>>>>> I got the following GC times:
>>>>>
>>>>>             Average    Median    99.9 percentile   Max
>>>>> Baseline: 8.76ms    8.44 ms   25.9 ms 34.7 ms
>>>>> Monitor:   6.17 ms 5.88 ms   26.0 ms 49.1 ms
>>>>> Semaphore: 3.43 ms 3.26 ms   13.4 ms           33.4 ms
>>>>>
>>>>> If I run an empty GC task 10 times per GC, by running the 
>>>>> following code:
>>>>> http://cr.openjdk.java.net/~stefank/8087324/timedTask/
>>>>>
>>>>> I get the following numbers to complete the empty GC tasks:
>>>>>
>>>>>             Average    Median    99.9 percentile   Max
>>>>> Baseline: 1.43 ms    0.92 ms   3.43 ms           9.30ms
>>>>> Monitor:    0.75ms 0.72 ms   1.74 ms           2.78ms
>>>>> Semaphore: 0.07 ms 0.07 ms   0.17 ms 0.26 ms
>>>>>
>>>>>
>>>>>
>>>>> The code has been tested with JPRT and our nightly testing suites.
>>>>>
>>>>> I've created a unit test to run a small test with both the 
>>>>> semaphore implementation and the monitor implementation:
>>>>> http://cr.openjdk.java.net/~stefank/8087324/workgangTest/
>>>>>
>>>>> But since we currently don't have code to shutdown worker threads 
>>>>> after they have been started, I don't want to push this test (or 
>>>>> clean it up) until we have that in place. I created this bug for that:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8087340
>>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150817/01417efd/attachment.htm>