RFR: 8087324: Use semaphores when starting and stopping GC task threads

Thu Jul 2 15:03:05 UTC 2015

On 07/02/2015 04:48 AM, Stefan Karlsson wrote:
>
>
> On 2015-07-01 18:31, Jon Masamitsu wrote:
>>
>>
>> On 6/12/2015 7:52 AM, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> The current implementation to distribute tasks to GC worker threads 
>>> often cause long latencies (multiple milliseconds) when the threads 
>>> are started and stopped.
>>>
>>> The main reason is that the worker threads have to fight over the 
>>> Monitor lock when they are woken up from the call to Monitor::wait. 
>>> Another reason is that all worker threads call notify_all when they 
>>> finish a task and there wakes all all sleeping worker threads, which 
>>> will yet again force the worker threads to fight over the lock.
>>>
>>> I propose that we use semaphores instead, so that the worker threads 
>>> don't have to fight over a lock when they are woken up.
>>>
>>>
>>> The patches build upon the following patch which introduces a 
>>> Semaphore utility class. This patch will sent out for review on the 
>>> hotspot-dev, since it affects non-GC parts of the code:
>>> http://cr.openjdk.java.net/~stefank/8087322/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8087322
>>>
>>>
>>> The first patch that I would like to get reviewed is:
>>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8087323 - Unify and split 
>>> the work gang classes
>>>
>>> It prepares for JDK-8087324, by separating the generic WorkGang 
>>> implementation from the more elaborate YieldingFlexibleWorkGang 
>>> (CMS) implementation. By having this part as a separate patch, I 
>>> hope it will be easier to review JDK-8087324. The patch changes the 
>>> work gang inheritance from:
>>>
>>> AbstractWorkGang
>>>  WorkGang
>>>   FlexibleWorkGang
>>>    YieldingFlexibleWorkGang
>>>
>>> to:
>>>
>>> AbstractWorkGang
>>>  WorkGang
>>>  YieldingFlexibleWorkGang
>>>
>>> Parts of the FlexibleWorkGang and WorkGang code that is going to be 
>>> used by both concrete work gang classes, has been moved into 
>>> AbstractWorkGang. I've duplicated some code in WorkGang and 
>>> YieldingFlexibleWorkGang, but that code will be removed from 
>>> WorkGang in the following patch.
>>
>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.hpp.frames.html
>>
>> There seems to be only one definition of 
>> is_YieldingFlexibleGang_task() now.  Is that right?  Is that useful?
>>
>>   131   NOT_PRODUCT(virtual bool is_YieldingFlexibleGang_task() const {
>>   132     return true;
>>   133   })
>
> I agree. I don't think we need it anymore.
>
Thanks.

>
>>
>> Not a change  in your patch but
>>
>>    86   AbstractWorkGang(const char* name, uint workers, bool are_GC_task_threads, bool are_ConcurrentGC_threads) :
>>    87       _name(name),
>>    88       _total_workers(workers),
>>    89       _active_workers(UseDynamicNumberOfGCThreads ? 1U : workers),
>>    90       _are_GC_task_threads(are_GC_task_threads),
>>    91       _are_Concurren
>>
>> _active_workers is always calculated as >= 2 unless _total_workers is 
>> only 1.
>> So line 89 should be
>>
>> _active_workers(UseDynamicNumberOfGCThreads ? MIN2(2, workers) : workers)
>>
>> Should I file a CR for that?  Or do you want to include it.
>
> I'm not sure that what is proposed above is correct. I see that 
> AdaptiveSizePolicy::calc_active_workers returns 2 as a minimum, but 
> both ConcurrentMark::calc_parallel_marking_threads and 
> AdaptiveSizePolicy::calc_active_conc_workers can return 1.
>
> I also don't think it should be AbstractWorkGang's responsibility to 
> have the knowledge about the minimum number of worker threads that are 
> used when UseDynamicNumberOfGCThreads are turned on. Maybe we should 
> set it to 0, and let the calc_*_active_workers setup the default value.

I think that at one time I had tried to set the default to 0 and 
something failed.  I can see the point though.

>
> I would prefer to handle any changes, to this part of the code, as 
> separate RFEs.

Fair enough.  If I think it's worth doing, I'll file and RFE. Probably 
something more than just setting
it to 2 (maybe picking a default value with the help of AdaptiveSizePolicy).

>
>> Have you considered (maybe for a later patch) changing 
>> YieldingFlexibleWorkGang to
>> simply YieldingWorkGang?  The "Flexible" attribute of 
>> YieldingFlexibleWorkGang having
>> been moved into AbstractWorkGang.
>
> I thought about it, but didn't think it was important enough to 
> warrant that change in this patch. I wouldn't mind if a RFE was 
> created to change the name.

I'll file the RFE  if you agree with the point that "Flexible" describes 
what AbstractWorkGang does now.

>
>>
>> http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.cpp.frames.html
>>
>> Is the cast at 53 necessary? I see it in the original code too.
>>
>>    50 AbstractGangWorker* YieldingFlexibleWorkGang::allocate_worker(uint which) {
>>    51   YieldingFlexibleGangWorker* new_member =
>>    52       new YieldingFlexibleGangWorker(this, which);
>>    53   return (YieldingFlexibleGangWorker*) new_member;
>>    54 }
>
> Yes, this is unnecessary.

Thanks.

Jon

>
>>
>> The rest looks good.
>
> Thanks.
>
>>
>> I'll do the second patch next.
>
> Great.
>
> StefanK
>
>>
>> Jon
>>
>>>
>>>
>>> The second patch I'd like to get reviewed is:
>>> http://cr.openjdk.java.net/~stefank/8087324/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8087324 - Use semaphores 
>>> when starting and stopping GC task threads
>>>
>>> It first simplifies the way we distribute the tasks to the GC worker 
>>> threads. For example, the coordinator thread dispatches a task to a 
>>> specific number of workers, and then waits for all work to be 
>>> completed. There's no risk that multiple tasks will be scheduled 
>>> simultaneously, so there's no need for the sequences number that is 
>>> used in the current implementation.
>>>
>>> The patch contains two task dispatch / thread synchronization 
>>> implementations:
>>>
>>> The first implementation uses Monitors, similar to what we did 
>>> before the patch, but with a slightly lower overhead since the code 
>>> calls notify_all less often. It still suffers from the "thundering 
>>> heard" problem. When the coordinator thread signals that the worker 
>>> threads should start, they all wake up from Monitor::wait and they 
>>> all try to lock the Monitor.
>>>
>>> The second, and the more interesting, implementation uses 
>>> semaphores. When the worker threads wake up from the semaphore wait, 
>>> they don't have to serialize the execution by taking a lock. This 
>>> greatly decreases the time it takes to start and stop the worker 
>>> threads.
>>>
>>> The semaphore implementation is used on all platforms where the 
>>> Semaphore class has been implemented in JDK-8087322. So, on some 
>>> OS:es the code will revert to the Monitor-based solution until a 
>>> Semaphore class has been implemented for that OS. So, porters might 
>>> want to consider implementing the Sempahore class.
>>>
>>> There's also a diagnostic vm option 
>>> (-XX:+/-UseSemaphoreGCThreadsSynchronization) to turn off the 
>>> Semaphore-based implementation, which can be used to debug this new 
>>> code. It's mainly targeted towards support and sustaining engineering.
>>>
>>>
>>> The patches have been performance tested on Linux, Solaris, OSX, and 
>>> Windows.
>>>
>>> The effects of the patch can be seen by running benchmarks with 
>>> small young gen sizes, which triggers frequent and short GCs.
>>>
>>> For example, here are runs from the SPECjvm2008 xml.transform 
>>> benchmark with:
>>> -Xmx1g -Xms1g -Xmn64m -XX:+PrintGC -XX:+UseG1GC -jar SPECjvm2008.jar 
>>> -ikv xml.transform -it 30 -wt 30
>>>
>>> I got the following GC times:
>>>
>>>             Average    Median    99.9 percentile   Max
>>> Baseline: 8.76ms    8.44 ms   25.9 ms 34.7 ms
>>> Monitor:   6.17 ms 5.88 ms   26.0 ms 49.1 ms
>>> Semaphore: 3.43 ms 3.26 ms   13.4 ms           33.4 ms
>>>
>>> If I run an empty GC task 10 times per GC, by running the following 
>>> code:
>>> http://cr.openjdk.java.net/~stefank/8087324/timedTask/
>>>
>>> I get the following numbers to complete the empty GC tasks:
>>>
>>>             Average    Median    99.9 percentile   Max
>>> Baseline: 1.43 ms    0.92 ms   3.43 ms           9.30ms
>>> Monitor:    0.75ms 0.72 ms   1.74 ms           2.78ms
>>> Semaphore: 0.07 ms 0.07 ms   0.17 ms           0.26 ms
>>>
>>>
>>>
>>> The code has been tested with JPRT and our nightly testing suites.
>>>
>>> I've created a unit test to run a small test with both the semaphore 
>>> implementation and the monitor implementation:
>>> http://cr.openjdk.java.net/~stefank/8087324/workgangTest/
>>>
>>> But since we currently don't have code to shutdown worker threads 
>>> after they have been started, I don't want to push this test (or 
>>> clean it up) until we have that in place. I created this bug for that:
>>> https://bugs.openjdk.java.net/browse/JDK-8087340
>>>
>>> Thanks,
>>> StefanK
>>>
>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150702/ea42ab57/attachment.htm>