Potential optimization to the GC termination protocol in JDK8

Mon Jun 11 13:10:05 UTC 2018

On 06/11/2018 07:04 AM, Thomas Schatzl wrote:
> Hi Tony,
> 
> On Fri, 2018-06-08 at 11:16 -0500, T S wrote:
>> On Fri, Jun 8, 2018 at 2:35 AM Thomas Schatzl <thomas.schatzl at oracle.
>> com> wrote:
>>>
>>> Hi,
>>>
>>
>> Hi Thomas,
>>
>> Thanks for your feedback.
>>
>>>    sorry for being somewhat late in that thread... busy lately...
>>>
>>> On Thu, 2018-04-26 at 08:53 +0200, Roman Kennke wrote:
>>>>>> [Problem]
>>>>>> The work stealing during the GC takes lots of time to
>>>>>> terminate.
>>>>>>   [...]
>>
>>>> Hi Tony,
>>>> in Shenandoah we have implemented an improved termination
>>>> protocol.
>>>> See the comment here:
>>>>
>>>> http://hg.openjdk.java.net/shenandoah/jdk/file/b4a3595f7c56/src/h
>>>> otsp
>>>> ot/share/gc/shenandoah/shenandoahTaskqueue.hpp#l93
>>>>
>>>> We intend to upstream this stuff very soon, at which point it can
>>>> be used by all other GCs if they wish. It's a drop-in replacement
>>>> for the existing task queue code.
>>>>
>>>> It sounds like it's similar to what you have in mind. Right?
>>>
>>> The ShenandoahTaskTerminator is different in that it optimizes the
>>> amount of woken up threads and the detection of that there is work
>>> to steal.
>>>
>>
>> Thanks for the introduction of ShenandoahTaskTerminator. I will read
>> the code later.
>> Actually, our optimization on GC terminator is similar as you
>> described. We also optimize stealing
> 
> The ShenandoahTaskTerminator does not change the actual stealing itself
> after non-busy threads are woken up.
> 
> It improves the process of waking up threads by (as far as I understand
> ;) of course):
> a) only waking up as many threads as there are work items
> b) only one (changing) master thread looks through the queues for work
> 
> This is independent and so complementary to the work provided by you.
> 
>> in two ways:
>> 1, the amount of live GC threads to control the steal total number;
>> 2, who or which thread to steal.
>> Our modification of steal on JDK 8 can be found on:
>> https://github.com/tonys24/optimized-steal
>>
>> The basic idea of our work is simple:
>> To realize target 1, we replace _n with live_number for live GC
>> threads.
> 
> I have one question about that, that I think the paper also did not
> answer specifically (maybe some interpretation error from me).
> 
> So steal_best_of_2, instead of doing 2*N*N (N = number of threads)
> steal attempts, it does 2*N_live*N_live steal attempts.
> 
> Is there any provision to do these N_live steal attempts only on
> threads that actually have work?
> 
> I can see that just by reducing the number of steal attempts we improve
> performance (I have done these experiments maybe a year or more ago,
> like only doing 2*N attempts or other random iteration counts), but
> just reducing the steal attempts seems to make it often fall back to
> trying to terminate.
> 
> Consider the situation when N_live is much smaller than N - then the
> probability that a given thread trying to find work is small, and so
> there may be a lot of unusuccessful steal attempts.
> 
> It seems to be much better to not only do N_live attempts, but do these
> attempts on only the queues of threads that are live.
> 
> The second optimization shown by the paper, trying to keep stealing
> from the same thread as before probably mitigates this issue of
> unsuccessful stealing attempts quite a bit, but still...

I also believe this optimization can be very beneficial. My observation 
while working on ShenandoahTaskTerminator, there usually only one or two 
queues have remaining works near the termination, so successful stealing 
ratio could be quite low

Thanks,

-Zhengyu

> 
> Property b) of the shenandoah task terminator might help here.
> 
>> https://github.com/tonys24/optimized-steal/blob/master/file6.diff
>> To realize target 2, we replace steal_best_of_2 with
>> steal_best_of_2_kun. We picked two
>> threads, one is stolen successfully last time and the other is picked
>> randomly. Then we select the one which has longer queue size.
> 
> Yeah, that makes sense.
> 
> Thanks,
>    Thomas
>