RFR: Optimizing best of two work stealing queue selection
Zhengyu Gu
zgu at redhat.com
Fri Jun 22 19:04:26 UTC 2018
Hi Thomas,
>
> If you mean ParallelClaimableQueueSet: I can't see how it would help in
> finding the queues that still have work during stealing (or in case of
> the mentioned optimization, do more frequent everybody-terminated-yet
> checks based on number of active queues).
>
> If I had to guess, I would think the ParallelClaimableQueueSet of a way
> of using the same queue set for phases with a changing number of
> threads but make sure that at the end of a phase all queues are empty.
> But I may be wrong :)
>
Basically, yes.
But I suspect it has effect to balance queues somehow, since I did not
see high stealing failure rates mentioned in paper.
Normal usage pattern for ParallelClaimableQueueSet, is that, workers
compete to claim *extra* queues and process them first, in the process,
they push new tasks to their *owned" queues. In theory, when we enter
termination phase, we can exclude *extra* queues, since they should all
be empty - it is on my TODO list for next week.
> Also the ShenandoahTaskTerminator only helps (from my understanding)
> with the contention on the "is-there-any-work-remaining" question and
> only waking up as many threads as there is work (not everyone).
Yes, we only wake up enough workers for remaining tasks.
>
> The second optimization in the paper, as far as I understand it, is
> limiting on the number of attempts to steal (i.e. after the thread is
> woken up, how often should he try to find work) - i.e. at the moment in
> GenericTaskQueueSet::steal(), steal_best_of_2() is called 2*#threads^2
> attempts to find work within all task queues regardless of how many
> active threads there are.
>
> The solution in the paper is to call steal_best_of_2() only 2*#live-
> threads^2 within all task queues and then go back to see if everybody
> terminated.
Sounds simply enough, I will give it a try next week.
>
> My suggestion is to not look through all task queues, but somehow
> during waking up the threads, where we already find out how many active
> tasks there are, also record which threads were found to have work.
> And only look through these task queues when trying to steal.
>
> Even if that is an approximation too (while we are looking for work,
> threads might complete their queues), it should make these tests more
> effective.
> However given we first try to steal on the queue of the thread we might
> already be extremely efficient in this area.
Eliminating *extra* queues may improve efficiency ... will see if I can
squeeze some juice here.
>>
>> Yes, it was also in my mind, will fix it.
>
> In case you opt for the second, instead of adding parameters every time
> you need a new thread local variable, it might be useful to add a
> single "context" struct containing them, and pass only that around.
>
I notice that ZGC started to use "__thread variables" for thread local
variables. Could you give us some insights if this is the prefer way to
go? If there is any compatibility issues?
Thanks,
-Zhengyu
> Might give nicer looking code.
>
> Thanks,
> Thomas
>
More information about the shenandoah-dev
mailing list