RFR: 8208636: Incresing bottom instead of decreasing top when an owner thread fails to take the last task

Mon Apr 13 07:36:09 UTC 2020

> On Aug 1, 2018, at 8:28 AM, Michihiro Horie <HORIE at jp.ibm.com> wrote:
> 
> Dear all,
> 
> Would you please review the following change?
> This is not a bug fix but refactoring.
> 
> The taskqueue is implemented based on the ABP work-stealing algorithm as noted in comment. (It looks Chase-Lev work-stealing, which addressed the overflow problem of ABP work-stealing, but core algorithm is the same.)
> 231 // Arora, N. S., Blumofe, R. D., and Plaxton, C. G.
> 232 // Thread scheduling for multiprogrammed multiprocessors.
> 233 // Theory of Computing Systems 34, 2 (2001), 115-144.
> 
> Correctness is presented with the following paper as also noted:
> 241 // Le, N. M., Pop, A., Cohen A., and Nardell, F. Z.
> 242 // Correct and efficient work-stealing for weak memory models
> 243 // Proceedings of the 18th ACM SIGPLAN symposium on Principles and
> 244 // practice of parallel programming (PPoPP 2013), 69-80
> 
> There is a slight difference between the current taskqueue and Chase-Lev's deque. When an owner thread and thieves try to take the last task in the deque and a thief succeeds in stealing the task, top is decreased by the owner thread in taskqueue, while bottom is increased in Chase-Lev's deque.
> 
> Le et al. use the characteristics that top monotonically increases for the proofs in their paper. I think there is no reason to decrease top in taskqueue. One tricky thing in current taskqueue is that loading a task before the CAS trial in pop_global() avoids the side effect that owner thread can push (thus, overwrite with) a new task in the same slot at which a thief stole the last task. I think it is more straightforward that top is changeable only via CAS and monotonically increased. Releasing incremented bottom is not necessary because succeeding push will release the bottom. Please note there is no concern in concurrency because size() in pop_global() returns 0 for dirty_size()=N-1 and so there is no thief trying a steal.
> 180 return (sz == N - 1) ? 0 : sz;
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8208636
> Webrev: http://cr.openjdk.java.net/~mhorie/8208636/webrev.00/
> 
> I ran JTREG and SPECjbb2015 to confirm there is no change in the test results and performance. Again this is just a refactoring and it's true current taskqueue works even if it is not aligned with the formal proofs partly.
> 
> 
> Best regards,
> --
> Michihiro,
> IBM Research - Tokyo

This has been sitting unreviewed for a long time. I've recently been
looking at the taskqueue code for other reasons, so decided to take
another look at this. I think we shouldn't make the change, as
discussed in the CR, which I've closed as not an issue.

We're using ABP, modified to make the buffer circular.  APB has some
subtle differences from Chase-Lev, which is the algorithm discussed
and given a correctness proof by Le et al.  I think it is better to
stay close to ABP, rather than use some hybrid.