RFR(M): Memory ordering in taskqueue.hpp

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Fri Mar 8 06:04:07 PST 2013


we have been, and still are, doing research on this issue.
We want to keep you up to date on this, and propose on the
further proceeding.

You asked explicit memory ordering operations and a rationale
why we added them. 

Axel found a paper describing the task queue algorithm and the
needed ordering on arm and power:
Correct and Efficient Work-Stealing for Weak Memory Models; 
Lê, Pop, Cohen, Nardelli; PPoPP'13;

According to this paper we need to add one fence and one load_acquire
to your implementation of the task queue.  You find this fence in this small 
webrev:  http://cr.openjdk.java.net/~goetz/webrevs/8006971-2/

With this fence, the algorithm works on Linux for our openjdk ppc 
port, and also for our SAP JVM .  

Actually, the fence fixes a problem we discovered with the concurrency
torture test suite.  The problem occurs with four or more GC threads.
If three threads are stealing from the queue of the fourth, two of 
them can pop the same element. Without a fence between the access
to age and bottom in pop_global(), bottom can be older than age. 

Unfortunately, the OpenJDK implementation with the added fence
does not work on AIX.  Axel already detected one place where the xlC
compiler optimization removed load instruction that is required for 
the correctness of the algorithm.  If we use our access routines with load_acquire
(see original webrev below) everything works, unclear why.

Now, we think C++ might allow that this load is removed and xlC does
the correct, but unexpected thing.  On the other hand it might also be
a compiler bug. 
We are currently discussing this with the IBM xlC developers.

Best regards,
  Axel and Goetz.

PS: The webrev we proposed originally:

More information about the hotspot-dev mailing list