RFR(M): Memory ordering in taskqueue.hpp

Fri Mar 8 07:31:39 PST 2013

Hi Vitaly,

the writer has a sync after the write to bottom (see pop_local)
but the store and the sync are not one atomic operation.  Stuff
can happen before the sync is executed.  Therefore the reader
must sync, too.

Anyways, in pop_global it should be cheaper than in pop_local.

Best regards,
  Geotz.

From: Vitaly Davidovich [mailto:vitalyd at gmail.com]
Sent: Freitag, 8. März 2013 16:13
To: Lindenmaier, Goetz
Cc: hotspot-dev at openjdk.java.net; David Holmes; John Cuthbertson; Simonis, Volker
Subject: RE: RFR(M): Memory ordering in taskqueue.hpp

Hmm, if load fences on the reader aren't sufficient, seems like the problem is missing order/fence on the writer? If the write isn't visible to all processors immediately on power, shouldn't there be a fence in the write to ensure all CPUs see it? Putting the fence in the reader seems odd, but maybe that's just me.

Thanks

Sent from my phone
On Mar 8, 2013 10:03 AM, "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>> wrote:
Hi Vitaly,

No, the problem is not reordering.  The problem is that
_bottom, which is read after _age, might be older than _age because
another processor didn't write it back yet.  The fence (sync) makes the
current thread wait until it has the new _bottom.
On Power, a write is not visible to all other threads simultaneously
(no multipl-copy-atomicity).

What you propose helps if you look at two processors, but not if
four threads on four processors are involved.

Best regards,
  Goetz.

From: Vitaly Davidovich [mailto:vitalyd at gmail.com<mailto:vitalyd at gmail.com>]
Sent: Freitag, 8. März 2013 15:29
To: Lindenmaier, Goetz
Cc: David Holmes; hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>; John Cuthbertson; Simonis, Volker
Subject: RE: RFR(M): Memory ordering in taskqueue.hpp

Hi Goetz,

If I understand correctly the problem is that bottom is loaded before age via compiler or arch reordering.  But why is a full fence needed? Is it not sufficient to have just a load fence to load age and relaxed load (just compiler barrier) for bottom?

Thanks

Sent from my phone
On Mar 8, 2013 9:06 AM, "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>> wrote:
Hi,

we have been, and still are, doing research on this issue.
We want to keep you up to date on this, and propose on the
further proceeding.

You asked explicit memory ordering operations and a rationale
why we added them.

Axel found a paper describing the task queue algorithm and the
needed ordering on arm and power:
Correct and Efficient Work-Stealing for Weak Memory Models;
Lê, Pop, Cohen, Nardelli; PPoPP'13;
http://www.di.ens.fr/~zappa/readings/ppopp13.pdf

According to this paper we need to add one fence and one load_acquire
to your implementation of the task queue.  You find this fence in this small
webrev:  http://cr.openjdk.java.net/~goetz/webrevs/8006971-2/

With this fence, the algorithm works on Linux for our openjdk ppc
port, and also for our SAP JVM .

Actually, the fence fixes a problem we discovered with the concurrency
torture test suite.  The problem occurs with four or more GC threads.
If three threads are stealing from the queue of the fourth, two of
them can pop the same element. Without a fence between the access
to age and bottom in pop_global(), bottom can be older than age.

Unfortunately, the OpenJDK implementation with the added fence
does not work on AIX.  Axel already detected one place where the xlC
compiler optimization removed load instruction that is required for
the correctness of the algorithm.  If we use our access routines with load_acquire
(see original webrev below) everything works, unclear why.

Now, we think C++ might allow that this load is removed and xlC does
the correct, but unexpected thing.  On the other hand it might also be
a compiler bug.
We are currently discussing this with the IBM xlC developers.

Best regards,
  Axel and Goetz.

PS: The webrev we proposed originally:
http://cr.openjdk.java.net/~goetz/webrevs/webrev-taskqueue/