RFR(M): Memory ordering in taskqueue.hpp

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Wed Dec 19 08:48:41 PST 2012


Hi David, 

first, thanks for the fast replies. Sorry it took me a day now!

> I didn't say that. I said these two need to be ordered and asked which 
> other paired accesses to these variables need to be ordered.
Together with a colleague I looked at the code. To us it seems 
very hard to identify places where access ordering is not needed.
Especially we think there will be places in the users of the 
task queue that would need barriers to ensure ordering.
But we think, the algorithm should be self-contained, doing all the
required ordering itself. 

Further, a barrier will spoil performance on x86 etc, because there it
issues a lock instruction, which is not needed. 


> I said that what you were doing was effectively treating the C++ variables 
> as if they were Java volatile variables. That's not the way we handle 
> these issues in the VM we tend to place fences/barriers at the places in 
> lock-free algorithms that need them, rather than wrap the variables in 
> accessors that impose fences/barriers that are not always needed.

As we understand, the C compilers on x86, sparc etc, enforce far more
than required to implement C++ volatile, basically treating the fields
like Java volatiles.  Therefore your code is fine on your platforms.
The C++ compilers on PPC optimize far more, maxing out
what the standard allows and the processer supports, leading to
immediate errors, but also to very subtle ones.

Our change makes explicit what the C compilers (except for PPC) do
anyways, and what is required by the algorithm. (Maybe one could now
even omit the volatile keywords - but we dare not try.) Thus, we get
it right without any overhead on your platforms.


> Though we have not run into this particular issue. Did 
> it arise in the context of a particular GC?
Issues are only on PPC, at least in parallelScavenge. I reproduced
some of the immediate errors, but I don't think this really helps with
the discussion.  I will try G1 soon.  Tell me if you want to know
more, then I'll try to come up with a wrap-up of our past fixes.

Best regards,
  Goetz.


-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Dienstag, 18. Dezember 2012 12:19
To: Lindenmaier, Goetz
Cc: hotspot-dev at openjdk.java.net
Subject: Re: RFR(M): Memory ordering in taskqueue.hpp

Hi Goetz,

On 18/12/2012 8:47 PM, Lindenmaier, Goetz wrote:
> Hi David,
>
> why do you think only these two accesses have to be ordered?

I didn't say that. I said these two need to be ordered and asked which 
other paired accesses to these variables need to be ordered.

> As I understand, any two accesses in one thread to these fields have to be
> orderered and must be written to memory to become visible in the proper order
> to other threads.

That depends on how they are used. It is the overall lock-free algorithm 
that is important here. There may be places where the order of access 
does not matter (and there may not be).

> The PPC compiler really takes all advantages given by the
> C-Standard to optimize this, and the processor reorders heavily.

Yes I understand. Though we have not run into this particular issue. Did 
it arise in the context of a particular GC?

> The specification of volatile in
>         Age get() const volatile { return _data; }
> says that it must be read from memory, not that it can not be reordered.

I didn't say anything about C/C++ volatile (which doesn't even say it 
must be read from memory - except perhaps in latest MT C++ standard?). I 
said that what you were doing was effectively treating the C++ variables 
as if they were Java volatile variables. That's not the way we handle 
these issues in the VM we tend to place fences/barriers at the places in 
lock-free algorithms that need them, rather than wrap the variables in 
accessors that impose fences/barriers that are not always needed.

Cheers,
David

> Also, doing
>        OrderAccess::load_acquire((volatile idx_t*)&(_age._fields._top))
> is better wrt. to optimizations by C-compilers, as there first
> is the whole address computation, and then the cast to volatile.
> If we use
>        _age.top()
> with
>        idx_t top() const volatile { return OrderAccess::load_acquire((volatile idx_t*)&(_fields._top); }
> compilers don't optimize that well.  (We saw this on hpia64.)
> Further, the access to, e.g., old_age.top() in pop_local_slow() would do
> the OrderAccess, which is useless overhead.
>
> Best regards,
>    Goetz.
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Dienstag, 18. Dezember 2012 01:24
> To: Lindenmaier, Goetz
> Cc: hotspot-dev at openjdk.java.net
> Subject: Re: RFR(M): Memory ordering in taskqueue.hpp
>
> Hi Goetz,
>
> On 17/12/2012 10:58 PM, Lindenmaier, Goetz wrote:
>> Hi,
>>
>> Once more, with webrev:
>> http://cr.openjdk.java.net/~goetz/webrevs/webrev-taskqueue/
>
> I can see that this function needs a storestore barrier:
>
>    237   void set_empty() {
>    238     _bottom = 0;
>    239     _age.set(0);
>    240   }
>
> to preserve ordering. Are there other functions to which we can
> constrain the loadload and storestore barriers rather than using the
> setters/getters the way you have defined? This is always about a pair
> (sometimes groups) of accesses so I'd rather deal with the pairs than
> treat each field as if it were a Java volatile.
>
> Thanks,
> David Holmes
>
>
>> Sorry for that.
>>
>> I would like to contribute fixes wrt. memory ordering in taskqueue.hpp.
>>
>> On Platfoms with weak memory ordering the taskqueue is not accessed
>> properly, as the accesses to the fields _bottom and _age are not ordered
>> correctly. Volatile is not sufficient to enforce this, because it depends on
>> what the compiler assumes to be necessary for volatile variables.
>>
>> The fix is to pull getter/setter routines from Age to TaskQueueSuper and use methods from
>> OrderAccess to access the fields in _age as well as _bottom.   Further the code must always
>> use the getter/setter methods to access _bottom, _age or the subfields of _age.
>> On the other hand we can relax constraints for accesses to locals oldAge and newAge.
>>
>> The OrderAccess routines do simple load/stores on x86_64.
>>
>> I want to discuss this change here and would like very much to see it taking
>> it's way into OpenJDK to support ports on platforms with weak memory
>> ordering, and, in particular, our PPC port.
>>
>> Best regards,
>>     Goetz.
>>
>>


More information about the hotspot-dev mailing list