RFR(M): Memory ordering in taskqueue.hpp

David Holmes david.holmes at oracle.com
Wed Dec 19 17:36:01 PST 2012


On 20/12/2012 2:48 AM, Lindenmaier, Goetz wrote:
> Hi David,
>
> first, thanks for the fast replies. Sorry it took me a day now!

No problem. I can't guarantee response times either :)

>> I didn't say that. I said these two need to be ordered and asked which
>> other paired accesses to these variables need to be ordered.
> Together with a colleague I looked at the code. To us it seems
> very hard to identify places where access ordering is not needed.
> Especially we think there will be places in the users of the
> task queue that would need barriers to ensure ordering.
> But we think, the algorithm should be self-contained, doing all the
> required ordering itself.

I agree the algorithm should be self-contained. My point is that in 
other places where we implement lock-free algorithm was put in the 
barriers/fences in the algorithmic code in the places needed we don't 
automatically wrap all of the variables accesses with code that 
introduces pre/post barriers (of whatever form).

> Further, a barrier will spoil performance on x86 etc, because there it
> issues a lock instruction, which is not needed.

You issue the appropriate barrier/fence for the situation and they 
should be implemented in whatever form needed for the given platform 
(modulo OrderAccess is far from perfect and has some outstanding issues).

>> I said that what you were doing was effectively treating the C++ variables
>> as if they were Java volatile variables. That's not the way we handle
>> these issues in the VM we tend to place fences/barriers at the places in
>> lock-free algorithms that need them, rather than wrap the variables in
>> accessors that impose fences/barriers that are not always needed.
>
> As we understand, the C compilers on x86, sparc etc, enforce far more
> than required to implement C++ volatile, basically treating the fields
> like Java volatiles.  Therefore your code is fine on your platforms.

I think you are confusing things. The existing C/C++ compilers do very 
little in regard to "volatile". It is only the latest multi-threading 
variant of the C++ standard that even addresses this! Our C/C++ 
compilers never issue any kind of memory barriers in response to a 
load/store of a volatile variable.

> The C++ compilers on PPC optimize far more, maxing out
> what the standard allows and the processer supports, leading to
> immediate errors, but also to very subtle ones.
>
> Our change makes explicit what the C compilers (except for PPC) do
> anyways, and what is required by the algorithm. (Maybe one could now
> even omit the volatile keywords - but we dare not try.) Thus, we get
> it right without any overhead on your platforms.

This is getting things confused. The lock-free algorithms should be 
written for the most lenient of memory-models and make no assumptions 
about ordering. They should use the appropriate OrderAccess method where 
needed for algorithmic correctness.

You proposal wraps the C variable in accessors that use OrderAccess 
operations on every access - in a similar way to how we have to handle 
Java volatile accesses. And as I have said that is not the style we use 
in hotspot's lock-free code. This has nothing to do with our compiler 
versus your compiler (though if you have one that issues memory-barriers 
directly then you'll get too much overhead.)

>> Though we have not run into this particular issue. Did
>> it arise in the context of a particular GC?
> Issues are only on PPC, at least in parallelScavenge. I reproduced
> some of the immediate errors, but I don't think this really helps with
> the discussion.  I will try G1 soon.  Tell me if you want to know
> more, then I'll try to come up with a wrap-up of our past fixes.

We have a PPC product and as I said we have not encountered this 
problem, but we also run with limited GCs so may not have been 
exercising this code.

Thanks,
David

> Best regards,
>    Goetz.
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Dienstag, 18. Dezember 2012 12:19
> To: Lindenmaier, Goetz
> Cc: hotspot-dev at openjdk.java.net
> Subject: Re: RFR(M): Memory ordering in taskqueue.hpp
>
> Hi Goetz,
>
> On 18/12/2012 8:47 PM, Lindenmaier, Goetz wrote:
>> Hi David,
>>
>> why do you think only these two accesses have to be ordered?
>
> I didn't say that. I said these two need to be ordered and asked which
> other paired accesses to these variables need to be ordered.
>
>> As I understand, any two accesses in one thread to these fields have to be
>> orderered and must be written to memory to become visible in the proper order
>> to other threads.
>
> That depends on how they are used. It is the overall lock-free algorithm
> that is important here. There may be places where the order of access
> does not matter (and there may not be).
>
>> The PPC compiler really takes all advantages given by the
>> C-Standard to optimize this, and the processor reorders heavily.
>
> Yes I understand. Though we have not run into this particular issue. Did
> it arise in the context of a particular GC?
>
>> The specification of volatile in
>>          Age get() const volatile { return _data; }
>> says that it must be read from memory, not that it can not be reordered.
>
> I didn't say anything about C/C++ volatile (which doesn't even say it
> must be read from memory - except perhaps in latest MT C++ standard?). I
> said that what you were doing was effectively treating the C++ variables
> as if they were Java volatile variables. That's not the way we handle
> these issues in the VM we tend to place fences/barriers at the places in
> lock-free algorithms that need them, rather than wrap the variables in
> accessors that impose fences/barriers that are not always needed.
>
> Cheers,
> David
>
>> Also, doing
>>         OrderAccess::load_acquire((volatile idx_t*)&(_age._fields._top))
>> is better wrt. to optimizations by C-compilers, as there first
>> is the whole address computation, and then the cast to volatile.
>> If we use
>>         _age.top()
>> with
>>         idx_t top() const volatile { return OrderAccess::load_acquire((volatile idx_t*)&(_fields._top); }
>> compilers don't optimize that well.  (We saw this on hpia64.)
>> Further, the access to, e.g., old_age.top() in pop_local_slow() would do
>> the OrderAccess, which is useless overhead.
>>
>> Best regards,
>>     Goetz.
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Dienstag, 18. Dezember 2012 01:24
>> To: Lindenmaier, Goetz
>> Cc: hotspot-dev at openjdk.java.net
>> Subject: Re: RFR(M): Memory ordering in taskqueue.hpp
>>
>> Hi Goetz,
>>
>> On 17/12/2012 10:58 PM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>> Once more, with webrev:
>>> http://cr.openjdk.java.net/~goetz/webrevs/webrev-taskqueue/
>>
>> I can see that this function needs a storestore barrier:
>>
>>     237   void set_empty() {
>>     238     _bottom = 0;
>>     239     _age.set(0);
>>     240   }
>>
>> to preserve ordering. Are there other functions to which we can
>> constrain the loadload and storestore barriers rather than using the
>> setters/getters the way you have defined? This is always about a pair
>> (sometimes groups) of accesses so I'd rather deal with the pairs than
>> treat each field as if it were a Java volatile.
>>
>> Thanks,
>> David Holmes
>>
>>
>>> Sorry for that.
>>>
>>> I would like to contribute fixes wrt. memory ordering in taskqueue.hpp.
>>>
>>> On Platfoms with weak memory ordering the taskqueue is not accessed
>>> properly, as the accesses to the fields _bottom and _age are not ordered
>>> correctly. Volatile is not sufficient to enforce this, because it depends on
>>> what the compiler assumes to be necessary for volatile variables.
>>>
>>> The fix is to pull getter/setter routines from Age to TaskQueueSuper and use methods from
>>> OrderAccess to access the fields in _age as well as _bottom.   Further the code must always
>>> use the getter/setter methods to access _bottom, _age or the subfields of _age.
>>> On the other hand we can relax constraints for accesses to locals oldAge and newAge.
>>>
>>> The OrderAccess routines do simple load/stores on x86_64.
>>>
>>> I want to discuss this change here and would like very much to see it taking
>>> it's way into OpenJDK to support ports on platforms with weak memory
>>> ordering, and, in particular, our PPC port.
>>>
>>> Best regards,
>>>      Goetz.
>>>
>>>


More information about the hotspot-dev mailing list