RFR(M): Memory ordering in taskqueue.hpp
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Thu Dec 20 02:18:08 PST 2012
Hi David,
I'll split this up, and first answer the easy one:
> We have a PPC product and as I said we have not encountered this
> problem, but we also run with limited GCs so may not have been
> exercising this code.
I know. That's only 32 bit, right, and for embedded?
We also wondered why you are not experiencing the same problems
there. A reason might be that 32-bit PPC behaves different from
64-bit. E.g., the compiler might be more conservative on 32-bit.
Some of our problems are easy to reproduce, see the attached file.
To produce the errors in the attached file I removed the
taskqueue change from our port. It did not fail with CMS
(-XX:+UseConcMarkSweepGC -XX:SurvivorRatio=4).
We also had issues that were much harder to detect.
In general, since we get more machines with many processor
threads (e.g., 32), we also see more errors with memory ordering
issues. Maybe you want to have a look at the fixes I already put into
the port:
(taskqueue: http://hg.openjdk.java.net/ppc-aix-port/jdk7u/hotspot/rev/ca3cac6a53bd you know this one)
opto: http://hg.openjdk.java.net/ppc-aix-port/jdk7u/hotspot/rev/3cd0e8951ecc
C-interpreter: http://hg.openjdk.java.net/ppc-aix-port/jdk7u/hotspot/rev/6c726fea31d4
Other: http://hg.openjdk.java.net/ppc-aix-port/jdk7u/hotspot/rev/07fd72d0ca47
It took us a row of fixes to end up at the solution for taskqueue I contributed.
In 2006, we added OrderAccess in OopTaskQueue::push() and
GenTaskQueue::push(). We also added volatiles that were
gone from your implementation for a while.
In 2008, we wrapped all accesses to _bottom in functions with
release/aquire (as it is still the case). We added release/acquire in
get_age, get_top, set_age. We did this after debugging a bug on PPC
where two threads GC'ed the same oop.
This was all done with #ifdefs for PPC. Later we decided to remove the
#ifdefs and only use the code with the release/acquire access functions.
On x86 etc. this results in the same code, as the OrderAccess routines only do
a cast to volatile. On ia64 we found a considerable performance
increase, because the HP C-compiler did better inlining.
When Oracle introduced Age::top() to replace SuperTaskQueue::get_top() etc,
the access ordering got lost in our code due to a wrong integration to our code
in 2010. Half a year later we again identified a bug, and added OrderAccess in Age::top().
This didn't suffice so 3 months later we replaced Age::top() by
SuperTaskQueue::get_age_top(). Since that we had no issues with the taskqueue.
Now I'll go back and address the other issues ...
Best regards,
Goetz.
Thanks,
David
> Best regards,
> Goetz.
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Dienstag, 18. Dezember 2012 12:19
> To: Lindenmaier, Goetz
> Cc: hotspot-dev at openjdk.java.net
> Subject: Re: RFR(M): Memory ordering in taskqueue.hpp
>
> Hi Goetz,
>
> On 18/12/2012 8:47 PM, Lindenmaier, Goetz wrote:
>> Hi David,
>>
>> why do you think only these two accesses have to be ordered?
>
> I didn't say that. I said these two need to be ordered and asked which
> other paired accesses to these variables need to be ordered.
>
>> As I understand, any two accesses in one thread to these fields have to be
>> orderered and must be written to memory to become visible in the proper order
>> to other threads.
>
> That depends on how they are used. It is the overall lock-free algorithm
> that is important here. There may be places where the order of access
> does not matter (and there may not be).
>
>> The PPC compiler really takes all advantages given by the
>> C-Standard to optimize this, and the processor reorders heavily.
>
> Yes I understand. Though we have not run into this particular issue. Did
> it arise in the context of a particular GC?
>
>> The specification of volatile in
>> Age get() const volatile { return _data; }
>> says that it must be read from memory, not that it can not be reordered.
>
> I didn't say anything about C/C++ volatile (which doesn't even say it
> must be read from memory - except perhaps in latest MT C++ standard?). I
> said that what you were doing was effectively treating the C++ variables
> as if they were Java volatile variables. That's not the way we handle
> these issues in the VM we tend to place fences/barriers at the places in
> lock-free algorithms that need them, rather than wrap the variables in
> accessors that impose fences/barriers that are not always needed.
>
> Cheers,
> David
>
>> Also, doing
>> OrderAccess::load_acquire((volatile idx_t*)&(_age._fields._top))
>> is better wrt. to optimizations by C-compilers, as there first
>> is the whole address computation, and then the cast to volatile.
>> If we use
>> _age.top()
>> with
>> idx_t top() const volatile { return OrderAccess::load_acquire((volatile idx_t*)&(_fields._top); }
>> compilers don't optimize that well. (We saw this on hpia64.)
>> Further, the access to, e.g., old_age.top() in pop_local_slow() would do
>> the OrderAccess, which is useless overhead.
>>
>> Best regards,
>> Goetz.
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Dienstag, 18. Dezember 2012 01:24
>> To: Lindenmaier, Goetz
>> Cc: hotspot-dev at openjdk.java.net
>> Subject: Re: RFR(M): Memory ordering in taskqueue.hpp
>>
>> Hi Goetz,
>>
>> On 17/12/2012 10:58 PM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>> Once more, with webrev:
>>> http://cr.openjdk.java.net/~goetz/webrevs/webrev-taskqueue/
>>
>> I can see that this function needs a storestore barrier:
>>
>> 237 void set_empty() {
>> 238 _bottom = 0;
>> 239 _age.set(0);
>> 240 }
>>
>> to preserve ordering. Are there other functions to which we can
>> constrain the loadload and storestore barriers rather than using the
>> setters/getters the way you have defined? This is always about a pair
>> (sometimes groups) of accesses so I'd rather deal with the pairs than
>> treat each field as if it were a Java volatile.
>>
>> Thanks,
>> David Holmes
>>
>>
>>> Sorry for that.
>>>
>>> I would like to contribute fixes wrt. memory ordering in taskqueue.hpp.
>>>
>>> On Platfoms with weak memory ordering the taskqueue is not accessed
>>> properly, as the accesses to the fields _bottom and _age are not ordered
>>> correctly. Volatile is not sufficient to enforce this, because it depends on
>>> what the compiler assumes to be necessary for volatile variables.
>>>
>>> The fix is to pull getter/setter routines from Age to TaskQueueSuper and use methods from
>>> OrderAccess to access the fields in _age as well as _bottom. Further the code must always
>>> use the getter/setter methods to access _bottom, _age or the subfields of _age.
>>> On the other hand we can relax constraints for accesses to locals oldAge and newAge.
>>>
>>> The OrderAccess routines do simple load/stores on x86_64.
>>>
>>> I want to discuss this change here and would like very much to see it taking
>>> it's way into OpenJDK to support ports on platforms with weak memory
>>> ordering, and, in particular, our PPC port.
>>>
>>> Best regards,
>>> Goetz.
>>>
>>>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: taskqueue-failures.txt
Url: http://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20121220/9ce08f58/taskqueue-failures-0001.txt
More information about the hotspot-dev
mailing list