RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Wed Jan 22 06:01:14 PST 2014


Hi,

thanks for the hint to this paper, David!  
We modified the known read-after-write test to use the serialization page
mechanism, and reading that paper we were able to understand it is correct
on X86 / TSO.  Thanks to Andreas Schoesser for this.
http://cr.openjdk.java.net/~goetz/raw_serialization_page/raw_s11n_page.cpp

We ran the test on linux ppc and aix over the weekend, without failure.
This indicates the mechanism works there, too.
The test fails immediately if the "write" in the fast thread is omitted.
We also checked that both threads make considerable progress, and the
fast thread does not always trap.

Also, we looked at documentation about how mprotect etc should
be implemented on ppc.
A mprotect should do a tlbie, invidating the TLBs of all processors.
It also should do an eieio and a tlbsync
This assures the TLB of the "fast" process doing only the write is
invalidated before mprotect returns.  So the fast thread will 
experience a TLB-miss on the write.  We assume on the TLB-miss
some synchronization instruction is used, probably the ptesync.
This should assure the fast thread gets the new value even if the 
page is already writable again.
Unfortunately this is implemented in supervisor code, which is not 
available to us.

I'll add a patch that enables UseMembar in our VM to test the performance
impacts of that variant.
As I don't see a good reason this flag is product, I'll change it to 
debug for the test.  As you described, the flag should be set depending on 
whether the serialization page works.  So nowadays it serves as a platform 
configuration so I think debug is the better choice.
(Btw, you could move UsePPCLWSYNC into your globals_ppc.hpp with the new
platform-dependent flags.)

Best regards,
  Goetz.

-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Dienstag, 21. Januar 2014 03:49
To: Lindenmaier, Goetz; Vladimir Kozlov
Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers'
Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization

On 20/01/2014 11:41 PM, Lindenmaier, Goetz wrote:
> Hi David,
>
> I understand your arguments and basically agree with them.
> If the serialization page does not work on PPC, your solution
> 1) is best.
>
> But I'm not sure why there should be a link between TSO and whether
> the serialization page trick works.  Second depends, as you say,
> on the OS implementation.

My limited understanding is that on RMO-based systems the requirements for:

"Synchronization based on page-protections - mprotect()"

as described in:

http://home.comcast.net/~pjbishop/Dave/Asymmetric-Dekker-Synchronization.txt

may not always hold. Dave Dice would need to provide more details if needed.

Cheers,
David

> I would assume that the write to the serialization page causes
> the OS to generate a new TLB entry or the like, involving the
> use of a ptwsync instruction which would be fine.  But we are
> investigating this.
>
> We are also experimenting with a small regression test to find
> out whether the  serialization page ever fails.
>
> Best regards,
>    Goetz.
>
>
>
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Montag, 20. Januar 2014 03:45
> To: Lindenmaier, Goetz; Vladimir Kozlov
> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers'
> Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization
>
> On 17/01/2014 11:30 PM, Lindenmaier, Goetz wrote:
>> Hi,
>>
>> I had a look at the first part of this issue: Whether StoreStore
>> is necessary in the interpreter.  Let's for now assume the serialization
>> page mechanism works on PPC.
>>
>> In the state transition leaving the VM state, which is executed in the
>> destructor, ThreadStateTransition::transition() is called, which executes
>>         if (UseMembar) {
>>           OrderAccess::fence();
>>         } else {
>>           os::write_memory_serialize_page(thread);
>>         }
>>
>> os:: write_memory_serialize_page() can not be considered a proper
>> MemBar, as it only serializes if another thread poisoned the page.
>> Thus it does not qualify to order the initialization and the publishing
>> of the object.
>>
>> You are right, if UseMembar is true, the StoreStore in the interpreter
>> is superfluous.  We could guard the StoreStores in the interpreter by
>> !UseMembar.
>
> My understanding, from our existing non-TSO system ports, is that the
> present assumption is that either:
>
> a) you have a TSO system, in which case you are probably using the
> serialization page, but you don't need any barrier to enforce ordering
> anyway; or
>
> b) you don't have a TSO system, you are using UseMembar==true and so you
> get a full fence inserted that enforces the ordering anyway.
>
> So the ordering requirements are satisfied by piggy-backing on the
> UseMembar setting that comes from the thread state transition code,
> which forms part of the "runtime entry" code. That's not to say that you
> will necessarily find this applied consistently in all places where it
> might be applied - nor will you necessarily find that this is common
> knowledge amongst VM engineers.
>
> Technically the storeStore barriers could be conditional on !UseMembar
> but that is redundant in the current usage.
>
>> But then again, one is to order the publishing of the thread states,
>> the other to enforce some Java semantics.  I don't know whether everybody
>> who changes in one place is aware of both issues.  But if you want to,
>> I'll add a !UseMembar in the interpreter.
>
> Here are my preferred options in order:
>
> 1. Set UseMembar==true on PPC64 and drop these new storeStore barriers -
> rely on the piggy-backing effect.
>
> 2. Conditionalize the new storeStore barriers on !UseMembar. This
> unfortunately penalizes all platforms with a runtime check.
>
> 3. Add the storeStores unconditionally. This penalizes platforms that
> set UseMembar==true as we will now get two fences at runtime.
>
> I know we're talking about the interpreter here so performance is not
> exactly critical, but still ...
>
>> Maybe it would be a good idea
>> to document the double use in interfaceSupport.cpp, too.  And maybe
>> add an assertion of some kind.
>
> interfaceSupport doesn't know that other code piggy-backs on the fact
> state-transitions have full fences when UseMembar is true. If it is
> documented anywhere it should be in the interpreter (and any other
> places that makes the same assumption) - something like:
>
> // On non-TSO systems there can be additional ordering constraints
> // between Java-level actions (such as allocation and constructor
> // invocation) that in principle need explicit memory barriers.
> // However, on many non-TSO systems the thread-state transition logic
> // in the IRT_ENTRY code will insert a full fence due to the use of
> // UseMembar==true, which provides the necessary ordering guarantees.
>
>>
>> We're digging into the other issue currenty, whether the serialization page
>> works on ppc.  We understand your concerns and have no simple answer to
>> it right now.  At least, in our VM and in the port there are no known problems
>> with the state transitions.
>
> Even if the memory serialization page does not work, in a guaranteed
> sense, on PPC-AIX, it is extremely unlikely that testing would expose
> this. Also note that the memory serialization page behaviour is more a
> function of the OS - so it may be that AIX is different to linux in that
> regard.
>
> Cheers,
> David
> -----
>
>> Best regards,
>>     Goetz.
>>
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Donnerstag, 16. Januar 2014 19:16
>> To: David Holmes; Lindenmaier, Goetz
>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev Source Developers'
>> Subject: Re: RFR (S): 8029957: PPC64 (part 213): cppInterpreter: memory ordering for object initialization
>>
>> Changes are in C++ Interpreter so it does not affect Oracle VM.
>> But David has point here. I would like to hear the explanation too.
>>
>> BTW, I see that for ppc64:
>>
>> src/cpu/ppc/vm//globals_ppc.hpp:define_pd_global(bool, UseMembar, false);
>>
>> as result write_memory_serialize_page() is used in
>> ThreadStateTransition::transition().
>>
>> Is it not enough on PPC64?
>>
>> Thanks,
>> Vladimir
>>
>> On 1/15/14 9:30 PM, David Holmes wrote:
>>> Can I get some response on this please - specifically the redundancy wrt
>>> IRT_ENTRY actions.
>>>
>>> Thanks,
>>> David
>>>
>>> On 20/12/2013 11:55 AM, David Holmes wrote:
>>>> Still catching up ...
>>>>
>>>> On 11/12/2013 9:46 PM, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> this change adds StoreStore barriers after object initialization and
>>>>> after constructor calls in the C++ interpreter.  This assures no
>>>>> uninitialized
>>>>> objects or final fields are visible.
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029957-0-moci/
>>>>
>>>> The InterpreterRuntime calls are all IRT_ENTRY points which will utilize
>>>> thread state transitions that already include a full "fence" so the
>>>> storestore barriers are redundant in those cases.
>>>>
>>>> The fastpath _new storestore seems okay.
>>>>
>>>> I don't know how handle_return gets used to know if it is reasonable or
>>>> not.
>>>>
>>>> I was trying, unsuccessfully, to examine the same code in the
>>>> templateInterpreter to see how it handles these cases as it naturally
>>>> has the same object-initialization-safety requirements (though these can
>>>> be handled in a number of different ways other than an unconditional
>>>> storestore barrier at the end of the initialization and construction
>>>> phases.
>>>>
>>>> David
>>>> -----
>>>>
>>>>> Please review and test this change.
>>>>>
>>>>> Best regards,
>>>>>      Goetz.
>>>>>


More information about the hotspot-dev mailing list