RFR(m): 8220351: Cross-modifying code

Wed Mar 27 14:04:53 UTC 2019

Hi Robbin,

On 2019-03-27 13:02, Robbin Ehn wrote:
> Hi Erik,
> 
> On 3/27/19 12:04 PM, Erik Österlund wrote:
>> Hi Robbin,
>>
>> In runtime/safepointMechanism.cpp:
>> 110     OrderAccess::storeload();
>>
>> This should be OrderAccess::fence() as it must also act as a 
>> loadload() between the local poll and the global poll; if they 
>> reorder, we could miss a safepoint request that we disarmed, and 
>> forget to arm again, causing potentially causing a deadlock in the 
>> whole system. The only thing interleaving the two now is a storeload 
>> and a release (comprising loadstore and storestore), in other words no 
>> loadload. Naturally, all known storeload fences are also loadload 
>> fences, so it will work anyway, but it is nevertheless good for the 
>> semantics to be explicit about it.
> 
> A question, in this case we have:
> A: load poll
> loadstore|storestore (release)
> B: store poll
> storeload
> C: load state
> 
> So B is ordered after A and C is ordered after B.
> Which I thought implied that C also was ordered after A ?
> 
> That is the reason for having a storeload and not a fence.

We have not defined transitivity between chains of 
{load|store}{load|store} style fenced access pairs. Our current spec 
only defines access class pairs, and how they interact.

In fact, in general, if you have N threads running sequences of loads 
and stores, and interleaved all access pairs with the corresponding 
{load|store}{load|store} fence for the immediately neighbouring 
accesses, you would not end up with a total ordering of all accesses in 
the system (in which you could reason about transitive happens-before 
ordering properties).

However, in practice, given our current bindings, your code here *will* 
be correctly ordered on all of our platforms, as your loads only race 
with stores of one other thread, and storeload implies loadload in our 
bindings. But it's not really defined in our memory model.

I noticed though that the local poll seemingly unconditionally has 
acquire semantics nowadays. A bit of a mismatch I suppose given that the 
VM-thread store-pair it races with is interleaved by storestore(). But 
anyway, that acquire should imply the loadload you need and I thought 
was missing. So I guess we are good. Perhaps the mixing of 
{load|store}{load|store}-style fences racing with acquire/release style 
fences ought to be cleaned up a bit. Some other day maybe. ;)

Looks good.

Thanks,
/Erik

>>
>> Otherwise, it looks good. Don't need another webrev. Thanks for 
>> sorting this out.
> 
> Thanks!
> 
> /Robbin
> 
>>
>> Thanks,
>> /Erik
>>
>> On 2019-03-26 13:51, Robbin Ehn wrote:
>>> Hi all, please review.
>>>
>>> This is v4, which is just an rebase on-top of "RFR: 8221207: Redo 
>>> JDK-8218446 -
>>> SuspendAtExit hangs". The commit simplified S/R code a bit, making 
>>> this patch
>>> simpler.
>>>
>>> Full:
>>> http://cr.openjdk.java.net/~rehn/8220351/v4/webrev/
>>> Inc (not complete since v3 did not apply proper, one of the reject 
>>> wasn't
>>> needed):
>>> http://cr.openjdk.java.net/~rehn/8220351/v4/inc/webrev/
>>>
>>> To summarize, before 8221207 we had 4 safe states:
>>> *Off threads_list (_thread_new)
>>> *Blocked          (_thread_blocked)
>>> *In native        (_thread_in_native)
>>> *Suspend          (*ANY*)
>>> *After handshake  (X)
>>>
>>> After 8221207 we have:
>>> *Off threads_list (_thread_new) (far from cross-mod code)
>>> *Blocked          (_thread_blocked)
>>> *In native        (_thread_in_native)
>>> *After handshake  (X)
>>>
>>> -We have one instruction barrier for when going from _thread_new.
>>> -We have four instruction barrier for when going from blocked:
>>>      - ThreadBlockInVM
>>>      - ThreadBlockInVMWithDeadlockCheck
>>>      - The manual blocked transition for suspended blocked case.
>>>      - Blocked in SS:block() uses the barrier in 
>>> block_if_requested_slow.
>>> -We let threads doing their own handshake hit the barrier in
>>> block_if_requested_slow.
>>> -We let threads in native stay armed so they hit the barrier in
>>> block_if_requested_slow.
>>>
>>> Not disarming threads in native works because, false positives 
>>> already happens
>>> today and are handled. Both handshakes and safepoints disarms in 
>>> correct order,
>>> so a JavaThread can notice that is should be armed, but with false 
>>> positives.
>>> Before first check of handshake/safepoint the JavaThread is in a 
>>> unsafe state.
>>> After that check a new handshake/safepoint cannot start/progress 
>>> since the
>>> JavaThread is unsafe. Thus re-arming and continue to next polling 
>>> site is okay.
>>>
>>> Patch only adds barrier for cross-modifying code that happens during a
>>> handshake/safepoint and implements this barrier for those platforms 
>>> which have a
>>> simple implementaion for accomplish it. sparc/arm32/aarch64 needs 
>>> additional
>>> work.
>>>
>>> The deprecated global poll path does not contain proper barriers, and 
>>> will be
>>> obsoleted in JDK 14.
>>>
>>> Thanks, Robbin
>>>
>>> On 2019-03-08 16:24, Robbin Ehn wrote:
>>>> Hi all, please review.
>>>>
>>>> Issue:
>>>> https://bugs.openjdk.java.net/browse/JDK-8220351
>>>> Changeset:
>>>> http://cr.openjdk.java.net/~rehn/8220351/webrev/
>>>>
>>>> After a JavaThread have been in a safepoint/(handshake) safe state 
>>>> it can start
>>>> executing updated code. E.g. an oop in the instruction stream can 
>>>> have been
>>>> updated.
>>>>
>>>> Most hardware's require a barrier or that the code cross modified is 
>>>> far away to
>>>> guarantee that the thread executing the updated instruction stream 
>>>> sees the
>>>> modification.
>>>>
>>>> What far away is and how far an update instruction stream is from a 
>>>> safepoint
>>>> safe state is not clear.
>>>>
>>>> To be compliant with those architectures an instruction stream 
>>>> barrier must be
>>>> added when leaving the safepoint safe state.
>>>>
>>>> There may be crashes today due to this missing barrier.
>>>> A new CPU with deeper pipeline or changes to the VM which moves a 
>>>> safepoint safe
>>>> state closer to a nmethod can increase changes of a crash.
>>>>
>>>> A few benchmarks will see a performance regression with the extra 
>>>> serializing,
>>>> such as Octane-Crypto -5% (x86).
>>>>
>>>> With a much more elaborate fix, such as changing the local poll to 
>>>> have more
>>>> than two states (armed/disarmed), it would be possible to only emit 
>>>> such
>>>> instruction stream barrier when the poll have been armed while the 
>>>> thread was
>>>> safepoint safe.
>>>>
>>>> Passes t1-3 on our platforms, and test built what I can.
>>>>
>>>> Thanks, Robbin