RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()
Daniel D. Daugherty
daniel.daugherty at oracle.com
Tue Sep 1 14:07:12 UTC 2015
On 8/31/15 11:11 PM, David Holmes wrote:
> Hi Dan,
>
> On 1/09/2015 2:58 AM, Daniel D. Daugherty wrote:
>> Hi David,
>>
>> Replies embedded below...
>
> Likewise ... but with some trimming (I'm getting RSI hitting page-down).
Yup. I use an Apple MagicMouse and I'm getting tired of
the scrolling drag... :-)
>
>> On 8/30/15 7:30 PM, David Holmes wrote:
>>> On 29/08/2015 2:27 AM, Daniel D. Daugherty wrote:
>>>> On 8/28/15 8:45 AM, Tom Benson wrote:
>>>> For my shutdown use, we are transitioning from '1' -> '0' and
>>>> we need that to be seen proactively so:
>>>
>>> Nit: OrderAccess "barriers" enforce ordering constraints but don't in
>>> general provide any guarantees about visibility - ie they are not
>>> necessarily "flushes". So while it may be true on some platforms by
>>> virtue of the underlying barrier mechanism, in general they don't
>>> change when a write becomes visible and so there is nothing
>>> "proactive" about them.
>>
>> My apologies for being sloppy with my wording.
>>
>> What I'm trying to do is put up a barrier so that once the
>> deleting thread has changed the flag from '1' -> '0' another
>> thread trying to read the flag won't see the '1' when it
>> should see the '0'. In order words, I don't want the other
>> thread's read of the flag to float above the setting of
>> the flag to '0'.
>
> That still suggest to me a notion that the barrier somehow forces
> visibility and that after the barrier_store returns all other threads
> are now guaranteed to see the value '1' - which is not the case. All
> the barriers here do is ensure a global observability ordering: if
> another thread sees an action that happened after the barrier, then it
> must see the action that happened before the barrier.
I'm not trying to say that visibility is forced. I was shooting
for something like "happens before" and "happens after", but I
didn't use those terms...
>
> More below ...
>
>>>
>>>> OrderAccess::release_store(&_has_PerfData, 0);
>>>> OrderAccess::storeload();
>>>
>>> I agree with Tom that the release is unnecessary - though harmless.
>>
>> I tried to switch the code to:
>>
>> OrderAccess::store(&_has_PerfData, 0)
>> OrderAccess::storeload();
>>
>> but that wouldn't compile on Solaris X64:
>>
>> Error: static OrderAccess::store(volatile int*, int) is not accessible
>> from static PerfDataManager::destroy().
>
> Strange - either a bug or an issue with including the .inline.hpp file.
Right. Not worth chasing since I don't really need it.
>
>> But then it occurred to me that I was trying too hard
>> to use OrderAccess functions...
>>
>>> The real ordering constraint here is that we preserve:
>>>
>>> _has_PerfData = 0;
>>> <do actual deallocation>
>>
>> I think I can switch to a straight "_has_PerfData = 0;" here and
>> not use either OrderAccess::release_store() or OrderAccess::store().
>
> Yes. Particularly as OrderAccess::store is just a plain store for
> 32-bit types.
>
>> I also reread OrderAccess.hpp again and convinced myself that
>> an "OrderAccess::fence()" call is the only way to be sure here.
>> I'm pretty sure that achieves 'storeload|storestore' barrier
>> semantics... Yes, I have a headache again... :-)
>
> Yes it does, as well as loadLoad and loadStore.
>
>>
>>> for which a storeload|storestore barrier after the write seems most
>>> appropriate. Though with the insertion of the sleep after the write
>>> there won't be any reordering anyway so explicit barriers seem
>>> redundant.
>>
>> I didn't think that os::naked_short_sleep() would do anything
>> to keep another thread's read from floating above the
>> "_has_PerfData = 0;". What did I miss?
>
> This is somewhat of a side-discussion but ...
>
> First your notion of floating reads in other threads is somewhat
> troubling to me. I kind of get what you mean but ...
>
> Now for the sleep. As always with ordering constraints there are two
> sides: the writer and the reader. Here we are dealing with the writer.
> The writer is executing:
>
> _has_PerfData = 0;
> for (int index = 0; index < _all->length(); index++) {
> PerfData* p = _all->at(index);
> delete p;
> }
>
> for the sake of discussion lets assume there's only one PerfData
> object and that deleting it amounts to setting an internal field to
> NULL. So our code becomes:
>
> _has_PerfData = 0;
> data->ref = NULL;
>
> The ordering constraint is that the above two statements happen in the
> order written - so we don't delete the data while the data is claimed
> to be valid. That requires both that the compiler not reorder them,
> nor the hardware. If we did:
>
> _has_PerfData = 0;
> membar storeload|storeStore;
> data->ref = NULL;
>
> then we achieve that goal. Now consider if we instead have:
>
> _has_PerfData = 0;
> os::naked_short_sleep(1);
> data->ref = NULL;
>
> The sleep call, leads to a library call and a syscall and ultimately
> likely a context switch as the current thread gets descheduled for
> 1ms. In terms of machine instructions the distance between the initial
> write of _has_PerfData and the load of 'data' and so the store to data
> is huge. Can those stores be reordered by the hardware? Theoretically
> yes, but in practice I say not at all likely: the separation is too
> large and there are likely additional barriers already within the code
> that will do the sleep. Can the stores be reordered by the compiler?
> Again theoretically yes, but in practice for a compiler to reorder
> across the sleep requires intimate knowledge of everything that
> happens within the sleep - no compiler we currently use would even
> attempt to try and figure that out. So the sleep, while not a
> guarantee, is most likely to act to prevent the undesired reordering.
>
> Is that lack of guarantee an issue? No - because we know we already
> have a race condition. No matter what we do with this section of code,
> in terms of barriers, we can't avoid the race.
All the way to this point, I was expecting you to say leave the
barrier in...
> Hence my view that adding explicit barriers together with the sleep
> serves no practical purpose here. It would be different if a barrier
> forced the write to become visible to all other threads - but that is
> not guaranteed by OrderAccess.
Not the conclusion I was expecting based on the last couple of
paragraphs. :-)
My preference is to leave the barrier in for these reasons:
- communicates intent to any future maintainer; we could also say that
this might communicate my silliness at continuing with a lock free
algorithm when everyone know that I hate them.
- does not make assumptions about os::naked_short_sleep(1); it might
be turned into an in-line function someday that the compiler can
digest and optimize around...
- does no harm; I think that's a paraphrase of what you said in the
round 0 code review :-)
> Now lets return to the reader side of this, somewhere we have code
> that is effectively doing something like:
>
> if (_has_PerfData == 1)
> cur_data = *data->ref;
>
> This is where we see the race: the value of the flag can change the
> instant after we read it and so we can access data->ref after it is
> NULL. Going back to the sleep, the purpose of the sleep is to give
> threads that have seen _has_PerfData==1 a chance to get past the data
> access before the PerfData object is deleted.
In the monitor subsystem, the check is little different:
if (data != NULL && _has_PerfData == 1) {
cur_data = *data->ref
}
so the read of data->ref can't float above the if-statement. It looks
like most (if not all) of the PerfData usage relies on NULL checks of
the PerfData object as the trigger to determine whether to call into
the PerfData object.
> In terms of ordering constraints we don't want the read of data->ref
> to be moved/float ahead of the read of _has_PerfData, so technically
> we need a loadload barrier to prevent that:
>
> if (_has_PerfData == 1) {
> membar loadload;
> cur_data = *data->ref;
> }
>
> From a hardware perspective I don't think a speculative load can cause
> a problem - if we read data->ref early and see NULL then we "must" see
> _has_perfData==0 (due to ordering at the writer) and so the code is
> skipped. And I think similarly for compiler reorderings (though my
> head is hurting now too :)). But in either case it doesn't really
> matter if this reasoning is wrong because even if the code is kept in
> the required order the race still exists and we can't tell exactly how
> things came about. So adding the loadload would achieve little, if
> anything.
Agreed that a loadload would achieve little, if anything here.
>
> Now feel free to ignore all that, go get a coffee and move on to my
> reply to your next webrev. :)
Will do. Needed a coffee for this one also...
Dan
>
> Cheers,
> David
>
>> My current plan:
>>
>> - switch from "OrderAccess::release_store(&_has_PerfData, 0);"
>> to "_has_PerfData = 0;"
>> - keep "OrderAccess::fence();"
>> - retest and send out for another round of review
>>
>> Dan
>>
>>
>>
>>>
>>> Cheers,
>>> David
>>> -----
>>>
>>>> which is modeled after _owner field transitions from non-zeo
>>>> -> NULL in ObjectMonitor.cpp
>>>>
>>>>
>>>>>
>>>>> Also, I think the comment above that release_store() could be
>>>>> clarified. It is fine as is if you're familiar with this bug report
>>>>> and discussion, but... I think it should explicitly say there is
>>>>> still a very small window for the lack of true synchronization to
>>>>> cause a failure. And perhaps that the release_store() (or
>>>>> store/release()) is not half of an acquire/release pair.
>>>>
>>>> Here's the existing comment:
>>>>
>>>> 286 // Clear the flag before we free the PerfData counters.
>>>> Thus
>>>> begins
>>>> 287 // the race between this thread and another thread that
>>>> has just
>>>> 288 // queried PerfDataManager::has_PerfData() and gotten back
>>>> 'true'.
>>>> 289 // The hope is that the other thread will finish its
>>>> PerfData
>>>> 290 // manipulation before we free the memory. The two
>>>> alternatives
>>>> 291 // are 1) leak the PerfData memory or 2) do some form of
>>>> ordered
>>>> 292 // access before every PerfData operation.
>>>>
>>>> I think it pretty clearly states that there is still a race here.
>>>> And I think that option 2 covers that we're not doing completely
>>>> safe ordered access. I'm not sure how to make this comment more
>>>> clear, but if you have specific suggestions...
>>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> Tom
>>>>
>>
More information about the hotspot-runtime-dev
mailing list