RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Mon Aug 31 01:30:10 UTC 2015

Hi Dan,

On 29/08/2015 2:27 AM, Daniel D. Daugherty wrote:
> On 8/28/15 8:45 AM, Tom Benson wrote:
>> Hi,
>> One more pair of eyes on this.  8^)
>
> Hi Tom!
>
> Thanks for reviewing and welcome to the party...
>
>
>>
>> On 8/27/2015 8:16 PM, Kim Barrett wrote:
>>> On Aug 27, 2015, at 5:42 PM, Daniel D. Daugherty
>>> <daniel.daugherty at oracle.com> wrote:
>>>> Sorry for starting another e-mail thread fork in an already complicated
>>>> review...
>>> OK, that was fascinating.  No, really, I mean it.
>>>
>>> It made me realize that we've been arguing and talking past each other
>>> in part because we're really dealing with two distinct though closely
>>> related bugs here.
>>>
>>> I've been primarily thinking about the case where we're calling
>>> vm_abort / os::abort, where the we presently delete the PerfData
>>> memory even though there can be arbitrary other threads running.  This
>>> was the case in JDK-8129978, which is how I got involved here in the
>>> first place.  In that bug we were in vm_exit_during_initialization and
>>> had called perfMemory_exit when some thread attempted to inflate a
>>> monitor (which is not one of the conflicting cases discussed by Dan).
>>>
>>> The problem Dan has been looking at, JDK-8049304, is about a "normal"
>>> VM shutdown.  In this case, the problem is that we believe it is safe
>>> to delete the PerfData, because we've safepointed, and yet some thread
>>> unexpectedly runs and attempts to touch the deleted data anyway.
>>>
>>> I think Dan's proposed fix (mostly) avoids the specific instance of
>>> JDK-8129978, but doesn't solve the more general problem of abnormal
>>> exit deleting the PerfData while some running thread is touching some
>>> non-monitor-related part of that data.  My proposal to leave it to the
>>> OS to deal with memory cleanup on process exit would deal with this
>>> case.
>>>
>>> I think Dan's proposed fix (mostly) avoids problems like JDK-8049304.
>>> And the approach I've been talking about doesn't help at all for this
>>> case.  But I wonder if Dan's proposed fix can be improved.  A "futile
>>> wakeup" case doesn't seem to me like one which requires super-high
>>> performance.  Would it be ok, in the two problematic cases that Dan
>>> identified, to use some kind of atomic / locking protocol with the
>>> cleanup?  Or is the comment for the counter increment in EnterI (and
>>> only there) correct that it's important to avoid a lock or atomics
>>> here (and presumably in ReenterI too).
>>>
>>
>> I notice that EnteriI/ReenterI both end with OrderAccess::fence(). Can
>> the potential update of _sync_FutileWakeups be delayed until that
>> point, to take advantage of the fence to make the sync hole even smaller?
>
> Not easily with EnterI() since there is one optional optimization
> between the OM_PERFDATA_OP(FutileWakeups, inc()) call and the
> OrderAccess::fence() call and that would result in lost FutileWakeups
> increments.
>
> Not easily in ReenterI(), the OM_PERFDATA_OP(FutileWakeups, inc()) call
> is at the bottom of the for-loop and the OrderAccess::fence() call at
> the end of the function is outside the loop. This would result in lost
> FutileWakeups increments.
>
> So in ReenterI() the OM_PERFDATA_OP(FutileWakeups, inc()) call immediately
> follows an OrderAccess::fence() call. Doesn't that make that increment as
> "safe" as it can be without having a real lock?
>
>
>> You've got a release() (and and short nap!) with the store in
>> PerfDataManager::destroy() to try to close the window somewhat.
>
> Yes, I modeled that after:
>
> src/share/vm/runtime/perfMemory.cpp:
>
>      83  void PerfMemory::initialize() {
> <snip>
>     156    OrderAccess::release_store(&_initialized, 1);
>     157  }
>
>
>> But I think rather than the release_store()  you used, you want a
>> store, followed by a release().   release_store() puts a fence before
>> the store to ensure earlier updates are seen before the current one, no?
>
> Yup, and I see I got my reasoning wrong. The code I modeled
> is right because you want to flush all the inits and it's OK
> if the _initialized transition from '0' -> '1' is lazily seen.
>
> For my shutdown use, we are transitioning from '1' -> '0' and
> we need that to be seen proactively so:

Nit: OrderAccess "barriers" enforce ordering constraints but don't in 
general provide any guarantees about visibility - ie they are not 
necessarily "flushes". So while it may be true on some platforms by 
virtue of the underlying barrier mechanism, in general they don't change 
when a write becomes visible and so there is nothing "proactive" about them.

>      OrderAccess::release_store(&_has_PerfData, 0);
>      OrderAccess::storeload();

I agree with Tom that the release is unnecessary - though harmless. The 
real ordering constraint here is that we preserve:

_has_PerfData = 0;
<do actual deallocation>

for which a storeload|storestore barrier after the write seems most 
appropriate. Though with the insertion of the sleep after the write 
there won't be any reordering anyway so explicit barriers seem redundant.

Cheers,
David
-----

> which is modeled after _owner field transitions from non-zeo
> -> NULL in ObjectMonitor.cpp
>
>
>>
>> Also, I think the comment above that release_store() could be
>> clarified.  It is fine as is if you're familiar with this bug report
>> and discussion, but...  I think it should explicitly say there is
>> still a very small window for the lack of true synchronization to
>> cause a failure.  And perhaps that the release_store() (or
>> store/release())  is not half of an acquire/release pair.
>
> Here's the existing comment:
>
>     286    // Clear the flag before we free the PerfData counters. Thus
> begins
>     287    // the race between this thread and another thread that has just
>     288    // queried PerfDataManager::has_PerfData() and gotten back
> 'true'.
>     289    // The hope is that the other thread will finish its PerfData
>     290    // manipulation before we free the memory. The two alternatives
>     291    // are 1) leak the PerfData memory or 2) do some form of ordered
>     292    // access before every PerfData operation.
>
> I think it pretty clearly states that there is still a race here.
> And I think that option 2 covers that we're not doing completely
> safe ordered access. I'm not sure how to make this comment more
> clear, but if you have specific suggestions...
>
> Dan
>
>
>>
>> Tom
>