RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Mon Aug 31 16:58:04 UTC 2015

Hi David,

Replies embedded below...

On 8/30/15 7:30 PM, David Holmes wrote:
> Hi Dan,
>
> On 29/08/2015 2:27 AM, Daniel D. Daugherty wrote:
>> On 8/28/15 8:45 AM, Tom Benson wrote:
>>> Hi,
>>> One more pair of eyes on this.  8^)
>>
>> Hi Tom!
>>
>> Thanks for reviewing and welcome to the party...
>>
>>
>>>
>>> On 8/27/2015 8:16 PM, Kim Barrett wrote:
>>>> On Aug 27, 2015, at 5:42 PM, Daniel D. Daugherty
>>>> <daniel.daugherty at oracle.com> wrote:
>>>>> Sorry for starting another e-mail thread fork in an already 
>>>>> complicated
>>>>> review...
>>>> OK, that was fascinating.  No, really, I mean it.
>>>>
>>>> It made me realize that we've been arguing and talking past each other
>>>> in part because we're really dealing with two distinct though closely
>>>> related bugs here.
>>>>
>>>> I've been primarily thinking about the case where we're calling
>>>> vm_abort / os::abort, where the we presently delete the PerfData
>>>> memory even though there can be arbitrary other threads running.  This
>>>> was the case in JDK-8129978, which is how I got involved here in the
>>>> first place.  In that bug we were in vm_exit_during_initialization and
>>>> had called perfMemory_exit when some thread attempted to inflate a
>>>> monitor (which is not one of the conflicting cases discussed by Dan).
>>>>
>>>> The problem Dan has been looking at, JDK-8049304, is about a "normal"
>>>> VM shutdown.  In this case, the problem is that we believe it is safe
>>>> to delete the PerfData, because we've safepointed, and yet some thread
>>>> unexpectedly runs and attempts to touch the deleted data anyway.
>>>>
>>>> I think Dan's proposed fix (mostly) avoids the specific instance of
>>>> JDK-8129978, but doesn't solve the more general problem of abnormal
>>>> exit deleting the PerfData while some running thread is touching some
>>>> non-monitor-related part of that data.  My proposal to leave it to the
>>>> OS to deal with memory cleanup on process exit would deal with this
>>>> case.
>>>>
>>>> I think Dan's proposed fix (mostly) avoids problems like JDK-8049304.
>>>> And the approach I've been talking about doesn't help at all for this
>>>> case.  But I wonder if Dan's proposed fix can be improved. A "futile
>>>> wakeup" case doesn't seem to me like one which requires super-high
>>>> performance.  Would it be ok, in the two problematic cases that Dan
>>>> identified, to use some kind of atomic / locking protocol with the
>>>> cleanup?  Or is the comment for the counter increment in EnterI (and
>>>> only there) correct that it's important to avoid a lock or atomics
>>>> here (and presumably in ReenterI too).
>>>>
>>>
>>> I notice that EnteriI/ReenterI both end with OrderAccess::fence(). Can
>>> the potential update of _sync_FutileWakeups be delayed until that
>>> point, to take advantage of the fence to make the sync hole even 
>>> smaller?
>>
>> Not easily with EnterI() since there is one optional optimization
>> between the OM_PERFDATA_OP(FutileWakeups, inc()) call and the
>> OrderAccess::fence() call and that would result in lost FutileWakeups
>> increments.
>>
>> Not easily in ReenterI(), the OM_PERFDATA_OP(FutileWakeups, inc()) call
>> is at the bottom of the for-loop and the OrderAccess::fence() call at
>> the end of the function is outside the loop. This would result in lost
>> FutileWakeups increments.
>>
>> So in ReenterI() the OM_PERFDATA_OP(FutileWakeups, inc()) call 
>> immediately
>> follows an OrderAccess::fence() call. Doesn't that make that 
>> increment as
>> "safe" as it can be without having a real lock?
>>
>>
>>> You've got a release() (and and short nap!) with the store in
>>> PerfDataManager::destroy() to try to close the window somewhat.
>>
>> Yes, I modeled that after:
>>
>> src/share/vm/runtime/perfMemory.cpp:
>>
>>      83  void PerfMemory::initialize() {
>> <snip>
>>     156    OrderAccess::release_store(&_initialized, 1);
>>     157  }
>>
>>
>>> But I think rather than the release_store()  you used, you want a
>>> store, followed by a release().   release_store() puts a fence before
>>> the store to ensure earlier updates are seen before the current one, 
>>> no?
>>
>> Yup, and I see I got my reasoning wrong. The code I modeled
>> is right because you want to flush all the inits and it's OK
>> if the _initialized transition from '0' -> '1' is lazily seen.
>>
>> For my shutdown use, we are transitioning from '1' -> '0' and
>> we need that to be seen proactively so:
>
> Nit: OrderAccess "barriers" enforce ordering constraints but don't in 
> general provide any guarantees about visibility - ie they are not 
> necessarily "flushes". So while it may be true on some platforms by 
> virtue of the underlying barrier mechanism, in general they don't 
> change when a write becomes visible and so there is nothing 
> "proactive" about them.

My apologies for being sloppy with my wording.

What I'm trying to do is put up a barrier so that once the
deleting thread has changed the flag from '1' -> '0' another
thread trying to read the flag won't see the '1' when it
should see the '0'. In order words, I don't want the other
thread's read of the flag to float above the setting of
the flag to '0'.

>
>> OrderAccess::release_store(&_has_PerfData, 0);
>>      OrderAccess::storeload();
>
> I agree with Tom that the release is unnecessary - though harmless.

I tried to switch the code to:

     OrderAccess::store(&_has_PerfData, 0)
     OrderAccess::storeload();

but that wouldn't compile on Solaris X64:

Error: static OrderAccess::store(volatile int*, int) is not accessible 
from static PerfDataManager::destroy().

But then it occurred to me that I was trying too hard
to use OrderAccess functions...

> The real ordering constraint here is that we preserve:
>
> _has_PerfData = 0;
> <do actual deallocation>

I think I can switch to a straight "_has_PerfData = 0;" here and
not use either OrderAccess::release_store() or OrderAccess::store().

I also reread OrderAccess.hpp again and convinced myself that
an "OrderAccess::fence()" call is the only way to be sure here.
I'm pretty sure that achieves 'storeload|storestore' barrier
semantics... Yes, I have a headache again... :-)

> for which a storeload|storestore barrier after the write seems most 
> appropriate. Though with the insertion of the sleep after the write 
> there won't be any reordering anyway so explicit barriers seem redundant.

I didn't think that os::naked_short_sleep() would do anything
to keep another thread's read from floating above the
"_has_PerfData = 0;". What did I miss?

My current plan:

- switch from "OrderAccess::release_store(&_has_PerfData, 0);"
   to "_has_PerfData = 0;"
- keep "OrderAccess::fence();"
- retest and send out for another round of review

Dan

>
> Cheers,
> David
> -----
>
>> which is modeled after _owner field transitions from non-zeo
>> -> NULL in ObjectMonitor.cpp
>>
>>
>>>
>>> Also, I think the comment above that release_store() could be
>>> clarified.  It is fine as is if you're familiar with this bug report
>>> and discussion, but...  I think it should explicitly say there is
>>> still a very small window for the lack of true synchronization to
>>> cause a failure.  And perhaps that the release_store() (or
>>> store/release())  is not half of an acquire/release pair.
>>
>> Here's the existing comment:
>>
>>     286    // Clear the flag before we free the PerfData counters. Thus
>> begins
>>     287    // the race between this thread and another thread that 
>> has just
>>     288    // queried PerfDataManager::has_PerfData() and gotten back
>> 'true'.
>>     289    // The hope is that the other thread will finish its PerfData
>>     290    // manipulation before we free the memory. The two 
>> alternatives
>>     291    // are 1) leak the PerfData memory or 2) do some form of 
>> ordered
>>     292    // access before every PerfData operation.
>>
>> I think it pretty clearly states that there is still a race here.
>> And I think that option 2 covers that we're not doing completely
>> safe ordered access. I'm not sure how to make this comment more
>> clear, but if you have specific suggestions...
>>
>> Dan
>>
>>
>>>
>>> Tom
>>