RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Fri Aug 28 17:57:39 UTC 2015

Hi Dan,

On 8/28/2015 12:27 PM, Daniel D. Daugherty wrote:
> On 8/28/15 8:45 AM, Tom Benson wrote:
>> Hi,
>> One more pair of eyes on this.  8^)
>
> Hi Tom!
>
> Thanks for reviewing and welcome to the party...
>
>
>>
>> On 8/27/2015 8:16 PM, Kim Barrett wrote:
>>> On Aug 27, 2015, at 5:42 PM, Daniel D. Daugherty 
>>> <daniel.daugherty at oracle.com> wrote:
>>>> Sorry for starting another e-mail thread fork in an already 
>>>> complicated
>>>> review...
>>> OK, that was fascinating.  No, really, I mean it.
>>>
>>> It made me realize that we've been arguing and talking past each other
>>> in part because we're really dealing with two distinct though closely
>>> related bugs here.
>>>
>>> I've been primarily thinking about the case where we're calling
>>> vm_abort / os::abort, where the we presently delete the PerfData
>>> memory even though there can be arbitrary other threads running.  This
>>> was the case in JDK-8129978, which is how I got involved here in the
>>> first place.  In that bug we were in vm_exit_during_initialization and
>>> had called perfMemory_exit when some thread attempted to inflate a
>>> monitor (which is not one of the conflicting cases discussed by Dan).
>>>
>>> The problem Dan has been looking at, JDK-8049304, is about a "normal"
>>> VM shutdown.  In this case, the problem is that we believe it is safe
>>> to delete the PerfData, because we've safepointed, and yet some thread
>>> unexpectedly runs and attempts to touch the deleted data anyway.
>>>
>>> I think Dan's proposed fix (mostly) avoids the specific instance of
>>> JDK-8129978, but doesn't solve the more general problem of abnormal
>>> exit deleting the PerfData while some running thread is touching some
>>> non-monitor-related part of that data.  My proposal to leave it to the
>>> OS to deal with memory cleanup on process exit would deal with this
>>> case.
>>>
>>> I think Dan's proposed fix (mostly) avoids problems like JDK-8049304.
>>> And the approach I've been talking about doesn't help at all for this
>>> case.  But I wonder if Dan's proposed fix can be improved.  A "futile
>>> wakeup" case doesn't seem to me like one which requires super-high
>>> performance.  Would it be ok, in the two problematic cases that Dan
>>> identified, to use some kind of atomic / locking protocol with the
>>> cleanup?  Or is the comment for the counter increment in EnterI (and
>>> only there) correct that it's important to avoid a lock or atomics
>>> here (and presumably in ReenterI too).
>>>
>>
>> I notice that EnteriI/ReenterI both end with OrderAccess::fence(). 
>> Can the potential update of _sync_FutileWakeups be delayed until that 
>> point, to take advantage of the fence to make the sync hole even 
>> smaller?
>
> Not easily with EnterI() since there is one optional optimization
> between the OM_PERFDATA_OP(FutileWakeups, inc()) call and the
> OrderAccess::fence() call and that would result in lost FutileWakeups
> increments.
>
> Not easily in ReenterI(), the OM_PERFDATA_OP(FutileWakeups, inc()) call
> is at the bottom of the for-loop and the OrderAccess::fence() call at
> the end of the function is outside the loop. This would result in lost
> FutileWakeups increments.
Yes, you'd have to keep a local count, and then add the total outside 
the loop for both Enter/Reenter, after the fence.  But I see what you 
mean about the other exit paths in Enter.   (The more I look at this 
code, the more I remember it...   BTW, are those knob_ setting defaults 
ever going to be moved to a platform specific-module?   That was my beef 
(well, one of them) in 2 different ports.  Or is the goal to make 
monitors so well self-tuning that they can go away?  Sorry for the 
digression... 8^))

At any rate, as you say, perhaps it's not worth it to leverage the 
fences, though it could be done.

>
> So in ReenterI() the OM_PERFDATA_OP(FutileWakeups, inc()) call 
> immediately
> follows an OrderAccess::fence() call. Doesn't that make that increment as
> "safe" as it can be without having a real lock?
>
>
>> You've got a release() (and and short nap!) with the store in 
>> PerfDataManager::destroy() to try to close the window somewhat.
>
> Yes, I modeled that after:
>
> src/share/vm/runtime/perfMemory.cpp:
>
>     83  void PerfMemory::initialize() {
> <snip>
>    156    OrderAccess::release_store(&_initialized, 1);
>    157  }
>
>
>> But I think rather than the release_store()  you used, you want a 
>> store, followed by a release().   release_store() puts a fence before 
>> the store to ensure earlier updates are seen before the current one, no?
>
> Yup, and I see I got my reasoning wrong. The code I modeled
> is right because you want to flush all the inits and it's OK
> if the _initialized transition from '0' -> '1' is lazily seen.
>
> For my shutdown use, we are transitioning from '1' -> '0' and
> we need that to be seen proactively so:
>
>     OrderAccess::release_store(&_has_PerfData, 0);
>     OrderAccess::storeload();
>
> which is modeled after _owner field transitions from non-zeo
> -> NULL in ObjectMonitor.cpp
>

It's not clear to me why the store needs to be a release_store in this 
case, as long as the storeload() follows it.  You're not protecting any 
earlier writes. ?

>
>>
>> Also, I think the comment above that release_store() could be 
>> clarified.  It is fine as is if you're familiar with this bug report 
>> and discussion, but...  I think it should explicitly say there is 
>> still a very small window for the lack of true synchronization to 
>> cause a failure.  And perhaps that the release_store() (or 
>> store/release())  is not half of an acquire/release pair.
>
> Here's the existing comment:
>
>    286    // Clear the flag before we free the PerfData counters. Thus 
> begins
>    287    // the race between this thread and another thread that has 
> just
>    288    // queried PerfDataManager::has_PerfData() and gotten back 
> 'true'.
>    289    // The hope is that the other thread will finish its PerfData
>    290    // manipulation before we free the memory. The two alternatives
>    291    // are 1) leak the PerfData memory or 2) do some form of 
> ordered
>    292    // access before every PerfData operation.
>
> I think it pretty clearly states that there is still a race here.
> And I think that option 2 covers that we're not doing completely
> safe ordered access. I'm not sure how to make this comment more
> clear, but if you have specific suggestions...
>
OK, I guess it's subjective.
Tom

> Dan
>
>
>>
>> Tom
>