RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Wed Sep 2 15:15:35 UTC 2015

On 9/2/15 8:49 AM, Tom Benson wrote:
> Hi,
> I'm a bit confused on one point... Since you now only call 
> PerfDataManager::destroy at a safepoint, why do you still have the 
> comment about 'the race'  and the sleep?

Because the two "futile wakeup" counter updates in the monitor
subsystem can execute in parallel with a safepoint. The JavaThread
state is "blocked" so the safepoint subsystem will see the JavaThread
as "at a safepoint" when it is actually executing the code to
increment the counter.

That's what the "is_safe" parameter to the OM_PERFDATA_OP macro was
all about in the round 1 code review. However, David convinced me
that all that logic didn't guarantee we wouldn't hit the race so
I ripped it all out in the round 2 code review (this one).

Does this help your confusion?

Dan

> Tom
>
> On 9/2/2015 7:52 AM, Daniel D. Daugherty wrote:
>> David,
>>
>> Thanks for the very fast re-review!
>>
>> Enjoy your vacation!
>>
>> Dan
>>
>>
>> On 9/2/15 2:54 AM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> On 2/09/2015 2:45 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I've updated the "fix" for this bug based on code review comments
>>>> received in round 1. I've dropped most of the changes from round 1
>>>> with a couple of exceptions.
>>>
>>> I have no further comments - it all looks good to me. If others want 
>>> the pendulum to swing back a little from this position then ... 
>>> nothing that has been suggested is functionally wrong. :)
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>> PS. When you get back from vacation I'll be gone for a month. That 
>>> gives you a large window to push other things through with less 
>>> stress ;-)
>>>
>>>
>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>
>>>> Webrev URL: 
>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/
>>>>
>>>> The easiest way to re-review is to download the two patch files
>>>> (round 0 and round 2) and view them in your favorite file merge tool:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch 
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/hotspot.patch 
>>>>
>>>>
>>>>
>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>           Aurora Adhoc vm.tmtools batch (in process)
>>>>           Kim's repro sequence for JDK-8049304
>>>>           Kim's repro sequence for JDK-8129978
>>>>           JPRT -testset hotspot
>>>>
>>>> Changes between round 0 and round 2:
>>>>
>>>> - clarify a few comments
>>>> - init _has_PerfData flag with '0' (instead of false)
>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>    _has_PerfData to '1' (we're in a Mutex)
>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>    when called at a safepoint and when the StatSampler is not
>>>>    running; this means when the VM is aborting, we no longer have
>>>>    a race between the original crash report and this code path.
>>>>
>>>> Changes between round 1 and round 2:
>>>>
>>>> - clarify a few comments
>>>> - drop is_safe parameter to OM_PERFDATA_OP macro
>>>> - init _has_PerfData flag with '0' (instead of false)
>>>> - drop OrderAccess::fence() call before os::naked_short_sleep() call
>>>> - drop PerfDataManager::has_PerfData_with_acquire()
>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>    _has_PerfData to '1' (we're in a Mutex)
>>>>
>>>> I believe that I've addressed all comments from round 0 and
>>>> from round 1.
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 8/31/15 4:51 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I've updated the "fix" for this bug based on code review comments
>>>>> received in round 0.
>>>>>
>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>
>>>>> Webrev URL:
>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/
>>>>>
>>>>> The easiest way to re-review is to download the two patch files
>>>>> and view them in your favorite file merge tool:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch 
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch 
>>>>>
>>>>>
>>>>>
>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>>          Aurora Adhoc vm.tmtools batch (in process)
>>>>>          Kim's repro sequence for JDK-8049304
>>>>>          Kim's repro sequence for JDK-8129978
>>>>>          JPRT -testset hotspot
>>>>>
>>>>> Changes between round 0 and round 1:
>>>>>
>>>>> - add an 'is_safe' parameter to the OM_PERFDATA_OP macro;
>>>>>   safepoint-safe callers can access _has_PerfData flag directly;
>>>>>   non-safepoint-safe callers use a load-acquire to fetch the
>>>>>   current _has_PerfData flag value
>>>>> - change PerfDataManager::destroy() to simply set _has_PerfData
>>>>>   to zero (field is volatile) and then use a fence() to prevent
>>>>>   any reordering of operations in any direction; it's only done
>>>>>   once during VM shutdown so...
>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>>   when called at a safepoint and when the StatSample is not
>>>>>   running; this means when the VM is aborting, we no longer have
>>>>>   a race between the original crash report and this code path.
>>>>>
>>>>> I believe that I've addressed all comments from round 0.
>>>>>
>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 8/25/15 3:08 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have a "fix" for a long standing race between JVM shutdown and the
>>>>>> JVM statistics subsystem:
>>>>>>
>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>
>>>>>> Webrev URL:
>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/
>>>>>>
>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch
>>>>>>          Aurora Adhoc vm.tmtools batch
>>>>>>          Kim's repro sequence for JDK-8049304
>>>>>>          Kim's repro sequence for JDK-8129978
>>>>>>          JPRT -testset hotspot
>>>>>>
>>>>>> This "fix":
>>>>>>
>>>>>> - adds a volatile flag to record whether PerfDataManager is holding
>>>>>>   data (PerfData objects)
>>>>>> - adds PerfDataManager::has_PerfData() to return the flag
>>>>>> - changes the Java monitor subsystem's use of PerfData to
>>>>>>   check both allocation of the monitor subsystem specific
>>>>>>   PerfData object and the new PerfDataManager::has_PerfData()
>>>>>>   return value
>>>>>>
>>>>>> If the global 'UsePerfData' option is false, the system works as
>>>>>> it did before. If 'UsePerfData' is true (the default on non-embedded
>>>>>> systems), the Java monitor subsystem will allocate a number of
>>>>>> PerfData objects to record information. The objects will record
>>>>>> information about Java monitor subsystem until the JVM shuts down.
>>>>>>
>>>>>> When the JVM starts to shutdown, the new PerfDataManager flag will
>>>>>> change to false and the Java monitor subsystem will stop using the
>>>>>> PerfData objects. This is the new behavior. As noted in the comments
>>>>>> I added to the code, the race is still present; I'm just changing
>>>>>> the order and the timing to reduce the likelihood of the crash.
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>