RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Wed Sep 2 14:49:49 UTC 2015

Hi,
I'm a bit confused on one point... Since you now only call 
PerfDataManager::destroy at a safepoint, why do you still have the 
comment about 'the race'  and the sleep?
Tom

On 9/2/2015 7:52 AM, Daniel D. Daugherty wrote:
> David,
>
> Thanks for the very fast re-review!
>
> Enjoy your vacation!
>
> Dan
>
>
> On 9/2/15 2:54 AM, David Holmes wrote:
>> Hi Dan,
>>
>> On 2/09/2015 2:45 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I've updated the "fix" for this bug based on code review comments
>>> received in round 1. I've dropped most of the changes from round 1
>>> with a couple of exceptions.
>>
>> I have no further comments - it all looks good to me. If others want 
>> the pendulum to swing back a little from this position then ... 
>> nothing that has been suggested is functionally wrong. :)
>>
>> Thanks,
>> David
>> -----
>>
>> PS. When you get back from vacation I'll be gone for a month. That 
>> gives you a large window to push other things through with less 
>> stress ;-)
>>
>>
>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>
>>> Webrev URL: 
>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/
>>>
>>> The easiest way to re-review is to download the two patch files
>>> (round 0 and round 2) and view them in your favorite file merge tool:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch 
>>>
>>>
>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/hotspot.patch 
>>>
>>>
>>>
>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>           Aurora Adhoc vm.tmtools batch (in process)
>>>           Kim's repro sequence for JDK-8049304
>>>           Kim's repro sequence for JDK-8129978
>>>           JPRT -testset hotspot
>>>
>>> Changes between round 0 and round 2:
>>>
>>> - clarify a few comments
>>> - init _has_PerfData flag with '0' (instead of false)
>>> - drop unnecessary use OrderAccess::release_store() to set
>>>    _has_PerfData to '1' (we're in a Mutex)
>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>    when called at a safepoint and when the StatSampler is not
>>>    running; this means when the VM is aborting, we no longer have
>>>    a race between the original crash report and this code path.
>>>
>>> Changes between round 1 and round 2:
>>>
>>> - clarify a few comments
>>> - drop is_safe parameter to OM_PERFDATA_OP macro
>>> - init _has_PerfData flag with '0' (instead of false)
>>> - drop OrderAccess::fence() call before os::naked_short_sleep() call
>>> - drop PerfDataManager::has_PerfData_with_acquire()
>>> - drop unnecessary use OrderAccess::release_store() to set
>>>    _has_PerfData to '1' (we're in a Mutex)
>>>
>>> I believe that I've addressed all comments from round 0 and
>>> from round 1.
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>>>
>>>
>>> On 8/31/15 4:51 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I've updated the "fix" for this bug based on code review comments
>>>> received in round 0.
>>>>
>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>
>>>> Webrev URL:
>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/
>>>>
>>>> The easiest way to re-review is to download the two patch files
>>>> and view them in your favorite file merge tool:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch 
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch 
>>>>
>>>>
>>>>
>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>          Aurora Adhoc vm.tmtools batch (in process)
>>>>          Kim's repro sequence for JDK-8049304
>>>>          Kim's repro sequence for JDK-8129978
>>>>          JPRT -testset hotspot
>>>>
>>>> Changes between round 0 and round 1:
>>>>
>>>> - add an 'is_safe' parameter to the OM_PERFDATA_OP macro;
>>>>   safepoint-safe callers can access _has_PerfData flag directly;
>>>>   non-safepoint-safe callers use a load-acquire to fetch the
>>>>   current _has_PerfData flag value
>>>> - change PerfDataManager::destroy() to simply set _has_PerfData
>>>>   to zero (field is volatile) and then use a fence() to prevent
>>>>   any reordering of operations in any direction; it's only done
>>>>   once during VM shutdown so...
>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>   when called at a safepoint and when the StatSample is not
>>>>   running; this means when the VM is aborting, we no longer have
>>>>   a race between the original crash report and this code path.
>>>>
>>>> I believe that I've addressed all comments from round 0.
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 8/25/15 3:08 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I have a "fix" for a long standing race between JVM shutdown and the
>>>>> JVM statistics subsystem:
>>>>>
>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>
>>>>> Webrev URL:
>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/
>>>>>
>>>>> Testing: Aurora Adhoc RT-SVC nightly batch
>>>>>          Aurora Adhoc vm.tmtools batch
>>>>>          Kim's repro sequence for JDK-8049304
>>>>>          Kim's repro sequence for JDK-8129978
>>>>>          JPRT -testset hotspot
>>>>>
>>>>> This "fix":
>>>>>
>>>>> - adds a volatile flag to record whether PerfDataManager is holding
>>>>>   data (PerfData objects)
>>>>> - adds PerfDataManager::has_PerfData() to return the flag
>>>>> - changes the Java monitor subsystem's use of PerfData to
>>>>>   check both allocation of the monitor subsystem specific
>>>>>   PerfData object and the new PerfDataManager::has_PerfData()
>>>>>   return value
>>>>>
>>>>> If the global 'UsePerfData' option is false, the system works as
>>>>> it did before. If 'UsePerfData' is true (the default on non-embedded
>>>>> systems), the Java monitor subsystem will allocate a number of
>>>>> PerfData objects to record information. The objects will record
>>>>> information about Java monitor subsystem until the JVM shuts down.
>>>>>
>>>>> When the JVM starts to shutdown, the new PerfDataManager flag will
>>>>> change to false and the Java monitor subsystem will stop using the
>>>>> PerfData objects. This is the new behavior. As noted in the comments
>>>>> I added to the code, the race is still present; I'm just changing
>>>>> the order and the timing to reduce the likelihood of the crash.
>>>>>
>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>