RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Wed Sep 2 15:40:09 UTC 2015

Hi Dan,
OK.  I didn't review what was added in round 1 once you said it was all 
removed for round 2.  8^)
It would be great if what you have in your first paragraph below was 
added to the comments.   I think the existing comment in perfMemory_exit 
implies we're safe to remove the PerfData objects without fear of them 
being in use because we're at a safepoint. Perhaps better to have it 
(the new comment) in PerfDataManager::destroy(),  because it seems so 
weird to have a sleep in the VM thread during a safepoint, even in a 
shutdown path.

Any interest in asserting that you're at a safepoint in 
PerfDataManager::destroy?   Just a thought.
Tom

On 9/2/2015 11:15 AM, Daniel D. Daugherty wrote:
> On 9/2/15 8:49 AM, Tom Benson wrote:
>> Hi,
>> I'm a bit confused on one point... Since you now only call 
>> PerfDataManager::destroy at a safepoint, why do you still have the 
>> comment about 'the race'  and the sleep?
>
> Because the two "futile wakeup" counter updates in the monitor
> subsystem can execute in parallel with a safepoint. The JavaThread
> state is "blocked" so the safepoint subsystem will see the JavaThread
> as "at a safepoint" when it is actually executing the code to
> increment the counter.
>
> That's what the "is_safe" parameter to the OM_PERFDATA_OP macro was
> all about in the round 1 code review. However, David convinced me
> that all that logic didn't guarantee we wouldn't hit the race so
> I ripped it all out in the round 2 code review (this one).
>
> Does this help your confusion?
>
> Dan
>
>
>> Tom
>>
>> On 9/2/2015 7:52 AM, Daniel D. Daugherty wrote:
>>> David,
>>>
>>> Thanks for the very fast re-review!
>>>
>>> Enjoy your vacation!
>>>
>>> Dan
>>>
>>>
>>> On 9/2/15 2:54 AM, David Holmes wrote:
>>>> Hi Dan,
>>>>
>>>> On 2/09/2015 2:45 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I've updated the "fix" for this bug based on code review comments
>>>>> received in round 1. I've dropped most of the changes from round 1
>>>>> with a couple of exceptions.
>>>>
>>>> I have no further comments - it all looks good to me. If others 
>>>> want the pendulum to swing back a little from this position then 
>>>> ... nothing that has been suggested is functionally wrong. :)
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>> PS. When you get back from vacation I'll be gone for a month. That 
>>>> gives you a large window to push other things through with less 
>>>> stress ;-)
>>>>
>>>>
>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>
>>>>> Webrev URL: 
>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/
>>>>>
>>>>> The easiest way to re-review is to download the two patch files
>>>>> (round 0 and round 2) and view them in your favorite file merge tool:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch 
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/hotspot.patch 
>>>>>
>>>>>
>>>>>
>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>>           Aurora Adhoc vm.tmtools batch (in process)
>>>>>           Kim's repro sequence for JDK-8049304
>>>>>           Kim's repro sequence for JDK-8129978
>>>>>           JPRT -testset hotspot
>>>>>
>>>>> Changes between round 0 and round 2:
>>>>>
>>>>> - clarify a few comments
>>>>> - init _has_PerfData flag with '0' (instead of false)
>>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>>    _has_PerfData to '1' (we're in a Mutex)
>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>>    when called at a safepoint and when the StatSampler is not
>>>>>    running; this means when the VM is aborting, we no longer have
>>>>>    a race between the original crash report and this code path.
>>>>>
>>>>> Changes between round 1 and round 2:
>>>>>
>>>>> - clarify a few comments
>>>>> - drop is_safe parameter to OM_PERFDATA_OP macro
>>>>> - init _has_PerfData flag with '0' (instead of false)
>>>>> - drop OrderAccess::fence() call before os::naked_short_sleep() call
>>>>> - drop PerfDataManager::has_PerfData_with_acquire()
>>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>>    _has_PerfData to '1' (we're in a Mutex)
>>>>>
>>>>> I believe that I've addressed all comments from round 0 and
>>>>> from round 1.
>>>>>
>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 8/31/15 4:51 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I've updated the "fix" for this bug based on code review comments
>>>>>> received in round 0.
>>>>>>
>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>
>>>>>> Webrev URL:
>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/
>>>>>>
>>>>>> The easiest way to re-review is to download the two patch files
>>>>>> and view them in your favorite file merge tool:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch 
>>>>>>
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch 
>>>>>>
>>>>>>
>>>>>>
>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>>>          Aurora Adhoc vm.tmtools batch (in process)
>>>>>>          Kim's repro sequence for JDK-8049304
>>>>>>          Kim's repro sequence for JDK-8129978
>>>>>>          JPRT -testset hotspot
>>>>>>
>>>>>> Changes between round 0 and round 1:
>>>>>>
>>>>>> - add an 'is_safe' parameter to the OM_PERFDATA_OP macro;
>>>>>>   safepoint-safe callers can access _has_PerfData flag directly;
>>>>>>   non-safepoint-safe callers use a load-acquire to fetch the
>>>>>>   current _has_PerfData flag value
>>>>>> - change PerfDataManager::destroy() to simply set _has_PerfData
>>>>>>   to zero (field is volatile) and then use a fence() to prevent
>>>>>>   any reordering of operations in any direction; it's only done
>>>>>>   once during VM shutdown so...
>>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>>>   when called at a safepoint and when the StatSample is not
>>>>>>   running; this means when the VM is aborting, we no longer have
>>>>>>   a race between the original crash report and this code path.
>>>>>>
>>>>>> I believe that I've addressed all comments from round 0.
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 8/25/15 3:08 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a "fix" for a long standing race between JVM shutdown and 
>>>>>>> the
>>>>>>> JVM statistics subsystem:
>>>>>>>
>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>>
>>>>>>> Webrev URL:
>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/
>>>>>>>
>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch
>>>>>>>          Aurora Adhoc vm.tmtools batch
>>>>>>>          Kim's repro sequence for JDK-8049304
>>>>>>>          Kim's repro sequence for JDK-8129978
>>>>>>>          JPRT -testset hotspot
>>>>>>>
>>>>>>> This "fix":
>>>>>>>
>>>>>>> - adds a volatile flag to record whether PerfDataManager is holding
>>>>>>>   data (PerfData objects)
>>>>>>> - adds PerfDataManager::has_PerfData() to return the flag
>>>>>>> - changes the Java monitor subsystem's use of PerfData to
>>>>>>>   check both allocation of the monitor subsystem specific
>>>>>>>   PerfData object and the new PerfDataManager::has_PerfData()
>>>>>>>   return value
>>>>>>>
>>>>>>> If the global 'UsePerfData' option is false, the system works as
>>>>>>> it did before. If 'UsePerfData' is true (the default on 
>>>>>>> non-embedded
>>>>>>> systems), the Java monitor subsystem will allocate a number of
>>>>>>> PerfData objects to record information. The objects will record
>>>>>>> information about Java monitor subsystem until the JVM shuts down.
>>>>>>>
>>>>>>> When the JVM starts to shutdown, the new PerfDataManager flag will
>>>>>>> change to false and the Java monitor subsystem will stop using the
>>>>>>> PerfData objects. This is the new behavior. As noted in the 
>>>>>>> comments
>>>>>>> I added to the code, the race is still present; I'm just changing
>>>>>>> the order and the timing to reduce the likelihood of the crash.
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>