RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Sep 2 16:03:27 UTC 2015
On 9/2/15 9:40 AM, Tom Benson wrote:
> Hi Dan,
> OK. I didn't review what was added in round 1 once you said it was
> all removed for round 2. 8^)
Not "all", but I did remove "most" of the round 1 changes :-)
The changes I kept are called in the list below.
> It would be great if what you have in your first paragraph below was
> added to the comments. I think the existing comment in
> perfMemory_exit implies we're safe to remove the PerfData objects
> without fear of them being in use because we're at a safepoint.
I think I'll add this sentence to the comment in perfMemory_exit():
// In rare cases, PerfData objects are used in parallel with a
// safepoint. See the work around in PerfDataManager::destroy().
> Perhaps better to have it (the new comment) in
> PerfDataManager::destroy(), because it seems so weird to have a sleep
> in the VM thread during a safepoint, even in a shutdown path.
I think the PerfDataManager::destroy() comment is clear about
the race we're trying avoid. Again, if you have specific wording
changes to suggest to make it more clear... I'll take them. :-)
I think I'll also add this comment:
// This PerfData object can be used in a parallel with a safepoint.
// See the work around in PerfDataManager::destroy().
above these lines in src/share/vm/runtime/objectMonitor.cpp:
575 OM_PERFDATA_OP(FutileWakeups, inc());
747 OM_PERFDATA_OP(FutileWakeups, inc());
> Any interest in asserting that you're at a safepoint in
> PerfDataManager::destroy? Just a thought.
I'd rather not add an assert() at this time.
Are you good with the above comment additions? Do you need to
see another webrev when I make those changes?
Dan
> Tom
>
> On 9/2/2015 11:15 AM, Daniel D. Daugherty wrote:
>> On 9/2/15 8:49 AM, Tom Benson wrote:
>>> Hi,
>>> I'm a bit confused on one point... Since you now only call
>>> PerfDataManager::destroy at a safepoint, why do you still have the
>>> comment about 'the race' and the sleep?
>>
>> Because the two "futile wakeup" counter updates in the monitor
>> subsystem can execute in parallel with a safepoint. The JavaThread
>> state is "blocked" so the safepoint subsystem will see the JavaThread
>> as "at a safepoint" when it is actually executing the code to
>> increment the counter.
>>
>> That's what the "is_safe" parameter to the OM_PERFDATA_OP macro was
>> all about in the round 1 code review. However, David convinced me
>> that all that logic didn't guarantee we wouldn't hit the race so
>> I ripped it all out in the round 2 code review (this one).
>>
>> Does this help your confusion?
>>
>> Dan
>>
>>
>>> Tom
>>>
>>> On 9/2/2015 7:52 AM, Daniel D. Daugherty wrote:
>>>> David,
>>>>
>>>> Thanks for the very fast re-review!
>>>>
>>>> Enjoy your vacation!
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 9/2/15 2:54 AM, David Holmes wrote:
>>>>> Hi Dan,
>>>>>
>>>>> On 2/09/2015 2:45 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I've updated the "fix" for this bug based on code review comments
>>>>>> received in round 1. I've dropped most of the changes from round 1
>>>>>> with a couple of exceptions.
>>>>>
>>>>> I have no further comments - it all looks good to me. If others
>>>>> want the pendulum to swing back a little from this position then
>>>>> ... nothing that has been suggested is functionally wrong. :)
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>> PS. When you get back from vacation I'll be gone for a month. That
>>>>> gives you a large window to push other things through with less
>>>>> stress ;-)
>>>>>
>>>>>
>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>
>>>>>> Webrev URL:
>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/
>>>>>>
>>>>>> The easiest way to re-review is to download the two patch files
>>>>>> (round 0 and round 2) and view them in your favorite file merge
>>>>>> tool:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch
>>>>>>
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/hotspot.patch
>>>>>>
>>>>>>
>>>>>>
>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>>> Aurora Adhoc vm.tmtools batch (in process)
>>>>>> Kim's repro sequence for JDK-8049304
>>>>>> Kim's repro sequence for JDK-8129978
>>>>>> JPRT -testset hotspot
>>>>>>
>>>>>> Changes between round 0 and round 2:
>>>>>>
>>>>>> - clarify a few comments
>>>>>> - init _has_PerfData flag with '0' (instead of false)
>>>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>>> _has_PerfData to '1' (we're in a Mutex)
>>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>>> when called at a safepoint and when the StatSampler is not
>>>>>> running; this means when the VM is aborting, we no longer have
>>>>>> a race between the original crash report and this code path.
>>>>>>
>>>>>> Changes between round 1 and round 2:
>>>>>>
>>>>>> - clarify a few comments
>>>>>> - drop is_safe parameter to OM_PERFDATA_OP macro
>>>>>> - init _has_PerfData flag with '0' (instead of false)
>>>>>> - drop OrderAccess::fence() call before os::naked_short_sleep() call
>>>>>> - drop PerfDataManager::has_PerfData_with_acquire()
>>>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>>> _has_PerfData to '1' (we're in a Mutex)
>>>>>>
>>>>>> I believe that I've addressed all comments from round 0 and
>>>>>> from round 1.
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 8/31/15 4:51 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I've updated the "fix" for this bug based on code review comments
>>>>>>> received in round 0.
>>>>>>>
>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>>
>>>>>>> Webrev URL:
>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/
>>>>>>>
>>>>>>> The easiest way to re-review is to download the two patch files
>>>>>>> and view them in your favorite file merge tool:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch
>>>>>>>
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>>>> Aurora Adhoc vm.tmtools batch (in process)
>>>>>>> Kim's repro sequence for JDK-8049304
>>>>>>> Kim's repro sequence for JDK-8129978
>>>>>>> JPRT -testset hotspot
>>>>>>>
>>>>>>> Changes between round 0 and round 1:
>>>>>>>
>>>>>>> - add an 'is_safe' parameter to the OM_PERFDATA_OP macro;
>>>>>>> safepoint-safe callers can access _has_PerfData flag directly;
>>>>>>> non-safepoint-safe callers use a load-acquire to fetch the
>>>>>>> current _has_PerfData flag value
>>>>>>> - change PerfDataManager::destroy() to simply set _has_PerfData
>>>>>>> to zero (field is volatile) and then use a fence() to prevent
>>>>>>> any reordering of operations in any direction; it's only done
>>>>>>> once during VM shutdown so...
>>>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>>>> when called at a safepoint and when the StatSample is not
>>>>>>> running; this means when the VM is aborting, we no longer have
>>>>>>> a race between the original crash report and this code path.
>>>>>>>
>>>>>>> I believe that I've addressed all comments from round 0.
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 8/25/15 3:08 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I have a "fix" for a long standing race between JVM shutdown
>>>>>>>> and the
>>>>>>>> JVM statistics subsystem:
>>>>>>>>
>>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>>>
>>>>>>>> Webrev URL:
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/
>>>>>>>>
>>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch
>>>>>>>> Aurora Adhoc vm.tmtools batch
>>>>>>>> Kim's repro sequence for JDK-8049304
>>>>>>>> Kim's repro sequence for JDK-8129978
>>>>>>>> JPRT -testset hotspot
>>>>>>>>
>>>>>>>> This "fix":
>>>>>>>>
>>>>>>>> - adds a volatile flag to record whether PerfDataManager is
>>>>>>>> holding
>>>>>>>> data (PerfData objects)
>>>>>>>> - adds PerfDataManager::has_PerfData() to return the flag
>>>>>>>> - changes the Java monitor subsystem's use of PerfData to
>>>>>>>> check both allocation of the monitor subsystem specific
>>>>>>>> PerfData object and the new PerfDataManager::has_PerfData()
>>>>>>>> return value
>>>>>>>>
>>>>>>>> If the global 'UsePerfData' option is false, the system works as
>>>>>>>> it did before. If 'UsePerfData' is true (the default on
>>>>>>>> non-embedded
>>>>>>>> systems), the Java monitor subsystem will allocate a number of
>>>>>>>> PerfData objects to record information. The objects will record
>>>>>>>> information about Java monitor subsystem until the JVM shuts down.
>>>>>>>>
>>>>>>>> When the JVM starts to shutdown, the new PerfDataManager flag will
>>>>>>>> change to false and the Java monitor subsystem will stop using the
>>>>>>>> PerfData objects. This is the new behavior. As noted in the
>>>>>>>> comments
>>>>>>>> I added to the code, the race is still present; I'm just changing
>>>>>>>> the order and the timing to reduce the likelihood of the crash.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>
More information about the serviceability-dev
mailing list