RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()
Tom Benson
tom.benson at oracle.com
Wed Sep 2 17:06:02 UTC 2015
Looks good to me!
Tnx,
Tom
On 9/2/2015 12:40 PM, Daniel D. Daugherty wrote:
> Just for the record, here are the comment context diffs:
>
> $ diff -c src/share/vm/runtime/perfMemory.cpp{.cr2,}***
> src/share/vm/runtime/perfMemory.cpp.cr2 Tue Sep 1 19:39:45 2015
> --- src/share/vm/runtime/perfMemory.cpp Wed Sep 2 09:35:48 2015
> ***************
> *** 70,76 ****
> // objects that are currently being used by running JavaThreads
> // or the StatSampler. This method is invoked while we are not at
> // a safepoint during a VM abort so leaving the PerfData objects
> ! // around may also help diagnose the failure.
> //
> if (SafepointSynchronize::is_at_safepoint() &&
> !StatSampler::is_active()) {
> PerfDataManager::destroy();
> --- 70,78 ----
> // objects that are currently being used by running JavaThreads
> // or the StatSampler. This method is invoked while we are not at
> // a safepoint during a VM abort so leaving the PerfData objects
> ! // around may also help diagnose the failure. In rare cases,
> ! // PerfData objects are used in parallel with a safepoint. See
> ! // the work around in PerfDataManager::destroy().
> //
> if (SafepointSynchronize::is_at_safepoint() &&
> !StatSampler::is_active()) {
> PerfDataManager::destroy();
>
>
> $ diff -c src/share/vm/runtime/objectMonitor.cpp{.cr2,}
> *** src/share/vm/runtime/objectMonitor.cpp.cr2 Tue Sep 1 19:23:35 2015
> --- src/share/vm/runtime/objectMonitor.cpp Wed Sep 2 09:37:08 2015
> ***************
> *** 572,577 ****
> --- 572,579 ----
> // That is by design - we trade "lossy" counters which are
> exposed to
> // races during updates for a lower probe effect.
> TEVENT(Inflated enter - Futile wakeup);
> + // This PerfData object can be used in a parallel with a safepoint.
> + // See the work around in PerfDataManager::destroy().
> OM_PERFDATA_OP(FutileWakeups, inc());
> ++nWakeups;
>
> ***************
> *** 744,749 ****
> --- 746,753 ----
> // *must* retry _owner before parking.
> OrderAccess::fence();
>
> + // This PerfData object can be used in a parallel with a safepoint.
> + // See the work around in PerfDataManager::destroy().
> OM_PERFDATA_OP(FutileWakeups, inc());
> }
>
>
> Dan
>
>
> On 9/2/15 10:03 AM, Daniel D. Daugherty wrote:
>> On 9/2/15 9:40 AM, Tom Benson wrote:
>>> Hi Dan,
>>> OK. I didn't review what was added in round 1 once you said it was
>>> all removed for round 2. 8^)
>>
>> Not "all", but I did remove "most" of the round 1 changes :-)
>> The changes I kept are called in the list below.
>>
>>
>>> It would be great if what you have in your first paragraph below was
>>> added to the comments. I think the existing comment in
>>> perfMemory_exit implies we're safe to remove the PerfData objects
>>> without fear of them being in use because we're at a safepoint.
>>
>> I think I'll add this sentence to the comment in perfMemory_exit():
>>
>> // In rare cases, PerfData objects are used in parallel with a
>> // safepoint. See the work around in PerfDataManager::destroy().
>>
>>
>>> Perhaps better to have it (the new comment) in
>>> PerfDataManager::destroy(), because it seems so weird to have a
>>> sleep in the VM thread during a safepoint, even in a shutdown path.
>>
>> I think the PerfDataManager::destroy() comment is clear about
>> the race we're trying avoid. Again, if you have specific wording
>> changes to suggest to make it more clear... I'll take them. :-)
>>
>> I think I'll also add this comment:
>>
>> // This PerfData object can be used in a parallel with a safepoint.
>> // See the work around in PerfDataManager::destroy().
>>
>> above these lines in src/share/vm/runtime/objectMonitor.cpp:
>>
>> 575 OM_PERFDATA_OP(FutileWakeups, inc());
>> 747 OM_PERFDATA_OP(FutileWakeups, inc());
>>
>>
>>> Any interest in asserting that you're at a safepoint in
>>> PerfDataManager::destroy? Just a thought.
>>
>> I'd rather not add an assert() at this time.
>>
>> Are you good with the above comment additions? Do you need to
>> see another webrev when I make those changes?
>>
>> Dan
>>
>>
>>> Tom
>>>
>>> On 9/2/2015 11:15 AM, Daniel D. Daugherty wrote:
>>>> On 9/2/15 8:49 AM, Tom Benson wrote:
>>>>> Hi,
>>>>> I'm a bit confused on one point... Since you now only call
>>>>> PerfDataManager::destroy at a safepoint, why do you still have the
>>>>> comment about 'the race' and the sleep?
>>>>
>>>> Because the two "futile wakeup" counter updates in the monitor
>>>> subsystem can execute in parallel with a safepoint. The JavaThread
>>>> state is "blocked" so the safepoint subsystem will see the JavaThread
>>>> as "at a safepoint" when it is actually executing the code to
>>>> increment the counter.
>>>>
>>>> That's what the "is_safe" parameter to the OM_PERFDATA_OP macro was
>>>> all about in the round 1 code review. However, David convinced me
>>>> that all that logic didn't guarantee we wouldn't hit the race so
>>>> I ripped it all out in the round 2 code review (this one).
>>>>
>>>> Does this help your confusion?
>>>>
>>>> Dan
>>>>
>>>>
>>>>> Tom
>>>>>
>>>>> On 9/2/2015 7:52 AM, Daniel D. Daugherty wrote:
>>>>>> David,
>>>>>>
>>>>>> Thanks for the very fast re-review!
>>>>>>
>>>>>> Enjoy your vacation!
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 9/2/15 2:54 AM, David Holmes wrote:
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> On 2/09/2015 2:45 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I've updated the "fix" for this bug based on code review comments
>>>>>>>> received in round 1. I've dropped most of the changes from round 1
>>>>>>>> with a couple of exceptions.
>>>>>>>
>>>>>>> I have no further comments - it all looks good to me. If others
>>>>>>> want the pendulum to swing back a little from this position then
>>>>>>> ... nothing that has been suggested is functionally wrong. :)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>> PS. When you get back from vacation I'll be gone for a month.
>>>>>>> That gives you a large window to push other things through with
>>>>>>> less stress ;-)
>>>>>>>
>>>>>>>
>>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>>>
>>>>>>>> Webrev URL:
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/
>>>>>>>>
>>>>>>>> The easiest way to re-review is to download the two patch files
>>>>>>>> (round 0 and round 2) and view them in your favorite file merge
>>>>>>>> tool:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch
>>>>>>>>
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/hotspot.patch
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>>>>> Aurora Adhoc vm.tmtools batch (in process)
>>>>>>>> Kim's repro sequence for JDK-8049304
>>>>>>>> Kim's repro sequence for JDK-8129978
>>>>>>>> JPRT -testset hotspot
>>>>>>>>
>>>>>>>> Changes between round 0 and round 2:
>>>>>>>>
>>>>>>>> - clarify a few comments
>>>>>>>> - init _has_PerfData flag with '0' (instead of false)
>>>>>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>>>>> _has_PerfData to '1' (we're in a Mutex)
>>>>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>>>>> when called at a safepoint and when the StatSampler is not
>>>>>>>> running; this means when the VM is aborting, we no longer have
>>>>>>>> a race between the original crash report and this code path.
>>>>>>>>
>>>>>>>> Changes between round 1 and round 2:
>>>>>>>>
>>>>>>>> - clarify a few comments
>>>>>>>> - drop is_safe parameter to OM_PERFDATA_OP macro
>>>>>>>> - init _has_PerfData flag with '0' (instead of false)
>>>>>>>> - drop OrderAccess::fence() call before os::naked_short_sleep()
>>>>>>>> call
>>>>>>>> - drop PerfDataManager::has_PerfData_with_acquire()
>>>>>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>>>>> _has_PerfData to '1' (we're in a Mutex)
>>>>>>>>
>>>>>>>> I believe that I've addressed all comments from round 0 and
>>>>>>>> from round 1.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/31/15 4:51 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I've updated the "fix" for this bug based on code review comments
>>>>>>>>> received in round 0.
>>>>>>>>>
>>>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>>>>
>>>>>>>>> Webrev URL:
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/
>>>>>>>>>
>>>>>>>>> The easiest way to re-review is to download the two patch files
>>>>>>>>> and view them in your favorite file merge tool:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>>>>>> Aurora Adhoc vm.tmtools batch (in process)
>>>>>>>>> Kim's repro sequence for JDK-8049304
>>>>>>>>> Kim's repro sequence for JDK-8129978
>>>>>>>>> JPRT -testset hotspot
>>>>>>>>>
>>>>>>>>> Changes between round 0 and round 1:
>>>>>>>>>
>>>>>>>>> - add an 'is_safe' parameter to the OM_PERFDATA_OP macro;
>>>>>>>>> safepoint-safe callers can access _has_PerfData flag directly;
>>>>>>>>> non-safepoint-safe callers use a load-acquire to fetch the
>>>>>>>>> current _has_PerfData flag value
>>>>>>>>> - change PerfDataManager::destroy() to simply set _has_PerfData
>>>>>>>>> to zero (field is volatile) and then use a fence() to prevent
>>>>>>>>> any reordering of operations in any direction; it's only done
>>>>>>>>> once during VM shutdown so...
>>>>>>>>> - change perfMemory_exit() to only call
>>>>>>>>> PerfDataManager::destroy()
>>>>>>>>> when called at a safepoint and when the StatSample is not
>>>>>>>>> running; this means when the VM is aborting, we no longer have
>>>>>>>>> a race between the original crash report and this code path.
>>>>>>>>>
>>>>>>>>> I believe that I've addressed all comments from round 0.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/25/15 3:08 PM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I have a "fix" for a long standing race between JVM shutdown
>>>>>>>>>> and the
>>>>>>>>>> JVM statistics subsystem:
>>>>>>>>>>
>>>>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>>>>>
>>>>>>>>>> Webrev URL:
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/
>>>>>>>>>>
>>>>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch
>>>>>>>>>> Aurora Adhoc vm.tmtools batch
>>>>>>>>>> Kim's repro sequence for JDK-8049304
>>>>>>>>>> Kim's repro sequence for JDK-8129978
>>>>>>>>>> JPRT -testset hotspot
>>>>>>>>>>
>>>>>>>>>> This "fix":
>>>>>>>>>>
>>>>>>>>>> - adds a volatile flag to record whether PerfDataManager is
>>>>>>>>>> holding
>>>>>>>>>> data (PerfData objects)
>>>>>>>>>> - adds PerfDataManager::has_PerfData() to return the flag
>>>>>>>>>> - changes the Java monitor subsystem's use of PerfData to
>>>>>>>>>> check both allocation of the monitor subsystem specific
>>>>>>>>>> PerfData object and the new PerfDataManager::has_PerfData()
>>>>>>>>>> return value
>>>>>>>>>>
>>>>>>>>>> If the global 'UsePerfData' option is false, the system works as
>>>>>>>>>> it did before. If 'UsePerfData' is true (the default on
>>>>>>>>>> non-embedded
>>>>>>>>>> systems), the Java monitor subsystem will allocate a number of
>>>>>>>>>> PerfData objects to record information. The objects will record
>>>>>>>>>> information about Java monitor subsystem until the JVM shuts
>>>>>>>>>> down.
>>>>>>>>>>
>>>>>>>>>> When the JVM starts to shutdown, the new PerfDataManager flag
>>>>>>>>>> will
>>>>>>>>>> change to false and the Java monitor subsystem will stop
>>>>>>>>>> using the
>>>>>>>>>> PerfData objects. This is the new behavior. As noted in the
>>>>>>>>>> comments
>>>>>>>>>> I added to the code, the race is still present; I'm just
>>>>>>>>>> changing
>>>>>>>>>> the order and the timing to reduce the likelihood of the crash.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>
More information about the serviceability-dev
mailing list