RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Wed Sep 2 16:40:49 UTC 2015

Just for the record, here are the comment context diffs:

$ diff -c src/share/vm/runtime/perfMemory.cpp{.cr2,}*** 
src/share/vm/runtime/perfMemory.cpp.cr2     Tue Sep  1 19:39:45 2015
--- src/share/vm/runtime/perfMemory.cpp Wed Sep  2 09:35:48 2015
***************
*** 70,76 ****
     // objects that are currently being used by running JavaThreads
     // or the StatSampler. This method is invoked while we are not at
     // a safepoint during a VM abort so leaving the PerfData objects
!   // around may also help diagnose the failure.
     //
     if (SafepointSynchronize::is_at_safepoint() && 
!StatSampler::is_active()) {
       PerfDataManager::destroy();
--- 70,78 ----
     // objects that are currently being used by running JavaThreads
     // or the StatSampler. This method is invoked while we are not at
     // a safepoint during a VM abort so leaving the PerfData objects
!   // around may also help diagnose the failure. In rare cases,
!   // PerfData objects are used in parallel with a safepoint. See
!   // the work around in PerfDataManager::destroy().
     //
     if (SafepointSynchronize::is_at_safepoint() && 
!StatSampler::is_active()) {
       PerfDataManager::destroy();

$ diff -c src/share/vm/runtime/objectMonitor.cpp{.cr2,}
*** src/share/vm/runtime/objectMonitor.cpp.cr2  Tue Sep  1 19:23:35 2015
--- src/share/vm/runtime/objectMonitor.cpp      Wed Sep  2 09:37:08 2015
***************
*** 572,577 ****
--- 572,579 ----
       // That is by design - we trade "lossy" counters which are exposed to
       // races during updates for a lower probe effect.
       TEVENT(Inflated enter - Futile wakeup);
+     // This PerfData object can be used in a parallel with a safepoint.
+     // See the work around in PerfDataManager::destroy().
       OM_PERFDATA_OP(FutileWakeups, inc());
       ++nWakeups;

***************
*** 744,749 ****
--- 746,753 ----
       // *must* retry  _owner before parking.
       OrderAccess::fence();

+     // This PerfData object can be used in a parallel with a safepoint.
+     // See the work around in PerfDataManager::destroy().
       OM_PERFDATA_OP(FutileWakeups, inc());
     }

Dan

On 9/2/15 10:03 AM, Daniel D. Daugherty wrote:
> On 9/2/15 9:40 AM, Tom Benson wrote:
>> Hi Dan,
>> OK.  I didn't review what was added in round 1 once you said it was 
>> all removed for round 2.  8^)
>
> Not "all", but I did remove "most" of the round 1 changes :-)
> The changes I kept are called in the list below.
>
>
>> It would be great if what you have in your first paragraph below was 
>> added to the comments.   I think the existing comment in 
>> perfMemory_exit implies we're safe to remove the PerfData objects 
>> without fear of them being in use because we're at a safepoint.
>
> I think I'll add this sentence to the comment in perfMemory_exit():
>
>     // In rare cases, PerfData objects are used in parallel with a
>     // safepoint. See the work around in PerfDataManager::destroy().
>
>
>> Perhaps better to have it (the new comment) in 
>> PerfDataManager::destroy(),  because it seems so weird to have a 
>> sleep in the VM thread during a safepoint, even in a shutdown path.
>
> I think the PerfDataManager::destroy() comment is clear about
> the race we're trying avoid. Again, if you have specific wording
> changes to suggest to make it more clear... I'll take them. :-)
>
> I think I'll also add this comment:
>
>     // This PerfData object can be used in a parallel with a safepoint.
>     // See the work around in PerfDataManager::destroy().
>
> above these lines in src/share/vm/runtime/objectMonitor.cpp:
>
> 575     OM_PERFDATA_OP(FutileWakeups, inc());
> 747     OM_PERFDATA_OP(FutileWakeups, inc());
>
>
>> Any interest in asserting that you're at a safepoint in 
>> PerfDataManager::destroy?   Just a thought.
>
> I'd rather not add an assert() at this time.
>
> Are you good with the above comment additions? Do you need to
> see another webrev when I make those changes?
>
> Dan
>
>
>> Tom
>>
>> On 9/2/2015 11:15 AM, Daniel D. Daugherty wrote:
>>> On 9/2/15 8:49 AM, Tom Benson wrote:
>>>> Hi,
>>>> I'm a bit confused on one point... Since you now only call 
>>>> PerfDataManager::destroy at a safepoint, why do you still have the 
>>>> comment about 'the race'  and the sleep?
>>>
>>> Because the two "futile wakeup" counter updates in the monitor
>>> subsystem can execute in parallel with a safepoint. The JavaThread
>>> state is "blocked" so the safepoint subsystem will see the JavaThread
>>> as "at a safepoint" when it is actually executing the code to
>>> increment the counter.
>>>
>>> That's what the "is_safe" parameter to the OM_PERFDATA_OP macro was
>>> all about in the round 1 code review. However, David convinced me
>>> that all that logic didn't guarantee we wouldn't hit the race so
>>> I ripped it all out in the round 2 code review (this one).
>>>
>>> Does this help your confusion?
>>>
>>> Dan
>>>
>>>
>>>> Tom
>>>>
>>>> On 9/2/2015 7:52 AM, Daniel D. Daugherty wrote:
>>>>> David,
>>>>>
>>>>> Thanks for the very fast re-review!
>>>>>
>>>>> Enjoy your vacation!
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 9/2/15 2:54 AM, David Holmes wrote:
>>>>>> Hi Dan,
>>>>>>
>>>>>> On 2/09/2015 2:45 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I've updated the "fix" for this bug based on code review comments
>>>>>>> received in round 1. I've dropped most of the changes from round 1
>>>>>>> with a couple of exceptions.
>>>>>>
>>>>>> I have no further comments - it all looks good to me. If others 
>>>>>> want the pendulum to swing back a little from this position then 
>>>>>> ... nothing that has been suggested is functionally wrong. :)
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>> PS. When you get back from vacation I'll be gone for a month. 
>>>>>> That gives you a large window to push other things through with 
>>>>>> less stress ;-)
>>>>>>
>>>>>>
>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>>
>>>>>>> Webrev URL: 
>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/
>>>>>>>
>>>>>>> The easiest way to re-review is to download the two patch files
>>>>>>> (round 0 and round 2) and view them in your favorite file merge 
>>>>>>> tool:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch 
>>>>>>>
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/2-jdk9-hs-rt/hotspot.patch 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>>>>           Aurora Adhoc vm.tmtools batch (in process)
>>>>>>>           Kim's repro sequence for JDK-8049304
>>>>>>>           Kim's repro sequence for JDK-8129978
>>>>>>>           JPRT -testset hotspot
>>>>>>>
>>>>>>> Changes between round 0 and round 2:
>>>>>>>
>>>>>>> - clarify a few comments
>>>>>>> - init _has_PerfData flag with '0' (instead of false)
>>>>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>>>>    _has_PerfData to '1' (we're in a Mutex)
>>>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>>>>    when called at a safepoint and when the StatSampler is not
>>>>>>>    running; this means when the VM is aborting, we no longer have
>>>>>>>    a race between the original crash report and this code path.
>>>>>>>
>>>>>>> Changes between round 1 and round 2:
>>>>>>>
>>>>>>> - clarify a few comments
>>>>>>> - drop is_safe parameter to OM_PERFDATA_OP macro
>>>>>>> - init _has_PerfData flag with '0' (instead of false)
>>>>>>> - drop OrderAccess::fence() call before os::naked_short_sleep() 
>>>>>>> call
>>>>>>> - drop PerfDataManager::has_PerfData_with_acquire()
>>>>>>> - drop unnecessary use OrderAccess::release_store() to set
>>>>>>>    _has_PerfData to '1' (we're in a Mutex)
>>>>>>>
>>>>>>> I believe that I've addressed all comments from round 0 and
>>>>>>> from round 1.
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 8/31/15 4:51 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I've updated the "fix" for this bug based on code review comments
>>>>>>>> received in round 0.
>>>>>>>>
>>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>>>
>>>>>>>> Webrev URL:
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/
>>>>>>>>
>>>>>>>> The easiest way to re-review is to download the two patch files
>>>>>>>> and view them in your favorite file merge tool:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch 
>>>>>>>>
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>>>>>>>>          Aurora Adhoc vm.tmtools batch (in process)
>>>>>>>>          Kim's repro sequence for JDK-8049304
>>>>>>>>          Kim's repro sequence for JDK-8129978
>>>>>>>>          JPRT -testset hotspot
>>>>>>>>
>>>>>>>> Changes between round 0 and round 1:
>>>>>>>>
>>>>>>>> - add an 'is_safe' parameter to the OM_PERFDATA_OP macro;
>>>>>>>>   safepoint-safe callers can access _has_PerfData flag directly;
>>>>>>>>   non-safepoint-safe callers use a load-acquire to fetch the
>>>>>>>>   current _has_PerfData flag value
>>>>>>>> - change PerfDataManager::destroy() to simply set _has_PerfData
>>>>>>>>   to zero (field is volatile) and then use a fence() to prevent
>>>>>>>>   any reordering of operations in any direction; it's only done
>>>>>>>>   once during VM shutdown so...
>>>>>>>> - change perfMemory_exit() to only call PerfDataManager::destroy()
>>>>>>>>   when called at a safepoint and when the StatSample is not
>>>>>>>>   running; this means when the VM is aborting, we no longer have
>>>>>>>>   a race between the original crash report and this code path.
>>>>>>>>
>>>>>>>> I believe that I've addressed all comments from round 0.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/25/15 3:08 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I have a "fix" for a long standing race between JVM shutdown 
>>>>>>>>> and the
>>>>>>>>> JVM statistics subsystem:
>>>>>>>>>
>>>>>>>>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>>>>>>>>
>>>>>>>>> Webrev URL:
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/
>>>>>>>>>
>>>>>>>>> Testing: Aurora Adhoc RT-SVC nightly batch
>>>>>>>>>          Aurora Adhoc vm.tmtools batch
>>>>>>>>>          Kim's repro sequence for JDK-8049304
>>>>>>>>>          Kim's repro sequence for JDK-8129978
>>>>>>>>>          JPRT -testset hotspot
>>>>>>>>>
>>>>>>>>> This "fix":
>>>>>>>>>
>>>>>>>>> - adds a volatile flag to record whether PerfDataManager is 
>>>>>>>>> holding
>>>>>>>>>   data (PerfData objects)
>>>>>>>>> - adds PerfDataManager::has_PerfData() to return the flag
>>>>>>>>> - changes the Java monitor subsystem's use of PerfData to
>>>>>>>>>   check both allocation of the monitor subsystem specific
>>>>>>>>>   PerfData object and the new PerfDataManager::has_PerfData()
>>>>>>>>>   return value
>>>>>>>>>
>>>>>>>>> If the global 'UsePerfData' option is false, the system works as
>>>>>>>>> it did before. If 'UsePerfData' is true (the default on 
>>>>>>>>> non-embedded
>>>>>>>>> systems), the Java monitor subsystem will allocate a number of
>>>>>>>>> PerfData objects to record information. The objects will record
>>>>>>>>> information about Java monitor subsystem until the JVM shuts 
>>>>>>>>> down.
>>>>>>>>>
>>>>>>>>> When the JVM starts to shutdown, the new PerfDataManager flag 
>>>>>>>>> will
>>>>>>>>> change to false and the Java monitor subsystem will stop using 
>>>>>>>>> the
>>>>>>>>> PerfData objects. This is the new behavior. As noted in the 
>>>>>>>>> comments
>>>>>>>>> I added to the code, the race is still present; I'm just changing
>>>>>>>>> the order and the timing to reduce the likelihood of the crash.
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
>