RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Tue Sep 1 06:13:50 UTC 2015

Hi Dan,

Hope you had that coffee :)

On 1/09/2015 8:51 AM, Daniel D. Daugherty wrote:
> Greetings,
>
> I've updated the "fix" for this bug based on code review comments
> received in round 0.
>
> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
> https://bugs.openjdk.java.net/browse/JDK-8049304
>
> Webrev URL: http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/
>
> The easiest way to re-review is to download the two patch files
> and view them in your favorite file merge tool:
>
> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/hotspot.patch
>
> http://cr.openjdk.java.net/~dcubed/8049304-webrev/1-jdk9-hs-rt/hotspot.patch
>
>
> Testing: Aurora Adhoc RT-SVC nightly batch (in process)
>           Aurora Adhoc vm.tmtools batch (in process)
>           Kim's repro sequence for JDK-8049304
>           Kim's repro sequence for JDK-8129978
>           JPRT -testset hotspot
>
> Changes between round 0 and round 1:
>
> - add an 'is_safe' parameter to the OM_PERFDATA_OP macro;

The sense of safepoint-safe is a bit confusing to me. If the thread is 
in a safepoint-safe-state then it seems safepoint-safe==false ??

>    safepoint-safe callers can access _has_PerfData flag directly;
>    non-safepoint-safe callers use a load-acquire to fetch the
>    current _has_PerfData flag value

load-acquire makes no difference to the "currentness" of _has_PerfData 
and has no affect on the size of the "race window". So I don't see any 
point to this is-safe distinction.

> - change PerfDataManager::destroy() to simply set _has_PerfData
>    to zero (field is volatile) and then use a fence() to prevent
>    any reordering of operations in any direction; it's only done
>    once during VM shutdown so...

See previous email :)

> - change perfMemory_exit() to only call PerfDataManager::destroy()
>    when called at a safepoint and when the StatSample is not
>    running; this means when the VM is aborting, we no longer have
>    a race between the original crash report and this code path.

Sounds reasonable.

I have few specific comments/nits:

src/share/vm/runtime/perfData.cpp

  43 volatile jint   PerfDataManager::_has_PerfData = false;

Nit: jint should be initialized with 0 not false.

  287   // manipulation before we free the memory. The two alternatives
  288   // are 1) leak the PerfData memory or 2) do some form of ordered
  289   // access before every PerfData operation.

No amount of ordered access will fix the race - only synchronization of 
some form.

  315 void PerfDataManager::add_item(PerfData* p, bool sampled) {
  316
  317   MutexLocker ml(PerfDataManager_lock);
  318
  319   if (_all == NULL) {
  320     _all = new PerfDataList(100);
  321     OrderAccess::release_store(&_has_PerfData, 1);
  322   }

I can't see any purpose for a release_store here. You have one-time 
initialization code guarded with a mutex. A release_store adds nothing 
here - other than future head-scratching from maintainers of this code.

Thanks,
David

> I believe that I've addressed all comments from round 0.
>
> Thanks, in advance, for any comments, questions or suggestions.
>
> Dan
>
>
> On 8/25/15 3:08 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have a "fix" for a long standing race between JVM shutdown and the
>> JVM statistics subsystem:
>>
>> JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
>> https://bugs.openjdk.java.net/browse/JDK-8049304
>>
>> Webrev URL:
>> http://cr.openjdk.java.net/~dcubed/8049304-webrev/0-jdk9-hs-rt/
>>
>> Testing: Aurora Adhoc RT-SVC nightly batch
>>          Aurora Adhoc vm.tmtools batch
>>          Kim's repro sequence for JDK-8049304
>>          Kim's repro sequence for JDK-8129978
>>          JPRT -testset hotspot
>>
>> This "fix":
>>
>> - adds a volatile flag to record whether PerfDataManager is holding
>>   data (PerfData objects)
>> - adds PerfDataManager::has_PerfData() to return the flag
>> - changes the Java monitor subsystem's use of PerfData to
>>   check both allocation of the monitor subsystem specific
>>   PerfData object and the new PerfDataManager::has_PerfData()
>>   return value
>>
>> If the global 'UsePerfData' option is false, the system works as
>> it did before. If 'UsePerfData' is true (the default on non-embedded
>> systems), the Java monitor subsystem will allocate a number of
>> PerfData objects to record information. The objects will record
>> information about Java monitor subsystem until the JVM shuts down.
>>
>> When the JVM starts to shutdown, the new PerfDataManager flag will
>> change to false and the Java monitor subsystem will stop using the
>> PerfData objects. This is the new behavior. As noted in the comments
>> I added to the code, the race is still present; I'm just changing
>> the order and the timing to reduce the likelihood of the crash.
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Dan
>>
>>
>>
>