RFR:8047290:Ensure consistent safepoint checking in MutexLockerEx

Fri Dec 5 23:02:49 UTC 2014

Dan prompted me to look at all the bits.

I think in the merge you accidentally reverted these lines.

    // CMS_modUnionTable_lock                   leaf
-  // CMS_bitMap_lock                          leaf + 1
-  // CMS_freeList_lock                        leaf + 2
+   // CMS_bitMap_lock                          leaf 1
+   // CMS_freeList_lock                        leaf 2

Besides restoring UnsafeJlong_lock (which is never checks for safepoints).

-#ifndef SUPPORTS_NATIVE_CX8
-  def(UnsafeJlong_lock             , Mutex,   special,     false);
-#endif

The rest looks fine.  I checked.   We think this compiled ok for you in 
JPRT because our platforms define SUPPORTS_NATIVE_CX8.

Also, I think we should file an single RFE for analyzing the locks for 
the GC code:

"WorkGroup monitor" share/vm/utilities/workgroup.cpp
"SLTMonitor" share/vm/gc_implementation/shared/concurrentGCThread.cpp
"CompactibleFreeListSpace._lock" 
share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
"freelist par lock" 
share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
"CMS_markBitMap_lock" 
share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
ParGCRareEvent_lock

One RFE for analyzing the locks for the compiler:

Compile_lock

And one for the JFR code:

JfrStacktrace_lock

And one for the Runtime code (which may be intentionally "sometimes" 
locks because they control safepoints but someone should verify this):

Safepoint_lock
Threads_lock
VMOperationQueue_lock
VMOperationRequest_lock
Terminator_lock
Heap_lock
PeriodicTask_lock

Thanks!
Coleen

On 12/5/14, 3:08 PM, Coleen Phillimore wrote:
>
> Max,
>
> I think these changes look great!  There were some comments about 
> making this more of a static check with types or detecting deadlocks 
> within the safepointing code, but the change here is a good first step 
> and does so with minimal disruption to the code and minimal 
> complexity.  So I consider this low-hanging fruit in order to make the 
> code safer.
>
> Thanks!
> Coleen
>
>
> On 12/4/14, 4:50 PM, Max Ockner wrote:
>> Hello once again...
>> I have a new (and suggestively titled) webrev: 
>> http://cr.openjdk.java.net/~coleenp/8047290final/
>>
>> Ran aurora tests.
>>
>> Here is a list of "sometimes" locks:
>>
>> "WorkGroup monitor" share/vm/utilities/workgroup.cpp
>> "SLTMonitor" share/vm/gc_implementation/shared/concurrentGCThread.cpp
>> "CompactibleFreeListSpace._lock" 
>> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
>> "freelist par lock" 
>> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
>> "SR_lock"                            share/vm/runtime/thread.cpp
>> "CMS_markBitMap_lock" 
>> share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>
>> The remaining "sometimes" locks can be found in 
>> share/vm/runtime/mutexLocker.cpp:
>>
>> ParGCRareEvent_lock
>> Safepoint_lock
>> Threads_lock
>> VMOperationQueue_lock
>> VMOperationRequest_lock
>> Terminator_lock
>> Heap_lock
>> Compile_lock
>> PeriodicTask_lock
>> JfrStacktrace_lock
>>
>>
>> Thanks,
>> Max Ockner
>>
>>
>>
>>
>> On 11/25/2014 3:03 AM, David Holmes wrote:
>>> Hi Max,
>>>
>>> On 21/11/2014 7:56 AM, Max Ockner wrote:
>>>> Hello again,
>>>>
>>>> I have made changes based on all comments. There is now a pair of 
>>>> assert
>>>> statements in the Monitor/Mutex wait() methods. When I reran tests, I
>>>
>>> Is there an updated webrev?
>>>
>>>> caught another lock that I had to change to "sometimes", but the 
>>>> assert
>>>> that caught this lock was not in wait. There are currently no locks 
>>>> that
>>>> use try to pass an incorrect safepoint check argument to wait().
>>>> Instead, gcbasher did not catch this lock last time, when the only
>>>> asserts were in lock and lock_without_safepoint. This lock is
>>>> "CMS_markBitMap_lock" in
>>>> share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp. 
>>>>
>>>> I'm guessing that it was not caught by the tests because this 
>>>> section of
>>>> code is not reached very often. My initial inspection failed to catch
>>>> this lock because it is passed around between various structures and
>>>> renamed 4 times. I have not yet found a good way to check for this
>>>> situation, even with a debugger.
>>>>
>>>> Are there any tests which actually manage to hit every line of code?
>>>
>>> No. There is too much code that is dependent on low-level details of 
>>> the GC used, the compilation strategy, plus the set of runtime flags 
>>> used (and whether product or fastdebug). That's why we have lots of 
>>> tests run in lots of different ways, to try to get coverage.
>>>
>>>> How should I handle this situation where I can't rely on the tests 
>>>> that
>>>> I have run to tell me if I have missed something?
>>>> At what point can I assume that everything is OK?
>>>
>>> Difficult to answer in general - there are a number of recommended 
>>> test suites used by the runtime team, but your changes will also 
>>> impact GC and compiler code and so may not get exercised by the 
>>> runtime test suites (unless run with various compiler and GC 
>>> options). Perhaps an ad-hoc test run similar to nightlies? Or you 
>>> push after the weekly snapshot so as to maximise nightly testing, 
>>> and if there are issues exposed then you have time to address them 
>>> or revert the fix.
>>>
>>> Cheers,
>>> David
>>>
>>>> Thanks,
>>>> Max Ockner
>>>>
>>>> On 11/10/2014 11:57 PM, David Holmes wrote:
>>>>> Hi Max,
>>>>>
>>>>> On 8/11/2014 2:15 AM, Max Ockner wrote:
>>>>>> Hello all,
>>>>>> I have made these additonal changes:
>>>>>> -Moved the assert() statements into the lock and 
>>>>>> lock_without_safepoint
>>>>>> methods.
>>>>>> -Changed Monitor::SafepointAllowed to 
>>>>>> Monitor::SafepointCheckRequired
>>>>>> -Changed the Monitor::SafepointCheckRequired values for several 
>>>>>> locks
>>>>>> which were locked outside of a MutexLockerEx (some were locked with
>>>>>> MutexLocker, some were locked were locked without any MutexLocker* )
>>>>>>
>>>>>> New webrev location: http://cr.openjdk.java.net/~coleenp/8047290/
>>>>>
>>>>> Generally this is all okay - a few style and other nits below.
>>>>>
>>>>> However you missed adding an assert in Monitor::wait to check if the
>>>>> no_safepoint_check flag was being used correctly for the current 
>>>>> monitor.
>>>>>
>>>>> Specific comments:
>>>>>
>>>>> src/share/vm/runtime/mutex.hpp
>>>>>
>>>>> This comment is no longer accurate with the moved check location:
>>>>>
>>>>> + // MutexLockerEx checks these flags when acquiring a lock
>>>>> + // to ensure consistent checking for each lock.
>>>>>
>>>>> The same goes for other references to MutexLockerEx in the enum
>>>>> description.
>>>>>
>>>>> Also copyright year needs updating.
>>>>>
>>>>> ---
>>>>>
>>>>> src/share/vm/runtime/mutex.cpp
>>>>>
>>>>>  898   //Ensure
>>>>>  961   //Ensure
>>>>>
>>>>> Space needed after //
>>>>>
>>>>> ---
>>>>>
>>>>> src/share/vm/runtime/mutexLocker.cpp
>>>>>
>>>>> +  var = new type(Mutex::pri, #var, 
>>>>> vm_block,safepoint_check_allowed); \
>>>>>
>>>>> space needed after comma in k,s
>>>>>
>>>>> ---
>>>>>
>>>>> src/share/vm/runtime/mutexLocker.hpp
>>>>>
>>>>> Whitespace only changes - looks like leftovers from removed edits.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>>> Additional testing:
>>>>>> jtreg ./jdk/test/java/lang/invoke
>>>>>> jtreg jfr tests
>>>>>>
>>>>>> Here is a list of ALL of the "sometimes" locks:
>>>>>>
>>>>>> "WorkGroup monitor" share/vm/utilities/workgroup.cpp
>>>>>> "SLTMonitor" 
>>>>>> share/vm/gc_implementation/shared/concurrentGCThread.cpp
>>>>>> "CompactibleFreeListSpace._lock"
>>>>>> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp 
>>>>>>
>>>>>>
>>>>>> "freelist par lock"
>>>>>> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp 
>>>>>>
>>>>>>
>>>>>> "SR_lock" share/vm/runtime/thread.cpp
>>>>>>
>>>>>> The remaining "sometimes" locks can be found in
>>>>>> share/vm/runtime/mutexLocker.cpp:
>>>>>>
>>>>>> ParGCRareEvent_lock
>>>>>> Safepoint_lock
>>>>>> Threads_lock
>>>>>> VMOperationQueue_lock
>>>>>> VMOperationRequest_lock
>>>>>> Terminator_lock
>>>>>> Heap_lock
>>>>>> Compile_lock
>>>>>> PeriodicTask_lock
>>>>>> JfrStacktrace_lock
>>>>>>
>>>>>> I have not checked the validity of the "sometimes" locks, and I 
>>>>>> believe
>>>>>> that this should be a different project.
>>>>>>
>>>>>> Thanks for your help!
>>>>>> Max Ockner
>>>>>> On 10/15/2014 8:54 PM, David Holmes wrote:
>>>>>>> Hi Max,
>>>>>>>
>>>>>>> This is looking good.
>>>>>>>
>>>>>>> A few high-level initial comments:
>>>>>>>
>>>>>>> I think SafepointAllowed should be SafepointCheckNeeded
>>>>>>>
>>>>>>> Why are the checks in MutexLocker when the state is maintained 
>>>>>>> in the
>>>>>>> mutex itself and the mutex/monitor has lock_without_safepoint, and
>>>>>>> wait(<safepoint checking flag>) ? I would have expected to see the
>>>>>>> check in the mutex/monitor methods.
>>>>>>>
>>>>>>> Checking consistent usage of the _no_safepoint_check_flag is 
>>>>>>> good. But
>>>>>>> another part of this is that a monitor/mutex that never checks for
>>>>>>> safepoints should never be held when a thread blocks at a 
>>>>>>> safepoint -
>>>>>>> is there some way to easily check that? I was surprised how many 
>>>>>>> locks
>>>>>>> are actually not checking for safepoints.
>>>>>>>
>>>>>>> Did you find any cases where the mutex/monitor was being used
>>>>>>> inconsistently and incorrectly?
>>>>>>>
>>>>>>> Did you analyse the "sometimes" cases to see if they were safe?
>>>>>>> (Aside: just for fun check out what happens if you lock the
>>>>>>> Threads_lock with a safepoint check and a safepoint has been 
>>>>>>> requested
>>>>>>> :) ).
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>>
>>>>>>> On 16/10/2014 4:04 AM, Max Ockner wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I am a new member of the Hotspot runtime team in Burlington, MA.
>>>>>>>> Please review my first fix related to safepoint checking.
>>>>>>>>
>>>>>>>> Summary: MutexLockerEx can either acquire a lock with or without a
>>>>>>>> safepoint check.
>>>>>>>> In some cases, a particular lock must either safepoint check 
>>>>>>>> always or
>>>>>>>> never to avoid deadlocking.
>>>>>>>> Some other locks have semantics which allow them to avoid 
>>>>>>>> deadlocks
>>>>>>>> despite having a safepoint check only some of the time.
>>>>>>>> All locks that are OK having inconsistent safepoint checks have 
>>>>>>>> been
>>>>>>>> marked. All locks that should never safepoint check and all 
>>>>>>>> locks that
>>>>>>>> should always safepoint check have also been marked.
>>>>>>>> When a MutexLockerEx acquires a lock with or without a safepoint
>>>>>>>> check,
>>>>>>>> the lock's safepointAllowed marker is checked to ensure consistent
>>>>>>>> safepoint checking.
>>>>>>>>
>>>>>>>> Webrev: http://oklahoma.us.oracle.com/~mockner/webrev/8047290/
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8047290
>>>>>>>>
>>>>>>>> Tested with:
>>>>>>>>      jprt "-testset hotspot"
>>>>>>>>      jtreg hotspot
>>>>>>>>      vm.quick.testlist
>>>>>>>>
>>>>>>>> Whitebox tests:
>>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java: Test
>>>>>>>> expects Assert ("This lock should always have a safepoint check")
>>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java: Test
>>>>>>>> expects Assert ("This lock should never have a safepoint check")
>>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java: code
>>>>>>>> should not assert. (Lock is properly acquired with no safepoint 
>>>>>>>> check)
>>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java: code
>>>>>>>> should not assert. (Lock is properly acquired with safepoint 
>>>>>>>> check)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Max
>>>>>>>>
>>>>>>
>>>>
>>
>