RFR:8047290:Ensure consistent safepoint checking in MutexLockerEx
David Holmes
david.holmes at oracle.com
Mon Dec 8 02:24:42 UTC 2014
On 6/12/2014 9:02 AM, Coleen Phillimore wrote:
>
> Dan prompted me to look at all the bits.
>
> I think in the merge you accidentally reverted these lines.
>
> // CMS_modUnionTable_lock leaf
> - // CMS_bitMap_lock leaf + 1
> - // CMS_freeList_lock leaf + 2
> + // CMS_bitMap_lock leaf 1
> + // CMS_freeList_lock leaf 2
>
> Besides restoring UnsafeJlong_lock (which is never checks for safepoints).
>
> -#ifndef SUPPORTS_NATIVE_CX8
> - def(UnsafeJlong_lock , Mutex, special, false);
> -#endif
Thanks for adding that info Coleen - it was on my todo list to mention
this new lock to Max.
> The rest looks fine. I checked. We think this compiled ok for you in
> JPRT because our platforms define SUPPORTS_NATIVE_CX8.
Not all them - a full JPRT test job would have failed.
Cheers,
David
> Also, I think we should file an single RFE for analyzing the locks for
> the GC code:
>
> "WorkGroup monitor" share/vm/utilities/workgroup.cpp
> "SLTMonitor" share/vm/gc_implementation/shared/concurrentGCThread.cpp
> "CompactibleFreeListSpace._lock"
> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
> "freelist par lock"
> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
> "CMS_markBitMap_lock"
> share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>
> ParGCRareEvent_lock
>
>
> One RFE for analyzing the locks for the compiler:
>
> Compile_lock
>
> And one for the JFR code:
>
> JfrStacktrace_lock
>
> And one for the Runtime code (which may be intentionally "sometimes"
> locks because they control safepoints but someone should verify this):
>
> Safepoint_lock
> Threads_lock
> VMOperationQueue_lock
> VMOperationRequest_lock
> Terminator_lock
> Heap_lock
> PeriodicTask_lock
>
> Thanks!
> Coleen
>
>
> On 12/5/14, 3:08 PM, Coleen Phillimore wrote:
>>
>> Max,
>>
>> I think these changes look great! There were some comments about
>> making this more of a static check with types or detecting deadlocks
>> within the safepointing code, but the change here is a good first step
>> and does so with minimal disruption to the code and minimal
>> complexity. So I consider this low-hanging fruit in order to make the
>> code safer.
>>
>> Thanks!
>> Coleen
>>
>>
>> On 12/4/14, 4:50 PM, Max Ockner wrote:
>>> Hello once again...
>>> I have a new (and suggestively titled) webrev:
>>> http://cr.openjdk.java.net/~coleenp/8047290final/
>>>
>>> Ran aurora tests.
>>>
>>> Here is a list of "sometimes" locks:
>>>
>>> "WorkGroup monitor" share/vm/utilities/workgroup.cpp
>>> "SLTMonitor" share/vm/gc_implementation/shared/concurrentGCThread.cpp
>>> "CompactibleFreeListSpace._lock"
>>> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
>>>
>>> "freelist par lock"
>>> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
>>>
>>> "SR_lock" share/vm/runtime/thread.cpp
>>> "CMS_markBitMap_lock"
>>> share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>
>>>
>>> The remaining "sometimes" locks can be found in
>>> share/vm/runtime/mutexLocker.cpp:
>>>
>>> ParGCRareEvent_lock
>>> Safepoint_lock
>>> Threads_lock
>>> VMOperationQueue_lock
>>> VMOperationRequest_lock
>>> Terminator_lock
>>> Heap_lock
>>> Compile_lock
>>> PeriodicTask_lock
>>> JfrStacktrace_lock
>>>
>>>
>>> Thanks,
>>> Max Ockner
>>>
>>>
>>>
>>>
>>> On 11/25/2014 3:03 AM, David Holmes wrote:
>>>> Hi Max,
>>>>
>>>> On 21/11/2014 7:56 AM, Max Ockner wrote:
>>>>> Hello again,
>>>>>
>>>>> I have made changes based on all comments. There is now a pair of
>>>>> assert
>>>>> statements in the Monitor/Mutex wait() methods. When I reran tests, I
>>>>
>>>> Is there an updated webrev?
>>>>
>>>>> caught another lock that I had to change to "sometimes", but the
>>>>> assert
>>>>> that caught this lock was not in wait. There are currently no locks
>>>>> that
>>>>> use try to pass an incorrect safepoint check argument to wait().
>>>>> Instead, gcbasher did not catch this lock last time, when the only
>>>>> asserts were in lock and lock_without_safepoint. This lock is
>>>>> "CMS_markBitMap_lock" in
>>>>> share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp.
>>>>>
>>>>> I'm guessing that it was not caught by the tests because this
>>>>> section of
>>>>> code is not reached very often. My initial inspection failed to catch
>>>>> this lock because it is passed around between various structures and
>>>>> renamed 4 times. I have not yet found a good way to check for this
>>>>> situation, even with a debugger.
>>>>>
>>>>> Are there any tests which actually manage to hit every line of code?
>>>>
>>>> No. There is too much code that is dependent on low-level details of
>>>> the GC used, the compilation strategy, plus the set of runtime flags
>>>> used (and whether product or fastdebug). That's why we have lots of
>>>> tests run in lots of different ways, to try to get coverage.
>>>>
>>>>> How should I handle this situation where I can't rely on the tests
>>>>> that
>>>>> I have run to tell me if I have missed something?
>>>>> At what point can I assume that everything is OK?
>>>>
>>>> Difficult to answer in general - there are a number of recommended
>>>> test suites used by the runtime team, but your changes will also
>>>> impact GC and compiler code and so may not get exercised by the
>>>> runtime test suites (unless run with various compiler and GC
>>>> options). Perhaps an ad-hoc test run similar to nightlies? Or you
>>>> push after the weekly snapshot so as to maximise nightly testing,
>>>> and if there are issues exposed then you have time to address them
>>>> or revert the fix.
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> Thanks,
>>>>> Max Ockner
>>>>>
>>>>> On 11/10/2014 11:57 PM, David Holmes wrote:
>>>>>> Hi Max,
>>>>>>
>>>>>> On 8/11/2014 2:15 AM, Max Ockner wrote:
>>>>>>> Hello all,
>>>>>>> I have made these additonal changes:
>>>>>>> -Moved the assert() statements into the lock and
>>>>>>> lock_without_safepoint
>>>>>>> methods.
>>>>>>> -Changed Monitor::SafepointAllowed to
>>>>>>> Monitor::SafepointCheckRequired
>>>>>>> -Changed the Monitor::SafepointCheckRequired values for several
>>>>>>> locks
>>>>>>> which were locked outside of a MutexLockerEx (some were locked with
>>>>>>> MutexLocker, some were locked were locked without any MutexLocker* )
>>>>>>>
>>>>>>> New webrev location: http://cr.openjdk.java.net/~coleenp/8047290/
>>>>>>
>>>>>> Generally this is all okay - a few style and other nits below.
>>>>>>
>>>>>> However you missed adding an assert in Monitor::wait to check if the
>>>>>> no_safepoint_check flag was being used correctly for the current
>>>>>> monitor.
>>>>>>
>>>>>> Specific comments:
>>>>>>
>>>>>> src/share/vm/runtime/mutex.hpp
>>>>>>
>>>>>> This comment is no longer accurate with the moved check location:
>>>>>>
>>>>>> + // MutexLockerEx checks these flags when acquiring a lock
>>>>>> + // to ensure consistent checking for each lock.
>>>>>>
>>>>>> The same goes for other references to MutexLockerEx in the enum
>>>>>> description.
>>>>>>
>>>>>> Also copyright year needs updating.
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/share/vm/runtime/mutex.cpp
>>>>>>
>>>>>> 898 //Ensure
>>>>>> 961 //Ensure
>>>>>>
>>>>>> Space needed after //
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/share/vm/runtime/mutexLocker.cpp
>>>>>>
>>>>>> + var = new type(Mutex::pri, #var,
>>>>>> vm_block,safepoint_check_allowed); \
>>>>>>
>>>>>> space needed after comma in k,s
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/share/vm/runtime/mutexLocker.hpp
>>>>>>
>>>>>> Whitespace only changes - looks like leftovers from removed edits.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>>> Additional testing:
>>>>>>> jtreg ./jdk/test/java/lang/invoke
>>>>>>> jtreg jfr tests
>>>>>>>
>>>>>>> Here is a list of ALL of the "sometimes" locks:
>>>>>>>
>>>>>>> "WorkGroup monitor" share/vm/utilities/workgroup.cpp
>>>>>>> "SLTMonitor"
>>>>>>> share/vm/gc_implementation/shared/concurrentGCThread.cpp
>>>>>>> "CompactibleFreeListSpace._lock"
>>>>>>> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
>>>>>>>
>>>>>>>
>>>>>>> "freelist par lock"
>>>>>>> share/vm/gc_implementation/concurrentMarkSweep/compactibleFreeListSpace.cpp
>>>>>>>
>>>>>>>
>>>>>>> "SR_lock" share/vm/runtime/thread.cpp
>>>>>>>
>>>>>>> The remaining "sometimes" locks can be found in
>>>>>>> share/vm/runtime/mutexLocker.cpp:
>>>>>>>
>>>>>>> ParGCRareEvent_lock
>>>>>>> Safepoint_lock
>>>>>>> Threads_lock
>>>>>>> VMOperationQueue_lock
>>>>>>> VMOperationRequest_lock
>>>>>>> Terminator_lock
>>>>>>> Heap_lock
>>>>>>> Compile_lock
>>>>>>> PeriodicTask_lock
>>>>>>> JfrStacktrace_lock
>>>>>>>
>>>>>>> I have not checked the validity of the "sometimes" locks, and I
>>>>>>> believe
>>>>>>> that this should be a different project.
>>>>>>>
>>>>>>> Thanks for your help!
>>>>>>> Max Ockner
>>>>>>> On 10/15/2014 8:54 PM, David Holmes wrote:
>>>>>>>> Hi Max,
>>>>>>>>
>>>>>>>> This is looking good.
>>>>>>>>
>>>>>>>> A few high-level initial comments:
>>>>>>>>
>>>>>>>> I think SafepointAllowed should be SafepointCheckNeeded
>>>>>>>>
>>>>>>>> Why are the checks in MutexLocker when the state is maintained
>>>>>>>> in the
>>>>>>>> mutex itself and the mutex/monitor has lock_without_safepoint, and
>>>>>>>> wait(<safepoint checking flag>) ? I would have expected to see the
>>>>>>>> check in the mutex/monitor methods.
>>>>>>>>
>>>>>>>> Checking consistent usage of the _no_safepoint_check_flag is
>>>>>>>> good. But
>>>>>>>> another part of this is that a monitor/mutex that never checks for
>>>>>>>> safepoints should never be held when a thread blocks at a
>>>>>>>> safepoint -
>>>>>>>> is there some way to easily check that? I was surprised how many
>>>>>>>> locks
>>>>>>>> are actually not checking for safepoints.
>>>>>>>>
>>>>>>>> Did you find any cases where the mutex/monitor was being used
>>>>>>>> inconsistently and incorrectly?
>>>>>>>>
>>>>>>>> Did you analyse the "sometimes" cases to see if they were safe?
>>>>>>>> (Aside: just for fun check out what happens if you lock the
>>>>>>>> Threads_lock with a safepoint check and a safepoint has been
>>>>>>>> requested
>>>>>>>> :) ).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 16/10/2014 4:04 AM, Max Ockner wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I am a new member of the Hotspot runtime team in Burlington, MA.
>>>>>>>>> Please review my first fix related to safepoint checking.
>>>>>>>>>
>>>>>>>>> Summary: MutexLockerEx can either acquire a lock with or without a
>>>>>>>>> safepoint check.
>>>>>>>>> In some cases, a particular lock must either safepoint check
>>>>>>>>> always or
>>>>>>>>> never to avoid deadlocking.
>>>>>>>>> Some other locks have semantics which allow them to avoid
>>>>>>>>> deadlocks
>>>>>>>>> despite having a safepoint check only some of the time.
>>>>>>>>> All locks that are OK having inconsistent safepoint checks have
>>>>>>>>> been
>>>>>>>>> marked. All locks that should never safepoint check and all
>>>>>>>>> locks that
>>>>>>>>> should always safepoint check have also been marked.
>>>>>>>>> When a MutexLockerEx acquires a lock with or without a safepoint
>>>>>>>>> check,
>>>>>>>>> the lock's safepointAllowed marker is checked to ensure consistent
>>>>>>>>> safepoint checking.
>>>>>>>>>
>>>>>>>>> Webrev: http://oklahoma.us.oracle.com/~mockner/webrev/8047290/
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8047290
>>>>>>>>>
>>>>>>>>> Tested with:
>>>>>>>>> jprt "-testset hotspot"
>>>>>>>>> jtreg hotspot
>>>>>>>>> vm.quick.testlist
>>>>>>>>>
>>>>>>>>> Whitebox tests:
>>>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency1.java: Test
>>>>>>>>> expects Assert ("This lock should always have a safepoint check")
>>>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency2.java: Test
>>>>>>>>> expects Assert ("This lock should never have a safepoint check")
>>>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency3.java: code
>>>>>>>>> should not assert. (Lock is properly acquired with no safepoint
>>>>>>>>> check)
>>>>>>>>> test/runtime/Safepoint/AssertSafepointCheckConsistency4.java: code
>>>>>>>>> should not assert. (Lock is properly acquired with safepoint
>>>>>>>>> check)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Max
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>
More information about the hotspot-dev
mailing list