RFR (S) 5103339: Strengthen NoSafepointVerifier

Wed Aug 14 22:31:00 UTC 2019

Thanks for explaining things for me Coleen. FTR I had missed the fact 
the check was already implicitly present in the MutexLocker path.

David

On 15/08/2019 12:35 am, coleen.phillimore at oracle.com wrote:
> 
> 
> On 8/14/19 8:52 AM, David Holmes wrote:
>> Hi Coleen,
>>
>> On 14/08/2019 10:47 am, coleen.phillimore at oracle.com wrote:
>>>
>>>
>>> On 8/7/19 7:18 PM, David Holmes wrote:
>>>> Hi Coleen,
>>>>
>>>> One follow-up:
>>>>
>>>> >> That said, are we certain that the xxx_or_null functions cannot
>>>> >> safepoint? They won't try to load a class but surely they could
>>>> >> potentially safepoint for other reasons?
>>>> >
>>>> > I don't see a safepoint in this code, or have found one while running
>>>> > the tests.  Hopefully the NSV would have found it!
>>>>
>>>> My question is really whether these functions are semantically 
>>>> guaranteed to never safepoint? 
>>>
>>> No, these functions _do_ safepoint taking out a lock.
>>
>> I think we're getting our wires crossed somewhere. The code you have 
>> is this:
>>
>>     if (or_null) {
>>       return ak->array_klass_or_null(n);
>>     }
>> +   THREAD->check_possible_safepoint();
>>     return ak->array_klass(n, THREAD);
>>
>> so we expect the array_klass() call to safepoint and hence we check 
>> that is allowed before taking that path. We don't check the path that 
>> calls array_klass_or_null() which suggests we don't expect it to take 
>> a safepoint. My point is that AFAIK there is nothing that guarantees 
>> array_klass_or_null() will not perform a safepoint check, so anyone 
>> assuming it doesn't may be mistaken. Which suggests we should be 
>> calling check_possible_safepoint() unconditionally so that both paths 
>> are covered.
> 
> Hi,  I think there's some confusion here.  The function 
> array_klass_or_null() calls array_klass_impl(or_null == true), and that 
> function's callers expect it not to safepoint.   I have another patch 
> that I can elaborate in more details.
> 
> The array_klass_impl(or_null == false) which is called by array_klass() 
> can safepoint when taking out a mutex in the 'if' condition above:
> 
>   329   // lock-free read needs acquire semantics
>   330   if (higher_dimension_acquire() == NULL) {
>   331     if (or_null) return NULL;
>   332
>   333     ResourceMark rm;
>   334     JavaThread *jt = (JavaThread *)THREAD;
>   335     {
>   336       // Ensure atomic creation of higher dimensions
>   337       MutexLocker mu(MultiArray_lock, THREAD);
> 
> 
> 
> So this function will safepoint if higher_dimension is NULL and or_null 
> is false.   We want to have this unconditionally check for safepoints 
> when the or_null is false, but higher_dimension is not null, essentially 
> unconditionally when array_klass() is called. The MutexLocker checks 
> possible safepoint inside, so we needed this in the 'else' clause.   Or 
> in the location below.
> 
> This is somewhat tortured logic because this function is to be called in 
> different contexts.
>>
>>>> Is that a property that these functions expose to their callers? I 
>>>> don't recall seeing anything to that affect, so it seems to me this 
>>>> is just how the implementation happens to be, but is not guaranteed 
>>>> to be so. If there was a NSV in the call path leading to this then I 
>>>> would have to ask the person who put in the NSV on what basis they 
>>>> think this mass of code should not ever hit a safepoint.
>>> There isn't a property of these or any functions that guarantee they 
>>> won't safepoint.  Some people have claimed having a TRAPS argumnent 
>>> is a good indicator, but many functions without TRAPS may take out a 
>>> MutexLocker which can safepoint.  I think it's safe to assume most 
>>> functions safepoint and taking out a NSV must be careful and call 
>>> into code that is contained.  The change is that NSV should assert if 
>>> that condition is not met.
>>>
>>> The fact that this code conditionally safepoints makes it less 
>>> obvious, but it's still worth checking possible safepoints here.
>>>
>>> In the or_null case, there actually isn't a caller with a NSV. In a 
>>> future patch, I made calling this code with a MutexLocker with 
>>> no_safepoint_check_flag and allow_vm_block = true imply a NSV.  The 
>>> caller must use the array_klass_or_null() function or it will safepoint.
>>
>>> I made the changes you suggested.
>>
>> And changes others suggested :)
>>
>>> Incremental: 
>>> http://cr.openjdk.java.net/~coleenp/2019/5103339.02.incr/webrev
>>> Full: http://cr.openjdk.java.net/~coleenp/2019/5103339.02/webrev
>>
>> This all seems fine to me. With regards to the discussion above, what 
>> you have done is replace the old call to clear_unhandled_oops with the 
>> new call to check_possible_safepoint under the same conditions. I'm 
>> querying whether those original conditions were actually too strict, 
>> but that is in a sense outside the scope of this change.
> 
> The check unhandled oops case might have been too strict but was 
> probably benign because the caller in the or_null == true case didn't 
> have any unhandled oops that were subsequently dereferenced.   Ie 
> unhandled oops shouldn't have been checked there but we didn't know it.
> 
> Thanks for the review and discussions.
> Coleen
> 
>>
>> Thanks,
>> David
>>
>>> Thanks,
>>> Coleen
>>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> On 8/08/2019 1:49 am, coleen.phillimore at oracle.com wrote:
>>>>>
>>>>> Thanks David,
>>>>>
>>>>> On 8/6/19 10:36 PM, David Holmes wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> On 7/08/2019 8:43 am, coleen.phillimore at oracle.com wrote:
>>>>>>>
>>>>>>> Thanks David for reading this.
>>>>>>>
>>>>>>> On 8/5/19 9:47 PM, David Holmes wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> Could please update the bug explaining why it was re-opened and 
>>>>>>>> what the intended enhancement is. It was closed in 2005 as not 
>>>>>>>> necessary because of the unhandled-oop checking.
>>>>>>>
>>>>>>> Yes, I added a comment.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>>>
>>>>>>>> On 6/08/2019 2:00 am, coleen.phillimore at oracle.com wrote:
>>>>>>>>> Summary: Add NSV check at possible safepoint transition or 
>>>>>>>>> places that could take out locks. Consolidate with clearing 
>>>>>>>>> unhandled oops.
>>>>>>>>>
>>>>>>>>> This change checks for NoSafepointVerifier no_safepoint_counts 
>>>>>>>>> at possible safepoints.  The starting set is at transitions, 
>>>>>>>>> and in the 
>>>>>>>>
>>>>>>>> Wouldn't it be better placed in the actual Safepoint checking 
>>>>>>>> methods rather than callers of those methods?
>>>>>>>
>>>>>>> See my reply to Robbin.  We want to make the checks in places 
>>>>>>> that conditionally may not safepoint poll, so that timing doesn't 
>>>>>>> prevent finding when we've violated the NoSafepointVerifier check.
>>>>>>
>>>>>> Okay.
>>>>>>
>>>>>>>>
>>>>>>>>> "else" clauses where CHECK_UNHANDLED_OOPS were clearing 
>>>>>>>>> unhandled oops. Some of these were removed because they weren't 
>>>>>>>>> places with possible safepoints, so were wrong.
>>>>>>>>>
>>>>>>>>> The unhandled oops clearing and no_safepoint counter check are 
>>>>>>>>> now done in the same function. MemAllocator -> 
>>>>>>>>> check_for_valid_allocation_state calls 
>>>>>>>>> check_for_valid_safepoint_state which calls 
>>>>>>>>> check_possible_safepoint.
>>>>>>>>>
>>>>>>>>> Calls to check_possible_safepoint are in DEBUG_ONLY when 
>>>>>>>>> Thread::current() is called.
>>>>>>>>>
>>>>>>>>> I had to remove it from the else clause in JvmtiThreadState 
>>>>>>>>> because it's called from a place that cannot safepoint (see 
>>>>>>>>> vtableStubs.cpp).
>>>>>>>>
>>>>>>>> Can you elaborate on that further please. It's not obvious, 
>>>>>>>> without following call chains, when any code may or may not hit 
>>>>>>>> a safepoint check.
>>>>>>>
>>>>>>> No, it isn't.   There's quite a large call stack here, but the 
>>>>>>> CompiledICLocker has an embedded NoSafepointVerifier, so here's 
>>>>>>> the call stack:
>>>>>>>
>>>>>>> V  [libjvm.so+0x16c5d05] Thread::check_possible_safepoint()+0x55
>>>>>>> V  [libjvm.so+0x10fb15c] 
>>>>>>> JvmtiExport::post_dynamic_code_generated_while_holding_locks(char 
>>>>>>> const*, unsigned char*, unsigned char*)+0x4c
>>>>>>> V  [libjvm.so+0x17c8fac]  VtableStubs::find_stub(bool, int)+0x2ac
>>>>>>> V  [libjvm.so+0x9dd18e] CompiledIC::set_to_megamorphic(CallInfo*, 
>>>>>>> Bytecodes::Code, bool&, Thread*)+0x9e
>>>>>>> V  [libjvm.so+0x1562bb4] 
>>>>>>> SharedRuntime::handle_ic_miss_helper_internal(Handle, 
>>>>>>> CompiledMethod*, frame const&, methodHandle, Bytecodes::Code, 
>>>>>>> CallInfo&, bool&, Thread*)+0x434
>>>>>>> V  [libjvm.so+0x156d202] 
>>>>>>> SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*)+0x372
>>>>>>> V  [libjvm.so+0x156d7a4] 
>>>>>>> SharedRuntime::handle_wrong_method_ic_miss(JavaThread*)+0x184
>>>>>>>
>>>>>>> The function handle_ic_miss_helper_internal has a 
>>>>>>> CompiledICLocker, which has an embedded NoSafepointVerifier.
>>>>>>>
>>>>>>> This seems like a lot of code to be protected with a 
>>>>>>> NoSafepointVerifier, but I assume that at least the find_stub 
>>>>>>> code shouldn't get cleaned up by a safepoint, so would require it.
>>>>>>
>>>>>> Okay so because you added the call to jvmti_thread_state() in the 
>>>>>> vtableStubs code you have to remove the check from the else-clause 
>>>>>> in jvmti_thread_state(), because it could fail with the new call 
>>>>>> path.
>>>>>
>>>>> Yes, but I'm not happy with this so I've reinstated the check in 
>>>>> the 'else' clause and changed 
>>>>> post_dynamic_code_generated_while_holding_locks() to simply get the 
>>>>> jvmti_state directly and not call JvmtiThreadState::state_for().  
>>>>> It already had an assert that it wasn't null.
>>>>>
>>>>> I'm retesting this and will have an 02 plus incremental next week.
>>>>>
>>>>>>> I also ran tests with an assert that the jvmti_thread_state() != 
>>>>>>> NULL, just to see if it's ever NULL and it wasn't.  I would have 
>>>>>>> liked to keep the else.
>>>>>>>>
>>>>>>>>> os.cpp ResourceMark needed for debugging.
>>>>>>>>
>>>>>>>> Interesting catch - I would have expected the RM to be higher up 
>>>>>>>> the call if needed. How did you detect this?
>>>>>>>
>>>>>>> Debugging, but I don't remember which thing I was debugging though.
>>>>>>>>
>>>>>>>>> Tested with tier1 on all Oracle platforms, and tier 1-3 on 
>>>>>>>>> linux-x64-debug.
>>>>>>>>>
>>>>>>>>> open webrev at 
>>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/5103339.01/webrev
>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-5103339
>>>>>>>>
>>>>>>>> src/hotspot/share/oops/objArrayKlass.cpp
>>>>>>>> src/hotspot/share/oops/typeArrayKlass.cpp
>>>>>>>>
>>>>>>>> Unclear why the check is predicated on !or_null ??
>>>>>>>
>>>>>>> If the parameter passed or_null is true, then the function never 
>>>>>>> safepoints. I added this comment.
>>>>>>>
>>>>>>>      // If passed or_null, this function will not try to safepoint.
>>>>>>>      if (!or_null) THREAD->check_possible_safepoint();
>>>>>>
>>>>>> Sorry I'm confused by the placement. We want to call 
>>>>>> check_possible_safepoint() on code paths that can safepoint, so 
>>>>>> that we ensure there is no NSV higher up the call chain when we 
>>>>>> take that path. So notwithstanding where the check_unhandled_oops 
>>>>>> was, instead of
>>>>>>
>>>>>>  352   } else {
>>>>>>  353     if (!or_null) THREAD->check_possible_safepoint();
>>>>>>  354   }
>>>>>>  355
>>>>>>  356   ObjArrayKlass *ak = ObjArrayKlass::cast(higher_dimension());
>>>>>>  357   if (or_null) {
>>>>>>  358     return ak->array_klass_or_null(n);
>>>>>>  359   }
>>>>>>  360   return ak->array_klass(n, THREAD);
>>>>>>
>>>>>> shouldn't/couldn't we just have:
>>>>>>
>>>>>>  352   }
>>>>>>  353
>>>>>>  354
>>>>>>  355
>>>>>>  356   ObjArrayKlass *ak = ObjArrayKlass::cast(higher_dimension());
>>>>>>  357   if (or_null) {
>>>>>>  358     return ak->array_klass_or_null(n);
>>>>>>  359   }
>>>>>>        THREAD->check_possible_safepoint();
>>>>>>  360   return ak->array_klass(n, THREAD);
>>>>>>
>>>>>> ?
>>>>>
>>>>> I couldn't decide if this is better or worse than what I had 
>>>>> because looking at this, one might question why this call is here, 
>>>>> and it checks the safepoint twice (already in the MutexLocker 
>>>>> path).  It's a few less lines of code.  okay, sure I can change 
>>>>> it.  I didn't find a better refactoring.
>>>>>
>>>>>> That said, are we certain that the xxx_or_null functions cannot 
>>>>>> safepoint? They won't try to load a class but surely they could 
>>>>>> potentially safepoint for other reasons?
>>>>>
>>>>> I don't see a safepoint in this code, or have found one while 
>>>>> running the tests.  Hopefully the NSV would have found it!
>>>>>
>>>>> Coleen
>>>>>
>>>>>>
>>>>>>> This is also called with NSV with or_null true.   I can't 
>>>>>>> remember where, I think it was the heap walker.
>>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/code/vtableStubs.cpp
>>>>>>>>
>>>>>>>> // cause a safepoint in this code that has NSV.
>>>>>>>>
>>>>>>>> Unclear what "code" has the NSV ??
>>>>>>>
>>>>>>> See above.  I can change the comment to:
>>>>>>>
>>>>>>> 237 // all locks. Only post this event if a new state is not 
>>>>>>> required. Creating a new state would
>>>>>>> 238 // cause a safepoint and the caller of this code has a 
>>>>>>> NoSafepointVerifier.
>>>>>>
>>>>>> Sounds good!
>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>>>>>>>
>>>>>>>> As mentioned above why not put the check inside 
>>>>>>>> SafepointMechanism::block_if_requested instead?
>>>>>>>
>>>>>>> The other callers of block_if_requested already have a 
>>>>>>> check_possible_safepoint call. The only caller that made sense to 
>>>>>>> add was transition from or to vm.
>>>>>>
>>>>>> Got it.
>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/mutex.cpp
>>>>>>>>
>>>>>>>> The Mutex constructor cleanup is incidental but okay. (Does make 
>>>>>>>> me wonder about the whole Monitor construction process though 
>>>>>>>> ... ClearMonitor doesn't need to be protected any more.)
>>>>>>>
>>>>>>> This leaked into this change.  I'll take it out and post a 
>>>>>>> different RFR for it.  The no-arg constructor Monitor isn't 
>>>>>>> needed either so ClearMonitor can be deleted.
>>>>>>
>>>>>> Okay.
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>>>>
>>>>>>>> // NSV checking
>>>>>>>> void check_for_valid_safepoint_state(bool 
>>>>>>>> potential_vm_operation) NOT_DEBUG_RETURN;
>>>>>>>> void check_possible_safepoint() NOT_DEBUG_RETURN;
>>>>>>>>
>>>>>>>> Please expand NSV. Does the comment apply to both methods or 
>>>>>>>> only the first?
>>>>>>>
>>>>>>> Both methods.  I'll change it to:
>>>>>>>
>>>>>>>    // These functions check conditions on a JavaThread before 
>>>>>>> possibly going to a safepoint,
>>>>>>>    // including NoSafepointVerifier.
>>>>>>>    void check_for_valid_safepoint_state(bool 
>>>>>>> potential_vm_operation) NOT_DEBUG_RETURN;
>>>>>>>    void check_possible_safepoint() NOT_DEBUG_RETURN;
>>>>>>
>>>>>> Okay on the new comment.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Coleen
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>
>>>>>
>>>
>