RFR (S) 5103339: Strengthen NoSafepointVerifier

Wed Aug 14 12:52:46 UTC 2019

Hi Coleen,

On 14/08/2019 10:47 am, coleen.phillimore at oracle.com wrote:
> 
> 
> On 8/7/19 7:18 PM, David Holmes wrote:
>> Hi Coleen,
>>
>> One follow-up:
>>
>> >> That said, are we certain that the xxx_or_null functions cannot
>> >> safepoint? They won't try to load a class but surely they could
>> >> potentially safepoint for other reasons?
>> >
>> > I don't see a safepoint in this code, or have found one while running
>> > the tests.  Hopefully the NSV would have found it!
>>
>> My question is really whether these functions are semantically 
>> guaranteed to never safepoint? 
> 
> No, these functions _do_ safepoint taking out a lock.

I think we're getting our wires crossed somewhere. The code you have is 
this:

     if (or_null) {
       return ak->array_klass_or_null(n);
     }
+   THREAD->check_possible_safepoint();
     return ak->array_klass(n, THREAD);

so we expect the array_klass() call to safepoint and hence we check that 
is allowed before taking that path. We don't check the path that calls 
array_klass_or_null() which suggests we don't expect it to take a 
safepoint. My point is that AFAIK there is nothing that guarantees 
array_klass_or_null() will not perform a safepoint check, so anyone 
assuming it doesn't may be mistaken. Which suggests we should be calling 
check_possible_safepoint() unconditionally so that both paths are covered.

>> Is that a property that these functions expose to their callers? I 
>> don't recall seeing anything to that affect, so it seems to me this is 
>> just how the implementation happens to be, but is not guaranteed to be 
>> so. If there was a NSV in the call path leading to this then I would 
>> have to ask the person who put in the NSV on what basis they think 
>> this mass of code should not ever hit a safepoint.
> There isn't a property of these or any functions that guarantee they 
> won't safepoint.  Some people have claimed having a TRAPS argumnent is a 
> good indicator, but many functions without TRAPS may take out a 
> MutexLocker which can safepoint.  I think it's safe to assume most 
> functions safepoint and taking out a NSV must be careful and call into 
> code that is contained.  The change is that NSV should assert if that 
> condition is not met.
> 
> The fact that this code conditionally safepoints makes it less obvious, 
> but it's still worth checking possible safepoints here.
> 
> In the or_null case, there actually isn't a caller with a NSV.   In a 
> future patch, I made calling this code with a MutexLocker with 
> no_safepoint_check_flag and allow_vm_block = true imply a NSV.  The 
> caller must use the array_klass_or_null() function or it will safepoint.

> I made the changes you suggested.

And changes others suggested :)

> Incremental: 
> http://cr.openjdk.java.net/~coleenp/2019/5103339.02.incr/webrev
> Full:  http://cr.openjdk.java.net/~coleenp/2019/5103339.02/webrev

This all seems fine to me. With regards to the discussion above, what 
you have done is replace the old call to clear_unhandled_oops with the 
new call to check_possible_safepoint under the same conditions. I'm 
querying whether those original conditions were actually too strict, but 
that is in a sense outside the scope of this change.

Thanks,
David

> Thanks,
> Coleen
> 
>>
>> Cheers,
>> David
>>
>> On 8/08/2019 1:49 am, coleen.phillimore at oracle.com wrote:
>>>
>>> Thanks David,
>>>
>>> On 8/6/19 10:36 PM, David Holmes wrote:
>>>> Hi Coleen,
>>>>
>>>> On 7/08/2019 8:43 am, coleen.phillimore at oracle.com wrote:
>>>>>
>>>>> Thanks David for reading this.
>>>>>
>>>>> On 8/5/19 9:47 PM, David Holmes wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> Could please update the bug explaining why it was re-opened and 
>>>>>> what the intended enhancement is. It was closed in 2005 as not 
>>>>>> necessary because of the unhandled-oop checking.
>>>>>
>>>>> Yes, I added a comment.
>>>>
>>>> Thanks!
>>>>
>>>>>>
>>>>>> On 6/08/2019 2:00 am, coleen.phillimore at oracle.com wrote:
>>>>>>> Summary: Add NSV check at possible safepoint transition or places 
>>>>>>> that could take out locks. Consolidate with clearing unhandled oops.
>>>>>>>
>>>>>>> This change checks for NoSafepointVerifier no_safepoint_counts at 
>>>>>>> possible safepoints.  The starting set is at transitions, and in the 
>>>>>>
>>>>>> Wouldn't it be better placed in the actual Safepoint checking 
>>>>>> methods rather than callers of those methods?
>>>>>
>>>>> See my reply to Robbin.  We want to make the checks in places that 
>>>>> conditionally may not safepoint poll, so that timing doesn't 
>>>>> prevent finding when we've violated the NoSafepointVerifier check.
>>>>
>>>> Okay.
>>>>
>>>>>>
>>>>>>> "else" clauses where CHECK_UNHANDLED_OOPS were clearing unhandled 
>>>>>>> oops. Some of these were removed because they weren't places with 
>>>>>>> possible safepoints, so were wrong.
>>>>>>>
>>>>>>> The unhandled oops clearing and no_safepoint counter check are 
>>>>>>> now done in the same function.  MemAllocator -> 
>>>>>>> check_for_valid_allocation_state calls 
>>>>>>> check_for_valid_safepoint_state which calls 
>>>>>>> check_possible_safepoint.
>>>>>>>
>>>>>>> Calls to check_possible_safepoint are in DEBUG_ONLY when 
>>>>>>> Thread::current() is called.
>>>>>>>
>>>>>>> I had to remove it from the else clause in JvmtiThreadState 
>>>>>>> because it's called from a place that cannot safepoint (see 
>>>>>>> vtableStubs.cpp).
>>>>>>
>>>>>> Can you elaborate on that further please. It's not obvious, 
>>>>>> without following call chains, when any code may or may not hit a 
>>>>>> safepoint check.
>>>>>
>>>>> No, it isn't.   There's quite a large call stack here, but the 
>>>>> CompiledICLocker has an embedded NoSafepointVerifier, so here's the 
>>>>> call stack:
>>>>>
>>>>> V  [libjvm.so+0x16c5d05] Thread::check_possible_safepoint()+0x55
>>>>> V  [libjvm.so+0x10fb15c] 
>>>>> JvmtiExport::post_dynamic_code_generated_while_holding_locks(char 
>>>>> const*, unsigned char*, unsigned char*)+0x4c
>>>>> V  [libjvm.so+0x17c8fac]  VtableStubs::find_stub(bool, int)+0x2ac
>>>>> V  [libjvm.so+0x9dd18e] CompiledIC::set_to_megamorphic(CallInfo*, 
>>>>> Bytecodes::Code, bool&, Thread*)+0x9e
>>>>> V  [libjvm.so+0x1562bb4] 
>>>>> SharedRuntime::handle_ic_miss_helper_internal(Handle, 
>>>>> CompiledMethod*, frame const&, methodHandle, Bytecodes::Code, 
>>>>> CallInfo&, bool&, Thread*)+0x434
>>>>> V  [libjvm.so+0x156d202] 
>>>>> SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*)+0x372
>>>>> V  [libjvm.so+0x156d7a4] 
>>>>> SharedRuntime::handle_wrong_method_ic_miss(JavaThread*)+0x184
>>>>>
>>>>> The function handle_ic_miss_helper_internal has a CompiledICLocker, 
>>>>> which has an embedded NoSafepointVerifier.
>>>>>
>>>>> This seems like a lot of code to be protected with a 
>>>>> NoSafepointVerifier, but I assume that at least the find_stub code 
>>>>> shouldn't get cleaned up by a safepoint, so would require it.
>>>>
>>>> Okay so because you added the call to jvmti_thread_state() in the 
>>>> vtableStubs code you have to remove the check from the else-clause 
>>>> in jvmti_thread_state(), because it could fail with the new call path.
>>>
>>> Yes, but I'm not happy with this so I've reinstated the check in the 
>>> 'else' clause and changed 
>>> post_dynamic_code_generated_while_holding_locks() to simply get the 
>>> jvmti_state directly and not call JvmtiThreadState::state_for().  It 
>>> already had an assert that it wasn't null.
>>>
>>> I'm retesting this and will have an 02 plus incremental next week.
>>>
>>>>> I also ran tests with an assert that the jvmti_thread_state() != 
>>>>> NULL, just to see if it's ever NULL and it wasn't.  I would have 
>>>>> liked to keep the else.
>>>>>>
>>>>>>> os.cpp ResourceMark needed for debugging.
>>>>>>
>>>>>> Interesting catch - I would have expected the RM to be higher up 
>>>>>> the call if needed. How did you detect this?
>>>>>
>>>>> Debugging, but I don't remember which thing I was debugging though.
>>>>>>
>>>>>>> Tested with tier1 on all Oracle platforms, and tier 1-3 on 
>>>>>>> linux-x64-debug.
>>>>>>>
>>>>>>> open webrev at 
>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/5103339.01/webrev
>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-5103339
>>>>>>
>>>>>> src/hotspot/share/oops/objArrayKlass.cpp
>>>>>> src/hotspot/share/oops/typeArrayKlass.cpp
>>>>>>
>>>>>> Unclear why the check is predicated on !or_null ??
>>>>>
>>>>> If the parameter passed or_null is true, then the function never 
>>>>> safepoints. I added this comment.
>>>>>
>>>>>      // If passed or_null, this function will not try to safepoint.
>>>>>      if (!or_null) THREAD->check_possible_safepoint();
>>>>
>>>> Sorry I'm confused by the placement. We want to call 
>>>> check_possible_safepoint() on code paths that can safepoint, so that 
>>>> we ensure there is no NSV higher up the call chain when we take that 
>>>> path. So notwithstanding where the check_unhandled_oops was, instead of
>>>>
>>>>  352   } else {
>>>>  353     if (!or_null) THREAD->check_possible_safepoint();
>>>>  354   }
>>>>  355
>>>>  356   ObjArrayKlass *ak = ObjArrayKlass::cast(higher_dimension());
>>>>  357   if (or_null) {
>>>>  358     return ak->array_klass_or_null(n);
>>>>  359   }
>>>>  360   return ak->array_klass(n, THREAD);
>>>>
>>>> shouldn't/couldn't we just have:
>>>>
>>>>  352   }
>>>>  353
>>>>  354
>>>>  355
>>>>  356   ObjArrayKlass *ak = ObjArrayKlass::cast(higher_dimension());
>>>>  357   if (or_null) {
>>>>  358     return ak->array_klass_or_null(n);
>>>>  359   }
>>>>        THREAD->check_possible_safepoint();
>>>>  360   return ak->array_klass(n, THREAD);
>>>>
>>>> ?
>>>
>>> I couldn't decide if this is better or worse than what I had because 
>>> looking at this, one might question why this call is here, and it 
>>> checks the safepoint twice (already in the MutexLocker path).  It's a 
>>> few less lines of code.  okay, sure I can change it.  I didn't find a 
>>> better refactoring.
>>>
>>>> That said, are we certain that the xxx_or_null functions cannot 
>>>> safepoint? They won't try to load a class but surely they could 
>>>> potentially safepoint for other reasons?
>>>
>>> I don't see a safepoint in this code, or have found one while running 
>>> the tests.  Hopefully the NSV would have found it!
>>>
>>> Coleen
>>>
>>>>
>>>>> This is also called with NSV with or_null true.   I can't remember 
>>>>> where, I think it was the heap walker.
>>>>>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/hotspot/share/code/vtableStubs.cpp
>>>>>>
>>>>>> // cause a safepoint in this code that has NSV.
>>>>>>
>>>>>> Unclear what "code" has the NSV ??
>>>>>
>>>>> See above.  I can change the comment to:
>>>>>
>>>>> 237 // all locks. Only post this event if a new state is not 
>>>>> required. Creating a new state would
>>>>> 238 // cause a safepoint and the caller of this code has a 
>>>>> NoSafepointVerifier.
>>>>
>>>> Sounds good!
>>>>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>>>>>
>>>>>> As mentioned above why not put the check inside 
>>>>>> SafepointMechanism::block_if_requested instead?
>>>>>
>>>>> The other callers of block_if_requested already have a 
>>>>> check_possible_safepoint call. The only caller that made sense to 
>>>>> add was transition from or to vm.
>>>>
>>>> Got it.
>>>>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/hotspot/share/runtime/mutex.cpp
>>>>>>
>>>>>> The Mutex constructor cleanup is incidental but okay. (Does make 
>>>>>> me wonder about the whole Monitor construction process though ... 
>>>>>> ClearMonitor doesn't need to be protected any more.)
>>>>>
>>>>> This leaked into this change.  I'll take it out and post a 
>>>>> different RFR for it.  The no-arg constructor Monitor isn't needed 
>>>>> either so ClearMonitor can be deleted.
>>>>
>>>> Okay.
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>>
>>>>>> // NSV checking
>>>>>> void check_for_valid_safepoint_state(bool potential_vm_operation) 
>>>>>> NOT_DEBUG_RETURN;
>>>>>> void check_possible_safepoint() NOT_DEBUG_RETURN;
>>>>>>
>>>>>> Please expand NSV. Does the comment apply to both methods or only 
>>>>>> the first?
>>>>>
>>>>> Both methods.  I'll change it to:
>>>>>
>>>>>    // These functions check conditions on a JavaThread before 
>>>>> possibly going to a safepoint,
>>>>>    // including NoSafepointVerifier.
>>>>>    void check_for_valid_safepoint_state(bool 
>>>>> potential_vm_operation) NOT_DEBUG_RETURN;
>>>>>    void check_possible_safepoint() NOT_DEBUG_RETURN;
>>>>
>>>> Okay on the new comment.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>>
>>>>> Thanks!
>>>>> Coleen
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>
>>>
>