RFR (S) 5103339: Strengthen NoSafepointVerifier
David Holmes
david.holmes at oracle.com
Wed Aug 14 22:31:00 UTC 2019
Thanks for explaining things for me Coleen. FTR I had missed the fact
the check was already implicitly present in the MutexLocker path.
David
On 15/08/2019 12:35 am, coleen.phillimore at oracle.com wrote:
>
>
> On 8/14/19 8:52 AM, David Holmes wrote:
>> Hi Coleen,
>>
>> On 14/08/2019 10:47 am, coleen.phillimore at oracle.com wrote:
>>>
>>>
>>> On 8/7/19 7:18 PM, David Holmes wrote:
>>>> Hi Coleen,
>>>>
>>>> One follow-up:
>>>>
>>>> >> That said, are we certain that the xxx_or_null functions cannot
>>>> >> safepoint? They won't try to load a class but surely they could
>>>> >> potentially safepoint for other reasons?
>>>> >
>>>> > I don't see a safepoint in this code, or have found one while running
>>>> > the tests. Hopefully the NSV would have found it!
>>>>
>>>> My question is really whether these functions are semantically
>>>> guaranteed to never safepoint?
>>>
>>> No, these functions _do_ safepoint taking out a lock.
>>
>> I think we're getting our wires crossed somewhere. The code you have
>> is this:
>>
>> if (or_null) {
>> return ak->array_klass_or_null(n);
>> }
>> + THREAD->check_possible_safepoint();
>> return ak->array_klass(n, THREAD);
>>
>> so we expect the array_klass() call to safepoint and hence we check
>> that is allowed before taking that path. We don't check the path that
>> calls array_klass_or_null() which suggests we don't expect it to take
>> a safepoint. My point is that AFAIK there is nothing that guarantees
>> array_klass_or_null() will not perform a safepoint check, so anyone
>> assuming it doesn't may be mistaken. Which suggests we should be
>> calling check_possible_safepoint() unconditionally so that both paths
>> are covered.
>
> Hi, I think there's some confusion here. The function
> array_klass_or_null() calls array_klass_impl(or_null == true), and that
> function's callers expect it not to safepoint. I have another patch
> that I can elaborate in more details.
>
> The array_klass_impl(or_null == false) which is called by array_klass()
> can safepoint when taking out a mutex in the 'if' condition above:
>
> 329 // lock-free read needs acquire semantics
> 330 if (higher_dimension_acquire() == NULL) {
> 331 if (or_null) return NULL;
> 332
> 333 ResourceMark rm;
> 334 JavaThread *jt = (JavaThread *)THREAD;
> 335 {
> 336 // Ensure atomic creation of higher dimensions
> 337 MutexLocker mu(MultiArray_lock, THREAD);
>
>
>
> So this function will safepoint if higher_dimension is NULL and or_null
> is false. We want to have this unconditionally check for safepoints
> when the or_null is false, but higher_dimension is not null, essentially
> unconditionally when array_klass() is called. The MutexLocker checks
> possible safepoint inside, so we needed this in the 'else' clause. Or
> in the location below.
>
> This is somewhat tortured logic because this function is to be called in
> different contexts.
>>
>>>> Is that a property that these functions expose to their callers? I
>>>> don't recall seeing anything to that affect, so it seems to me this
>>>> is just how the implementation happens to be, but is not guaranteed
>>>> to be so. If there was a NSV in the call path leading to this then I
>>>> would have to ask the person who put in the NSV on what basis they
>>>> think this mass of code should not ever hit a safepoint.
>>> There isn't a property of these or any functions that guarantee they
>>> won't safepoint. Some people have claimed having a TRAPS argumnent
>>> is a good indicator, but many functions without TRAPS may take out a
>>> MutexLocker which can safepoint. I think it's safe to assume most
>>> functions safepoint and taking out a NSV must be careful and call
>>> into code that is contained. The change is that NSV should assert if
>>> that condition is not met.
>>>
>>> The fact that this code conditionally safepoints makes it less
>>> obvious, but it's still worth checking possible safepoints here.
>>>
>>> In the or_null case, there actually isn't a caller with a NSV. In a
>>> future patch, I made calling this code with a MutexLocker with
>>> no_safepoint_check_flag and allow_vm_block = true imply a NSV. The
>>> caller must use the array_klass_or_null() function or it will safepoint.
>>
>>> I made the changes you suggested.
>>
>> And changes others suggested :)
>>
>>> Incremental:
>>> http://cr.openjdk.java.net/~coleenp/2019/5103339.02.incr/webrev
>>> Full: http://cr.openjdk.java.net/~coleenp/2019/5103339.02/webrev
>>
>> This all seems fine to me. With regards to the discussion above, what
>> you have done is replace the old call to clear_unhandled_oops with the
>> new call to check_possible_safepoint under the same conditions. I'm
>> querying whether those original conditions were actually too strict,
>> but that is in a sense outside the scope of this change.
>
> The check unhandled oops case might have been too strict but was
> probably benign because the caller in the or_null == true case didn't
> have any unhandled oops that were subsequently dereferenced. Ie
> unhandled oops shouldn't have been checked there but we didn't know it.
>
> Thanks for the review and discussions.
> Coleen
>
>>
>> Thanks,
>> David
>>
>>> Thanks,
>>> Coleen
>>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> On 8/08/2019 1:49 am, coleen.phillimore at oracle.com wrote:
>>>>>
>>>>> Thanks David,
>>>>>
>>>>> On 8/6/19 10:36 PM, David Holmes wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> On 7/08/2019 8:43 am, coleen.phillimore at oracle.com wrote:
>>>>>>>
>>>>>>> Thanks David for reading this.
>>>>>>>
>>>>>>> On 8/5/19 9:47 PM, David Holmes wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> Could please update the bug explaining why it was re-opened and
>>>>>>>> what the intended enhancement is. It was closed in 2005 as not
>>>>>>>> necessary because of the unhandled-oop checking.
>>>>>>>
>>>>>>> Yes, I added a comment.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>>>
>>>>>>>> On 6/08/2019 2:00 am, coleen.phillimore at oracle.com wrote:
>>>>>>>>> Summary: Add NSV check at possible safepoint transition or
>>>>>>>>> places that could take out locks. Consolidate with clearing
>>>>>>>>> unhandled oops.
>>>>>>>>>
>>>>>>>>> This change checks for NoSafepointVerifier no_safepoint_counts
>>>>>>>>> at possible safepoints. The starting set is at transitions,
>>>>>>>>> and in the
>>>>>>>>
>>>>>>>> Wouldn't it be better placed in the actual Safepoint checking
>>>>>>>> methods rather than callers of those methods?
>>>>>>>
>>>>>>> See my reply to Robbin. We want to make the checks in places
>>>>>>> that conditionally may not safepoint poll, so that timing doesn't
>>>>>>> prevent finding when we've violated the NoSafepointVerifier check.
>>>>>>
>>>>>> Okay.
>>>>>>
>>>>>>>>
>>>>>>>>> "else" clauses where CHECK_UNHANDLED_OOPS were clearing
>>>>>>>>> unhandled oops. Some of these were removed because they weren't
>>>>>>>>> places with possible safepoints, so were wrong.
>>>>>>>>>
>>>>>>>>> The unhandled oops clearing and no_safepoint counter check are
>>>>>>>>> now done in the same function. MemAllocator ->
>>>>>>>>> check_for_valid_allocation_state calls
>>>>>>>>> check_for_valid_safepoint_state which calls
>>>>>>>>> check_possible_safepoint.
>>>>>>>>>
>>>>>>>>> Calls to check_possible_safepoint are in DEBUG_ONLY when
>>>>>>>>> Thread::current() is called.
>>>>>>>>>
>>>>>>>>> I had to remove it from the else clause in JvmtiThreadState
>>>>>>>>> because it's called from a place that cannot safepoint (see
>>>>>>>>> vtableStubs.cpp).
>>>>>>>>
>>>>>>>> Can you elaborate on that further please. It's not obvious,
>>>>>>>> without following call chains, when any code may or may not hit
>>>>>>>> a safepoint check.
>>>>>>>
>>>>>>> No, it isn't. There's quite a large call stack here, but the
>>>>>>> CompiledICLocker has an embedded NoSafepointVerifier, so here's
>>>>>>> the call stack:
>>>>>>>
>>>>>>> V [libjvm.so+0x16c5d05] Thread::check_possible_safepoint()+0x55
>>>>>>> V [libjvm.so+0x10fb15c]
>>>>>>> JvmtiExport::post_dynamic_code_generated_while_holding_locks(char
>>>>>>> const*, unsigned char*, unsigned char*)+0x4c
>>>>>>> V [libjvm.so+0x17c8fac] VtableStubs::find_stub(bool, int)+0x2ac
>>>>>>> V [libjvm.so+0x9dd18e] CompiledIC::set_to_megamorphic(CallInfo*,
>>>>>>> Bytecodes::Code, bool&, Thread*)+0x9e
>>>>>>> V [libjvm.so+0x1562bb4]
>>>>>>> SharedRuntime::handle_ic_miss_helper_internal(Handle,
>>>>>>> CompiledMethod*, frame const&, methodHandle, Bytecodes::Code,
>>>>>>> CallInfo&, bool&, Thread*)+0x434
>>>>>>> V [libjvm.so+0x156d202]
>>>>>>> SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*)+0x372
>>>>>>> V [libjvm.so+0x156d7a4]
>>>>>>> SharedRuntime::handle_wrong_method_ic_miss(JavaThread*)+0x184
>>>>>>>
>>>>>>> The function handle_ic_miss_helper_internal has a
>>>>>>> CompiledICLocker, which has an embedded NoSafepointVerifier.
>>>>>>>
>>>>>>> This seems like a lot of code to be protected with a
>>>>>>> NoSafepointVerifier, but I assume that at least the find_stub
>>>>>>> code shouldn't get cleaned up by a safepoint, so would require it.
>>>>>>
>>>>>> Okay so because you added the call to jvmti_thread_state() in the
>>>>>> vtableStubs code you have to remove the check from the else-clause
>>>>>> in jvmti_thread_state(), because it could fail with the new call
>>>>>> path.
>>>>>
>>>>> Yes, but I'm not happy with this so I've reinstated the check in
>>>>> the 'else' clause and changed
>>>>> post_dynamic_code_generated_while_holding_locks() to simply get the
>>>>> jvmti_state directly and not call JvmtiThreadState::state_for().
>>>>> It already had an assert that it wasn't null.
>>>>>
>>>>> I'm retesting this and will have an 02 plus incremental next week.
>>>>>
>>>>>>> I also ran tests with an assert that the jvmti_thread_state() !=
>>>>>>> NULL, just to see if it's ever NULL and it wasn't. I would have
>>>>>>> liked to keep the else.
>>>>>>>>
>>>>>>>>> os.cpp ResourceMark needed for debugging.
>>>>>>>>
>>>>>>>> Interesting catch - I would have expected the RM to be higher up
>>>>>>>> the call if needed. How did you detect this?
>>>>>>>
>>>>>>> Debugging, but I don't remember which thing I was debugging though.
>>>>>>>>
>>>>>>>>> Tested with tier1 on all Oracle platforms, and tier 1-3 on
>>>>>>>>> linux-x64-debug.
>>>>>>>>>
>>>>>>>>> open webrev at
>>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/5103339.01/webrev
>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-5103339
>>>>>>>>
>>>>>>>> src/hotspot/share/oops/objArrayKlass.cpp
>>>>>>>> src/hotspot/share/oops/typeArrayKlass.cpp
>>>>>>>>
>>>>>>>> Unclear why the check is predicated on !or_null ??
>>>>>>>
>>>>>>> If the parameter passed or_null is true, then the function never
>>>>>>> safepoints. I added this comment.
>>>>>>>
>>>>>>> // If passed or_null, this function will not try to safepoint.
>>>>>>> if (!or_null) THREAD->check_possible_safepoint();
>>>>>>
>>>>>> Sorry I'm confused by the placement. We want to call
>>>>>> check_possible_safepoint() on code paths that can safepoint, so
>>>>>> that we ensure there is no NSV higher up the call chain when we
>>>>>> take that path. So notwithstanding where the check_unhandled_oops
>>>>>> was, instead of
>>>>>>
>>>>>> 352 } else {
>>>>>> 353 if (!or_null) THREAD->check_possible_safepoint();
>>>>>> 354 }
>>>>>> 355
>>>>>> 356 ObjArrayKlass *ak = ObjArrayKlass::cast(higher_dimension());
>>>>>> 357 if (or_null) {
>>>>>> 358 return ak->array_klass_or_null(n);
>>>>>> 359 }
>>>>>> 360 return ak->array_klass(n, THREAD);
>>>>>>
>>>>>> shouldn't/couldn't we just have:
>>>>>>
>>>>>> 352 }
>>>>>> 353
>>>>>> 354
>>>>>> 355
>>>>>> 356 ObjArrayKlass *ak = ObjArrayKlass::cast(higher_dimension());
>>>>>> 357 if (or_null) {
>>>>>> 358 return ak->array_klass_or_null(n);
>>>>>> 359 }
>>>>>> THREAD->check_possible_safepoint();
>>>>>> 360 return ak->array_klass(n, THREAD);
>>>>>>
>>>>>> ?
>>>>>
>>>>> I couldn't decide if this is better or worse than what I had
>>>>> because looking at this, one might question why this call is here,
>>>>> and it checks the safepoint twice (already in the MutexLocker
>>>>> path). It's a few less lines of code. okay, sure I can change
>>>>> it. I didn't find a better refactoring.
>>>>>
>>>>>> That said, are we certain that the xxx_or_null functions cannot
>>>>>> safepoint? They won't try to load a class but surely they could
>>>>>> potentially safepoint for other reasons?
>>>>>
>>>>> I don't see a safepoint in this code, or have found one while
>>>>> running the tests. Hopefully the NSV would have found it!
>>>>>
>>>>> Coleen
>>>>>
>>>>>>
>>>>>>> This is also called with NSV with or_null true. I can't
>>>>>>> remember where, I think it was the heap walker.
>>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/code/vtableStubs.cpp
>>>>>>>>
>>>>>>>> // cause a safepoint in this code that has NSV.
>>>>>>>>
>>>>>>>> Unclear what "code" has the NSV ??
>>>>>>>
>>>>>>> See above. I can change the comment to:
>>>>>>>
>>>>>>> 237 // all locks. Only post this event if a new state is not
>>>>>>> required. Creating a new state would
>>>>>>> 238 // cause a safepoint and the caller of this code has a
>>>>>>> NoSafepointVerifier.
>>>>>>
>>>>>> Sounds good!
>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>>>>>>>
>>>>>>>> As mentioned above why not put the check inside
>>>>>>>> SafepointMechanism::block_if_requested instead?
>>>>>>>
>>>>>>> The other callers of block_if_requested already have a
>>>>>>> check_possible_safepoint call. The only caller that made sense to
>>>>>>> add was transition from or to vm.
>>>>>>
>>>>>> Got it.
>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/mutex.cpp
>>>>>>>>
>>>>>>>> The Mutex constructor cleanup is incidental but okay. (Does make
>>>>>>>> me wonder about the whole Monitor construction process though
>>>>>>>> ... ClearMonitor doesn't need to be protected any more.)
>>>>>>>
>>>>>>> This leaked into this change. I'll take it out and post a
>>>>>>> different RFR for it. The no-arg constructor Monitor isn't
>>>>>>> needed either so ClearMonitor can be deleted.
>>>>>>
>>>>>> Okay.
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>>>>
>>>>>>>> // NSV checking
>>>>>>>> void check_for_valid_safepoint_state(bool
>>>>>>>> potential_vm_operation) NOT_DEBUG_RETURN;
>>>>>>>> void check_possible_safepoint() NOT_DEBUG_RETURN;
>>>>>>>>
>>>>>>>> Please expand NSV. Does the comment apply to both methods or
>>>>>>>> only the first?
>>>>>>>
>>>>>>> Both methods. I'll change it to:
>>>>>>>
>>>>>>> // These functions check conditions on a JavaThread before
>>>>>>> possibly going to a safepoint,
>>>>>>> // including NoSafepointVerifier.
>>>>>>> void check_for_valid_safepoint_state(bool
>>>>>>> potential_vm_operation) NOT_DEBUG_RETURN;
>>>>>>> void check_possible_safepoint() NOT_DEBUG_RETURN;
>>>>>>
>>>>>> Okay on the new comment.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Coleen
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>
>>>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list