RFR(s): 8218147: make_walkable asserts on multiple calls

Robbin Ehn robbin.ehn at oracle.com
Tue Apr 16 06:51:17 UTC 2019


Hi David,

On 4/16/19 4:00 AM, David Holmes wrote:
> Hi Robbin,
> 
> On 15/04/2019 6:58 pm, Robbin Ehn wrote:
>> Hi, please review.
>>
>> After reexamine this issue:
>> Threads in native must always have their stack walkable.
>> JFR sampler should never need to make a stack walkable (for native sample).
>>
>> I manage to locally reproduce reliable with changes to JFR sampler and having
>> hundreds of threads running similar code as the in the bug.
>> (Looping creating an array with negative size.)
>>
>> I found a place where we don't proper look at the suspend flags.
>> The java thread can thus escape native and make it's stack unwalkable and later
>> it tries to make it walkable at the same time as the JFR sampler.
>>
>> By removing some kind of fast check and instead always call the 
>> check_safepoint_and_suspend_for_native_trans I can no longer reproduce.
> 
> Sorry but I can't see how this can fix anything:
> 
> -     if (SafepointMechanism::should_block(thread) || 
> thread->is_suspend_after_native()) {
>          JavaThread::check_safepoint_and_suspend_for_native_trans(thread);
> -     }
> 

In check_safepoint_and_suspend_for_native_trans we check _trace_flag and stop 
with macro:
JFR_ONLY(SUSPEND_THREAD_CONDITIONAL(thread);)

This method does not do the right thing:
bool is_suspend_after_native() const {
   return (_suspend_flags & (_external_suspend | _deopt_suspend)) != 0;
}

If we want to keep the out-of-line double checking, this is an alternative:
diff -r 51211b2d6514 src/hotspot/share/runtime/thread.hpp
--- a/src/hotspot/share/runtime/thread.hpp	Tue Apr 16 08:38:32 2019 +0200
+++ b/src/hotspot/share/runtime/thread.hpp	Tue Apr 16 08:44:21 2019 +0200
@@ -1417,3 +1417,3 @@
    bool is_suspend_after_native() const {
-    return (_suspend_flags & (_external_suspend | _deopt_suspend)) != 0;
+    return (_suspend_flags & (_external_suspend | _deopt_suspend | 
_trace_flag)) != 0;
    }

So we are missing checking that bit completely in this transition code.

Thanks, Robbin


> All you are doing is changing the timing of the race between the thread 
> re-entering the VM/Java and the request for a suspend or safepoint.
> 
> If there is a race between the sampler logic acting on the thread, and the 
> thread acting on itself then that race has to be precluded somehow.
> 
> Thanks,
> David
> -----
> 
>> (which have the JFR native trans suspend check)
>> And it passes t1-5.
>>
>> Code:
>> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8218147
>>
>> Thanks, Robbin
>>
>> On 4/5/19 5:43 PM, Robbin Ehn wrote:
>>> Hi Dean,
>>>
>>> Sorry, I missed this mail.
>>> Yes we can do that.
>>> Ignore my other mail, I'll update.
>>>
>>> Thanks, Robbin
>>>
>>>
>>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST)
>>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote:
>>>>>
>>>>>>>
>>>>>>> If it's already set, should we check that _last_Java_pc matches the
>>>>
>>>>>>> new value?
>>>>>>
>>>>>> We manually set the pc in several places, so if it's set, it's not
>>>>>> certain that
>>>>>> it should be the same as in last sp.
>>>>>> I can't distinguish between the cases.
>>>>>>
>>>>>
>>>>> If we get pc from sp[-1] then it should match, but you're right, we
>>>>> sometimes get pc from somewhere else.
>>>>
>>>> How about if we combine the !walkable check and the
>>>> capture_last_Java_pc() logic into a single method?
>>>> Then we can do something like:
>>>>
>>>>      if (!walkable()) {
>>>>          address pc = (address)_last_Java_sp[-1];
>>>>          address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL);
>>>>          assert(a == NULL || a == pc, "unexpected PC %p", a);
>>>>      }
>>>>
>>>> dl


More information about the hotspot-jfr-dev mailing list