RFR(s): 8218147: make_walkable asserts on multiple calls
Robbin Ehn
robbin.ehn at oracle.com
Tue Apr 16 06:51:17 UTC 2019
Hi David,
On 4/16/19 4:00 AM, David Holmes wrote:
> Hi Robbin,
>
> On 15/04/2019 6:58 pm, Robbin Ehn wrote:
>> Hi, please review.
>>
>> After reexamine this issue:
>> Threads in native must always have their stack walkable.
>> JFR sampler should never need to make a stack walkable (for native sample).
>>
>> I manage to locally reproduce reliable with changes to JFR sampler and having
>> hundreds of threads running similar code as the in the bug.
>> (Looping creating an array with negative size.)
>>
>> I found a place where we don't proper look at the suspend flags.
>> The java thread can thus escape native and make it's stack unwalkable and later
>> it tries to make it walkable at the same time as the JFR sampler.
>>
>> By removing some kind of fast check and instead always call the
>> check_safepoint_and_suspend_for_native_trans I can no longer reproduce.
>
> Sorry but I can't see how this can fix anything:
>
> - if (SafepointMechanism::should_block(thread) ||
> thread->is_suspend_after_native()) {
> JavaThread::check_safepoint_and_suspend_for_native_trans(thread);
> - }
>
In check_safepoint_and_suspend_for_native_trans we check _trace_flag and stop
with macro:
JFR_ONLY(SUSPEND_THREAD_CONDITIONAL(thread);)
This method does not do the right thing:
bool is_suspend_after_native() const {
return (_suspend_flags & (_external_suspend | _deopt_suspend)) != 0;
}
If we want to keep the out-of-line double checking, this is an alternative:
diff -r 51211b2d6514 src/hotspot/share/runtime/thread.hpp
--- a/src/hotspot/share/runtime/thread.hpp Tue Apr 16 08:38:32 2019 +0200
+++ b/src/hotspot/share/runtime/thread.hpp Tue Apr 16 08:44:21 2019 +0200
@@ -1417,3 +1417,3 @@
bool is_suspend_after_native() const {
- return (_suspend_flags & (_external_suspend | _deopt_suspend)) != 0;
+ return (_suspend_flags & (_external_suspend | _deopt_suspend |
_trace_flag)) != 0;
}
So we are missing checking that bit completely in this transition code.
Thanks, Robbin
> All you are doing is changing the timing of the race between the thread
> re-entering the VM/Java and the request for a suspend or safepoint.
>
> If there is a race between the sampler logic acting on the thread, and the
> thread acting on itself then that race has to be precluded somehow.
>
> Thanks,
> David
> -----
>
>> (which have the JFR native trans suspend check)
>> And it passes t1-5.
>>
>> Code:
>> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8218147
>>
>> Thanks, Robbin
>>
>> On 4/5/19 5:43 PM, Robbin Ehn wrote:
>>> Hi Dean,
>>>
>>> Sorry, I missed this mail.
>>> Yes we can do that.
>>> Ignore my other mail, I'll update.
>>>
>>> Thanks, Robbin
>>>
>>>
>>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST)
>>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote:
>>>>>
>>>>>>>
>>>>>>> If it's already set, should we check that _last_Java_pc matches the
>>>>
>>>>>>> new value?
>>>>>>
>>>>>> We manually set the pc in several places, so if it's set, it's not
>>>>>> certain that
>>>>>> it should be the same as in last sp.
>>>>>> I can't distinguish between the cases.
>>>>>>
>>>>>
>>>>> If we get pc from sp[-1] then it should match, but you're right, we
>>>>> sometimes get pc from somewhere else.
>>>>
>>>> How about if we combine the !walkable check and the
>>>> capture_last_Java_pc() logic into a single method?
>>>> Then we can do something like:
>>>>
>>>> if (!walkable()) {
>>>> address pc = (address)_last_Java_sp[-1];
>>>> address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL);
>>>> assert(a == NULL || a == pc, "unexpected PC %p", a);
>>>> }
>>>>
>>>> dl
More information about the hotspot-jfr-dev
mailing list