Conflicting use of StackWatermark in StackWalker vs GC?
Stefan Karlsson
stefan.karlsson at oracle.com
Tue Feb 9 22:44:24 UTC 2021
On 2021-02-09 16:08, Roman Kennke wrote:
> Hi Stefan,
>
>> It's interesting that fetchNextBatch process the entire stack in
>> preparation for filling in the information about the frames:
>>
>> // If we have to get back here for even more frames, then 1) the
>> user did not supply
>> // an accurate hint suggesting the depth of the stack walk, and
>> 2) we are not just
>> // peeking at a few frames. Take the cost of flushing out any
>> pending deferred GC
>> // processing of the stack.
>> StackWatermarkSet::finish_processing(jt, NULL /* context */,
>> StackWatermarkKind::gc);
>>
>> but further down in fill_in_frames => LiveFrameStream::fill_frame =>
>> fill_live_stackframe, we perform object allocation, which could
>> safepoint for a GC that would reset the watermark. After leaving that
>> safepoint we will have processed the top-most frames, but we won't
>> have processed down the the current frame the StackWalker is looking
>> at. This is my guess of what's happening, but I haven't been able to
>> reproduce the problem, so it's a bit hard to verify that this is
>> what's happening.
>
> That sounds plausible.
>
> What would be a way out of this? Scan the stack and collect all
> relevant information without allocating any Java objects yet, and fill
> in the Java frames array after the stack scan, maybe?
We have a way to deal with similar situations:
// Use this class to mark a remote thread you are currently interested
// in examining the entire stack, without it slipping into an unprocessed
// state at safepoint polls.
class KeepStackGCProcessedMark : public StackObj {
It installs a link to the other thread, and whenever we hit a safepoint
that entire stack is processed. See:
void StackWatermark::on_safepoint() {
start_processing();
StackWatermark* linked_watermark = _linked_watermark;
if (linked_watermark != NULL) {
linked_watermark->finish_processing(NULL /* context */);
}
}
KeepStackGCProcessedMark isn't reentrant, so we would have to watch out
for that.
StefanK
>
> Roman
>
>
>> StefanK
>>
>> On 2021-02-09 15:08, Roman Kennke wrote:
>>> I am getting the same failure with ZGC:
>>>
>>> CONF=linux-x86_64-server-fastdebug make run-test
>>> TEST=java/lang/StackWalker
>>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC
>>> -XX:ZCollectionInterval=0.01"
>>>
>>>
>>>> Hello all,
>>>>
>>>> When running StackWalker tests with 'aggressive' Shenandoah mode
>>>> (i.e. run GCs all the time, even if there is no work), then I
>>>> observe crashes like this:
>>>>
>>>> # Internal Error
>>>> (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178),
>>>> pid=549168, tid=549230
>>>> # assert(is_frame_safe(f)) failed: Frame must be safe
>>>>
>>>> Full hs_err:
>>>> http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log
>>>>
>>>> I strongly suspect that this is happening because of StackWalker's
>>>> use of StackWatermark which conflicts with the GC's own use of
>>>> StackWalker. IOW, it asserts that the frame has been processed, but
>>>> the GC is still on it.
>>>>
>>>> Are we missing some coordination between StackWalker and the GC here?
>>>>
>>>> It can be reproduced using:
>>>> CONF=linux-x86_64-server-fastdebug make run-test
>>>> TEST=java/lang/StackWalker
>>>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC
>>>> -XX:ShenandoahGCHeuristics=aggressive"
>>>>
>>>> Thanks,
>>>> Roman
>>>
>>
>
More information about the hotspot-gc-dev
mailing list