Conflicting use of StackWatermark in StackWalker vs GC?
Roman Kennke
rkennke at redhat.com
Tue Feb 9 15:08:24 UTC 2021
Hi Stefan,
> It's interesting that fetchNextBatch process the entire stack in
> preparation for filling in the information about the frames:
>
> // If we have to get back here for even more frames, then 1) the
> user did not supply
> // an accurate hint suggesting the depth of the stack walk, and 2)
> we are not just
> // peeking at a few frames. Take the cost of flushing out any
> pending deferred GC
> // processing of the stack.
> StackWatermarkSet::finish_processing(jt, NULL /* context */,
> StackWatermarkKind::gc);
>
> but further down in fill_in_frames => LiveFrameStream::fill_frame =>
> fill_live_stackframe, we perform object allocation, which could
> safepoint for a GC that would reset the watermark. After leaving that
> safepoint we will have processed the top-most frames, but we won't have
> processed down the the current frame the StackWalker is looking at. This
> is my guess of what's happening, but I haven't been able to reproduce
> the problem, so it's a bit hard to verify that this is what's happening.
That sounds plausible.
What would be a way out of this? Scan the stack and collect all relevant
information without allocating any Java objects yet, and fill in the
Java frames array after the stack scan, maybe?
Roman
> StefanK
>
> On 2021-02-09 15:08, Roman Kennke wrote:
>> I am getting the same failure with ZGC:
>>
>> CONF=linux-x86_64-server-fastdebug make run-test
>> TEST=java/lang/StackWalker
>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC
>> -XX:ZCollectionInterval=0.01"
>>
>>
>>> Hello all,
>>>
>>> When running StackWalker tests with 'aggressive' Shenandoah mode
>>> (i.e. run GCs all the time, even if there is no work), then I observe
>>> crashes like this:
>>>
>>> # Internal Error
>>> (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178),
>>> pid=549168, tid=549230
>>> # assert(is_frame_safe(f)) failed: Frame must be safe
>>>
>>> Full hs_err:
>>> http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log
>>>
>>> I strongly suspect that this is happening because of StackWalker's
>>> use of StackWatermark which conflicts with the GC's own use of
>>> StackWalker. IOW, it asserts that the frame has been processed, but
>>> the GC is still on it.
>>>
>>> Are we missing some coordination between StackWalker and the GC here?
>>>
>>> It can be reproduced using:
>>> CONF=linux-x86_64-server-fastdebug make run-test
>>> TEST=java/lang/StackWalker
>>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC
>>> -XX:ShenandoahGCHeuristics=aggressive"
>>>
>>> Thanks,
>>> Roman
>>
>
More information about the hotspot-gc-dev
mailing list