Conflicting use of StackWatermark in StackWalker vs GC?

Roman Kennke rkennke at
Tue Feb 9 15:08:24 UTC 2021

Hi Stefan,

> It's interesting that fetchNextBatch process the entire stack in 
> preparation for filling in the information about the frames:
>      // If we have to get back here for even more frames, then 1) the 
> user did not supply
>      // an accurate hint suggesting the depth of the stack walk, and 2) 
> we are not just
>      // peeking  at a few frames. Take the cost of flushing out any 
> pending deferred GC
>      // processing of the stack.
>      StackWatermarkSet::finish_processing(jt, NULL /* context */, 
> StackWatermarkKind::gc);
> but further down in fill_in_frames => LiveFrameStream::fill_frame => 
> fill_live_stackframe, we perform object allocation, which could 
> safepoint for a GC that would reset the watermark. After leaving that 
> safepoint we will have processed the top-most frames, but we won't have 
> processed down the the current frame the StackWalker is looking at. This 
> is my guess of what's happening, but I haven't been able to reproduce 
> the problem, so it's a bit hard to verify that this is what's happening.

That sounds plausible.

What would be a way out of this? Scan the stack and collect all relevant 
information without allocating any Java objects yet, and fill in the 
Java frames array after the stack scan, maybe?


> StefanK
> On 2021-02-09 15:08, Roman Kennke wrote:
>> I am getting the same failure with ZGC:
>> CONF=linux-x86_64-server-fastdebug make run-test 
>> TEST=java/lang/StackWalker 
>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC 
>> -XX:ZCollectionInterval=0.01"
>>> Hello all,
>>> When running StackWalker tests with 'aggressive' Shenandoah mode 
>>> (i.e. run GCs all the time, even if there is no work), then I observe 
>>> crashes like this:
>>> #  Internal Error 
>>> (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178), 
>>> pid=549168, tid=549230
>>> #  assert(is_frame_safe(f)) failed: Frame must be safe
>>> Full hs_err:
>>> I strongly suspect that this is happening because of StackWalker's 
>>> use of StackWatermark which conflicts with the GC's own use of 
>>> StackWalker. IOW, it asserts that the frame has been processed, but 
>>> the GC is still on it.
>>> Are we missing some coordination between StackWalker and the GC here?
>>> It can be reproduced using:
>>> CONF=linux-x86_64-server-fastdebug make run-test 
>>> TEST=java/lang/StackWalker 
>>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC 
>>> -XX:ShenandoahGCHeuristics=aggressive"
>>> Thanks,
>>> Roman

More information about the hotspot-gc-dev mailing list