Conflicting use of StackWatermark in StackWalker vs GC?

Stefan Karlsson stefan.karlsson at oracle.com
Tue Feb 9 14:45:52 UTC 2021


It's interesting that fetchNextBatch process the entire stack in 
preparation for filling in the information about the frames:

     // If we have to get back here for even more frames, then 1) the 
user did not supply
     // an accurate hint suggesting the depth of the stack walk, and 2) 
we are not just
     // peeking  at a few frames. Take the cost of flushing out any 
pending deferred GC
     // processing of the stack.
     StackWatermarkSet::finish_processing(jt, NULL /* context */, 
StackWatermarkKind::gc);

but further down in fill_in_frames => LiveFrameStream::fill_frame => 
fill_live_stackframe, we perform object allocation, which could 
safepoint for a GC that would reset the watermark. After leaving that 
safepoint we will have processed the top-most frames, but we won't have 
processed down the the current frame the StackWalker is looking at. This 
is my guess of what's happening, but I haven't been able to reproduce 
the problem, so it's a bit hard to verify that this is what's happening.

StefanK

On 2021-02-09 15:08, Roman Kennke wrote:
> I am getting the same failure with ZGC:
>
> CONF=linux-x86_64-server-fastdebug make run-test 
> TEST=java/lang/StackWalker 
> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC 
> -XX:ZCollectionInterval=0.01"
>
>
>> Hello all,
>>
>> When running StackWalker tests with 'aggressive' Shenandoah mode 
>> (i.e. run GCs all the time, even if there is no work), then I observe 
>> crashes like this:
>>
>> #  Internal Error 
>> (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178), 
>> pid=549168, tid=549230
>> #  assert(is_frame_safe(f)) failed: Frame must be safe
>>
>> Full hs_err:
>> http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log
>>
>> I strongly suspect that this is happening because of StackWalker's 
>> use of StackWatermark which conflicts with the GC's own use of 
>> StackWalker. IOW, it asserts that the frame has been processed, but 
>> the GC is still on it.
>>
>> Are we missing some coordination between StackWalker and the GC here?
>>
>> It can be reproduced using:
>> CONF=linux-x86_64-server-fastdebug make run-test 
>> TEST=java/lang/StackWalker 
>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC 
>> -XX:ShenandoahGCHeuristics=aggressive"
>>
>> Thanks,
>> Roman
>




More information about the hotspot-gc-dev mailing list