Conflicting use of StackWatermark in StackWalker vs GC?

Roman Kennke rkennke at redhat.com
Tue Feb 9 15:30:23 UTC 2021


Tracking this here:
https://bugs.openjdk.java.net/browse/JDK-8261448

Roman


> Hi Stefan,
> 
>> It's interesting that fetchNextBatch process the entire stack in 
>> preparation for filling in the information about the frames:
>>
>>      // If we have to get back here for even more frames, then 1) the 
>> user did not supply
>>      // an accurate hint suggesting the depth of the stack walk, and 
>> 2) we are not just
>>      // peeking  at a few frames. Take the cost of flushing out any 
>> pending deferred GC
>>      // processing of the stack.
>>      StackWatermarkSet::finish_processing(jt, NULL /* context */, 
>> StackWatermarkKind::gc);
>>
>> but further down in fill_in_frames => LiveFrameStream::fill_frame => 
>> fill_live_stackframe, we perform object allocation, which could 
>> safepoint for a GC that would reset the watermark. After leaving that 
>> safepoint we will have processed the top-most frames, but we won't 
>> have processed down the the current frame the StackWalker is looking 
>> at. This is my guess of what's happening, but I haven't been able to 
>> reproduce the problem, so it's a bit hard to verify that this is 
>> what's happening.
> 
> That sounds plausible.
> 
> What would be a way out of this? Scan the stack and collect all relevant 
> information without allocating any Java objects yet, and fill in the 
> Java frames array after the stack scan, maybe?
> 
> Roman
> 
> 
>> StefanK
>>
>> On 2021-02-09 15:08, Roman Kennke wrote:
>>> I am getting the same failure with ZGC:
>>>
>>> CONF=linux-x86_64-server-fastdebug make run-test 
>>> TEST=java/lang/StackWalker 
>>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC 
>>> -XX:ZCollectionInterval=0.01"
>>>
>>>
>>>> Hello all,
>>>>
>>>> When running StackWalker tests with 'aggressive' Shenandoah mode 
>>>> (i.e. run GCs all the time, even if there is no work), then I 
>>>> observe crashes like this:
>>>>
>>>> #  Internal Error 
>>>> (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178), 
>>>> pid=549168, tid=549230
>>>> #  assert(is_frame_safe(f)) failed: Frame must be safe
>>>>
>>>> Full hs_err:
>>>> http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log
>>>>
>>>> I strongly suspect that this is happening because of StackWalker's 
>>>> use of StackWatermark which conflicts with the GC's own use of 
>>>> StackWalker. IOW, it asserts that the frame has been processed, but 
>>>> the GC is still on it.
>>>>
>>>> Are we missing some coordination between StackWalker and the GC here?
>>>>
>>>> It can be reproduced using:
>>>> CONF=linux-x86_64-server-fastdebug make run-test 
>>>> TEST=java/lang/StackWalker 
>>>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC 
>>>> -XX:ShenandoahGCHeuristics=aggressive"
>>>>
>>>> Thanks,
>>>> Roman
>>>
>>




More information about the hotspot-gc-dev mailing list