Conflicting use of StackWatermark in StackWalker vs GC?

Roman Kennke rkennke at redhat.com
Tue Feb 9 23:23:13 UTC 2021



>>> It's interesting that fetchNextBatch process the entire stack in 
>>> preparation for filling in the information about the frames:
>>>
>>>      // If we have to get back here for even more frames, then 1) the 
>>> user did not supply
>>>      // an accurate hint suggesting the depth of the stack walk, and 
>>> 2) we are not just
>>>      // peeking  at a few frames. Take the cost of flushing out any 
>>> pending deferred GC
>>>      // processing of the stack.
>>>      StackWatermarkSet::finish_processing(jt, NULL /* context */, 
>>> StackWatermarkKind::gc);
>>>
>>> but further down in fill_in_frames => LiveFrameStream::fill_frame => 
>>> fill_live_stackframe, we perform object allocation, which could 
>>> safepoint for a GC that would reset the watermark. After leaving that 
>>> safepoint we will have processed the top-most frames, but we won't 
>>> have processed down the the current frame the StackWalker is looking 
>>> at. This is my guess of what's happening, but I haven't been able to 
>>> reproduce the problem, so it's a bit hard to verify that this is 
>>> what's happening.
>>
>> That sounds plausible.
>>
>> What would be a way out of this? Scan the stack and collect all 
>> relevant information without allocating any Java objects yet, and fill 
>> in the Java frames array after the stack scan, maybe?
> 
> We have a way to deal with similar situations:
> 
> // Use this class to mark a remote thread you are currently interested
> // in examining the entire stack, without it slipping into an unprocessed
> // state at safepoint polls.
> class KeepStackGCProcessedMark : public StackObj {
> 
> It installs a link to the other thread, and whenever we hit a safepoint 
> that entire stack is processed. See:
> 
> void StackWatermark::on_safepoint() {
>    start_processing();
>    StackWatermark* linked_watermark = _linked_watermark;
>    if (linked_watermark != NULL) {
>      linked_watermark->finish_processing(NULL /* context */);
>    }
> }
> 
> KeepStackGCProcessedMark isn't reentrant, so we would have to watch out 
> for that.

Wow, this is very useful! I was almost done with separating stack 
scanning and setting up the Java stack frame info objects, but using the 
KeepStackGCProcessedMark it is much simpler:

This seems to work perfectly fine and fix the bug:

https://gist.github.com/rkennke/553b0ac024d6d094ff0784fa56c85fb0

I'll look at it some more and do more testing, and will file a PR 
(unless you disagree).

Thanks!
Roman




More information about the hotspot-gc-dev mailing list