Conflicting use of StackWatermark in StackWalker vs GC?
Roman Kennke
rkennke at redhat.com
Tue Feb 9 23:23:13 UTC 2021
>>> It's interesting that fetchNextBatch process the entire stack in
>>> preparation for filling in the information about the frames:
>>>
>>> // If we have to get back here for even more frames, then 1) the
>>> user did not supply
>>> // an accurate hint suggesting the depth of the stack walk, and
>>> 2) we are not just
>>> // peeking at a few frames. Take the cost of flushing out any
>>> pending deferred GC
>>> // processing of the stack.
>>> StackWatermarkSet::finish_processing(jt, NULL /* context */,
>>> StackWatermarkKind::gc);
>>>
>>> but further down in fill_in_frames => LiveFrameStream::fill_frame =>
>>> fill_live_stackframe, we perform object allocation, which could
>>> safepoint for a GC that would reset the watermark. After leaving that
>>> safepoint we will have processed the top-most frames, but we won't
>>> have processed down the the current frame the StackWalker is looking
>>> at. This is my guess of what's happening, but I haven't been able to
>>> reproduce the problem, so it's a bit hard to verify that this is
>>> what's happening.
>>
>> That sounds plausible.
>>
>> What would be a way out of this? Scan the stack and collect all
>> relevant information without allocating any Java objects yet, and fill
>> in the Java frames array after the stack scan, maybe?
>
> We have a way to deal with similar situations:
>
> // Use this class to mark a remote thread you are currently interested
> // in examining the entire stack, without it slipping into an unprocessed
> // state at safepoint polls.
> class KeepStackGCProcessedMark : public StackObj {
>
> It installs a link to the other thread, and whenever we hit a safepoint
> that entire stack is processed. See:
>
> void StackWatermark::on_safepoint() {
> start_processing();
> StackWatermark* linked_watermark = _linked_watermark;
> if (linked_watermark != NULL) {
> linked_watermark->finish_processing(NULL /* context */);
> }
> }
>
> KeepStackGCProcessedMark isn't reentrant, so we would have to watch out
> for that.
Wow, this is very useful! I was almost done with separating stack
scanning and setting up the Java stack frame info objects, but using the
KeepStackGCProcessedMark it is much simpler:
This seems to work perfectly fine and fix the bug:
https://gist.github.com/rkennke/553b0ac024d6d094ff0784fa56c85fb0
I'll look at it some more and do more testing, and will file a PR
(unless you disagree).
Thanks!
Roman
More information about the hotspot-gc-dev
mailing list