Conflicting use of StackWatermark in StackWalker vs GC?
Stefan Karlsson
stefan.karlsson at oracle.com
Wed Feb 10 08:04:23 UTC 2021
On 2021-02-10 00:23, Roman Kennke wrote:
>
>
>>>> It's interesting that fetchNextBatch process the entire stack in
>>>> preparation for filling in the information about the frames:
>>>>
>>>> // If we have to get back here for even more frames, then 1)
>>>> the user did not supply
>>>> // an accurate hint suggesting the depth of the stack walk,
>>>> and 2) we are not just
>>>> // peeking at a few frames. Take the cost of flushing out any
>>>> pending deferred GC
>>>> // processing of the stack.
>>>> StackWatermarkSet::finish_processing(jt, NULL /* context */,
>>>> StackWatermarkKind::gc);
>>>>
>>>> but further down in fill_in_frames => LiveFrameStream::fill_frame
>>>> => fill_live_stackframe, we perform object allocation, which could
>>>> safepoint for a GC that would reset the watermark. After leaving
>>>> that safepoint we will have processed the top-most frames, but we
>>>> won't have processed down the the current frame the StackWalker is
>>>> looking at. This is my guess of what's happening, but I haven't
>>>> been able to reproduce the problem, so it's a bit hard to verify
>>>> that this is what's happening.
>>>
>>> That sounds plausible.
>>>
>>> What would be a way out of this? Scan the stack and collect all
>>> relevant information without allocating any Java objects yet, and
>>> fill in the Java frames array after the stack scan, maybe?
>>
>> We have a way to deal with similar situations:
>>
>> // Use this class to mark a remote thread you are currently interested
>> // in examining the entire stack, without it slipping into an
>> unprocessed
>> // state at safepoint polls.
>> class KeepStackGCProcessedMark : public StackObj {
>>
>> It installs a link to the other thread, and whenever we hit a
>> safepoint that entire stack is processed. See:
>>
>> void StackWatermark::on_safepoint() {
>> start_processing();
>> StackWatermark* linked_watermark = _linked_watermark;
>> if (linked_watermark != NULL) {
>> linked_watermark->finish_processing(NULL /* context */);
>> }
>> }
>>
>> KeepStackGCProcessedMark isn't reentrant, so we would have to watch
>> out for that.
>
> Wow, this is very useful! I was almost done with separating stack
> scanning and setting up the Java stack frame info objects, but using
> the KeepStackGCProcessedMark it is much simpler:
>
> This seems to work perfectly fine and fix the bug:
>
> https://urldefense.com/v3/__https://gist.github.com/rkennke/553b0ac024d6d094ff0784fa56c85fb0__;!!GqivPVa7Brio!JmXVRlaquMtM5x6DZKv2vBX0ldOTtH_YglZnpL0ogEw1DmsUfW9yl1toV-H2Zfro0ZLj$
>
> I'll look at it some more and do more testing, and will file a PR
> (unless you disagree).
Yes, I think that looks good. I haven't looked too closely at
StackWalk::fetchFirstBatch, but that one might have to be handled as
well. Would be good to get a second opinion from Erik.
Thanks,
StefanK
>
> Thanks!
> Roman
>
More information about the hotspot-gc-dev
mailing list