Conflicting use of StackWatermark in StackWalker vs GC?

Stefan Karlsson stefan.karlsson at oracle.com
Wed Feb 10 08:04:23 UTC 2021



On 2021-02-10 00:23, Roman Kennke wrote:
>
>
>>>> It's interesting that fetchNextBatch process the entire stack in 
>>>> preparation for filling in the information about the frames:
>>>>
>>>>      // If we have to get back here for even more frames, then 1) 
>>>> the user did not supply
>>>>      // an accurate hint suggesting the depth of the stack walk, 
>>>> and 2) we are not just
>>>>      // peeking  at a few frames. Take the cost of flushing out any 
>>>> pending deferred GC
>>>>      // processing of the stack.
>>>>      StackWatermarkSet::finish_processing(jt, NULL /* context */, 
>>>> StackWatermarkKind::gc);
>>>>
>>>> but further down in fill_in_frames => LiveFrameStream::fill_frame 
>>>> => fill_live_stackframe, we perform object allocation, which could 
>>>> safepoint for a GC that would reset the watermark. After leaving 
>>>> that safepoint we will have processed the top-most frames, but we 
>>>> won't have processed down the the current frame the StackWalker is 
>>>> looking at. This is my guess of what's happening, but I haven't 
>>>> been able to reproduce the problem, so it's a bit hard to verify 
>>>> that this is what's happening.
>>>
>>> That sounds plausible.
>>>
>>> What would be a way out of this? Scan the stack and collect all 
>>> relevant information without allocating any Java objects yet, and 
>>> fill in the Java frames array after the stack scan, maybe?
>>
>> We have a way to deal with similar situations:
>>
>> // Use this class to mark a remote thread you are currently interested
>> // in examining the entire stack, without it slipping into an 
>> unprocessed
>> // state at safepoint polls.
>> class KeepStackGCProcessedMark : public StackObj {
>>
>> It installs a link to the other thread, and whenever we hit a 
>> safepoint that entire stack is processed. See:
>>
>> void StackWatermark::on_safepoint() {
>>    start_processing();
>>    StackWatermark* linked_watermark = _linked_watermark;
>>    if (linked_watermark != NULL) {
>>      linked_watermark->finish_processing(NULL /* context */);
>>    }
>> }
>>
>> KeepStackGCProcessedMark isn't reentrant, so we would have to watch 
>> out for that.
>
> Wow, this is very useful! I was almost done with separating stack 
> scanning and setting up the Java stack frame info objects, but using 
> the KeepStackGCProcessedMark it is much simpler:
>
> This seems to work perfectly fine and fix the bug:
>
> https://urldefense.com/v3/__https://gist.github.com/rkennke/553b0ac024d6d094ff0784fa56c85fb0__;!!GqivPVa7Brio!JmXVRlaquMtM5x6DZKv2vBX0ldOTtH_YglZnpL0ogEw1DmsUfW9yl1toV-H2Zfro0ZLj$ 
>
> I'll look at it some more and do more testing, and will file a PR 
> (unless you disagree).

Yes, I think that looks good. I haven't looked too closely at 
StackWalk::fetchFirstBatch, but that one might have to be handled as 
well. Would be good to get a second opinion from Erik.

Thanks,
StefanK

>
> Thanks!
> Roman
>




More information about the hotspot-gc-dev mailing list