RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk
Roman Kennke
rkennke at openjdk.java.net
Fri Feb 12 21:45:57 UTC 2021
On Wed, 10 Feb 2021 12:38:10 GMT, Roman Kennke <rkennke at openjdk.org> wrote:
>> I am observing the following assert:
>>
>> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
>> # assert(is_frame_safe(f)) failed: Frame must be safe
>>
>> (see issue for full hs_err)
>>
>> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.
>>
>> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.
>>
>> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.
>>
>> Testing:
>> - [x] StackWalk tests with Shenandoah/aggressive
>> - [x] StackWalk tests with ZGC/aggressive
>> - [ ] tier1 (+Shenandoah/ZGC)
>> - [ ] tier2 (+Shenandoah/ZGC)
>
> I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant.
I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we *also* can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom *and* mainline JDK though.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2500
More information about the hotspot-gc-dev
mailing list