RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v10]

Tue Oct 6 02:42:49 UTC 2020

On Mon, 5 Oct 2020 11:43:52 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

>> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack Processing" (cf.
>> https://openjdk.java.net/jeps/376).
>> Basically, this patch modifies the epilog safepoint when returning from a frame (supporting interpreter frames, c1, c2,
>> and native wrapper frames), to compare the stack pointer against a thread-local value. This turns return polls into
>> more of a swiss army knife that can be used to poll for safepoints, handshakes, but also returns into not yet safe to
>> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and other thread oops) in a state of a mess in
>> the GC checkpoint safepoints, rather than processing all threads and their stacks. Processing is initialized
>> automagically when threads wake up for a safepoint, or get poked by a handshake or safepoint. Said initialization
>> processes a few (3) frames and other thread oops. The rest - the bulk of the frame processing, is deferred until it is
>> actually needed. It is needed when a frame is exposed to either 1) execution (returns or unwinding due to exception
>> handling), or 2) stack walker APIs. A hook is then run to go and finish the lazy processing of frames.  Mutator and GC
>> threads can compete for processing. The processing is therefore performed under a per-thread lock. Note that disarming
>> of the poll word (that the returns are comparing against) is only performed by the thread itself. So sliding the
>> watermark up will require one runtime call for a thread to note that nothing needs to be done, and then update the poll
>> word accordingly. Downgrading the poll word concurrently by other threads was simply not worth the complexity it
>> brought (and is only possible on TSO machines). So left that one out.
>
> Erik Österlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now
> contains 16 commits:
>  - Review: Deal with new assert from mainline
>  - Merge branch 'master' into 8253180_conc_stack_scanning
>  - Review: StackWalker hook
>  - Review: Kim CR 1 and exception handling fix
>  - Review: Move barrier detach
>  - Review: Remove assert that has outstayed its welcome
>  - Merge branch 'master' into 8253180_conc_stack_scanning
>  - Review: Albert CR2 and defensive programming
>  - Review: StefanK CR 3
>  - Review: Per CR 1
>  - ... and 6 more: https://git.openjdk.java.net/jdk/compare/9604ee82...e633cb94

src/hotspot/share/runtime/stackWatermark.cpp line 223:

> 221: void StackWatermark::yield_processing() {
> 222:   update_watermark();
> 223:   MutexUnlocker mul(&_lock, Mutex::_no_safepoint_check_flag);

This seems a little dubious - is it just a heuristic? There is no guarantee that unlocking the Mutex will allow another
thread to claim it before this thread re-locks it.

src/hotspot/share/runtime/stackWatermark.hpp line 91:

> 89:   JavaThread* _jt;
> 90:   StackWatermarkFramesIterator* _iterator;
> 91:   Mutex _lock;

How are you guaranteeing that the Mutex is unused at the time the StackWatermark is deleted?

-------------

PR: https://git.openjdk.java.net/jdk/pull/296