RFR: JDK-8132849: Increased stop time in cleanup phase because of single-threaded walk of thread stacks in NMethodSweeper::mark_active_nmethods()

Sun Sep 23 22:08:48 UTC 2018

Hi Roman,

Thank you for sorting this out. It is very helpful.

Could you change the name of ThreadToCodeBlobClosure to 
NMethodMarkingThreadClosure. (Motivation: the closure filters 
JavaThreads that are not the sweeper, and actually only looks at 
nmethods, and not other types of CodeBlobs, e.g. AoT methods, so it does 
less than I expect).

Also, the NMethodSweeper::prepare_mark_active_nmethods() was built for 
safepoint cleaning and returns either hotness counting or nmethod 
marking closures. However, when moving nmethod marking out to be done 
concurrently with TLH, it is slightly confusing to have the same member 
function called from the concurrent context despite never ever wanting a 
hotness counter closure from there.

I'm thinking the prepare_mark_active_nmethods() member function could be 
split into two:

One member function that returns either nmethod marking closure or NULL 
(depending on whether it's needed or not).
Another member function that calls the first one, and if NULL slaps on a 
hotness counter closure.

Then from concurrent contexts we would call the first method (nmethod 
marking or NULL), and from STW contexts we would call the second member 
function (nmethod marking or hotness counter).

Another thing worth noticing is that the VM_MarkActiveNMethods VM 
operation marks the nmethods on the stack twice. First in safepoint 
cleanup, and subsequently in the operation itself 
(VM_MarkActiveNMethods::doit). I would argue that only one pass is 
enough. Therefore, I would propose to completely remove the nmethod 
marking from the safepoint cleanup, and have safepoint cleanup *only* 
fiddle around with hotness counters. If we do that, then nmethod marking 
is done in VM_MarkActiveNMethods::doit if TLH is off, and in your new 
handshake operation when TLH is on.

Then we can have zero nmethod marking in safepoint cleanup, and 
subsequently figure out how to get rid of the hotness counters.

Thanks,
/Erik

On 2018-09-23 20:47, Roman Kennke wrote:
> Hi David,
>
> thanks for looking at this!
>
>> Should compiler folk be looking at this as well?
> Maybe. I added them.
>
>> I'm not familiar with the details of the NMethodSweeper but it seems to
>> me that this change potentially allows multiple concurrent executions of
>> NMethodSweeper::prepare_mark_active_nmethods() and that code does not
>> appear to be thread-safe.
> There are two scenarios now:
> - TLHS enabled: NMethodSweeper::prepare_mark_active_nmethods() only gets
> called from the sweeper thread.
> - TLHS disabled: NMethodSweeper::prepare_mark_active_nmethods() only
> gets called from VMThread/at-safepoint.
>
> The structures used in NMethodSweeper::prepare_mark_active_nmethods()
> are only ever called from sweeper thread, or at safepoint, and those are
> exclusive, that means it should be safe. And instead of removing the
> assert, we can extend it to accept the sweeper thread. I also noticed
> that we need to grab the CodeCache_lock before calling into
> prepare_mark_active_nmethods() so I added that and put that into the
> assertion.
>
> Incremental webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02.diff/
> Full webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.02/
>
> Better now?
>
> Thanks,
> Roman
>
>
>> Thanks,
>> David
>>
>> On 21/09/2018 9:57 AM, Roman Kennke wrote:
>>> Please review the following change to improve and/or eliminate stop to
>>> to mark stacks for NMethodSweeper.
>>>
>>> The proposed change is two-fold:
>>> - If ThreadLocalHandshake is enabled, do the stack-nmethod-marking using
>>> TLHS. This completely eliminates the full safepoint. In this scenario,
>>> nmethod-marking will also be skipped during safepoint-cleanup. IOW, it
>>> only happens when the sweeper loop asks for it. It is also most
>>> efficient because each thread scans its own stack, without requiring to
>>> synchronize with other threads. Everything remains free-running.
>>> - Otherwise, try to use GC-safepoint-workers to do the marking at SP.
>>> The infrastructure for this is already there since some time, and both
>>> G1 and ZGC (and Shenandoah, when it arrives) support it. The
>>> safepoint-cleanup-phase already uses it, so let's just do the same in
>>> sweeper-loop-induced safepoints.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8132849
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8132849/webrev.01/
>>>
>>> Testing: hotspot/jtreg:tier1 using +ThreadLocalHandshakes and
>>> -ThreadLocalHandshakes
>>>
>>> One issue that I am not sure of is the:
>>>
>>> assert(SafepointSynchronize::is_at_safepoint(), "must be executed at a
>>> safepoint");
>>>
>>> at the start of NMethodSweeper::prepare_mark_active_nmethods().
>>>
>>> I couldn't see any particular reason for it. The
>>> wait_for_stack_scanning() stuff is called outside safepoinst anyway, and
>>> the other stuff doesn't seem critical. And besides, in the scenario
>>> where we'd call this outside safepoint (+ThreadLocalHandshakes) we'd
>>> only ever call it from the sweeper thread anyway.
>>>
>>> What do you think?
>>>
>>>
>