RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1

Wed Sep 25 10:42:51 UTC 2019

Hi all,

   can I have reviews for this change that fixes a regression introduced 
in jdk11?

So, currently in g1, in the concurrent start pause we have to mark 
through all live oops referenced from thread stacks to keep those alive 
across the concurrent marking in presence of class unloading. These are 
"strong" nmethods.

However G1 also scans oops of nmethods as part of the per-region nmethod 
remembered sets, but should not mark their oops as their classes should 
remain unloadable. These are "weak" nmethods.

Regardless of either way of processing, nmethods are claimed, to prevent 
processing them multiple times as happens often. Which means that if an 
nmethod is claimed first via the "weak" path, its oops will not be 
marked properly, and *boom*.

In order to prevent this, currently there is a hard synchronization, 
i.e. wait barrrier between strong nmethod processing and the actual 
evacuation (which ultimately does the weak nmethod processing).

This is a problem, particularly since jdk11, because stacks are claimed 
on a per thread basis. I.e. an extremely deep stack could make all 100's 
or 1000's of your cpu cores wait idly. The "particular since jdk11" part 
is that since then the amount of work done during nmethod iteration is 
only bounded by the amount of live objects to copy (i.e. JDK-6672778 
added task queue trimming which is generally a good thing), which can 
take a lot of time.

In this case, in some internal Cassandra setup we have seen ridiculously 
long waiting times as a single thread apparently evacuates the whole 
young gen... :(

This change moves the wait to the end of the evacuation, remembering any 
nmethods that were claimed by the weak nmethod processing before the 
strong nmethod processing got to it.

Since the amount of nmethods that need to be marked through has always 
been very low (single digit), that phase took <0.1ms typically. There is 
some attempt to parallelize this phase based on the number of nmethods 
(this pass only needs to mark the oops <= TAMS in nmethods, no more) 
anyway though.

I will look into merging parallel phases into a single one in the 
post-evacuation phase soon to get rid of this additional spin-up of 
threads (only during concurrent mark) again.

During this work I tried several alternatives that were rejected:
- disabling task queue trimming; that works, but still has the problem 
with deep thread stacks
- moving the actual wait deep into the evacuation code: made a mess with 
  the code, and still does not really solve the problem
- instead of remembering nmethods, concurrently process them. Does not 
work because x86 embedded oops are not word-aligned, so this is not a 
good idea.

There is always the option of doing the stack scanning in the concurrent 
phase: that seemingly requires much more work, e.g. by using method 
return barriers and has been left as a future enhancement.

CR:
https://bugs.openjdk.java.net/browse/JDK-8230706
Webrev:
http://cr.openjdk.java.net/~tschatzl/8230706/webrev/
Testing:
hs-tier1-5, many cassandra runs - no more exceptionally long concurrent 
start pauses :)

Thanks,
   Thomas