RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1
Thomas Schatzl
thomas.schatzl at oracle.com
Wed Sep 25 10:42:51 UTC 2019
Hi all,
can I have reviews for this change that fixes a regression introduced
in jdk11?
So, currently in g1, in the concurrent start pause we have to mark
through all live oops referenced from thread stacks to keep those alive
across the concurrent marking in presence of class unloading. These are
"strong" nmethods.
However G1 also scans oops of nmethods as part of the per-region nmethod
remembered sets, but should not mark their oops as their classes should
remain unloadable. These are "weak" nmethods.
Regardless of either way of processing, nmethods are claimed, to prevent
processing them multiple times as happens often. Which means that if an
nmethod is claimed first via the "weak" path, its oops will not be
marked properly, and *boom*.
In order to prevent this, currently there is a hard synchronization,
i.e. wait barrrier between strong nmethod processing and the actual
evacuation (which ultimately does the weak nmethod processing).
This is a problem, particularly since jdk11, because stacks are claimed
on a per thread basis. I.e. an extremely deep stack could make all 100's
or 1000's of your cpu cores wait idly. The "particular since jdk11" part
is that since then the amount of work done during nmethod iteration is
only bounded by the amount of live objects to copy (i.e. JDK-6672778
added task queue trimming which is generally a good thing), which can
take a lot of time.
In this case, in some internal Cassandra setup we have seen ridiculously
long waiting times as a single thread apparently evacuates the whole
young gen... :(
This change moves the wait to the end of the evacuation, remembering any
nmethods that were claimed by the weak nmethod processing before the
strong nmethod processing got to it.
Since the amount of nmethods that need to be marked through has always
been very low (single digit), that phase took <0.1ms typically. There is
some attempt to parallelize this phase based on the number of nmethods
(this pass only needs to mark the oops <= TAMS in nmethods, no more)
anyway though.
I will look into merging parallel phases into a single one in the
post-evacuation phase soon to get rid of this additional spin-up of
threads (only during concurrent mark) again.
During this work I tried several alternatives that were rejected:
- disabling task queue trimming; that works, but still has the problem
with deep thread stacks
- moving the actual wait deep into the evacuation code: made a mess with
the code, and still does not really solve the problem
- instead of remembering nmethods, concurrently process them. Does not
work because x86 embedded oops are not word-aligned, so this is not a
good idea.
There is always the option of doing the stack scanning in the concurrent
phase: that seemingly requires much more work, e.g. by using method
return barriers and has been left as a future enhancement.
CR:
https://bugs.openjdk.java.net/browse/JDK-8230706
Webrev:
http://cr.openjdk.java.net/~tschatzl/8230706/webrev/
Testing:
hs-tier1-5, many cassandra runs - no more exceptionally long concurrent
start pauses :)
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list