Class unloading in ZGC

Erik Österlund erik.osterlund at oracle.com
Mon Dec 7 13:26:11 UTC 2020


Hi Liang,

Sorry, I don't know if I understand what you are referring to 
specifically. I think you
are talking about what happens when class unloading is enabled, am I right?

If so, then there is indeed a difference between G1 and ZGC. They both 
scan the stacks,
marking through on-stack nmethods. But ZGC also arms nmethod entry 
barriers, to lazily
mark through nmethod oops. Here is why.

ZGC needs to color all nmethod oops as "marked" before exposing them to 
mutator threads.
We also think that explicitly marking oops exposed to mutators is the 
most robust way
of treating these oops, as they are indeed weak until used. So marking 
them in the nmethod
entry barrier during concurrent marking, is in spirit very similar to 
applying a weak load
barrier on Reference.get(), which also G1 does.

The contract with a SATB collector like G1 is that we need to apply 
barriers when loading
a weak oop. The nmethod oops are weak. So not applying nmethod entry 
barriers, does seem
like a violation of the SATB invariant, for G1. However, people are 
arguing that it is okay,
as all oops embedded in nmethods, that are reachable by mutators during 
concurrent marking,
will have their oops marked through. That is okay, as long as the 
compiler knows about SATB,
and hence what oops it is allowed to embed in the nmethods. If the 
compiler was to for example
embed a string from the string table, that might not necessarily be 
reachable by the holders
of the inlined method holders, then this approach would crash as the 
violation of the SATB
contract would suddenly become more visible. By using nmethod entry 
barriers, this logic
becomes more robust, as the compiler does not have to know what oops it 
may or may not embed
into the code stream, as we explicitly apply barriers.

While the robustness reason is one reason to do this dance regardless, 
we certainly do also
need to apply the right colors in ZGC to the pointers, regardless of 
whether we would trust
the actual objects to be marked or not. And, in order to deal with 
relocation properly, we
needed something like nmethod entry barriers anyway, as a mutator really 
is not allowed to
see not yet relocated oops. So with this mechanism already in place, it 
made sense to use it
for marking as well, solving 3 problems at the same time: 1) ensuring 
the objects are marked
in a more robust way, 2) ensuring the colors of exposed nmethods are 
good during marking, and
3) dealing with concurrent relocation.

I have argued that G1 should also use nmethod entry barriers to 
explicitly enforce its SATB
invariant, regarding these weak oops, and that the way they are treated 
today is not robust.
In fact, that is indeed being done in the loom repo, and is likely to 
become the standard way
of dealing with concurrent marking w.r.t. nmethods, for all concurrently 
marking GCs in HotSpot.

Hope this helps, and that I got your question right.

Thanks,
/Erik

On 2020-12-07 13:47, Liang Mao wrote:
> Hi Erik,
>
> If we are only considering the pause time thread root processing in 
> jdk12-15.
> Comparing to G1 which only marks the on-stack nmethod at mark start pause
> without nmethod entry barrier, ZGC will mark the on-stack nmethod
> at mark start pause and also use nmethod entry barrier to do the marking.
> Is the additional marking by nmethod entry barrier a specific behavior 
> because of
> color pointer mechanism?
>
> Thanks,
> Liang
>
>
>
>     ------------------------------------------------------------------
>     From:Erik Österlund <erik.osterlund at oracle.com>
>     Send Time:2020 Dec. 7 (Mon.) 20:08
>     To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; zgc-dev
>     <zgc-dev at openjdk.java.net>
>     Subject:Re: Class unloading in ZGC
>
>     Hi Liang,
>
>     On 2020-12-07 12:48, Liang Mao wrote:
>
>      Hi Erik,
>
>
>     Appreciate your comprehensive reply!
>
>     I still have few quetion.
>
>     > -----Original Message-----
>
>     > From: Erik Österlund [mailto:erik.osterlund at oracle.com]
>
>     > Sent: 2020年12月7日 18:35
>
>     > To: Liang Mao <maoliang.ml at alibaba-inc.com>; zgc-dev <zgc-
>
>     > dev at openjdk.java.net>
>
>     > Subject: Re: Class unloading in ZGC
>
>     > 
>
>     > Hi Liang,
>
>     > 
>
>     > So there are two distict cases. Class unloading enabled (default),
>     and class
>
>     > unloading disabled (seemingly for people that just really want to
>     have memory
>
>     > leaks for no apparent good reason).
>
>     > 
>
>     > When class unloading is enabled, the code cache comprises weak
>     roots, except
>
>     > oops that are on-stack that are treated as strong. These
>     semantics are the same
>
>     > across all GCs.
>
>     > When marking starts, ZGC
>
>     > lazily processes the snapshot of nmethods that were on-stack when
>     marking
>
>     > started, with lazy application of nmethod entry barriers. These
>     barriers will mark
>
>
>     Sorry that I need to mention I was looking at the code of
>     8214897: ZGC: Concurrent Class Unloading.
>
>     It handled the on-stack nmethod at pause time. Do you mean the
>     pause processing
>
>     is not necessary at that patch and the nmethod walking can be
>     delayed as long as nmethod
>
>     entry barrier is there?
>
>     On the other hand, if on-stack nmethod is processed at pause time
>     in mark start,  the nmethod
>
>     entry barrier is not necessary?
>
>
>     What I was describing is what we do today, as opposed to what we
>     did in JDK12.
>
>     Back then, we did not have concurrent stack processing, which we
>     do have today. Therefore,
>     in that patch, I had to process stacks in a safepoint. Moreover,
>     when class unloading is disabled,
>     I walked the code cache in a safepoint. I was not feeling very
>     motivated to optimize the case when
>     class unloading is disabled, as there is pretty much no reason I
>     can think of why you would want
>     to disable it. It's just a memory leak with no benefit, to disable
>     class unloading. For other collectors
>     class unloading might come at a latency cost. But for ZGC it does
>     not. So there does not seem to exist
>     any form of trade-off.
>
>     Since concurrent stack processing was integrated, there is no
>     longer any need for processing
>     the on-stack nmethods in safepoints, so that has been moved out of
>     safepoints and is instead
>     concurrently, incrementally and cooperatively applied through lazy
>     nmethod entry barriers as
>     the mutators return into frames that have not been processed yet.
>     Since then, we have also made
>     the code cache walk when class unloading is disabled concurrent,
>     as it simplified the root processing
>     code in the end to have only concurrent roots, instead of
>     distinguising between STW and concurrent
>     roots as well as strong vs weak. Now there is only strong vs weak,
>     and no roots are scanned during
>     safepoint operations, with or without class unloading.
>
>     Thanks,
>     /Erik
>
>     Thanks,
>
>     Liang
>
>
>     > the objects, and heal the pointers to the corresponding marked color, as
>
>     > expected by our barrier machinery. New nmethods that are called
>     go through
>
>     > the same processing using nmethod entry barriers. Semantically this
>     ensures that
>
>     > on-stack nmethods are treated as strong roots, and the rest of
>     the nmethods
>
>     > are treated as weak roots.
>
>     > This has the same semantics
>
>     > as any other GC.
>
>     > 
>
>     > When class unloading is disabled, the code cache comprises strong
>     roots.
>
>     > That means that the GC will
>
>     > during concurrent marking walk all nmethods, and mark the oops as
>     strong.
>
>     > However, remember that there are two operations: marking the
>     objects, and
>
>     > self-healing the pointers as expected by the barrier machinery.
>
>     > The second part of the operation still requires us to lazily apply
>     nmethod entry
>
>     > barriers to the stacks as well as arming nmethod entry barriers
>     for calls, during
>
>     > concurrent marking, so that the oops in the nmethods are
>     self-healed to the
>
>     > corresponding marked pointer color, before they are exposed to
>     the execution
>
>     > of mutators, which might for example store this oop into the object
>     graph. So I
>
>     > suppose the special thing here compared to G1 is that we both
>     walk the code
>
>     > cache marking all the oops, *and* explicitly walk the stacks
>     marking them as
>
>     > well, with the main purpose of fixing the pointer colors before
>     the mutator gets
>
>     > to use the nmethod. And arming the nmethod entry barriers for calls,
>     for the
>
>     > same reason.
>
>     > 
>
>     > During relocation, we only arm the nmethod entry barriers with
>     and without
>
>     > class unloading. The relocation is lazy and won't be performed
>     until either
>
>     > someone uses the nmethod (on-stack lazy nmethod entry barrier or
>     a call to a
>
>     > new nmethod), or the subsequent marking cycle will walk the code
>     cache and
>
>     > make sure that the objects are remapped, when it is performing
>     marking.
>
>     > 
>
>     > Hope this makes sense and sheds some light on this confusion.
>
>     > 
>
>     > /Erik
>
>     > 
>
>     > On 2020-12-06 16:40, Liang Mao wrote:
>
>     > > Hi ZGC team,
>
>     > >
>
>     > > Previously without concurrent class unloading in ZGC, the code
>     cache
>
>     > > will be all treated as strong roots. Then concurrent class
>     unloading
>
>     > > will only mark the nmethod of executing threads at mark start pause
>
>     > > and use the nmethod entry barrier to heal and also mark the
>     oops. That
>
>     > > sounds reasonable. But when I looked into the concurrent
>     marking in G1, it
>
>     > doesn't threat all code cache as strong roots and of course has
>     no nmethod
>
>     > entry barrier. So I'm confused why ZGC need the nmethod entry
>     barrier for
>
>     > >marking. Does the difference comes from the different algorithm
>     of SATB vs
>
>     > load barrier?
>
>     > >
>
>     > > Thanks,
>
>     > > Liang
>
>     > >
>
>
>



More information about the zgc-dev mailing list