From maoliang.ml at alibaba-inc.com Thu Dec 3 10:02:53 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Thu, 03 Dec 2020 18:02:53 +0800
Subject: =?UTF-8?B?QSBxdWVzdGlvbiBpbiBaR0M=?=
Message-ID: <9d8027ad-b852-4e2f-bb4d-3c525ac6f3b2.maoliang.ml@alibaba-inc.com>
Dear ZGC developers,
May I ask a detailed question about the following code?
void ZMark::follow_array_object(objArrayOop obj, bool finalizable) {
if (finalizable) {
ZMarkBarrierOopClosure cl;
cl.do_klass(obj->klass());
} else {
ZMarkBarrierOopClosure cl;
cl.do_klass(obj->klass());
}
I'm confused with this. Why do we need to specificaly visit the object array's
klass(actually the class load data) when we are marking the object array?
It comes from JDK12:8214897: ZGC: Concurrent Class Unloading.
Thanks,
Liang
From stefan.karlsson at oracle.com Thu Dec 3 10:07:21 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 3 Dec 2020 11:07:21 +0100
Subject: A question in ZGC
In-Reply-To: <9d8027ad-b852-4e2f-bb4d-3c525ac6f3b2.maoliang.ml@alibaba-inc.com>
References: <9d8027ad-b852-4e2f-bb4d-3c525ac6f3b2.maoliang.ml@alibaba-inc.com>
Message-ID: <0b855d31-699c-f92a-8e40-184959cd7196@oracle.com>
Hi Liang,
On 2020-12-03 11:02, Liang Mao wrote:
> Dear ZGC developers,
>
> May I ask a detailed question about the following code?
>
> void ZMark::follow_array_object(objArrayOop obj, bool finalizable) {
> if (finalizable) {
> ZMarkBarrierOopClosure cl;
> cl.do_klass(obj->klass());
> } else {
> ZMarkBarrierOopClosure cl;
> cl.do_klass(obj->klass());
> }
>
> I'm confused with this. Why do we need to specificaly visit the object array's
> klass(actually the class load data) when we are marking the object array?
> It comes from JDK12:8214897: ZGC: Concurrent Class Unloading.
This is done to keep the class of the object array alive. We do the same
for other GCs as well, but through a different code path. See:
template
void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) {
assert (obj->is_array(), "obj must be array");
objArrayOop a = objArrayOop(obj);
if (Devirtualizer::do_metadata(closure)) {
Devirtualizer::do_klass(closure, obj->klass());
}
oop_oop_iterate_elements(a, closure);
}
Where we visit the klass and all its normal object pointers.
StefanK
>
> Thanks,
> Liang
>
From maoliang.ml at alibaba-inc.com Thu Dec 3 12:12:53 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Thu, 03 Dec 2020 20:12:53 +0800
Subject: =?UTF-8?B?UmU6IEEgcXVlc3Rpb24gaW4gWkdD?=
In-Reply-To: <0b855d31-699c-f92a-8e40-184959cd7196@oracle.com>
References: <9d8027ad-b852-4e2f-bb4d-3c525ac6f3b2.maoliang.ml@alibaba-inc.com>,
<0b855d31-699c-f92a-8e40-184959cd7196@oracle.com>
Message-ID:
Hi Stefan,
Thanks very much for your explanation!
Liang
------------------------------------------------------------------
From:Stefan Karlsson
Send Time:2020 Dec. 3 (Thu.) 18:07
To:"MAO, Liang" ; zgc-dev
Subject:Re: A question in ZGC
Hi Liang,
On 2020-12-03 11:02, Liang Mao wrote:
> Dear ZGC developers,
>
> May I ask a detailed question about the following code?
>
> void ZMark::follow_array_object(objArrayOop obj, bool finalizable) {
> if (finalizable) {
> ZMarkBarrierOopClosure cl;
> cl.do_klass(obj->klass());
> } else {
> ZMarkBarrierOopClosure cl;
> cl.do_klass(obj->klass());
> }
>
> I'm confused with this. Why do we need to specificaly visit the object array's
> klass(actually the class load data) when we are marking the object array?
> It comes from JDK12:8214897: ZGC: Concurrent Class Unloading.
This is done to keep the class of the object array alive. We do the same
for other GCs as well, but through a different code path. See:
template
void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) {
assert (obj->is_array(), "obj must be array");
objArrayOop a = objArrayOop(obj);
if (Devirtualizer::do_metadata(closure)) {
Devirtualizer::do_klass(closure, obj->klass());
}
oop_oop_iterate_elements(a, closure);
}
Where we visit the klass and all its normal object pointers.
StefanK
>
> Thanks,
> Liang
>
From maoliang.ml at alibaba-inc.com Sun Dec 6 15:40:22 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Sun, 06 Dec 2020 23:40:22 +0800
Subject: =?UTF-8?B?Q2xhc3MgdW5sb2FkaW5nIGluIFpHQw==?=
Message-ID:
Hi ZGC team,
Previously without concurrent class unloading in ZGC, the code cache will be all treated as strong
roots. Then concurrent class unloading will only mark the nmethod of executing threads at mark start
pause and use the nmethod entry barrier to heal and also mark the oops. That sounds reasonable. But
when I looked into the concurrent marking in G1, it doesn't threat all code cache as strong roots and
of course has no nmethod entry barrier. So I'm confused why ZGC need the nmethod entry barrier for
marking. Does the difference comes from the different algorithm of SATB vs load barrier?
Thanks,
Liang
From erik.osterlund at oracle.com Mon Dec 7 10:35:26 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 7 Dec 2020 11:35:26 +0100
Subject: Class unloading in ZGC
In-Reply-To:
References:
Message-ID:
Hi Liang,
So there are two distict cases. Class unloading enabled (default), and
class unloading disabled (seemingly
for people that just really want to have memory leaks for no apparent
good reason).
When class unloading is enabled, the code cache comprises weak roots,
except oops that are on-stack
that are treated as strong. These semantics are the same across all GCs.
When marking starts, ZGC
lazily processes the snapshot of nmethods that were on-stack when
marking started, with lazy application
of nmethod entry barriers. These barriers will mark the objects, and
heal the pointers to the corresponding
marked color, as expected by our barrier machinery. New nmethods that
are called go through the same
processing using nmethod entry barriers. Semantically this ensures that
on-stack nmethods are treated
as strong roots, and the rest of the nmethods are treated as weak roots.
This has the same semantics
as any other GC.
When class unloading is disabled, the code cache comprises strong roots.
That means that the GC will
during concurrent marking walk all nmethods, and mark the oops as
strong. However, remember that there
are two operations: marking the objects, and self-healing the pointers
as expected by the barrier machinery.
The second part of the operation still requires us to lazily apply
nmethod entry barriers to the stacks
as well as arming nmethod entry barriers for calls, during concurrent
marking, so that the oops in the
nmethods are self-healed to the corresponding marked pointer color,
before they are exposed to the
execution of mutators, which might for example store this oop into the
object graph. So I suppose the
special thing here compared to G1 is that we both walk the code cache
marking all the oops, *and* explicitly
walk the stacks marking them as well, with the main purpose of fixing
the pointer colors before the mutator
gets to use the nmethod. And arming the nmethod entry barriers for
calls, for the same reason.
During relocation, we only arm the nmethod entry barriers with and
without class unloading. The relocation
is lazy and won't be performed until either someone uses the nmethod
(on-stack lazy nmethod entry barrier
or a call to a new nmethod), or the subsequent marking cycle will walk
the code cache and make sure that
the objects are remapped, when it is performing marking.
Hope this makes sense and sheds some light on this confusion.
/Erik
On 2020-12-06 16:40, Liang Mao wrote:
> Hi ZGC team,
>
> Previously without concurrent class unloading in ZGC, the code cache will be all treated as strong
> roots. Then concurrent class unloading will only mark the nmethod of executing threads at mark start
> pause and use the nmethod entry barrier to heal and also mark the oops. That sounds reasonable. But
> when I looked into the concurrent marking in G1, it doesn't threat all code cache as strong roots and
> of course has no nmethod entry barrier. So I'm confused why ZGC need the nmethod entry barrier for
> marking. Does the difference comes from the different algorithm of SATB vs load barrier?
>
> Thanks,
> Liang
>
From maoliang.ml at alibaba-inc.com Mon Dec 7 11:48:53 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Mon, 07 Dec 2020 19:48:53 +0800
Subject: =?UTF-8?B?UmU6IENsYXNzIHVubG9hZGluZyBpbiBaR0M=?=
In-Reply-To:
References: ,
Message-ID:
Hi Erik,
Appreciate your comprehensive reply!
I still have few quetion.
> -----Original Message-----
> From: Erik ?sterlund [mailto:erik.osterlund at oracle.com]
> Sent: 2020?12?7? 18:35
> To: Liang Mao ; zgc-dev dev at openjdk.java.net>
> Subject: Re: Class unloading in ZGC
>
> Hi Liang,
>
> So there are two distict cases. Class unloading enabled (default), and class
> unloading disabled (seemingly for people that just really want to have memory
> leaks for no apparent good reason).
>
> When class unloading is enabled, the code cache comprises weak roots, except
> oops that are on-stack that are treated as strong. These semantics are the same
> across all GCs.
> When marking starts, ZGC
> lazily processes the snapshot of nmethods that were on-stack when marking
> started, with lazy application of nmethod entry barriers. These barriers will mark
Sorry that I need to mention I was looking at the code of 8214897: ZGC: Concurrent Class Unloading.
It handled the on-stack nmethod at pause time. Do you mean the pause processing
is not necessary at that patch and the nmethod walking can be delayed as long as nmethod
entry barrier is there?
On the other hand, if on-stack nmethod is processed at pause time in mark start, the nmethod
entry barrier is not necessary?
Thanks,
Liang
> the objects, and heal the pointers to the corresponding marked color, as
> expected by our barrier machinery. New nmethods that are called go through
> the same processing using nmethod entry barriers. Semantically this ensures that
> on-stack nmethods are treated as strong roots, and the rest of the nmethods
> are treated as weak roots.
> This has the same semantics
> as any other GC.
>
> When class unloading is disabled, the code cache comprises strong roots.
> That means that the GC will
> during concurrent marking walk all nmethods, and mark the oops as strong.
> However, remember that there are two operations: marking the objects, and
> self-healing the pointers as expected by the barrier machinery.
> The second part of the operation still requires us to lazily apply nmethod entry
> barriers to the stacks as well as arming nmethod entry barriers for calls, during
> concurrent marking, so that the oops in the nmethods are self-healed to the
> corresponding marked pointer color, before they are exposed to the execution
> of mutators, which might for example store this oop into the object graph. So I
> suppose the special thing here compared to G1 is that we both walk the code
> cache marking all the oops, *and* explicitly walk the stacks marking them as
> well, with the main purpose of fixing the pointer colors before the mutator gets
> to use the nmethod. And arming the nmethod entry barriers for calls, for the
> same reason.
>
> During relocation, we only arm the nmethod entry barriers with and without
> class unloading. The relocation is lazy and won't be performed until either
> someone uses the nmethod (on-stack lazy nmethod entry barrier or a call to a
> new nmethod), or the subsequent marking cycle will walk the code cache and
> make sure that the objects are remapped, when it is performing marking.
>
> Hope this makes sense and sheds some light on this confusion.
>
> /Erik
>
> On 2020-12-06 16:40, Liang Mao wrote:
> > Hi ZGC team,
> >
> > Previously without concurrent class unloading in ZGC, the code cache
> > will be all treated as strong roots. Then concurrent class unloading
> > will only mark the nmethod of executing threads at mark start pause
> > and use the nmethod entry barrier to heal and also mark the oops. That
> > sounds reasonable. But when I looked into the concurrent marking in G1, it
> doesn't threat all code cache as strong roots and of course has no nmethod
> entry barrier. So I'm confused why ZGC need the nmethod entry barrier for
> > marking. Does the difference comes from the different algorithm of SATB vs
> load barrier?
> >
> > Thanks,
> > Liang
> >
From erik.osterlund at oracle.com Mon Dec 7 12:08:10 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 7 Dec 2020 13:08:10 +0100
Subject: Class unloading in ZGC
In-Reply-To:
References:
Message-ID: <992575da-4d9c-671b-67da-cde23b3afeb1@oracle.com>
Hi Liang,
On 2020-12-07 12:48, Liang Mao wrote:
>
> ?Hi Erik,
>
>
> Appreciate your comprehensive reply!
>
> I still have few quetion.
>
> > -----Original Message-----
>
> > From: Erik ?sterlund [mailto:erik.osterlund at oracle.com]
>
> > Sent: 2020?12?7? 18:35
>
> > To: Liang Mao ; zgc-dev
> > dev at openjdk.java.net>
>
> > Subject: Re: Class unloading in ZGC
>
> >
>
> > Hi Liang,
>
> >
>
> > So there are two distict cases. Class unloading enabled (default), and class
>
> > unloading disabled (seemingly for people that just really want to have memory
>
> > leaks for no apparent good reason).
>
> >
>
> > When class unloading is enabled, the code cache comprises weak roots, except
>
> > oops that are on-stack that are treated as strong. These semantics are the same
>
> > across all GCs.
>
> > When marking starts, ZGC
>
> > lazily processes the snapshot of nmethods that were on-stack when marking
>
> > started, with lazy application of nmethod entry barriers. These barriers will mark
>
>
> Sorry that I need to mention I was looking at the code of
> 8214897:?ZGC:?Concurrent?Class?Unloading.
>
> It handled the on-stack nmethod at pause time. Do you mean the pause
> processing
>
> is not necessary at that patch and the nmethod walking can be delayed
> as long as nmethod
>
> entry barrier is there?
>
> On the other hand, if on-stack nmethod is processed at pause time in
> mark start,? the nmethod
>
> entry barrier is not necessary?
>
What I was describing is what we do today, as opposed to what we did in
JDK12.
Back then, we did not have concurrent stack processing, which we do have
today. Therefore,
in that patch, I had to process stacks in a safepoint. Moreover, when
class unloading is disabled,
I walked the code cache in a safepoint. I was not feeling very motivated
to optimize the case when
class unloading is disabled, as there is pretty much no reason I can
think of why you would want
to disable it. It's just a memory leak with no benefit, to disable class
unloading. For other collectors
class unloading might come at a latency cost. But for ZGC it does not.
So there does not seem to exist
any form of trade-off.
Since concurrent stack processing was integrated, there is no longer any
need for processing
the on-stack nmethods in safepoints, so that has been moved out of
safepoints and is instead
concurrently, incrementally and cooperatively applied through lazy
nmethod entry barriers as
the mutators return into frames that have not been processed yet. Since
then, we have also made
the code cache walk when class unloading is disabled concurrent, as it
simplified the root processing
code in the end to have only concurrent roots, instead of distinguising
between STW and concurrent
roots as well as strong vs weak. Now there is only strong vs weak, and
no roots are scanned during
safepoint operations, with or without class unloading.
Thanks,
/Erik
> Thanks,
>
> Liang
>
>
> > the objects, and heal the pointers to the corresponding marked color, as
>
> > expected by our barrier machinery. New nmethods that are called go through
>
> > the same processing using nmethod entry barriers. Semantically this ensures that
>
> > on-stack nmethods are treated as strong roots, and the rest of the nmethods
>
> > are treated as weak roots.
>
> > This has the same semantics
>
> > as any other GC.
>
> >
>
> > When class unloading is disabled, the code cache comprises strong roots.
>
> > That means that the GC will
>
> > during concurrent marking walk all nmethods, and mark the oops as strong.
>
> > However, remember that there are two operations: marking the objects, and
>
> > self-healing the pointers as expected by the barrier machinery.
>
> > The second part of the operation still requires us to lazily apply nmethod entry
>
> > barriers to the stacks as well as arming nmethod entry barriers for calls, during
>
> > concurrent marking, so that the oops in the nmethods are self-healed to the
>
> > corresponding marked pointer color, before they are exposed to the execution
>
> > of mutators, which might for example store this oop into the object
> graph. So I
>
> > suppose the special thing here compared to G1 is that we both walk the code
>
> > cache marking all the oops, *and* explicitly walk the stacks marking them as
>
> > well, with the main purpose of fixing the pointer colors before the mutator gets
>
> > to use the nmethod. And arming the nmethod entry barriers for calls, for the
>
> > same reason.
>
> >
>
> > During relocation, we only arm the nmethod entry barriers with and without
>
> > class unloading. The relocation is lazy and won't be performed until either
>
> > someone uses the nmethod (on-stack lazy nmethod entry barrier or a call to a
>
> > new nmethod), or the subsequent marking cycle will walk the code cache and
>
> > make sure that the objects are remapped, when it is performing marking.
>
> >
>
> > Hope this makes sense and sheds some light on this confusion.
>
> >
>
> > /Erik
>
> >
>
> > On 2020-12-06 16:40, Liang Mao wrote:
>
> > > Hi ZGC team,
>
> > >
>
> > > Previously without concurrent class unloading in ZGC, the code cache
>
> > > will be all treated as strong roots. Then concurrent class unloading
>
> > > will only mark the nmethod of executing threads at mark start pause
>
> > > and use the nmethod entry barrier to heal and also mark the oops. That
>
> > > sounds reasonable. But when I looked into the concurrent marking in G1, it
>
> > doesn't threat all code cache as strong roots and of course has no nmethod
>
> > entry barrier. So I'm confused why ZGC need the nmethod entry barrier for
>
> > >marking. Does the difference comes from the different algorithm of SATB vs
>
> > load barrier?
>
> > >
>
> > > Thanks,
>
> > > Liang
>
> > >
>
From maoliang.ml at alibaba-inc.com Mon Dec 7 12:47:53 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Mon, 07 Dec 2020 20:47:53 +0800
Subject: =?UTF-8?B?Q2xhc3MgdW5sb2FkaW5nIGluIFpHQw==?=
Message-ID: <9ad150b3-01af-4bff-99b7-3a1fa147589a.maoliang.ml@alibaba-inc.com>
Hi Erik,
If we are only considering the pause time thread root processing in jdk12-15.
Comparing to G1 which only marks the on-stack nmethod at mark start pause
without nmethod entry barrier, ZGC will mark the on-stack nmethod
at mark start pause and also use nmethod entry barrier to do the marking.
Is the additional marking by nmethod entry barrier a specific behavior because of
color pointer mechanism?
Thanks,
Liang
------------------------------------------------------------------
From:Erik ?sterlund
Send Time:2020 Dec. 7 (Mon.) 20:08
To:"MAO, Liang" ; zgc-dev
Subject:Re: Class unloading in ZGC
Hi Liang,
On 2020-12-07 12:48, Liang Mao wrote:
Hi Erik,
Appreciate your comprehensive reply!
I still have few quetion.
> -----Original Message-----
> From: Erik ?sterlund [mailto:erik.osterlund at oracle.com]
> Sent: 2020?12?7? 18:35
> To: Liang Mao ; zgc-dev dev at openjdk.java.net>
> Subject: Re: Class unloading in ZGC
>
> Hi Liang,
>
> So there are two distict cases. Class unloading enabled (default), and class
> unloading disabled (seemingly for people that just really want to have memory
> leaks for no apparent good reason).
>
> When class unloading is enabled, the code cache comprises weak roots, except
> oops that are on-stack that are treated as strong. These semantics are the same
> across all GCs.
> When marking starts, ZGC
> lazily processes the snapshot of nmethods that were on-stack when marking
> started, with lazy application of nmethod entry barriers. These barriers will mark
Sorry that I need to mention I was looking at the code of 8214897: ZGC: Concurrent Class Unloading.
It handled the on-stack nmethod at pause time. Do you mean the pause processing
is not necessary at that patch and the nmethod walking can be delayed as long as nmethod
entry barrier is there?
On the other hand, if on-stack nmethod is processed at pause time in mark start, the nmethod
entry barrier is not necessary?
What I was describing is what we do today, as opposed to what we did in JDK12.
Back then, we did not have concurrent stack processing, which we do have today. Therefore,
in that patch, I had to process stacks in a safepoint. Moreover, when class unloading is disabled,
I walked the code cache in a safepoint. I was not feeling very motivated to optimize the case when
class unloading is disabled, as there is pretty much no reason I can think of why you would want
to disable it. It's just a memory leak with no benefit, to disable class unloading. For other collectors
class unloading might come at a latency cost. But for ZGC it does not. So there does not seem to exist
any form of trade-off.
Since concurrent stack processing was integrated, there is no longer any need for processing
the on-stack nmethods in safepoints, so that has been moved out of safepoints and is instead
concurrently, incrementally and cooperatively applied through lazy nmethod entry barriers as
the mutators return into frames that have not been processed yet. Since then, we have also made
the code cache walk when class unloading is disabled concurrent, as it simplified the root processing
code in the end to have only concurrent roots, instead of distinguising between STW and concurrent
roots as well as strong vs weak. Now there is only strong vs weak, and no roots are scanned during
safepoint operations, with or without class unloading.
Thanks,
/Erik
Thanks,
Liang
> the objects, and heal the pointers to the corresponding marked color, as
> expected by our barrier machinery. New nmethods that are called go through
> the same processing using nmethod entry barriers. Semantically this ensures that
> on-stack nmethods are treated as strong roots, and the rest of the nmethods
> are treated as weak roots.
> This has the same semantics
> as any other GC.
>
> When class unloading is disabled, the code cache comprises strong roots.
> That means that the GC will
> during concurrent marking walk all nmethods, and mark the oops as strong.
> However, remember that there are two operations: marking the objects, and
> self-healing the pointers as expected by the barrier machinery.
> The second part of the operation still requires us to lazily apply nmethod entry
> barriers to the stacks as well as arming nmethod entry barriers for calls, during
> concurrent marking, so that the oops in the nmethods are self-healed to the
> corresponding marked pointer color, before they are exposed to the execution
> of mutators, which might for example store this oop into the object graph. So I
> suppose the special thing here compared to G1 is that we both walk the code
> cache marking all the oops, *and* explicitly walk the stacks marking them as
> well, with the main purpose of fixing the pointer colors before the mutator gets
> to use the nmethod. And arming the nmethod entry barriers for calls, for the
> same reason.
>
> During relocation, we only arm the nmethod entry barriers with and without
> class unloading. The relocation is lazy and won't be performed until either
> someone uses the nmethod (on-stack lazy nmethod entry barrier or a call to a
> new nmethod), or the subsequent marking cycle will walk the code cache and
> make sure that the objects are remapped, when it is performing marking.
>
> Hope this makes sense and sheds some light on this confusion.
>
> /Erik
>
> On 2020-12-06 16:40, Liang Mao wrote:
> > Hi ZGC team,
> >
> > Previously without concurrent class unloading in ZGC, the code cache
> > will be all treated as strong roots. Then concurrent class unloading
> > will only mark the nmethod of executing threads at mark start pause
> > and use the nmethod entry barrier to heal and also mark the oops. That
> > sounds reasonable. But when I looked into the concurrent marking in G1, it
> doesn't threat all code cache as strong roots and of course has no nmethod
> entry barrier. So I'm confused why ZGC need the nmethod entry barrier for
> > marking. Does the difference comes from the different algorithm of SATB vs
> load barrier?
> >
> > Thanks,
> > Liang
> >
From erik.osterlund at oracle.com Mon Dec 7 13:26:11 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 7 Dec 2020 14:26:11 +0100
Subject: Class unloading in ZGC
In-Reply-To: <9ad150b3-01af-4bff-99b7-3a1fa147589a.maoliang.ml@alibaba-inc.com>
References: <9ad150b3-01af-4bff-99b7-3a1fa147589a.maoliang.ml@alibaba-inc.com>
Message-ID:
Hi Liang,
Sorry, I don't know if I understand what you are referring to
specifically. I think you
are talking about what happens when class unloading is enabled, am I right?
If so, then there is indeed a difference between G1 and ZGC. They both
scan the stacks,
marking through on-stack nmethods. But ZGC also arms nmethod entry
barriers, to lazily
mark through nmethod oops. Here is why.
ZGC needs to color all nmethod oops as "marked" before exposing them to
mutator threads.
We also think that explicitly marking oops exposed to mutators is the
most robust way
of treating these oops, as they are indeed weak until used. So marking
them in the nmethod
entry barrier during concurrent marking, is in spirit very similar to
applying a weak load
barrier on Reference.get(), which also G1 does.
The contract with a SATB collector like G1 is that we need to apply
barriers when loading
a weak oop. The nmethod oops are weak. So not applying nmethod entry
barriers, does seem
like a violation of the SATB invariant, for G1. However, people are
arguing that it is okay,
as all oops embedded in nmethods, that are reachable by mutators during
concurrent marking,
will have their oops marked through. That is okay, as long as the
compiler knows about SATB,
and hence what oops it is allowed to embed in the nmethods. If the
compiler was to for example
embed a string from the string table, that might not necessarily be
reachable by the holders
of the inlined method holders, then this approach would crash as the
violation of the SATB
contract would suddenly become more visible. By using nmethod entry
barriers, this logic
becomes more robust, as the compiler does not have to know what oops it
may or may not embed
into the code stream, as we explicitly apply barriers.
While the robustness reason is one reason to do this dance regardless,
we certainly do also
need to apply the right colors in ZGC to the pointers, regardless of
whether we would trust
the actual objects to be marked or not. And, in order to deal with
relocation properly, we
needed something like nmethod entry barriers anyway, as a mutator really
is not allowed to
see not yet relocated oops. So with this mechanism already in place, it
made sense to use it
for marking as well, solving 3 problems at the same time: 1) ensuring
the objects are marked
in a more robust way, 2) ensuring the colors of exposed nmethods are
good during marking, and
3) dealing with concurrent relocation.
I have argued that G1 should also use nmethod entry barriers to
explicitly enforce its SATB
invariant, regarding these weak oops, and that the way they are treated
today is not robust.
In fact, that is indeed being done in the loom repo, and is likely to
become the standard way
of dealing with concurrent marking w.r.t. nmethods, for all concurrently
marking GCs in HotSpot.
Hope this helps, and that I got your question right.
Thanks,
/Erik
On 2020-12-07 13:47, Liang Mao wrote:
> Hi Erik,
>
> If we are only considering the pause time thread root processing in
> jdk12-15.
> Comparing to G1 which only marks the on-stack nmethod at mark start pause
> without nmethod entry barrier, ZGC will mark the on-stack nmethod
> at mark start pause and also use nmethod entry barrier to do the marking.
> Is the additional marking by nmethod entry barrier a specific behavior
> because of
> color pointer mechanism?
>
> Thanks,
> Liang
>
>
>
> ------------------------------------------------------------------
> From:Erik ?sterlund
> Send Time:2020 Dec. 7 (Mon.) 20:08
> To:"MAO, Liang" ; zgc-dev
>
> Subject:Re: Class unloading in ZGC
>
> Hi Liang,
>
> On 2020-12-07 12:48, Liang Mao wrote:
>
> ?Hi Erik,
>
>
> Appreciate your comprehensive reply!
>
> I still have few quetion.
>
> > -----Original Message-----
>
> > From: Erik ?sterlund [mailto:erik.osterlund at oracle.com]
>
> > Sent: 2020?12?7? 18:35
>
> > To: Liang Mao ; zgc-dev
> > dev at openjdk.java.net>
>
> > Subject: Re: Class unloading in ZGC
>
> >
>
> > Hi Liang,
>
> >
>
> > So there are two distict cases. Class unloading enabled (default),
> and class
>
> > unloading disabled (seemingly for people that just really want to
> have memory
>
> > leaks for no apparent good reason).
>
> >
>
> > When class unloading is enabled, the code cache comprises weak
> roots, except
>
> > oops that are on-stack that are treated as strong. These
> semantics are the same
>
> > across all GCs.
>
> > When marking starts, ZGC
>
> > lazily processes the snapshot of nmethods that were on-stack when
> marking
>
> > started, with lazy application of nmethod entry barriers. These
> barriers will mark
>
>
> Sorry that I need to mention I was looking at the code of
> 8214897:?ZGC:?Concurrent?Class?Unloading.
>
> It handled the on-stack nmethod at pause time. Do you mean the
> pause processing
>
> is not necessary at that patch and the nmethod walking can be
> delayed as long as nmethod
>
> entry barrier is there?
>
> On the other hand, if on-stack nmethod is processed at pause time
> in mark start,? the nmethod
>
> entry barrier is not necessary?
>
>
> What I was describing is what we do today, as opposed to what we
> did in JDK12.
>
> Back then, we did not have concurrent stack processing, which we
> do have today. Therefore,
> in that patch, I had to process stacks in a safepoint. Moreover,
> when class unloading is disabled,
> I walked the code cache in a safepoint. I was not feeling very
> motivated to optimize the case when
> class unloading is disabled, as there is pretty much no reason I
> can think of why you would want
> to disable it. It's just a memory leak with no benefit, to disable
> class unloading. For other collectors
> class unloading might come at a latency cost. But for ZGC it does
> not. So there does not seem to exist
> any form of trade-off.
>
> Since concurrent stack processing was integrated, there is no
> longer any need for processing
> the on-stack nmethods in safepoints, so that has been moved out of
> safepoints and is instead
> concurrently, incrementally and cooperatively applied through lazy
> nmethod entry barriers as
> the mutators return into frames that have not been processed yet.
> Since then, we have also made
> the code cache walk when class unloading is disabled concurrent,
> as it simplified the root processing
> code in the end to have only concurrent roots, instead of
> distinguising between STW and concurrent
> roots as well as strong vs weak. Now there is only strong vs weak,
> and no roots are scanned during
> safepoint operations, with or without class unloading.
>
> Thanks,
> /Erik
>
> Thanks,
>
> Liang
>
>
> > the objects, and heal the pointers to the corresponding marked color, as
>
> > expected by our barrier machinery. New nmethods that are called
> go through
>
> > the same processing using nmethod entry barriers. Semantically this
> ensures that
>
> > on-stack nmethods are treated as strong roots, and the rest of
> the nmethods
>
> > are treated as weak roots.
>
> > This has the same semantics
>
> > as any other GC.
>
> >
>
> > When class unloading is disabled, the code cache comprises strong
> roots.
>
> > That means that the GC will
>
> > during concurrent marking walk all nmethods, and mark the oops as
> strong.
>
> > However, remember that there are two operations: marking the
> objects, and
>
> > self-healing the pointers as expected by the barrier machinery.
>
> > The second part of the operation still requires us to lazily apply
> nmethod entry
>
> > barriers to the stacks as well as arming nmethod entry barriers
> for calls, during
>
> > concurrent marking, so that the oops in the nmethods are
> self-healed to the
>
> > corresponding marked pointer color, before they are exposed to
> the execution
>
> > of mutators, which might for example store this oop into the object
> graph. So I
>
> > suppose the special thing here compared to G1 is that we both
> walk the code
>
> > cache marking all the oops, *and* explicitly walk the stacks
> marking them as
>
> > well, with the main purpose of fixing the pointer colors before
> the mutator gets
>
> > to use the nmethod. And arming the nmethod entry barriers for calls,
> for the
>
> > same reason.
>
> >
>
> > During relocation, we only arm the nmethod entry barriers with
> and without
>
> > class unloading. The relocation is lazy and won't be performed
> until either
>
> > someone uses the nmethod (on-stack lazy nmethod entry barrier or
> a call to a
>
> > new nmethod), or the subsequent marking cycle will walk the code
> cache and
>
> > make sure that the objects are remapped, when it is performing
> marking.
>
> >
>
> > Hope this makes sense and sheds some light on this confusion.
>
> >
>
> > /Erik
>
> >
>
> > On 2020-12-06 16:40, Liang Mao wrote:
>
> > > Hi ZGC team,
>
> > >
>
> > > Previously without concurrent class unloading in ZGC, the code
> cache
>
> > > will be all treated as strong roots. Then concurrent class
> unloading
>
> > > will only mark the nmethod of executing threads at mark start pause
>
> > > and use the nmethod entry barrier to heal and also mark the
> oops. That
>
> > > sounds reasonable. But when I looked into the concurrent
> marking in G1, it
>
> > doesn't threat all code cache as strong roots and of course has
> no nmethod
>
> > entry barrier. So I'm confused why ZGC need the nmethod entry
> barrier for
>
> > >marking. Does the difference comes from the different algorithm
> of SATB vs
>
> > load barrier?
>
> > >
>
> > > Thanks,
>
> > > Liang
>
> > >
>
>
>
From maoliang.ml at alibaba-inc.com Mon Dec 7 14:56:32 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Mon, 07 Dec 2020 22:56:32 +0800
Subject: =?UTF-8?B?Q2xhc3MgdW5sb2FkaW5nIGluIFpHQw==?=
Message-ID: <322bb5f9-f6cb-4ab8-9891-35e931f75da1.maoliang.ml@alibaba-inc.com>
Hi Erik,
Thank you! It's exactly what I want to know!
> The contract with a SATB collector like G1 is that we need to apply barriers when loading
> a weak oop. The nmethod oops are weak. So not applying nmethod entry barriers, does seem
> like a violation of the SATB invariant, for G1. However, people are arguing that it is okay,
> as all oops embedded in nmethods, that are reachable by mutators during concurrent marking,
> will have their oops marked through.
Do you mean the weak reference in G1 has the implicit *weak_pointer = null, and we
need to record the previous value to make sure the SATB invariant?
But for those nmethod which is only on-stack during concurrent marking, we didn't
enqueue the previous value. It is safe so far because the limitation of embeded oop,
right?
BTW, if ZGC has an explicit load barrier while accessing an oop from nmethod embeded,
is the nmethod entry barrier still necessary? Does nmethod entry barrier play the role
of such load barrier?
Thanks,
Liang
------------------------------------------------------------------
From:Erik ?sterlund
Send Time:2020 Dec. 7 (Mon.) 21:26
To:"MAO, Liang" ; zgc-dev
Subject:Re: Class unloading in ZGC
Hi Liang,
Sorry, I don't know if I understand what you are referring to specifically. I think you
are talking about what happens when class unloading is enabled, am I right?
If so, then there is indeed a difference between G1 and ZGC. They both scan the stacks,
marking through on-stack nmethods. But ZGC also arms nmethod entry barriers, to lazily
mark through nmethod oops. Here is why.
ZGC needs to color all nmethod oops as "marked" before exposing them to mutator threads.
We also think that explicitly marking oops exposed to mutators is the most robust way
of treating these oops, as they are indeed weak until used. So marking them in the nmethod
entry barrier during concurrent marking, is in spirit very similar to applying a weak load
barrier on Reference.get(), which also G1 does.
The contract with a SATB collector like G1 is that we need to apply barriers when loading
a weak oop. The nmethod oops are weak. So not applying nmethod entry barriers, does seem
like a violation of the SATB invariant, for G1. However, people are arguing that it is okay,
as all oops embedded in nmethods, that are reachable by mutators during concurrent marking,
will have their oops marked through. That is okay, as long as the compiler knows about SATB,
and hence what oops it is allowed to embed in the nmethods. If the compiler was to for example
embed a string from the string table, that might not necessarily be reachable by the holders
of the inlined method holders, then this approach would crash as the violation of the SATB
contract would suddenly become more visible. By using nmethod entry barriers, this logic
becomes more robust, as the compiler does not have to know what oops it may or may not embed
into the code stream, as we explicitly apply barriers.
While the robustness reason is one reason to do this dance regardless, we certainly do also
need to apply the right colors in ZGC to the pointers, regardless of whether we would trust
the actual objects to be marked or not. And, in order to deal with relocation properly, we
needed something like nmethod entry barriers anyway, as a mutator really is not allowed to
see not yet relocated oops. So with this mechanism already in place, it made sense to use it
for marking as well, solving 3 problems at the same time: 1) ensuring the objects are marked
in a more robust way, 2) ensuring the colors of exposed nmethods are good during marking, and
3) dealing with concurrent relocation.
I have argued that G1 should also use nmethod entry barriers to explicitly enforce its SATB
invariant, regarding these weak oops, and that the way they are treated today is not robust.
In fact, that is indeed being done in the loom repo, and is likely to become the standard way
of dealing with concurrent marking w.r.t. nmethods, for all concurrently marking GCs in HotSpot.
Hope this helps, and that I got your question right.
Thanks,
/Erik
On 2020-12-07 13:47, Liang Mao wrote:
Hi Erik,
If we are only considering the pause time thread root processing in jdk12-15.
Comparing to G1 which only marks the on-stack nmethod at mark start pause
without nmethod entry barrier, ZGC will mark the on-stack nmethod
at mark start pause and also use nmethod entry barrier to do the marking.
Is the additional marking by nmethod entry barrier a specific behavior because of
color pointer mechanism?
Thanks,
Liang
------------------------------------------------------------------
From:Erik ?sterlund
Send Time:2020 Dec. 7 (Mon.) 20:08
To:"MAO, Liang" ; zgc-dev
Subject:Re: Class unloading in ZGC
Hi Liang,
On 2020-12-07 12:48, Liang Mao wrote:
Hi Erik,
Appreciate your comprehensive reply!
I still have few quetion.
> -----Original Message-----
> From: Erik ?sterlund [mailto:erik.osterlund at oracle.com]
> Sent: 2020?12?7? 18:35
> To: Liang Mao ; zgc-dev dev at openjdk.java.net>
> Subject: Re: Class unloading in ZGC
>
> Hi Liang,
>
> So there are two distict cases. Class unloading enabled (default), and class
> unloading disabled (seemingly for people that just really want to have memory
> leaks for no apparent good reason).
>
> When class unloading is enabled, the code cache comprises weak roots, except
> oops that are on-stack that are treated as strong. These semantics are the same
> across all GCs.
> When marking starts, ZGC
> lazily processes the snapshot of nmethods that were on-stack when marking
> started, with lazy application of nmethod entry barriers. These barriers will mark
Sorry that I need to mention I was looking at the code of 8214897: ZGC: Concurrent Class Unloading.
It handled the on-stack nmethod at pause time. Do you mean the pause processing
is not necessary at that patch and the nmethod walking can be delayed as long as nmethod
entry barrier is there?
On the other hand, if on-stack nmethod is processed at pause time in mark start, the nmethod
entry barrier is not necessary?
What I was describing is what we do today, as opposed to what we did in JDK12.
Back then, we did not have concurrent stack processing, which we do have today. Therefore,
in that patch, I had to process stacks in a safepoint. Moreover, when class unloading is disabled,
I walked the code cache in a safepoint. I was not feeling very motivated to optimize the case when
class unloading is disabled, as there is pretty much no reason I can think of why you would want
to disable it. It's just a memory leak with no benefit, to disable class unloading. For other collectors
class unloading might come at a latency cost. But for ZGC it does not. So there does not seem to exist
any form of trade-off.
Since concurrent stack processing was integrated, there is no longer any need for processing
the on-stack nmethods in safepoints, so that has been moved out of safepoints and is instead
concurrently, incrementally and cooperatively applied through lazy nmethod entry barriers as
the mutators return into frames that have not been processed yet. Since then, we have also made
the code cache walk when class unloading is disabled concurrent, as it simplified the root processing
code in the end to have only concurrent roots, instead of distinguising between STW and concurrent
roots as well as strong vs weak. Now there is only strong vs weak, and no roots are scanned during
safepoint operations, with or without class unloading.
Thanks,
/Erik
Thanks,
Liang
> the objects, and heal the pointers to the corresponding marked color, as
> expected by our barrier machinery. New nmethods that are called go through
> the same processing using nmethod entry barriers. Semantically this ensures that
> on-stack nmethods are treated as strong roots, and the rest of the nmethods
> are treated as weak roots.
> This has the same semantics
> as any other GC.
>
> When class unloading is disabled, the code cache comprises strong roots.
> That means that the GC will
> during concurrent marking walk all nmethods, and mark the oops as strong.
> However, remember that there are two operations: marking the objects, and
> self-healing the pointers as expected by the barrier machinery.
> The second part of the operation still requires us to lazily apply nmethod entry
> barriers to the stacks as well as arming nmethod entry barriers for calls, during
> concurrent marking, so that the oops in the nmethods are self-healed to the
> corresponding marked pointer color, before they are exposed to the execution
> of mutators, which might for example store this oop into the object graph. So I
> suppose the special thing here compared to G1 is that we both walk the code
> cache marking all the oops, *and* explicitly walk the stacks marking them as
> well, with the main purpose of fixing the pointer colors before the mutator gets
> to use the nmethod. And arming the nmethod entry barriers for calls, for the
> same reason.
>
> During relocation, we only arm the nmethod entry barriers with and without
> class unloading. The relocation is lazy and won't be performed until either
> someone uses the nmethod (on-stack lazy nmethod entry barrier or a call to a
> new nmethod), or the subsequent marking cycle will walk the code cache and
> make sure that the objects are remapped, when it is performing marking.
>
> Hope this makes sense and sheds some light on this confusion.
>
> /Erik
>
> On 2020-12-06 16:40, Liang Mao wrote:
> > Hi ZGC team,
> >
> > Previously without concurrent class unloading in ZGC, the code cache
> > will be all treated as strong roots. Then concurrent class unloading
> > will only mark the nmethod of executing threads at mark start pause
> > and use the nmethod entry barrier to heal and also mark the oops. That
> > sounds reasonable. But when I looked into the concurrent marking in G1, it
> doesn't threat all code cache as strong roots and of course has no nmethod
> entry barrier. So I'm confused why ZGC need the nmethod entry barrier for
> > marking. Does the difference comes from the different algorithm of SATB vs
> load barrier?
> >
> > Thanks,
> > Liang
> >
From erik.osterlund at oracle.com Mon Dec 7 15:18:37 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 7 Dec 2020 16:18:37 +0100
Subject: Class unloading in ZGC
In-Reply-To: <322bb5f9-f6cb-4ab8-9891-35e931f75da1.maoliang.ml@alibaba-inc.com>
References: <322bb5f9-f6cb-4ab8-9891-35e931f75da1.maoliang.ml@alibaba-inc.com>
Message-ID: <2f161309-6f7d-ff03-18d3-adb7d4671ce4@oracle.com>
Hi Liang,
On 2020-12-07 15:56, Liang Mao wrote:
> Hi Erik,
>
> Thank you! It's exactly what I want to know!
>
> > The contract with a SATB collector like G1 is that we need to apply
> barriers when loading
> > a weak oop. The nmethod oops are weak. So not applying nmethod entry
> barriers, does seem
> > like a violation of the SATB invariant, for G1. However, people are
> arguing that it is okay,
> > as all oops embedded in nmethods, that are reachable by mutators
> during concurrent marking,
> > will have their oops marked through.
> Do you mean the weak reference in G1 has the implicit *weak_pointer =
> null, and we
> need to record the previous value to make sure the SATB invariant?
Something like that. Except instead of clearing weak pointers, we make
the whole nmethod
unloaded, and throw away the whole nmethod. All other weak references
are cleared during
reference processing, when they are found to no longer be alive. But
nmethods are special,
resulting in more drastic action than clearing, if they have an embedded
oop that is dead.
> But for those nmethod which is only on-stack during concurrent
> marking, we didn't
> enqueue the previous value. It is safe so far because the limitation
> of embeded oop,
> right?
Exactly. And that's what I am referring to as fragile. It has been
debated many times among
our GC engineers, if it has holes or not. We wanted something more robust.
>
> BTW, if ZGC has an explicit load barrier while accessing an oop from
> nmethod embeded,
> is the nmethod entry barrier still necessary? Does nmethod entry
> barrier play the role
> of such load barrier?
It's a swiss army knife, serving multiple roles. It is also used to
coordinate lazy cleaning of
inline caches. In particular, when we reach mark end, nmethods will have
dead oops in them.
But inline caches from other nmethods are still pointing at said
nmethods with dead oops.
What this implies is that the nmethod with the dead oop will get
unloaded, but has not been
unloaded yet. Yet threads that wake up from the safepoint absolutely
must not perform calls
into such nmethods. Normally inline caches are cleaned in the safepoint
operation that unloads
the code cache. But we don't have time for that and let calls into dying
nmethods go ahead,
because we know they will take the slow path in our nmethod entry
barrier, and then re-resolve
the call to something less dead. So apart from dealing with oops, we
really do need this also
for dealing with inline caches (and similarly static calls embedded as
direct calls).
Another thing we have to consider, which is not inherent, but is
currently true, is that the
oops embedded into the code stream on x86_64 are misaligned in memory.
This puts particular
constraints on what mechanism is used to heal the pointers in the
nmethod. We could not
simply use a load barrier with a CAS, as the oop could cross two cache
lines, which at the
very least the specification does not allow. In practice it might work
due to luck, but induce
costs that are huge, similar to inter-processor interrupts (IPI). So
that should be avoided at
all cost. With nmethod entry barriers, we can take mutators into a path
where the nmethod
oops are protected by a per-nmethod lock. One thread will heal the oops,
and no other thread
will concurrently read them.
It is also the case that performing a load barrier into the nmethod
would not necessarily work,
even if the oops were aligned, as the data and instruction caches are
not necessarily in sync.
That is why the check for the nmethod entry barrier is notified of being
disarmed via writes to
the instruction stream. x86_64 machines have an explicit guarantee that
instruction modification
is observed by the execution, in the order the modifications were
written. That means that if the
nmethod entry barrier executed the instruction that perceived the
nmethod as disarmed, then
that guarantees that instruction executions with immediate oops will
also observe the updated
immediate oop.
So yeah, there are indeed multiple things that would not work very well
without the nmethod
entry barrier.
Hope this helps understanding why we do the things that we do.
/Erik
> Thanks,
> Liang
>
>
>
> ------------------------------------------------------------------
> From:Erik ?sterlund
> Send Time:2020 Dec. 7 (Mon.) 21:26
> To:"MAO, Liang" ; zgc-dev
>
> Subject:Re: Class unloading in ZGC
>
> Hi Liang,
>
> Sorry, I don't know if I understand what you are referring to
> specifically. I think you
> are talking about what happens when class unloading is enabled, am I
> right?
>
> If so, then there is indeed a difference between G1 and ZGC. They both
> scan the stacks,
> marking through on-stack nmethods. But ZGC also arms nmethod entry
> barriers, to lazily
> mark through nmethod oops. Here is why.
>
> ZGC needs to color all nmethod oops as "marked" before exposing them
> to mutator threads.
> We also think that explicitly marking oops exposed to mutators is the
> most robust way
> of treating these oops, as they are indeed weak until used. So marking
> them in the nmethod
> entry barrier during concurrent marking, is in spirit very similar to
> applying a weak load
> barrier on Reference.get(), which also G1 does.
>
> The contract with a SATB collector like G1 is that we need to apply
> barriers when loading
> a weak oop. The nmethod oops are weak. So not applying nmethod entry
> barriers, does seem
> like a violation of the SATB invariant, for G1. However, people are
> arguing that it is okay,
> as all oops embedded in nmethods, that are reachable by mutators
> during concurrent marking,
> will have their oops marked through. That is okay, as long as the
> compiler knows about SATB,
> and hence what oops it is allowed to embed in the nmethods. If the
> compiler was to for example
> embed a string from the string table, that might not necessarily be
> reachable by the holders
> of the inlined method holders, then this approach would crash as the
> violation of the SATB
> contract would suddenly become more visible. By using nmethod entry
> barriers, this logic
> becomes more robust, as the compiler does not have to know what oops
> it may or may not embed
> into the code stream, as we explicitly apply barriers.
>
> While the robustness reason is one reason to do this dance regardless,
> we certainly do also
> need to apply the right colors in ZGC to the pointers, regardless of
> whether we would trust
> the actual objects to be marked or not. And, in order to deal with
> relocation properly, we
> needed something like nmethod entry barriers anyway, as a mutator
> really is not allowed to
> see not yet relocated oops. So with this mechanism already in place,
> it made sense to use it
> for marking as well, solving 3 problems at the same time: 1) ensuring
> the objects are marked
> in a more robust way, 2) ensuring the colors of exposed nmethods are
> good during marking, and
> 3) dealing with concurrent relocation.
>
> I have argued that G1 should also use nmethod entry barriers to
> explicitly enforce its SATB
> invariant, regarding these weak oops, and that the way they are
> treated today is not robust.
> In fact, that is indeed being done in the loom repo, and is likely to
> become the standard way
> of dealing with concurrent marking w.r.t. nmethods, for all
> concurrently marking GCs in HotSpot.
>
> Hope this helps, and that I got your question right.
>
> Thanks,
> /Erik
>
> On 2020-12-07 13:47, Liang Mao wrote:
> Hi Erik,
>
> If we are only considering the pause time thread root processing in
> jdk12-15.
> Comparing to G1 which only marks the on-stack nmethod at mark start pause
> without nmethod entry barrier, ZGC will mark the on-stack nmethod
> at mark start pause and also use nmethod entry barrier to do the marking.
> Is the additional marking by nmethod entry barrier a specific behavior
> because of
> color pointer mechanism?
>
> Thanks,
> Liang
>
>
>
> ------------------------------------------------------------------
> From:Erik ?sterlund
> Send Time:2020 Dec. 7 (Mon.) 20:08
> To:"MAO, Liang" ; zgc-dev
>
> Subject:Re: Class unloading in ZGC
>
> Hi Liang,
>
> On 2020-12-07 12:48, Liang Mao wrote:
>
> ?Hi Erik,
>
>
> Appreciate your comprehensive reply!
>
> I still have few quetion.
>
> > -----Original Message-----
>
> > From: Erik ?sterlund [mailto:erik.osterlund at oracle.com]
>
> > Sent: 2020?12?7? 18:35
>
> > To: Liang Mao ; zgc-dev
> > dev at openjdk.java.net>
>
> > Subject: Re: Class unloading in ZGC
>
> >
>
> > Hi Liang,
>
> >
>
> > So there are two distict cases. Class unloading enabled (default),
> and class
>
> > unloading disabled (seemingly for people that just really want to
> have memory
>
> > leaks for no apparent good reason).
>
> >
>
> > When class unloading is enabled, the code cache comprises weak roots,
> except
>
> > oops that are on-stack that are treated as strong. These semantics
> are the same
>
> > across all GCs.
>
> > When marking starts, ZGC
>
> > lazily processes the snapshot of nmethods that were on-stack when marking
>
> > started, with lazy application of nmethod entry barriers. These
> barriers will mark
>
>
> Sorry that I need to mention I was looking at the code of
> 8214897:?ZGC:?Concurrent?Class?Unloading.
>
> It handled the on-stack nmethod at pause time. Do you mean the pause
> processing
>
> is not necessary at that patch and the nmethod walking can be delayed
> as long as nmethod
>
> entry barrier is there?
>
> On the other hand, if on-stack nmethod is processed at pause time in
> mark start,? the nmethod
>
> entry barrier is not necessary?
>
>
> What I was describing is what we do today, as opposed to what we did
> in JDK12.
>
> Back then, we did not have concurrent stack processing, which we do
> have today. Therefore,
> in that patch, I had to process stacks in a safepoint. Moreover, when
> class unloading is disabled,
> I walked the code cache in a safepoint. I was not feeling very
> motivated to optimize the case when
> class unloading is disabled, as there is pretty much no reason I can
> think of why you would want
> to disable it. It's just a memory leak with no benefit, to disable
> class unloading. For other collectors
> class unloading might come at a latency cost. But for ZGC it does not.
> So there does not seem to exist
> any form of trade-off.
>
> Since concurrent stack processing was integrated, there is no longer
> any need for processing
> the on-stack nmethods in safepoints, so that has been moved out of
> safepoints and is instead
> concurrently, incrementally and cooperatively applied through lazy
> nmethod entry barriers as
> the mutators return into frames that have not been processed yet.
> Since then, we have also made
> the code cache walk when class unloading is disabled concurrent, as it
> simplified the root processing
> code in the end to have only concurrent roots, instead of
> distinguising between STW and concurrent
> roots as well as strong vs weak. Now there is only strong vs weak, and
> no roots are scanned during
> safepoint operations, with or without class unloading.
>
> Thanks,
> /Erik
>
> Thanks,
>
> Liang
>
>
> > the objects, and heal the pointers to the corresponding marked color, as
>
> > expected by our barrier machinery. New nmethods that are called go
> through
>
> > the same processing using nmethod entry barriers. Semantically this
> ensures that
>
> > on-stack nmethods are treated as strong roots, and the rest of the
> nmethods
>
> > are treated as weak roots.
>
> > This has the same semantics
>
> > as any other GC.
>
> >
>
> > When class unloading is disabled, the code cache comprises strong roots.
>
> > That means that the GC will
>
> > during concurrent marking walk all nmethods, and mark the oops as strong.
>
> > However, remember that there are two operations: marking the objects, and
>
> > self-healing the pointers as expected by the barrier machinery.
>
> > The second part of the operation still requires us to lazily apply
> nmethod entry
>
> > barriers to the stacks as well as arming nmethod entry barriers for
> calls, during
>
> > concurrent marking, so that the oops in the nmethods are self-healed
> to the
>
> > corresponding marked pointer color, before they are exposed to the
> execution
>
> > of mutators, which might for example store this oop into the object
> graph. So I
>
> > suppose the special thing here compared to G1 is that we both walk
> the code
>
> > cache marking all the oops, *and* explicitly walk the stacks marking
> them as
>
> > well, with the main purpose of fixing the pointer colors before the
> mutator gets
>
> > to use the nmethod. And arming the nmethod entry barriers for calls,
> for the
>
> > same reason.
>
> >
>
> > During relocation, we only arm the nmethod entry barriers with and
> without
>
> > class unloading. The relocation is lazy and won't be performed until
> either
>
> > someone uses the nmethod (on-stack lazy nmethod entry barrier or a
> call to a
>
> > new nmethod), or the subsequent marking cycle will walk the code
> cache and
>
> > make sure that the objects are remapped, when it is performing marking.
>
> >
>
> > Hope this makes sense and sheds some light on this confusion.
>
> >
>
> > /Erik
>
> >
>
> > On 2020-12-06 16:40, Liang Mao wrote:
>
> > > Hi ZGC team,
>
> > >
>
> > > Previously without concurrent class unloading in ZGC, the code cache
>
> > > will be all treated as strong roots. Then concurrent class unloading
>
> > > will only mark the nmethod of executing threads at mark start pause
>
> > > and use the nmethod entry barrier to heal and also mark the oops. That
>
> > > sounds reasonable. But when I looked into the concurrent marking in
> G1, it
>
> > doesn't threat all code cache as strong roots and of course has no
> nmethod
>
> > entry barrier. So I'm confused why ZGC need the nmethod entry barrier for
>
> > >marking. Does the difference comes from the different algorithm of
> SATB vs
>
> > load barrier?
>
> > >
>
> > > Thanks,
>
> > > Liang
>
> > >
>
>
>
>
>
From maoliang.ml at alibaba-inc.com Tue Dec 8 02:54:03 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Tue, 08 Dec 2020 10:54:03 +0800
Subject: =?UTF-8?B?Q2xhc3MgdW5sb2FkaW5nIGluIFpHQw==?=
Message-ID: <9187d73c-a183-49f3-8b5c-d05a5433702f.maoliang.ml@alibaba-inc.com>
Hi Erik,
Your answer is a great help. Thanks a lot for sharing the valuable thinking.
Liang
------------------------------------------------------------------
From:Erik ?sterlund
Send Time:2020 Dec. 7 (Mon.) 23:18
To:"MAO, Liang" ; zgc-dev
Subject:Re: Class unloading in ZGC
Hi Liang,
On 2020-12-07 15:56, Liang Mao wrote:
Hi Erik,
Thank you! It's exactly what I want to know!
> The contract with a SATB collector like G1 is that we need to apply barriers when loading
> a weak oop. The nmethod oops are weak. So not applying nmethod entry barriers, does seem
> like a violation of the SATB invariant, for G1. However, people are arguing that it is okay,
> as all oops embedded in nmethods, that are reachable by mutators during concurrent marking,
> will have their oops marked through.
Do you mean the weak reference in G1 has the implicit *weak_pointer = null, and we
need to record the previous value to make sure the SATB invariant?
Something like that. Except instead of clearing weak pointers, we make the whole nmethod
unloaded, and throw away the whole nmethod. All other weak references are cleared during
reference processing, when they are found to no longer be alive. But nmethods are special,
resulting in more drastic action than clearing, if they have an embedded oop that is dead.
But for those nmethod which is only on-stack during concurrent marking, we didn't
enqueue the previous value. It is safe so far because the limitation of embeded oop,
right?
Exactly. And that's what I am referring to as fragile. It has been debated many times among
our GC engineers, if it has holes or not. We wanted something more robust.
BTW, if ZGC has an explicit load barrier while accessing an oop from nmethod embeded,
is the nmethod entry barrier still necessary? Does nmethod entry barrier play the role
of such load barrier?
It's a swiss army knife, serving multiple roles. It is also used to coordinate lazy cleaning of
inline caches. In particular, when we reach mark end, nmethods will have dead oops in them.
But inline caches from other nmethods are still pointing at said nmethods with dead oops.
What this implies is that the nmethod with the dead oop will get unloaded, but has not been
unloaded yet. Yet threads that wake up from the safepoint absolutely must not perform calls
into such nmethods. Normally inline caches are cleaned in the safepoint operation that unloads
the code cache. But we don't have time for that and let calls into dying nmethods go ahead,
because we know they will take the slow path in our nmethod entry barrier, and then re-resolve
the call to something less dead. So apart from dealing with oops, we really do need this also
for dealing with inline caches (and similarly static calls embedded as direct calls).
Another thing we have to consider, which is not inherent, but is currently true, is that the
oops embedded into the code stream on x86_64 are misaligned in memory. This puts particular
constraints on what mechanism is used to heal the pointers in the nmethod. We could not
simply use a load barrier with a CAS, as the oop could cross two cache lines, which at the
very least the specification does not allow. In practice it might work due to luck, but induce
costs that are huge, similar to inter-processor interrupts (IPI). So that should be avoided at
all cost. With nmethod entry barriers, we can take mutators into a path where the nmethod
oops are protected by a per-nmethod lock. One thread will heal the oops, and no other thread
will concurrently read them.
It is also the case that performing a load barrier into the nmethod would not necessarily work,
even if the oops were aligned, as the data and instruction caches are not necessarily in sync.
That is why the check for the nmethod entry barrier is notified of being disarmed via writes to
the instruction stream. x86_64 machines have an explicit guarantee that instruction modification
is observed by the execution, in the order the modifications were written. That means that if the
nmethod entry barrier executed the instruction that perceived the nmethod as disarmed, then
that guarantees that instruction executions with immediate oops will also observe the updated
immediate oop.
So yeah, there are indeed multiple things that would not work very well without the nmethod
entry barrier.
Hope this helps understanding why we do the things that we do.
/Erik
Thanks,
Liang
------------------------------------------------------------------
From:Erik ?sterlund
Send Time:2020 Dec. 7 (Mon.) 21:26
To:"MAO, Liang" ; zgc-dev
Subject:Re: Class unloading in ZGC
Hi Liang,
Sorry, I don't know if I understand what you are referring to specifically. I think you
are talking about what happens when class unloading is enabled, am I right?
If so, then there is indeed a difference between G1 and ZGC. They both scan the stacks,
marking through on-stack nmethods. But ZGC also arms nmethod entry barriers, to lazily
mark through nmethod oops. Here is why.
ZGC needs to color all nmethod oops as "marked" before exposing them to mutator threads.
We also think that explicitly marking oops exposed to mutators is the most robust way
of treating these oops, as they are indeed weak until used. So marking them in the nmethod
entry barrier during concurrent marking, is in spirit very similar to applying a weak load
barrier on Reference.get(), which also G1 does.
The contract with a SATB collector like G1 is that we need to apply barriers when loading
a weak oop. The nmethod oops are weak. So not applying nmethod entry barriers, does seem
like a violation of the SATB invariant, for G1. However, people are arguing that it is okay,
as all oops embedded in nmethods, that are reachable by mutators during concurrent marking,
will have their oops marked through. That is okay, as long as the compiler knows about SATB,
and hence what oops it is allowed to embed in the nmethods. If the compiler was to for example
embed a string from the string table, that might not necessarily be reachable by the holders
of the inlined method holders, then this approach would crash as the violation of the SATB
contract would suddenly become more visible. By using nmethod entry barriers, this logic
becomes more robust, as the compiler does not have to know what oops it may or may not embed
into the code stream, as we explicitly apply barriers.
While the robustness reason is one reason to do this dance regardless, we certainly do also
need to apply the right colors in ZGC to the pointers, regardless of whether we would trust
the actual objects to be marked or not. And, in order to deal with relocation properly, we
needed something like nmethod entry barriers anyway, as a mutator really is not allowed to
see not yet relocated oops. So with this mechanism already in place, it made sense to use it
for marking as well, solving 3 problems at the same time: 1) ensuring the objects are marked
in a more robust way, 2) ensuring the colors of exposed nmethods are good during marking, and
3) dealing with concurrent relocation.
I have argued that G1 should also use nmethod entry barriers to explicitly enforce its SATB
invariant, regarding these weak oops, and that the way they are treated today is not robust.
In fact, that is indeed being done in the loom repo, and is likely to become the standard way
of dealing with concurrent marking w.r.t. nmethods, for all concurrently marking GCs in HotSpot.
Hope this helps, and that I got your question right.
Thanks,
/Erik
On 2020-12-07 13:47, Liang Mao wrote:
Hi Erik,
If we are only considering the pause time thread root processing in jdk12-15.
Comparing to G1 which only marks the on-stack nmethod at mark start pause
without nmethod entry barrier, ZGC will mark the on-stack nmethod
at mark start pause and also use nmethod entry barrier to do the marking.
Is the additional marking by nmethod entry barrier a specific behavior because of
color pointer mechanism?
Thanks,
Liang
------------------------------------------------------------------
From:Erik ?sterlund
Send Time:2020 Dec. 7 (Mon.) 20:08
To:"MAO, Liang" ; zgc-dev
Subject:Re: Class unloading in ZGC
Hi Liang,
On 2020-12-07 12:48, Liang Mao wrote:
Hi Erik,
Appreciate your comprehensive reply!
I still have few quetion.
> -----Original Message-----
> From: Erik ?sterlund [mailto:erik.osterlund at oracle.com]
> Sent: 2020?12?7? 18:35
> To: Liang Mao ; zgc-dev dev at openjdk.java.net>
> Subject: Re: Class unloading in ZGC
>
> Hi Liang,
>
> So there are two distict cases. Class unloading enabled (default), and class
> unloading disabled (seemingly for people that just really want to have memory
> leaks for no apparent good reason).
>
> When class unloading is enabled, the code cache comprises weak roots, except
> oops that are on-stack that are treated as strong. These semantics are the same
> across all GCs.
> When marking starts, ZGC
> lazily processes the snapshot of nmethods that were on-stack when marking
> started, with lazy application of nmethod entry barriers. These barriers will mark
Sorry that I need to mention I was looking at the code of 8214897: ZGC: Concurrent Class Unloading.
It handled the on-stack nmethod at pause time. Do you mean the pause processing
is not necessary at that patch and the nmethod walking can be delayed as long as nmethod
entry barrier is there?
On the other hand, if on-stack nmethod is processed at pause time in mark start, the nmethod
entry barrier is not necessary?
What I was describing is what we do today, as opposed to what we did in JDK12.
Back then, we did not have concurrent stack processing, which we do have today. Therefore,
in that patch, I had to process stacks in a safepoint. Moreover, when class unloading is disabled,
I walked the code cache in a safepoint. I was not feeling very motivated to optimize the case when
class unloading is disabled, as there is pretty much no reason I can think of why you would want
to disable it. It's just a memory leak with no benefit, to disable class unloading. For other collectors
class unloading might come at a latency cost. But for ZGC it does not. So there does not seem to exist
any form of trade-off.
Since concurrent stack processing was integrated, there is no longer any need for processing
the on-stack nmethods in safepoints, so that has been moved out of safepoints and is instead
concurrently, incrementally and cooperatively applied through lazy nmethod entry barriers as
the mutators return into frames that have not been processed yet. Since then, we have also made
the code cache walk when class unloading is disabled concurrent, as it simplified the root processing
code in the end to have only concurrent roots, instead of distinguising between STW and concurrent
roots as well as strong vs weak. Now there is only strong vs weak, and no roots are scanned during
safepoint operations, with or without class unloading.
Thanks,
/Erik
Thanks,
Liang
> the objects, and heal the pointers to the corresponding marked color, as
> expected by our barrier machinery. New nmethods that are called go through
> the same processing using nmethod entry barriers. Semantically this ensures that
> on-stack nmethods are treated as strong roots, and the rest of the nmethods
> are treated as weak roots.
> This has the same semantics
> as any other GC.
>
> When class unloading is disabled, the code cache comprises strong roots.
> That means that the GC will
> during concurrent marking walk all nmethods, and mark the oops as strong.
> However, remember that there are two operations: marking the objects, and
> self-healing the pointers as expected by the barrier machinery.
> The second part of the operation still requires us to lazily apply nmethod entry
> barriers to the stacks as well as arming nmethod entry barriers for calls, during
> concurrent marking, so that the oops in the nmethods are self-healed to the
> corresponding marked pointer color, before they are exposed to the execution
> of mutators, which might for example store this oop into the object graph. So I
> suppose the special thing here compared to G1 is that we both walk the code
> cache marking all the oops, *and* explicitly walk the stacks marking them as
> well, with the main purpose of fixing the pointer colors before the mutator gets
> to use the nmethod. And arming the nmethod entry barriers for calls, for the
> same reason.
>
> During relocation, we only arm the nmethod entry barriers with and without
> class unloading. The relocation is lazy and won't be performed until either
> someone uses the nmethod (on-stack lazy nmethod entry barrier or a call to a
> new nmethod), or the subsequent marking cycle will walk the code cache and
> make sure that the objects are remapped, when it is performing marking.
>
> Hope this makes sense and sheds some light on this confusion.
>
> /Erik
>
> On 2020-12-06 16:40, Liang Mao wrote:
> > Hi ZGC team,
> >
> > Previously without concurrent class unloading in ZGC, the code cache
> > will be all treated as strong roots. Then concurrent class unloading
> > will only mark the nmethod of executing threads at mark start pause
> > and use the nmethod entry barrier to heal and also mark the oops. That
> > sounds reasonable. But when I looked into the concurrent marking in G1, it
> doesn't threat all code cache as strong roots and of course has no nmethod
> entry barrier. So I'm confused why ZGC need the nmethod entry barrier for
> > marking. Does the difference comes from the different algorithm of SATB vs
> load barrier?
> >
> > Thanks,
> > Liang
> >
From per.liden at oracle.com Fri Dec 11 10:58:38 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 11 Dec 2020 11:58:38 +0100
Subject: Pause reported in MXBean
In-Reply-To:
References:
Message-ID: <1f5fb30e-d06b-5436-d023-3c395a626845@oracle.com>
Hi Pedro,
On 12/7/20 12:29 PM, Viton, Pedro (Nokia - ES/Madrid) wrote:
> Hi:
>
> A couple of years ago, I was trying out ZGC on Java 11.
> Now, I've retested it for our Java Application on 15.0.1.
>
> What I've noticed now, is that I get the impression that the data reported in MXBeans is the total time ZGC is running and using CPU, both the pause times and the concurrent times.
That's correct. This changed with JDK-8240679. The problem is of course
that the GarbageCollectorMXBean wasn't really designed to convey
information from a concurrent GC, so whatever model we pick some
information will be inaccurate or missing.
> For instance, as seen from the CLI of our Application (that reads the data from MXBeans):
>
> [CMD] java gc stats
>
> Collector Count Time(ms)
> --------- ----- --------
> ZGC 702 415291
>
>
> However, if my memory is not too bad, I would swear that the time reported back in Java 11, was only the pause time, and didn't include the time of the concurrent phases.
> Somehow, that's also the behavior of G1 and old CMS GC, that in MXBeans, they only report the pause times.
>
> I suppose that at this stage, it is not possible to change current behavior, as it could affect other people and would break backwards compatibility.
> But, alternatively, I was wondering if something could be done similar to what Shenandoah does; report 2 instances in MXBeans: pause counts & times and cycles count & times
>
> policy-luna38> java gc stats
> 103 Multi-line response follows.
> Collector Count Time(ms)
> ----------------- ----- --------
> Shenandoah Pauses 20 21
> Shenandoah Cycles 5 1471
> 100 Ok.
>
> policy-luna38> java gc force
> 100 Ok.
>
> policy-luna38> java gc stats
> 103 Multi-line response follows.
> Collector Count Time(ms)
> ----------------- ----- --------
> Shenandoah Pauses 24 26
> Shenandoah Cycles 6 1769
> 100 Ok.
>
> Though CPU time is important, what people are normally concerned about is the pause times, and avg time per pause (specially people who have traditionally used G1 and CMS)
Reporting both could be an alternative going forward. It's perhaps not a
super great fit either, but might still be an acceptable options. There
has also been some loose discussions about creating a new type of
MXBean, which would be more suitable to convey information from a
concurrent GC. But no work has been done on that so far.
Also worth noting is that starting with JDK 16, ZGC pauses will execute
in constant time (MarkStart and RelocateStart) or be bound in time to
200us (MarkEnd), so pause time information from ZGC is getting less
interesting, as they are so short.
cheers,
Per
>
> Thanks a lot,
> Pedro
>
>
>
>
> -----Original Message-----
> From: Viton, Pedro (Nokia - ES/Madrid)
> Sent: Wednesday, September 26, 2018 3:13 PM
> To: Per Liden
> Cc: zgc-dev at openjdk.java.net
> Subject: RE: Inclusion of pause time in gc.log
>
> Hi:
>
> Adding the pause time in the way you indicate:
> 1.302ms/0.987ms/1.211ms
> Is even better than my initial suggestion.
>
> Thanks,
> Pedro
>
> PEDRO VITON
> NOKIA
> AAA Specialist
> C/ Maria Tubau 9 (28050 Madrid - SPAIN)
> M: +34 690 964740 (ext. 2411 5746)
> pedro.viton at nokia.com
>
> -----Original Message-----
> From: Per Liden
> Sent: Wednesday, September 26, 2018 3:04 PM
> To: Viton, Pedro (Nokia - ES/Madrid) ; zgc-dev at openjdk.java.net
> Subject: Re: Inclusion of pause time in gc.log
>
> Hi,
>
> On 09/25/2018 02:41 PM, Viton, Pedro (Nokia - ES/Madrid) wrote:
>> Hi:
>>
>> I've downloaded the latest Java11 Early Access to try out ZGC.
>> I have to admit I'm really impressed with its reduced pause time, that
>> I measure with MXBeans
>>
>> I also like that the log line (included in gc.log, with default
>> logging enabled) indicates the % of the heap before and after the
>> collection [26.814s][info][gc] GC(0) Garbage Collection (Metadata GC
>> Threshold) 168M(1%)->78M(1%) [28.719s][info][gc] GC(1) Garbage
>> Collection (Metadata GC Threshold) 212M(2%)->94M(1%)
>> [37.136s][info][gc] GC(2) Garbage Collection (Warmup)
>> 3764M(31%)->1606M(13%) [45.807s][info][gc] GC(3) Garbage Collection
>> (Allocation Rate) 6468M(53%)->2530M(21%) [48.553s][info][gc] GC(4)
>> Garbage Collection (System.gc()) 2530M(21%)->1640M(13%)
>> [114.713s][info][gc] GC(5) Garbage Collection (Allocation Rate)
>> 6454M(53%)->2198M(18%) [165.234s][info][gc] GC(6) Garbage Collection
>> (Allocation Rate) 8496M(69%)->3190M(26%)
>>
>> I just have 1 suggestion:
>> Would it be possible to also include in that log line, the pause time of the collection (as the addition of the 3 pauses for each collection)?
>
> We've discussed various ways of showing this, for example like this
>
> "Garbage Collection (Allocation Rate) 8496M(69%)->3190M(26%), 1.302ms/0.987ms/1.211ms"
>
> where those times would correspond the MarkStart/MarkEnd/RelocateStart pause times. It's a bit more to it though, since there are potentially more than one MarkEnd pause it's not obvious what to show there. The longest of the MarkEnd pauses is probably what most people want to see.
>
> cheers,
> Per
>
>>
>> Thanks,
>> Pedro
>>
>> PEDRO VITON
>> NOKIA
>> AAA Specialist
>> C/ Maria Tubau 9 (28050 Madrid - SPAIN)
>> M: +34 690 964740 (ext. 2411 5746)
>> pedro.viton at nokia.com
>>