high cpu usage caused by weak reference
shaobin zeng
zengshaobin2008 at gmail.com
Thu Nov 26 02:31:22 UTC 2020
OK, I will do it right away. Thank you very much!
Per Liden <per.liden at oracle.com> 于2020年11月26日周四 上午12:51写道:
> On 11/25/20 4:30 PM, shaobin zeng wrote:
> > Hi Per,
> > It is so kind of you to explain the details to me, I got the point
> > now. Then this problem may be resolved if the threads are not pooled or
> > pooled for a short time only, right? Since the innocent instances will
> > not be asked again if the thread is not active any more.
>
> Yes, in the sense that the backing hash table is owned by the thread, so
> when it terminates the table (and all its WeakReferences) will become
> garbage and eventually collected.
>
> cheers,
> Per
>
> >
> > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>> 于2020年
> > 11月25日周三 下午10:49写道:
> >
> > On 11/24/20 5:41 PM, shaobin zeng wrote:
> > > Hi Per,
> > > Thanks for your reply, the thread locals are used in a lib
> > so it is
> > > hard to change. And did you mean zgc is one of the SATB
> > collectors, and
> > > there is some thing wrong when the collector try to collect these
> > weak
> > > references because of the Reference::get is called?
> >
> > By calling Reference.get() you create a strong reference to the
> > referent, which means it will not be collected until a future GC
> cycle
> > has concluded that it's no longer strongly reachable. ThreadLocal is
> > backed by a hash table. ThreadLocal.get() will do a lookup in that
> > table, which in turn can call Reference.get() on a number of
> "innocent"
> > instances to figure out if it's the instance it is looking for. This
> > causes those "innocent" instances to be kept alive. This is where
> > Reference.refersTo() comes in (which I linked to). refersTo()
> > provides a
> > way to ask what the referent is pointing to without creating a strong
> > reference to it. This allows ThreadLocal.get() to avoid keeping
> > "innocent" instances alive.
> >
> > cheers,
> > Per
> >
> > (Btw, ZGC does precise wavefront marking, not SATB)
> >
> > >
> > > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>
> > <mailto:per.liden at oracle.com <mailto:per.liden at oracle.com>>> 于2020年
> > > 11月24日周二 下午7:35写道:
> > >
> > > Hi,
> > >
> > > On 11/24/20 2:40 AM, shaobin zeng wrote:
> > > > Hi,
> > > > I am trying to use zgc in product environment, so I
> updated
> > > jdk from jdk8
> > > > to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related
> > > options, but the
> > > > cpu usage goes to 1000+% after a few hours later of the jvm
> > > start(normal
> > > > cpu usage should be 100-300%). If I make node offline
> > about 30s,
> > > the cpu
> > > > goes down, and then make it online, it will work normally
> for
> > > hours until
> > > > the cpu goes high again. Here is the gc option:
> > > >
> > > >> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
> > > >> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8
> > > -XX:ConcGCThreads=4
> > > >> -Xss2m -XX:+UseZGC
> > > >>
> > >
> >
> -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
> > > >> -XX:+HeapDumpOnOutOfMemoryError'
> > > >
> > > >
> > > > I profiled it with async-profile, the most hot method
> is
> > > > java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss,
> maybe
> > > there are too
> > > > many thread local map entry which are weak referenced are
> not
> > > reclaimed in
> > > > time (jdk8 and cms works well on this)?
> > > > The following gc logs show the discovered weak
> > reference count
> > > keep
> > > > increasing after start, but the requests is almost
> > constant from
> > > > 11:00-17:00. Note that cpu dropped from 600% to 400%
> > > automatically after
> > > > GC(9821), the enqueued is ~250K. GC(10265) the node was
> > offline,
> > > enqueued
> > > > was ~770K. I'm confused why the enqueued count is small in
> > a long
> > > time and
> > > > the discovered count going up straightly?
> > > > Thanks for any suggestions!
> > >
> > > I think you've identified the problem. WeakReferences are
> > constantly
> > > being resurrected and kept alive by calls to
> ThreadLocal.get(),
> > > which in
> > > turn calls getEntryAfterMiss(). Over time the table of
> > WeakReferences
> > > grows (because they are all alive) and becomes more and more
> > expensive
> > > to process. When you take the node off-line, and the calls
> > > ThreadLocal.get() stop (i.e. resurrection stops), then the GC
> > gets a
> > > change to clean out the stale WeakReferences.
> > >
> > > This is a known problem with WeakReferences, which
> > ThreadLocal makes
> > > heavy use of. The good news is that work is in progress to
> > fix this.
> > > The
> > > first part (https://bugs.openjdk.java.net/browse/JDK-8188055)
> > recently
> > > went into mainline, and the two follow up patches
> > > (https://bugs.openjdk.java.net/browse/JDK-8256377 and
> > > https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully
> > go in
> > > soon, and will resolve this issue for good.
> > >
> > > In the mean time, you might want to look into calling
> > > ThreadLocal.remove() once a thread local is no longer needed.
> > This is
> > > not always feasible, since those thread locals might have be
> > created by
> > > a library you don't control, or it might be difficult to tell
> > when a
> > > thread local is no longer needed, etc. However, if it is
> > feasible in
> > > your case, then it might be a way for you to lower to cost of
> > > processing
> > > WeakReferences.
> > >
> > > (CMS in JDK 8 was less sensitive to this issue, because it
> had a
> > > different marking strategy).
> > >
> > > cheers,
> > > Per
> > >
> > > >
> > > > [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658
> > encountered,
> > > 72334
> > > >> discovered, 0 enqueued
> > > >> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462
> > > encountered, 122216
> > > >> discovered, 1380 enqueued
> > > >> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598
> > > encountered, 107228
> > > >> discovered, 677 enqueued
> > > >> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536
> > > encountered, 82199
> > > >> discovered, 1713 enqueued
> > > >> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946
> > > encountered, 291651
> > > >> discovered, 292 enqueued
> > > >> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065
> > > encountered, 124351
> > > >> discovered, 815 enqueued
> > > >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> > > encountered, 298932
> > > >> discovered, 353 enqueued
> > > >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> > > encountered, 519369
> > > >> discovered, 4648 enqueued
> > > >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> > > encountered, 298932
> > > >> discovered, 353 enqueued
> > > >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> > > encountered, 519369
> > > >> discovered, 4648 enqueued
> > > >> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757
> > > encountered, 928748
> > > >> discovered, 1691 enqueued
> > > >> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080
> > > encountered, 841168
> > > >> discovered, 247352 enqueued
> > > >> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253
> > > encountered, 568564
> > > >> discovered, 3938 enqueued
> > > >> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081
> > > encountered, 788825
> > > >> discovered, 767288 enqueued
> > > >> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876
> > > encountered, 18186
> > > >> discovered, 1 enqueued
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > > 曾少斌
> >
> >
> >
> > --
> > 曾少斌
>
--
曾少斌
More information about the zgc-dev
mailing list