high cpu usage caused by weak reference
Per Liden
per.liden at oracle.com
Wed Nov 25 16:51:05 UTC 2020
On 11/25/20 4:30 PM, shaobin zeng wrote:
> Hi Per,
> It is so kind of you to explain the details to me, I got the point
> now. Then this problem may be resolved if the threads are not pooled or
> pooled for a short time only, right? Since the innocent instances will
> not be asked again if the thread is not active any more.
Yes, in the sense that the backing hash table is owned by the thread, so
when it terminates the table (and all its WeakReferences) will become
garbage and eventually collected.
cheers,
Per
>
> Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>> 于2020年
> 11月25日周三 下午10:49写道:
>
> On 11/24/20 5:41 PM, shaobin zeng wrote:
> > Hi Per,
> > Thanks for your reply, the thread locals are used in a lib
> so it is
> > hard to change. And did you mean zgc is one of the SATB
> collectors, and
> > there is some thing wrong when the collector try to collect these
> weak
> > references because of the Reference::get is called?
>
> By calling Reference.get() you create a strong reference to the
> referent, which means it will not be collected until a future GC cycle
> has concluded that it's no longer strongly reachable. ThreadLocal is
> backed by a hash table. ThreadLocal.get() will do a lookup in that
> table, which in turn can call Reference.get() on a number of "innocent"
> instances to figure out if it's the instance it is looking for. This
> causes those "innocent" instances to be kept alive. This is where
> Reference.refersTo() comes in (which I linked to). refersTo()
> provides a
> way to ask what the referent is pointing to without creating a strong
> reference to it. This allows ThreadLocal.get() to avoid keeping
> "innocent" instances alive.
>
> cheers,
> Per
>
> (Btw, ZGC does precise wavefront marking, not SATB)
>
> >
> > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>
> <mailto:per.liden at oracle.com <mailto:per.liden at oracle.com>>> 于2020年
> > 11月24日周二 下午7:35写道:
> >
> > Hi,
> >
> > On 11/24/20 2:40 AM, shaobin zeng wrote:
> > > Hi,
> > > I am trying to use zgc in product environment, so I updated
> > jdk from jdk8
> > > to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related
> > options, but the
> > > cpu usage goes to 1000+% after a few hours later of the jvm
> > start(normal
> > > cpu usage should be 100-300%). If I make node offline
> about 30s,
> > the cpu
> > > goes down, and then make it online, it will work normally for
> > hours until
> > > the cpu goes high again. Here is the gc option:
> > >
> > >> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
> > >> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8
> > -XX:ConcGCThreads=4
> > >> -Xss2m -XX:+UseZGC
> > >>
> >
> -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
> > >> -XX:+HeapDumpOnOutOfMemoryError'
> > >
> > >
> > > I profiled it with async-profile, the most hot method is
> > > java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, maybe
> > there are too
> > > many thread local map entry which are weak referenced are not
> > reclaimed in
> > > time (jdk8 and cms works well on this)?
> > > The following gc logs show the discovered weak
> reference count
> > keep
> > > increasing after start, but the requests is almost
> constant from
> > > 11:00-17:00. Note that cpu dropped from 600% to 400%
> > automatically after
> > > GC(9821), the enqueued is ~250K. GC(10265) the node was
> offline,
> > enqueued
> > > was ~770K. I'm confused why the enqueued count is small in
> a long
> > time and
> > > the discovered count going up straightly?
> > > Thanks for any suggestions!
> >
> > I think you've identified the problem. WeakReferences are
> constantly
> > being resurrected and kept alive by calls to ThreadLocal.get(),
> > which in
> > turn calls getEntryAfterMiss(). Over time the table of
> WeakReferences
> > grows (because they are all alive) and becomes more and more
> expensive
> > to process. When you take the node off-line, and the calls
> > ThreadLocal.get() stop (i.e. resurrection stops), then the GC
> gets a
> > change to clean out the stale WeakReferences.
> >
> > This is a known problem with WeakReferences, which
> ThreadLocal makes
> > heavy use of. The good news is that work is in progress to
> fix this.
> > The
> > first part (https://bugs.openjdk.java.net/browse/JDK-8188055)
> recently
> > went into mainline, and the two follow up patches
> > (https://bugs.openjdk.java.net/browse/JDK-8256377 and
> > https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully
> go in
> > soon, and will resolve this issue for good.
> >
> > In the mean time, you might want to look into calling
> > ThreadLocal.remove() once a thread local is no longer needed.
> This is
> > not always feasible, since those thread locals might have be
> created by
> > a library you don't control, or it might be difficult to tell
> when a
> > thread local is no longer needed, etc. However, if it is
> feasible in
> > your case, then it might be a way for you to lower to cost of
> > processing
> > WeakReferences.
> >
> > (CMS in JDK 8 was less sensitive to this issue, because it had a
> > different marking strategy).
> >
> > cheers,
> > Per
> >
> > >
> > > [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658
> encountered,
> > 72334
> > >> discovered, 0 enqueued
> > >> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462
> > encountered, 122216
> > >> discovered, 1380 enqueued
> > >> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598
> > encountered, 107228
> > >> discovered, 677 enqueued
> > >> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536
> > encountered, 82199
> > >> discovered, 1713 enqueued
> > >> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946
> > encountered, 291651
> > >> discovered, 292 enqueued
> > >> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065
> > encountered, 124351
> > >> discovered, 815 enqueued
> > >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> > encountered, 298932
> > >> discovered, 353 enqueued
> > >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> > encountered, 519369
> > >> discovered, 4648 enqueued
> > >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> > encountered, 298932
> > >> discovered, 353 enqueued
> > >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> > encountered, 519369
> > >> discovered, 4648 enqueued
> > >> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757
> > encountered, 928748
> > >> discovered, 1691 enqueued
> > >> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080
> > encountered, 841168
> > >> discovered, 247352 enqueued
> > >> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253
> > encountered, 568564
> > >> discovered, 3938 enqueued
> > >> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081
> > encountered, 788825
> > >> discovered, 767288 enqueued
> > >> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876
> > encountered, 18186
> > >> discovered, 1 enqueued
> > >
> > >
> >
> >
> >
> > --
> > 曾少斌
>
>
>
> --
> 曾少斌
More information about the zgc-dev
mailing list