high cpu usage caused by weak reference

shaobin zeng zengshaobin2008 at gmail.com
Wed Nov 25 15:30:54 UTC 2020


Hi Per,
    It is so kind of you to explain the details to me, I got the point now.
Then this problem may be resolved if the threads are not pooled or pooled
for a short time only, right? Since the innocent instances will not be
asked again if the thread is not active any more.

Per Liden <per.liden at oracle.com> 于2020年11月25日周三 下午10:49写道:

> On 11/24/20 5:41 PM, shaobin zeng wrote:
> > Hi Per,
> >      Thanks for your reply, the thread locals are used in a lib so it is
> > hard to change. And did you mean zgc is one of the SATB collectors, and
> > there is some thing wrong when the collector try to collect these weak
> > references because of the Reference::get is called?
>
> By calling Reference.get() you create a strong reference to the
> referent, which means it will not be collected until a future GC cycle
> has concluded that it's no longer strongly reachable. ThreadLocal is
> backed by a hash table. ThreadLocal.get() will do a lookup in that
> table, which in turn can call Reference.get() on a number of "innocent"
> instances to figure out if it's the instance it is looking for. This
> causes those "innocent" instances to be kept alive. This is where
> Reference.refersTo() comes in (which I linked to). refersTo() provides a
> way to ask what the referent is pointing to without creating a strong
> reference to it. This allows ThreadLocal.get() to avoid keeping
> "innocent" instances alive.
>
> cheers,
> Per
>
> (Btw, ZGC does precise wavefront marking, not SATB)
>
> >
> > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>> 于2020年
> > 11月24日周二 下午7:35写道:
> >
> >     Hi,
> >
> >     On 11/24/20 2:40 AM, shaobin zeng wrote:
> >      > Hi,
> >      >    I am trying to use zgc in product environment, so I updated
> >     jdk from jdk8
> >      > to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related
> >     options, but the
> >      > cpu usage goes to 1000+% after a few hours later of the jvm
> >     start(normal
> >      > cpu usage should be 100-300%). If I make node offline about 30s,
> >     the cpu
> >      > goes down, and then make it online, it will work normally for
> >     hours until
> >      > the cpu goes high again. Here is the gc option:
> >      >
> >      >> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
> >      >> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8
> >     -XX:ConcGCThreads=4
> >      >> -Xss2m -XX:+UseZGC
> >      >>
> >
>  -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
> >      >> -XX:+HeapDumpOnOutOfMemoryError'
> >      >
> >      >
> >      >     I profiled it with async-profile, the most hot method is
> >      > java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, maybe
> >     there are too
> >      > many thread local map entry which are weak referenced are not
> >     reclaimed in
> >      > time (jdk8 and cms works well on this)?
> >      >    The following gc logs show the discovered weak reference count
> >     keep
> >      > increasing after start, but the requests is almost constant from
> >      > 11:00-17:00. Note that cpu dropped from 600% to 400%
> >     automatically after
> >      > GC(9821), the enqueued is ~250K. GC(10265) the node was offline,
> >     enqueued
> >      > was ~770K. I'm confused why the enqueued count is small in a long
> >     time and
> >      > the discovered count going up straightly?
> >      >     Thanks for any suggestions!
> >
> >     I think you've identified the problem. WeakReferences are constantly
> >     being resurrected and kept alive by calls to ThreadLocal.get(),
> >     which in
> >     turn calls getEntryAfterMiss(). Over time the table of WeakReferences
> >     grows (because they are all alive) and becomes more and more
> expensive
> >     to process. When you take the node off-line, and the calls
> >     ThreadLocal.get() stop (i.e. resurrection stops), then the GC gets a
> >     change to clean out the stale WeakReferences.
> >
> >     This is a known problem with WeakReferences, which ThreadLocal makes
> >     heavy use of. The good news is that work is in progress to fix this.
> >     The
> >     first part (https://bugs.openjdk.java.net/browse/JDK-8188055)
> recently
> >     went into mainline, and the two follow up patches
> >     (https://bugs.openjdk.java.net/browse/JDK-8256377 and
> >     https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully go
> in
> >     soon, and will resolve this issue for good.
> >
> >     In the mean time, you might want to look into calling
> >     ThreadLocal.remove() once a thread local is no longer needed. This is
> >     not always feasible, since those thread locals might have be created
> by
> >     a library you don't control, or it might be difficult to tell when a
> >     thread local is no longer needed, etc. However, if it is feasible in
> >     your case, then it might be a way for you to lower to cost of
> >     processing
> >     WeakReferences.
> >
> >     (CMS in JDK 8 was less sensitive to this issue, because it had a
> >     different marking strategy).
> >
> >     cheers,
> >     Per
> >
> >      >
> >      > [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658 encountered,
> >     72334
> >      >> discovered, 0 enqueued
> >      >> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462
> >     encountered, 122216
> >      >> discovered, 1380 enqueued
> >      >> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598
> >     encountered, 107228
> >      >> discovered, 677 enqueued
> >      >> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536
> >     encountered, 82199
> >      >> discovered, 1713 enqueued
> >      >> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946
> >     encountered, 291651
> >      >> discovered, 292 enqueued
> >      >> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065
> >     encountered, 124351
> >      >> discovered, 815 enqueued
> >      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> >     encountered, 298932
> >      >> discovered, 353 enqueued
> >      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> >     encountered, 519369
> >      >> discovered, 4648 enqueued
> >      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> >     encountered, 298932
> >      >> discovered, 353 enqueued
> >      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> >     encountered, 519369
> >      >> discovered, 4648 enqueued
> >      >> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757
> >     encountered, 928748
> >      >> discovered, 1691 enqueued
> >      >> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080
> >     encountered, 841168
> >      >> discovered, 247352 enqueued
> >      >> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253
> >     encountered, 568564
> >      >> discovered, 3938 enqueued
> >      >> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081
> >     encountered, 788825
> >      >> discovered, 767288 enqueued
> >      >> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876
> >     encountered, 18186
> >      >> discovered, 1 enqueued
> >      >
> >      >
> >
> >
> >
> > --
> > 曾少斌
>


-- 
曾少斌


More information about the zgc-dev mailing list