high cpu usage caused by weak reference

shaobin zeng zengshaobin2008 at gmail.com
Thu Nov 26 02:31:22 UTC 2020


OK, I will do it right away. Thank you very much!

Per Liden <per.liden at oracle.com> 于2020年11月26日周四 上午12:51写道:

> On 11/25/20 4:30 PM, shaobin zeng wrote:
> > Hi Per,
> >      It is so kind of you to explain the details to me, I got the point
> > now. Then this problem may be resolved if the threads are not pooled or
> > pooled for a short time only, right? Since the innocent instances will
> > not be asked again if the thread is not active any more.
>
> Yes, in the sense that the backing hash table is owned by the thread, so
> when it terminates the table (and all its WeakReferences) will become
> garbage and eventually collected.
>
> cheers,
> Per
>
> >
> > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>> 于2020年
> > 11月25日周三 下午10:49写道:
> >
> >     On 11/24/20 5:41 PM, shaobin zeng wrote:
> >      > Hi Per,
> >      >      Thanks for your reply, the thread locals are used in a lib
> >     so it is
> >      > hard to change. And did you mean zgc is one of the SATB
> >     collectors, and
> >      > there is some thing wrong when the collector try to collect these
> >     weak
> >      > references because of the Reference::get is called?
> >
> >     By calling Reference.get() you create a strong reference to the
> >     referent, which means it will not be collected until a future GC
> cycle
> >     has concluded that it's no longer strongly reachable. ThreadLocal is
> >     backed by a hash table. ThreadLocal.get() will do a lookup in that
> >     table, which in turn can call Reference.get() on a number of
> "innocent"
> >     instances to figure out if it's the instance it is looking for. This
> >     causes those "innocent" instances to be kept alive. This is where
> >     Reference.refersTo() comes in (which I linked to). refersTo()
> >     provides a
> >     way to ask what the referent is pointing to without creating a strong
> >     reference to it. This allows ThreadLocal.get() to avoid keeping
> >     "innocent" instances alive.
> >
> >     cheers,
> >     Per
> >
> >     (Btw, ZGC does precise wavefront marking, not SATB)
> >
> >      >
> >      > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>
> >     <mailto:per.liden at oracle.com <mailto:per.liden at oracle.com>>> 于2020年
> >      > 11月24日周二 下午7:35写道:
> >      >
> >      >     Hi,
> >      >
> >      >     On 11/24/20 2:40 AM, shaobin zeng wrote:
> >      >      > Hi,
> >      >      >    I am trying to use zgc in product environment, so I
> updated
> >      >     jdk from jdk8
> >      >      > to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related
> >      >     options, but the
> >      >      > cpu usage goes to 1000+% after a few hours later of the jvm
> >      >     start(normal
> >      >      > cpu usage should be 100-300%). If I make node offline
> >     about 30s,
> >      >     the cpu
> >      >      > goes down, and then make it online, it will work normally
> for
> >      >     hours until
> >      >      > the cpu goes high again. Here is the gc option:
> >      >      >
> >      >      >> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
> >      >      >> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8
> >      >     -XX:ConcGCThreads=4
> >      >      >> -Xss2m -XX:+UseZGC
> >      >      >>
> >      >
> >
>  -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
> >      >      >> -XX:+HeapDumpOnOutOfMemoryError'
> >      >      >
> >      >      >
> >      >      >     I profiled it with async-profile, the most hot method
> is
> >      >      > java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss,
> maybe
> >      >     there are too
> >      >      > many thread local map entry which are weak referenced are
> not
> >      >     reclaimed in
> >      >      > time (jdk8 and cms works well on this)?
> >      >      >    The following gc logs show the discovered weak
> >     reference count
> >      >     keep
> >      >      > increasing after start, but the requests is almost
> >     constant from
> >      >      > 11:00-17:00. Note that cpu dropped from 600% to 400%
> >      >     automatically after
> >      >      > GC(9821), the enqueued is ~250K. GC(10265) the node was
> >     offline,
> >      >     enqueued
> >      >      > was ~770K. I'm confused why the enqueued count is small in
> >     a long
> >      >     time and
> >      >      > the discovered count going up straightly?
> >      >      >     Thanks for any suggestions!
> >      >
> >      >     I think you've identified the problem. WeakReferences are
> >     constantly
> >      >     being resurrected and kept alive by calls to
> ThreadLocal.get(),
> >      >     which in
> >      >     turn calls getEntryAfterMiss(). Over time the table of
> >     WeakReferences
> >      >     grows (because they are all alive) and becomes more and more
> >     expensive
> >      >     to process. When you take the node off-line, and the calls
> >      >     ThreadLocal.get() stop (i.e. resurrection stops), then the GC
> >     gets a
> >      >     change to clean out the stale WeakReferences.
> >      >
> >      >     This is a known problem with WeakReferences, which
> >     ThreadLocal makes
> >      >     heavy use of. The good news is that work is in progress to
> >     fix this.
> >      >     The
> >      >     first part (https://bugs.openjdk.java.net/browse/JDK-8188055)
> >     recently
> >      >     went into mainline, and the two follow up patches
> >      >     (https://bugs.openjdk.java.net/browse/JDK-8256377 and
> >      > https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully
> >     go in
> >      >     soon, and will resolve this issue for good.
> >      >
> >      >     In the mean time, you might want to look into calling
> >      >     ThreadLocal.remove() once a thread local is no longer needed.
> >     This is
> >      >     not always feasible, since those thread locals might have be
> >     created by
> >      >     a library you don't control, or it might be difficult to tell
> >     when a
> >      >     thread local is no longer needed, etc. However, if it is
> >     feasible in
> >      >     your case, then it might be a way for you to lower to cost of
> >      >     processing
> >      >     WeakReferences.
> >      >
> >      >     (CMS in JDK 8 was less sensitive to this issue, because it
> had a
> >      >     different marking strategy).
> >      >
> >      >     cheers,
> >      >     Per
> >      >
> >      >      >
> >      >      > [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658
> >     encountered,
> >      >     72334
> >      >      >> discovered, 0 enqueued
> >      >      >> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462
> >      >     encountered, 122216
> >      >      >> discovered, 1380 enqueued
> >      >      >> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598
> >      >     encountered, 107228
> >      >      >> discovered, 677 enqueued
> >      >      >> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536
> >      >     encountered, 82199
> >      >      >> discovered, 1713 enqueued
> >      >      >> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946
> >      >     encountered, 291651
> >      >      >> discovered, 292 enqueued
> >      >      >> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065
> >      >     encountered, 124351
> >      >      >> discovered, 815 enqueued
> >      >      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> >      >     encountered, 298932
> >      >      >> discovered, 353 enqueued
> >      >      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> >      >     encountered, 519369
> >      >      >> discovered, 4648 enqueued
> >      >      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> >      >     encountered, 298932
> >      >      >> discovered, 353 enqueued
> >      >      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> >      >     encountered, 519369
> >      >      >> discovered, 4648 enqueued
> >      >      >> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757
> >      >     encountered, 928748
> >      >      >> discovered, 1691 enqueued
> >      >      >> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080
> >      >     encountered, 841168
> >      >      >> discovered, 247352 enqueued
> >      >      >> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253
> >      >     encountered, 568564
> >      >      >> discovered, 3938 enqueued
> >      >      >> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081
> >      >     encountered, 788825
> >      >      >> discovered, 767288 enqueued
> >      >      >> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876
> >      >     encountered, 18186
> >      >      >> discovered, 1 enqueued
> >      >      >
> >      >      >
> >      >
> >      >
> >      >
> >      > --
> >      > 曾少斌
> >
> >
> >
> > --
> > 曾少斌
>


-- 
曾少斌


More information about the zgc-dev mailing list