high cpu usage caused by weak reference

Per Liden per.liden at oracle.com
Tue Nov 24 11:35:17 UTC 2020


On 11/24/20 2:40 AM, shaobin zeng wrote:
> Hi,
>    I am trying to use zgc in product environment, so I updated jdk from jdk8
> to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related options, but the
> cpu usage goes to 1000+% after a few hours later of the jvm start(normal
> cpu usage should be 100-300%). If I make node offline about 30s, the cpu
> goes down, and then make it online, it will work normally for hours until
> the cpu goes high again. Here is the gc option:
>> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
>> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8 -XX:ConcGCThreads=4
>> -Xss2m -XX:+UseZGC
>> -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
>> -XX:+HeapDumpOnOutOfMemoryError'
>     I profiled it with async-profile, the most hot method is
> java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, maybe there are too
> many thread local map entry which are weak referenced are not reclaimed in
> time (jdk8 and cms works well on this)?
>    The following gc logs show the discovered weak reference count keep
> increasing after start, but the requests is almost constant from
> 11:00-17:00. Note that cpu dropped from 600% to 400% automatically after
> GC(9821), the enqueued is ~250K. GC(10265) the node was offline, enqueued
> was ~770K. I'm confused why the enqueued count is small in a long time and
> the discovered count going up straightly?
>     Thanks for any suggestions!

I think you've identified the problem. WeakReferences are constantly 
being resurrected and kept alive by calls to ThreadLocal.get(), which in 
turn calls getEntryAfterMiss(). Over time the table of WeakReferences 
grows (because they are all alive) and becomes more and more expensive 
to process. When you take the node off-line, and the calls 
ThreadLocal.get() stop (i.e. resurrection stops), then the GC gets a 
change to clean out the stale WeakReferences.

This is a known problem with WeakReferences, which ThreadLocal makes 
heavy use of. The good news is that work is in progress to fix this. The 
first part (https://bugs.openjdk.java.net/browse/JDK-8188055) recently 
went into mainline, and the two follow up patches 
(https://bugs.openjdk.java.net/browse/JDK-8256377 and 
https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully go in 
soon, and will resolve this issue for good.

In the mean time, you might want to look into calling 
ThreadLocal.remove() once a thread local is no longer needed. This is 
not always feasible, since those thread locals might have be created by 
a library you don't control, or it might be difficult to tell when a 
thread local is no longer needed, etc. However, if it is feasible in 
your case, then it might be a way for you to lower to cost of processing 

(CMS in JDK 8 was less sensitive to this issue, because it had a 
different marking strategy).


> [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658 encountered, 72334
>> discovered, 0 enqueued
>> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462 encountered, 122216
>> discovered, 1380 enqueued
>> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598 encountered, 107228
>> discovered, 677 enqueued
>> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536 encountered, 82199
>> discovered, 1713 enqueued
>> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946 encountered, 291651
>> discovered, 292 enqueued
>> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065 encountered, 124351
>> discovered, 815 enqueued
>> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932
>> discovered, 353 enqueued
>> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369
>> discovered, 4648 enqueued
>> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932
>> discovered, 353 enqueued
>> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369
>> discovered, 4648 enqueued
>> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757 encountered, 928748
>> discovered, 1691 enqueued
>> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080 encountered, 841168
>> discovered, 247352 enqueued
>> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253 encountered, 568564
>> discovered, 3938 enqueued
>> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081 encountered, 788825
>> discovered, 767288 enqueued
>> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876 encountered, 18186
>> discovered, 1 enqueued

More information about the zgc-dev mailing list