From David.Tavoularis at mycom-osi.com  Fri Nov 20 15:21:31 2020
From: David.Tavoularis at mycom-osi.com (David Tavoularis)
Date: Fri, 20 Nov 2020 16:21:31 +0100
Subject: ZGC: Failed to truncate backing file (Permission denied) at startup
 after 14.0.2 to 15.0.1 upgrade
Message-ID: <op.0udzl5ketktu5k@fr-nbspt6.mycom-internal.com>

Hi,

I found a possible regression in customer production environment linked to  
JDK-8245203 "ZGC: Don't track size in ZPhysicalMemoryBacking" or JDK15  
"Fixed support for transparent huge pages".
After upgrade from jdk-14.0.2 to jdk-15.0.1, the JVM (using ZGC) is  
crashing at startup with the error message "Failed to truncate backing  
file (Permission denied)"

This error message was introduced in changelist  
http://hg.openjdk.java.net/jdk-updates/jdk15u/rev/556d5070c458
Cf source code :  
http://hg.openjdk.java.net/jdk-updates/jdk15u/file/556d5070c458/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp

The issue was seen in Prod, not in Test, nor in labs, I suspect that  Heap  
Backing Filesystem  is not correctly set to tmpfs in Prod or a permission  
issue on a directory/file owned by root.

$ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -version
openjdk version "15.0.1" 2020-10-20
OpenJDK Runtime Environment (build 15.0.1+9-18)
OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing)

$ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -XX:+UseZGC -version
[0.009s][error][gc] Failed to truncate backing file (Permission denied)
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

$ /opt/mycom/3rd_party/jdk_installed/jdk-14.0.2/bin/java  
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -version
openjdk version "14.0.2" 2020-07-14
OpenJDK Runtime Environment (build 14.0.2+12-46)
OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)

Should I fill a bug at https://bugreport.java.com/bugreport/ ?
What additional information should I provide ?
Any workaround ?

Best Regards
-- 
David


[Additional details]

1. Not working in Prod with 15.0.1 :
openjdk version "15.0.1" 2020-10-20
OpenJDK Runtime Environment (build 15.0.1+9-18)
OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing)

[2020-11-20T16:56:31.952+0530][debug][gc,heap] Minimum heap 96636764160  
Initial heap 96636764160 Maximum heap 289910292480
[2020-11-20T16:56:31.953+0530][info ][gc,init] Initializing The Z Garbage  
Collector
[2020-11-20T16:56:31.953+0530][info ][gc,init] Version: 15.0.1+9-18  
(release)
[2020-11-20T16:56:31.953+0530][info ][gc,init] NUMA Support: Disabled
[2020-11-20T16:56:31.953+0530][info ][gc,init] CPUs: 70 total, 45 available
[2020-11-20T16:56:31.953+0530][info ][gc,init] Memory: 307200M
[2020-11-20T16:56:31.953+0530][info ][gc,init] Large Page Support: Disabled
[2020-11-20T16:56:31.953+0530][info ][gc,init] Workers: 8 parallel, 8  
concurrent
[2020-11-20T16:56:31.956+0530][info ][gc,init] Address Space Type:  
Contiguous/Unrestricted/Complete
[2020-11-20T16:56:31.956+0530][info ][gc,init] Address Space Size:  
4423680M x 3 = 13271040M
[2020-11-20T16:56:31.956+0530][info ][gc,init] Heap Backing File:  
/memfd:java_heap
[2020-11-20T16:56:31.956+0530][error][gc ] Failed to truncate backing file  
(Permission denied)
[2020-11-20T16:56:31.992+0530][info ][gc,init] Runtime Workers: 8 parallel
[2020-11-20T16:56:31.994+0530][info ][gc ] Using The Z Garbage Collector

$ /opt/mycom/3rd_party/jdk_installed/jdk-15.0.1/bin/java -Xmx512m  
-XX:+PrintFlagsFinal -version | egrep  
"UseHugeTLBFS|UseLargePages|UseSHM|UseTransparentHugePages"
bool UseHugeTLBFS = false {product} {default}
bool UseLargePages = false {pd product} {default}
bool UseLargePagesInMetaspace = false {product} {default}
bool UseLargePagesIndividualAllocation = false {pd product} {default}
bool UseSHM = false {product} {default}
bool UseTransparentHugePages = false {product} {default}


2. Working fine in Prod with 14.0.2 :
openjdk version "14.0.2" 2020-07-14
OpenJDK Runtime Environment (build 14.0.2+12-46)
OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)

[2020-11-20T17:10:57.378+0530][debug][gc,heap] Minimum heap 96636764160  
Initial heap 96636764160 Maximum heap 289910292480
[2020-11-20T17:10:57.378+0530][info ][gc,init] Initializing The Z Garbage  
Collector
[2020-11-20T17:10:57.378+0530][info ][gc,init] Version: 14.0.2+12-46  
(release)
[2020-11-20T17:10:57.378+0530][info ][gc,init] NUMA Support: Disabled
[2020-11-20T17:10:57.378+0530][info ][gc,init] CPUs: 70 total, 45 available
[2020-11-20T17:10:57.378+0530][info ][gc,init] Memory: 307200M
[2020-11-20T17:10:57.378+0530][info ][gc,init] Large Page Support: Disabled
[2020-11-20T17:10:57.378+0530][info ][gc,init] Medium Page Size: 32M
[2020-11-20T17:10:57.378+0530][info ][gc,init] Workers: 8 parallel, 8  
concurrent
[2020-11-20T17:10:57.381+0530][info ][gc,init] Address Space Type:  
Contiguous/Unrestricted/Complete
[2020-11-20T17:10:57.381+0530][info ][gc,init] Address Space Size:  
4423680M x 3 = 13271040M
[2020-11-20T17:10:57.381+0530][info ][gc,init] Heap backed by file:  
/memfd:java_heap
[2020-11-20T17:10:57.381+0530][info ][gc,init] Min Capacity: 92160M
[2020-11-20T17:10:57.381+0530][info ][gc,init] Initial Capacity: 92160M
[2020-11-20T17:10:57.381+0530][info ][gc,init] Max Capacity: 276480M
[2020-11-20T17:10:57.381+0530][info ][gc,init] Max Reserve: 48M
[2020-11-20T17:10:57.381+0530][info ][gc,init] Pre-touch: Disabled
[2020-11-20T17:10:57.381+0530][info ][gc,init] Available space on backing  
filesystem: N/A
[2020-11-20T17:11:21.585+0530][info ][gc,init] Uncommit: Enabled, Delay:  
300s
[2020-11-20T17:11:21.618+0530][info ][gc,init] Runtime Workers: 8 parallel
[2020-11-20T17:11:21.620+0530][info ][gc ] Using The Z Garbage Collector


Working fine in Lab with 15.0.1 :
$ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -Xms90g -Xmx270g  
-XX:-UseNUMA -XX:+UseZGC -XX:ParallelGCThreads=8  
-Xlog:gc*,gc+heap=debug,gc+age=trace,safepoint:stdout:time,level,tags  
HelloWorld.java
[2020-11-20T14:16:08.635+0000][debug][gc,heap] Minimum heap 96636764160  
Initial heap 96636764160 Maximum heap 289910292480
[2020-11-20T14:16:08.635+0000][info ][gc,init] Initializing The Z Garbage  
Collector
[2020-11-20T14:16:08.635+0000][info ][gc,init] Version: 15.0.1+9-18  
(release)
[2020-11-20T14:16:08.635+0000][info ][gc,init] NUMA Support: Disabled
[2020-11-20T14:16:08.635+0000][info ][gc,init] CPUs: 64 total, 64 available
[2020-11-20T14:16:08.635+0000][info ][gc,init] Memory: 257775M
[2020-11-20T14:16:08.635+0000][info ][gc,init] Large Page Support: Disabled
[2020-11-20T14:16:08.635+0000][info ][gc,init] Workers: 8 parallel, 8  
concurrent
[2020-11-20T14:16:08.636+0000][info ][gc,init] Address Space Type:  
Contiguous/Unrestricted/Complete
[2020-11-20T14:16:08.636+0000][info ][gc,init] Address Space Size:  
4423680M x 3 = 13271040M
[2020-11-20T14:16:08.636+0000][info ][gc,init] Heap Backing File:  
/memfd:java_heap
[2020-11-20T14:16:08.637+0000][info ][gc,init] Heap Backing Filesystem:  
tmpfs (0x1021994)
[2020-11-20T14:16:08.637+0000][info ][gc,init] Min Capacity: 92160M
[2020-11-20T14:16:08.637+0000][info ][gc,init] Initial Capacity: 92160M
[2020-11-20T14:16:08.637+0000][info ][gc,init] Max Capacity: 276480M
[2020-11-20T14:16:08.637+0000][info ][gc,init] Max Reserve: 48M
[2020-11-20T14:16:08.637+0000][info ][gc,init] Medium Page Size: 32M
[2020-11-20T14:16:08.637+0000][info ][gc,init] Pre-touch: Disabled
[2020-11-20T14:16:08.637+0000][info ][gc,init] Available space on backing  
filesystem: N/A
[2020-11-20T14:16:08.637+0000][info ][gc,init] Uncommit: Enabled
[2020-11-20T14:16:08.637+0000][info ][gc,init] Uncommit Delay: 300s
[2020-11-20T14:16:24.652+0000][info ][gc,init] Runtime Workers: 8 parallel
[2020-11-20T14:16:24.653+0000][info ][gc ] Using The Z Garbage Collector

From per.liden at oracle.com  Mon Nov 23 14:11:30 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 23 Nov 2020 15:11:30 +0100
Subject: ZGC: Failed to truncate backing file (Permission denied) at
 startup after 14.0.2 to 15.0.1 upgrade
In-Reply-To: <op.0udzl5ketktu5k@fr-nbspt6.mycom-internal.com>
References: <op.0udzl5ketktu5k@fr-nbspt6.mycom-internal.com>
Message-ID: <ff5edd20-174a-2ae7-6c05-e44549448930@oracle.com>

Hi David,

On 11/20/20 4:21 PM, David Tavoularis wrote:
> Hi,
> 
> I found a possible regression in customer production environment linked 
> to JDK-8245203 "ZGC: Don't track size in ZPhysicalMemoryBacking" or 
> JDK15 "Fixed support for transparent huge pages".
> After upgrade from jdk-14.0.2 to jdk-15.0.1, the JVM (using ZGC) is 
> crashing at startup with the error message "Failed to truncate backing 
> file (Permission denied)"
> 
> This error message was introduced in changelist 
> http://hg.openjdk.java.net/jdk-updates/jdk15u/rev/556d5070c458
> Cf source code : 
> http://hg.openjdk.java.net/jdk-updates/jdk15u/file/556d5070c458/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp 
> 
> 
> The issue was seen in Prod, not in Test, nor in labs, I suspect that  
> Heap Backing Filesystem? is not correctly set to tmpfs in Prod or a 
> permission issue on a directory/file owned by root.
> 
> $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -version
> openjdk version "15.0.1" 2020-10-20
> OpenJDK Runtime Environment (build 15.0.1+9-18)
> OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing)
> 
> $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -XX:+UseZGC -version
> [0.009s][error][gc] Failed to truncate backing file (Permission denied)
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> 
> $ /opt/mycom/3rd_party/jdk_installed/jdk-14.0.2/bin/java 
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -version
> openjdk version "14.0.2" 2020-07-14
> OpenJDK Runtime Environment (build 14.0.2+12-46)
> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
> 
> Should I fill a bug at https://bugreport.java.com/bugreport/ ?
> What additional information should I provide ?
> Any workaround ?

It's hard to tell exactly why this fails. One difference between JDK 14 
and 15 is that the backing file will be truncated to the max heap size 
up-front when the JVM starts (but it will not be backed by any memory 
until we actually start to expand the heap), instead of gradually grown 
as the heap expands. So, one theory is that you could have had this 
problem even with 14, but the heap might not have grown large enough for 
the problem to be exposed. To test this, run with JDK 14 and use 
"-Xms270g -Xmx270g". This will force the backing file to be truncated 
up-front, similar to what happens in JDK 15.

Also running with "-Xlog:gc*=debug" will print a few more things that 
might be interesting.

The fact that ftruncate returns EACCESS suggests that there's some kind 
of environment problem that is blocking the file from growing. In the 
production system, is this process running in some constrained 
environment/container/cgroup?

cheers,
Per

From David.Tavoularis at mycom-osi.com  Mon Nov 23 14:35:49 2020
From: David.Tavoularis at mycom-osi.com (David Tavoularis)
Date: Mon, 23 Nov 2020 15:35:49 +0100
Subject: ZGC: Failed to truncate backing file (Permission denied) at
 startup after 14.0.2 to 15.0.1 upgrade
In-Reply-To: <ff5edd20-174a-2ae7-6c05-e44549448930@oracle.com>
References: <op.0udzl5ketktu5k@fr-nbspt6.mycom-internal.com>
 <ff5edd20-174a-2ae7-6c05-e44549448930@oracle.com>
Message-ID: <op.0ujhhziutktu5k@fr-nbspt6.mycom-internal.com>

Hi Per,

Thank you for your quick reply.

> The fact that ftruncate returns EACCESS suggests that there's some kind  
> of environment problem that is blocking the file from growing.
> In the production system, is this process running in some constrained  
> environment/container/cgroup?
Yes, it is running inside a pod in an OpenShift environment.

> To test this, run with JDK 14 and use "-Xms270g -Xmx270g"
I tried with "-Xms100g -Xmx100g", I hope that it is fine for this test.

$ /opt/3rd_party/jdk_installed/jdk-14.0.2/bin/java -Xms100g -Xmx100g  
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
[0.006s][debug][gc,heap] Minimum heap 107374182400  Initial heap  
107374182400  Maximum heap 107374182400
[0.006s][info ][gc,init] Initializing The Z Garbage Collector
[0.006s][info ][gc,init] Version: 14.0.2+12-46 (release)
[0.006s][info ][gc,init] NUMA Support: Disabled
[0.006s][info ][gc,init] CPUs: 70 total, 40 available
[0.006s][info ][gc,init] Memory: 419840M
[0.006s][info ][gc,init] Large Page Support: Disabled
[0.006s][info ][gc,init] Medium Page Size: 32M
[0.006s][info ][gc,init] Workers: 24 parallel, 5 concurrent
[0.009s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active  
Workers: 24
[0.009s][info ][gc,init] Address Space Type:  
Contiguous/Unrestricted/Complete
[0.009s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
[0.010s][info ][gc,init] Heap backed by file: /memfd:java_heap
[0.010s][info ][gc,init] Min Capacity: 102400M
[0.010s][info ][gc,init] Initial Capacity: 102400M
[0.010s][info ][gc,init] Max Capacity: 102400M
[0.010s][info ][gc,init] Max Reserve: 80M
[0.010s][info ][gc,init] Pre-touch: Disabled
[0.010s][info ][gc,init] Available space on backing filesystem: N/A
[25.372s][info ][gc,init] Uncommit: Disabled
[25.372s][debug][gc,marking] Expanding mark stack space: 0M->32M
[25.888s][info ][gc,init   ] Runtime Workers: 24 parallel
[25.960s][info ][gc        ] Using The Z Garbage Collector
[26.179s][debug][gc,nmethod] Rebuilding NMethod Table: 0->1024 entries,  
0(0%->0%) registered, 0(0%->0%) unregistered
openjdk version "14.0.2" 2020-07-14
OpenJDK Runtime Environment (build 14.0.2+12-46)
OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)

$ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -Xms100g -Xmx100g  
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
[0.006s][debug][gc,heap] Minimum heap 107374182400  Initial heap  
107374182400  Maximum heap 107374182400
[0.006s][info ][gc,init] Initializing The Z Garbage Collector
[0.006s][info ][gc,init] Version: 15.0.1+9-18 (release)
[0.006s][info ][gc,init] NUMA Support: Disabled
[0.006s][info ][gc,init] CPUs: 70 total, 40 available
[0.006s][info ][gc,init] Memory: 419840M
[0.006s][info ][gc,init] Large Page Support: Disabled
[0.007s][info ][gc,init] Workers: 24 parallel, 5 concurrent
[0.010s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active  
Workers: 24
[0.011s][info ][gc,init] Address Space Type:  
Contiguous/Unrestricted/Complete
[0.011s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
[0.012s][info ][gc,init] Heap Backing File: /memfd:java_heap
[0.012s][error][gc     ] Failed to truncate backing file (Permission  
denied)
[0.012s][debug][gc,marking] Expanding mark stack space: 0M->32M
[0.081s][info ][gc,init   ] Runtime Workers: 24 parallel
[0.084s][info ][gc        ] Using The Z Garbage Collector
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

> The fact that ftruncate returns EACCESS suggests that there's some kind  
> of environment problem
Is there a workaround or a way to fix the environment problem ?

Best Regards
David

On Mon, 23 Nov 2020 15:11:30 +0100, Per Liden <per.liden at oracle.com> wrote:

> Hi David,
>
> On 11/20/20 4:21 PM, David Tavoularis wrote:
>> Hi,
>>  I found a possible regression in customer production environment  
>> linked to JDK-8245203 "ZGC: Don't track size in ZPhysicalMemoryBacking"  
>> or JDK15 "Fixed support for transparent huge pages".
>> After upgrade from jdk-14.0.2 to jdk-15.0.1, the JVM (using ZGC) is  
>> crashing at startup with the error message "Failed to truncate backing  
>> file (Permission denied)"
>>  This error message was introduced in changelist  
>> http://hg.openjdk.java.net/jdk-updates/jdk15u/rev/556d5070c458
>> Cf source code :  
>> http://hg.openjdk.java.net/jdk-updates/jdk15u/file/556d5070c458/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp  
>>   The issue was seen in Prod, not in Test, nor in labs, I suspect that   
>> Heap Backing Filesystem  is not correctly set to tmpfs in Prod or a  
>> permission issue on a directory/file owned by root.
>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -version
>> openjdk version "15.0.1" 2020-10-20
>> OpenJDK Runtime Environment (build 15.0.1+9-18)
>> OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing)
>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -XX:+UseZGC -version
>> [0.009s][error][gc] Failed to truncate backing file (Permission denied)
>> Error: Could not create the Java Virtual Machine.
>> Error: A fatal exception has occurred. Program will exit.
>>  $ /opt/mycom/3rd_party/jdk_installed/jdk-14.0.2/bin/java  
>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -version
>> openjdk version "14.0.2" 2020-07-14
>> OpenJDK Runtime Environment (build 14.0.2+12-46)
>> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
>>  Should I fill a bug at https://bugreport.java.com/bugreport/ ?
>> What additional information should I provide ?
>> Any workaround ?
>
> It's hard to tell exactly why this fails. One difference between JDK 14  
> and 15 is that the backing file will be truncated to the max heap size  
> up-front when the JVM starts (but it will not be backed by any memory  
> until we actually start to expand the heap), instead of gradually grown  
> as the heap expands. So, one theory is that you could have had this  
> problem even with 14, but the heap might not have grown large enough for  
> the problem to be exposed. To test this, run with JDK 14 and use  
> "-Xms270g -Xmx270g". This will force the backing file to be truncated  
> up-front, similar to what happens in JDK 15.
>
> Also running with "-Xlog:gc*=debug" will print a few more things that  
> might be interesting.
>
> The fact that ftruncate returns EACCESS suggests that there's some kind  
> of environment problem that is blocking the file from growing. In the  
> production system, is this process running in some constrained  
> environment/container/cgroup?
>
> cheers,
> Per

From per.liden at oracle.com  Mon Nov 23 17:40:41 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 23 Nov 2020 18:40:41 +0100
Subject: ZGC: Failed to truncate backing file (Permission denied) at
 startup after 14.0.2 to 15.0.1 upgrade
In-Reply-To: <op.0ujhhziutktu5k@fr-nbspt6.mycom-internal.com>
References: <op.0udzl5ketktu5k@fr-nbspt6.mycom-internal.com>
 <ff5edd20-174a-2ae7-6c05-e44549448930@oracle.com>
 <op.0ujhhziutktu5k@fr-nbspt6.mycom-internal.com>
Message-ID: <c69739c6-4724-280a-6f0f-373d1207abce@oracle.com>

Hi David,

On 11/23/20 3:35 PM, David Tavoularis wrote:
> Hi Per,
> 
> Thank you for your quick reply.
> 
>> The fact that ftruncate returns EACCESS suggests that there's some 
>> kind of environment problem that is blocking the file from growing.
>> In the production system, is this process running in some constrained 
>> environment/container/cgroup?
> Yes, it is running inside a pod in an OpenShift environment.
> 
>> To test this, run with JDK 14 and use "-Xms270g -Xmx270g"
> I tried with "-Xms100g -Xmx100g", I hope that it is fine for this test.
> 
> $ /opt/3rd_party/jdk_installed/jdk-14.0.2/bin/java -Xms100g -Xmx100g 
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
> [0.006s][debug][gc,heap] Minimum heap 107374182400? Initial heap 
> 107374182400? Maximum heap 107374182400
> [0.006s][info ][gc,init] Initializing The Z Garbage Collector
> [0.006s][info ][gc,init] Version: 14.0.2+12-46 (release)
> [0.006s][info ][gc,init] NUMA Support: Disabled
> [0.006s][info ][gc,init] CPUs: 70 total, 40 available
> [0.006s][info ][gc,init] Memory: 419840M
> [0.006s][info ][gc,init] Large Page Support: Disabled
> [0.006s][info ][gc,init] Medium Page Size: 32M
> [0.006s][info ][gc,init] Workers: 24 parallel, 5 concurrent
> [0.009s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active 
> Workers: 24
> [0.009s][info ][gc,init] Address Space Type: 
> Contiguous/Unrestricted/Complete
> [0.009s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
> [0.010s][info ][gc,init] Heap backed by file: /memfd:java_heap
> [0.010s][info ][gc,init] Min Capacity: 102400M
> [0.010s][info ][gc,init] Initial Capacity: 102400M
> [0.010s][info ][gc,init] Max Capacity: 102400M
> [0.010s][info ][gc,init] Max Reserve: 80M
> [0.010s][info ][gc,init] Pre-touch: Disabled
> [0.010s][info ][gc,init] Available space on backing filesystem: N/A
> [25.372s][info ][gc,init] Uncommit: Disabled
> [25.372s][debug][gc,marking] Expanding mark stack space: 0M->32M
> [25.888s][info ][gc,init?? ] Runtime Workers: 24 parallel
> [25.960s][info ][gc??????? ] Using The Z Garbage Collector
> [26.179s][debug][gc,nmethod] Rebuilding NMethod Table: 0->1024 entries, 
> 0(0%->0%) registered, 0(0%->0%) unregistered
> openjdk version "14.0.2" 2020-07-14
> OpenJDK Runtime Environment (build 14.0.2+12-46)
> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
> 
> $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -Xms100g -Xmx100g 
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
> [0.006s][debug][gc,heap] Minimum heap 107374182400? Initial heap 
> 107374182400? Maximum heap 107374182400
> [0.006s][info ][gc,init] Initializing The Z Garbage Collector
> [0.006s][info ][gc,init] Version: 15.0.1+9-18 (release)
> [0.006s][info ][gc,init] NUMA Support: Disabled
> [0.006s][info ][gc,init] CPUs: 70 total, 40 available
> [0.006s][info ][gc,init] Memory: 419840M
> [0.006s][info ][gc,init] Large Page Support: Disabled
> [0.007s][info ][gc,init] Workers: 24 parallel, 5 concurrent
> [0.010s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active 
> Workers: 24
> [0.011s][info ][gc,init] Address Space Type: 
> Contiguous/Unrestricted/Complete
> [0.011s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
> [0.012s][info ][gc,init] Heap Backing File: /memfd:java_heap
> [0.012s][error][gc???? ] Failed to truncate backing file (Permission 
> denied)
> [0.012s][debug][gc,marking] Expanding mark stack space: 0M->32M
> [0.081s][info ][gc,init?? ] Runtime Workers: 24 parallel
> [0.084s][info ][gc??????? ] Using The Z Garbage Collector
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> 
>> The fact that ftruncate returns EACCESS suggests that there's some 
>> kind of environment problem
> Is there a workaround or a way to fix the environment problem ?

I looked around for Docker + ftruncate issues, and found a number of 
bugs related to this, for example:

https://github.com/openshift/origin/issues/15723
https://access.redhat.com/errata/RHBA-2018:0195

These bugs are fairly old, but they seem to describe the exact same 
problem you're seeing. In short, it's an docker+selinux policy bug 
affecting memfd_create+ftruncate. Maybe you're production system haven't 
been patched with this bugfix?

This would also explain why JDK 14 works, since we used fallocate 
(instead of ftruncate) to grow the file there, and therefore happened to 
dodge this issue.

Please have a look at those links, and see if you might have run into 
the same issue.

cheers,
Per

> 
> Best Regards
> David
> 
> On Mon, 23 Nov 2020 15:11:30 +0100, Per Liden <per.liden at oracle.com> wrote:
> 
>> Hi David,
>>
>> On 11/20/20 4:21 PM, David Tavoularis wrote:
>>> Hi,
>>> ?I found a possible regression in customer production environment 
>>> linked to JDK-8245203 "ZGC: Don't track size in 
>>> ZPhysicalMemoryBacking" or JDK15 "Fixed support for transparent huge 
>>> pages".
>>> After upgrade from jdk-14.0.2 to jdk-15.0.1, the JVM (using ZGC) is 
>>> crashing at startup with the error message "Failed to truncate 
>>> backing file (Permission denied)"
>>> ?This error message was introduced in changelist 
>>> http://hg.openjdk.java.net/jdk-updates/jdk15u/rev/556d5070c458
>>> Cf source code : 
>>> http://hg.openjdk.java.net/jdk-updates/jdk15u/file/556d5070c458/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp 
>>> ? The issue was seen in Prod, not in Test, nor in labs, I suspect 
>>> that? Heap Backing Filesystem? is not correctly set to tmpfs in Prod 
>>> or a permission issue on a directory/file owned by root.
>>> ?$ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -version
>>> openjdk version "15.0.1" 2020-10-20
>>> OpenJDK Runtime Environment (build 15.0.1+9-18)
>>> OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing)
>>> ?$ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -XX:+UseZGC -version
>>> [0.009s][error][gc] Failed to truncate backing file (Permission denied)
>>> Error: Could not create the Java Virtual Machine.
>>> Error: A fatal exception has occurred. Program will exit.
>>> ?$ /opt/mycom/3rd_party/jdk_installed/jdk-14.0.2/bin/java 
>>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -version
>>> openjdk version "14.0.2" 2020-07-14
>>> OpenJDK Runtime Environment (build 14.0.2+12-46)
>>> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
>>> ?Should I fill a bug at https://bugreport.java.com/bugreport/ ?
>>> What additional information should I provide ?
>>> Any workaround ?
>>
>> It's hard to tell exactly why this fails. One difference between JDK 
>> 14 and 15 is that the backing file will be truncated to the max heap 
>> size up-front when the JVM starts (but it will not be backed by any 
>> memory until we actually start to expand the heap), instead of 
>> gradually grown as the heap expands. So, one theory is that you could 
>> have had this problem even with 14, but the heap might not have grown 
>> large enough for the problem to be exposed. To test this, run with JDK 
>> 14 and use "-Xms270g -Xmx270g". This will force the backing file to be 
>> truncated up-front, similar to what happens in JDK 15.
>>
>> Also running with "-Xlog:gc*=debug" will print a few more things that 
>> might be interesting.
>>
>> The fact that ftruncate returns EACCESS suggests that there's some 
>> kind of environment problem that is blocking the file from growing. In 
>> the production system, is this process running in some constrained 
>> environment/container/cgroup?
>>
>> cheers,
>> Per

From zengshaobin2008 at gmail.com  Tue Nov 24 01:40:13 2020
From: zengshaobin2008 at gmail.com (shaobin zeng)
Date: Tue, 24 Nov 2020 09:40:13 +0800
Subject: high cpu usage caused by weak reference
Message-ID: <CA+_9DrUG5xckYGY_2tMF0=vELLV4N0QbzXzCi9-+e8ob3EXMig@mail.gmail.com>

Hi,
  I am trying to use zgc in product environment, so I updated jdk from jdk8
to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related options, but the
cpu usage goes to 1000+% after a few hours later of the jvm start(normal
cpu usage should be 100-300%). If I make node offline about 30s, the cpu
goes down, and then make it online, it will work normally for hours until
the cpu goes high again. Here is the gc option:

> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8 -XX:ConcGCThreads=4
> -Xss2m -XX:+UseZGC
> -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
> -XX:+HeapDumpOnOutOfMemoryError'


   I profiled it with async-profile, the most hot method is
java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, maybe there are too
many thread local map entry which are weak referenced are not reclaimed in
time (jdk8 and cms works well on this)?
  The following gc logs show the discovered weak reference count keep
increasing after start, but the requests is almost constant from
11:00-17:00. Note that cpu dropped from 600% to 400% automatically after
GC(9821), the enqueued is ~250K. GC(10265) the node was offline, enqueued
was ~770K. I'm confused why the enqueued count is small in a long time and
the discovered count going up straightly?
   Thanks for any suggestions!

[2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658 encountered, 72334
> discovered, 0 enqueued
> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462 encountered, 122216
> discovered, 1380 enqueued
> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598 encountered, 107228
> discovered, 677 enqueued
> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536 encountered, 82199
> discovered, 1713 enqueued
> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946 encountered, 291651
> discovered, 292 enqueued
> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065 encountered, 124351
> discovered, 815 enqueued
> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932
> discovered, 353 enqueued
> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369
> discovered, 4648 enqueued
> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932
> discovered, 353 enqueued
> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369
> discovered, 4648 enqueued
> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757 encountered, 928748
> discovered, 1691 enqueued
> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080 encountered, 841168
> discovered, 247352 enqueued
> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253 encountered, 568564
> discovered, 3938 enqueued
> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081 encountered, 788825
> discovered, 767288 enqueued
> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876 encountered, 18186
> discovered, 1 enqueued


-- 
Shaobin Zeng

From David.Tavoularis at mycom-osi.com  Tue Nov 24 09:51:33 2020
From: David.Tavoularis at mycom-osi.com (David Tavoularis)
Date: Tue, 24 Nov 2020 10:51:33 +0100
Subject: ZGC: Failed to truncate backing file (Permission denied) at
 startup after 14.0.2 to 15.0.1 upgrade
In-Reply-To: <c69739c6-4724-280a-6f0f-373d1207abce@oracle.com>
References: <op.0udzl5ketktu5k@fr-nbspt6.mycom-internal.com>
 <ff5edd20-174a-2ae7-6c05-e44549448930@oracle.com>
 <op.0ujhhziutktu5k@fr-nbspt6.mycom-internal.com>
 <c69739c6-4724-280a-6f0f-373d1207abce@oracle.com>
Message-ID: <op.0ukyz7qgtktu5k@fr-nbspt6.mycom-internal.com>

Hi Per,

Thanks to your help, we identified that 15.0.1 (with ZGC) was starting  
fine with recent container-selinux versions (2.33/2.36/2.42), but was  
broken with older one (2.21) :
- Not working : container-selinux-2.21-2.gitba103ac.el7.noarch
- Working     : container-selinux-2.33-1.git86f33cd.el7.noarch
- Working     : container-selinux-2.36-1.gitff95335.el7.noarch
- Working     : container-selinux-2.42-1.gitad8f0f7.el7.noarch

We will plan to upgrade the nodes running with old version.

Just for my information, is there a plan to support the old buggy  
container-selinux in openjdk-15.0.2 by implementing a fallback to  
fallocate when ftruncate returns EACCESS ?

Best Regards
-- 
David

On Mon, 23 Nov 2020 18:40:41 +0100, Per Liden <per.liden at oracle.com> wrote:

> Hi David,
>
> On 11/23/20 3:35 PM, David Tavoularis wrote:
>> Hi Per,
>>  Thank you for your quick reply.
>>
>>> The fact that ftruncate returns EACCESS suggests that there's some  
>>> kind of environment problem that is blocking the file from growing.
>>> In the production system, is this process running in some constrained  
>>> environment/container/cgroup?
>> Yes, it is running inside a pod in an OpenShift environment.
>>
>>> To test this, run with JDK 14 and use "-Xms270g -Xmx270g"
>> I tried with "-Xms100g -Xmx100g", I hope that it is fine for this test.
>>  $ /opt/3rd_party/jdk_installed/jdk-14.0.2/bin/java -Xms100g -Xmx100g  
>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
>> [0.006s][debug][gc,heap] Minimum heap 107374182400  Initial heap  
>> 107374182400  Maximum heap 107374182400
>> [0.006s][info ][gc,init] Initializing The Z Garbage Collector
>> [0.006s][info ][gc,init] Version: 14.0.2+12-46 (release)
>> [0.006s][info ][gc,init] NUMA Support: Disabled
>> [0.006s][info ][gc,init] CPUs: 70 total, 40 available
>> [0.006s][info ][gc,init] Memory: 419840M
>> [0.006s][info ][gc,init] Large Page Support: Disabled
>> [0.006s][info ][gc,init] Medium Page Size: 32M
>> [0.006s][info ][gc,init] Workers: 24 parallel, 5 concurrent
>> [0.009s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active  
>> Workers: 24
>> [0.009s][info ][gc,init] Address Space Type:  
>> Contiguous/Unrestricted/Complete
>> [0.009s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
>> [0.010s][info ][gc,init] Heap backed by file: /memfd:java_heap
>> [0.010s][info ][gc,init] Min Capacity: 102400M
>> [0.010s][info ][gc,init] Initial Capacity: 102400M
>> [0.010s][info ][gc,init] Max Capacity: 102400M
>> [0.010s][info ][gc,init] Max Reserve: 80M
>> [0.010s][info ][gc,init] Pre-touch: Disabled
>> [0.010s][info ][gc,init] Available space on backing filesystem: N/A
>> [25.372s][info ][gc,init] Uncommit: Disabled
>> [25.372s][debug][gc,marking] Expanding mark stack space: 0M->32M
>> [25.888s][info ][gc,init   ] Runtime Workers: 24 parallel
>> [25.960s][info ][gc        ] Using The Z Garbage Collector
>> [26.179s][debug][gc,nmethod] Rebuilding NMethod Table: 0->1024 entries,  
>> 0(0%->0%) registered, 0(0%->0%) unregistered
>> openjdk version "14.0.2" 2020-07-14
>> OpenJDK Runtime Environment (build 14.0.2+12-46)
>> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -Xms100g -Xmx100g  
>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
>> [0.006s][debug][gc,heap] Minimum heap 107374182400  Initial heap  
>> 107374182400  Maximum heap 107374182400
>> [0.006s][info ][gc,init] Initializing The Z Garbage Collector
>> [0.006s][info ][gc,init] Version: 15.0.1+9-18 (release)
>> [0.006s][info ][gc,init] NUMA Support: Disabled
>> [0.006s][info ][gc,init] CPUs: 70 total, 40 available
>> [0.006s][info ][gc,init] Memory: 419840M
>> [0.006s][info ][gc,init] Large Page Support: Disabled
>> [0.007s][info ][gc,init] Workers: 24 parallel, 5 concurrent
>> [0.010s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active  
>> Workers: 24
>> [0.011s][info ][gc,init] Address Space Type:  
>> Contiguous/Unrestricted/Complete
>> [0.011s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
>> [0.012s][info ][gc,init] Heap Backing File: /memfd:java_heap
>> [0.012s][error][gc     ] Failed to truncate backing file (Permission  
>> denied)
>> [0.012s][debug][gc,marking] Expanding mark stack space: 0M->32M
>> [0.081s][info ][gc,init   ] Runtime Workers: 24 parallel
>> [0.084s][info ][gc        ] Using The Z Garbage Collector
>> Error: Could not create the Java Virtual Machine.
>> Error: A fatal exception has occurred. Program will exit.
>>
>>> The fact that ftruncate returns EACCESS suggests that there's some  
>>> kind of environment problem
>> Is there a workaround or a way to fix the environment problem ?
>
> I looked around for Docker + ftruncate issues, and found a number of  
> bugs related to this, for example:
>
> https://github.com/openshift/origin/issues/15723
> https://access.redhat.com/errata/RHBA-2018:0195
>
> These bugs are fairly old, but they seem to describe the exact same  
> problem you're seeing. In short, it's an docker+selinux policy bug  
> affecting memfd_create+ftruncate. Maybe you're production system haven't  
> been patched with this bugfix?
>
> This would also explain why JDK 14 works, since we used fallocate  
> (instead of ftruncate) to grow the file there, and therefore happened to  
> dodge this issue.
>
> Please have a look at those links, and see if you might have run into  
> the same issue.
>
> cheers,
> Per
>
>>  Best Regards
>> David
>>  On Mon, 23 Nov 2020 15:11:30 +0100, Per Liden <per.liden at oracle.com>  
>> wrote:
>>
>>> Hi David,
>>>
>>> On 11/20/20 4:21 PM, David Tavoularis wrote:
>>>> Hi,
>>>>  I found a possible regression in customer production environment  
>>>> linked to JDK-8245203 "ZGC: Don't track size in  
>>>> ZPhysicalMemoryBacking" or JDK15 "Fixed support for transparent huge  
>>>> pages".
>>>> After upgrade from jdk-14.0.2 to jdk-15.0.1, the JVM (using ZGC) is  
>>>> crashing at startup with the error message "Failed to truncate  
>>>> backing file (Permission denied)"
>>>>  This error message was introduced in changelist  
>>>> http://hg.openjdk.java.net/jdk-updates/jdk15u/rev/556d5070c458
>>>> Cf source code :  
>>>> http://hg.openjdk.java.net/jdk-updates/jdk15u/file/556d5070c458/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp  
>>>>   The issue was seen in Prod, not in Test, nor in labs, I suspect  
>>>> that  Heap Backing Filesystem  is not correctly set to tmpfs in Prod  
>>>> or a permission issue on a directory/file owned by root.
>>>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -version
>>>> openjdk version "15.0.1" 2020-10-20
>>>> OpenJDK Runtime Environment (build 15.0.1+9-18)
>>>> OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing)
>>>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -XX:+UseZGC  
>>>> -version
>>>> [0.009s][error][gc] Failed to truncate backing file (Permission  
>>>> denied)
>>>> Error: Could not create the Java Virtual Machine.
>>>> Error: A fatal exception has occurred. Program will exit.
>>>>  $ /opt/mycom/3rd_party/jdk_installed/jdk-14.0.2/bin/java  
>>>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -version
>>>> openjdk version "14.0.2" 2020-07-14
>>>> OpenJDK Runtime Environment (build 14.0.2+12-46)
>>>> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
>>>>  Should I fill a bug at https://bugreport.java.com/bugreport/ ?
>>>> What additional information should I provide ?
>>>> Any workaround ?
>>>
>>> It's hard to tell exactly why this fails. One difference between JDK  
>>> 14 and 15 is that the backing file will be truncated to the max heap  
>>> size up-front when the JVM starts (but it will not be backed by any  
>>> memory until we actually start to expand the heap), instead of  
>>> gradually grown as the heap expands. So, one theory is that you could  
>>> have had this problem even with 14, but the heap might not have grown  
>>> large enough for the problem to be exposed. To test this, run with JDK  
>>> 14 and use "-Xms270g -Xmx270g". This will force the backing file to be  
>>> truncated up-front, similar to what happens in JDK 15.
>>>
>>> Also running with "-Xlog:gc*=debug" will print a few more things that  
>>> might be interesting.
>>>
>>> The fact that ftruncate returns EACCESS suggests that there's some  
>>> kind of environment problem that is blocking the file from growing. In  
>>> the production system, is this process running in some constrained  
>>> environment/container/cgroup?
>>>
>>> cheers,
>>> Per

From per.liden at oracle.com  Tue Nov 24 10:32:00 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 24 Nov 2020 11:32:00 +0100
Subject: ZGC: Failed to truncate backing file (Permission denied) at
 startup after 14.0.2 to 15.0.1 upgrade
In-Reply-To: <op.0ukyz7qgtktu5k@fr-nbspt6.mycom-internal.com>
References: <op.0udzl5ketktu5k@fr-nbspt6.mycom-internal.com>
 <ff5edd20-174a-2ae7-6c05-e44549448930@oracle.com>
 <op.0ujhhziutktu5k@fr-nbspt6.mycom-internal.com>
 <c69739c6-4724-280a-6f0f-373d1207abce@oracle.com>
 <op.0ukyz7qgtktu5k@fr-nbspt6.mycom-internal.com>
Message-ID: <a99a752c-036e-b82b-1c45-73baeba6ec8b@oracle.com>

Hi David,

On 11/24/20 10:51 AM, David Tavoularis wrote:
> Hi Per,
> 
> Thanks to your help, we identified that 15.0.1 (with ZGC) was starting 
> fine with recent container-selinux versions (2.33/2.36/2.42), but was 
> broken with older one (2.21) :
> - Not working : container-selinux-2.21-2.gitba103ac.el7.noarch
> - Working???? : container-selinux-2.33-1.git86f33cd.el7.noarch
> - Working???? : container-selinux-2.36-1.gitff95335.el7.noarch
> - Working???? : container-selinux-2.42-1.gitad8f0f7.el7.noarch
> 

Good to hear!

> We will plan to upgrade the nodes running with old version.
> 
> Just for my information, is there a plan to support the old buggy 
> container-selinux in openjdk-15.0.2 by implementing a fallback to 
> fallocate when ftruncate returns EACCESS ?

A fallback using lseek+write would probably work. If it's worth 
implement and maintain such a thing depends on how common this problem 
is in the real world. You are the fist one report this issue, so I sort 
of assume it's not that common. Recommending people to use a more 
up-to-date version of the container-selinux package seems like a 
reasonable workaround/fix at this time. Of course, if this turns out to 
be a common issue we can reconsider.

cheers,
Per

From David.Tavoularis at mycom-osi.com  Tue Nov 24 10:34:40 2020
From: David.Tavoularis at mycom-osi.com (David Tavoularis)
Date: Tue, 24 Nov 2020 11:34:40 +0100
Subject: ZGC: Failed to truncate backing file (Permission denied) at
 startup after 14.0.2 to 15.0.1 upgrade
In-Reply-To: <a99a752c-036e-b82b-1c45-73baeba6ec8b@oracle.com>
References: <op.0udzl5ketktu5k@fr-nbspt6.mycom-internal.com>
 <ff5edd20-174a-2ae7-6c05-e44549448930@oracle.com>
 <op.0ujhhziutktu5k@fr-nbspt6.mycom-internal.com>
 <c69739c6-4724-280a-6f0f-373d1207abce@oracle.com>
 <op.0ukyz7qgtktu5k@fr-nbspt6.mycom-internal.com>
 <a99a752c-036e-b82b-1c45-73baeba6ec8b@oracle.com>
Message-ID: <op.0uk0z2g4tktu5k@fr-nbspt6.mycom-internal.com>

Thank you again Per for identifying and resolving the issue.
David

On Tue, 24 Nov 2020 11:32:00 +0100, Per Liden <per.liden at oracle.com> wrote:

> Hi David,
>
> On 11/24/20 10:51 AM, David Tavoularis wrote:
>> Hi Per,
>>  Thanks to your help, we identified that 15.0.1 (with ZGC) was starting  
>> fine with recent container-selinux versions (2.33/2.36/2.42), but was  
>> broken with older one (2.21) :
>> - Not working : container-selinux-2.21-2.gitba103ac.el7.noarch
>> - Working     : container-selinux-2.33-1.git86f33cd.el7.noarch
>> - Working     : container-selinux-2.36-1.gitff95335.el7.noarch
>> - Working     : container-selinux-2.42-1.gitad8f0f7.el7.noarch
>>
>
> Good to hear!
>
>> We will plan to upgrade the nodes running with old version.
>>  Just for my information, is there a plan to support the old buggy  
>> container-selinux in openjdk-15.0.2 by implementing a fallback to  
>> fallocate when ftruncate returns EACCESS ?
>
> A fallback using lseek+write would probably work. If it's worth  
> implement and maintain such a thing depends on how common this problem  
> is in the real world. You are the fist one report this issue, so I sort  
> of assume it's not that common. Recommending people to use a more  
> up-to-date version of the container-selinux package seems like a  
> reasonable workaround/fix at this time. Of course, if this turns out to  
> be a common issue we can reconsider.
>
> cheers,
> Per

From per.liden at oracle.com  Tue Nov 24 11:35:17 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 24 Nov 2020 12:35:17 +0100
Subject: high cpu usage caused by weak reference
In-Reply-To: <CA+_9DrUG5xckYGY_2tMF0=vELLV4N0QbzXzCi9-+e8ob3EXMig@mail.gmail.com>
References: <CA+_9DrUG5xckYGY_2tMF0=vELLV4N0QbzXzCi9-+e8ob3EXMig@mail.gmail.com>
Message-ID: <42862ebd-3e0a-a06a-7e1a-172c248763b0@oracle.com>

Hi,

On 11/24/20 2:40 AM, shaobin zeng wrote:
> Hi,
>    I am trying to use zgc in product environment, so I updated jdk from jdk8
> to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related options, but the
> cpu usage goes to 1000+% after a few hours later of the jvm start(normal
> cpu usage should be 100-300%). If I make node offline about 30s, the cpu
> goes down, and then make it online, it will work normally for hours until
> the cpu goes high again. Here is the gc option:
> 
>> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
>> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8 -XX:ConcGCThreads=4
>> -Xss2m -XX:+UseZGC
>> -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
>> -XX:+HeapDumpOnOutOfMemoryError'
> 
> 
>     I profiled it with async-profile, the most hot method is
> java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, maybe there are too
> many thread local map entry which are weak referenced are not reclaimed in
> time (jdk8 and cms works well on this)?
>    The following gc logs show the discovered weak reference count keep
> increasing after start, but the requests is almost constant from
> 11:00-17:00. Note that cpu dropped from 600% to 400% automatically after
> GC(9821), the enqueued is ~250K. GC(10265) the node was offline, enqueued
> was ~770K. I'm confused why the enqueued count is small in a long time and
> the discovered count going up straightly?
>     Thanks for any suggestions!

I think you've identified the problem. WeakReferences are constantly 
being resurrected and kept alive by calls to ThreadLocal.get(), which in 
turn calls getEntryAfterMiss(). Over time the table of WeakReferences 
grows (because they are all alive) and becomes more and more expensive 
to process. When you take the node off-line, and the calls 
ThreadLocal.get() stop (i.e. resurrection stops), then the GC gets a 
change to clean out the stale WeakReferences.

This is a known problem with WeakReferences, which ThreadLocal makes 
heavy use of. The good news is that work is in progress to fix this. The 
first part (https://bugs.openjdk.java.net/browse/JDK-8188055) recently 
went into mainline, and the two follow up patches 
(https://bugs.openjdk.java.net/browse/JDK-8256377 and 
https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully go in 
soon, and will resolve this issue for good.

In the mean time, you might want to look into calling 
ThreadLocal.remove() once a thread local is no longer needed. This is 
not always feasible, since those thread locals might have be created by 
a library you don't control, or it might be difficult to tell when a 
thread local is no longer needed, etc. However, if it is feasible in 
your case, then it might be a way for you to lower to cost of processing 
WeakReferences.

(CMS in JDK 8 was less sensitive to this issue, because it had a 
different marking strategy).

cheers,
Per

> 
> [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658 encountered, 72334
>> discovered, 0 enqueued
>> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462 encountered, 122216
>> discovered, 1380 enqueued
>> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598 encountered, 107228
>> discovered, 677 enqueued
>> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536 encountered, 82199
>> discovered, 1713 enqueued
>> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946 encountered, 291651
>> discovered, 292 enqueued
>> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065 encountered, 124351
>> discovered, 815 enqueued
>> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932
>> discovered, 353 enqueued
>> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369
>> discovered, 4648 enqueued
>> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932
>> discovered, 353 enqueued
>> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369
>> discovered, 4648 enqueued
>> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757 encountered, 928748
>> discovered, 1691 enqueued
>> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080 encountered, 841168
>> discovered, 247352 enqueued
>> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253 encountered, 568564
>> discovered, 3938 enqueued
>> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081 encountered, 788825
>> discovered, 767288 enqueued
>> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876 encountered, 18186
>> discovered, 1 enqueued
> 
> 

From zengshaobin2008 at gmail.com  Tue Nov 24 16:41:37 2020
From: zengshaobin2008 at gmail.com (shaobin zeng)
Date: Wed, 25 Nov 2020 00:41:37 +0800
Subject: high cpu usage caused by weak reference
In-Reply-To: <42862ebd-3e0a-a06a-7e1a-172c248763b0@oracle.com>
References: <CA+_9DrUG5xckYGY_2tMF0=vELLV4N0QbzXzCi9-+e8ob3EXMig@mail.gmail.com>
 <42862ebd-3e0a-a06a-7e1a-172c248763b0@oracle.com>
Message-ID: <CA+_9DrWb_uoxaTcBOQMPnEdy4Q3EksKvVWc2g-3LX5s1NOPzZw@mail.gmail.com>

Hi Per,
    Thanks for your reply, the thread locals are used in a lib so it is
hard to change. And did you mean zgc is one of the SATB collectors, and
there is some thing wrong when the collector try to collect these weak
references because of the Reference::get is called?

Per Liden <per.liden at oracle.com> ?2020?11?24??? ??7:35???

> Hi,
>
> On 11/24/20 2:40 AM, shaobin zeng wrote:
> > Hi,
> >    I am trying to use zgc in product environment, so I updated jdk from
> jdk8
> > to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related options, but
> the
> > cpu usage goes to 1000+% after a few hours later of the jvm start(normal
> > cpu usage should be 100-300%). If I make node offline about 30s, the cpu
> > goes down, and then make it online, it will work normally for hours until
> > the cpu goes high again. Here is the gc option:
> >
> >> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
> >> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8
> -XX:ConcGCThreads=4
> >> -Xss2m -XX:+UseZGC
> >>
> -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
> >> -XX:+HeapDumpOnOutOfMemoryError'
> >
> >
> >     I profiled it with async-profile, the most hot method is
> > java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, maybe there are
> too
> > many thread local map entry which are weak referenced are not reclaimed
> in
> > time (jdk8 and cms works well on this)?
> >    The following gc logs show the discovered weak reference count keep
> > increasing after start, but the requests is almost constant from
> > 11:00-17:00. Note that cpu dropped from 600% to 400% automatically after
> > GC(9821), the enqueued is ~250K. GC(10265) the node was offline, enqueued
> > was ~770K. I'm confused why the enqueued count is small in a long time
> and
> > the discovered count going up straightly?
> >     Thanks for any suggestions!
>
> I think you've identified the problem. WeakReferences are constantly
> being resurrected and kept alive by calls to ThreadLocal.get(), which in
> turn calls getEntryAfterMiss(). Over time the table of WeakReferences
> grows (because they are all alive) and becomes more and more expensive
> to process. When you take the node off-line, and the calls
> ThreadLocal.get() stop (i.e. resurrection stops), then the GC gets a
> change to clean out the stale WeakReferences.
>
> This is a known problem with WeakReferences, which ThreadLocal makes
> heavy use of. The good news is that work is in progress to fix this. The
> first part (https://bugs.openjdk.java.net/browse/JDK-8188055) recently
> went into mainline, and the two follow up patches
> (https://bugs.openjdk.java.net/browse/JDK-8256377 and
> https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully go in
> soon, and will resolve this issue for good.
>
> In the mean time, you might want to look into calling
> ThreadLocal.remove() once a thread local is no longer needed. This is
> not always feasible, since those thread locals might have be created by
> a library you don't control, or it might be difficult to tell when a
> thread local is no longer needed, etc. However, if it is feasible in
> your case, then it might be a way for you to lower to cost of processing
> WeakReferences.
>
> (CMS in JDK 8 was less sensitive to this issue, because it had a
> different marking strategy).
>
> cheers,
> Per
>
> >
> > [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658 encountered, 72334
> >> discovered, 0 enqueued
> >> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462 encountered, 122216
> >> discovered, 1380 enqueued
> >> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598 encountered, 107228
> >> discovered, 677 enqueued
> >> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536 encountered, 82199
> >> discovered, 1713 enqueued
> >> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946 encountered, 291651
> >> discovered, 292 enqueued
> >> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065 encountered, 124351
> >> discovered, 815 enqueued
> >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932
> >> discovered, 353 enqueued
> >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369
> >> discovered, 4648 enqueued
> >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070 encountered, 298932
> >> discovered, 353 enqueued
> >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162 encountered, 519369
> >> discovered, 4648 enqueued
> >> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757 encountered,
> 928748
> >> discovered, 1691 enqueued
> >> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080 encountered,
> 841168
> >> discovered, 247352 enqueued
> >> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253 encountered, 568564
> >> discovered, 3938 enqueued
> >> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081 encountered,
> 788825
> >> discovered, 767288 enqueued
> >> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876 encountered, 18186
> >> discovered, 1 enqueued
> >
> >
>


-- 
???

From per.liden at oracle.com  Wed Nov 25 14:47:35 2020
From: per.liden at oracle.com (Per Liden)
Date: Wed, 25 Nov 2020 15:47:35 +0100
Subject: high cpu usage caused by weak reference
In-Reply-To: <CA+_9DrWb_uoxaTcBOQMPnEdy4Q3EksKvVWc2g-3LX5s1NOPzZw@mail.gmail.com>
References: <CA+_9DrUG5xckYGY_2tMF0=vELLV4N0QbzXzCi9-+e8ob3EXMig@mail.gmail.com>
 <42862ebd-3e0a-a06a-7e1a-172c248763b0@oracle.com>
 <CA+_9DrWb_uoxaTcBOQMPnEdy4Q3EksKvVWc2g-3LX5s1NOPzZw@mail.gmail.com>
Message-ID: <329f593f-b227-fbd8-2cb5-6086bec73200@oracle.com>

On 11/24/20 5:41 PM, shaobin zeng wrote:
> Hi Per,
>  ? ? Thanks for your reply, the thread locals are used in a lib so it is 
> hard to change. And did you mean zgc is one of the SATB collectors, and 
> there is some thing wrong when the collector try to collect these weak 
> references because of the Reference::get is called?

By calling Reference.get() you create a strong reference to the 
referent, which means it will not be collected until a future GC cycle 
has concluded that it's no longer strongly reachable. ThreadLocal is 
backed by a hash table. ThreadLocal.get() will do a lookup in that 
table, which in turn can call Reference.get() on a number of "innocent" 
instances to figure out if it's the instance it is looking for. This 
causes those "innocent" instances to be kept alive. This is where 
Reference.refersTo() comes in (which I linked to). refersTo() provides a 
way to ask what the referent is pointing to without creating a strong 
reference to it. This allows ThreadLocal.get() to avoid keeping 
"innocent" instances alive.

cheers,
Per

(Btw, ZGC does precise wavefront marking, not SATB)

> 
> Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>> ?2020? 
> 11?24??? ??7:35???
> 
>     Hi,
> 
>     On 11/24/20 2:40 AM, shaobin zeng wrote:
>      > Hi,
>      >? ? I am trying to use zgc in product environment, so I updated
>     jdk from jdk8
>      > to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related
>     options, but the
>      > cpu usage goes to 1000+% after a few hours later of the jvm
>     start(normal
>      > cpu usage should be 100-300%). If I make node offline about 30s,
>     the cpu
>      > goes down, and then make it online, it will work normally for
>     hours until
>      > the cpu goes high again. Here is the gc option:
>      >
>      >> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
>      >> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8
>     -XX:ConcGCThreads=4
>      >> -Xss2m -XX:+UseZGC
>      >>
>     -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
>      >> -XX:+HeapDumpOnOutOfMemoryError'
>      >
>      >
>      >? ? ?I profiled it with async-profile, the most hot method is
>      > java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, maybe
>     there are too
>      > many thread local map entry which are weak referenced are not
>     reclaimed in
>      > time (jdk8 and cms works well on this)?
>      >? ? The following gc logs show the discovered weak reference count
>     keep
>      > increasing after start, but the requests is almost constant from
>      > 11:00-17:00. Note that cpu dropped from 600% to 400%
>     automatically after
>      > GC(9821), the enqueued is ~250K. GC(10265) the node was offline,
>     enqueued
>      > was ~770K. I'm confused why the enqueued count is small in a long
>     time and
>      > the discovered count going up straightly?
>      >? ? ?Thanks for any suggestions!
> 
>     I think you've identified the problem. WeakReferences are constantly
>     being resurrected and kept alive by calls to ThreadLocal.get(),
>     which in
>     turn calls getEntryAfterMiss(). Over time the table of WeakReferences
>     grows (because they are all alive) and becomes more and more expensive
>     to process. When you take the node off-line, and the calls
>     ThreadLocal.get() stop (i.e. resurrection stops), then the GC gets a
>     change to clean out the stale WeakReferences.
> 
>     This is a known problem with WeakReferences, which ThreadLocal makes
>     heavy use of. The good news is that work is in progress to fix this.
>     The
>     first part (https://bugs.openjdk.java.net/browse/JDK-8188055) recently
>     went into mainline, and the two follow up patches
>     (https://bugs.openjdk.java.net/browse/JDK-8256377 and
>     https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully go in
>     soon, and will resolve this issue for good.
> 
>     In the mean time, you might want to look into calling
>     ThreadLocal.remove() once a thread local is no longer needed. This is
>     not always feasible, since those thread locals might have be created by
>     a library you don't control, or it might be difficult to tell when a
>     thread local is no longer needed, etc. However, if it is feasible in
>     your case, then it might be a way for you to lower to cost of
>     processing
>     WeakReferences.
> 
>     (CMS in JDK 8 was less sensitive to this issue, because it had a
>     different marking strategy).
> 
>     cheers,
>     Per
> 
>      >
>      > [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658 encountered,
>     72334
>      >> discovered, 0 enqueued
>      >> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462
>     encountered, 122216
>      >> discovered, 1380 enqueued
>      >> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598
>     encountered, 107228
>      >> discovered, 677 enqueued
>      >> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536
>     encountered, 82199
>      >> discovered, 1713 enqueued
>      >> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946
>     encountered, 291651
>      >> discovered, 292 enqueued
>      >> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065
>     encountered, 124351
>      >> discovered, 815 enqueued
>      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
>     encountered, 298932
>      >> discovered, 353 enqueued
>      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
>     encountered, 519369
>      >> discovered, 4648 enqueued
>      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
>     encountered, 298932
>      >> discovered, 353 enqueued
>      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
>     encountered, 519369
>      >> discovered, 4648 enqueued
>      >> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757
>     encountered, 928748
>      >> discovered, 1691 enqueued
>      >> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080
>     encountered, 841168
>      >> discovered, 247352 enqueued
>      >> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253
>     encountered, 568564
>      >> discovered, 3938 enqueued
>      >> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081
>     encountered, 788825
>      >> discovered, 767288 enqueued
>      >> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876
>     encountered, 18186
>      >> discovered, 1 enqueued
>      >
>      >
> 
> 
> 
> -- 
> ???

From zengshaobin2008 at gmail.com  Wed Nov 25 15:30:54 2020
From: zengshaobin2008 at gmail.com (shaobin zeng)
Date: Wed, 25 Nov 2020 23:30:54 +0800
Subject: high cpu usage caused by weak reference
In-Reply-To: <329f593f-b227-fbd8-2cb5-6086bec73200@oracle.com>
References: <CA+_9DrUG5xckYGY_2tMF0=vELLV4N0QbzXzCi9-+e8ob3EXMig@mail.gmail.com>
 <42862ebd-3e0a-a06a-7e1a-172c248763b0@oracle.com>
 <CA+_9DrWb_uoxaTcBOQMPnEdy4Q3EksKvVWc2g-3LX5s1NOPzZw@mail.gmail.com>
 <329f593f-b227-fbd8-2cb5-6086bec73200@oracle.com>
Message-ID: <CA+_9DrX=8Fb0JFJuS1Y+Nti3NgzLDc1en=LuM_PG_Z5M=KAatQ@mail.gmail.com>

Hi Per,
    It is so kind of you to explain the details to me, I got the point now.
Then this problem may be resolved if the threads are not pooled or pooled
for a short time only, right? Since the innocent instances will not be
asked again if the thread is not active any more.

Per Liden <per.liden at oracle.com> ?2020?11?25??? ??10:49???

> On 11/24/20 5:41 PM, shaobin zeng wrote:
> > Hi Per,
> >      Thanks for your reply, the thread locals are used in a lib so it is
> > hard to change. And did you mean zgc is one of the SATB collectors, and
> > there is some thing wrong when the collector try to collect these weak
> > references because of the Reference::get is called?
>
> By calling Reference.get() you create a strong reference to the
> referent, which means it will not be collected until a future GC cycle
> has concluded that it's no longer strongly reachable. ThreadLocal is
> backed by a hash table. ThreadLocal.get() will do a lookup in that
> table, which in turn can call Reference.get() on a number of "innocent"
> instances to figure out if it's the instance it is looking for. This
> causes those "innocent" instances to be kept alive. This is where
> Reference.refersTo() comes in (which I linked to). refersTo() provides a
> way to ask what the referent is pointing to without creating a strong
> reference to it. This allows ThreadLocal.get() to avoid keeping
> "innocent" instances alive.
>
> cheers,
> Per
>
> (Btw, ZGC does precise wavefront marking, not SATB)
>
> >
> > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>> ?2020?
> > 11?24??? ??7:35???
> >
> >     Hi,
> >
> >     On 11/24/20 2:40 AM, shaobin zeng wrote:
> >      > Hi,
> >      >    I am trying to use zgc in product environment, so I updated
> >     jdk from jdk8
> >      > to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related
> >     options, but the
> >      > cpu usage goes to 1000+% after a few hours later of the jvm
> >     start(normal
> >      > cpu usage should be 100-300%). If I make node offline about 30s,
> >     the cpu
> >      > goes down, and then make it online, it will work normally for
> >     hours until
> >      > the cpu goes high again. Here is the gc option:
> >      >
> >      >> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
> >      >> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8
> >     -XX:ConcGCThreads=4
> >      >> -Xss2m -XX:+UseZGC
> >      >>
> >
>  -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
> >      >> -XX:+HeapDumpOnOutOfMemoryError'
> >      >
> >      >
> >      >     I profiled it with async-profile, the most hot method is
> >      > java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, maybe
> >     there are too
> >      > many thread local map entry which are weak referenced are not
> >     reclaimed in
> >      > time (jdk8 and cms works well on this)?
> >      >    The following gc logs show the discovered weak reference count
> >     keep
> >      > increasing after start, but the requests is almost constant from
> >      > 11:00-17:00. Note that cpu dropped from 600% to 400%
> >     automatically after
> >      > GC(9821), the enqueued is ~250K. GC(10265) the node was offline,
> >     enqueued
> >      > was ~770K. I'm confused why the enqueued count is small in a long
> >     time and
> >      > the discovered count going up straightly?
> >      >     Thanks for any suggestions!
> >
> >     I think you've identified the problem. WeakReferences are constantly
> >     being resurrected and kept alive by calls to ThreadLocal.get(),
> >     which in
> >     turn calls getEntryAfterMiss(). Over time the table of WeakReferences
> >     grows (because they are all alive) and becomes more and more
> expensive
> >     to process. When you take the node off-line, and the calls
> >     ThreadLocal.get() stop (i.e. resurrection stops), then the GC gets a
> >     change to clean out the stale WeakReferences.
> >
> >     This is a known problem with WeakReferences, which ThreadLocal makes
> >     heavy use of. The good news is that work is in progress to fix this.
> >     The
> >     first part (https://bugs.openjdk.java.net/browse/JDK-8188055)
> recently
> >     went into mainline, and the two follow up patches
> >     (https://bugs.openjdk.java.net/browse/JDK-8256377 and
> >     https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully go
> in
> >     soon, and will resolve this issue for good.
> >
> >     In the mean time, you might want to look into calling
> >     ThreadLocal.remove() once a thread local is no longer needed. This is
> >     not always feasible, since those thread locals might have be created
> by
> >     a library you don't control, or it might be difficult to tell when a
> >     thread local is no longer needed, etc. However, if it is feasible in
> >     your case, then it might be a way for you to lower to cost of
> >     processing
> >     WeakReferences.
> >
> >     (CMS in JDK 8 was less sensitive to this issue, because it had a
> >     different marking strategy).
> >
> >     cheers,
> >     Per
> >
> >      >
> >      > [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658 encountered,
> >     72334
> >      >> discovered, 0 enqueued
> >      >> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462
> >     encountered, 122216
> >      >> discovered, 1380 enqueued
> >      >> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598
> >     encountered, 107228
> >      >> discovered, 677 enqueued
> >      >> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536
> >     encountered, 82199
> >      >> discovered, 1713 enqueued
> >      >> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946
> >     encountered, 291651
> >      >> discovered, 292 enqueued
> >      >> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065
> >     encountered, 124351
> >      >> discovered, 815 enqueued
> >      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> >     encountered, 298932
> >      >> discovered, 353 enqueued
> >      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> >     encountered, 519369
> >      >> discovered, 4648 enqueued
> >      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> >     encountered, 298932
> >      >> discovered, 353 enqueued
> >      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> >     encountered, 519369
> >      >> discovered, 4648 enqueued
> >      >> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757
> >     encountered, 928748
> >      >> discovered, 1691 enqueued
> >      >> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080
> >     encountered, 841168
> >      >> discovered, 247352 enqueued
> >      >> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253
> >     encountered, 568564
> >      >> discovered, 3938 enqueued
> >      >> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081
> >     encountered, 788825
> >      >> discovered, 767288 enqueued
> >      >> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876
> >     encountered, 18186
> >      >> discovered, 1 enqueued
> >      >
> >      >
> >
> >
> >
> > --
> > ???
>


-- 
???

From per.liden at oracle.com  Wed Nov 25 16:51:05 2020
From: per.liden at oracle.com (Per Liden)
Date: Wed, 25 Nov 2020 17:51:05 +0100
Subject: high cpu usage caused by weak reference
In-Reply-To: <CA+_9DrX=8Fb0JFJuS1Y+Nti3NgzLDc1en=LuM_PG_Z5M=KAatQ@mail.gmail.com>
References: <CA+_9DrUG5xckYGY_2tMF0=vELLV4N0QbzXzCi9-+e8ob3EXMig@mail.gmail.com>
 <42862ebd-3e0a-a06a-7e1a-172c248763b0@oracle.com>
 <CA+_9DrWb_uoxaTcBOQMPnEdy4Q3EksKvVWc2g-3LX5s1NOPzZw@mail.gmail.com>
 <329f593f-b227-fbd8-2cb5-6086bec73200@oracle.com>
 <CA+_9DrX=8Fb0JFJuS1Y+Nti3NgzLDc1en=LuM_PG_Z5M=KAatQ@mail.gmail.com>
Message-ID: <7db84374-916a-a749-a561-0831de6a2d2e@oracle.com>

On 11/25/20 4:30 PM, shaobin zeng wrote:
> Hi Per,
>  ? ? It is so kind of you to explain the details to me, I got the point 
> now. Then?this problem may be?resolved if the threads are not pooled or 
> pooled for a short time only, right? Since the innocent instances will 
> not be asked again if the thread is not active any more.

Yes, in the sense that the backing hash table is owned by the thread, so 
when it terminates the table (and all its WeakReferences) will become 
garbage and eventually collected.

cheers,
Per

> 
> Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>> ?2020? 
> 11?25??? ??10:49???
> 
>     On 11/24/20 5:41 PM, shaobin zeng wrote:
>      > Hi Per,
>      >? ? ? Thanks for your reply, the thread locals are used in a lib
>     so it is
>      > hard to change. And did you mean zgc is one of the SATB
>     collectors, and
>      > there is some thing wrong when the collector try to collect these
>     weak
>      > references because of the Reference::get is called?
> 
>     By calling Reference.get() you create a strong reference to the
>     referent, which means it will not be collected until a future GC cycle
>     has concluded that it's no longer strongly reachable. ThreadLocal is
>     backed by a hash table. ThreadLocal.get() will do a lookup in that
>     table, which in turn can call Reference.get() on a number of "innocent"
>     instances to figure out if it's the instance it is looking for. This
>     causes those "innocent" instances to be kept alive. This is where
>     Reference.refersTo() comes in (which I linked to). refersTo()
>     provides a
>     way to ask what the referent is pointing to without creating a strong
>     reference to it. This allows ThreadLocal.get() to avoid keeping
>     "innocent" instances alive.
> 
>     cheers,
>     Per
> 
>     (Btw, ZGC does precise wavefront marking, not SATB)
> 
>      >
>      > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>
>     <mailto:per.liden at oracle.com <mailto:per.liden at oracle.com>>> ?2020?
>      > 11?24??? ??7:35???
>      >
>      >? ? ?Hi,
>      >
>      >? ? ?On 11/24/20 2:40 AM, shaobin zeng wrote:
>      >? ? ? > Hi,
>      >? ? ? >? ? I am trying to use zgc in product environment, so I updated
>      >? ? ?jdk from jdk8
>      >? ? ? > to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related
>      >? ? ?options, but the
>      >? ? ? > cpu usage goes to 1000+% after a few hours later of the jvm
>      >? ? ?start(normal
>      >? ? ? > cpu usage should be 100-300%). If I make node offline
>     about 30s,
>      >? ? ?the cpu
>      >? ? ? > goes down, and then make it online, it will work normally for
>      >? ? ?hours until
>      >? ? ? > the cpu goes high again. Here is the gc option:
>      >? ? ? >
>      >? ? ? >> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
>      >? ? ? >> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8
>      >? ? ?-XX:ConcGCThreads=4
>      >? ? ? >> -Xss2m -XX:+UseZGC
>      >? ? ? >>
>      >   
>      ?-Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
>      >? ? ? >> -XX:+HeapDumpOnOutOfMemoryError'
>      >? ? ? >
>      >? ? ? >
>      >? ? ? >? ? ?I profiled it with async-profile, the most hot method is
>      >? ? ? > java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss, maybe
>      >? ? ?there are too
>      >? ? ? > many thread local map entry which are weak referenced are not
>      >? ? ?reclaimed in
>      >? ? ? > time (jdk8 and cms works well on this)?
>      >? ? ? >? ? The following gc logs show the discovered weak
>     reference count
>      >? ? ?keep
>      >? ? ? > increasing after start, but the requests is almost
>     constant from
>      >? ? ? > 11:00-17:00. Note that cpu dropped from 600% to 400%
>      >? ? ?automatically after
>      >? ? ? > GC(9821), the enqueued is ~250K. GC(10265) the node was
>     offline,
>      >? ? ?enqueued
>      >? ? ? > was ~770K. I'm confused why the enqueued count is small in
>     a long
>      >? ? ?time and
>      >? ? ? > the discovered count going up straightly?
>      >? ? ? >? ? ?Thanks for any suggestions!
>      >
>      >? ? ?I think you've identified the problem. WeakReferences are
>     constantly
>      >? ? ?being resurrected and kept alive by calls to ThreadLocal.get(),
>      >? ? ?which in
>      >? ? ?turn calls getEntryAfterMiss(). Over time the table of
>     WeakReferences
>      >? ? ?grows (because they are all alive) and becomes more and more
>     expensive
>      >? ? ?to process. When you take the node off-line, and the calls
>      >? ? ?ThreadLocal.get() stop (i.e. resurrection stops), then the GC
>     gets a
>      >? ? ?change to clean out the stale WeakReferences.
>      >
>      >? ? ?This is a known problem with WeakReferences, which
>     ThreadLocal makes
>      >? ? ?heavy use of. The good news is that work is in progress to
>     fix this.
>      >? ? ?The
>      >? ? ?first part (https://bugs.openjdk.java.net/browse/JDK-8188055)
>     recently
>      >? ? ?went into mainline, and the two follow up patches
>      >? ? ?(https://bugs.openjdk.java.net/browse/JDK-8256377 and
>      > https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully
>     go in
>      >? ? ?soon, and will resolve this issue for good.
>      >
>      >? ? ?In the mean time, you might want to look into calling
>      >? ? ?ThreadLocal.remove() once a thread local is no longer needed.
>     This is
>      >? ? ?not always feasible, since those thread locals might have be
>     created by
>      >? ? ?a library you don't control, or it might be difficult to tell
>     when a
>      >? ? ?thread local is no longer needed, etc. However, if it is
>     feasible in
>      >? ? ?your case, then it might be a way for you to lower to cost of
>      >? ? ?processing
>      >? ? ?WeakReferences.
>      >
>      >? ? ?(CMS in JDK 8 was less sensitive to this issue, because it had a
>      >? ? ?different marking strategy).
>      >
>      >? ? ?cheers,
>      >? ? ?Per
>      >
>      >? ? ? >
>      >? ? ? > [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658
>     encountered,
>      >? ? ?72334
>      >? ? ? >> discovered, 0 enqueued
>      >? ? ? >> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462
>      >? ? ?encountered, 122216
>      >? ? ? >> discovered, 1380 enqueued
>      >? ? ? >> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598
>      >? ? ?encountered, 107228
>      >? ? ? >> discovered, 677 enqueued
>      >? ? ? >> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536
>      >? ? ?encountered, 82199
>      >? ? ? >> discovered, 1713 enqueued
>      >? ? ? >> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946
>      >? ? ?encountered, 291651
>      >? ? ? >> discovered, 292 enqueued
>      >? ? ? >> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065
>      >? ? ?encountered, 124351
>      >? ? ? >> discovered, 815 enqueued
>      >? ? ? >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
>      >? ? ?encountered, 298932
>      >? ? ? >> discovered, 353 enqueued
>      >? ? ? >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
>      >? ? ?encountered, 519369
>      >? ? ? >> discovered, 4648 enqueued
>      >? ? ? >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
>      >? ? ?encountered, 298932
>      >? ? ? >> discovered, 353 enqueued
>      >? ? ? >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
>      >? ? ?encountered, 519369
>      >? ? ? >> discovered, 4648 enqueued
>      >? ? ? >> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757
>      >? ? ?encountered, 928748
>      >? ? ? >> discovered, 1691 enqueued
>      >? ? ? >> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080
>      >? ? ?encountered, 841168
>      >? ? ? >> discovered, 247352 enqueued
>      >? ? ? >> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253
>      >? ? ?encountered, 568564
>      >? ? ? >> discovered, 3938 enqueued
>      >? ? ? >> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081
>      >? ? ?encountered, 788825
>      >? ? ? >> discovered, 767288 enqueued
>      >? ? ? >> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876
>      >? ? ?encountered, 18186
>      >? ? ? >> discovered, 1 enqueued
>      >? ? ? >
>      >? ? ? >
>      >
>      >
>      >
>      > --
>      > ???
> 
> 
> 
> -- 
> ???

From zengshaobin2008 at gmail.com  Thu Nov 26 02:31:22 2020
From: zengshaobin2008 at gmail.com (shaobin zeng)
Date: Thu, 26 Nov 2020 10:31:22 +0800
Subject: high cpu usage caused by weak reference
In-Reply-To: <7db84374-916a-a749-a561-0831de6a2d2e@oracle.com>
References: <CA+_9DrUG5xckYGY_2tMF0=vELLV4N0QbzXzCi9-+e8ob3EXMig@mail.gmail.com>
 <42862ebd-3e0a-a06a-7e1a-172c248763b0@oracle.com>
 <CA+_9DrWb_uoxaTcBOQMPnEdy4Q3EksKvVWc2g-3LX5s1NOPzZw@mail.gmail.com>
 <329f593f-b227-fbd8-2cb5-6086bec73200@oracle.com>
 <CA+_9DrX=8Fb0JFJuS1Y+Nti3NgzLDc1en=LuM_PG_Z5M=KAatQ@mail.gmail.com>
 <7db84374-916a-a749-a561-0831de6a2d2e@oracle.com>
Message-ID: <CA+_9DrWcYyBNaXurwpR1k_pq6ZDF5usZ+19C1jK2F_9mvNNPiQ@mail.gmail.com>

OK, I will do it right away. Thank you very much!

Per Liden <per.liden at oracle.com> ?2020?11?26??? ??12:51???

> On 11/25/20 4:30 PM, shaobin zeng wrote:
> > Hi Per,
> >      It is so kind of you to explain the details to me, I got the point
> > now. Then this problem may be resolved if the threads are not pooled or
> > pooled for a short time only, right? Since the innocent instances will
> > not be asked again if the thread is not active any more.
>
> Yes, in the sense that the backing hash table is owned by the thread, so
> when it terminates the table (and all its WeakReferences) will become
> garbage and eventually collected.
>
> cheers,
> Per
>
> >
> > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>> ?2020?
> > 11?25??? ??10:49???
> >
> >     On 11/24/20 5:41 PM, shaobin zeng wrote:
> >      > Hi Per,
> >      >      Thanks for your reply, the thread locals are used in a lib
> >     so it is
> >      > hard to change. And did you mean zgc is one of the SATB
> >     collectors, and
> >      > there is some thing wrong when the collector try to collect these
> >     weak
> >      > references because of the Reference::get is called?
> >
> >     By calling Reference.get() you create a strong reference to the
> >     referent, which means it will not be collected until a future GC
> cycle
> >     has concluded that it's no longer strongly reachable. ThreadLocal is
> >     backed by a hash table. ThreadLocal.get() will do a lookup in that
> >     table, which in turn can call Reference.get() on a number of
> "innocent"
> >     instances to figure out if it's the instance it is looking for. This
> >     causes those "innocent" instances to be kept alive. This is where
> >     Reference.refersTo() comes in (which I linked to). refersTo()
> >     provides a
> >     way to ask what the referent is pointing to without creating a strong
> >     reference to it. This allows ThreadLocal.get() to avoid keeping
> >     "innocent" instances alive.
> >
> >     cheers,
> >     Per
> >
> >     (Btw, ZGC does precise wavefront marking, not SATB)
> >
> >      >
> >      > Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>
> >     <mailto:per.liden at oracle.com <mailto:per.liden at oracle.com>>> ?2020?
> >      > 11?24??? ??7:35???
> >      >
> >      >     Hi,
> >      >
> >      >     On 11/24/20 2:40 AM, shaobin zeng wrote:
> >      >      > Hi,
> >      >      >    I am trying to use zgc in product environment, so I
> updated
> >      >     jdk from jdk8
> >      >      > to openjdk 15, tomcat 8 to tomcat 8.5, and the gc related
> >      >     options, but the
> >      >      > cpu usage goes to 1000+% after a few hours later of the jvm
> >      >     start(normal
> >      >      > cpu usage should be 100-300%). If I make node offline
> >     about 30s,
> >      >     the cpu
> >      >      > goes down, and then make it online, it will work normally
> for
> >      >     hours until
> >      >      > the cpu goes high again. Here is the gc option:
> >      >      >
> >      >      >> export JAVA_OPTS='-Xms10g -Xmx10g -XX:+UseLargePages
> >      >      >> -XX:ZAllocationSpikeTolerance=5 -XX:ParallelGCThreads=8
> >      >     -XX:ConcGCThreads=4
> >      >      >> -Xss2m -XX:+UseZGC
> >      >      >>
> >      >
> >
>  -Xlog:gc,gc+phases,safepoint:file=/logs/gc.log:t:filecount=10,filesize=10m
> >      >      >> -XX:+HeapDumpOnOutOfMemoryError'
> >      >      >
> >      >      >
> >      >      >     I profiled it with async-profile, the most hot method
> is
> >      >      > java/lang/ThreadLocal$ThreadLocalMap.getEntryAfterMiss,
> maybe
> >      >     there are too
> >      >      > many thread local map entry which are weak referenced are
> not
> >      >     reclaimed in
> >      >      > time (jdk8 and cms works well on this)?
> >      >      >    The following gc logs show the discovered weak
> >     reference count
> >      >     keep
> >      >      > increasing after start, but the requests is almost
> >     constant from
> >      >      > 11:00-17:00. Note that cpu dropped from 600% to 400%
> >      >     automatically after
> >      >      > GC(9821), the enqueued is ~250K. GC(10265) the node was
> >     offline,
> >      >     enqueued
> >      >      > was ~770K. I'm confused why the enqueued count is small in
> >     a long
> >      >     time and
> >      >      > the discovered count going up straightly?
> >      >      >     Thanks for any suggestions!
> >      >
> >      >     I think you've identified the problem. WeakReferences are
> >     constantly
> >      >     being resurrected and kept alive by calls to
> ThreadLocal.get(),
> >      >     which in
> >      >     turn calls getEntryAfterMiss(). Over time the table of
> >     WeakReferences
> >      >     grows (because they are all alive) and becomes more and more
> >     expensive
> >      >     to process. When you take the node off-line, and the calls
> >      >     ThreadLocal.get() stop (i.e. resurrection stops), then the GC
> >     gets a
> >      >     change to clean out the stale WeakReferences.
> >      >
> >      >     This is a known problem with WeakReferences, which
> >     ThreadLocal makes
> >      >     heavy use of. The good news is that work is in progress to
> >     fix this.
> >      >     The
> >      >     first part (https://bugs.openjdk.java.net/browse/JDK-8188055)
> >     recently
> >      >     went into mainline, and the two follow up patches
> >      >     (https://bugs.openjdk.java.net/browse/JDK-8256377 and
> >      > https://bugs.openjdk.java.net/browse/JDK-8256167) will hopefully
> >     go in
> >      >     soon, and will resolve this issue for good.
> >      >
> >      >     In the mean time, you might want to look into calling
> >      >     ThreadLocal.remove() once a thread local is no longer needed.
> >     This is
> >      >     not always feasible, since those thread locals might have be
> >     created by
> >      >     a library you don't control, or it might be difficult to tell
> >     when a
> >      >     thread local is no longer needed, etc. However, if it is
> >     feasible in
> >      >     your case, then it might be a way for you to lower to cost of
> >      >     processing
> >      >     WeakReferences.
> >      >
> >      >     (CMS in JDK 8 was less sensitive to this issue, because it
> had a
> >      >     different marking strategy).
> >      >
> >      >     cheers,
> >      >     Per
> >      >
> >      >      >
> >      >      > [2020-11-19T11:00:00.245+0800] GC(992) Weak: 155658
> >     encountered,
> >      >     72334
> >      >      >> discovered, 0 enqueued
> >      >      >> [2020-11-19T12:00:00.397+0800] GC(2194) Weak: 220462
> >      >     encountered, 122216
> >      >      >> discovered, 1380 enqueued
> >      >      >> [2020-11-19T12:00:03.411+0800] GC(2195) Weak: 220598
> >      >     encountered, 107228
> >      >      >> discovered, 677 enqueued
> >      >      >> [2020-11-19T13:00:00.497+0800] GC(3395) Weak: 222536
> >      >     encountered, 82199
> >      >      >> discovered, 1713 enqueued
> >      >      >> [2020-11-19T14:00:00.647+0800] GC(4613) Weak: 443946
> >      >     encountered, 291651
> >      >      >> discovered, 292 enqueued
> >      >      >> [2020-11-19T15:00:01.173+0800] GC(5819) Weak: 338065
> >      >     encountered, 124351
> >      >      >> discovered, 815 enqueued
> >      >      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> >      >     encountered, 298932
> >      >      >> discovered, 353 enqueued
> >      >      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> >      >     encountered, 519369
> >      >      >> discovered, 4648 enqueued
> >      >      >> [2020-11-19T16:00:01.283+0800] GC(7022) Weak: 459070
> >      >     encountered, 298932
> >      >      >> discovered, 353 enqueued
> >      >      >> [2020-11-19T17:00:01.426+0800] GC(8222) Weak: 688162
> >      >     encountered, 519369
> >      >      >> discovered, 4648 enqueued
> >      >      >> [2020-11-19T18:00:01.556+0800] GC(9430) Weak: 1078757
> >      >     encountered, 928748
> >      >      >> discovered, 1691 enqueued
> >      >      >> [2020-11-19T18:18:43.595+0800] GC(9821) Weak: 1022080
> >      >     encountered, 841168
> >      >      >> discovered, 247352 enqueued
> >      >      >> [2020-11-19T18:18:46.592+0800] GC(9822) Weak: 774253
> >      >     encountered, 568564
> >      >      >> discovered, 3938 enqueued
> >      >      >> [2020-11-19T18:40:49.616+0800] GC(10265) Weak: 842081
> >      >     encountered, 788825
> >      >      >> discovered, 767288 enqueued
> >      >      >> [2020-11-19T18:40:52.593+0800] GC(10266) Weak: 74876
> >      >     encountered, 18186
> >      >      >> discovered, 1 enqueued
> >      >      >
> >      >      >
> >      >
> >      >
> >      >
> >      > --
> >      > ???
> >
> >
> >
> > --
> > ???
>


-- 
???