ZGC: Failed to truncate backing file (Permission denied) at startup after 14.0.2 to 15.0.1 upgrade

Per Liden per.liden at oracle.com
Mon Nov 23 17:40:41 UTC 2020


Hi David,

On 11/23/20 3:35 PM, David Tavoularis wrote:
> Hi Per,
> 
> Thank you for your quick reply.
> 
>> The fact that ftruncate returns EACCESS suggests that there's some 
>> kind of environment problem that is blocking the file from growing.
>> In the production system, is this process running in some constrained 
>> environment/container/cgroup?
> Yes, it is running inside a pod in an OpenShift environment.
> 
>> To test this, run with JDK 14 and use "-Xms270g -Xmx270g"
> I tried with "-Xms100g -Xmx100g", I hope that it is fine for this test.
> 
> $ /opt/3rd_party/jdk_installed/jdk-14.0.2/bin/java -Xms100g -Xmx100g 
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
> [0.006s][debug][gc,heap] Minimum heap 107374182400  Initial heap 
> 107374182400  Maximum heap 107374182400
> [0.006s][info ][gc,init] Initializing The Z Garbage Collector
> [0.006s][info ][gc,init] Version: 14.0.2+12-46 (release)
> [0.006s][info ][gc,init] NUMA Support: Disabled
> [0.006s][info ][gc,init] CPUs: 70 total, 40 available
> [0.006s][info ][gc,init] Memory: 419840M
> [0.006s][info ][gc,init] Large Page Support: Disabled
> [0.006s][info ][gc,init] Medium Page Size: 32M
> [0.006s][info ][gc,init] Workers: 24 parallel, 5 concurrent
> [0.009s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active 
> Workers: 24
> [0.009s][info ][gc,init] Address Space Type: 
> Contiguous/Unrestricted/Complete
> [0.009s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
> [0.010s][info ][gc,init] Heap backed by file: /memfd:java_heap
> [0.010s][info ][gc,init] Min Capacity: 102400M
> [0.010s][info ][gc,init] Initial Capacity: 102400M
> [0.010s][info ][gc,init] Max Capacity: 102400M
> [0.010s][info ][gc,init] Max Reserve: 80M
> [0.010s][info ][gc,init] Pre-touch: Disabled
> [0.010s][info ][gc,init] Available space on backing filesystem: N/A
> [25.372s][info ][gc,init] Uncommit: Disabled
> [25.372s][debug][gc,marking] Expanding mark stack space: 0M->32M
> [25.888s][info ][gc,init   ] Runtime Workers: 24 parallel
> [25.960s][info ][gc        ] Using The Z Garbage Collector
> [26.179s][debug][gc,nmethod] Rebuilding NMethod Table: 0->1024 entries, 
> 0(0%->0%) registered, 0(0%->0%) unregistered
> openjdk version "14.0.2" 2020-07-14
> OpenJDK Runtime Environment (build 14.0.2+12-46)
> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
> 
> $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -Xms100g -Xmx100g 
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
> [0.006s][debug][gc,heap] Minimum heap 107374182400  Initial heap 
> 107374182400  Maximum heap 107374182400
> [0.006s][info ][gc,init] Initializing The Z Garbage Collector
> [0.006s][info ][gc,init] Version: 15.0.1+9-18 (release)
> [0.006s][info ][gc,init] NUMA Support: Disabled
> [0.006s][info ][gc,init] CPUs: 70 total, 40 available
> [0.006s][info ][gc,init] Memory: 419840M
> [0.006s][info ][gc,init] Large Page Support: Disabled
> [0.007s][info ][gc,init] Workers: 24 parallel, 5 concurrent
> [0.010s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active 
> Workers: 24
> [0.011s][info ][gc,init] Address Space Type: 
> Contiguous/Unrestricted/Complete
> [0.011s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
> [0.012s][info ][gc,init] Heap Backing File: /memfd:java_heap
> [0.012s][error][gc     ] Failed to truncate backing file (Permission 
> denied)
> [0.012s][debug][gc,marking] Expanding mark stack space: 0M->32M
> [0.081s][info ][gc,init   ] Runtime Workers: 24 parallel
> [0.084s][info ][gc        ] Using The Z Garbage Collector
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> 
>> The fact that ftruncate returns EACCESS suggests that there's some 
>> kind of environment problem
> Is there a workaround or a way to fix the environment problem ?

I looked around for Docker + ftruncate issues, and found a number of 
bugs related to this, for example:

https://github.com/openshift/origin/issues/15723
https://access.redhat.com/errata/RHBA-2018:0195

These bugs are fairly old, but they seem to describe the exact same 
problem you're seeing. In short, it's an docker+selinux policy bug 
affecting memfd_create+ftruncate. Maybe you're production system haven't 
been patched with this bugfix?

This would also explain why JDK 14 works, since we used fallocate 
(instead of ftruncate) to grow the file there, and therefore happened to 
dodge this issue.

Please have a look at those links, and see if you might have run into 
the same issue.

cheers,
Per

> 
> Best Regards
> David
> 
> On Mon, 23 Nov 2020 15:11:30 +0100, Per Liden <per.liden at oracle.com> wrote:
> 
>> Hi David,
>>
>> On 11/20/20 4:21 PM, David Tavoularis wrote:
>>> Hi,
>>>  I found a possible regression in customer production environment 
>>> linked to JDK-8245203 "ZGC: Don't track size in 
>>> ZPhysicalMemoryBacking" or JDK15 "Fixed support for transparent huge 
>>> pages".
>>> After upgrade from jdk-14.0.2 to jdk-15.0.1, the JVM (using ZGC) is 
>>> crashing at startup with the error message "Failed to truncate 
>>> backing file (Permission denied)"
>>>  This error message was introduced in changelist 
>>> http://hg.openjdk.java.net/jdk-updates/jdk15u/rev/556d5070c458
>>> Cf source code : 
>>> http://hg.openjdk.java.net/jdk-updates/jdk15u/file/556d5070c458/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp 
>>>   The issue was seen in Prod, not in Test, nor in labs, I suspect 
>>> that  Heap Backing Filesystem  is not correctly set to tmpfs in Prod 
>>> or a permission issue on a directory/file owned by root.
>>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -version
>>> openjdk version "15.0.1" 2020-10-20
>>> OpenJDK Runtime Environment (build 15.0.1+9-18)
>>> OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing)
>>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -XX:+UseZGC -version
>>> [0.009s][error][gc] Failed to truncate backing file (Permission denied)
>>> Error: Could not create the Java Virtual Machine.
>>> Error: A fatal exception has occurred. Program will exit.
>>>  $ /opt/mycom/3rd_party/jdk_installed/jdk-14.0.2/bin/java 
>>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -version
>>> openjdk version "14.0.2" 2020-07-14
>>> OpenJDK Runtime Environment (build 14.0.2+12-46)
>>> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
>>>  Should I fill a bug at https://bugreport.java.com/bugreport/ ?
>>> What additional information should I provide ?
>>> Any workaround ?
>>
>> It's hard to tell exactly why this fails. One difference between JDK 
>> 14 and 15 is that the backing file will be truncated to the max heap 
>> size up-front when the JVM starts (but it will not be backed by any 
>> memory until we actually start to expand the heap), instead of 
>> gradually grown as the heap expands. So, one theory is that you could 
>> have had this problem even with 14, but the heap might not have grown 
>> large enough for the problem to be exposed. To test this, run with JDK 
>> 14 and use "-Xms270g -Xmx270g". This will force the backing file to be 
>> truncated up-front, similar to what happens in JDK 15.
>>
>> Also running with "-Xlog:gc*=debug" will print a few more things that 
>> might be interesting.
>>
>> The fact that ftruncate returns EACCESS suggests that there's some 
>> kind of environment problem that is blocking the file from growing. In 
>> the production system, is this process running in some constrained 
>> environment/container/cgroup?
>>
>> cheers,
>> Per


More information about the zgc-dev mailing list