ZGC: Failed to truncate backing file (Permission denied) at startup after 14.0.2 to 15.0.1 upgrade

David Tavoularis David.Tavoularis at mycom-osi.com
Tue Nov 24 09:51:33 UTC 2020


Hi Per,

Thanks to your help, we identified that 15.0.1 (with ZGC) was starting  
fine with recent container-selinux versions (2.33/2.36/2.42), but was  
broken with older one (2.21) :
- Not working : container-selinux-2.21-2.gitba103ac.el7.noarch
- Working     : container-selinux-2.33-1.git86f33cd.el7.noarch
- Working     : container-selinux-2.36-1.gitff95335.el7.noarch
- Working     : container-selinux-2.42-1.gitad8f0f7.el7.noarch

We will plan to upgrade the nodes running with old version.

Just for my information, is there a plan to support the old buggy  
container-selinux in openjdk-15.0.2 by implementing a fallback to  
fallocate when ftruncate returns EACCESS ?

Best Regards
-- 
David

On Mon, 23 Nov 2020 18:40:41 +0100, Per Liden <per.liden at oracle.com> wrote:

> Hi David,
>
> On 11/23/20 3:35 PM, David Tavoularis wrote:
>> Hi Per,
>>  Thank you for your quick reply.
>>
>>> The fact that ftruncate returns EACCESS suggests that there's some  
>>> kind of environment problem that is blocking the file from growing.
>>> In the production system, is this process running in some constrained  
>>> environment/container/cgroup?
>> Yes, it is running inside a pod in an OpenShift environment.
>>
>>> To test this, run with JDK 14 and use "-Xms270g -Xmx270g"
>> I tried with "-Xms100g -Xmx100g", I hope that it is fine for this test.
>>  $ /opt/3rd_party/jdk_installed/jdk-14.0.2/bin/java -Xms100g -Xmx100g  
>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
>> [0.006s][debug][gc,heap] Minimum heap 107374182400  Initial heap  
>> 107374182400  Maximum heap 107374182400
>> [0.006s][info ][gc,init] Initializing The Z Garbage Collector
>> [0.006s][info ][gc,init] Version: 14.0.2+12-46 (release)
>> [0.006s][info ][gc,init] NUMA Support: Disabled
>> [0.006s][info ][gc,init] CPUs: 70 total, 40 available
>> [0.006s][info ][gc,init] Memory: 419840M
>> [0.006s][info ][gc,init] Large Page Support: Disabled
>> [0.006s][info ][gc,init] Medium Page Size: 32M
>> [0.006s][info ][gc,init] Workers: 24 parallel, 5 concurrent
>> [0.009s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active  
>> Workers: 24
>> [0.009s][info ][gc,init] Address Space Type:  
>> Contiguous/Unrestricted/Complete
>> [0.009s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
>> [0.010s][info ][gc,init] Heap backed by file: /memfd:java_heap
>> [0.010s][info ][gc,init] Min Capacity: 102400M
>> [0.010s][info ][gc,init] Initial Capacity: 102400M
>> [0.010s][info ][gc,init] Max Capacity: 102400M
>> [0.010s][info ][gc,init] Max Reserve: 80M
>> [0.010s][info ][gc,init] Pre-touch: Disabled
>> [0.010s][info ][gc,init] Available space on backing filesystem: N/A
>> [25.372s][info ][gc,init] Uncommit: Disabled
>> [25.372s][debug][gc,marking] Expanding mark stack space: 0M->32M
>> [25.888s][info ][gc,init   ] Runtime Workers: 24 parallel
>> [25.960s][info ][gc        ] Using The Z Garbage Collector
>> [26.179s][debug][gc,nmethod] Rebuilding NMethod Table: 0->1024 entries,  
>> 0(0%->0%) registered, 0(0%->0%) unregistered
>> openjdk version "14.0.2" 2020-07-14
>> OpenJDK Runtime Environment (build 14.0.2+12-46)
>> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -Xms100g -Xmx100g  
>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc*=debug -version
>> [0.006s][debug][gc,heap] Minimum heap 107374182400  Initial heap  
>> 107374182400  Maximum heap 107374182400
>> [0.006s][info ][gc,init] Initializing The Z Garbage Collector
>> [0.006s][info ][gc,init] Version: 15.0.1+9-18 (release)
>> [0.006s][info ][gc,init] NUMA Support: Disabled
>> [0.006s][info ][gc,init] CPUs: 70 total, 40 available
>> [0.006s][info ][gc,init] Memory: 419840M
>> [0.006s][info ][gc,init] Large Page Support: Disabled
>> [0.007s][info ][gc,init] Workers: 24 parallel, 5 concurrent
>> [0.010s][debug][gc,task] Executing Task: ZWorkersInitializeTask, Active  
>> Workers: 24
>> [0.011s][info ][gc,init] Address Space Type:  
>> Contiguous/Unrestricted/Complete
>> [0.011s][info ][gc,init] Address Space Size: 1638400M x 3 = 4915200M
>> [0.012s][info ][gc,init] Heap Backing File: /memfd:java_heap
>> [0.012s][error][gc     ] Failed to truncate backing file (Permission  
>> denied)
>> [0.012s][debug][gc,marking] Expanding mark stack space: 0M->32M
>> [0.081s][info ][gc,init   ] Runtime Workers: 24 parallel
>> [0.084s][info ][gc        ] Using The Z Garbage Collector
>> Error: Could not create the Java Virtual Machine.
>> Error: A fatal exception has occurred. Program will exit.
>>
>>> The fact that ftruncate returns EACCESS suggests that there's some  
>>> kind of environment problem
>> Is there a workaround or a way to fix the environment problem ?
>
> I looked around for Docker + ftruncate issues, and found a number of  
> bugs related to this, for example:
>
> https://github.com/openshift/origin/issues/15723
> https://access.redhat.com/errata/RHBA-2018:0195
>
> These bugs are fairly old, but they seem to describe the exact same  
> problem you're seeing. In short, it's an docker+selinux policy bug  
> affecting memfd_create+ftruncate. Maybe you're production system haven't  
> been patched with this bugfix?
>
> This would also explain why JDK 14 works, since we used fallocate  
> (instead of ftruncate) to grow the file there, and therefore happened to  
> dodge this issue.
>
> Please have a look at those links, and see if you might have run into  
> the same issue.
>
> cheers,
> Per
>
>>  Best Regards
>> David
>>  On Mon, 23 Nov 2020 15:11:30 +0100, Per Liden <per.liden at oracle.com>  
>> wrote:
>>
>>> Hi David,
>>>
>>> On 11/20/20 4:21 PM, David Tavoularis wrote:
>>>> Hi,
>>>>  I found a possible regression in customer production environment  
>>>> linked to JDK-8245203 "ZGC: Don't track size in  
>>>> ZPhysicalMemoryBacking" or JDK15 "Fixed support for transparent huge  
>>>> pages".
>>>> After upgrade from jdk-14.0.2 to jdk-15.0.1, the JVM (using ZGC) is  
>>>> crashing at startup with the error message "Failed to truncate  
>>>> backing file (Permission denied)"
>>>>  This error message was introduced in changelist  
>>>> http://hg.openjdk.java.net/jdk-updates/jdk15u/rev/556d5070c458
>>>> Cf source code :  
>>>> http://hg.openjdk.java.net/jdk-updates/jdk15u/file/556d5070c458/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp  
>>>>   The issue was seen in Prod, not in Test, nor in labs, I suspect  
>>>> that  Heap Backing Filesystem  is not correctly set to tmpfs in Prod  
>>>> or a permission issue on a directory/file owned by root.
>>>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -version
>>>> openjdk version "15.0.1" 2020-10-20
>>>> OpenJDK Runtime Environment (build 15.0.1+9-18)
>>>> OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing)
>>>>  $ /opt/3rd_party/jdk_installed/jdk-15.0.1/bin/java -XX:+UseZGC  
>>>> -version
>>>> [0.009s][error][gc] Failed to truncate backing file (Permission  
>>>> denied)
>>>> Error: Could not create the Java Virtual Machine.
>>>> Error: A fatal exception has occurred. Program will exit.
>>>>  $ /opt/mycom/3rd_party/jdk_installed/jdk-14.0.2/bin/java  
>>>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -version
>>>> openjdk version "14.0.2" 2020-07-14
>>>> OpenJDK Runtime Environment (build 14.0.2+12-46)
>>>> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode)
>>>>  Should I fill a bug at https://bugreport.java.com/bugreport/ ?
>>>> What additional information should I provide ?
>>>> Any workaround ?
>>>
>>> It's hard to tell exactly why this fails. One difference between JDK  
>>> 14 and 15 is that the backing file will be truncated to the max heap  
>>> size up-front when the JVM starts (but it will not be backed by any  
>>> memory until we actually start to expand the heap), instead of  
>>> gradually grown as the heap expands. So, one theory is that you could  
>>> have had this problem even with 14, but the heap might not have grown  
>>> large enough for the problem to be exposed. To test this, run with JDK  
>>> 14 and use "-Xms270g -Xmx270g". This will force the backing file to be  
>>> truncated up-front, similar to what happens in JDK 15.
>>>
>>> Also running with "-Xlog:gc*=debug" will print a few more things that  
>>> might be interesting.
>>>
>>> The fact that ftruncate returns EACCESS suggests that there's some  
>>> kind of environment problem that is blocking the file from growing. In  
>>> the production system, is this process running in some constrained  
>>> environment/container/cgroup?
>>>
>>> cheers,
>>> Per


More information about the zgc-dev mailing list