RFR: 8286030: Avoid JVM crash when containers share the same /tmp dir [v6]

Severin Gehwolf sgehwolf at openjdk.org
Fri Jul 15 12:20:06 UTC 2022


On Tue, 12 Jul 2022 22:39:36 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Some Kubernetes setups share the /tmp directory across multiple containers. On rare occasions, the JVM may crash when it tries to write to `/tmp/hsperfdata_<user>/<pid>` when a process in a separate container decides to do the same thing (because they happen to have the same namespaced pid).
>> 
>> This patch avoids the crash by using `flock()` to allow only one of these processes to write to the file. All other competing processes that fail to grab the lock will give up the file and run with PerfMemory disabled. We will try to enable PerfMemory for the failed processes in a follow-up RFE: [JDK-8289883](https://bugs.openjdk.org/browse/JDK-8289883)
>> 
>> Thanks to Vitaly Davidovich and Nico Williams for coming up with the idea of using `flock()`.
>> 
>> I kept the use of `kill()` for stale file detection to be compatible with older JVMs.
>> 
>> I also took the opportunity to clean up the comments and remove dead code. The old code was using "shared memory resources" which sounds unclear and odd. I changed the terminology to say "shared memory file" instead.
>
> Ioi Lam has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - add errno to log
>  - added debug log and tweaked comment

LGTM. My manual tests of this work as expected as well.


$ podman run --rm -ti --userns=keep-id -u $(id -u) -v $(pwd)/shared-tmp:/tmp:z -v /disk/openjdk/upstream-sources/git/jdk-jdk/build/linux-x86_64-server-release/images/jdk:/opt/jdk:z -v $(pwd)/test:/opt/test:z fedora:36 /opt/jdk/bin/java -Xlog:perf+memops=debug -cp /opt/test HelloWait
[0.001s][debug][perf,memops] PerfDataMemorySize = 32768, os::vm_allocation_granularity = 4096, adjusted size = 32768
[0.001s][info ][perf,memops] Trying to open /tmp/hsperfdata_sgehwolf/1
[0.001s][info ][perf,memops] Successfully opened
[0.001s][debug][perf,memops] PerfMemory created: address = 0x00007fac290dd000, size = 32768
Hello!
$ podman run --rm -ti --userns=keep-id -u $(id -u) -v $(pwd)/shared-tmp:/tmp:z -v /disk/openjdk/upstream-sources/git/jdk-jdk/build/linux-x86_64-server-release/images/jdk:/opt/jdk:z -v $(pwd)/test:/opt/test:z fedora:36 /opt/jdk/bin/java -Xlog:perf+memops=debug -cp /opt/test HelloWait
[0.001s][debug][perf,memops] PerfDataMemorySize = 32768, os::vm_allocation_granularity = 4096, adjusted size = 32768
[0.001s][debug][perf,memops] flock for stale file check failed for /tmp/hsperfdata_sgehwolf/1
[0.001s][info ][perf,memops] Trying to open /tmp/hsperfdata_sgehwolf/1
[0.001s][warning][perf,memops] Cannot use file /tmp/hsperfdata_sgehwolf/1 because it is locked by another process (errno = 11)
[0.001s][debug  ][perf,memops] PerfMemory created: address = 0x00007fc60bc79000, size = 32768
Hello!

-------------

Marked as reviewed by sgehwolf (Reviewer).

PR: https://git.openjdk.org/jdk/pull/9406


More information about the serviceability-dev mailing list