RFR: 8286030: Avoid JVM crash when containers share the same /tmp dir [v5]

Thomas Stuefe stuefe at openjdk.org
Mon Jul 11 05:17:07 UTC 2022


On Sat, 9 Jul 2022 06:49:03 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Some Kubernetes setups share the /tmp directory across multiple containers. On rare occasions, the JVM may crash when it tries to write to `/tmp/hsperfdata_<user>/<pid>` when a process in a separate container decides to do the same thing (because they happen to have the same namespaced pid).
>> 
>> This patch avoids the crash by using `flock()` to allow only one of these processes to write to the file. All other competing processes that fail to grab the lock will give up the file and run with PerfMemory disabled. We will try to enable PerfMemory for the failed processes in a follow-up RFE: [JDK-8289883](https://bugs.openjdk.org/browse/JDK-8289883)
>> 
>> Thanks to Vitaly Davidovich and Nico Williams for coming up with the idea of using `flock()`.
>> 
>> I kept the use of `kill()` for stale file detection to be compatible with older JVMs.
>> 
>> I also took the opportunity to clean up the comments and remove dead code. The old code was using "shared memory resources" which sounds unclear and odd. I changed the terminology to say "shared memory file" instead.
>
> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fixed typo

Just nits and questions. I like the test (did not know we have Docker support for tests in jtreg).

src/hotspot/os/posix/perfMemory_posix.cpp line 733:

> 731:   while ((entry = os::readdir(dirp)) != NULL) {
> 732:     const char* filename = entry->d_name;
> 733:     pid_t pid = filename_to_pid(filename);

Pre-existing. An error value of -1 would be somewhat cleaner since strictly speaking pid 0 is a valid PID. (Feel free to ignore this comment)

src/hotspot/os/posix/perfMemory_posix.cpp line 760:

> 758:       // Something wrong happened. Ignore the error and don't try to remove the
> 759:       // file.
> 760:       errno = 0;

Can we have a debug or trace log line here?

src/hotspot/os/posix/perfMemory_posix.cpp line 804:

> 802:     }
> 803: 
> 804:  #if defined(LINUX)

Indentation

src/hotspot/os/posix/perfMemory_posix.cpp line 807:

> 805:     // Hold the lock until here to prevent other JVMs from using this file
> 806:     // while we are in the middle of deleting it.
> 807:     ::close(fd);

I don't understand this comment, it seems to contradict what you are doing. We are closing the only fd referring to this lock file, right? So the lock should get unlocked here too? If we want to keep the lock open, shouldn't we avoid closing the fd?

src/hotspot/os/posix/perfMemory_posix.cpp line 926:

> 924:                               (errno == EWOULDBLOCK) ?
> 925:                               "it is locked by another process" :
> 926:                               "flock() failed");

strerror would be helpful here, or at least errno

-------------

PR: https://git.openjdk.org/jdk/pull/9406


More information about the serviceability-dev mailing list