RFR: 8286030: Avoid JVM crash when containers share the same /tmp dir [v5]

Ioi Lam iklam at openjdk.org
Tue Jul 12 22:39:44 UTC 2022


On Mon, 11 Jul 2022 04:53:19 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fixed typo
>
> src/hotspot/os/posix/perfMemory_posix.cpp line 733:
> 
>> 731:   while ((entry = os::readdir(dirp)) != NULL) {
>> 732:     const char* filename = entry->d_name;
>> 733:     pid_t pid = filename_to_pid(filename);
> 
> Pre-existing. An error value of -1 would be somewhat cleaner since strictly speaking pid 0 is a valid PID. (Feel free to ignore this comment)

I think I'll leave it for now to minimize the behavior changes of this PR. I read somewhere that pid 0 is the scheduler

https://unix.stackexchange.com/questions/83322/which-process-has-pid-0

So no actual JVM process will create a stale file with name "0". If such a file exists for some other reason we would remove it, which be consistent with the new comment "any other files found in this directory may be removed".

> src/hotspot/os/posix/perfMemory_posix.cpp line 760:
> 
>> 758:       // Something wrong happened. Ignore the error and don't try to remove the
>> 759:       // file.
>> 760:       errno = 0;
> 
> Can we have a debug or trace log line here?

Fixed

> src/hotspot/os/posix/perfMemory_posix.cpp line 804:
> 
>> 802:     }
>> 803: 
>> 804:  #if defined(LINUX)
> 
> Indentation

Fixed

> src/hotspot/os/posix/perfMemory_posix.cpp line 807:
> 
>> 805:     // Hold the lock until here to prevent other JVMs from using this file
>> 806:     // while we are in the middle of deleting it.
>> 807:     ::close(fd);
> 
> I don't understand this comment, it seems to contradict what you are doing. We are closing the only fd referring to this lock file, right? So the lock should get unlocked here too? If we want to keep the lock open, shouldn't we avoid closing the fd?

I want to prevent the following race condition. Let's assume this process has PID 10 and there's another process (in a different pid namespace) with PID 20. Both process see a file named "20".

0. No one holds a lock on this file.
1. Process 20 successfully locks the file in cleanup_sharedmem_files().
2. Process 20 gives up the lock.
3. Process 20 decides it can delete the file (PID 20 matches its own PID).
4. This process successfully locks the file in cleanup_sharedmem_files().
5. This process gives up the lock
6. This process decides it can delete the file (PID 20 does not exist in my pid namespace)
7. Process 20 deletes the file. Creates a new version of this file. Successfully locks the new file.
8. This process deletes the new version of this file (by name).

By holding the lock between steps 4 and 8, we can guaranteed that if a process can successfully lock the file in create_sharedmem_file(), this file will never be unintentionally deleted.

I changed the comment slightly:


    // Hold the lock until here to prevent other JVMs from using this file
-   // while we are in the middle of deleting it.
+   // while we were in the middle of deleting it.

> src/hotspot/os/posix/perfMemory_posix.cpp line 926:
> 
>> 924:                               (errno == EWOULDBLOCK) ?
>> 925:                               "it is locked by another process" :
>> 926:                               "flock() failed");
> 
> strerror would be helpful here, or at least errno

I added just the errno. Example:


[0.003s][warning][perf,memops] Cannot use file /tmp/hsperfdata_root/1 
because it is locked by another process (errno = 11)


If we print the `os::strerror()` it would look like this:


... because it is locked by another process (errno = 11, Operation would block) 


which seems too verbose and could be confusing.

-------------

PR: https://git.openjdk.org/jdk/pull/9406


More information about the serviceability-dev mailing list