RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD

Leo Korinth leo.korinth at oracle.com
Thu Jun 7 11:07:07 UTC 2018


Hi!

Thanks for reviewing!

On 06/06/18 07:40, David Holmes wrote:
> Hi Leo,
> 
> First, please include the bug id in the RFR email subject line - thanks.
> 
> On 31/05/2018 11:56 PM, Leo Korinth wrote:
>> Hi!
>>
>> I am uppdating TestInheritFD.java to fix intermittent test
>> failures. The reason is that
>> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.
> 
> Okay.
> 
>> The new test does not count the number of open file descriptors
>> but instead looks to see if the log file is still open; this is
>> less fragile. This relies on a /proc file system, but that seems
>> to be available on all non Windows ports except Mac. Mac testing is 
>> disabled. The Windows solution does not need to change.
> 
> Let me back up a step to understand the original problem and the test.
> 
> Problem: if a VM was started with a log file, the fd for the log file 
> would be inherited by an exec'd process (ie another VM).
> 
> Detection: On Windows the original log file could not be deleted due to 
> the child process's open fd.

Exactly, this is actually the original problem.

> 
> Fix: use close-on-exec for the log file
> 
> Test:
>   - On Windows try to rename the log file - should succeed.
>   - On non-windows get the count of open fds in the parent process and 
> child process and check child-count < parent-count as child should not 
> have the log file.
> 
> The flaw: On Linux, due to the container detection logic the child can 
> still have an open /sys/fs/cgroup/memory/memory.limit_in_bytes and so 
> the count is not smaller than the parent.
> 
> Question: why does the parent not also have the open 
> /sys/fs/cgroup/memory/memory.limit_in_bytes?

I have not looked into the logic, but I guess the file is only opened 
briefly and then closed. That would explain why the test case fails 
intermittently.

Taking a fast look at "subsystem_file_contents" seems to verify this.
> 
> The basic problem for the test is: how it can determine that one process 
> has an open fd for the same file as another process? The Windows 
> situation answers that accurately. Counting fd's doesn't. But the other 
> platforms don't have the rename issue that windows does. And there's no 
> file API to ask this kind of question as far as I can see.
> 
> So the proposed fix is to use the /proc filesystem to look in the open 
> fd's for the file. This seemingly works on all non-windows platforms 
> except OS X. Do we ever see the intermittent failures on OS X? If not 
> can it keep using the fd counting logic?

The old test could probably be used safely on OS X. But it is quite 
fragile and could fail if the JVM is opening files for any reason in the 
future; I think it is better to just disable the testing on OS X. Do you 
agree?
> 
> Thanks for bearing with me as I walked through that :)
> 
>> This test bug does cause intermittent test failures in tier1, so I 
>> would be grateful for reviews and also someone to sponsor the final hg 
>> export.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8202740
>>
>> Webrev:
>> http://cr.openjdk.java.net/~lkorinth/8202740/
> 
> So generally the approach seems good. Took quite an effort to translate 
> all the stream/lambda processing though :)
> 
> +             return new File("Error: could not read symbolic link (last 
> link can not be read as it points to the /proc/self/fd directory that 
> has been closed): " + e.toString());
> 
> That is a very long line and it seems quite bizarre to "return" an error 
> as a File object. Took me a while to realize this is not really an error 
> but the way we detect the last entry which is unreadable. I suppose this 
> is one way to get all the information printed out.


Exactly, it is ugly, but the log is good to have if the test case fails. 
I have reworked the code, it is hopefully better now.

> 
> +         System.out.println("Open file descriptors:\n" + 
> stream(dir.listFiles())
> +             .map(f -> "###" + f.getAbsolutePath() + " -> " + 
> linkTarget(f))
> +             .collect(joining("\n")));
> 
> I found this really hard to parse and understand. It would help a little 
> if the " -> " string was written as " maps to " so it didn't look like 
> the -> from a lambda expression (or even just =>). It would also help to 
> unwrap one level and assign to a String which is then passed to println.

Changed, hopefully better now.

> +         return stream(dir.listFiles())
> +             .map(TestInheritFD::linkTarget)
> +             .filter(f -> f.getName().endsWith(LOG_SUFFIX))
> +             .findAny()
> +             .isPresent();
> 
> Neat!
> 
> Three minor nits:
> 
> 133   + " ON as files in /proc and /sys is opened by the JVM");
> 
> Typo: is opened -> are opened

Fixed.

> 
> 134    System.out.println(findOpenLogFile()?LEAKS_FD:RETAINS_FD);
> 179    System.out.println(f.renameTo(f)?RETAINS_FD:LEAKS_FD);
> 
> Spaces around ? and : operators

Fixed.

I also added method fakeLeakyJVM (so that one can test that the test 
case really fails when the JVM leaks).

Updated webrevs:
http://cr.openjdk.java.net/~lkorinth/8202740/01_02/ (incremental)
http://cr.openjdk.java.net/~lkorinth/8202740/02/

Started running mach5 tests...

Thanks, Leo
> 
> Thanks,
> David
> -----
> 
> 
>> Testing:
>> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
>>
>> Thanks,
>> Leo


More information about the hotspot-runtime-dev mailing list