RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD

David Holmes david.holmes at oracle.com
Wed Jun 6 05:40:06 UTC 2018


Hi Leo,

First, please include the bug id in the RFR email subject line - thanks.

On 31/05/2018 11:56 PM, Leo Korinth wrote:
> Hi!
> 
> I am uppdating TestInheritFD.java to fix intermittent test
> failures. The reason is that
> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.

Okay.

> The new test does not count the number of open file descriptors
> but instead looks to see if the log file is still open; this is
> less fragile. This relies on a /proc file system, but that seems
> to be available on all non Windows ports except Mac. Mac testing is 
> disabled. The Windows solution does not need to change.

Let me back up a step to understand the original problem and the test.

Problem: if a VM was started with a log file, the fd for the log file 
would be inherited by an exec'd process (ie another VM).

Detection: On Windows the original log file could not be deleted due to 
the child process's open fd.

Fix: use close-on-exec for the log file

Test:
  - On Windows try to rename the log file - should succeed.
  - On non-windows get the count of open fds in the parent process and 
child process and check child-count < parent-count as child should not 
have the log file.

The flaw: On Linux, due to the container detection logic the child can 
still have an open /sys/fs/cgroup/memory/memory.limit_in_bytes and so 
the count is not smaller than the parent.

Question: why does the parent not also have the open 
/sys/fs/cgroup/memory/memory.limit_in_bytes?

The basic problem for the test is: how it can determine that one process 
has an open fd for the same file as another process? The Windows 
situation answers that accurately. Counting fd's doesn't. But the other 
platforms don't have the rename issue that windows does. And there's no 
file API to ask this kind of question as far as I can see.

So the proposed fix is to use the /proc filesystem to look in the open 
fd's for the file. This seemingly works on all non-windows platforms 
except OS X. Do we ever see the intermittent failures on OS X? If not 
can it keep using the fd counting logic?

Thanks for bearing with me as I walked through that :)

> This test bug does cause intermittent test failures in tier1, so I would 
> be grateful for reviews and also someone to sponsor the final hg export.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8202740
> 
> Webrev:
> http://cr.openjdk.java.net/~lkorinth/8202740/

So generally the approach seems good. Took quite an effort to translate 
all the stream/lambda processing though :)

+             return new File("Error: could not read symbolic link (last 
link can not be read as it points to the /proc/self/fd directory that 
has been closed): " + e.toString());

That is a very long line and it seems quite bizarre to "return" an error 
as a File object. Took me a while to realize this is not really an error 
but the way we detect the last entry which is unreadable. I suppose this 
is one way to get all the information printed out.

+         System.out.println("Open file descriptors:\n" + 
stream(dir.listFiles())
+             .map(f -> "###" + f.getAbsolutePath() + " -> " + 
linkTarget(f))
+             .collect(joining("\n")));

I found this really hard to parse and understand. It would help a little 
if the " -> " string was written as " maps to " so it didn't look like 
the -> from a lambda expression (or even just =>). It would also help to 
unwrap one level and assign to a String which is then passed to println.

+         return stream(dir.listFiles())
+             .map(TestInheritFD::linkTarget)
+             .filter(f -> f.getName().endsWith(LOG_SUFFIX))
+             .findAny()
+             .isPresent();

Neat!

Three minor nits:

133   + " ON as files in /proc and /sys is opened by the JVM");

Typo: is opened -> are opened

134    System.out.println(findOpenLogFile()?LEAKS_FD:RETAINS_FD);
179    System.out.println(f.renameTo(f)?RETAINS_FD:LEAKS_FD);

Spaces around ? and : operators

Thanks,
David
-----


> Testing:
> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
> 
> Thanks,
> Leo


More information about the hotspot-runtime-dev mailing list