RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD
David Holmes
david.holmes at oracle.com
Sun Jun 10 21:21:10 UTC 2018
Hi Leo,
Overall updates seem okay - use of Optional does simplify things a little.
Nit/General Comment: I'm not a fan of static imports in general as they
lead to more obscure code IMHO because you have to go and search the
import list to understand where a method is coming from. YMMV.
Regarding OSX ... I think I'd rather see testing that might possibly
fail (but we haven't seen that have we?) rather than skip it altogether.
Other opinions welcomed.
Thanks,
David
On 7/06/2018 9:07 PM, Leo Korinth wrote:
> Hi!
>
> Thanks for reviewing!
>
> On 06/06/18 07:40, David Holmes wrote:
>> Hi Leo,
>>
>> First, please include the bug id in the RFR email subject line - thanks.
>>
>> On 31/05/2018 11:56 PM, Leo Korinth wrote:
>>> Hi!
>>>
>>> I am uppdating TestInheritFD.java to fix intermittent test
>>> failures. The reason is that
>>> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.
>>
>> Okay.
>>
>>> The new test does not count the number of open file descriptors
>>> but instead looks to see if the log file is still open; this is
>>> less fragile. This relies on a /proc file system, but that seems
>>> to be available on all non Windows ports except Mac. Mac testing is
>>> disabled. The Windows solution does not need to change.
>>
>> Let me back up a step to understand the original problem and the test.
>>
>> Problem: if a VM was started with a log file, the fd for the log file
>> would be inherited by an exec'd process (ie another VM).
>>
>> Detection: On Windows the original log file could not be deleted due
>> to the child process's open fd.
>
> Exactly, this is actually the original problem.
>
>>
>> Fix: use close-on-exec for the log file
>>
>> Test:
>> - On Windows try to rename the log file - should succeed.
>> - On non-windows get the count of open fds in the parent process and
>> child process and check child-count < parent-count as child should not
>> have the log file.
>>
>> The flaw: On Linux, due to the container detection logic the child can
>> still have an open /sys/fs/cgroup/memory/memory.limit_in_bytes and so
>> the count is not smaller than the parent.
>>
>> Question: why does the parent not also have the open
>> /sys/fs/cgroup/memory/memory.limit_in_bytes?
>
> I have not looked into the logic, but I guess the file is only opened
> briefly and then closed. That would explain why the test case fails
> intermittently.
>
> Taking a fast look at "subsystem_file_contents" seems to verify this.
>>
>> The basic problem for the test is: how it can determine that one
>> process has an open fd for the same file as another process? The
>> Windows situation answers that accurately. Counting fd's doesn't. But
>> the other platforms don't have the rename issue that windows does. And
>> there's no file API to ask this kind of question as far as I can see.
>>
>> So the proposed fix is to use the /proc filesystem to look in the open
>> fd's for the file. This seemingly works on all non-windows platforms
>> except OS X. Do we ever see the intermittent failures on OS X? If not
>> can it keep using the fd counting logic?
>
> The old test could probably be used safely on OS X. But it is quite
> fragile and could fail if the JVM is opening files for any reason in the
> future; I think it is better to just disable the testing on OS X. Do you
> agree?
>>
>> Thanks for bearing with me as I walked through that :)
>>
>>> This test bug does cause intermittent test failures in tier1, so I
>>> would be grateful for reviews and also someone to sponsor the final
>>> hg export.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8202740
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~lkorinth/8202740/
>>
>> So generally the approach seems good. Took quite an effort to
>> translate all the stream/lambda processing though :)
>>
>> + return new File("Error: could not read symbolic link
>> (last link can not be read as it points to the /proc/self/fd directory
>> that has been closed): " + e.toString());
>>
>> That is a very long line and it seems quite bizarre to "return" an
>> error as a File object. Took me a while to realize this is not really
>> an error but the way we detect the last entry which is unreadable. I
>> suppose this is one way to get all the information printed out.
>
>
> Exactly, it is ugly, but the log is good to have if the test case fails.
> I have reworked the code, it is hopefully better now.
>
>>
>> + System.out.println("Open file descriptors:\n" +
>> stream(dir.listFiles())
>> + .map(f -> "###" + f.getAbsolutePath() + " -> " +
>> linkTarget(f))
>> + .collect(joining("\n")));
>>
>> I found this really hard to parse and understand. It would help a
>> little if the " -> " string was written as " maps to " so it didn't
>> look like the -> from a lambda expression (or even just =>). It would
>> also help to unwrap one level and assign to a String which is then
>> passed to println.
>
> Changed, hopefully better now.
>
>> + return stream(dir.listFiles())
>> + .map(TestInheritFD::linkTarget)
>> + .filter(f -> f.getName().endsWith(LOG_SUFFIX))
>> + .findAny()
>> + .isPresent();
>>
>> Neat!
>>
>> Three minor nits:
>>
>> 133 + " ON as files in /proc and /sys is opened by the JVM");
>>
>> Typo: is opened -> are opened
>
> Fixed.
>
>>
>> 134 System.out.println(findOpenLogFile()?LEAKS_FD:RETAINS_FD);
>> 179 System.out.println(f.renameTo(f)?RETAINS_FD:LEAKS_FD);
>>
>> Spaces around ? and : operators
>
> Fixed.
>
> I also added method fakeLeakyJVM (so that one can test that the test
> case really fails when the JVM leaks).
>
> Updated webrevs:
> http://cr.openjdk.java.net/~lkorinth/8202740/01_02/ (incremental)
> http://cr.openjdk.java.net/~lkorinth/8202740/02/
>
> Started running mach5 tests...
>
> Thanks, Leo
>>
>> Thanks,
>> David
>> -----
>>
>>
>>> Testing:
>>> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
>>>
>>> Thanks,
>>> Leo
More information about the hotspot-runtime-dev
mailing list