RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD

Leo Korinth leo.korinth at oracle.com
Wed Jun 13 14:26:32 UTC 2018


Hi David!

I have reworked the test case not to use /proc (my solution was not good 
on Solaris either). I will now use "lsof -p" if I can find it and use 
"pfiles -F" otherwise (no "lsof" is installed on our Solaris test 
machines). It does now run on GNU/Linux, Windows, Solaris and MacOS.

I have removed most of all the changes from my previous patch, so I will 
not post an incremental webrev.

In this webrev, I believe the static imports are no longer obscure.

Webrev:
http://cr.openjdk.java.net/~lkorinth/8202740/03/

Thanks,
Leo


On 10/06/18 23:21, David Holmes wrote:
> Hi Leo,
> 
> Overall updates seem okay - use of Optional does simplify things a little.
> 
> Nit/General Comment: I'm not a fan of static imports in general as they 
> lead to more obscure code IMHO because you have to go and search the 
> import list to understand where a method is coming from. YMMV.
> 
> Regarding OSX ... I think I'd rather see testing that might possibly 
> fail (but we haven't seen that have we?) rather than skip it altogether. 
> Other opinions welcomed.
> 
> Thanks,
> David
> 
> On 7/06/2018 9:07 PM, Leo Korinth wrote:
>> Hi!
>>
>> Thanks for reviewing!
>>
>> On 06/06/18 07:40, David Holmes wrote:
>>> Hi Leo,
>>>
>>> First, please include the bug id in the RFR email subject line - thanks.
>>>
>>> On 31/05/2018 11:56 PM, Leo Korinth wrote:
>>>> Hi!
>>>>
>>>> I am uppdating TestInheritFD.java to fix intermittent test
>>>> failures. The reason is that
>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.
>>>
>>> Okay.
>>>
>>>> The new test does not count the number of open file descriptors
>>>> but instead looks to see if the log file is still open; this is
>>>> less fragile. This relies on a /proc file system, but that seems
>>>> to be available on all non Windows ports except Mac. Mac testing is 
>>>> disabled. The Windows solution does not need to change.
>>>
>>> Let me back up a step to understand the original problem and the test.
>>>
>>> Problem: if a VM was started with a log file, the fd for the log file 
>>> would be inherited by an exec'd process (ie another VM).
>>>
>>> Detection: On Windows the original log file could not be deleted due 
>>> to the child process's open fd.
>>
>> Exactly, this is actually the original problem.
>>
>>>
>>> Fix: use close-on-exec for the log file
>>>
>>> Test:
>>>   - On Windows try to rename the log file - should succeed.
>>>   - On non-windows get the count of open fds in the parent process 
>>> and child process and check child-count < parent-count as child 
>>> should not have the log file.
>>>
>>> The flaw: On Linux, due to the container detection logic the child 
>>> can still have an open /sys/fs/cgroup/memory/memory.limit_in_bytes 
>>> and so the count is not smaller than the parent.
>>>
>>> Question: why does the parent not also have the open 
>>> /sys/fs/cgroup/memory/memory.limit_in_bytes?
>>
>> I have not looked into the logic, but I guess the file is only opened 
>> briefly and then closed. That would explain why the test case fails 
>> intermittently.
>>
>> Taking a fast look at "subsystem_file_contents" seems to verify this.
>>>
>>> The basic problem for the test is: how it can determine that one 
>>> process has an open fd for the same file as another process? The 
>>> Windows situation answers that accurately. Counting fd's doesn't. But 
>>> the other platforms don't have the rename issue that windows does. 
>>> And there's no file API to ask this kind of question as far as I can 
>>> see.
>>>
>>> So the proposed fix is to use the /proc filesystem to look in the 
>>> open fd's for the file. This seemingly works on all non-windows 
>>> platforms except OS X. Do we ever see the intermittent failures on OS 
>>> X? If not can it keep using the fd counting logic?
>>
>> The old test could probably be used safely on OS X. But it is quite 
>> fragile and could fail if the JVM is opening files for any reason in 
>> the future; I think it is better to just disable the testing on OS X. 
>> Do you agree?
>>>
>>> Thanks for bearing with me as I walked through that :)
>>>
>>>> This test bug does cause intermittent test failures in tier1, so I 
>>>> would be grateful for reviews and also someone to sponsor the final 
>>>> hg export.
>>>>
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8202740
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~lkorinth/8202740/
>>>
>>> So generally the approach seems good. Took quite an effort to 
>>> translate all the stream/lambda processing though :)
>>>
>>> +             return new File("Error: could not read symbolic link 
>>> (last link can not be read as it points to the /proc/self/fd 
>>> directory that has been closed): " + e.toString());
>>>
>>> That is a very long line and it seems quite bizarre to "return" an 
>>> error as a File object. Took me a while to realize this is not really 
>>> an error but the way we detect the last entry which is unreadable. I 
>>> suppose this is one way to get all the information printed out.
>>
>>
>> Exactly, it is ugly, but the log is good to have if the test case 
>> fails. I have reworked the code, it is hopefully better now.
>>
>>>
>>> +         System.out.println("Open file descriptors:\n" + 
>>> stream(dir.listFiles())
>>> +             .map(f -> "###" + f.getAbsolutePath() + " -> " + 
>>> linkTarget(f))
>>> +             .collect(joining("\n")));
>>>
>>> I found this really hard to parse and understand. It would help a 
>>> little if the " -> " string was written as " maps to " so it didn't 
>>> look like the -> from a lambda expression (or even just =>). It would 
>>> also help to unwrap one level and assign to a String which is then 
>>> passed to println.
>>
>> Changed, hopefully better now.
>>
>>> +         return stream(dir.listFiles())
>>> +             .map(TestInheritFD::linkTarget)
>>> +             .filter(f -> f.getName().endsWith(LOG_SUFFIX))
>>> +             .findAny()
>>> +             .isPresent();
>>>
>>> Neat!
>>>
>>> Three minor nits:
>>>
>>> 133   + " ON as files in /proc and /sys is opened by the JVM");
>>>
>>> Typo: is opened -> are opened
>>
>> Fixed.
>>
>>>
>>> 134    System.out.println(findOpenLogFile()?LEAKS_FD:RETAINS_FD);
>>> 179    System.out.println(f.renameTo(f)?RETAINS_FD:LEAKS_FD);
>>>
>>> Spaces around ? and : operators
>>
>> Fixed.
>>
>> I also added method fakeLeakyJVM (so that one can test that the test 
>> case really fails when the JVM leaks).
>>
>> Updated webrevs:
>> http://cr.openjdk.java.net/~lkorinth/8202740/01_02/ (incremental)
>> http://cr.openjdk.java.net/~lkorinth/8202740/02/
>>
>> Started running mach5 tests...
>>
>> Thanks, Leo
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>
>>>> Testing:
>>>> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
>>>>
>>>> Thanks,
>>>> Leo


More information about the hotspot-runtime-dev mailing list