RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD

David Holmes david.holmes at oracle.com
Sun Jun 10 21:21:10 UTC 2018


Hi Leo,

Overall updates seem okay - use of Optional does simplify things a little.

Nit/General Comment: I'm not a fan of static imports in general as they 
lead to more obscure code IMHO because you have to go and search the 
import list to understand where a method is coming from. YMMV.

Regarding OSX ... I think I'd rather see testing that might possibly 
fail (but we haven't seen that have we?) rather than skip it altogether. 
Other opinions welcomed.

Thanks,
David

On 7/06/2018 9:07 PM, Leo Korinth wrote:
> Hi!
> 
> Thanks for reviewing!
> 
> On 06/06/18 07:40, David Holmes wrote:
>> Hi Leo,
>>
>> First, please include the bug id in the RFR email subject line - thanks.
>>
>> On 31/05/2018 11:56 PM, Leo Korinth wrote:
>>> Hi!
>>>
>>> I am uppdating TestInheritFD.java to fix intermittent test
>>> failures. The reason is that
>>> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.
>>
>> Okay.
>>
>>> The new test does not count the number of open file descriptors
>>> but instead looks to see if the log file is still open; this is
>>> less fragile. This relies on a /proc file system, but that seems
>>> to be available on all non Windows ports except Mac. Mac testing is 
>>> disabled. The Windows solution does not need to change.
>>
>> Let me back up a step to understand the original problem and the test.
>>
>> Problem: if a VM was started with a log file, the fd for the log file 
>> would be inherited by an exec'd process (ie another VM).
>>
>> Detection: On Windows the original log file could not be deleted due 
>> to the child process's open fd.
> 
> Exactly, this is actually the original problem.
> 
>>
>> Fix: use close-on-exec for the log file
>>
>> Test:
>>   - On Windows try to rename the log file - should succeed.
>>   - On non-windows get the count of open fds in the parent process and 
>> child process and check child-count < parent-count as child should not 
>> have the log file.
>>
>> The flaw: On Linux, due to the container detection logic the child can 
>> still have an open /sys/fs/cgroup/memory/memory.limit_in_bytes and so 
>> the count is not smaller than the parent.
>>
>> Question: why does the parent not also have the open 
>> /sys/fs/cgroup/memory/memory.limit_in_bytes?
> 
> I have not looked into the logic, but I guess the file is only opened 
> briefly and then closed. That would explain why the test case fails 
> intermittently.
> 
> Taking a fast look at "subsystem_file_contents" seems to verify this.
>>
>> The basic problem for the test is: how it can determine that one 
>> process has an open fd for the same file as another process? The 
>> Windows situation answers that accurately. Counting fd's doesn't. But 
>> the other platforms don't have the rename issue that windows does. And 
>> there's no file API to ask this kind of question as far as I can see.
>>
>> So the proposed fix is to use the /proc filesystem to look in the open 
>> fd's for the file. This seemingly works on all non-windows platforms 
>> except OS X. Do we ever see the intermittent failures on OS X? If not 
>> can it keep using the fd counting logic?
> 
> The old test could probably be used safely on OS X. But it is quite 
> fragile and could fail if the JVM is opening files for any reason in the 
> future; I think it is better to just disable the testing on OS X. Do you 
> agree?
>>
>> Thanks for bearing with me as I walked through that :)
>>
>>> This test bug does cause intermittent test failures in tier1, so I 
>>> would be grateful for reviews and also someone to sponsor the final 
>>> hg export.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8202740
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~lkorinth/8202740/
>>
>> So generally the approach seems good. Took quite an effort to 
>> translate all the stream/lambda processing though :)
>>
>> +             return new File("Error: could not read symbolic link 
>> (last link can not be read as it points to the /proc/self/fd directory 
>> that has been closed): " + e.toString());
>>
>> That is a very long line and it seems quite bizarre to "return" an 
>> error as a File object. Took me a while to realize this is not really 
>> an error but the way we detect the last entry which is unreadable. I 
>> suppose this is one way to get all the information printed out.
> 
> 
> Exactly, it is ugly, but the log is good to have if the test case fails. 
> I have reworked the code, it is hopefully better now.
> 
>>
>> +         System.out.println("Open file descriptors:\n" + 
>> stream(dir.listFiles())
>> +             .map(f -> "###" + f.getAbsolutePath() + " -> " + 
>> linkTarget(f))
>> +             .collect(joining("\n")));
>>
>> I found this really hard to parse and understand. It would help a 
>> little if the " -> " string was written as " maps to " so it didn't 
>> look like the -> from a lambda expression (or even just =>). It would 
>> also help to unwrap one level and assign to a String which is then 
>> passed to println.
> 
> Changed, hopefully better now.
> 
>> +         return stream(dir.listFiles())
>> +             .map(TestInheritFD::linkTarget)
>> +             .filter(f -> f.getName().endsWith(LOG_SUFFIX))
>> +             .findAny()
>> +             .isPresent();
>>
>> Neat!
>>
>> Three minor nits:
>>
>> 133   + " ON as files in /proc and /sys is opened by the JVM");
>>
>> Typo: is opened -> are opened
> 
> Fixed.
> 
>>
>> 134    System.out.println(findOpenLogFile()?LEAKS_FD:RETAINS_FD);
>> 179    System.out.println(f.renameTo(f)?RETAINS_FD:LEAKS_FD);
>>
>> Spaces around ? and : operators
> 
> Fixed.
> 
> I also added method fakeLeakyJVM (so that one can test that the test 
> case really fails when the JVM leaks).
> 
> Updated webrevs:
> http://cr.openjdk.java.net/~lkorinth/8202740/01_02/ (incremental)
> http://cr.openjdk.java.net/~lkorinth/8202740/02/
> 
> Started running mach5 tests...
> 
> Thanks, Leo
>>
>> Thanks,
>> David
>> -----
>>
>>
>>> Testing:
>>> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
>>>
>>> Thanks,
>>> Leo


More information about the hotspot-runtime-dev mailing list