RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD

Robbin Ehn robbin.ehn at oracle.com
Wed Jun 13 14:40:09 UTC 2018


On 06/13/2018 04:26 PM, Leo Korinth wrote:
> Hi David!
> 
> I have reworked the test case not to use /proc (my solution was not good on 
> Solaris either). I will now use "lsof -p" if I can find it and use "pfiles -F" 
> otherwise (no "lsof" is installed on our Solaris test machines). It does now run 
> on GNU/Linux, Windows, Solaris and MacOS.
> 
> I have removed most of all the changes from my previous patch, so I will not 
> post an incremental webrev.
> 
> In this webrev, I believe the static imports are no longer obscure.
> 
> Webrev:
> http://cr.openjdk.java.net/~lkorinth/8202740/03/

I'm fine with this also. Thanks for fixing.

/Robbin

> 
> Thanks,
> Leo
> 
> 
> On 10/06/18 23:21, David Holmes wrote:
>> Hi Leo,
>>
>> Overall updates seem okay - use of Optional does simplify things a little.
>>
>> Nit/General Comment: I'm not a fan of static imports in general as they lead 
>> to more obscure code IMHO because you have to go and search the import list to 
>> understand where a method is coming from. YMMV.
>>
>> Regarding OSX ... I think I'd rather see testing that might possibly fail (but 
>> we haven't seen that have we?) rather than skip it altogether. Other opinions 
>> welcomed.
>>
>> Thanks,
>> David
>>
>> On 7/06/2018 9:07 PM, Leo Korinth wrote:
>>> Hi!
>>>
>>> Thanks for reviewing!
>>>
>>> On 06/06/18 07:40, David Holmes wrote:
>>>> Hi Leo,
>>>>
>>>> First, please include the bug id in the RFR email subject line - thanks.
>>>>
>>>> On 31/05/2018 11:56 PM, Leo Korinth wrote:
>>>>> Hi!
>>>>>
>>>>> I am uppdating TestInheritFD.java to fix intermittent test
>>>>> failures. The reason is that
>>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.
>>>>
>>>> Okay.
>>>>
>>>>> The new test does not count the number of open file descriptors
>>>>> but instead looks to see if the log file is still open; this is
>>>>> less fragile. This relies on a /proc file system, but that seems
>>>>> to be available on all non Windows ports except Mac. Mac testing is 
>>>>> disabled. The Windows solution does not need to change.
>>>>
>>>> Let me back up a step to understand the original problem and the test.
>>>>
>>>> Problem: if a VM was started with a log file, the fd for the log file would 
>>>> be inherited by an exec'd process (ie another VM).
>>>>
>>>> Detection: On Windows the original log file could not be deleted due to the 
>>>> child process's open fd.
>>>
>>> Exactly, this is actually the original problem.
>>>
>>>>
>>>> Fix: use close-on-exec for the log file
>>>>
>>>> Test:
>>>>   - On Windows try to rename the log file - should succeed.
>>>>   - On non-windows get the count of open fds in the parent process and child 
>>>> process and check child-count < parent-count as child should not have the 
>>>> log file.
>>>>
>>>> The flaw: On Linux, due to the container detection logic the child can still 
>>>> have an open /sys/fs/cgroup/memory/memory.limit_in_bytes and so the count is 
>>>> not smaller than the parent.
>>>>
>>>> Question: why does the parent not also have the open 
>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes?
>>>
>>> I have not looked into the logic, but I guess the file is only opened briefly 
>>> and then closed. That would explain why the test case fails intermittently.
>>>
>>> Taking a fast look at "subsystem_file_contents" seems to verify this.
>>>>
>>>> The basic problem for the test is: how it can determine that one process has 
>>>> an open fd for the same file as another process? The Windows situation 
>>>> answers that accurately. Counting fd's doesn't. But the other platforms 
>>>> don't have the rename issue that windows does. And there's no file API to 
>>>> ask this kind of question as far as I can see.
>>>>
>>>> So the proposed fix is to use the /proc filesystem to look in the open fd's 
>>>> for the file. This seemingly works on all non-windows platforms except OS X. 
>>>> Do we ever see the intermittent failures on OS X? If not can it keep using 
>>>> the fd counting logic?
>>>
>>> The old test could probably be used safely on OS X. But it is quite fragile 
>>> and could fail if the JVM is opening files for any reason in the future; I 
>>> think it is better to just disable the testing on OS X. Do you agree?
>>>>
>>>> Thanks for bearing with me as I walked through that :)
>>>>
>>>>> This test bug does cause intermittent test failures in tier1, so I would be 
>>>>> grateful for reviews and also someone to sponsor the final hg export.
>>>>>
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8202740
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~lkorinth/8202740/
>>>>
>>>> So generally the approach seems good. Took quite an effort to translate all 
>>>> the stream/lambda processing though :)
>>>>
>>>> +             return new File("Error: could not read symbolic link (last 
>>>> link can not be read as it points to the /proc/self/fd directory that has 
>>>> been closed): " + e.toString());
>>>>
>>>> That is a very long line and it seems quite bizarre to "return" an error as 
>>>> a File object. Took me a while to realize this is not really an error but 
>>>> the way we detect the last entry which is unreadable. I suppose this is one 
>>>> way to get all the information printed out.
>>>
>>>
>>> Exactly, it is ugly, but the log is good to have if the test case fails. I 
>>> have reworked the code, it is hopefully better now.
>>>
>>>>
>>>> +         System.out.println("Open file descriptors:\n" + 
>>>> stream(dir.listFiles())
>>>> +             .map(f -> "###" + f.getAbsolutePath() + " -> " + linkTarget(f))
>>>> +             .collect(joining("\n")));
>>>>
>>>> I found this really hard to parse and understand. It would help a little if 
>>>> the " -> " string was written as " maps to " so it didn't look like the -> 
>>>> from a lambda expression (or even just =>). It would also help to unwrap one 
>>>> level and assign to a String which is then passed to println.
>>>
>>> Changed, hopefully better now.
>>>
>>>> +         return stream(dir.listFiles())
>>>> +             .map(TestInheritFD::linkTarget)
>>>> +             .filter(f -> f.getName().endsWith(LOG_SUFFIX))
>>>> +             .findAny()
>>>> +             .isPresent();
>>>>
>>>> Neat!
>>>>
>>>> Three minor nits:
>>>>
>>>> 133   + " ON as files in /proc and /sys is opened by the JVM");
>>>>
>>>> Typo: is opened -> are opened
>>>
>>> Fixed.
>>>
>>>>
>>>> 134    System.out.println(findOpenLogFile()?LEAKS_FD:RETAINS_FD);
>>>> 179    System.out.println(f.renameTo(f)?RETAINS_FD:LEAKS_FD);
>>>>
>>>> Spaces around ? and : operators
>>>
>>> Fixed.
>>>
>>> I also added method fakeLeakyJVM (so that one can test that the test case 
>>> really fails when the JVM leaks).
>>>
>>> Updated webrevs:
>>> http://cr.openjdk.java.net/~lkorinth/8202740/01_02/ (incremental)
>>> http://cr.openjdk.java.net/~lkorinth/8202740/02/
>>>
>>> Started running mach5 tests...
>>>
>>> Thanks, Leo
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>
>>>>> Testing:
>>>>> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
>>>>>
>>>>> Thanks,
>>>>> Leo


More information about the hotspot-runtime-dev mailing list