RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD
David Holmes
david.holmes at oracle.com
Fri Jun 15 00:58:08 UTC 2018
<sigh> A few runs through the CI and we hit a linux failure:
java.lang.RuntimeException: could not find lsof-like command
I will file a bug.
David
On 14/06/2018 5:47 PM, Leo Korinth wrote:
> On 14/06/18 08:00, David Holmes wrote:
>> Hi Leo,
>>
>> On 14/06/2018 12:26 AM, Leo Korinth wrote:
>>> Hi David!
>>>
>>> I have reworked the test case not to use /proc (my solution was not
>>> good on Solaris either). I will now use "lsof -p" if I can find it
>>> and use "pfiles -F" otherwise (no "lsof" is installed on our Solaris
>>> test machines). It does now run on GNU/Linux, Windows, Solaris and
>>> MacOS.
>>
>> FYI my Solaris box has:
>>
>> /usr/local/bin/lsof
> Thanks for the review! I will add "/usr/local/bin/lsof" to the test case.
>
> /Leo
>
>>> I have removed most of all the changes from my previous patch, so I
>>> will not post an incremental webrev.
>>>
>>> In this webrev, I believe the static imports are no longer obscure.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~lkorinth/8202740/03/
>>
>> Looks good!
>>
>> The only minor concern is if lsof/pfiles is not found, but that will
>> get shaken out in a few test cycles.
>>
>> Thanks,
>> David
>>
>>> Thanks,
>>> Leo
>>>
>>>
>>> On 10/06/18 23:21, David Holmes wrote:
>>>> Hi Leo,
>>>>
>>>> Overall updates seem okay - use of Optional does simplify things a
>>>> little.
>>>>
>>>> Nit/General Comment: I'm not a fan of static imports in general as
>>>> they lead to more obscure code IMHO because you have to go and
>>>> search the import list to understand where a method is coming from.
>>>> YMMV.
>>>>
>>>> Regarding OSX ... I think I'd rather see testing that might possibly
>>>> fail (but we haven't seen that have we?) rather than skip it
>>>> altogether. Other opinions welcomed.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 7/06/2018 9:07 PM, Leo Korinth wrote:
>>>>> Hi!
>>>>>
>>>>> Thanks for reviewing!
>>>>>
>>>>> On 06/06/18 07:40, David Holmes wrote:
>>>>>> Hi Leo,
>>>>>>
>>>>>> First, please include the bug id in the RFR email subject line -
>>>>>> thanks.
>>>>>>
>>>>>> On 31/05/2018 11:56 PM, Leo Korinth wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> I am uppdating TestInheritFD.java to fix intermittent test
>>>>>>> failures. The reason is that
>>>>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.
>>>>>>
>>>>>> Okay.
>>>>>>
>>>>>>> The new test does not count the number of open file descriptors
>>>>>>> but instead looks to see if the log file is still open; this is
>>>>>>> less fragile. This relies on a /proc file system, but that seems
>>>>>>> to be available on all non Windows ports except Mac. Mac testing
>>>>>>> is disabled. The Windows solution does not need to change.
>>>>>>
>>>>>> Let me back up a step to understand the original problem and the
>>>>>> test.
>>>>>>
>>>>>> Problem: if a VM was started with a log file, the fd for the log
>>>>>> file would be inherited by an exec'd process (ie another VM).
>>>>>>
>>>>>> Detection: On Windows the original log file could not be deleted
>>>>>> due to the child process's open fd.
>>>>>
>>>>> Exactly, this is actually the original problem.
>>>>>
>>>>>>
>>>>>> Fix: use close-on-exec for the log file
>>>>>>
>>>>>> Test:
>>>>>> - On Windows try to rename the log file - should succeed.
>>>>>> - On non-windows get the count of open fds in the parent process
>>>>>> and child process and check child-count < parent-count as child
>>>>>> should not have the log file.
>>>>>>
>>>>>> The flaw: On Linux, due to the container detection logic the child
>>>>>> can still have an open /sys/fs/cgroup/memory/memory.limit_in_bytes
>>>>>> and so the count is not smaller than the parent.
>>>>>>
>>>>>> Question: why does the parent not also have the open
>>>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes?
>>>>>
>>>>> I have not looked into the logic, but I guess the file is only
>>>>> opened briefly and then closed. That would explain why the test
>>>>> case fails intermittently.
>>>>>
>>>>> Taking a fast look at "subsystem_file_contents" seems to verify this.
>>>>>>
>>>>>> The basic problem for the test is: how it can determine that one
>>>>>> process has an open fd for the same file as another process? The
>>>>>> Windows situation answers that accurately. Counting fd's doesn't.
>>>>>> But the other platforms don't have the rename issue that windows
>>>>>> does. And there's no file API to ask this kind of question as far
>>>>>> as I can see.
>>>>>>
>>>>>> So the proposed fix is to use the /proc filesystem to look in the
>>>>>> open fd's for the file. This seemingly works on all non-windows
>>>>>> platforms except OS X. Do we ever see the intermittent failures on
>>>>>> OS X? If not can it keep using the fd counting logic?
>>>>>
>>>>> The old test could probably be used safely on OS X. But it is quite
>>>>> fragile and could fail if the JVM is opening files for any reason
>>>>> in the future; I think it is better to just disable the testing on
>>>>> OS X. Do you agree?
>>>>>>
>>>>>> Thanks for bearing with me as I walked through that :)
>>>>>>
>>>>>>> This test bug does cause intermittent test failures in tier1, so
>>>>>>> I would be grateful for reviews and also someone to sponsor the
>>>>>>> final hg export.
>>>>>>>
>>>>>>> Bug:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8202740
>>>>>>>
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~lkorinth/8202740/
>>>>>>
>>>>>> So generally the approach seems good. Took quite an effort to
>>>>>> translate all the stream/lambda processing though :)
>>>>>>
>>>>>> + return new File("Error: could not read symbolic link
>>>>>> (last link can not be read as it points to the /proc/self/fd
>>>>>> directory that has been closed): " + e.toString());
>>>>>>
>>>>>> That is a very long line and it seems quite bizarre to "return" an
>>>>>> error as a File object. Took me a while to realize this is not
>>>>>> really an error but the way we detect the last entry which is
>>>>>> unreadable. I suppose this is one way to get all the information
>>>>>> printed out.
>>>>>
>>>>>
>>>>> Exactly, it is ugly, but the log is good to have if the test case
>>>>> fails. I have reworked the code, it is hopefully better now.
>>>>>
>>>>>>
>>>>>> + System.out.println("Open file descriptors:\n" +
>>>>>> stream(dir.listFiles())
>>>>>> + .map(f -> "###" + f.getAbsolutePath() + " -> " +
>>>>>> linkTarget(f))
>>>>>> + .collect(joining("\n")));
>>>>>>
>>>>>> I found this really hard to parse and understand. It would help a
>>>>>> little if the " -> " string was written as " maps to " so it
>>>>>> didn't look like the -> from a lambda expression (or even just
>>>>>> =>). It would also help to unwrap one level and assign to a String
>>>>>> which is then passed to println.
>>>>>
>>>>> Changed, hopefully better now.
>>>>>
>>>>>> + return stream(dir.listFiles())
>>>>>> + .map(TestInheritFD::linkTarget)
>>>>>> + .filter(f -> f.getName().endsWith(LOG_SUFFIX))
>>>>>> + .findAny()
>>>>>> + .isPresent();
>>>>>>
>>>>>> Neat!
>>>>>>
>>>>>> Three minor nits:
>>>>>>
>>>>>> 133 + " ON as files in /proc and /sys is opened by the JVM");
>>>>>>
>>>>>> Typo: is opened -> are opened
>>>>>
>>>>> Fixed.
>>>>>
>>>>>>
>>>>>> 134 System.out.println(findOpenLogFile()?LEAKS_FD:RETAINS_FD);
>>>>>> 179 System.out.println(f.renameTo(f)?RETAINS_FD:LEAKS_FD);
>>>>>>
>>>>>> Spaces around ? and : operators
>>>>>
>>>>> Fixed.
>>>>>
>>>>> I also added method fakeLeakyJVM (so that one can test that the
>>>>> test case really fails when the JVM leaks).
>>>>>
>>>>> Updated webrevs:
>>>>> http://cr.openjdk.java.net/~lkorinth/8202740/01_02/ (incremental)
>>>>> http://cr.openjdk.java.net/~lkorinth/8202740/02/
>>>>>
>>>>> Started running mach5 tests...
>>>>>
>>>>> Thanks, Leo
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>
>>>>>>> Testing:
>>>>>>> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Leo
More information about the hotspot-runtime-dev
mailing list