RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD
Leo Korinth
leo.korinth at oracle.com
Wed Jun 13 17:37:03 UTC 2018
On 13/06/18 16:40, Robbin Ehn wrote:
> On 06/13/2018 04:26 PM, Leo Korinth wrote:
>> Hi David!
>>
>> I have reworked the test case not to use /proc (my solution was not
>> good on Solaris either). I will now use "lsof -p" if I can find it and
>> use "pfiles -F" otherwise (no "lsof" is installed on our Solaris test
>> machines). It does now run on GNU/Linux, Windows, Solaris and MacOS.
>>
>> I have removed most of all the changes from my previous patch, so I
>> will not post an incremental webrev.
>>
>> In this webrev, I believe the static imports are no longer obscure.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~lkorinth/8202740/03/
>
> I'm fine with this also. Thanks for fixing.
>
> /Robbin
Thanks for the review!
/Leo
>
>>
>> Thanks,
>> Leo
>>
>>
>> On 10/06/18 23:21, David Holmes wrote:
>>> Hi Leo,
>>>
>>> Overall updates seem okay - use of Optional does simplify things a
>>> little.
>>>
>>> Nit/General Comment: I'm not a fan of static imports in general as
>>> they lead to more obscure code IMHO because you have to go and search
>>> the import list to understand where a method is coming from. YMMV.
>>>
>>> Regarding OSX ... I think I'd rather see testing that might possibly
>>> fail (but we haven't seen that have we?) rather than skip it
>>> altogether. Other opinions welcomed.
>>>
>>> Thanks,
>>> David
>>>
>>> On 7/06/2018 9:07 PM, Leo Korinth wrote:
>>>> Hi!
>>>>
>>>> Thanks for reviewing!
>>>>
>>>> On 06/06/18 07:40, David Holmes wrote:
>>>>> Hi Leo,
>>>>>
>>>>> First, please include the bug id in the RFR email subject line -
>>>>> thanks.
>>>>>
>>>>> On 31/05/2018 11:56 PM, Leo Korinth wrote:
>>>>>> Hi!
>>>>>>
>>>>>> I am uppdating TestInheritFD.java to fix intermittent test
>>>>>> failures. The reason is that
>>>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.
>>>>>
>>>>> Okay.
>>>>>
>>>>>> The new test does not count the number of open file descriptors
>>>>>> but instead looks to see if the log file is still open; this is
>>>>>> less fragile. This relies on a /proc file system, but that seems
>>>>>> to be available on all non Windows ports except Mac. Mac testing
>>>>>> is disabled. The Windows solution does not need to change.
>>>>>
>>>>> Let me back up a step to understand the original problem and the test.
>>>>>
>>>>> Problem: if a VM was started with a log file, the fd for the log
>>>>> file would be inherited by an exec'd process (ie another VM).
>>>>>
>>>>> Detection: On Windows the original log file could not be deleted
>>>>> due to the child process's open fd.
>>>>
>>>> Exactly, this is actually the original problem.
>>>>
>>>>>
>>>>> Fix: use close-on-exec for the log file
>>>>>
>>>>> Test:
>>>>> - On Windows try to rename the log file - should succeed.
>>>>> - On non-windows get the count of open fds in the parent process
>>>>> and child process and check child-count < parent-count as child
>>>>> should not have the log file.
>>>>>
>>>>> The flaw: On Linux, due to the container detection logic the child
>>>>> can still have an open /sys/fs/cgroup/memory/memory.limit_in_bytes
>>>>> and so the count is not smaller than the parent.
>>>>>
>>>>> Question: why does the parent not also have the open
>>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes?
>>>>
>>>> I have not looked into the logic, but I guess the file is only
>>>> opened briefly and then closed. That would explain why the test case
>>>> fails intermittently.
>>>>
>>>> Taking a fast look at "subsystem_file_contents" seems to verify this.
>>>>>
>>>>> The basic problem for the test is: how it can determine that one
>>>>> process has an open fd for the same file as another process? The
>>>>> Windows situation answers that accurately. Counting fd's doesn't.
>>>>> But the other platforms don't have the rename issue that windows
>>>>> does. And there's no file API to ask this kind of question as far
>>>>> as I can see.
>>>>>
>>>>> So the proposed fix is to use the /proc filesystem to look in the
>>>>> open fd's for the file. This seemingly works on all non-windows
>>>>> platforms except OS X. Do we ever see the intermittent failures on
>>>>> OS X? If not can it keep using the fd counting logic?
>>>>
>>>> The old test could probably be used safely on OS X. But it is quite
>>>> fragile and could fail if the JVM is opening files for any reason in
>>>> the future; I think it is better to just disable the testing on OS
>>>> X. Do you agree?
>>>>>
>>>>> Thanks for bearing with me as I walked through that :)
>>>>>
>>>>>> This test bug does cause intermittent test failures in tier1, so I
>>>>>> would be grateful for reviews and also someone to sponsor the
>>>>>> final hg export.
>>>>>>
>>>>>> Bug:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8202740
>>>>>>
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~lkorinth/8202740/
>>>>>
>>>>> So generally the approach seems good. Took quite an effort to
>>>>> translate all the stream/lambda processing though :)
>>>>>
>>>>> + return new File("Error: could not read symbolic link
>>>>> (last link can not be read as it points to the /proc/self/fd
>>>>> directory that has been closed): " + e.toString());
>>>>>
>>>>> That is a very long line and it seems quite bizarre to "return" an
>>>>> error as a File object. Took me a while to realize this is not
>>>>> really an error but the way we detect the last entry which is
>>>>> unreadable. I suppose this is one way to get all the information
>>>>> printed out.
>>>>
>>>>
>>>> Exactly, it is ugly, but the log is good to have if the test case
>>>> fails. I have reworked the code, it is hopefully better now.
>>>>
>>>>>
>>>>> + System.out.println("Open file descriptors:\n" +
>>>>> stream(dir.listFiles())
>>>>> + .map(f -> "###" + f.getAbsolutePath() + " -> " +
>>>>> linkTarget(f))
>>>>> + .collect(joining("\n")));
>>>>>
>>>>> I found this really hard to parse and understand. It would help a
>>>>> little if the " -> " string was written as " maps to " so it didn't
>>>>> look like the -> from a lambda expression (or even just =>). It
>>>>> would also help to unwrap one level and assign to a String which is
>>>>> then passed to println.
>>>>
>>>> Changed, hopefully better now.
>>>>
>>>>> + return stream(dir.listFiles())
>>>>> + .map(TestInheritFD::linkTarget)
>>>>> + .filter(f -> f.getName().endsWith(LOG_SUFFIX))
>>>>> + .findAny()
>>>>> + .isPresent();
>>>>>
>>>>> Neat!
>>>>>
>>>>> Three minor nits:
>>>>>
>>>>> 133 + " ON as files in /proc and /sys is opened by the JVM");
>>>>>
>>>>> Typo: is opened -> are opened
>>>>
>>>> Fixed.
>>>>
>>>>>
>>>>> 134 System.out.println(findOpenLogFile()?LEAKS_FD:RETAINS_FD);
>>>>> 179 System.out.println(f.renameTo(f)?RETAINS_FD:LEAKS_FD);
>>>>>
>>>>> Spaces around ? and : operators
>>>>
>>>> Fixed.
>>>>
>>>> I also added method fakeLeakyJVM (so that one can test that the test
>>>> case really fails when the JVM leaks).
>>>>
>>>> Updated webrevs:
>>>> http://cr.openjdk.java.net/~lkorinth/8202740/01_02/ (incremental)
>>>> http://cr.openjdk.java.net/~lkorinth/8202740/02/
>>>>
>>>> Started running mach5 tests...
>>>>
>>>> Thanks, Leo
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>
>>>>>> Testing:
>>>>>> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
>>>>>>
>>>>>> Thanks,
>>>>>> Leo
More information about the hotspot-runtime-dev
mailing list