RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD

David Holmes david.holmes at oracle.com
Fri Jun 15 01:30:47 UTC 2018


Sorry hadn't seen one was already filed and out for review.

David

On 15/06/2018 10:58 AM, David Holmes wrote:
> <sigh> A few runs through the CI and we hit a linux failure:
> 
> java.lang.RuntimeException: could not find lsof-like command
> 
> I will file a bug.
> 
> David
> 
> On 14/06/2018 5:47 PM, Leo Korinth wrote:
>> On 14/06/18 08:00, David Holmes wrote:
>>> Hi Leo,
>>>
>>> On 14/06/2018 12:26 AM, Leo Korinth wrote:
>>>> Hi David!
>>>>
>>>> I have reworked the test case not to use /proc (my solution was not 
>>>> good on Solaris either). I will now use "lsof -p" if I can find it 
>>>> and use "pfiles -F" otherwise (no "lsof" is installed on our Solaris 
>>>> test machines). It does now run on GNU/Linux, Windows, Solaris and 
>>>> MacOS.
>>>
>>> FYI my Solaris box has:
>>>
>>> /usr/local/bin/lsof
>> Thanks for the review! I will add "/usr/local/bin/lsof" to the test case.
>>
>> /Leo
>>
>>>> I have removed most of all the changes from my previous patch, so I 
>>>> will not post an incremental webrev.
>>>>
>>>> In this webrev, I believe the static imports are no longer obscure.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~lkorinth/8202740/03/
>>>
>>> Looks good!
>>>
>>> The only minor concern is if lsof/pfiles is not found, but that will 
>>> get shaken out in a few test cycles.
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks,
>>>> Leo
>>>>
>>>>
>>>> On 10/06/18 23:21, David Holmes wrote:
>>>>> Hi Leo,
>>>>>
>>>>> Overall updates seem okay - use of Optional does simplify things a 
>>>>> little.
>>>>>
>>>>> Nit/General Comment: I'm not a fan of static imports in general as 
>>>>> they lead to more obscure code IMHO because you have to go and 
>>>>> search the import list to understand where a method is coming from. 
>>>>> YMMV.
>>>>>
>>>>> Regarding OSX ... I think I'd rather see testing that might 
>>>>> possibly fail (but we haven't seen that have we?) rather than skip 
>>>>> it altogether. Other opinions welcomed.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 7/06/2018 9:07 PM, Leo Korinth wrote:
>>>>>> Hi!
>>>>>>
>>>>>> Thanks for reviewing!
>>>>>>
>>>>>> On 06/06/18 07:40, David Holmes wrote:
>>>>>>> Hi Leo,
>>>>>>>
>>>>>>> First, please include the bug id in the RFR email subject line - 
>>>>>>> thanks.
>>>>>>>
>>>>>>> On 31/05/2018 11:56 PM, Leo Korinth wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> I am uppdating TestInheritFD.java to fix intermittent test
>>>>>>>> failures. The reason is that
>>>>>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.
>>>>>>>
>>>>>>> Okay.
>>>>>>>
>>>>>>>> The new test does not count the number of open file descriptors
>>>>>>>> but instead looks to see if the log file is still open; this is
>>>>>>>> less fragile. This relies on a /proc file system, but that seems
>>>>>>>> to be available on all non Windows ports except Mac. Mac testing 
>>>>>>>> is disabled. The Windows solution does not need to change.
>>>>>>>
>>>>>>> Let me back up a step to understand the original problem and the 
>>>>>>> test.
>>>>>>>
>>>>>>> Problem: if a VM was started with a log file, the fd for the log 
>>>>>>> file would be inherited by an exec'd process (ie another VM).
>>>>>>>
>>>>>>> Detection: On Windows the original log file could not be deleted 
>>>>>>> due to the child process's open fd.
>>>>>>
>>>>>> Exactly, this is actually the original problem.
>>>>>>
>>>>>>>
>>>>>>> Fix: use close-on-exec for the log file
>>>>>>>
>>>>>>> Test:
>>>>>>>   - On Windows try to rename the log file - should succeed.
>>>>>>>   - On non-windows get the count of open fds in the parent 
>>>>>>> process and child process and check child-count < parent-count as 
>>>>>>> child should not have the log file.
>>>>>>>
>>>>>>> The flaw: On Linux, due to the container detection logic the 
>>>>>>> child can still have an open 
>>>>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes and so the count is 
>>>>>>> not smaller than the parent.
>>>>>>>
>>>>>>> Question: why does the parent not also have the open 
>>>>>>> /sys/fs/cgroup/memory/memory.limit_in_bytes?
>>>>>>
>>>>>> I have not looked into the logic, but I guess the file is only 
>>>>>> opened briefly and then closed. That would explain why the test 
>>>>>> case fails intermittently.
>>>>>>
>>>>>> Taking a fast look at "subsystem_file_contents" seems to verify this.
>>>>>>>
>>>>>>> The basic problem for the test is: how it can determine that one 
>>>>>>> process has an open fd for the same file as another process? The 
>>>>>>> Windows situation answers that accurately. Counting fd's doesn't. 
>>>>>>> But the other platforms don't have the rename issue that windows 
>>>>>>> does. And there's no file API to ask this kind of question as far 
>>>>>>> as I can see.
>>>>>>>
>>>>>>> So the proposed fix is to use the /proc filesystem to look in the 
>>>>>>> open fd's for the file. This seemingly works on all non-windows 
>>>>>>> platforms except OS X. Do we ever see the intermittent failures 
>>>>>>> on OS X? If not can it keep using the fd counting logic?
>>>>>>
>>>>>> The old test could probably be used safely on OS X. But it is 
>>>>>> quite fragile and could fail if the JVM is opening files for any 
>>>>>> reason in the future; I think it is better to just disable the 
>>>>>> testing on OS X. Do you agree?
>>>>>>>
>>>>>>> Thanks for bearing with me as I walked through that :)
>>>>>>>
>>>>>>>> This test bug does cause intermittent test failures in tier1, so 
>>>>>>>> I would be grateful for reviews and also someone to sponsor the 
>>>>>>>> final hg export.
>>>>>>>>
>>>>>>>> Bug:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8202740
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~lkorinth/8202740/
>>>>>>>
>>>>>>> So generally the approach seems good. Took quite an effort to 
>>>>>>> translate all the stream/lambda processing though :)
>>>>>>>
>>>>>>> +             return new File("Error: could not read symbolic 
>>>>>>> link (last link can not be read as it points to the /proc/self/fd 
>>>>>>> directory that has been closed): " + e.toString());
>>>>>>>
>>>>>>> That is a very long line and it seems quite bizarre to "return" 
>>>>>>> an error as a File object. Took me a while to realize this is not 
>>>>>>> really an error but the way we detect the last entry which is 
>>>>>>> unreadable. I suppose this is one way to get all the information 
>>>>>>> printed out.
>>>>>>
>>>>>>
>>>>>> Exactly, it is ugly, but the log is good to have if the test case 
>>>>>> fails. I have reworked the code, it is hopefully better now.
>>>>>>
>>>>>>>
>>>>>>> +         System.out.println("Open file descriptors:\n" + 
>>>>>>> stream(dir.listFiles())
>>>>>>> +             .map(f -> "###" + f.getAbsolutePath() + " -> " + 
>>>>>>> linkTarget(f))
>>>>>>> +             .collect(joining("\n")));
>>>>>>>
>>>>>>> I found this really hard to parse and understand. It would help a 
>>>>>>> little if the " -> " string was written as " maps to " so it 
>>>>>>> didn't look like the -> from a lambda expression (or even just 
>>>>>>> =>). It would also help to unwrap one level and assign to a 
>>>>>>> String which is then passed to println.
>>>>>>
>>>>>> Changed, hopefully better now.
>>>>>>
>>>>>>> +         return stream(dir.listFiles())
>>>>>>> +             .map(TestInheritFD::linkTarget)
>>>>>>> +             .filter(f -> f.getName().endsWith(LOG_SUFFIX))
>>>>>>> +             .findAny()
>>>>>>> +             .isPresent();
>>>>>>>
>>>>>>> Neat!
>>>>>>>
>>>>>>> Three minor nits:
>>>>>>>
>>>>>>> 133   + " ON as files in /proc and /sys is opened by the JVM");
>>>>>>>
>>>>>>> Typo: is opened -> are opened
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>>
>>>>>>> 134    System.out.println(findOpenLogFile()?LEAKS_FD:RETAINS_FD);
>>>>>>> 179    System.out.println(f.renameTo(f)?RETAINS_FD:LEAKS_FD);
>>>>>>>
>>>>>>> Spaces around ? and : operators
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>> I also added method fakeLeakyJVM (so that one can test that the 
>>>>>> test case really fails when the JVM leaks).
>>>>>>
>>>>>> Updated webrevs:
>>>>>> http://cr.openjdk.java.net/~lkorinth/8202740/01_02/ (incremental)
>>>>>> http://cr.openjdk.java.net/~lkorinth/8202740/02/
>>>>>>
>>>>>> Started running mach5 tests...
>>>>>>
>>>>>> Thanks, Leo
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>
>>>>>>>> Testing:
>>>>>>>> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Leo


More information about the hotspot-runtime-dev mailing list