RFR: 8176717: GC log file handle leaked to child processes
Per Liden
per.liden at oracle.com
Mon Mar 26 08:17:43 UTC 2018
Hi,
On 03/13/2018 07:46 AM, Thomas Stüfe wrote:
> Hi Leo,
>
> On Mon, Mar 12, 2018 at 8:40 PM, Leo Korinth <leo.korinth at oracle.com> wrote:
>
>>
>>
>> On 12/03/18 17:48, Thomas Stüfe wrote:
>>
>>> Hi Leo,
>>>
>>> On Mon, Mar 12, 2018 at 4:54 PM, Leo Korinth <leo.korinth at oracle.com
>>> <mailto:leo.korinth at oracle.com>> wrote:
>>>
>>>
>>>
>>> On 12/03/18 15:29, Thomas Stüfe wrote:
>>>
>>> Hi Leo,
>>>
>>> This seems weird.
>>>
>>> This would affect numerous open() calls, not just this GC log, I
>>> cannot imagine the correct fix is to change all of them.
>>>
>>>
>>> Sorry, I do not understand what you mean with "numerous open()".
>>> This fix will only affect logging -- or am I missing something?
>>> os::open does roughly what I try to do in os::fopen_retain.
>>>
>>>
>>> Sorry, I spoke unclear. What I meant was I would expect the problem you
>>> found in gc logging to be present for every raw open()/fopen()/CreateFile()
>>> call in the VM and in the JDK, which are quite a few. I wondered why we do
>>> not see more problems like this.
>>>
>>
>> Oh, now I see, I just *assumed* os::open was used everywhere when in fact
>> it is only used in two places where the file in addition seems to be closed
>> fast afterwards. I should assume less...
>>
>> I guess leaking file descriptors is not that big of a problem. It seems to
>> *have* been a problem on Solaris (which seems to be the reason for
>> os::open) but as the unix file descriptors are closed by core-libs before
>> exec is mostly a windows problem.
>>
>> On windows it is also more of a problem because open files are harder to
>> rename or remove, and therefore the bug report.
>>
>>
>>
>>>
>>>
>>> In fact, on Posix platforms we close all file descriptors except
>>> the Pipe ones before between fork() and exec() - see
>>> unix/native/libjava/ childproc.c.
>>>
>>>
>>> Yes, that is why my test case did not fail before the fix on
>>> unix-like systems. I do not know why it is not handled in Windows
>>> (possibly a bug, possibly to keep old behaviour???), I had planned
>>> to ask that as a follow up question later, maybe open a bug report
>>> if it was not for keeping old behaviour. Even though childproc.c
>>> does close the file handler, I think it is much nicer to open them
>>> with FD_CLOEXEC (in addition to let childproc.c close it). os::open
>>> does so, and I would like to handle ::fopen the same way as ::open
>>> with a proxy call that ensures that the VM process will retain the
>>> file descriptor it opens (in HotSpot at least).
>>>
>>> Such code is missing on Windows - see
>>> windows/native/libjava/ProcessImpl_md.c . There, we do not have
>>> fork/exec, but CreateProcess(), and whether we inherit handles
>>> or not is controlled via an argument to CreateProcess(). But
>>> that flag is TRUE, so child processes inherit handles.
>>>
>>> 331 if (!CreateProcessW(
>>> 332 NULL, /* executable name */
>>> 333 (LPWSTR)pcmd, /* command line */
>>> 334 NULL, /* process security
>>> attribute */
>>> 335 NULL, /* thread security
>>> attribute */
>>> 336 TRUE, /* inherits system
>>> handles */ <<<<<<
>>> 337 processFlag, /* selected based
>>> on exe type */
>>> 338 (LPVOID)penvBlock,/* environment block
>>> */
>>> 339 (LPCWSTR)pdir, /* change to the
>>> new current directory */
>>> 340 &si, /* (in) startup
>>> information */
>>> 341 &pi)) /* (out) process
>>> information */
>>> 342 {
>>> 343 win32Error(env, L"CreateProcess");
>>> 344 }
>>>
>>> Maybe this is the real error we should fix? Make Windows
>>> Runtime.exec behave like the Posix variant by closing all file
>>> descriptors upon CreateProcess >
>>> (This seems more of a core-libs question.)
>>>
>>>
>>> I think it is both a core-libs question and a hotspot question. I
>>> firmly believe we should retain file descriptors with help of
>>> FD_CLOEXEC and its variants in HotSpot. I am unsure (and have no
>>> opinion) what to do in core-libs, maybe there is a deeper thought
>>> behind line 336?
>>>
>>> Some reasons for this:
>>>
>>> - if a process is forked using JNI, it would still be good if the
>>> hotspot descriptors would not leak.
>>>
>>> - if (I have no idea if this is true) the behaviour in core-libs can
>>> not be changed because the behaviour is already wildly (ab)used,
>>> this is still a correct fix. Remember this will only close file
>>> descriptors opened by HotSpot code, and at the moment only logging
>>> code.
>>>
>>> - this will fix the issue in the bug report, and give time for
>>> core-libs to consider what is correct (and what can be changed
>>> without breaking applications).
>>>
>>> Thanks,
>>> Leo
>>>
>>>
>>> yes, you convinced me.
>>>
>>> 1 We should fix raw open() calls, because if native code forks via a
>>> different code paths than java Runtime.exec(), we run into the same
>>> problem. Your patch fixes one instance of the problem.
>>>
>> Yes, I agree. I now understand that ::open() is much more used in the code
>> base.
>>
>>>
>>> 2 And we should fix Windows Runtime.exec() to the same behaviour as on
>>> Posix. I can see this being backward-compatible-problematic, but it
>>> certainly would be the right thing to do. Would love to know what core-libs
>>> says.
>>>
>>
>> Possibly (I am intentionally dodging this question)
>>
>>>
>>> Okay, about your change: I dislike that we add a new function, especially
>>> a first class open function, to the os namespace. How about this instead:
>>> since we know that os::open() does the right thing on all platforms, why
>>> can we not just use os::open() instead? Afterwards call fdopen() to wrap a
>>> FILE structure around it, respectively call "FILE* os::open(int fd, const
>>> char* mode)" , which seems to be just a wrapped fdopen(). That way you can
>>> get what you want with less change and without introducing a new API.
>>>
>>
>> Yes, that might be a better solution. I did consider it, but was afraid
>> that, for example the (significant) "w"/"w+" differences in semantics would
>> matter. or that:
>>
>> os::open(os::open(path, WINDOWS_ONLY(_)O_CREAT|WINDOWS_ONLY(_)O_TRUNC,
>> flags, mode) mode2)
>>
>> ...or something similar for fopen(path, "w"), would not be exactly the
>> same. For example it would set the file to binary mode on windows. Maybe it
>> is exactly the same otherwise? For me, the equality in semantics are not
>> obvious.
>>
>> Also, now when I realized there is only two users of os::open I am less
>> sure it always does the right thing...
>>
>> I prefer the os::fopen_retain way.
>>
>>
> I agree with you on the proposed fix: to open the file - at least on
> windows - with the inherit flag turned off. I still disagree with you about
> the way this is done. I am not a bit fan on "one trick APIs" being dropped
> into the os namespace for one singular purpose - I think we had recently a
> similar discussion about an snprintf variant specific for logging only.
>
> Just counting are a couple of variants I would prefer:
>
> 1) keep the API locally to logging, do not make it global. It is logging
> specific, after all.
>
> 2) Or even easier, just do this (logFileOutput.cpp):
>
> const char* const LogFileOutput::FileOpenMode = WINDOWS_ONLY("aN")
> NOT_WINDOWS("a");
>
> that would fix windows. The other platforms do not have the problem if
> spawning via Runtime.exec(), and the problem of
> native-forking-and-handle-leaking is, while possible, rather theoretical.
>
> 2) If you really want a new global API, rename the thing to just
> "os::fopen()". Because after all you want to wrap a generic fopen() and
> forbid handle inheritance, yes? This is the same thing the os::open()
> sister function does too, so if you think you need that, give it a first
> class name :) And we could use tests then too (I think we have gtests for
> os::open()). In that case I also dislike the many ifdefs, so if you keep
> the function in its current form, I'd prefer it fanned out for different
> platforms, like os::open() does it.
Just offering my 2c on this. I would go with what Thomas suggests in
bullet 2 above.
/Per
>
> Just my 5c, and tastes differ, so I'll wait what others say. I'll cc Markus
> as the UL owner.
>
> Oh, I also think the bug desciption is a bit misleading, since this is
> about the UL file handle in general, not only gc log. And may it make sense
> to post this in hotspot-runtime, not hotspot-dev?
>
> Thanks and Best Regards, Thomas
>
>
>
>> Thanks,
>> Leo
>>
>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>> On Mon, Mar 12, 2018 at 2:20 PM, Leo Korinth
>>> <leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>
>>> <mailto:leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>>>
>>>
>>> wrote:
>>>
>>> Hi,
>>>
>>> This fix is for all operating systems though the problem
>>> only seams
>>> to appear on windows.
>>>
>>> I am creating a proxy function for fopen (os::fopen_retain)
>>> that
>>> appends the non-standard "e" mode for linux and bsds. For
>>> windows
>>> the "N" mode is used. For other operating systems, I assume
>>> that I
>>> can use fcntl F_SETFD FD_CLOEXEC. I think this will work
>>> for AIX,
>>> Solaris and other operating systems that do not support the
>>> "e"
>>> flag. Feedback otherwise please!
>>>
>>> The reason that I use the mode "e" and not only fcntl for
>>> linux and
>>> bsds is threefold. First, I still need to use mode flags on
>>> windows
>>> as it does not support fcntl. Second, I probably save a
>>> system call.
>>> Third, the change will be applied directly, and there will
>>> be no
>>> point in time (between system calls) when the process can
>>> leak the
>>> file descriptor, so it is safer.
>>>
>>> The test case forks three VMs in a row. By doing so we know
>>> that the
>>> second VM is opened with a specific log file. The third VM
>>> should
>>> have less open file descriptors (as it is does not use
>>> logging)
>>> which is checked using a UnixOperatingSystemMXBean. This is
>>> not
>>> possible on windows, so on windows I try to rename the
>>> file, which
>>> will not work if the file is opened (the actual reason the
>>> bug was
>>> opened).
>>>
>>> The added test case shows that the bug fix closes the log
>>> file on
>>> windows. The VM on other operating systems closed the log
>>> file even
>>> before the fix.
>>>
>>> Maybe the test case should be moved to a different path?
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8176717
>>> <https://bugs.openjdk.java.net/browse/JDK-8176717>
>>> <https://bugs.openjdk.java.net/browse/JDK-8176717
>>> <https://bugs.openjdk.java.net/browse/JDK-8176717>>
>>> https://bugs.openjdk.java.net/browse/JDK-8176809
>>> <https://bugs.openjdk.java.net/browse/JDK-8176809>
>>> <https://bugs.openjdk.java.net/browse/JDK-8176809
>>> <https://bugs.openjdk.java.net/browse/JDK-8176809>>
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~lkorinth/8176717/00/
>>> <http://cr.openjdk.java.net/~lkorinth/8176717/00/>
>>> <http://cr.openjdk.java.net/~lkorinth/8176717/00/
>>> <http://cr.openjdk.java.net/~lkorinth/8176717/00/>>
>>>
>>> Testing:
>>> hs-tier1, hs-tier2 and TestInheritFD.java
>>> (on 64-bit linux, solaris, windows and mac)
>>>
>>> Thanks,
>>> Leo
>>>
>>>
>>>
>>>
More information about the hotspot-dev
mailing list