RFR: 8176717: GC log file handle leaked to child processes
Leo Korinth
leo.korinth at oracle.com
Mon Mar 12 19:40:30 UTC 2018
On 12/03/18 17:48, Thomas Stüfe wrote:
> Hi Leo,
>
> On Mon, Mar 12, 2018 at 4:54 PM, Leo Korinth <leo.korinth at oracle.com
> <mailto:leo.korinth at oracle.com>> wrote:
>
>
>
> On 12/03/18 15:29, Thomas Stüfe wrote:
>
> Hi Leo,
>
> This seems weird.
>
> This would affect numerous open() calls, not just this GC log, I
> cannot imagine the correct fix is to change all of them.
>
>
> Sorry, I do not understand what you mean with "numerous open()".
> This fix will only affect logging -- or am I missing something?
> os::open does roughly what I try to do in os::fopen_retain.
>
>
> Sorry, I spoke unclear. What I meant was I would expect the problem you
> found in gc logging to be present for every raw
> open()/fopen()/CreateFile() call in the VM and in the JDK, which are
> quite a few. I wondered why we do not see more problems like this.
Oh, now I see, I just *assumed* os::open was used everywhere when in
fact it is only used in two places where the file in addition seems to
be closed fast afterwards. I should assume less...
I guess leaking file descriptors is not that big of a problem. It seems
to *have* been a problem on Solaris (which seems to be the reason for
os::open) but as the unix file descriptors are closed by core-libs
before exec is mostly a windows problem.
On windows it is also more of a problem because open files are harder to
rename or remove, and therefore the bug report.
>
>
>
> In fact, on Posix platforms we close all file descriptors except
> the Pipe ones before between fork() and exec() - see
> unix/native/libjava/ childproc.c.
>
>
> Yes, that is why my test case did not fail before the fix on
> unix-like systems. I do not know why it is not handled in Windows
> (possibly a bug, possibly to keep old behaviour???), I had planned
> to ask that as a follow up question later, maybe open a bug report
> if it was not for keeping old behaviour. Even though childproc.c
> does close the file handler, I think it is much nicer to open them
> with FD_CLOEXEC (in addition to let childproc.c close it). os::open
> does so, and I would like to handle ::fopen the same way as ::open
> with a proxy call that ensures that the VM process will retain the
> file descriptor it opens (in HotSpot at least).
>
> Such code is missing on Windows - see
> windows/native/libjava/ProcessImpl_md.c . There, we do not have
> fork/exec, but CreateProcess(), and whether we inherit handles
> or not is controlled via an argument to CreateProcess(). But
> that flag is TRUE, so child processes inherit handles.
>
> 331 if (!CreateProcessW(
> 332 NULL, /* executable name */
> 333 (LPWSTR)pcmd, /* command line */
> 334 NULL, /* process security
> attribute */
> 335 NULL, /* thread security
> attribute */
> 336 TRUE, /* inherits system
> handles */ <<<<<<
> 337 processFlag, /* selected based
> on exe type */
> 338 (LPVOID)penvBlock,/* environment block */
> 339 (LPCWSTR)pdir, /* change to the
> new current directory */
> 340 &si, /* (in) startup
> information */
> 341 &pi)) /* (out) process
> information */
> 342 {
> 343 win32Error(env, L"CreateProcess");
> 344 }
>
> Maybe this is the real error we should fix? Make Windows
> Runtime.exec behave like the Posix variant by closing all file
> descriptors upon CreateProcess >
> (This seems more of a core-libs question.)
>
>
> I think it is both a core-libs question and a hotspot question. I
> firmly believe we should retain file descriptors with help of
> FD_CLOEXEC and its variants in HotSpot. I am unsure (and have no
> opinion) what to do in core-libs, maybe there is a deeper thought
> behind line 336?
>
> Some reasons for this:
>
> - if a process is forked using JNI, it would still be good if the
> hotspot descriptors would not leak.
>
> - if (I have no idea if this is true) the behaviour in core-libs can
> not be changed because the behaviour is already wildly (ab)used,
> this is still a correct fix. Remember this will only close file
> descriptors opened by HotSpot code, and at the moment only logging code.
>
> - this will fix the issue in the bug report, and give time for
> core-libs to consider what is correct (and what can be changed
> without breaking applications).
>
> Thanks,
> Leo
>
>
> yes, you convinced me.
>
> 1 We should fix raw open() calls, because if native code forks via a
> different code paths than java Runtime.exec(), we run into the same
> problem. Your patch fixes one instance of the problem.
Yes, I agree. I now understand that ::open() is much more used in the
code base.
>
> 2 And we should fix Windows Runtime.exec() to the same behaviour as on
> Posix. I can see this being backward-compatible-problematic, but it
> certainly would be the right thing to do. Would love to know what
> core-libs says.
Possibly (I am intentionally dodging this question)
>
> Okay, about your change: I dislike that we add a new function,
> especially a first class open function, to the os namespace. How about
> this instead: since we know that os::open() does the right thing on all
> platforms, why can we not just use os::open() instead? Afterwards call
> fdopen() to wrap a FILE structure around it, respectively call "FILE*
> os::open(int fd, const char* mode)" , which seems to be just a wrapped
> fdopen(). That way you can get what you want with less change and
> without introducing a new API.
Yes, that might be a better solution. I did consider it, but was afraid
that, for example the (significant) "w"/"w+" differences in semantics
would matter. or that:
os::open(os::open(path, WINDOWS_ONLY(_)O_CREAT|WINDOWS_ONLY(_)O_TRUNC,
flags, mode) mode2)
...or something similar for fopen(path, "w"), would not be exactly the
same. For example it would set the file to binary mode on windows. Maybe
it is exactly the same otherwise? For me, the equality in semantics are
not obvious.
Also, now when I realized there is only two users of os::open I am less
sure it always does the right thing...
I prefer the os::fopen_retain way.
Thanks,
Leo
>
> Kind Regards, Thomas
>
>
>
> Kind Regards, Thomas
>
>
> On Mon, Mar 12, 2018 at 2:20 PM, Leo Korinth
> <leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>
> <mailto:leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>>>
> wrote:
>
> Hi,
>
> This fix is for all operating systems though the problem
> only seams
> to appear on windows.
>
> I am creating a proxy function for fopen (os::fopen_retain)
> that
> appends the non-standard "e" mode for linux and bsds. For
> windows
> the "N" mode is used. For other operating systems, I assume
> that I
> can use fcntl F_SETFD FD_CLOEXEC. I think this will work
> for AIX,
> Solaris and other operating systems that do not support the "e"
> flag. Feedback otherwise please!
>
> The reason that I use the mode "e" and not only fcntl for
> linux and
> bsds is threefold. First, I still need to use mode flags on
> windows
> as it does not support fcntl. Second, I probably save a
> system call.
> Third, the change will be applied directly, and there will
> be no
> point in time (between system calls) when the process can
> leak the
> file descriptor, so it is safer.
>
> The test case forks three VMs in a row. By doing so we know
> that the
> second VM is opened with a specific log file. The third VM
> should
> have less open file descriptors (as it is does not use logging)
> which is checked using a UnixOperatingSystemMXBean. This is not
> possible on windows, so on windows I try to rename the
> file, which
> will not work if the file is opened (the actual reason the
> bug was
> opened).
>
> The added test case shows that the bug fix closes the log
> file on
> windows. The VM on other operating systems closed the log
> file even
> before the fix.
>
> Maybe the test case should be moved to a different path?
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8176717
> <https://bugs.openjdk.java.net/browse/JDK-8176717>
> <https://bugs.openjdk.java.net/browse/JDK-8176717
> <https://bugs.openjdk.java.net/browse/JDK-8176717>>
> https://bugs.openjdk.java.net/browse/JDK-8176809
> <https://bugs.openjdk.java.net/browse/JDK-8176809>
> <https://bugs.openjdk.java.net/browse/JDK-8176809
> <https://bugs.openjdk.java.net/browse/JDK-8176809>>
>
> Webrev:
> http://cr.openjdk.java.net/~lkorinth/8176717/00/
> <http://cr.openjdk.java.net/~lkorinth/8176717/00/>
> <http://cr.openjdk.java.net/~lkorinth/8176717/00/
> <http://cr.openjdk.java.net/~lkorinth/8176717/00/>>
>
> Testing:
> hs-tier1, hs-tier2 and TestInheritFD.java
> (on 64-bit linux, solaris, windows and mac)
>
> Thanks,
> Leo
>
>
>
More information about the hotspot-dev
mailing list