RFR: 8176717: GC log file handle leaked to child processes

Thomas Stüfe thomas.stuefe at gmail.com
Tue Mar 13 06:46:31 UTC 2018


Hi Leo,

On Mon, Mar 12, 2018 at 8:40 PM, Leo Korinth <leo.korinth at oracle.com> wrote:

>
>
> On 12/03/18 17:48, Thomas Stüfe wrote:
>
>> Hi Leo,
>>
>> On Mon, Mar 12, 2018 at 4:54 PM, Leo Korinth <leo.korinth at oracle.com
>> <mailto:leo.korinth at oracle.com>> wrote:
>>
>>
>>
>>     On 12/03/18 15:29, Thomas Stüfe wrote:
>>
>>         Hi Leo,
>>
>>         This seems weird.
>>
>>         This would affect numerous open() calls, not just this GC log, I
>>         cannot imagine the correct fix is to change all of them.
>>
>>
>>     Sorry, I do not understand what you mean with "numerous open()".
>>     This fix will only affect logging -- or am I missing something?
>>     os::open does roughly what I try to do in os::fopen_retain.
>>
>>
>> Sorry, I spoke unclear. What I meant was I would expect the problem you
>> found in gc logging to be present for every raw open()/fopen()/CreateFile()
>> call in the VM and in the JDK, which are quite a few. I wondered why we do
>> not see more problems like this.
>>
>
> Oh, now I see, I just *assumed* os::open was used everywhere when in fact
> it is only used in two places where the file in addition seems to be closed
> fast afterwards. I should assume less...
>
> I guess leaking file descriptors is not that big of a problem. It seems to
> *have* been a problem on Solaris (which seems to be the reason for
> os::open) but as the unix file descriptors are closed by core-libs before
> exec is mostly a windows problem.
>
> On windows it is also more of a problem because open files are harder to
> rename or remove, and therefore the bug report.
>
>
>
>>
>>
>>         In fact, on Posix platforms we close all file descriptors except
>>         the Pipe ones before between fork() and exec() - see
>>         unix/native/libjava/ childproc.c.
>>
>>
>>     Yes, that is why my test case did not fail before the fix on
>>     unix-like systems. I do not know why it is not handled in Windows
>>     (possibly a bug, possibly to keep old behaviour???), I had planned
>>     to ask that as a follow up question later, maybe open a bug report
>>     if it was not for keeping old behaviour. Even though childproc.c
>>     does close the file handler, I think it is much nicer to open them
>>     with FD_CLOEXEC (in addition to let childproc.c close it). os::open
>>     does so, and I would like to handle ::fopen the same way as ::open
>>     with a proxy call that ensures that the VM process will retain the
>>     file descriptor it opens (in HotSpot at least).
>>
>>         Such code is missing on Windows - see
>>         windows/native/libjava/ProcessImpl_md.c  . There, we do not have
>>         fork/exec, but CreateProcess(), and whether we inherit handles
>>         or not is controlled via an argument to CreateProcess(). But
>>         that flag is TRUE, so child processes inherit handles.
>>
>>         331                    if (!CreateProcessW(
>>         332                        NULL,             /* executable name */
>>         333                        (LPWSTR)pcmd,     /* command line */
>>         334                        NULL,             /* process security
>>         attribute */
>>         335                        NULL,             /* thread security
>>         attribute */
>>         336                        TRUE,             /* inherits system
>>         handles */          <<<<<<
>>         337                        processFlag,      /* selected based
>>         on exe type */
>>         338                        (LPVOID)penvBlock,/* environment block
>> */
>>         339                        (LPCWSTR)pdir,    /* change to the
>>         new current directory */
>>         340                        &si,              /* (in)  startup
>>         information */
>>         341                        &pi))             /* (out) process
>>         information */
>>         342                    {
>>         343                        win32Error(env, L"CreateProcess");
>>         344                    }
>>
>>         Maybe this is the real error we should fix? Make Windows
>>         Runtime.exec behave like the Posix variant by closing all file
>>         descriptors upon CreateProcess >
>>         (This seems more of a core-libs question.)
>>
>>
>>     I think it is both a core-libs question and a hotspot question. I
>>     firmly believe we should retain file descriptors with help of
>>     FD_CLOEXEC and its variants in HotSpot. I am unsure (and have no
>>     opinion) what to do in core-libs, maybe there is a deeper thought
>>     behind line 336?
>>
>>     Some reasons for this:
>>
>>     - if a process is forked using JNI, it would still be good if the
>>     hotspot descriptors would not leak.
>>
>>     - if (I have no idea if this is true) the behaviour in core-libs can
>>     not be changed because the behaviour is already wildly (ab)used,
>>     this is still a correct fix. Remember this will only close file
>>     descriptors opened by HotSpot code, and at the moment only logging
>> code.
>>
>>     - this will fix the issue in the bug report, and give time for
>>     core-libs to consider what is correct (and what can be changed
>>     without breaking applications).
>>
>>     Thanks,
>>     Leo
>>
>>
>> yes, you convinced me.
>>
>> 1 We should fix raw open() calls, because if native code forks via a
>> different code paths than java Runtime.exec(), we run into the same
>> problem. Your patch fixes one instance of the problem.
>>
> Yes, I agree. I now understand that ::open() is much more used in the code
> base.
>
>>
>> 2 And we should fix Windows Runtime.exec() to the same behaviour as on
>> Posix. I can see this being backward-compatible-problematic, but it
>> certainly would be the right thing to do. Would love to know what core-libs
>> says.
>>
>
> Possibly (I am intentionally dodging this question)
>
>>
>> Okay, about your change: I dislike that we add a new function, especially
>> a first class open function, to the os namespace. How about this instead:
>> since we know that os::open() does the right thing on all platforms, why
>> can we not just use os::open() instead? Afterwards call fdopen() to wrap a
>> FILE structure around it, respectively call "FILE* os::open(int fd, const
>> char* mode)" , which seems to be just a wrapped fdopen(). That way you can
>> get what you want with less change and without introducing a new API.
>>
>
> Yes, that might be a better solution. I did consider it, but was afraid
> that, for example the (significant) "w"/"w+" differences in semantics would
> matter. or that:
>
> os::open(os::open(path, WINDOWS_ONLY(_)O_CREAT|WINDOWS_ONLY(_)O_TRUNC,
> flags, mode) mode2)
>
> ...or something similar for fopen(path, "w"),  would not be exactly the
> same. For example it would set the file to binary mode on windows. Maybe it
> is exactly the same otherwise? For me, the equality in semantics are not
> obvious.
>
> Also, now when I realized there is only two users of os::open I am less
> sure it always does the right thing...
>
> I prefer the os::fopen_retain way.
>
>
I agree with you on the proposed fix: to open the file - at least on
windows - with the inherit flag turned off. I still disagree with you about
the way this is done. I am not a bit fan on "one trick APIs" being dropped
into the os namespace for one singular purpose - I think we had recently a
similar discussion about an snprintf variant specific for logging only.

Just counting are a couple of variants I would prefer:

1) keep the API locally to logging, do not make it global. It is logging
specific, after all.

2) Or even easier, just do this (logFileOutput.cpp):

const char* const LogFileOutput::FileOpenMode = WINDOWS_ONLY("aN")
NOT_WINDOWS("a");

that would fix windows. The other platforms do not have the problem if
spawning via Runtime.exec(), and the problem of
native-forking-and-handle-leaking is, while possible, rather theoretical.

2) If you really want a new global API, rename the thing to just
"os::fopen()". Because after all you want to wrap a generic fopen() and
forbid handle inheritance, yes? This is the same thing the os::open()
sister function does too, so if you think you need that, give it a first
class name :) And we could use tests then too (I think we have gtests for
os::open()). In that case I also dislike the many ifdefs, so if you keep
the function in its current form, I'd prefer it fanned out for different
platforms, like os::open() does it.

Just my 5c, and tastes differ, so I'll wait what others say. I'll cc Markus
as the UL owner.

Oh, I also think the bug desciption is a bit misleading, since this is
about the UL file handle in general, not only gc log. And may it make sense
to post this in hotspot-runtime, not hotspot-dev?

Thanks and Best Regards, Thomas



> Thanks,
> Leo
>
>
>> Kind Regards, Thomas
>>
>>
>>
>>         Kind Regards, Thomas
>>
>>
>>         On Mon, Mar 12, 2018 at 2:20 PM, Leo Korinth
>>         <leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>
>>         <mailto:leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>>>
>>
>>         wrote:
>>
>>              Hi,
>>
>>              This fix is for all operating systems though the problem
>>         only seams
>>              to appear on windows.
>>
>>              I am creating a proxy function for fopen (os::fopen_retain)
>>         that
>>              appends the non-standard "e" mode for linux and bsds. For
>>         windows
>>              the "N" mode is used. For other operating systems, I assume
>>         that I
>>              can use fcntl F_SETFD FD_CLOEXEC. I think this will work
>>         for AIX,
>>              Solaris and other operating systems that do not support the
>> "e"
>>              flag. Feedback otherwise please!
>>
>>              The reason that I use the mode "e" and not only fcntl for
>>         linux and
>>              bsds is threefold. First, I still need to use mode flags on
>>         windows
>>              as it does not support fcntl. Second, I probably save a
>>         system call.
>>              Third, the change will be applied directly, and there will
>>         be no
>>              point in time (between system calls) when the process can
>>         leak the
>>              file descriptor, so it is safer.
>>
>>              The test case forks three VMs in a row. By doing so we know
>>         that the
>>              second VM is opened with a specific log file. The third VM
>>         should
>>              have less open file descriptors (as it is does not use
>> logging)
>>              which is checked using a UnixOperatingSystemMXBean. This is
>> not
>>              possible on windows, so on windows I try to rename the
>>         file, which
>>              will not work if the file is opened (the actual reason the
>>         bug was
>>              opened).
>>
>>              The added test case shows that the bug fix closes the log
>>         file on
>>              windows. The VM on other operating systems closed the log
>>         file even
>>              before the fix.
>>
>>              Maybe the test case should be moved to a different path?
>>
>>              Bug:
>>         https://bugs.openjdk.java.net/browse/JDK-8176717
>>         <https://bugs.openjdk.java.net/browse/JDK-8176717>
>>              <https://bugs.openjdk.java.net/browse/JDK-8176717
>>         <https://bugs.openjdk.java.net/browse/JDK-8176717>>
>>         https://bugs.openjdk.java.net/browse/JDK-8176809
>>         <https://bugs.openjdk.java.net/browse/JDK-8176809>
>>              <https://bugs.openjdk.java.net/browse/JDK-8176809
>>         <https://bugs.openjdk.java.net/browse/JDK-8176809>>
>>
>>              Webrev:
>>         http://cr.openjdk.java.net/~lkorinth/8176717/00/
>>         <http://cr.openjdk.java.net/~lkorinth/8176717/00/>
>>              <http://cr.openjdk.java.net/~lkorinth/8176717/00/
>>         <http://cr.openjdk.java.net/~lkorinth/8176717/00/>>
>>
>>              Testing:
>>              hs-tier1, hs-tier2 and TestInheritFD.java
>>              (on 64-bit linux, solaris, windows and mac)
>>
>>              Thanks,
>>              Leo
>>
>>
>>
>>


More information about the hotspot-dev mailing list