RFR: 8176717: GC log file handle leaked to child processes

Per Liden per.liden at oracle.com
Mon Mar 26 08:17:43 UTC 2018


Hi,

On 03/13/2018 07:46 AM, Thomas Stüfe wrote:
> Hi Leo,
> 
> On Mon, Mar 12, 2018 at 8:40 PM, Leo Korinth <leo.korinth at oracle.com> wrote:
> 
>>
>>
>> On 12/03/18 17:48, Thomas Stüfe wrote:
>>
>>> Hi Leo,
>>>
>>> On Mon, Mar 12, 2018 at 4:54 PM, Leo Korinth <leo.korinth at oracle.com
>>> <mailto:leo.korinth at oracle.com>> wrote:
>>>
>>>
>>>
>>>      On 12/03/18 15:29, Thomas Stüfe wrote:
>>>
>>>          Hi Leo,
>>>
>>>          This seems weird.
>>>
>>>          This would affect numerous open() calls, not just this GC log, I
>>>          cannot imagine the correct fix is to change all of them.
>>>
>>>
>>>      Sorry, I do not understand what you mean with "numerous open()".
>>>      This fix will only affect logging -- or am I missing something?
>>>      os::open does roughly what I try to do in os::fopen_retain.
>>>
>>>
>>> Sorry, I spoke unclear. What I meant was I would expect the problem you
>>> found in gc logging to be present for every raw open()/fopen()/CreateFile()
>>> call in the VM and in the JDK, which are quite a few. I wondered why we do
>>> not see more problems like this.
>>>
>>
>> Oh, now I see, I just *assumed* os::open was used everywhere when in fact
>> it is only used in two places where the file in addition seems to be closed
>> fast afterwards. I should assume less...
>>
>> I guess leaking file descriptors is not that big of a problem. It seems to
>> *have* been a problem on Solaris (which seems to be the reason for
>> os::open) but as the unix file descriptors are closed by core-libs before
>> exec is mostly a windows problem.
>>
>> On windows it is also more of a problem because open files are harder to
>> rename or remove, and therefore the bug report.
>>
>>
>>
>>>
>>>
>>>          In fact, on Posix platforms we close all file descriptors except
>>>          the Pipe ones before between fork() and exec() - see
>>>          unix/native/libjava/ childproc.c.
>>>
>>>
>>>      Yes, that is why my test case did not fail before the fix on
>>>      unix-like systems. I do not know why it is not handled in Windows
>>>      (possibly a bug, possibly to keep old behaviour???), I had planned
>>>      to ask that as a follow up question later, maybe open a bug report
>>>      if it was not for keeping old behaviour. Even though childproc.c
>>>      does close the file handler, I think it is much nicer to open them
>>>      with FD_CLOEXEC (in addition to let childproc.c close it). os::open
>>>      does so, and I would like to handle ::fopen the same way as ::open
>>>      with a proxy call that ensures that the VM process will retain the
>>>      file descriptor it opens (in HotSpot at least).
>>>
>>>          Such code is missing on Windows - see
>>>          windows/native/libjava/ProcessImpl_md.c  . There, we do not have
>>>          fork/exec, but CreateProcess(), and whether we inherit handles
>>>          or not is controlled via an argument to CreateProcess(). But
>>>          that flag is TRUE, so child processes inherit handles.
>>>
>>>          331                    if (!CreateProcessW(
>>>          332                        NULL,             /* executable name */
>>>          333                        (LPWSTR)pcmd,     /* command line */
>>>          334                        NULL,             /* process security
>>>          attribute */
>>>          335                        NULL,             /* thread security
>>>          attribute */
>>>          336                        TRUE,             /* inherits system
>>>          handles */          <<<<<<
>>>          337                        processFlag,      /* selected based
>>>          on exe type */
>>>          338                        (LPVOID)penvBlock,/* environment block
>>> */
>>>          339                        (LPCWSTR)pdir,    /* change to the
>>>          new current directory */
>>>          340                        &si,              /* (in)  startup
>>>          information */
>>>          341                        &pi))             /* (out) process
>>>          information */
>>>          342                    {
>>>          343                        win32Error(env, L"CreateProcess");
>>>          344                    }
>>>
>>>          Maybe this is the real error we should fix? Make Windows
>>>          Runtime.exec behave like the Posix variant by closing all file
>>>          descriptors upon CreateProcess >
>>>          (This seems more of a core-libs question.)
>>>
>>>
>>>      I think it is both a core-libs question and a hotspot question. I
>>>      firmly believe we should retain file descriptors with help of
>>>      FD_CLOEXEC and its variants in HotSpot. I am unsure (and have no
>>>      opinion) what to do in core-libs, maybe there is a deeper thought
>>>      behind line 336?
>>>
>>>      Some reasons for this:
>>>
>>>      - if a process is forked using JNI, it would still be good if the
>>>      hotspot descriptors would not leak.
>>>
>>>      - if (I have no idea if this is true) the behaviour in core-libs can
>>>      not be changed because the behaviour is already wildly (ab)used,
>>>      this is still a correct fix. Remember this will only close file
>>>      descriptors opened by HotSpot code, and at the moment only logging
>>> code.
>>>
>>>      - this will fix the issue in the bug report, and give time for
>>>      core-libs to consider what is correct (and what can be changed
>>>      without breaking applications).
>>>
>>>      Thanks,
>>>      Leo
>>>
>>>
>>> yes, you convinced me.
>>>
>>> 1 We should fix raw open() calls, because if native code forks via a
>>> different code paths than java Runtime.exec(), we run into the same
>>> problem. Your patch fixes one instance of the problem.
>>>
>> Yes, I agree. I now understand that ::open() is much more used in the code
>> base.
>>
>>>
>>> 2 And we should fix Windows Runtime.exec() to the same behaviour as on
>>> Posix. I can see this being backward-compatible-problematic, but it
>>> certainly would be the right thing to do. Would love to know what core-libs
>>> says.
>>>
>>
>> Possibly (I am intentionally dodging this question)
>>
>>>
>>> Okay, about your change: I dislike that we add a new function, especially
>>> a first class open function, to the os namespace. How about this instead:
>>> since we know that os::open() does the right thing on all platforms, why
>>> can we not just use os::open() instead? Afterwards call fdopen() to wrap a
>>> FILE structure around it, respectively call "FILE* os::open(int fd, const
>>> char* mode)" , which seems to be just a wrapped fdopen(). That way you can
>>> get what you want with less change and without introducing a new API.
>>>
>>
>> Yes, that might be a better solution. I did consider it, but was afraid
>> that, for example the (significant) "w"/"w+" differences in semantics would
>> matter. or that:
>>
>> os::open(os::open(path, WINDOWS_ONLY(_)O_CREAT|WINDOWS_ONLY(_)O_TRUNC,
>> flags, mode) mode2)
>>
>> ...or something similar for fopen(path, "w"),  would not be exactly the
>> same. For example it would set the file to binary mode on windows. Maybe it
>> is exactly the same otherwise? For me, the equality in semantics are not
>> obvious.
>>
>> Also, now when I realized there is only two users of os::open I am less
>> sure it always does the right thing...
>>
>> I prefer the os::fopen_retain way.
>>
>>
> I agree with you on the proposed fix: to open the file - at least on
> windows - with the inherit flag turned off. I still disagree with you about
> the way this is done. I am not a bit fan on "one trick APIs" being dropped
> into the os namespace for one singular purpose - I think we had recently a
> similar discussion about an snprintf variant specific for logging only.
> 
> Just counting are a couple of variants I would prefer:
> 
> 1) keep the API locally to logging, do not make it global. It is logging
> specific, after all.
> 
> 2) Or even easier, just do this (logFileOutput.cpp):
> 
> const char* const LogFileOutput::FileOpenMode = WINDOWS_ONLY("aN")
> NOT_WINDOWS("a");
> 
> that would fix windows. The other platforms do not have the problem if
> spawning via Runtime.exec(), and the problem of
> native-forking-and-handle-leaking is, while possible, rather theoretical.
> 
> 2) If you really want a new global API, rename the thing to just
> "os::fopen()". Because after all you want to wrap a generic fopen() and
> forbid handle inheritance, yes? This is the same thing the os::open()
> sister function does too, so if you think you need that, give it a first
> class name :) And we could use tests then too (I think we have gtests for
> os::open()). In that case I also dislike the many ifdefs, so if you keep
> the function in its current form, I'd prefer it fanned out for different
> platforms, like os::open() does it.

Just offering my 2c on this. I would go with what Thomas suggests in 
bullet 2 above.

/Per

> 
> Just my 5c, and tastes differ, so I'll wait what others say. I'll cc Markus
> as the UL owner.
> 
> Oh, I also think the bug desciption is a bit misleading, since this is
> about the UL file handle in general, not only gc log. And may it make sense
> to post this in hotspot-runtime, not hotspot-dev?
> 
> Thanks and Best Regards, Thomas
> 
> 
> 
>> Thanks,
>> Leo
>>
>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>>          Kind Regards, Thomas
>>>
>>>
>>>          On Mon, Mar 12, 2018 at 2:20 PM, Leo Korinth
>>>          <leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>
>>>          <mailto:leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>>>
>>>
>>>          wrote:
>>>
>>>               Hi,
>>>
>>>               This fix is for all operating systems though the problem
>>>          only seams
>>>               to appear on windows.
>>>
>>>               I am creating a proxy function for fopen (os::fopen_retain)
>>>          that
>>>               appends the non-standard "e" mode for linux and bsds. For
>>>          windows
>>>               the "N" mode is used. For other operating systems, I assume
>>>          that I
>>>               can use fcntl F_SETFD FD_CLOEXEC. I think this will work
>>>          for AIX,
>>>               Solaris and other operating systems that do not support the
>>> "e"
>>>               flag. Feedback otherwise please!
>>>
>>>               The reason that I use the mode "e" and not only fcntl for
>>>          linux and
>>>               bsds is threefold. First, I still need to use mode flags on
>>>          windows
>>>               as it does not support fcntl. Second, I probably save a
>>>          system call.
>>>               Third, the change will be applied directly, and there will
>>>          be no
>>>               point in time (between system calls) when the process can
>>>          leak the
>>>               file descriptor, so it is safer.
>>>
>>>               The test case forks three VMs in a row. By doing so we know
>>>          that the
>>>               second VM is opened with a specific log file. The third VM
>>>          should
>>>               have less open file descriptors (as it is does not use
>>> logging)
>>>               which is checked using a UnixOperatingSystemMXBean. This is
>>> not
>>>               possible on windows, so on windows I try to rename the
>>>          file, which
>>>               will not work if the file is opened (the actual reason the
>>>          bug was
>>>               opened).
>>>
>>>               The added test case shows that the bug fix closes the log
>>>          file on
>>>               windows. The VM on other operating systems closed the log
>>>          file even
>>>               before the fix.
>>>
>>>               Maybe the test case should be moved to a different path?
>>>
>>>               Bug:
>>>          https://bugs.openjdk.java.net/browse/JDK-8176717
>>>          <https://bugs.openjdk.java.net/browse/JDK-8176717>
>>>               <https://bugs.openjdk.java.net/browse/JDK-8176717
>>>          <https://bugs.openjdk.java.net/browse/JDK-8176717>>
>>>          https://bugs.openjdk.java.net/browse/JDK-8176809
>>>          <https://bugs.openjdk.java.net/browse/JDK-8176809>
>>>               <https://bugs.openjdk.java.net/browse/JDK-8176809
>>>          <https://bugs.openjdk.java.net/browse/JDK-8176809>>
>>>
>>>               Webrev:
>>>          http://cr.openjdk.java.net/~lkorinth/8176717/00/
>>>          <http://cr.openjdk.java.net/~lkorinth/8176717/00/>
>>>               <http://cr.openjdk.java.net/~lkorinth/8176717/00/
>>>          <http://cr.openjdk.java.net/~lkorinth/8176717/00/>>
>>>
>>>               Testing:
>>>               hs-tier1, hs-tier2 and TestInheritFD.java
>>>               (on 64-bit linux, solaris, windows and mac)
>>>
>>>               Thanks,
>>>               Leo
>>>
>>>
>>>
>>>


More information about the hotspot-dev mailing list