RFR: 8176717: GC log file handle leaked to child processes
Thomas Stüfe
thomas.stuefe at gmail.com
Tue Mar 13 06:46:31 UTC 2018
Hi Leo,
On Mon, Mar 12, 2018 at 8:40 PM, Leo Korinth <leo.korinth at oracle.com> wrote:
>
>
> On 12/03/18 17:48, Thomas Stüfe wrote:
>
>> Hi Leo,
>>
>> On Mon, Mar 12, 2018 at 4:54 PM, Leo Korinth <leo.korinth at oracle.com
>> <mailto:leo.korinth at oracle.com>> wrote:
>>
>>
>>
>> On 12/03/18 15:29, Thomas Stüfe wrote:
>>
>> Hi Leo,
>>
>> This seems weird.
>>
>> This would affect numerous open() calls, not just this GC log, I
>> cannot imagine the correct fix is to change all of them.
>>
>>
>> Sorry, I do not understand what you mean with "numerous open()".
>> This fix will only affect logging -- or am I missing something?
>> os::open does roughly what I try to do in os::fopen_retain.
>>
>>
>> Sorry, I spoke unclear. What I meant was I would expect the problem you
>> found in gc logging to be present for every raw open()/fopen()/CreateFile()
>> call in the VM and in the JDK, which are quite a few. I wondered why we do
>> not see more problems like this.
>>
>
> Oh, now I see, I just *assumed* os::open was used everywhere when in fact
> it is only used in two places where the file in addition seems to be closed
> fast afterwards. I should assume less...
>
> I guess leaking file descriptors is not that big of a problem. It seems to
> *have* been a problem on Solaris (which seems to be the reason for
> os::open) but as the unix file descriptors are closed by core-libs before
> exec is mostly a windows problem.
>
> On windows it is also more of a problem because open files are harder to
> rename or remove, and therefore the bug report.
>
>
>
>>
>>
>> In fact, on Posix platforms we close all file descriptors except
>> the Pipe ones before between fork() and exec() - see
>> unix/native/libjava/ childproc.c.
>>
>>
>> Yes, that is why my test case did not fail before the fix on
>> unix-like systems. I do not know why it is not handled in Windows
>> (possibly a bug, possibly to keep old behaviour???), I had planned
>> to ask that as a follow up question later, maybe open a bug report
>> if it was not for keeping old behaviour. Even though childproc.c
>> does close the file handler, I think it is much nicer to open them
>> with FD_CLOEXEC (in addition to let childproc.c close it). os::open
>> does so, and I would like to handle ::fopen the same way as ::open
>> with a proxy call that ensures that the VM process will retain the
>> file descriptor it opens (in HotSpot at least).
>>
>> Such code is missing on Windows - see
>> windows/native/libjava/ProcessImpl_md.c . There, we do not have
>> fork/exec, but CreateProcess(), and whether we inherit handles
>> or not is controlled via an argument to CreateProcess(). But
>> that flag is TRUE, so child processes inherit handles.
>>
>> 331 if (!CreateProcessW(
>> 332 NULL, /* executable name */
>> 333 (LPWSTR)pcmd, /* command line */
>> 334 NULL, /* process security
>> attribute */
>> 335 NULL, /* thread security
>> attribute */
>> 336 TRUE, /* inherits system
>> handles */ <<<<<<
>> 337 processFlag, /* selected based
>> on exe type */
>> 338 (LPVOID)penvBlock,/* environment block
>> */
>> 339 (LPCWSTR)pdir, /* change to the
>> new current directory */
>> 340 &si, /* (in) startup
>> information */
>> 341 &pi)) /* (out) process
>> information */
>> 342 {
>> 343 win32Error(env, L"CreateProcess");
>> 344 }
>>
>> Maybe this is the real error we should fix? Make Windows
>> Runtime.exec behave like the Posix variant by closing all file
>> descriptors upon CreateProcess >
>> (This seems more of a core-libs question.)
>>
>>
>> I think it is both a core-libs question and a hotspot question. I
>> firmly believe we should retain file descriptors with help of
>> FD_CLOEXEC and its variants in HotSpot. I am unsure (and have no
>> opinion) what to do in core-libs, maybe there is a deeper thought
>> behind line 336?
>>
>> Some reasons for this:
>>
>> - if a process is forked using JNI, it would still be good if the
>> hotspot descriptors would not leak.
>>
>> - if (I have no idea if this is true) the behaviour in core-libs can
>> not be changed because the behaviour is already wildly (ab)used,
>> this is still a correct fix. Remember this will only close file
>> descriptors opened by HotSpot code, and at the moment only logging
>> code.
>>
>> - this will fix the issue in the bug report, and give time for
>> core-libs to consider what is correct (and what can be changed
>> without breaking applications).
>>
>> Thanks,
>> Leo
>>
>>
>> yes, you convinced me.
>>
>> 1 We should fix raw open() calls, because if native code forks via a
>> different code paths than java Runtime.exec(), we run into the same
>> problem. Your patch fixes one instance of the problem.
>>
> Yes, I agree. I now understand that ::open() is much more used in the code
> base.
>
>>
>> 2 And we should fix Windows Runtime.exec() to the same behaviour as on
>> Posix. I can see this being backward-compatible-problematic, but it
>> certainly would be the right thing to do. Would love to know what core-libs
>> says.
>>
>
> Possibly (I am intentionally dodging this question)
>
>>
>> Okay, about your change: I dislike that we add a new function, especially
>> a first class open function, to the os namespace. How about this instead:
>> since we know that os::open() does the right thing on all platforms, why
>> can we not just use os::open() instead? Afterwards call fdopen() to wrap a
>> FILE structure around it, respectively call "FILE* os::open(int fd, const
>> char* mode)" , which seems to be just a wrapped fdopen(). That way you can
>> get what you want with less change and without introducing a new API.
>>
>
> Yes, that might be a better solution. I did consider it, but was afraid
> that, for example the (significant) "w"/"w+" differences in semantics would
> matter. or that:
>
> os::open(os::open(path, WINDOWS_ONLY(_)O_CREAT|WINDOWS_ONLY(_)O_TRUNC,
> flags, mode) mode2)
>
> ...or something similar for fopen(path, "w"), would not be exactly the
> same. For example it would set the file to binary mode on windows. Maybe it
> is exactly the same otherwise? For me, the equality in semantics are not
> obvious.
>
> Also, now when I realized there is only two users of os::open I am less
> sure it always does the right thing...
>
> I prefer the os::fopen_retain way.
>
>
I agree with you on the proposed fix: to open the file - at least on
windows - with the inherit flag turned off. I still disagree with you about
the way this is done. I am not a bit fan on "one trick APIs" being dropped
into the os namespace for one singular purpose - I think we had recently a
similar discussion about an snprintf variant specific for logging only.
Just counting are a couple of variants I would prefer:
1) keep the API locally to logging, do not make it global. It is logging
specific, after all.
2) Or even easier, just do this (logFileOutput.cpp):
const char* const LogFileOutput::FileOpenMode = WINDOWS_ONLY("aN")
NOT_WINDOWS("a");
that would fix windows. The other platforms do not have the problem if
spawning via Runtime.exec(), and the problem of
native-forking-and-handle-leaking is, while possible, rather theoretical.
2) If you really want a new global API, rename the thing to just
"os::fopen()". Because after all you want to wrap a generic fopen() and
forbid handle inheritance, yes? This is the same thing the os::open()
sister function does too, so if you think you need that, give it a first
class name :) And we could use tests then too (I think we have gtests for
os::open()). In that case I also dislike the many ifdefs, so if you keep
the function in its current form, I'd prefer it fanned out for different
platforms, like os::open() does it.
Just my 5c, and tastes differ, so I'll wait what others say. I'll cc Markus
as the UL owner.
Oh, I also think the bug desciption is a bit misleading, since this is
about the UL file handle in general, not only gc log. And may it make sense
to post this in hotspot-runtime, not hotspot-dev?
Thanks and Best Regards, Thomas
> Thanks,
> Leo
>
>
>> Kind Regards, Thomas
>>
>>
>>
>> Kind Regards, Thomas
>>
>>
>> On Mon, Mar 12, 2018 at 2:20 PM, Leo Korinth
>> <leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>
>> <mailto:leo.korinth at oracle.com <mailto:leo.korinth at oracle.com>>>
>>
>> wrote:
>>
>> Hi,
>>
>> This fix is for all operating systems though the problem
>> only seams
>> to appear on windows.
>>
>> I am creating a proxy function for fopen (os::fopen_retain)
>> that
>> appends the non-standard "e" mode for linux and bsds. For
>> windows
>> the "N" mode is used. For other operating systems, I assume
>> that I
>> can use fcntl F_SETFD FD_CLOEXEC. I think this will work
>> for AIX,
>> Solaris and other operating systems that do not support the
>> "e"
>> flag. Feedback otherwise please!
>>
>> The reason that I use the mode "e" and not only fcntl for
>> linux and
>> bsds is threefold. First, I still need to use mode flags on
>> windows
>> as it does not support fcntl. Second, I probably save a
>> system call.
>> Third, the change will be applied directly, and there will
>> be no
>> point in time (between system calls) when the process can
>> leak the
>> file descriptor, so it is safer.
>>
>> The test case forks three VMs in a row. By doing so we know
>> that the
>> second VM is opened with a specific log file. The third VM
>> should
>> have less open file descriptors (as it is does not use
>> logging)
>> which is checked using a UnixOperatingSystemMXBean. This is
>> not
>> possible on windows, so on windows I try to rename the
>> file, which
>> will not work if the file is opened (the actual reason the
>> bug was
>> opened).
>>
>> The added test case shows that the bug fix closes the log
>> file on
>> windows. The VM on other operating systems closed the log
>> file even
>> before the fix.
>>
>> Maybe the test case should be moved to a different path?
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8176717
>> <https://bugs.openjdk.java.net/browse/JDK-8176717>
>> <https://bugs.openjdk.java.net/browse/JDK-8176717
>> <https://bugs.openjdk.java.net/browse/JDK-8176717>>
>> https://bugs.openjdk.java.net/browse/JDK-8176809
>> <https://bugs.openjdk.java.net/browse/JDK-8176809>
>> <https://bugs.openjdk.java.net/browse/JDK-8176809
>> <https://bugs.openjdk.java.net/browse/JDK-8176809>>
>>
>> Webrev:
>> http://cr.openjdk.java.net/~lkorinth/8176717/00/
>> <http://cr.openjdk.java.net/~lkorinth/8176717/00/>
>> <http://cr.openjdk.java.net/~lkorinth/8176717/00/
>> <http://cr.openjdk.java.net/~lkorinth/8176717/00/>>
>>
>> Testing:
>> hs-tier1, hs-tier2 and TestInheritFD.java
>> (on 64-bit linux, solaris, windows and mac)
>>
>> Thanks,
>> Leo
>>
>>
>>
>>
More information about the hotspot-dev
mailing list