RFR(s): 8210549: Runtime.exec: in closeDescriptors(), use FD_CLOEXEC instead of close()

Langer, Christoph christoph.langer at sap.com
Wed Sep 12 11:47:30 UTC 2018


Hi Thomas,

I like the full patch �� +1

Best regards
Christoph

> -----Original Message-----
> From: core-libs-dev <core-libs-dev-bounces at openjdk.java.net> On Behalf
> Of Thomas Stüfe
> Sent: Dienstag, 11. September 2018 19:27
> To: Java Core Libs <core-libs-dev at openjdk.java.net>
> Subject: RFR(s): 8210549: Runtime.exec: in closeDescriptors(), use
> FD_CLOEXEC instead of close()
> 
> Hi all,
> 
> May I please have reviews for this small fix:
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8210549
> 
> patch (full): http://cr.openjdk.java.net/~stuefe/webrevs/8210549-Use-
> FD_CLOEXEC-instead-of-close/webrev.00/webrev/
> patch (minimal alternative):
> http://cr.openjdk.java.net/~stuefe/webrevs/8210549-Use-FD_CLOEXEC-
> instead-of-close/webrev.minimal/webrev/
> 
> See also prior discussion:
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-
> September/055173.html
> 
> Submit tests ran with full patch successfully through.
> 
> ---
> 
> Background:
> 
> After fork()/vfork() and before exec(), the child process needs to
> close all inherited file descriptors apart from the stdin/out/err pipe
> ends. We do this by iterating thru all file descriptors in
> /proc/<pid>/fd or whatever the equivalent is on that platform. This is
> done using opendir(), readdir().
> 
> We then close all these file descriptors using close().
> 
> The problem with that technique is that we may accidentally close the
> file descriptor opendir() is using internally to read the directory
> content of /proc/<pid>/fd, thereby sawing off the branch we are
> sitting on. The coding does some magic to attempt to prevent this:
> 
> <quote>
> 86    /* We're trying to close all file descriptors, but opendir() might
> 87     * itself be implemented using a file descriptor, and we certainly
> 88     * don't want to close that while it's in use.  We assume that if
> 89     * opendir() is implemented using a file descriptor, then it uses
> 90     * the lowest numbered file descriptor, just like open().  So we
> 91     * close a couple explicitly.  */
> 92
> 93    close(from_fd);          /* for possible use by opendir() */
> 94    close(from_fd + 1);      /* another one for good luck */
> 
> ...
> 108    while ((dirp = readdir64(dp)) != NULL) {
> 109        int fd;
> 110        if (isAsciiDigit(dirp->d_name[0]) &&
> 111            (fd = strtol(dirp->d_name, NULL, 10)) >= from_fd +
> 2)Improve "close all filedescriptors" coding.
> 112            close(fd);
> 113    }
> 
> </quote>
> 
> This workaround can be removed if, instead of outright closing the
> file descriptor in the loop, we were to set the file descriptor to
> FD_CLOEXEC. Closing all descriptors would be defered to the actual
> exec() call some milliseconds later.
> 
> ----
> 
> The patch changes the close() call to fcntl(FD_CLOEXEC). This removes
> the need for the workaround as described.
> 
> In addition to that, the full version of the patch also adds error
> handling to the readdir() loop.
> 
> But note that this exposes us to a behavioral change should we run
> into a readdir() error:
> 
> - currently, we would leave the remaining file descriptors quietly
> unclosed. This usually would have no effect, but in rare cases may
> lead to difficult-to-analyze errors when child processes accidentally
> disturb parent process IO.
> 
> - With this patch, we will fail if readdir fails. However, we do not
> just yet fail the Runtime.exec() call, but enter a fallback code
> section here:
> 
> 364     /* close everything */
> 365     if (closeDescriptors() == 0) { /* failed,  close the old way */
> 366         int max_fd = (int)sysconf(_SC_OPEN_MAX);
> 367         int fd;
> 368         for (fd = FAIL_FILENO + 1; fd < max_fd; fd++)
> 369             if (close(fd) == -1 && errno != EBADF)
> 370                 goto WhyCantJohnnyExec;
> 371     }
> 
> I am not convinced this is a good fallback. _SC_OPEN_MAX depends on
> ulimit -Hn and may be high. On my Ubuntu 16.4 box, default value after
> installation is 1048576. When we hit this fallback code, it takes
> almost a full second to spawn a single child process. In fork
> intensive scenarios this can get pretty bad.
> 
> What do you think?
> 
> Thanks & Best Regards,
> 
> Thomas


More information about the core-libs-dev mailing list