Question about libjava/childproc.c

Wed Sep 5 18:43:43 UTC 2018

On Wed, Sep 5, 2018 at 6:43 PM, Martin Buchholz <martinrb at google.com> wrote:
>
>
> On Wed, Sep 5, 2018 at 9:23 AM, Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
>>
>>
>> Since all this happens between vfork and exec we cannot do any decent
>> error handling here. Even what little we do is way outside the allowed
>> spec for vfork().
>
>
> Well we do have a fallback if closeDescriptors fails.  But on closer
> inspection we don't check for error conditions in readdir
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html
>
> Upon successful completion, readdir() shall return a pointer to an object of
> type struct dirent. When an error is encountered, a null pointer shall be
> returned and errno shall be set to indicate the error. When the end of the
> directory is encountered, a null pointer shall be returned and errno is not
> changed.
>
>
> So before the readdir loop we should set errno to 0.  Then when readdir
> returns NULL, we should check whether errno is non-zero.
>

Note that if we fix this, we may run into errors which were hidden
before, namely if we really accidentally did close the file descriptor
used by opendir(). Currently, even if some file descriptors remain
unclosed, that may not necessarily lead to an error. But with your
suggested error handling we would straight away fail.

Therefore I would combine both fixes - add readdir error handling and
also change to fcntl(FD_CLOEXEC).

> ---
>
> We do have a way of reporting errno to the parent.  See WhyCantJohnnyExec.

Ah I see.

Btw, do you know what happens when we modify errno in a vforked child
process? Would that not modify errno in the parent process too? (errno
is probably implemented as thread local variable, and since the parent
thread sleeps until we exec this may still be okay, or?).

---

A bit more background, if you are interested:

A long time ago, at SAP we swapped Oracle's implementation of
Runtime.exec on Unix against a completely homegrown one. The reason
was that we needed a really robust implementation for our customers,
and with the then current implementation we kept running into
limitations and pathological corner cases. Especially on outlier
platforms like HPUX or AS/400.

This was long before OpenJDK existed, so there was no way for us to
just fix the coding upstream. But now there is, so at regular
intervals I keep eyeing our implementation to check if we could merge.

The main difference between our version and the upstream one is: we
switched to vfork() for the obvious performance reasons. But since
vfork() is scary we implemented the exec-twice-trick with a little
in-between binary. The main purpose was to minimize the window between
vfork() and the first exec(), where we operate inside the parent
process memory. Everything "dangerous" - calling libc functions,
scanning the proc file system, error handling, tracing etc - is
deferred to after the first exec. And since that window is minimal,
that also reduces the chance of signals in the child process harming
the parent.

Later Oracle introduced something very similar with the jspawnhelper.
But I see that jspawnhelper is not used at all for the vfork() case,
just for posix_spawn(), yes?

Thanks, Thomas