Process.exec with the linux posix_spawn mode has a bug

Thomas Stüfe thomas.stuefe at gmail.com
Mon May 13 14:11:52 UTC 2019


On Mon, May 13, 2019 at 3:42 PM Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:

>
> Hi Martin,
>
> On Mon, May 13, 2019 at 2:08 PM Martin Buchholz <martinrb at google.com>
> wrote:
>
>>
>>
>> I am happy this is resolved and the intermittent behavior explained. Yes,
>>> we could improve exception messages, especially since analyzing fork
>>> scenarios is cumbersome.
>>>
>>
>> I tried hard back in 2005 to provide pretty good java-level diagnostics
>> when subprocess starting failed somehow (see WhyCantJohnnyExec) .  At least
>> the errno did get reported.
>>
>>
> I know your code. For many years I wondered who Johnny is :)
>
> We have a very similar solution in our port: we have our own error codes
> (plus errno mixed in where it makes sense) for the many things that can go
> wrong in the forkhelper. Maybe we can improve upon your solution a bit.
> And/or add tracing for environment etc.
>
> But here is one thing that I still do not understand with Remis problem:
>
> The theory is that the first exec(), starting jspawnhelper, went wrong
> with NOACCESS, yes?
>
> Man page for posix_spawn() states:
>
> <quote>
>        Upon successful completion, posix_spawn() and posix_spawnp() place
>        the PID of the child process in pid, and return 0.  If there is an
>        error before or during the fork(2), then no child is created, the
>        contents of *pid are unspecified, and these functions return an
> error
>        number as described below.
>
>        Even when these functions return a success status, the child process
>        may still fail for a plethora of reasons related to its pre-exec()
>        initialization.  In addition, the exec(3) may fail.  In all of these
>        cases, the child process will exit with the exit value of 127.
> </quote>
>
> To me this looks as if what should have happened is: posix_spawn() should
> have returned with success, since the fork() went thru. Then, the child
> process (still inside posix_spawn()) attempts exec and gets a NOACCESS.
> Then, child process should have ended with exit code 127. Your fail pipe
> would never read an error code since we never entered the main function of
> jspawnhelper. For the java caller it should have looked like a very short
> lived process with exit code 127.
>
> Obviously this is not what happened, since Remi reported an IOException
> with an errno. So, where do I understand this wong?
>
>
Hmm this looks wrong. Just tested (Ubuntu 16.4): removing execute
permission from jspawnhelper does not result in an IOException. Instead,
Runtime.exec() seemingly succeeds. strace shows the exec() for jspawnhelper
to fail as expected:

5676 [pid 13796]
execve("/shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/lib/jspawnhelper",
["11:14"], [/* 79 vars */]) = -1 EACCES (Permission denied)
5677 [pid 13796] exit_group(127)             = ?
5678 [pid 13780] <... vfork resumed> )       = 13796
5679 [pid 13796] +++ exited with 127 +++
5680 [pid 13780] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
si_pid=13796, si_uid=1027, si_status=127, si_utime=0, si_stime=0} ---

But we completely fail to notice.

This is bad. We should fix it.

One more thing, not sure if this is libc specific? The OpenGroup manpage
for posix_spawn() states:

<quote>
If *posix_spawn*() or *posix_spawnp*() fail for any of the reasons that
would cause *fork*()
<http://pubs.opengroup.org/onlinepubs/007904875/functions/fork.html> or one
of the *exec
<http://pubs.opengroup.org/onlinepubs/007904875/functions/exec.html>* family
of functions to fail, an error value shall be returned as described by
*fork*()
<http://pubs.opengroup.org/onlinepubs/007904875/functions/fork.html> and *exec
<http://pubs.opengroup.org/onlinepubs/007904875/functions/exec.html>*,
respectively (or, if the error occurs after the calling process
successfully returns, the child process shall exit with exit status 127).
</quote>

which I interpret as the standard leaves open the decision if exec() errors
are communicated outside to the caller of posix_spawn().

..Thomas


> I've had this little script around for ages:
>>
>> #!/bin/bash
>> # -v: Print unabbreviated versions of environment, etc
>>
>> exec /usr/bin/strace -f -v -s 256 -e signal=none -e trace=process "$@"
>>
>>
> We had all this as part of spawn traces. But this is a nice and neat idea.
> Does it print current directory?
>
> Cheers, Thomas
>
>


More information about the core-libs-dev mailing list