Process.exec with the linux posix_spawn mode has a bug
Thomas Stüfe
thomas.stuefe at gmail.com
Mon May 13 14:22:23 UTC 2019
On Mon, May 13, 2019 at 4:11 PM Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:
>
>
> On Mon, May 13, 2019 at 3:42 PM Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
>
>>
>> Hi Martin,
>>
>> On Mon, May 13, 2019 at 2:08 PM Martin Buchholz <martinrb at google.com>
>> wrote:
>>
>>>
>>>
>>> I am happy this is resolved and the intermittent behavior explained. Yes,
>>>> we could improve exception messages, especially since analyzing fork
>>>> scenarios is cumbersome.
>>>>
>>>
>>> I tried hard back in 2005 to provide pretty good java-level diagnostics
>>> when subprocess starting failed somehow (see WhyCantJohnnyExec) . At least
>>> the errno did get reported.
>>>
>>>
>> I know your code. For many years I wondered who Johnny is :)
>>
>> We have a very similar solution in our port: we have our own error codes
>> (plus errno mixed in where it makes sense) for the many things that can go
>> wrong in the forkhelper. Maybe we can improve upon your solution a bit.
>> And/or add tracing for environment etc.
>>
>> But here is one thing that I still do not understand with Remis problem:
>>
>> The theory is that the first exec(), starting jspawnhelper, went wrong
>> with NOACCESS, yes?
>>
>> Man page for posix_spawn() states:
>>
>> <quote>
>> Upon successful completion, posix_spawn() and posix_spawnp() place
>> the PID of the child process in pid, and return 0. If there is an
>> error before or during the fork(2), then no child is created, the
>> contents of *pid are unspecified, and these functions return an
>> error
>> number as described below.
>>
>> Even when these functions return a success status, the child
>> process
>> may still fail for a plethora of reasons related to its pre-exec()
>> initialization. In addition, the exec(3) may fail. In all of
>> these
>> cases, the child process will exit with the exit value of 127.
>> </quote>
>>
>> To me this looks as if what should have happened is: posix_spawn() should
>> have returned with success, since the fork() went thru. Then, the child
>> process (still inside posix_spawn()) attempts exec and gets a NOACCESS.
>> Then, child process should have ended with exit code 127. Your fail pipe
>> would never read an error code since we never entered the main function of
>> jspawnhelper. For the java caller it should have looked like a very short
>> lived process with exit code 127.
>>
>> Obviously this is not what happened, since Remi reported an IOException
>> with an errno. So, where do I understand this wong?
>>
>>
> Hmm this looks wrong. Just tested (Ubuntu 16.4): removing execute
> permission from jspawnhelper does not result in an IOException. Instead,
> Runtime.exec() seemingly succeeds. strace shows the exec() for jspawnhelper
> to fail as expected:
>
> 5676 [pid 13796]
> execve("/shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/lib/jspawnhelper",
> ["11:14"], [/* 79 vars */]) = -1 EACCES (Permission denied)
> 5677 [pid 13796] exit_group(127) = ?
> 5678 [pid 13780] <... vfork resumed> ) = 13796
> 5679 [pid 13796] +++ exited with 127 +++
> 5680 [pid 13780] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
> si_pid=13796, si_uid=1027, si_status=127, si_utime=0, si_stime=0} ---
>
> But we completely fail to notice.
>
> This is bad. We should fix it.
>
> One more thing, not sure if this is libc specific? The OpenGroup manpage
> for posix_spawn() states:
>
> <quote>
> If *posix_spawn*() or *posix_spawnp*() fail for any of the reasons that
> would cause *fork*()
> <http://pubs.opengroup.org/onlinepubs/007904875/functions/fork.html> or
> one of the *exec
> <http://pubs.opengroup.org/onlinepubs/007904875/functions/exec.html>* family
> of functions to fail, an error value shall be returned as described by
> *fork*()
> <http://pubs.opengroup.org/onlinepubs/007904875/functions/fork.html> and *exec
> <http://pubs.opengroup.org/onlinepubs/007904875/functions/exec.html>*,
> respectively (or, if the error occurs after the calling process
> successfully returns, the child process shall exit with exit status 127).
> </quote>
>
> which I interpret as the standard leaves open the decision if exec()
> errors are communicated outside to the caller of posix_spawn().
>
> ..Thomas
>
>
.. opened https://bugs.openjdk.java.net/browse/JDK-8223777 to track this.
>
>> I've had this little script around for ages:
>>>
>>> #!/bin/bash
>>> # -v: Print unabbreviated versions of environment, etc
>>>
>>> exec /usr/bin/strace -f -v -s 256 -e signal=none -e trace=process "$@"
>>>
>>>
>> We had all this as part of spawn traces. But this is a nice and neat
>> idea. Does it print current directory?
>>
>> Cheers, Thomas
>>
>>
>
More information about the core-libs-dev
mailing list