PING: RFR(s): (new approach) 8223777: In posix_spawn mode, failing to exec() jspawnhelper does not result in an error

Thomas Stüfe thomas.stuefe at gmail.com
Tue Jun 4 16:14:17 UTC 2019


Hi Roger,

thanks for the feedback. Please find my answers inline.

On Tue, Jun 4, 2019 at 5:09 PM Roger Riggs <Roger.Riggs at oracle.com> wrote:

> Hi Thomas,
>
> A minor concern is the impact of the extra write and read that can cause
> rescheduling
> of the parent and child processes.  But that's probably in the noise
> compared to the
> real work of exec.
>

I would think so too. exec() needs to read the binary from the file system,
so there is some IO at least to come.


>   It would raise the complexity quite a bit/too much to code a single read
> in the parent that could expect 0/4/8 bytes.
>
>
I guess I could do that.. but I am not sure it is worthwhile. In our
propietary port we have a more involved solution with some more read-write
cycles between child and parent. I am not saying this is better - there are
pros and cons - but I never saw any performance issues with our solution
compared to OpenJDK. Based on that I believe one more read would be fine.


> At ProcessImpl_md.c: 708: the "Read failed" is less than informative.
> (Though it is the same as the pre-existing one at 720).
>

Yes. I have a follow up issue open (linked in the JBS issue) to improve
reporting on errors. That may improve matters a bit.


> But I suppose it has never happened.  The 'Exec failed' is more specific
> than 'read'.
> And it has probably never been seen.
>

You know this is interesting. I thought exec errors should be not that
uncommon or difficult to provoke. I did a test. I maimed an executable such
that I cannot execute it without getting an exec format error (ENOEXEC).
Then I tried to spawn it with Runtime.exec. But that appeared to be
successful, but the child process died immediately. Happened regardless of
launchMechanism.

Then I ran an strace over it and saw this:

5332 [pid  3911] execve("./sleep2", ["./sleep2"], [/* 78 vars */]
<unfinished ...>
                                                      ..
5342 [pid  3911] <... execve resumed> )      = -1 ENOEXEC (Exec format
error)
5343 [pid  3911] execve("/bin/sh", ["/bin/sh", "./sleep2"], [/* 78 vars */]
<unfinished ...>
5347 [pid  3911] <... execve resumed> )      = 0

So, if the first exec fails for whatever reason, we try again, passing the
executable file name as argument to the system shell. This does not feel
right? Do you know why we do this?


>
> Thanks, Roger
>
>
Thank you Roger. Can I consider the patch to be reviewed by you?

..Thomas


> On 06/04/2019 02:06 AM, Thomas Stüfe wrote:
>
> Hi all,
>
> may I please have reviews/opinions on this fix?
>
> Fix has been live in our test landscape since some weeks.
>
> If we do not want this fix to be in JDK13, we may want to revert the
> posix_spawn-by-default-on-Linux change.
>
> Thanks, Thomas
>
>
> On Mon, May 20, 2019 at 4:15 PM Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
>
>> Hi all,
>>
>> (old mail thread:
>> https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060139.html
>> )
>>
>> May I please have your reviews and opinions for the following bug fix:
>>
>> issue: https://bugs.openjdk.java.net/browse/JDK-8223777
>> cr:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8223777-posix_spawn-no-exec-error-alternate-impl/webrev.00/webrev/
>>
>> ---
>>
>> The fix implements Florians proposal: the jspawnhelper will signal its
>> aliveness to the parent process the moment it gains control. If the parent
>> process does not get the signal, it assumes exec'ing the jspawnhelper never
>> worked.
>>
>> I only do this for posix_spawn mode, out of cautiousness.
>>
>> (Note that I kept the fix as minimal as possible. I found a minor bug and
>> some improvement possibilities and opened follow up issues to track them:
>> JDK-8224180 and JDK-8224181).
>>
>> Thanks, Thomas
>>
>>
>>
>


More information about the core-libs-dev mailing list