Runtime.exec : vfork() concerns and a fix proposal
Hi all, I wanted to gauge opinions on the following issue: Runtime.exec, on Linux, uses vfork(2) by default. It gives us better performance compared with fork() and robustness in constrained memory situations. But as we know vfork() can be dangerous if used incorrectly. In the child process before exec'ing, we live in the memory of the parent process. If we are not very careful we can influence or crash the parent process. According to POSIX pretty much the only thing the child process is allowed to do after vfork(2) is to exec(3) immediately; if that fails, you must call _exit(2). http://pubs.opengroup.org/onlinepubs/009604599/functions/vfork.html However, in the openjdk we do a number of things beyond that: - stdin,out,err pipe handling business - closing all file descriptors - we change the working directory - we may actually modify errno manually - in case exec fails, we communicate the error back to the parent using pipe. This involves calling a number of libc functions beyond exec(), namely read, close, dup2, opendir/readdir, write, chdir... It also needs a bit of stack, since we assemble path names. -- I was curious whether there were any real issues, so I tested (on Ubuntu 16.4) and found: 1) A crash - any crash - in the child process before exec() will kill the parent jvm dead. Weirdly enough, we do not even enter our error handling, but seem to die instantly with the default "Segmentation Fault". 2) Signals received by the child process before exec() influence the parent process. For example: - SIGINT set to the child ends both parent and child, immediately - SIGABRT aborts both child and parent - any error signal sent to the child lead to the behavior described at (1) 3) A stack overflow in the child before exec() also kills the parent. Unsurprising, since guard page hit -> segfault -> see (1). 4) more amusing, setting errno in the child before exec() changes the errno in the parent process. propagates to the parent process. But since errno is thread local and the thread in the parent process is waiting in vfork() and will, upon return, not look at errno (after all, vfork succeeded) this causes no trouble. There may be more issues, but these were the ones I tested. In all cases I counter-tested with fork() instead of vfork() and as expected with fork() the parent process stays unaffected as it should be. ------------- Whether you think these issues are worth solving is an open question. All these cases may happen in the wild (well, apart from crash-by-programming-error if one assumes the program to be really bug free) albeit with a very small probability. But once these bugs occur, they can be very difficult to analyse. So fixing this may be worthwhile. At SAP, we opted for robustness, so we changed the Runtime.exec() implementation to deal with vfork() issues. Basically, we employ the exec-twice technique: - in the child, after the vfork(), we immediately exec() into a little bootstrap binary ("forkhelper"). - Now we are safe in the sense that we do not share memory with the parent process anymore - Then, parent process communicates with the child via pipes and gives it all information needed to do the "real" exec: environ, current dir, arguments... . - Now the child exec's a second time, this time into the real target binary. The point of this technique is that we minimize the window in the child between vfork and the first exec. In fact, we are now fully POSIX compliant. This solves the described pathological cases. It has some other advantages too, e.g. allowing for better error handling and tracing in the Runtime.exec() area. Performance-wise it depends: we exec twice, so we pay twice. However, since the parent continues execution after the first exec, it spends less time waiting on the child process, which can make a difference if there are many file descriptors open. --- Checking opinions here. Do you think we are okay with our current implementation or would a change as described above be welcome in the OpenJDK too? Thanks, and Best Regards, Thomas
I think this is a cool idea. Do you have any performance numbers? On Tue, Sep 11, 2018 at 12:52 PM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi all,
I wanted to gauge opinions on the following issue:
Runtime.exec, on Linux, uses vfork(2) by default. It gives us better performance compared with fork() and robustness in constrained memory situations.
But as we know vfork() can be dangerous if used incorrectly. In the child process before exec'ing, we live in the memory of the parent process. If we are not very careful we can influence or crash the parent process.
According to POSIX pretty much the only thing the child process is allowed to do after vfork(2) is to exec(3) immediately; if that fails, you must call _exit(2).
http://pubs.opengroup.org/onlinepubs/009604599/functions/vfork.html
However, in the openjdk we do a number of things beyond that:
- stdin,out,err pipe handling business - closing all file descriptors - we change the working directory - we may actually modify errno manually - in case exec fails, we communicate the error back to the parent using pipe.
This involves calling a number of libc functions beyond exec(), namely read, close, dup2, opendir/readdir, write, chdir... It also needs a bit of stack, since we assemble path names.
--
I was curious whether there were any real issues, so I tested (on Ubuntu 16.4) and found:
1) A crash - any crash - in the child process before exec() will kill the parent jvm dead. Weirdly enough, we do not even enter our error handling, but seem to die instantly with the default "Segmentation Fault".
2) Signals received by the child process before exec() influence the parent process. For example: - SIGINT set to the child ends both parent and child, immediately - SIGABRT aborts both child and parent - any error signal sent to the child lead to the behavior described at (1)
3) A stack overflow in the child before exec() also kills the parent. Unsurprising, since guard page hit -> segfault -> see (1).
4) more amusing, setting errno in the child before exec() changes the errno in the parent process. propagates to the parent process. But since errno is thread local and the thread in the parent process is waiting in vfork() and will, upon return, not look at errno (after all, vfork succeeded) this causes no trouble.
There may be more issues, but these were the ones I tested.
In all cases I counter-tested with fork() instead of vfork() and as expected with fork() the parent process stays unaffected as it should be.
-------------
Whether you think these issues are worth solving is an open question.
All these cases may happen in the wild (well, apart from crash-by-programming-error if one assumes the program to be really bug free) albeit with a very small probability. But once these bugs occur, they can be very difficult to analyse. So fixing this may be worthwhile.
At SAP, we opted for robustness, so we changed the Runtime.exec() implementation to deal with vfork() issues. Basically, we employ the exec-twice technique:
- in the child, after the vfork(), we immediately exec() into a little bootstrap binary ("forkhelper"). - Now we are safe in the sense that we do not share memory with the parent process anymore - Then, parent process communicates with the child via pipes and gives it all information needed to do the "real" exec: environ, current dir, arguments... . - Now the child exec's a second time, this time into the real target binary.
The point of this technique is that we minimize the window in the child between vfork and the first exec. In fact, we are now fully POSIX compliant. This solves the described pathological cases.
It has some other advantages too, e.g. allowing for better error handling and tracing in the Runtime.exec() area. Performance-wise it depends: we exec twice, so we pay twice. However, since the parent continues execution after the first exec, it spends less time waiting on the child process, which can make a difference if there are many file descriptors open.
---
Checking opinions here. Do you think we are okay with our current implementation or would a change as described above be welcome in the OpenJDK too?
Thanks, and Best Regards, Thomas
-- - DML
Hi David, On Tue, Sep 11, 2018 at 8:29 PM, David Lloyd <david.lloyd@redhat.com> wrote:
I think this is a cool idea.
Thanks. I think I did not come up with it though, I think the technique was known already.
Do you have any performance numbers?
Sure: small program, just spawning off /bin/true a 1000 times, measured on my t450s running Ubuntu 16.4: Number open files: <none> 1000 100000 openjdk8: 305ms 1.5s 115s sapjvm8: 721ms 2.3s 142s factor 2.4 1.53 1.23 So, it starts off with factor 2.3, but penalty diminishes with the number of open files. This comparison is a imprecise however since we compare different JVMs with completely different Runtime.exec() implementations. We do more checks in our JVM, which may mean more syscalls per fork(). ..Thomas
On Tue, Sep 11, 2018 at 12:52 PM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi all,
I wanted to gauge opinions on the following issue:
Runtime.exec, on Linux, uses vfork(2) by default. It gives us better performance compared with fork() and robustness in constrained memory situations.
But as we know vfork() can be dangerous if used incorrectly. In the child process before exec'ing, we live in the memory of the parent process. If we are not very careful we can influence or crash the parent process.
According to POSIX pretty much the only thing the child process is allowed to do after vfork(2) is to exec(3) immediately; if that fails, you must call _exit(2).
http://pubs.opengroup.org/onlinepubs/009604599/functions/vfork.html
However, in the openjdk we do a number of things beyond that:
- stdin,out,err pipe handling business - closing all file descriptors - we change the working directory - we may actually modify errno manually - in case exec fails, we communicate the error back to the parent using pipe.
This involves calling a number of libc functions beyond exec(), namely read, close, dup2, opendir/readdir, write, chdir... It also needs a bit of stack, since we assemble path names.
--
I was curious whether there were any real issues, so I tested (on Ubuntu 16.4) and found:
1) A crash - any crash - in the child process before exec() will kill the parent jvm dead. Weirdly enough, we do not even enter our error handling, but seem to die instantly with the default "Segmentation Fault".
2) Signals received by the child process before exec() influence the parent process. For example: - SIGINT set to the child ends both parent and child, immediately - SIGABRT aborts both child and parent - any error signal sent to the child lead to the behavior described at (1)
3) A stack overflow in the child before exec() also kills the parent. Unsurprising, since guard page hit -> segfault -> see (1).
4) more amusing, setting errno in the child before exec() changes the errno in the parent process. propagates to the parent process. But since errno is thread local and the thread in the parent process is waiting in vfork() and will, upon return, not look at errno (after all, vfork succeeded) this causes no trouble.
There may be more issues, but these were the ones I tested.
In all cases I counter-tested with fork() instead of vfork() and as expected with fork() the parent process stays unaffected as it should be.
-------------
Whether you think these issues are worth solving is an open question.
All these cases may happen in the wild (well, apart from crash-by-programming-error if one assumes the program to be really bug free) albeit with a very small probability. But once these bugs occur, they can be very difficult to analyse. So fixing this may be worthwhile.
At SAP, we opted for robustness, so we changed the Runtime.exec() implementation to deal with vfork() issues. Basically, we employ the exec-twice technique:
- in the child, after the vfork(), we immediately exec() into a little bootstrap binary ("forkhelper"). - Now we are safe in the sense that we do not share memory with the parent process anymore - Then, parent process communicates with the child via pipes and gives it all information needed to do the "real" exec: environ, current dir, arguments... . - Now the child exec's a second time, this time into the real target binary.
The point of this technique is that we minimize the window in the child between vfork and the first exec. In fact, we are now fully POSIX compliant. This solves the described pathological cases.
It has some other advantages too, e.g. allowing for better error handling and tracing in the Runtime.exec() area. Performance-wise it depends: we exec twice, so we pay twice. However, since the parent continues execution after the first exec, it spends less time waiting on the child process, which can make a difference if there are many file descriptors open.
---
Checking opinions here. Do you think we are okay with our current implementation or would a change as described above be welcome in the OpenJDK too?
Thanks, and Best Regards, Thomas
-- - DML
On Wed, Sep 12, 2018 at 2:04 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi David,
On Tue, Sep 11, 2018 at 8:29 PM, David Lloyd <david.lloyd@redhat.com> wrote:
I think this is a cool idea.
Thanks. I think I did not come up with it though, I think the technique was known already.
Do you have any performance numbers?
Sure:
small program, just spawning off /bin/true a 1000 times, measured on my t450s running Ubuntu 16.4:
Number open files: <none> 1000 100000
openjdk8: 305ms 1.5s 115s sapjvm8: 721ms 2.3s 142s
factor 2.4 1.53 1.23
This doesn't seem too bad, even for a first implementation. I wonder if it could be improved further using e.g. shared memory to reduce syscalls? -- - DML
On Wed, Sep 12, 2018 at 2:33 AM, Martin Buchholz <martinrb@google.com> wrote:
https://sourceware.org/ml/libc-alpha/2018-09/msg00120. Coincidence?
Yes :) Interesting though.
I agree that your proposal makes the implementation more robust, so if you are willing to implement it and it doesn't cost too much, then I would support it. The current implementation doesn't seem to cause trouble in practice, or at least we don't see any bug reports. When you benchmark with open file descriptors, try with and without FD_CLOEXEC flag set. Almost all fds should be created with the FD_CLOEXEC flag set - openjdk is moving in that direction. When there are many open files, your implementation may be a weird sort of optimization. On Tue, Sep 11, 2018 at 10:51 AM, Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
The point of this technique is that we minimize the window in the child between vfork and the first exec. In fact, we are now fully POSIX compliant.
2004-POSIX compliant ... but in the mean time they removed vfork(). But vfork is critical for Google (and probably others) due to the momentary overcommit problem on Linux. Or has that changed somehow?
On Wed, Sep 12, 2018 at 9:21 AM Martin Buchholz <martinrb@google.com> wrote:
I agree that your proposal makes the implementation more robust, so if you are willing to implement it and it doesn't cost too much, then I would support it. The current implementation doesn't seem to cause trouble in practice, or at least we don't see any bug reports.
When you benchmark with open file descriptors, try with and without FD_CLOEXEC flag set. Almost all fds should be created with the FD_CLOEXEC flag set - openjdk is moving in that direction. When there are many open files, your implementation may be a weird sort of optimization.
On Tue, Sep 11, 2018 at 10:51 AM, Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
The point of this technique is that we minimize the window in the child between vfork and the first exec. In fact, we are now fully POSIX compliant.
2004-POSIX compliant ... but in the mean time they removed vfork(). But vfork is critical for Google (and probably others) due to the momentary overcommit problem on Linux. Or has that changed somehow?
If the lack of POSIX support is a problem, posix_spawn could possibly be used instead in this case:
The child process is created using vfork(2) instead of fork(2) when [...] file_actions is NULL and the spawn-flags element of the attributes object pointed to by attrp does not contain POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF, POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER, POSIX_SPAWN_SETPGROUP, or POSIX_SPAWN_RESETIDS.
This would only work if the helper program is run directly though. -- - DML
Hi David, How does your proposal differ from the current posix_spawn and jspawnhelper MODE_POSIX_SPAWN implementation available on Solaris and AIX? This area is sensitive enough that it would be prudent to implement it as a new mode in ProcessImpl/ProcessImpl; retaining the existing options. Changing the default could be a build option and would require extensive testing and a long stability period. Regards, Roger On 9/12/18 10:33 AM, David Lloyd wrote:
On Wed, Sep 12, 2018 at 9:21 AM Martin Buchholz <martinrb@google.com> wrote:
I agree that your proposal makes the implementation more robust, so if you are willing to implement it and it doesn't cost too much, then I would support it. The current implementation doesn't seem to cause trouble in practice, or at least we don't see any bug reports.
When you benchmark with open file descriptors, try with and without FD_CLOEXEC flag set. Almost all fds should be created with the FD_CLOEXEC flag set - openjdk is moving in that direction. When there are many open files, your implementation may be a weird sort of optimization.
On Tue, Sep 11, 2018 at 10:51 AM, Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
The point of this technique is that we minimize the window in the child between vfork and the first exec. In fact, we are now fully POSIX compliant. 2004-POSIX compliant ... but in the mean time they removed vfork(). But vfork is critical for Google (and probably others) due to the momentary overcommit problem on Linux. Or has that changed somehow? If the lack of POSIX support is a problem, posix_spawn could possibly be used instead in this case:
The child process is created using vfork(2) instead of fork(2) when [...] file_actions is NULL and the spawn-flags element of the attributes object pointed to by attrp does not contain POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF, POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER, POSIX_SPAWN_SETPGROUP, or POSIX_SPAWN_RESETIDS. This would only work if the helper program is run directly though.
On Wed, Sep 12, 2018 at 9:58 AM Roger Riggs <roger.riggs@oracle.com> wrote:
Hi David,
How does your proposal differ from the current posix_spawn and jspawnhelper MODE_POSIX_SPAWN implementation available on Solaris and AIX?
Reading through the code, it ends up being pretty much identical AFAICT; if I understand correctly, Linux would be using vfork (or its equivalent) internally in this case.
This area is sensitive enough that it would be prudent to implement it as a new mode in ProcessImpl/ProcessImpl; retaining the existing options. Changing the default could be a build option and would require extensive testing and a long stability period.
Seems worthwhile though, given vfork's now-10-year-old obsolescence. It looks like Linux is the only platform still using vfork for ProcessImpl in OpenJDK. -- - DML
On Wed, Sep 12, 2018 at 8:16 AM, David Lloyd <david.lloyd@redhat.com> wrote:
Seems worthwhile though, given vfork's now-10-year-old obsolescence. It looks like Linux is the only platform still using vfork for ProcessImpl in OpenJDK.
I'm fine with switching to posix_spawn on Linux as long as we don't reintroduce the memory overcommit problem, which would be a deal-breaker. But glibc posix_spawn will use either vfork or clone with CLONE_VFORK, so we should be OK.
On Wed, Sep 12, 2018 at 7:33 AM, David Lloyd <david.lloyd@redhat.com> wrote:
The child process is created using vfork(2) instead of fork(2) when [...] file_actions is NULL and the spawn-flags element of the attributes object pointed to by attrp does not contain POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF, POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER, POSIX_SPAWN_SETPGROUP, or POSIX_SPAWN_RESETIDS.
Yes, the posix_spawn man page says that, but https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/spawni.c... says: /* The Linux implementation of posix_spawn{p} uses the clone syscall directly with CLONE_VM and CLONE_VFORK flags and an allocated stack. The new stack and start function solves most the vfork limitation (possible parent clobber due stack spilling).
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own. Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good. The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1". commit f0eb9ff7c46dff76f91160491fcca0eb25d0ab95 Author: David M. Lloyd <david.lloyd@redhat.com> Date: Thu Oct 18 15:56:37 2018 -0500 [JDK-6850720] Enable POSIX_SPAWN as an option for child process creation on Linux diff --git a/make/launcher/Launcher-java.base.gmk b/make/launcher/Launcher-java.base.gmk index 0ce0287d2be..c28fe42d102 100644 --- a/make/launcher/Launcher-java.base.gmk +++ b/make/launcher/Launcher-java.base.gmk @@ -84,7 +84,7 @@ endif ################################################################################ -ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix), ) +ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix linux), ) $(eval $(call SetupJdkExecutable, BUILD_JSPAWNHELPER, \ NAME := jspawnhelper, \ SRC := $(TOPDIR)/src/$(MODULE)/unix/native/jspawnhelper, \ diff --git a/src/java.base/unix/classes/java/lang/ProcessImpl.java b/src/java.base/unix/classes/java/lang/ProcessImpl.java index 368a4f7380b..959e50dfecd 100644 --- a/src/java.base/unix/classes/java/lang/ProcessImpl.java +++ b/src/java.base/unix/classes/java/lang/ProcessImpl.java @@ -89,7 +89,7 @@ final class ProcessImpl extends Process { private static enum Platform { - LINUX(LaunchMechanism.VFORK, LaunchMechanism.FORK), + LINUX(LaunchMechanism.VFORK, LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK), BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK), diff --git a/src/java.base/unix/native/libjava/ProcessImpl_md.c b/src/java.base/unix/native/libjava/ProcessImpl_md.c index 533584fdb7a..6869a64f2cc 100644 --- a/src/java.base/unix/native/libjava/ProcessImpl_md.c +++ b/src/java.base/unix/native/libjava/ProcessImpl_md.c @@ -44,7 +44,7 @@ #include <signal.h> #include <string.h> -#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) #include <spawn.h> #endif @@ -390,7 +390,7 @@ forkChild(ChildStuff *c) { return resultPid; } -#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) static pid_t spawnChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) { pid_t resultPid; @@ -489,7 +489,7 @@ startChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) #endif case MODE_FORK: return forkChild(c); -#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) case MODE_POSIX_SPAWN: return spawnChild(env, process, c, helperpath); #endif
Hi David, (for convenience, here the old thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-September/055333.h...) I would rather not expose a third way to spawn to the end user. As a test switch for developers, this seems fine. AFAIK (and someone please correct me if I am wrong) posix_spawn is not the magical third way beside fork and vfork, it is just a wrapper around fork/vfork and exec. Especially posix_spawn(), if using fork(), will still have the same pitfalls raw fork() has - quasi-random ENOMEM if we hit the overcommit heuristic, and inferior performance compared to vfork. So you have nothing gained, you have just relegated the fork/vfork decision down to a different layer, one which you have little control over. And since that fork/vfork decision is kind of important, you need that contol. Note that glibc has a flag to force always vfork behavior: POSIX_SPAWN_USEVFORK. The question is, how tight do we bind us to GNU-only features, since we may run on linux with a different libc, e.g. on Alpine we use muslc. My proposal, vfork() + the exec-twice-trick, is IMHO the best way to go. That exec-twice trick takes the sting away from vfork. After Martin made supportive noises in the other mail thread, I have planned in some time later this year to produce a patch for this. -- However: I just realized that my proposal (vfork() + exec-twice-trick) is very close, implementation wise, to the (posix_spawn() + POSIX_SPAWN_USEVFORK + jspawnhelper). The jspawnhelper already implements the exec-twice trick, if only for different reasons than to get away from vfork dangers. Advantage of this implementation would be that very little code would have to change. Disadvantage, in my eyes, is that we bind us to glibc, and an aesthetic side: calling an OS API but needing it to work in one specific way feels like programming with mechanical gloves... Thanks & Best Regards, Thomas (Copying Martin, I'm curious what he thinks) -- On Thu, Oct 18, 2018 at 11:43 PM David Lloyd <david.lloyd@redhat.com> wrote:
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own.
Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good.
The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1".
commit f0eb9ff7c46dff76f91160491fcca0eb25d0ab95 Author: David M. Lloyd <david.lloyd@redhat.com> Date: Thu Oct 18 15:56:37 2018 -0500
[JDK-6850720] Enable POSIX_SPAWN as an option for child process creation on Linux
diff --git a/make/launcher/Launcher-java.base.gmk b/make/launcher/Launcher-java.base.gmk index 0ce0287d2be..c28fe42d102 100644 --- a/make/launcher/Launcher-java.base.gmk +++ b/make/launcher/Launcher-java.base.gmk @@ -84,7 +84,7 @@ endif
################################################################################
-ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix), ) +ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix linux), ) $(eval $(call SetupJdkExecutable, BUILD_JSPAWNHELPER, \ NAME := jspawnhelper, \ SRC := $(TOPDIR)/src/$(MODULE)/unix/native/jspawnhelper, \ diff --git a/src/java.base/unix/classes/java/lang/ProcessImpl.java b/src/java.base/unix/classes/java/lang/ProcessImpl.java index 368a4f7380b..959e50dfecd 100644 --- a/src/java.base/unix/classes/java/lang/ProcessImpl.java +++ b/src/java.base/unix/classes/java/lang/ProcessImpl.java @@ -89,7 +89,7 @@ final class ProcessImpl extends Process {
private static enum Platform {
- LINUX(LaunchMechanism.VFORK, LaunchMechanism.FORK), + LINUX(LaunchMechanism.VFORK, LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
diff --git a/src/java.base/unix/native/libjava/ProcessImpl_md.c b/src/java.base/unix/native/libjava/ProcessImpl_md.c index 533584fdb7a..6869a64f2cc 100644 --- a/src/java.base/unix/native/libjava/ProcessImpl_md.c +++ b/src/java.base/unix/native/libjava/ProcessImpl_md.c @@ -44,7 +44,7 @@ #include <signal.h> #include <string.h>
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) #include <spawn.h> #endif
@@ -390,7 +390,7 @@ forkChild(ChildStuff *c) { return resultPid; }
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) static pid_t spawnChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) { pid_t resultPid; @@ -489,7 +489,7 @@ startChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) #endif case MODE_FORK: return forkChild(c); -#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) case MODE_POSIX_SPAWN: return spawnChild(env, process, c, helperpath); #endif
On Fri, Oct 19, 2018 at 1:56 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi David,
(for convenience, here the old thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-September/055333.h...)
I would rather not expose a third way to spawn to the end user. As a test switch for developers, this seems fine.
AFAIK (and someone please correct me if I am wrong) posix_spawn is not the magical third way beside fork and vfork, it is just a wrapper around fork/vfork and exec. Especially posix_spawn(), if using fork(), will still have the same pitfalls raw fork() has - quasi-random ENOMEM if we hit the overcommit heuristic, and inferior performance compared to vfork. So you have nothing gained, you have just relegated the fork/vfork decision down to a different layer, one which you have little control over. And since that fork/vfork decision is kind of important, you need that contol.
We've gained the same thing that was gained in [1] and for the same reason: vfork is deprecated and it is an error to continue to rely directly on it. The only non-deprecated means of using vfork is via posix_spawn. And, posix_spawn is not the game of roulette that you make it out to be. To quote the manual page: "The child process is created using vfork(2) instead of fork(2) when either of the following is true: * the spawn-flags element of the attributes object pointed to by attrp contains the GNU-specific flag POSIX_SPAWN_USEVFORK; or * file_actions is NULL and the spawn-flags element of the attributes object pointed to by attrp does not contain POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF, POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER, POSIX_SPAWN_SETPGROUP, or POSIX_SPAWN_RESETIDS. In other words, vfork(2) is used if the caller requests it, or if there is no cleanup expected in the child before it exec(3)s the requested file." It is true that we aren't explicitly requesting it, however, in our case file_actions is NULL and attrp is also NULL, so posix_spawn will use vfork* on Linux. * Actually neither fork nor vfork use anything but the clone() syscall. To quote the sources, "The Linux implementation of posix_spawn{p} uses the clone syscall directly with CLONE_VM and CLONE_VFORK flags and an allocated stack."
Note that glibc has a flag to force always vfork behavior: POSIX_SPAWN_USEVFORK.
Sure, but I don't really see this as necessary if glibc is already following the vfork-like path. Another thing to know is that at least in the 2.26 sources I have checked out, the POSIX_SPAWN_USEVFORK flag is completely ignored. See also [2]. [1] https://bugs.openjdk.java.net/browse/JDK-8161360 [2] https://github.com/bminor/glibc/commit/ccfb2964726512f6669fea99a43afa714e2e6... -- - DML
Hi David, On Fri, Oct 19, 2018 at 2:20 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Fri, Oct 19, 2018 at 1:56 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi David,
(for convenience, here the old thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-September/055333.h...)
I would rather not expose a third way to spawn to the end user. As a test switch for developers, this seems fine.
AFAIK (and someone please correct me if I am wrong) posix_spawn is not the magical third way beside fork and vfork, it is just a wrapper around fork/vfork and exec. Especially posix_spawn(), if using fork(), will still have the same pitfalls raw fork() has - quasi-random ENOMEM if we hit the overcommit heuristic, and inferior performance compared to vfork. So you have nothing gained, you have just relegated the fork/vfork decision down to a different layer, one which you have little control over. And since that fork/vfork decision is kind of important, you need that contol.
We've gained the same thing that was gained in [1] and for the same reason: vfork is deprecated and it is an error to continue to rely directly on it. The only non-deprecated means of using vfork is via posix_spawn.
To me this seems not a strong reason. I do not see vfork() ever going away. My issue with posix_spawn() is that we hand control over how the way we fork to the libc implementation. That implementation may change. It did so in the past, for both glibc and muslc (I think muslc is important beside glibc, see https://openjdk.java.net/projects/portola/). My point is since we know exactly what we want we may be better off doing that ourselves. Also, in the past libc implementators have not been exactly easy to deal with, so if problems with posix_spawn() come up, it may be difficult to get them fixed.
And, posix_spawn is not the game of roulette that you make it out to be. To quote the manual page:
"The child process is created using vfork(2) instead of fork(2) when either of the following is true:
* the spawn-flags element of the attributes object pointed to by attrp contains the GNU-specific flag POSIX_SPAWN_USEVFORK; or
* file_actions is NULL and the spawn-flags element of the attributes object pointed to by attrp does not contain POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF, POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER, POSIX_SPAWN_SETPGROUP, or POSIX_SPAWN_RESETIDS.
In other words, vfork(2) is used if the caller requests it, or if there is no cleanup expected in the child before it exec(3)s the requested file."
It is true that we aren't explicitly requesting it, however, in our case file_actions is NULL and attrp is also NULL, so posix_spawn will use vfork* on Linux.
* Actually neither fork nor vfork use anything but the clone() syscall. To quote the sources, "The Linux implementation of posix_spawn{p} uses the clone syscall directly with CLONE_VM and CLONE_VFORK flags and an allocated stack."
Note that glibc has a flag to force always vfork behavior: POSIX_SPAWN_USEVFORK.
Sure, but I don't really see this as necessary if glibc is already following the vfork-like path. Another thing to know is that at least in the 2.26 sources I have checked out, the POSIX_SPAWN_USEVFORK flag is completely ignored. See also [2].
[1] https://bugs.openjdk.java.net/browse/JDK-8161360 [2] https://github.com/bminor/glibc/commit/ccfb2964726512f6669fea99a43afa714e2e6...
-- - DML
You may be wrong about [2]. The relevant implementation in glibc seems to be this: https://github.com/tstuefe/glibc/blob/master/sysdeps/unix/sysv/linux/spawni.... and this seems to be the relevant change: https://github.com/tstuefe/glibc/commit/9ff72da471a509a8c19791efe469f47fa697... ( [2] is the "posix implementation". I am not sure what that is, maybe for Hurd?) So, glibc posix_spawn() uses clone with CLONE_VM and CLONE_VFORK. Interesting: if I understand the code right, they mitigate part of the dangers of vfork aka CLONE_VM by allocating a new separate stack for the child. --- Lets see what muslc does: https://github.com/esmil/musl/blob/master/src/process/posix_spawn.c So, they do exactly the same (I think the musl folks did this before glibc switched to clone). use clone(CLONE_VM|CLONE_VFORK). Hm okay. In conjunction with jspawnhelper this may work fine... I have to mull this over a bit more. -- One concern for me is that the change in the glibc is quite recent - 2016 - and that there may be many old glibc variants in the field installed which still use fork(). On those installations, switching the java vm to suddenly use posix_spawn may introduce regressions. My main concern - reliance on glibc developers for something we should better do ourselves - is unshaken, but lets see what others say. Kind Regards, Thomas
As author of the vfork strategy ... I'm supportive of the directions in this thread.👍👍 David's patch seems like clear progress (although maybe now that we have configure, we can make the spawn strategy conditional on HAVE_SPAWN_H) vfork is even (!) less in favor than it used to be, so migrating off of it seems good. Just make sure we don't bring back the OOM problems or unacceptable overhead from e.g. an second exec. Create a 100GB heap, then run /bin/true 1000 times. That big internal comment I wrote years ago is suffering from bitrot. IIRC vfork used to call a special vfork system call, NOT clone, years ago.
On Sun, Oct 21, 2018 at 7:25 PM Martin Buchholz <martinrb@google.com> wrote:
As author of the vfork strategy ... I'm supportive of the directions in this thread.
Great.
vfork is even (!) less in favor than it used to be, so migrating off of it seems good. Just make sure we don't bring back the OOM problems or unacceptable overhead from e.g. an second exec. Create a 100GB heap, then run /bin/true 1000 times.
Tried it and it succeeded without issues. I didn't use a 100GB heap though because I don't have that kind of RAM! But, I created a long chain of object arrays totalling about 8GB and forced it to stay alive until after the 1000 execs of /bin/true and it worked fine. To verify that the program works correctly, I compared the results with VFORK and FORK. As expected, VFORK also worked correctly, but FORK failed with errno=ENOMEM. -- - DML
* David Lloyd:
Sure, but I don't really see this as necessary if glibc is already following the vfork-like path. Another thing to know is that at least in the 2.26 sources I have checked out, the POSIX_SPAWN_USEVFORK flag is completely ignored. See also [2].
Right, the manual pages are outdated. For applications, there is never a good reason to use POSIX_SPAWN_USEVFORK because the glibc implementation is either buggy when this flag is used, or the flag does nothing. The bugs may matter to applications using OpenJDK, so I don't think you can set the flag within OpenJDK. So the only thing you can do here is to use posix_spawn *without* POSIX_SPAWN_USEVFORK, and advise OS vendors to backport the glibc changes if they have customers that are impacted by the lack of process creation performance (or OOM during process creation). Another possibility would be to emulate what glibc's fixed, fork-based posix_spawn does, but this requires writing some machine code (for vfork/clone) and issuing direct system calls to bypass some abstractions in glibc (for setprocmask).
[2] https://github.com/bminor/glibc/commit/ccfb2964726512f6669fea99a43afa714e2e6...
Note that this neither the canonical glibc source code location, nor is the code actually used on Linux. 8-) Thanks, Florian
Hi all, First off, to be clear, I am certainly shedding no tears if I do not get to contribute my vfork+exec*2 patch to upstream. It is a lot of work, and I much rather do something else. However, I have spend too much of my life with the Runtime.exec() layer to be easy about changing it. So far I have not read a single technical reason in this thread why vfork needs to be abandoned now - apart from it being obsolete. If you read my initial thread from September, you know that I think we have understood vfork's shortcomings very well, and that our (SAPs) proposed patch shows that they can be dealt with. In our port, our vfork+exec*2 is solid since many years, without any issues. I see that posix_spawn(), in its current form in the glibc and muslc, could be a good option. Long term, maybe better than a home grown one. But before we switch to posix_spawn(), I would like to understand better, at least for the major distros, which distro would be affected by regressions. Also, if we decide to switch to posix_spawn(), I would like to prevent "accidental backporting" without checking the platform requirements. The current posix_spawn() implementation was added to glibc with glibc 2.24. So, what was the state of posix_spawn() before that version? Is it safe to use, does it do the right thing, or will we encounter regressions? My Ubuntu 16.04 box runs glibc 2.23. Arguably, Ubuntu 16.04 is quite a common distro. I have to check our machines at work, but I am very sure that our zoo of SLES and RHEL servers do not all run glibc>=2.24, especially on the more exotic architectures. @Florian:
and advise OS vendors to backport the glibc changes if they have customers that are impacted by the lack of process creation performance (or OOM during process creation).
Please not. We have no leverage toward the OS vendors, we cannot "advise" anything. Even if OS vendors react and fix, we disrupt our users and make them unhappy with us and with java. Every issue has to be first analyzed before it even finds the way to us, let alone the OS vendor. Lets not risk breaking userspace (don't you guys have a rule like that :)? Best Regards, Thomas On Mon, Oct 22, 2018 at 3:37 PM Florian Weimer <fweimer@redhat.com> wrote:
* David Lloyd:
Sure, but I don't really see this as necessary if glibc is already following the vfork-like path. Another thing to know is that at least in the 2.26 sources I have checked out, the POSIX_SPAWN_USEVFORK flag is completely ignored. See also [2].
Right, the manual pages are outdated.
For applications, there is never a good reason to use POSIX_SPAWN_USEVFORK because the glibc implementation is either buggy when this flag is used, or the flag does nothing. The bugs may matter to applications using OpenJDK, so I don't think you can set the flag within OpenJDK. So the only thing you can do here is to use posix_spawn *without* POSIX_SPAWN_USEVFORK, and advise OS vendors to backport the glibc changes if they have customers that are impacted by the lack of process creation performance (or OOM during process creation).
Another possibility would be to emulate what glibc's fixed, fork-based posix_spawn does, but this requires writing some machine code (for vfork/clone) and issuing direct system calls to bypass some abstractions in glibc (for setprocmask).
[2] https://github.com/bminor/glibc/commit/ccfb2964726512f6669fea99a43afa714e2e6...
Note that this neither the canonical glibc source code location, nor is the code actually used on Linux. 8-)
Thanks, Florian
* Thomas Stüfe:
So far I have not read a single technical reason in this thread why vfork needs to be abandoned now - apart from it being obsolete. If you read my initial thread from September, you know that I think we have understood vfork's shortcomings very well, and that our (SAPs) proposed patch shows that they can be dealt with. In our port, our vfork+exec*2 is solid since many years, without any issues.
The main problem for vfork in application code is that you need to *all* disable signals, even signals used by the implementation. If a signal handler runs by accident while the vfork is active, memory corruption is practically guaranteed. The only way to disable the signals is with a direct system call; sigprocmask/pthread_sigmask do not work. Does your implementation do this?
The current posix_spawn() implementation was added to glibc with glibc 2.24. So, what was the state of posix_spawn() before that version? Is it safe to use, does it do the right thing, or will we encounter regressions?
It uses fork by default. It can be told to use vfork, via POSIX_SPAWN_USEVFORK, but then it is buggy. For generic JDK code, this seems hardly appropriate.
My Ubuntu 16.04 box runs glibc 2.23. Arguably, Ubuntu 16.04 is quite a common distro. I have to check our machines at work, but I am very sure that our zoo of SLES and RHEL servers do not all run glibc>=2.24, especially on the more exotic architectures.
In glibc, the vfork-based performance does not bring in any new ABIs, so it is in theory backportable. The main risk is that the vfork optimization landed in glibc 2.24, and the PID cache was removed in glibc 2.25. vfork with the PID cache was really iffy, but I would not recommend to backport the PID cache removal. But Debian 9/stretch uses glibc 2.24, and I think that shows that the vfork optimization with the PID cache should be safe enough. (Of course you need to remove the assert that fires if the vfork does not actually stop the parent process and is implemented as a fork; the glibc implementation still works, but with somewhat degraded error checking.) How far back would you want to see this changed? Debian jessie and Red Hat Enterprise Linux 6 would be rather unlikely. If you want to target those, your only chance is to essentially duplicate the glibc implementation in OpenJDK. Thanks, Florian
Hi Florian, our mails crossed... I think I am fine now with posix_spawn(), provided we do enough testing. But I'll answer your questions inline. On Mon, Oct 22, 2018 at 9:00 PM Florian Weimer <fweimer@redhat.com> wrote:
* Thomas Stüfe:
So far I have not read a single technical reason in this thread why vfork needs to be abandoned now - apart from it being obsolete. If you read my initial thread from September, you know that I think we have understood vfork's shortcomings very well, and that our (SAPs) proposed patch shows that they can be dealt with. In our port, our vfork+exec*2 is solid since many years, without any issues.
The main problem for vfork in application code is that you need to *all* disable signals, even signals used by the implementation. If a signal handler runs by accident while the vfork is active, memory corruption is practically guaranteed. The only way to disable the signals is with a direct system call; sigprocmask/pthread_sigmask do not work.
Does your implementation do this?
I understand. No, admittedly not. But we squeeze the vulnerable time window to the minimal possible: if (vfork() == 0) exec(..); which was a large step forward from the stock Ojdk solution. While not completely bullet proof, I saw not a single instance of an error in all these years (I understand those errors would be very intermittent and difficult to attribute to vfork+signalling, so we may have missed some).
The current posix_spawn() implementation was added to glibc with glibc 2.24. So, what was the state of posix_spawn() before that version? Is it safe to use, does it do the right thing, or will we encounter regressions?
It uses fork by default. It can be told to use vfork, via POSIX_SPAWN_USEVFORK, but then it is buggy. For generic JDK code, this seems hardly appropriate.
Are you sure about this? The coding I saw in glibc < 2.24 was that it would use vfork if both attributes and file actions were NULL, which should be the case with the OpenJDK and jspawnhelper. fork() would be bad and a reason not to use posix_spawn().
My Ubuntu 16.04 box runs glibc 2.23. Arguably, Ubuntu 16.04 is quite a common distro. I have to check our machines at work, but I am very sure that our zoo of SLES and RHEL servers do not all run glibc>=2.24, especially on the more exotic architectures.
In glibc, the vfork-based performance does not bring in any new ABIs, so it is in theory backportable. The main risk is that the vfork optimization landed in glibc 2.24, and the PID cache was removed in glibc 2.25. vfork with the PID cache was really iffy, but I would not recommend to backport the PID cache removal. But Debian 9/stretch uses glibc 2.24, and I think that shows that the vfork optimization with the PID cache should be safe enough. (Of course you need to remove the assert that fires if the vfork does not actually stop the parent process and is implemented as a fork; the glibc implementation still works, but with somewhat degraded error checking.)
How far back would you want to see this changed? Debian jessie and Red Hat Enterprise Linux 6 would be rather unlikely. If you want to target those, your only chance is to essentially duplicate the glibc implementation in OpenJDK.
As I wrote before, if I understand the coding in glibc between 2.4 and 2.24 correctly, I think it uses vfork() and that should be fine by me: posix_spawn() using vfork(), with no attributes/file actions and in conjunction with the jspawnhelper, is almost exactly the same as the proposed vfork() + exec*2 patch: posix_spawn() will exec() immediately after the vfork(), then, in jspwnhelper, we set up the new process and exec() again. So I am fine with that. Provided I have understood all that stuff correctly and not made a thinking error somewhere. Cheers, Thomas
Thanks, Florian
* Thomas Stüfe:
The main problem for vfork in application code is that you need to *all* disable signals, even signals used by the implementation. If a signal handler runs by accident while the vfork is active, memory corruption is practically guaranteed. The only way to disable the signals is with a direct system call; sigprocmask/pthread_sigmask do not work.
Does your implementation do this?
I understand. No, admittedly not. But we squeeze the vulnerable time window to the minimal possible:
if (vfork() == 0) exec(..);
which was a large step forward from the stock Ojdk solution.
While not completely bullet proof, I saw not a single instance of an error in all these years (I understand those errors would be very intermittent and difficult to attribute to vfork+signalling, so we may have missed some).
Well, such tight races never matter until they suddenly do. I'm always amazed how races previously thought impossible reproduce reliably with high probability, given the right circumstances. The highlight so far was a single-instruction race (not even a data race) in our condition variable implementation, which broke our iSCSI implementation.
The current posix_spawn() implementation was added to glibc with glibc 2.24. So, what was the state of posix_spawn() before that version? Is it safe to use, does it do the right thing, or will we encounter regressions?
It uses fork by default. It can be told to use vfork, via POSIX_SPAWN_USEVFORK, but then it is buggy. For generic JDK code, this seems hardly appropriate.
Are you sure about this? The coding I saw in glibc < 2.24 was that it would use vfork if both attributes and file actions were NULL, which should be the case with the OpenJDK and jspawnhelper.
Oh, right. All the more reason to backport the glibc 2.24 change. 8-) Thanks, Florian
On Mon, Oct 22, 2018 at 1:23 PM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi all, [...] So far I have not read a single technical reason in this thread why vfork needs to be abandoned now
Note that my patch does not abandon vfork. It does not even change the default exec strategy away from vfork, nor does it cause any obstacle to your proposed change. It only adds the ability for users to opt in to the existing posix_spawn implementation on Linux. It's just a first step, in whatever direction we might be going, which is far from decided. As you and Martin have both implied, it's a good idea to be very cautious in this area.
The current posix_spawn() implementation was added to glibc with glibc 2.24. So, what was the state of posix_spawn() before that version? Is it safe to use, does it do the right thing, or will we encounter regressions?
I certainly have not suggested making posix_spawn be the default on Linux at this time. But if we don't make it at least a possible option, it cannot be tested *at all*, by anyone, unless they're adventurous enough to hack OpenJDK themselves. The questions you ask can not be answered without at least making it a choice. There are many reasonable paths that can be taken after this patch is merged (if it is): • It could stop here and keep the status quo; users who have trouble with the current implementation can try out posix_spawn • The Linux port could be modified similarly to your proposal to use the existing jspawnhelper infrastructure with vfork on Linux, while still allowing posix_spawn to be manually selected • The Linux port could be modified to use posix_spawn if gnu_get_libc_version(3) and/or other run-time probes return an acceptable value, using vfork (either the original or your modified approach) otherwise, still allowing posix_spawn to be manually selected Other possibilities exist as well. Nothing in this email thread (AFAICT) invalidates your basic proposal, other than perhaps the observation that your proposal has similarities to the existing jspawnhelper path, so that could possibly be utilized. This patch only *increases* our possible future choices; it does not impose any restrictions. -- - DML
Hi David, On Mon, Oct 22, 2018 at 9:00 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Mon, Oct 22, 2018 at 1:23 PM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi all, [...] So far I have not read a single technical reason in this thread why vfork needs to be abandoned now
Note that my patch does not abandon vfork. It does not even change the default exec strategy away from vfork, nor does it cause any obstacle to your proposed change. It only adds the ability for users to opt in to the existing posix_spawn implementation on Linux. It's just a first step, in whatever direction we might be going, which is far from decided. As you and Martin have both implied, it's a good idea to be very cautious in this area.
The current posix_spawn() implementation was added to glibc with glibc 2.24. So, what was the state of posix_spawn() before that version? Is it safe to use, does it do the right thing, or will we encounter regressions?
I certainly have not suggested making posix_spawn be the default on Linux at this time. But if we don't make it at least a possible option, it cannot be tested *at all*, by anyone, unless they're adventurous enough to hack OpenJDK themselves. The questions you ask can not be answered without at least making it a choice.
There are many reasonable paths that can be taken after this patch is merged (if it is):
• It could stop here and keep the status quo; users who have trouble with the current implementation can try out posix_spawn • The Linux port could be modified similarly to your proposal to use the existing jspawnhelper infrastructure with vfork on Linux, while still allowing posix_spawn to be manually selected • The Linux port could be modified to use posix_spawn if gnu_get_libc_version(3) and/or other run-time probes return an acceptable value, using vfork (either the original or your modified approach) otherwise, still allowing posix_spawn to be manually selected
Other possibilities exist as well. Nothing in this email thread (AFAICT) invalidates your basic proposal, other than perhaps the observation that your proposal has similarities to the existing jspawnhelper path, so that could possibly be utilized. This patch only *increases* our possible future choices; it does not impose any restrictions.
I understand. Sorry for being stubborn. I have spent so much time debugging Runtime.exec() problems in the past that I am very paranoid about changes. Bugs in the Runtime.exec() layer tend to be so very hard to analyze. That said, as I wrote in my earlier mail, I think using posix_spawn() may be fine. If I understand the glibc coding correctly, we are good down to glibc 2.4, which is quite old. This will also save me a lot of work, since upstreaming our patch would have been a real pain. It is tightly interwoven with other proprietary components and would have effectively been a rewrite. So I think yes, your patch is fine. Lets get it in and test it. Best Regards, Thomas
-- - DML
I ran some tests on my local I ran a number of tests on various machines and architectures, all with glibc variants older than 2.24, and straced them, and they all used vfork() internally. That made me curious, and I dug into the glibc sources and examined the history. If I understand this correctly: 1) before glibc 2.4, posix_spawn() used just fork() 2) between glibc 2.4 and glibc 2.23, posix_spawn uses either fork() and vfork(), as still described in the manpage today. The relevant coding (sysdeps/posix/spawni.c) looks like this: 93 /* Do this once. */ 94 short int flags = attrp == NULL ? 0 : attrp->__flags; 95 96 /* Generate the new process. */ 97 if ((flags & POSIX_SPAWN_USEVFORK) != 0 98 /* If no major work is done, allow using vfork. Note that we 99 might perform the path searching. But this would be done by 100 a call to execvp(), too, and such a call must be OK according 101 to POSIX. */ 102 || ((flags & (POSIX_SPAWN_SETSIGMASK | POSIX_SPAWN_SETSIGDEF 103 | POSIX_SPAWN_SETSCHEDPARAM | POSIX_SPAWN_SETSCHEDULER 104 | POSIX_SPAWN_SETPGROUP | POSIX_SPAWN_RESETIDS)) == 0 105 && file_actions == NULL)) 106 new_pid = __vfork (); 107 else 108 new_pid = __fork (); 109 1 So, for our use case this means it always uses vfork() since we do not pass attributes nor actions, as David said before. We do not have to since we run the jspawnhelper which does all that for us at the expense of a second exec() call. This looks okay for me. 3) and since 2.24 we use the new linux variant in sysdeps/unix/sysv/linux/spawni.c 365 new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size, 366 CLONE_VM | CLONE_VFORK | SIGCHLD, &args); which is okay for me too. In fact this is better since the child process runs on its own stack before the exec(). --- Okay, seeing that (2) and (3) look fine and (1) only affects glibc versions older than 2.4, which came out in 2006, I admit this is okay and I worry too much. So I'll be quiet now :) Best Regards, Thomas Ubuntu 16.4 and various machines at work, with varying glibc versions as old as On Fri, Oct 19, 2018 at 8:55 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi David,
(for convenience, here the old thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-September/055333.h...)
I would rather not expose a third way to spawn to the end user. As a test switch for developers, this seems fine.
AFAIK (and someone please correct me if I am wrong) posix_spawn is not the magical third way beside fork and vfork, it is just a wrapper around fork/vfork and exec. Especially posix_spawn(), if using fork(), will still have the same pitfalls raw fork() has - quasi-random ENOMEM if we hit the overcommit heuristic, and inferior performance compared to vfork. So you have nothing gained, you have just relegated the fork/vfork decision down to a different layer, one which you have little control over. And since that fork/vfork decision is kind of important, you need that contol.
Note that glibc has a flag to force always vfork behavior: POSIX_SPAWN_USEVFORK. The question is, how tight do we bind us to GNU-only features, since we may run on linux with a different libc, e.g. on Alpine we use muslc.
My proposal, vfork() + the exec-twice-trick, is IMHO the best way to go. That exec-twice trick takes the sting away from vfork. After Martin made supportive noises in the other mail thread, I have planned in some time later this year to produce a patch for this.
--
However: I just realized that my proposal (vfork() + exec-twice-trick) is very close, implementation wise, to the (posix_spawn() + POSIX_SPAWN_USEVFORK + jspawnhelper). The jspawnhelper already implements the exec-twice trick, if only for different reasons than to get away from vfork dangers. Advantage of this implementation would be that very little code would have to change. Disadvantage, in my eyes, is that we bind us to glibc, and an aesthetic side: calling an OS API but needing it to work in one specific way feels like programming with mechanical gloves...
Thanks & Best Regards, Thomas
(Copying Martin, I'm curious what he thinks)
--
On Thu, Oct 18, 2018 at 11:43 PM David Lloyd <david.lloyd@redhat.com> wrote:
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own.
Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good.
The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1".
commit f0eb9ff7c46dff76f91160491fcca0eb25d0ab95 Author: David M. Lloyd <david.lloyd@redhat.com> Date: Thu Oct 18 15:56:37 2018 -0500
[JDK-6850720] Enable POSIX_SPAWN as an option for child process creation on Linux
diff --git a/make/launcher/Launcher-java.base.gmk b/make/launcher/Launcher-java.base.gmk index 0ce0287d2be..c28fe42d102 100644 --- a/make/launcher/Launcher-java.base.gmk +++ b/make/launcher/Launcher-java.base.gmk @@ -84,7 +84,7 @@ endif
################################################################################
-ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix), ) +ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix linux), ) $(eval $(call SetupJdkExecutable, BUILD_JSPAWNHELPER, \ NAME := jspawnhelper, \ SRC := $(TOPDIR)/src/$(MODULE)/unix/native/jspawnhelper, \ diff --git a/src/java.base/unix/classes/java/lang/ProcessImpl.java b/src/java.base/unix/classes/java/lang/ProcessImpl.java index 368a4f7380b..959e50dfecd 100644 --- a/src/java.base/unix/classes/java/lang/ProcessImpl.java +++ b/src/java.base/unix/classes/java/lang/ProcessImpl.java @@ -89,7 +89,7 @@ final class ProcessImpl extends Process {
private static enum Platform {
- LINUX(LaunchMechanism.VFORK, LaunchMechanism.FORK), + LINUX(LaunchMechanism.VFORK, LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
diff --git a/src/java.base/unix/native/libjava/ProcessImpl_md.c b/src/java.base/unix/native/libjava/ProcessImpl_md.c index 533584fdb7a..6869a64f2cc 100644 --- a/src/java.base/unix/native/libjava/ProcessImpl_md.c +++ b/src/java.base/unix/native/libjava/ProcessImpl_md.c @@ -44,7 +44,7 @@ #include <signal.h> #include <string.h>
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) #include <spawn.h> #endif
@@ -390,7 +390,7 @@ forkChild(ChildStuff *c) { return resultPid; }
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) static pid_t spawnChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) { pid_t resultPid; @@ -489,7 +489,7 @@ startChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) #endif case MODE_FORK: return forkChild(c); -#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) case MODE_POSIX_SPAWN: return spawnChild(env, process, c, helperpath); #endif
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote:
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own.
Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good.
The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1".
commit f0eb9ff7c46dff76f91160491fcca0eb25d0ab95 Author: David M. Lloyd <david.lloyd@redhat.com> Date: Thu Oct 18 15:56:37 2018 -0500
[JDK-6850720] Enable POSIX_SPAWN as an option for child process creation on Linux
diff --git a/make/launcher/Launcher-java.base.gmk b/make/launcher/Launcher-java.base.gmk index 0ce0287d2be..c28fe42d102 100644 --- a/make/launcher/Launcher-java.base.gmk +++ b/make/launcher/Launcher-java.base.gmk @@ -84,7 +84,7 @@ endif
################################################################################
-ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix), ) +ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix linux), ) $(eval $(call SetupJdkExecutable, BUILD_JSPAWNHELPER, \ NAME := jspawnhelper, \ SRC := $(TOPDIR)/src/$(MODULE)/unix/native/jspawnhelper, \ diff --git a/src/java.base/unix/classes/java/lang/ProcessImpl.java b/src/java.base/unix/classes/java/lang/ProcessImpl.java index 368a4f7380b..959e50dfecd 100644 --- a/src/java.base/unix/classes/java/lang/ProcessImpl.java +++ b/src/java.base/unix/classes/java/lang/ProcessImpl.java @@ -89,7 +89,7 @@ final class ProcessImpl extends Process {
private static enum Platform {
- LINUX(LaunchMechanism.VFORK, LaunchMechanism.FORK), + LINUX(LaunchMechanism.VFORK, LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
diff --git a/src/java.base/unix/native/libjava/ProcessImpl_md.c b/src/java.base/unix/native/libjava/ProcessImpl_md.c index 533584fdb7a..6869a64f2cc 100644 --- a/src/java.base/unix/native/libjava/ProcessImpl_md.c +++ b/src/java.base/unix/native/libjava/ProcessImpl_md.c @@ -44,7 +44,7 @@ #include <signal.h> #include <string.h>
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) #include <spawn.h> #endif
@@ -390,7 +390,7 @@ forkChild(ChildStuff *c) { return resultPid; }
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) static pid_t spawnChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) { pid_t resultPid; @@ -489,7 +489,7 @@ startChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) #endif case MODE_FORK: return forkChild(c); -#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) case MODE_POSIX_SPAWN: return spawnChild(env, process, c, helperpath); #endif
-- - DML
Oh, I can open a bug report on JBS for you. Should I? (Now I understand the "reuse bug id"). On Tue, Oct 23, 2018 at 3:18 PM David Lloyd <david.lloyd@redhat.com> wrote:
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote:
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own.
Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good.
The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1".
commit f0eb9ff7c46dff76f91160491fcca0eb25d0ab95 Author: David M. Lloyd <david.lloyd@redhat.com> Date: Thu Oct 18 15:56:37 2018 -0500
[JDK-6850720] Enable POSIX_SPAWN as an option for child process creation on Linux
diff --git a/make/launcher/Launcher-java.base.gmk b/make/launcher/Launcher-java.base.gmk index 0ce0287d2be..c28fe42d102 100644 --- a/make/launcher/Launcher-java.base.gmk +++ b/make/launcher/Launcher-java.base.gmk @@ -84,7 +84,7 @@ endif
################################################################################
-ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix), ) +ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix linux), ) $(eval $(call SetupJdkExecutable, BUILD_JSPAWNHELPER, \ NAME := jspawnhelper, \ SRC := $(TOPDIR)/src/$(MODULE)/unix/native/jspawnhelper, \ diff --git a/src/java.base/unix/classes/java/lang/ProcessImpl.java b/src/java.base/unix/classes/java/lang/ProcessImpl.java index 368a4f7380b..959e50dfecd 100644 --- a/src/java.base/unix/classes/java/lang/ProcessImpl.java +++ b/src/java.base/unix/classes/java/lang/ProcessImpl.java @@ -89,7 +89,7 @@ final class ProcessImpl extends Process {
private static enum Platform {
- LINUX(LaunchMechanism.VFORK, LaunchMechanism.FORK), + LINUX(LaunchMechanism.VFORK, LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
diff --git a/src/java.base/unix/native/libjava/ProcessImpl_md.c b/src/java.base/unix/native/libjava/ProcessImpl_md.c index 533584fdb7a..6869a64f2cc 100644 --- a/src/java.base/unix/native/libjava/ProcessImpl_md.c +++ b/src/java.base/unix/native/libjava/ProcessImpl_md.c @@ -44,7 +44,7 @@ #include <signal.h> #include <string.h>
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) #include <spawn.h> #endif
@@ -390,7 +390,7 @@ forkChild(ChildStuff *c) { return resultPid; }
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) static pid_t spawnChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) { pid_t resultPid; @@ -489,7 +489,7 @@ startChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) #endif case MODE_FORK: return forkChild(c); -#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) case MODE_POSIX_SPAWN: return spawnChild(env, process, c, helperpath); #endif
-- - DML
Sure. I don't have any tracking information on the bugreport one I submitted, but if you can track that down and promote it, it would save you some typing. Otherwise whatever you can do would be great, thanks. On Tue, Oct 23, 2018 at 9:02 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Oh, I can open a bug report on JBS for you. Should I?
(Now I understand the "reuse bug id").
On Tue, Oct 23, 2018 at 3:18 PM David Lloyd <david.lloyd@redhat.com> wrote:
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote:
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own.
Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good.
The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1".
commit f0eb9ff7c46dff76f91160491fcca0eb25d0ab95 Author: David M. Lloyd <david.lloyd@redhat.com> Date: Thu Oct 18 15:56:37 2018 -0500
[JDK-6850720] Enable POSIX_SPAWN as an option for child process creation on Linux
diff --git a/make/launcher/Launcher-java.base.gmk b/make/launcher/Launcher-java.base.gmk index 0ce0287d2be..c28fe42d102 100644 --- a/make/launcher/Launcher-java.base.gmk +++ b/make/launcher/Launcher-java.base.gmk @@ -84,7 +84,7 @@ endif
################################################################################
-ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix), ) +ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix linux), ) $(eval $(call SetupJdkExecutable, BUILD_JSPAWNHELPER, \ NAME := jspawnhelper, \ SRC := $(TOPDIR)/src/$(MODULE)/unix/native/jspawnhelper, \ diff --git a/src/java.base/unix/classes/java/lang/ProcessImpl.java b/src/java.base/unix/classes/java/lang/ProcessImpl.java index 368a4f7380b..959e50dfecd 100644 --- a/src/java.base/unix/classes/java/lang/ProcessImpl.java +++ b/src/java.base/unix/classes/java/lang/ProcessImpl.java @@ -89,7 +89,7 @@ final class ProcessImpl extends Process {
private static enum Platform {
- LINUX(LaunchMechanism.VFORK, LaunchMechanism.FORK), + LINUX(LaunchMechanism.VFORK, LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
diff --git a/src/java.base/unix/native/libjava/ProcessImpl_md.c b/src/java.base/unix/native/libjava/ProcessImpl_md.c index 533584fdb7a..6869a64f2cc 100644 --- a/src/java.base/unix/native/libjava/ProcessImpl_md.c +++ b/src/java.base/unix/native/libjava/ProcessImpl_md.c @@ -44,7 +44,7 @@ #include <signal.h> #include <string.h>
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) #include <spawn.h> #endif
@@ -390,7 +390,7 @@ forkChild(ChildStuff *c) { return resultPid; }
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) static pid_t spawnChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) { pid_t resultPid; @@ -489,7 +489,7 @@ startChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) #endif case MODE_FORK: return forkChild(c); -#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) case MODE_POSIX_SPAWN: return spawnChild(env, process, c, helperpath); #endif
-- - DML
-- - DML
Here you go: https://bugs.openjdk.java.net/browse/JDK-8212828 If noone else steps in, I can sponsor the change for you. Cheers, Thomas On Tue, Oct 23, 2018 at 4:19 PM David Lloyd <david.lloyd@redhat.com> wrote:
Sure. I don't have any tracking information on the bugreport one I submitted, but if you can track that down and promote it, it would save you some typing. Otherwise whatever you can do would be great, thanks. On Tue, Oct 23, 2018 at 9:02 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Oh, I can open a bug report on JBS for you. Should I?
(Now I understand the "reuse bug id").
On Tue, Oct 23, 2018 at 3:18 PM David Lloyd <david.lloyd@redhat.com> wrote:
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote:
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own.
Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good.
The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1".
commit f0eb9ff7c46dff76f91160491fcca0eb25d0ab95 Author: David M. Lloyd <david.lloyd@redhat.com> Date: Thu Oct 18 15:56:37 2018 -0500
[JDK-6850720] Enable POSIX_SPAWN as an option for child process creation on Linux
diff --git a/make/launcher/Launcher-java.base.gmk b/make/launcher/Launcher-java.base.gmk index 0ce0287d2be..c28fe42d102 100644 --- a/make/launcher/Launcher-java.base.gmk +++ b/make/launcher/Launcher-java.base.gmk @@ -84,7 +84,7 @@ endif
################################################################################
-ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix), ) +ifneq ($(findstring $(OPENJDK_TARGET_OS), macosx solaris aix linux), ) $(eval $(call SetupJdkExecutable, BUILD_JSPAWNHELPER, \ NAME := jspawnhelper, \ SRC := $(TOPDIR)/src/$(MODULE)/unix/native/jspawnhelper, \ diff --git a/src/java.base/unix/classes/java/lang/ProcessImpl.java b/src/java.base/unix/classes/java/lang/ProcessImpl.java index 368a4f7380b..959e50dfecd 100644 --- a/src/java.base/unix/classes/java/lang/ProcessImpl.java +++ b/src/java.base/unix/classes/java/lang/ProcessImpl.java @@ -89,7 +89,7 @@ final class ProcessImpl extends Process {
private static enum Platform {
- LINUX(LaunchMechanism.VFORK, LaunchMechanism.FORK), + LINUX(LaunchMechanism.VFORK, LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
diff --git a/src/java.base/unix/native/libjava/ProcessImpl_md.c b/src/java.base/unix/native/libjava/ProcessImpl_md.c index 533584fdb7a..6869a64f2cc 100644 --- a/src/java.base/unix/native/libjava/ProcessImpl_md.c +++ b/src/java.base/unix/native/libjava/ProcessImpl_md.c @@ -44,7 +44,7 @@ #include <signal.h> #include <string.h>
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) #include <spawn.h> #endif
@@ -390,7 +390,7 @@ forkChild(ChildStuff *c) { return resultPid; }
-#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) static pid_t spawnChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) { pid_t resultPid; @@ -489,7 +489,7 @@ startChild(JNIEnv *env, jobject process, ChildStuff *c, const char *helperpath) #endif case MODE_FORK: return forkChild(c); -#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) +#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__) case MODE_POSIX_SPAWN: return spawnChild(env, process, c, helperpath); #endif
-- - DML
-- - DML
My plans to try jdk/submit have fallen through unfortunately, as I cannot seem to gain direct or indirect access to that system. So I guess I'm looking for any reviews on this patch now. Thomas has volunteered to sponsor. Thanks. On Tue, Oct 23, 2018 at 10:49 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Here you go:
https://bugs.openjdk.java.net/browse/JDK-8212828
If noone else steps in, I can sponsor the change for you.
Cheers, Thomas On Tue, Oct 23, 2018 at 4:19 PM David Lloyd <david.lloyd@redhat.com> wrote:
Sure. I don't have any tracking information on the bugreport one I submitted, but if you can track that down and promote it, it would save you some typing. Otherwise whatever you can do would be great, thanks. On Tue, Oct 23, 2018 at 9:02 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Oh, I can open a bug report on JBS for you. Should I?
(Now I understand the "reuse bug id").
On Tue, Oct 23, 2018 at 3:18 PM David Lloyd <david.lloyd@redhat.com> wrote:
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote:
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own.
Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good.
The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1".
-- - DML
Hi David, et.al. What would be the rest of the plan for testing? Usually, changes come with tests and a plan. What build parameters are needed to run a full set of tests with the change? Are there build changes needed? Thanks, Roger On 10/23/2018 03:26 PM, David Lloyd wrote:
My plans to try jdk/submit have fallen through unfortunately, as I cannot seem to gain direct or indirect access to that system. So I guess I'm looking for any reviews on this patch now. Thomas has volunteered to sponsor.
Thanks.
On Tue, Oct 23, 2018 at 10:49 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Here you go:
https://bugs.openjdk.java.net/browse/JDK-8212828
If noone else steps in, I can sponsor the change for you.
Cheers, Thomas On Tue, Oct 23, 2018 at 4:19 PM David Lloyd <david.lloyd@redhat.com> wrote:
Sure. I don't have any tracking information on the bugreport one I submitted, but if you can track that down and promote it, it would save you some typing. Otherwise whatever you can do would be great, thanks. On Tue, Oct 23, 2018 at 9:02 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Oh, I can open a bug report on JBS for you. Should I?
(Now I understand the "reuse bug id").
On Tue, Oct 23, 2018 at 3:18 PM David Lloyd <david.lloyd@redhat.com> wrote:
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote:
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own.
Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good.
The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1".
I'm not really sure TBH. I did some manual testing; beyond that, in a way, the point of this change is to *get* more real-world testing. I haven't yet found any existing tests which try out the various process creation options for a given platform, however. Maybe an idea would be to have a variation on test/jdk/java/lang/ProcessBuilder/BigFork.java which tries all of the LaunchMechanism options (other than FORK, which I believe is known to likely fail with this test) which are available on the current arch? Also there's test/jdk/java/lang/ProcessBuilder/Basic.java which could possibly be adapted to run multiple times as well (and maybe also to cover FORK which is supported for many archs but AFAICT not tested on archs where there are other options, i.e. all of them). But I'm not really sure how to make that happen with the OpenJDK test infrastructure. As for build changes, no I do not believe any are needed, though I am not able to build on non-Linux platforms to be sure; I was hoping to get a run through jdk-submit to verify that but I don't seem to have access to that either, having no particular status in the OpenJDK project other than sometime contributor. Martin did suggest that maybe the spawn native code could be modified to use config-based macros (e.g. HAS_SPAWN_H) instead of the current platform-based detection but I feel that's out of scope for this little change. On Tue, Oct 23, 2018 at 2:53 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi David, et.al.
What would be the rest of the plan for testing? Usually, changes come with tests and a plan. What build parameters are needed to run a full set of tests with the change?
Are there build changes needed?
Thanks, Roger
On 10/23/2018 03:26 PM, David Lloyd wrote:
My plans to try jdk/submit have fallen through unfortunately, as I cannot seem to gain direct or indirect access to that system. So I guess I'm looking for any reviews on this patch now. Thomas has volunteered to sponsor.
Thanks.
On Tue, Oct 23, 2018 at 10:49 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Here you go:
https://bugs.openjdk.java.net/browse/JDK-8212828
If noone else steps in, I can sponsor the change for you.
Cheers, Thomas On Tue, Oct 23, 2018 at 4:19 PM David Lloyd <david.lloyd@redhat.com> wrote:
Sure. I don't have any tracking information on the bugreport one I submitted, but if you can track that down and promote it, it would save you some typing. Otherwise whatever you can do would be great, thanks. On Tue, Oct 23, 2018 at 9:02 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Oh, I can open a bug report on JBS for you. Should I?
(Now I understand the "reuse bug id").
On Tue, Oct 23, 2018 at 3:18 PM David Lloyd <david.lloyd@redhat.com> wrote:
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote: > The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process > launching on Linux, but it's the closest I could find out of what are > really a surprisingly large number of issues that refer to posix_spawn > in one way or another relating to ProcessImpl. There's a different > issue to move from vfork to posix_spawn on Solaris, but I wasn't sure > if that one was quite right to hang this off of. Maybe it should be > yet another issue of its own. > > Anyway: this is a follow-up to the email thread entitled "Runtime.exec > : vfork() concerns and a fix proposal", where it was casually > mentioned that maybe posix_spawn could become an option on Linux, > whereafter it could be thoroughly tested by brave individuals and > eventually maybe become the default on that platform, obsoleting the > vfork support for good. > > The following patch does just that. I've tested it launching a > multi-process WildFly instance a bunch of times, in conjunction with > the conveniently existent "jdk.lang.Process.launchMechanism" property, > and nothing exploded so here it is. The usual deal with git patches: > apply directly through "patch -p1".
-- - DML
For the convenience of the reviewers, here webrev and bug: https://bugs.openjdk.java.net/browse/JDK-8212828 http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web... submit tests are currently running. ..Thomas On Tue, Oct 23, 2018 at 9:27 PM David Lloyd <david.lloyd@redhat.com> wrote:
My plans to try jdk/submit have fallen through unfortunately, as I cannot seem to gain direct or indirect access to that system. So I guess I'm looking for any reviews on this patch now. Thomas has volunteered to sponsor.
Thanks.
On Tue, Oct 23, 2018 at 10:49 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Here you go:
https://bugs.openjdk.java.net/browse/JDK-8212828
If noone else steps in, I can sponsor the change for you.
Cheers, Thomas On Tue, Oct 23, 2018 at 4:19 PM David Lloyd <david.lloyd@redhat.com> wrote:
Sure. I don't have any tracking information on the bugreport one I submitted, but if you can track that down and promote it, it would save you some typing. Otherwise whatever you can do would be great, thanks. On Tue, Oct 23, 2018 at 9:02 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Oh, I can open a bug report on JBS for you. Should I?
(Now I understand the "reuse bug id").
On Tue, Oct 23, 2018 at 3:18 PM David Lloyd <david.lloyd@redhat.com> wrote:
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote:
The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process launching on Linux, but it's the closest I could find out of what are really a surprisingly large number of issues that refer to posix_spawn in one way or another relating to ProcessImpl. There's a different issue to move from vfork to posix_spawn on Solaris, but I wasn't sure if that one was quite right to hang this off of. Maybe it should be yet another issue of its own.
Anyway: this is a follow-up to the email thread entitled "Runtime.exec : vfork() concerns and a fix proposal", where it was casually mentioned that maybe posix_spawn could become an option on Linux, whereafter it could be thoroughly tested by brave individuals and eventually maybe become the default on that platform, obsoleting the vfork support for good.
The following patch does just that. I've tested it launching a multi-process WildFly instance a bunch of times, in conjunction with the conveniently existent "jdk.lang.Process.launchMechanism" property, and nothing exploded so here it is. The usual deal with git patches: apply directly through "patch -p1".
-- - DML
Hi all, Basic considerations: This patch enables an untested function to the end user. I am unsure about this. I would feel better if we had something like "-XX:+UnlockDiagnosticVMOptions" for the JDK. That said, I can see the merit in having this switch to play around with. It would it make easier for us to test posix_spawn() (for us, the next step would be to enable it by default in our nightly tests at SAP and just run the nightlies with it for a couple of days or weeks). So, bottomline, I have no strong reservations if no one else has.+ --- Review: - copyright dates need updating on the C-sources - I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms. Looks fine otherwise. --- Best, Thomas On Wed, Oct 24, 2018 at 7:51 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
For the convenience of the reviewers, here webrev and bug:
https://bugs.openjdk.java.net/browse/JDK-8212828 http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
submit tests are currently running.
..Thomas
On Tue, Oct 23, 2018 at 9:27 PM David Lloyd <david.lloyd@redhat.com> wrote:
My plans to try jdk/submit have fallen through unfortunately, as I cannot seem to gain direct or indirect access to that system. So I guess I'm looking for any reviews on this patch now. Thomas has volunteered to sponsor.
Thanks.
On Tue, Oct 23, 2018 at 10:49 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Here you go:
https://bugs.openjdk.java.net/browse/JDK-8212828
If noone else steps in, I can sponsor the change for you.
Cheers, Thomas On Tue, Oct 23, 2018 at 4:19 PM David Lloyd <david.lloyd@redhat.com> wrote:
Sure. I don't have any tracking information on the bugreport one I submitted, but if you can track that down and promote it, it would save you some typing. Otherwise whatever you can do would be great, thanks. On Tue, Oct 23, 2018 at 9:02 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Oh, I can open a bug report on JBS for you. Should I?
(Now I understand the "reuse bug id").
On Tue, Oct 23, 2018 at 3:18 PM David Lloyd <david.lloyd@redhat.com> wrote:
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote: > > The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process > launching on Linux, but it's the closest I could find out of what are > really a surprisingly large number of issues that refer to posix_spawn > in one way or another relating to ProcessImpl. There's a different > issue to move from vfork to posix_spawn on Solaris, but I wasn't sure > if that one was quite right to hang this off of. Maybe it should be > yet another issue of its own. > > Anyway: this is a follow-up to the email thread entitled "Runtime.exec > : vfork() concerns and a fix proposal", where it was casually > mentioned that maybe posix_spawn could become an option on Linux, > whereafter it could be thoroughly tested by brave individuals and > eventually maybe become the default on that platform, obsoleting the > vfork support for good. > > The following patch does just that. I've tested it launching a > multi-process WildFly instance a bunch of times, in conjunction with > the conveniently existent "jdk.lang.Process.launchMechanism" property, > and nothing exploded so here it is. The usual deal with git patches: > apply directly through "patch -p1".
-- - DML
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes. -- - DML
Hi David, I think we need some form of test, as Alan indicated. java/lang/ProcessBuilder/Basic.java should run through. Actually, I would like it if we would run this test for all valid enabled launch mechanisms, regardless which one is default. Can this be done? To save time I will only rerun the tests once all reviews are completed and the test is added. Thanks, Thomas On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes.
-- - DML
On Wed, Oct 24, 2018 at 8:52 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi David,
I think we need some form of test, as Alan indicated.
java/lang/ProcessBuilder/Basic.java should run through. Actually, I would like it if we would run this test for all valid enabled launch mechanisms, regardless which one is default. Can this be done?
I agree, however, I do not know how this can be accomplished. Someone who understands this test infrastructure will have to take over. -- - DML
On Wed, Oct 24, 2018 at 3:56 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 8:52 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi David,
I think we need some form of test, as Alan indicated.
java/lang/ProcessBuilder/Basic.java should run through. Actually, I would like it if we would run this test for all valid enabled launch mechanisms, regardless which one is default. Can this be done?
I agree, however, I do not know how this can be accomplished. Someone who understands this test infrastructure will have to take over.
Okay I'll add the test. ..Thomas
-- - DML
Hi all, version 2 of Davids patch, with test changes: http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web... Executed the test on my local Ubuntu box, works fine. Submit job runs. About the test: I added a new line: * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default. This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux. Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too. Best, Thomas On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes.
-- - DML
Hi Thomas, Thanks for adding the test. There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times. There can be multiple @test blocks with different @requires. * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic I'll run this through our build system too. To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs. Thanks, Roger On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms. Here's a version with these changes.
-- - DML
Oops, it dropped the newlines. * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic Roger On 10/24/2018 11:22 AM, Roger Riggs wrote:
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs.
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms. Here's a version with these changes.
-- - DML
Hi Roger, On Wed, Oct 24, 2018 at 5:23 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
Thanks for adding the test.
There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times.
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
Does not seem to work, sorry: 24 /* 25 * @test 26 * @bug 4199068 4738465 4937983 4930681 4926230 4931433 4932663 4986689 27 * 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 28 * 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 29 * 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 30 * 8067796 31 * @key intermittent 32 * @summary Basic tests for Process and Environment Variable code 33 * @modules java.base/java.lang:open 34 * @run main/othervm/timeout=300 Basic 35 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic 36 * @test 37 * @requires (os.family == "liinux") 38 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic 39 * @author Martin Buchholz 40 */ If I have @requires (os.family == "linux") all three variants are executed - ok. Then I wanted to have a negative test, so I misnamed linux as "liinux" and would have expected the first @test block to fire, the second to be ignored. But in fact: thomas@mainframe /shared/projects/openjdk/jtreg $ ../jtreg-prebuilt/jtreg/bin/jtreg -jdk ../jdk-jdk/output-slowdebug/images/jdk/ ../jdk-jdk/source/test/jdk/java/lang/ProcessBuilder/Basic.java Test results: no tests selected So to me it looks like as if the @requires tag is valid for both @test blocks, not only the one it appears in.
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs.
Sure! But I think this is out of scope for this patch. Thanks, Thomas
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms. Here's a version with these changes.
-- - DML
Hi Thomas, Sorry, I had the incantation for multiple tests wrong. Separate test configurations need to be in separate /* */ comment blocks. BTW, it creates separate .jtr files for each block. diff --git a/test/jdk/java/lang/ProcessBuilder/Basic.java b/test/jdk/java/lang/ProcessBuilder/Basic.java --- a/test/jdk/java/lang/ProcessBuilder/Basic.java +++ b/test/jdk/java/lang/ProcessBuilder/Basic.java @@ -33,8 +33,10 @@ * @modules java.base/java.lang:open * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic - * + */ +/* * @test + * @modules java.base/java.lang:open * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic * @author Martin Buchholz As for the build configuration being out of scope... I don't want to see part of the work done and then the rest forgotten or worse yet someone comes along in the future and incorrectly believes that posix_spawn is ready for production and starts filing bugs or getting bit by something. There's a lot more testing to do before it could be come the default and perhaps a significant warning should be attached to the code so it does not get forgotten. Regards, Roger On 10/24/18 1:53 PM, Thomas Stüfe wrote:
Hi Roger,
On Wed, Oct 24, 2018 at 5:23 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
Thanks for adding the test.
There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times.
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
Does not seem to work, sorry:
24 /* 25 * @test 26 * @bug 4199068 4738465 4937983 4930681 4926230 4931433 4932663 4986689 27 * 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 28 * 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 29 * 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 30 * 8067796 31 * @key intermittent 32 * @summary Basic tests for Process and Environment Variable code 33 * @modules java.base/java.lang:open 34 * @run main/othervm/timeout=300 Basic 35 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic 36 * @test 37 * @requires (os.family == "liinux") 38 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic 39 * @author Martin Buchholz 40 */
If I have @requires (os.family == "linux") all three variants are executed - ok.
Then I wanted to have a negative test, so I misnamed linux as "liinux" and would have expected the first @test block to fire, the second to be ignored. But in fact:
thomas@mainframe /shared/projects/openjdk/jtreg $ ../jtreg-prebuilt/jtreg/bin/jtreg -jdk ../jdk-jdk/output-slowdebug/images/jdk/ ../jdk-jdk/source/test/jdk/java/lang/ProcessBuilder/Basic.java Test results: no tests selected
So to me it looks like as if the @requires tag is valid for both @test blocks, not only the one it appears in.
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs. Sure! But I think this is out of scope for this patch.
Thanks, Thomas
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms. Here's a version with these changes.
-- - DML
Hi Roger, On Wed 24. Oct 2018 at 21:39, Roger Riggs <roger.riggs@oracle.com> wrote:
Hi Thomas,
Sorry, I had the incantation for multiple tests wrong. Separate test configurations need to be in separate /* */ comment blocks. BTW, it creates separate .jtr files for each block.
diff --git a/test/jdk/java/lang/ProcessBuilder/Basic.java b/test/jdk/java/lang/ProcessBuilder/Basic.java --- a/test/jdk/java/lang/ProcessBuilder/Basic.java +++ b/test/jdk/java/lang/ProcessBuilder/Basic.java @@ -33,8 +33,10 @@ * @modules java.base/java.lang:open * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic - * + */ +/* * @test + * @modules java.base/java.lang:open * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic * @author Martin Buchholz
As for the build configuration being out of scope...
I don't want to see part of the work done and then the rest forgotten or worse yet someone comes along in the future and incorrectly believes that posix_spawn is ready for production and starts filing bugs or getting bit by something. There's a lot more testing to do before it could be come the default and perhaps a significant warning should be attached to the code so it does not get forgotten.
I agree, we ideally should not roll out half tested changes. But where does that leave us wrt this patch? We could just not do it, requiring anyone willing to do the extensive testing necessary to switch the default to posix-spawn to apply this change first locally. This is not an insurmountable amount of work, especially since the base is quite static, so the patch won’t bit rot easily. We could push the patch in its current form, plus a large source comment? But then. Users do not read comments. A warning on stderr? Would play havoc with many tests parsing stderr. Do you have an idea how to proceed? Thanks, Thomas
Regards, Roger
On 10/24/18 1:53 PM, Thomas Stüfe wrote:
Hi Roger,
On Wed, Oct 24, 2018 at 5:23 PM Roger Riggs <Roger.Riggs@oracle.com> <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
Thanks for adding the test.
There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times.
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
Does not seem to work, sorry:
24 /* 25 * @test 26 * @bug 4199068 4738465 4937983 4930681 4926230 4931433 4932663 4986689 27 * 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 28 * 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 29 * 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 30 * 8067796 31 * @key intermittent 32 * @summary Basic tests for Process and Environment Variable code 33 * @modules java.base/java.lang:open 34 * @run main/othervm/timeout=300 Basic 35 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic 36 * @test 37 * @requires (os.family == "liinux") 38 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic 39 * @author Martin Buchholz 40 */
If I have @requires (os.family == "linux") all three variants are executed - ok.
Then I wanted to have a negative test, so I misnamed linux as "liinux" and would have expected the first @test block to fire, the second to be ignored. But in fact:
thomas@mainframe /shared/projects/openjdk/jtreg $ ../jtreg-prebuilt/jtreg/bin/jtreg -jdk ../jdk-jdk/output-slowdebug/images/jdk/ ../jdk-jdk/source/test/jdk/java/lang/ProcessBuilder/Basic.java Test results: no tests selected
So to me it looks like as if the @requires tag is valid for both @test blocks, not only the one it appears in.
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs.
Sure! But I think this is out of scope for this patch.
Thanks, Thomas
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes: http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes.
-- - DML
Hi all, the more I mull over this, the more I would prefer to do the jump for real and attempt switch the default to posix_spawn() for Linux. We have theoretically established that both glibc down to 2.4 and muslc since always did "the right thing". We still have time in the 12 time line to test this thoroughly. What do you think? Thanks, Thomas On Wed, Oct 24, 2018 at 10:59 PM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi Roger,
On Wed 24. Oct 2018 at 21:39, Roger Riggs <roger.riggs@oracle.com> wrote:
Hi Thomas,
Sorry, I had the incantation for multiple tests wrong. Separate test configurations need to be in separate /* */ comment blocks. BTW, it creates separate .jtr files for each block.
diff --git a/test/jdk/java/lang/ProcessBuilder/Basic.java b/test/jdk/java/lang/ProcessBuilder/Basic.java --- a/test/jdk/java/lang/ProcessBuilder/Basic.java +++ b/test/jdk/java/lang/ProcessBuilder/Basic.java @@ -33,8 +33,10 @@ * @modules java.base/java.lang:open * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic - * + */ +/* * @test + * @modules java.base/java.lang:open * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic * @author Martin Buchholz
As for the build configuration being out of scope...
I don't want to see part of the work done and then the rest forgotten or worse yet someone comes along in the future and incorrectly believes that posix_spawn is ready for production and starts filing bugs or getting bit by something. There's a lot more testing to do before it could be come the default and perhaps a significant warning should be attached to the code so it does not get forgotten.
I agree, we ideally should not roll out half tested changes. But where does that leave us wrt this patch?
We could just not do it, requiring anyone willing to do the extensive testing necessary to switch the default to posix-spawn to apply this change first locally. This is not an insurmountable amount of work, especially since the base is quite static, so the patch won’t bit rot easily.
We could push the patch in its current form, plus a large source comment? But then. Users do not read comments. A warning on stderr? Would play havoc with many tests parsing stderr.
Do you have an idea how to proceed?
Thanks, Thomas
Regards, Roger
On 10/24/18 1:53 PM, Thomas Stüfe wrote:
Hi Roger,
On Wed, Oct 24, 2018 at 5:23 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
Thanks for adding the test.
There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times.
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
Does not seem to work, sorry:
24 /* 25 * @test 26 * @bug 4199068 4738465 4937983 4930681 4926230 4931433 4932663 4986689 27 * 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 28 * 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 29 * 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 30 * 8067796 31 * @key intermittent 32 * @summary Basic tests for Process and Environment Variable code 33 * @modules java.base/java.lang:open 34 * @run main/othervm/timeout=300 Basic 35 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic 36 * @test 37 * @requires (os.family == "liinux") 38 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic 39 * @author Martin Buchholz 40 */
If I have @requires (os.family == "linux") all three variants are executed - ok.
Then I wanted to have a negative test, so I misnamed linux as "liinux" and would have expected the first @test block to fire, the second to be ignored. But in fact:
thomas@mainframe /shared/projects/openjdk/jtreg $ ../jtreg-prebuilt/jtreg/bin/jtreg -jdk ../jdk-jdk/output-slowdebug/images/jdk/ ../jdk-jdk/source/test/jdk/java/lang/ProcessBuilder/Basic.java Test results: no tests selected
So to me it looks like as if the @requires tag is valid for both @test blocks, not only the one it appears in.
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs.
Sure! But I think this is out of scope for this patch.
Thanks, Thomas
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes.
-- - DML
I'm in favor, for whatever that's worth. On Thu, Oct 25, 2018 at 5:33 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi all,
the more I mull over this, the more I would prefer to do the jump for real and attempt switch the default to posix_spawn() for Linux.
We have theoretically established that both glibc down to 2.4 and muslc since always did "the right thing".
We still have time in the 12 time line to test this thoroughly.
What do you think?
Thanks, Thomas
On Wed, Oct 24, 2018 at 10:59 PM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi Roger,
On Wed 24. Oct 2018 at 21:39, Roger Riggs <roger.riggs@oracle.com> wrote:
Hi Thomas,
Sorry, I had the incantation for multiple tests wrong. Separate test configurations need to be in separate /* */ comment blocks. BTW, it creates separate .jtr files for each block.
diff --git a/test/jdk/java/lang/ProcessBuilder/Basic.java b/test/jdk/java/lang/ProcessBuilder/Basic.java --- a/test/jdk/java/lang/ProcessBuilder/Basic.java +++ b/test/jdk/java/lang/ProcessBuilder/Basic.java @@ -33,8 +33,10 @@ * @modules java.base/java.lang:open * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic - * + */ +/* * @test + * @modules java.base/java.lang:open * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic * @author Martin Buchholz
As for the build configuration being out of scope...
I don't want to see part of the work done and then the rest forgotten or worse yet someone comes along in the future and incorrectly believes that posix_spawn is ready for production and starts filing bugs or getting bit by something. There's a lot more testing to do before it could be come the default and perhaps a significant warning should be attached to the code so it does not get forgotten.
I agree, we ideally should not roll out half tested changes. But where does that leave us wrt this patch?
We could just not do it, requiring anyone willing to do the extensive testing necessary to switch the default to posix-spawn to apply this change first locally. This is not an insurmountable amount of work, especially since the base is quite static, so the patch won’t bit rot easily.
We could push the patch in its current form, plus a large source comment? But then. Users do not read comments. A warning on stderr? Would play havoc with many tests parsing stderr.
Do you have an idea how to proceed?
Thanks, Thomas
Regards, Roger
On 10/24/18 1:53 PM, Thomas Stüfe wrote:
Hi Roger,
On Wed, Oct 24, 2018 at 5:23 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
Thanks for adding the test.
There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times.
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
Does not seem to work, sorry:
24 /* 25 * @test 26 * @bug 4199068 4738465 4937983 4930681 4926230 4931433 4932663 4986689 27 * 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 28 * 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 29 * 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 30 * 8067796 31 * @key intermittent 32 * @summary Basic tests for Process and Environment Variable code 33 * @modules java.base/java.lang:open 34 * @run main/othervm/timeout=300 Basic 35 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic 36 * @test 37 * @requires (os.family == "liinux") 38 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic 39 * @author Martin Buchholz 40 */
If I have @requires (os.family == "linux") all three variants are executed - ok.
Then I wanted to have a negative test, so I misnamed linux as "liinux" and would have expected the first @test block to fire, the second to be ignored. But in fact:
thomas@mainframe /shared/projects/openjdk/jtreg $ ../jtreg-prebuilt/jtreg/bin/jtreg -jdk ../jdk-jdk/output-slowdebug/images/jdk/ ../jdk-jdk/source/test/jdk/java/lang/ProcessBuilder/Basic.java Test results: no tests selected
So to me it looks like as if the @requires tag is valid for both @test blocks, not only the one it appears in.
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs.
Sure! But I think this is out of scope for this patch.
Thanks, Thomas
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes.
-- - DML
-- - DML
Hi Thomas, In an abundance of caution, I was thinking that it would be a change right at the beginning of a new release so it gets the most exercise and users in early access, etc. And before that it needs to be put into more regular usage and some more unusual environments. The default is currently selected by virtual of being first in the argument list; that doesn't lend itself to being configurable with either configure or make arguments. Roger On 10/25/2018 06:33 AM, Thomas Stüfe wrote:
Hi all,
the more I mull over this, the more I would prefer to do the jump for real and attempt switch the default to posix_spawn() for Linux.
We have theoretically established that both glibc down to 2.4 and muslc since always did "the right thing".
We still have time in the 12 time line to test this thoroughly.
What do you think?
Thanks, Thomas
On Wed, Oct 24, 2018 at 10:59 PM Thomas Stüfe<thomas.stuefe@gmail.com> wrote:
Hi Roger,
On Wed 24. Oct 2018 at 21:39, Roger Riggs<roger.riggs@oracle.com> wrote:
Hi Thomas,
Sorry, I had the incantation for multiple tests wrong. Separate test configurations need to be in separate /* */ comment blocks. BTW, it creates separate .jtr files for each block.
diff --git a/test/jdk/java/lang/ProcessBuilder/Basic.java b/test/jdk/java/lang/ProcessBuilder/Basic.java --- a/test/jdk/java/lang/ProcessBuilder/Basic.java +++ b/test/jdk/java/lang/ProcessBuilder/Basic.java @@ -33,8 +33,10 @@ * @modules java.base/java.lang:open * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic - * + */ +/* * @test + * @modules java.base/java.lang:open * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic * @author Martin Buchholz
As for the build configuration being out of scope...
I don't want to see part of the work done and then the rest forgotten or worse yet someone comes along in the future and incorrectly believes that posix_spawn is ready for production and starts filing bugs or getting bit by something. There's a lot more testing to do before it could be come the default and perhaps a significant warning should be attached to the code so it does not get forgotten. I agree, we ideally should not roll out half tested changes. But where does that leave us wrt this patch?
We could just not do it, requiring anyone willing to do the extensive testing necessary to switch the default to posix-spawn to apply this change first locally. This is not an insurmountable amount of work, especially since the base is quite static, so the patch won’t bit rot easily.
We could push the patch in its current form, plus a large source comment? But then. Users do not read comments. A warning on stderr? Would play havoc with many tests parsing stderr.
Do you have an idea how to proceed?
Thanks, Thomas
Regards, Roger
On 10/24/18 1:53 PM, Thomas Stüfe wrote:
Hi Roger,
On Wed, Oct 24, 2018 at 5:23 PM Roger Riggs<Roger.Riggs@oracle.com> wrote:
Hi Thomas,
Thanks for adding the test.
There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times.
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
Does not seem to work, sorry:
24 /* 25 * @test 26 * @bug 4199068 4738465 4937983 4930681 4926230 4931433 4932663 4986689 27 * 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 28 * 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 29 * 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 30 * 8067796 31 * @key intermittent 32 * @summary Basic tests for Process and Environment Variable code 33 * @modules java.base/java.lang:open 34 * @run main/othervm/timeout=300 Basic 35 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic 36 * @test 37 * @requires (os.family == "liinux") 38 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic 39 * @author Martin Buchholz 40 */
If I have @requires (os.family == "linux") all three variants are executed - ok.
Then I wanted to have a negative test, so I misnamed linux as "liinux" and would have expected the first @test block to fire, the second to be ignored. But in fact:
thomas@mainframe /shared/projects/openjdk/jtreg $ ../jtreg-prebuilt/jtreg/bin/jtreg -jdk ../jdk-jdk/output-slowdebug/images/jdk/ ../jdk-jdk/source/test/jdk/java/lang/ProcessBuilder/Basic.java Test results: no tests selected
So to me it looks like as if the @requires tag is valid for both @test blocks, not only the one it appears in.
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs.
Sure! But I think this is out of scope for this patch.
Thanks, Thomas
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd<david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe<thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes.
-- - DML
Hi Roger, On Thu, Oct 25, 2018 at 10:45 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
In an abundance of caution, I was thinking that it would be a change right at the beginning of a new release so it gets the most exercise and users in early access, etc.
Okay, I understand that. Over the next days I will run tests with posix_spawn enabled by default on our landscape. We have many different Linuxes on different architectures and different levels of glibc, so this is a reasonable test. If I do not encounter red flags, I would consider the posix_spawn path tested well enough to ship it at least as a non-default, experimental option. Like David originally intended. Then, start of next release, we can make it default and see how that goes. Does that sound ok to you?
And before that it needs to be put into more regular usage and some more unusual environments. The default is currently selected by virtual of being first in the argument list; that doesn't lend itself to being configurable with either configure or make arguments.
I do not understand the last sentence: it was never my intention of adding a configure option?
Roger
Thanks, Thomas
On 10/25/2018 06:33 AM, Thomas Stüfe wrote:
Hi all,
the more I mull over this, the more I would prefer to do the jump for real and attempt switch the default to posix_spawn() for Linux.
We have theoretically established that both glibc down to 2.4 and muslc since always did "the right thing".
We still have time in the 12 time line to test this thoroughly.
What do you think?
Thanks, Thomas
On Wed, Oct 24, 2018 at 10:59 PM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi Roger,
On Wed 24. Oct 2018 at 21:39, Roger Riggs <roger.riggs@oracle.com> wrote:
Hi Thomas,
Sorry, I had the incantation for multiple tests wrong. Separate test configurations need to be in separate /* */ comment blocks. BTW, it creates separate .jtr files for each block.
diff --git a/test/jdk/java/lang/ProcessBuilder/Basic.java b/test/jdk/java/lang/ProcessBuilder/Basic.java --- a/test/jdk/java/lang/ProcessBuilder/Basic.java +++ b/test/jdk/java/lang/ProcessBuilder/Basic.java @@ -33,8 +33,10 @@ * @modules java.base/java.lang:open * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic - * + */ +/* * @test + * @modules java.base/java.lang:open * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic * @author Martin Buchholz
As for the build configuration being out of scope...
I don't want to see part of the work done and then the rest forgotten or worse yet someone comes along in the future and incorrectly believes that posix_spawn is ready for production and starts filing bugs or getting bit by something. There's a lot more testing to do before it could be come the default and perhaps a significant warning should be attached to the code so it does not get forgotten.
I agree, we ideally should not roll out half tested changes. But where does that leave us wrt this patch?
We could just not do it, requiring anyone willing to do the extensive testing necessary to switch the default to posix-spawn to apply this change first locally. This is not an insurmountable amount of work, especially since the base is quite static, so the patch won’t bit rot easily.
We could push the patch in its current form, plus a large source comment? But then. Users do not read comments. A warning on stderr? Would play havoc with many tests parsing stderr.
Do you have an idea how to proceed?
Thanks, Thomas
Regards, Roger
On 10/24/18 1:53 PM, Thomas Stüfe wrote:
Hi Roger,
On Wed, Oct 24, 2018 at 5:23 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
Thanks for adding the test.
There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times.
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
Does not seem to work, sorry:
24 /* 25 * @test 26 * @bug 4199068 4738465 4937983 4930681 4926230 4931433 4932663 4986689 27 * 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 28 * 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 29 * 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 30 * 8067796 31 * @key intermittent 32 * @summary Basic tests for Process and Environment Variable code 33 * @modules java.base/java.lang:open 34 * @run main/othervm/timeout=300 Basic 35 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic 36 * @test 37 * @requires (os.family == "liinux") 38 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic 39 * @author Martin Buchholz 40 */
If I have @requires (os.family == "linux") all three variants are executed - ok.
Then I wanted to have a negative test, so I misnamed linux as "liinux" and would have expected the first @test block to fire, the second to be ignored. But in fact:
thomas@mainframe /shared/projects/openjdk/jtreg $ ../jtreg-prebuilt/jtreg/bin/jtreg -jdk ../jdk-jdk/output-slowdebug/images/jdk/ ../jdk-jdk/source/test/jdk/java/lang/ProcessBuilder/Basic.java Test results: no tests selected
So to me it looks like as if the @requires tag is valid for both @test blocks, not only the one it appears in.
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs.
Sure! But I think this is out of scope for this patch.
Thanks, Thomas
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes.
-- - DML
Hi Thomas, On 10/29/2018 12:04 PM, Thomas Stüfe wrote:
Hi Roger,
On Thu, Oct 25, 2018 at 10:45 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
In an abundance of caution, I was thinking that it would be a change right at the beginning of a new release so it gets the most exercise and users in early access, etc.
Okay, I understand that.
Over the next days I will run tests with posix_spawn enabled by default on our landscape. We have many different Linuxes on different architectures and different levels of glibc, so this is a reasonable test.
If I do not encounter red flags, I would consider the posix_spawn path tested well enough to ship it at least as a non-default, experimental option. Like David originally intended. Then, start of next release, we can make it default and see how that goes.
Does that sound ok to you?
That's fine, until it becomes the default, it will be opt in. Is there an updated webrev with the corrected test executions?
And before that it needs to be put into more regular usage and some more unusual environments. The default is currently selected by virtual of being first in the argument list; that doesn't lend itself to being configurable with either configure or make arguments. I do not understand the last sentence: it was never my intention of adding a configure option?
I was considering whether it would be useful to have such an option to make it easier to make it the default in a particular build. But it seems unnecessary since the tests can be done without. Thanks, Roger ...
On 10/25/2018 06:33 AM, Thomas Stüfe wrote:
Hi all,
the more I mull over this, the more I would prefer to do the jump for real and attempt switch the default to posix_spawn() for Linux.
We have theoretically established that both glibc down to 2.4 and muslc since always did "the right thing".
We still have time in the 12 time line to test this thoroughly.
What do you think?
Thanks, Thomas
On Wed, Oct 24, 2018 at 10:59 PM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi Roger,
On Wed 24. Oct 2018 at 21:39, Roger Riggs <roger.riggs@oracle.com> wrote:
Hi Thomas,
Sorry, I had the incantation for multiple tests wrong. Separate test configurations need to be in separate /* */ comment blocks. BTW, it creates separate .jtr files for each block.
diff --git a/test/jdk/java/lang/ProcessBuilder/Basic.java b/test/jdk/java/lang/ProcessBuilder/Basic.java --- a/test/jdk/java/lang/ProcessBuilder/Basic.java +++ b/test/jdk/java/lang/ProcessBuilder/Basic.java @@ -33,8 +33,10 @@ * @modules java.base/java.lang:open * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic - * + */ +/* * @test + * @modules java.base/java.lang:open * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic * @author Martin Buchholz
As for the build configuration being out of scope...
I don't want to see part of the work done and then the rest forgotten or worse yet someone comes along in the future and incorrectly believes that posix_spawn is ready for production and starts filing bugs or getting bit by something. There's a lot more testing to do before it could be come the default and perhaps a significant warning should be attached to the code so it does not get forgotten.
I agree, we ideally should not roll out half tested changes. But where does that leave us wrt this patch?
We could just not do it, requiring anyone willing to do the extensive testing necessary to switch the default to posix-spawn to apply this change first locally. This is not an insurmountable amount of work, especially since the base is quite static, so the patch won’t bit rot easily.
We could push the patch in its current form, plus a large source comment? But then. Users do not read comments. A warning on stderr? Would play havoc with many tests parsing stderr.
Do you have an idea how to proceed?
Thanks, Thomas
Regards, Roger
On 10/24/18 1:53 PM, Thomas Stüfe wrote:
Hi Roger,
On Wed, Oct 24, 2018 at 5:23 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
Thanks for adding the test.
There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times.
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
Does not seem to work, sorry:
24 /* 25 * @test 26 * @bug 4199068 4738465 4937983 4930681 4926230 4931433 4932663 4986689 27 * 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 28 * 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 29 * 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 30 * 8067796 31 * @key intermittent 32 * @summary Basic tests for Process and Environment Variable code 33 * @modules java.base/java.lang:open 34 * @run main/othervm/timeout=300 Basic 35 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic 36 * @test 37 * @requires (os.family == "liinux") 38 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic 39 * @author Martin Buchholz 40 */
If I have @requires (os.family == "linux") all three variants are executed - ok.
Then I wanted to have a negative test, so I misnamed linux as "liinux" and would have expected the first @test block to fire, the second to be ignored. But in fact:
thomas@mainframe /shared/projects/openjdk/jtreg $ ../jtreg-prebuilt/jtreg/bin/jtreg -jdk ../jdk-jdk/output-slowdebug/images/jdk/ ../jdk-jdk/source/test/jdk/java/lang/ProcessBuilder/Basic.java Test results: no tests selected
So to me it looks like as if the @requires tag is valid for both @test blocks, not only the one it appears in.
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs.
Sure! But I think this is out of scope for this patch.
Thanks, Thomas
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes.
-- - DML
Hi Roger, On Tue, Oct 30, 2018 at 3:46 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
On 10/29/2018 12:04 PM, Thomas Stüfe wrote:
Hi Roger,
On Thu, Oct 25, 2018 at 10:45 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
In an abundance of caution, I was thinking that it would be a change right at the beginning of a new release so it gets the most exercise and users in early access, etc.
Okay, I understand that.
Over the next days I will run tests with posix_spawn enabled by default on our landscape. We have many different Linuxes on different architectures and different levels of glibc, so this is a reasonable test.
If I do not encounter red flags, I would consider the posix_spawn path tested well enough to ship it at least as a non-default, experimental option. Like David originally intended. Then, start of next release, we can make it default and see how that goes.
Does that sound ok to you?
That's fine, until it becomes the default, it will be opt in.
Is there an updated webrev with the corrected test executions?
Here you go. http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn-on-linux/... Tested locally, seems to work fine. I am re-running the submit tests too.
And before that it needs to be put into more regular usage and some more unusual environments. The default is currently selected by virtual of being first in the argument list; that doesn't lend itself to being configurable with either configure or make arguments.
I do not understand the last sentence: it was never my intention of adding a configure option?
I was considering whether it would be useful to have such an option to make it easier to make it the default in a particular build. But it seems unnecessary since the tests can be done without.
Okay, understood, thanks for clarifying. Thanks, Thomas
Thanks, Roger
...
On 10/25/2018 06:33 AM, Thomas Stüfe wrote:
Hi all,
the more I mull over this, the more I would prefer to do the jump for real and attempt switch the default to posix_spawn() for Linux.
We have theoretically established that both glibc down to 2.4 and muslc since always did "the right thing".
We still have time in the 12 time line to test this thoroughly.
What do you think?
Thanks, Thomas
On Wed, Oct 24, 2018 at 10:59 PM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Hi Roger,
On Wed 24. Oct 2018 at 21:39, Roger Riggs <roger.riggs@oracle.com> wrote:
Hi Thomas,
Sorry, I had the incantation for multiple tests wrong. Separate test configurations need to be in separate /* */ comment blocks. BTW, it creates separate .jtr files for each block.
diff --git a/test/jdk/java/lang/ProcessBuilder/Basic.java b/test/jdk/java/lang/ProcessBuilder/Basic.java --- a/test/jdk/java/lang/ProcessBuilder/Basic.java +++ b/test/jdk/java/lang/ProcessBuilder/Basic.java @@ -33,8 +33,10 @@ * @modules java.base/java.lang:open * @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic - * + */ +/* * @test + * @modules java.base/java.lang:open * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic * @author Martin Buchholz
As for the build configuration being out of scope...
I don't want to see part of the work done and then the rest forgotten or worse yet someone comes along in the future and incorrectly believes that posix_spawn is ready for production and starts filing bugs or getting bit by something. There's a lot more testing to do before it could be come the default and perhaps a significant warning should be attached to the code so it does not get forgotten.
I agree, we ideally should not roll out half tested changes. But where does that leave us wrt this patch?
We could just not do it, requiring anyone willing to do the extensive testing necessary to switch the default to posix-spawn to apply this change first locally. This is not an insurmountable amount of work, especially since the base is quite static, so the patch won’t bit rot easily.
We could push the patch in its current form, plus a large source comment? But then. Users do not read comments. A warning on stderr? Would play havoc with many tests parsing stderr.
Do you have an idea how to proceed?
Thanks, Thomas
Regards, Roger
On 10/24/18 1:53 PM, Thomas Stüfe wrote:
Hi Roger,
On Wed, Oct 24, 2018 at 5:23 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
Thanks for adding the test.
There's a feature of jtreg that was exposed a couple of month ago that can be used to deal with running the test too many times.
There can be multiple @test blocks with different @requires.
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic * * @test * @requires (os.family == "linux") * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
Does not seem to work, sorry:
24 /* 25 * @test 26 * @bug 4199068 4738465 4937983 4930681 4926230 4931433 4932663 4986689 27 * 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 28 * 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 29 * 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 30 * 8067796 31 * @key intermittent 32 * @summary Basic tests for Process and Environment Variable code 33 * @modules java.base/java.lang:open 34 * @run main/othervm/timeout=300 Basic 35 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic 36 * @test 37 * @requires (os.family == "liinux") 38 * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic 39 * @author Martin Buchholz 40 */
If I have @requires (os.family == "linux") all three variants are executed - ok.
Then I wanted to have a negative test, so I misnamed linux as "liinux" and would have expected the first @test block to fire, the second to be ignored. But in fact:
thomas@mainframe /shared/projects/openjdk/jtreg $ ../jtreg-prebuilt/jtreg/bin/jtreg -jdk ../jdk-jdk/output-slowdebug/images/jdk/ ../jdk-jdk/source/test/jdk/java/lang/ProcessBuilder/Basic.java Test results: no tests selected
So to me it looks like as if the @requires tag is valid for both @test blocks, not only the one it appears in.
I'll run this through our build system too.
To fully test out using posix_spawn in many more different scenarios the build system should be augmented to be able to use it as the default for an entire build/CI/CD/test runs.
Sure! But I think this is out of scope for this patch.
Thanks, Thomas
Thanks, Roger
On 10/24/2018 10:35 AM, Thomas Stüfe wrote:
Hi all,
version 2 of Davids patch, with test changes:
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
Executed the test on my local Ubuntu box, works fine. Submit job runs.
About the test: I added a new line:
* @run main/othervm/timeout=300 Basic * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=fork Basic + * @run main/othervm/timeout=300 -Djdk.lang.Process.launchMechanism=posix_spawn Basic
The way I understand those tests work is that the first line causes the test to be run with the default launch mechanism, the second line with jdk.lang.Process.launchMechanism=fork explicitly. The third line, added by me, will now explicitly test with posix_spawn. I did not limit the platforms, since posix_spawn should now be available on all platforms, so it should work on all platforms. But then, on all other platforms it has already been the default.
This is still a bit iffy: On Windows, jdk.lang.Process.launchMechanism gets ignored, so we just executed the same test twice (now three times)? And now we execute the posix_spawn variant twice on all platforms where this is the default, so line 1 and 3 are the same? You see I am not a jtreg expert :) Can I specify a @run directive for only one platform? In that case I would limit the explicit posix_spawn test to Linux.
Note however that if we really abondon vfork in the future and make posix_spawn the default, this test becomes simpler too.
Best, Thomas
On Wed, Oct 24, 2018 at 3:36 PM David Lloyd <david.lloyd@redhat.com> wrote:
On Wed, Oct 24, 2018 at 1:05 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Review:
- copyright dates need updating on the C-sources
- I opt for "#if defined(__solaris__) || defined(_ALLBSD_SOURCE) || defined(_AIX) || defined(__linux__)" to be removed completely from unix-specific source files. The ifdef now covers all OpenJDK Unix platforms.
Here's a version with these changes.
-- - DML
Hi Thomas, The webrev looks fine. Please remove the @author tag in the Linux (2nd) test block in Basic.java. Author tags are losing favor and there's no need to repeat it. I ran the change through our tests without errors. I'd give it another 24hours before pushing in case anyone else wants to review it. Thanks, Roger p.s. The issue to change the default is: https://bugs.openjdk.java.net/browse/JDK-8213192 On 10/30/2018 11:41 AM, Thomas Stüfe wrote:
Hi Roger,
On Tue, Oct 30, 2018 at 3:46 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
On 10/29/2018 12:04 PM, Thomas Stüfe wrote:
Hi Roger,
On Thu, Oct 25, 2018 at 10:45 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
In an abundance of caution, I was thinking that it would be a change right at the beginning of a new release so it gets the most exercise and users in early access, etc.
Okay, I understand that.
Over the next days I will run tests with posix_spawn enabled by default on our landscape. We have many different Linuxes on different architectures and different levels of glibc, so this is a reasonable test.
If I do not encounter red flags, I would consider the posix_spawn path tested well enough to ship it at least as a non-default, experimental option. Like David originally intended. Then, start of next release, we can make it default and see how that goes.
Does that sound ok to you?
That's fine, until it becomes the default, it will be opt in.
Is there an updated webrev with the corrected test executions?
Here you go.
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn-on-linux/...
Tested locally, seems to work fine. I am re-running the submit tests too.
Hi Roger, thanks! I'll remove the author tag before pushing. I ran the change through jdk-submit too, without problems, though I assume they are a subset of the tests you ran. I was not yet able to run them through our tests, due to technical problems. Will do so in the next days. Thanks, Thomas On Wed, Oct 31, 2018 at 2:46 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
The webrev looks fine.
Please remove the @author tag in the Linux (2nd) test block in Basic.java. Author tags are losing favor and there's no need to repeat it.
I ran the change through our tests without errors.
I'd give it another 24hours before pushing in case anyone else wants to review it.
Thanks, Roger
p.s. The issue to change the default is: https://bugs.openjdk.java.net/browse/JDK-8213192
On 10/30/2018 11:41 AM, Thomas Stüfe wrote:
Hi Roger,
On Tue, Oct 30, 2018 at 3:46 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
On 10/29/2018 12:04 PM, Thomas Stüfe wrote:
Hi Roger,
On Thu, Oct 25, 2018 at 10:45 PM Roger Riggs <Roger.Riggs@oracle.com> wrote:
Hi Thomas,
In an abundance of caution, I was thinking that it would be a change right at the beginning of a new release so it gets the most exercise and users in early access, etc.
Okay, I understand that.
Over the next days I will run tests with posix_spawn enabled by default on our landscape. We have many different Linuxes on different architectures and different levels of glibc, so this is a reasonable test.
If I do not encounter red flags, I would consider the posix_spawn path tested well enough to ship it at least as a non-default, experimental option. Like David originally intended. Then, start of next release, we can make it default and see how that goes.
Does that sound ok to you?
That's fine, until it becomes the default, it will be opt in.
Is there an updated webrev with the corrected test executions?
Here you go.
http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn-on-linux/...
Tested locally, seems to work fine. I am re-running the submit tests too.
On 31/10/2018 13:45, Roger Riggs wrote:
Hi Thomas,
The webrev looks fine.
Please remove the @author tag in the Linux (2nd) test block in Basic.java. Author tags are losing favor and there's no need to repeat it. I agree, no need to repeat it here.
The webrev otherwise looks good to me too and I see Roger's issue to looking at switching the default. -Alan
Thank you all. I just pushed. I set David as contributor since he provided the original patch. Cheers, Thomas On Thu, Nov 1, 2018 at 9:11 AM Alan Bateman <Alan.Bateman@oracle.com> wrote:
On 31/10/2018 13:45, Roger Riggs wrote:
Hi Thomas,
The webrev looks fine.
Please remove the @author tag in the Linux (2nd) test block in Basic.java. Author tags are losing favor and there's no need to repeat it. I agree, no need to repeat it here.
The webrev otherwise looks good to me too and I see Roger's issue to looking at switching the default.
-Alan
On 24/10/2018 06:51, Thomas Stüfe wrote:
For the convenience of the reviewers, here webrev and bug:
https://bugs.openjdk.java.net/browse/JDK-8212828 http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
submit tests are currently running.
Adding the posix_spawn to the Linux build make sense. As regards testing then we need to at least get ProcessBuilder/Basic.java running with -Djdk.lang.Process.launchMechanism=posix_spawn, maybe a wrapper that has @requires (os.family == "linux") in its description. -Alan
jdk-submit tests were clean. ..Thomas On Wed, Oct 24, 2018 at 7:51 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
For the convenience of the reviewers, here webrev and bug:
https://bugs.openjdk.java.net/browse/JDK-8212828 http://cr.openjdk.java.net/~stuefe/webrevs/JDK-8212828-posix_spawn.patch/web...
submit tests are currently running.
..Thomas
On Tue, Oct 23, 2018 at 9:27 PM David Lloyd <david.lloyd@redhat.com> wrote:
My plans to try jdk/submit have fallen through unfortunately, as I cannot seem to gain direct or indirect access to that system. So I guess I'm looking for any reviews on this patch now. Thomas has volunteered to sponsor.
Thanks.
On Tue, Oct 23, 2018 at 10:49 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Here you go:
https://bugs.openjdk.java.net/browse/JDK-8212828
If noone else steps in, I can sponsor the change for you.
Cheers, Thomas On Tue, Oct 23, 2018 at 4:19 PM David Lloyd <david.lloyd@redhat.com> wrote:
Sure. I don't have any tracking information on the bugreport one I submitted, but if you can track that down and promote it, it would save you some typing. Otherwise whatever you can do would be great, thanks. On Tue, Oct 23, 2018 at 9:02 AM Thomas Stüfe <thomas.stuefe@gmail.com> wrote:
Oh, I can open a bug report on JBS for you. Should I?
(Now I understand the "reuse bug id").
On Tue, Oct 23, 2018 at 3:18 PM David Lloyd <david.lloyd@redhat.com> wrote:
I've submitted a bug report via bugreport.java.com. If/when it gets promoted to a proper JIRA with an issue number, I'll see if I can put the patch up on jdk/submit. On Thu, Oct 18, 2018 at 4:42 PM David Lloyd <david.lloyd@redhat.com> wrote: > > The issue 6850720 isn't _exactly_ to use POSIX_SPAWN for process > launching on Linux, but it's the closest I could find out of what are > really a surprisingly large number of issues that refer to posix_spawn > in one way or another relating to ProcessImpl. There's a different > issue to move from vfork to posix_spawn on Solaris, but I wasn't sure > if that one was quite right to hang this off of. Maybe it should be > yet another issue of its own. > > Anyway: this is a follow-up to the email thread entitled "Runtime.exec > : vfork() concerns and a fix proposal", where it was casually > mentioned that maybe posix_spawn could become an option on Linux, > whereafter it could be thoroughly tested by brave individuals and > eventually maybe become the default on that platform, obsoleting the > vfork support for good. > > The following patch does just that. I've tested it launching a > multi-process WildFly instance a bunch of times, in conjunction with > the conveniently existent "jdk.lang.Process.launchMechanism" property, > and nothing exploded so here it is. The usual deal with git patches: > apply directly through "patch -p1".
-- - DML
participants (6)
-
Alan Bateman
-
David Lloyd
-
Florian Weimer
-
Martin Buchholz
-
Roger Riggs
-
Thomas Stüfe