RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

Volker Simonis simonis at openjdk.org
Mon May 15 16:20:48 UTC 2023


On Fri, 12 May 2023 15:24:19 GMT, Volker Simonis <simonis at openjdk.org> wrote:

> Since JDK13, executing commands in a sub-process defaults to the so called `POSIX_SPAWN` launching mechanism (i.e. `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by using `posix_spawn(3)` to firstly start a tiny helper program called `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the command data from the parent Java process over a Unix pipe and finally executes (i.e. `execvp(3)`) the requested command.
> 
> In cases where the parent process terminates abnormally before `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` will block indefinitely on the reading end of the pipe. This is especially harmful if the parent process had open sockets, because in that case, the forked `jspawnhelper` process will inherit them and keep all the corresponding ports open effectively preventing other processes to bind to them. Notice that this is not an abstract scenario. We've observed this regularly in production with services which couldn't be restarted after a crash after migrating to JDK 17.
> 
> The fix of the issue is rather trivial. `jspawnhelper` has to close its writing end of the pipe which connects it with the parent Java process *before* starting to read from that pipe such that reads from the pipe can immediately return with EOF if the parent process terminates abnormally.
> 
> Also did some cleanup:
>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to write the command data from the parent process to `jspawnhelper`
>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because that's long gone.

> > > Looks ok.
> > > Is it practical to write a test for this situation? Can I assume you've validated the improvement as a remedy for the observed hangs?
> > 
> > 
> > I thought about a test but couldn't come up with a practical way to write it. The JVM has to exit in the time frame after the `posix_spawn()` and before `jspawnhelper` has read its data from the parent. In production this usually happens due to memory constraints on the particular host which let the OOM-killer kill the JVM process because it is the biggest memory consumer.
> 
> A regression test would be good.
> 
> This can be very simply a runtime switch that kills the parent process at vulnerable times. See my example here:
> 
> [master...tstuefe:jdk:test-for-parent-premature-death](https://github.com/openjdk/jdk/compare/master...tstuefe:jdk:test-for-parent-premature-death)
> 
> Using that (KILLTEST=1), I was able to reproduce your problem with the hanging child. Using this to build a jtreg test is not hard.
> 
> Cheers, Thomas

Do you really think it makes sense to add such test code to the native runtime libraries?

I'm not sure. What do others think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1548163835


More information about the core-libs-dev mailing list