RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it [v4]

Volker Simonis simonis at openjdk.org
Tue May 23 08:33:06 UTC 2023


On Mon, 22 May 2023 10:22:16 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Since JDK13, executing commands in a sub-process defaults to the so called `POSIX_SPAWN` launching mechanism (i.e. `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by using `posix_spawn(3)` to firstly start a tiny helper program called `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the command data from the parent Java process over a Unix pipe and finally executes (i.e. `execvp(3)`) the requested command.
>> 
>> In cases where the parent process terminates abnormally before `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` will block indefinitely on the reading end of the pipe. This is especially harmful if the parent process had open sockets, because in that case, the forked `jspawnhelper` process will inherit them and keep all the corresponding ports open effectively preventing other processes to bind to them. Notice that this is not an abstract scenario. We've observed this regularly in production with services which couldn't be restarted after a crash after migrating to JDK 17.
>> 
>> The fix of the issue is rather trivial. `jspawnhelper` has to close its writing end of the pipe which connects it with the parent Java process *before* starting to read from that pipe such that reads from the pipe can immediately return with EOF if the parent process terminates abnormally.
>> 
>> Also did some cleanup:
>>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to write the command data from the parent process to `jspawnhelper`
>>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because that's long gone.
>
> Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
> 
>  - Merge branch 'master' into JDK-8307990
>  - Address comments from tstuefe and RogerRiggs
>  - REALLY adding the test :)
>  - Added test case
>  - 8307990: jspawnhelper must close its writing side of a pipe before reading from it

> Hey, chiming on this conversation, because I believe it will fix an issue we encountered since, upgrading from JDK 11 to 17. In our application we spawn a lot of processes over time and noticed that more and more JVM threads get stuck in the native `ProcessImpl.forkAndExec`.
> 
> Thread Dump Excerpt
> After some investigation we saw that we have `jspawnhelper` processes corresponding to the number of the stuck threads. `/proc/*/task/{id}/syscall` shows that all of these processes are currently reading from the pipe.
> 
> Our theory is that the write calls from the parent thread were interrupted resulting in an incomplete write.
> 
> https://github.com/openjdk/jdk/blob/69f508a2ac344eb61cef7be985348873b8265171/src/java.base/unix/native/libjava/ProcessImpl_md.c#L556-L559
> 
> 
> This would cause the `readFully` on the `jspawnhelper` side to wait for the remaining data.
> As the number of bytes written is not checked on the writer side it would continue and wait for the alive ping from the `jspawnhelper` which results in a deadlock.
> I would be interested in a reproducer e.g. using `gdb`, but to be honest I wouldn't know where to start.
> But regardless of whether this fixes our issue I think incomplete writes should be handled according to specification.
> I think the `restartableWrites` in this PR do that?
> We also mitigated by falling back to `VFORK` as was also mentioned in [this comment](https://github.com/openjdk/jdk/pull/13956#issuecomment-1547777042).
> 
> Maybe it would be worth it to mention our bug situation in the release notes as well? Thank you and please let me know if I got something wrong or there is anything I could provide to help!

@mlichtblau thanks a lot for your report. I saw some similar reports from our internal Maven builds (have to check though if they are exactly the same). Your analysis of the problem seems sound and I hope the restartable writes will indeed fix the problem.

I'll definitely add your example to the release notes and I'll try if I can come up with a test that reproduces your issue.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1558794718


More information about the core-libs-dev mailing list