RFR: 8307990: jspawnhelper must close its writing side of a pipe before reading from it

Thomas Stuefe stuefe at openjdk.org
Tue May 16 07:56:46 UTC 2023


On Mon, 15 May 2023 18:45:24 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

>> Since JDK13, executing commands in a sub-process defaults to the so called `POSIX_SPAWN` launching mechanism (i.e. `-Djdk.lang.Process.launchMechanism=POSIX_SPAWN`) on Linux. This works by using `posix_spawn(3)` to firstly start a tiny helper program called `jspawnhelper` in a subprocess. In a second step, `jspawnhelper` reads the command data from the parent Java process over a Unix pipe and finally executes (i.e. `execvp(3)`) the requested command.
>> 
>> In cases where the parent process terminates abnormally before `jspawnhelper` has read all the expected data from the pipe, `jspawnhelper` will block indefinitely on the reading end of the pipe. This is especially harmful if the parent process had open sockets, because in that case, the forked `jspawnhelper` process will inherit them and keep all the corresponding ports open effectively preventing other processes to bind to them. Notice that this is not an abstract scenario. We've observed this regularly in production with services which couldn't be restarted after a crash after migrating to JDK 17.
>> 
>> The fix of the issue is rather trivial. `jspawnhelper` has to close its writing end of the pipe which connects it with the parent Java process *before* starting to read from that pipe such that reads from the pipe can immediately return with EOF if the parent process terminates abnormally.
>> 
>> Also did some cleanup:
>>  - in `jspawnhelper.c` correctly chek the result of `readFully()`
>>  - in `ProcessImpl_md.c` use `restartableWrite()` instead of `write()` to write the command data from the parent process to `jspawnhelper`
>>  - in `childproc.{c,h}` remove remaining traces of `MODE_CLONE` because that's long gone.
>
> I think writing a test is worth a little bit of time but getting a clean test run using the KILLTEST might be a bit tricky. The exit(-1) of the parent would be detected by jtreg as a failure. And verifying that the spawned child runs to completion might be difficult without writing a driver app to monitor the child; though it may be sufficient to know just that it does not hang and timeout. When the parent dies, the child will be re-parented.
> And it would only be run in debug builds.

@RogerRiggs @simonis 

Okay, I wrote a jtreg test case for the issue, just to see if I could. Rebased atop of Volkers change, you can just take the commit if you want:

https://github.com/tstuefe/jdk/tree/Added-TestCase-For-Volker
https://github.com/openjdk/jdk/commit/9119b442dac8dd95228d47d70871ee59862649ce

Test case fails immediately (so, no relying on timeouts) if Volkers patch is missing. Succeeds immediately with Volkers patch. I also tried commenting out the fixing close() call, test fails immediately.

Patch relies on jspawnhelper to write a small marker file on communication error with parent if KILLTEST is set. Only in debug code. I think this is very reliable. No second-guessing aliveness of the child and worrying about PID reuse. Re-parenting is no issue either. And I think such a file make be a good idea anyway to analyze handshake errors with jspawnhelper.

Tested on Linux x64.

@simonis, feel free to add me as contributor if you take this test. I spent way to much time on this :-)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1549172824


More information about the core-libs-dev mailing list