How important is jspawnhelper crash detection?

Tue Jul 15 15:04:54 UTC 2025

Hi Thomas,

Simpler is better on both sides of the protocol.
The version check will have happened before this part of the protocol so 
there's no confusion about matching expectations.
I agree that removing it is preferred.

Roger

On 7/15/25 10:44 AM, Thomas Stüfe wrote:
> Hi,
>
> I am currently working on removing (eventually) the vfork mode. Before 
> we can do this, we need a bit better error diagnostics. I do this by 
> gently improving the error handling throughout the code, so that we 
> can generate IOExceptions based on more exact knowledge.
>
> While working at this, I re-examined the "send aliveness ping from 
> jspawnhelper to parent" logic again I introduced back in 2019 as a 
> workaround for an obscure glibc bug with posix_spawn (see 
> https://bugs.openjdk.org/browse/JDK-8223777). I found that it was not 
> needed anymore since the glibc was fixed, so I started removing that 
> workaround (https://bugs.openjdk.org/browse/JDK-8362257).
>
> But then it occurred to me that an obscure part of the jspawnhelper 
> diagnostics introduced with 
> https://bugs.openjdk.org/browse/JDK-8226242 piggy-backs on the 
> aliveness ping for its 
> "detect-abnormal-process-termination-before-exec" capabilities. Works 
> like this:
>
> A jspawnhelper starts
> B jspawnhelper enters childProcess() and sends alivenes ping
> C jspawnhelper does a bunch of other things
> D jspawnhelper exec's
>
> In the parent, we count abnormal child terminations that occur before 
> the aliveness ping (B) as "spawn error" and print the signal number.  
> Without the aliveness ping we could not tell apart "jspawnhelper ends 
> abnormally due to a signal" from "jspawnhelper exec()'s the user 
> program successfully, user program runs and ends abnormally due to 
> signal".
>
> However, the question is how important or even useful this part really is:
> - for externally sent abnormal termination signals (SIGTERM etc), from 
> the caller's point of view it probably does not matter when it comes : 
> before or after exec().
> - it only matters for synchronous termination signals (crashes) we 
> ourselves cause; but here it only covers crashes in a rather small 
> piece of code, before the liveness ping (B). Basically, just the first 
> part of jspawnhelper main(). Any crashes after the liveness ping are 
> still unrecognised by ProcessBuilder.start, and are instead handled by 
> the caller calling Process.waitFor().
>
> There are two ways to deal with this:
>
> We could do without the crash protection in point (A), which would 
> allow us to remove the liveness ping. I would very much prefer that. 
> It would simplify the jspawnhelper protocol and make it more robust. 
> Because we now don't have any communication in case no error happens - 
> there would be only a single bit of information sent back via fail 
> pipe, and only in case of an error. Fail pipe would stay quiet in case 
> of successful exec(). Abnormal child process termination in the first 
> part of jspawnhelper would be covered by the same path that detects 
> abnormal child process termination in user programs - Process.waitFor().
>
> If we determine we really need this crash detection, we could at least 
> improve it: move the liveness ping to just-before the exec() call, so 
> that we cover all the area from A-D. Also, do it for all modes (FORK, 
> too), to simplify coding.
>
> Bottomline: remove an obscure and complex small mechanism that only 
> helps a bit with detecting program errors (sigsegv etc) inside the 
> first part of jspawnhelper main() ?
>
> Thanks, Thomas
>
>