ProcessReaper: single thread reaper

roger riggs roger.riggs at oracle.com
Fri Apr 18 13:29:04 UTC 2014


Hi David,

Thanks for collecting the threads...  I've been a bit occupied with 
another task.

On 04/17/2014 02:58 PM, Peter Levart wrote:
> ... I guess I am indeed going in circles at this point.  I wonder 
> though if you'll indulge me a bit longer, and verify my collected 
> understanding of the requirements of what is being requested here:
>
> 1) The process API must reap all child processes it produces that 
> terminate during the lifetime of the JVM, leaving no zombies 
> (including processes which have changed process group and/or session)
> 2) The process API must allow for child processes which are not 
> managed by it (by not attempting to reap them except as allowed by #3)
> 3) The process API must somehow be able to "adopt" other child 
> processes produced by means other than the Process API
> 4) The process reaper should keep resource consumption to a minimum 
> (preferably no more than one thread, preferably no more than one extra 
> FD per process)
> 5) The process API must provide an explicitly graceful terminate 
> method in addition to the existing forcible and "unspecified" destroy 
> methods
> 6) The process API must provide safeguards to prevent the wrong 
> process from being signaled (i.e. would be required to synchronize 
> process reaping with termination/signaling (PID reuse probabilities 
> notwithstanding))
>
> I've deliberately left off any mention of direct management of 
> grandchild processes.  I believe it was pretty well established by 
> Peter Levart that a child is solely the responsibility of its parent.  
> Martin Buchholz has doubts about it as well.  I think Roger Riggs had 
> some unaddressed disagreement though.  For what it's worth, I agree 
> with Peter on this point, because I think managing grand+children 
> makes #6 difficult or impossible to satisfy.  But the topic, AFAIK, 
> remains open.
Correct, in the cleanup case, we have seen zombies left around and will 
need to investigate the cases.
If the child process does not clean up after itself, someone still does.
>
> Also I haven't brought up anything from JEP 102 that I haven't already 
> seen on this thread.
>
> These requirements seem to exclude some techniques brought up on the 
> thread previously:
>
> - waitid(P_ALL,...)/waitpid(-1,...) (which violates #2, either 
> directly, or by simply failing in the WNOWAIT|WNOHANG + unmanaged 
> child process case previously outlined by Peter Levart).
> - setpgid() to an all-child process group + waitid(P_PID,...) (which 
> allows badly behaved processes to cause us to violate #1, and also 
> prevents automatic propagation of e.g. SIGTERM/SIGINT)
> - setpgid() to a per-child process group (same problems, also no 
> workable reaping solution was found that I saw)
> - SIGCHLD + siginfo (very unlikely to work consistently or correctly)
> - anything relying on WNOWAIT on Mac OS X and maybe others
>
> I think everyone liked the idea of pluggable implementations.
>
> I didn't see this mentioned on this thread, but it seems to me that we 
> can have a simple 100% correct implementation on UNIX-likes by 
> retaining a single thread per child process (today each one has a 32k 
> stack, maybe it could be even smaller?).  Much like the default 
> polling SelectorProvider for NIO, this could act as a simple fallback 
> implementation that will always work and be correct.
Yes, seem clear for 100% backward compatibility this is needed (and 
probably the default at least to start)
>
> On proc-enabled systems, using poll or similar on the corresponding 
> proc files seems like a possible alternative implementation requiring 
> one additional FD per child process and only one reaper thread, since 
> it seems possible to meet all 6 above requirements, though lack of 
> standardization might add risk.
>
> Using a single thread to iterate all child PIDs each time a SIGCHLD is 
> received (with WNOHANG) would work without consuming more than one 
> thread and zero FDs total, however it scales poorly with very large 
> numbers of child processes, and it might be considered a violation of 
> #2 to use SIGCHLD anyway.  Maybe these ideas could be implemented as 
> an alternative, contingent on -Xrs, or contingent on the previous 
> handler being SIG_IGN similarly to the suggestion by Martin Buchholz.
>
> I didn't see any other workable implementation alternatives.
>
> As for API, I had suggested that "adopted" processes have a strict 
> subset of functions compared to "managed" processes, and thus could be 
> a supertype of Process.  Martin indicated that managing grandchildren 
> should have a different API altogether.  Peter seems to lean towards 
> exposing the OS capabilities a bit more directly, through child 
> process ID enumeration (presumably including managed and unmanaged 
> processes in the same bucket) and an API which operates on any child 
> process by ID, regardless of its disposition (though I don't know of 
> any portable API to enumerate child processes; on Linux I believe you 
> have to use /proc).  Peter also suggested that a process reaper be a 
> primary internal API construct.
I have been working on the premise of a separate API will fewer 
functions.  The primary function
that is difficult and may need to be omitted is getting the exit status 
of an unmanaged subprocess.
I suspect the API may be limited to knowing if the process is alive and 
being able to terminate it.
There may need to be a configuration, primarily related to the reaper 
that does manage every child.
I have prototyped an implementation that works across the 4/5 main OSs 
for iterating over processes.

Thanks for the good summary.

Roger

>
> Did I miss anything?




More information about the core-libs-dev mailing list