ProcessReaper: single thread reaper

David M. Lloyd david.lloyd at redhat.com
Thu Apr 17 22:19:43 UTC 2014


On 04/17/2014 02:58 PM, Peter Levart wrote:
>
> On 04/17/2014 05:15 PM, David M. Lloyd wrote:
>> On 04/17/2014 09:43 AM, Peter Levart wrote:
>>> On 04/17/2014 09:07 AM, Martin Buchholz wrote:
>>>> Many possible solutions eventually fail because whatever we do cannot
>>>> take ownership of any global resource.  Calling waitid on all child
>>>> processes, even with NOWAIT and NOHANG changes global state (what if
>>>> another subprocess library in the same process is trying to do the
>>>> same thing?)
>>>
>>> waitid(P_ALL, ..., NOWAIT | NOHANG) does not reap the child. It can be
>>> repeated multiple times. It can be used as a precursor to real
>>> waitid/waitpid which reaps a child, but only if it is "ours". The
>>> problem with this approach is what to do in the following scenario: the
>>> precursor waitid(P_ALL, ..., NOWAIT | NOHANG) returns a child that is
>>> not "ours" so we don't reap it. The "owner" of that child (JNI-library)
>>> does not do prompt reaping of their children. We loop, repeatedly
>>> getting the same child as a result, not seeing any other children that
>>> have exited in the meanwhile...
>>
>> Maybe it would be a good idea to create a process group for
>> JDK-managed subprocesses?  Otherwise, it seems that the only other
>> choice is to take over all child process management.
>
> This was the first idea discussed in the thread. But it's not fool-proof
> either. Parent can set the process group of a child at it's creation
> (after fork() but before execv() or as the very act of posix_spawn()),
> but the child is free to change it's group at any time after that. Such
> child "escapes" the group and waiting on the group id:
>
> waitpid(-pgid, ...)
>
> ...will never reap this child even after it exits. The escaping act is
> never reported to the parent. This usually does not happen in practice.
> Daemon processes are one of two kinds of processes that I know of, that
> change the group id, but they are grand-children of the JVM, not
> children, the other kind are processes spawned by some shell or other
> program that groups it's children in process groups to manage them so
> they are JVM grand-children too, so this is not a problem in practice, I
> think. But it's not a full-proof scheme.

Yeah definitely not.  I guess I am indeed going in circles at this 
point.  I wonder though if you'll indulge me a bit longer, and verify my 
collected understanding of the requirements of what is being requested here:

1) The process API must reap all child processes it produces that 
terminate during the lifetime of the JVM, leaving no zombies (including 
processes which have changed process group and/or session)
2) The process API must allow for child processes which are not managed 
by it (by not attempting to reap them except as allowed by #3)
3) The process API must somehow be able to "adopt" other child processes 
produced by means other than the Process API
4) The process reaper should keep resource consumption to a minimum 
(preferably no more than one thread, preferably no more than one extra 
FD per process)
5) The process API must provide an explicitly graceful terminate method 
in addition to the existing forcible and "unspecified" destroy methods
6) The process API must provide safeguards to prevent the wrong process 
from being signaled (i.e. would be required to synchronize process 
reaping with termination/signaling (PID reuse probabilities 
notwithstanding))

I've deliberately left off any mention of direct management of 
grandchild processes.  I believe it was pretty well established by Peter 
Levart that a child is solely the responsibility of its parent.  Martin 
Buchholz has doubts about it as well.  I think Roger Riggs had some 
unaddressed disagreement though.  For what it's worth, I agree with 
Peter on this point, because I think managing grand+children makes #6 
difficult or impossible to satisfy.  But the topic, AFAIK, remains open.

Also I haven't brought up anything from JEP 102 that I haven't already 
seen on this thread.

These requirements seem to exclude some techniques brought up on the 
thread previously:

- waitid(P_ALL,...)/waitpid(-1,...) (which violates #2, either directly, 
or by simply failing in the WNOWAIT|WNOHANG + unmanaged child process 
case previously outlined by Peter Levart).
- setpgid() to an all-child process group + waitid(P_PID,...) (which 
allows badly behaved processes to cause us to violate #1, and also 
prevents automatic propagation of e.g. SIGTERM/SIGINT)
- setpgid() to a per-child process group (same problems, also no 
workable reaping solution was found that I saw)
- SIGCHLD + siginfo (very unlikely to work consistently or correctly)
- anything relying on WNOWAIT on Mac OS X and maybe others

I think everyone liked the idea of pluggable implementations.

I didn't see this mentioned on this thread, but it seems to me that we 
can have a simple 100% correct implementation on UNIX-likes by retaining 
a single thread per child process (today each one has a 32k stack, maybe 
it could be even smaller?).  Much like the default polling 
SelectorProvider for NIO, this could act as a simple fallback 
implementation that will always work and be correct.

On proc-enabled systems, using poll or similar on the corresponding proc 
files seems like a possible alternative implementation requiring one 
additional FD per child process and only one reaper thread, since it 
seems possible to meet all 6 above requirements, though lack of 
standardization might add risk.

Using a single thread to iterate all child PIDs each time a SIGCHLD is 
received (with WNOHANG) would work without consuming more than one 
thread and zero FDs total, however it scales poorly with very large 
numbers of child processes, and it might be considered a violation of #2 
to use SIGCHLD anyway.  Maybe these ideas could be implemented as an 
alternative, contingent on -Xrs, or contingent on the previous handler 
being SIG_IGN similarly to the suggestion by Martin Buchholz.

I didn't see any other workable implementation alternatives.

As for API, I had suggested that "adopted" processes have a strict 
subset of functions compared to "managed" processes, and thus could be a 
supertype of Process.  Martin indicated that managing grandchildren 
should have a different API altogether.  Peter seems to lean towards 
exposing the OS capabilities a bit more directly, through child process 
ID enumeration (presumably including managed and unmanaged processes in 
the same bucket) and an API which operates on any child process by ID, 
regardless of its disposition (though I don't know of any portable API 
to enumerate child processes; on Linux I believe you have to use /proc). 
  Peter also suggested that a process reaper be a primary internal API 
construct.

Did I miss anything?
-- 
- DML



More information about the core-libs-dev mailing list