ProcessReaper: single thread reaper

Thu Apr 17 19:58:33 UTC 2014

On 04/17/2014 05:15 PM, David M. Lloyd wrote:
> On 04/17/2014 09:43 AM, Peter Levart wrote:
>> On 04/17/2014 09:07 AM, Martin Buchholz wrote:
>>> Many possible solutions eventually fail because whatever we do cannot
>>> take ownership of any global resource.  Calling waitid on all child
>>> processes, even with NOWAIT and NOHANG changes global state (what if
>>> another subprocess library in the same process is trying to do the
>>> same thing?)
>>
>> waitid(P_ALL, ..., NOWAIT | NOHANG) does not reap the child. It can be
>> repeated multiple times. It can be used as a precursor to real
>> waitid/waitpid which reaps a child, but only if it is "ours". The
>> problem with this approach is what to do in the following scenario: the
>> precursor waitid(P_ALL, ..., NOWAIT | NOHANG) returns a child that is
>> not "ours" so we don't reap it. The "owner" of that child (JNI-library)
>> does not do prompt reaping of their children. We loop, repeatedly
>> getting the same child as a result, not seeing any other children that
>> have exited in the meanwhile...
>
> Maybe it would be a good idea to create a process group for 
> JDK-managed subprocesses?  Otherwise, it seems that the only other 
> choice is to take over all child process management.

This was the first idea discussed in the thread. But it's not fool-proof 
either. Parent can set the process group of a child at it's creation 
(after fork() but before execv() or as the very act of posix_spawn()), 
but the child is free to change it's group at any time after that. Such 
child "escapes" the group and waiting on the group id:

waitpid(-pgid, ...)

...will never reap this child even after it exits. The escaping act is 
never reported to the parent. This usually does not happen in practice. 
Daemon processes are one of two kinds of processes that I know of, that 
change the group id, but they are grand-children of the JVM, not 
children, the other kind are processes spawned by some shell or other 
program that groups it's children in process groups to manage them so 
they are JVM grand-children too, so this is not a problem in practice, I 
think. But it's not a full-proof scheme.

Regards, Peter

>
>>
>> Regards, Peter
>>
>>>
>>>
>>> On Wed, Apr 16, 2014 at 3:34 PM, David M. Lloyd
>>> <david.lloyd at redhat.com <mailto:david.lloyd at redhat.com>> wrote:
>>>
>>>     On 04/16/2014 02:15 PM, Martin Buchholz wrote:
>>>
>>>         On Mon, Apr 14, 2014 at 1:57 PM, Peter Levart
>>>         <peter.levart at gmail.com <mailto:peter.levart at gmail.com>
>>>         <mailto:peter.levart at gmail.com
>>>         <mailto:peter.levart at gmail.com>>> wrote:
>>>
>>>
>>>             There's already such a race in current implementation of
>>>             Process.terminate(). It admittedly only concerns a small
>>>         window
>>>             between process exiting and the reaper thread managing to
>>>         signal
>>>             this state to the other threads wishing to terminate it at
>>>         the same
>>>             time, so it could happen that a KILL/TERM signal is sent 
>>> to an
>>>             already deceased PID which was re-used, but it doesn't
>>>         happen in
>>>             practice since PIDs are not re-used very soon typically.
>>>
>>>             But I agree, waiting between listing children and 
>>> sending them
>>>             signals increases the chance of hitting a reused PID.
>>>
>>>
>>>         We do rely on the OS not reusing a PID _immediately_. We used
>>>         to have
>>>         bugs in this area where Process.destroy would send a signal to
>>>         a pid
>>>         that may have deceased arbitrarily long ago.
>>>
>>>
>>>     It seems to me that the key to avoiding this is to ensure that
>>>     waitpid() is not called until we know the PID is ready to be
>>>     cleaned.  As long as waitpid() has not yet been called, we can be
>>>     certain that the process still exists and is ours.  So the real
>>>     question is, how can we know a process is dead without actually
>>>     calling wait() (thereby making that knowledge useless)?
>>>
>>>     The aforementioned /proc trick seems like one good way to do so
>>>     without, say, spawning a plethora of threads (though at one
>>>     additional FD per thread, it is not free either). Unforunately
>>>     /proc is not ubiquitous, and even where it does exist, it's not
>>>     standardized (thus its behavior probably cannot be relied upon
>>>     absolutely).
>>>
>>>     A simple solution may be to use a synchronized set of child PIDs,
>>>     and set a SIGCHLD handler or waiter which, when triggered, locks
>>>     the set and performs a series of waitid() operations with WNOHANG,
>>>     processing all the process status updates.  The signalling APIs
>>>     would be required to synchronize on the set to determine if the
>>>     process in question is owned by the parent process. Previously
>>>     unknown processes can be "adopted" into this area by acquiring the
>>>     synchronization and calling "waitpid()"+WNOHANG on the PID in
>>>     question, and using the result to determine whether the PID should
>>>     be added to the set (or whether we just reaped it - or whether it
>>>     doesn't belong to us at all).
>>>
>>>     As long as the process API is restricted to managing direct
>>>     children, this should work and be safe across all POSIX-ish
>>>     environments.  Note the potential downside that all children will
>>>     be automatically reaped, which is possibly somewhat hostile to
>>>     naïve JNI libraries or embedders. Selectively enabling the /proc
>>>     trick can mitigate this downside on platforms which support it
>>>     however.
>>>
>>>     --
>>>     - DML
>>>
>>>
>>
>
>