ProcessReaper: single thread reaper

Peter Levart peter.levart at gmail.com
Thu Apr 17 14:43:52 UTC 2014


On 04/17/2014 09:07 AM, Martin Buchholz wrote:
> Many possible solutions eventually fail because whatever we do cannot 
> take ownership of any global resource.  Calling waitid on all child 
> processes, even with NOWAIT and NOHANG changes global state (what if 
> another subprocess library in the same process is trying to do the 
> same thing?)

waitid(P_ALL, ..., NOWAIT | NOHANG) does not reap the child. It can be 
repeated multiple times. It can be used as a precursor to real 
waitid/waitpid which reaps a child, but only if it is "ours". The 
problem with this approach is what to do in the following scenario: the 
precursor waitid(P_ALL, ..., NOWAIT | NOHANG) returns a child that is 
not "ours" so we don't reap it. The "owner" of that child (JNI-library) 
does not do prompt reaping of their children. We loop, repeatedly 
getting the same child as a result, not seeing any other children that 
have exited in the meanwhile...

Regards, Peter

>
>
> On Wed, Apr 16, 2014 at 3:34 PM, David M. Lloyd 
> <david.lloyd at redhat.com <mailto:david.lloyd at redhat.com>> wrote:
>
>     On 04/16/2014 02:15 PM, Martin Buchholz wrote:
>
>         On Mon, Apr 14, 2014 at 1:57 PM, Peter Levart
>         <peter.levart at gmail.com <mailto:peter.levart at gmail.com>
>         <mailto:peter.levart at gmail.com
>         <mailto:peter.levart at gmail.com>>> wrote:
>
>
>             There's already such a race in current implementation of
>             Process.terminate(). It admittedly only concerns a small
>         window
>             between process exiting and the reaper thread managing to
>         signal
>             this state to the other threads wishing to terminate it at
>         the same
>             time, so it could happen that a KILL/TERM signal is sent to an
>             already deceased PID which was re-used, but it doesn't
>         happen in
>             practice since PIDs are not re-used very soon typically.
>
>             But I agree, waiting between listing children and sending them
>             signals increases the chance of hitting a reused PID.
>
>
>         We do rely on the OS not reusing a PID _immediately_.  We used
>         to have
>         bugs in this area where Process.destroy would send a signal to
>         a pid
>         that may have deceased arbitrarily long ago.
>
>
>     It seems to me that the key to avoiding this is to ensure that
>     waitpid() is not called until we know the PID is ready to be
>     cleaned.  As long as waitpid() has not yet been called, we can be
>     certain that the process still exists and is ours.  So the real
>     question is, how can we know a process is dead without actually
>     calling wait() (thereby making that knowledge useless)?
>
>     The aforementioned /proc trick seems like one good way to do so
>     without, say, spawning a plethora of threads (though at one
>     additional FD per thread, it is not free either).  Unforunately
>     /proc is not ubiquitous, and even where it does exist, it's not
>     standardized (thus its behavior probably cannot be relied upon
>     absolutely).
>
>     A simple solution may be to use a synchronized set of child PIDs,
>     and set a SIGCHLD handler or waiter which, when triggered, locks
>     the set and performs a series of waitid() operations with WNOHANG,
>     processing all the process status updates.  The signalling APIs
>     would be required to synchronize on the set to determine if the
>     process in question is owned by the parent process.  Previously
>     unknown processes can be "adopted" into this area by acquiring the
>     synchronization and calling "waitpid()"+WNOHANG on the PID in
>     question, and using the result to determine whether the PID should
>     be added to the set (or whether we just reaped it - or whether it
>     doesn't belong to us at all).
>
>     As long as the process API is restricted to managing direct
>     children, this should work and be safe across all POSIX-ish
>     environments.  Note the potential downside that all children will
>     be automatically reaped, which is possibly somewhat hostile to
>     naïve JNI libraries or embedders. Selectively enabling the /proc
>     trick can mitigate this downside on platforms which support it
>     however.
>
>     -- 
>     - DML
>
>




More information about the core-libs-dev mailing list