ProcessReaper: single thread reaper

Thu Apr 17 15:15:17 UTC 2014

On 04/17/2014 09:43 AM, Peter Levart wrote:
> On 04/17/2014 09:07 AM, Martin Buchholz wrote:
>> Many possible solutions eventually fail because whatever we do cannot
>> take ownership of any global resource.  Calling waitid on all child
>> processes, even with NOWAIT and NOHANG changes global state (what if
>> another subprocess library in the same process is trying to do the
>> same thing?)
>
> waitid(P_ALL, ..., NOWAIT | NOHANG) does not reap the child. It can be
> repeated multiple times. It can be used as a precursor to real
> waitid/waitpid which reaps a child, but only if it is "ours". The
> problem with this approach is what to do in the following scenario: the
> precursor waitid(P_ALL, ..., NOWAIT | NOHANG) returns a child that is
> not "ours" so we don't reap it. The "owner" of that child (JNI-library)
> does not do prompt reaping of their children. We loop, repeatedly
> getting the same child as a result, not seeing any other children that
> have exited in the meanwhile...

Maybe it would be a good idea to create a process group for JDK-managed 
subprocesses?  Otherwise, it seems that the only other choice is to take 
over all child process management.

>
> Regards, Peter
>
>>
>>
>> On Wed, Apr 16, 2014 at 3:34 PM, David M. Lloyd
>> <david.lloyd at redhat.com <mailto:david.lloyd at redhat.com>> wrote:
>>
>>     On 04/16/2014 02:15 PM, Martin Buchholz wrote:
>>
>>         On Mon, Apr 14, 2014 at 1:57 PM, Peter Levart
>>         <peter.levart at gmail.com <mailto:peter.levart at gmail.com>
>>         <mailto:peter.levart at gmail.com
>>         <mailto:peter.levart at gmail.com>>> wrote:
>>
>>
>>             There's already such a race in current implementation of
>>             Process.terminate(). It admittedly only concerns a small
>>         window
>>             between process exiting and the reaper thread managing to
>>         signal
>>             this state to the other threads wishing to terminate it at
>>         the same
>>             time, so it could happen that a KILL/TERM signal is sent to an
>>             already deceased PID which was re-used, but it doesn't
>>         happen in
>>             practice since PIDs are not re-used very soon typically.
>>
>>             But I agree, waiting between listing children and sending them
>>             signals increases the chance of hitting a reused PID.
>>
>>
>>         We do rely on the OS not reusing a PID _immediately_.  We used
>>         to have
>>         bugs in this area where Process.destroy would send a signal to
>>         a pid
>>         that may have deceased arbitrarily long ago.
>>
>>
>>     It seems to me that the key to avoiding this is to ensure that
>>     waitpid() is not called until we know the PID is ready to be
>>     cleaned.  As long as waitpid() has not yet been called, we can be
>>     certain that the process still exists and is ours.  So the real
>>     question is, how can we know a process is dead without actually
>>     calling wait() (thereby making that knowledge useless)?
>>
>>     The aforementioned /proc trick seems like one good way to do so
>>     without, say, spawning a plethora of threads (though at one
>>     additional FD per thread, it is not free either).  Unforunately
>>     /proc is not ubiquitous, and even where it does exist, it's not
>>     standardized (thus its behavior probably cannot be relied upon
>>     absolutely).
>>
>>     A simple solution may be to use a synchronized set of child PIDs,
>>     and set a SIGCHLD handler or waiter which, when triggered, locks
>>     the set and performs a series of waitid() operations with WNOHANG,
>>     processing all the process status updates.  The signalling APIs
>>     would be required to synchronize on the set to determine if the
>>     process in question is owned by the parent process.  Previously
>>     unknown processes can be "adopted" into this area by acquiring the
>>     synchronization and calling "waitpid()"+WNOHANG on the PID in
>>     question, and using the result to determine whether the PID should
>>     be added to the set (or whether we just reaped it - or whether it
>>     doesn't belong to us at all).
>>
>>     As long as the process API is restricted to managing direct
>>     children, this should work and be safe across all POSIX-ish
>>     environments.  Note the potential downside that all children will
>>     be automatically reaped, which is possibly somewhat hostile to
>>     naïve JNI libraries or embedders. Selectively enabling the /proc
>>     trick can mitigate this downside on platforms which support it
>>     however.
>>
>>     --
>>     - DML
>>
>>
>

-- 
- DML