ProcessReaper: single thread reaper

Wed Mar 26 18:01:16 UTC 2014

Hi  Martin,

My solution to that is to export a public API that can be used
by other subsystems that fork processes.  Some peaceful cooperation is 
required.

Roger

On 3/26/14 10:54 AM, Martin Buchholz wrote:
> Peter and Roger, please stop going down this road until you have a 
> solution for my show-stopper problem, that in the below you are 
> reaping children that don't belong to java.lang.Process
>
> +    pid = waitpid(-1, &exitValue, 0);
>
>
>
> On Wed, Mar 26, 2014 at 10:49 AM, Peter Levart <peter.levart at gmail.com 
> <mailto:peter.levart at gmail.com>> wrote:
>
>     Hi Roger,
>
>     Your current implementation works for the demostrated use case in
>     UNIXProcess, where a single call-back is registered for a pid. If
>     you wanted to register another, the waitList and exitList might
>     have already removed the pid from them as a result of the 1st
>     call-back already been serviced before registering the 2nd one.
>
>     So you might want to keep the entry in exitList for a while, to
>     accommodate for 2nd and subsequent call-back registration if such
>     usage is to be needed.
>
>     This is what I tried to do in the following code:
>
>     http://cr.openjdk.java.net/~plevart/jdk9-dev/ProcessWaiter/webrev.01/
>     <http://cr.openjdk.java.net/%7Eplevart/jdk9-dev/ProcessWaiter/webrev.01/>
>
>     I leveraged ConcurrentHashMap and CompletableFuture for the task.
>     As an example of usage in UNIXProcess, I modified the linux
>     variant only. A single waiter thread is  dispatching exit statuses
>     from exited children to CompletableFutures. Various waitFor() etc.
>     methods in UNIXProcess are implemented in terms of
>     Future.get()/isDone() methods and clean-up is implemented by
>     dispatching asynchronously a cleanup task to the thread pool, so
>     that multiple threads can help draining buffers. Children that are
>     collected by waiter thread but nobody has asked to be notified
>     about get expunged from the map after a time-out.
>
>     I haven't done any checking for live pids as you did in
>     checkLiveness() because in current usage, a pid that registers a
>     Future entry in the map is taken from a successful forkAndExec()
>     call so it will definitely be collected by the waitLoop() when the
>     child process exits. The only possibility that this does not
>     happen is if the same pid was waited for somewhere else (not in
>     the waitLoop()). So if this is to be supported, a scan of live
>     pids will be necessary from time to time and not only when
>     there're no unwaited children (suppose there is a long-lived child
>     running indefinitely).
>
>     What do you think?
>
>     Regards, Peter
>
>
>
>     On 03/25/2014 10:47 PM, Roger Riggs wrote:
>>     Hi Martin,
>>
>>     Two cases, one current and one future.
>>
>>     In the current case, Process can spawn a process and the process
>>     can exit before Process can register the callback to get the
>>     exitValue.
>>     Peter pointed out this race in his comments.
>>     The exitValue needs to be saved (for some time yet to be determined)
>>     to allow the Process to register and get its callback with the
>>     exitValue.
>>
>>     The second case is future looking to the case where a child process
>>     not spawned by Process is exiting.  It might be due to a child being
>>     inherited from a dieing child or due to some different subprocess
>>     launcher.
>>
>>     When the JEP 102 process work happens, it should be possible to
>>     wait or get called back for those non-spawned processes.
>>
>>     The single (smaller) number of threads has been requested to handle
>>     processes that control a large number of children. It could be from
>>     a thread pool, either dedicated or common.  The common threadpool
>>     does not expect its tasks to hang for an indefinite period as might
>>     occur when waiting for along running process to exit.
>>
>>     Thanks, Roger
>>
>>
>>     On 3/25/14 2:39 PM, Martin Buchholz wrote:
>>>     What happens if the pid you get back is a subprocess not created by ProcessBuilder?
>>>     +    pid = waitpid(-1, &exitValue, 0);
>>>     ---
>>>
>>>     What is the advantage of having a single thread?  Are you just
>>>     trying to save threads?  The cost of the reaper threads is much
>>>     lower than usual because of the small stack size.
>>>
>>>
>>>
>>>     On Tue, Mar 25, 2014 at 7:05 AM, Peter Levart
>>>     <peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>>
>>>         On 03/24/2014 10:05 PM, roger riggs wrote:
>>>>         Hi Rob, Martin, et.al <http://et.al>.
>>>>
>>>>         I've prototyped (for 9) a thread reaper[1] that uses a
>>>>         single thread to wait for exiting
>>>>         processes and calling back to the process with the exit
>>>>         status.
>>>>         the interesting part is getting the exit status back to the
>>>>         Process that needs it
>>>>         It needs more testing and hardening.
>>>>
>>>>         I had not considered using a signal handler for SIGCHLD but
>>>>         that's an option,
>>>>         though we need to be very careful about thread usage.
>>>>
>>>>         Roger
>>>>
>>>>         p.s. comments on the single thread reaper appreciated (in a
>>>>         new thread)
>>>>         [1] http://cr.openjdk.java.net/~rriggs/webrev-waitpid/
>>>>         <http://cr.openjdk.java.net/%7Erriggs/webrev-waitpid/>
>>>>
>>>>
>>>         Hi Roger,
>>>
>>>         I think I found a little race. Suppose reaper task is still
>>>         alive and that all consumers have been serviced
>>>         (consumerCount is 0). The reaper task waits for 500 millis
>>>         for next consumer to be registered, but times out. Before
>>>         calling "reaperThread.release()", new consumer comes around
>>>         and registers itself, also calling runReaper(), but since
>>>         reaperThread.release() has not yet been called by old reaper
>>>         task, new reaper task is not submitted to commonPool. The
>>>         old reaper task finishes, leaving one consumer on the
>>>         waitingList with no reaper task to service it. If no new
>>>         consumers get registered, the waiting consumer will never be
>>>         notified...
>>>
>>>         The simplest solution for this race, I think, would be to
>>>         have a dedicated long-running thread. It could be spawned
>>>         lazily, but then it would never finish.
>>>
>>>         Otherwise a nice solution with two lists (exitList/waitList)
>>>         and avoidance of race with reversed orders between
>>>         - consumer registration: register on waitList 1st then check
>>>         exitList, and
>>>         - exit event dispatch: register on exitList 1st then check
>>>         waitList
>>>
>>>         ...but the check you use with consumeCount local variable to
>>>         detect processes spawned by other means (for purposes of
>>>         logging only) has a race:
>>>
>>>         thread1: Suppose a new consumer is being registered with
>>>         onExitCall(...), is added on the waitList, but before
>>>         checking exitList().size() and iterating the exitList, ...
>>>         thread2: the reaper task detects that the very same process
>>>         has finished (gets it's pid from waitpid) and adds it's pid
>>>         to exitList. Then before iterating waitList, ...
>>>         thread1: iterates the exitList, finds a match and consumes
>>>         the pid, removing the matching entries from both exitList
>>>         and waitList. Then ...
>>>         thread2: the reaper task iterates waitList, doesn't find a
>>>         matching entry for exitPid, doesn't increment consumeCount
>>>         and voila: debug log("Unexpected process exit for pid:...").
>>>
>>>
>>>         That's enough races for today.
>>>
>>>         Regards, Peter
>>>
>>>>
>>>>
>>>>         On 3/24/2014 12:38 AM, Rob McKenna wrote:
>>>>>         Hi folks,
>>>>>
>>>>>         Roger Riggs (cc'd) may want to chip in here as he's
>>>>>         looking at the reaper thread arrangement in 9 at the moment.
>>>>>
>>>>>         On another note, I too support the merging of those files.
>>>>>         I didn't think there was much appetite for it at the time
>>>>>         so I must admit this fell down my todo list. Looking at
>>>>>         this bug did remind me that its something worth trying
>>>>>         though. As per Alan's mail, I'm going to tackle it
>>>>>         separately if you folks don't mind. I'll have a look at
>>>>>         Peter's changes (thanks Peter!) as soon as I can and see
>>>>>         about getting them in.
>>>>>
>>>>>             -Rob
>>>>>
>>>>>         On 23/03/14 22:30, Martin Buchholz wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>         On Sun, Mar 23, 2014 at 2:34 AM, Martin Buchholz
>>>>>>         <martinrb at google.com <mailto:martinrb at google.com>> wrote:
>>>>>>
>>>>>>
>>>>>>             We have also thought about whether having reaper
>>>>>>             threads is necessary.  The Unix rule is that child
>>>>>>             processes should be waited for, and some thread needs
>>>>>>             to do that.  There's no way to wait for a set of
>>>>>>             child pids, or to specify a "completion handler".
>>>>>>              Well, you might be able to get the newish waitid()
>>>>>>             to do what you want, but it looks like it's not
>>>>>>             sufficient when java is running inside a process that
>>>>>>             might do independent subprocess creation outside of
>>>>>>             the JVM.
>>>>>>
>>>>>>
>>>>>>         Actually, I take it back.  With sufficient work, it looks
>>>>>>         like you can get SIGCHLD to give you pid information in
>>>>>>         siginfo_t si_pid, and that can be used to trigger the
>>>>>>         reaping.  It looks like waitpid is "async-signal-safe",
>>>>>>         so we can call it from our signal handler.
>>>>>>
>>>>>>         While we're at it we can fix SIGCHLD handling to do
>>>>>>         signal chaining, as with other signals.
>>>>>
>>>>
>>>
>>>
>>
>
>