ProcessReaper: single thread reaper
Martin Buchholz
martinrb at google.com
Wed Mar 26 17:54:36 UTC 2014
Peter and Roger, please stop going down this road until you have a solution
for my show-stopper problem, that in the below you are reaping children
that don't belong to java.lang.Process
+ pid = waitpid(-1, &exitValue, 0);
On Wed, Mar 26, 2014 at 10:49 AM, Peter Levart <peter.levart at gmail.com>wrote:
> Hi Roger,
>
> Your current implementation works for the demostrated use case in
> UNIXProcess, where a single call-back is registered for a pid. If you
> wanted to register another, the waitList and exitList might have already
> removed the pid from them as a result of the 1st call-back already been
> serviced before registering the 2nd one.
>
> So you might want to keep the entry in exitList for a while, to
> accommodate for 2nd and subsequent call-back registration if such usage is
> to be needed.
>
> This is what I tried to do in the following code:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/ProcessWaiter/webrev.01/
>
> I leveraged ConcurrentHashMap and CompletableFuture for the task. As an
> example of usage in UNIXProcess, I modified the linux variant only. A
> single waiter thread is dispatching exit statuses from exited children to
> CompletableFutures. Various waitFor() etc. methods in UNIXProcess are
> implemented in terms of Future.get()/isDone() methods and clean-up is
> implemented by dispatching asynchronously a cleanup task to the thread
> pool, so that multiple threads can help draining buffers. Children that are
> collected by waiter thread but nobody has asked to be notified about get
> expunged from the map after a time-out.
>
> I haven't done any checking for live pids as you did in checkLiveness()
> because in current usage, a pid that registers a Future entry in the map is
> taken from a successful forkAndExec() call so it will definitely be
> collected by the waitLoop() when the child process exits. The only
> possibility that this does not happen is if the same pid was waited for
> somewhere else (not in the waitLoop()). So if this is to be supported, a
> scan of live pids will be necessary from time to time and not only when
> there're no unwaited children (suppose there is a long-lived child running
> indefinitely).
>
> What do you think?
>
> Regards, Peter
>
>
>
> On 03/25/2014 10:47 PM, Roger Riggs wrote:
>
> Hi Martin,
>
> Two cases, one current and one future.
>
> In the current case, Process can spawn a process and the process
> can exit before Process can register the callback to get the exitValue.
> Peter pointed out this race in his comments.
> The exitValue needs to be saved (for some time yet to be determined)
> to allow the Process to register and get its callback with the exitValue.
>
> The second case is future looking to the case where a child process
> not spawned by Process is exiting. It might be due to a child being
> inherited from a dieing child or due to some different subprocess launcher.
>
> When the JEP 102 process work happens, it should be possible to
> wait or get called back for those non-spawned processes.
>
> The single (smaller) number of threads has been requested to handle
> processes that control a large number of children. It could be from
> a thread pool, either dedicated or common. The common threadpool
> does not expect its tasks to hang for an indefinite period as might
> occur when waiting for along running process to exit.
>
> Thanks, Roger
>
>
> On 3/25/14 2:39 PM, Martin Buchholz wrote:
>
> What happens if the pid you get back is a subprocess not created by ProcessBuilder?
>
> + pid = waitpid(-1, &exitValue, 0);
>
> ---
>
> What is the advantage of having a single thread? Are you just trying to
> save threads? The cost of the reaper threads is much lower than usual
> because of the small stack size.
>
>
>
> On Tue, Mar 25, 2014 at 7:05 AM, Peter Levart <peter.levart at gmail.com>wrote:
>
>> On 03/24/2014 10:05 PM, roger riggs wrote:
>>
>> Hi Rob, Martin, et.al.
>>
>> I've prototyped (for 9) a thread reaper[1] that uses a single thread to
>> wait for exiting
>> processes and calling back to the process with the exit status.
>> the interesting part is getting the exit status back to the Process that
>> needs it
>> It needs more testing and hardening.
>>
>> I had not considered using a signal handler for SIGCHLD but that's an
>> option,
>> though we need to be very careful about thread usage.
>>
>> Roger
>>
>> p.s. comments on the single thread reaper appreciated (in a new thread)
>> [1] http://cr.openjdk.java.net/~rriggs/webrev-waitpid/
>>
>>
>> Hi Roger,
>>
>> I think I found a little race. Suppose reaper task is still alive and
>> that all consumers have been serviced (consumerCount is 0). The reaper task
>> waits for 500 millis for next consumer to be registered, but times out.
>> Before calling "reaperThread.release()", new consumer comes around and
>> registers itself, also calling runReaper(), but since
>> reaperThread.release() has not yet been called by old reaper task, new
>> reaper task is not submitted to commonPool. The old reaper task finishes,
>> leaving one consumer on the waitingList with no reaper task to service it.
>> If no new consumers get registered, the waiting consumer will never be
>> notified...
>>
>> The simplest solution for this race, I think, would be to have a
>> dedicated long-running thread. It could be spawned lazily, but then it
>> would never finish.
>>
>> Otherwise a nice solution with two lists (exitList/waitList) and
>> avoidance of race with reversed orders between
>> - consumer registration: register on waitList 1st then check exitList,
>> and
>> - exit event dispatch: register on exitList 1st then check waitList
>>
>> ...but the check you use with consumeCount local variable to detect
>> processes spawned by other means (for purposes of logging only) has a race:
>>
>> thread1: Suppose a new consumer is being registered with onExitCall(...),
>> is added on the waitList, but before checking exitList().size() and
>> iterating the exitList, ...
>> thread2: the reaper task detects that the very same process has finished
>> (gets it's pid from waitpid) and adds it's pid to exitList. Then before
>> iterating waitList, ...
>> thread1: iterates the exitList, finds a match and consumes the pid,
>> removing the matching entries from both exitList and waitList. Then ...
>> thread2: the reaper task iterates waitList, doesn't find a matching entry
>> for exitPid, doesn't increment consumeCount and voila: debug
>> log("Unexpected process exit for pid:...").
>>
>>
>> That's enough races for today.
>>
>> Regards, Peter
>>
>>
>>
>> On 3/24/2014 12:38 AM, Rob McKenna wrote:
>>
>> Hi folks,
>>
>> Roger Riggs (cc'd) may want to chip in here as he's looking at the reaper
>> thread arrangement in 9 at the moment.
>>
>> On another note, I too support the merging of those files. I didn't think
>> there was much appetite for it at the time so I must admit this fell down
>> my todo list. Looking at this bug did remind me that its something worth
>> trying though. As per Alan's mail, I'm going to tackle it separately if you
>> folks don't mind. I'll have a look at Peter's changes (thanks Peter!) as
>> soon as I can and see about getting them in.
>>
>> -Rob
>>
>> On 23/03/14 22:30, Martin Buchholz wrote:
>>
>>
>>
>>
>> On Sun, Mar 23, 2014 at 2:34 AM, Martin Buchholz <martinrb at google.com>wrote:
>>
>>>
>>> We have also thought about whether having reaper threads is necessary.
>>> The Unix rule is that child processes should be waited for, and some
>>> thread needs to do that. There's no way to wait for a set of child pids,
>>> or to specify a "completion handler". Well, you might be able to get the
>>> newish waitid() to do what you want, but it looks like it's not sufficient
>>> when java is running inside a process that might do independent subprocess
>>> creation outside of the JVM.
>>>
>>>
>> Actually, I take it back. With sufficient work, it looks like you can
>> get SIGCHLD to give you pid information in siginfo_t si_pid, and that can
>> be used to trigger the reaping. It looks like waitpid is
>> "async-signal-safe", so we can call it from our signal handler.
>>
>> While we're at it we can fix SIGCHLD handling to do signal chaining, as
>> with other signals.
>>
>>
>>
>>
>>
>
>
>
More information about the core-libs-dev
mailing list