ProcessReaper: single thread reaper
Peter Levart
peter.levart at gmail.com
Wed Mar 26 17:49:07 UTC 2014
Hi Roger,
Your current implementation works for the demostrated use case in
UNIXProcess, where a single call-back is registered for a pid. If you
wanted to register another, the waitList and exitList might have already
removed the pid from them as a result of the 1st call-back already been
serviced before registering the 2nd one.
So you might want to keep the entry in exitList for a while, to
accommodate for 2nd and subsequent call-back registration if such usage
is to be needed.
This is what I tried to do in the following code:
http://cr.openjdk.java.net/~plevart/jdk9-dev/ProcessWaiter/webrev.01/
I leveraged ConcurrentHashMap and CompletableFuture for the task. As an
example of usage in UNIXProcess, I modified the linux variant only. A
single waiter thread is dispatching exit statuses from exited children
to CompletableFutures. Various waitFor() etc. methods in UNIXProcess are
implemented in terms of Future.get()/isDone() methods and clean-up is
implemented by dispatching asynchronously a cleanup task to the thread
pool, so that multiple threads can help draining buffers. Children that
are collected by waiter thread but nobody has asked to be notified about
get expunged from the map after a time-out.
I haven't done any checking for live pids as you did in checkLiveness()
because in current usage, a pid that registers a Future entry in the map
is taken from a successful forkAndExec() call so it will definitely be
collected by the waitLoop() when the child process exits. The only
possibility that this does not happen is if the same pid was waited for
somewhere else (not in the waitLoop()). So if this is to be supported, a
scan of live pids will be necessary from time to time and not only when
there're no unwaited children (suppose there is a long-lived child
running indefinitely).
What do you think?
Regards, Peter
On 03/25/2014 10:47 PM, Roger Riggs wrote:
> Hi Martin,
>
> Two cases, one current and one future.
>
> In the current case, Process can spawn a process and the process
> can exit before Process can register the callback to get the exitValue.
> Peter pointed out this race in his comments.
> The exitValue needs to be saved (for some time yet to be determined)
> to allow the Process to register and get its callback with the exitValue.
>
> The second case is future looking to the case where a child process
> not spawned by Process is exiting. It might be due to a child being
> inherited from a dieing child or due to some different subprocess
> launcher.
>
> When the JEP 102 process work happens, it should be possible to
> wait or get called back for those non-spawned processes.
>
> The single (smaller) number of threads has been requested to handle
> processes that control a large number of children. It could be from
> a thread pool, either dedicated or common. The common threadpool
> does not expect its tasks to hang for an indefinite period as might
> occur when waiting for along running process to exit.
>
> Thanks, Roger
>
>
> On 3/25/14 2:39 PM, Martin Buchholz wrote:
>> What happens if the pid you get back is a subprocess not created by ProcessBuilder?
>> + pid = waitpid(-1, &exitValue, 0);
>> ---
>>
>> What is the advantage of having a single thread? Are you just trying
>> to save threads? The cost of the reaper threads is much lower than
>> usual because of the small stack size.
>>
>>
>>
>> On Tue, Mar 25, 2014 at 7:05 AM, Peter Levart <peter.levart at gmail.com
>> <mailto:peter.levart at gmail.com>> wrote:
>>
>> On 03/24/2014 10:05 PM, roger riggs wrote:
>>> Hi Rob, Martin, et.al <http://et.al>.
>>>
>>> I've prototyped (for 9) a thread reaper[1] that uses a single
>>> thread to wait for exiting
>>> processes and calling back to the process with the exit status.
>>> the interesting part is getting the exit status back to the
>>> Process that needs it
>>> It needs more testing and hardening.
>>>
>>> I had not considered using a signal handler for SIGCHLD but
>>> that's an option,
>>> though we need to be very careful about thread usage.
>>>
>>> Roger
>>>
>>> p.s. comments on the single thread reaper appreciated (in a new
>>> thread)
>>> [1] http://cr.openjdk.java.net/~rriggs/webrev-waitpid/
>>> <http://cr.openjdk.java.net/%7Erriggs/webrev-waitpid/>
>>>
>>>
>> Hi Roger,
>>
>> I think I found a little race. Suppose reaper task is still alive
>> and that all consumers have been serviced (consumerCount is 0).
>> The reaper task waits for 500 millis for next consumer to be
>> registered, but times out. Before calling
>> "reaperThread.release()", new consumer comes around and registers
>> itself, also calling runReaper(), but since
>> reaperThread.release() has not yet been called by old reaper
>> task, new reaper task is not submitted to commonPool. The old
>> reaper task finishes, leaving one consumer on the waitingList
>> with no reaper task to service it. If no new consumers get
>> registered, the waiting consumer will never be notified...
>>
>> The simplest solution for this race, I think, would be to have a
>> dedicated long-running thread. It could be spawned lazily, but
>> then it would never finish.
>>
>> Otherwise a nice solution with two lists (exitList/waitList) and
>> avoidance of race with reversed orders between
>> - consumer registration: register on waitList 1st then check
>> exitList, and
>> - exit event dispatch: register on exitList 1st then check waitList
>>
>> ...but the check you use with consumeCount local variable to
>> detect processes spawned by other means (for purposes of logging
>> only) has a race:
>>
>> thread1: Suppose a new consumer is being registered with
>> onExitCall(...), is added on the waitList, but before checking
>> exitList().size() and iterating the exitList, ...
>> thread2: the reaper task detects that the very same process has
>> finished (gets it's pid from waitpid) and adds it's pid to
>> exitList. Then before iterating waitList, ...
>> thread1: iterates the exitList, finds a match and consumes the
>> pid, removing the matching entries from both exitList and
>> waitList. Then ...
>> thread2: the reaper task iterates waitList, doesn't find a
>> matching entry for exitPid, doesn't increment consumeCount and
>> voila: debug log("Unexpected process exit for pid:...").
>>
>>
>> That's enough races for today.
>>
>> Regards, Peter
>>
>>>
>>>
>>> On 3/24/2014 12:38 AM, Rob McKenna wrote:
>>>> Hi folks,
>>>>
>>>> Roger Riggs (cc'd) may want to chip in here as he's looking at
>>>> the reaper thread arrangement in 9 at the moment.
>>>>
>>>> On another note, I too support the merging of those files. I
>>>> didn't think there was much appetite for it at the time so I
>>>> must admit this fell down my todo list. Looking at this bug did
>>>> remind me that its something worth trying though. As per Alan's
>>>> mail, I'm going to tackle it separately if you folks don't
>>>> mind. I'll have a look at Peter's changes (thanks Peter!) as
>>>> soon as I can and see about getting them in.
>>>>
>>>> -Rob
>>>>
>>>> On 23/03/14 22:30, Martin Buchholz wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Mar 23, 2014 at 2:34 AM, Martin Buchholz
>>>>> <martinrb at google.com <mailto:martinrb at google.com>> wrote:
>>>>>
>>>>>
>>>>> We have also thought about whether having reaper threads
>>>>> is necessary. The Unix rule is that child processes
>>>>> should be waited for, and some thread needs to do that.
>>>>> There's no way to wait for a set of child pids, or to
>>>>> specify a "completion handler". Well, you might be able
>>>>> to get the newish waitid() to do what you want, but it
>>>>> looks like it's not sufficient when java is running inside
>>>>> a process that might do independent subprocess creation
>>>>> outside of the JVM.
>>>>>
>>>>>
>>>>> Actually, I take it back. With sufficient work, it looks like
>>>>> you can get SIGCHLD to give you pid information in siginfo_t
>>>>> si_pid, and that can be used to trigger the reaping. It looks
>>>>> like waitpid is "async-signal-safe", so we can call it from
>>>>> our signal handler.
>>>>>
>>>>> While we're at it we can fix SIGCHLD handling to do signal
>>>>> chaining, as with other signals.
>>>>
>>>
>>
>>
>
More information about the core-libs-dev
mailing list