ProcessReaper: single thread reaper
Roger Riggs
Roger.Riggs at Oracle.com
Wed Mar 26 18:01:16 UTC 2014
Hi Martin,
My solution to that is to export a public API that can be used
by other subsystems that fork processes. Some peaceful cooperation is
required.
Roger
On 3/26/14 10:54 AM, Martin Buchholz wrote:
> Peter and Roger, please stop going down this road until you have a
> solution for my show-stopper problem, that in the below you are
> reaping children that don't belong to java.lang.Process
>
> + pid = waitpid(-1, &exitValue, 0);
>
>
>
> On Wed, Mar 26, 2014 at 10:49 AM, Peter Levart <peter.levart at gmail.com
> <mailto:peter.levart at gmail.com>> wrote:
>
> Hi Roger,
>
> Your current implementation works for the demostrated use case in
> UNIXProcess, where a single call-back is registered for a pid. If
> you wanted to register another, the waitList and exitList might
> have already removed the pid from them as a result of the 1st
> call-back already been serviced before registering the 2nd one.
>
> So you might want to keep the entry in exitList for a while, to
> accommodate for 2nd and subsequent call-back registration if such
> usage is to be needed.
>
> This is what I tried to do in the following code:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/ProcessWaiter/webrev.01/
> <http://cr.openjdk.java.net/%7Eplevart/jdk9-dev/ProcessWaiter/webrev.01/>
>
> I leveraged ConcurrentHashMap and CompletableFuture for the task.
> As an example of usage in UNIXProcess, I modified the linux
> variant only. A single waiter thread is dispatching exit statuses
> from exited children to CompletableFutures. Various waitFor() etc.
> methods in UNIXProcess are implemented in terms of
> Future.get()/isDone() methods and clean-up is implemented by
> dispatching asynchronously a cleanup task to the thread pool, so
> that multiple threads can help draining buffers. Children that are
> collected by waiter thread but nobody has asked to be notified
> about get expunged from the map after a time-out.
>
> I haven't done any checking for live pids as you did in
> checkLiveness() because in current usage, a pid that registers a
> Future entry in the map is taken from a successful forkAndExec()
> call so it will definitely be collected by the waitLoop() when the
> child process exits. The only possibility that this does not
> happen is if the same pid was waited for somewhere else (not in
> the waitLoop()). So if this is to be supported, a scan of live
> pids will be necessary from time to time and not only when
> there're no unwaited children (suppose there is a long-lived child
> running indefinitely).
>
> What do you think?
>
> Regards, Peter
>
>
>
> On 03/25/2014 10:47 PM, Roger Riggs wrote:
>> Hi Martin,
>>
>> Two cases, one current and one future.
>>
>> In the current case, Process can spawn a process and the process
>> can exit before Process can register the callback to get the
>> exitValue.
>> Peter pointed out this race in his comments.
>> The exitValue needs to be saved (for some time yet to be determined)
>> to allow the Process to register and get its callback with the
>> exitValue.
>>
>> The second case is future looking to the case where a child process
>> not spawned by Process is exiting. It might be due to a child being
>> inherited from a dieing child or due to some different subprocess
>> launcher.
>>
>> When the JEP 102 process work happens, it should be possible to
>> wait or get called back for those non-spawned processes.
>>
>> The single (smaller) number of threads has been requested to handle
>> processes that control a large number of children. It could be from
>> a thread pool, either dedicated or common. The common threadpool
>> does not expect its tasks to hang for an indefinite period as might
>> occur when waiting for along running process to exit.
>>
>> Thanks, Roger
>>
>>
>> On 3/25/14 2:39 PM, Martin Buchholz wrote:
>>> What happens if the pid you get back is a subprocess not created by ProcessBuilder?
>>> + pid = waitpid(-1, &exitValue, 0);
>>> ---
>>>
>>> What is the advantage of having a single thread? Are you just
>>> trying to save threads? The cost of the reaper threads is much
>>> lower than usual because of the small stack size.
>>>
>>>
>>>
>>> On Tue, Mar 25, 2014 at 7:05 AM, Peter Levart
>>> <peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>>
>>> On 03/24/2014 10:05 PM, roger riggs wrote:
>>>> Hi Rob, Martin, et.al <http://et.al>.
>>>>
>>>> I've prototyped (for 9) a thread reaper[1] that uses a
>>>> single thread to wait for exiting
>>>> processes and calling back to the process with the exit
>>>> status.
>>>> the interesting part is getting the exit status back to the
>>>> Process that needs it
>>>> It needs more testing and hardening.
>>>>
>>>> I had not considered using a signal handler for SIGCHLD but
>>>> that's an option,
>>>> though we need to be very careful about thread usage.
>>>>
>>>> Roger
>>>>
>>>> p.s. comments on the single thread reaper appreciated (in a
>>>> new thread)
>>>> [1] http://cr.openjdk.java.net/~rriggs/webrev-waitpid/
>>>> <http://cr.openjdk.java.net/%7Erriggs/webrev-waitpid/>
>>>>
>>>>
>>> Hi Roger,
>>>
>>> I think I found a little race. Suppose reaper task is still
>>> alive and that all consumers have been serviced
>>> (consumerCount is 0). The reaper task waits for 500 millis
>>> for next consumer to be registered, but times out. Before
>>> calling "reaperThread.release()", new consumer comes around
>>> and registers itself, also calling runReaper(), but since
>>> reaperThread.release() has not yet been called by old reaper
>>> task, new reaper task is not submitted to commonPool. The
>>> old reaper task finishes, leaving one consumer on the
>>> waitingList with no reaper task to service it. If no new
>>> consumers get registered, the waiting consumer will never be
>>> notified...
>>>
>>> The simplest solution for this race, I think, would be to
>>> have a dedicated long-running thread. It could be spawned
>>> lazily, but then it would never finish.
>>>
>>> Otherwise a nice solution with two lists (exitList/waitList)
>>> and avoidance of race with reversed orders between
>>> - consumer registration: register on waitList 1st then check
>>> exitList, and
>>> - exit event dispatch: register on exitList 1st then check
>>> waitList
>>>
>>> ...but the check you use with consumeCount local variable to
>>> detect processes spawned by other means (for purposes of
>>> logging only) has a race:
>>>
>>> thread1: Suppose a new consumer is being registered with
>>> onExitCall(...), is added on the waitList, but before
>>> checking exitList().size() and iterating the exitList, ...
>>> thread2: the reaper task detects that the very same process
>>> has finished (gets it's pid from waitpid) and adds it's pid
>>> to exitList. Then before iterating waitList, ...
>>> thread1: iterates the exitList, finds a match and consumes
>>> the pid, removing the matching entries from both exitList
>>> and waitList. Then ...
>>> thread2: the reaper task iterates waitList, doesn't find a
>>> matching entry for exitPid, doesn't increment consumeCount
>>> and voila: debug log("Unexpected process exit for pid:...").
>>>
>>>
>>> That's enough races for today.
>>>
>>> Regards, Peter
>>>
>>>>
>>>>
>>>> On 3/24/2014 12:38 AM, Rob McKenna wrote:
>>>>> Hi folks,
>>>>>
>>>>> Roger Riggs (cc'd) may want to chip in here as he's
>>>>> looking at the reaper thread arrangement in 9 at the moment.
>>>>>
>>>>> On another note, I too support the merging of those files.
>>>>> I didn't think there was much appetite for it at the time
>>>>> so I must admit this fell down my todo list. Looking at
>>>>> this bug did remind me that its something worth trying
>>>>> though. As per Alan's mail, I'm going to tackle it
>>>>> separately if you folks don't mind. I'll have a look at
>>>>> Peter's changes (thanks Peter!) as soon as I can and see
>>>>> about getting them in.
>>>>>
>>>>> -Rob
>>>>>
>>>>> On 23/03/14 22:30, Martin Buchholz wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 23, 2014 at 2:34 AM, Martin Buchholz
>>>>>> <martinrb at google.com <mailto:martinrb at google.com>> wrote:
>>>>>>
>>>>>>
>>>>>> We have also thought about whether having reaper
>>>>>> threads is necessary. The Unix rule is that child
>>>>>> processes should be waited for, and some thread needs
>>>>>> to do that. There's no way to wait for a set of
>>>>>> child pids, or to specify a "completion handler".
>>>>>> Well, you might be able to get the newish waitid()
>>>>>> to do what you want, but it looks like it's not
>>>>>> sufficient when java is running inside a process that
>>>>>> might do independent subprocess creation outside of
>>>>>> the JVM.
>>>>>>
>>>>>>
>>>>>> Actually, I take it back. With sufficient work, it looks
>>>>>> like you can get SIGCHLD to give you pid information in
>>>>>> siginfo_t si_pid, and that can be used to trigger the
>>>>>> reaping. It looks like waitpid is "async-signal-safe",
>>>>>> so we can call it from our signal handler.
>>>>>>
>>>>>> While we're at it we can fix SIGCHLD handling to do
>>>>>> signal chaining, as with other signals.
>>>>>
>>>>
>>>
>>>
>>
>
>
More information about the core-libs-dev
mailing list