ProcessReaper: single thread reaper
Roger Riggs
Roger.Riggs at Oracle.com
Tue Mar 25 16:25:12 UTC 2014
Hi Peter,
Thanks for the comments, I'll fix the races conditions, I intended
to go back to the dedicated thread but wanted to check for scalability
in the case of a large number of threads and non-trivial termination
overheads
where the streams had to be drained.
Thanks, Roger
On 3/25/14 7:05 AM, Peter Levart wrote:
> On 03/24/2014 10:05 PM, roger riggs wrote:
>> Hi Rob, Martin, et.al.
>>
>> I've prototyped (for 9) a thread reaper[1] that uses a single thread
>> to wait for exiting
>> processes and calling back to the process with the exit status.
>> the interesting part is getting the exit status back to the Process
>> that needs it
>> It needs more testing and hardening.
>>
>> I had not considered using a signal handler for SIGCHLD but that's an
>> option,
>> though we need to be very careful about thread usage.
>>
>> Roger
>>
>> p.s. comments on the single thread reaper appreciated (in a new thread)
>> [1] http://cr.openjdk.java.net/~rriggs/webrev-waitpid/
>>
>>
> Hi Roger,
>
> I think I found a little race. Suppose reaper task is still alive and
> that all consumers have been serviced (consumerCount is 0). The reaper
> task waits for 500 millis for next consumer to be registered, but
> times out. Before calling "reaperThread.release()", new consumer comes
> around and registers itself, also calling runReaper(), but since
> reaperThread.release() has not yet been called by old reaper task, new
> reaper task is not submitted to commonPool. The old reaper task
> finishes, leaving one consumer on the waitingList with no reaper task
> to service it. If no new consumers get registered, the waiting
> consumer will never be notified...
>
> The simplest solution for this race, I think, would be to have a
> dedicated long-running thread. It could be spawned lazily, but then it
> would never finish.
>
> Otherwise a nice solution with two lists (exitList/waitList) and
> avoidance of race with reversed orders between
> - consumer registration: register on waitList 1st then check exitList,
> and
> - exit event dispatch: register on exitList 1st then check waitList
>
> ...but the check you use with consumeCount local variable to detect
> processes spawned by other means (for purposes of logging only) has a
> race:
>
> thread1: Suppose a new consumer is being registered with
> onExitCall(...), is added on the waitList, but before checking
> exitList().size() and iterating the exitList, ...
> thread2: the reaper task detects that the very same process has
> finished (gets it's pid from waitpid) and adds it's pid to exitList.
> Then before iterating waitList, ...
> thread1: iterates the exitList, finds a match and consumes the pid,
> removing the matching entries from both exitList and waitList. Then ...
> thread2: the reaper task iterates waitList, doesn't find a matching
> entry for exitPid, doesn't increment consumeCount and voila: debug
> log("Unexpected process exit for pid:...").
>
>
> That's enough races for today.
>
> Regards, Peter
>
>>
>>
>> On 3/24/2014 12:38 AM, Rob McKenna wrote:
>>> Hi folks,
>>>
>>> Roger Riggs (cc'd) may want to chip in here as he's looking at the
>>> reaper thread arrangement in 9 at the moment.
>>>
>>> On another note, I too support the merging of those files. I didn't
>>> think there was much appetite for it at the time so I must admit
>>> this fell down my todo list. Looking at this bug did remind me that
>>> its something worth trying though. As per Alan's mail, I'm going to
>>> tackle it separately if you folks don't mind. I'll have a look at
>>> Peter's changes (thanks Peter!) as soon as I can and see about
>>> getting them in.
>>>
>>> -Rob
>>>
>>> On 23/03/14 22:30, Martin Buchholz wrote:
>>>>
>>>>
>>>>
>>>> On Sun, Mar 23, 2014 at 2:34 AM, Martin Buchholz
>>>> <martinrb at google.com <mailto:martinrb at google.com>> wrote:
>>>>
>>>>
>>>> We have also thought about whether having reaper threads is
>>>> necessary. The Unix rule is that child processes should be
>>>> waited for, and some thread needs to do that. There's no way
>>>> to wait for a set of child pids, or to specify a "completion
>>>> handler". Well, you might be able to get the newish waitid()
>>>> to do what you want, but it looks like it's not sufficient when
>>>> java is running inside a process that might do independent
>>>> subprocess creation outside of the JVM.
>>>>
>>>>
>>>> Actually, I take it back. With sufficient work, it looks like you
>>>> can get SIGCHLD to give you pid information in siginfo_t si_pid,
>>>> and that can be used to trigger the reaping. It looks like waitpid
>>>> is "async-signal-safe", so we can call it from our signal handler.
>>>>
>>>> While we're at it we can fix SIGCHLD handling to do signal
>>>> chaining, as with other signals.
>>>
>>
>
More information about the core-libs-dev
mailing list