ProcessReaper: single thread reaper

Peter Levart peter.levart at gmail.com
Tue Mar 25 14:05:16 UTC 2014


On 03/24/2014 10:05 PM, roger riggs wrote:
> Hi Rob, Martin, et.al.
>
> I've prototyped (for 9) a thread reaper[1] that uses a single thread 
> to wait for exiting
> processes and calling back to the process with the exit status.
> the interesting part is getting the exit status back to the Process 
> that needs it
> It needs more testing and hardening.
>
> I had not considered using a signal handler for SIGCHLD but that's an 
> option,
> though we need to be very careful about thread usage.
>
> Roger
>
> p.s. comments on the single thread reaper appreciated (in a new thread)
> [1] http://cr.openjdk.java.net/~rriggs/webrev-waitpid/
>
>
Hi Roger,

I think I found a little race. Suppose reaper task is still alive and 
that all consumers have been serviced (consumerCount is 0). The reaper 
task waits for 500 millis for next consumer to be registered, but times 
out. Before calling "reaperThread.release()", new consumer comes around 
and registers itself, also calling runReaper(), but since 
reaperThread.release() has not yet been called by old reaper task, new 
reaper task is not submitted to commonPool. The old reaper task 
finishes, leaving one consumer on the waitingList with no reaper task to 
service it. If no new consumers get registered, the waiting consumer 
will never be notified...

The simplest solution for this race, I think, would be to have a 
dedicated long-running thread. It could be spawned lazily, but then it 
would never finish.

Otherwise a nice solution with two lists (exitList/waitList) and 
avoidance of race with reversed orders between
- consumer registration: register on waitList 1st then check exitList, and
- exit event dispatch: register on exitList 1st then check waitList

...but the check you use with consumeCount local variable to detect 
processes spawned by other means (for purposes of logging only) has a race:

thread1: Suppose a new consumer is being registered with 
onExitCall(...), is added on the waitList, but before checking 
exitList().size() and iterating the exitList, ...
thread2: the reaper task detects that the very same process has finished 
(gets it's pid from waitpid) and adds it's pid to exitList. Then before 
iterating waitList, ...
thread1: iterates the exitList, finds a match and consumes the pid, 
removing the matching entries from both exitList and waitList. Then ...
thread2: the reaper task iterates waitList, doesn't find a matching 
entry for exitPid, doesn't increment consumeCount and voila: debug 
log("Unexpected process exit for pid:...").


That's enough races for today.

Regards, Peter

>
>
> On 3/24/2014 12:38 AM, Rob McKenna wrote:
>> Hi folks,
>>
>> Roger Riggs (cc'd) may want to chip in here as he's looking at the 
>> reaper thread arrangement in 9 at the moment.
>>
>> On another note, I too support the merging of those files. I didn't 
>> think there was much appetite for it at the time so I must admit this 
>> fell down my todo list. Looking at this bug did remind me that its 
>> something worth trying though. As per Alan's mail, I'm going to 
>> tackle it separately if you folks don't mind. I'll have a look at 
>> Peter's changes (thanks Peter!) as soon as I can and see about 
>> getting them in.
>>
>>     -Rob
>>
>> On 23/03/14 22:30, Martin Buchholz wrote:
>>>
>>>
>>>
>>> On Sun, Mar 23, 2014 at 2:34 AM, Martin Buchholz 
>>> <martinrb at google.com <mailto:martinrb at google.com>> wrote:
>>>
>>>
>>>     We have also thought about whether having reaper threads is
>>>     necessary.  The Unix rule is that child processes should be
>>>     waited for, and some thread needs to do that.  There's no way to
>>>     wait for a set of child pids, or to specify a "completion
>>>     handler".  Well, you might be able to get the newish waitid() to
>>>     do what you want, but it looks like it's not sufficient when
>>>     java is running inside a process that might do independent
>>>     subprocess creation outside of the JVM.
>>>
>>>
>>> Actually, I take it back.  With sufficient work, it looks like you 
>>> can get SIGCHLD to give you pid information in siginfo_t si_pid, and 
>>> that can be used to trigger the reaping.  It looks like waitpid is 
>>> "async-signal-safe", so we can call it from our signal handler.
>>>
>>> While we're at it we can fix SIGCHLD handling to do signal chaining, 
>>> as with other signals.
>>
>




More information about the core-libs-dev mailing list