ProcessReaper: single thread reaper

Mon Apr 14 08:35:30 UTC 2014

On 04/11/2014 06:49 PM, roger riggs wrote:
> Hi Peter,
>
> We do know the PIDs of the processes that we care about but are unwilling
> to pay the cost of waiting for them individually.
> For the escapees, Process could resort to an individual thread 
> invoking waitpid(n).
>
> Thanks, Roger

Yes, it could. But the problem is that we don't find-out about the 
escapees immediately. Only after waitpid(-pgid, ...) starts returning <0 
with errno==ECHILD, we can assume that all children we didn't get a 
report on, have escaped. But that might not happen for a long time if we 
have at least one long-lived child in the process group...

We could ignore this problem and pretend that such things never happen 
in practice, but I don't feel good about it and neither does Martin, I 
think.

For a moment I thought there was another way to wait for selected children:

        waitid():
            _SVID_SOURCE || _XOPEN_SOURCE >= 500 || _XOPEN_SOURCE && 
_XOPEN_SOURCE_EXTENDED
            || /* Since glibc 2.12: */ _POSIX_C_SOURCE >= 200809L

...contains one additional option:

        WNOWAIT     Leave the child in a waitable state; a later wait 
call can be used to again retrieve the child status information.

This function is available on Linux and Solaris, but not on Mac OS-X and 
I don't know about AIX. :-( ...

So I guess we are out of luck and one thread per child is about to stay. 
The overhead is 32K per reaper thread which amounts to 32MB for 1K 
children, which doesn't seem much for a system which wants to spawn 1K 
concurrent children and cached thread pool takes care of thread re-use 
when a program spawns some children, waits for them and does that 
repeatedly...

Regards, Peter

>
>
> On 4/11/2014 10:52 AM, Peter Levart wrote:
>> On 04/09/2014 07:02 PM, Martin Buchholz wrote:
>>>
>>>
>>>
>>> On Tue, Apr 8, 2014 at 11:08 PM, Peter Levart 
>>> <peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>>
>>>     Hi Martin,
>>>
>>>     As you might have seen in my later reply to Roger, there's still
>>>     hope on that front: setpgid() + wait(-pgid, ...) might be the
>>>     answer. I'm exploring in that direction. Shells are doing it, so
>>>     why can't JDK?
>>>
>>>     It's a little trickier for Process API, since I imagine that
>>>     shells form a group of processes from a pipeline which is known
>>>     in-advance while Process API will have to add processes to the
>>>     live group dynamically. So some races will have to be resolved,
>>>     but I think it's doable.
>>>
>>>
>>> This is a clever idea, and it's arguably better to design 
>>> subprocesses so they live in separate process groups (emacs does 
>>> that), but:
>>> Every time you create a process group, you change the effect of a 
>>> user signal like Ctrl-C, since it's sent to only one group.
>>> Maybe propagate signals to the subprocess group?  It's starting to 
>>> get complicated...
>>>
>>
>> Hi Martin,
>>
>> Yes, shells send Ctrl-C (SIGINT) and other signals initiated by 
>> terminal to a (foreground) process group. A process group is formed 
>> from a pipeline of interconnected processes. Each pipeline is 
>> considered to be a separate "job", hence shells call this feature 
>> "job-control". Child processes by default inherit process group from 
>> it's parent, so children born with Process API (and their children) 
>> inherit the process group from the JVM process. Considering the 
>> intentions of shell job-controll, is propagating 
>> SIGTERM/SIGINT/SIGTSTP/SIGCONT signals to children spawned by Process 
>> API desirable? If so, then yes, handling those signals in JVM and 
>> propagating them to current process group that contains all children 
>> spawned by Process API and their descendants would have to be 
>> performed by JVM. That problem would certainly have to be addressed. 
>> But let's first see what I found out about sigaction(SIGCHLD, ...), 
>> setpgid(pid, pgid), waitpid(-pgid, ...), etc...
>>
>> waitpid(-pgid, ...) alone seems to not be enough for our task. Mainly 
>> because a process can re-assign it's group and join some other group. 
>> I don't know if this is a situation that occurs in real world, but 
>> imagine if we have one live child process in a process group pgid1 
>> and no unwaited exited children. If we issue:
>>
>>     waitpid(-pgid1, &status, 0);
>>
>> Then this call blocks, because at the time it was given, there were 
>> >0 child processes in the pgid1 group and none of them has exited 
>> yet. Now if this one child process changes it's process group with:
>>
>>     setpgid(0, pgid2);
>>
>> Then the waitpid call in the parent does not return (maybe this is a 
>> bug in Linux?) although there are no more live child processes in the 
>> pgid1 group any more. Even when this child exits, the call to waitpid 
>> does not return, since this child is not in the group we are waiting 
>> for when it exits. If all our children "escape" the group in such 
>> way, the tread doing waiting will never unblock. To solve this, we 
>> can employ signal handlers. In a signal handler for SIGCHLD signal we 
>> can invoke:
>>
>>     waitpid(-pgid1, &status, WNOHANG); // non-blocking call
>>
>> ...in loop until it either returns (0) which means that there're no 
>> more unwaited exited children in the group at the momen or (-1) with 
>> errno == ECHILD, which means that there're no more children in the 
>> queried group any more - the group does not exist any more. Since 
>> signal handler is invoked whith SIGCHLD being masked and there is one 
>> bit of pending signal state in the kernel, no child exit can be 
>> "skipped" this way. Unless the child "escapes" by changing it's 
>> group. I don't know of a plausible reason for a program to change 
>> it's process group. If a program executing as JVM child wants to 
>> become a background daemon it usually behaves as follows:
>>
>> - fork()s a grand-child and then exit()s (so we get notified via 
>> signal and waitpid(-pgid, ...) successfully for it's exitstatus)
>> - the grand-child then changes it's session and group (becomes 
>> session and group leader), closes file descriptors, etc. The 
>> responsibility for waiting on the grand-child daemon is transferred 
>> to the init process (pid=1) since the grand-child becomes an orphan 
>> (has no parent).
>>
>> Ignoring this still unsolved problem of possible ill-behaved child 
>> program that changes it's process group, I started constructing a 
>> proof-of-concept prototype. What I will do in the prototype is start 
>> throwing IllegalStateException from the methods of the Process API 
>> that pertain to such children. I think this is reasonable.
>>
>> Stay tuned,
>>
>> Peter
>>
>>
>