ProcessReaper: single thread reaper

Peter Levart peter.levart at gmail.com
Fri Apr 11 14:52:03 UTC 2014


On 04/09/2014 07:02 PM, Martin Buchholz wrote:
>
>
>
> On Tue, Apr 8, 2014 at 11:08 PM, Peter Levart <peter.levart at gmail.com 
> <mailto:peter.levart at gmail.com>> wrote:
>
>     Hi Martin,
>
>     As you might have seen in my later reply to Roger, there's still
>     hope on that front: setpgid() + wait(-pgid, ...) might be the
>     answer. I'm exploring in that direction. Shells are doing it, so
>     why can't JDK?
>
>     It's a little trickier for Process API, since I imagine that
>     shells form a group of processes from a pipeline which is known
>     in-advance while Process API will have to add processes to the
>     live group dynamically. So some races will have to be resolved,
>     but I think it's doable.
>
>
> This is a clever idea, and it's arguably better to design subprocesses 
> so they live in separate process groups (emacs does that), but:
> Every time you create a process group, you change the effect of a user 
> signal like Ctrl-C, since it's sent to only one group.
> Maybe propagate signals to the subprocess group?  It's starting to get 
> complicated...
>

Hi Martin,

Yes, shells send Ctrl-C (SIGINT) and other signals initiated by terminal 
to a (foreground) process group. A process group is formed from a 
pipeline of interconnected processes. Each pipeline is considered to be 
a separate "job", hence shells call this feature "job-control". Child 
processes by default inherit process group from it's parent, so children 
born with Process API (and their children) inherit the process group 
from the JVM process. Considering the intentions of shell job-controll, 
is propagating SIGTERM/SIGINT/SIGTSTP/SIGCONT signals to children 
spawned by Process API desirable? If so, then yes, handling those 
signals in JVM and propagating them to current process group that 
contains all children spawned by Process API and their descendants would 
have to be performed by JVM. That problem would certainly have to be 
addressed. But let's first see what I found out about sigaction(SIGCHLD, 
...), setpgid(pid, pgid), waitpid(-pgid, ...), etc...

waitpid(-pgid, ...) alone seems to not be enough for our task. Mainly 
because a process can re-assign it's group and join some other group. I 
don't know if this is a situation that occurs in real world, but imagine 
if we have one live child process in a process group pgid1 and no 
unwaited exited children. If we issue:

     waitpid(-pgid1, &status, 0);

Then this call blocks, because at the time it was given, there were >0 
child processes in the pgid1 group and none of them has exited yet. Now 
if this one child process changes it's process group with:

     setpgid(0, pgid2);

Then the waitpid call in the parent does not return (maybe this is a bug 
in Linux?) although there are no more live child processes in the pgid1 
group any more. Even when this child exits, the call to waitpid does not 
return, since this child is not in the group we are waiting for when it 
exits. If all our children "escape" the group in such way, the tread 
doing waiting will never unblock. To solve this, we can employ signal 
handlers. In a signal handler for SIGCHLD signal we can invoke:

     waitpid(-pgid1, &status, WNOHANG); // non-blocking call

...in loop until it either returns (0) which means that there're no more 
unwaited exited children in the group at the momen or (-1) with errno == 
ECHILD, which means that there're no more children in the queried group 
any more - the group does not exist any more. Since signal handler is 
invoked whith SIGCHLD being masked and there is one bit of pending 
signal state in the kernel, no child exit can be "skipped" this way. 
Unless the child "escapes" by changing it's group. I don't know of a 
plausible reason for a program to change it's process group. If a 
program executing as JVM child wants to become a background daemon it 
usually behaves as follows:

- fork()s a grand-child and then exit()s (so we get notified via signal 
and waitpid(-pgid, ...) successfully for it's exitstatus)
- the grand-child then changes it's session and group (becomes session 
and group leader), closes file descriptors, etc. The responsibility for 
waiting on the grand-child daemon is transferred to the init process 
(pid=1) since the grand-child becomes an orphan (has no parent).

Ignoring this still unsolved problem of possible ill-behaved child 
program that changes it's process group, I started constructing a 
proof-of-concept prototype. What I will do in the prototype is start 
throwing IllegalStateException from the methods of the Process API that 
pertain to such children. I think this is reasonable.

Stay tuned,

Peter





More information about the core-libs-dev mailing list