ProcessReaper: single thread reaper

roger riggs roger.riggs at oracle.com
Mon Apr 14 17:31:16 UTC 2014


Hi Peter,

We already have Process.destroy vs Process.destroyForcibly though the 
implementations are identical.

I agree that for a general purpose API, using a nice polite approach 
with the
children is needed.  But I'm troubled, by the 'wait-a-while' technique.
That probably should be left to the caller of the API.

But in the cleanup case, such as jtreg, the niceties do not need to be 
provided; just clean the swamp.
I've seen one technique that suspends all of the sub processes in the 
first pass
then send them all sigkill and go back and continue them again.

Roger


On 4/14/2014 12:37 PM, Peter Levart wrote:
> On 04/14/2014 04:37 PM, roger riggs wrote:
>> Hi,
>>
>> Jtreg, for example, needs a reliable way to cleanup after tests.
>> We've had a variety of problems with stray processes left over because
>> there is no visibility nor reliable way to identify and kill them.
>>
>> Roger
>
> Hi Roger,
>
> If you want to reliably get rid of all ancestors then there's only one 
> way on UNIX:
>
>
> for (Proc c : enumerateDirectChildrenOfJVM()) {
>     getRidOfTreeRootedAt(c);
> }
>
> getRidOfTreeRootedAt(Proc p) {
>     // if we're not alive any more, then we can't have children - they 
> are
>     // orphans and we can't identify them any more (their parent is 
> "init")
>     if (p.isAlive()) {
>         // save list of direct children 1st, since they will be 
> re-parented when
>         // their parent is gone, preventing enumerating them later...
>         List<Proc> children = p.enumerateDirectChildren();
>         // try gracefull...
>         p.terminateGracefully();
>         // wait a while
>         if (p.isAlive()) p.terminateForcefully();
>         // now iterate children
>         for (C : children) {
>             getRidOfTreeRootedAt(C);
>         }
>     }
> }
>
>
> - must 1st terminate the parent (hopefully with grace and it will take 
> care of children) because if you kill a child 1st, a persistent parent 
> might re-spawn it.
> - must enumerate the children before terminating the parent, because 
> they are re-parented when the parent dies and you can't find them any 
> more.
>
>
> So my list of requirements for the new API that I submitted in 
> previous message:
>
> On 04/14/2014 05:54 PM, Peter Levart wrote:
>> - enumerate direct children (regardless of which API was used to 
>> spawn them) of JVM
>> - trigger graceful destruction of any direct child
>> - non-blocking query for liveness of any direct child
>> - trigger forcible termination of any direct child and all 
>> descendants in one call
>> - (optionally: obtain a Process object of any live direct child that 
>> was spawned by Process API)
>
> ...must be augmented:
>
> - enumerate direct children (regardless of which API was used to spawn 
> them) of JVM
> - enumerate direct children of any child enumerated by the API
> - trigger graceful destruction of any ancestor enumerated by the API
> - non-blocking query for liveness of any ancestor enumerated by the API
> - trigger forcible termination of any ancestor enumerated by the API
> - (optionally: obtain a Process object of any live direct JVM child 
> that was spawned by Process API)
>
>
> Regards, Peter
>
>
>>
>>
>> On 4/14/2014 10:31 AM, David M. Lloyd wrote:
>>> Where does the requirement to manage grandchild processes actually 
>>> come from?  I'd hate to see the ability to "nicely" terminate 
>>> immediate child processes lost just because it was difficult to 
>>> implement some grander scheme.
>>>
>>> On 04/14/2014 08:49 AM, roger riggs wrote:
>>>> Hi Martin,
>>>>
>>>> A new API is needed, overloading the current Process API is not a good
>>>> option.
>>>> Even within Process a new method will be needed to destroy the
>>>> subprocess and all
>>>> of its children maintain backward compatibility.
>>>>
>>>> Are there specific OS features that need to be exposed to 
>>>> applications?
>>>> Is the destroy-process-and-all-children abstraction too coarse.
>>>>
>>>> Roger
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 4/11/2014 7:37 PM, Martin Buchholz wrote:
>>>>> Let's step back again and try to check our goals...
>>>>>
>>>>> We could try to optimize the one-reaper-thread-per-subprocess thing.
>>>>>  But that is risky, and the cost of what we're doing today is not 
>>>>> that
>>>>> high.
>>>>>
>>>>> We could try to implement the feature of killing off an entire
>>>>> subprocess tree.  But historically, any kind of behavior change like
>>>>> that has been vetoed.  I have tried and failed to make less
>>>>> incompatible changes.  We would have to add a new API.
>>>>>
>>>>> The reality is that Java does not give you real access to the
>>>>> underlying OS, and unless there's a seriously heterodox attempt to
>>>>> provide OS-specific extensions, people will have to continue to 
>>>>> either
>>>>> write native code or delegate to an OS-savvy subprocess like a perl
>>>>> script.
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 7:52 AM, Peter Levart <peter.levart at gmail.com
>>>>> <mailto:peter.levart at gmail.com>> wrote:
>>>>>
>>>>>     On 04/09/2014 07:02 PM, Martin Buchholz wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>     On Tue, Apr 8, 2014 at 11:08 PM, Peter Levart
>>>>>>     <peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>>>>>
>>>>>>         Hi Martin,
>>>>>>
>>>>>>         As you might have seen in my later reply to Roger, there's
>>>>>>         still hope on that front: setpgid() + wait(-pgid, ...) might
>>>>>>         be the answer. I'm exploring in that direction. Shells are
>>>>>>         doing it, so why can't JDK?
>>>>>>
>>>>>>         It's a little trickier for Process API, since I imagine that
>>>>>>         shells form a group of processes from a pipeline which is
>>>>>>         known in-advance while Process API will have to add 
>>>>>> processes
>>>>>>         to the live group dynamically. So some races will have to be
>>>>>>         resolved, but I think it's doable.
>>>>>>
>>>>>>
>>>>>>     This is a clever idea, and it's arguably better to design
>>>>>>     subprocesses so they live in separate process groups (emacs does
>>>>>>     that), but:
>>>>>>     Every time you create a process group, you change the effect 
>>>>>> of a
>>>>>>     user signal like Ctrl-C, since it's sent to only one group.
>>>>>>     Maybe propagate signals to the subprocess group? It's starting
>>>>>>     to get complicated...
>>>>>>
>>>>>
>>>>>     Hi Martin,
>>>>>
>>>>>     Yes, shells send Ctrl-C (SIGINT) and other signals initiated by
>>>>>     terminal to a (foreground) process group. A process group is
>>>>>     formed from a pipeline of interconnected processes. Each pipeline
>>>>>     is considered to be a separate "job", hence shells call this
>>>>>     feature "job-control". Child processes by default inherit process
>>>>>     group from it's parent, so children born with Process API (and
>>>>>     their children) inherit the process group from the JVM process.
>>>>>     Considering the intentions of shell job-controll, is propagating
>>>>>     SIGTERM/SIGINT/SIGTSTP/SIGCONT signals to children spawned by
>>>>>     Process API desirable? If so, then yes, handling those signals in
>>>>>     JVM and propagating them to current process group that contains
>>>>>     all children spawned by Process API and their descendants would
>>>>>     have to be performed by JVM. That problem would certainly have to
>>>>>     be addressed. But let's first see what I found out about
>>>>>     sigaction(SIGCHLD, ...), setpgid(pid, pgid), waitpid(-pgid, ...),
>>>>>     etc...
>>>>>
>>>>>     waitpid(-pgid, ...) alone seems to not be enough for our task.
>>>>>     Mainly because a process can re-assign it's group and join some
>>>>>     other group. I don't know if this is a situation that occurs in
>>>>>     real world, but imagine if we have one live child process in a
>>>>>     process group pgid1 and no unwaited exited children. If we issue:
>>>>>
>>>>>         waitpid(-pgid1, &status, 0);
>>>>>
>>>>>     Then this call blocks, because at the time it was given, there
>>>>>     were >0 child processes in the pgid1 group and none of them has
>>>>>     exited yet. Now if this one child process changes it's process
>>>>>     group with:
>>>>>
>>>>>         setpgid(0, pgid2);
>>>>>
>>>>>     Then the waitpid call in the parent does not return (maybe 
>>>>> this is
>>>>>     a bug in Linux?) although there are no more live child processes
>>>>>     in the pgid1 group any more. Even when this child exits, the call
>>>>>     to waitpid does not return, since this child is not in the group
>>>>>     we are waiting for when it exits. If all our children "escape" 
>>>>> the
>>>>>     group in such way, the tread doing waiting will never unblock. To
>>>>>     solve this, we can employ signal handlers. In a signal handler 
>>>>> for
>>>>>     SIGCHLD signal we can invoke:
>>>>>
>>>>>         waitpid(-pgid1, &status, WNOHANG); // non-blocking call
>>>>>
>>>>>     ...in loop until it either returns (0) which means that there're
>>>>>     no more unwaited exited children in the group at the momen or 
>>>>> (-1)
>>>>>     with errno == ECHILD, which means that there're no more children
>>>>>     in the queried group any more - the group does not exist any 
>>>>> more.
>>>>>     Since signal handler is invoked whith SIGCHLD being masked and
>>>>>     there is one bit of pending signal state in the kernel, no child
>>>>>     exit can be "skipped" this way. Unless the child "escapes" by
>>>>>     changing it's group. I don't know of a plausible reason for a
>>>>>     program to change it's process group. If a program executing as
>>>>>     JVM child wants to become a background daemon it usually behaves
>>>>>     as follows:
>>>>>
>>>>>     - fork()s a grand-child and then exit()s (so we get notified via
>>>>>     signal and waitpid(-pgid, ...) successfully for it's exitstatus)
>>>>>     - the grand-child then changes it's session and group (becomes
>>>>>     session and group leader), closes file descriptors, etc. The
>>>>>     responsibility for waiting on the grand-child daemon is
>>>>>     transferred to the init process (pid=1) since the grand-child
>>>>>     becomes an orphan (has no parent).
>>>>>
>>>>>     Ignoring this still unsolved problem of possible ill-behaved 
>>>>> child
>>>>>     program that changes it's process group, I started constructing a
>>>>>     proof-of-concept prototype. What I will do in the prototype is
>>>>>     start throwing IllegalStateException from the methods of the
>>>>>     Process API that pertain to such children. I think this is
>>>>> reasonable.
>>>>>
>>>>>     Stay tuned,
>>>>>
>>>>>     Peter
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>




More information about the core-libs-dev mailing list