ProcessReaper: single thread reaper
Peter Levart
peter.levart at gmail.com
Mon Apr 14 16:37:27 UTC 2014
On 04/14/2014 04:37 PM, roger riggs wrote:
> Hi,
>
> Jtreg, for example, needs a reliable way to cleanup after tests.
> We've had a variety of problems with stray processes left over because
> there is no visibility nor reliable way to identify and kill them.
>
> Roger
Hi Roger,
If you want to reliably get rid of all ancestors then there's only one
way on UNIX:
for (Proc c : enumerateDirectChildrenOfJVM()) {
getRidOfTreeRootedAt(c);
}
getRidOfTreeRootedAt(Proc p) {
// if we're not alive any more, then we can't have children - they are
// orphans and we can't identify them any more (their parent is "init")
if (p.isAlive()) {
// save list of direct children 1st, since they will be
re-parented when
// their parent is gone, preventing enumerating them later...
List<Proc> children = p.enumerateDirectChildren();
// try gracefull...
p.terminateGrecefully();
// wait a while
if (p.isAlive()) p.terminateForcefully();
// now iterate children
for (C : children) {
getRidOfTreeRootedAt(C);
}
}
}
- must 1st terminate the parent (hopefully with grace and it will take
care of children) because if you kill a child 1st, a persistent parent
might re-spawn it.
- must enumerate the children before terminating the parent, because
they are re-parented when the parent dies and you can't find them any more.
So my list of requirements for the new API that I submitted in previous
message:
On 04/14/2014 05:54 PM, Peter Levart wrote:
> - enumerate direct children (regardless of which API was used to spawn
> them) of JVM
> - trigger graceful destruction of any direct child
> - non-blocking query for liveness of any direct child
> - trigger forcible termination of any direct child and all descendants
> in one call
> - (optionally: obtain a Process object of any live direct child that
> was spawned by Process API)
...must be augmented:
- enumerate direct children (regardless of which API was used to spawn
them) of JVM
- enumerate direct children of any child enumerated by the API
- trigger graceful destruction of any ancestor enumerated by the API
- non-blocking query for liveness of any ancestor enumerated by the API
- trigger forcible termination of any ancestor enumerated by the API
- (optionally: obtain a Process object of any live direct JVM child that
was spawned by Process API)
Regards, Peter
>
>
> On 4/14/2014 10:31 AM, David M. Lloyd wrote:
>> Where does the requirement to manage grandchild processes actually
>> come from? I'd hate to see the ability to "nicely" terminate
>> immediate child processes lost just because it was difficult to
>> implement some grander scheme.
>>
>> On 04/14/2014 08:49 AM, roger riggs wrote:
>>> Hi Martin,
>>>
>>> A new API is needed, overloading the current Process API is not a good
>>> option.
>>> Even within Process a new method will be needed to destroy the
>>> subprocess and all
>>> of its children maintain backward compatibility.
>>>
>>> Are there specific OS features that need to be exposed to applications?
>>> Is the destroy-process-and-all-children abstraction too coarse.
>>>
>>> Roger
>>>
>>>
>>>
>>>
>>>
>>> On 4/11/2014 7:37 PM, Martin Buchholz wrote:
>>>> Let's step back again and try to check our goals...
>>>>
>>>> We could try to optimize the one-reaper-thread-per-subprocess thing.
>>>> But that is risky, and the cost of what we're doing today is not that
>>>> high.
>>>>
>>>> We could try to implement the feature of killing off an entire
>>>> subprocess tree. But historically, any kind of behavior change like
>>>> that has been vetoed. I have tried and failed to make less
>>>> incompatible changes. We would have to add a new API.
>>>>
>>>> The reality is that Java does not give you real access to the
>>>> underlying OS, and unless there's a seriously heterodox attempt to
>>>> provide OS-specific extensions, people will have to continue to either
>>>> write native code or delegate to an OS-savvy subprocess like a perl
>>>> script.
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 7:52 AM, Peter Levart <peter.levart at gmail.com
>>>> <mailto:peter.levart at gmail.com>> wrote:
>>>>
>>>> On 04/09/2014 07:02 PM, Martin Buchholz wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 8, 2014 at 11:08 PM, Peter Levart
>>>>> <peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>> As you might have seen in my later reply to Roger, there's
>>>>> still hope on that front: setpgid() + wait(-pgid, ...) might
>>>>> be the answer. I'm exploring in that direction. Shells are
>>>>> doing it, so why can't JDK?
>>>>>
>>>>> It's a little trickier for Process API, since I imagine that
>>>>> shells form a group of processes from a pipeline which is
>>>>> known in-advance while Process API will have to add processes
>>>>> to the live group dynamically. So some races will have to be
>>>>> resolved, but I think it's doable.
>>>>>
>>>>>
>>>>> This is a clever idea, and it's arguably better to design
>>>>> subprocesses so they live in separate process groups (emacs does
>>>>> that), but:
>>>>> Every time you create a process group, you change the effect of a
>>>>> user signal like Ctrl-C, since it's sent to only one group.
>>>>> Maybe propagate signals to the subprocess group? It's starting
>>>>> to get complicated...
>>>>>
>>>>
>>>> Hi Martin,
>>>>
>>>> Yes, shells send Ctrl-C (SIGINT) and other signals initiated by
>>>> terminal to a (foreground) process group. A process group is
>>>> formed from a pipeline of interconnected processes. Each pipeline
>>>> is considered to be a separate "job", hence shells call this
>>>> feature "job-control". Child processes by default inherit process
>>>> group from it's parent, so children born with Process API (and
>>>> their children) inherit the process group from the JVM process.
>>>> Considering the intentions of shell job-controll, is propagating
>>>> SIGTERM/SIGINT/SIGTSTP/SIGCONT signals to children spawned by
>>>> Process API desirable? If so, then yes, handling those signals in
>>>> JVM and propagating them to current process group that contains
>>>> all children spawned by Process API and their descendants would
>>>> have to be performed by JVM. That problem would certainly have to
>>>> be addressed. But let's first see what I found out about
>>>> sigaction(SIGCHLD, ...), setpgid(pid, pgid), waitpid(-pgid, ...),
>>>> etc...
>>>>
>>>> waitpid(-pgid, ...) alone seems to not be enough for our task.
>>>> Mainly because a process can re-assign it's group and join some
>>>> other group. I don't know if this is a situation that occurs in
>>>> real world, but imagine if we have one live child process in a
>>>> process group pgid1 and no unwaited exited children. If we issue:
>>>>
>>>> waitpid(-pgid1, &status, 0);
>>>>
>>>> Then this call blocks, because at the time it was given, there
>>>> were >0 child processes in the pgid1 group and none of them has
>>>> exited yet. Now if this one child process changes it's process
>>>> group with:
>>>>
>>>> setpgid(0, pgid2);
>>>>
>>>> Then the waitpid call in the parent does not return (maybe this is
>>>> a bug in Linux?) although there are no more live child processes
>>>> in the pgid1 group any more. Even when this child exits, the call
>>>> to waitpid does not return, since this child is not in the group
>>>> we are waiting for when it exits. If all our children "escape" the
>>>> group in such way, the tread doing waiting will never unblock. To
>>>> solve this, we can employ signal handlers. In a signal handler for
>>>> SIGCHLD signal we can invoke:
>>>>
>>>> waitpid(-pgid1, &status, WNOHANG); // non-blocking call
>>>>
>>>> ...in loop until it either returns (0) which means that there're
>>>> no more unwaited exited children in the group at the momen or (-1)
>>>> with errno == ECHILD, which means that there're no more children
>>>> in the queried group any more - the group does not exist any more.
>>>> Since signal handler is invoked whith SIGCHLD being masked and
>>>> there is one bit of pending signal state in the kernel, no child
>>>> exit can be "skipped" this way. Unless the child "escapes" by
>>>> changing it's group. I don't know of a plausible reason for a
>>>> program to change it's process group. If a program executing as
>>>> JVM child wants to become a background daemon it usually behaves
>>>> as follows:
>>>>
>>>> - fork()s a grand-child and then exit()s (so we get notified via
>>>> signal and waitpid(-pgid, ...) successfully for it's exitstatus)
>>>> - the grand-child then changes it's session and group (becomes
>>>> session and group leader), closes file descriptors, etc. The
>>>> responsibility for waiting on the grand-child daemon is
>>>> transferred to the init process (pid=1) since the grand-child
>>>> becomes an orphan (has no parent).
>>>>
>>>> Ignoring this still unsolved problem of possible ill-behaved child
>>>> program that changes it's process group, I started constructing a
>>>> proof-of-concept prototype. What I will do in the prototype is
>>>> start throwing IllegalStateException from the methods of the
>>>> Process API that pertain to such children. I think this is
>>>> reasonable.
>>>>
>>>> Stay tuned,
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>
>>
>>
>
More information about the core-libs-dev
mailing list