RFR 9: 8077350 Process API Updates Implementation Review
Thomas Stüfe
thomas.stuefe at gmail.com
Thu Apr 16 19:01:50 UTC 2015
Hi Roger,
thank you for your answer!
The reason I take an interest is not just theoretical. We (SAP) use our JVM
for our test infrastructure and we had exactly the problem allChildren() is
designed to solve: killing a process tree related to a specific tests
(similar to jtreg tests) in case of errors or hangs. We have test machines
running large workloads of tests in parallel and we reach pid wraparound -
depending on the OS - quite fast.
We solved this by adding process groups to Process.java and we are very
happy with this solution. We are able to quickly kill a whole process tree,
cleanly and completely, without ambiguity or risk to other tests. Of course
we had to add this support as a "sideways hack" in order to not change the
official Process.java interface. Therefore I was hoping that with JEP 102,
we would get official support for process groups. Unfortunately, seems the
decision is already done and we are too late in the discussion :(
see my other comments inline.
On Sat, Apr 11, 2015 at 8:55 PM, Roger Riggs <Roger.Riggs at oracle.com> wrote:
> Hi Thomas,
>
> Thanks for the comments.
>
> On 4/11/2015 8:31 AM, Thomas Stüfe wrote:
>
> Hi Roger,
>
> I have a question about getChildren() and getAllChildren().
>
> I assume the point of those functions is to implement point 4 of JEP 102
> ("The ability to deal with process trees, in particular some means to
> destroy a process tree."), by returning a collection of PIDs which are the
> children of the process and then killing them?
>
> Earlier versions included a killProcess tree method but it was recommended
> to leave
> the exact algorithm to kill processes to the caller.
>
>
> However, I am not sure that this can be implemented in a safe way, at
> least on UNIX, because - as Martin already pointed out - of PID recycling.
> I do not see how you can prevent allChildren() from returning PIDs which
> may be already reaped and recyled when you use them later. How do you
> prevent that?
>
> Unless there is an extended time between getting the children and
> destroying them the pids will still be valid.
>
Why? Child process may be getting reaped the instant you are done reading
it from /proc, and pid may have been recycled by the OS right away and
already pointing to another process when allChildren() returns. If a
process lives about as long as it takes the system to reach a pid
wraparound to the same pid value, its pid could be recycled right after it
is reaped, or? Sure, the longer you wait, the higher the chance of this to
happen, but it may happen right away.
As Martin said, we had those races in the kill() code since a long time,
but children()/allChildren() could make those error more probable, because
now more processes are involved. Especially if you use allChildren to kill
a deep process tree. And there is nothing in the javadoc warning the user
about this scenario. You would just happen from time to time to kill an
unrelated process. Those problems are hard to debug.
The technique of caching the start time can prevent that case; though it
> has AFAIK not been a problem.
>
How would that work? User should, before issuing the kill, compare start
time of process to kill with cached start time?
> Note even if your coding is bulletproof, that allChildren() will also
> return PIDs of sub processes which are completely unrelated to you and
> Process.java - they could have been forked by some third party native code
> which just happens to run in parallel in the same process. There, you have
> no control about when it gets reaped. It might already have been reaped by
> the time allChildren() returns, and now the same PID got recycled as
> another, unrelated process.
>
> Of course, the best case is for an application to spawn and manage its own
> processes
> and handle there proper termination.
> The use cases for children/allChildren are focused on
> supervisory/executive functions
> that monitor a running system and can cleanup even in the case of
> unexpected failures.
>
All management of processes is subject to OS limitations, if the PID were
> from a completely
> different process tree, the ordinary destroy/info functions would not be
> available
> unless the process was running as a privileged os user (same as any other
> native application).
>
Could you explain this please? If both trees run under the same user, why
should I not be able to kill a process from a different tree?
> If I am right, it would not be sufficient to state "There is no guarantee
> that a process is alive." - it may be alive but by now be a different
> process altogether. This makes "allChildren()" useless for many cases,
> because the returned information may already be obsolete the moment the
> function returns.
>
> The caching of startTime can remove the ambiguity.
>
>
> Of course I may something missing here?
>
> But if I got all that right and the sole purpose of allChildren() is to
> be able to kill them (or otherwise signal them), why not use process
> groups? Process groups would be the traditional way on POSIX platforms to
> handle process trees, and they are also available on Windows in the form of
> Job Objects.
>
> Using process groups to signal sub process trees would be safe, would
> not rely on PID identity, and would be more efficient. Also way less
> coding. Also, it would be an old, established pattern - process groups have
> been around for a long time. Also, using process groups it is possible to
> break away from a group, so a program below you which wants to run as a
> demon can do so by removing itself from the process group and thus escaping
> your kill.
>
> On Windows we have Job objects, and I think there are enough
> similarities to POSIX process groups to abstract them into something
> platform independent.
>
> Earlier discussions of process termination and exit value reaping
> considered
> using process groups but it became evident that the Java runtime needed to
> be very careful to not interfere with processes that might be spawned and
> controlled by native libraries and that process groups would only increase
> complexity and the interactions.
>
> Thanks, Roger
>
>
Thanks! Thomas
More information about the core-libs-dev
mailing list