RFR 9: 8077350 Process API Updates Implementation Review

Thomas Stüfe thomas.stuefe at gmail.com
Sat Apr 18 18:27:12 UTC 2015


Hi Roger,

On Fri, Apr 17, 2015 at 7:05 PM, Roger Riggs <Roger.Riggs at oracle.com> wrote:

>  Hi Thomas,
>
> On 4/16/2015 3:01 PM, Thomas Stüfe wrote:
>
> Hi Roger,
>
>  thank you for your answer!
>
>  The reason I take an interest is not just theoretical. We (SAP) use our
> JVM for our test infrastructure and we had exactly the problem
> allChildren() is designed to solve: killing a process tree related to a
> specific tests (similar to jtreg tests) in case of errors or hangs. We have
> test machines running large workloads of tests in parallel and we reach pid
> wraparound - depending on the OS - quite fast.
>
>  We solved this by adding process groups to Process.java and we are very
> happy with this solution. We are able to quickly kill a whole process tree,
> cleanly and completely, without ambiguity or risk to other tests. Of course
> we had to add this support as a "sideways hack" in order to not change the
> official Process.java interface. Therefore I was hoping that with JEP 102,
> we would get official support for process groups. Unfortunately, seems the
> decision is already done and we are too late in the discussion :(
>
> It would be interesting to see a description of what you added to/around
> the API.
>

Very simple really, all we did was to add a flag to Runtime.exec -
ultimately exposed via ProcessBuilder - to make the child process leader of
a new process group. This flag just triggered a setpgid() call between
fork() and exec() in the child process. This caused creation of a new
process group with child process as leader. Now you could kill the whole
tree with kill(-pid). On Windows we implemented it with Jobs.

It was all simple because we did never aim to bring process groups with all
their features to the JDK, we just needed a way to kill a tree of child
processes, which is a rather specific problem.

The reason to avoid them was one of simplicity and non-interference with
> processes
> spawned by native libraries.
>

See, that I don't understand, you still interfere with them by returning
all child pids - be they spawned by java or by native libs. Or do you mean
you offload responsibility to the caller - so he should decide whether to
kill the child pids indiscriminately or be more careful?


>   If that complexity can be understood process groups/jobs
> could fulfill a need in a scalable system.
>
>
I think process groups could be added to the API if they are well
documented (which admittedly will be difficult in a platform-neutral way).
Basically, process groups are a tool like all others, and the caller must
think before using it like with every other tool.


> At this point, I'd like to deal with it as a separate request for
> enhancement.
>

Sure! Thanks for listening.

Kind Regards, Thoams


>
>
>
>  see my other comments inline.
>
> On Sat, Apr 11, 2015 at 8:55 PM, Roger Riggs <Roger.Riggs at oracle.com>
> wrote:
>
>>  Hi Thomas,
>>
>> Thanks for the comments.
>>
>> On 4/11/2015 8:31 AM, Thomas Stüfe wrote:
>>
>> Hi Roger,
>>
>>  I have a question about getChildren() and getAllChildren().
>>
>> I assume the point of those functions is to implement point 4 of JEP 102
>> ("The ability to deal with process trees, in particular some means to
>> destroy a process tree."), by returning a collection of PIDs which are the
>> children of the process and then killing them?
>>
>>  Earlier versions included a killProcess tree method but it was
>> recommended to leave
>> the exact algorithm to kill processes to the caller.
>>
>>
>>  However, I am not sure that this can be implemented in a safe way, at
>> least on UNIX, because - as Martin already pointed out - of PID recycling.
>> I do not see how you can prevent allChildren() from returning PIDs which
>> may be already reaped and recyled when you use them later. How do you
>> prevent that?
>>
>>  Unless there is an extended time between getting the children and
>> destroying them the pids will still be valid.
>>
>
>  Why? Child process may be getting reaped the instant you are done
> reading it from /proc, and pid may have been recycled by the OS right away
> and already pointing to another process when allChildren() returns. If a
> process lives about as long as it takes the system to reach a pid
> wraparound to the same pid value, its pid could be recycled right after it
> is reaped, or? Sure, the longer you wait, the higher the chance of this to
> happen, but it may happen right away.
>
>  As Martin said, we had those races in the kill() code since a long time,
> but children()/allChildren() could make those error more probable, because
> now more processes are involved. Especially if you use allChildren to kill
> a deep process tree. And there is nothing in the javadoc warning the user
> about this scenario. You would just happen from time to time to kill an
> unrelated process. Those problems are hard to debug.
>
>  The technique of caching the start time can prevent that case; though it
>> has AFAIK not been a problem.
>>
>
>  How would that work? User should, before issuing the kill, compare start
> time of process to kill with cached start time?
>
> See Peter's email, he described it more thoroughly that I have in previous
> emails.
>
>     Note even if your coding is bulletproof, that allChildren() will also
>> return PIDs of sub processes which are completely unrelated to you and
>> Process.java - they could have been forked by some third party native code
>> which just happens to run in parallel in the same process. There, you have
>> no control about when it gets reaped. It might already have been reaped by
>> the time allChildren() returns, and now the same PID got recycled as
>> another, unrelated process.
>>
>>  Of course, the best case is for an application to spawn and manage its
>> own processes
>> and handle there proper termination.
>> The use cases for children/allChildren are focused on
>> supervisory/executive functions
>> that monitor a running system and can cleanup even in the case of
>> unexpected failures.
>>
>  All management of processes is subject to OS limitations, if the PID
>> were from a completely
>> different process tree, the ordinary destroy/info functions would not be
>> available
>> unless the process was running as a privileged os user (same as any other
>> native application).
>>
>
>  Could you explain this please? If both trees run under the same user,
> why should I not be able to kill a process from a different tree?
>
> I was considering the case of a different user; only the OS access
> controls apply
> so if it was the same user the processes could be controlled.
> The PH API does not provide more or less access than the OS.
>
> Thanks, Roger
>
>
>     If I am right, it would not be sufficient to state "There is no
>> guarantee that a process is alive." - it may be alive but by now be a
>> different process altogether. This makes "allChildren()" useless for many
>> cases, because the returned information may already be obsolete the moment
>> the function returns.
>>
>>  The caching of startTime can remove the ambiguity.
>>
>
>>
>>  Of course I may something missing here?
>>
>>  But if I got all that right and the sole purpose of allChildren() is to
>> be able to kill them (or otherwise signal them), why not use process
>> groups? Process groups would be the traditional way on POSIX platforms to
>> handle process trees, and they are also available on Windows in the form of
>> Job Objects.
>>
>>  Using process groups to signal sub process trees would be safe, would
>> not rely on PID identity, and would be more efficient. Also way less
>> coding. Also, it would be an old, established pattern - process groups have
>> been around for a long time. Also, using process groups it is possible to
>> break away from a group, so a program below you which wants to run as a
>> demon can do so by removing itself from the process group and thus escaping
>> your kill.
>>
>>  On Windows we have Job objects, and I think there are enough
>> similarities to POSIX process groups to abstract them into something
>> platform independent.
>>
>>  Earlier discussions of process termination and exit value reaping
>> considered
>> using process groups but it became evident that the Java runtime needed to
>> be very careful to not interfere with processes that might be spawned and
>> controlled by native libraries and that process groups would only increase
>> complexity and the interactions.
>>
>
>>  Thanks, Roger
>>
>>
>  Thanks! Thomas
>
>
>
>



More information about the core-libs-dev mailing list