RFR 10: 8184808 (process) isAlive should use pid for validity, not /proc/pid
Thomas Stüfe
thomas.stuefe at gmail.com
Wed Jul 19 08:20:54 UTC 2017
Hi Roger,
On Tue, Jul 18, 2017 at 9:01 PM, Roger Riggs <Roger.Riggs at oracle.com> wrote:
> Hi Thomas,
>
> Yes, if there is no access to the pid, then it can't report alive or not,
> and assume not.
> If there access restrictions it will apply to the waitid/waitpid in the
> waitForProcessExit0
> logic also and the answer will be at least consistent (and avoid a
> possible race
> between //proc/pid/psinfo and kill state).
>
> Thanks, Roger
>
>
Okay, sounds reasonable. Interestingly, while reading up on the semantics
of kill(), I found:
http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html
"Existing implementations vary on the result of a kill() with pid
indicating an inactive process (a terminated process that has not been
waited for by its parent). Some indicate success on such a call (subject to
permission checking), while others give an error of [ESRCH]. Since the
definition of process lifetime in this volume of IEEE Std 1003.1-2001
covers inactive processes, the [ESRCH] error as described is inappropriate
in this case. In particular, this means that an application cannot have a
parent process check for termination of a particular child with kill().
(Usually this is done with the null signal; this can be done reliably with
waitpid().)"
So, kill() may return success for terminated but not yet reaped processes.
I did not know that.
But this does not invalidate your change, does it, if all you want to do is
to force one consistent view. At least I did not find any code relying on
isAlive returning false for not-yet-reaped processes.
Thanks, Thomas
>
> On 7/18/2017 2:53 PM, Thomas Stüfe wrote:
>
> Hi Roger,
>
> I think this may fail if you have no permission to send a signal to that
> process. In that case, kill(2) may yield EPERM and isAlive may return false
> even though the process is alive.
>
> But then, I am not sure if that could happen in that particular scenario,
> plus it may also mean that you do not have access to /proc/pid either. So,
> I do not know how much of an issue this could be.
>
> Otherwise, the fix seems straightforward.
>
> Kind Regards, Thomas
>
> On Tue, Jul 18, 2017 at 8:46 PM, Roger Riggs <Roger.Riggs at oracle.com>
> wrote:
>
>> Please review a fix for an intermittent failure in the ProcessHandle
>> OnExitTest
>> that fails frequently on Solaris.
>>
>> ProcessHandle.isAlive is using /proc/pid/psinfo to determine if a process
>> is alive and it's start time.
>> However, it appears that the between the process exiting and the reaping
>> of its status, the
>> psinfo file indicates the process is alive but kill(pid, 0) reports that
>> is is not alive.
>> Depending on a race, the ProcessHandler.onExit may determine the process
>> has exited
>> but later isAlive may report it is alive.
>>
>> To have a consistent view of the process being alive,
>> ProcessHandle.isAlive in its native implementation
>> should use kill(pid, 0) to determine if the process is definitively
>> determine if the process alive.
>>
>> The original issue[1] will be kept open until it is known that it is
>> resolved.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~rriggs/webrev-alive-solaris-8184808/
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8184808
>>
>> Thanks, Roger
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8177932
>>
>>
>>
>
>
More information about the core-libs-dev
mailing list