RFR 10: 8184808 (process) isAlive should use pid for validity, not /proc/pid
Roger Riggs
Roger.Riggs at Oracle.com
Thu Jul 20 14:25:17 UTC 2017
Hi Thomas,
Thanks for the investigation and links.
The variations, across os's, in the status of exited vs reaped (zombie)
process have been a
problem for quite a while (for portable apps).
The description of waitpid is focused heavily on child processes; this a
particular case
is dealing with non-child processes so I stayed with using kill(pid,0)
to determine liveness.
Thanks, Roger
On 7/19/2017 4:20 AM, Thomas Stüfe wrote:
> Hi Roger,
>
> On Tue, Jul 18, 2017 at 9:01 PM, Roger Riggs <Roger.Riggs at oracle.com
> <mailto:Roger.Riggs at oracle.com>> wrote:
>
> Hi Thomas,
>
> Yes, if there is no access to the pid, then it can't report alive
> or not, and assume not.
> If there access restrictions it will apply to the waitid/waitpid
> in the waitForProcessExit0
> logic also and the answer will be at least consistent (and avoid a
> possible race
> between //proc/pid/psinfo and kill state).
>
> Thanks, Roger
>
>
> Okay, sounds reasonable. Interestingly, while reading up on the
> semantics of kill(), I found:
>
> http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html
>
> "Existing implementations vary on the result of a kill() with pid
> indicating an inactive process (a terminated process that has not been
> waited for by its parent). Some indicate success on such a call
> (subject to permission checking), while others give an error of
> [ESRCH]. Since the definition of process lifetime in this volume of
> IEEE Std 1003.1-2001 covers inactive processes, the [ESRCH] error as
> described is inappropriate in this case. In particular, this means
> that an application cannot have a parent process check for termination
> of a particular child with kill(). (Usually this is done with the null
> signal; this can be done reliably with waitpid().)"
>
> So, kill() may return success for terminated but not yet reaped
> processes. I did not know that.
>
> But this does not invalidate your change, does it, if all you want to
> do is to force one consistent view. At least I did not find any code
> relying on isAlive returning false for not-yet-reaped processes.
>
> Thanks, Thomas
>
>
> On 7/18/2017 2:53 PM, Thomas Stüfe wrote:
>> Hi Roger,
>>
>> I think this may fail if you have no permission to send a signal
>> to that process. In that case, kill(2) may yield EPERM and
>> isAlive may return false even though the process is alive.
>>
>> But then, I am not sure if that could happen in that particular
>> scenario, plus it may also mean that you do not have access to
>> /proc/pid either. So, I do not know how much of an issue this
>> could be.
>>
>> Otherwise, the fix seems straightforward.
>>
>> Kind Regards, Thomas
>>
>> On Tue, Jul 18, 2017 at 8:46 PM, Roger Riggs
>> <Roger.Riggs at oracle.com <mailto:Roger.Riggs at oracle.com>> wrote:
>>
>> Please review a fix for an intermittent failure in the
>> ProcessHandle OnExitTest
>> that fails frequently on Solaris.
>>
>> ProcessHandle.isAlive is using /proc/pid/psinfo to determine
>> if a process is alive and it's start time.
>> However, it appears that the between the process exiting and
>> the reaping of its status, the
>> psinfo file indicates the process is alive but kill(pid, 0)
>> reports that is is not alive.
>> Depending on a race, the ProcessHandler.onExit may determine
>> the process has exited
>> but later isAlive may report it is alive.
>>
>> To have a consistent view of the process being alive,
>> ProcessHandle.isAlive in its native implementation
>> should use kill(pid, 0) to determine if the process is
>> definitively determine if the process alive.
>>
>> The original issue[1] will be kept open until it is known
>> that it is resolved.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~rriggs/webrev-alive-solaris-8184808/
>> <http://cr.openjdk.java.net/%7Erriggs/webrev-alive-solaris-8184808/>
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8184808
>> <https://bugs.openjdk.java.net/browse/JDK-8184808>
>>
>> Thanks, Roger
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8177932
>> <https://bugs.openjdk.java.net/browse/JDK-8177932>
>>
>>
>>
>
>
More information about the core-libs-dev
mailing list