RFR 10: 8184808 (process) isAlive should use pid for validity, not /proc/pid
Thomas Stüfe
thomas.stuefe at gmail.com
Fri Jul 21 05:07:06 UTC 2017
Hi Roger,
On Thu, Jul 20, 2017 at 4:25 PM, Roger Riggs <Roger.Riggs at oracle.com> wrote:
> Hi Thomas,
>
> Thanks for the investigation and links.
> The variations, across os's, in the status of exited vs reaped (zombie)
> process have been a
> problem for quite a while (for portable apps).
>
> The description of waitpid is focused heavily on child processes; this a
> particular case
> is dealing with non-child processes so I stayed with using kill(pid,0) to
> determine liveness.
>
> Thanks, Roger
>
>
That makes sense. Thanks for clarifying.
..Thomas
> On 7/19/2017 4:20 AM, Thomas Stüfe wrote:
>
> Hi Roger,
>
> On Tue, Jul 18, 2017 at 9:01 PM, Roger Riggs <Roger.Riggs at oracle.com>
> wrote:
>
>> Hi Thomas,
>>
>> Yes, if there is no access to the pid, then it can't report alive or not,
>> and assume not.
>> If there access restrictions it will apply to the waitid/waitpid in the
>> waitForProcessExit0
>> logic also and the answer will be at least consistent (and avoid a
>> possible race
>> between //proc/pid/psinfo and kill state).
>>
>> Thanks, Roger
>>
>>
> Okay, sounds reasonable. Interestingly, while reading up on the semantics
> of kill(), I found:
>
> http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html
>
> "Existing implementations vary on the result of a kill() with pid
> indicating an inactive process (a terminated process that has not been
> waited for by its parent). Some indicate success on such a call (subject to
> permission checking), while others give an error of [ESRCH]. Since the
> definition of process lifetime in this volume of IEEE Std 1003.1-2001
> covers inactive processes, the [ESRCH] error as described is inappropriate
> in this case. In particular, this means that an application cannot have a
> parent process check for termination of a particular child with kill().
> (Usually this is done with the null signal; this can be done reliably with
> waitpid().)"
>
> So, kill() may return success for terminated but not yet reaped processes.
> I did not know that.
>
> But this does not invalidate your change, does it, if all you want to do
> is to force one consistent view. At least I did not find any code relying
> on isAlive returning false for not-yet-reaped processes.
>
> Thanks, Thomas
>
>
>
>>
>> On 7/18/2017 2:53 PM, Thomas Stüfe wrote:
>>
>> Hi Roger,
>>
>> I think this may fail if you have no permission to send a signal to that
>> process. In that case, kill(2) may yield EPERM and isAlive may return false
>> even though the process is alive.
>>
>> But then, I am not sure if that could happen in that particular scenario,
>> plus it may also mean that you do not have access to /proc/pid either. So,
>> I do not know how much of an issue this could be.
>>
>> Otherwise, the fix seems straightforward.
>>
>> Kind Regards, Thomas
>>
>> On Tue, Jul 18, 2017 at 8:46 PM, Roger Riggs <Roger.Riggs at oracle.com>
>> wrote:
>>
>>> Please review a fix for an intermittent failure in the ProcessHandle
>>> OnExitTest
>>> that fails frequently on Solaris.
>>>
>>> ProcessHandle.isAlive is using /proc/pid/psinfo to determine if a
>>> process is alive and it's start time.
>>> However, it appears that the between the process exiting and the reaping
>>> of its status, the
>>> psinfo file indicates the process is alive but kill(pid, 0) reports that
>>> is is not alive.
>>> Depending on a race, the ProcessHandler.onExit may determine the process
>>> has exited
>>> but later isAlive may report it is alive.
>>>
>>> To have a consistent view of the process being alive,
>>> ProcessHandle.isAlive in its native implementation
>>> should use kill(pid, 0) to determine if the process is definitively
>>> determine if the process alive.
>>>
>>> The original issue[1] will be kept open until it is known that it is
>>> resolved.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~rriggs/webrev-alive-solaris-8184808/
>>>
>>> Issue:
>>> https://bugs.openjdk.java.net/browse/JDK-8184808
>>>
>>> Thanks, Roger
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8177932
>>>
>>>
>>>
>>
>>
>
>
More information about the core-libs-dev
mailing list