RFR 10: 8184808 (process) isAlive should use pid for validity, not /proc/pid

Roger Riggs Roger.Riggs at Oracle.com
Thu Jul 20 14:25:17 UTC 2017


Hi Thomas,

Thanks for the investigation and links.
The variations, across os's, in the status of exited vs reaped (zombie) 
process have been a
problem for quite a while (for portable apps).

The description of waitpid is focused heavily on child processes; this a 
particular case
is dealing with non-child processes so I stayed with using kill(pid,0) 
to determine liveness.

Thanks, Roger


On 7/19/2017 4:20 AM, Thomas Stüfe wrote:
> Hi Roger,
>
> On Tue, Jul 18, 2017 at 9:01 PM, Roger Riggs <Roger.Riggs at oracle.com 
> <mailto:Roger.Riggs at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     Yes, if there is no access to the pid, then it can't report alive
>     or not, and assume not.
>     If there access restrictions it will apply to the waitid/waitpid
>     in the waitForProcessExit0
>     logic also and the answer will be at least consistent (and avoid a
>     possible race
>     between //proc/pid/psinfo and kill state).
>
>     Thanks, Roger
>
>
> Okay, sounds reasonable. Interestingly, while reading up on the 
> semantics of kill(), I found:
>
> http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html
>
> "Existing implementations vary on the result of a kill() with pid 
> indicating an inactive process (a terminated process that has not been 
> waited for by its parent). Some indicate success on such a call 
> (subject to permission checking), while others give an error of 
> [ESRCH]. Since the definition of process lifetime in this volume of 
> IEEE Std 1003.1-2001 covers inactive processes, the [ESRCH] error as 
> described is inappropriate in this case. In particular, this means 
> that an application cannot have a parent process check for termination 
> of a particular child with kill(). (Usually this is done with the null 
> signal; this can be done reliably with waitpid().)"
>
> So, kill() may return success for terminated but not yet reaped 
> processes. I did not know that.
>
> But this does not invalidate your change, does it, if all you want to 
> do is to force one consistent view. At least I did not find any code 
> relying on isAlive returning false for not-yet-reaped processes.
>
> Thanks, Thomas
>
>
>     On 7/18/2017 2:53 PM, Thomas Stüfe wrote:
>>     Hi Roger,
>>
>>     I think this may fail if you have no permission to send a signal
>>     to that process. In that case, kill(2) may yield EPERM and
>>     isAlive may return false even though the process is alive.
>>
>>     But then, I am not sure if that could happen in that particular
>>     scenario, plus it may also mean that you do not have access to
>>     /proc/pid either. So, I do not know how much of an issue this
>>     could be.
>>
>>     Otherwise, the fix seems straightforward.
>>
>>     Kind Regards, Thomas
>>
>>     On Tue, Jul 18, 2017 at 8:46 PM, Roger Riggs
>>     <Roger.Riggs at oracle.com <mailto:Roger.Riggs at oracle.com>> wrote:
>>
>>         Please review a fix for an intermittent failure in the
>>         ProcessHandle OnExitTest
>>         that fails frequently on Solaris.
>>
>>         ProcessHandle.isAlive is using /proc/pid/psinfo to determine
>>         if a process is alive and it's start time.
>>         However, it appears that the between the process exiting and
>>         the reaping of its status, the
>>         psinfo file indicates the process is alive but kill(pid, 0)
>>         reports that is is not alive.
>>         Depending on a race, the ProcessHandler.onExit may determine
>>         the process has exited
>>         but later isAlive may report it is alive.
>>
>>         To have a consistent view of the process being alive,
>>         ProcessHandle.isAlive in its native implementation
>>         should use kill(pid, 0) to determine if the process is
>>         definitively determine if the process alive.
>>
>>         The original issue[1] will be kept open until it is known
>>         that it is resolved.
>>
>>         Webrev:
>>         http://cr.openjdk.java.net/~rriggs/webrev-alive-solaris-8184808/
>>         <http://cr.openjdk.java.net/%7Erriggs/webrev-alive-solaris-8184808/>
>>
>>         Issue:
>>         https://bugs.openjdk.java.net/browse/JDK-8184808
>>         <https://bugs.openjdk.java.net/browse/JDK-8184808>
>>
>>         Thanks, Roger
>>
>>         [1] https://bugs.openjdk.java.net/browse/JDK-8177932
>>         <https://bugs.openjdk.java.net/browse/JDK-8177932>
>>
>>
>>
>
>



More information about the core-libs-dev mailing list