RFR 9: 8078099 : (process) ProcessHandle should uniquely identify processes

Tue Jul 7 11:31:22 UTC 2015

Hi Roger,

On 07/06/2015 04:47 PM, Roger Riggs wrote:
> Hi Peter,
>
> Thanks for reviewing the new implementation.
>
> The idea to pre-allocate the buffer based on the size of the output 
> arrays passed for output
> will not work as desired.
> The same native code entry point is used whether it is a query for all 
> processes or
> just the immediate children.  In either case, the buffer for sysctl 
> needs to be
> large enough to accommodate all of the OS threads; its not a function 
> of the
> output array size.

Ah, I see now. You optionally filter the entries returned by sysctl 
based on passed-in parent pid.

>
> With respect to the buffer allocation on OS X, I might see the point 
> of a retry
> when sysctl returns ENOMEM;
>
>
> I don't think its particularly worthwhile to add the native code to 
> retry in the case
> of ENOMEM. The OSX doc for sysctrl does indicate it tries to round up
> to avoid a subsequent failure.
> The ProcessHandle API already cautions that the list of processes is 
> dynamic
> and that processes may started or terminate concurrently with the 
> call  to children/allChildren.
> So if the sysctl returns ENOMEM indicating not all the processes fit; 
> the ones
> that do fit within in the allocated buffer are valid at that point in 
> time.
> If there are extras that do not fit in the buffer they are likely to 
> be 'newer'
> processes and can be considered to have been started 'after' the 
> children/allChildren invocation.

If that's the case, then it might be OK to just silently ignore it. But 
OTOH it would be surprising if some already long running pid was skipped 
because of that.

What about designing the getProcessPids0 API to always return all 
processes and do the filtering in Java? At least current implementations 
wouldn't become less optimal because of that (you always retrieve all 
processes from the OS and just return the filtered list to Java).

Some alternatives (I'm sure you have already considered and rejected 
because of added complexity):

An alternative could be designing the API around returning a single 
long[] that you allocate in native code - with (pid, ppid, stime) placed 
into array as consecutive triplets: [pid1, ppid1, stime1, pid2, ppid2, 
stime2, ...], but you would have to deal with allocation and 
re-allocation of the resulting array in native code which might get 
complicated.

Another alternative: using a private class:

static class ProcEntry {
     long pid, ppid, stime;
     ProcEntry next;
}

...and building a linked-list of ProcEntries in native code...

> So I propose to change the error checking to consider ENOMEM as a 
> non-exceptional
> return and return the processes available.

It should be non-exceptional in any case. I just wonder if it is OK that 
this "lack of full-information" is not communicated to the method caller 
and acted upon.

Regards, Peter

>
> Thanks, Roger
>
>
>
> On 7/2/2015 3:32 AM, Peter Levart wrote:
>> Hi Roger,
>>
>> I looked at the code briefly and have the following comments:
>>
>> For ProcessHandleImpl_macosx.c, in method getProcessPids0:
>>
>>  224     // Get buffer size needed to read all processes
>>  225     int mib[4] = {CTL_KERN, KERN_PROC, KERN_PROC_ALL, 0};
>>  226     if (sysctl(mib, 4, NULL, &bufSize, NULL, 0) < 0) {
>>  227         JNU_ThrowByNameWithLastError(env,
>>  228             "java/lang/RuntimeException", "sysctl failed");
>>  229         return -1;
>>  230     }
>>  231
>>  232     // Allocate buffer big enough for all processes
>>  233     void *buffer = malloc(bufSize);
>>  234     if (buffer == NULL) {
>>  235         JNU_ThrowOutOfMemoryError(env, "malloc failed");
>>  236         return -1;
>>  237     }
>>  238
>>  239     // Read process info for all processes
>>  240     if (sysctl(mib, 4, buffer, &bufSize, NULL, 0) < 0) {
>>  241         JNU_ThrowByNameWithLastError(env,
>>  242             "java/lang/RuntimeException", "sysctl failed");
>>  243         free(buffer);
>>  244         return -1;
>>  245     }
>>
>> ... the 1st call to sysctl is used to measure the size of the needed 
>> buffer and the 2nd call fills-in the buffer. The documentation for 
>> sysctl says:
>>
>> "     The information is copied into the buffer specified by oldp. 
>> The size of the buffer is given by the
>>      location specified by oldlenp before the call, and that location 
>> gives the amount of data copied after
>>      a successful call and after a call that returns with the error 
>> code ENOMEM.  If the amount of data
>>      available is greater than the size of the buffer supplied, the 
>> call supplies as much data as fits in
>>      the buffer provided and returns with the error code ENOMEM.  If 
>> the old value is not desired, oldp and
>>      oldlenp should be set to NULL.
>>
>>      The size of the available data can be determined by calling 
>> sysctl() with the NULL argument for oldp.
>>      The size of the available data will be returned in the location 
>> pointed to by oldlenp.  For some opera-tions, operations,
>>      tions, the amount of space may change often.  For these 
>> operations, the system attempts to round up so
>>      that the returned size is large enough for a call to return the 
>> data shortly thereafter."
>>
>> So while not very probable, it can happen that you get ENOMEM from 
>> 2nd call because of underestimated buffer size. Would it be better to 
>> retry (re)allocation of buffer and 2nd call in a loop with new 
>> estimation returned from previous call while the error is ENOMEM?
>>
>> Another suggestion: What would it be if the buffer size estimation 
>> was not computed by a call to sysctl with NULL buffer, but was taken 
>> from the passed-in resulting array size(s). In case the user 
>> passes-in arrays of sufficient size, you can avoid double invocation 
>> of sysctl. Also, if ENOMEM happens, you can just return the result 
>> obtained and the new estimation - no looping in native code. The 
>> UNIX, Solaris and Windows variants look good.
>>
>> Regards, Peter
>>
>> On 06/22/2015 05:10 PM, Roger Riggs wrote:
>>> Please review changes to ProcessHandle implementation to uniquely 
>>> identify
>>> processes based on the start time provided by the OS. It addresses 
>>> the issue
>>> of PID reuse.
>>>
>>> This is the implementation of the ProcessHandle.equals() spec change in
>>>     8129344 : (process) ProcessHandle instances should define equals 
>>> and be value-based
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~rriggs/webrev-starttime-8078099/
>>>
>>> Issue:
>>> https://bugs.openjdk.java.net/browse/JDK-8078099
>>>
>>> Thanks, Roger
>>>
>>
>