RFR 8220684 : Process.waitFor(long, TimeUnit) can return false for a process that exited within the timeout

Ivan Gerasimov ivan.gerasimov at oracle.com
Fri Mar 15 08:08:38 UTC 2019


Hi David!

The code could be written like following:

long start = System.nanoTime();
do {
     ...
     long elapsed = System.nanoTime() - start;
     remainingNanos = timeoutNanos - elapsed;
} while (remainingNanos > 0);

but then one realizes that this always calculates (start + 
timeoutNanos), both of which are effectively final.
So, the sum can be calculated in advance, and named, say, "deadline" :)

With kind regards,
Ivan

On 3/15/19 12:16 AM, David Holmes wrote:
> On 15/03/2019 5:03 pm, Ivan Gerasimov wrote:
>> Thank you David!
>>
>>
>> On 3/14/19 11:01 PM, David Holmes wrote:
>>> Hi Ivan,
>>>
>>> On 15/03/2019 12:02 pm, Ivan Gerasimov wrote:
>>>> Hi David!
>>>>
>>>>
>>>> On 3/14/19 5:28 PM, David Holmes wrote:
>>>>> Hi Ivan,
>>>>>
>>>>> On 15/03/2019 5:49 am, Ivan Gerasimov wrote:
>>>>>> Hello!
>>>>>>
>>>>>> The default implementation of Process.waitFor(long, TimeUnit) 
>>>>>> does not check if the process has exited after the last portion 
>>>>>> of the timeout has expired.
>>>>>
>>>>> Please clarify. There is always a race between detecting a timeout 
>>>>> and the process actually terminating. If the process actually 
>>>>> terminates while you're deciding to report a timeout that seems 
>>>>> just  an acceptable possibility. No matter what you do the process 
>>>>> could terminate just after you decided to report the timeout.
>>>>>
>>>>>
>>>> Current code for waitFor(...) is
>>>>
>>>>   212         do {
>>>>   213             try {
>>>>   214                 exitValue();
>>>>   215                 return true;
>>>>   216             } catch(IllegalThreadStateException ex) {
>>>>   217                 if (rem > 0)
>>>>   218                     Thread.sleep(
>>>>   219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100));
>>>>   220             }
>>>>   221             rem = unit.toNanos(timeout) - (System.nanoTime() 
>>>> - startTime);
>>>>   222         } while (rem > 0);
>>>>   223         return false;
>>>>
>>>> So, if the loop has just processed the last 100 ms portion of the 
>>>> timeout, it ends right away, without checking if the process has 
>>>> exited during this period.
>>>
>>> Ah I see. Yes there should be a final check after what will be the 
>>> last sleep.
>>>
>>>> Not a big deal of course, but it is more accurate to check if the 
>>>> process has exited at the *end* of the specified timeout, and not 
>>>> 100 milliseconds before the end.
>>>
>>> Agreed.
>>>
>>> I share Pavel's concern about the implicit overflow in calculating a 
>>> deadline, rather than comparing elapsed time with the timeout. There 
>>> has been some effort in core-libs to remove assumptions about how 
>>> overflows are handled.
>>>
>> We only check sign of the remainingNanos, which is initially strictly 
>> positive timeout.  Over time it gets decreased by elapsed time.
>> The elapsed time is the difference between System.nanoTime() measured 
>> at line 217 and 213, so it is always non-negative (well, strictly 
>> positive, as there's sleep in between).
>>
>> I don't really see a possibility of overflow for remainingNanos here.
>>
>> The deadline may overflow, and it is totally fine, as we don't care 
>> about its sign.
>
> It's deadline that I was flagging.
>
>> Am I missing something?
>
> Just a code cleanliness issue. I had thought, but now can't find, 
> these kinds of overflowing calculations were being cleaned up in 
> core-libs. But perhaps I'm just mis-remembering.
>
> Cheers,
> David
>
>>
>> With kind regards,
>> Ivan
>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> With kind regards,
>>>> Ivan
>>>>
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> JDK has two implementations of Process (for Unix and Windows) and 
>>>>>> they both override waitFor(), so it's not an issue for them.
>>>>>>
>>>>>> Still, it is better to provide a more accurate default 
>>>>>> implementation.
>>>>>>
>>>>>> I'm not quite certain the regression test needs to be included in 
>>>>>> the fix.  The test does demonstrate the issue with the unfixed 
>>>>>> JDK and passed Okay on all tested platforms in Mach5. Yet, I 
>>>>>> suspect the test can still show false negative results, as there 
>>>>>> are no guaranties that even such simple application as `true` 
>>>>>> will finish in 100 ms.
>>>>>> I can tag the test as @ignored with a comment, or simply remove 
>>>>>> it from the fix.
>>>>>>
>>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684
>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/
>>>>>>
>>>>>> Thanks in advance for reviewing!
>>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
With kind regards,
Ivan Gerasimov



More information about the core-libs-dev mailing list