RFR 8220684 : Process.waitFor(long, TimeUnit) can return false for a process that exited within the timeout
Ivan Gerasimov
ivan.gerasimov at oracle.com
Fri Mar 15 08:08:38 UTC 2019
Hi David!
The code could be written like following:
long start = System.nanoTime();
do {
...
long elapsed = System.nanoTime() - start;
remainingNanos = timeoutNanos - elapsed;
} while (remainingNanos > 0);
but then one realizes that this always calculates (start +
timeoutNanos), both of which are effectively final.
So, the sum can be calculated in advance, and named, say, "deadline" :)
With kind regards,
Ivan
On 3/15/19 12:16 AM, David Holmes wrote:
> On 15/03/2019 5:03 pm, Ivan Gerasimov wrote:
>> Thank you David!
>>
>>
>> On 3/14/19 11:01 PM, David Holmes wrote:
>>> Hi Ivan,
>>>
>>> On 15/03/2019 12:02 pm, Ivan Gerasimov wrote:
>>>> Hi David!
>>>>
>>>>
>>>> On 3/14/19 5:28 PM, David Holmes wrote:
>>>>> Hi Ivan,
>>>>>
>>>>> On 15/03/2019 5:49 am, Ivan Gerasimov wrote:
>>>>>> Hello!
>>>>>>
>>>>>> The default implementation of Process.waitFor(long, TimeUnit)
>>>>>> does not check if the process has exited after the last portion
>>>>>> of the timeout has expired.
>>>>>
>>>>> Please clarify. There is always a race between detecting a timeout
>>>>> and the process actually terminating. If the process actually
>>>>> terminates while you're deciding to report a timeout that seems
>>>>> just an acceptable possibility. No matter what you do the process
>>>>> could terminate just after you decided to report the timeout.
>>>>>
>>>>>
>>>> Current code for waitFor(...) is
>>>>
>>>> 212 do {
>>>> 213 try {
>>>> 214 exitValue();
>>>> 215 return true;
>>>> 216 } catch(IllegalThreadStateException ex) {
>>>> 217 if (rem > 0)
>>>> 218 Thread.sleep(
>>>> 219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100));
>>>> 220 }
>>>> 221 rem = unit.toNanos(timeout) - (System.nanoTime()
>>>> - startTime);
>>>> 222 } while (rem > 0);
>>>> 223 return false;
>>>>
>>>> So, if the loop has just processed the last 100 ms portion of the
>>>> timeout, it ends right away, without checking if the process has
>>>> exited during this period.
>>>
>>> Ah I see. Yes there should be a final check after what will be the
>>> last sleep.
>>>
>>>> Not a big deal of course, but it is more accurate to check if the
>>>> process has exited at the *end* of the specified timeout, and not
>>>> 100 milliseconds before the end.
>>>
>>> Agreed.
>>>
>>> I share Pavel's concern about the implicit overflow in calculating a
>>> deadline, rather than comparing elapsed time with the timeout. There
>>> has been some effort in core-libs to remove assumptions about how
>>> overflows are handled.
>>>
>> We only check sign of the remainingNanos, which is initially strictly
>> positive timeout. Over time it gets decreased by elapsed time.
>> The elapsed time is the difference between System.nanoTime() measured
>> at line 217 and 213, so it is always non-negative (well, strictly
>> positive, as there's sleep in between).
>>
>> I don't really see a possibility of overflow for remainingNanos here.
>>
>> The deadline may overflow, and it is totally fine, as we don't care
>> about its sign.
>
> It's deadline that I was flagging.
>
>> Am I missing something?
>
> Just a code cleanliness issue. I had thought, but now can't find,
> these kinds of overflowing calculations were being cleaned up in
> core-libs. But perhaps I'm just mis-remembering.
>
> Cheers,
> David
>
>>
>> With kind regards,
>> Ivan
>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> With kind regards,
>>>> Ivan
>>>>
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> JDK has two implementations of Process (for Unix and Windows) and
>>>>>> they both override waitFor(), so it's not an issue for them.
>>>>>>
>>>>>> Still, it is better to provide a more accurate default
>>>>>> implementation.
>>>>>>
>>>>>> I'm not quite certain the regression test needs to be included in
>>>>>> the fix. The test does demonstrate the issue with the unfixed
>>>>>> JDK and passed Okay on all tested platforms in Mach5. Yet, I
>>>>>> suspect the test can still show false negative results, as there
>>>>>> are no guaranties that even such simple application as `true`
>>>>>> will finish in 100 ms.
>>>>>> I can tag the test as @ignored with a comment, or simply remove
>>>>>> it from the fix.
>>>>>>
>>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684
>>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/
>>>>>>
>>>>>> Thanks in advance for reviewing!
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
With kind regards,
Ivan Gerasimov
More information about the core-libs-dev
mailing list