RFR 8220684 : Process.waitFor(long, TimeUnit) can return false for a process that exited within the timeout

Pavel Rappo pavel.rappo at oracle.com
Fri Mar 15 12:30:09 UTC 2019


Thanks for the clarifications. The change looks fine now.

I felt initial unease because, in my experience, it's so easy to overlook some
subtlety when working with overflowing integers.

For `System.currentTimeMillis` the code like:

    stopTime - startTime >= timeout

doesn't raise one's eyebrows. For one, both `stopTime` and `startTime` are
non-negative and stopTime >= startTime. And that will be the case for the next
300 million years [1] of which we have lived the first 50 years or so.

For the code in your fix, however, not only can `deadline` be negative, but the
value returned by `System.nanoTime` too. What's even worse is that `System.nanoTime`
can change the sign between invocations [2].

For the problem like that, I usually draw a circle with wrapping points
Long.MAX_VALUE and Long.MIN_VALUE on the paper and satisfy myself by exploring
some number of examples that exhaust all (sign) possibilities. There's a
significant cognitive load to perform this analysis each and every time. And
there is always a possibility of making a mistake [3]. Happy for the people who
can do it seamlessly and on the fly. As for me, I wish there were a method that
hid all these subtleties, something like

    long remainingTimeout(long firstMeasurement,
                          long secondMeasurement,
                          long timeout)

Or maybe it's just the documentation for `System.nanoTime` and `System.currentTimeMillis`
that could be updated so as to explain the right way in details.
`System.nanoTime` is already doing a good job there.

I've searched through the JDK and found out that kind of timeout calculations
are quite abundant. In particular, the one used in your code seems to be an
idiom. My guess, it might be true for other projects.

-Pavel

--------------------------------------------------------------------------------
[1] (2^63) / (10^3 * 24 * 60 * 60 * 365)
[2] It's unknown what would the initial (at JVM start) value be. It might be
    negative. So that's why `System.nanoTime() >= startTime + timeoutNanos` might
    fail even if the right hand side doesn't overflow.
[3] Heck, you can even break your neck trying to find a middle of the index range:
    [a, b]. While `(a + b) / 2` overflows, `a + (b - a)/2` (or `(a + b) >>> 1`)
    does not.

> On 14 Mar 2019, at 21:32, Ivan Gerasimov <ivan.gerasimov at oracle.com> wrote:
> 
> Thank you Pavel!
> 
> 
> On 3/14/19 2:02 PM, Pavel Rappo wrote:
>> I have to look at this patch in more detail, however here's what jumped out at
>> me straight away:
>> 
>>     long deadline = System.nanoTime() + remainingNanos;
>> 
>> It seems like a possibility for an overflow.
>> 
> Not quite.  The deadline can surely become negative, which is alright.  Later we only check the difference (deadline - System.nanoTime()).
> 
> Actually, the new code mimics what we have in PocessImpl for Unix and Windows.
> 
> With kind regards,
> Ivan
> 
>>  Documentation for System.nanoTime
>> has a special section on this:
>> 
>>     For example, to measure how long some code takes to execute:
>> 
>>        long startTime = System.nanoTime();
>>        // ... the code being measured ...
>>        long elapsedNanos = System.nanoTime() - startTime;
>> 
>>     To compare elapsed time against a timeout, use
>> 
>>        if (System.nanoTime() - startTime >= timeoutNanos) ...
>> 
>>     instead of
>> 
>>      if (System.nanoTime() >= startTime + timeoutNanos) ...
>> 
>>     because of the possibility of numerical overflow.
>> 
>> Is that of concern in this case?
>> 
>>> On 14 Mar 2019, at 19:49, Ivan Gerasimov <ivan.gerasimov at oracle.com> wrote:
>>> 
>>> Hello!
>>> 
>>> The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.
>>> 
>>> JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.
>>> 
>>> Still, it is better to provide a more accurate default implementation.
>>> 
>>> I'm not quite certain the regression test needs to be included in the fix.  The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5.  Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms.
>>> I can tag the test as @ignored with a comment, or simply remove it from the fix.
>>> 
>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/
>>> 
>>> Thanks in advance for reviewing!
>>> 
>>> -- 
>>> With kind regards,
>>> Ivan Gerasimov
>>> 
>> 
> 
> -- 
> With kind regards,
> Ivan Gerasimov
> 



More information about the core-libs-dev mailing list