RFR 8220684 : Process.waitFor(long, TimeUnit) can return false for a process that exited within the timeout

Ivan Gerasimov

14 Mar 2019 14 Mar '19

7:49 p.m.

Hello! The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired. JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them. Still, it is better to provide a more accurate default implementation. I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/ Thanks in advance for reviewing! -- With kind regards, Ivan Gerasimov

Show replies by date

Pavel Rappo

14 Mar 14 Mar

9:02 p.m.

I have to look at this patch in more detail, however here's what jumped out at me straight away: long deadline = System.nanoTime() + remainingNanos; It seems like a possibility for an overflow. Documentation for System.nanoTime has a special section on this: For example, to measure how long some code takes to execute: long startTime = System.nanoTime(); // ... the code being measured ... long elapsedNanos = System.nanoTime() - startTime; To compare elapsed time against a timeout, use if (System.nanoTime() - startTime >= timeoutNanos) ... instead of if (System.nanoTime() >= startTime + timeoutNanos) ... because of the possibility of numerical overflow. Is that of concern in this case?

...

On 14 Mar 2019, at 19:49, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:

Hello!

The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.

JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.

Still, it is better to provide a more accurate default implementation.

I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/

Thanks in advance for reviewing!

-- With kind regards, Ivan Gerasimov

Ivan Gerasimov

9:32 p.m.

Thank you Pavel! On 3/14/19 2:02 PM, Pavel Rappo wrote:

...

I have to look at this patch in more detail, however here's what jumped out at me straight away:

long deadline = System.nanoTime() + remainingNanos;

It seems like a possibility for an overflow.

Not quite. The deadline can surely become negative, which is alright. Later we only check the difference (deadline - System.nanoTime()). Actually, the new code mimics what we have in PocessImpl for Unix and Windows. With kind regards, Ivan

...

Documentation for System.nanoTime has a special section on this:

For example, to measure how long some code takes to execute:

long startTime = System.nanoTime(); // ... the code being measured ... long elapsedNanos = System.nanoTime() - startTime;

To compare elapsed time against a timeout, use

if (System.nanoTime() - startTime >= timeoutNanos) ...

instead of

if (System.nanoTime() >= startTime + timeoutNanos) ...

because of the possibility of numerical overflow.

Is that of concern in this case?

...
On 14 Mar 2019, at 19:49, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:

Hello!

The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.

JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.

Still, it is better to provide a more accurate default implementation.

I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/

Thanks in advance for reviewing!

-- With kind regards, Ivan Gerasimov

-- With kind regards, Ivan Gerasimov

Pavel Rappo

15 Mar 15 Mar

12:30 p.m.

Thanks for the clarifications. The change looks fine now. I felt initial unease because, in my experience, it's so easy to overlook some subtlety when working with overflowing integers. For `System.currentTimeMillis` the code like: stopTime - startTime >= timeout doesn't raise one's eyebrows. For one, both `stopTime` and `startTime` are non-negative and stopTime >= startTime. And that will be the case for the next 300 million years [1] of which we have lived the first 50 years or so. For the code in your fix, however, not only can `deadline` be negative, but the value returned by `System.nanoTime` too. What's even worse is that `System.nanoTime` can change the sign between invocations [2]. For the problem like that, I usually draw a circle with wrapping points Long.MAX_VALUE and Long.MIN_VALUE on the paper and satisfy myself by exploring some number of examples that exhaust all (sign) possibilities. There's a significant cognitive load to perform this analysis each and every time. And there is always a possibility of making a mistake [3]. Happy for the people who can do it seamlessly and on the fly. As for me, I wish there were a method that hid all these subtleties, something like long remainingTimeout(long firstMeasurement, long secondMeasurement, long timeout) Or maybe it's just the documentation for `System.nanoTime` and `System.currentTimeMillis` that could be updated so as to explain the right way in details. `System.nanoTime` is already doing a good job there. I've searched through the JDK and found out that kind of timeout calculations are quite abundant. In particular, the one used in your code seems to be an idiom. My guess, it might be true for other projects. -Pavel -------------------------------------------------------------------------------- [1] (2^63) / (10^3 * 24 * 60 * 60 * 365) [2] It's unknown what would the initial (at JVM start) value be. It might be negative. So that's why `System.nanoTime() >= startTime + timeoutNanos` might fail even if the right hand side doesn't overflow. [3] Heck, you can even break your neck trying to find a middle of the index range: [a, b]. While `(a + b) / 2` overflows, `a + (b - a)/2` (or `(a + b) >>> 1`) does not.

...

On 14 Mar 2019, at 21:32, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:

Thank you Pavel!

On 3/14/19 2:02 PM, Pavel Rappo wrote:

...
I have to look at this patch in more detail, however here's what jumped out at me straight away:

long deadline = System.nanoTime() + remainingNanos;

It seems like a possibility for an overflow.

Not quite. The deadline can surely become negative, which is alright. Later we only check the difference (deadline - System.nanoTime()).

Actually, the new code mimics what we have in PocessImpl for Unix and Windows.

With kind regards, Ivan

...
Documentation for System.nanoTime has a special section on this:

For example, to measure how long some code takes to execute:

long startTime = System.nanoTime(); // ... the code being measured ... long elapsedNanos = System.nanoTime() - startTime;

To compare elapsed time against a timeout, use

if (System.nanoTime() - startTime >= timeoutNanos) ...

instead of

if (System.nanoTime() >= startTime + timeoutNanos) ...

because of the possibility of numerical overflow.

Is that of concern in this case?

...
On 14 Mar 2019, at 19:49, Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:

Hello!

The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.

JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.

Still, it is better to provide a more accurate default implementation.

I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/

Thanks in advance for reviewing!

-- With kind regards, Ivan Gerasimov

-- With kind regards, Ivan Gerasimov

David Holmes

12:28 a.m.

Hi Ivan, On 15/03/2019 5:49 am, Ivan Gerasimov wrote:

...

Hello!

The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.

Please clarify. There is always a race between detecting a timeout and the process actually terminating. If the process actually terminates while you're deciding to report a timeout that seems just an acceptable possibility. No matter what you do the process could terminate just after you decided to report the timeout. David -----

...

JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.

Still, it is better to provide a more accurate default implementation.

I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/

Thanks in advance for reviewing!

Ivan Gerasimov

2:02 a.m.

Hi David! On 3/14/19 5:28 PM, David Holmes wrote:

...

Hi Ivan,

On 15/03/2019 5:49 am, Ivan Gerasimov wrote:

...
Hello!

The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.

Please clarify. There is always a race between detecting a timeout and the process actually terminating. If the process actually terminates while you're deciding to report a timeout that seems just an acceptable possibility. No matter what you do the process could terminate just after you decided to report the timeout.

Current code for waitFor(...) is 212 do { 213 try { 214 exitValue(); 215 return true; 216 } catch(IllegalThreadStateException ex) { 217 if (rem > 0) 218 Thread.sleep( 219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100)); 220 } 221 rem = unit.toNanos(timeout) - (System.nanoTime() - startTime); 222 } while (rem > 0); 223 return false; So, if the loop has just processed the last 100 ms portion of the timeout, it ends right away, without checking if the process has exited during this period. Not a big deal of course, but it is more accurate to check if the process has exited at the *end* of the specified timeout, and not 100 milliseconds before the end. With kind regards, Ivan

...

David -----

...
JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.

Still, it is better to provide a more accurate default implementation.

I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/

Thanks in advance for reviewing!

-- With kind regards, Ivan Gerasimov

David Holmes

6:01 a.m.

Hi Ivan, On 15/03/2019 12:02 pm, Ivan Gerasimov wrote:

...

Hi David!

On 3/14/19 5:28 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 5:49 am, Ivan Gerasimov wrote:

...
Hello!

The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.

Please clarify. There is always a race between detecting a timeout and the process actually terminating. If the process actually terminates while you're deciding to report a timeout that seems just an acceptable possibility. No matter what you do the process could terminate just after you decided to report the timeout.

Current code for waitFor(...) is

212 do { 213 try { 214 exitValue(); 215 return true; 216 } catch(IllegalThreadStateException ex) { 217 if (rem > 0) 218 Thread.sleep( 219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100)); 220 } 221 rem = unit.toNanos(timeout) - (System.nanoTime() - startTime); 222 } while (rem > 0); 223 return false;

So, if the loop has just processed the last 100 ms portion of the timeout, it ends right away, without checking if the process has exited during this period.

Ah I see. Yes there should be a final check after what will be the last sleep.

...

Not a big deal of course, but it is more accurate to check if the process has exited at the *end* of the specified timeout, and not 100 milliseconds before the end.

Agreed. I share Pavel's concern about the implicit overflow in calculating a deadline, rather than comparing elapsed time with the timeout. There has been some effort in core-libs to remove assumptions about how overflows are handled. Thanks, David -----

...

With kind regards, Ivan

...
David -----

...
JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.

Still, it is better to provide a more accurate default implementation.

I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/

Thanks in advance for reviewing!

Ivan Gerasimov

7:03 a.m.

Thank you David! On 3/14/19 11:01 PM, David Holmes wrote:

...

Hi Ivan,

On 15/03/2019 12:02 pm, Ivan Gerasimov wrote:

...
Hi David!

On 3/14/19 5:28 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 5:49 am, Ivan Gerasimov wrote:

...
Hello!

The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.

Please clarify. There is always a race between detecting a timeout and the process actually terminating. If the process actually terminates while you're deciding to report a timeout that seems just an acceptable possibility. No matter what you do the process could terminate just after you decided to report the timeout.

Current code for waitFor(...) is

212 do { 213 try { 214 exitValue(); 215 return true; 216 } catch(IllegalThreadStateException ex) { 217 if (rem > 0) 218 Thread.sleep( 219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100)); 220 } 221 rem = unit.toNanos(timeout) - (System.nanoTime() - startTime); 222 } while (rem > 0); 223 return false;

So, if the loop has just processed the last 100 ms portion of the timeout, it ends right away, without checking if the process has exited during this period.

Ah I see. Yes there should be a final check after what will be the last sleep.

...
Not a big deal of course, but it is more accurate to check if the process has exited at the *end* of the specified timeout, and not 100 milliseconds before the end.

Agreed.

I share Pavel's concern about the implicit overflow in calculating a deadline, rather than comparing elapsed time with the timeout. There has been some effort in core-libs to remove assumptions about how overflows are handled.

We only check sign of the remainingNanos, which is initially strictly positive timeout. Over time it gets decreased by elapsed time. The elapsed time is the difference between System.nanoTime() measured at line 217 and 213, so it is always non-negative (well, strictly positive, as there's sleep in between). I don't really see a possibility of overflow for remainingNanos here. The deadline may overflow, and it is totally fine, as we don't care about its sign. Am I missing something? With kind regards, Ivan

...

Thanks, David -----

...
With kind regards, Ivan

...
David -----

...
JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.

Still, it is better to provide a more accurate default implementation.

I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/

Thanks in advance for reviewing!

-- With kind regards, Ivan Gerasimov

David Holmes

7:16 a.m.

On 15/03/2019 5:03 pm, Ivan Gerasimov wrote:

...

Thank you David!

On 3/14/19 11:01 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 12:02 pm, Ivan Gerasimov wrote:

...
Hi David!

On 3/14/19 5:28 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 5:49 am, Ivan Gerasimov wrote:

...
Hello!

The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.

Please clarify. There is always a race between detecting a timeout and the process actually terminating. If the process actually terminates while you're deciding to report a timeout that seems just an acceptable possibility. No matter what you do the process could terminate just after you decided to report the timeout.

Current code for waitFor(...) is

212 do { 213 try { 214 exitValue(); 215 return true; 216 } catch(IllegalThreadStateException ex) { 217 if (rem > 0) 218 Thread.sleep( 219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100)); 220 } 221 rem = unit.toNanos(timeout) - (System.nanoTime() - startTime); 222 } while (rem > 0); 223 return false;

So, if the loop has just processed the last 100 ms portion of the timeout, it ends right away, without checking if the process has exited during this period.

Ah I see. Yes there should be a final check after what will be the last sleep.

...
Not a big deal of course, but it is more accurate to check if the process has exited at the *end* of the specified timeout, and not 100 milliseconds before the end.

Agreed.

I share Pavel's concern about the implicit overflow in calculating a deadline, rather than comparing elapsed time with the timeout. There has been some effort in core-libs to remove assumptions about how overflows are handled.

We only check sign of the remainingNanos, which is initially strictly positive timeout. Over time it gets decreased by elapsed time. The elapsed time is the difference between System.nanoTime() measured at line 217 and 213, so it is always non-negative (well, strictly positive, as there's sleep in between).

I don't really see a possibility of overflow for remainingNanos here.

The deadline may overflow, and it is totally fine, as we don't care about its sign.

It's deadline that I was flagging.

...

Am I missing something?

Just a code cleanliness issue. I had thought, but now can't find, these kinds of overflowing calculations were being cleaned up in core-libs. But perhaps I'm just mis-remembering. Cheers, David

...

With kind regards, Ivan

...
Thanks, David -----

...
With kind regards, Ivan

...
David -----

...
JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.

Still, it is better to provide a more accurate default implementation.

I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/

Thanks in advance for reviewing!

Ivan Gerasimov

8:08 a.m.

Hi David! The code could be written like following: long start = System.nanoTime(); do { ... long elapsed = System.nanoTime() - start; remainingNanos = timeoutNanos - elapsed; } while (remainingNanos > 0); but then one realizes that this always calculates (start + timeoutNanos), both of which are effectively final. So, the sum can be calculated in advance, and named, say, "deadline" :) With kind regards, Ivan On 3/15/19 12:16 AM, David Holmes wrote:

...

On 15/03/2019 5:03 pm, Ivan Gerasimov wrote:

...
Thank you David!

On 3/14/19 11:01 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 12:02 pm, Ivan Gerasimov wrote:

...
Hi David!

On 3/14/19 5:28 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 5:49 am, Ivan Gerasimov wrote:

...
Hello!

The default implementation of Process.waitFor(long, TimeUnit) does not check if the process has exited after the last portion of the timeout has expired.

Please clarify. There is always a race between detecting a timeout and the process actually terminating. If the process actually terminates while you're deciding to report a timeout that seems just an acceptable possibility. No matter what you do the process could terminate just after you decided to report the timeout.

Current code for waitFor(...) is

212 do { 213 try { 214 exitValue(); 215 return true; 216 } catch(IllegalThreadStateException ex) { 217 if (rem > 0) 218 Thread.sleep( 219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100)); 220 } 221 rem = unit.toNanos(timeout) - (System.nanoTime() - startTime); 222 } while (rem > 0); 223 return false;

So, if the loop has just processed the last 100 ms portion of the timeout, it ends right away, without checking if the process has exited during this period.

Ah I see. Yes there should be a final check after what will be the last sleep.

...
Not a big deal of course, but it is more accurate to check if the process has exited at the *end* of the specified timeout, and not 100 milliseconds before the end.

Agreed.

I share Pavel's concern about the implicit overflow in calculating a deadline, rather than comparing elapsed time with the timeout. There has been some effort in core-libs to remove assumptions about how overflows are handled.

We only check sign of the remainingNanos, which is initially strictly positive timeout. Over time it gets decreased by elapsed time. The elapsed time is the difference between System.nanoTime() measured at line 217 and 213, so it is always non-negative (well, strictly positive, as there's sleep in between).

I don't really see a possibility of overflow for remainingNanos here.

The deadline may overflow, and it is totally fine, as we don't care about its sign.

It's deadline that I was flagging.

...
Am I missing something?

Just a code cleanliness issue. I had thought, but now can't find, these kinds of overflowing calculations were being cleaned up in core-libs. But perhaps I'm just mis-remembering.

Cheers, David

...
With kind regards, Ivan

...
Thanks, David -----

...
With kind regards, Ivan

...
David -----

...
JDK has two implementations of Process (for Unix and Windows) and they both override waitFor(), so it's not an issue for them.

Still, it is better to provide a more accurate default implementation.

I'm not quite certain the regression test needs to be included in the fix. The test does demonstrate the issue with the unfixed JDK and passed Okay on all tested platforms in Mach5. Yet, I suspect the test can still show false negative results, as there are no guaranties that even such simple application as `true` will finish in 100 ms. I can tag the test as @ignored with a comment, or simply remove it from the fix.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/

Thanks in advance for reviewing!

-- With kind regards, Ivan Gerasimov

Roger Riggs

6:39 p.m.

Hi Ivan, The updated code looks fine except for style. Please put the return and the if's on different lines (by the style conventions). Line 210 and 211, and 216. Thanks, Roger On 03/15/2019 04:08 AM, Ivan Gerasimov wrote:

...

Hi David!

The code could be written like following:

long start = System.nanoTime(); do { ... long elapsed = System.nanoTime() - start; remainingNanos = timeoutNanos - elapsed; } while (remainingNanos > 0);

but then one realizes that this always calculates (start + timeoutNanos), both of which are effectively final. So, the sum can be calculated in advance, and named, say, "deadline" :)

With kind regards, Ivan

On 3/15/19 12:16 AM, David Holmes wrote:

...
On 15/03/2019 5:03 pm, Ivan Gerasimov wrote:

...
Thank you David!

On 3/14/19 11:01 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 12:02 pm, Ivan Gerasimov wrote:

...
Hi David!

On 3/14/19 5:28 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 5:49 am, Ivan Gerasimov wrote: > Hello! > > The default implementation of Process.waitFor(long, TimeUnit) > does not check if the process has exited after the last portion > of the timeout has expired.

Please clarify. There is always a race between detecting a timeout and the process actually terminating. If the process actually terminates while you're deciding to report a timeout that seems just an acceptable possibility. No matter what you do the process could terminate just after you decided to report the timeout.

Current code for waitFor(...) is

212 do { 213 try { 214 exitValue(); 215 return true; 216 } catch(IllegalThreadStateException ex) { 217 if (rem > 0) 218 Thread.sleep( 219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100)); 220 } 221 rem = unit.toNanos(timeout) - (System.nanoTime() - startTime); 222 } while (rem > 0); 223 return false;

So, if the loop has just processed the last 100 ms portion of the timeout, it ends right away, without checking if the process has exited during this period.

Ah I see. Yes there should be a final check after what will be the last sleep.

...
Not a big deal of course, but it is more accurate to check if the process has exited at the *end* of the specified timeout, and not 100 milliseconds before the end.

Agreed.

I share Pavel's concern about the implicit overflow in calculating a deadline, rather than comparing elapsed time with the timeout. There has been some effort in core-libs to remove assumptions about how overflows are handled.

We only check sign of the remainingNanos, which is initially strictly positive timeout. Over time it gets decreased by elapsed time. The elapsed time is the difference between System.nanoTime() measured at line 217 and 213, so it is always non-negative (well, strictly positive, as there's sleep in between).

I don't really see a possibility of overflow for remainingNanos here.

The deadline may overflow, and it is totally fine, as we don't care about its sign.

It's deadline that I was flagging.

...
Am I missing something?

Just a code cleanliness issue. I had thought, but now can't find, these kinds of overflowing calculations were being cleaned up in core-libs. But perhaps I'm just mis-remembering.

Cheers, David

...
With kind regards, Ivan

...
Thanks, David -----

...
With kind regards, Ivan

...
David -----

> JDK has two implementations of Process (for Unix and Windows) > and they both override waitFor(), so it's not an issue for them. > > Still, it is better to provide a more accurate default > implementation. > > I'm not quite certain the regression test needs to be included > in the fix. The test does demonstrate the issue with the > unfixed JDK and passed Okay on all tested platforms in Mach5. > Yet, I suspect the test can still show false negative results, > as there are no guaranties that even such simple application as > `true` will finish in 100 ms. > I can tag the test as @ignored with a comment, or simply remove > it from the fix. > > BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 > WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/ > > Thanks in advance for reviewing! >

Martin Buchholz

7:14 p.m.

There are many nanoTime-based calculations in j.u.concurrent and indeed we use a variable named "deadline" a lot. final long deadline = System.nanoTime() + nanosTimeout; But different classes use nanoTime in subtly different ways so there's no copy/paste idiom. When we updated the spec for nanoTime to address integer wraparound, we didn't think about where else such clarifications might be needed. On Fri, Mar 15, 2019 at 1:11 AM Ivan Gerasimov <ivan.gerasimov@oracle.com> wrote:

...

Hi David!

The code could be written like following:

long start = System.nanoTime(); do { ... long elapsed = System.nanoTime() - start; remainingNanos = timeoutNanos - elapsed; } while (remainingNanos > 0);

but then one realizes that this always calculates (start + timeoutNanos), both of which are effectively final. So, the sum can be calculated in advance, and named, say, "deadline" :)

With kind regards, Ivan

On 3/15/19 12:16 AM, David Holmes wrote:

...
On 15/03/2019 5:03 pm, Ivan Gerasimov wrote:

...
Thank you David!

On 3/14/19 11:01 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 12:02 pm, Ivan Gerasimov wrote:

...
Hi David!

On 3/14/19 5:28 PM, David Holmes wrote:

...
Hi Ivan,

On 15/03/2019 5:49 am, Ivan Gerasimov wrote: > Hello! > > The default implementation of Process.waitFor(long, TimeUnit) > does not check if the process has exited after the last portion > of the timeout has expired.

Please clarify. There is always a race between detecting a timeout and the process actually terminating. If the process actually terminates while you're deciding to report a timeout that seems just an acceptable possibility. No matter what you do the process could terminate just after you decided to report the timeout.

Current code for waitFor(...) is

212 do { 213 try { 214 exitValue(); 215 return true; 216 } catch(IllegalThreadStateException ex) { 217 if (rem > 0) 218 Thread.sleep( 219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100)); 220 } 221 rem = unit.toNanos(timeout) - (System.nanoTime() - startTime); 222 } while (rem > 0); 223 return false;

So, if the loop has just processed the last 100 ms portion of the timeout, it ends right away, without checking if the process has exited during this period.

Ah I see. Yes there should be a final check after what will be the last sleep.

...
Not a big deal of course, but it is more accurate to check if the process has exited at the *end* of the specified timeout, and not 100 milliseconds before the end.

Agreed.

I share Pavel's concern about the implicit overflow in calculating a deadline, rather than comparing elapsed time with the timeout. There has been some effort in core-libs to remove assumptions about how overflows are handled.

We only check sign of the remainingNanos, which is initially strictly positive timeout. Over time it gets decreased by elapsed time. The elapsed time is the difference between System.nanoTime() measured at line 217 and 213, so it is always non-negative (well, strictly positive, as there's sleep in between).

I don't really see a possibility of overflow for remainingNanos here.

The deadline may overflow, and it is totally fine, as we don't care about its sign.

It's deadline that I was flagging.

...
Am I missing something?

Just a code cleanliness issue. I had thought, but now can't find, these kinds of overflowing calculations were being cleaned up in core-libs. But perhaps I'm just mis-remembering.

Cheers, David

...
With kind regards, Ivan

...
Thanks, David -----

...
With kind regards, Ivan

...
David -----

> JDK has two implementations of Process (for Unix and Windows) and > they both override waitFor(), so it's not an issue for them. > > Still, it is better to provide a more accurate default > implementation. > > I'm not quite certain the regression test needs to be included in > the fix. The test does demonstrate the issue with the unfixed > JDK and passed Okay on all tested platforms in Mach5. Yet, I > suspect the test can still show false negative results, as there > are no guaranties that even such simple application as `true` > will finish in 100 ms. > I can tag the test as @ignored with a comment, or simply remove > it from the fix. > > BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684 > WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/ > > Thanks in advance for reviewing! >

-- With kind regards, Ivan Gerasimov

2551

Age (days ago)

2552

Last active (days ago)

List overview

Download

11 comments

5 participants

participants (5)

David Holmes
Ivan Gerasimov
Martin Buchholz
Pavel Rappo
Roger Riggs

RFR 8220684 : Process.waitFor(long, TimeUnit) can return false for a process that exited within the timeout

tags

participants (5)