RFR 8220684 : Process.waitFor(long, TimeUnit) can return false for a process that exited within the timeout

Martin Buchholz martinrb at google.com
Fri Mar 15 19:14:48 UTC 2019


There are many nanoTime-based calculations in j.u.concurrent and indeed we
use a variable named "deadline" a lot.

final long deadline = System.nanoTime() + nanosTimeout;

But different classes use nanoTime in subtly different ways so there's no
copy/paste idiom.

When we updated the spec for nanoTime to address integer wraparound, we
didn't think about where else such clarifications might be needed.

On Fri, Mar 15, 2019 at 1:11 AM Ivan Gerasimov <ivan.gerasimov at oracle.com>
wrote:

> Hi David!
>
> The code could be written like following:
>
> long start = System.nanoTime();
> do {
>      ...
>      long elapsed = System.nanoTime() - start;
>      remainingNanos = timeoutNanos - elapsed;
> } while (remainingNanos > 0);
>
> but then one realizes that this always calculates (start +
> timeoutNanos), both of which are effectively final.
> So, the sum can be calculated in advance, and named, say, "deadline" :)
>
> With kind regards,
> Ivan
>
> On 3/15/19 12:16 AM, David Holmes wrote:
> > On 15/03/2019 5:03 pm, Ivan Gerasimov wrote:
> >> Thank you David!
> >>
> >>
> >> On 3/14/19 11:01 PM, David Holmes wrote:
> >>> Hi Ivan,
> >>>
> >>> On 15/03/2019 12:02 pm, Ivan Gerasimov wrote:
> >>>> Hi David!
> >>>>
> >>>>
> >>>> On 3/14/19 5:28 PM, David Holmes wrote:
> >>>>> Hi Ivan,
> >>>>>
> >>>>> On 15/03/2019 5:49 am, Ivan Gerasimov wrote:
> >>>>>> Hello!
> >>>>>>
> >>>>>> The default implementation of Process.waitFor(long, TimeUnit)
> >>>>>> does not check if the process has exited after the last portion
> >>>>>> of the timeout has expired.
> >>>>>
> >>>>> Please clarify. There is always a race between detecting a timeout
> >>>>> and the process actually terminating. If the process actually
> >>>>> terminates while you're deciding to report a timeout that seems
> >>>>> just  an acceptable possibility. No matter what you do the process
> >>>>> could terminate just after you decided to report the timeout.
> >>>>>
> >>>>>
> >>>> Current code for waitFor(...) is
> >>>>
> >>>>   212         do {
> >>>>   213             try {
> >>>>   214                 exitValue();
> >>>>   215                 return true;
> >>>>   216             } catch(IllegalThreadStateException ex) {
> >>>>   217                 if (rem > 0)
> >>>>   218                     Thread.sleep(
> >>>>   219 Math.min(TimeUnit.NANOSECONDS.toMillis(rem) + 1, 100));
> >>>>   220             }
> >>>>   221             rem = unit.toNanos(timeout) - (System.nanoTime()
> >>>> - startTime);
> >>>>   222         } while (rem > 0);
> >>>>   223         return false;
> >>>>
> >>>> So, if the loop has just processed the last 100 ms portion of the
> >>>> timeout, it ends right away, without checking if the process has
> >>>> exited during this period.
> >>>
> >>> Ah I see. Yes there should be a final check after what will be the
> >>> last sleep.
> >>>
> >>>> Not a big deal of course, but it is more accurate to check if the
> >>>> process has exited at the *end* of the specified timeout, and not
> >>>> 100 milliseconds before the end.
> >>>
> >>> Agreed.
> >>>
> >>> I share Pavel's concern about the implicit overflow in calculating a
> >>> deadline, rather than comparing elapsed time with the timeout. There
> >>> has been some effort in core-libs to remove assumptions about how
> >>> overflows are handled.
> >>>
> >> We only check sign of the remainingNanos, which is initially strictly
> >> positive timeout.  Over time it gets decreased by elapsed time.
> >> The elapsed time is the difference between System.nanoTime() measured
> >> at line 217 and 213, so it is always non-negative (well, strictly
> >> positive, as there's sleep in between).
> >>
> >> I don't really see a possibility of overflow for remainingNanos here.
> >>
> >> The deadline may overflow, and it is totally fine, as we don't care
> >> about its sign.
> >
> > It's deadline that I was flagging.
> >
> >> Am I missing something?
> >
> > Just a code cleanliness issue. I had thought, but now can't find,
> > these kinds of overflowing calculations were being cleaned up in
> > core-libs. But perhaps I'm just mis-remembering.
> >
> > Cheers,
> > David
> >
> >>
> >> With kind regards,
> >> Ivan
> >>
> >>> Thanks,
> >>> David
> >>> -----
> >>>
> >>>> With kind regards,
> >>>> Ivan
> >>>>
> >>>>>
> >>>>> David
> >>>>> -----
> >>>>>
> >>>>>> JDK has two implementations of Process (for Unix and Windows) and
> >>>>>> they both override waitFor(), so it's not an issue for them.
> >>>>>>
> >>>>>> Still, it is better to provide a more accurate default
> >>>>>> implementation.
> >>>>>>
> >>>>>> I'm not quite certain the regression test needs to be included in
> >>>>>> the fix.  The test does demonstrate the issue with the unfixed
> >>>>>> JDK and passed Okay on all tested platforms in Mach5. Yet, I
> >>>>>> suspect the test can still show false negative results, as there
> >>>>>> are no guaranties that even such simple application as `true`
> >>>>>> will finish in 100 ms.
> >>>>>> I can tag the test as @ignored with a comment, or simply remove
> >>>>>> it from the fix.
> >>>>>>
> >>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8220684
> >>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8220684/00/webrev/
> >>>>>>
> >>>>>> Thanks in advance for reviewing!
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
> --
> With kind regards,
> Ivan Gerasimov
>
>


More information about the core-libs-dev mailing list