RFR 9 8043477: java/lang/ProcessBuilder/Basic.java failed with: java.lang.AssertionError: Some tests failed

roger riggs roger.riggs at oracle.com
Thu Nov 13 02:25:23 UTC 2014


Hi Martin,

I think that sleep is the version that spawns a JavaChild process and
has its own comment interpreter. See line 309: Am I getting the sleeps 
mixed up?

Roger



On 11/12/2014 6:32 PM, Martin Buchholz wrote:
> Hi Roger,
>
>
>
> On Wed, Nov 12, 2014 at 1:57 PM, roger riggs <roger.riggs at oracle.com> wrote:
>> Hi Martin,
>>
>> I updated the webrev to use long timeouts for the test in question; they
>> can't be too long
>> because the spawned sleep is only alive for 10 seconds.
> I am looking at the "sleep" command invocation, but it's an indefinite
> sleep.  I don't see 10 seconds anywhere.
>
>> Take a look please:
>>
>> http://cr.openjdk.java.net/~rriggs/webrev-basic-debug-8043477/
> 2313             Integer exitValue = null;
> 2314             try {
> 2315                 exitValue = p.exitValue();
> 2316             } catch (IllegalStateException ise) {
> 2317                 // no exitValue
> 2318             }
> 2319             if ((end - start) > 6 * 1_000_000_000)
> 2320                 fail("Test failed: waitFor took too long on a
> dead process. (" + (end - start) + "ns)"
> 2321                 + ", exitValue: " + Objects.requireNonNull(exitValue));
>
> exitValue throws IllegalThreadStateException, not IllegalStateException.
>
> I would fail separately in case of ITSE.  It looks instead like you'll
> fail with uninformative NPE.
>
> 2301             p.waitFor(1000, TimeUnit.MILLISECONDS);
>
> 2304             if ((end - start) < 500_000_000)
>
>
> I am surprised the test is not testing that the initial timed waitFor
> doesn't wait at least the specified time, instead of only half.  I
> consider returning from waitFor early a bug.
>
>
>> Thanks, Roger
>>
>>
>> On 11/12/2014 4:45 PM, Martin Buchholz wrote:
>>> Looking over the test again (I was the original author IIRC),
>>> I think this test still has lots of value with a longer timeout.
>>>
>>> Experience seems to show that 200ms is not a long enough timeout for
>>> "system" operations to use in non-flaky tests, including everything in
>>> Process handling.
>>> I say we ncrease the timeout by at least 10x and move on.
>>>
>>> On Wed, Nov 12, 2014 at 12:16 PM, roger riggs <roger.riggs at oracle.com>
>>> wrote:
>>>> Hi Martin, Rob,
>>>>
>>>> I thought the point of the test was to verify the timeout logic in
>>>> waitFor(timeout).
>>>> Adding the prints without changing the test criteria was intended to
>>>> gather
>>>> data
>>>> about the distribution.
>>>>
>>>> A long timeout would still catch a failure in the reaper/waitFor
>>>> communication
>>>> so maybe that's ok.
>>>>
>>>> As for printing the time, I would switch to the newer
>>>> java.time.Instant/Duration
>>>> which have a decent toString.
>>>>
>>>> Roger
>>>>
>>>>
>>>>
>>>> On 11/12/2014 3:02 PM, Rob McKenna wrote:
>>>>> Eesh, Sorry Roger, I have something like this on my todo.
>>>>>
>>>>> Martin, my concern with the long delay approach is that it effectively
>>>>> nullifies the point of the test. Given that this test has been flakey
>>>>> the
>>>>> approach has been to simply bump the acceptable delta by another 100ms
>>>>> or so
>>>>> every time. Since we've had to do this repeatedly however, I'm beginning
>>>>> to
>>>>> question whether the test is more trouble than its worth.
>>>>>
>>>>>       -Rob
>>>>>
>>>>> On 12/11/14 19:44, Martin Buchholz wrote:
>>>>>> The print statement below seems redundant with the assertion failure
>>>>>> message.
>>>>>> You could improve the assertion message instead if need be.
>>>>>> Adding thousand separator underscores to 200000000L would help a
>>>>>> little.
>>>>>>
>>>>>> I like my little helper
>>>>>>
>>>>>>        static long millisElapsedSince(long startNanoTime) {
>>>>>>            return NANOSECONDS.toMillis(System.nanoTime() -
>>>>>> startNanoTime);
>>>>>>        }
>>>>>>
>>>>>>
>>>>>> +            System.out.printf(" waitFor process: delta: %d%n",(end -
>>>>>> start) );
>>>>>> +
>>>>>>                 if ((end - start) > 200000000L * (AIX.is() ? 2 : 1))
>>>>>>                     fail("Test failed: waitFor took too long (" + (end -
>>>>>> start) + "ns)");
>>>>>>
>>>>>> 200 ms timeout for subprocesses to finish is just too damn low. In
>>>>>> j.u.c. tests we switched to 10 seconds for most "long" timeouts
>>>>>> (LONG_DELAY_MS) and are happy with the disappearance of rare flaky
>>>>>> results.
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 12, 2014 at 11:34 AM, roger riggs <roger.riggs at oracle.com>
>>>>>> wrote:
>>>>>>> Please review test changes to ProcessBuilder Basic.java to add some
>>>>>>> debugging information.
>>>>>>> The test has been failing intermittently.  The wait times have been
>>>>>>> extended
>>>>>>> to see
>>>>>>> if the systems are just slow.  The failure criteria have not changed.
>>>>>>>
>>>>>>> Suggestions welcome.
>>>>>>>
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~rriggs/webrev-basic-debug-8043477/
>>>>>>>
>>>>>>> issue:
>>>>>>>       8043477: java/lang/ProcessBuilder/Basic.java failed with:
>>>>>>> java.lang.AssertionError: Some tests failed
>>>>>>>
>>>>>>> Thanks, Roger
>>>>>>>
>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8043477
>>>>>>>




More information about the core-libs-dev mailing list