Request for reviews (M) 8005600: compiler/8004741/Test8004741.java fails intermediately

Mon Jan 21 15:17:28 PST 2013

Thank you, David

First, this test tries to exercise paths in runtime code (not in 
compiled code) which pass exceptions to compiled code. 2 places are:

1. runtime calls from compiled methods (in opto/runtime.cpp). The test 
call it because variable values (passed as arguments, so it does not 
matter what values are passed) are used for multidimensional array 
allocation and flag -XX:+StressCompiledExceptionHandlers to avoid 
deoptimization of compiled frame which we do normally.

2. during safepoint in JavaThread::send_thread_stop(). The test tries to 
bring VM into safepoint by allocating big array with small heap.

I think this cases should be separated (separate test methods). Also it 
would be nice to add third case, as you suggested, by injected invalid 
dimension size.

An other thing to improve is to add COMPILED state which set after 11000 
iterations in the loop in run(). We test without tiered so threshold is 
10000 for C2 (1000 for c1) and we use -Xbatch so the code will wait 
compilation finish.

It would be nice to avoid (passed > N/2) condition to pass test. Is it 
possible to find why it fails?

Thanks,
Vladimir

On 1/19/13 7:05 AM, David Chase wrote:
>
> http://cr.openjdk.java.net/~drchase/8006500/webrev.01/
>
> 8004741 tested for the multi-dimensional-array-allocation-lack-exception-handler bug.
> This is accomplished by allocating a multi-dimensional array in an infinite loop,
> Thread.stop()ing the looping allocator (asynchronously throws ThreadDeath in victim thread).
>
> The old logic used timed waits to coordinate the two threads, and that did not always work;
> sometimes the ThreadDeath would land in an unexpected place.
> It also contained 6 seconds of total wait time.
>
> Fix:
>
> Rewrote most of the test surrounding the victim try-block.
>
> New version uses wait-notify to coordinate the two threads.
>
> The region of code in which a "pass" could be recorded was reduced.
>
> Pass was turned into a count, and the test was made statistical;
> if more than half of the N runs of the test code record a "pass" then
> it is judged to be a pass, else it is a fail.  The regression failure
> is a crash, so less than 100% testing is okay.
>
> All unexpected behavior (any exception other than ThreadDeath,
> and unexpected thread state changes) leads to a fail().
>
> The new version contains .5 + .1 * N seconds of explicit wait time
> (N=12, so 1.7), plus whatever is required for wait-notify thread coordination
> (ought to be milliseconds in the usual case).  The first .5 second might
> be completely unnecessary; I expect someone will tell me.
>
> Testing:
>
> Switching back and forth between pre-8004741-fix 7u11 and post-fix, verified that the modified 8004741 would still reliably fail on an unfixed VM.  Experimented with different sizes of 2-D allocation; that seemed to matter.
>
> Artificially padded the body of test to see that the statistical nature of the test would work -- that the try-block hit rate would go down (it did).
>
> JPRT of the new 8004741 on Mac, Solaris, x86 (some problem with running on Linux)
>
> - Checked that the x86 failures were all bogus (zip failure post-test; tests all passed)
> - Checked the logs of all the Mac and Solaris JPRT runs to look for any misses of the try-block; some Solaris runs saw one out of 12 trials miss once (i.e., 11 hits).  Note that this is hitting a "smaller target" than the previous version of the test, so these misses don't correspond to failures in the previous version of the test; that was due to timing hiccups.
>
>
>