RFR: JDK-8212028: Use run-test makefile framework for testing in Oracle's Mach5

Fri Oct 12 04:29:12 UTC 2018

Hi Jon,

On 12/10/2018 11:58 AM, Jonathan Gibbons wrote:
> 
> 
> On 10/11/18 3:40 PM, David Holmes wrote:
>> Hi Erik,
>>
>> On 12/10/2018 8:29 AM, Erik Joelsson wrote:
>>> Hello,
>>>
>>> (adding serviceability-dev and hotspot-dev for test changes)
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8212028
>>>
>>> Webrev: 
>>> http://cr.openjdk.java.net/~erikj/8212028/webrev.01/index.html (From 
>>> ihse-runtestprebuilt-branch in jdk-sandbox)
>>>
>>> In order to fully adopt the new run-test framework, we need to switch 
>>> over the automated and distributed testing system at Oracle to the 
>>> new framework. To get this to work, there are number of issues that 
>>> needed to be fixed. Here follows a brief explanation, see bug for 
>>> more details.
>>>
>>> For RunTest.gmk and related makefiles there are a number of minor 
>>> tweaks to support all the necessary control variables that are 
>>> currently used for the old test makefiles, as well as correcting some 
>>> test setup settings.
>>>
>>> In addition to that, some tests also needed to be modified:
>>>
>>> Timeouts
>>> The current default timeoutFactor in the makefiles is 4. However, the 
>>> old Mach5 executor overrides that to 10. I don't think it should 
>>> dabble with such things and leave it to the makefiles, the user, or a 
>>> specific job definition, so with the new run-test executor, it no 
>>> longer does. This means many tests now have a much shorter effective 
>>> timeout. Because of this, we need to increase the timeout on some 
>>> that are now prone to timing out. I have run tier1-5 a few times to 
>>> try and find these and added /timeout=300 (which will result in the 
>>> same effective timeout as before) when specific tests seemed 
>>> problematic.
>>
>> This should be fixed in the tier job definitions not the individual 
>> tests. We have moved away from putting explicit timeouts on individual 
>> tests and instead rely on the framework timeout being set appropriately.
>>
>> David
>> -----
> 
> David,
> 
> That's a suboptimal policy. because it means you're relying on the 
> framework handling the worst case test.

Yes. Given we have such a huge range of tests running on a range of 
platforms, on machines with a range of capabilities, using a range of VM 
flags and using a range of loads on the test machines, this has to be 
punted to the framework - otherwise you have to update every test to add 
an explicit timeout for the worst case (as experienced by some runner of 
the tests).

There's no holy-grail answer here.

My understanding of current approach was to set the framework timeout so 
that the majority of tests running under a given "normal" execution 
context pass. Then add multipliers for specific test configurations or 
platforms known to take longer (-Xcomp or sparc, for example). Then 
tests that don't fit within that chosen timeout get either their own 
timeout set, or moved to a tier with a different multiplier.

This change basically lowers the bar that had been set such that more 
tests now need explicit timeouts. I'm not sure why that was necessary, 
nor do I think it necessarily a good thing.

But after some internal discussions the test folk seem to be okay with 
this, so having said my piece I'll let it drop.

> As far as jtreg goes, the default timeout for each step is 2 mins, which 
> is intended to be enough for the test to reliably run within that time 
> on a reasonably modern developer-class machine.  A test which always 
> times out on a good machine should use a test-specific increased timeout.

Agreed.

> Where the framework can help is, if tests are being run on an old or 
> slow machine, or if test run args are provided that will cause the test 
> to run significantly slower than usual, then the framework can/should 
> start scaling up the timeout factor.

Again agreed.

Cheers,
David

> -- Jon
>>
>>> test/hotspot/jtreg/runtime/appcds/jvmti/InstrumentationTest.java
>>> This test spawns a child process and tries to locate it using the 
>>> attach api, by looking for a unique token in the command line string 
>>> of the spawned JVM. The problem is that the command line string it 
>>> gets from the attach api is truncated and the token is last on the 
>>> command line. This normally works well, but the arguments before it 
>>> are 3 files, with full absolute paths inside the jtreg work 
>>> directory. With Mach5 we have pretty deep work directories, and with 
>>> run-test, we make them even deeper. This unfortunately trips the 
>>> limit and the test fails. I have fixed this by reordering the 
>>> arguments to the child process.
>>>
>>> /Erik
>>>
>