RFR 8184445: JShell tests: fail intermittently if tests are run in high concurrent mode.

Thu Mar 1 06:12:27 UTC 2018

An updated version with only the launching tests in exclusiveAccess and 
the common harness code moved up into lib:

    http://cr.openjdk.java.net/~rfield/8184445v1.webrev/

Warning: this takes this fix from tiny to massive.

Much effort has already been put into hardening JShell's launching 
networking code.  There is probably more that could be done, but I 
wouldn't know what.

-Robert

On 02/27/18 18:14, Joseph D. Darcy wrote:
> Hi Robert,
>
> I'd prefer if only the launching tests and other known failures were 
> segregated into a non-concurrent area. That would still let ~3/4 of 
> the tests proceed normally.
>
> As a follow-up, can an RFE be filed to harden the intermittently 
> failing tests against concurrent networking?
>
> Thanks,
>
> -Joe
>
> On 2/27/2018 12:21 PM, Robert Field wrote:
>> OK, I did a survey of all the JShell bugs.  There are over a dozen 
>> intermittent test failures, almost all are probably network related.  
>> But if we limit to just intermittent failures to launch, then there 
>> are seven.
>>
>> There are 17 tests of launching configuration, and 75 'normal' 
>> tests.  So, the launching configuration tests do fail 
>> disproportionately, 3 mentioned failures vs 5 mentioned failing files.
>>
>> The bug that highlighted the concurrent testing -- "JShell tests: 
>> fail intermittently if tests are run in high concurrent mode":
>>     https://bugs.openjdk.java.net/browse/JDK-8184445
>> mentioned 'several' issues, the two included JTR files are, 
>> tellingly, normal tests.
>>
>> The non-launching intermittent failures are all normal tests.
>>
>> So, where does that leave us?  I could reduce the failures a bit at 
>> low time-cost by putting the launching configuration tests in the 
>> exclusiveAccess.dirs.  Or, I could, at considerable testing cost, 
>> address the broad swath.
>>
>> -Robert
>>
>> On 02/26/18 17:28, joe darcy wrote:
>>> Hi Robert,
>>>
>>> On 2/26/2018 10:57 AM, Robert Field wrote:
>>>>
>>>>
>>>> On 02/26/18 10:23, joe darcy wrote:
>>>>> Hi Robert,
>>>>>
>>>>> The fix looks acceptable in terms of addressing the problem, but 
>>>>> is there a sense of how this might impact running time of the test 
>>>>> suite?
>>>>>
>>>>> Phrased differently, are there plans to make the tests more robust 
>>>>> to concurrent runs in the future?
>>>>
>>>> Hi Joe,
>>>>
>>>> There is a lot of network connection happening in these tests, most 
>>>> of which is in layers we don't control (JDI). We have been trying 
>>>> to lower the risk and we don't see failures running the tests 
>>>> ourselves, but intermittent failures scattered through the suite 
>>>> during testing (e.g. mach5) have been a constant problem.
>>>>
>>>> We will see the impact on test duration.  Default connection has 
>>>> three-level fail-over, the tests of other connection modes see 
>>>> failure far more frequently, so, if necessary, we can look at 
>>>> tuning this.
>>>>
>>>
>>> From some quick checking, there are about 80 tests in that 
>>> directory. From one sample point on my laptop, the tests took a good 
>>> long while to run. If some of the tests can be reliably run 
>>> concurrently, I'd much prefer to see a subset of tests moved to a 
>>> sheltered directory.
>>>
>>> Thanks,
>>>
>>> -Joe
>>
>