RFR: 8219585: [TESTBUG] sun/management/jmxremote/bootstrap/JMXInterfaceBindingTest.java passes trivially when it shouldn't

Daniel Fuchs daniel.fuchs at oracle.com
Fri Mar 8 11:59:47 UTC 2019


Hi Severin,

On 05/03/2019 14:33, Severin Gehwolf wrote:
> Hi Daniel,
> 
> Latest webrev:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8219585/03/webrev/
> 
> It's incorporating your changes wrt. to some timeouts and asserting the
> expected exit code. Instead of the ProcessThread changes, I'm using the
> sendMessage() approach. That API is already there.

Oh - nice finding, thanks for that! :-)

> Unfortunately when running both SSL and plain sockets tests, it would
> randomly fail for me. Even if I choose different sets of ports for
> plain/ssl. So I've taken a different route of randomly selecting SSL or
> plain. Overall, this should give reasonable coverage for both (plain
> and SSL). This made test stability improve a lot on my Linux x86_64
> machine. Ran for 100 iterations without failure.

Did you try it with a Thread.sleep(1000); statement inserted between
the plain test and the SSL test?

FWIW - I have imported your changes, and sent them as is in our
test system. Ran them 200 times on each platform - got only
one failure - on Linux x64, where one of the process failed with
"address already in use". I believe this is bound to happen from
time to time if by misfortune some other process on the host
is binding to the same port.

Then I modified your changes to test both plain and SSL, with
a sleep(1000) between the two - and observed the same behavior:
1 random failure out of 200 runs (which happened on Linux
x64 too but on a different machine in the farm).

I'd suggest to preemptively add the `@key intermittent` keyword
to the test - this hints to future maintainers (and to people that
analyze failures during build promotions) that the test is known
to fail intermittently.

1 out of 200 runs is probably an acceptable failure rate for that
kind of test. If I could think of any way to get rid of such random
failures I would insist that we fix the test before pushing it, but
unfortunately I have come to the conclusion that the nature of the
feature we're trying to test rules out any of the usual tricks we
could try.

So to sum it up:

- please add the `@key intermittent` keyword
- please verify on your local box if testing both plain &
   SSL with a Thread.sleep(1000) in between makes the test
   more stable. If it does, then I'd prefer we go down
   this road rather than selecting one plain vs SSL at random.

Then I think you're good to go - you can count me in as Reviewer.

If the failure rate becomes too noisy in the future, then we
can revisit - and one possibility will then be to turn this test
into a manual test.

Sorry to have been a bit picky - I hate intermittent failures
with a passion ;-)

best regards,

-- daniel

> 
> I'll run this through jdk/submit now. Could you run this through your
> test system too, please?
> 
> Would you be OK with getting this patch pushed once we'd have positive
> results for both?
> 
> Thanks,
> Severin
> 
> On Fri, 2019-03-01 at 15:08 +0000, Daniel Fuchs wrote:
>> Hi Severin,
>>
>> On 28/02/2019 15:47, Severin Gehwolf wrote:
>>> Yes, thanks for looking at this Daniel. That was my determination as
>>> well. However, even if we do all of the above, and add a test config
>>> with /etc/hosts mapping a non-loopback address to "localhost" it would
>>> break other tests. E.g. this one:
>>> java/net/InetAddress/GetLocalHostWithSM.java
>>>
>>> So I'd have to explore whether your suggestion with
>>> InetAddress.getLocalHost() is viable. It seems promising.
>>>
>>> I'll keep you posted.
>>
>> Thanks for keeping the investigation going!
>>
>> For what it's worth this is what I have been experimenting with:
>> http://cr.openjdk.java.net/~dfuchs/experiment-8219585/experiment.00/
>>
>> It's only a POC and obviously need more cleaning.
>> Maybe some of the arbitrary timeouts in the test could be scaled
>> in accordance to timeout-factor (I think there's an adjustTimeout(long)
>> function somewhere in the test libs that does that).
>>
>> I ran it 50 times in our test system - and it passed on all platforms,
>> so there's yet hope :-)
>>
>> hope this helps,
>>
>> -- daniel
>>
>>
> 



More information about the serviceability-dev mailing list