RFR [XS]: 8229370: make jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
mikhailo.seledtsov at oracle.com
mikhailo.seledtsov at oracle.com
Wed Aug 28 02:53:48 UTC 2019
On 8/27/19 6:14 PM, David Holmes wrote:
> On 28/08/2019 6:47 am, Mikhailo Seledtsov wrote:
>> On 8/27/19, 1:15 AM, Baesken, Matthias wrote:
>>> Hi David, thanks for the info about
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8228990
>>>
>>>
>>> regarding your comment in the bug :
>>>
>>>> So it makes no sense. I finally found an example where the test
>>>> passed and failed on the same machine.
>>> I've seen this too .
>>>
>>> Looks like my change only increased the probability of
>>> incidental network traffic happening on the real network interfaces .
>>>
>>> Should we exclude the test, in the current state it might indeed be
>>> problematic .
>>>
>>> (otherwise we could make the test pass on Linux when just 1
>>> network interface is found, this might be a legitimate case isn’t
>>> it ?)
>> Based on David's analysis in the "JDK-8228990: [TESTBUG] JFR
>> TestNetworkUtilizationEvent.java expects 2+ Network interfaces on
>> Linux but finding 1", my opinion is to remove the check for number of
>> interfaces all together. Or just check that there is 1 interface.
>
> The test expects there to be two interfaces always present: the
> loopback interface and a real network interface. There could be
> additional ones. The problem is that the test fails to generate
> traffic on the real network interface due to the use of
> 10.0.0.0:12345. I have no idea why someone thought sending a packet to
> that address would necessarily cause the kind of traffic that would
> show up in the JFR event.
>
> Are we really likely to be running this test on a machine without a
> real network interface or the loopback interface? The former seems
> very unlikely. The latter may be something configurable but it seems
> very unlikely to me that anyone would configure a test system that
> way. So I don't think the "expected number of interfaces" is the
> issue. The issue is generating observable traffic on the real network
> interface - at least that is what we see in our test failures (the
> output for the "lo" interface is always present).
Thank you for detailed explanation. Sorry, I did not understand this at
first.
>
> So it should be as simple as changing 10.0.0.0:12345 into something
> guaranteed to work?
>
> I think this needs to be looked at by the JFR folk and net-dev folk to
> come up with a valid testing scenario.
Perhaps, we can problem list the test for now, until we find a good
solution. Several options come to mind:
- pick a suitable destination address for "real network interface"
and test it first; if no traffic is generated after sufficient retries,
return jtreg.SkippedException (test skipped), instead of failure
- try a range of suitable destination addresses; also have a
fallback to jtreg.SkippedException if none of them work
- in addition, a suitable address can be passed as a test property;
this way test will be configurable for a given infrastructure
What do you think?
Thank you,
Misha
> Cheers,
> David
>
>> Misha
>>>
>>>
>>> Best regards, Matthias
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes<david.holmes at oracle.com>
>>>> Sent: Dienstag, 27. August 2019 09:56
>>>> To: Baesken, Matthias<matthias.baesken at sap.com>; 'hotspot-
>>>> dev at openjdk.java.net'<hotspot-dev at openjdk.java.net>; hotspot-jfr-
>>>> dev at openjdk.java.net
>>>> Subject: Re: RFR [XS]: 8229370: make
>>>> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
>>>>
>>>> Hi Matthias,
>>>>
>>>> On 27/08/2019 5:41 pm, Baesken, Matthias wrote:
>>>>> Hello, any reviews for this small change ?
>>>> I missed the initial request - sorry.
>>>>
>>>> Seems we have a double up of effort here as we also have
>>>> JDK-8228990 for
>>>> the exact same problem that we see on some of our test machines.
>>>>
>>>> Our analysis suggests that this test often passes by accident due to
>>>> incidental activity on the real network interface when the logic
>>>> intended to generate that activity (the packet sent to 10.0.0.0:12345)
>>>> actually had no affect (unreachable address). If there is no
>>>> incidental
>>>> network activity then the real network interface is not seen and so
>>>> the
>>>> test fails.
>>>>
>>>> David
>>>>
>>>>> Thanks , Matthias
>>>>>
>>>>> From: Baesken, Matthias
>>>>> Sent: Montag, 12. August 2019 14:33
>>>>> To: 'hotspot-dev at openjdk.java.net'<hotspot-dev at openjdk.java.net>;
>>>> 'hotspot-jfr-dev at openjdk.java.net'<hotspot-jfr-dev at openjdk.java.net>
>>>>> Subject: RFR [XS]: 8229370: make
>>>> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
>>>>> Hello, please review this small test enhancement.
>>>>>
>>>>> We noticed that on some of our Linux machines (SLES12 based) the
>>>> TestNetworkUtilizationEvent.java test reported just 1 interface
>>>>> (the test TestNetworkUtilizationEvent.java expects more than 1 on
>>>>> Linux).
>>>>>
>>>>> Looking into the HS code , os_perf_linux.cpp collects the
>>>>> interfaces +
>>>> additional information about bytes read/written (by looking at
>>>> /sys/class/net/eth<X>/statistics/<countername> )
>>>>> and this info is given to JFR .
>>>>>
>>>>> However it seems to need (at least on some machines / setups) more
>>>> packet send operations / potential retries to really get counter
>>>> updates
>>>> (and without updates in the counters, no interfaces are found).
>>>>> So I adjusted the test accordingly.
>>>>>
>>>>>
>>>>> Bug/webrev :
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8229370
>>>>>
>>>>> http://cr.openjdk.java.net/~mbaesken/webrevs/8229370.0/
>>>>>
>>>>>
>>>>> Best regards, Matthias
>>>>>
More information about the hotspot-jfr-dev
mailing list