RFR [XS]: 8229370: make jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Wed Aug 28 02:53:48 UTC 2019


On 8/27/19 6:14 PM, David Holmes wrote:
> On 28/08/2019 6:47 am, Mikhailo Seledtsov wrote:
>> On 8/27/19, 1:15 AM, Baesken, Matthias wrote:
>>> Hi  David,   thanks for  the info about
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8228990
>>>
>>>
>>> regarding your comment in the bug :
>>>
>>>> So it makes no sense. I finally found an example where the test 
>>>> passed and failed on the same machine.
>>> I've seen this too .
>>>
>>> Looks like  my change  only  increased  the probability of 
>>> incidental network traffic happening  on the real network interfaces .
>>>
>>> Should we exclude the test,  in the current state it might indeed be 
>>> problematic .
>>>
>>> (otherwise we could make the test pass  on Linux  when just 1 
>>> network interface is found,  this might be a legitimate case isn’t 
>>> it ?)
>> Based on David's analysis in the "JDK-8228990: [TESTBUG] JFR 
>> TestNetworkUtilizationEvent.java expects 2+ Network interfaces on 
>> Linux but finding 1", my opinion is to remove the check for number of 
>> interfaces all together. Or just check that there is 1 interface.
>
> The test expects there to be two interfaces always present: the 
> loopback interface and a real network interface. There could be 
> additional ones. The problem is that the test fails to generate 
> traffic on the real network interface due to the use of 
> 10.0.0.0:12345. I have no idea why someone thought sending a packet to 
> that address would necessarily cause the kind of traffic that would 
> show up in the JFR event.
>
> Are we really likely to be running this test on a machine without a 
> real network interface or the loopback interface? The former seems 
> very unlikely. The latter may be something configurable but it seems 
> very unlikely to me that anyone would configure a test system that 
> way. So I don't think the "expected number of interfaces" is the 
> issue. The issue is generating observable traffic on the real network 
> interface - at least that is what we see in our test failures (the 
> output for the "lo" interface is always present).
Thank you for detailed explanation. Sorry, I did not understand this at 
first.
>
> So it should be as simple as changing 10.0.0.0:12345 into something 
> guaranteed to work?
>
> I think this needs to be looked at by the JFR folk and net-dev folk to 
> come up with a valid testing scenario.

Perhaps, we can problem list the test for now, until we find a good 
solution. Several options come to mind:

    - pick a suitable destination address for "real network interface" 
and test it first; if no traffic is generated after sufficient retries, 
return jtreg.SkippedException (test skipped), instead of failure

    - try a range of suitable destination addresses; also have a 
fallback to jtreg.SkippedException if none of them work

    - in addition, a suitable address can be passed as a test property; 
this way test will be configurable for a given infrastructure


What do you think?

Thank you,

Misha

> Cheers,
> David
>
>> Misha
>>>
>>>
>>> Best regards, Matthias
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes<david.holmes at oracle.com>
>>>> Sent: Dienstag, 27. August 2019 09:56
>>>> To: Baesken, Matthias<matthias.baesken at sap.com>; 'hotspot-
>>>> dev at openjdk.java.net'<hotspot-dev at openjdk.java.net>; hotspot-jfr-
>>>> dev at openjdk.java.net
>>>> Subject: Re: RFR [XS]: 8229370: make
>>>> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
>>>>
>>>> Hi Matthias,
>>>>
>>>> On 27/08/2019 5:41 pm, Baesken, Matthias wrote:
>>>>> Hello, any reviews for this small change ?
>>>> I missed the initial request - sorry.
>>>>
>>>> Seems we have a double up of effort here as we also have 
>>>> JDK-8228990 for
>>>> the exact same problem that we see on some of our test machines.
>>>>
>>>> Our analysis suggests that this test often passes by accident due to
>>>> incidental activity on the real network interface when the logic
>>>> intended to generate that activity (the packet sent to 10.0.0.0:12345)
>>>> actually had no affect (unreachable address). If there is no 
>>>> incidental
>>>> network activity then the real network interface is not seen and so 
>>>> the
>>>> test fails.
>>>>
>>>> David
>>>>
>>>>> Thanks , Matthias
>>>>>
>>>>> From: Baesken, Matthias
>>>>> Sent: Montag, 12. August 2019 14:33
>>>>> To: 'hotspot-dev at openjdk.java.net'<hotspot-dev at openjdk.java.net>;
>>>> 'hotspot-jfr-dev at openjdk.java.net'<hotspot-jfr-dev at openjdk.java.net>
>>>>> Subject: RFR [XS]: 8229370: make
>>>> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
>>>>> Hello, please review this small test enhancement.
>>>>>
>>>>> We noticed that on some of our Linux machines  (SLES12 based)   the
>>>> TestNetworkUtilizationEvent.java test reported just 1 interface
>>>>> (the test TestNetworkUtilizationEvent.java  expects more than 1 on 
>>>>> Linux).
>>>>>
>>>>> Looking into the HS code , os_perf_linux.cpp collects the 
>>>>> interfaces +
>>>> additional information about bytes read/written  (by looking at
>>>> /sys/class/net/eth<X>/statistics/<countername> )
>>>>> and this info is given to JFR .
>>>>>
>>>>> However it seems to need (at least on some machines / setups) more
>>>> packet send operations / potential  retries to really  get counter 
>>>> updates
>>>> (and without updates in the counters,   no interfaces are found).
>>>>> So I adjusted the test accordingly.
>>>>>
>>>>>
>>>>> Bug/webrev :
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8229370
>>>>>
>>>>> http://cr.openjdk.java.net/~mbaesken/webrevs/8229370.0/
>>>>>
>>>>>
>>>>> Best regards, Matthias
>>>>>


More information about the hotspot-jfr-dev mailing list