RFR [XS]: 8229370: make jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
Baesken, Matthias
matthias.baesken at sap.com
Wed Aug 28 12:02:12 UTC 2019
Hi David, thanks for reaching out to the net folks.
I use now the stream of InetAddresses from NetworkInterface.networkInterfaces().flatMap(NetworkInterface::inetAddresses) ;
However on the machine where I noticed the issue , I still have to do a couple of retries .
Just one send to the stream of InetAddresses from NetworkInterface.networkInterfaces().flatMap(NetworkInterface::inetAddresses) does not always give
the expected result ☹ .
New webrev :
http://cr.openjdk.java.net/~mbaesken/webrevs/8229370.2/
Thanks, Matthias
> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Mittwoch, 28. August 2019 09:41
> To: mikhailo.seledtsov at oracle.com; Baesken, Matthias
> <matthias.baesken at sap.com>
> Cc: 'hotspot-dev at openjdk.java.net' <hotspot-dev at openjdk.java.net>;
> hotspot-jfr-dev at openjdk.java.net
> Subject: Re: RFR [XS]: 8229370: make
> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
>
> Hi Misha,
>
> On 28/08/2019 12:53 pm, mikhailo.seledtsov at oracle.com wrote:
> >
> > On 8/27/19 6:14 PM, David Holmes wrote:
> >> On 28/08/2019 6:47 am, Mikhailo Seledtsov wrote:
> >>> On 8/27/19, 1:15 AM, Baesken, Matthias wrote:
> >>>> Hi David, thanks for the info about
> >>>>
> >>>> https://bugs.openjdk.java.net/browse/JDK-8228990
> >>>>
> >>>>
> >>>> regarding your comment in the bug :
> >>>>
> >>>>> So it makes no sense. I finally found an example where the test
> >>>>> passed and failed on the same machine.
> >>>> I've seen this too .
> >>>>
> >>>> Looks like my change only increased the probability of
> >>>> incidental network traffic happening on the real network interfaces .
> >>>>
> >>>> Should we exclude the test, in the current state it might indeed be
> >>>> problematic .
> >>>>
> >>>> (otherwise we could make the test pass on Linux when just 1
> >>>> network interface is found, this might be a legitimate case isn’t
> >>>> it ?)
> >>> Based on David's analysis in the "JDK-8228990: [TESTBUG] JFR
> >>> TestNetworkUtilizationEvent.java expects 2+ Network interfaces on
> >>> Linux but finding 1", my opinion is to remove the check for number of
> >>> interfaces all together. Or just check that there is 1 interface.
> >>
> >> The test expects there to be two interfaces always present: the
> >> loopback interface and a real network interface. There could be
> >> additional ones. The problem is that the test fails to generate
> >> traffic on the real network interface due to the use of
> >> 10.0.0.0:12345. I have no idea why someone thought sending a packet to
> >> that address would necessarily cause the kind of traffic that would
> >> show up in the JFR event.
> >>
> >> Are we really likely to be running this test on a machine without a
> >> real network interface or the loopback interface? The former seems
> >> very unlikely. The latter may be something configurable but it seems
> >> very unlikely to me that anyone would configure a test system that
> >> way. So I don't think the "expected number of interfaces" is the
> >> issue. The issue is generating observable traffic on the real network
> >> interface - at least that is what we see in our test failures (the
> >> output for the "lo" interface is always present).
> > Thank you for detailed explanation. Sorry, I did not understand this at
> > first.
> >>
> >> So it should be as simple as changing 10.0.0.0:12345 into something
> >> guaranteed to work?
> >>
> >> I think this needs to be looked at by the JFR folk and net-dev folk to
> >> come up with a valid testing scenario.
> >
> > Perhaps, we can problem list the test for now, until we find a good
> > solution. Several options come to mind:
> >
> > - pick a suitable destination address for "real network interface"
> > and test it first; if no traffic is generated after sufficient retries,
> > return jtreg.SkippedException (test skipped), instead of failure
> >
> > - try a range of suitable destination addresses; also have a
> > fallback to jtreg.SkippedException if none of them work
> >
> > - in addition, a suitable address can be passed as a test property;
> > this way test will be configurable for a given infrastructure
> >
> >
> > What do you think?
>
> I spoke to the net folk and one thing that should be guaranteed to work
> is to write to the IP address of the machine itself (the real IP address
> not the loopback address). Now to get that without incurring a name
> resolution we need to get the network interfaces. From Alan Bateman:
>
> NetworkInterface.networkInterfaces().flatMap(NetworkInterface::inetAddr
> esses)
> will give you a stream of the InetAddress objects.
>
> So we could write to one or more of those addresses and then check for
> the utilization info.
>
> David
> -----
>
> > Thank you,
> >
> > Misha
> >
> >> Cheers,
> >> David
> >>
> >>> Misha
> >>>>
> >>>>
> >>>> Best regards, Matthias
> >>>>
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: David Holmes<david.holmes at oracle.com>
> >>>>> Sent: Dienstag, 27. August 2019 09:56
> >>>>> To: Baesken, Matthias<matthias.baesken at sap.com>; 'hotspot-
> >>>>> dev at openjdk.java.net'<hotspot-dev at openjdk.java.net>; hotspot-
> jfr-
> >>>>> dev at openjdk.java.net
> >>>>> Subject: Re: RFR [XS]: 8229370: make
> >>>>> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
> >>>>>
> >>>>> Hi Matthias,
> >>>>>
> >>>>> On 27/08/2019 5:41 pm, Baesken, Matthias wrote:
> >>>>>> Hello, any reviews for this small change ?
> >>>>> I missed the initial request - sorry.
> >>>>>
> >>>>> Seems we have a double up of effort here as we also have
> >>>>> JDK-8228990 for
> >>>>> the exact same problem that we see on some of our test machines.
> >>>>>
> >>>>> Our analysis suggests that this test often passes by accident due to
> >>>>> incidental activity on the real network interface when the logic
> >>>>> intended to generate that activity (the packet sent to 10.0.0.0:12345)
> >>>>> actually had no affect (unreachable address). If there is no
> >>>>> incidental
> >>>>> network activity then the real network interface is not seen and so
> >>>>> the
> >>>>> test fails.
> >>>>>
> >>>>> David
> >>>>>
> >>>>>> Thanks , Matthias
> >>>>>>
> >>>>>> From: Baesken, Matthias
> >>>>>> Sent: Montag, 12. August 2019 14:33
> >>>>>> To: 'hotspot-dev at openjdk.java.net'<hotspot-
> dev at openjdk.java.net>;
> >>>>> 'hotspot-jfr-dev at openjdk.java.net'<hotspot-jfr-
> dev at openjdk.java.net>
> >>>>>> Subject: RFR [XS]: 8229370: make
> >>>>> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
> >>>>>> Hello, please review this small test enhancement.
> >>>>>>
> >>>>>> We noticed that on some of our Linux machines (SLES12 based) the
> >>>>> TestNetworkUtilizationEvent.java test reported just 1 interface
> >>>>>> (the test TestNetworkUtilizationEvent.java expects more than 1 on
> >>>>>> Linux).
> >>>>>>
> >>>>>> Looking into the HS code , os_perf_linux.cpp collects the
> >>>>>> interfaces +
> >>>>> additional information about bytes read/written (by looking at
> >>>>> /sys/class/net/eth<X>/statistics/<countername> )
> >>>>>> and this info is given to JFR .
> >>>>>>
> >>>>>> However it seems to need (at least on some machines / setups)
> more
> >>>>> packet send operations / potential retries to really get counter
> >>>>> updates
> >>>>> (and without updates in the counters, no interfaces are found).
> >>>>>> So I adjusted the test accordingly.
> >>>>>>
> >>>>>>
> >>>>>> Bug/webrev :
> >>>>>>
> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8229370
> >>>>>>
> >>>>>> http://cr.openjdk.java.net/~mbaesken/webrevs/8229370.0/
> >>>>>>
> >>>>>>
> >>>>>> Best regards, Matthias
> >>>>>>
More information about the hotspot-jfr-dev
mailing list