RFR [XS]: 8229370: make jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable

Baesken, Matthias matthias.baesken at sap.com
Mon Sep 30 10:14:05 UTC 2019


>
> I'm unclear about the details of the test. Does this:
>  77         Stream<InetAddress> si = NetworkInterface.networkInterfaces().flatMap(NetworkInterface::inetAddresses);
> not also return the loopback address that was already tested? Could it 
> return interfaces that we really don't want to be trying to test?

Hi David, 
   yes we are sending to all Inetadresses of all adapters  ( at least   the ones  that are not in status DOWN,   I noticed  that  the Java/net JDK classes  omit  those on Linux ).
I think it is not a bad idea to send to all to get the "right"  one  but maybe the original  test owners  might comment on this .

  88             } catch(IOException ioe) {
  89             }

> Why are we silently swallowing exceptions here?

I agree , we should at least give some output for this case of send failures .

> The test is sometimes failing on Windows (2 out of 5 runs):

Thanks for testing !
 Bad to hear about the failures ,   is it failing too  without my  patch ?  It might be a separate issue  you observe .

Events.hasEvents(events);     fails in your  example below  looking at the stacktrace  -  there seems to be something  very wrong with the JFR event  generating and/or capturing on the machine you test .

Best regards, Matthias


> 
> Hi Matthias,
> 
> The test is sometimes failing on Windows (2 out of 5 runs):
> 
> java.lang.RuntimeException: No events: expected false, was true
> 	at jdk.test.lib.Asserts.fail(Asserts.java:594)
> 	at jdk.test.lib.Asserts.assertFalse(Asserts.java:461)
> 	at jdk.test.lib.jfr.Events.hasEvents(Events.java:158)
> 	at
> jdk.jfr.event.runtime.TestNetworkUtilizationEvent.main(TestNetworkUtiliza
> tionEvent.java:98)
> 	at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> 	at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMet
> hodAccessorImpl.java:62)
> 	at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Delega
> tingMethodAccessorImpl.java:43)
> 	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> 	at
> com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapp
> er.java:127)
> 	at java.base/java.lang.Thread.run(Thread.java:830)
> 
> The main output shows we are duplicating the write to the loopback
> address and I think we're trying to write to too many interfaces:
> 
> ----------System.out:(12/660)----------
> [0.796s][trace][jfr,event] Reporting network utilization
> [0.811s][trace][jfr,event] Reporting network utilization
> InetAddress.getLoopbackAddress :localhost/127.0.0.1 host address:127.0.0.1
> Sending to InetAddress:/127.0.0.1
> Sending to InetAddress:/0:0:0:0:0:0:0:1
> Sending to InetAddress:/<IPv4 address>
> Sending to InetAddress:/<IPv6 addr>%eth4
> Sending to InetAddress:/<IPv6 addr>
> Sending to InetAddress:/<IPv6 add>%net5
> [6.943s][trace][jfr,event] Reporting network utilization
> [6.950s][trace][jfr,event] Reporting network utilization
> [6.957s][trace][jfr,event] Reporting network utilization
> 
> On a passing test I see:
> 
> [6.947s][trace][jfr,event] Reporting network utilization
> [6.947s][trace][jfr,event] found data for NetworkInterface Oracle VirtIO
> Ethernet Adapter (read_rate 19, write_rate 10)
> [6.952s][trace][jfr,event] Reporting network utilization
> [6.960s][trace][jfr,event] Reporting network utilization
> jdk.NetworkUtilization {
>    startTime = 00:36:46.904
>    networkInterface = "Oracle VirtIO Ethernet Adapter"
>    readRate = 152 bps
>    writeRate = 80 bps
> }
> 
> but I have no idea to which of the 6 INetAddress entries this corresponds.
> 
> David
> 
> On 29/09/2019 10:17 am, David Holmes wrote:
> > Hi Matthias,
> >
> > On 27/09/2019 8:56 pm, Baesken, Matthias wrote:
> >> Hi David /  Mikhailo ,  I  adjusted the test a bit more , and also
> >> added   (+enabled) UL-based   jfr,event  tracing  in
> >> src/hotspot/share/jfr/periodic/jfrNetworkUtilization.cpp
> >>    to better see  the recorded event information .
> >>
> >> The current revision
> >>
> >> http://cr.openjdk.java.net/~mbaesken/webrevs/8229370.3/
> >>
> >> sends   DatagramPackets   to     all     InetAddresses     of    all
> >> network interfaces  of the machine  .
> >> I observed  that on our "problematic" machine  where the test  fails
> >> we still need a little  delay   to see the   read / write   counters
> >> (fetched by os_perf and then used in the JFR)
> >>     increase on the machine ( that’s why I wait a bit before every
> >> send operation).
> >>
> >> Could you  please  check   8229370.3    also in your infrastructure
> >> where you noticed   sporadic failures   in
> >> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java   and tell me
> >>   about the results ?
> >
> > I've submitted a test run to our system.
> >
> > I'm unclear about the details of the test. Does this:
> >
> >   77         Stream<InetAddress> si =
> >
> NetworkInterface.networkInterfaces().flatMap(NetworkInterface::inetAddr
> esses);
> >
> >
> > not also return the loopback address that was already tested? Could it
> > return interfaces that we really don't want to be trying to test?
> >
> >   88             } catch(IOException ioe) {
> >   89             }
> >
> > Why are we silently swallowing exceptions here?
> >
> > Thanks,
> > David
> >
> >>
> >> Best regards, Matthias
> >>
> >>
> >>> Subject: Re: RFR [XS]: 8229370: make
> >>> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
> >>>
> >>> Hi Matthias,
> >>>
> >>> On 24/09/2019 12:23 am, Baesken, Matthias wrote:
> >>>> Hi David /  Mikhailo , I was busy with other tasks  but today  got
> >>>> back to
> >>> 8229370 .
> >>>>
> >>>> I noticed that in the meantime,   the test was excluded  with
> >>>>
> >>>> https://bugs.openjdk.java.net/browse/JDK-8230115
> >>>>
> >>>> "Problemlist JFR TestNetworkUtilization test"
> >>>>
> >>>>
> >>>> Do you think we still should  rely  on the OS counters , and expect
> >>>> to get  2+
> >>> network interfaces,  or  keep  the test excluded (or just relax  the
> >>> check and
> >>> check for 1+  network interfaces on Linux)  ?
> >>>
> >>> Exclusion is just a temporary measure to clean up the testing results,
> >>> so this still needs to be fixed. I have nothing further to add from my
> >>> comments in the bug:
> >>>
> >>>   > So it should be as simple as changing 10.0.0.0:12345 into something
> >>>   > guaranteed to work?
> >>>   >
> >>>   > I think this needs to be looked at by the JFR folk and net-dev
> >>> folk to
> >>>   > come up with a valid testing scenario.
> >>>
> >>> It's not the number of interfaces that is the issue, it is generating
> >>> traffic on the real interface.
> >>>
> >>> Thanks,
> >>> David
> >>>
> >>>>
> >>>> Best regards, Matthias
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> On 29/08/2019 12:24 am, Baesken, Matthias wrote:
> >>>>>> Hi David ,   I could  add some  optional  UL logging  to see
> >>>>>> what happens.
> >>>>>
> >>>>> I just want to see more visibility at the test level to ensure it is
> >>>>> finding the interfaces and addresses I would expect it to find.
> >>>>>
> >>>>> David
> >>>>>
> >>>>>> Maybe the  OS counters   that are fetched by   os_perf    are not
> >>>>>> that
> >>>>> reliable on some  kernels .
> >>>>>>
> >>>>>>
> >>>>>> Best regards, Matthias
> >>>>>>
> >>>>


More information about the hotspot-dev mailing list