RFR [XS]: 8229370: make jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable

David Holmes david.holmes at oracle.com
Wed Oct 2 03:17:20 UTC 2019


Hi Matthias,

On 30/09/2019 8:14 pm, Baesken, Matthias wrote:
>>
>> I'm unclear about the details of the test. Does this:
>>   77         Stream<InetAddress> si = NetworkInterface.networkInterfaces().flatMap(NetworkInterface::inetAddresses);
>> not also return the loopback address that was already tested? Could it
>> return interfaces that we really don't want to be trying to test?
> 
> Hi David,
>     yes we are sending to all Inetadresses of all adapters  ( at least   the ones  that are not in status DOWN,   I noticed  that  the Java/net JDK classes  omit  those on Linux ).
> I think it is not a bad idea to send to all to get the "right"  one  but maybe the original  test owners  might comment on this .

I don't see any point sending explicitly to the loopback address and 
then have that repeated when you loop through all the interfaces.

I don't know enough about the psuedo/virtual adapters to know whether 
including them makes sense.

>    88             } catch(IOException ioe) {
>    89             }
> 
>> Why are we silently swallowing exceptions here?
> 
> I agree , we should at least give some output for this case of send failures .
> 
>> The test is sometimes failing on Windows (2 out of 5 runs):
> 
> Thanks for testing !
>   Bad to hear about the failures ,   is it failing too  without my  patch ?  It might be a separate issue  you observe .

It's hard to run the test on the exact same machines. The point is that 
this "more stable" test is still failing.

> Events.hasEvents(events);     fails in your  example below  looking at the stacktrace  -  there seems to be something  very wrong with the JFR event  generating and/or capturing on the machine you test .

Then we need the JFR folk to chime in and see why we're not getting the 
expected events.

Thanks,
David

> Best regards, Matthias
> 
> 
>>
>> Hi Matthias,
>>
>> The test is sometimes failing on Windows (2 out of 5 runs):
>>
>> java.lang.RuntimeException: No events: expected false, was true
>> 	at jdk.test.lib.Asserts.fail(Asserts.java:594)
>> 	at jdk.test.lib.Asserts.assertFalse(Asserts.java:461)
>> 	at jdk.test.lib.jfr.Events.hasEvents(Events.java:158)
>> 	at
>> jdk.jfr.event.runtime.TestNetworkUtilizationEvent.main(TestNetworkUtiliza
>> tionEvent.java:98)
>> 	at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> 	at
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMet
>> hodAccessorImpl.java:62)
>> 	at
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Delega
>> tingMethodAccessorImpl.java:43)
>> 	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>> 	at
>> com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapp
>> er.java:127)
>> 	at java.base/java.lang.Thread.run(Thread.java:830)
>>
>> The main output shows we are duplicating the write to the loopback
>> address and I think we're trying to write to too many interfaces:
>>
>> ----------System.out:(12/660)----------
>> [0.796s][trace][jfr,event] Reporting network utilization
>> [0.811s][trace][jfr,event] Reporting network utilization
>> InetAddress.getLoopbackAddress :localhost/127.0.0.1 host address:127.0.0.1
>> Sending to InetAddress:/127.0.0.1
>> Sending to InetAddress:/0:0:0:0:0:0:0:1
>> Sending to InetAddress:/<IPv4 address>
>> Sending to InetAddress:/<IPv6 addr>%eth4
>> Sending to InetAddress:/<IPv6 addr>
>> Sending to InetAddress:/<IPv6 add>%net5
>> [6.943s][trace][jfr,event] Reporting network utilization
>> [6.950s][trace][jfr,event] Reporting network utilization
>> [6.957s][trace][jfr,event] Reporting network utilization
>>
>> On a passing test I see:
>>
>> [6.947s][trace][jfr,event] Reporting network utilization
>> [6.947s][trace][jfr,event] found data for NetworkInterface Oracle VirtIO
>> Ethernet Adapter (read_rate 19, write_rate 10)
>> [6.952s][trace][jfr,event] Reporting network utilization
>> [6.960s][trace][jfr,event] Reporting network utilization
>> jdk.NetworkUtilization {
>>     startTime = 00:36:46.904
>>     networkInterface = "Oracle VirtIO Ethernet Adapter"
>>     readRate = 152 bps
>>     writeRate = 80 bps
>> }
>>
>> but I have no idea to which of the 6 INetAddress entries this corresponds.
>>
>> David
>>
>> On 29/09/2019 10:17 am, David Holmes wrote:
>>> Hi Matthias,
>>>
>>> On 27/09/2019 8:56 pm, Baesken, Matthias wrote:
>>>> Hi David /  Mikhailo ,  I  adjusted the test a bit more , and also
>>>> added   (+enabled) UL-based   jfr,event  tracing  in
>>>> src/hotspot/share/jfr/periodic/jfrNetworkUtilization.cpp
>>>>     to better see  the recorded event information .
>>>>
>>>> The current revision
>>>>
>>>> http://cr.openjdk.java.net/~mbaesken/webrevs/8229370.3/
>>>>
>>>> sends   DatagramPackets   to     all     InetAddresses     of    all
>>>> network interfaces  of the machine  .
>>>> I observed  that on our "problematic" machine  where the test  fails
>>>> we still need a little  delay   to see the   read / write   counters
>>>> (fetched by os_perf and then used in the JFR)
>>>>      increase on the machine ( that’s why I wait a bit before every
>>>> send operation).
>>>>
>>>> Could you  please  check   8229370.3    also in your infrastructure
>>>> where you noticed   sporadic failures   in
>>>> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java   and tell me
>>>>    about the results ?
>>>
>>> I've submitted a test run to our system.
>>>
>>> I'm unclear about the details of the test. Does this:
>>>
>>>    77         Stream<InetAddress> si =
>>>
>> NetworkInterface.networkInterfaces().flatMap(NetworkInterface::inetAddr
>> esses);
>>>
>>>
>>> not also return the loopback address that was already tested? Could it
>>> return interfaces that we really don't want to be trying to test?
>>>
>>>    88             } catch(IOException ioe) {
>>>    89             }
>>>
>>> Why are we silently swallowing exceptions here?
>>>
>>> Thanks,
>>> David
>>>
>>>>
>>>> Best regards, Matthias
>>>>
>>>>
>>>>> Subject: Re: RFR [XS]: 8229370: make
>>>>> jdk/jfr/event/runtime/TestNetworkUtilizationEvent.java more stable
>>>>>
>>>>> Hi Matthias,
>>>>>
>>>>> On 24/09/2019 12:23 am, Baesken, Matthias wrote:
>>>>>> Hi David /  Mikhailo , I was busy with other tasks  but today  got
>>>>>> back to
>>>>> 8229370 .
>>>>>>
>>>>>> I noticed that in the meantime,   the test was excluded  with
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8230115
>>>>>>
>>>>>> "Problemlist JFR TestNetworkUtilization test"
>>>>>>
>>>>>>
>>>>>> Do you think we still should  rely  on the OS counters , and expect
>>>>>> to get  2+
>>>>> network interfaces,  or  keep  the test excluded (or just relax  the
>>>>> check and
>>>>> check for 1+  network interfaces on Linux)  ?
>>>>>
>>>>> Exclusion is just a temporary measure to clean up the testing results,
>>>>> so this still needs to be fixed. I have nothing further to add from my
>>>>> comments in the bug:
>>>>>
>>>>>    > So it should be as simple as changing 10.0.0.0:12345 into something
>>>>>    > guaranteed to work?
>>>>>    >
>>>>>    > I think this needs to be looked at by the JFR folk and net-dev
>>>>> folk to
>>>>>    > come up with a valid testing scenario.
>>>>>
>>>>> It's not the number of interfaces that is the issue, it is generating
>>>>> traffic on the real interface.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>>
>>>>>> Best regards, Matthias
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On 29/08/2019 12:24 am, Baesken, Matthias wrote:
>>>>>>>> Hi David ,   I could  add some  optional  UL logging  to see
>>>>>>>> what happens.
>>>>>>>
>>>>>>> I just want to see more visibility at the test level to ensure it is
>>>>>>> finding the interfaces and addresses I would expect it to find.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>> Maybe the  OS counters   that are fetched by   os_perf    are not
>>>>>>>> that
>>>>>>> reliable on some  kernels .
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards, Matthias
>>>>>>>>
>>>>>>


More information about the hotspot-jfr-dev mailing list