RFR(S): 8223217: [TESTBUG] Create JFR tests with JMX across container boundary

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Fri Feb 21 00:37:46 UTC 2020


Thank you. I will update per your suggestion.

On 2/20/20 4:36 PM, Erik Gahlin wrote:
>
> Looks good.
>
> I think null could be avoided all together and instead have something 
> like this:
>
> static JMXConnector waitForJmxConnection(String host, int port) throws 
> Exception {
>         String urlPath = "/jndi/rmi://" + host + ":" + port + "/jmxrmi";
>         JMXServiceURL url = new JMXServiceURL("rmi", "", 0, urlPath);
>         while (true) {
>             try {
>                 return JMXConnectorFactory.connect(url);
>             } catch (IOException e) {
>                 System.out.println("establishJmxConnection() thrown 
> IOException: " + e.getMessage());
>             }
>             Thread.sleep(1000);
>         }
>     }
>
> No need to send updated webrev if you change into the above.
>
> Thanks
> Erik
>
>
>
> On 2020-02-21 01:20, mikhailo.seledtsov at oracle.com wrote:
>>
>> Hi Erik
>>
>>   Thank you for review. See my comments inline:
>>
>> On 2/20/20 3:48 PM, Erik Gahlin wrote:
>>> Hi Misha,
>>>
>>> Overall looks good.
>>>
>>> Does the consumer used with ProcessTools.startProcess get output one 
>>> line at a time? If so, there is no risk that the test reads half an 
>>> IP address.
>> Yes, it does.
>>>
>>> In waitForJmxConnection you loop until the connector is not null. 
>>> This seems incorrect. I would expect JMXCinnectoryFactory to throw 
>>> an exception on failure, not return null. You may want to catch the 
>>> exception and retry.
>>>
>> I have updated the establishJmxConnection() method with try/catch, 
>> returning null in case if IOException occurs.
>>
>>> Not sure if it matters, but if the test fails with an exception, for 
>>> example in the file transfer,  killContainer(containerName) will not 
>>> run and get cleaned up? Is that a problem? Try-finally would ensure 
>>> that it is always run.
>>
>> Good catch. It is always good to cleanup resources, otherwise 
>> orphaned container will keep running. I added try/finally as you 
>> recommend.
>>
>> Here is the updated webrev: 
>> http://cr.openjdk.java.net/~mseledtsov/8223217.04/
>>
>>
>> Thanks,
>>
>> Misha
>>
>>>
>>> Thanks
>>> Erik
>>>
>>>> On 21 Feb 2020, at 00:23, mikhailo.seledtsov at oracle.com 
>>>> <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>>>
>>>> I have resumed working on this issue, after some break for other 
>>>> activities.
>>>>
>>>> Here is the latest webrev: 
>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.03/
>>>>
>>>> Please review.
>>>>
>>>> Summary of changes since last webrev:
>>>>
>>>>   - simplified the observee "EventProducer.java" to a minimum: it 
>>>> communicates the self IP address, then produces events in a loop
>>>>
>>>>   - the main test class TestJFRWithJMX.java now controls the 
>>>> observee; when done recording, it transfers the recording over JMX 
>>>> and kills the observee container
>>>>
>>>>   - general cleanup
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Misha
>>>>
>>>> *
>>>> *
>>>>
>>>>
>>>> On 12/10/19 6:27 AM, Erik Gahlin wrote:
>>>>> Hi Misha,
>>>>>
>>>>> On 2019-12-05 19:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>> Hi Erik,
>>>>>>
>>>>>> On 12/4/19 11:30 PM, Erik Gahlin wrote:
>>>>>>> Hi Misha,
>>>>>>>
>>>>>>> The recording may complete before:
>>>>>>>
>>>>>>> while (r.getRecordings().get(0).getState() != 
>>>>>>> RecordingState.STOPPED)
>>>>>>>
>>>>>>> is reached, or the recording may mot even be created (added to 
>>>>>>> the list). This will result in an IndexOutOfBoundsException.
>>>>>> Right. I did not account for that.
>>>>>>>
>>>>>>> In my example I worked around it by looping over the list 
>>>>>>> (possibly empty) and by not closing the recording I could be 
>>>>>>> sure that the STOPPED state will eventually be seen. When the 
>>>>>>> recording is closed it will disappear from the list.
>>>>>>
>>>>>> I will use your example then, with one change. Instead of 
>>>>>> existing the test once (r.getState() == RecordingState.STOPPED) 
>>>>>> is true, the test will break. If child process exits right away, 
>>>>>> this will result in closing of the container, depriving the main 
>>>>>> process the opportunity to transfer the recording. 
>>>>>
>>>>> Agree, it will not work with transfers.
>>>>>
>>>>>> Perhaps, I should break instead of System.exit(), and then have 
>>>>>> another loop to wait for all the recordings to close 
>>>>>> (r.getRecordings() become empty). After stopping the recording 
>>>>>> the main process transfers the recordings to the "host" system 
>>>>>> from the container, and then closes the recording. This will 
>>>>>> signal the "child/observed" process to "exit". E.g.:
>>>>>>
>>>>>>         // Wait for recording to be transferred and be closed
>>>>>>         while 
>>>>>> (!FlightRecorder.getFlightRecorder().getRecordings().isEmpty()) {
>>>>>>             Thread.sleep(100);
>>>>>>         }
>>>>>>
>>>>>>         System.out.println("Leaving main()");
>>>>>>
>>>>>> Let me know if you see any problems with this approach.
>>>>>
>>>>> The transfer may complete before the child process sees the 
>>>>> STOPPED state, which means it will be stuck in the first loop. I 
>>>>> think you need a state(i.e a second recording or some other 
>>>>> mechanism)  that is not associated with the recording that you 
>>>>> want to operate on.
>>>>>
>>>>> Erik
>>>>>
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Misha
>>>>>>
>>>>>>> If you think the test recording should be closed, you could find 
>>>>>>> some other means to communicate to the child process. For 
>>>>>>> example, create two new recordings when the test recording has 
>>>>>>> been closed and then in the loop check the count and if it is 
>>>>>>> two exit the application.
>>>>>>>
>>>>>>> Erik
>>>>>>>
>>>>>>>> On 5 Dec 2019, at 05:53, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>
>>>>>>>> Hi Erik,
>>>>>>>>      I have followed your suggestions, but have modified the 
>>>>>>>> code a bit in the event producer class from
>>>>>>>>      your original. First the child "event producer" process 
>>>>>>>> waits for the JFR system/recorder to be initialized,
>>>>>>>>      then waits for recording to stop, and after that waiting 
>>>>>>>> for recording(s) to close; the main test process
>>>>>>>>      will transfer recording first, then close it.
>>>>>>>>
>>>>>>>>      Updated webrev: 
>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.02/
>>>>>>>>
>>>>>>>>      I ran the test 20 times in our test lab, all PASS
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Misha
>>>>>>>>
>>>>>>>> On 12/4/19 10:02 AM, Mikhailo Seledtsov wrote:
>>>>>>>>> Hi Erik,
>>>>>>>>>
>>>>>>>>>    Thank you for reviewing the test, and thanks for 
>>>>>>>>> recommendations. I will make the updates according to your 
>>>>>>>>> recommendations, and will post updated webrev after some testing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Misha
>>>>>>>>>
>>>>>>>>> On 12/3/19, 11:46 AM, Erik Gahlin wrote:
>>>>>>>>>> Hi Misha,
>>>>>>>>>>
>>>>>>>>>> Looks good overall, but it seems brittle to expect that the 
>>>>>>>>>> test will finish within the 10 s the EvenyGenerator runs
>>>>>>>>>>
>>>>>>>>>> Maybe you could make it deterministic by (for example) 
>>>>>>>>>> checking if the recording has stopped:
>>>>>>>>>>
>>>>>>>>>> int i = 0;
>>>>>>>>>> while (true) {
>>>>>>>>>>      SimpleEvent event = new SimpleEvent();
>>>>>>>>>>      event.msg = "Hello";
>>>>>>>>>>      event.count = i++;
>>>>>>>>>>      event.commit();
>>>>>>>>>>      Thread.sleep(10);
>>>>>>>>>>      if (FlightRecorder.isInitialized()) {
>>>>>>>>>>          for (Recording r : 
>>>>>>>>>> FlightRecorder.getFlightRecorder().getRecordings()) {
>>>>>>>>>>              if (r.getState() == RecordingState.STOPPED) {
>>>>>>>>>>                  System.exit(0);
>>>>>>>>>>              }
>>>>>>>>>>          }
>>>>>>>>>>      }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> and then not close the recording in the test?
>>>>>>>>>>
>>>>>>>>>> The test could also be made a bit more realistic if you run a 
>>>>>>>>>> default recording
>>>>>>>>>>
>>>>>>>>>> bean.setPredefinedConfiguration(recordingId, "default");
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Erik
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2019-12-03 05:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>> Please review this change that tests JFR operations across 
>>>>>>>>>>> container boundary while using JMX to control JFR.
>>>>>>>>>>> The test starts a container running a simple event producer, 
>>>>>>>>>>> establishes a JMX connection, uses FlightRecorderMXBean to 
>>>>>>>>>>> start the recording, stop and dump the recording, then 
>>>>>>>>>>> transfers recording from the container.
>>>>>>>>>>> Simple verification is performed on the transferred 
>>>>>>>>>>> recording, making sure that the expected test event is present.
>>>>>>>>>>>
>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8223217
>>>>>>>>>>>      Webrev: http://cr.openjdk.java.net/~mseledtsov/8223217.01/
>>>>>>>>>>>      Testing:
>>>>>>>>>>>          1. Ran the newly created test at least 30 times on 
>>>>>>>>>>> a Linux-x64 host with docker engine configured - PASSED
>>>>>>>>>>>          2. Ran the test at least 20 times on a test 
>>>>>>>>>>> cluster: ALL PASS
>>>>>>>>>>>          3. Ran all hotspot/containers tests on the test 
>>>>>>>>>>> cluster: PASS
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Misha
>>>>>>>>>>>
>>>


More information about the hotspot-jfr-dev mailing list