RFR(S): 8223217: [TESTBUG] Create JFR tests with JMX across container boundary

Leonid Mesnik leonid.mesnik at oracle.com
Fri Feb 21 04:09:06 UTC 2020


The latest version seems fine to me. 2 small nits.

http://cr.openjdk.java.net/~mseledtsov/8223217.04/test/hotspot/jtreg/containers/docker/TestJFRWithJMX.java.html

71     static AtomicReference<String> ipAddr = new AtomicReference();

It makes sense to make ipAddr final to prevent some issues during 
editing test. It wouldn't be possible to set it to null occasionally. 
Also IDE might warn about mistakes like "if (iAddr == null) {"  knowing 
that ipAddr couldn't be updated.



218         try (FileOutputStream fos = new FileOutputStream(f);
219              BufferedOutputStream bos = new BufferedOutputStream(fos)) {

The indentation of line 219 is wrong (at least in webrev). And better to 
replace it with one line. bos.close() also close FOS.

218         try (BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(f)) {


Leonid
On 2/20/20 4:37 PM, mikhailo.seledtsov at oracle.com wrote:
> Thank you. I will update per your suggestion.
>
> On 2/20/20 4:36 PM, Erik Gahlin wrote:
>>
>> Looks good.
>>
>> I think null could be avoided all together and instead have something 
>> like this:
>>
>> static JMXConnector waitForJmxConnection(String host, int port) 
>> throws Exception {
>>         String urlPath = "/jndi/rmi://" + host + ":" + port + "/jmxrmi";
>>         JMXServiceURL url = new JMXServiceURL("rmi", "", 0, urlPath);
>>         while (true) {
>>             try {
>>                 return JMXConnectorFactory.connect(url);
>>             } catch (IOException e) {
>>                 System.out.println("establishJmxConnection() thrown 
>> IOException: " + e.getMessage());
>>             }
>>             Thread.sleep(1000);
>>         }
>>     }
>>
>> No need to send updated webrev if you change into the above.
>>
>> Thanks
>> Erik
>>
>>
>>
>> On 2020-02-21 01:20, mikhailo.seledtsov at oracle.com wrote:
>>>
>>> Hi Erik
>>>
>>>   Thank you for review. See my comments inline:
>>>
>>> On 2/20/20 3:48 PM, Erik Gahlin wrote:
>>>> Hi Misha,
>>>>
>>>> Overall looks good.
>>>>
>>>> Does the consumer used with ProcessTools.startProcess get output 
>>>> one line at a time? If so, there is no risk that the test reads 
>>>> half an IP address.
>>> Yes, it does.
>>>>
>>>> In waitForJmxConnection you loop until the connector is not null. 
>>>> This seems incorrect. I would expect JMXCinnectoryFactory to throw 
>>>> an exception on failure, not return null. You may want to catch the 
>>>> exception and retry.
>>>>
>>> I have updated the establishJmxConnection() method with try/catch, 
>>> returning null in case if IOException occurs.
>>>
>>>> Not sure if it matters, but if the test fails with an exception, 
>>>> for example in the file transfer,  killContainer(containerName) 
>>>> will not run and get cleaned up? Is that a problem? Try-finally 
>>>> would ensure that it is always run.
>>>
>>> Good catch. It is always good to cleanup resources, otherwise 
>>> orphaned container will keep running. I added try/finally as you 
>>> recommend.
>>>
>>> Here is the updated webrev: 
>>> http://cr.openjdk.java.net/~mseledtsov/8223217.04/
>>>
>>>
>>> Thanks,
>>>
>>> Misha
>>>
>>>>
>>>> Thanks
>>>> Erik
>>>>
>>>>> On 21 Feb 2020, at 00:23, mikhailo.seledtsov at oracle.com 
>>>>> <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>>>>
>>>>> I have resumed working on this issue, after some break for other 
>>>>> activities.
>>>>>
>>>>> Here is the latest webrev: 
>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.03/
>>>>>
>>>>> Please review.
>>>>>
>>>>> Summary of changes since last webrev:
>>>>>
>>>>>   - simplified the observee "EventProducer.java" to a minimum: it 
>>>>> communicates the self IP address, then produces events in a loop
>>>>>
>>>>>   - the main test class TestJFRWithJMX.java now controls the 
>>>>> observee; when done recording, it transfers the recording over JMX 
>>>>> and kills the observee container
>>>>>
>>>>>   - general cleanup
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Misha
>>>>>
>>>>> *
>>>>> *
>>>>>
>>>>>
>>>>> On 12/10/19 6:27 AM, Erik Gahlin wrote:
>>>>>> Hi Misha,
>>>>>>
>>>>>> On 2019-12-05 19:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>> On 12/4/19 11:30 PM, Erik Gahlin wrote:
>>>>>>>> Hi Misha,
>>>>>>>>
>>>>>>>> The recording may complete before:
>>>>>>>>
>>>>>>>> while (r.getRecordings().get(0).getState() != 
>>>>>>>> RecordingState.STOPPED)
>>>>>>>>
>>>>>>>> is reached, or the recording may mot even be created (added to 
>>>>>>>> the list). This will result in an IndexOutOfBoundsException.
>>>>>>> Right. I did not account for that.
>>>>>>>>
>>>>>>>> In my example I worked around it by looping over the list 
>>>>>>>> (possibly empty) and by not closing the recording I could be 
>>>>>>>> sure that the STOPPED state will eventually be seen. When the 
>>>>>>>> recording is closed it will disappear from the list.
>>>>>>>
>>>>>>> I will use your example then, with one change. Instead of 
>>>>>>> existing the test once (r.getState() == RecordingState.STOPPED) 
>>>>>>> is true, the test will break. If child process exits right away, 
>>>>>>> this will result in closing of the container, depriving the main 
>>>>>>> process the opportunity to transfer the recording. 
>>>>>>
>>>>>> Agree, it will not work with transfers.
>>>>>>
>>>>>>> Perhaps, I should break instead of System.exit(), and then have 
>>>>>>> another loop to wait for all the recordings to close 
>>>>>>> (r.getRecordings() become empty). After stopping the recording 
>>>>>>> the main process transfers the recordings to the "host" system 
>>>>>>> from the container, and then closes the recording. This will 
>>>>>>> signal the "child/observed" process to "exit". E.g.:
>>>>>>>
>>>>>>>         // Wait for recording to be transferred and be closed
>>>>>>>         while 
>>>>>>> (!FlightRecorder.getFlightRecorder().getRecordings().isEmpty()) {
>>>>>>>             Thread.sleep(100);
>>>>>>>         }
>>>>>>>
>>>>>>>         System.out.println("Leaving main()");
>>>>>>>
>>>>>>> Let me know if you see any problems with this approach.
>>>>>>
>>>>>> The transfer may complete before the child process sees the 
>>>>>> STOPPED state, which means it will be stuck in the first loop. I 
>>>>>> think you need a state(i.e a second recording or some other 
>>>>>> mechanism)  that is not associated with the recording that you 
>>>>>> want to operate on.
>>>>>>
>>>>>> Erik
>>>>>>
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Misha
>>>>>>>
>>>>>>>> If you think the test recording should be closed, you could 
>>>>>>>> find some other means to communicate to the child process. For 
>>>>>>>> example, create two new recordings when the test recording has 
>>>>>>>> been closed and then in the loop check the count and if it is 
>>>>>>>> two exit the application.
>>>>>>>>
>>>>>>>> Erik
>>>>>>>>
>>>>>>>>> On 5 Dec 2019, at 05:53, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>
>>>>>>>>> Hi Erik,
>>>>>>>>>      I have followed your suggestions, but have modified the 
>>>>>>>>> code a bit in the event producer class from
>>>>>>>>>      your original. First the child "event producer" process 
>>>>>>>>> waits for the JFR system/recorder to be initialized,
>>>>>>>>>      then waits for recording to stop, and after that waiting 
>>>>>>>>> for recording(s) to close; the main test process
>>>>>>>>>      will transfer recording first, then close it.
>>>>>>>>>
>>>>>>>>>      Updated webrev: 
>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.02/
>>>>>>>>>
>>>>>>>>>      I ran the test 20 times in our test lab, all PASS
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Misha
>>>>>>>>>
>>>>>>>>> On 12/4/19 10:02 AM, Mikhailo Seledtsov wrote:
>>>>>>>>>> Hi Erik,
>>>>>>>>>>
>>>>>>>>>>    Thank you for reviewing the test, and thanks for 
>>>>>>>>>> recommendations. I will make the updates according to your 
>>>>>>>>>> recommendations, and will post updated webrev after some 
>>>>>>>>>> testing.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Misha
>>>>>>>>>>
>>>>>>>>>> On 12/3/19, 11:46 AM, Erik Gahlin wrote:
>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>
>>>>>>>>>>> Looks good overall, but it seems brittle to expect that the 
>>>>>>>>>>> test will finish within the 10 s the EvenyGenerator runs
>>>>>>>>>>>
>>>>>>>>>>> Maybe you could make it deterministic by (for example) 
>>>>>>>>>>> checking if the recording has stopped:
>>>>>>>>>>>
>>>>>>>>>>> int i = 0;
>>>>>>>>>>> while (true) {
>>>>>>>>>>>      SimpleEvent event = new SimpleEvent();
>>>>>>>>>>>      event.msg = "Hello";
>>>>>>>>>>>      event.count = i++;
>>>>>>>>>>>      event.commit();
>>>>>>>>>>>      Thread.sleep(10);
>>>>>>>>>>>      if (FlightRecorder.isInitialized()) {
>>>>>>>>>>>          for (Recording r : 
>>>>>>>>>>> FlightRecorder.getFlightRecorder().getRecordings()) {
>>>>>>>>>>>              if (r.getState() == RecordingState.STOPPED) {
>>>>>>>>>>>                  System.exit(0);
>>>>>>>>>>>              }
>>>>>>>>>>>          }
>>>>>>>>>>>      }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> and then not close the recording in the test?
>>>>>>>>>>>
>>>>>>>>>>> The test could also be made a bit more realistic if you run 
>>>>>>>>>>> a default recording
>>>>>>>>>>>
>>>>>>>>>>> bean.setPredefinedConfiguration(recordingId, "default");
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Erik
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2019-12-03 05:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>> Please review this change that tests JFR operations across 
>>>>>>>>>>>> container boundary while using JMX to control JFR.
>>>>>>>>>>>> The test starts a container running a simple event 
>>>>>>>>>>>> producer, establishes a JMX connection, uses 
>>>>>>>>>>>> FlightRecorderMXBean to start the recording, stop and dump 
>>>>>>>>>>>> the recording, then transfers recording from the container.
>>>>>>>>>>>> Simple verification is performed on the transferred 
>>>>>>>>>>>> recording, making sure that the expected test event is 
>>>>>>>>>>>> present.
>>>>>>>>>>>>
>>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8223217
>>>>>>>>>>>>      Webrev: 
>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.01/
>>>>>>>>>>>>      Testing:
>>>>>>>>>>>>          1. Ran the newly created test at least 30 times on 
>>>>>>>>>>>> a Linux-x64 host with docker engine configured - PASSED
>>>>>>>>>>>>          2. Ran the test at least 20 times on a test 
>>>>>>>>>>>> cluster: ALL PASS
>>>>>>>>>>>>          3. Ran all hotspot/containers tests on the test 
>>>>>>>>>>>> cluster: PASS
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> Misha
>>>>>>>>>>>>
>>>>


More information about the hotspot-runtime-dev mailing list