RFR(S): 8223217: [TESTBUG] Create JFR tests with JMX across container boundary

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Fri Feb 21 00:20:28 UTC 2020


Hi Erik

   Thank you for review. See my comments inline:

On 2/20/20 3:48 PM, Erik Gahlin wrote:
> Hi Misha,
>
> Overall looks good.
>
> Does the consumer used with ProcessTools.startProcess get output one 
> line at a time? If so, there is no risk that the test reads half an IP 
> address.
Yes, it does.
>
> In waitForJmxConnection you loop until the connector is not null. This 
> seems incorrect. I would expect JMXCinnectoryFactory to throw an 
> exception on failure, not return null. You may want to catch the 
> exception and retry.
>
I have updated the establishJmxConnection() method with try/catch, 
returning null in case if IOException occurs.

> Not sure if it matters, but if the test fails with an exception, for 
> example in the file transfer,  killContainer(containerName) will not 
> run and get cleaned up? Is that a problem? Try-finally would ensure 
> that it is always run.

Good catch. It is always good to cleanup resources, otherwise orphaned 
container will keep running. I added try/finally as you recommend.

Here is the updated webrev: 
http://cr.openjdk.java.net/~mseledtsov/8223217.04/


Thanks,

Misha

>
> Thanks
> Erik
>
>> On 21 Feb 2020, at 00:23, mikhailo.seledtsov at oracle.com 
>> <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>
>> I have resumed working on this issue, after some break for other 
>> activities.
>>
>> Here is the latest webrev: 
>> http://cr.openjdk.java.net/~mseledtsov/8223217.03/
>>
>> Please review.
>>
>> Summary of changes since last webrev:
>>
>>   - simplified the observee "EventProducer.java" to a minimum: it 
>> communicates the self IP address, then produces events in a loop
>>
>>   - the main test class TestJFRWithJMX.java now controls the 
>> observee; when done recording, it transfers the recording over JMX 
>> and kills the observee container
>>
>>   - general cleanup
>>
>>
>> Thanks,
>>
>> Misha
>>
>> *
>> *
>>
>>
>> On 12/10/19 6:27 AM, Erik Gahlin wrote:
>>> Hi Misha,
>>>
>>> On 2019-12-05 19:06, mikhailo.seledtsov at oracle.com wrote:
>>>> Hi Erik,
>>>>
>>>> On 12/4/19 11:30 PM, Erik Gahlin wrote:
>>>>> Hi Misha,
>>>>>
>>>>> The recording may complete before:
>>>>>
>>>>> while (r.getRecordings().get(0).getState() != RecordingState.STOPPED)
>>>>>
>>>>> is reached, or the recording may mot even be created (added to the 
>>>>> list). This will result in an IndexOutOfBoundsException.
>>>> Right. I did not account for that.
>>>>>
>>>>> In my example I worked around it by looping over the list 
>>>>> (possibly empty) and by not closing the recording I could be sure 
>>>>> that the STOPPED state will eventually be seen. When the recording 
>>>>> is closed it will disappear from the list.
>>>>
>>>> I will use your example then, with one change. Instead of existing 
>>>> the test once (r.getState() == RecordingState.STOPPED) is true, the 
>>>> test will break. If child process exits right away, this will 
>>>> result in closing of the container, depriving the main process the 
>>>> opportunity to transfer the recording. 
>>>
>>> Agree, it will not work with transfers.
>>>
>>>> Perhaps, I should break instead of System.exit(), and then have 
>>>> another loop to wait for all the recordings to close 
>>>> (r.getRecordings() become empty). After stopping the recording the 
>>>> main process transfers the recordings to the "host" system from the 
>>>> container, and then closes the recording. This will signal the 
>>>> "child/observed" process to "exit". E.g.:
>>>>
>>>>         // Wait for recording to be transferred and be closed
>>>>         while 
>>>> (!FlightRecorder.getFlightRecorder().getRecordings().isEmpty()) {
>>>>             Thread.sleep(100);
>>>>         }
>>>>
>>>>         System.out.println("Leaving main()");
>>>>
>>>> Let me know if you see any problems with this approach.
>>>
>>> The transfer may complete before the child process sees the STOPPED 
>>> state, which means it will be stuck in the first loop. I think you 
>>> need a state(i.e a second recording or some other mechanism)  that 
>>> is not associated with the recording that you want to operate on.
>>>
>>> Erik
>>>
>>>>
>>>> Thank you,
>>>>
>>>> Misha
>>>>
>>>>> If you think the test recording should be closed, you could find 
>>>>> some other means to communicate to the child process. For example, 
>>>>> create two new recordings when the test recording has been closed 
>>>>> and then in the loop check the count and if it is two exit the 
>>>>> application.
>>>>>
>>>>> Erik
>>>>>
>>>>>> On 5 Dec 2019, at 05:53, mikhailo.seledtsov at oracle.com wrote:
>>>>>>
>>>>>> Hi Erik,
>>>>>>      I have followed your suggestions, but have modified the code 
>>>>>> a bit in the event producer class from
>>>>>>      your original. First the child "event producer" process 
>>>>>> waits for the JFR system/recorder to be initialized,
>>>>>>      then waits for recording to stop, and after that waiting for 
>>>>>> recording(s) to close; the main test process
>>>>>>      will transfer recording first, then close it.
>>>>>>
>>>>>>      Updated webrev: 
>>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.02/
>>>>>>
>>>>>>      I ran the test 20 times in our test lab, all PASS
>>>>>>
>>>>>>
>>>>>> Thank you,
>>>>>> Misha
>>>>>>
>>>>>> On 12/4/19 10:02 AM, Mikhailo Seledtsov wrote:
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>>    Thank you for reviewing the test, and thanks for 
>>>>>>> recommendations. I will make the updates according to your 
>>>>>>> recommendations, and will post updated webrev after some testing.
>>>>>>>
>>>>>>>
>>>>>>> Misha
>>>>>>>
>>>>>>> On 12/3/19, 11:46 AM, Erik Gahlin wrote:
>>>>>>>> Hi Misha,
>>>>>>>>
>>>>>>>> Looks good overall, but it seems brittle to expect that the 
>>>>>>>> test will finish within the 10 s the EvenyGenerator runs
>>>>>>>>
>>>>>>>> Maybe you could make it deterministic by (for example) checking 
>>>>>>>> if the recording has stopped:
>>>>>>>>
>>>>>>>> int i = 0;
>>>>>>>> while (true) {
>>>>>>>>      SimpleEvent event = new SimpleEvent();
>>>>>>>>      event.msg = "Hello";
>>>>>>>>      event.count = i++;
>>>>>>>>      event.commit();
>>>>>>>>      Thread.sleep(10);
>>>>>>>>      if (FlightRecorder.isInitialized()) {
>>>>>>>>          for (Recording r : 
>>>>>>>> FlightRecorder.getFlightRecorder().getRecordings()) {
>>>>>>>>              if (r.getState() == RecordingState.STOPPED) {
>>>>>>>>                  System.exit(0);
>>>>>>>>              }
>>>>>>>>          }
>>>>>>>>      }
>>>>>>>> }
>>>>>>>>
>>>>>>>> and then not close the recording in the test?
>>>>>>>>
>>>>>>>> The test could also be made a bit more realistic if you run a 
>>>>>>>> default recording
>>>>>>>>
>>>>>>>> bean.setPredefinedConfiguration(recordingId, "default");
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Erik
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2019-12-03 05:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>> Please review this change that tests JFR operations across 
>>>>>>>>> container boundary while using JMX to control JFR.
>>>>>>>>> The test starts a container running a simple event producer, 
>>>>>>>>> establishes a JMX connection, uses FlightRecorderMXBean to 
>>>>>>>>> start the recording, stop and dump the recording, then 
>>>>>>>>> transfers recording from the container.
>>>>>>>>> Simple verification is performed on the transferred recording, 
>>>>>>>>> making sure that the expected test event is present.
>>>>>>>>>
>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8223217
>>>>>>>>>      Webrev: http://cr.openjdk.java.net/~mseledtsov/8223217.01/
>>>>>>>>>      Testing:
>>>>>>>>>          1. Ran the newly created test at least 30 times on a 
>>>>>>>>> Linux-x64 host with docker engine configured - PASSED
>>>>>>>>>          2. Ran the test at least 20 times on a test cluster: 
>>>>>>>>> ALL PASS
>>>>>>>>>          3. Ran all hotspot/containers tests on the test 
>>>>>>>>> cluster: PASS
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Misha
>>>>>>>>>
>


More information about the hotspot-jfr-dev mailing list