RFR(S): 8223217: [TESTBUG] Create JFR tests with JMX across container boundary

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Thu Feb 20 23:23:15 UTC 2020


I have resumed working on this issue, after some break for other activities.

Here is the latest webrev: 
http://cr.openjdk.java.net/~mseledtsov/8223217.03/

Please review.

Summary of changes since last webrev:

   - simplified the observee "EventProducer.java" to a minimum: it 
communicates the self IP address, then produces events in a loop

   - the main test class TestJFRWithJMX.java now controls the observee; 
when done recording, it transfers the recording over JMX and kills the 
observee container

   - general cleanup


Thanks,

Misha

*
*


On 12/10/19 6:27 AM, Erik Gahlin wrote:
> Hi Misha,
>
> On 2019-12-05 19:06, mikhailo.seledtsov at oracle.com wrote:
>> Hi Erik,
>>
>> On 12/4/19 11:30 PM, Erik Gahlin wrote:
>>> Hi Misha,
>>>
>>> The recording may complete before:
>>>
>>> while (r.getRecordings().get(0).getState() != RecordingState.STOPPED)
>>>
>>> is reached, or the recording may mot even be created (added to the 
>>> list). This will result in an IndexOutOfBoundsException.
>> Right. I did not account for that.
>>>
>>> In my example I worked around it by looping over the list (possibly 
>>> empty) and by not closing the recording I could be sure that the 
>>> STOPPED state will eventually be seen. When the recording is closed 
>>> it will disappear from the list.
>>
>> I will use your example then, with one change. Instead of existing 
>> the test once (r.getState() == RecordingState.STOPPED) is true, the 
>> test will break. If child process exits right away, this will result 
>> in closing of the container, depriving the main process the 
>> opportunity to transfer the recording. 
>
> Agree, it will not work with transfers.
>
>> Perhaps, I should break instead of System.exit(), and then have 
>> another loop to wait for all the recordings to close 
>> (r.getRecordings() become empty). After stopping the recording the 
>> main process transfers the recordings to the "host" system from the 
>> container, and then closes the recording. This will signal the 
>> "child/observed" process to "exit". E.g.:
>>
>>         // Wait for recording to be transferred and be closed
>>         while 
>> (!FlightRecorder.getFlightRecorder().getRecordings().isEmpty()) {
>>             Thread.sleep(100);
>>         }
>>
>>         System.out.println("Leaving main()");
>>
>> Let me know if you see any problems with this approach.
>
> The transfer may complete before the child process sees the STOPPED 
> state, which means it will be stuck in the first loop. I think you 
> need a state(i.e a second recording or some other mechanism)  that is 
> not associated with the recording that you want to operate on.
>
> Erik
>
>>
>> Thank you,
>>
>> Misha
>>
>>> If you think the test recording should be closed, you could find 
>>> some other means to communicate to the child process. For example, 
>>> create two new recordings when the test recording has been closed 
>>> and then in the loop check the count and if it is two exit the 
>>> application.
>>>
>>> Erik
>>>
>>>> On 5 Dec 2019, at 05:53, mikhailo.seledtsov at oracle.com wrote:
>>>>
>>>> Hi Erik,
>>>>      I have followed your suggestions, but have modified the code a 
>>>> bit in the event producer class from
>>>>      your original. First the child "event producer" process waits 
>>>> for the JFR system/recorder to be initialized,
>>>>      then waits for recording to stop, and after that waiting for 
>>>> recording(s) to close; the main test process
>>>>      will transfer recording first, then close it.
>>>>
>>>>      Updated webrev: 
>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.02/
>>>>
>>>>      I ran the test 20 times in our test lab, all PASS
>>>>
>>>>
>>>> Thank you,
>>>> Misha
>>>>
>>>> On 12/4/19 10:02 AM, Mikhailo Seledtsov wrote:
>>>>> Hi Erik,
>>>>>
>>>>>    Thank you for reviewing the test, and thanks for 
>>>>> recommendations. I will make the updates according to your 
>>>>> recommendations, and will post updated webrev after some testing.
>>>>>
>>>>>
>>>>> Misha
>>>>>
>>>>> On 12/3/19, 11:46 AM, Erik Gahlin wrote:
>>>>>> Hi Misha,
>>>>>>
>>>>>> Looks good overall, but it seems brittle to expect that the test 
>>>>>> will finish within the 10 s the EvenyGenerator runs
>>>>>>
>>>>>> Maybe you could make it deterministic by (for example) checking 
>>>>>> if the recording has stopped:
>>>>>>
>>>>>> int i = 0;
>>>>>> while (true) {
>>>>>>      SimpleEvent event = new SimpleEvent();
>>>>>>      event.msg = "Hello";
>>>>>>      event.count = i++;
>>>>>>      event.commit();
>>>>>>      Thread.sleep(10);
>>>>>>      if (FlightRecorder.isInitialized()) {
>>>>>>          for (Recording r : 
>>>>>> FlightRecorder.getFlightRecorder().getRecordings()) {
>>>>>>              if (r.getState() == RecordingState.STOPPED) {
>>>>>>                  System.exit(0);
>>>>>>              }
>>>>>>          }
>>>>>>      }
>>>>>> }
>>>>>>
>>>>>> and then not close the recording in the test?
>>>>>>
>>>>>> The test could also be made a bit more realistic if you run a 
>>>>>> default recording
>>>>>>
>>>>>> bean.setPredefinedConfiguration(recordingId, "default");
>>>>>>
>>>>>> Thanks
>>>>>> Erik
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2019-12-03 05:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>>> Please review this change that tests JFR operations across 
>>>>>>> container boundary while using JMX to control JFR.
>>>>>>> The test starts a container running a simple event producer, 
>>>>>>> establishes a JMX connection, uses FlightRecorderMXBean to start 
>>>>>>> the recording, stop and dump the recording, then transfers 
>>>>>>> recording from the container.
>>>>>>> Simple verification is performed on the transferred recording, 
>>>>>>> making sure that the expected test event is present.
>>>>>>>
>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8223217
>>>>>>>      Webrev: http://cr.openjdk.java.net/~mseledtsov/8223217.01/
>>>>>>>      Testing:
>>>>>>>          1. Ran the newly created test at least 30 times on a 
>>>>>>> Linux-x64 host with docker engine configured - PASSED
>>>>>>>          2. Ran the test at least 20 times on a test cluster: 
>>>>>>> ALL PASS
>>>>>>>          3. Ran all hotspot/containers tests on the test 
>>>>>>> cluster: PASS
>>>>>>>
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Misha
>>>>>>>


More information about the hotspot-runtime-dev mailing list