RFR(S): 8223217: [TESTBUG] Create JFR tests with JMX across container boundary
mikhailo.seledtsov at oracle.com
mikhailo.seledtsov at oracle.com
Fri Feb 21 00:20:28 UTC 2020
Hi Erik
Thank you for review. See my comments inline:
On 2/20/20 3:48 PM, Erik Gahlin wrote:
> Hi Misha,
>
> Overall looks good.
>
> Does the consumer used with ProcessTools.startProcess get output one
> line at a time? If so, there is no risk that the test reads half an IP
> address.
Yes, it does.
>
> In waitForJmxConnection you loop until the connector is not null. This
> seems incorrect. I would expect JMXCinnectoryFactory to throw an
> exception on failure, not return null. You may want to catch the
> exception and retry.
>
I have updated the establishJmxConnection() method with try/catch,
returning null in case if IOException occurs.
> Not sure if it matters, but if the test fails with an exception, for
> example in the file transfer, killContainer(containerName) will not
> run and get cleaned up? Is that a problem? Try-finally would ensure
> that it is always run.
Good catch. It is always good to cleanup resources, otherwise orphaned
container will keep running. I added try/finally as you recommend.
Here is the updated webrev:
http://cr.openjdk.java.net/~mseledtsov/8223217.04/
Thanks,
Misha
>
> Thanks
> Erik
>
>> On 21 Feb 2020, at 00:23, mikhailo.seledtsov at oracle.com
>> <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>
>> I have resumed working on this issue, after some break for other
>> activities.
>>
>> Here is the latest webrev:
>> http://cr.openjdk.java.net/~mseledtsov/8223217.03/
>>
>> Please review.
>>
>> Summary of changes since last webrev:
>>
>> - simplified the observee "EventProducer.java" to a minimum: it
>> communicates the self IP address, then produces events in a loop
>>
>> - the main test class TestJFRWithJMX.java now controls the
>> observee; when done recording, it transfers the recording over JMX
>> and kills the observee container
>>
>> - general cleanup
>>
>>
>> Thanks,
>>
>> Misha
>>
>> *
>> *
>>
>>
>> On 12/10/19 6:27 AM, Erik Gahlin wrote:
>>> Hi Misha,
>>>
>>> On 2019-12-05 19:06, mikhailo.seledtsov at oracle.com wrote:
>>>> Hi Erik,
>>>>
>>>> On 12/4/19 11:30 PM, Erik Gahlin wrote:
>>>>> Hi Misha,
>>>>>
>>>>> The recording may complete before:
>>>>>
>>>>> while (r.getRecordings().get(0).getState() != RecordingState.STOPPED)
>>>>>
>>>>> is reached, or the recording may mot even be created (added to the
>>>>> list). This will result in an IndexOutOfBoundsException.
>>>> Right. I did not account for that.
>>>>>
>>>>> In my example I worked around it by looping over the list
>>>>> (possibly empty) and by not closing the recording I could be sure
>>>>> that the STOPPED state will eventually be seen. When the recording
>>>>> is closed it will disappear from the list.
>>>>
>>>> I will use your example then, with one change. Instead of existing
>>>> the test once (r.getState() == RecordingState.STOPPED) is true, the
>>>> test will break. If child process exits right away, this will
>>>> result in closing of the container, depriving the main process the
>>>> opportunity to transfer the recording.
>>>
>>> Agree, it will not work with transfers.
>>>
>>>> Perhaps, I should break instead of System.exit(), and then have
>>>> another loop to wait for all the recordings to close
>>>> (r.getRecordings() become empty). After stopping the recording the
>>>> main process transfers the recordings to the "host" system from the
>>>> container, and then closes the recording. This will signal the
>>>> "child/observed" process to "exit". E.g.:
>>>>
>>>> // Wait for recording to be transferred and be closed
>>>> while
>>>> (!FlightRecorder.getFlightRecorder().getRecordings().isEmpty()) {
>>>> Thread.sleep(100);
>>>> }
>>>>
>>>> System.out.println("Leaving main()");
>>>>
>>>> Let me know if you see any problems with this approach.
>>>
>>> The transfer may complete before the child process sees the STOPPED
>>> state, which means it will be stuck in the first loop. I think you
>>> need a state(i.e a second recording or some other mechanism) that
>>> is not associated with the recording that you want to operate on.
>>>
>>> Erik
>>>
>>>>
>>>> Thank you,
>>>>
>>>> Misha
>>>>
>>>>> If you think the test recording should be closed, you could find
>>>>> some other means to communicate to the child process. For example,
>>>>> create two new recordings when the test recording has been closed
>>>>> and then in the loop check the count and if it is two exit the
>>>>> application.
>>>>>
>>>>> Erik
>>>>>
>>>>>> On 5 Dec 2019, at 05:53, mikhailo.seledtsov at oracle.com wrote:
>>>>>>
>>>>>> Hi Erik,
>>>>>> I have followed your suggestions, but have modified the code
>>>>>> a bit in the event producer class from
>>>>>> your original. First the child "event producer" process
>>>>>> waits for the JFR system/recorder to be initialized,
>>>>>> then waits for recording to stop, and after that waiting for
>>>>>> recording(s) to close; the main test process
>>>>>> will transfer recording first, then close it.
>>>>>>
>>>>>> Updated webrev:
>>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.02/
>>>>>>
>>>>>> I ran the test 20 times in our test lab, all PASS
>>>>>>
>>>>>>
>>>>>> Thank you,
>>>>>> Misha
>>>>>>
>>>>>> On 12/4/19 10:02 AM, Mikhailo Seledtsov wrote:
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>> Thank you for reviewing the test, and thanks for
>>>>>>> recommendations. I will make the updates according to your
>>>>>>> recommendations, and will post updated webrev after some testing.
>>>>>>>
>>>>>>>
>>>>>>> Misha
>>>>>>>
>>>>>>> On 12/3/19, 11:46 AM, Erik Gahlin wrote:
>>>>>>>> Hi Misha,
>>>>>>>>
>>>>>>>> Looks good overall, but it seems brittle to expect that the
>>>>>>>> test will finish within the 10 s the EvenyGenerator runs
>>>>>>>>
>>>>>>>> Maybe you could make it deterministic by (for example) checking
>>>>>>>> if the recording has stopped:
>>>>>>>>
>>>>>>>> int i = 0;
>>>>>>>> while (true) {
>>>>>>>> SimpleEvent event = new SimpleEvent();
>>>>>>>> event.msg = "Hello";
>>>>>>>> event.count = i++;
>>>>>>>> event.commit();
>>>>>>>> Thread.sleep(10);
>>>>>>>> if (FlightRecorder.isInitialized()) {
>>>>>>>> for (Recording r :
>>>>>>>> FlightRecorder.getFlightRecorder().getRecordings()) {
>>>>>>>> if (r.getState() == RecordingState.STOPPED) {
>>>>>>>> System.exit(0);
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> and then not close the recording in the test?
>>>>>>>>
>>>>>>>> The test could also be made a bit more realistic if you run a
>>>>>>>> default recording
>>>>>>>>
>>>>>>>> bean.setPredefinedConfiguration(recordingId, "default");
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Erik
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2019-12-03 05:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>> Please review this change that tests JFR operations across
>>>>>>>>> container boundary while using JMX to control JFR.
>>>>>>>>> The test starts a container running a simple event producer,
>>>>>>>>> establishes a JMX connection, uses FlightRecorderMXBean to
>>>>>>>>> start the recording, stop and dump the recording, then
>>>>>>>>> transfers recording from the container.
>>>>>>>>> Simple verification is performed on the transferred recording,
>>>>>>>>> making sure that the expected test event is present.
>>>>>>>>>
>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8223217
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8223217.01/
>>>>>>>>> Testing:
>>>>>>>>> 1. Ran the newly created test at least 30 times on a
>>>>>>>>> Linux-x64 host with docker engine configured - PASSED
>>>>>>>>> 2. Ran the test at least 20 times on a test cluster:
>>>>>>>>> ALL PASS
>>>>>>>>> 3. Ran all hotspot/containers tests on the test
>>>>>>>>> cluster: PASS
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Misha
>>>>>>>>>
>
More information about the hotspot-jfr-dev
mailing list