RFR(S): 8223217: [TESTBUG] Create JFR tests with JMX across container boundary
Erik Gahlin
erik.gahlin at oracle.com
Fri Feb 21 00:36:39 UTC 2020
Looks good.
I think null could be avoided all together and instead have something
like this:
static JMXConnector waitForJmxConnection(String host, int port) throws
Exception {
String urlPath = "/jndi/rmi://" + host + ":" + port + "/jmxrmi";
JMXServiceURL url = new JMXServiceURL("rmi", "", 0, urlPath);
while (true) {
try {
return JMXConnectorFactory.connect(url);
} catch (IOException e) {
System.out.println("establishJmxConnection() thrown
IOException: " + e.getMessage());
}
Thread.sleep(1000);
}
}
No need to send updated webrev if you change into the above.
Thanks
Erik
On 2020-02-21 01:20, mikhailo.seledtsov at oracle.com wrote:
>
> Hi Erik
>
> Thank you for review. See my comments inline:
>
> On 2/20/20 3:48 PM, Erik Gahlin wrote:
>> Hi Misha,
>>
>> Overall looks good.
>>
>> Does the consumer used with ProcessTools.startProcess get output one
>> line at a time? If so, there is no risk that the test reads half an
>> IP address.
> Yes, it does.
>>
>> In waitForJmxConnection you loop until the connector is not null.
>> This seems incorrect. I would expect JMXCinnectoryFactory to throw an
>> exception on failure, not return null. You may want to catch the
>> exception and retry.
>>
> I have updated the establishJmxConnection() method with try/catch,
> returning null in case if IOException occurs.
>
>> Not sure if it matters, but if the test fails with an exception, for
>> example in the file transfer, killContainer(containerName) will not
>> run and get cleaned up? Is that a problem? Try-finally would ensure
>> that it is always run.
>
> Good catch. It is always good to cleanup resources, otherwise orphaned
> container will keep running. I added try/finally as you recommend.
>
> Here is the updated webrev:
> http://cr.openjdk.java.net/~mseledtsov/8223217.04/
>
>
> Thanks,
>
> Misha
>
>>
>> Thanks
>> Erik
>>
>>> On 21 Feb 2020, at 00:23, mikhailo.seledtsov at oracle.com
>>> <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>>
>>> I have resumed working on this issue, after some break for other
>>> activities.
>>>
>>> Here is the latest webrev:
>>> http://cr.openjdk.java.net/~mseledtsov/8223217.03/
>>>
>>> Please review.
>>>
>>> Summary of changes since last webrev:
>>>
>>> - simplified the observee "EventProducer.java" to a minimum: it
>>> communicates the self IP address, then produces events in a loop
>>>
>>> - the main test class TestJFRWithJMX.java now controls the
>>> observee; when done recording, it transfers the recording over JMX
>>> and kills the observee container
>>>
>>> - general cleanup
>>>
>>>
>>> Thanks,
>>>
>>> Misha
>>>
>>> *
>>> *
>>>
>>>
>>> On 12/10/19 6:27 AM, Erik Gahlin wrote:
>>>> Hi Misha,
>>>>
>>>> On 2019-12-05 19:06, mikhailo.seledtsov at oracle.com wrote:
>>>>> Hi Erik,
>>>>>
>>>>> On 12/4/19 11:30 PM, Erik Gahlin wrote:
>>>>>> Hi Misha,
>>>>>>
>>>>>> The recording may complete before:
>>>>>>
>>>>>> while (r.getRecordings().get(0).getState() !=
>>>>>> RecordingState.STOPPED)
>>>>>>
>>>>>> is reached, or the recording may mot even be created (added to
>>>>>> the list). This will result in an IndexOutOfBoundsException.
>>>>> Right. I did not account for that.
>>>>>>
>>>>>> In my example I worked around it by looping over the list
>>>>>> (possibly empty) and by not closing the recording I could be sure
>>>>>> that the STOPPED state will eventually be seen. When the
>>>>>> recording is closed it will disappear from the list.
>>>>>
>>>>> I will use your example then, with one change. Instead of existing
>>>>> the test once (r.getState() == RecordingState.STOPPED) is true,
>>>>> the test will break. If child process exits right away, this will
>>>>> result in closing of the container, depriving the main process the
>>>>> opportunity to transfer the recording.
>>>>
>>>> Agree, it will not work with transfers.
>>>>
>>>>> Perhaps, I should break instead of System.exit(), and then have
>>>>> another loop to wait for all the recordings to close
>>>>> (r.getRecordings() become empty). After stopping the recording the
>>>>> main process transfers the recordings to the "host" system from
>>>>> the container, and then closes the recording. This will signal the
>>>>> "child/observed" process to "exit". E.g.:
>>>>>
>>>>> // Wait for recording to be transferred and be closed
>>>>> while
>>>>> (!FlightRecorder.getFlightRecorder().getRecordings().isEmpty()) {
>>>>> Thread.sleep(100);
>>>>> }
>>>>>
>>>>> System.out.println("Leaving main()");
>>>>>
>>>>> Let me know if you see any problems with this approach.
>>>>
>>>> The transfer may complete before the child process sees the STOPPED
>>>> state, which means it will be stuck in the first loop. I think you
>>>> need a state(i.e a second recording or some other mechanism) that
>>>> is not associated with the recording that you want to operate on.
>>>>
>>>> Erik
>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Misha
>>>>>
>>>>>> If you think the test recording should be closed, you could find
>>>>>> some other means to communicate to the child process. For
>>>>>> example, create two new recordings when the test recording has
>>>>>> been closed and then in the loop check the count and if it is two
>>>>>> exit the application.
>>>>>>
>>>>>> Erik
>>>>>>
>>>>>>> On 5 Dec 2019, at 05:53, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>
>>>>>>> Hi Erik,
>>>>>>> I have followed your suggestions, but have modified the
>>>>>>> code a bit in the event producer class from
>>>>>>> your original. First the child "event producer" process
>>>>>>> waits for the JFR system/recorder to be initialized,
>>>>>>> then waits for recording to stop, and after that waiting
>>>>>>> for recording(s) to close; the main test process
>>>>>>> will transfer recording first, then close it.
>>>>>>>
>>>>>>> Updated webrev:
>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.02/
>>>>>>>
>>>>>>> I ran the test 20 times in our test lab, all PASS
>>>>>>>
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Misha
>>>>>>>
>>>>>>> On 12/4/19 10:02 AM, Mikhailo Seledtsov wrote:
>>>>>>>> Hi Erik,
>>>>>>>>
>>>>>>>> Thank you for reviewing the test, and thanks for
>>>>>>>> recommendations. I will make the updates according to your
>>>>>>>> recommendations, and will post updated webrev after some testing.
>>>>>>>>
>>>>>>>>
>>>>>>>> Misha
>>>>>>>>
>>>>>>>> On 12/3/19, 11:46 AM, Erik Gahlin wrote:
>>>>>>>>> Hi Misha,
>>>>>>>>>
>>>>>>>>> Looks good overall, but it seems brittle to expect that the
>>>>>>>>> test will finish within the 10 s the EvenyGenerator runs
>>>>>>>>>
>>>>>>>>> Maybe you could make it deterministic by (for example)
>>>>>>>>> checking if the recording has stopped:
>>>>>>>>>
>>>>>>>>> int i = 0;
>>>>>>>>> while (true) {
>>>>>>>>> SimpleEvent event = new SimpleEvent();
>>>>>>>>> event.msg = "Hello";
>>>>>>>>> event.count = i++;
>>>>>>>>> event.commit();
>>>>>>>>> Thread.sleep(10);
>>>>>>>>> if (FlightRecorder.isInitialized()) {
>>>>>>>>> for (Recording r :
>>>>>>>>> FlightRecorder.getFlightRecorder().getRecordings()) {
>>>>>>>>> if (r.getState() == RecordingState.STOPPED) {
>>>>>>>>> System.exit(0);
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> and then not close the recording in the test?
>>>>>>>>>
>>>>>>>>> The test could also be made a bit more realistic if you run a
>>>>>>>>> default recording
>>>>>>>>>
>>>>>>>>> bean.setPredefinedConfiguration(recordingId, "default");
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Erik
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2019-12-03 05:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>> Please review this change that tests JFR operations across
>>>>>>>>>> container boundary while using JMX to control JFR.
>>>>>>>>>> The test starts a container running a simple event producer,
>>>>>>>>>> establishes a JMX connection, uses FlightRecorderMXBean to
>>>>>>>>>> start the recording, stop and dump the recording, then
>>>>>>>>>> transfers recording from the container.
>>>>>>>>>> Simple verification is performed on the transferred
>>>>>>>>>> recording, making sure that the expected test event is present.
>>>>>>>>>>
>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8223217
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8223217.01/
>>>>>>>>>> Testing:
>>>>>>>>>> 1. Ran the newly created test at least 30 times on a
>>>>>>>>>> Linux-x64 host with docker engine configured - PASSED
>>>>>>>>>> 2. Ran the test at least 20 times on a test cluster:
>>>>>>>>>> ALL PASS
>>>>>>>>>> 3. Ran all hotspot/containers tests on the test
>>>>>>>>>> cluster: PASS
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Misha
>>>>>>>>>>
>>
More information about the hotspot-jfr-dev
mailing list