RFR(S): 8223217: [TESTBUG] Create JFR tests with JMX across container boundary
Erik Gahlin
erik.gahlin at oracle.com
Thu Feb 20 23:48:22 UTC 2020
Hi Misha,
Overall looks good.
Does the consumer used with ProcessTools.startProcess get output one line at a time? If so, there is no risk that the test reads half an IP address.
In waitForJmxConnection you loop until the connector is not null. This seems incorrect. I would expect JMXCinnectoryFactory to throw an exception on failure, not return null. You may want to catch the exception and retry.
Not sure if it matters, but if the test fails with an exception, for example in the file transfer, killContainer(containerName) will not run and get cleaned up? Is that a problem? Try-finally would ensure that it is always run.
Thanks
Erik
> On 21 Feb 2020, at 00:23, mikhailo.seledtsov at oracle.com wrote:
>
> I have resumed working on this issue, after some break for other activities.
>
> Here is the latest webrev: http://cr.openjdk.java.net/~mseledtsov/8223217.03/ <http://cr.openjdk.java.net/~mseledtsov/8223217.03/>
> Please review.
>
> Summary of changes since last webrev:
>
> - simplified the observee "EventProducer.java" to a minimum: it communicates the self IP address, then produces events in a loop
> - the main test class TestJFRWithJMX.java now controls the observee; when done recording, it transfers the recording over JMX and kills the observee container
>
> - general cleanup
>
>
> Thanks,
>
> Misha
>
>
>
> On 12/10/19 6:27 AM, Erik Gahlin wrote:
>> Hi Misha,
>>
>> On 2019-12-05 19:06, mikhailo.seledtsov at oracle.com <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>> Hi Erik,
>>>
>>> On 12/4/19 11:30 PM, Erik Gahlin wrote:
>>>> Hi Misha,
>>>>
>>>> The recording may complete before:
>>>>
>>>> while (r.getRecordings().get(0).getState() != RecordingState.STOPPED)
>>>>
>>>> is reached, or the recording may mot even be created (added to the list). This will result in an IndexOutOfBoundsException.
>>> Right. I did not account for that.
>>>>
>>>> In my example I worked around it by looping over the list (possibly empty) and by not closing the recording I could be sure that the STOPPED state will eventually be seen. When the recording is closed it will disappear from the list.
>>>
>>> I will use your example then, with one change. Instead of existing the test once (r.getState() == RecordingState.STOPPED) is true, the test will break. If child process exits right away, this will result in closing of the container, depriving the main process the opportunity to transfer the recording.
>>
>> Agree, it will not work with transfers.
>>
>>> Perhaps, I should break instead of System.exit(), and then have another loop to wait for all the recordings to close (r.getRecordings() become empty). After stopping the recording the main process transfers the recordings to the "host" system from the container, and then closes the recording. This will signal the "child/observed" process to "exit". E.g.:
>>>
>>> // Wait for recording to be transferred and be closed
>>> while (!FlightRecorder.getFlightRecorder().getRecordings().isEmpty()) {
>>> Thread.sleep(100);
>>> }
>>>
>>> System.out.println("Leaving main()");
>>>
>>> Let me know if you see any problems with this approach.
>>
>> The transfer may complete before the child process sees the STOPPED state, which means it will be stuck in the first loop. I think you need a state(i.e a second recording or some other mechanism) that is not associated with the recording that you want to operate on.
>>
>> Erik
>>
>>>
>>> Thank you,
>>>
>>> Misha
>>>
>>>> If you think the test recording should be closed, you could find some other means to communicate to the child process. For example, create two new recordings when the test recording has been closed and then in the loop check the count and if it is two exit the application.
>>>>
>>>> Erik
>>>>
>>>>> On 5 Dec 2019, at 05:53, mikhailo.seledtsov at oracle.com <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>>>>
>>>>> Hi Erik,
>>>>> I have followed your suggestions, but have modified the code a bit in the event producer class from
>>>>> your original. First the child "event producer" process waits for the JFR system/recorder to be initialized,
>>>>> then waits for recording to stop, and after that waiting for recording(s) to close; the main test process
>>>>> will transfer recording first, then close it.
>>>>>
>>>>> Updated webrev: http://cr.openjdk.java.net/~mseledtsov/8223217.02/ <http://cr.openjdk.java.net/~mseledtsov/8223217.02/>
>>>>>
>>>>> I ran the test 20 times in our test lab, all PASS
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Misha
>>>>>
>>>>> On 12/4/19 10:02 AM, Mikhailo Seledtsov wrote:
>>>>>> Hi Erik,
>>>>>>
>>>>>> Thank you for reviewing the test, and thanks for recommendations. I will make the updates according to your recommendations, and will post updated webrev after some testing.
>>>>>>
>>>>>>
>>>>>> Misha
>>>>>>
>>>>>> On 12/3/19, 11:46 AM, Erik Gahlin wrote:
>>>>>>> Hi Misha,
>>>>>>>
>>>>>>> Looks good overall, but it seems brittle to expect that the test will finish within the 10 s the EvenyGenerator runs
>>>>>>>
>>>>>>> Maybe you could make it deterministic by (for example) checking if the recording has stopped:
>>>>>>>
>>>>>>> int i = 0;
>>>>>>> while (true) {
>>>>>>> SimpleEvent event = new SimpleEvent();
>>>>>>> event.msg = "Hello";
>>>>>>> event.count = i++;
>>>>>>> event.commit();
>>>>>>> Thread.sleep(10);
>>>>>>> if (FlightRecorder.isInitialized()) {
>>>>>>> for (Recording r : FlightRecorder.getFlightRecorder().getRecordings()) {
>>>>>>> if (r.getState() == RecordingState.STOPPED) {
>>>>>>> System.exit(0);
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> and then not close the recording in the test?
>>>>>>>
>>>>>>> The test could also be made a bit more realistic if you run a default recording
>>>>>>>
>>>>>>> bean.setPredefinedConfiguration(recordingId, "default");
>>>>>>>
>>>>>>> Thanks
>>>>>>> Erik
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2019-12-03 05:06, mikhailo.seledtsov at oracle.com <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>>>>>>> Please review this change that tests JFR operations across container boundary while using JMX to control JFR.
>>>>>>>> The test starts a container running a simple event producer, establishes a JMX connection, uses FlightRecorderMXBean to start the recording, stop and dump the recording, then transfers recording from the container.
>>>>>>>> Simple verification is performed on the transferred recording, making sure that the expected test event is present.
>>>>>>>>
>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8223217 <https://bugs.openjdk.java.net/browse/JDK-8223217>
>>>>>>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8223217.01/ <http://cr.openjdk.java.net/~mseledtsov/8223217.01/>
>>>>>>>> Testing:
>>>>>>>> 1. Ran the newly created test at least 30 times on a Linux-x64 host with docker engine configured - PASSED
>>>>>>>> 2. Ran the test at least 20 times on a test cluster: ALL PASS
>>>>>>>> 3. Ran all hotspot/containers tests on the test cluster: PASS
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Misha
>>>>>>>>
More information about the hotspot-jfr-dev
mailing list