RFR(S): 8223217: [TESTBUG] Create JFR tests with JMX across container boundary
mikhailo.seledtsov at oracle.com
mikhailo.seledtsov at oracle.com
Fri Feb 21 18:49:21 UTC 2020
Leonid,
Thank you for review. I will make updates that you recommend before push.
Thank you,
Misha
On 2/20/20 8:09 PM, Leonid Mesnik wrote:
> The latest version seems fine to me. 2 small nits.
>
> http://cr.openjdk.java.net/~mseledtsov/8223217.04/test/hotspot/jtreg/containers/docker/TestJFRWithJMX.java.html
>
>
> 71 static AtomicReference<String> ipAddr = new AtomicReference();
>
> It makes sense to make ipAddr final to prevent some issues during
> editing test. It wouldn't be possible to set it to null occasionally.
> Also IDE might warn about mistakes like "if (iAddr == null) {"
> knowing that ipAddr couldn't be updated.
>
>
>
> 218 try (FileOutputStream fos = new FileOutputStream(f);
> 219 BufferedOutputStream bos = new
> BufferedOutputStream(fos)) {
>
> The indentation of line 219 is wrong (at least in webrev). And better
> to replace it with one line. bos.close() also close FOS.
>
> 218 try (BufferedOutputStream bos = new
> BufferedOutputStream(new FileOutputStream(f)) {
>
>
> Leonid
> On 2/20/20 4:37 PM, mikhailo.seledtsov at oracle.com wrote:
>> Thank you. I will update per your suggestion.
>>
>> On 2/20/20 4:36 PM, Erik Gahlin wrote:
>>>
>>> Looks good.
>>>
>>> I think null could be avoided all together and instead have
>>> something like this:
>>>
>>> static JMXConnector waitForJmxConnection(String host, int port)
>>> throws Exception {
>>> String urlPath = "/jndi/rmi://" + host + ":" + port +
>>> "/jmxrmi";
>>> JMXServiceURL url = new JMXServiceURL("rmi", "", 0, urlPath);
>>> while (true) {
>>> try {
>>> return JMXConnectorFactory.connect(url);
>>> } catch (IOException e) {
>>> System.out.println("establishJmxConnection() thrown
>>> IOException: " + e.getMessage());
>>> }
>>> Thread.sleep(1000);
>>> }
>>> }
>>>
>>> No need to send updated webrev if you change into the above.
>>>
>>> Thanks
>>> Erik
>>>
>>>
>>>
>>> On 2020-02-21 01:20, mikhailo.seledtsov at oracle.com wrote:
>>>>
>>>> Hi Erik
>>>>
>>>> Thank you for review. See my comments inline:
>>>>
>>>> On 2/20/20 3:48 PM, Erik Gahlin wrote:
>>>>> Hi Misha,
>>>>>
>>>>> Overall looks good.
>>>>>
>>>>> Does the consumer used with ProcessTools.startProcess get output
>>>>> one line at a time? If so, there is no risk that the test reads
>>>>> half an IP address.
>>>> Yes, it does.
>>>>>
>>>>> In waitForJmxConnection you loop until the connector is not null.
>>>>> This seems incorrect. I would expect JMXCinnectoryFactory to throw
>>>>> an exception on failure, not return null. You may want to catch
>>>>> the exception and retry.
>>>>>
>>>> I have updated the establishJmxConnection() method with try/catch,
>>>> returning null in case if IOException occurs.
>>>>
>>>>> Not sure if it matters, but if the test fails with an exception,
>>>>> for example in the file transfer, killContainer(containerName)
>>>>> will not run and get cleaned up? Is that a problem? Try-finally
>>>>> would ensure that it is always run.
>>>>
>>>> Good catch. It is always good to cleanup resources, otherwise
>>>> orphaned container will keep running. I added try/finally as you
>>>> recommend.
>>>>
>>>> Here is the updated webrev:
>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.04/
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Misha
>>>>
>>>>>
>>>>> Thanks
>>>>> Erik
>>>>>
>>>>>> On 21 Feb 2020, at 00:23, mikhailo.seledtsov at oracle.com
>>>>>> <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>>>>>
>>>>>> I have resumed working on this issue, after some break for other
>>>>>> activities.
>>>>>>
>>>>>> Here is the latest webrev:
>>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.03/
>>>>>>
>>>>>> Please review.
>>>>>>
>>>>>> Summary of changes since last webrev:
>>>>>>
>>>>>> - simplified the observee "EventProducer.java" to a minimum: it
>>>>>> communicates the self IP address, then produces events in a loop
>>>>>>
>>>>>> - the main test class TestJFRWithJMX.java now controls the
>>>>>> observee; when done recording, it transfers the recording over
>>>>>> JMX and kills the observee container
>>>>>>
>>>>>> - general cleanup
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Misha
>>>>>>
>>>>>> *
>>>>>> *
>>>>>>
>>>>>>
>>>>>> On 12/10/19 6:27 AM, Erik Gahlin wrote:
>>>>>>> Hi Misha,
>>>>>>>
>>>>>>> On 2019-12-05 19:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>> Hi Erik,
>>>>>>>>
>>>>>>>> On 12/4/19 11:30 PM, Erik Gahlin wrote:
>>>>>>>>> Hi Misha,
>>>>>>>>>
>>>>>>>>> The recording may complete before:
>>>>>>>>>
>>>>>>>>> while (r.getRecordings().get(0).getState() !=
>>>>>>>>> RecordingState.STOPPED)
>>>>>>>>>
>>>>>>>>> is reached, or the recording may mot even be created (added to
>>>>>>>>> the list). This will result in an IndexOutOfBoundsException.
>>>>>>>> Right. I did not account for that.
>>>>>>>>>
>>>>>>>>> In my example I worked around it by looping over the list
>>>>>>>>> (possibly empty) and by not closing the recording I could be
>>>>>>>>> sure that the STOPPED state will eventually be seen. When the
>>>>>>>>> recording is closed it will disappear from the list.
>>>>>>>>
>>>>>>>> I will use your example then, with one change. Instead of
>>>>>>>> existing the test once (r.getState() == RecordingState.STOPPED)
>>>>>>>> is true, the test will break. If child process exits right
>>>>>>>> away, this will result in closing of the container, depriving
>>>>>>>> the main process the opportunity to transfer the recording.
>>>>>>>
>>>>>>> Agree, it will not work with transfers.
>>>>>>>
>>>>>>>> Perhaps, I should break instead of System.exit(), and then have
>>>>>>>> another loop to wait for all the recordings to close
>>>>>>>> (r.getRecordings() become empty). After stopping the recording
>>>>>>>> the main process transfers the recordings to the "host" system
>>>>>>>> from the container, and then closes the recording. This will
>>>>>>>> signal the "child/observed" process to "exit". E.g.:
>>>>>>>>
>>>>>>>> // Wait for recording to be transferred and be closed
>>>>>>>> while
>>>>>>>> (!FlightRecorder.getFlightRecorder().getRecordings().isEmpty()) {
>>>>>>>> Thread.sleep(100);
>>>>>>>> }
>>>>>>>>
>>>>>>>> System.out.println("Leaving main()");
>>>>>>>>
>>>>>>>> Let me know if you see any problems with this approach.
>>>>>>>
>>>>>>> The transfer may complete before the child process sees the
>>>>>>> STOPPED state, which means it will be stuck in the first loop. I
>>>>>>> think you need a state(i.e a second recording or some other
>>>>>>> mechanism) that is not associated with the recording that you
>>>>>>> want to operate on.
>>>>>>>
>>>>>>> Erik
>>>>>>>
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>>
>>>>>>>> Misha
>>>>>>>>
>>>>>>>>> If you think the test recording should be closed, you could
>>>>>>>>> find some other means to communicate to the child process. For
>>>>>>>>> example, create two new recordings when the test recording has
>>>>>>>>> been closed and then in the loop check the count and if it is
>>>>>>>>> two exit the application.
>>>>>>>>>
>>>>>>>>> Erik
>>>>>>>>>
>>>>>>>>>> On 5 Dec 2019, at 05:53, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Erik,
>>>>>>>>>> I have followed your suggestions, but have modified the
>>>>>>>>>> code a bit in the event producer class from
>>>>>>>>>> your original. First the child "event producer" process
>>>>>>>>>> waits for the JFR system/recorder to be initialized,
>>>>>>>>>> then waits for recording to stop, and after that waiting
>>>>>>>>>> for recording(s) to close; the main test process
>>>>>>>>>> will transfer recording first, then close it.
>>>>>>>>>>
>>>>>>>>>> Updated webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.02/
>>>>>>>>>>
>>>>>>>>>> I ran the test 20 times in our test lab, all PASS
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Misha
>>>>>>>>>>
>>>>>>>>>> On 12/4/19 10:02 AM, Mikhailo Seledtsov wrote:
>>>>>>>>>>> Hi Erik,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for reviewing the test, and thanks for
>>>>>>>>>>> recommendations. I will make the updates according to your
>>>>>>>>>>> recommendations, and will post updated webrev after some
>>>>>>>>>>> testing.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Misha
>>>>>>>>>>>
>>>>>>>>>>> On 12/3/19, 11:46 AM, Erik Gahlin wrote:
>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>
>>>>>>>>>>>> Looks good overall, but it seems brittle to expect that the
>>>>>>>>>>>> test will finish within the 10 s the EvenyGenerator runs
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe you could make it deterministic by (for example)
>>>>>>>>>>>> checking if the recording has stopped:
>>>>>>>>>>>>
>>>>>>>>>>>> int i = 0;
>>>>>>>>>>>> while (true) {
>>>>>>>>>>>> SimpleEvent event = new SimpleEvent();
>>>>>>>>>>>> event.msg = "Hello";
>>>>>>>>>>>> event.count = i++;
>>>>>>>>>>>> event.commit();
>>>>>>>>>>>> Thread.sleep(10);
>>>>>>>>>>>> if (FlightRecorder.isInitialized()) {
>>>>>>>>>>>> for (Recording r :
>>>>>>>>>>>> FlightRecorder.getFlightRecorder().getRecordings()) {
>>>>>>>>>>>> if (r.getState() == RecordingState.STOPPED) {
>>>>>>>>>>>> System.exit(0);
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> and then not close the recording in the test?
>>>>>>>>>>>>
>>>>>>>>>>>> The test could also be made a bit more realistic if you run
>>>>>>>>>>>> a default recording
>>>>>>>>>>>>
>>>>>>>>>>>> bean.setPredefinedConfiguration(recordingId, "default");
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Erik
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2019-12-03 05:06, mikhailo.seledtsov at oracle.com wrote:
>>>>>>>>>>>>> Please review this change that tests JFR operations across
>>>>>>>>>>>>> container boundary while using JMX to control JFR.
>>>>>>>>>>>>> The test starts a container running a simple event
>>>>>>>>>>>>> producer, establishes a JMX connection, uses
>>>>>>>>>>>>> FlightRecorderMXBean to start the recording, stop and dump
>>>>>>>>>>>>> the recording, then transfers recording from the container.
>>>>>>>>>>>>> Simple verification is performed on the transferred
>>>>>>>>>>>>> recording, making sure that the expected test event is
>>>>>>>>>>>>> present.
>>>>>>>>>>>>>
>>>>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8223217
>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8223217.01/
>>>>>>>>>>>>> Testing:
>>>>>>>>>>>>> 1. Ran the newly created test at least 30 times
>>>>>>>>>>>>> on a Linux-x64 host with docker engine configured - PASSED
>>>>>>>>>>>>> 2. Ran the test at least 20 times on a test
>>>>>>>>>>>>> cluster: ALL PASS
>>>>>>>>>>>>> 3. Ran all hotspot/containers tests on the test
>>>>>>>>>>>>> cluster: PASS
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>> Misha
>>>>>>>>>>>>>
>>>>>
More information about the hotspot-runtime-dev
mailing list