RFR(S): 8229209: [TESTBUG] test for cross-process JFR event streaming

mikhailo.seledtsov at oracle.com mikhailo.seledtsov at oracle.com
Fri Nov 15 21:51:49 UTC 2019


On 11/14/19 8:32 PM, Igor Ignatyev wrote:
> reading your code, I remembered that ProcessTools.startProcess can't be used together w/ OutputAnalyzer (see 8233725 for details), i.e. you can't reliably use OutputAnalyzer to check output of the producer process,
Thanks. Good to know.
> so you can either use Consumer to check that 'Leaving main' isn't printed, or skip this check.

I will check the exit value directly from the Process.exitValue(), and 
skip the log string check.

Here is the updated webrev: 
http://cr.openjdk.java.net/~mseledtsov/8229209.07/


Misha

> sorry that I haven't warned you before.
>
> -- Igor
>
>> On Nov 14, 2019, at 8:17 PM, mikhailo.seledtsov at oracle.com wrote:
>>
>> Here is the updated webrev: http://cr.openjdk.java.net/~mseledtsov/8229209.06/
>>
>> On 11/14/19 11:40 AM, Mikhailo Seledtsov wrote:
>>>
>>> On 11/14/19, 11:26 AM, Igor Ignatyev wrote:
>>>>
>>>>> On Nov 14, 2019, at 11:23 AM, Mikhailo Seledtsov <mikhailo.seledtsov at oracle.com <mailto:mikhailo.seledtsov at oracle.com>> wrote:
>>>>>
>>>>> Minor update: http://cr.openjdk.java.net/~mseledtsov/8229209.05/
>>>>> Difference since v04: removed exit code verification from testProducerCrashing(), since crashing process produces different exit codes on different systems, and it is not strictly necessary to verify that.
>>>> although it's not strictly need to verify the product behavior, checking that a process really crashed is beneficial to catch possible test bugs, you can assert that it's not 0 and  you also might check that there is no 'Leaving main()' in stdout.
>>> I will do that.
>>>> the rest looks good to me.
>>> Thank you,
>>> Misha
>>>> -- Igor
>>>>> Testing: ran multiple times in the test cluster, ALL PASS
>>>>>
>>>>> Misha
>>>>>
>>>>> On 11/13/19, 4:42 PM, Mikhailo Seledtsov wrote:
>>>>>> Here is the updated webrev: http://cr.openjdk.java.net/~mseledtsov/8229209.04/
>>>>>>
>>>>>> I have incorporated feedback from Igor, as well as feedback from chatting with Erik, and some fixes.
>>>>>> Feedback from Erik:
>>>>>>    - use VistualMachine.attach() instead of JCMD - preferred way, simplifies code
>>>>>>    - rework the crash logic in a producer a bit - simplifies code
>>>>>>
>>>>>> Also, I have added exports for child process "--add-exports" for Unsafe, added check for child process exit code, plus comment cleanup.
>>>>>>
>>>>>> Thank you,
>>>>>> Misha
>>>>>>
>>>>>> On 11/13/19, 2:12 PM, Erik Gahlin wrote:
>>>>>>> On 2019-11-13 22:38, Mikhailo Seledtsov wrote:
>>>>>>>> Hi Igor,
>>>>>>>>
>>>>>>>>    Thank you for reviewing the test.
>>>>>>>>
>>>>>>>> The original question in your first email: I do not think it is necessary to run the test in othervm mode; I will remove the 'othervm' from at-run. The crashing process is the child of the main test process, so this should not affect the state of the main test VM.
>>>>>>>>
>>>>>>>> On 11/12/19, 9:59 PM, Igor Ignatyev wrote:
>>>>>>>>> Hi Misha,
>>>>>>>>>
>>>>>>>>> I've finished reading, several more comments:
>>>>>>>>>
>>>>>>>>> 2) in consumeEvents at line #167, it'd be clearer and safer to use TestEvent.class.getName().
>>>>>>>> Fixed
>>>>>>>>> 3) instead of asking JVM which you create its system properties in getJfrRepository(), can we just start "producer" process w/ jdk.jfr.repository being set to a well-know location?
>>>>>>>> AFAIK, JFR sub-system picks the location automatically. Part of location is a base directory, e.g. "/private/var/folders/" on Linux, while the second part is picked at runtime based on time stamp and PID. I would like to leave this as is for now, while I start a discussion with JFR team on how to improve this.
>>>>>>> It was our initial plan to provide means to set the repository on the command line, but we ran into issues so we decided against it, at least for the time being.
>>>>>>>
>>>>>>>   -XX:FligtRecorderOptions:repository=... today means the base repository so it may break code if we change it.
>>>>>>>
>>>>>>> Adding a second repository option (what would it be called?) would confuse users, but also force us to deal with a situation where the repository already exist. This can happen if the JVM crashes, or if on Windows and another process is reading a file, and the directory can't be removed. It can also happen, if you start two JVMs with the same repository path.
>>>>>>>
>>>>>>> One can imagine a production deployment where a server is restarted if it crashes, but it will not start up again because the repository of the previous process was left behind.
>>>>>>>
>>>>>>>> This is a good point. As a consideration, the user should be able to specify this property, or at least the base directory for JFR repositories. Once this is resolved, I will update the test accordingly.
>>>>>>>>> 4) to make the test more robust, at line #118 I'd compare eventsDeliveredWhileProducerIsAlive w/ 1 instead of 1/2 of eventsDeliveredTotal.
>>>>>>>> OK.
>>>>>>>>> 5) at line #110 please use Asserts.assertEqual instead of Assert.assertTrue(a == b) as it not just compare the values but also prints them out in case !(a==b)
>>>>>>>> Fixed
>>>>>>>>> 6) at line #118 please use Asserts.assertGreaterThanOrEqual instead of Assert.assertTrue(a>= b) as it not just compare the values but also prints them out in case !(a>=b)
>>>>>>>> Fixed
>>>>>>>>> 7) is it really necessary to wait till "producer" process main is started?
>>>>>>>> This is necessary to make sure that VM has initialized to a point where it can correctly receive JCMD requests. We ran into this type of issue before.
>>>>>>>>> 7.1) if it is, you should use String::equals at line #204 instead of contains
>>>>>>>>> 7.2) if the answer of 7) is true, I'd recommend you to use j.u.c.CountDownLatch (w/ count = 1) instead of AtomicBoolean for producerMainStarted (you might need to find a better name, e.g. producerStartedSignal) to convey a signal that "producer" process reached its main, so line#205 will become producerMainStarted.countDown(), and line#142 -- producerMainStarted.await()
>>>>>>>> Thanks for suggesting during our conversation the use of ProcessTools.startProcess() with wait-until-predicate. I will use this, it will simplify the code and be more declarative.
>>>>>>>>> 8) I find usage of AtomicIntegeres as a way to return two values from consumeEvents() at a very least confusing, could you please create a "record" class for that purpose instead?
>>>>>>>> I can do that, as we have discussed.
>>>>>>>>> 9) local 'id' (line #169) isn't use anywhere. will it be beneficial to check that its value is b/w 0 and (NUMBER_OF_EVENTS_PRODUCED - 1) and, for testProducerExitsNormally case, that sum of all id is ((NUMBER_OF_EVENTS_PRODUCED - 1) * NUMBER_OF_EVENTS_PRODUCED / 2) ?
>>>>>>>> The id is not necessary. I used it for development and experimentation. I will remove it.
>>>>>>>>> 9.1) and if EventStream::onEvent is expected to preserve events order, that 'id' is strictly increasing?
>>>>>>>> No guarantee of that.
>>>>>>>>
>>>>>>>> I will make the updates, test, and post the updated webrev.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Misha
>>>>>>>>> Thanks,
>>>>>>>>> -- Igor
>>>>>>>>>
>>>>>>>>>> On Nov 12, 2019, at 9:12 PM, Igor Ignatyev<igor.ignatyev at oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Misha,
>>>>>>>>>>
>>>>>>>>>> I haven't read the test code thoroughly yet, but I have a question -- is there a reason for TestCrossProcessStreaming to be run in othervm mode?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> -- Igor
>>>>>>>>>>
>>>>>>>>>>> On Nov 12, 2019, at 4:20 PM, Mikhailo Seledtsov<mikhailo.seledtsov at oracle.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Please review this new JFR test. In this test the producer of JFR events and the consumer run in separate processes.
>>>>>>>>>>> This is expected to be a common scenario with JFR streaming for monitoring applications and services.
>>>>>>>>>>>
>>>>>>>>>>>     JBS: https://bugs.openjdk.java.net/browse/JDK-8229209
>>>>>>>>>>>     Webrev: http://cr.openjdk.java.net/~mseledtsov/8229209.01/
>>>>>>>>>>>     Testing:
>>>>>>>>>>>       - ran this new test locally (Mac, Linux-x64) multiple times - PASS
>>>>>>>>>>>       - running this test on the test cluster - in progress
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Misha
>>>>>>>>>>>


More information about the hotspot-jfr-dev mailing list