RFR(S): 8229209: [TESTBUG] test for cross-process JFR event streaming
mikhailo.seledtsov at oracle.com
mikhailo.seledtsov at oracle.com
Fri Nov 15 04:17:30 UTC 2019
Here is the updated webrev:
http://cr.openjdk.java.net/~mseledtsov/8229209.06/
On 11/14/19 11:40 AM, Mikhailo Seledtsov wrote:
>
>
> On 11/14/19, 11:26 AM, Igor Ignatyev wrote:
>>
>>
>>> On Nov 14, 2019, at 11:23 AM, Mikhailo Seledtsov
>>> <mikhailo.seledtsov at oracle.com
>>> <mailto:mikhailo.seledtsov at oracle.com>> wrote:
>>>
>>> Minor update: http://cr.openjdk.java.net/~mseledtsov/8229209.05/
>>> Difference since v04: removed exit code verification from
>>> testProducerCrashing(), since crashing process produces different
>>> exit codes on different systems, and it is not strictly necessary to
>>> verify that.
>> although it's not strictly need to verify the product behavior,
>> checking that a process really crashed is beneficial to catch
>> possible test bugs, you can assert that it's not 0 and you also
>> might check that there is no 'Leaving main()' in stdout.
> I will do that.
>>
>> the rest looks good to me.
> Thank you,
> Misha
>>
>> -- Igor
>>>
>>> Testing: ran multiple times in the test cluster, ALL PASS
>>>
>>> Misha
>>>
>>> On 11/13/19, 4:42 PM, Mikhailo Seledtsov wrote:
>>>> Here is the updated webrev:
>>>> http://cr.openjdk.java.net/~mseledtsov/8229209.04/
>>>>
>>>> I have incorporated feedback from Igor, as well as feedback from
>>>> chatting with Erik, and some fixes.
>>>> Feedback from Erik:
>>>> - use VistualMachine.attach() instead of JCMD - preferred way,
>>>> simplifies code
>>>> - rework the crash logic in a producer a bit - simplifies code
>>>>
>>>> Also, I have added exports for child process "--add-exports" for
>>>> Unsafe, added check for child process exit code, plus comment cleanup.
>>>>
>>>> Thank you,
>>>> Misha
>>>>
>>>> On 11/13/19, 2:12 PM, Erik Gahlin wrote:
>>>>>
>>>>> On 2019-11-13 22:38, Mikhailo Seledtsov wrote:
>>>>>> Hi Igor,
>>>>>>
>>>>>> Thank you for reviewing the test.
>>>>>>
>>>>>> The original question in your first email: I do not think it is
>>>>>> necessary to run the test in othervm mode; I will remove the
>>>>>> 'othervm' from at-run. The crashing process is the child of the
>>>>>> main test process, so this should not affect the state of the
>>>>>> main test VM.
>>>>>>
>>>>>> On 11/12/19, 9:59 PM, Igor Ignatyev wrote:
>>>>>>> Hi Misha,
>>>>>>>
>>>>>>> I've finished reading, several more comments:
>>>>>>>
>>>>>>> 2) in consumeEvents at line #167, it'd be clearer and safer to
>>>>>>> use TestEvent.class.getName().
>>>>>> Fixed
>>>>>>> 3) instead of asking JVM which you create its system properties
>>>>>>> in getJfrRepository(), can we just start "producer" process w/
>>>>>>> jdk.jfr.repository being set to a well-know location?
>>>>>> AFAIK, JFR sub-system picks the location automatically. Part of
>>>>>> location is a base directory, e.g. "/private/var/folders/" on
>>>>>> Linux, while the second part is picked at runtime based on time
>>>>>> stamp and PID. I would like to leave this as is for now, while I
>>>>>> start a discussion with JFR team on how to improve this.
>>>>>
>>>>> It was our initial plan to provide means to set the repository on
>>>>> the command line, but we ran into issues so we decided against it,
>>>>> at least for the time being.
>>>>>
>>>>> -XX:FligtRecorderOptions:repository=... today means the base
>>>>> repository so it may break code if we change it.
>>>>>
>>>>> Adding a second repository option (what would it be called?) would
>>>>> confuse users, but also force us to deal with a situation where
>>>>> the repository already exist. This can happen if the JVM crashes,
>>>>> or if on Windows and another process is reading a file, and the
>>>>> directory can't be removed. It can also happen, if you start two
>>>>> JVMs with the same repository path.
>>>>>
>>>>> One can imagine a production deployment where a server is
>>>>> restarted if it crashes, but it will not start up again because
>>>>> the repository of the previous process was left behind.
>>>>>
>>>>>> This is a good point. As a consideration, the user should be able
>>>>>> to specify this property, or at least the base directory for JFR
>>>>>> repositories. Once this is resolved, I will update the test
>>>>>> accordingly.
>>>>>>> 4) to make the test more robust, at line #118 I'd compare
>>>>>>> eventsDeliveredWhileProducerIsAlive w/ 1 instead of 1/2 of
>>>>>>> eventsDeliveredTotal.
>>>>>> OK.
>>>>>>> 5) at line #110 please use Asserts.assertEqual instead of
>>>>>>> Assert.assertTrue(a == b) as it not just compare the values but
>>>>>>> also prints them out in case !(a==b)
>>>>>> Fixed
>>>>>>> 6) at line #118 please use Asserts.assertGreaterThanOrEqual
>>>>>>> instead of Assert.assertTrue(a>= b) as it not just compare the
>>>>>>> values but also prints them out in case !(a>=b)
>>>>>> Fixed
>>>>>>> 7) is it really necessary to wait till "producer" process main
>>>>>>> is started?
>>>>>> This is necessary to make sure that VM has initialized to a point
>>>>>> where it can correctly receive JCMD requests. We ran into this
>>>>>> type of issue before.
>>>>>>> 7.1) if it is, you should use String::equals at line #204
>>>>>>> instead of contains
>>>>>>> 7.2) if the answer of 7) is true, I'd recommend you to use
>>>>>>> j.u.c.CountDownLatch (w/ count = 1) instead of AtomicBoolean for
>>>>>>> producerMainStarted (you might need to find a better name, e.g.
>>>>>>> producerStartedSignal) to convey a signal that "producer"
>>>>>>> process reached its main, so line#205 will become
>>>>>>> producerMainStarted.countDown(), and line#142 --
>>>>>>> producerMainStarted.await()
>>>>>> Thanks for suggesting during our conversation the use of
>>>>>> ProcessTools.startProcess() with wait-until-predicate. I will use
>>>>>> this, it will simplify the code and be more declarative.
>>>>>>> 8) I find usage of AtomicIntegeres as a way to return two values
>>>>>>> from consumeEvents() at a very least confusing, could you please
>>>>>>> create a "record" class for that purpose instead?
>>>>>> I can do that, as we have discussed.
>>>>>>> 9) local 'id' (line #169) isn't use anywhere. will it be
>>>>>>> beneficial to check that its value is b/w 0 and
>>>>>>> (NUMBER_OF_EVENTS_PRODUCED - 1) and, for
>>>>>>> testProducerExitsNormally case, that sum of all id is
>>>>>>> ((NUMBER_OF_EVENTS_PRODUCED - 1) * NUMBER_OF_EVENTS_PRODUCED / 2) ?
>>>>>> The id is not necessary. I used it for development and
>>>>>> experimentation. I will remove it.
>>>>>>> 9.1) and if EventStream::onEvent is expected to preserve events
>>>>>>> order, that 'id' is strictly increasing?
>>>>>> No guarantee of that.
>>>>>>
>>>>>> I will make the updates, test, and post the updated webrev.
>>>>>>
>>>>>> Thank you,
>>>>>> Misha
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -- Igor
>>>>>>>
>>>>>>>> On Nov 12, 2019, at 9:12 PM, Igor
>>>>>>>> Ignatyev<igor.ignatyev at oracle.com> wrote:
>>>>>>>>
>>>>>>>> Hi Misha,
>>>>>>>>
>>>>>>>> I haven't read the test code thoroughly yet, but I have a
>>>>>>>> question -- is there a reason for TestCrossProcessStreaming to
>>>>>>>> be run in othervm mode?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -- Igor
>>>>>>>>
>>>>>>>>> On Nov 12, 2019, at 4:20 PM, Mikhailo
>>>>>>>>> Seledtsov<mikhailo.seledtsov at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Please review this new JFR test. In this test the producer of
>>>>>>>>> JFR events and the consumer run in separate processes.
>>>>>>>>> This is expected to be a common scenario with JFR streaming
>>>>>>>>> for monitoring applications and services.
>>>>>>>>>
>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8229209
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8229209.01/
>>>>>>>>> Testing:
>>>>>>>>> - ran this new test locally (Mac, Linux-x64) multiple
>>>>>>>>> times - PASS
>>>>>>>>> - running this test on the test cluster - in progress
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Misha
>>>>>>>>>
>>
More information about the hotspot-jfr-dev
mailing list