RFR(S): 8229209: [TESTBUG] test for cross-process JFR event streaming
Mikhailo Seledtsov
mikhailo.seledtsov at oracle.com
Thu Nov 14 19:40:56 UTC 2019
On 11/14/19, 11:26 AM, Igor Ignatyev wrote:
>
>
>> On Nov 14, 2019, at 11:23 AM, Mikhailo Seledtsov
>> <mikhailo.seledtsov at oracle.com
>> <mailto:mikhailo.seledtsov at oracle.com>> wrote:
>>
>> Minor update: http://cr.openjdk.java.net/~mseledtsov/8229209.05/
>> Difference since v04: removed exit code verification from
>> testProducerCrashing(), since crashing process produces different
>> exit codes on different systems, and it is not strictly necessary to
>> verify that.
> although it's not strictly need to verify the product behavior,
> checking that a process really crashed is beneficial to catch possible
> test bugs, you can assert that it's not 0 and you also might check
> that there is no 'Leaving main()' in stdout.
I will do that.
>
> the rest looks good to me.
Thank you,
Misha
>
> -- Igor
>>
>> Testing: ran multiple times in the test cluster, ALL PASS
>>
>> Misha
>>
>> On 11/13/19, 4:42 PM, Mikhailo Seledtsov wrote:
>>> Here is the updated webrev:
>>> http://cr.openjdk.java.net/~mseledtsov/8229209.04/
>>>
>>> I have incorporated feedback from Igor, as well as feedback from
>>> chatting with Erik, and some fixes.
>>> Feedback from Erik:
>>> - use VistualMachine.attach() instead of JCMD - preferred way,
>>> simplifies code
>>> - rework the crash logic in a producer a bit - simplifies code
>>>
>>> Also, I have added exports for child process "--add-exports" for
>>> Unsafe, added check for child process exit code, plus comment cleanup.
>>>
>>> Thank you,
>>> Misha
>>>
>>> On 11/13/19, 2:12 PM, Erik Gahlin wrote:
>>>>
>>>> On 2019-11-13 22:38, Mikhailo Seledtsov wrote:
>>>>> Hi Igor,
>>>>>
>>>>> Thank you for reviewing the test.
>>>>>
>>>>> The original question in your first email: I do not think it is
>>>>> necessary to run the test in othervm mode; I will remove the
>>>>> 'othervm' from at-run. The crashing process is the child of the
>>>>> main test process, so this should not affect the state of the main
>>>>> test VM.
>>>>>
>>>>> On 11/12/19, 9:59 PM, Igor Ignatyev wrote:
>>>>>> Hi Misha,
>>>>>>
>>>>>> I've finished reading, several more comments:
>>>>>>
>>>>>> 2) in consumeEvents at line #167, it'd be clearer and safer to
>>>>>> use TestEvent.class.getName().
>>>>> Fixed
>>>>>> 3) instead of asking JVM which you create its system properties
>>>>>> in getJfrRepository(), can we just start "producer" process w/
>>>>>> jdk.jfr.repository being set to a well-know location?
>>>>> AFAIK, JFR sub-system picks the location automatically. Part of
>>>>> location is a base directory, e.g. "/private/var/folders/" on
>>>>> Linux, while the second part is picked at runtime based on time
>>>>> stamp and PID. I would like to leave this as is for now, while I
>>>>> start a discussion with JFR team on how to improve this.
>>>>
>>>> It was our initial plan to provide means to set the repository on
>>>> the command line, but we ran into issues so we decided against it,
>>>> at least for the time being.
>>>>
>>>> -XX:FligtRecorderOptions:repository=... today means the base
>>>> repository so it may break code if we change it.
>>>>
>>>> Adding a second repository option (what would it be called?) would
>>>> confuse users, but also force us to deal with a situation where the
>>>> repository already exist. This can happen if the JVM crashes, or if
>>>> on Windows and another process is reading a file, and the directory
>>>> can't be removed. It can also happen, if you start two JVMs with
>>>> the same repository path.
>>>>
>>>> One can imagine a production deployment where a server is restarted
>>>> if it crashes, but it will not start up again because the
>>>> repository of the previous process was left behind.
>>>>
>>>>> This is a good point. As a consideration, the user should be able
>>>>> to specify this property, or at least the base directory for JFR
>>>>> repositories. Once this is resolved, I will update the test
>>>>> accordingly.
>>>>>> 4) to make the test more robust, at line #118 I'd compare
>>>>>> eventsDeliveredWhileProducerIsAlive w/ 1 instead of 1/2 of
>>>>>> eventsDeliveredTotal.
>>>>> OK.
>>>>>> 5) at line #110 please use Asserts.assertEqual instead of
>>>>>> Assert.assertTrue(a == b) as it not just compare the values but
>>>>>> also prints them out in case !(a==b)
>>>>> Fixed
>>>>>> 6) at line #118 please use Asserts.assertGreaterThanOrEqual
>>>>>> instead of Assert.assertTrue(a>= b) as it not just compare the
>>>>>> values but also prints them out in case !(a>=b)
>>>>> Fixed
>>>>>> 7) is it really necessary to wait till "producer" process main is
>>>>>> started?
>>>>> This is necessary to make sure that VM has initialized to a point
>>>>> where it can correctly receive JCMD requests. We ran into this
>>>>> type of issue before.
>>>>>> 7.1) if it is, you should use String::equals at line #204 instead
>>>>>> of contains
>>>>>> 7.2) if the answer of 7) is true, I'd recommend you to use
>>>>>> j.u.c.CountDownLatch (w/ count = 1) instead of AtomicBoolean for
>>>>>> producerMainStarted (you might need to find a better name, e.g.
>>>>>> producerStartedSignal) to convey a signal that "producer" process
>>>>>> reached its main, so line#205 will become
>>>>>> producerMainStarted.countDown(), and line#142 --
>>>>>> producerMainStarted.await()
>>>>> Thanks for suggesting during our conversation the use of
>>>>> ProcessTools.startProcess() with wait-until-predicate. I will use
>>>>> this, it will simplify the code and be more declarative.
>>>>>> 8) I find usage of AtomicIntegeres as a way to return two values
>>>>>> from consumeEvents() at a very least confusing, could you please
>>>>>> create a "record" class for that purpose instead?
>>>>> I can do that, as we have discussed.
>>>>>> 9) local 'id' (line #169) isn't use anywhere. will it be
>>>>>> beneficial to check that its value is b/w 0 and
>>>>>> (NUMBER_OF_EVENTS_PRODUCED - 1) and, for
>>>>>> testProducerExitsNormally case, that sum of all id is
>>>>>> ((NUMBER_OF_EVENTS_PRODUCED - 1) * NUMBER_OF_EVENTS_PRODUCED / 2) ?
>>>>> The id is not necessary. I used it for development and
>>>>> experimentation. I will remove it.
>>>>>> 9.1) and if EventStream::onEvent is expected to preserve events
>>>>>> order, that 'id' is strictly increasing?
>>>>> No guarantee of that.
>>>>>
>>>>> I will make the updates, test, and post the updated webrev.
>>>>>
>>>>> Thank you,
>>>>> Misha
>>>>>>
>>>>>> Thanks,
>>>>>> -- Igor
>>>>>>
>>>>>>> On Nov 12, 2019, at 9:12 PM, Igor
>>>>>>> Ignatyev<igor.ignatyev at oracle.com> wrote:
>>>>>>>
>>>>>>> Hi Misha,
>>>>>>>
>>>>>>> I haven't read the test code thoroughly yet, but I have a
>>>>>>> question -- is there a reason for TestCrossProcessStreaming to
>>>>>>> be run in othervm mode?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -- Igor
>>>>>>>
>>>>>>>> On Nov 12, 2019, at 4:20 PM, Mikhailo
>>>>>>>> Seledtsov<mikhailo.seledtsov at oracle.com> wrote:
>>>>>>>>
>>>>>>>> Please review this new JFR test. In this test the producer of
>>>>>>>> JFR events and the consumer run in separate processes.
>>>>>>>> This is expected to be a common scenario with JFR streaming for
>>>>>>>> monitoring applications and services.
>>>>>>>>
>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8229209
>>>>>>>> Webrev: http://cr.openjdk.java.net/~mseledtsov/8229209.01/
>>>>>>>> Testing:
>>>>>>>> - ran this new test locally (Mac, Linux-x64) multiple
>>>>>>>> times - PASS
>>>>>>>> - running this test on the test cluster - in progress
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Misha
>>>>>>>>
>
More information about the hotspot-jfr-dev
mailing list