RFR(S): 8229209: [TESTBUG] test for cross-process JFR event streaming

Mikhailo Seledtsov mikhailo.seledtsov at oracle.com
Thu Nov 14 19:40:56 UTC 2019



On 11/14/19, 11:26 AM, Igor Ignatyev wrote:
>
>
>> On Nov 14, 2019, at 11:23 AM, Mikhailo Seledtsov 
>> <mikhailo.seledtsov at oracle.com 
>> <mailto:mikhailo.seledtsov at oracle.com>> wrote:
>>
>> Minor update: http://cr.openjdk.java.net/~mseledtsov/8229209.05/
>> Difference since v04: removed exit code verification from 
>> testProducerCrashing(), since crashing process produces different 
>> exit codes on different systems, and it is not strictly necessary to 
>> verify that.
> although it's not strictly need to verify the product behavior, 
> checking that a process really crashed is beneficial to catch possible 
> test bugs, you can assert that it's not 0 and  you also might check 
> that there is no 'Leaving main()' in stdout.
I will do that.
>
> the rest looks good to me.
Thank you,
Misha
>
> -- Igor
>>
>> Testing: ran multiple times in the test cluster, ALL PASS
>>
>> Misha
>>
>> On 11/13/19, 4:42 PM, Mikhailo Seledtsov wrote:
>>> Here is the updated webrev: 
>>> http://cr.openjdk.java.net/~mseledtsov/8229209.04/
>>>
>>> I have incorporated feedback from Igor, as well as feedback from 
>>> chatting with Erik, and some fixes.
>>> Feedback from Erik:
>>>   - use VistualMachine.attach() instead of JCMD - preferred way, 
>>> simplifies code
>>>   - rework the crash logic in a producer a bit - simplifies code
>>>
>>> Also, I have added exports for child process "--add-exports" for 
>>> Unsafe, added check for child process exit code, plus comment cleanup.
>>>
>>> Thank you,
>>> Misha
>>>
>>> On 11/13/19, 2:12 PM, Erik Gahlin wrote:
>>>>
>>>> On 2019-11-13 22:38, Mikhailo Seledtsov wrote:
>>>>> Hi Igor,
>>>>>
>>>>>   Thank you for reviewing the test.
>>>>>
>>>>> The original question in your first email: I do not think it is 
>>>>> necessary to run the test in othervm mode; I will remove the 
>>>>> 'othervm' from at-run. The crashing process is the child of the 
>>>>> main test process, so this should not affect the state of the main 
>>>>> test VM.
>>>>>
>>>>> On 11/12/19, 9:59 PM, Igor Ignatyev wrote:
>>>>>> Hi Misha,
>>>>>>
>>>>>> I've finished reading, several more comments:
>>>>>>
>>>>>> 2) in consumeEvents at line #167, it'd be clearer and safer to 
>>>>>> use TestEvent.class.getName().
>>>>> Fixed
>>>>>> 3) instead of asking JVM which you create its system properties 
>>>>>> in getJfrRepository(), can we just start "producer" process w/ 
>>>>>> jdk.jfr.repository being set to a well-know location?
>>>>> AFAIK, JFR sub-system picks the location automatically. Part of 
>>>>> location is a base directory, e.g. "/private/var/folders/" on 
>>>>> Linux, while the second part is picked at runtime based on time 
>>>>> stamp and PID. I would like to leave this as is for now, while I 
>>>>> start a discussion with JFR team on how to improve this.
>>>>
>>>> It was our initial plan to provide means to set the repository on 
>>>> the command line, but we ran into issues so we decided against it, 
>>>> at least for the time being.
>>>>
>>>>  -XX:FligtRecorderOptions:repository=... today means the base 
>>>> repository so it may break code if we change it.
>>>>
>>>> Adding a second repository option (what would it be called?) would 
>>>> confuse users, but also force us to deal with a situation where the 
>>>> repository already exist. This can happen if the JVM crashes, or if 
>>>> on Windows and another process is reading a file, and the directory 
>>>> can't be removed. It can also happen, if you start two JVMs with 
>>>> the same repository path.
>>>>
>>>> One can imagine a production deployment where a server is restarted 
>>>> if it crashes, but it will not start up again because the 
>>>> repository of the previous process was left behind.
>>>>
>>>>> This is a good point. As a consideration, the user should be able 
>>>>> to specify this property, or at least the base directory for JFR 
>>>>> repositories. Once this is resolved, I will update the test 
>>>>> accordingly.
>>>>>> 4) to make the test more robust, at line #118 I'd compare 
>>>>>> eventsDeliveredWhileProducerIsAlive w/ 1 instead of 1/2 of 
>>>>>> eventsDeliveredTotal.
>>>>> OK.
>>>>>> 5) at line #110 please use Asserts.assertEqual instead of 
>>>>>> Assert.assertTrue(a == b) as it not just compare the values but 
>>>>>> also prints them out in case !(a==b)
>>>>> Fixed
>>>>>> 6) at line #118 please use Asserts.assertGreaterThanOrEqual 
>>>>>> instead of Assert.assertTrue(a>= b) as it not just compare the 
>>>>>> values but also prints them out in case !(a>=b)
>>>>> Fixed
>>>>>> 7) is it really necessary to wait till "producer" process main is 
>>>>>> started?
>>>>> This is necessary to make sure that VM has initialized to a point 
>>>>> where it can correctly receive JCMD requests. We ran into this 
>>>>> type of issue before.
>>>>>> 7.1) if it is, you should use String::equals at line #204 instead 
>>>>>> of contains
>>>>>> 7.2) if the answer of 7) is true, I'd recommend you to use 
>>>>>> j.u.c.CountDownLatch (w/ count = 1) instead of AtomicBoolean for 
>>>>>> producerMainStarted (you might need to find a better name, e.g. 
>>>>>> producerStartedSignal) to convey a signal that "producer" process 
>>>>>> reached its main, so line#205 will become 
>>>>>> producerMainStarted.countDown(), and line#142 -- 
>>>>>> producerMainStarted.await()
>>>>> Thanks for suggesting during our conversation the use of 
>>>>> ProcessTools.startProcess() with wait-until-predicate. I will use 
>>>>> this, it will simplify the code and be more declarative.
>>>>>> 8) I find usage of AtomicIntegeres as a way to return two values 
>>>>>> from consumeEvents() at a very least confusing, could you please 
>>>>>> create a "record" class for that purpose instead?
>>>>> I can do that, as we have discussed.
>>>>>> 9) local 'id' (line #169) isn't use anywhere. will it be 
>>>>>> beneficial to check that its value is b/w 0 and 
>>>>>> (NUMBER_OF_EVENTS_PRODUCED - 1) and, for 
>>>>>> testProducerExitsNormally case, that sum of all id is 
>>>>>> ((NUMBER_OF_EVENTS_PRODUCED - 1) * NUMBER_OF_EVENTS_PRODUCED / 2) ?
>>>>> The id is not necessary. I used it for development and 
>>>>> experimentation. I will remove it.
>>>>>> 9.1) and if EventStream::onEvent is expected to preserve events 
>>>>>> order, that 'id' is strictly increasing?
>>>>> No guarantee of that.
>>>>>
>>>>> I will make the updates, test, and post the updated webrev.
>>>>>
>>>>> Thank you,
>>>>> Misha
>>>>>>
>>>>>> Thanks,
>>>>>> -- Igor
>>>>>>
>>>>>>> On Nov 12, 2019, at 9:12 PM, Igor 
>>>>>>> Ignatyev<igor.ignatyev at oracle.com>  wrote:
>>>>>>>
>>>>>>> Hi Misha,
>>>>>>>
>>>>>>> I haven't read the test code thoroughly yet, but I have a 
>>>>>>> question -- is there a reason for TestCrossProcessStreaming to 
>>>>>>> be run in othervm mode?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -- Igor
>>>>>>>
>>>>>>>> On Nov 12, 2019, at 4:20 PM, Mikhailo 
>>>>>>>> Seledtsov<mikhailo.seledtsov at oracle.com>  wrote:
>>>>>>>>
>>>>>>>> Please review this new JFR test. In this test the producer of 
>>>>>>>> JFR events and the consumer run in separate processes.
>>>>>>>> This is expected to be a common scenario with JFR streaming for 
>>>>>>>> monitoring applications and services.
>>>>>>>>
>>>>>>>>    JBS: https://bugs.openjdk.java.net/browse/JDK-8229209
>>>>>>>>    Webrev: http://cr.openjdk.java.net/~mseledtsov/8229209.01/
>>>>>>>>    Testing:
>>>>>>>>      - ran this new test locally (Mac, Linux-x64) multiple 
>>>>>>>> times - PASS
>>>>>>>>      - running this test on the test cluster - in progress
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Misha
>>>>>>>>
>


More information about the hotspot-jfr-dev mailing list