RFR(S): 8229209: [TESTBUG] test for cross-process JFR event streaming

Fri Nov 15 04:17:30 UTC 2019

Here is the updated webrev: 
http://cr.openjdk.java.net/~mseledtsov/8229209.06/

On 11/14/19 11:40 AM, Mikhailo Seledtsov wrote:
>
>
> On 11/14/19, 11:26 AM, Igor Ignatyev wrote:
>>
>>
>>> On Nov 14, 2019, at 11:23 AM, Mikhailo Seledtsov 
>>> <mikhailo.seledtsov at oracle.com 
>>> <mailto:mikhailo.seledtsov at oracle.com>> wrote:
>>>
>>> Minor update: http://cr.openjdk.java.net/~mseledtsov/8229209.05/
>>> Difference since v04: removed exit code verification from 
>>> testProducerCrashing(), since crashing process produces different 
>>> exit codes on different systems, and it is not strictly necessary to 
>>> verify that.
>> although it's not strictly need to verify the product behavior, 
>> checking that a process really crashed is beneficial to catch 
>> possible test bugs, you can assert that it's not 0 and  you also 
>> might check that there is no 'Leaving main()' in stdout.
> I will do that.
>>
>> the rest looks good to me.
> Thank you,
> Misha
>>
>> -- Igor
>>>
>>> Testing: ran multiple times in the test cluster, ALL PASS
>>>
>>> Misha
>>>
>>> On 11/13/19, 4:42 PM, Mikhailo Seledtsov wrote:
>>>> Here is the updated webrev: 
>>>> http://cr.openjdk.java.net/~mseledtsov/8229209.04/
>>>>
>>>> I have incorporated feedback from Igor, as well as feedback from 
>>>> chatting with Erik, and some fixes.
>>>> Feedback from Erik:
>>>>   - use VistualMachine.attach() instead of JCMD - preferred way, 
>>>> simplifies code
>>>>   - rework the crash logic in a producer a bit - simplifies code
>>>>
>>>> Also, I have added exports for child process "--add-exports" for 
>>>> Unsafe, added check for child process exit code, plus comment cleanup.
>>>>
>>>> Thank you,
>>>> Misha
>>>>
>>>> On 11/13/19, 2:12 PM, Erik Gahlin wrote:
>>>>>
>>>>> On 2019-11-13 22:38, Mikhailo Seledtsov wrote:
>>>>>> Hi Igor,
>>>>>>
>>>>>>   Thank you for reviewing the test.
>>>>>>
>>>>>> The original question in your first email: I do not think it is 
>>>>>> necessary to run the test in othervm mode; I will remove the 
>>>>>> 'othervm' from at-run. The crashing process is the child of the 
>>>>>> main test process, so this should not affect the state of the 
>>>>>> main test VM.
>>>>>>
>>>>>> On 11/12/19, 9:59 PM, Igor Ignatyev wrote:
>>>>>>> Hi Misha,
>>>>>>>
>>>>>>> I've finished reading, several more comments:
>>>>>>>
>>>>>>> 2) in consumeEvents at line #167, it'd be clearer and safer to 
>>>>>>> use TestEvent.class.getName().
>>>>>> Fixed
>>>>>>> 3) instead of asking JVM which you create its system properties 
>>>>>>> in getJfrRepository(), can we just start "producer" process w/ 
>>>>>>> jdk.jfr.repository being set to a well-know location?
>>>>>> AFAIK, JFR sub-system picks the location automatically. Part of 
>>>>>> location is a base directory, e.g. "/private/var/folders/" on 
>>>>>> Linux, while the second part is picked at runtime based on time 
>>>>>> stamp and PID. I would like to leave this as is for now, while I 
>>>>>> start a discussion with JFR team on how to improve this.
>>>>>
>>>>> It was our initial plan to provide means to set the repository on 
>>>>> the command line, but we ran into issues so we decided against it, 
>>>>> at least for the time being.
>>>>>
>>>>>  -XX:FligtRecorderOptions:repository=... today means the base 
>>>>> repository so it may break code if we change it.
>>>>>
>>>>> Adding a second repository option (what would it be called?) would 
>>>>> confuse users, but also force us to deal with a situation where 
>>>>> the repository already exist. This can happen if the JVM crashes, 
>>>>> or if on Windows and another process is reading a file, and the 
>>>>> directory can't be removed. It can also happen, if you start two 
>>>>> JVMs with the same repository path.
>>>>>
>>>>> One can imagine a production deployment where a server is 
>>>>> restarted if it crashes, but it will not start up again because 
>>>>> the repository of the previous process was left behind.
>>>>>
>>>>>> This is a good point. As a consideration, the user should be able 
>>>>>> to specify this property, or at least the base directory for JFR 
>>>>>> repositories. Once this is resolved, I will update the test 
>>>>>> accordingly.
>>>>>>> 4) to make the test more robust, at line #118 I'd compare 
>>>>>>> eventsDeliveredWhileProducerIsAlive w/ 1 instead of 1/2 of 
>>>>>>> eventsDeliveredTotal.
>>>>>> OK.
>>>>>>> 5) at line #110 please use Asserts.assertEqual instead of 
>>>>>>> Assert.assertTrue(a == b) as it not just compare the values but 
>>>>>>> also prints them out in case !(a==b)
>>>>>> Fixed
>>>>>>> 6) at line #118 please use Asserts.assertGreaterThanOrEqual 
>>>>>>> instead of Assert.assertTrue(a>= b) as it not just compare the 
>>>>>>> values but also prints them out in case !(a>=b)
>>>>>> Fixed
>>>>>>> 7) is it really necessary to wait till "producer" process main 
>>>>>>> is started?
>>>>>> This is necessary to make sure that VM has initialized to a point 
>>>>>> where it can correctly receive JCMD requests. We ran into this 
>>>>>> type of issue before.
>>>>>>> 7.1) if it is, you should use String::equals at line #204 
>>>>>>> instead of contains
>>>>>>> 7.2) if the answer of 7) is true, I'd recommend you to use 
>>>>>>> j.u.c.CountDownLatch (w/ count = 1) instead of AtomicBoolean for 
>>>>>>> producerMainStarted (you might need to find a better name, e.g. 
>>>>>>> producerStartedSignal) to convey a signal that "producer" 
>>>>>>> process reached its main, so line#205 will become 
>>>>>>> producerMainStarted.countDown(), and line#142 -- 
>>>>>>> producerMainStarted.await()
>>>>>> Thanks for suggesting during our conversation the use of 
>>>>>> ProcessTools.startProcess() with wait-until-predicate. I will use 
>>>>>> this, it will simplify the code and be more declarative.
>>>>>>> 8) I find usage of AtomicIntegeres as a way to return two values 
>>>>>>> from consumeEvents() at a very least confusing, could you please 
>>>>>>> create a "record" class for that purpose instead?
>>>>>> I can do that, as we have discussed.
>>>>>>> 9) local 'id' (line #169) isn't use anywhere. will it be 
>>>>>>> beneficial to check that its value is b/w 0 and 
>>>>>>> (NUMBER_OF_EVENTS_PRODUCED - 1) and, for 
>>>>>>> testProducerExitsNormally case, that sum of all id is 
>>>>>>> ((NUMBER_OF_EVENTS_PRODUCED - 1) * NUMBER_OF_EVENTS_PRODUCED / 2) ?
>>>>>> The id is not necessary. I used it for development and 
>>>>>> experimentation. I will remove it.
>>>>>>> 9.1) and if EventStream::onEvent is expected to preserve events 
>>>>>>> order, that 'id' is strictly increasing?
>>>>>> No guarantee of that.
>>>>>>
>>>>>> I will make the updates, test, and post the updated webrev.
>>>>>>
>>>>>> Thank you,
>>>>>> Misha
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -- Igor
>>>>>>>
>>>>>>>> On Nov 12, 2019, at 9:12 PM, Igor 
>>>>>>>> Ignatyev<igor.ignatyev at oracle.com> wrote:
>>>>>>>>
>>>>>>>> Hi Misha,
>>>>>>>>
>>>>>>>> I haven't read the test code thoroughly yet, but I have a 
>>>>>>>> question -- is there a reason for TestCrossProcessStreaming to 
>>>>>>>> be run in othervm mode?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -- Igor
>>>>>>>>
>>>>>>>>> On Nov 12, 2019, at 4:20 PM, Mikhailo 
>>>>>>>>> Seledtsov<mikhailo.seledtsov at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Please review this new JFR test. In this test the producer of 
>>>>>>>>> JFR events and the consumer run in separate processes.
>>>>>>>>> This is expected to be a common scenario with JFR streaming 
>>>>>>>>> for monitoring applications and services.
>>>>>>>>>
>>>>>>>>>    JBS: https://bugs.openjdk.java.net/browse/JDK-8229209
>>>>>>>>>    Webrev: http://cr.openjdk.java.net/~mseledtsov/8229209.01/
>>>>>>>>>    Testing:
>>>>>>>>>      - ran this new test locally (Mac, Linux-x64) multiple 
>>>>>>>>> times - PASS
>>>>>>>>>      - running this test on the test cluster - in progress
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Misha
>>>>>>>>>
>>