RFR(S): 8229209: [TESTBUG] test for cross-process JFR event streaming

Mikhailo Seledtsov mikhailo.seledtsov at oracle.com
Thu Nov 14 19:23:15 UTC 2019


Minor update: http://cr.openjdk.java.net/~mseledtsov/8229209.05/
Difference since v04: removed exit code verification from 
testProducerCrashing(), since crashing process produces different exit 
codes on different systems, and it is not strictly necessary to verify that.

Testing: ran multiple times in the test cluster, ALL PASS

Misha

On 11/13/19, 4:42 PM, Mikhailo Seledtsov wrote:
> Here is the updated webrev: 
> http://cr.openjdk.java.net/~mseledtsov/8229209.04/
>
> I have incorporated feedback from Igor, as well as feedback from 
> chatting with Erik, and some fixes.
> Feedback from Erik:
>   - use VistualMachine.attach() instead of JCMD - preferred way, 
> simplifies code
>   - rework the crash logic in a producer a bit - simplifies code
>
> Also, I have added exports for child process "--add-exports" for 
> Unsafe, added check for child process exit code, plus comment cleanup.
>
> Thank you,
> Misha
>
> On 11/13/19, 2:12 PM, Erik Gahlin wrote:
>>
>> On 2019-11-13 22:38, Mikhailo Seledtsov wrote:
>>> Hi Igor,
>>>
>>>   Thank you for reviewing the test.
>>>
>>> The original question in your first email: I do not think it is 
>>> necessary to run the test in othervm mode; I will remove the 
>>> 'othervm' from at-run. The crashing process is the child of the main 
>>> test process, so this should not affect the state of the main test VM.
>>>
>>> On 11/12/19, 9:59 PM, Igor Ignatyev wrote:
>>>> Hi Misha,
>>>>
>>>> I've finished reading, several more comments:
>>>>
>>>> 2) in consumeEvents at line #167, it'd be clearer and safer to use 
>>>> TestEvent.class.getName().
>>> Fixed
>>>> 3) instead of asking JVM which you create its system properties in 
>>>> getJfrRepository(), can we just start "producer" process w/ 
>>>> jdk.jfr.repository being set to a well-know location?
>>> AFAIK, JFR sub-system picks the location automatically. Part of 
>>> location is a base directory, e.g. "/private/var/folders/" on Linux, 
>>> while the second part is picked at runtime based on time stamp and 
>>> PID. I would like to leave this as is for now, while I start a 
>>> discussion with JFR team on how to improve this.
>>
>> It was our initial plan to provide means to set the repository on the 
>> command line, but we ran into issues so we decided against it, at 
>> least for the time being.
>>
>>  -XX:FligtRecorderOptions:repository=... today means the base 
>> repository so it may break code if we change it.
>>
>> Adding a second repository option (what would it be called?) would 
>> confuse users, but also force us to deal with a situation where the 
>> repository already exist. This can happen if the JVM crashes, or if 
>> on Windows and another process is reading a file, and the directory 
>> can't be removed. It can also happen, if you start two JVMs with the 
>> same repository path.
>>
>> One can imagine a production deployment where a server is restarted 
>> if it crashes, but it will not start up again because the repository 
>> of the previous process was left behind.
>>
>>> This is a good point. As a consideration, the user should be able to 
>>> specify this property, or at least the base directory for JFR 
>>> repositories. Once this is resolved, I will update the test 
>>> accordingly.
>>>> 4) to make the test more robust, at line #118 I'd compare 
>>>> eventsDeliveredWhileProducerIsAlive w/ 1 instead of 1/2 of 
>>>> eventsDeliveredTotal.
>>> OK.
>>>> 5) at line #110 please use Asserts.assertEqual instead of 
>>>> Assert.assertTrue(a == b) as it not just compare the values but 
>>>> also prints them out in case !(a==b)
>>> Fixed
>>>> 6) at line #118 please use Asserts.assertGreaterThanOrEqual instead 
>>>> of Assert.assertTrue(a>= b) as it not just compare the values but 
>>>> also prints them out in case !(a>=b)
>>> Fixed
>>>> 7) is it really necessary to wait till "producer" process main is 
>>>> started?
>>> This is necessary to make sure that VM has initialized to a point 
>>> where it can correctly receive JCMD requests. We ran into this type 
>>> of issue before.
>>>> 7.1) if it is, you should use String::equals at line #204 instead 
>>>> of contains
>>>> 7.2) if the answer of 7) is true, I'd recommend you to use 
>>>> j.u.c.CountDownLatch (w/ count = 1) instead of AtomicBoolean for 
>>>> producerMainStarted (you might need to find a better name, e.g. 
>>>> producerStartedSignal) to convey a signal that "producer" process 
>>>> reached its main, so line#205 will become 
>>>> producerMainStarted.countDown(), and line#142 -- 
>>>> producerMainStarted.await()
>>> Thanks for suggesting during our conversation the use of 
>>> ProcessTools.startProcess() with wait-until-predicate. I will use 
>>> this, it will simplify the code and be more declarative.
>>>> 8) I find usage of AtomicIntegeres as a way to return two values 
>>>> from consumeEvents() at a very least confusing, could you please 
>>>> create a "record" class for that purpose instead?
>>> I can do that, as we have discussed.
>>>> 9) local 'id' (line #169) isn't use anywhere. will it be beneficial 
>>>> to check that its value is b/w 0 and (NUMBER_OF_EVENTS_PRODUCED - 
>>>> 1) and, for testProducerExitsNormally case, that sum of all id is 
>>>> ((NUMBER_OF_EVENTS_PRODUCED - 1) * NUMBER_OF_EVENTS_PRODUCED / 2) ?
>>> The id is not necessary. I used it for development and 
>>> experimentation. I will remove it.
>>>> 9.1) and if EventStream::onEvent is expected to preserve events 
>>>> order, that 'id' is strictly increasing?
>>> No guarantee of that.
>>>
>>> I will make the updates, test, and post the updated webrev.
>>>
>>> Thank you,
>>> Misha
>>>>
>>>> Thanks,
>>>> -- Igor
>>>>
>>>>> On Nov 12, 2019, at 9:12 PM, Igor 
>>>>> Ignatyev<igor.ignatyev at oracle.com>  wrote:
>>>>>
>>>>> Hi Misha,
>>>>>
>>>>> I haven't read the test code thoroughly yet, but I have a question 
>>>>> -- is there a reason for TestCrossProcessStreaming to be run in 
>>>>> othervm mode?
>>>>>
>>>>> Thanks,
>>>>> -- Igor
>>>>>
>>>>>> On Nov 12, 2019, at 4:20 PM, Mikhailo 
>>>>>> Seledtsov<mikhailo.seledtsov at oracle.com>  wrote:
>>>>>>
>>>>>> Please review this new JFR test. In this test the producer of JFR 
>>>>>> events and the consumer run in separate processes.
>>>>>> This is expected to be a common scenario with JFR streaming for 
>>>>>> monitoring applications and services.
>>>>>>
>>>>>>    JBS: https://bugs.openjdk.java.net/browse/JDK-8229209
>>>>>>    Webrev: http://cr.openjdk.java.net/~mseledtsov/8229209.01/
>>>>>>    Testing:
>>>>>>      - ran this new test locally (Mac, Linux-x64) multiple times 
>>>>>> - PASS
>>>>>>      - running this test on the test cluster - in progress
>>>>>>
>>>>>>
>>>>>> Thank you,
>>>>>> Misha
>>>>>>


More information about the hotspot-jfr-dev mailing list