RFR: 8202466: Test serviceability/tmtools/jstat/GcTest01.java fails: Number of concurrent GC events is 1, but CGCT is 0
Chris Plummer
chris.plummer at oracle.com
Thu May 17 21:43:12 UTC 2018
Hi Yasumasa,
Your changes look fine. Can you fix the alignment of the output in the
following comment in JstatGcNewResults.java:
27 * Output example:
28 * S0C S1C S0U S1U TT MTT DSS
EC EU YGC YGCT
29 * 11264.0 11264.0 0.0 0.0 15 15 0.0 67584.0
1351.7 0 0.000
If it looks aligned to you in this email, it's because you are looking
at it using a variable width font. Look at is using a fixed width font.
thanks,
Chris
On 5/15/18 6:52 AM, Yasumasa Suenaga wrote:
> Hi,
>
>>> If that is the case, maybe just drop the error check.
>>
>> I want to do that.
>
> I removed them in following webrev. Could you review it?
>
> http://cr.openjdk.java.net/~ysuenaga/JDK-8202466/webrev.00/
>
>
> Thanks,
>
> Yasumasa
>
>
> On 2018/05/06 21:44, Yasumasa Suenaga wrote:
>> Hi Chris,
>>
>>> As I pointed out in my initial review, this is not code I'm that
>>> familiar with. One question I have is if it is even necessary to
>>> fail if the value is .000? It seems the assumption is that if there
>>> was 1 GC event, the total time spent will be at least .0005ms,
>>> rounded up to .001ms, so it shows up in the output as a non-zero value.
>>
>> I agree with you.
>>
>>
>>> If that is the case, maybe just drop the error check.
>>
>> I want to do that.
>> Another option(s) affects jstat output, so I think we need to CSR.
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> On 2018/05/05 7:03, Chris Plummer wrote:
>>> Hi Yasumasa,
>>>
>>> I just noticed that GcTest01.java and GcCauseTest03.java have also
>>> failed for this reason. I see 9 total failures between the 3 tests
>>> so far.
>>>
>>> On 5/4/18 6:17 AM, Yasumasa Suenaga wrote:
>>>> Hi Chris,
>>>>
>>>> Thank you for your evaluation !
>>>> It is very helpful for me.
>>>> (I've not reproduced this issue on linux-x64 !)
>>>>
>>>>> It's just very intermittent.
>>>>
>>>> I do not yet decided how do we fix this problem.
>>>>
>>>> IMHO we can add fallback code to get raw PerfCounter data through
>>>> `jcmd PerfCounter.print`.
>>>> But I think we shouldn't be so because that fallback (jcmd) code
>>>> might hide jstat problem(s).
>>>>
>>>> Also we cannot reproduce this issue completely, so we might not
>>>> check the change for this issue.
>>>>
>>>>
>>>> Do you have any idea?
>>> As I pointed out in my initial review, this is not code I'm that
>>> familiar with. One question I have is if it is even necessary to
>>> fail if the value is .000? It seems the assumption is that if there
>>> was 1 GC event, the total time spent will be at least .0005ms,
>>> rounded up to .001ms, so it shows up in the output as a non-zero
>>> value. If that is the case, maybe just drop the error check. Another
>>> option is to use something other than .000 when the true value
>>> rounds to .000. Does it have to be 3 digits? Maybe always print
>>> enough digits so you never end up with all 0's.
>>>
>>> thanks,
>>>
>>> Chris
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>
>>>> On 2018/05/04 2:08, Chris Plummer wrote:
>>>>> Hi Yasumasa,
>>>>>
>>>>> Here are my results. I ran the test on macosx-x64-debug 500 times
>>>>> using random machines. I saw the failure twice, both times on
>>>>> different machines. I then ran 25 times on each of the 3 macosx
>>>>> machines that have shown the failure, and did not see it again. So
>>>>> I don't think this is necessarily an issue that is more likely to
>>>>> turn up on one macosx machine than any other. It's just very
>>>>> intermittent.
>>>>>
>>>>> I then ran 200 times each on all our supported platforms. So that
>>>>> would be both debug and product builds on macosx, linux-x64,
>>>>> solaris-sparc, and windows-x64. During those runs it turned up
>>>>> once on linux-x64 (product, not debug), so this issue does not
>>>>> seem to be limited to macosx.
>>>>>
>>>>> best regards,
>>>>>
>>>>> Chris
>>>>>
>>>>> On 5/3/18 8:26 AM, Chris Plummer wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> I only see the one reported failure in our recent test history
>>>>>> for this test, so it doesn't look like it happens every time.
>>>>>> I'll try running the test on the machine it failed on, both with
>>>>>> the binary it failed with and a more recent binary.
>>>>>>
>>>>>> best regards,
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On 5/3/18 4:48 AM, Yasumasa Suenaga wrote:
>>>>>>> FYI: jdk-11-ea+11 passed GcTest01.java
>>>>>>> https://download.java.net/openjdk/testresults/11/archives/11/diff-hotspot-10-11.txt
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2018/05/02 21:46, Yasumasa Suenaga wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I've tried to fix 8202466, but I do not yet certain which
>>>>>>>> solution is the best for it.
>>>>>>>> So I want your opinion for it.
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8202466
>>>>>>>>
>>>>>>>> David reports serviceability/tmtools/jstat/GcTest01.java failed
>>>>>>>> in Mach5. It seems to appear on OS X only.
>>>>>>>> He has shared jtreg report, and I found as below:
>>>>>>>>
>>>>>>>> ```
>>>>>>>> stdout: S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT
>>>>>>>> FGC FGCT CGC CGCT GCT
>>>>>>>> 0.0 0.0 0.0 0.0 3072.0 0.0 11264.0 2825.6 9472.0 9003.3
>>>>>>>> 1024.0 845.3 26 0.204 3 0.292 1 0.000 0.497
>>>>>>>> ```
>>>>>>>>
>>>>>>>> YGCT: 0.204
>>>>>>>> FGCT: 0.292
>>>>>>>> CGCT: 0.000
>>>>>>>>
>>>>>>>> GCT: 0.497
>>>>>>>>
>>>>>>>> I guess this failure was caused by rounding error because (YGCT
>>>>>>>> + FGCT + CGCT) < GCT.
>>>>>>>> CGC is 1, so CGC operation might finish in very short time.
>>>>>>>>
>>>>>>>>
>>>>>>>> GcTest01.java:
>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4da7dce7e2bf/test/hotspot/jtreg/serviceability/tmtools/jstat/GcTest01.java#l57
>>>>>>>>
>>>>>>>>
>>>>>>>> GcCauseTest01.java:
>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/4da7dce7e2bf/test/hotspot/jtreg/serviceability/tmtools/jstat/GcCauseTest01.java#l53
>>>>>>>>
>>>>>>>>
>>>>>>>> GcTest01 and GcCauseTest01 are very similar, but GcCauseTest01
>>>>>>>> passed.
>>>>>>>> Both tests use GcProvoker::provokeGc() to inflate memory usage.
>>>>>>>> So I wonder why GcTest01 just only failed.
>>>>>>>> I guess we might encounter similar issue(s) in the future if we
>>>>>>>> get more fast machines.
>>>>>>>>
>>>>>>>> Hence I think we can take two approaches as below:
>>>>>>>>
>>>>>>>> 1. Add all tests in serviceability/tmtools/jstat to
>>>>>>>> ProblemList.
>>>>>>>> 2. Change all JstatGc*Tool to use custom jstat_options - to
>>>>>>>> show raw values in PerfCounters
>>>>>>>>
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>> I can start to work for it if they are OK.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>>
More information about the serviceability-dev
mailing list