RFR(S) 8246019 PerfClassTraceTime slows down VM start-up

Wed Jun 17 09:42:10 UTC 2020

On 17/06/2020 1:54 pm, Yumin Qi wrote:
>    product(bool, UsePerfData, 
> true,                                          \
>            "Flag to disable jvmstat instrumentation for performance 
> testing "\
>            "and problem isolation 
> purposes")                                 \
> 
> 
> The flag default value set to true --- should we change that? If the 
> flag set to false at default, performance can benefit from that. If 
> users want to collect performance data, should explicitly turn it on.

See comments in JDK-8246020.

David

> 
> Thanks
> 
> Yumin
> 
> 
> On 6/16/20 8:19 PM, Ioi Lam wrote:
>>
>>
>> On 6/16/20 6:20 PM, David Holmes wrote:
>>> Hi Ioi,
>>>
>>> On 17/06/2020 6:14 am, Ioi Lam wrote:
>>>> https://bugs.openjdk.java.net/browse/JDK-8246019
>>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ 
>>>>
>>>>
>>>> PerfClassTraceTime is (a rarely used feature) for measuring the time 
>>>> spent during class linking and initialization.
>>>
>>> "A special command jcmd <process id/main class> PerfCounter.print 
>>> prints all performance counters in the process."
>>>
>>> How do you know this is a "rarely used feature"?
>> Hi David,
>>
>> Sure, the counter will be dumped, but by "rarely used" -- I mean no 
>> one will find this particular counter useful, and no one will be 
>> actively looking at it.
>>
>> I changed two parts of the code -- class init and class linking.
>>
>> For class initialization, the counter may be useful for people who 
>> want to know how much time is spent in their <clinit> functions, and 
>> my patch doesn't change that. It only avoids using the counter when a 
>> class has no <clinit>, i.e., we know that the counter counts nothing 
>> (except for a logging statement).
>>
>> =====
>>
>> For class linking, no user code is executed, so it only measures VM 
>> code. If it's useful for anyone, that would be VM engineers like me 
>> who are trying to optimize the speed of class loading. However, due to 
>> the overhead of the counter vs what it's trying to measure, the 
>> results are pretty meaningless.
>>
>> Note that I've not disabled the counter altogether. Instead, I disable 
>> it only when linking a CDS shared class, and we know that very little 
>> is happening for this class (e.g., no verification).
>>
>> I think the class linking timer might have been useful 15 years ago 
>> when it was introduced, or it might be useful today when CDS is 
>> disabled. But with CDS enabled, we are paying a constant price that 
>> seems to benefit no one.
>>
>> I think we should short-circuit it when it seems appropriate. If this 
>> indeed causes problems for our users, it's easy to re-enable it. 
>> That's better than just keeping this forever just because we're afraid 
>> to touch anything.
>>
>>>
>>> I find it hard to evaluate whether this short-circuiting of the time 
>>> tracing is reasonable or not. Obviously any monitoring mechanism 
>>> should impose minimal overhead compared to what is being measured, 
>>> and these timers fall short in that regard. But if these stats become 
>>> meaningless then they may as well be removed.
>>>
>>> I think the serviceability folk (cc'd) need to evaluate this in the 
>>> context of the M&M tools.
>>>
>>>> However, it's quite expensive and it needs to start and stop a bunch 
>>>> of timers. With CDS, it's quite often for the overhead of the timer 
>>>> itself to be much more than the time it's trying to measure, giving 
>>>> unreliable measurement.
>>>>
>>>> In this patch, when it's clear that the init and linking will be 
>>>> very quick, I disable the timer and count only the number of 
>>>> invocations. This shows a small improvement in start-up
>>>
>>> I'm curious if you tried to forcing EagerInitialization to be true to 
>>> see how that improves the baseline. I've always noticed eager_init in 
>>> the code, but hadn't realized it is disabled by default.
>>>
>>
>> I think it cannot be done by default, as it will violate the JLS. A 
>> class can be initialized only when it's touched by bytecodes.
>>
>> It can also backfire as we may load many classes without initializing 
>> them. E.g., during bytecode verification, we load many classes and 
>> just check that one is a supertype of another.
>>
>> Thanks
>> - Ioi
>>
>>> Cheers,
>>> David
>>> -----
>>>
>>>> Results of " perf stat -r 100 bin/java -Xshare:on 
>>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version "
>>>>
>>>> 59623970 59341935 (-282035)   -----  41.774  41.591 ( -0.183) -
>>>> 59623495 59331646 (-291849)   -----  41.696  41.165 ( -0.531) --
>>>> 59627148 59329526 (-297622)   -----  41.249  41.094 ( -0.155) -
>>>> 59612439 59340760 (-271679)   ----   41.773  40.657 ( -1.116) -----
>>>> 59626438 59335681 (-290757)   -----  41.683  40.901 ( -0.782) ----
>>>> 59618436 59338953 (-279483)   -----  41.861  41.249 ( -0.612) ---
>>>> 59608782 59340173 (-268609)   ----   41.198  41.508 (  0.310) +
>>>> 59614612 59325177 (-289435)   -----  41.397  41.738 (  0.341) ++
>>>> 59615905 59344006 (-271899)   ----   41.921  40.969 ( -0.952) ----
>>>> 59635867 59333147 (-302720)   -----  41.491  40.836 ( -0.655) ---
>>>> ================================================
>>>> 59620708 59336100 (-284608)   -----  41.604  41.169 ( -0.434) --
>>>> instruction delta =      -284608    -0.4774%
>>>> time        delta =       -0.434 ms -1.0435%
>>>>
>>>> The number of PerfClassTraceTime's used is reduced from 564 to 116 
>>>> (so we have an overhead of about 715 instructions per use, yikes!).
>>