RFR(S) 8246019 PerfClassTraceTime slows down VM start-up

Ioi Lam ioi.lam at oracle.com
Thu Jun 18 05:57:21 UTC 2020



On 6/17/20 10:16 PM, David Holmes wrote:
> Hi Ioi,
>
> Re EagerInitialize ... skip to bottom part ...
>
> On 18/06/2020 2:34 pm, Ioi Lam wrote:
>>
>>
>> On 6/17/20 7:38 PM, David Holmes wrote:
>>> Hi Ioi,
>>>
>>> On 17/06/2020 1:19 pm, Ioi Lam wrote:
>>>> On 6/16/20 6:20 PM, David Holmes wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> On 17/06/2020 6:14 am, Ioi Lam wrote:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246019
>>>>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ 
>>>>>>
>>>>>>
>>>>>> PerfClassTraceTime is (a rarely used feature) for measuring the 
>>>>>> time spent during class linking and initialization.
>>>>>
>>>>> "A special command jcmd <process id/main class> PerfCounter.print 
>>>>> prints all performance counters in the process."
>>>>>
>>>>> How do you know this is a "rarely used feature"?
>>>> Hi David,
>>>>
>>>> Sure, the counter will be dumped, but by "rarely used" -- I mean no 
>>>> one will find this particular counter useful, and no one will be 
>>>> actively looking at it.
>>>>
>>>> I changed two parts of the code -- class init and class linking.
>>>>
>>>> For class initialization, the counter may be useful for people who 
>>>> want to know how much time is spent in their <clinit> functions, 
>>>> and my patch doesn't change that. It only avoids using the counter 
>>>> when a class has no <clinit>, i.e., we know that the counter counts 
>>>> nothing (except for a logging statement).
>>>
>>> I can see where you are coming from here. Today we keep track of the 
>>> time taken to mark a class as initialized when there is no clinit to 
>>> execute, and we actually record pure timer overhead as it dwarfs the 
>>> simple update to the class state. With your change we now won't 
>>> track the time taken to mark the class as initialized. In both cases 
>>> the time recorded is inaccurate - in opposite senses. In that regard 
>>> your slight underestimation of the class initalization cost seems 
>>> better then the present over-estimate.
>>>
>>>> =====
>>>>
>>>> For class linking, no user code is executed, so it only measures VM 
>>>> code. If it's useful for anyone, that would be VM engineers like me 
>>>> who are trying to optimize the speed of class loading. However, due 
>>>> to the overhead of the counter vs what it's trying to measure, the 
>>>> results are pretty meaningless.
>>>>
>>>> Note that I've not disabled the counter altogether. Instead, I 
>>>> disable it only when linking a CDS shared class, and we know that 
>>>> very little is happening for this class (e.g., no verification).
>>>
>>> Yes I have little concern for this part. Linking is a multi-phase 
>>> process so "time for linking" is already ill-defined. And the fact 
>>> you only do it for CDS makes it even less interesting.
>>>
>>>> I think the class linking timer might have been useful 15 years ago 
>>>> when it was introduced, or it might be useful today when CDS is 
>>>> disabled. But with CDS enabled, we are paying a constant price that 
>>>> seems to benefit no one.
>>>>
>>>> I think we should short-circuit it when it seems appropriate. If 
>>>> this indeed causes problems for our users, it's easy to re-enable 
>>>> it. That's better than just keeping this forever just because we're 
>>>> afraid to touch anything.
>>>
>>> I'm uncomfortable with both the "keep forever as we're too afraid to 
>>> change" and "change it now and restore it if anyone complains" ends 
>>> of this spectrum. Obviously we need to make progress, but the 
>>> "change it now and change back later if needed" is a bit naive, as 
>>> once any change is made we can't change back without affecting 
>>> another set of users, and we don't know how long it will be before 
>>> the change reaches users and the problems return to us. From a CSR 
>>> perspective I want to see that due diligence has been applied with 
>>> regard to these behavioural changes, and JDK engineers are often not 
>>> in a position to understand how end users use this kind of 
>>> information. I don't have a solution for that general problem.
>>>
>>> In this particular case I think under-estimating the class 
>>> initialization overhead is better than the present over-estimate. 
>>> Though anyone tracking the trends here will be surprised when the 
>>> cost suddenly drops.
>>
>> Hi David,
>>
>> I don't have a solution, either. I am in no hurry, and the 
>> improvement is minor.
>>
>> I could post a CSR and let it stand for a few months to see if anyone 
>> objects. My impression is "linking time" is such an esoteric feature 
>> that no one will care, but I may be wrong. Actually it would be good 
>> if someone tells me I am wrong -- they probably are experiencing some 
>> overhead in class loading that we don't currently know about.
>>
>> As I mentioned earlier, if anyone is using this timer, it would be VM 
>> engineers who work on class loading. In fact, Yumin fixed 8178349 
>> "Cache builtin class loader constraints to avoid re-initializing 
>> itable/vtable for shared classes" in JDK 15, which significantly 
>> reduced the amount of time spent during class linking. However, we 
>> didn't use this timer for measuring the effectiveness of that fix, as 
>> the overhead and variability are too high. We have reached a point in 
>> the class loading code that we can only make small incremental 
>> improvements, and we can only measure the effect of our changes with 
>> external profilers such as "perf stat -r 200 bin/java -version" that 
>> launches the VM 200 times (and repeat that 10 times) and averages the 
>> elapsed time.
>>
>> Maybe it's time for the "class linking" timer to go away completely, 
>> or at least be disabled when CDS is enabled. It's pretty much 
>> useless. I wish we had an established deprecation process for such 
>> legacy features.
>>
>>
>>>
>>>>>
>>>>> I find it hard to evaluate whether this short-circuiting of the 
>>>>> time tracing is reasonable or not. Obviously any monitoring 
>>>>> mechanism should impose minimal overhead compared to what is being 
>>>>> measured, and these timers fall short in that regard. But if these 
>>>>> stats become meaningless then they may as well be removed.
>>>>>
>>>>> I think the serviceability folk (cc'd) need to evaluate this in 
>>>>> the context of the M&M tools.
>>>>>
>>>>>> However, it's quite expensive and it needs to start and stop a 
>>>>>> bunch of timers. With CDS, it's quite often for the overhead of 
>>>>>> the timer itself to be much more than the time it's trying to 
>>>>>> measure, giving unreliable measurement.
>>>>>>
>>>>>> In this patch, when it's clear that the init and linking will be 
>>>>>> very quick, I disable the timer and count only the number of 
>>>>>> invocations. This shows a small improvement in start-up
>>>>>
>>>>> I'm curious if you tried to forcing EagerInitialization to be true 
>>>>> to see how that improves the baseline. I've always noticed 
>>>>> eager_init in the code, but hadn't realized it is disabled by 
>>>>> default.
>>>>>
>>>>
>>>> I think it cannot be done by default, as it will violate the JLS. A 
>>>> class can be initialized only when it's touched by bytecodes.
>>>
>>> I'm also not sure it violates JLS as you can't directly query if a 
>>> class is initialized, but anyway I wasn't suggesting turning this on 
>>> by default, I meant only in regard to get a performance baseline to 
>>> compare against the changes you have implemented here.
>>
>> I don't think EagerInitialization will make a difference that's 
>> related to the timer. Both the two instances of the timer will still 
>> be used for exactly the same number of times, just under different 
>> call stacks.
>
> Eager initialization doesn't use any timers.
>
>>>
>>> Aside: I have to wonder whether anyone uses EagerInitialization or 
>>> whether we should get rid of it?
>>>
>> I wonder about the same thing. I'll ask around and file an RFE if 
>> appropriate.
>>
>>>> It can also backfire as we may load many classes without 
>>>> initializing them. E.g., during bytecode verification, we load many 
>>>> classes and just check that one is a supertype of another.
>>>
>>> Not sure what is backfiring here ??
>>
>> Here's an example that both violates the JLS and backfires as it 
>> slows down VM start up, because <clinit> can have observable side 
>> effects:
>>
>> class Main {
>>      static int X;
>>      public void main(String args[]) {
>>          System.out.println(X);
>>      }
>>      void deadcode() {
>>          Super s = new Child();
>>          s.method();
>>      }
>> }
>>
>> class Super {
>>      void method() {}
>> }
>>
>> class Child extends Super {
>>      static {
>>          for (int i=0; i<1000000; i++) {
>>              Main.X ++;
>>          }
>>      }
>> }
>>
>> When Main is linked, its bytecodes are verified, including 
>> deadcode(). Since deadcode() has an implicit cast of Child to Super, 
>> the verifier needs to load both Child and Super, and check that Child 
>> is indeed a subclass of Super.
>>
>> If EagerInitialization is enabled, Child will be initialized as soon 
>> as it's entered into the system dictionary. This violates 
>
> No it won't. Eager initialization is only applicable when there is no 
> <clinit> and where the super classes are already initialised:
>
> void InstanceKlass::eager_initialize(Thread *thread) {
>   if (!EagerInitialization) return;
>
>   if (this->is_not_initialized()) {
>     // abort if the the class has a class initializer
>     if (this->class_initializer() != NULL) return;
>
>     // abort if it is java.lang.Object (initialization is handled in 
> genesis)
>     Klass* super_klass = super();
>     if (super_klass == NULL) return;
>
>     // abort if the super class should be initialized
>     if (!InstanceKlass::cast(super_klass)->is_initialized()) return;
>
>     // call body to expose the this pointer
>     eager_initialize_impl();
>   }
> }
>
> David
> -----
>
Ah, I didn't see that. I should have read the code first :-)

Anyway, Eager initialization will trigger verification which may cause 
more classes to be loaded (and initialized if they don't have <clinit>, 
and so forth). So it's not equivalent to, and is hard to compare with, 
my proposed changes for skipping the timer.

Thanks
- Ioi


>> https://docs.oracle.com/javase/specs/jls/se14/html/jls-12.html#jls-12.4.1 
>>
>>
>>     A class or interface type T will be initialized immediately before
>>     the first occurrence of any one of the following:
>>
>>     T is a class and an instance of T is created.
>>     A static method declared by T is invoked.
>>     A static field declared by T is assigned.
>>     A static field declared by T is used and the field is not a constant
>>     variable (§4.12.4).
>>     ....
>>     A class or interface will not be initialized under any other
>>     circumstance.
>>
>>
>> Also, app developers will generally expect 0 to be printed. So 
>> EagerInitialization will probably break apps in subtle ways.
>>
>> And initializing Child may recursively load in more classes during 
>> the verification of Child ......
>>
>> Thanks
>> - Ioi
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> Cheers,
>>>>> David
>>>>> -----
>>>>>
>>>>>> Results of " perf stat -r 100 bin/java -Xshare:on 
>>>>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version "
>>>>>>
>>>>>> 59623970 59341935 (-282035)   -----  41.774  41.591 ( -0.183) -
>>>>>> 59623495 59331646 (-291849)   -----  41.696  41.165 ( -0.531) --
>>>>>> 59627148 59329526 (-297622)   -----  41.249  41.094 ( -0.155) -
>>>>>> 59612439 59340760 (-271679)   ----   41.773  40.657 ( -1.116) -----
>>>>>> 59626438 59335681 (-290757)   -----  41.683  40.901 ( -0.782) ----
>>>>>> 59618436 59338953 (-279483)   -----  41.861  41.249 ( -0.612) ---
>>>>>> 59608782 59340173 (-268609)   ----   41.198  41.508 ( 0.310) +
>>>>>> 59614612 59325177 (-289435)   -----  41.397  41.738 ( 0.341) ++
>>>>>> 59615905 59344006 (-271899)   ----   41.921  40.969 ( -0.952) ----
>>>>>> 59635867 59333147 (-302720)   -----  41.491  40.836 ( -0.655) ---
>>>>>> ================================================
>>>>>> 59620708 59336100 (-284608)   -----  41.604  41.169 ( -0.434) --
>>>>>> instruction delta =      -284608    -0.4774%
>>>>>> time        delta =       -0.434 ms -1.0435%
>>>>>>
>>>>>> The number of PerfClassTraceTime's used is reduced from 564 to 
>>>>>> 116 (so we have an overhead of about 715 instructions per use, 
>>>>>> yikes!).
>>>>
>>



More information about the hotspot-runtime-dev mailing list