RFR(S) 8246019 PerfClassTraceTime slows down VM start-up

David Holmes david.holmes at oracle.com
Thu Jun 18 05:16:30 UTC 2020


Hi Ioi,

Re EagerInitialize ... skip to bottom part ...

On 18/06/2020 2:34 pm, Ioi Lam wrote:
> 
> 
> On 6/17/20 7:38 PM, David Holmes wrote:
>> Hi Ioi,
>>
>> On 17/06/2020 1:19 pm, Ioi Lam wrote:
>>> On 6/16/20 6:20 PM, David Holmes wrote:
>>>> Hi Ioi,
>>>>
>>>> On 17/06/2020 6:14 am, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8246019
>>>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ 
>>>>>
>>>>>
>>>>> PerfClassTraceTime is (a rarely used feature) for measuring the 
>>>>> time spent during class linking and initialization.
>>>>
>>>> "A special command jcmd <process id/main class> PerfCounter.print 
>>>> prints all performance counters in the process."
>>>>
>>>> How do you know this is a "rarely used feature"?
>>> Hi David,
>>>
>>> Sure, the counter will be dumped, but by "rarely used" -- I mean no 
>>> one will find this particular counter useful, and no one will be 
>>> actively looking at it.
>>>
>>> I changed two parts of the code -- class init and class linking.
>>>
>>> For class initialization, the counter may be useful for people who 
>>> want to know how much time is spent in their <clinit> functions, and 
>>> my patch doesn't change that. It only avoids using the counter when a 
>>> class has no <clinit>, i.e., we know that the counter counts nothing 
>>> (except for a logging statement).
>>
>> I can see where you are coming from here. Today we keep track of the 
>> time taken to mark a class as initialized when there is no clinit to 
>> execute, and we actually record pure timer overhead as it dwarfs the 
>> simple update to the class state. With your change we now won't track 
>> the time taken to mark the class as initialized. In both cases the 
>> time recorded is inaccurate - in opposite senses. In that regard your 
>> slight underestimation of the class initalization cost seems better 
>> then the present over-estimate.
>>
>>> =====
>>>
>>> For class linking, no user code is executed, so it only measures VM 
>>> code. If it's useful for anyone, that would be VM engineers like me 
>>> who are trying to optimize the speed of class loading. However, due 
>>> to the overhead of the counter vs what it's trying to measure, the 
>>> results are pretty meaningless.
>>>
>>> Note that I've not disabled the counter altogether. Instead, I 
>>> disable it only when linking a CDS shared class, and we know that 
>>> very little is happening for this class (e.g., no verification).
>>
>> Yes I have little concern for this part. Linking is a multi-phase 
>> process so "time for linking" is already ill-defined. And the fact you 
>> only do it for CDS makes it even less interesting.
>>
>>> I think the class linking timer might have been useful 15 years ago 
>>> when it was introduced, or it might be useful today when CDS is 
>>> disabled. But with CDS enabled, we are paying a constant price that 
>>> seems to benefit no one.
>>>
>>> I think we should short-circuit it when it seems appropriate. If this 
>>> indeed causes problems for our users, it's easy to re-enable it. 
>>> That's better than just keeping this forever just because we're 
>>> afraid to touch anything.
>>
>> I'm uncomfortable with both the "keep forever as we're too afraid to 
>> change" and "change it now and restore it if anyone complains" ends of 
>> this spectrum. Obviously we need to make progress, but the "change it 
>> now and change back later if needed" is a bit naive, as once any 
>> change is made we can't change back without affecting another set of 
>> users, and we don't know how long it will be before the change reaches 
>> users and the problems return to us. From a CSR perspective I want to 
>> see that due diligence has been applied with regard to these 
>> behavioural changes, and JDK engineers are often not in a position to 
>> understand how end users use this kind of information. I don't have a 
>> solution for that general problem.
>>
>> In this particular case I think under-estimating the class 
>> initialization overhead is better than the present over-estimate. 
>> Though anyone tracking the trends here will be surprised when the cost 
>> suddenly drops.
> 
> Hi David,
> 
> I don't have a solution, either. I am in no hurry, and the improvement 
> is minor.
> 
> I could post a CSR and let it stand for a few months to see if anyone 
> objects. My impression is "linking time" is such an esoteric feature 
> that no one will care, but I may be wrong. Actually it would be good if 
> someone tells me I am wrong -- they probably are experiencing some 
> overhead in class loading that we don't currently know about.
> 
> As I mentioned earlier, if anyone is using this timer, it would be VM 
> engineers who work on class loading. In fact, Yumin fixed 8178349 "Cache 
> builtin class loader constraints to avoid re-initializing itable/vtable 
> for shared classes" in JDK 15, which significantly reduced the amount of 
> time spent during class linking. However, we didn't use this timer for 
> measuring the effectiveness of that fix, as the overhead and variability 
> are too high. We have reached a point in the class loading code that we 
> can only make small incremental improvements, and we can only measure 
> the effect of our changes with external profilers such as "perf stat -r 
> 200 bin/java -version" that launches the VM 200 times (and repeat that 
> 10 times) and averages the elapsed time.
> 
> Maybe it's time for the "class linking" timer to go away completely, or 
> at least be disabled when CDS is enabled. It's pretty much useless. I 
> wish we had an established deprecation process for such legacy features.
> 
> 
>>
>>>>
>>>> I find it hard to evaluate whether this short-circuiting of the time 
>>>> tracing is reasonable or not. Obviously any monitoring mechanism 
>>>> should impose minimal overhead compared to what is being measured, 
>>>> and these timers fall short in that regard. But if these stats 
>>>> become meaningless then they may as well be removed.
>>>>
>>>> I think the serviceability folk (cc'd) need to evaluate this in the 
>>>> context of the M&M tools.
>>>>
>>>>> However, it's quite expensive and it needs to start and stop a 
>>>>> bunch of timers. With CDS, it's quite often for the overhead of the 
>>>>> timer itself to be much more than the time it's trying to measure, 
>>>>> giving unreliable measurement.
>>>>>
>>>>> In this patch, when it's clear that the init and linking will be 
>>>>> very quick, I disable the timer and count only the number of 
>>>>> invocations. This shows a small improvement in start-up
>>>>
>>>> I'm curious if you tried to forcing EagerInitialization to be true 
>>>> to see how that improves the baseline. I've always noticed 
>>>> eager_init in the code, but hadn't realized it is disabled by default.
>>>>
>>>
>>> I think it cannot be done by default, as it will violate the JLS. A 
>>> class can be initialized only when it's touched by bytecodes.
>>
>> I'm also not sure it violates JLS as you can't directly query if a 
>> class is initialized, but anyway I wasn't suggesting turning this on 
>> by default, I meant only in regard to get a performance baseline to 
>> compare against the changes you have implemented here.
> 
> I don't think EagerInitialization will make a difference that's related 
> to the timer. Both the two instances of the timer will still be used for 
> exactly the same number of times, just under different call stacks.

Eager initialization doesn't use any timers.

>>
>> Aside: I have to wonder whether anyone uses EagerInitialization or 
>> whether we should get rid of it?
>>
> I wonder about the same thing. I'll ask around and file an RFE if 
> appropriate.
> 
>>> It can also backfire as we may load many classes without initializing 
>>> them. E.g., during bytecode verification, we load many classes and 
>>> just check that one is a supertype of another.
>>
>> Not sure what is backfiring here ??
> 
> Here's an example that both violates the JLS and backfires as it slows 
> down VM start up, because <clinit> can have observable side effects:
> 
> class Main {
>      static int X;
>      public void main(String args[]) {
>          System.out.println(X);
>      }
>      void deadcode() {
>          Super s = new Child();
>          s.method();
>      }
> }
> 
> class Super {
>      void method() {}
> }
> 
> class Child extends Super {
>      static {
>          for (int i=0; i<1000000; i++) {
>              Main.X ++;
>          }
>      }
> }
> 
> When Main is linked, its bytecodes are verified, including deadcode(). 
> Since deadcode() has an implicit cast of Child to Super, the verifier 
> needs to load both Child and Super, and check that Child is indeed a 
> subclass of Super.
> 
> If EagerInitialization is enabled, Child will be initialized as soon as 
> it's entered into the system dictionary. This violates 

No it won't. Eager initialization is only applicable when there is no 
<clinit> and where the super classes are already initialised:

void InstanceKlass::eager_initialize(Thread *thread) {
   if (!EagerInitialization) return;

   if (this->is_not_initialized()) {
     // abort if the the class has a class initializer
     if (this->class_initializer() != NULL) return;

     // abort if it is java.lang.Object (initialization is handled in 
genesis)
     Klass* super_klass = super();
     if (super_klass == NULL) return;

     // abort if the super class should be initialized
     if (!InstanceKlass::cast(super_klass)->is_initialized()) return;

     // call body to expose the this pointer
     eager_initialize_impl();
   }
}

David
-----


> https://docs.oracle.com/javase/specs/jls/se14/html/jls-12.html#jls-12.4.1
> 
>     A class or interface type T will be initialized immediately before
>     the first occurrence of any one of the following:
> 
>     T is a class and an instance of T is created.
>     A static method declared by T is invoked.
>     A static field declared by T is assigned.
>     A static field declared by T is used and the field is not a constant
>     variable (§4.12.4).
>     ....
>     A class or interface will not be initialized under any other
>     circumstance.
> 
> 
> Also, app developers will generally expect 0 to be printed. So 
> EagerInitialization will probably break apps in subtle ways.
> 
> And initializing Child may recursively load in more classes during the 
> verification of Child ......
> 
> Thanks
> - Ioi
> 
>>
>> Thanks,
>> David
>> -----
>>
>>> Thanks
>>> - Ioi
>>>
>>>> Cheers,
>>>> David
>>>> -----
>>>>
>>>>> Results of " perf stat -r 100 bin/java -Xshare:on 
>>>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version "
>>>>>
>>>>> 59623970 59341935 (-282035)   -----  41.774  41.591 ( -0.183) -
>>>>> 59623495 59331646 (-291849)   -----  41.696  41.165 ( -0.531) --
>>>>> 59627148 59329526 (-297622)   -----  41.249  41.094 ( -0.155) -
>>>>> 59612439 59340760 (-271679)   ----   41.773  40.657 ( -1.116) -----
>>>>> 59626438 59335681 (-290757)   -----  41.683  40.901 ( -0.782) ----
>>>>> 59618436 59338953 (-279483)   -----  41.861  41.249 ( -0.612) ---
>>>>> 59608782 59340173 (-268609)   ----   41.198  41.508 ( 0.310) +
>>>>> 59614612 59325177 (-289435)   -----  41.397  41.738 ( 0.341) ++
>>>>> 59615905 59344006 (-271899)   ----   41.921  40.969 ( -0.952) ----
>>>>> 59635867 59333147 (-302720)   -----  41.491  40.836 ( -0.655) ---
>>>>> ================================================
>>>>> 59620708 59336100 (-284608)   -----  41.604  41.169 ( -0.434) --
>>>>> instruction delta =      -284608    -0.4774%
>>>>> time        delta =       -0.434 ms -1.0435%
>>>>>
>>>>> The number of PerfClassTraceTime's used is reduced from 564 to 116 
>>>>> (so we have an overhead of about 715 instructions per use, yikes!).
>>>
> 


More information about the serviceability-dev mailing list