RFR(S) 8246019 PerfClassTraceTime slows down VM start-up
David Holmes
david.holmes at oracle.com
Thu Jun 18 05:16:30 UTC 2020
Hi Ioi,
Re EagerInitialize ... skip to bottom part ...
On 18/06/2020 2:34 pm, Ioi Lam wrote:
>
>
> On 6/17/20 7:38 PM, David Holmes wrote:
>> Hi Ioi,
>>
>> On 17/06/2020 1:19 pm, Ioi Lam wrote:
>>> On 6/16/20 6:20 PM, David Holmes wrote:
>>>> Hi Ioi,
>>>>
>>>> On 17/06/2020 6:14 am, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8246019
>>>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/
>>>>>
>>>>>
>>>>> PerfClassTraceTime is (a rarely used feature) for measuring the
>>>>> time spent during class linking and initialization.
>>>>
>>>> "A special command jcmd <process id/main class> PerfCounter.print
>>>> prints all performance counters in the process."
>>>>
>>>> How do you know this is a "rarely used feature"?
>>> Hi David,
>>>
>>> Sure, the counter will be dumped, but by "rarely used" -- I mean no
>>> one will find this particular counter useful, and no one will be
>>> actively looking at it.
>>>
>>> I changed two parts of the code -- class init and class linking.
>>>
>>> For class initialization, the counter may be useful for people who
>>> want to know how much time is spent in their <clinit> functions, and
>>> my patch doesn't change that. It only avoids using the counter when a
>>> class has no <clinit>, i.e., we know that the counter counts nothing
>>> (except for a logging statement).
>>
>> I can see where you are coming from here. Today we keep track of the
>> time taken to mark a class as initialized when there is no clinit to
>> execute, and we actually record pure timer overhead as it dwarfs the
>> simple update to the class state. With your change we now won't track
>> the time taken to mark the class as initialized. In both cases the
>> time recorded is inaccurate - in opposite senses. In that regard your
>> slight underestimation of the class initalization cost seems better
>> then the present over-estimate.
>>
>>> =====
>>>
>>> For class linking, no user code is executed, so it only measures VM
>>> code. If it's useful for anyone, that would be VM engineers like me
>>> who are trying to optimize the speed of class loading. However, due
>>> to the overhead of the counter vs what it's trying to measure, the
>>> results are pretty meaningless.
>>>
>>> Note that I've not disabled the counter altogether. Instead, I
>>> disable it only when linking a CDS shared class, and we know that
>>> very little is happening for this class (e.g., no verification).
>>
>> Yes I have little concern for this part. Linking is a multi-phase
>> process so "time for linking" is already ill-defined. And the fact you
>> only do it for CDS makes it even less interesting.
>>
>>> I think the class linking timer might have been useful 15 years ago
>>> when it was introduced, or it might be useful today when CDS is
>>> disabled. But with CDS enabled, we are paying a constant price that
>>> seems to benefit no one.
>>>
>>> I think we should short-circuit it when it seems appropriate. If this
>>> indeed causes problems for our users, it's easy to re-enable it.
>>> That's better than just keeping this forever just because we're
>>> afraid to touch anything.
>>
>> I'm uncomfortable with both the "keep forever as we're too afraid to
>> change" and "change it now and restore it if anyone complains" ends of
>> this spectrum. Obviously we need to make progress, but the "change it
>> now and change back later if needed" is a bit naive, as once any
>> change is made we can't change back without affecting another set of
>> users, and we don't know how long it will be before the change reaches
>> users and the problems return to us. From a CSR perspective I want to
>> see that due diligence has been applied with regard to these
>> behavioural changes, and JDK engineers are often not in a position to
>> understand how end users use this kind of information. I don't have a
>> solution for that general problem.
>>
>> In this particular case I think under-estimating the class
>> initialization overhead is better than the present over-estimate.
>> Though anyone tracking the trends here will be surprised when the cost
>> suddenly drops.
>
> Hi David,
>
> I don't have a solution, either. I am in no hurry, and the improvement
> is minor.
>
> I could post a CSR and let it stand for a few months to see if anyone
> objects. My impression is "linking time" is such an esoteric feature
> that no one will care, but I may be wrong. Actually it would be good if
> someone tells me I am wrong -- they probably are experiencing some
> overhead in class loading that we don't currently know about.
>
> As I mentioned earlier, if anyone is using this timer, it would be VM
> engineers who work on class loading. In fact, Yumin fixed 8178349 "Cache
> builtin class loader constraints to avoid re-initializing itable/vtable
> for shared classes" in JDK 15, which significantly reduced the amount of
> time spent during class linking. However, we didn't use this timer for
> measuring the effectiveness of that fix, as the overhead and variability
> are too high. We have reached a point in the class loading code that we
> can only make small incremental improvements, and we can only measure
> the effect of our changes with external profilers such as "perf stat -r
> 200 bin/java -version" that launches the VM 200 times (and repeat that
> 10 times) and averages the elapsed time.
>
> Maybe it's time for the "class linking" timer to go away completely, or
> at least be disabled when CDS is enabled. It's pretty much useless. I
> wish we had an established deprecation process for such legacy features.
>
>
>>
>>>>
>>>> I find it hard to evaluate whether this short-circuiting of the time
>>>> tracing is reasonable or not. Obviously any monitoring mechanism
>>>> should impose minimal overhead compared to what is being measured,
>>>> and these timers fall short in that regard. But if these stats
>>>> become meaningless then they may as well be removed.
>>>>
>>>> I think the serviceability folk (cc'd) need to evaluate this in the
>>>> context of the M&M tools.
>>>>
>>>>> However, it's quite expensive and it needs to start and stop a
>>>>> bunch of timers. With CDS, it's quite often for the overhead of the
>>>>> timer itself to be much more than the time it's trying to measure,
>>>>> giving unreliable measurement.
>>>>>
>>>>> In this patch, when it's clear that the init and linking will be
>>>>> very quick, I disable the timer and count only the number of
>>>>> invocations. This shows a small improvement in start-up
>>>>
>>>> I'm curious if you tried to forcing EagerInitialization to be true
>>>> to see how that improves the baseline. I've always noticed
>>>> eager_init in the code, but hadn't realized it is disabled by default.
>>>>
>>>
>>> I think it cannot be done by default, as it will violate the JLS. A
>>> class can be initialized only when it's touched by bytecodes.
>>
>> I'm also not sure it violates JLS as you can't directly query if a
>> class is initialized, but anyway I wasn't suggesting turning this on
>> by default, I meant only in regard to get a performance baseline to
>> compare against the changes you have implemented here.
>
> I don't think EagerInitialization will make a difference that's related
> to the timer. Both the two instances of the timer will still be used for
> exactly the same number of times, just under different call stacks.
Eager initialization doesn't use any timers.
>>
>> Aside: I have to wonder whether anyone uses EagerInitialization or
>> whether we should get rid of it?
>>
> I wonder about the same thing. I'll ask around and file an RFE if
> appropriate.
>
>>> It can also backfire as we may load many classes without initializing
>>> them. E.g., during bytecode verification, we load many classes and
>>> just check that one is a supertype of another.
>>
>> Not sure what is backfiring here ??
>
> Here's an example that both violates the JLS and backfires as it slows
> down VM start up, because <clinit> can have observable side effects:
>
> class Main {
> static int X;
> public void main(String args[]) {
> System.out.println(X);
> }
> void deadcode() {
> Super s = new Child();
> s.method();
> }
> }
>
> class Super {
> void method() {}
> }
>
> class Child extends Super {
> static {
> for (int i=0; i<1000000; i++) {
> Main.X ++;
> }
> }
> }
>
> When Main is linked, its bytecodes are verified, including deadcode().
> Since deadcode() has an implicit cast of Child to Super, the verifier
> needs to load both Child and Super, and check that Child is indeed a
> subclass of Super.
>
> If EagerInitialization is enabled, Child will be initialized as soon as
> it's entered into the system dictionary. This violates
No it won't. Eager initialization is only applicable when there is no
<clinit> and where the super classes are already initialised:
void InstanceKlass::eager_initialize(Thread *thread) {
if (!EagerInitialization) return;
if (this->is_not_initialized()) {
// abort if the the class has a class initializer
if (this->class_initializer() != NULL) return;
// abort if it is java.lang.Object (initialization is handled in
genesis)
Klass* super_klass = super();
if (super_klass == NULL) return;
// abort if the super class should be initialized
if (!InstanceKlass::cast(super_klass)->is_initialized()) return;
// call body to expose the this pointer
eager_initialize_impl();
}
}
David
-----
> https://docs.oracle.com/javase/specs/jls/se14/html/jls-12.html#jls-12.4.1
>
> A class or interface type T will be initialized immediately before
> the first occurrence of any one of the following:
>
> T is a class and an instance of T is created.
> A static method declared by T is invoked.
> A static field declared by T is assigned.
> A static field declared by T is used and the field is not a constant
> variable (§4.12.4).
> ....
> A class or interface will not be initialized under any other
> circumstance.
>
>
> Also, app developers will generally expect 0 to be printed. So
> EagerInitialization will probably break apps in subtle ways.
>
> And initializing Child may recursively load in more classes during the
> verification of Child ......
>
> Thanks
> - Ioi
>
>>
>> Thanks,
>> David
>> -----
>>
>>> Thanks
>>> - Ioi
>>>
>>>> Cheers,
>>>> David
>>>> -----
>>>>
>>>>> Results of " perf stat -r 100 bin/java -Xshare:on
>>>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version "
>>>>>
>>>>> 59623970 59341935 (-282035) ----- 41.774 41.591 ( -0.183) -
>>>>> 59623495 59331646 (-291849) ----- 41.696 41.165 ( -0.531) --
>>>>> 59627148 59329526 (-297622) ----- 41.249 41.094 ( -0.155) -
>>>>> 59612439 59340760 (-271679) ---- 41.773 40.657 ( -1.116) -----
>>>>> 59626438 59335681 (-290757) ----- 41.683 40.901 ( -0.782) ----
>>>>> 59618436 59338953 (-279483) ----- 41.861 41.249 ( -0.612) ---
>>>>> 59608782 59340173 (-268609) ---- 41.198 41.508 ( 0.310) +
>>>>> 59614612 59325177 (-289435) ----- 41.397 41.738 ( 0.341) ++
>>>>> 59615905 59344006 (-271899) ---- 41.921 40.969 ( -0.952) ----
>>>>> 59635867 59333147 (-302720) ----- 41.491 40.836 ( -0.655) ---
>>>>> ================================================
>>>>> 59620708 59336100 (-284608) ----- 41.604 41.169 ( -0.434) --
>>>>> instruction delta = -284608 -0.4774%
>>>>> time delta = -0.434 ms -1.0435%
>>>>>
>>>>> The number of PerfClassTraceTime's used is reduced from 564 to 116
>>>>> (so we have an overhead of about 715 instructions per use, yikes!).
>>>
>
More information about the serviceability-dev
mailing list