RFR(S) 8246019 PerfClassTraceTime slows down VM start-up
Ioi Lam
ioi.lam at oracle.com
Thu Jun 18 05:57:21 UTC 2020
On 6/17/20 10:16 PM, David Holmes wrote:
> Hi Ioi,
>
> Re EagerInitialize ... skip to bottom part ...
>
> On 18/06/2020 2:34 pm, Ioi Lam wrote:
>>
>>
>> On 6/17/20 7:38 PM, David Holmes wrote:
>>> Hi Ioi,
>>>
>>> On 17/06/2020 1:19 pm, Ioi Lam wrote:
>>>> On 6/16/20 6:20 PM, David Holmes wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> On 17/06/2020 6:14 am, Ioi Lam wrote:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246019
>>>>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/
>>>>>>
>>>>>>
>>>>>> PerfClassTraceTime is (a rarely used feature) for measuring the
>>>>>> time spent during class linking and initialization.
>>>>>
>>>>> "A special command jcmd <process id/main class> PerfCounter.print
>>>>> prints all performance counters in the process."
>>>>>
>>>>> How do you know this is a "rarely used feature"?
>>>> Hi David,
>>>>
>>>> Sure, the counter will be dumped, but by "rarely used" -- I mean no
>>>> one will find this particular counter useful, and no one will be
>>>> actively looking at it.
>>>>
>>>> I changed two parts of the code -- class init and class linking.
>>>>
>>>> For class initialization, the counter may be useful for people who
>>>> want to know how much time is spent in their <clinit> functions,
>>>> and my patch doesn't change that. It only avoids using the counter
>>>> when a class has no <clinit>, i.e., we know that the counter counts
>>>> nothing (except for a logging statement).
>>>
>>> I can see where you are coming from here. Today we keep track of the
>>> time taken to mark a class as initialized when there is no clinit to
>>> execute, and we actually record pure timer overhead as it dwarfs the
>>> simple update to the class state. With your change we now won't
>>> track the time taken to mark the class as initialized. In both cases
>>> the time recorded is inaccurate - in opposite senses. In that regard
>>> your slight underestimation of the class initalization cost seems
>>> better then the present over-estimate.
>>>
>>>> =====
>>>>
>>>> For class linking, no user code is executed, so it only measures VM
>>>> code. If it's useful for anyone, that would be VM engineers like me
>>>> who are trying to optimize the speed of class loading. However, due
>>>> to the overhead of the counter vs what it's trying to measure, the
>>>> results are pretty meaningless.
>>>>
>>>> Note that I've not disabled the counter altogether. Instead, I
>>>> disable it only when linking a CDS shared class, and we know that
>>>> very little is happening for this class (e.g., no verification).
>>>
>>> Yes I have little concern for this part. Linking is a multi-phase
>>> process so "time for linking" is already ill-defined. And the fact
>>> you only do it for CDS makes it even less interesting.
>>>
>>>> I think the class linking timer might have been useful 15 years ago
>>>> when it was introduced, or it might be useful today when CDS is
>>>> disabled. But with CDS enabled, we are paying a constant price that
>>>> seems to benefit no one.
>>>>
>>>> I think we should short-circuit it when it seems appropriate. If
>>>> this indeed causes problems for our users, it's easy to re-enable
>>>> it. That's better than just keeping this forever just because we're
>>>> afraid to touch anything.
>>>
>>> I'm uncomfortable with both the "keep forever as we're too afraid to
>>> change" and "change it now and restore it if anyone complains" ends
>>> of this spectrum. Obviously we need to make progress, but the
>>> "change it now and change back later if needed" is a bit naive, as
>>> once any change is made we can't change back without affecting
>>> another set of users, and we don't know how long it will be before
>>> the change reaches users and the problems return to us. From a CSR
>>> perspective I want to see that due diligence has been applied with
>>> regard to these behavioural changes, and JDK engineers are often not
>>> in a position to understand how end users use this kind of
>>> information. I don't have a solution for that general problem.
>>>
>>> In this particular case I think under-estimating the class
>>> initialization overhead is better than the present over-estimate.
>>> Though anyone tracking the trends here will be surprised when the
>>> cost suddenly drops.
>>
>> Hi David,
>>
>> I don't have a solution, either. I am in no hurry, and the
>> improvement is minor.
>>
>> I could post a CSR and let it stand for a few months to see if anyone
>> objects. My impression is "linking time" is such an esoteric feature
>> that no one will care, but I may be wrong. Actually it would be good
>> if someone tells me I am wrong -- they probably are experiencing some
>> overhead in class loading that we don't currently know about.
>>
>> As I mentioned earlier, if anyone is using this timer, it would be VM
>> engineers who work on class loading. In fact, Yumin fixed 8178349
>> "Cache builtin class loader constraints to avoid re-initializing
>> itable/vtable for shared classes" in JDK 15, which significantly
>> reduced the amount of time spent during class linking. However, we
>> didn't use this timer for measuring the effectiveness of that fix, as
>> the overhead and variability are too high. We have reached a point in
>> the class loading code that we can only make small incremental
>> improvements, and we can only measure the effect of our changes with
>> external profilers such as "perf stat -r 200 bin/java -version" that
>> launches the VM 200 times (and repeat that 10 times) and averages the
>> elapsed time.
>>
>> Maybe it's time for the "class linking" timer to go away completely,
>> or at least be disabled when CDS is enabled. It's pretty much
>> useless. I wish we had an established deprecation process for such
>> legacy features.
>>
>>
>>>
>>>>>
>>>>> I find it hard to evaluate whether this short-circuiting of the
>>>>> time tracing is reasonable or not. Obviously any monitoring
>>>>> mechanism should impose minimal overhead compared to what is being
>>>>> measured, and these timers fall short in that regard. But if these
>>>>> stats become meaningless then they may as well be removed.
>>>>>
>>>>> I think the serviceability folk (cc'd) need to evaluate this in
>>>>> the context of the M&M tools.
>>>>>
>>>>>> However, it's quite expensive and it needs to start and stop a
>>>>>> bunch of timers. With CDS, it's quite often for the overhead of
>>>>>> the timer itself to be much more than the time it's trying to
>>>>>> measure, giving unreliable measurement.
>>>>>>
>>>>>> In this patch, when it's clear that the init and linking will be
>>>>>> very quick, I disable the timer and count only the number of
>>>>>> invocations. This shows a small improvement in start-up
>>>>>
>>>>> I'm curious if you tried to forcing EagerInitialization to be true
>>>>> to see how that improves the baseline. I've always noticed
>>>>> eager_init in the code, but hadn't realized it is disabled by
>>>>> default.
>>>>>
>>>>
>>>> I think it cannot be done by default, as it will violate the JLS. A
>>>> class can be initialized only when it's touched by bytecodes.
>>>
>>> I'm also not sure it violates JLS as you can't directly query if a
>>> class is initialized, but anyway I wasn't suggesting turning this on
>>> by default, I meant only in regard to get a performance baseline to
>>> compare against the changes you have implemented here.
>>
>> I don't think EagerInitialization will make a difference that's
>> related to the timer. Both the two instances of the timer will still
>> be used for exactly the same number of times, just under different
>> call stacks.
>
> Eager initialization doesn't use any timers.
>
>>>
>>> Aside: I have to wonder whether anyone uses EagerInitialization or
>>> whether we should get rid of it?
>>>
>> I wonder about the same thing. I'll ask around and file an RFE if
>> appropriate.
>>
>>>> It can also backfire as we may load many classes without
>>>> initializing them. E.g., during bytecode verification, we load many
>>>> classes and just check that one is a supertype of another.
>>>
>>> Not sure what is backfiring here ??
>>
>> Here's an example that both violates the JLS and backfires as it
>> slows down VM start up, because <clinit> can have observable side
>> effects:
>>
>> class Main {
>> static int X;
>> public void main(String args[]) {
>> System.out.println(X);
>> }
>> void deadcode() {
>> Super s = new Child();
>> s.method();
>> }
>> }
>>
>> class Super {
>> void method() {}
>> }
>>
>> class Child extends Super {
>> static {
>> for (int i=0; i<1000000; i++) {
>> Main.X ++;
>> }
>> }
>> }
>>
>> When Main is linked, its bytecodes are verified, including
>> deadcode(). Since deadcode() has an implicit cast of Child to Super,
>> the verifier needs to load both Child and Super, and check that Child
>> is indeed a subclass of Super.
>>
>> If EagerInitialization is enabled, Child will be initialized as soon
>> as it's entered into the system dictionary. This violates
>
> No it won't. Eager initialization is only applicable when there is no
> <clinit> and where the super classes are already initialised:
>
> void InstanceKlass::eager_initialize(Thread *thread) {
> if (!EagerInitialization) return;
>
> if (this->is_not_initialized()) {
> // abort if the the class has a class initializer
> if (this->class_initializer() != NULL) return;
>
> // abort if it is java.lang.Object (initialization is handled in
> genesis)
> Klass* super_klass = super();
> if (super_klass == NULL) return;
>
> // abort if the super class should be initialized
> if (!InstanceKlass::cast(super_klass)->is_initialized()) return;
>
> // call body to expose the this pointer
> eager_initialize_impl();
> }
> }
>
> David
> -----
>
Ah, I didn't see that. I should have read the code first :-)
Anyway, Eager initialization will trigger verification which may cause
more classes to be loaded (and initialized if they don't have <clinit>,
and so forth). So it's not equivalent to, and is hard to compare with,
my proposed changes for skipping the timer.
Thanks
- Ioi
>> https://docs.oracle.com/javase/specs/jls/se14/html/jls-12.html#jls-12.4.1
>>
>>
>> A class or interface type T will be initialized immediately before
>> the first occurrence of any one of the following:
>>
>> T is a class and an instance of T is created.
>> A static method declared by T is invoked.
>> A static field declared by T is assigned.
>> A static field declared by T is used and the field is not a constant
>> variable (§4.12.4).
>> ....
>> A class or interface will not be initialized under any other
>> circumstance.
>>
>>
>> Also, app developers will generally expect 0 to be printed. So
>> EagerInitialization will probably break apps in subtle ways.
>>
>> And initializing Child may recursively load in more classes during
>> the verification of Child ......
>>
>> Thanks
>> - Ioi
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> Cheers,
>>>>> David
>>>>> -----
>>>>>
>>>>>> Results of " perf stat -r 100 bin/java -Xshare:on
>>>>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version "
>>>>>>
>>>>>> 59623970 59341935 (-282035) ----- 41.774 41.591 ( -0.183) -
>>>>>> 59623495 59331646 (-291849) ----- 41.696 41.165 ( -0.531) --
>>>>>> 59627148 59329526 (-297622) ----- 41.249 41.094 ( -0.155) -
>>>>>> 59612439 59340760 (-271679) ---- 41.773 40.657 ( -1.116) -----
>>>>>> 59626438 59335681 (-290757) ----- 41.683 40.901 ( -0.782) ----
>>>>>> 59618436 59338953 (-279483) ----- 41.861 41.249 ( -0.612) ---
>>>>>> 59608782 59340173 (-268609) ---- 41.198 41.508 ( 0.310) +
>>>>>> 59614612 59325177 (-289435) ----- 41.397 41.738 ( 0.341) ++
>>>>>> 59615905 59344006 (-271899) ---- 41.921 40.969 ( -0.952) ----
>>>>>> 59635867 59333147 (-302720) ----- 41.491 40.836 ( -0.655) ---
>>>>>> ================================================
>>>>>> 59620708 59336100 (-284608) ----- 41.604 41.169 ( -0.434) --
>>>>>> instruction delta = -284608 -0.4774%
>>>>>> time delta = -0.434 ms -1.0435%
>>>>>>
>>>>>> The number of PerfClassTraceTime's used is reduced from 564 to
>>>>>> 116 (so we have an overhead of about 715 instructions per use,
>>>>>> yikes!).
>>>>
>>
More information about the serviceability-dev
mailing list