RFR: JDK-8236604: Optimize SystemDictionary::resolve_well_known_classes for CDS
Yumin Qi
yumin.qi at oracle.com
Tue Feb 25 15:59:16 UTC 2020
HI, Claes
Thanks for your confirmation. Wait for your full-review!
Yumin
On 2/24/20 9:41 PM, Claes Redestad wrote:
> Hi,
>
> before diving into a full review, I took your patch for a spin and
> can confirm a speed-up around .5ms, along with about a 1% reduction of
> #instructions and #branches on Hello World[1]. Sweet!
>
> /Claes
>
> [1]
> Baseline:
> 53.028269 task-clock (msec) # 1.396 CPUs
> utilized ( +- 0.34% )
> 244 context-switches # 0.005 M/sec
> ( +- 0.60% )
> 32 cpu-migrations # 0.607 K/sec
> ( +- 0.39% )
> 3,634 page-faults # 0.069 M/sec
> ( +- 0.03% )
> 138,590,680 cycles # 2.614 GHz
> ( +- 0.25% )
> 112,946,802 instructions # 0.81 insns per
> cycle ( +- 0.06% )
> 22,428,330 branches # 422.950 M/sec
> ( +- 0.06% )
> 771,755 branch-misses # 3.44% of all
> branches ( +- 0.13% )
>
> 0.037981200 seconds time elapsed ( +- 0.39% )
>
> Patched:
> 52.537826 task-clock (msec) # 1.409 CPUs
> utilized ( +- 0.26% )
> 244 context-switches # 0.005 M/sec
> ( +- 0.65% )
> 32 cpu-migrations # 0.607 K/sec
> ( +- 0.43% )
> 3,637 page-faults # 0.069 M/sec
> ( +- 0.04% )
> 137,129,495 cycles # 2.610 GHz
> ( +- 0.23% )
> 111,931,764 instructions # 0.82 insns per
> cycle ( +- 0.07% )
> 22,215,346 branches # 422.845 M/sec
> ( +- 0.07% )
> 766,573 branch-misses # 3.45% of all
> branches ( +- 0.13% )
>
> 0.037283845 seconds time elapsed ( +- 0.30% )
>
>
> On 2020-02-25 06:02, Yumin Qi wrote:
>> Hi,
>>
>> Please review fix:
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8236604
>>
>> Webrev:http://cr.openjdk.java.net/~minqi/8236604/webrev-00
>>
>> Description: Optimize well known classes initialization for CDS
>> during jvm startup.
>>
>> When run with CDS, initialize well known classes (95 classes)
>> will call resolve_or_fail thus go though the resolve functions, locks
>> etc. The initialization of well-known classes happens in fact in very
>> early stage, those lock can be avoided. The fix is serializing
>> SystemDictionay::_well_known_klasses into CDS and in runtime restore
>> them by avoiding call resolve_or_fail. Since Compile_lock has to be
>> held for SystemDictionary::add_to_hierachy, the fix avoids calling it
>> by copying the code from it in SytemDictionary::quick_resolve, but
>> this way I have to remove two asserts which will assert on
>> Compile_lock. The reminding usage of the lock was added as comments
>> on the function declarations of Klass::append_to_sibling_list and
>> InstanceKlass::add_implementor. The two functions calling paths have
>> been checked to make sure they are not used in other places.
>>
>> Performance measured by take time before and after
>> SystemDictionary::resolve_well_known_classes(manually modified
>> orignal/new versions), since this is the most direct measurement and
>> accurate(excluded in review code):
>>
>> + jlong s0 = os::javaTimeNanos();
>> resolve_well_known_classes(CHECK);
>>
>> + jlong s1 = os::javaTimeNanos();
>>
>> // print out s1 - s0
>>
>> Run -version 2000 times for original/new versions respectively
>> and took the averages. The saving is about 18% (2.9ms vs 2.4ms), it
>> is not a big saving but still helps to improve the startup time.
>>
>> Testing: local jtreg test on linux-x86.
>>
>> pending mach5 hs-tier1-4
>>
>>
>> Thanks
>>
>> Yumin
>>
More information about the hotspot-runtime-dev
mailing list