Additional clang -ftime-trace data

Stefan Karlsson stefan.karlsson at oracle.com
Tue Sep 6 12:28:53 UTC 2022


If you want to see something ugly take a look at this patch, which cuts 
the inlining of AccessInternal code, and explicitly instantiate needed 
definitions in the a new access.cpp file:
https://github.com/openjdk/jdk/compare/master...stefank:jdk:access_backend_no_inline_for_compile_times

This PoC showed a 20% HotSpot compile time improvement, but I got 
feedback that this was so ugly that they would rather live with the 
extra compile time, than to ever have to look at the access.cpp file 
again. :)

StefanK

On 2022-09-06 12:36, Magnus Ihse Bursie wrote:
> Here are more output from ClangBuildAnalyzer. Seems that 
> AccessInternal might be worth having a second look at, to see if it 
> can be made easier on the compiler..?
>
> **** Time summary:
> Compilation (2033 times):
>   Parsing (frontend):          271.6 s
>   Codegen & opts (backend):    274.1 s
>
> **** Files that took longest to parse (compiler frontend):
>   1339 ms: ad_aarch64.o
>   1311 ms: dfa_aarch64.o
>   1108 ms: compileBroker.o
>   1088 ms: compile.o
>   1005 ms: ad_aarch64_misc.o
>    990 ms: library_call.o
>    971 ms: whitebox.o
>    953 ms: runtime.o
>    938 ms: sharedRuntime_aarch64.o
>    889 ms: instanceKlass.o
>
> **** Files that took longest to codegen (compiler backend):
>   7267 ms: ad_aarch64_gen.o
>   6830 ms: ad_aarch64.o
>   5402 ms: dfa_aarch64.o
>   3482 ms: whitebox.o
>   3106 ms: ad_aarch64_misc.o
>   2960 ms: jvmtiEnterTrace.o
>   2844 ms: macroAssembler_aarch64.o
>   2568 ms: jvmciCompilerToVMInit.o
>   2520 ms: jni.o
>   2512 ms: stubGenerator_aarch64.o
>
> **** Templates that took longest to instantiate:
>   2884 ms: AccessInternal::OopLoadAtProxy<262144>::operator oopDesc * 
> (512 times, avg 5 ms)
>   2870 ms: AccessInternal::load_at<262148, oopDesc *> (512 times, avg 
> 5 ms)
>   2797 ms: AccessInternal::OopLoadAtProxy<1310720>::operator oopDesc * 
> (512 times, avg 5 ms)
>   2783 ms: AccessInternal::load_at<1310724, oopDesc *> (512 times, avg 
> 5 ms)
>   2656 ms: AccessInternal::PreRuntimeDispatch::load_at<286790, oopDesc 
> *> (512 times, avg 5 ms)
>   2569 ms: AccessInternal::PreRuntimeDispatch::load_at<1335366, 
> oopDesc *> (512 times, avg 5 ms)
>   2446 ms: Access<525312>::oop_atomic_xchg<oopDesc *, oopDesc *> (383 
> times, avg 6 ms)
>   2380 ms: AccessInternal::atomic_xchg<525316, oopDesc *, oopDesc *> 
> (383 times, avg 6 ms)
>   2311 ms: AccessInternal::atomic_xchg_reduce_types<549892, oopDesc *> 
> (383 times, avg 6 ms)
>   2252 ms: AccessInternal::PreRuntimeDispatch::atomic_xchg<549892, 
> oopDesc *> (383 times, avg 5 ms)
>   2234 ms: Access<262144>::oop_store_at<oopDesc *> (512 times, avg 4 ms)
>   2168 ms: Access<1310720>::oop_store_at<oopDesc *> (512 times, avg 4 ms)
>   2162 ms: AccessInternal::store_at<262148, oopDesc *> (512 times, avg 
> 4 ms)
>   2119 ms: AccessInternal::PreRuntimeDispatch::store_at<286790, 
> oopDesc *> (512 times, avg 4 ms)
>   2098 ms: AccessInternal::store_at<1310724, oopDesc *> (512 times, 
> avg 4 ms)
>   2059 ms: AccessInternal::PreRuntimeDispatch::store_at<1335366, 
> oopDesc *> (512 times, avg 4 ms)
>   2018 ms: AccessInternal::RuntimeDispatch<286790, oopDesc *, 
> AccessInternal::B... (512 times, avg 3 ms)
>   1999 ms: AccessInternal::RuntimeDispatch<286790, oopDesc *, 
> AccessInternal::B... (512 times, avg 3 ms)
>   1954 ms: AccessInternal::BarrierResolver<286790, oopDesc 
> *(*)(oopDesc *, long... (512 times, avg 3 ms)
>   1949 ms: AccessInternal::BarrierResolver<286790, oopDesc 
> *(*)(oopDesc *, long... (512 times, avg 3 ms)
>   1941 ms: AccessInternal::RuntimeDispatch<1335366, oopDesc *, 
> AccessInternal::... (512 times, avg 3 ms)
>   1931 ms: AccessInternal::OopLoadProxy<oopDesc *, 524288>::operator 
> oopDesc * (414 times, avg 4 ms)
>   1921 ms: AccessInternal::load<524292, oopDesc *, oopDesc *> (414 
> times, avg 4 ms)
>   1919 ms: AccessInternal::RuntimeDispatch<1335366, oopDesc *, 
> AccessInternal::... (512 times, avg 3 ms)
>   1877 ms: AccessInternal::BarrierResolver<1335366, oopDesc 
> *(*)(oopDesc *, lon... (512 times, avg 3 ms)
>   1872 ms: AccessInternal::BarrierResolver<1335366, oopDesc 
> *(*)(oopDesc *, lon... (512 times, avg 3 ms)
>   1844 ms: AccessInternal::OopLoadProxy<oopDesc *, 528384>::operator 
> oopDesc * (391 times, avg 4 ms)
>   1840 ms: AccessInternal::load_reduce_types<548932, oopDesc *> (414 
> times, avg 4 ms)
>   1834 ms: AccessInternal::load<528388, oopDesc *, oopDesc *> (391 
> times, avg 4 ms)
>   1789 ms: AccessInternal::PreRuntimeDispatch::load<548932, oopDesc *> 
> (414 times, avg 4 ms)
>
> **** Template sets that took longest to instantiate:
>  19441 ms: AccessInternal::BarrierResolver<$>::resolve_barrier (5560 
> times, avg 3 ms)
>  19373 ms: AccessInternal::BarrierResolver<$>::resolve_barrier_rt 
> (5560 times, avg 3 ms)
>  18680 ms: AccessInternal::BarrierResolver<$>::resolve_barrier_gc<$> 
> (10787 times, avg 1 ms)
>  10406 ms: AccessInternal::PostRuntimeDispatch<$>::oop_access_barrier 
> (10218 times, avg 1 ms)
>  10357 ms: AccessInternal::load_at<$> (1953 times, avg 5 ms)
>   9745 ms: AccessInternal::PreRuntimeDispatch::load_at<$> (1953 times, 
> avg 4 ms)
>   8775 ms: AccessInternal::load<$> (3284 times, avg 2 ms)
>   7888 ms: AccessInternal::RuntimeDispatch<$>::load_at (1953 times, 
> avg 4 ms)
>   7876 ms: AccessInternal::PreRuntimeDispatch::load<$> (3068 times, 
> avg 2 ms)
>   7811 ms: AccessInternal::RuntimeDispatch<$>::load_at_init (1953 
> times, avg 3 ms)
>   7470 ms: AccessInternal::load_reduce_types<$> (2771 times, avg 2 ms)
>   5702 ms: AccessInternal::RuntimeDispatch<$>::load (1529 times, avg 3 
> ms)
>   5644 ms: AccessInternal::RuntimeDispatch<$>::load_init (1529 times, 
> avg 3 ms)
>   5055 ms: oopDesc::obj_field_access<$> (925 times, avg 5 ms)
>   4418 ms: Access<$>::oop_store_at<$> (1029 times, avg 4 ms)
>   4275 ms: AccessInternal::store_at<$> (1029 times, avg 4 ms)
>   4194 ms: AccessInternal::PreRuntimeDispatch::store_at<$> (1029 
> times, avg 4 ms)
>   3515 ms: AccessInternal::RuntimeDispatch<$>::store_at (1029 times, 
> avg 3 ms)
>   3477 ms: AccessInternal::RuntimeDispatch<$>::store_at_init (1029 
> times, avg 3 ms)
>   2884 ms: AccessInternal::OopLoadAtProxy<262144>::operator oopDesc * 
> (512 times, avg 5 ms)
>   2797 ms: AccessInternal::OopLoadAtProxy<1310720>::operator oopDesc * 
> (512 times, avg 5 ms)
>   2446 ms: Access<$>::oop_atomic_xchg<$> (383 times, avg 6 ms)
>   2380 ms: AccessInternal::atomic_xchg<$> (383 times, avg 6 ms)
>   2311 ms: AccessInternal::atomic_xchg_reduce_types<$> (383 times, avg 
> 6 ms)
>   2269 ms: IsIntegral<$> (2546 times, avg 0 ms)
>   2252 ms: AccessInternal::PreRuntimeDispatch::atomic_xchg<$> (383 
> times, avg 5 ms)
>   2035 ms: ModRefBarrierSet::AccessBarrier<$>::oop_store_in_heap<$> 
> (1416 times, avg 1 ms)
>   1996 ms: ModRefBarrierSet::AccessBarrier<$>::oop_store_in_heap_at 
> (1306 times, avg 1 ms)
>   1931 ms: AccessInternal::OopLoadProxy<oopDesc *, 524288>::operator 
> oopDesc * (414 times, avg 4 ms)
>   1844 ms: AccessInternal::OopLoadProxy<oopDesc *, 528384>::operator 
> oopDesc * (391 times, avg 4 ms)
>
> **** Functions that took longest to compile:
>   3963 ms: State::MachNodeGenerator(int) 
> (gensrc/adfiles/ad_aarch64_gen.cpp)
>   1137 ms: CompilerToVM::initialize_intrinsics(JVMCIEnv*) 
> (src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp)
>   1099 ms: ClassVerifier::verify_method(methodHandle const&, 
> JavaThread*) (src/hotspot/share/classfile/verifier.cpp)
>   1050 ms: readConfiguration0(JNIEnv_*, JVMCIEnv*) 
> (src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp)
>    595 ms: Parse::do_one_bytecode() (src/hotspot/share/opto/parse2.cpp)
>    511 ms: schema_extend_event_subklass_bytes(InstanceKlass const*, 
> ClassFilePa... 
> (src/hotspot/share/jfr/instrumentation/jfrEventClassTransformer.cpp)
>    464 ms: State::_sub_Op_AddL(Node const*) 
> (gensrc/adfiles/dfa_aarch64.cpp)
>    394 ms: State::DFA(int, Node const*) (gensrc/adfiles/dfa_aarch64.cpp)
>    383 ms: State::MachOperGenerator(int) 
> (gensrc/adfiles/ad_aarch64_gen.cpp)
>    378 ms: JNIJVMCI::initialize_ids(JNIEnv_*) 
> (src/hotspot/share/jvmci/jvmciJavaClasses.cpp)
>    319 ms: State::_sub_Op_XorI(Node const*) 
> (gensrc/adfiles/dfa_aarch64.cpp)
>    315 ms: PhaseChaitin::Split(unsigned int, ResourceArea*) 
> (src/hotspot/share/opto/reg_split.cpp)
>    277 ms: GenerateOopMap::interp1(BytecodeStream*) 
> (src/hotspot/share/oops/generateOopMap.cpp)
>    260 ms: PhiNode::Ideal(PhaseGVN*, bool) 
> (src/hotspot/share/opto/cfgnode.cpp)
>    255 ms: State::_sub_Op_AndI(Node const*) 
> (gensrc/adfiles/dfa_aarch64.cpp)
>    229 ms: LIR_OpVisitState::visit(LIR_Op*) 
> (src/hotspot/share/c1/c1_LIR.cpp)
>    226 ms: Bytecodes::initialize() 
> (src/hotspot/share/interpreter/bytecodes.cpp)
>    222 ms: State::_sub_Op_AddI(Node const*) 
> (gensrc/adfiles/dfa_aarch64.cpp)
>    219 ms: State::_sub_Op_XorL(Node const*) 
> (gensrc/adfiles/dfa_aarch64.cpp)
>    216 ms: State::_sub_Op_AddP(Node const*) 
> (gensrc/adfiles/dfa_aarch64.cpp)
>    214 ms: TemplateTable::initialize() 
> (src/hotspot/share/interpreter/templateTable.cpp)
>    208 ms: Arguments::parse_each_vm_init_arg(JavaVMInitArgs const*, 
> bool*, JVMF... (src/hotspot/share/runtime/arguments.cpp)
>    208 ms: BCEscapeAnalyzer::iterate_one_block(ciBlock*, 
> BCEscapeAnalyzer::Stat... (src/hotspot/share/ci/bcEscapeAnalyzer.cpp)
>    189 ms: LibraryCallKit::inline_native_getEventWriter() 
> (src/hotspot/share/opto/library_call.cpp)
>    181 ms: PhaseStringOpts::replace_string_concat(StringConcat*) 
> (src/hotspot/share/opto/stringopts.cpp)
>    179 ms: PhaseChaitin::post_allocate_copy_removal() 
> (src/hotspot/share/opto/postaloc.cpp)
>    170 ms: Type::Initialize_shared(Compile*) 
> (src/hotspot/share/opto/type.cpp)
>    167 ms: _GLOBAL__sub_I_vmStructs.cpp 
> (src/hotspot/share/runtime/vmStructs.cpp)
>    166 ms: ConnectionGraph::split_unique_types(GrowableArray<Node*>&, 
> GrowableA... (src/hotspot/share/opto/escape.cpp)
>    153 ms: State::_sub_Op_AndL(Node const*) 
> (gensrc/adfiles/dfa_aarch64.cpp)
>
> **** Function sets that took longest to compile / optimize:
>   1725 ms: GrowableArrayWithAllocator<$>::grow(int) (436 times, avg 3 ms)
>   1698 ms: void 
> OopOopIterateBoundedDispatch<$>::Table::oop_oop_iterate_bounded... 
> (515 times, avg 3 ms)
>   1682 ms: void 
> OopOopIterateDispatch<$>::Table::oop_oop_iterate<$>(G1CMOopClos... 
> (613 times, avg 2 ms)
>   1545 ms: 
> AccessInternal::PostRuntimeDispatch<$>::oop_access_barrier(void*) (766 
> times, avg 2 ms)
>   1037 ms: JfrEvent<$>::write_sized_event(JfrBuffer*, Thread*, 
> unsigned long lo... (145 times, avg 7 ms)
>    831 ms: void 
> InstanceMirrorKlass::oop_oop_iterate_bounded<$>(oopDesc*, G1CMO... 
> (120 times, avg 6 ms)
>    719 ms: void 
> InstanceStackChunkKlass::oop_oop_iterate_stack<$>(stackChunkOop... 
> (121 times, avg 5 ms)
>    641 ms: void InstanceMirrorKlass::oop_oop_iterate<$>(oopDesc*, 
> G1CMOopClosur... (121 times, avg 5 ms)
>    615 ms: void 
> InstanceRefKlass::oop_oop_iterate_ref_processing<$>(oopDesc*, G... 
> (120 times, avg 5 ms)
>    538 ms: void 
> InstanceRefKlass::oop_oop_iterate_discovery<$>(oopDesc*, Refere... 
> (120 times, avg 4 ms)
>    484 ms: EventWriterHost<$>::end_event_write(bool) (51 times, avg 9 ms)
>    481 ms: void 
> InstanceStackChunkKlass::oop_oop_iterate_stack_with_bitmap<$>(s... 
> (121 times, avg 3 ms)
>    468 ms: 
> AccessInternal::PostRuntimeDispatch<$>::oop_access_barrier(oopDesc*,... 
> (324 times, avg 1 ms)
>    468 ms: void 
> InstanceStackChunkKlass::oop_oop_iterate_stack_bounded<$>(stack... (77 
> times, avg 6 ms)
>    443 ms: void 
> InstanceRefKlass::oop_oop_iterate_ref_processing<$>(oopDesc*, G... 
> (121 times, avg 3 ms)
>    417 ms: void 
> InstanceRefKlass::oop_oop_iterate_discovery<$>(oopDesc*, Refere... 
> (120 times, avg 3 ms)
>    385 ms: bool 
> BitMap::iterate<$>(StackChunkOopIterateBitmapClosure<$>*, unsig... 
> (238 times, avg 1 ms)
>    383 ms: AccessInternal::RuntimeDispatch<$>::load_init(void*) (346 
> times, avg 1 ms)
>    351 ms: void 
> InstanceClassLoaderKlass::oop_oop_iterate_bounded<$>(oopDesc*, ... 
> (120 times, avg 2 ms)
>    328 ms: void 
> OopOopIterateDispatch<$>::Table::oop_oop_iterate<$>(AdjustPoint... (86 
> times, avg 3 ms)
>    310 ms: WriterHost<$>::write_utf8(char const*) (27 times, avg 11 ms)
>    293 ms: oopDesc* 
> ZBarrierSet::AccessBarrier<$>::load_barrier_on_oop_field_pr... (284 
> times, avg 1 ms)
>    268 ms: void WriterHost<$>::write<$>(unsigned long long) (62 times, 
> avg 4 ms)
>    257 ms: EventWriterHost<$>::begin_event_write(bool) (52 times, avg 
> 4 ms)
>    255 ms: void 
> InstanceStackChunkKlass::oop_oop_iterate_bounded<$>(oopDesc*, G... (84 
> times, avg 3 ms)
>    228 ms: void 
> OopOopIterateDispatch<$>::Table::oop_oop_iterate<$>(G1MarkAndPu... (40 
> times, avg 5 ms)
>    210 ms: BCEscapeAnalyzer::iterate_one_block(ciBlock*, 
> BCEscapeAnalyzer::Stat... (2 times, avg 105 ms)
>    197 ms: void WriterHost<$>::write<$>(long) (47 times, avg 4 ms)
>    183 ms: void 
> OopOopIterateBackwardsDispatch<$>::Table::oop_oop_iterate_backw... (33 
> times, avg 5 ms)
>    171 ms: 
> ZBarrierSet::AccessBarrier<$>::oop_load_in_heap_at(oopDesc*, long) (65 
> times, avg 2 ms)
>
> /Magnus
>



More information about the hotspot-dev mailing list