Additional clang -ftime-trace data
Magnus Ihse Bursie
magnus.ihse.bursie at oracle.com
Tue Sep 6 14:44:35 UTC 2022
On 2022-09-06 14:28, Stefan Karlsson wrote:
> If you want to see something ugly take a look at this patch, which
> cuts the inlining of AccessInternal code, and explicitly instantiate
> needed definitions in the a new access.cpp file:
> https://github.com/openjdk/jdk/compare/master...stefank:jdk:access_backend_no_inline_for_compile_times
>
>
> This PoC showed a 20% HotSpot compile time improvement, but I got
> feedback that this was so ugly that they would rather live with the
> extra compile time, than to ever have to look at the access.cpp file
> again. :)
Ok, so the AccessInternal cost of compile time was already well-known. I
had hoped I would at least provide some new insights. :)
I can understand that this was perceived as ugly. Is this the only
possible solution, though? I realize I'm far outside my (admittedly very
small) comfort zone of C++, but usually, There Is More Than One Way To
Do It. Maybe it's possible to find a more palatable solution, that still
cuts down compilation time?
/Magnus
>
> StefanK
>
> On 2022-09-06 12:36, Magnus Ihse Bursie wrote:
>> Here are more output from ClangBuildAnalyzer. Seems that
>> AccessInternal might be worth having a second look at, to see if it
>> can be made easier on the compiler..?
>>
>> **** Time summary:
>> Compilation (2033 times):
>> Parsing (frontend): 271.6 s
>> Codegen & opts (backend): 274.1 s
>>
>> **** Files that took longest to parse (compiler frontend):
>> 1339 ms: ad_aarch64.o
>> 1311 ms: dfa_aarch64.o
>> 1108 ms: compileBroker.o
>> 1088 ms: compile.o
>> 1005 ms: ad_aarch64_misc.o
>> 990 ms: library_call.o
>> 971 ms: whitebox.o
>> 953 ms: runtime.o
>> 938 ms: sharedRuntime_aarch64.o
>> 889 ms: instanceKlass.o
>>
>> **** Files that took longest to codegen (compiler backend):
>> 7267 ms: ad_aarch64_gen.o
>> 6830 ms: ad_aarch64.o
>> 5402 ms: dfa_aarch64.o
>> 3482 ms: whitebox.o
>> 3106 ms: ad_aarch64_misc.o
>> 2960 ms: jvmtiEnterTrace.o
>> 2844 ms: macroAssembler_aarch64.o
>> 2568 ms: jvmciCompilerToVMInit.o
>> 2520 ms: jni.o
>> 2512 ms: stubGenerator_aarch64.o
>>
>> **** Templates that took longest to instantiate:
>> 2884 ms: AccessInternal::OopLoadAtProxy<262144>::operator oopDesc *
>> (512 times, avg 5 ms)
>> 2870 ms: AccessInternal::load_at<262148, oopDesc *> (512 times, avg
>> 5 ms)
>> 2797 ms: AccessInternal::OopLoadAtProxy<1310720>::operator oopDesc
>> * (512 times, avg 5 ms)
>> 2783 ms: AccessInternal::load_at<1310724, oopDesc *> (512 times,
>> avg 5 ms)
>> 2656 ms: AccessInternal::PreRuntimeDispatch::load_at<286790,
>> oopDesc *> (512 times, avg 5 ms)
>> 2569 ms: AccessInternal::PreRuntimeDispatch::load_at<1335366,
>> oopDesc *> (512 times, avg 5 ms)
>> 2446 ms: Access<525312>::oop_atomic_xchg<oopDesc *, oopDesc *> (383
>> times, avg 6 ms)
>> 2380 ms: AccessInternal::atomic_xchg<525316, oopDesc *, oopDesc *>
>> (383 times, avg 6 ms)
>> 2311 ms: AccessInternal::atomic_xchg_reduce_types<549892, oopDesc
>> *> (383 times, avg 6 ms)
>> 2252 ms: AccessInternal::PreRuntimeDispatch::atomic_xchg<549892,
>> oopDesc *> (383 times, avg 5 ms)
>> 2234 ms: Access<262144>::oop_store_at<oopDesc *> (512 times, avg 4 ms)
>> 2168 ms: Access<1310720>::oop_store_at<oopDesc *> (512 times, avg 4
>> ms)
>> 2162 ms: AccessInternal::store_at<262148, oopDesc *> (512 times,
>> avg 4 ms)
>> 2119 ms: AccessInternal::PreRuntimeDispatch::store_at<286790,
>> oopDesc *> (512 times, avg 4 ms)
>> 2098 ms: AccessInternal::store_at<1310724, oopDesc *> (512 times,
>> avg 4 ms)
>> 2059 ms: AccessInternal::PreRuntimeDispatch::store_at<1335366,
>> oopDesc *> (512 times, avg 4 ms)
>> 2018 ms: AccessInternal::RuntimeDispatch<286790, oopDesc *,
>> AccessInternal::B... (512 times, avg 3 ms)
>> 1999 ms: AccessInternal::RuntimeDispatch<286790, oopDesc *,
>> AccessInternal::B... (512 times, avg 3 ms)
>> 1954 ms: AccessInternal::BarrierResolver<286790, oopDesc
>> *(*)(oopDesc *, long... (512 times, avg 3 ms)
>> 1949 ms: AccessInternal::BarrierResolver<286790, oopDesc
>> *(*)(oopDesc *, long... (512 times, avg 3 ms)
>> 1941 ms: AccessInternal::RuntimeDispatch<1335366, oopDesc *,
>> AccessInternal::... (512 times, avg 3 ms)
>> 1931 ms: AccessInternal::OopLoadProxy<oopDesc *, 524288>::operator
>> oopDesc * (414 times, avg 4 ms)
>> 1921 ms: AccessInternal::load<524292, oopDesc *, oopDesc *> (414
>> times, avg 4 ms)
>> 1919 ms: AccessInternal::RuntimeDispatch<1335366, oopDesc *,
>> AccessInternal::... (512 times, avg 3 ms)
>> 1877 ms: AccessInternal::BarrierResolver<1335366, oopDesc
>> *(*)(oopDesc *, lon... (512 times, avg 3 ms)
>> 1872 ms: AccessInternal::BarrierResolver<1335366, oopDesc
>> *(*)(oopDesc *, lon... (512 times, avg 3 ms)
>> 1844 ms: AccessInternal::OopLoadProxy<oopDesc *, 528384>::operator
>> oopDesc * (391 times, avg 4 ms)
>> 1840 ms: AccessInternal::load_reduce_types<548932, oopDesc *> (414
>> times, avg 4 ms)
>> 1834 ms: AccessInternal::load<528388, oopDesc *, oopDesc *> (391
>> times, avg 4 ms)
>> 1789 ms: AccessInternal::PreRuntimeDispatch::load<548932, oopDesc
>> *> (414 times, avg 4 ms)
>>
>> **** Template sets that took longest to instantiate:
>> 19441 ms: AccessInternal::BarrierResolver<$>::resolve_barrier (5560
>> times, avg 3 ms)
>> 19373 ms: AccessInternal::BarrierResolver<$>::resolve_barrier_rt
>> (5560 times, avg 3 ms)
>> 18680 ms: AccessInternal::BarrierResolver<$>::resolve_barrier_gc<$>
>> (10787 times, avg 1 ms)
>> 10406 ms: AccessInternal::PostRuntimeDispatch<$>::oop_access_barrier
>> (10218 times, avg 1 ms)
>> 10357 ms: AccessInternal::load_at<$> (1953 times, avg 5 ms)
>> 9745 ms: AccessInternal::PreRuntimeDispatch::load_at<$> (1953
>> times, avg 4 ms)
>> 8775 ms: AccessInternal::load<$> (3284 times, avg 2 ms)
>> 7888 ms: AccessInternal::RuntimeDispatch<$>::load_at (1953 times,
>> avg 4 ms)
>> 7876 ms: AccessInternal::PreRuntimeDispatch::load<$> (3068 times,
>> avg 2 ms)
>> 7811 ms: AccessInternal::RuntimeDispatch<$>::load_at_init (1953
>> times, avg 3 ms)
>> 7470 ms: AccessInternal::load_reduce_types<$> (2771 times, avg 2 ms)
>> 5702 ms: AccessInternal::RuntimeDispatch<$>::load (1529 times, avg
>> 3 ms)
>> 5644 ms: AccessInternal::RuntimeDispatch<$>::load_init (1529 times,
>> avg 3 ms)
>> 5055 ms: oopDesc::obj_field_access<$> (925 times, avg 5 ms)
>> 4418 ms: Access<$>::oop_store_at<$> (1029 times, avg 4 ms)
>> 4275 ms: AccessInternal::store_at<$> (1029 times, avg 4 ms)
>> 4194 ms: AccessInternal::PreRuntimeDispatch::store_at<$> (1029
>> times, avg 4 ms)
>> 3515 ms: AccessInternal::RuntimeDispatch<$>::store_at (1029 times,
>> avg 3 ms)
>> 3477 ms: AccessInternal::RuntimeDispatch<$>::store_at_init (1029
>> times, avg 3 ms)
>> 2884 ms: AccessInternal::OopLoadAtProxy<262144>::operator oopDesc *
>> (512 times, avg 5 ms)
>> 2797 ms: AccessInternal::OopLoadAtProxy<1310720>::operator oopDesc
>> * (512 times, avg 5 ms)
>> 2446 ms: Access<$>::oop_atomic_xchg<$> (383 times, avg 6 ms)
>> 2380 ms: AccessInternal::atomic_xchg<$> (383 times, avg 6 ms)
>> 2311 ms: AccessInternal::atomic_xchg_reduce_types<$> (383 times,
>> avg 6 ms)
>> 2269 ms: IsIntegral<$> (2546 times, avg 0 ms)
>> 2252 ms: AccessInternal::PreRuntimeDispatch::atomic_xchg<$> (383
>> times, avg 5 ms)
>> 2035 ms: ModRefBarrierSet::AccessBarrier<$>::oop_store_in_heap<$>
>> (1416 times, avg 1 ms)
>> 1996 ms: ModRefBarrierSet::AccessBarrier<$>::oop_store_in_heap_at
>> (1306 times, avg 1 ms)
>> 1931 ms: AccessInternal::OopLoadProxy<oopDesc *, 524288>::operator
>> oopDesc * (414 times, avg 4 ms)
>> 1844 ms: AccessInternal::OopLoadProxy<oopDesc *, 528384>::operator
>> oopDesc * (391 times, avg 4 ms)
>>
>> **** Functions that took longest to compile:
>> 3963 ms: State::MachNodeGenerator(int)
>> (gensrc/adfiles/ad_aarch64_gen.cpp)
>> 1137 ms: CompilerToVM::initialize_intrinsics(JVMCIEnv*)
>> (src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp)
>> 1099 ms: ClassVerifier::verify_method(methodHandle const&,
>> JavaThread*) (src/hotspot/share/classfile/verifier.cpp)
>> 1050 ms: readConfiguration0(JNIEnv_*, JVMCIEnv*)
>> (src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp)
>> 595 ms: Parse::do_one_bytecode() (src/hotspot/share/opto/parse2.cpp)
>> 511 ms: schema_extend_event_subklass_bytes(InstanceKlass const*,
>> ClassFilePa...
>> (src/hotspot/share/jfr/instrumentation/jfrEventClassTransformer.cpp)
>> 464 ms: State::_sub_Op_AddL(Node const*)
>> (gensrc/adfiles/dfa_aarch64.cpp)
>> 394 ms: State::DFA(int, Node const*) (gensrc/adfiles/dfa_aarch64.cpp)
>> 383 ms: State::MachOperGenerator(int)
>> (gensrc/adfiles/ad_aarch64_gen.cpp)
>> 378 ms: JNIJVMCI::initialize_ids(JNIEnv_*)
>> (src/hotspot/share/jvmci/jvmciJavaClasses.cpp)
>> 319 ms: State::_sub_Op_XorI(Node const*)
>> (gensrc/adfiles/dfa_aarch64.cpp)
>> 315 ms: PhaseChaitin::Split(unsigned int, ResourceArea*)
>> (src/hotspot/share/opto/reg_split.cpp)
>> 277 ms: GenerateOopMap::interp1(BytecodeStream*)
>> (src/hotspot/share/oops/generateOopMap.cpp)
>> 260 ms: PhiNode::Ideal(PhaseGVN*, bool)
>> (src/hotspot/share/opto/cfgnode.cpp)
>> 255 ms: State::_sub_Op_AndI(Node const*)
>> (gensrc/adfiles/dfa_aarch64.cpp)
>> 229 ms: LIR_OpVisitState::visit(LIR_Op*)
>> (src/hotspot/share/c1/c1_LIR.cpp)
>> 226 ms: Bytecodes::initialize()
>> (src/hotspot/share/interpreter/bytecodes.cpp)
>> 222 ms: State::_sub_Op_AddI(Node const*)
>> (gensrc/adfiles/dfa_aarch64.cpp)
>> 219 ms: State::_sub_Op_XorL(Node const*)
>> (gensrc/adfiles/dfa_aarch64.cpp)
>> 216 ms: State::_sub_Op_AddP(Node const*)
>> (gensrc/adfiles/dfa_aarch64.cpp)
>> 214 ms: TemplateTable::initialize()
>> (src/hotspot/share/interpreter/templateTable.cpp)
>> 208 ms: Arguments::parse_each_vm_init_arg(JavaVMInitArgs const*,
>> bool*, JVMF... (src/hotspot/share/runtime/arguments.cpp)
>> 208 ms: BCEscapeAnalyzer::iterate_one_block(ciBlock*,
>> BCEscapeAnalyzer::Stat... (src/hotspot/share/ci/bcEscapeAnalyzer.cpp)
>> 189 ms: LibraryCallKit::inline_native_getEventWriter()
>> (src/hotspot/share/opto/library_call.cpp)
>> 181 ms: PhaseStringOpts::replace_string_concat(StringConcat*)
>> (src/hotspot/share/opto/stringopts.cpp)
>> 179 ms: PhaseChaitin::post_allocate_copy_removal()
>> (src/hotspot/share/opto/postaloc.cpp)
>> 170 ms: Type::Initialize_shared(Compile*)
>> (src/hotspot/share/opto/type.cpp)
>> 167 ms: _GLOBAL__sub_I_vmStructs.cpp
>> (src/hotspot/share/runtime/vmStructs.cpp)
>> 166 ms: ConnectionGraph::split_unique_types(GrowableArray<Node*>&,
>> GrowableA... (src/hotspot/share/opto/escape.cpp)
>> 153 ms: State::_sub_Op_AndL(Node const*)
>> (gensrc/adfiles/dfa_aarch64.cpp)
>>
>> **** Function sets that took longest to compile / optimize:
>> 1725 ms: GrowableArrayWithAllocator<$>::grow(int) (436 times, avg 3
>> ms)
>> 1698 ms: void
>> OopOopIterateBoundedDispatch<$>::Table::oop_oop_iterate_bounded...
>> (515 times, avg 3 ms)
>> 1682 ms: void
>> OopOopIterateDispatch<$>::Table::oop_oop_iterate<$>(G1CMOopClos...
>> (613 times, avg 2 ms)
>> 1545 ms:
>> AccessInternal::PostRuntimeDispatch<$>::oop_access_barrier(void*)
>> (766 times, avg 2 ms)
>> 1037 ms: JfrEvent<$>::write_sized_event(JfrBuffer*, Thread*,
>> unsigned long lo... (145 times, avg 7 ms)
>> 831 ms: void
>> InstanceMirrorKlass::oop_oop_iterate_bounded<$>(oopDesc*, G1CMO...
>> (120 times, avg 6 ms)
>> 719 ms: void
>> InstanceStackChunkKlass::oop_oop_iterate_stack<$>(stackChunkOop...
>> (121 times, avg 5 ms)
>> 641 ms: void InstanceMirrorKlass::oop_oop_iterate<$>(oopDesc*,
>> G1CMOopClosur... (121 times, avg 5 ms)
>> 615 ms: void
>> InstanceRefKlass::oop_oop_iterate_ref_processing<$>(oopDesc*, G...
>> (120 times, avg 5 ms)
>> 538 ms: void
>> InstanceRefKlass::oop_oop_iterate_discovery<$>(oopDesc*, Refere...
>> (120 times, avg 4 ms)
>> 484 ms: EventWriterHost<$>::end_event_write(bool) (51 times, avg 9
>> ms)
>> 481 ms: void
>> InstanceStackChunkKlass::oop_oop_iterate_stack_with_bitmap<$>(s...
>> (121 times, avg 3 ms)
>> 468 ms:
>> AccessInternal::PostRuntimeDispatch<$>::oop_access_barrier(oopDesc*,...
>> (324 times, avg 1 ms)
>> 468 ms: void
>> InstanceStackChunkKlass::oop_oop_iterate_stack_bounded<$>(stack...
>> (77 times, avg 6 ms)
>> 443 ms: void
>> InstanceRefKlass::oop_oop_iterate_ref_processing<$>(oopDesc*, G...
>> (121 times, avg 3 ms)
>> 417 ms: void
>> InstanceRefKlass::oop_oop_iterate_discovery<$>(oopDesc*, Refere...
>> (120 times, avg 3 ms)
>> 385 ms: bool
>> BitMap::iterate<$>(StackChunkOopIterateBitmapClosure<$>*, unsig...
>> (238 times, avg 1 ms)
>> 383 ms: AccessInternal::RuntimeDispatch<$>::load_init(void*) (346
>> times, avg 1 ms)
>> 351 ms: void
>> InstanceClassLoaderKlass::oop_oop_iterate_bounded<$>(oopDesc*, ...
>> (120 times, avg 2 ms)
>> 328 ms: void
>> OopOopIterateDispatch<$>::Table::oop_oop_iterate<$>(AdjustPoint...
>> (86 times, avg 3 ms)
>> 310 ms: WriterHost<$>::write_utf8(char const*) (27 times, avg 11 ms)
>> 293 ms: oopDesc*
>> ZBarrierSet::AccessBarrier<$>::load_barrier_on_oop_field_pr... (284
>> times, avg 1 ms)
>> 268 ms: void WriterHost<$>::write<$>(unsigned long long) (62
>> times, avg 4 ms)
>> 257 ms: EventWriterHost<$>::begin_event_write(bool) (52 times, avg
>> 4 ms)
>> 255 ms: void
>> InstanceStackChunkKlass::oop_oop_iterate_bounded<$>(oopDesc*, G...
>> (84 times, avg 3 ms)
>> 228 ms: void
>> OopOopIterateDispatch<$>::Table::oop_oop_iterate<$>(G1MarkAndPu...
>> (40 times, avg 5 ms)
>> 210 ms: BCEscapeAnalyzer::iterate_one_block(ciBlock*,
>> BCEscapeAnalyzer::Stat... (2 times, avg 105 ms)
>> 197 ms: void WriterHost<$>::write<$>(long) (47 times, avg 4 ms)
>> 183 ms: void
>> OopOopIterateBackwardsDispatch<$>::Table::oop_oop_iterate_backw...
>> (33 times, avg 5 ms)
>> 171 ms:
>> ZBarrierSet::AccessBarrier<$>::oop_load_in_heap_at(oopDesc*, long)
>> (65 times, avg 2 ms)
>>
>> /Magnus
>>
>
More information about the hotspot-dev
mailing list