RFR: 8345265: Minor improvements for LTO across all compilers [v2]
Julian Waters
jwaters at openjdk.org
Fri Mar 28 05:13:16 UTC 2025
On Thu, 27 Mar 2025 14:32:35 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:
> > Wait, sorry to trouble you further, but what does nm --demangle --reverse-sort --print-size --size-sort libjvm.so on HotSpot compiled by gcc 14 with LTO active yield as the largest symbol in the binary? (It should be the symbol listed at the very top)
>
> This is my output; maybe I have to add that I used the 'normal' jdk head without patches, is that what I should do for a gcc14 build test?
>
> ```
> nm --demangle --reverse-sort --print-size --size-sort images/jdk/lib/server/libjvm.so | more
> 0000000000453ee0 000000000002320e t State::MachNodeGenerator(int)
> 0000000000970f70 0000000000018eb9 t CompilerToVM::initialize_intrinsics(JVMCIEnv*)
> 000000000140caa0 000000000000f018 b Matcher::mreg2regmask
> 0000000000993c80 000000000000a40d t JNIJVMCI::initialize_ids(JNIEnv_*)
> 0000000000ac16b0 0000000000009d16 t Matcher::Fixup_Save_On_Entry()
> 000000000143db00 0000000000008000 b _ZL9_elements.lto_priv.0
> 0000000001446d20 0000000000008000 b _free_list
> 000000000141e900 0000000000007d00 b DFSClosure::_reference_stack
> 00000000013d6d40 0000000000007668 d _ZL9flagTable.lto_priv.0
> 00000000013edf60 0000000000006c30 d VMStructs::localHotSpotVMStructs
> 00000000010463f0 0000000000006a06 t readConfiguration0(JNIEnv_*, JVMCIEnv*) [clone .isra.0]
> 0000000000d51dc0 00000000000067a2 t StubGenerator::generate_libmPow()
> 00000000010b12d0 0000000000006289 t G1ParScanThreadState::trim_queue_to_threshold(unsigned int)
> 0000000000e24550 00000000000061e8 t ClassVerifier::verify_method(methodHandle const&, JavaThread*)
> 0000000001076dd0 000000000000548d t State::DFA(int, Node const*) [clone .isra.0]
> 0000000000e1ba00 000000000000519d t VMError::report(outputStream*, bool)
> 00000000014521e0 0000000000005000 b TemplateInterpreter::_safept_table
> 000000000142cb60 0000000000005000 b TemplateInterpreter::_normal_table
> 0000000001431b60 0000000000005000 b TemplateInterpreter::_active_table
> 0000000000653790 0000000000004e12 t CompileBroker::print_heapinfo(outputStream*, char const*, unsigned long)
> 000000000075be40 0000000000004e0b t G1CollectedHeap::do_collection_pause_at_safepoint_helper()
> 0000000000b90260 0000000000004a41 t Parse::do_one_bytecode() [clone .part.0]
> 00000000010c1f20 00000000000049e6 t d_print_comp_inner
> 0000000000c1e180 0000000000004594 t ServiceThread::service_thread_entry(JavaThread*, JavaThread*)
> 00000000005ad7c0 0000000000004424 t C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)
> 0000000000d260b0 000000000000440a t PhaseStringOpts::replace_string_concat(StringConcat*)
> 0000000000e48920 00000000000042e5 t VM_Version::initialize()
> 0000000000afbad0 0000000000004240 t Method::init_intrinsic_id(vmSymbolID)
> 00000000010200d0 00000000000040ab t PSParallelCompact::invoke_no_policy(bool) [clone .isra.0]
> 000000000051f230 0000000000004054 t Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)
> 000000000065c060 0000000000004015 t Compile::Code_Gen()
> 0000000000663060 0000000000003fec t CompileBroker::compiler_thread_loop()
> 000000000070d9b0 0000000000003fd7 t ConnectionGraph::do_analysis(Compile*, PhaseIterGVN*)
> 0000000000be8890 0000000000003f8e t PhaseChaitin::Split(unsigned int, ResourceArea*)
> 0000000000818730 0000000000003f82 t PhaseChaitin::build_ifg_physical(ResourceArea*)
> 0000000000fb1920 0000000000003f44 t SharedRuntime::generate_native_wrapper(MacroAssembler*, methodHandle const&, int, BasicType*, VMRegPair*, BasicType) [clone .constprop.0]
> 00000000007e2ab0 0000000000003f41 t PhaseCFG::global_code_motion()
> 00000000013e8580 0000000000003e10 d JVMCIVMStructs::localHotSpotVMStructs
> 0000000000db3b00 0000000000003dff t TemplateInterpreterGenerator::generate_all()
> 0000000000f436d0 0000000000003d92 t initialize_stubs(StubGenBlobId, int, int, char const*, char const*, char const*) [clone .constprop.0]
> 0000000000d6b680 0000000000003d42 t StubGenerator::generate_libmTan()
> 00000000004ff560 0000000000003d28 t BCEscapeAnalyzer::iterate_blocks(Arena*)
> 0000000000a2bfb0 0000000000003cef t VM_RedefineClasses::load_new_class_versions() [clone .part.0]
> 000000000073fde0 0000000000003ca2 t G1CollectedHeap::do_full_collection(bool, bool)
> 0000000000e8e170 0000000000003bea t ZDriverMajor::run_thread()
> 000000000103ec00 0000000000003b81 t JvmtiEnv::RetransformClasses(int, _jclass* const*) [clone .isra.0]
> 0000000000d18c50 0000000000003b4d t StubGenerator::generate_md5_implCompress(StubGenStubId)
> 0000000000a97160 0000000000003b20 t PhaseIdealLoop::auto_vectorize(IdealLoopTree*, VSharedData&) [clone .part.0]
> 00000000013df000 0000000000003aa8 d ruleName
> 00000000005c7e90 0000000000003a9f t PhiNode::Ideal(PhaseGVN*, bool)
> 00000000006a1ef0 0000000000003a9f t State::_sub_Op_AddP(Node const*)
> 0000000000e86880 0000000000003a74 t ZGeneration::select_relocation_set(ZGenerationId, bool)
> --More--
> ```
Yes, that should be good enough, thank you for sharing it. I'm baffled by how tiny the methods are on Linux, in particular G1ParScanThreadState::trim_queue_to_threshold(unsigned int) only being 25KB is astonishing to me. I have no clue why flatten causes so much inlining on Windows to the point where it results in massive 5MB G1 methods, but then it's perfectly fine on Windows. I really wonder why the things I have to solve can never be easy sometimes
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2760210133
More information about the hotspot-dev
mailing list