Bytecode profiling
Edward Nevill
ed at camswl.com
Fri Feb 20 06:57:32 PST 2009
Hi folks,
Another day, more profiling. I have persuaded gprof to tell me the % time spent executed
each bytecode in my ASM loop. This is more useful than the simple bytecode counter in
the CPP interpreter. Some interesting results.
I have just included summaries here down to the 1% level. If anyone wants the full profile
I can send it. At the bottom of each bytecode profile I have include other functions down
to the 1% level for comparison.
First CaffeineMark. Fairly much as expected. I'm surprised it spends quit that amount of
time in getfield. I'll have another look at that. Also notice the high % in iload_0. This
is probably a lie. It is probably aload_0, the asm VM does not distinguish aload_0, it
just goes to iload_0. So the sequence is probably aload_0, getfield.
----- ECM ---
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
11.99 1.21 1.21 do_getfield
7.19 1.94 0.73 do_iload
4.96 3.06 0.50 do_iaload
4.56 3.52 0.46 do_ifne
4.56 3.98 0.46 do_iload_0
4.36 4.42 0.44 do_iconst_1
3.32 4.75 0.34 do_iload_2
2.97 5.05 0.30 do_if_icmplt
2.78 5.33 0.28 do_istore
2.58 5.87 0.26 do_ixor
2.38 6.11 0.24 do_laload
1.78 6.48 0.18 do_iadd
1.73 6.66 0.18 do_iinc
1.68 7.00 0.17 do_iload_1
1.54 7.15 0.16 do_iload_3
1.49 7.45 0.15 do_iconst_0
1.44 7.60 0.15 do_if_icmpgt
1.39 7.74 0.14 do_if_icmpge
1.19 7.86 0.12 do_dadd
1.19 7.98 0.12 do_ifle
1.19 8.10 0.12 do_istore_2
1.14 8.21 0.12 do_lastore
1.09 8.43 0.11 do_lload
6.14 2.56 0.62 2136180 0.00 0.00 BytecodeInterpreter::run(BytecodeInterpreter*)
2.78 5.61 0.28 490 0.57 1.85 CppInterpreter::main_loop(int, Thread*)
1.88 6.30 0.19 1099047 0.00 0.00 InterpreterFrame::build(ZeroStack*, methodOopDesc*, JavaThread*)
1.68 6.83 0.17 1218147 0.00 0.00 methodOopDesc::result_type() const
1.49 7.30 0.15 1224036 0.00 0.00 SignatureIterator::iterate_returntype()
1.09 8.32 0.11 18512 0.01 0.01 typeArrayKlass::allocate(int, Thread*)
----------------------------
Next, EEMBC. Only 5 bytecode above the 1% level. Notice the time spent in os::vm_page_size()
2.93%, more than any bytecode apart from do_getfield.
--- EEMBC -----
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
3.54 61.03 3.84 do_getfield
3.27 64.57 3.54 run_opt_entry
1.80 79.09 1.95 do_iload
1.58 80.80 1.72 do_iload_0
1.16 82.06 1.26 do_iload_1
1.06 83.21 1.15 do_if_icmplt
24.60 26.67 26.67 100435622 0.00 0.00 BytecodeInterpreter::run(BytecodeInterpreter*)
13.87 41.70 15.03 1847 0.01 0.02 CppInterpreter::main_loop(int, Thread*)
5.77 47.95 6.25 47872572 0.00 0.00 methodOopDesc::result_type() const
4.58 52.91 4.97 47096141 0.00 0.00 InterpreterFrame::build(ZeroStack*, methodOopDesc*, JavaThread*)
3.95 57.19 4.28 47882026 0.00 0.00 SignatureIterator::iterate_returntype()
2.93 67.75 3.18 94194307 0.00 0.00 os::vm_page_size()
2.64 70.61 2.86 46326855 0.00 0.00 CppInterpreter::normal_entry(methodOopDesc*, int, Thread*)
2.06 72.84 2.23 47897882 0.00 0.00 SignatureIterator::parse_type()
2.03 75.04 2.20 74343829 0.00 0.00 ConstantPoolCacheEntry::is_resolved(Bytecodes::Code) const
1.94 77.14 2.10 47884823 0.00 0.00 SignatureIterator::SignatureIterator(symbolHandle)
------------------------------
Finally, Think Office. Only 1 bytecode above the 1% level. And it spends more time doing
os::vm_page_size() than any bytecode. I must get the callgraph profile and find out who
is calling it 33E6 times.
Not the time in run_opt_entry. This is because of all the invokes and returns. Perhaps
I have just deoptimised the VM for think office and it is just spending its time bouncing
between 'run' and 'run_opt'
--- Office ---
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
2.12 31.06 1.07 run_opt_entry
1.39 34.18 0.70 do_getfield
25.50 12.88 12.88 34862862 0.00 0.00 BytecodeInterpreter::run(BytecodeInterpreter*)
11.26 18.57 5.69 35015 0.00 0.00 CppInterpreter::main_loop(int, Thread*)
6.36 21.78 3.21 18326576 0.00 0.00 methodOopDesc::result_type() const
4.59 24.10 2.32 16971754 0.00 0.00 InterpreterFrame::build(ZeroStack*, methodOopDesc*, JavaThread*)
4.30 26.27 2.17 18391356 0.00 0.00 SignatureIterator::iterate_returntype()
2.67 27.62 1.35 18587041 0.00 0.00 SignatureIterator::parse_type()
2.36 28.81 1.19 24191043 0.00 0.00 ConstantPoolCacheEntry::is_resolved(Bytecodes::Code) const
2.34 29.99 1.18 15669590 0.00 0.00 CppInterpreter::normal_entry(methodOopDesc*, int, Thread*)
1.94 32.04 0.98 33980388 0.00 0.00 os::vm_page_size()
1.46 32.77 0.74 1302034 0.00 0.00 CppInterpreter::native_entry(methodOopDesc*, int, Thread*)
1.41 33.48 0.71 18435409 0.00 0.00 SignatureIterator::SignatureIterator(symbolHandle)
--------------------
For now....
Ed
More information about the zero-dev
mailing list