[10] RFR: 8183299: Improve inlining of CompiledMethod methods into frame::sender
Vladimir Kozlov
vladimir.kozlov at oracle.com
Mon Jul 3 20:06:36 UTC 2017
Thank you for testing. Yes, inlining is main optimization in all
compilers ;)
Changes are good.
Thanks,
Vladimir
On 7/3/17 12:49 PM, Claes Redestad wrote:
> Hi Vladimir,
>
> ran and profiled an old Throw JMH micro[1] with -prof perfasm: no
> statistically significant difference by
> swapping, while it's clear that the inlining has significant effect in
> profiles (and on throughput
> when not profiling):
>
> baseline
> ....[Hottest Methods (after
> inlining)]..............................................................
> 14.10% 17.16% libjvm.so
> java_lang_Throwable::fill_in_stack_trace
> 11.55% 14.62% libjvm.so CodeHeap::find_blob_unsafe
> 7.79% 9.34% libjvm.so objArrayOopDesc::obj_at_put
> 7.48% 8.73% libjvm.so CodeCache::find_blob
> 7.35% 6.59% libjvm.so BacktraceBuilder::push
> * 6.14% 5.84% libjvm.so frame::sender *
> 5.28% 2.83% libjvm.so ObjArrayKlass::allocate
> 3.54% 3.83% libjvm.so TypeArrayKlass::allocate_common
> 3.31% 3.21% libjvm.so
> G1SATBCardTableLoggingModRefBS::write_ref_field_work
> 3.17% 1.91% libjvm.so CodeBlob::is_zombie
> 2.74% 2.44% [unknown] [unknown] *
> ** 2.31% 0.92% libjvm.so
> CompiledMethod::get_deopt_original_pc *
> 2.06% 1.35% c2, level 4 java.lang.Exception::<init>,
> version 818
> 1.90% 0.92% libc-2.23.so __memset_avx2
> 1.57% 2.14% libjvm.so frame::is_interpreted_frame
> 1.29% 1.35% libjvm.so BacktraceBuilder::expand
> 1.25% 1.22% libjvm.so oopDesc::is_a
> 1.20% 1.26% libjvm.so Handle::Handle
> 1.03% 1.46% libjvm.so
> G1SATBCardTableModRefBS::write_ref_field_pre_work
> 0.77% 0.43% libjvm.so oopDesc::release_obj_field_put
> 13.36% 11.77% <...other 145 warm methods...>
>
> hotspot.01:
> ....[Hottest Methods (after
> inlining)]..............................................................
> 15.44% 16.42% libjvm.so
> java_lang_Throwable::fill_in_stack_trace
> 11.83% 14.56% libjvm.so CodeHeap::find_blob_unsafe
> 8.20% 6.59% libjvm.so BacktraceBuilder::push
> 7.27% 8.77% libjvm.so CodeCache::find_blob
> 7.15% 9.20% libjvm.so objArrayOopDesc::obj_at_put
> 5.73% 2.73% libjvm.so ObjArrayKlass::allocate
> 3.93% 3.27% [unknown] [unknown]
> * 3.89% 3.43% libjvm.so frame::sender *
> 3.88% 3.22% libjvm.so
> G1SATBCardTableLoggingModRefBS::write_ref_field_work
> 3.70% 4.09% libjvm.so TypeArrayKlass::allocate_common
> 3.16% 4.16% libjvm.so frame::sender_for_interpreter_frame
> 2.14% 1.00% libc-2.23.so __memset_avx2
> 2.12% 1.25% libjvm.so CodeBlob::is_zombie
> 1.66% 1.37% c2, level 4 java.lang.Exception::<init>,
> version 817
> 1.44% 1.19% libjvm.so Handle::Handle
> 1.43% 2.09% libjvm.so frame::is_interpreted_frame
> 1.08% 1.26% libjvm.so BacktraceBuilder::expand
> 1.05% 1.27% libjvm.so
> G1SATBCardTableModRefBS::write_ref_field_pre_work
> 1.04% 1.23% libjvm.so oopDesc::is_a
> 0.81% 0.49% libjvm.so oopDesc::release_obj_field_put
> 12.28% 11.64% <...other 145 warm methods...>
>
> hotspot.01 + swap:
> ....[Hottest Methods (after
> inlining)]..............................................................
> 14.70% 15.91% libjvm.so
> java_lang_Throwable::fill_in_stack_trace
> 11.72% 14.55% libjvm.so CodeHeap::find_blob_unsafe
> 7.79% 6.65% libjvm.so BacktraceBuilder::push
> 7.51% 9.06% libjvm.so objArrayOopDesc::obj_at_put
> 7.30% 8.41% libjvm.so CodeCache::find_blob
> 5.63% 3.24% libjvm.so ObjArrayKlass::allocate
> * 4.22% 4.15% libjvm.so frame::sender*
> 3.80% 3.44% libjvm.so
> G1SATBCardTableLoggingModRefBS::write_ref_field_work
> 3.70% 3.98% libjvm.so TypeArrayKlass::allocate_common
> 3.48% 4.17% libjvm.so frame::sender_for_interpreter_frame
> 2.59% 2.31% [unknown] [unknown]
> 2.35% 1.31% libc-2.23.so __memset_avx2
> 2.15% 1.12% c2, level 4 java.lang.Exception::<init>,
> version 808
> 1.80% 1.03% libjvm.so CodeBlob::is_zombie
> 1.42% 1.57% libjvm.so Handle::Handle
> 1.30% 2.07% libjvm.so frame::is_interpreted_frame
> 1.20% 1.29% libjvm.so BacktraceBuilder::expand
> 1.17% 1.37% libjvm.so oopDesc::is_a
> 1.03% 1.14% libjvm.so Method::bci_from
> 1.00% 1.37% libjvm.so
> G1SATBCardTableModRefBS::write_ref_field_pre_work
> 13.20% 11.06% <...other 162 warm methods...>
>
> It's interesting to see that relative time in frame::sender decreases
> after get_deopt_original_pc
> is inlined into it, which suggests (unsurprisingly?) that the inlining
> itself allows gcc to optimize
> things further.
>
> Thanks!
>
> /Claes
>
> [1] It's an old microbenchmark, but it checks out:
>
> @State(Scope.Thread)
> public class Throw {
>
> public Object useObject = new Object();
> public Object dummyObj;
>
> @Benchmark
> public void throwSyncException() {
> try {
> throwingMethod();
> } catch (Exception ex) {
> dummyObj = useObject;
> }
> }
>
> public void throwingMethod() throws Exception {
> if (alwaysTrue) {
> throw new Exception();
> }
> }
> }
>
> On 2017-07-03 17:56, Vladimir Kozlov wrote:
>> Claes,
>>
>> Did you try to swap && conditions?:
>>
>> 36 || (pc == (deopt_handler_begin() +
>> NativeCall::instruction_size) && is_compiled_by_jvmci())
>>
>> is_compiled_by_jvmci() could be cache miss and other check is false in
>> all cases (until Graal is used as JIT).
>>
>> Thanks,
>> Vladimir
>>
>> On 7/3/17 7:41 AM, Claes Redestad wrote:
>>> <adding hotspot-runtime-dev since this patch now touches
>>> frame.inline.hpp as well>
>>>
>>> Hi,
>>>
>>> I've reworked the patch to introduce a compiledMethod.inline.hpp
>>> instead to break a
>>> circular dependency on code/nativeInstr.hpp on some platforms.
>>>
>>> http://cr.openjdk.java.net/~redestad/8183299/hotspot.01/
>>>
>>> An attempt to tease apart the include dependencies was made (Stefan
>>> Karlsson has
>>> an almost working patch to clean things up significantly), but there
>>> is currently a lot of
>>> code out there that includes frame.inline.hpp transitively which
>>> makes it cumbersome
>>> to add this new include in any other place and still keep the linker
>>> happy.
>>>
>>> Testing: JPRT
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 06/30/2017 06:50 PM, Claes Redestad wrote:
>>>> Hi all,
>>>>
>>>> here's a startup optimization that turned out to also help the VM
>>>> stack walking in general, including java exception throwing
>>>> performance:
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~redestad/8183299/hotspot.00/
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183299
>>>>
>>>> The static footprint cost of the more aggressive inlining is ~20Kb,
>>>> which seems reasonable for the achieved speed-up (~5% in some
>>>> microbenchmarks).
>>>>
>>>> Testing: JPRT (still in-flight), jtreg locally, microbenchmarks
>>>>
>>>> Thanks!
>>>>
>>>> /Claes
>>>
>
More information about the hotspot-compiler-dev
mailing list