[10] RFR: 8183299: Improve inlining of CompiledMethod methods into frame::sender
Claes Redestad
claes.redestad at oracle.com
Mon Jul 3 19:49:53 UTC 2017
Hi Vladimir,
ran and profiled an old Throw JMH micro[1] with -prof perfasm: no
statistically significant difference by
swapping, while it's clear that the inlining has significant effect in
profiles (and on throughput
when not profiling):
baseline
....[Hottest Methods (after
inlining)]..............................................................
14.10% 17.16% libjvm.so
java_lang_Throwable::fill_in_stack_trace
11.55% 14.62% libjvm.so CodeHeap::find_blob_unsafe
7.79% 9.34% libjvm.so objArrayOopDesc::obj_at_put
7.48% 8.73% libjvm.so CodeCache::find_blob
7.35% 6.59% libjvm.so BacktraceBuilder::push
* 6.14% 5.84% libjvm.so frame::sender *
5.28% 2.83% libjvm.so ObjArrayKlass::allocate
3.54% 3.83% libjvm.so TypeArrayKlass::allocate_common
3.31% 3.21% libjvm.so
G1SATBCardTableLoggingModRefBS::write_ref_field_work
3.17% 1.91% libjvm.so CodeBlob::is_zombie
2.74% 2.44% [unknown] [unknown] *
** 2.31% 0.92% libjvm.so
CompiledMethod::get_deopt_original_pc *
2.06% 1.35% c2, level 4 java.lang.Exception::<init>,
version 818
1.90% 0.92% libc-2.23.so __memset_avx2
1.57% 2.14% libjvm.so frame::is_interpreted_frame
1.29% 1.35% libjvm.so BacktraceBuilder::expand
1.25% 1.22% libjvm.so oopDesc::is_a
1.20% 1.26% libjvm.so Handle::Handle
1.03% 1.46% libjvm.so
G1SATBCardTableModRefBS::write_ref_field_pre_work
0.77% 0.43% libjvm.so oopDesc::release_obj_field_put
13.36% 11.77% <...other 145 warm methods...>
hotspot.01:
....[Hottest Methods (after
inlining)]..............................................................
15.44% 16.42% libjvm.so
java_lang_Throwable::fill_in_stack_trace
11.83% 14.56% libjvm.so CodeHeap::find_blob_unsafe
8.20% 6.59% libjvm.so BacktraceBuilder::push
7.27% 8.77% libjvm.so CodeCache::find_blob
7.15% 9.20% libjvm.so objArrayOopDesc::obj_at_put
5.73% 2.73% libjvm.so ObjArrayKlass::allocate
3.93% 3.27% [unknown] [unknown]
* 3.89% 3.43% libjvm.so frame::sender *
3.88% 3.22% libjvm.so
G1SATBCardTableLoggingModRefBS::write_ref_field_work
3.70% 4.09% libjvm.so TypeArrayKlass::allocate_common
3.16% 4.16% libjvm.so frame::sender_for_interpreter_frame
2.14% 1.00% libc-2.23.so __memset_avx2
2.12% 1.25% libjvm.so CodeBlob::is_zombie
1.66% 1.37% c2, level 4 java.lang.Exception::<init>,
version 817
1.44% 1.19% libjvm.so Handle::Handle
1.43% 2.09% libjvm.so frame::is_interpreted_frame
1.08% 1.26% libjvm.so BacktraceBuilder::expand
1.05% 1.27% libjvm.so
G1SATBCardTableModRefBS::write_ref_field_pre_work
1.04% 1.23% libjvm.so oopDesc::is_a
0.81% 0.49% libjvm.so oopDesc::release_obj_field_put
12.28% 11.64% <...other 145 warm methods...>
hotspot.01 + swap:
....[Hottest Methods (after
inlining)]..............................................................
14.70% 15.91% libjvm.so
java_lang_Throwable::fill_in_stack_trace
11.72% 14.55% libjvm.so CodeHeap::find_blob_unsafe
7.79% 6.65% libjvm.so BacktraceBuilder::push
7.51% 9.06% libjvm.so objArrayOopDesc::obj_at_put
7.30% 8.41% libjvm.so CodeCache::find_blob
5.63% 3.24% libjvm.so ObjArrayKlass::allocate
* 4.22% 4.15% libjvm.so frame::sender*
3.80% 3.44% libjvm.so
G1SATBCardTableLoggingModRefBS::write_ref_field_work
3.70% 3.98% libjvm.so TypeArrayKlass::allocate_common
3.48% 4.17% libjvm.so frame::sender_for_interpreter_frame
2.59% 2.31% [unknown] [unknown]
2.35% 1.31% libc-2.23.so __memset_avx2
2.15% 1.12% c2, level 4 java.lang.Exception::<init>,
version 808
1.80% 1.03% libjvm.so CodeBlob::is_zombie
1.42% 1.57% libjvm.so Handle::Handle
1.30% 2.07% libjvm.so frame::is_interpreted_frame
1.20% 1.29% libjvm.so BacktraceBuilder::expand
1.17% 1.37% libjvm.so oopDesc::is_a
1.03% 1.14% libjvm.so Method::bci_from
1.00% 1.37% libjvm.so
G1SATBCardTableModRefBS::write_ref_field_pre_work
13.20% 11.06% <...other 162 warm methods...>
It's interesting to see that relative time in frame::sender decreases
after get_deopt_original_pc
is inlined into it, which suggests (unsurprisingly?) that the inlining
itself allows gcc to optimize
things further.
Thanks!
/Claes
[1] It's an old microbenchmark, but it checks out:
@State(Scope.Thread)
public class Throw {
public Object useObject = new Object();
public Object dummyObj;
@Benchmark
public void throwSyncException() {
try {
throwingMethod();
} catch (Exception ex) {
dummyObj = useObject;
}
}
public void throwingMethod() throws Exception {
if (alwaysTrue) {
throw new Exception();
}
}
}
On 2017-07-03 17:56, Vladimir Kozlov wrote:
> Claes,
>
> Did you try to swap && conditions?:
>
> 36 || (pc == (deopt_handler_begin() +
> NativeCall::instruction_size) && is_compiled_by_jvmci())
>
> is_compiled_by_jvmci() could be cache miss and other check is false in
> all cases (until Graal is used as JIT).
>
> Thanks,
> Vladimir
>
> On 7/3/17 7:41 AM, Claes Redestad wrote:
>> <adding hotspot-runtime-dev since this patch now touches
>> frame.inline.hpp as well>
>>
>> Hi,
>>
>> I've reworked the patch to introduce a compiledMethod.inline.hpp
>> instead to break a
>> circular dependency on code/nativeInstr.hpp on some platforms.
>>
>> http://cr.openjdk.java.net/~redestad/8183299/hotspot.01/
>>
>> An attempt to tease apart the include dependencies was made (Stefan
>> Karlsson has
>> an almost working patch to clean things up significantly), but there
>> is currently a lot of
>> code out there that includes frame.inline.hpp transitively which
>> makes it cumbersome
>> to add this new include in any other place and still keep the linker
>> happy.
>>
>> Testing: JPRT
>>
>> Thanks!
>>
>> /Claes
>>
>> On 06/30/2017 06:50 PM, Claes Redestad wrote:
>>> Hi all,
>>>
>>> here's a startup optimization that turned out to also help the VM
>>> stack walking in general, including java exception throwing
>>> performance:
>>>
>>> Webrev: http://cr.openjdk.java.net/~redestad/8183299/hotspot.00/
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183299
>>>
>>> The static footprint cost of the more aggressive inlining is ~20Kb,
>>> which seems reasonable for the achieved speed-up (~5% in some
>>> microbenchmarks).
>>>
>>> Testing: JPRT (still in-flight), jtreg locally, microbenchmarks
>>>
>>> Thanks!
>>>
>>> /Claes
>>
More information about the hotspot-compiler-dev
mailing list