[10] RFR: 8183299: Improve inlining of CompiledMethod methods into frame::sender

Vladimir Kozlov vladimir.kozlov at oracle.com
Mon Jul 3 20:06:36 UTC 2017


Thank you for testing. Yes, inlining is main optimization in all 
compilers ;)

Changes are good.

Thanks,
Vladimir

On 7/3/17 12:49 PM, Claes Redestad wrote:
> Hi Vladimir,
> 
> ran and profiled an old Throw JMH micro[1] with -prof perfasm: no 
> statistically significant difference by
> swapping, while it's clear that the inlining has significant effect in 
> profiles (and on throughput
> when not profiling):
> 
> baseline
> ....[Hottest Methods (after 
> inlining)]..............................................................
>   14.10%   17.16%           libjvm.so 
> java_lang_Throwable::fill_in_stack_trace
>   11.55%   14.62%           libjvm.so  CodeHeap::find_blob_unsafe
>    7.79%    9.34%           libjvm.so  objArrayOopDesc::obj_at_put
>    7.48%    8.73%           libjvm.so  CodeCache::find_blob
>    7.35%    6.59%           libjvm.so  BacktraceBuilder::push
> *  6.14%    5.84%           libjvm.so  frame::sender *
>    5.28%    2.83%           libjvm.so  ObjArrayKlass::allocate
>    3.54%    3.83%           libjvm.so TypeArrayKlass::allocate_common
>    3.31%    3.21%           libjvm.so 
> G1SATBCardTableLoggingModRefBS::write_ref_field_work
>    3.17%    1.91%           libjvm.so  CodeBlob::is_zombie
>    2.74%    2.44%           [unknown]  [unknown] *
> **  2.31%    0.92%           libjvm.so 
> CompiledMethod::get_deopt_original_pc *
>    2.06%    1.35%         c2, level 4 java.lang.Exception::<init>, 
> version 818
>    1.90%    0.92%        libc-2.23.so  __memset_avx2
>    1.57%    2.14%           libjvm.so  frame::is_interpreted_frame
>    1.29%    1.35%           libjvm.so  BacktraceBuilder::expand
>    1.25%    1.22%           libjvm.so  oopDesc::is_a
>    1.20%    1.26%           libjvm.so  Handle::Handle
>    1.03%    1.46%           libjvm.so 
> G1SATBCardTableModRefBS::write_ref_field_pre_work
>    0.77%    0.43%           libjvm.so oopDesc::release_obj_field_put
>   13.36%   11.77%  <...other 145 warm methods...>
> 
> hotspot.01:
> ....[Hottest Methods (after 
> inlining)]..............................................................
>   15.44%   16.42%           libjvm.so 
> java_lang_Throwable::fill_in_stack_trace
>   11.83%   14.56%           libjvm.so  CodeHeap::find_blob_unsafe
>    8.20%    6.59%           libjvm.so  BacktraceBuilder::push
>    7.27%    8.77%           libjvm.so  CodeCache::find_blob
>    7.15%    9.20%           libjvm.so  objArrayOopDesc::obj_at_put
>    5.73%    2.73%           libjvm.so  ObjArrayKlass::allocate
>    3.93%    3.27%           [unknown]  [unknown]
> *  3.89%    3.43%           libjvm.so  frame::sender *
>    3.88%    3.22%           libjvm.so 
> G1SATBCardTableLoggingModRefBS::write_ref_field_work
>    3.70%    4.09%           libjvm.so TypeArrayKlass::allocate_common
>    3.16%    4.16%           libjvm.so frame::sender_for_interpreter_frame
>    2.14%    1.00%        libc-2.23.so  __memset_avx2
>    2.12%    1.25%           libjvm.so  CodeBlob::is_zombie
>    1.66%    1.37%         c2, level 4 java.lang.Exception::<init>, 
> version 817
>    1.44%    1.19%           libjvm.so  Handle::Handle
>    1.43%    2.09%           libjvm.so  frame::is_interpreted_frame
>    1.08%    1.26%           libjvm.so  BacktraceBuilder::expand
>    1.05%    1.27%           libjvm.so 
> G1SATBCardTableModRefBS::write_ref_field_pre_work
>    1.04%    1.23%           libjvm.so  oopDesc::is_a
>    0.81%    0.49%           libjvm.so oopDesc::release_obj_field_put
>   12.28%   11.64%  <...other 145 warm methods...>
> 
> hotspot.01 + swap:
> ....[Hottest Methods (after 
> inlining)]..............................................................
>   14.70%   15.91%           libjvm.so 
> java_lang_Throwable::fill_in_stack_trace
>   11.72%   14.55%           libjvm.so  CodeHeap::find_blob_unsafe
>    7.79%    6.65%           libjvm.so  BacktraceBuilder::push
>    7.51%    9.06%           libjvm.so  objArrayOopDesc::obj_at_put
>    7.30%    8.41%           libjvm.so  CodeCache::find_blob
>    5.63%    3.24%           libjvm.so  ObjArrayKlass::allocate
> *  4.22%    4.15%           libjvm.so  frame::sender*
>    3.80%    3.44%           libjvm.so 
> G1SATBCardTableLoggingModRefBS::write_ref_field_work
>    3.70%    3.98%           libjvm.so TypeArrayKlass::allocate_common
>    3.48%    4.17%           libjvm.so frame::sender_for_interpreter_frame
>    2.59%    2.31%           [unknown]  [unknown]
>    2.35%    1.31%        libc-2.23.so  __memset_avx2
>    2.15%    1.12%         c2, level 4 java.lang.Exception::<init>, 
> version 808
>    1.80%    1.03%           libjvm.so  CodeBlob::is_zombie
>    1.42%    1.57%           libjvm.so  Handle::Handle
>    1.30%    2.07%           libjvm.so  frame::is_interpreted_frame
>    1.20%    1.29%           libjvm.so  BacktraceBuilder::expand
>    1.17%    1.37%           libjvm.so  oopDesc::is_a
>    1.03%    1.14%           libjvm.so  Method::bci_from
>    1.00%    1.37%           libjvm.so 
> G1SATBCardTableModRefBS::write_ref_field_pre_work
>   13.20%   11.06%  <...other 162 warm methods...>
> 
> It's interesting to see that relative time in frame::sender decreases 
> after get_deopt_original_pc
> is inlined into it, which suggests (unsurprisingly?) that the inlining 
> itself allows gcc to optimize
> things further.
> 
> Thanks!
> 
> /Claes
> 
> [1] It's an old microbenchmark, but it checks out:
> 
> @State(Scope.Thread)
> public class Throw {
> 
>      public Object useObject = new Object();
>      public Object dummyObj;
> 
>      @Benchmark
>      public void throwSyncException() {
>          try {
>              throwingMethod();
>          } catch (Exception ex) {
>              dummyObj = useObject;
>          }
>      }
> 
>      public void throwingMethod() throws Exception {
>          if (alwaysTrue) {
>              throw new Exception();
>          }
>      }
> }
> 
> On 2017-07-03 17:56, Vladimir Kozlov wrote:
>> Claes,
>>
>> Did you try to swap && conditions?:
>>
>>   36     || (pc == (deopt_handler_begin() + 
>> NativeCall::instruction_size) && is_compiled_by_jvmci())
>>
>> is_compiled_by_jvmci() could be cache miss and other check is false in 
>> all cases (until Graal is used as JIT).
>>
>> Thanks,
>> Vladimir
>>
>> On 7/3/17 7:41 AM, Claes Redestad wrote:
>>> <adding hotspot-runtime-dev since this patch now touches 
>>> frame.inline.hpp as well>
>>>
>>> Hi,
>>>
>>> I've reworked the patch to introduce a compiledMethod.inline.hpp 
>>> instead to break a
>>> circular dependency on code/nativeInstr.hpp on some platforms.
>>>
>>> http://cr.openjdk.java.net/~redestad/8183299/hotspot.01/
>>>
>>> An attempt to tease apart the include dependencies was made (Stefan 
>>> Karlsson has
>>> an almost working patch to clean things up significantly), but there 
>>> is currently a lot of
>>> code out there that includes frame.inline.hpp transitively which 
>>> makes it cumbersome
>>> to add this new include in any other place and still keep the linker 
>>> happy.
>>>
>>> Testing: JPRT
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 06/30/2017 06:50 PM, Claes Redestad wrote:
>>>> Hi all,
>>>>
>>>> here's a startup optimization that turned out to also help the VM 
>>>> stack walking in general, including java exception throwing 
>>>> performance:
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~redestad/8183299/hotspot.00/
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183299
>>>>
>>>> The static footprint cost of the more aggressive inlining is ~20Kb, 
>>>> which seems reasonable for the achieved speed-up (~5% in some 
>>>> microbenchmarks).
>>>>
>>>> Testing: JPRT (still in-flight), jtreg locally, microbenchmarks
>>>>
>>>> Thanks!
>>>>
>>>> /Claes
>>>
> 


More information about the hotspot-compiler-dev mailing list