[10] RFR: 8183299: Improve inlining of CompiledMethod methods into frame::sender

Claes Redestad claes.redestad at oracle.com
Mon Jul 3 19:49:53 UTC 2017


Hi Vladimir,

ran and profiled an old Throw JMH micro[1] with -prof perfasm: no 
statistically significant difference by
swapping, while it's clear that the inlining has significant effect in 
profiles (and on throughput
when not profiling):

baseline
....[Hottest Methods (after 
inlining)]..............................................................
  14.10%   17.16%           libjvm.so 
java_lang_Throwable::fill_in_stack_trace
  11.55%   14.62%           libjvm.so  CodeHeap::find_blob_unsafe
   7.79%    9.34%           libjvm.so  objArrayOopDesc::obj_at_put
   7.48%    8.73%           libjvm.so  CodeCache::find_blob
   7.35%    6.59%           libjvm.so  BacktraceBuilder::push
*  6.14%    5.84%           libjvm.so  frame::sender *
   5.28%    2.83%           libjvm.so  ObjArrayKlass::allocate
   3.54%    3.83%           libjvm.so TypeArrayKlass::allocate_common
   3.31%    3.21%           libjvm.so 
G1SATBCardTableLoggingModRefBS::write_ref_field_work
   3.17%    1.91%           libjvm.so  CodeBlob::is_zombie
   2.74%    2.44%           [unknown]  [unknown] *
**  2.31%    0.92%           libjvm.so 
CompiledMethod::get_deopt_original_pc *
   2.06%    1.35%         c2, level 4 java.lang.Exception::<init>, 
version 818
   1.90%    0.92%        libc-2.23.so  __memset_avx2
   1.57%    2.14%           libjvm.so  frame::is_interpreted_frame
   1.29%    1.35%           libjvm.so  BacktraceBuilder::expand
   1.25%    1.22%           libjvm.so  oopDesc::is_a
   1.20%    1.26%           libjvm.so  Handle::Handle
   1.03%    1.46%           libjvm.so 
G1SATBCardTableModRefBS::write_ref_field_pre_work
   0.77%    0.43%           libjvm.so oopDesc::release_obj_field_put
  13.36%   11.77%  <...other 145 warm methods...>

hotspot.01:
....[Hottest Methods (after 
inlining)]..............................................................
  15.44%   16.42%           libjvm.so 
java_lang_Throwable::fill_in_stack_trace
  11.83%   14.56%           libjvm.so  CodeHeap::find_blob_unsafe
   8.20%    6.59%           libjvm.so  BacktraceBuilder::push
   7.27%    8.77%           libjvm.so  CodeCache::find_blob
   7.15%    9.20%           libjvm.so  objArrayOopDesc::obj_at_put
   5.73%    2.73%           libjvm.so  ObjArrayKlass::allocate
   3.93%    3.27%           [unknown]  [unknown]
*  3.89%    3.43%           libjvm.so  frame::sender *
   3.88%    3.22%           libjvm.so 
G1SATBCardTableLoggingModRefBS::write_ref_field_work
   3.70%    4.09%           libjvm.so TypeArrayKlass::allocate_common
   3.16%    4.16%           libjvm.so frame::sender_for_interpreter_frame
   2.14%    1.00%        libc-2.23.so  __memset_avx2
   2.12%    1.25%           libjvm.so  CodeBlob::is_zombie
   1.66%    1.37%         c2, level 4 java.lang.Exception::<init>, 
version 817
   1.44%    1.19%           libjvm.so  Handle::Handle
   1.43%    2.09%           libjvm.so  frame::is_interpreted_frame
   1.08%    1.26%           libjvm.so  BacktraceBuilder::expand
   1.05%    1.27%           libjvm.so 
G1SATBCardTableModRefBS::write_ref_field_pre_work
   1.04%    1.23%           libjvm.so  oopDesc::is_a
   0.81%    0.49%           libjvm.so oopDesc::release_obj_field_put
  12.28%   11.64%  <...other 145 warm methods...>

hotspot.01 + swap:
....[Hottest Methods (after 
inlining)]..............................................................
  14.70%   15.91%           libjvm.so 
java_lang_Throwable::fill_in_stack_trace
  11.72%   14.55%           libjvm.so  CodeHeap::find_blob_unsafe
   7.79%    6.65%           libjvm.so  BacktraceBuilder::push
   7.51%    9.06%           libjvm.so  objArrayOopDesc::obj_at_put
   7.30%    8.41%           libjvm.so  CodeCache::find_blob
   5.63%    3.24%           libjvm.so  ObjArrayKlass::allocate
*  4.22%    4.15%           libjvm.so  frame::sender*
   3.80%    3.44%           libjvm.so 
G1SATBCardTableLoggingModRefBS::write_ref_field_work
   3.70%    3.98%           libjvm.so TypeArrayKlass::allocate_common
   3.48%    4.17%           libjvm.so frame::sender_for_interpreter_frame
   2.59%    2.31%           [unknown]  [unknown]
   2.35%    1.31%        libc-2.23.so  __memset_avx2
   2.15%    1.12%         c2, level 4 java.lang.Exception::<init>, 
version 808
   1.80%    1.03%           libjvm.so  CodeBlob::is_zombie
   1.42%    1.57%           libjvm.so  Handle::Handle
   1.30%    2.07%           libjvm.so  frame::is_interpreted_frame
   1.20%    1.29%           libjvm.so  BacktraceBuilder::expand
   1.17%    1.37%           libjvm.so  oopDesc::is_a
   1.03%    1.14%           libjvm.so  Method::bci_from
   1.00%    1.37%           libjvm.so 
G1SATBCardTableModRefBS::write_ref_field_pre_work
  13.20%   11.06%  <...other 162 warm methods...>

It's interesting to see that relative time in frame::sender decreases 
after get_deopt_original_pc
is inlined into it, which suggests (unsurprisingly?) that the inlining 
itself allows gcc to optimize
things further.

Thanks!

/Claes

[1] It's an old microbenchmark, but it checks out:

@State(Scope.Thread)
public class Throw {

     public Object useObject = new Object();
     public Object dummyObj;

     @Benchmark
     public void throwSyncException() {
         try {
             throwingMethod();
         } catch (Exception ex) {
             dummyObj = useObject;
         }
     }

     public void throwingMethod() throws Exception {
         if (alwaysTrue) {
             throw new Exception();
         }
     }
}

On 2017-07-03 17:56, Vladimir Kozlov wrote:
> Claes,
>
> Did you try to swap && conditions?:
>
>   36     || (pc == (deopt_handler_begin() + 
> NativeCall::instruction_size) && is_compiled_by_jvmci())
>
> is_compiled_by_jvmci() could be cache miss and other check is false in 
> all cases (until Graal is used as JIT).
>
> Thanks,
> Vladimir
>
> On 7/3/17 7:41 AM, Claes Redestad wrote:
>> <adding hotspot-runtime-dev since this patch now touches 
>> frame.inline.hpp as well>
>>
>> Hi,
>>
>> I've reworked the patch to introduce a compiledMethod.inline.hpp 
>> instead to break a
>> circular dependency on code/nativeInstr.hpp on some platforms.
>>
>> http://cr.openjdk.java.net/~redestad/8183299/hotspot.01/
>>
>> An attempt to tease apart the include dependencies was made (Stefan 
>> Karlsson has
>> an almost working patch to clean things up significantly), but there 
>> is currently a lot of
>> code out there that includes frame.inline.hpp transitively which 
>> makes it cumbersome
>> to add this new include in any other place and still keep the linker 
>> happy.
>>
>> Testing: JPRT
>>
>> Thanks!
>>
>> /Claes
>>
>> On 06/30/2017 06:50 PM, Claes Redestad wrote:
>>> Hi all,
>>>
>>> here's a startup optimization that turned out to also help the VM 
>>> stack walking in general, including java exception throwing 
>>> performance:
>>>
>>> Webrev: http://cr.openjdk.java.net/~redestad/8183299/hotspot.00/
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8183299
>>>
>>> The static footprint cost of the more aggressive inlining is ~20Kb, 
>>> which seems reasonable for the achieved speed-up (~5% in some 
>>> microbenchmarks).
>>>
>>> Testing: JPRT (still in-flight), jtreg locally, microbenchmarks
>>>
>>> Thanks!
>>>
>>> /Claes
>>



More information about the hotspot-compiler-dev mailing list