RFR: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode [v2]
SUN Guoyun
duke at openjdk.org
Thu Mar 28 11:45:32 UTC 2024
On Wed, 27 Mar 2024 09:32:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>
> What exactly is it that gives you the speedup in your benchmark? Spilling? Fewer add instructions? Would be nice to understand that better, and see what are potential examples where we would have regressions with your patch.
That is fewer spilling and add instructions make the benchmark speedup.
before opto:
<pre>
subq rsp, #32 # Create frame
02a movq RBP, [RSI + #16 (8-bit)] # long ! Field: CallNode.val
02e leaq R10, [RBP + #3]
032 movq [rsp + #0], R10 # spill
nop # 1 bytes pad for loops and calls
037 call,static CallNode::callNoInlineMethod
# CallNode::test @ bci:10 (line 11) L[0]=_ L[1]=rsp + #0 L[2]=_
# OopMap {off=60/0x3c}
044 B2: # out( N41 ) <- in( B1 ) Freq: 0.99998
# Block is sole successor of call
044 addq RAX, RBP # long
047 addq RAX, #3 # long
04b addq rsp, 32 # Destroy frame
</pre>
after opto:
<pre>
subq rsp, #16 # Create frame
02a movl RBP, #3 # long (unsigned 32-bit)
02f addq RBP, [RSI + #16 (8-bit)] # long
033 call,static CallNode::callNoInlineMethod
# CallNode::test @ bci:10 (line 11) L[0]=_ L[1]=RBP L[2]=_
# OopMap {off=56/0x38}
040 B2: # out( N36 ) <- in( B1 ) Freq: 0.99998
# Block is sole successor of call
040 addq RAX, RBP # long
043 addq rsp, 16 # Destroy frame
</pre>
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2024984725
More information about the hotspot-compiler-dev
mailing list