RFR: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode [v2]

Thu Mar 28 11:45:32 UTC 2024

On Wed, 27 Mar 2024 09:32:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> 
> What exactly is it that gives you the speedup in your benchmark? Spilling? Fewer add instructions? Would be nice to understand that better, and see what are potential examples where we would have regressions with your patch.

That is fewer spilling and add instructions make the benchmark speedup.
before opto:
<pre>
	   subq    rsp, #32	# Create frame
02a     movq    RBP, [RSI + #16 (8-bit)]	# long ! Field: CallNode.val
02e     leaq    R10, [RBP + #3]
032     movq    [rsp + #0], R10	# spill
        nop 	# 1 bytes pad for loops and calls
037     call,static  CallNode::callNoInlineMethod
        # CallNode::test @ bci:10 (line 11) L[0]=_ L[1]=rsp + #0 L[2]=_
        # OopMap {off=60/0x3c}

044     B2: #	out( N41 ) <- in( B1 )  Freq: 0.99998
        # Block is sole successor of call
044     addq    RAX, RBP	# long
047     addq    RAX, #3	# long
04b     addq    rsp, 32	# Destroy frame
</pre>

after opto:
<pre>
  	   subq    rsp, #16	# Create frame
02a     movl    RBP, #3	# long (unsigned 32-bit)
02f     addq    RBP, [RSI + #16 (8-bit)]	# long
033     call,static  CallNode::callNoInlineMethod
        # CallNode::test @ bci:10 (line 11) L[0]=_ L[1]=RBP L[2]=_
        # OopMap {off=56/0x38}

040     B2: #	out( N36 ) <- in( B1 )  Freq: 0.99998
        # Block is sole successor of call
040     addq    RAX, RBP	# long
043     addq    rsp, 16	# Destroy frame

</pre>

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2024984725