RFR 8222717 [lworld] Calling convention - repair C1 stack

Tobias Hartmann tobias.hartmann at oracle.com
Wed Apr 24 13:21:06 UTC 2019


Hi Ioi,

On 23.04.19 03:02, Ioi Lam wrote:
> Here's an example of the code that I am generating for JDK-8222717 [lworld] Calling
> convention - repair C1 stack [1]
> 
> The C2 stack repair code relies on the reserved slots in the VEP calling convention
> (see [2], page 14) to preserve the caller's return address.
> 
> However, I haven't quite figured out how to do the same thing for
> the VVEP calling convention (as doing so will also recursively affect the VEP convetion).
> So for now, I decide to have a simplier approach for C1, by directly manipulating the
> return address on the stack. See line 165 in the following dump.

Yes, that's reasonable.

> Actually, I am not quite sure how the C2 code uses the RA pushed by line 31, but it
> turns out to be very handy for C1 :-)

It currently does not use it but just pushes it there for consistency (in case someone looks at the
frame). I'm planning to revisit/benchmark the implementation once everything is done and maybe get
rid of the reserved entry complexity by restoring the RA address in a similar way you are doing it
for C1 (requires two additional mov instructions at the end).

> Here's the webrev:
> 
> http://cr.openjdk.java.net/~iklam/valhalla/8222717-c1-stack-repair.v01/
> 
> With C1, the frequency of stack extension is much lower (only when you have scalarized
> floating-point fields), so even though the code is not as efficient as the C2 stack
> repair code, maybe it's still OK?
> 
> What do you think?

Yes, that looks good to me.

Thanks,
Tobias

> 
> public class Foo {
>     static value class V {
>       int a = 0, b = 0;
>     }
>     // C1 extends stack (1 extra stack word)
>     static int test(U u1, int a1, int a2, int a3, int a4, int a5, int a6) {
>         return a1 + a2 + a6;
>     }
>     public static void main(String args[]) {
>         V v = new V();
>         System.out.println("Hello: " + test(u, 1, 2, 3, 4, 5, 6));
>     }
> }
> 
> ----------------------------------------------------------------------
> 
> Foo.test(QFoo$U;IIIIII)I  [0x00007fc7a0e361e0, 0x00007fc7a0e36398]  440 bytes
> [Disassembling for mach='i386:x86-64']
>   # {method} {0x00007fc78b2f7808} 'test' '(QFoo$U;IIIIII)I' in 'Foo'
> [Entry Point]
> [Verified Entry Point]
> [Verified Value Entry Point (RO)]
>   # parm0:    xmm0      = float
>   # parm1:    xmm1      = float
>   # parm2:    xmm2      = float
>   # parm3:    rsi       = int
>   # parm4:    rdx       = int
>   # parm5:    rcx       = int
>   # parm6:    r8        = int
>   # parm7:    r9        = int
>   # parm8:    rdi       = int
>   #           [sp+0x40]  (sp of caller)
>  ;;  block B1 [0, 0]
> 
>      0: push   %rbp
>      1: sub    $0x30,%rsp
>      5: mov    $0x7fc78b2f7808,%rbx
>     15: callq  0x00007fc7a09d8ac0  ;   {runtime_call buffer_value_args Runtime1 stub}
>     20: pop    %rbp
> 
> // extend stack
>     21: add    $0x30,%rsp
>     25: pop    %r13
>     27: sub    $0x10,%rsp
>     31: push   %r13                ; << RA saved by stack extension code
> 
>     33: mov    %rdi,0x8(%rsp)
>     38: mov    %r9,%rdi
>     41: mov    %r8,%r9
>     44: mov    %rcx,%r8
>     47: mov    %rdx,%rcx
>     50: mov    %rsi,%rdx
>     53: mov    0x10(%rax),%esi
>     56: vmovss %xmm0,0x10(%rsi)
>     61: vmovss %xmm1,0x14(%rsi)
>     66: vmovss %xmm2,0x18(%rsi)
> 
>     71: mov    %eax,-0x16000(%rsp)
> 
>     78: push   %rbp                ; << now RA is just one word below saved rbp
>     79: sub    $0x30,%rsp
>     83: movq   $0x50,0x8(%rsp)
>     92: jmpq    L_1 (149) v
> 
> [Verified Value Entry Point]
>   # parm0:    rsi:rsi   = 'java/lang/Object'
>   # parm1:    rdx       = int
>   # parm2:    rcx       = int
>   # parm3:    r8        = int
>   # parm4:    r9        = int
>   # parm5:    rdi       = int
>   # parm6:    [sp+0x40]   = int  (sp of caller)
>    128: mov    %eax,-0x16000(%rsp)
>    135: push   %rbp                ; << RA is  one word below saved rbp
>    136: sub    $0x30,%rsp
>    140: movq   $0x40,0x8(%rsp)
> 
> L_1
>    149: mov    0x40(%rsp),%eax
>  ;;  block B0 [0, 6]
> 
>    153: add    %ecx,%edx
>    155: add    %eax,%edx
>    157: mov    %rdx,%rax
> 
>    // stack repair
>    160: mov    0x38(%rsp),%r13     ; get saved RA
>    165: mov    0x30(%rsp),%rbp     ; restore saved rbp
>    170: add    0x8(%rsp),%rsp
>    175: push   %r13                ; push RA, so stack would look the
>                                    ; same as @ line 135
>    // stack repair - end
> 
>    177: mov    0x128(%r15),%r10
>    184: test   %eax,(%r10)         ;   {poll_return}
>    187: retq                       ; return to caller
> 
> 
> ----------------------------------------------------------------------
> [1] https://bugs.openjdk.java.net/browse/JDK-8222717
> [2] http://cr.openjdk.java.net/~thartmann/talks/2019-ValueType_Optimizations.pdf
> 


More information about the valhalla-dev mailing list