LRB midpath code quality

Roman Kennke rkennke at redhat.com
Mon Mar 4 12:49:02 UTC 2019


Any ideas why C2 is doing this?

Roland: do you think this can be improved?

Thanks, Roman


Am 4. März 2019 12:51:06 MEZ schrieb Aleksey Shipilev <shade at redhat.com>:
>Hi there,
>
>I have been looking into generated code quality for LRB.
>
>Run the gc-bench test that writes a single int:
> https://icedtea.classpath.org/hg/gc-bench/
>
>$
>~/trunks/shenandoah-jdk/build/linux-x86_64-server-release/images/jdk/bin/java
>-jar
>target/benchmarks.jar -jvmArgs "-XX:+UnlockExperimentalVMOptions
>-Xmx20g -XX:+UseShenandoahGC"
>writes.Plain.test_int -prof perfasm:printMargin=30 2>&1 | tee
>lrb.perfasm
>
>There are things to improve in default mode, but it is also visible
>with -XX:-UseCompressedOops:
>
>                [Verified Entry Point]
>  7.34%           0x00007f60e3a167b0: mov    %eax,-0x14000(%rsp)
>  5.73%           0x00007f60e3a167b7: push   %rbp
>  6.09%           0x00007f60e3a167b8: sub    $0x10,%rsp
>  5.31%           0x00007f60e3a167bc: mov    0x10(%rsi),%r10
>.......................... LRB fastpath check
>..........................
>  0.85%           0x00007f60e3a167c0: testb  $0x1,0x20(%r15)
>  7.14%  ╭        0x00007f60e3a167c5: jne    0x00007f60e3a167db
>.........│......... LRB fastpath ends, store to %r10 follows
>...........
>  0.38%  │   ↗    0x00007f60e3a167c7: movl   $0x2a,0x20(%r10)
> 12.63%  │   │    0x00007f60e3a167cf: add    $0x10,%rsp
>  0.40%  │   │    0x00007f60e3a167d3: pop    %rbp
>  5.56%  │   │    0x00007f60e3a167d4: test   %eax,0x177b9826(%rip)
>  0.29%  │   │    0x00007f60e3a167da: retq
>---------│---│----------- LRB midpath starts --------------------------
>.........│...│............ checking in-cset ...........................
>         ↘   │    0x00007f60e3a167db: mov    %r10,%r11
>             │    0x00007f60e3a167de: shr    $0x17,%r11
>             │    0x00007f60e3a167e2: movabs $0x7f60f309c048,%r8
>             │    0x00007f60e3a167ec: cmpb   $0x0,(%r8,%r11,1)
>          ╭  │    0x00007f60e3a167f1: je     0x00007f60e3a16806
>..........│..│............ checking null ..............................
>          │  │    0x00007f60e3a167f3: test   %r10,%r10
>          │╭ │    0x00007f60e3a167f6: je     0x00007f60e3a16820
>..........││.│............ checking is-forwarded ......................
>          ││ │    0x00007f60e3a167f8: mov    -0x8(%r10),%r11
>          ││ │    0x00007f60e3a167fc: cmp    %r10,%r11
>          ││╭│    0x00007f60e3a167ff: je     0x00007f60e3a1680b
>..........││││............ return mess ................................
>          ││││↗↗  0x00007f60e3a16801: mov    %r11,%r10
>          │││╰││  0x00007f60e3a16804: jmp    0x00007f60e3a167c7
>          ↘││ ││  0x00007f60e3a16806: mov    %r10,%r11
>           ││ ╰│  0x00007f60e3a16809: jmp    0x00007f60e3a16801
>...........││..│.......... slowpath call ..............................
>           │↘  │  0x00007f60e3a1680b: mov    %r11,%rdi
>           │   │  0x00007f60e3a1680e: movabs $0x7f60f9afad70,%r10
>           │   │  0x00007f60e3a16818: callq  *%r10
>           │   │  0x00007f60e3a1681b: mov    %rax,%r11
>           │   ╰  0x00007f60e3a1681e: jmp    0x00007f60e3a16801
>
>
>I would have expected the branches return straight to
>0x00007f60e3a167c7, instead of jumping through
>the "return mess", since %r10 is kept untouched.
>
>-XX:+UseCompressedOops is messier:
>
>               [Verified Entry Point]
>  3.26%          0x00007f39ac476150: mov    %eax,-0x14000(%rsp)
>  6.60%          0x00007f39ac476157: push   %rbp
>  1.94%          0x00007f39ac476158: sub    $0x10,%rsp
>  1.70%          0x00007f39ac47615c: mov    0xc(%rsi),%r11d
>.......................... LRB fastpath check
>..........................
>  5.84%          0x00007f39ac476160: testb  $0x1,0x20(%r15)
>  2.07%  ╭       0x00007f39ac476165: jne    0x00007f39ac47617c
>.........│......... LRB fastpath ends, store to %r11 follows
>...........
>  1.36%  │   ↗   0x00007f39ac476167: movl   $0x2a,0xc(%r12,%r11,8)
> 13.28%  │   │   0x00007f39ac476170: add    $0x10,%rsp
>  3.36%  │   │   0x00007f39ac476174: pop    %rbp
>  1.90%  │   │   0x00007f39ac476175: test   %eax,0x19e85e85(%rip)
>  0.98%  │   │   0x00007f39ac47617b: retq
>---------│---│----------- LRB midpath starts --------------------------
>.........│...│............ checking in-cset ...........................
>         ↘   │   0x00007f39ac47617c: mov    %r11,%r9
>             │   0x00007f39ac47617f: shl    $0x3,%r9
>             │   0x00007f39ac476183: mov    %r9,%r10
>             │   0x00007f39ac476186: shr    $0x17,%r10
>             │   0x00007f39ac47618a: movabs $0x7f39bc0871e0,%r8
>             │   0x00007f39ac476194: cmpb   $0x0,(%r8,%r10,1)
>          ╭  │   0x00007f39ac476199: je     0x00007f39ac4761ae
>..........│..│............ checking null ..............................
>          │  │   0x00007f39ac47619b: test   %r11d,%r11d
>          │╭ │   0x00007f39ac47619e: je     0x00007f39ac4761cc
>..........││.│............ checking is-forwarded ......................
>          ││ │   0x00007f39ac4761a0: mov    -0x8(%r12,%r11,8),%r9
>          ││ │   0x00007f39ac4761a5: lea    (%r12,%r11,8),%r10
>          ││ │   0x00007f39ac4761a9: cmp    %r10,%r9
>          ││╭│   0x00007f39ac4761ac: je     0x00007f39ac4761b7
>..........││││............ return mess ................................
>          ↘│││↗  0x00007f39ac4761ae: mov    %r9,%r11
>           ││││  0x00007f39ac4761b1: shr    $0x3,%r11
>           ││╰│  0x00007f39ac4761b5: jmp    0x00007f39ac476167
>...........││.│.......... slowpath call ...............................
>           │↘ │  0x00007f39ac4761b7: mov    %r9,%rdi
>           │  │  0x00007f39ac4761ba: movabs $0x7f39c4c26d70,%r10
>           │  │  0x00007f39ac4761c4: callq  *%r10
>           │  │  0x00007f39ac4761c7: mov    %rax,%r9
>           │  ╰  0x00007f39ac4761ca: jmp    0x00007f39ac4761ae
>
>Same thing here, and "return mess" packs the reference back for
>returning. It seems useless as %r11
>still carries the unpacked reference on non-in-cset path. Also, %r9 is
>available with unpacked
>reference during "checking is-forwarded" execution, being unpacked
>earlier during "checking in-cset".
>
>Maybe LRB expansion in C2 needs touchups to handle these, to optimize
>code size and performance when
>GC is active.
>
>-Aleksey

-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.


More information about the shenandoah-dev mailing list