LRB midpath code quality

Mon Mar 4 11:51:06 UTC 2019

Hi there,

I have been looking into generated code quality for LRB.

Run the gc-bench test that writes a single int:
 https://icedtea.classpath.org/hg/gc-bench/

$ ~/trunks/shenandoah-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -jar
target/benchmarks.jar -jvmArgs "-XX:+UnlockExperimentalVMOptions -Xmx20g -XX:+UseShenandoahGC"
writes.Plain.test_int -prof perfasm:printMargin=30 2>&1 | tee lrb.perfasm

There are things to improve in default mode, but it is also visible with -XX:-UseCompressedOops:

                [Verified Entry Point]
  7.34%           0x00007f60e3a167b0: mov    %eax,-0x14000(%rsp)
  5.73%           0x00007f60e3a167b7: push   %rbp
  6.09%           0x00007f60e3a167b8: sub    $0x10,%rsp
  5.31%           0x00007f60e3a167bc: mov    0x10(%rsi),%r10
.......................... LRB fastpath check ..........................
  0.85%           0x00007f60e3a167c0: testb  $0x1,0x20(%r15)
  7.14%  ╭        0x00007f60e3a167c5: jne    0x00007f60e3a167db
.........│......... LRB fastpath ends, store to %r10 follows ...........
  0.38%  │   ↗    0x00007f60e3a167c7: movl   $0x2a,0x20(%r10)
 12.63%  │   │    0x00007f60e3a167cf: add    $0x10,%rsp
  0.40%  │   │    0x00007f60e3a167d3: pop    %rbp
  5.56%  │   │    0x00007f60e3a167d4: test   %eax,0x177b9826(%rip)
  0.29%  │   │    0x00007f60e3a167da: retq
---------│---│----------- LRB midpath starts --------------------------
.........│...│............ checking in-cset ...........................
         ↘   │    0x00007f60e3a167db: mov    %r10,%r11
             │    0x00007f60e3a167de: shr    $0x17,%r11
             │    0x00007f60e3a167e2: movabs $0x7f60f309c048,%r8
             │    0x00007f60e3a167ec: cmpb   $0x0,(%r8,%r11,1)
          ╭  │    0x00007f60e3a167f1: je     0x00007f60e3a16806
..........│..│............ checking null ..............................
          │  │    0x00007f60e3a167f3: test   %r10,%r10
          │╭ │    0x00007f60e3a167f6: je     0x00007f60e3a16820
..........││.│............ checking is-forwarded ......................
          ││ │    0x00007f60e3a167f8: mov    -0x8(%r10),%r11
          ││ │    0x00007f60e3a167fc: cmp    %r10,%r11
          ││╭│    0x00007f60e3a167ff: je     0x00007f60e3a1680b
..........││││............ return mess ................................
          ││││↗↗  0x00007f60e3a16801: mov    %r11,%r10
          │││╰││  0x00007f60e3a16804: jmp    0x00007f60e3a167c7
          ↘││ ││  0x00007f60e3a16806: mov    %r10,%r11
           ││ ╰│  0x00007f60e3a16809: jmp    0x00007f60e3a16801
...........││..│.......... slowpath call ..............................
           │↘  │  0x00007f60e3a1680b: mov    %r11,%rdi
           │   │  0x00007f60e3a1680e: movabs $0x7f60f9afad70,%r10
           │   │  0x00007f60e3a16818: callq  *%r10
           │   │  0x00007f60e3a1681b: mov    %rax,%r11
           │   ╰  0x00007f60e3a1681e: jmp    0x00007f60e3a16801

I would have expected the branches return straight to 0x00007f60e3a167c7, instead of jumping through
the "return mess", since %r10 is kept untouched.

-XX:+UseCompressedOops is messier:

               [Verified Entry Point]
  3.26%          0x00007f39ac476150: mov    %eax,-0x14000(%rsp)
  6.60%          0x00007f39ac476157: push   %rbp
  1.94%          0x00007f39ac476158: sub    $0x10,%rsp
  1.70%          0x00007f39ac47615c: mov    0xc(%rsi),%r11d
.......................... LRB fastpath check ..........................
  5.84%          0x00007f39ac476160: testb  $0x1,0x20(%r15)
  2.07%  ╭       0x00007f39ac476165: jne    0x00007f39ac47617c
.........│......... LRB fastpath ends, store to %r11 follows ...........
  1.36%  │   ↗   0x00007f39ac476167: movl   $0x2a,0xc(%r12,%r11,8)
 13.28%  │   │   0x00007f39ac476170: add    $0x10,%rsp
  3.36%  │   │   0x00007f39ac476174: pop    %rbp
  1.90%  │   │   0x00007f39ac476175: test   %eax,0x19e85e85(%rip)
  0.98%  │   │   0x00007f39ac47617b: retq
---------│---│----------- LRB midpath starts --------------------------
.........│...│............ checking in-cset ...........................
         ↘   │   0x00007f39ac47617c: mov    %r11,%r9
             │   0x00007f39ac47617f: shl    $0x3,%r9
             │   0x00007f39ac476183: mov    %r9,%r10
             │   0x00007f39ac476186: shr    $0x17,%r10
             │   0x00007f39ac47618a: movabs $0x7f39bc0871e0,%r8
             │   0x00007f39ac476194: cmpb   $0x0,(%r8,%r10,1)
          ╭  │   0x00007f39ac476199: je     0x00007f39ac4761ae
..........│..│............ checking null ..............................
          │  │   0x00007f39ac47619b: test   %r11d,%r11d
          │╭ │   0x00007f39ac47619e: je     0x00007f39ac4761cc
..........││.│............ checking is-forwarded ......................
          ││ │   0x00007f39ac4761a0: mov    -0x8(%r12,%r11,8),%r9
          ││ │   0x00007f39ac4761a5: lea    (%r12,%r11,8),%r10
          ││ │   0x00007f39ac4761a9: cmp    %r10,%r9
          ││╭│   0x00007f39ac4761ac: je     0x00007f39ac4761b7
..........││││............ return mess ................................
          ↘│││↗  0x00007f39ac4761ae: mov    %r9,%r11
           ││││  0x00007f39ac4761b1: shr    $0x3,%r11
           ││╰│  0x00007f39ac4761b5: jmp    0x00007f39ac476167
...........││.│.......... slowpath call ...............................
           │↘ │  0x00007f39ac4761b7: mov    %r9,%rdi
           │  │  0x00007f39ac4761ba: movabs $0x7f39c4c26d70,%r10
           │  │  0x00007f39ac4761c4: callq  *%r10
           │  │  0x00007f39ac4761c7: mov    %rax,%r9
           │  ╰  0x00007f39ac4761ca: jmp    0x00007f39ac4761ae

Same thing here, and "return mess" packs the reference back for returning. It seems useless as %r11
still carries the unpacked reference on non-in-cset path. Also, %r9 is available with unpacked
reference during "checking is-forwarded" execution, being unpacked earlier during "checking in-cset".

Maybe LRB expansion in C2 needs touchups to handle these, to optimize code size and performance when
GC is active.

-Aleksey