LRB midpath code quality
Aleksey Shipilev
shade at redhat.com
Mon Mar 4 11:51:06 UTC 2019
Hi there,
I have been looking into generated code quality for LRB.
Run the gc-bench test that writes a single int:
https://icedtea.classpath.org/hg/gc-bench/
$ ~/trunks/shenandoah-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -jar
target/benchmarks.jar -jvmArgs "-XX:+UnlockExperimentalVMOptions -Xmx20g -XX:+UseShenandoahGC"
writes.Plain.test_int -prof perfasm:printMargin=30 2>&1 | tee lrb.perfasm
There are things to improve in default mode, but it is also visible with -XX:-UseCompressedOops:
[Verified Entry Point]
7.34% 0x00007f60e3a167b0: mov %eax,-0x14000(%rsp)
5.73% 0x00007f60e3a167b7: push %rbp
6.09% 0x00007f60e3a167b8: sub $0x10,%rsp
5.31% 0x00007f60e3a167bc: mov 0x10(%rsi),%r10
.......................... LRB fastpath check ..........................
0.85% 0x00007f60e3a167c0: testb $0x1,0x20(%r15)
7.14% ╭ 0x00007f60e3a167c5: jne 0x00007f60e3a167db
.........│......... LRB fastpath ends, store to %r10 follows ...........
0.38% │ ↗ 0x00007f60e3a167c7: movl $0x2a,0x20(%r10)
12.63% │ │ 0x00007f60e3a167cf: add $0x10,%rsp
0.40% │ │ 0x00007f60e3a167d3: pop %rbp
5.56% │ │ 0x00007f60e3a167d4: test %eax,0x177b9826(%rip)
0.29% │ │ 0x00007f60e3a167da: retq
---------│---│----------- LRB midpath starts --------------------------
.........│...│............ checking in-cset ...........................
↘ │ 0x00007f60e3a167db: mov %r10,%r11
│ 0x00007f60e3a167de: shr $0x17,%r11
│ 0x00007f60e3a167e2: movabs $0x7f60f309c048,%r8
│ 0x00007f60e3a167ec: cmpb $0x0,(%r8,%r11,1)
╭ │ 0x00007f60e3a167f1: je 0x00007f60e3a16806
..........│..│............ checking null ..............................
│ │ 0x00007f60e3a167f3: test %r10,%r10
│╭ │ 0x00007f60e3a167f6: je 0x00007f60e3a16820
..........││.│............ checking is-forwarded ......................
││ │ 0x00007f60e3a167f8: mov -0x8(%r10),%r11
││ │ 0x00007f60e3a167fc: cmp %r10,%r11
││╭│ 0x00007f60e3a167ff: je 0x00007f60e3a1680b
..........││││............ return mess ................................
││││↗↗ 0x00007f60e3a16801: mov %r11,%r10
│││╰││ 0x00007f60e3a16804: jmp 0x00007f60e3a167c7
↘││ ││ 0x00007f60e3a16806: mov %r10,%r11
││ ╰│ 0x00007f60e3a16809: jmp 0x00007f60e3a16801
...........││..│.......... slowpath call ..............................
│↘ │ 0x00007f60e3a1680b: mov %r11,%rdi
│ │ 0x00007f60e3a1680e: movabs $0x7f60f9afad70,%r10
│ │ 0x00007f60e3a16818: callq *%r10
│ │ 0x00007f60e3a1681b: mov %rax,%r11
│ ╰ 0x00007f60e3a1681e: jmp 0x00007f60e3a16801
I would have expected the branches return straight to 0x00007f60e3a167c7, instead of jumping through
the "return mess", since %r10 is kept untouched.
-XX:+UseCompressedOops is messier:
[Verified Entry Point]
3.26% 0x00007f39ac476150: mov %eax,-0x14000(%rsp)
6.60% 0x00007f39ac476157: push %rbp
1.94% 0x00007f39ac476158: sub $0x10,%rsp
1.70% 0x00007f39ac47615c: mov 0xc(%rsi),%r11d
.......................... LRB fastpath check ..........................
5.84% 0x00007f39ac476160: testb $0x1,0x20(%r15)
2.07% ╭ 0x00007f39ac476165: jne 0x00007f39ac47617c
.........│......... LRB fastpath ends, store to %r11 follows ...........
1.36% │ ↗ 0x00007f39ac476167: movl $0x2a,0xc(%r12,%r11,8)
13.28% │ │ 0x00007f39ac476170: add $0x10,%rsp
3.36% │ │ 0x00007f39ac476174: pop %rbp
1.90% │ │ 0x00007f39ac476175: test %eax,0x19e85e85(%rip)
0.98% │ │ 0x00007f39ac47617b: retq
---------│---│----------- LRB midpath starts --------------------------
.........│...│............ checking in-cset ...........................
↘ │ 0x00007f39ac47617c: mov %r11,%r9
│ 0x00007f39ac47617f: shl $0x3,%r9
│ 0x00007f39ac476183: mov %r9,%r10
│ 0x00007f39ac476186: shr $0x17,%r10
│ 0x00007f39ac47618a: movabs $0x7f39bc0871e0,%r8
│ 0x00007f39ac476194: cmpb $0x0,(%r8,%r10,1)
╭ │ 0x00007f39ac476199: je 0x00007f39ac4761ae
..........│..│............ checking null ..............................
│ │ 0x00007f39ac47619b: test %r11d,%r11d
│╭ │ 0x00007f39ac47619e: je 0x00007f39ac4761cc
..........││.│............ checking is-forwarded ......................
││ │ 0x00007f39ac4761a0: mov -0x8(%r12,%r11,8),%r9
││ │ 0x00007f39ac4761a5: lea (%r12,%r11,8),%r10
││ │ 0x00007f39ac4761a9: cmp %r10,%r9
││╭│ 0x00007f39ac4761ac: je 0x00007f39ac4761b7
..........││││............ return mess ................................
↘│││↗ 0x00007f39ac4761ae: mov %r9,%r11
││││ 0x00007f39ac4761b1: shr $0x3,%r11
││╰│ 0x00007f39ac4761b5: jmp 0x00007f39ac476167
...........││.│.......... slowpath call ...............................
│↘ │ 0x00007f39ac4761b7: mov %r9,%rdi
│ │ 0x00007f39ac4761ba: movabs $0x7f39c4c26d70,%r10
│ │ 0x00007f39ac4761c4: callq *%r10
│ │ 0x00007f39ac4761c7: mov %rax,%r9
│ ╰ 0x00007f39ac4761ca: jmp 0x00007f39ac4761ae
Same thing here, and "return mess" packs the reference back for returning. It seems useless as %r11
still carries the unpacked reference on non-in-cset path. Also, %r9 is available with unpacked
reference during "checking is-forwarded" execution, being unpacked earlier during "checking in-cset".
Maybe LRB expansion in C2 needs touchups to handle these, to optimize code size and performance when
GC is active.
-Aleksey
More information about the shenandoah-dev
mailing list