LRB midpath code quality
Roman Kennke
rkennke at redhat.com
Mon Mar 4 21:29:12 UTC 2019
Also, it seems weird that the null-check is after the in-cset-check, but
not before. It's probably a left-over from null-check-cloning that
should actually disappear too?
Roman
> Hi there,
>
> I have been looking into generated code quality for LRB.
>
> Run the gc-bench test that writes a single int:
> https://icedtea.classpath.org/hg/gc-bench/
>
> $ ~/trunks/shenandoah-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -jar
> target/benchmarks.jar -jvmArgs "-XX:+UnlockExperimentalVMOptions -Xmx20g -XX:+UseShenandoahGC"
> writes.Plain.test_int -prof perfasm:printMargin=30 2>&1 | tee lrb.perfasm
>
> There are things to improve in default mode, but it is also visible with -XX:-UseCompressedOops:
>
> [Verified Entry Point]
> 7.34% 0x00007f60e3a167b0: mov %eax,-0x14000(%rsp)
> 5.73% 0x00007f60e3a167b7: push %rbp
> 6.09% 0x00007f60e3a167b8: sub $0x10,%rsp
> 5.31% 0x00007f60e3a167bc: mov 0x10(%rsi),%r10
> .......................... LRB fastpath check ..........................
> 0.85% 0x00007f60e3a167c0: testb $0x1,0x20(%r15)
> 7.14% ╭ 0x00007f60e3a167c5: jne 0x00007f60e3a167db
> .........│......... LRB fastpath ends, store to %r10 follows ...........
> 0.38% │ ↗ 0x00007f60e3a167c7: movl $0x2a,0x20(%r10)
> 12.63% │ │ 0x00007f60e3a167cf: add $0x10,%rsp
> 0.40% │ │ 0x00007f60e3a167d3: pop %rbp
> 5.56% │ │ 0x00007f60e3a167d4: test %eax,0x177b9826(%rip)
> 0.29% │ │ 0x00007f60e3a167da: retq
> ---------│---│----------- LRB midpath starts --------------------------
> .........│...│............ checking in-cset ...........................
> ↘ │ 0x00007f60e3a167db: mov %r10,%r11
> │ 0x00007f60e3a167de: shr $0x17,%r11
> │ 0x00007f60e3a167e2: movabs $0x7f60f309c048,%r8
> │ 0x00007f60e3a167ec: cmpb $0x0,(%r8,%r11,1)
> ╭ │ 0x00007f60e3a167f1: je 0x00007f60e3a16806
> ..........│..│............ checking null ..............................
> │ │ 0x00007f60e3a167f3: test %r10,%r10
> │╭ │ 0x00007f60e3a167f6: je 0x00007f60e3a16820
> ..........││.│............ checking is-forwarded ......................
> ││ │ 0x00007f60e3a167f8: mov -0x8(%r10),%r11
> ││ │ 0x00007f60e3a167fc: cmp %r10,%r11
> ││╭│ 0x00007f60e3a167ff: je 0x00007f60e3a1680b
> ..........││││............ return mess ................................
> ││││↗↗ 0x00007f60e3a16801: mov %r11,%r10
> │││╰││ 0x00007f60e3a16804: jmp 0x00007f60e3a167c7
> ↘││ ││ 0x00007f60e3a16806: mov %r10,%r11
> ││ ╰│ 0x00007f60e3a16809: jmp 0x00007f60e3a16801
> ...........││..│.......... slowpath call ..............................
> │↘ │ 0x00007f60e3a1680b: mov %r11,%rdi
> │ │ 0x00007f60e3a1680e: movabs $0x7f60f9afad70,%r10
> │ │ 0x00007f60e3a16818: callq *%r10
> │ │ 0x00007f60e3a1681b: mov %rax,%r11
> │ ╰ 0x00007f60e3a1681e: jmp 0x00007f60e3a16801
>
>
> I would have expected the branches return straight to 0x00007f60e3a167c7, instead of jumping through
> the "return mess", since %r10 is kept untouched.
>
> -XX:+UseCompressedOops is messier:
>
> [Verified Entry Point]
> 3.26% 0x00007f39ac476150: mov %eax,-0x14000(%rsp)
> 6.60% 0x00007f39ac476157: push %rbp
> 1.94% 0x00007f39ac476158: sub $0x10,%rsp
> 1.70% 0x00007f39ac47615c: mov 0xc(%rsi),%r11d
> .......................... LRB fastpath check ..........................
> 5.84% 0x00007f39ac476160: testb $0x1,0x20(%r15)
> 2.07% ╭ 0x00007f39ac476165: jne 0x00007f39ac47617c
> .........│......... LRB fastpath ends, store to %r11 follows ...........
> 1.36% │ ↗ 0x00007f39ac476167: movl $0x2a,0xc(%r12,%r11,8)
> 13.28% │ │ 0x00007f39ac476170: add $0x10,%rsp
> 3.36% │ │ 0x00007f39ac476174: pop %rbp
> 1.90% │ │ 0x00007f39ac476175: test %eax,0x19e85e85(%rip)
> 0.98% │ │ 0x00007f39ac47617b: retq
> ---------│---│----------- LRB midpath starts --------------------------
> .........│...│............ checking in-cset ...........................
> ↘ │ 0x00007f39ac47617c: mov %r11,%r9
> │ 0x00007f39ac47617f: shl $0x3,%r9
> │ 0x00007f39ac476183: mov %r9,%r10
> │ 0x00007f39ac476186: shr $0x17,%r10
> │ 0x00007f39ac47618a: movabs $0x7f39bc0871e0,%r8
> │ 0x00007f39ac476194: cmpb $0x0,(%r8,%r10,1)
> ╭ │ 0x00007f39ac476199: je 0x00007f39ac4761ae
> ..........│..│............ checking null ..............................
> │ │ 0x00007f39ac47619b: test %r11d,%r11d
> │╭ │ 0x00007f39ac47619e: je 0x00007f39ac4761cc
> ..........││.│............ checking is-forwarded ......................
> ││ │ 0x00007f39ac4761a0: mov -0x8(%r12,%r11,8),%r9
> ││ │ 0x00007f39ac4761a5: lea (%r12,%r11,8),%r10
> ││ │ 0x00007f39ac4761a9: cmp %r10,%r9
> ││╭│ 0x00007f39ac4761ac: je 0x00007f39ac4761b7
> ..........││││............ return mess ................................
> ↘│││↗ 0x00007f39ac4761ae: mov %r9,%r11
> ││││ 0x00007f39ac4761b1: shr $0x3,%r11
> ││╰│ 0x00007f39ac4761b5: jmp 0x00007f39ac476167
> ...........││.│.......... slowpath call ...............................
> │↘ │ 0x00007f39ac4761b7: mov %r9,%rdi
> │ │ 0x00007f39ac4761ba: movabs $0x7f39c4c26d70,%r10
> │ │ 0x00007f39ac4761c4: callq *%r10
> │ │ 0x00007f39ac4761c7: mov %rax,%r9
> │ ╰ 0x00007f39ac4761ca: jmp 0x00007f39ac4761ae
>
> Same thing here, and "return mess" packs the reference back for returning. It seems useless as %r11
> still carries the unpacked reference on non-in-cset path. Also, %r9 is available with unpacked
> reference during "checking is-forwarded" execution, being unpacked earlier during "checking in-cset".
>
> Maybe LRB expansion in C2 needs touchups to handle these, to optimize code size and performance when
> GC is active.
>
> -Aleksey
>
More information about the shenandoah-dev
mailing list