LRB midpath code quality
Roman Kennke
rkennke at redhat.com
Mon Mar 4 12:49:02 UTC 2019
Any ideas why C2 is doing this?
Roland: do you think this can be improved?
Thanks, Roman
Am 4. März 2019 12:51:06 MEZ schrieb Aleksey Shipilev <shade at redhat.com>:
>Hi there,
>
>I have been looking into generated code quality for LRB.
>
>Run the gc-bench test that writes a single int:
> https://icedtea.classpath.org/hg/gc-bench/
>
>$
>~/trunks/shenandoah-jdk/build/linux-x86_64-server-release/images/jdk/bin/java
>-jar
>target/benchmarks.jar -jvmArgs "-XX:+UnlockExperimentalVMOptions
>-Xmx20g -XX:+UseShenandoahGC"
>writes.Plain.test_int -prof perfasm:printMargin=30 2>&1 | tee
>lrb.perfasm
>
>There are things to improve in default mode, but it is also visible
>with -XX:-UseCompressedOops:
>
> [Verified Entry Point]
> 7.34% 0x00007f60e3a167b0: mov %eax,-0x14000(%rsp)
> 5.73% 0x00007f60e3a167b7: push %rbp
> 6.09% 0x00007f60e3a167b8: sub $0x10,%rsp
> 5.31% 0x00007f60e3a167bc: mov 0x10(%rsi),%r10
>.......................... LRB fastpath check
>..........................
> 0.85% 0x00007f60e3a167c0: testb $0x1,0x20(%r15)
> 7.14% ╭ 0x00007f60e3a167c5: jne 0x00007f60e3a167db
>.........│......... LRB fastpath ends, store to %r10 follows
>...........
> 0.38% │ ↗ 0x00007f60e3a167c7: movl $0x2a,0x20(%r10)
> 12.63% │ │ 0x00007f60e3a167cf: add $0x10,%rsp
> 0.40% │ │ 0x00007f60e3a167d3: pop %rbp
> 5.56% │ │ 0x00007f60e3a167d4: test %eax,0x177b9826(%rip)
> 0.29% │ │ 0x00007f60e3a167da: retq
>---------│---│----------- LRB midpath starts --------------------------
>.........│...│............ checking in-cset ...........................
> ↘ │ 0x00007f60e3a167db: mov %r10,%r11
> │ 0x00007f60e3a167de: shr $0x17,%r11
> │ 0x00007f60e3a167e2: movabs $0x7f60f309c048,%r8
> │ 0x00007f60e3a167ec: cmpb $0x0,(%r8,%r11,1)
> ╭ │ 0x00007f60e3a167f1: je 0x00007f60e3a16806
>..........│..│............ checking null ..............................
> │ │ 0x00007f60e3a167f3: test %r10,%r10
> │╭ │ 0x00007f60e3a167f6: je 0x00007f60e3a16820
>..........││.│............ checking is-forwarded ......................
> ││ │ 0x00007f60e3a167f8: mov -0x8(%r10),%r11
> ││ │ 0x00007f60e3a167fc: cmp %r10,%r11
> ││╭│ 0x00007f60e3a167ff: je 0x00007f60e3a1680b
>..........││││............ return mess ................................
> ││││↗↗ 0x00007f60e3a16801: mov %r11,%r10
> │││╰││ 0x00007f60e3a16804: jmp 0x00007f60e3a167c7
> ↘││ ││ 0x00007f60e3a16806: mov %r10,%r11
> ││ ╰│ 0x00007f60e3a16809: jmp 0x00007f60e3a16801
>...........││..│.......... slowpath call ..............................
> │↘ │ 0x00007f60e3a1680b: mov %r11,%rdi
> │ │ 0x00007f60e3a1680e: movabs $0x7f60f9afad70,%r10
> │ │ 0x00007f60e3a16818: callq *%r10
> │ │ 0x00007f60e3a1681b: mov %rax,%r11
> │ ╰ 0x00007f60e3a1681e: jmp 0x00007f60e3a16801
>
>
>I would have expected the branches return straight to
>0x00007f60e3a167c7, instead of jumping through
>the "return mess", since %r10 is kept untouched.
>
>-XX:+UseCompressedOops is messier:
>
> [Verified Entry Point]
> 3.26% 0x00007f39ac476150: mov %eax,-0x14000(%rsp)
> 6.60% 0x00007f39ac476157: push %rbp
> 1.94% 0x00007f39ac476158: sub $0x10,%rsp
> 1.70% 0x00007f39ac47615c: mov 0xc(%rsi),%r11d
>.......................... LRB fastpath check
>..........................
> 5.84% 0x00007f39ac476160: testb $0x1,0x20(%r15)
> 2.07% ╭ 0x00007f39ac476165: jne 0x00007f39ac47617c
>.........│......... LRB fastpath ends, store to %r11 follows
>...........
> 1.36% │ ↗ 0x00007f39ac476167: movl $0x2a,0xc(%r12,%r11,8)
> 13.28% │ │ 0x00007f39ac476170: add $0x10,%rsp
> 3.36% │ │ 0x00007f39ac476174: pop %rbp
> 1.90% │ │ 0x00007f39ac476175: test %eax,0x19e85e85(%rip)
> 0.98% │ │ 0x00007f39ac47617b: retq
>---------│---│----------- LRB midpath starts --------------------------
>.........│...│............ checking in-cset ...........................
> ↘ │ 0x00007f39ac47617c: mov %r11,%r9
> │ 0x00007f39ac47617f: shl $0x3,%r9
> │ 0x00007f39ac476183: mov %r9,%r10
> │ 0x00007f39ac476186: shr $0x17,%r10
> │ 0x00007f39ac47618a: movabs $0x7f39bc0871e0,%r8
> │ 0x00007f39ac476194: cmpb $0x0,(%r8,%r10,1)
> ╭ │ 0x00007f39ac476199: je 0x00007f39ac4761ae
>..........│..│............ checking null ..............................
> │ │ 0x00007f39ac47619b: test %r11d,%r11d
> │╭ │ 0x00007f39ac47619e: je 0x00007f39ac4761cc
>..........││.│............ checking is-forwarded ......................
> ││ │ 0x00007f39ac4761a0: mov -0x8(%r12,%r11,8),%r9
> ││ │ 0x00007f39ac4761a5: lea (%r12,%r11,8),%r10
> ││ │ 0x00007f39ac4761a9: cmp %r10,%r9
> ││╭│ 0x00007f39ac4761ac: je 0x00007f39ac4761b7
>..........││││............ return mess ................................
> ↘│││↗ 0x00007f39ac4761ae: mov %r9,%r11
> ││││ 0x00007f39ac4761b1: shr $0x3,%r11
> ││╰│ 0x00007f39ac4761b5: jmp 0x00007f39ac476167
>...........││.│.......... slowpath call ...............................
> │↘ │ 0x00007f39ac4761b7: mov %r9,%rdi
> │ │ 0x00007f39ac4761ba: movabs $0x7f39c4c26d70,%r10
> │ │ 0x00007f39ac4761c4: callq *%r10
> │ │ 0x00007f39ac4761c7: mov %rax,%r9
> │ ╰ 0x00007f39ac4761ca: jmp 0x00007f39ac4761ae
>
>Same thing here, and "return mess" packs the reference back for
>returning. It seems useless as %r11
>still carries the unpacked reference on non-in-cset path. Also, %r9 is
>available with unpacked
>reference during "checking is-forwarded" execution, being unpacked
>earlier during "checking in-cset".
>
>Maybe LRB expansion in C2 needs touchups to handle these, to optimize
>code size and performance when
>GC is active.
>
>-Aleksey
--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
More information about the shenandoah-dev
mailing list