LRB and 32-bit compressed oops

Mon Mar 25 19:19:26 UTC 2019

For some reason I don't seem to be getting your asm output. But can you
try the following patch real quick?

diff -r 6277fcfd1269
src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp

--- a/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp	Mon Mar
25 19:57:08 2019 +0100
+++ b/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp	Mon Mar
25 20:18:53 2019 +0100
@@ -1508,7 +1508,7 @@
       IfNode* iff = unc_ctrl->in(0)->as_If();
       phase->igvn().replace_input_of(iff, 1, phase->igvn().intcon(1));
     }
-    Node* addr = new AddPNode(new_val, uncasted_val,
phase->igvn().MakeConX(ShenandoahBrooksPointer::byte_offset()));
+    Node* addr = new AddPNode(new_val, new_val,
phase->igvn().MakeConX(ShenandoahBrooksPointer::byte_offset()));
     phase->register_new_node(addr, ctrl);
     assert(val->bottom_type()->isa_oopptr(), "what else?");
     const TypePtr* obj_type =  val->bottom_type()->is_oopptr();


Thanks,
Roman

Am 25.03.19 um 18:17 schrieb Aleksey Shipilev:
> Hi again,
> 
> I was following up on experiments with LRB vs non-LRB, and spotted the thing about 32-bit compressed
> oops.
> 
> Run the gc-bench test that writes a single int:
>  https://icedtea.classpath.org/hg/gc-bench/
> 
> $ ~/trunks/shenandoah-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -jar
> target/benchmarks.jar -jvmArgs "-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC"
> writes.Plain.test_int -prof perfasm:printMargin=30 2>&1 | tee lrb.perfasm
> 
> Run with -Xmx20g, thus enabling compressed oops, you shall see this:
> 
>               [Verified Entry Point]
>   6.94%         0x00007f60c0497050: mov    %eax,-0x14000(%rsp)
>   5.80%         0x00007f60c0497057: push   %rbp
>   0.30%         0x00007f60c0497058: sub    $0x10,%rsp
>  11.81%         0x00007f60c049705c: mov    0xc(%rsi),%r11d
>   0.82%         0x00007f60c0497060: mov    %r11,%r9
>   0.48%         0x00007f60c0497063: shl    $0x3,%r9
> .......................... LRB fastpath check ..........................
>   5.29%         0x00007f60c0497067: testb  $0x1,0x20(%r15)
>   5.49%  ╭      0x00007f60c049706c: jne    0x00007f60c0497086
> .........│......... LRB fastpath ends, store to %r9 follows ............
>   0.87%  │↗ ↗↗  0x00007f60c049706e: movl   $0x2a,0xc(%r9)
>   7.59%  ││ ││  0x00007f60c0497076: add    $0x10,%rsp
>   6.12%  ││ ││  0x00007f60c049707a: pop    %rbp
>   1.01%  ││ ││  0x00007f60c049707b: mov    0x108(%r15),%r10
>   0.63%  ││ ││  0x00007f60c0497082: test   %eax,(%r10)
>   6.73%  ││ ││  0x00007f60c0497085: retq
> ---------││-││----------- LRB midpath starts --------------------------
> .........│|.|│............ checking in-cset ...........................
>          ↘│ ││  0x00007f60c0497086: mov    %r9,%r10
>           │ ││  0x00007f60c0497089: shr    $0x17,%r10
>           │ ││  0x00007f60c049708d: movabs $0x7f60d00919f0,%r8
>           │ ││  0x00007f60c0497097: cmpb   $0x0,(%r8,%r10,1)
>           ╰ ││  0x00007f60c049709c: je     0x00007f60c049706e
> ............││............ checking is-forwarded ......................
>             ││  0x00007f60c049709e: mov    -0x8(%r12,%r11,8),%r9
>             ││  0x00007f60c04970a3: lea    (%r12,%r11,8),%r10
>             ││  0x00007f60c04970a7: cmp    %r10,%r9
>             ╰│  0x00007f60c04970aa: jne    0x00007f60c049706e
> .............│............... slow path call ..........................
>              │  0x00007f60c04970ac: mov    %r9,%rdi
>              │  0x00007f60c04970af: movabs $0x7f60d7775030,%r10
>              │  0x00007f60c04970b9: callq  *%r10
>              │  0x00007f60c04970bc: mov    %rax,%r9
>              ╰  0x00007f60c04970bf: jmp    0x00007f60c049706e
> 
> This is actually good code. But if you add -Xmx1g, thus enabling 32-bit compressed oops, you would
> expect decode to go away in favor of just using the (extended) 32-bit value. Shifts are indeed gone,
> but register moves are still there. And that, I think, wastes registers, see:
> 
>               [Verified Entry Point]
>   6.85%         0x00007fb1284982d0: mov    %eax,-0x14000(%rsp)
>   5.71%         0x00007fb1284982d7: push   %rbp
>   3.14%         0x00007fb1284982d8: sub    $0x10,%rsp
>   7.32%         0x00007fb1284982dc: mov    0xc(%rsi),%r11d
>   2.46%         0x00007fb1284982e0: mov    %r11,%r9           <---- !!!!
> .......................... LRB fastpath check ..........................
>   2.97%         0x00007fb1284982e3: testb  $0x1,0x20(%r15)
>   3.45%  ╭      0x00007fb1284982e8: jne    0x00007fb128498302
> .........│......... LRB fastpath ends, store to %r9 follows ............
>   3.51%  │↗ ↗↗  0x00007fb1284982ea: movl   $0x2a,0xc(%r9)
>   7.30%  ││ ││  0x00007fb1284982f2: add    $0x10,%rsp
>   3.12%  ││ ││  0x00007fb1284982f6: pop    %rbp
>   2.91%  ││ ││  0x00007fb1284982f7: mov    0x108(%r15),%r10
>   3.23%  ││ ││  0x00007fb1284982fe: test   %eax,(%r10)
>   4.63%  ││ ││  0x00007fb128498301: retq
> ---------││-││----------- LRB midpath starts --------------------------
> .........│|.|│............ checking in-cset ...........................
>          ↘│ ││  0x00007fb128498302: mov    %r9,%r10
>           │ ││  0x00007fb128498305: shr    $0x13,%r10
>           │ ││  0x00007fb128498309: movabs $0x7fb13808d770,%r8
>           │ ││  0x00007fb128498313: cmpb   $0x0,(%r8,%r10,1)
>           ╰ ││  0x00007fb128498318: je     0x00007fb1284982ea
> ............││............ checking is-forwarded ......................
>             ││  0x00007fb12849831a: mov    -0x8(%r11),%r9
>             ││  0x00007fb12849831e: mov    %r11,%r10         <---- !!!!
>             ││  0x00007fb128498321: cmp    %r10,%r9
>             ╰│  0x00007fb128498324: jne    0x00007fb1284982ea
> .............│............... slow path call ..........................
>              │  0x00007fb128498326: mov    %r9,%rdi
>              │  0x00007fb128498329: movabs $0x7fb13f963030,%r10
>              │  0x00007fb128498333: callq  *%r10
>              │  0x00007fb128498336: mov    %rax,%r9
>              ╰  0x00007fb128498339: jmp    0x00007fb1284982ea
> 
> 32-bit compressed oops mode is interesting, because it is the microservice range. Not sure it is LRB
> problem, or a generic C2 one.
> 
> Thanks,
> -Aleksey
> 
>