LRB and 32-bit compressed oops

Roland Westrelin rwestrel at redhat.com
Fri Mar 29 08:32:35 UTC 2019


> Run with -Xmx20g, thus enabling compressed oops, you shall see this:
>
>               [Verified Entry Point]
>   6.94%         0x00007f60c0497050: mov    %eax,-0x14000(%rsp)
>   5.80%         0x00007f60c0497057: push   %rbp
>   0.30%         0x00007f60c0497058: sub    $0x10,%rsp
>  11.81%         0x00007f60c049705c: mov    0xc(%rsi),%r11d
>   0.82%         0x00007f60c0497060: mov    %r11,%r9
>   0.48%         0x00007f60c0497063: shl    $0x3,%r9
> .......................... LRB fastpath check ..........................
>   5.29%         0x00007f60c0497067: testb  $0x1,0x20(%r15)
>   5.49%  ╭      0x00007f60c049706c: jne    0x00007f60c0497086
> .........│......... LRB fastpath ends, store to %r9 follows ............
>   0.87%  │↗ ↗↗  0x00007f60c049706e: movl   $0x2a,0xc(%r9)
>   7.59%  ││ ││  0x00007f60c0497076: add    $0x10,%rsp
>   6.12%  ││ ││  0x00007f60c049707a: pop    %rbp
>   1.01%  ││ ││  0x00007f60c049707b: mov    0x108(%r15),%r10
>   0.63%  ││ ││  0x00007f60c0497082: test   %eax,(%r10)
>   6.73%  ││ ││  0x00007f60c0497085: retq
> ---------││-││----------- LRB midpath starts --------------------------
> .........│|.|│............ checking in-cset ...........................
>          ↘│ ││  0x00007f60c0497086: mov    %r9,%r10
>           │ ││  0x00007f60c0497089: shr    $0x17,%r10
>           │ ││  0x00007f60c049708d: movabs $0x7f60d00919f0,%r8
>           │ ││  0x00007f60c0497097: cmpb   $0x0,(%r8,%r10,1)
>           ╰ ││  0x00007f60c049709c: je     0x00007f60c049706e
> ............││............ checking is-forwarded ......................
>             ││  0x00007f60c049709e: mov    -0x8(%r12,%r11,8),%r9
>             ││  0x00007f60c04970a3: lea    (%r12,%r11,8),%r10
>             ││  0x00007f60c04970a7: cmp    %r10,%r9
>             ╰│  0x00007f60c04970aa: jne    0x00007f60c049706e
> .............│............... slow path call ..........................
>              │  0x00007f60c04970ac: mov    %r9,%rdi
>              │  0x00007f60c04970af: movabs $0x7f60d7775030,%r10
>              │  0x00007f60c04970b9: callq  *%r10
>              │  0x00007f60c04970bc: mov    %rax,%r9
>              ╰  0x00007f60c04970bf: jmp    0x00007f60c049706e

So why not store the forwarding pointer compressed? Decoding would then
happen after the LRB. So this code:

   0.82%         0x00007f60c0497060: mov    %r11,%r9
   0.48%         0x00007f60c0497063: shl    $0x3,%r9

would fold into the following access:

   0.87%  │↗ ↗↗  0x00007f60c049706e: movl   $0x2a,0xc(%r9)

and would be essentially free. I suppose this:

          ↘│ ││  0x00007f60c0497086: mov    %r9,%r10
           │ ││  0x00007f60c0497089: shr    $0x17,%r10

could be adjusted so there's no need to decode the value here. And
decoding here:

             ││  0x00007f60c049709e: mov    -0x8(%r12,%r11,8),%r9

is already folded in the forwarding pointer access.

I suppose this would help the other case you mention where decoding is a
noop.

Roland.



More information about the shenandoah-dev mailing list