LRB and 32-bit compressed oops

Fri Mar 29 09:07:54 UTC 2019

I don't think that storing fwd ptr compressed is an option. It would have to be decoded and re-encoded in the barrier. It would complicate and slow down everything else in GC. And it would gain us nothing: we still need the 64bits because alignment. (And we already have a prototype to eliminate that word...)

The question was really only, can we get rid of the superfluous mov and register usage in the noop-decode case?

Roman

Am 29. März 2019 09:32:35 MEZ schrieb Roland Westrelin <rwestrel at redhat.com>:
>> Run with -Xmx20g, thus enabling compressed oops, you shall see this:
>>
>>               [Verified Entry Point]
>>   6.94%         0x00007f60c0497050: mov    %eax,-0x14000(%rsp)
>>   5.80%         0x00007f60c0497057: push   %rbp
>>   0.30%         0x00007f60c0497058: sub    $0x10,%rsp
>>  11.81%         0x00007f60c049705c: mov    0xc(%rsi),%r11d
>>   0.82%         0x00007f60c0497060: mov    %r11,%r9
>>   0.48%         0x00007f60c0497063: shl    $0x3,%r9
>> .......................... LRB fastpath check
>..........................
>>   5.29%         0x00007f60c0497067: testb  $0x1,0x20(%r15)
>>   5.49%  ╭      0x00007f60c049706c: jne    0x00007f60c0497086
>> .........│......... LRB fastpath ends, store to %r9 follows
>............
>>   0.87%  │↗ ↗↗  0x00007f60c049706e: movl   $0x2a,0xc(%r9)
>>   7.59%  ││ ││  0x00007f60c0497076: add    $0x10,%rsp
>>   6.12%  ││ ││  0x00007f60c049707a: pop    %rbp
>>   1.01%  ││ ││  0x00007f60c049707b: mov    0x108(%r15),%r10
>>   0.63%  ││ ││  0x00007f60c0497082: test   %eax,(%r10)
>>   6.73%  ││ ││  0x00007f60c0497085: retq
>> ---------││-││----------- LRB midpath starts
>--------------------------
>> .........│|.|│............ checking in-cset
>...........................
>>          ↘│ ││  0x00007f60c0497086: mov    %r9,%r10
>>           │ ││  0x00007f60c0497089: shr    $0x17,%r10
>>           │ ││  0x00007f60c049708d: movabs $0x7f60d00919f0,%r8
>>           │ ││  0x00007f60c0497097: cmpb   $0x0,(%r8,%r10,1)
>>           ╰ ││  0x00007f60c049709c: je     0x00007f60c049706e
>> ............││............ checking is-forwarded
>......................
>>             ││  0x00007f60c049709e: mov    -0x8(%r12,%r11,8),%r9
>>             ││  0x00007f60c04970a3: lea    (%r12,%r11,8),%r10
>>             ││  0x00007f60c04970a7: cmp    %r10,%r9
>>             ╰│  0x00007f60c04970aa: jne    0x00007f60c049706e
>> .............│............... slow path call
>..........................
>>              │  0x00007f60c04970ac: mov    %r9,%rdi
>>              │  0x00007f60c04970af: movabs $0x7f60d7775030,%r10
>>              │  0x00007f60c04970b9: callq  *%r10
>>              │  0x00007f60c04970bc: mov    %rax,%r9
>>              ╰  0x00007f60c04970bf: jmp    0x00007f60c049706e
>
>So why not store the forwarding pointer compressed? Decoding would then
>happen after the LRB. So this code:
>
>   0.82%         0x00007f60c0497060: mov    %r11,%r9
>   0.48%         0x00007f60c0497063: shl    $0x3,%r9
>
>would fold into the following access:
>
>   0.87%  │↗ ↗↗  0x00007f60c049706e: movl   $0x2a,0xc(%r9)
>
>and would be essentially free. I suppose this:
>
>          ↘│ ││  0x00007f60c0497086: mov    %r9,%r10
>           │ ││  0x00007f60c0497089: shr    $0x17,%r10
>
>could be adjusted so there's no need to decode the value here. And
>decoding here:
>
>             ││  0x00007f60c049709e: mov    -0x8(%r12,%r11,8),%r9
>
>is already folded in the forwarding pointer access.
>
>I suppose this would help the other case you mention where decoding is
>a
>noop.
>
>Roland.

-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.