LRB and 32-bit compressed oops
Roman Kennke
rkennke at redhat.com
Fri Mar 29 09:07:54 UTC 2019
I don't think that storing fwd ptr compressed is an option. It would have to be decoded and re-encoded in the barrier. It would complicate and slow down everything else in GC. And it would gain us nothing: we still need the 64bits because alignment. (And we already have a prototype to eliminate that word...)
The question was really only, can we get rid of the superfluous mov and register usage in the noop-decode case?
Roman
Am 29. März 2019 09:32:35 MEZ schrieb Roland Westrelin <rwestrel at redhat.com>:
>> Run with -Xmx20g, thus enabling compressed oops, you shall see this:
>>
>> [Verified Entry Point]
>> 6.94% 0x00007f60c0497050: mov %eax,-0x14000(%rsp)
>> 5.80% 0x00007f60c0497057: push %rbp
>> 0.30% 0x00007f60c0497058: sub $0x10,%rsp
>> 11.81% 0x00007f60c049705c: mov 0xc(%rsi),%r11d
>> 0.82% 0x00007f60c0497060: mov %r11,%r9
>> 0.48% 0x00007f60c0497063: shl $0x3,%r9
>> .......................... LRB fastpath check
>..........................
>> 5.29% 0x00007f60c0497067: testb $0x1,0x20(%r15)
>> 5.49% ╭ 0x00007f60c049706c: jne 0x00007f60c0497086
>> .........│......... LRB fastpath ends, store to %r9 follows
>............
>> 0.87% │↗ ↗↗ 0x00007f60c049706e: movl $0x2a,0xc(%r9)
>> 7.59% ││ ││ 0x00007f60c0497076: add $0x10,%rsp
>> 6.12% ││ ││ 0x00007f60c049707a: pop %rbp
>> 1.01% ││ ││ 0x00007f60c049707b: mov 0x108(%r15),%r10
>> 0.63% ││ ││ 0x00007f60c0497082: test %eax,(%r10)
>> 6.73% ││ ││ 0x00007f60c0497085: retq
>> ---------││-││----------- LRB midpath starts
>--------------------------
>> .........│|.|│............ checking in-cset
>...........................
>> ↘│ ││ 0x00007f60c0497086: mov %r9,%r10
>> │ ││ 0x00007f60c0497089: shr $0x17,%r10
>> │ ││ 0x00007f60c049708d: movabs $0x7f60d00919f0,%r8
>> │ ││ 0x00007f60c0497097: cmpb $0x0,(%r8,%r10,1)
>> ╰ ││ 0x00007f60c049709c: je 0x00007f60c049706e
>> ............││............ checking is-forwarded
>......................
>> ││ 0x00007f60c049709e: mov -0x8(%r12,%r11,8),%r9
>> ││ 0x00007f60c04970a3: lea (%r12,%r11,8),%r10
>> ││ 0x00007f60c04970a7: cmp %r10,%r9
>> ╰│ 0x00007f60c04970aa: jne 0x00007f60c049706e
>> .............│............... slow path call
>..........................
>> │ 0x00007f60c04970ac: mov %r9,%rdi
>> │ 0x00007f60c04970af: movabs $0x7f60d7775030,%r10
>> │ 0x00007f60c04970b9: callq *%r10
>> │ 0x00007f60c04970bc: mov %rax,%r9
>> ╰ 0x00007f60c04970bf: jmp 0x00007f60c049706e
>
>So why not store the forwarding pointer compressed? Decoding would then
>happen after the LRB. So this code:
>
> 0.82% 0x00007f60c0497060: mov %r11,%r9
> 0.48% 0x00007f60c0497063: shl $0x3,%r9
>
>would fold into the following access:
>
> 0.87% │↗ ↗↗ 0x00007f60c049706e: movl $0x2a,0xc(%r9)
>
>and would be essentially free. I suppose this:
>
> ↘│ ││ 0x00007f60c0497086: mov %r9,%r10
> │ ││ 0x00007f60c0497089: shr $0x17,%r10
>
>could be adjusted so there's no need to decode the value here. And
>decoding here:
>
> ││ 0x00007f60c049709e: mov -0x8(%r12,%r11,8),%r9
>
>is already folded in the forwarding pointer access.
>
>I suppose this would help the other case you mention where decoding is
>a
>noop.
>
>Roland.
--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
More information about the shenandoah-dev
mailing list