Perf: excess store in allocation fast path?
Aleksey Shipilev
shade at redhat.com
Tue Dec 6 19:29:28 UTC 2016
On 12/06/2016 08:25 PM, Roman Kennke wrote:
> Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev:
>> I think we have the excess store at allocation fast path, compare
>> Shenandoah [1] and Parallel [2]. And this is not storing the fwdptr, but
>> seems to be the excess zeroing. In that test, allocating a simple Object
>> yields this:
>>
>> mov %r11,(%rax) ; mark word
>> prefetchnta 0xc0(%r10)
>> movl $0xf80001dd,0x8(%rax) ; class word
>> mov %rax,-0x8(%rax) ; fwdptr
>> mov %r12d,0xc(%rax) ; zeroing last 4 bytes
>> mov %r12,0x10(%rax) ; <--- hey, what?
>>
>> I think this happens because allocation fastpath bumps the instance size
>> to "cover" for the upcoming object's fwdptr, and accidentally zeroes it as
>> well? Do we need this? I can imagine the invariant that everything up to
>> top pointer should be zeroed, is this such a case?
>
> It looks like initialization for the first field in the object. Maybe
> we're failing the c2 opt that eliminates initial zeroing for fields?
> Maybe our barrier or allocation stuff somehow gets in the way of that
> and c2 can't see the initialization and therefore cannot optimize it
> away?
The test allocates new Object(), no fields. The object is 16 bytes long, yet we
store something beyond 16 bytes -- which AFAIR is the slot for the next object's
forwarding pointer.
Thanks,
-Aleksey
More information about the shenandoah-dev
mailing list