Perf: excess store in allocation fast path?

Aleksey Shipilev shade at redhat.com
Tue Dec 6 19:29:28 UTC 2016


On 12/06/2016 08:25 PM, Roman Kennke wrote:
> Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev:
>> I think we have the excess store at allocation fast path, compare 
>> Shenandoah [1] and Parallel [2]. And this is not storing the fwdptr, but
>> seems to be the excess zeroing. In that test, allocating a simple Object
>> yields this:
>>
>>   mov    %r11,(%rax)            ; mark word
>>   prefetchnta 0xc0(%r10)
>>   movl   $0xf80001dd,0x8(%rax)  ; class word
>>   mov    %rax,-0x8(%rax)        ; fwdptr
>>   mov    %r12d,0xc(%rax)        ; zeroing last 4 bytes
>>   mov    %r12,0x10(%rax)        ; <--- hey, what?
>>
>> I think this happens because allocation fastpath bumps the instance size
>> to "cover" for the upcoming object's fwdptr, and accidentally zeroes it as
>> well? Do we need this? I can imagine the invariant that everything up to
>> top pointer should be zeroed, is this such a case?
> 
> It looks like initialization for the first field in the object. Maybe
> we're failing the c2 opt that eliminates initial zeroing for fields?
> Maybe our barrier or allocation stuff somehow gets in the way of that
> and c2 can't see the initialization and therefore cannot optimize it
> away?

The test allocates new Object(), no fields. The object is 16 bytes long, yet we
store something beyond 16 bytes -- which AFAIR is the slot for the next object's
forwarding pointer.

Thanks,
-Aleksey





More information about the shenandoah-dev mailing list