Perf: excess store in allocation fast path?
Roman Kennke
rkennke at redhat.com
Tue Dec 6 19:44:14 UTC 2016
Am Dienstag, den 06.12.2016, 20:29 +0100 schrieb Aleksey Shipilev:
> On 12/06/2016 08:25 PM, Roman Kennke wrote:
> > Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev:
> > > I think we have the excess store at allocation fast path,
> > > compare
> > > Shenandoah [1] and Parallel [2]. And this is not storing the
> > > fwdptr, but
> > > seems to be the excess zeroing. In that test, allocating a simple
> > > Object
> > > yields this:
> > >
> > > mov %r11,(%rax) ; mark word
> > > prefetchnta 0xc0(%r10)
> > > movl $0xf80001dd,0x8(%rax) ; class word
> > > mov %rax,-0x8(%rax) ; fwdptr
> > > mov %r12d,0xc(%rax) ; zeroing last 4 bytes
> > > mov %r12,0x10(%rax) ; <--- hey, what?
> > >
> > > I think this happens because allocation fastpath bumps the
> > > instance size
> > > to "cover" for the upcoming object's fwdptr, and accidentally
> > > zeroes it as
> > > well? Do we need this? I can imagine the invariant that
> > > everything up to
> > > top pointer should be zeroed, is this such a case?
> >
> > It looks like initialization for the first field in the object.
> > Maybe
> > we're failing the c2 opt that eliminates initial zeroing for
> > fields?
> > Maybe our barrier or allocation stuff somehow gets in the way of
> > that
> > and c2 can't see the initialization and therefore cannot optimize
> > it
> > away?
>
> The test allocates new Object(), no fields. The object is 16 bytes
> long, yet we
> store something beyond 16 bytes -- which AFAIR is the slot for the
> next object's
> forwarding pointer.
Try the attached patch. It preserves the obj_size, and passes that to
initialize_object().
-------------- next part --------------
diff --git a/src/share/vm/opto/macro.cpp b/src/share/vm/opto/macro.cpp
--- a/src/share/vm/opto/macro.cpp
+++ b/src/share/vm/opto/macro.cpp
@@ -1449,6 +1449,7 @@
transform_later(old_eden_top);
// Add to heap top to get a new heap top
+ Node* init_size_in_bytes = size_in_bytes;
if (UseShenandoahGC) {
// Allocate several words more for the Shenandoah brooks pointer.
size_in_bytes = new AddLNode(size_in_bytes, _igvn.MakeConX(BrooksPointer::byte_size()));
@@ -1554,7 +1555,7 @@
InitializeNode* init = alloc->initialization();
fast_oop_rawmem = initialize_object(alloc,
fast_oop_ctrl, fast_oop_rawmem, fast_oop,
- klass_node, length, size_in_bytes);
+ klass_node, length, init_size_in_bytes);
// If initialization is performed by an array copy, any required
// MemBarStoreStore was already added. If the object does not
More information about the shenandoah-dev
mailing list