Perf: excess store in allocation fast path?

Roman Kennke rkennke at redhat.com
Tue Dec 6 19:44:14 UTC 2016


Am Dienstag, den 06.12.2016, 20:29 +0100 schrieb Aleksey Shipilev:
> On 12/06/2016 08:25 PM, Roman Kennke wrote:
> > Am Dienstag, den 06.12.2016, 19:39 +0100 schrieb Aleksey Shipilev:
> > > I think we have the excess store at allocation fast path,
> > > compare 
> > > Shenandoah [1] and Parallel [2]. And this is not storing the
> > > fwdptr, but
> > > seems to be the excess zeroing. In that test, allocating a simple
> > > Object
> > > yields this:
> > > 
> > >   mov    %r11,(%rax)            ; mark word
> > >   prefetchnta 0xc0(%r10)
> > >   movl   $0xf80001dd,0x8(%rax)  ; class word
> > >   mov    %rax,-0x8(%rax)        ; fwdptr
> > >   mov    %r12d,0xc(%rax)        ; zeroing last 4 bytes
> > >   mov    %r12,0x10(%rax)        ; <--- hey, what?
> > > 
> > > I think this happens because allocation fastpath bumps the
> > > instance size
> > > to "cover" for the upcoming object's fwdptr, and accidentally
> > > zeroes it as
> > > well? Do we need this? I can imagine the invariant that
> > > everything up to
> > > top pointer should be zeroed, is this such a case?
> > 
> > It looks like initialization for the first field in the object.
> > Maybe
> > we're failing the c2 opt that eliminates initial zeroing for
> > fields?
> > Maybe our barrier or allocation stuff somehow gets in the way of
> > that
> > and c2 can't see the initialization and therefore cannot optimize
> > it
> > away?
> 
> The test allocates new Object(), no fields. The object is 16 bytes
> long, yet we
> store something beyond 16 bytes -- which AFAIR is the slot for the
> next object's
> forwarding pointer.

Try the attached patch. It preserves the obj_size, and passes that to
initialize_object().


-------------- next part --------------
diff --git a/src/share/vm/opto/macro.cpp b/src/share/vm/opto/macro.cpp
--- a/src/share/vm/opto/macro.cpp
+++ b/src/share/vm/opto/macro.cpp
@@ -1449,6 +1449,7 @@
     transform_later(old_eden_top);
     // Add to heap top to get a new heap top
 
+    Node* init_size_in_bytes = size_in_bytes;
     if (UseShenandoahGC) {
       // Allocate several words more for the Shenandoah brooks pointer.
       size_in_bytes = new AddLNode(size_in_bytes, _igvn.MakeConX(BrooksPointer::byte_size()));
@@ -1554,7 +1555,7 @@
     InitializeNode* init = alloc->initialization();
     fast_oop_rawmem = initialize_object(alloc,
                                         fast_oop_ctrl, fast_oop_rawmem, fast_oop,
-                                        klass_node, length, size_in_bytes);
+                                        klass_node, length, init_size_in_bytes);
 
     // If initialization is performed by an array copy, any required
     // MemBarStoreStore was already added. If the object does not


More information about the shenandoah-dev mailing list