[RESUMED] RFR: 8158946 - btree009 fails with assert(s > 0) failed: Bad size calculated

Wed Jun 22 20:38:12 UTC 2016

Hi,

On Wed, 2016-06-22 at 15:33 -0400, Derek White wrote:
> Hi Thomas,
> 
> On 6/22/16 3:01 PM, Thomas Schatzl wrote:
> > Hi,
> > 
[...]
> > CollectedHeap::array_allocate() is not used for humongous objects,
> > but
> > G1CollectedHeap::humongous_obj_allocate().
>  
> That can't be right - G1CollectedHeap::humongous_obj_allocate()
> doesn't set the object header (it doesn't even know the Klass). It
> clears the object header, and does the storestore before updating the
> heap region bookkeeping that makes the new object scannable. At that
> point the new object is a valid uninitialized object.
> 
> G1CollectedHeap::humongous_obj_allocate()      is called by
>   Universe::heap()->mem_allocate()             is called by 
>    CollectedHeap::common_mem_allocate_noinit() is called by various
>      CollectedHeap::XXX_allocate()
> 
> But what Kim is concerned about is the ordering of setting the object
> header (lock and klass fields) and setting either the array length or
> the "oop_size" field of a java.lang.Class instance. We (GC) never
> want to see an object with a non-zero klass in the header and an
> unset array length or oop_size. These fields are set up in
> CollectedHeap::post_allocation_install_obj_klass() (and neighbors),
> but there is no ordering enforced between the stores.

Bug? I mean, if the comment already says it needs this ordering, and
there is no explicit barrier, it's a bug in any case. At least on non-
TSO platforms the cpu will reorder the stores, and on others there is
still the issue with potential compiler reordering. 

Guess we were kind of lucky? Or one of these spurious crash reports we
have is caused by this?

> I think we're primarily worried by concurrent GC threads (G1 or CMS)
> seeing these new objects as they are being created. So we aren't
> concerned about young gen objects. There's some evidence that CMS is
> synchronizing access between allocators and concurrent scanners (see
> below), but I don't know if there are similar issues with G1.

There unfortunately seem to be quite a few potentially concurrent
callers (refinement - scanning the card, marking - advancing the
finger) of oopDesc::size_given_klass() and I can't convince myself we
have an address dependency on klass in that method for array size
calculation (which would imply a loadload barrier on the platforms
needed) - other objects seem fine, as they read a variable based off
the klass pointer.

(And on TSO systems again it did not matter except for the potential
compiler reordering, which seems almost impossible here)

But please check that yourselves too.

Thanks,
  Thomas