[aarch64-port-dev ] RFR(s): 8171449" [aarch64] store_klass needs to use store release

Tue Dec 20 14:40:56 UTC 2016

Hi Andrew,

Some background. There are two synchronization issues around object initialization:
1) The allocating thread needs to ensure that the object is fully initialized (to default zeros or specified values) before another Java thread can see the object. This case is well handled with memory barriers etc. 

2) A concurrent GC might find an object during heap scanning that has been allocated but not yet initialized. At the least it needs to know the size of the object if it's to reason about it. To enable this, the contract between the runtime and the concurrent collectors is that the length of an array (and 'forgotten case B'), is written before the klass word is installed in the header. If CMS finds an object with a null klass word, it either retries, terminates what it's doing, or uses a back-up method for finding the object size.

So the history of this bug and it's relatives started with 'forgotten case B', which is that Class instances themselves are "pseudo" arrays, in that they contain zero or more locations of all of the static fields of a particular class. The size of this variable area is stored as a normal field in the same object, so to get the actual size of the a Class object you need to add the nominal size + the variable size. But the code that allocated the Class objects (which is the regular slow-path allocation code in runtime) wrote the header first (including the Class object's klass field), and then set the variable size field. I.e. the source code had the fields written out-of-order. This is bug JDK-8158946.

This was fixed, but discussion led to the point that a compiler or weak memory-model CPU might also write the fields out-of-order, so a series of fixes changed the concurrent GC code to use load-acquires when necessary. This is JDK-8160369 and sub-tasks. See in particular oopDesc:: klass_or_null_acquire().

The C++ code, being static, uses store-releases no matter what GC is in use when allocating objects in the slow-path. The jitted code can chose to do normal stores or store-releases depending on the GC being used.

As far as which GCs need to worry about this goes, CMS is clearly in danger with this issue on weak memory model systems. I don't have a definitive answer for G1. Thomas makes a good argument that in G1, concurrent GC would only scan a newly allocated object if it were humongous, and there are enough memory barriers around allocating a humungous region that we should be safe. But there were changes made to G1 to use oopDesc:: klass_or_null_acquire(). See http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/1a33f585a889 . Perhaps these are overly conservative?

 - Derek

-----Original Message-----
From: Thomas Schatzl [mailto:thomas.schatzl at oracle.com] 
Sent: Tuesday, December 20, 2016 6:09 AM
To: Andrew Haley <aph at redhat.com>; White, Derek <Derek.White at cavium.com>; aarch64-port-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(s): 8171449" [aarch64] store_klass needs to use store release

On Tue, 2016-12-20 at 10:27 +0000, Andrew Haley wrote:
> On 20/12/16 10:21, Thomas Schatzl wrote:
> > 
> > Derek can very likely tell you more about the protocol between CMS'
> > concurrent threads and the mutators, but both collectors have 
> > protocols to ensure that the memory contains valid information, 
> > either retrying later (G1), or busy waiting (CMS, afaics from code 
> > around a "bugfix for systems with weak memory model" comment) until 
> > this is the case.
>
> Yeah.  This stuff doesn't make any sense: I need to see the code.  I 
> guess that what's going on is that a humongous object is allocated, 
> the memory is zeroed, and the memory is then returned to the Java 
> thread, which then initializes the object.  So, the object is 
> reachable from the GC thread even before it has been initialized.
> But instances of Class aren't humongous... or are they?

  you can have instances of regular classes that are humongous. :) There is no limitation on object instance size in the language spec afaik, but at least it allows as large objects as to qualify as humongous.

In any case, G1 only ever allocates humongous objects in old gen which makes things a lot easier because these regions can only ever contain that single humongous object and nothing else useful. And we only need to take care about appropriate memory barriers in the runtime because humongous objects are always allocated in the runtime.

As for the patch, this changes the generated fast-path eden allocation code, so to be of any problem for the interaction with the concurrent gc threads, the runtime would first have needed to hand out "TLABs"
from old gen.

G1 never does that, so the patch does not seem to be applicable for it at all, but I think for CMS this seems different. The TLAB allocator for CMS can hand out old gen "TLABs" in some cases, e.g. when the GCLocker prevents an immediate GC when running out of space (I did not confirm this particular case yet).

So Derek, can you elaborate a bit on what error case you have in mind?

Thanks,
  Thomas