RFC: 8160369 - Memory fences needed around setting and reading object lengths

Wed Jun 29 19:46:22 UTC 2016

[Request for Comment]

This is a follow-on to some discussion that's been taking place on bug 
https://bugs.openjdk.java.net/browse/JDK-8160369.

 From Thomas:

> Regular objects also need the barriers. I.e. the writing of the layout 
> helper of the Klass must happen before publishing the Klass pointer to 
> internal data structures (i.e. it is accessible by the mutator 
> threads). Regular (huge) objects may be allocated directly into the 
> old gen too. The issue is a bit simpler here, because we can assume 
> that there is an implicit loadload barrier when the reader accesses 
> the layout helper. Also, at least in G1, regular objects are always 
> allocated through the runtime.
Hi Thomas,

I think we might be in agreement about what should happen, but are 
differing in terms.

I agree that when creating the C++ Klass object, we should have a 
storestore barrier between setting the layout helper field and 
publishing the Klass to the world. It sounds unlikely that a concurrent 
GC would ever see a store that stayed out-of-order through Klass 
creation, loading, initialization, and instance allocation, but I'd 
rather avoid that bug now :-)

And I agree that the implicit loadload barrier in code like 
oop->klass()->layout_helper() is sufficient for the processors we care 
about.

I'm not sure what you mean with the phrase "Regular objects also need 
the barriers" though. Do you mean in CollectedHeap::obj_allocate(), like 
we're adding to array_alloc() and class_alloc()? What is being written 
here that needs ordering?

> And java.lang.Class instances also only need the barrier if allocated 
> into the old gen (afaik they are not).

Interesting point. It turns out that huge java.lang.Class instances 
/can/ get allocated in the old gen. Also, Colleen mentioned that it 
might be nice if we could allocate all java.lang.Class instances 
directly in the old gen.

It might be a good idea to only do the memory barriers when the object 
is in the old gen, but I'm not sure how the cost of testing this at 
runtime compares to the cost of the barriers themselves. I've been 
assuming that, statistically speaking, CollectedHeap::/foo/_allocate() 
is only called for old gen allocations, and the rest are handled by the 
compiled code or interpreter. Modulo a pile of corner cases.

Similarly, we use memory barriers only when we're using a concurrent GC, 
if the cost trade-off made sense.

> Also those "other" cases where the layout helper is < 0, and we take 
> the virtual call must be considered too (whatever these are) to make 
> sure that a loadload barrier is executed (assuming that instances of 
> these kind can be somehow allocated in old gen).

I think the virtual calls for oop_size() end up using the layout helper 
(after all), using the array length, or using the oop_size field. So 
resolving the issue of ordering the length field or oop_size field 
should be sufficient.

Thanks for looking at this issue!

  - Derek

FYI - I have two stabs at fixing this issue:

http://cr.openjdk.java.net/~drwhite/8160369/webrev.02 - Minimal 
barriers, but I'm not sure it's catching everything.

http://cr.openjdk.java.net/~drwhite/8160369/webrev.03 - Smaller patch, 
possibly overly conservative.

In both cases, I've used the call to oopDesc::klass_or_null() as a flag 
that the caller is some part of concurrent GC that is written to safely 
handle partially initialized objects. If klass_or_null() is called in 
other circumstances then my patches are doing too many barriers...