GC interface update

Wed Apr 26 12:57:29 UTC 2017

Hi Roman,

On 2017-04-26 12:35, Roman Kennke wrote:
> Hi Erik,
>
>
>>> Regarding Shenandoah, what we need there is a way to apply read- and
>>> write-barriers on *every* heap access. This is used to resolve the
>>> target object to its to-space copy. For example, when reading a field
>>> from an object, and that object is in from-space, we first need to read
>>> its forwarding pointer to arrive at the to-space copy and read from
>>> there. Similary, for writes, we first need to invoke some write barrier
>>> magic to copy the target object to to-space, CAS the forwarding pointer
>>> of the from-space object, and then do the write into the to-space copy.
>>> In pseudocode, a store (or load) that used to look like this:
>>>
>>> store(oop obj, int offset, int value)
>>>
>>> now needs to look like this:
>>>
>>> obj = write_barrier(obj)
>>> store(obj, offset, value)
>>>
>>> With regards to the GC interface, this means we need to have access to
>>> the source (for loads) or target (for stores) object, not only the
>>> actual field address. Infact, a field address would be pointless, what
>>> we need is the object+offset.
>>>
>>> Does your proposal provide for this?
>> Yes it does. The HeapAccess class (Access on the Java heap) has
>> store_at, load_at etc that takes a base pointer and an offset - just
>> what you need.
> Perfect!
>
>>> Why don't you push all this into the jdk10-sandbox, under the
>>> JDK-8163329-branch (aka GC-interface-branch) ? We do need to collaborate
>>> on this stuff, and the best way to do that would be with actual code
>>> exchange. It's easy to do in the sandbox: we can go completely wild in
>>> there until we're satisfied ;-)
>> I agree we need to collaborate here. Having said that - I hope your
>> version of "completely wild" is not the same as mine. ;)
> Hehe. As long as it builds... ;-)
>
>> I will push the code to the sandbox.
> Great!
>
>>> To be honest, I wouldn't go over the top to optimize runtime barrier
>>> accesses. I haven't seen a single benchmark yet that suffers from
>>> virtual calls. Shenandoah does introduce *much* more virtual calls (in
>>> its current design), i.e. one for each primitive load and store, and it
>>> doesn't seem to impact performance or show up in profiles on benchmarks
>>> we are running *at all*. It seems like a complete non-issue to me. I
>>> suppose it is possible to construct benchmarks where it does matter
>>> (heavily exercising JNI heap accessors comes to mind), but even then I
>>> don't think virtual calls in the runtime accessors hurt that much. If
>>> you have such benchmark, please please let me (or us) know.
>> Sorry if this was not clear enough, but the main purpose of the template
>> machinery was not to micro-optimize virtual calls. It is more of a nice
>> bonus you get. The main purpose is being able to unite all these
>> different weird accesses with special treatment due to potentially
>> orthogonal semantics requiring them to be treated differently by
>> different GCs. This moves the complexity from the sprinkled special GC
>> treatment code all over hotspot into a contained (and limited)
>> complexity for the mediator between the user of the Access API and the
>> backends. But it is very easy to use both by users of the Access API and
>> backends.
> Ok, that seems fine. Infact I wasn't exactly worried about the templates
> (although it tends to make difficult to find what is actually called
> how, but I'll figure it out). I was more worried about the messing-about
> with function pointers to get barrier calls without virtual. This seems
> like overkill to me.  Considering the typical complexity of a barrier,
> saving one template lookup doesn't seem worth it. Even in the case of
> Shenandoah read barriers, which are a single load operation, we haven't
> see any impact when introducing virtual calls on all primitive and
> reference loads. And with function pointers you still need to make a
> call, you don't get magical inlining or such ;-)
>
> Plus, this additional complexity does have a cost: it opens the door for
> bugs. Somebody needs to understand & maintain it. The way it is, it is
> much harder to understand (for me) where and how a call to HeapAccess
> ends up in specific GC barriers. It's already complex using templates. I
> see no need to make it even more complex by shoving it through a
> function-pointer-indirection machinery.

I understand your point. But believe it or not - the choice to use 
function pointers instead of virtual calls was also largerly motivated 
by keeping it simple rather than the performance benefits (which again, 
are sweet bonuses). Pulling template parameters through virtual calls is 
inherently nasty as virtual member functions can not have template 
parameters. It is possible in theory to make mechanisms fighting the 
language using template classes that are allocated dynamically (and 
freed when installed with contention) and pull the template parameters 
through the virtual call using the template parameters of the class, 
hence mimicking a function pointer with dummy objects. But it is much 
harder to do compared to just using function pointers that requires no 
special memory management or factory stuff to get the template 
properties across the runtime call point. So the function pointer 
approach seemed both simpler and better performant than using virtual calls.

Having said that, I do agree that the internals of the Access API 
mediator between the frontend and backend are a bit complex (yet 
contained). But hopefully we won't have to dig in the mediator logic too 
often. And when we do, I will be around to help out. What it does 
though, remember, is to get of the currently sprinkled complexity of 
special handling weird accesses across hotspot.

>>> My idea for runtime accessors basically boiled down to the API that's
>>> currently in oop.hpp / oop.inline.hpp: i.e. forward all heap access
>>> through the barrier via 1 (and only 1) virtual call. I don't exactly
>>> mind some magic to avoid even this one virtual call, but I question if
>>> it's worth the additional complexity (which doesn't sound exactly
>>> negligible).
>> The simpler API you refer to in oop.hpp does not yet acknowledge all the
>> weird accesses we do - it handles the default heap accesses on only
>> strongly reachable objects and then sprinkles conditionally executed
>> GC-specific barriers around these default accesses at callsites where
>> there are such weird accesses, rather than supplying the semantics. I
>> believe I saw this was on your TODO-list. This system is the result of
>> going down that rabbit hole.
> I see this, and I like it. Very good stuff!

I'm glad you like it!

>>> We can help with the arm64 port. :-)
>> Thank you, very glad to hear that! :)
>>
>>> Those comments are purely based on your description. Now I'm going to
>>> study your patch :-)
>> May I recommend a cup of coffee...
> I am still wrapping my head around it. Currently trying to understand
> how the previous if (UseG1GC) { G1SATBBarrierSet::enqueue(v); } is
> solved for loads of referent fields. I might ping you on IRC...

It is solved by specifying the semantics of the access rather than the 
implementation of GC-specific barriers at the call sites.
Users of the Access API can for example say that it's an unsafe load 
where we don't know what object it is (possible Reference referent field):

HeapAccess<ACCESS_ON_ANONYMOUS>::load_at(_obj, _offset); (from unsafe.cpp)

...or the user can say it's a weak root of some sort, like in 
jvmtiTagMap.cpp:

RootAccess<GC_ACCESS_ON_PHANTOM>::oop_load(object_addr()); ("vm weaks" 
are phantom-like in strength, hence the name)

These properties imply that we might want to do a SATB enqueue barrier. 
The BarrierSet backend gets all decorators passed in to its 
AccessBarrier where this information is caught.
Take the example of how this is handled in the G1 backend for 
oop_load_at (g1BarrierSet.inline.hpp):

template <DecoratorSet decorators>
inline oop G1BarrierSet::AccessBarrier<decorators>::oop_load_at(oop 
base, ptrdiff_t offset) {
   oop load_val = ModRef::oop_load_at(base, offset);
   if (!DecoratorTest<decorators>::HAS_ACCESS_WEAK &&
       (DecoratorTest<decorators>::IS_ON_REFERENCE ||
        (DecoratorTest<decorators>::HAS_ACCESS_ON_ANONYMOUS &&
         load_val != NULL &&
         !oopDesc::is_null(base) &&
         is_referent_field(base, offset)))) {
     satb_enqueue(load_val);
   }

   return load_val;
}

The DecoratorTest checks determine if the passed in semantics do indeed 
imply that a SATB enqueue operation is required for the load or not. The 
DecoratorTest checks are all constant folded and hence checking them and 
handling different cases in the same load function comes with no runtime 
cost. This means we can be very expressive in how we filter out barrier 
calls in the accesses.

The names of the decorators and their DecoratorTests are of course up to 
debate, but I hope this demonstrates the idea and helps grasping it.

> I guess it makes sense that I close the bugs that I opened around
> BarrierSet refactorings, withdraw the RFR that I sent and leave this
> stuff to you :-)

That would be great!

Thanks,
/Erik

> Cheers, Roman
>
>