GC interface update
erik.osterlund at oracle.com
Tue Apr 25 15:35:57 UTC 2017
I'm glad to see that we all want to go towards modularizing the GC
implementations in hotspot more. Thank you Roman for starting this
thread. I have wanted a better GC interface since I first set foot in
As mentioned, I have been cooking up a GC barrier interface prototype
based on ideas mentioned earlier in this thread. I will provide a
preview of where it is headed in this email before we start to diverge
I have a long patch queue with many individual changes, but to get an
overview for the discussion here, I will start by posting a combined
webrev for preview of the whole thing as a pre-review. Once the real
detailed review starts, I will start sending out the smaller incremental
changes that are easier to grasp in reviews.
The webrev is based on the latest jdk10-hs repo.
Full webrev: http://cr.openjdk.java.net/~eosterlund/gc_interface/webrev.00/
== High Level Design Philosophy ==
The overall design idea has been to remove all explicit calls to barrier
code in the VM. The barrier code is often required for conforming to
some kind of semantics. So rather than having these explicit barrier
calls in the shared code, the new API is instead to perform a memory
access that specifies the intended semantic properties instead. These
semantic properties are not only GC related but could be any property
that is important for performing a memory access with the right
semantics. These semantic properties are called decorators in my API.
So for example, a decorator for a load on an oop could be ACCESS_ON_HEAP
to denote that this access is performed in the Java heap, and MO_ACQUIRE
that the access should have acquire memory ordering semantics and
ACCESS_ON_WEAK to denote that the access is performed on a weakly
reachable reference. This would in the end need to boil down to the
1) Compressed oops to decode narrow oops
2) Potentially an acquire membar on e.g. ARM machines
3) Potentially a SATB enqueue barrier for SATB-type GCs
So rather than treating these differently and having runtime-resolved
explicit barriers built bottom up, the approach is to build accesses top
down instead. The GC may override the whole access to to anything, but
will probably want to reuse things like decoding and encoding compressed
oops, and the pre/post write pattern and memory ordering. Therefore
barrier sets may ask their super class to fill in such details. This
allows arbitrary level of control without introducing code duplication.
Each BarrierSet has 4 barrier-related components with a similar design.
1) An AccessBarrier class responsible for performing accesses requested
by the runtime system through a new Access API (more later)
2) A BarrierSetCodeGen class responsible for generating accesses in
platform-specific assembly code (stub generators and interpreter)
3) A C1BarrierSetCodeGen class responsible for generating accesses for
the C1 compiler
4) A C2BarrierSetCodeGen class responsible for generating accesses for
the C2 compiler
So there is one class for each part of hotspot (runtime,
platform-specific, c1, c2), and they all follow a class hierarchy
mirroring their respective BarrierSet hierarchy to reuse more general
functionality like memory ordering and compressed oops.
== Runtime: Access API ==
The runtime part of the API goes through a new class called Access. All
decorated accesses should go through this interface. It makes heavy use
of templates to perform the right accesses and barriers in the most
optimal way, by connecting the intended Access semantics to the
appropriately decorated AccessBarrier of the current BarrierSet. It
combines different decorators in a pipeline that are resolved at
different times in the JVM life cycle, but handled in the same way. Some
decorators are resolved at build-time, like for example whether the
build needs to support barriers on primitives. If Shenandoah is built
for example, this decorator will be set, and if Shenandoah is not built,
it will not be set. Other decorators are resolved statically at the call
site, such as what strength a reference has. Yet some decorators are
resolved at runtime, such as whether compressed oops are used or not and
which garbage collector was selected.
When there exists runtime dependencies for resolving a barrier, the
Access system will generate function pointers for the access. The
function pointers initially point to a resolver function that checks the
selected runtime properties, and then patches the function pointer to
point to a statically generated function with those properties set, so
that the next time the function is called, it will call straight into
the appropriate barrier. This means that where we would previously have
multiple virtual calls for pre- and post-write barriers, followed by if
checks for compressed oops, all of that boils down to a single function
pointer call that then has inlined everything that needs to be done for
that set of runtime parameters.
The goal has been to separate out GC-specific code to GC-specific
directories as far as barriers are concerned. To glue this together,
there is a barrierSetConfig.hpp and barrierSetConfig.inline.hpp. The
barrierSetConfig.hpp configures what barrier sets there are and produces
a macro allowing you to do something for each barrier set. This is used
by barrier resolution at runtime. The barrierSetConfig.inline.hpp
basically just includes in the GC-specific inline headers to allow
inlining the GC barriers all the way. So anyone making a new GC should
put their GC in there. I added a Shenandoah GC therere as an example so
you can see what I mean.
The Access API goes through a template pipeline. First the Access class
bridges the API to functions in the AccessInternal namespace. This
involves using temporary proxy objects to artificially infer the return
types of loads. Then in the AccessInternal namespace the types are first
decayed, meaning that CV-qualifiers and references are stripped. Then
types of addresses and values are joined, at which times certain
decorators are infered like the use of compressed oops when e.g. loading
an oop from a narrowOop*. Other implicit decorarors are also inferred
then, such as a default memory ordering if none is specified, and other
rules related to memory ordering such as sequential consistent stores
implicitly also being releasing stores etc. Then buildtime decorators
are added and a pre-runtime stage is reached where the mechanism tries
to bind accesses statically if possible, and otherwise producing a
runtime-dispatch point that statically generates all possible runtime
variants of the access and a self-patching function pointer that
resolves the correct variant at runtime. These statically generated
barriers are resolved through the BarrierSet AccessBarrier that gives
the GC full control for generating an appropriate access. It can use the
DecoratorTest class to check for different decorators specifying
semantics that add barriers altering the access. Eventually, a super
class of the AccessBarrier called BasicAccessBarrier that handles
compressed oops and it calls RawAccessBarrier that inspects the decayed
times and forwards to appropriate calls to Atomic, OrderAccess or
performs volatile or raw accesses depending on selected memory ordering.
I have then applied the Access API to many weird accesses performed in
the runtime system where we check if we are using G1 and then
subsequently doing some weird ad-hoc SATB enqueue barrier. Examples
include the string table, ciMetadata and jvmtiTagMap, unsafe get,
reference get, jweak resolve etc. These accesses now use decorated
accesses through Access instead.
== C1 ==
The shared C1 barrier code has been moved into the C1BarrierSetCodeGen
class for each specific barrier set. It generates decorated accesses,
and decorates it with GC barriers as required by the specified
semantics. The slowpath stubs have been refactored. The code sutbs have
moved into the C1BarrierSetCodeGen class and it assembles the machine
code with the platform specific BarrierSetCodeGen assembler. The
runtime1 code stubs have been changed to not be generated in switch
statements, but instead with a code generation closure that calls into
the C1BarrierSetCodeGen that calls assembles the runtime1 stub with the
platform specific BarrierSetCodeGen.
The design of accesses going through C1BarrierSetCodeGen is consistent
with the rest of the Access API: the accesses are built top down and
allows overriding the whole operation. The C1BarrierSetCodeGen class
mirrors the class hierarchy of the BarrierSets.
== C2 ==
Similar to the C1BarrierSetCodeGen, the C2BarrierSetCodeGen helps the
GraphKit generate decorated accesses top-down. The class hierarchy of
the C2BarrierSetCodeGen class mirrors the class hierarchy of the
BarrierSets. Since C2 expands GC barriers rather early and then pulls
the barriers through optimizations, there are some additional calls to
be able to distinguish barrier-related nodes from non-barrier nodes.
== Graal ==
For now I only try not to break the Graal port used for AoT in the
hotspot repository. Ideally, graal would follow the same pattern, but
initially this is out of scope for me.
== GC: BarrierSet consolidation ==
The hierarchy of our barrier sets seem unnecessarily deep - partially
because the card table itself is part of the card table barrier sets. I
have split the card table hierarchy and separated it from the barrier
set hierarchy. A CardTableModRefBarrier *has* a CardTable. As a result
the hierarchy could be simplified a lot to contains only BarrierSet,
ModRefBarrierSet, CardTableBarrierSet and G1BarrierSet. G1BarrierSet and
CardTableBarrierSet are the only leaves, and ModRefBarrierSet is only a
small helper class.
== Colaboration ==
I hope you like the direction this is going and hope it will suit
Shenandoah as well. I have not yet applied the Access API for all
primitives yet because I thought that you probably have a better idea
where they are since your GC uses such barriers a lot more. But the
framework should be able to support that without much trouble. So I hope
we can work together a bit on this. If there are any shortcomings, I
hope we can work it out together.
Also, as you can see, I have only provided x86 and SPARC ports so far.
The architecture specific code mostly involves the stub generators, the
interpreter, and the G1 C1 slow path stuff. I was hoping to eventually
get some help from other port maintainers to port this to their
respective platforms. If you feel compelled to help porting this to ARM,
I would be very happy. ;)
And perhaps somebody would like to help getting PPC and S390 on board
too. I thought I would at least start the discussion now.
So yeah, hope everyone likes this direction. If there are any questions,
I will happily answer them. Any feedback is very welcome.
On 2017-04-25 14:05, Per Liden wrote:
> On 2017-04-24 15:46, Roman Kennke wrote:
>> Am 24.04.2017 um 08:37 schrieb Per Liden:
>>> On 04/20/2017 02:29 PM, Roman Kennke wrote:
>>>> Am 20.04.2017 um 14:01 schrieb Per Liden:
>>>>> On 2017-04-20 12:05, Aleksey Shipilev wrote:
>>>>>> On 04/20/2017 09:37 AM, Kirk Pepperdine wrote:
>>>>>>>> Good stuff. However, one thing I'm not quite comfortable with
>>>>>>>> is the
>>>>>>>> introduction of the GC class (and its sub classes). I don't quite
>>>>>>>> see the
>>>>>>>> purpose of this interface split-up between GC and CollectedHeap. I
>>>>>>>> CollectedHeap as _the_ interface (but yes, it needs some love),
>>>>>>>> as a
>>>>>>>> result I think the the functions you've exposed in the GC class
>>>>>>>> belongs in CollectedHeap.
>>>>>>> I thought the name CollectedHeap implied the state of the heap
>>>>>>> after the
>>>>>>> collector has completed. What is the intent of CollectedHeap?
>>>>>> No, CollectedHeap is the actual current GC interface. This is the
>>>>>> entry point to
>>>>>> GC as far as the rest of runtime is concerned, see e.g.
>>>>>> Universe::create_heap(), etc. Implementing CollectedHeap,
>>>>>> CollectorPolicy, and
>>>>>> BarrierSet are the bare minimum required for GC implementation
>>>>>> today. 
>>>>> Yep, and I'd like us to move towards tightening down the GC
>>>>> interface to
>>>>> basically be cleaned up versions of CollectedHeap and BarrierSet.
>>>>> CollectorPolicy and some other things that class drags along, like
>>>>> AdaptiveSizePolicy, are way too collector specific and I don't think
>>>>> that should be exposed to the rest of the VM.
>>>> Right, I totally agree with this.
>>>> BTW, another reason for making a new GC interface class instead of
>>>> further bloating CollectedHeap as the central interface was that there
>>>> is way too much implementation stuff in CollectedHeap. Ideally, I'd
>>>> to have a true interface with no or only trivial implementations
>>>> for the
>>>> declared methods, and most importantly nothing that's only ever needed
>>>> by the GC itself (and never called by the runtime). But as I said, I'm
>>>> not against a serious refactoring and tightening-up of CollectedHeap
>>> Yes, I'd like to keep CollectedHeap as the main interface, but I
>>> completely agree that CollectedHeap currently contains too much
>>> implementation stuff that we probably want to move out.
>> Ok, I will revert that part of the change to use CollectedHeap as main
>> interface then. It's no big deal, so far I only had one additional
>> method for servicability support in the GC interface class anyway.
> Ok, sounds good.
> And regarding BarrierSet. As you know, Erik Österlund is working on
> overhauling BarrierSet and how barriers are used across the VM. He'll
> be sending out his current proposal later today.
>> Would you also prefer keep 'management' of the heap in Universe too?
>> I.e. Universe::create_heap() and Universe::heap() etc? Or do you see a
>> benefit in moving it out like I did with gc_factory.cpp? The idea being
>> that there's only one smallish place that knows about all the existing
>> GC impls?
> I'd like to keep Universe::heap() and create_heap(), but I'd like to
> move away from our current if-else if-else.. and instead have a more
> declarative way of saying which GC's are available. create_heap()
> would then just walk the list of available GC and ask if it's enabled
> and if so create an instance. I think we'd want to do something
> similar to (or even combine this with) what Erik is doing in his
> BarrierSet patch.
> In general, to make it easier to review/test/integrate all these
> changes it would be good if we can have incremental patches, each
> addressing some specific/contained area.
More information about the hotspot-gc-dev