RFR: 8201543: Modularize C1 GC barriers

Mon Apr 23 21:34:08 UTC 2018

Hi Roman,

On 2018-04-23 22:32, Roman Kennke wrote:
> I've taken a cursory look at it. I like the way it looks :-) Thank so
> *SO MUCH* for this work!

I'm glad you like it.

> - At first I was confused about the additional indirection of LIRAccess.
> But it looks like this is mostly/completely hidden from the actual
> consumer of the API, which would call, e.g. access_load_at() ?

Yes indeed. This is only a wrapper object for the shared state of all 
accesses, to reduce boiler plate code when passing around the same 
famous suspects all over the backends. But the caller does not need to 
know about them.

> - At the same time, do you think the resolve_address() provides a place
> for GC like Shenandoah that needs to resolve the base object and insert
> a read/write-barrier and *then* resolve the barrier(base)+offset? If
> this is the intention, may I request to somehow carry over the
> information whether or not the address is intended for read or write?
> Because we need insert different barriers on reads vs writes. Maybe this
> can be added in the decorators?

Yes that was my intention indeed. I added in a C1_WRITE_ACCESS and 
C1_READ_ACCESS decorator for your convenience. Loads have 
C1_READ_ACCESS, stores have C1_WRITE_ACCESS, and atomics have both.

Full webrev:
http://cr.openjdk.java.net/~eosterlund/8201543/webrev.02/

Incremental webrev:
http://cr.openjdk.java.net/~eosterlund/8201543/webrev.01_02/

It's a good place for ModRef to check whether resolving to a register is 
needed or not too (only required for accesses with write barriers).

> - Or is the intention to resolve the base object on the actual
> load_at_resolved() etc entry points? As far as I can tell this would be
> difficult because we don't get to see the base object anymore?

I think it's best done in resolve. But note that you get the access 
wrapper in the resolved paths, and it retains the base pointer if you 
need it.

> - Thinking forward a little bit... Shenandoah is going to need some
> resolve-hooks in a couple of places that are not practical to handle via
> load/store style access calls. For example in monitorenter/exit code we
> need to insert a write-barrier on the object to be synchronized on. Will
> this be possible/practical to do via resolve_address()?

I think we should probably add a public resolve barrier for that purpose.

> - Does it already handle primitive accesses? From the looks of it, I'd
> say yes. That'd be nice.

The backends should be ready for it. But I think I will leave the 
exercise of finding the callsites to you, as you have spent more time 
finding where they are needed.

> The rest of it looks nice from afar, but I haven't really looked deeper
> yet. I will poke around some more and probably will have more questions...

Okay!

Thanks for having a look at these changes.

/Erik

> Thanks again!
> Cheers, Roman
>
>
>> Hi everyone,
>>
>> I sat down with Rickard and Per at the office to have a look at this,
>> and have built a new webrev based on their feedback.
>> The main elements in the delta are the following:
>>
>> 1) Wrap various context information that is passed around in the
>> BarrierSetC1 hierarchy in a wrapper object (to reduce boiler plate),
>> that has been named LIRAccess. It contains the address elements (base
>> and offset, as either LIRItem or LIROpr), as well as the decorators, and
>> the CodeEmitInfo of the address (for patching), the CodeEmitInfo for the
>> access (for things like implicit null checks), and the LIRGenerator
>> instance, that would normally be passed around to every function.
>> 2) Added #ifdef COMPILER1 in the G1BarrierSetC1 classes to be polite to
>> people trying to build HotSpot with a generated interpreter but no C1
>> compiler.
>> 3) Added a decorator_fixup() method that applies various implicit
>> decorator rules for sane defaults (for example, IN_HEAP_ARRAY implies
>> IN_HEAP). This is a 1:1 mirror to the DecoratorFixup meta function used
>> in the runtime backend. Both are now located in accessDecorator.hpp. One
>> for use by templates (DecoratorFixup), and one for use by code
>> generators that do not use templates (decorator_fixup()).
>> 4) Removed some unnecessary includes and friend class declarations.
>> 5) Made BarrierSetC1 a member of LIRGenerator to reduce boiler plate
>> further.
>> 6) Changed name of the lir_generator variable passed around to gen, to
>> be consistent with what other code in C1 does when passing around the
>> LIRGenerator instance.
>>
>> Incremental webrev:
>> http://cr.openjdk.java.net/~eosterlund/8201543/webrev.00_01/
>>
>> Full webrev:
>> http://cr.openjdk.java.net/~eosterlund/8201543/webrev.01/
>>
>> Big thanks to Rickard and Per for having a look at this.
>>
>> Thanks,
>> /Erik
>>
>> On 2018-04-13 17:11, Erik Österlund wrote:
>>> Hi,
>>>
>>> The GC barriers for C1 are not as modular as they could be. It
>>> currently uses switch statements to check which GC barrier set is
>>> being used, and calls one or another barrier based on that, in a way
>>> that it can only be used for write barriers.
>>>
>>> The solution I propose is to add the same facilities that have been
>>> added in runtime and the interpreter already: a barrier set backend
>>> for C1. I call it BarrierSetC1, and it helps us generate decorated
>>> accesses that give the GC control over the details how to generate
>>> this access. It recognizes the same decorators (accessDecorators.hpp)
>>> that the other parts of the VM recognize. Each concrete barrier set
>>> has its own backend. For now, these are CardTableBarrierSetC1 and
>>> G1BarrierSetC1, but this should pave way for upcoming concurrently
>>> compacting GCs as well.
>>>
>>> Two decorators were added for C1 specifically (found in
>>> c1_Decorators.hpp):
>>> C1_NEEDS_PATCHING for accesses where the index is not yet load because
>>> the class has yet to be loaded, and
>>> C1_MASK_BOOLEAN for accesses that need to mask untrusted boolean values.
>>>
>>> LIRGenerator calls a wrapper called access_store_at, access_load_at,
>>> etc (there are variants for cpmxchg, xchg and atomic add as well).
>>> The access call calls straight into the BarrierSetC1 hierarchy using
>>> virtual calls. It is structured in a way very similar to
>>> BarrierSetAssembler.
>>>
>>> BarrierSetC1 can also be called during initialization to generate
>>> stubs and runtime methods required by C1. For G1BarrierSetC1, this
>>> results in calling the BarrierSetAssembler for the platform specific
>>> code. This way, the BarrierSetC1 hierarchy has been carefully kept in
>>> shared code, and the switch statements for generating G1 code have
>>> been removed. Some code that used to be platform specific (like unsafe
>>> get/set and array store) have been broken out to shared code, with the
>>> actual platform specific details (some register allocation for store
>>> check and atomics) broken out to platform specific methods. This way,
>>> calls to access are kept in platform specific code.
>>>
>>> As usual, big thanks go to Martin Doerr for helping out with S390 and
>>> PPC, and Roman for taking care of AArch64.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8201543
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8201543/webrev.00/
>>>
>>> Thanks,
>>> /Erik
>