RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1 and C2

Mon Sep 3 09:25:12 UTC 2018

Hi Roman,

It did not use to be possible as it needed its own enum switches all 
over the place. But as part of my C1 barrier set interface work, I 
wanted to make it possible to make your own LIR_Ops in the barrier set 
as well without cluttering the switches and inserted appropriate virutal 
calls to the LIR_Ops allowing you to do that. Now, basically, if your 
LIR_Op id is lir_none (which the default constructor sets it to), then 
it will use virtual calls into your LIR_Op in the switch statements.

I see how inserting LIR loops in the HIR basic block in the general case 
can go horribly wrong as Roland showed in his example. So if you feel 
like defining your own LIR_Op and lower it in your barrier set is the 
more natural solution for Shenandoah, you can use that mechanism of course.

It sounds like we have reached an agreement?

Thanks,
/Erik

On 2018-09-03 10:59, Roman Kennke wrote:
> I wasn't sure that the BarrierSetC1 interface allows to define custom ops. This sounds like a good natural solution. Ditto for C2. Let's see if we can make that work.
>
> Roman
>
> Am 3. September 2018 10:37:04 MESZ schrieb "Erik Österlund" <erik.osterlund at oracle.com>:
>> Hi Roland,
>>
>> First of all, I apologize for getting your name wrong in the last
>> email.
>>
>> On 2018-08-31 16:46, Roland Westrelin wrote:
>>>> Well... C1 uses CAS in the heap only for the Unsafe CAS intrinsic,
>>>> which is indeed inserted at parse time. And all other GCs alter the
>>>> CFG for the GC barriers in their CAS barriers, using LIR. Except
>>>> Epsilon I suppose.
>>> Are you talking about for instance G1BarrierSetC1::pre_barrier()?
>> That
>>> method adds control flow withing a basic block. It doesn't hack the
>> CFG
>>> (it doesn't add new basic blocks). How can the register allocator
>>> compute liveness without a correct CFG? Either
>>> G1BarrierSetC1::pre_barrier() is a simple enough case that register
>>> allocation is correct or there are some nasty bugs in there. In any
>>> case, building control flow within a block like
>>> G1BarrierSetC1::pre_barrier() does is an ugly hack. Doing anything
>> more
>>> complicated that way is asking for trouble.
>> The C1 basic blocks are built and optimized as part of the HIR and are
>> not to be changed after that. Once the HIR is generated, the LIR
>> inserts
>> operations required for lowering this optimized HIR to machine code.
>> After IR::compute_code() of the HIR, those basic blocks are set in
>> stone. That means that any control flow alterations needed by the
>> LIRGenerator, which comes into play after that, is going to use
>> branches
>> within the HIR basic block instead (as we promised not to change the
>> HIR
>> basic blocks after the HIR is built and optimized). I can see how that
>> might feel like a hack, but that is kind of the way that things are
>> currently done in C1. It is used this way for all barrier sets today
>> (UseCondCardMark for card marking GCs, for G1, ZGC), and it's also used
>>
>> by T_BOOLEAN normalization, switch statements, checking for referents
>> in
>> unsafe intrinsics etc. I suppose the stubs inserted at the LIR level
>> also similarly break the basic block abstraction of the HIR level.
>> These
>> are things that can of course be changed into a more strict basic block
>>
>> model even at the LIR level. But I don't know how much that would help
>> given that this is just the pass before lowering to machine code. But
>> that is a whole different discussion.
>>
>> I do not propose to move the GC barriers into the HIR - it is too
>> early.
>> I propose to insert it at the LIR level like all the other GCs, in a
>> similar way to all the other GCs, using the same mechanisms used by all
>>
>> the other GCs.
>>
>> @Roman: If you feel more comfortable using your own LIR_Op with your
>> own
>> lowering or stubs instead because you want this written in assembly for
>>
>> whatever reason, then I am fine with that too as long as it is
>> contained
>> in the shenandoah folders. What I do have reservations against is to
>> change the API that everybody else uses to make the LIRGenerator raw
>> CAS
>> get lowered into a not raw Access call to the macro assembler, passing
>> in temporary registers used by Shenandoah from above into the raw cas
>> used by the not raw macro assembler access CAS.
>>
>> For example, in ZGC we have a class LIR_OpZLoadBarrierTest LIR_Op
>> defined in zBarrierSetC1.cpp, which allows us to do custom machine
>> dependent lowering of the test itself, which can be inserted into the
>> LIR list.
>>
>> I hope we are on the same page here!
>>
>> Thanks,
>> /Erik
>>
>>> Roland.