RFR: JDK-8207169: X86: Modularize cmpxchg-oop assembler for C1 and C2

Fri Aug 31 14:16:36 UTC 2018

Hi Roman and Roland,

In C1, just as with C2, the barriers should be represented with LIR first, before being lowered to machine code. Making the raw CAS be intercepted at the assembly level, with temp register allocations for the barriers done on a higher level, that at the last second inserts GC barriers, is placing the barriers at the wrong abstraction layer IMO, and is indeed something that no other GCs do.

So if you have some kind of loop for your CAS, your ShenandoahBarrierSetC1 can generate the LIR for that, and allocate registers accordingly, letting the register allocator do its thing.

If you do have machine dependencies in some part of your barriers, you can make your own barrier LIR in your ShenandoahBarrierSetC1 files that will have its relevant visitors called via virtual calls.

I hope this helps.

Thanks,
/Erik

> On 29 Aug 2018, at 21:59, Roman Kennke <rkennke at redhat.com> wrote:
> 
> So how about we leave C2 out of the picture for now, and do the C1 part.
> This is a pretty easy affair. The only little thing is if/when to
> allocate tmp registers, but we can easily solve this by having
> BarrierSetC1 pass actual tmp registers or illegalOpr to
> LIRGenerator::atomic_cmpxchg(), instead of handling this in
> atomic_cmpxchg() altogether. Right?
> 
> http://cr.openjdk.java.net/~rkennke/JDK-8207169/webrev.01/
> 
> If we later decide to handle the C2 case by using the same routine, we
> can still extend the interface, or else come up with a better solution.
> 
> What do you think?
> 
> Roman
> 
>> Hi Roman,
>> 
>>> On 2018-08-29 17:14, Roman Kennke wrote:
>>> Hmm right, generating specific e.g. ShenandoahCompareAndSwapPNode and
>>> putting an expansion for that in the .ad file seems like a good idea.
>> 
>> I meant generating CAS and GC barrier nodes for the intrinsic, and macro
>> expanding the barrier nodes during the macro expansion phase into an
>> inflated node representation describing their barrier logic, and leave
>> the AD file alone.
>> 
>>> We tried generating ideal graph for what we need for cmpxchg-oop and it
>>> was complicated and error-prone and not really worth it. We may try
>>> again.
>> 
>> It is fiddly indeed. But the resulting interface is much cleaner I
>> think. I don't think putting things in the AD file for convenience is
>> necessarily the right thing to do. I do see the temptation of doing so
>> though, as it is quite a shortcut. But I think the AD files was meant to
>> describe how nodes map to different architectures, not be a place where
>> you put things for convenience that were tricky to stitch together the
>> right node graph for. At least that is how I view it.
>> 
>> Thanks,
>> /Erik
>> 
>>> Roman
>>> 
>>>> I agree with you that it would be nice to have a good interface for
>>>> this, and that putting your GC barriers straight into the AD file makes
>>>> it very awkward to have a clean interface. It will be a special
>>>> interface for shenandoah only, that only shenandoah uses. All other GCs
>>>> generate nodes for their barriers. It seems like a bit of a shortcut for
>>>> not having to generate the proper nodes for your barriers, which I think
>>>> is the right level to do this, and is where all other GCs do this. The
>>>> rules matched in the AD file kind of presume you know what instructions
>>>> will be generated. This is required for example for PrintOptoAssembly to
>>>> generate anything useful. It might not be a dealbreaker I suppose if
>>>> PrintOptoAssembly prints nonsense. But I can't help but feel like the
>>>> abstraction level is not right here, and that placing GC barriers
>>>> straight into the AD file for convenience, is a bit of a hack.
>>>> 
>>>> An alternative that I think should be considered, is to do what ZGC
>>>> does: generate nodes in ZBarrierSetC2, and macro expand barrier nodes in
>>>> ZBarrierSetC2. That way, your changes can remain in your
>>>> ShenandoahBarrierSetC2 class, and not leak out into the AD files.
>>>> 
>>>> Thanks,
>>>> /Erik
>>>> 
>>>>> On 2018-08-29 16:35, Roman Kennke wrote:
>>>>> Hi Roland,
>>>>> 
>>>>>>> It was suggested to me (off-list) to simply put this stuff under if
>>>>>>> (UseShenandoahGC). I don't have a very strong preference about this.
>>>>>>> I'm
>>>>>>> leaning towards having a (somewhat) proper GC interface for it, if
>>>>>>> only
>>>>>>> because of the symmetry with the runtime GC interface that also has a
>>>>>>> cmpxchg hook, and because, cmpxchg(-oop) is, in-fact, a
>>>>>>> heap-access and
>>>>>>> should thus go through the GC interface.
>>>>>>> 
>>>>>>> So, how do others feel about this? Better to put this under
>>>>>>> Shenandoah-specific paths? Make it a proper
>>>>>>> GC/BarrierSet-interface? I
>>>>>>> also very much welcome suggestions for improving the suggested
>>>>>>> interface.
>>>>>> FWIW, I made that off-list comment. My objection is that:
>>>>>> 
>>>>>>     666     if
>>>>>> (BarrierSet::barrier_set()->barrier_set_assembler()->handle_cmpxchg_oop())
>>>>>> 
>>>>>> {
>>>>>>     667       tmp1 = new_register(T_OBJECT);
>>>>>>     668       tmp2 = new_register(T_OBJECT);
>>>>>>     669     }
>>>>>> 
>>>>>> is shenandoah specific. I suppose you could have an API entry point
>>>>>> that
>>>>>> would return some opaque object. That object for shenandoah would
>>>>>> record
>>>>>> 2 extra registers. Then in the backend, you would pass that opaque
>>>>>> object to BarrierSetAssembler::cmpxchg_oop_c1() and the shenandoah
>>>>>> implementation would get its 2 extra registers.
>>>>>> 
>>>>>> The problem is that with c2:
>>>>>> 
>>>>>>    7471 instruct compareAndSwapP_BS(rRegI res,
>>>>>>    7472                             memory mem_ptr,
>>>>>>    7473                             rRegP tmp1, rRegP tmp2,
>>>>>>    7474                             rax_RegP oldval, rRegP newval,
>>>>>>    7475                             rFlagsReg cr)
>>>>>>    7476 %{
>>>>>> 
>>>>>> is also shenandoah specific for the same reason (it needs 2 extra
>>>>>> registers). And I don't see a way to properly abstract that.
>>>>>> 
>>>>>> So it seems to me, that code is not properly abstracted, there's no
>>>>>> reasonable way to properly abstract it, and it's actually shenandoah
>>>>>> specific code in disguise so it's better to have it be explicitly
>>>>>> shenandoah specific.
>>>>>> 
>>>>>> Roland.
>>>>>> 
>>>>> Thanks, Roland!
>>>>> 
>>>>> So let me put my argument here too. The GC/BarrierSet-interface already
>>>>> has a couple of interfaces that are Shenandoah-specific. Basically all
>>>>> of what I added for supporting primitive access, object equality and
>>>>> allocations is Shenandoah-specific. There is a cmpxchg-oop interface in
>>>>> the runtime too. But then again, it is totally conceivable that a
>>>>> future
>>>>> GC might need those abstractions. In-fact, as far as I know, ZGC
>>>>> suffers
>>>>> from a similar problem with cmpxchg-oop as Shenandoah does, but there
>>>>> it's solved differently.
>>>>> 
>>>>> Looking from a very-far-away perspective, one of the design ideas
>>>>> behind
>>>>> the current GC/BarrierSet-interface was that the GC should own all heap
>>>>> access. And cmpxchg-oop is quite definitely a heap-access. It is
>>>>> unfortunate that we can't seem to make it really clean with the way
>>>>> that
>>>>> the C2 backend is currently working. What I suggested seemed the
>>>>> closest
>>>>> we can get while remaining practical.
>>>>> 
>>>>> Roman
>>>>> 
>>> 
>> 
> 
>