[aarch64-port-dev ] RFR: JDK-8214527 AArch64: ZGC for Aarch64

Thu Jun 13 03:18:30 UTC 2019

Hi Stuart,

+// Class for 128 bit register v4
+reg_class v4_reg(
+    V4, V4_H
+);

The comment should be "64 bit register"?

And in generate_load_barrier_stub():

+  // Save live registers
+  RegSet savedRegs = RegSet::range(r0,r28) - RegSet::of(raddr);
+
+  __ enter();
+  __ push(savedRegs, sp);

I think just saving call clobbered registers should be OK?

Thanks,
Ningsheng

On 6/12/19 11:18 PM, Stuart Monteith wrote:
> Hello,
>     I believe I've addressed the outstanding issues. As you pointed out
> Andrew H., getting a vectorized loop to provoke a LoadBarrier to spill
> vector/float registers is proving difficult.
> 
> I've added the spilling of floating pointer registers to load_at in
> zBarrierSetAssembler_aarch64.cpp and I've modified aarch64.ad and
> z_aarch64.ad to spill the vector registers, I think. I'd appreciate it
> if Andrew D wouldn't mind giving it a once over - I my wrongness or
> correctness should be obvious - it's not clear to me whether the whole
> 128 bits is spilled, or whether just 64 would be spilled.
> 
>      http://cr.openjdk.java.net/~smonteith/8214527/webrev.5/
> 
> I've tested against Lucene, SPECjbb2015 and some limited JTreg runs -
> I've got a full run running just now.
> 
> Thanks,
>     Stuart
> 
> On Tue, 11 Jun 2019 at 09:53, Andrew Haley <aph at redhat.com> wrote:
>>
>> On 6/10/19 10:38 PM, Stuart Monteith wrote:
>>> With ZGC we are emitting more lea macro instructions. In some
>>> circumstances I found that a post indexed address was being passed to
>>> LEA, and so we have to cover that circumstance. In this case, the
>>> effective address is just the base address, as it is changed after the
>>> address is calculated.
>>
>> Can we not find some way to fix this? Presumably Intel doesn't do that
>> because it has no post-increment instructions.
>>
>>> rscratch2 is loaded with the effective source address, which you'll
>>> see is passed to the load barrier routine, further down in the method
>>> - would you like a comment clarifying?
>>
>> The use of scratch registers in the AArch64 barrier code has already
>> led to runtime bugs; it's not a model to be imitated.  Please do not
>> use scratch registers to pass parameters. It's too risky. Also, please
>> do not expect scratch registers to be preserved across macro
>> expansions.
>>
>>>       // call_VM_leaf uses rscratch1.
>>>       __ call_VM_leaf(ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr(decorators),
>>> dst, rscratch2);
>>>
>>> As for the vector registers, I should be saving the full 128 bits of
>>> v0 to v31?  ZBarrierSetAssembler::load_at on x86 saves 8 SSE XMM
>>> registers - would they be considered the same as the save-on-call
>>> d0-d7 registers on Aarch64?
>>>
>>> I notice that z_x86_64.ad will kill xmm, ymm and zmm registers (SSE,
>>> AVX, AVX-512 registers?), depending on the machine it is on. I presume
>>> the risk we have here is that during autovectorization these registers
>>> will be lost if we are unlucky enough to have the barrier code
>>
>> Explicit KILLs in the AD file are the thing to do here. This uses the
>> native calling convention.
>>
>> --
>> Andrew Haley
>> Java Platform Lead Engineer
>> Red Hat UK Ltd. <https://www.redhat.com>
>> https://keybase.io/andrewhaley
>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671