RFR: JDK-8214527 AArch64: ZGC for Aarch64

Thu Jun 13 10:13:22 UTC 2019

I also thought that the comment should be "64-bit" rather than
"128-bit", but what I did was consistent with v0_reg to v3_reg. I
presume one of them is wrong.

The clobbered registers are r0-r18, so I'll change it to that, thanks.

On Thu, 13 Jun 2019 at 04:22, Ningsheng Jian <ningsheng.jian at arm.com> wrote:
>
> Hi Stuart,
>
> +// Class for 128 bit register v4
> +reg_class v4_reg(
> +    V4, V4_H
> +);
>
> The comment should be "64 bit register"?
>
> And in generate_load_barrier_stub():
>
> +  // Save live registers
> +  RegSet savedRegs = RegSet::range(r0,r28) - RegSet::of(raddr);
> +
> +  __ enter();
> +  __ push(savedRegs, sp);
>
> I think just saving call clobbered registers should be OK?
>
> Thanks,
> Ningsheng
>
> On 6/12/19 11:18 PM, Stuart Monteith wrote:
> > Hello,
> >     I believe I've addressed the outstanding issues. As you pointed out
> > Andrew H., getting a vectorized loop to provoke a LoadBarrier to spill
> > vector/float registers is proving difficult.
> >
> > I've added the spilling of floating pointer registers to load_at in
> > zBarrierSetAssembler_aarch64.cpp and I've modified aarch64.ad and
> > z_aarch64.ad to spill the vector registers, I think. I'd appreciate it
> > if Andrew D wouldn't mind giving it a once over - I my wrongness or
> > correctness should be obvious - it's not clear to me whether the whole
> > 128 bits is spilled, or whether just 64 would be spilled.
> >
> >      http://cr.openjdk.java.net/~smonteith/8214527/webrev.5/
> >
> > I've tested against Lucene, SPECjbb2015 and some limited JTreg runs -
> > I've got a full run running just now.
> >
> > Thanks,
> >     Stuart
> >
> > On Tue, 11 Jun 2019 at 09:53, Andrew Haley <aph at redhat.com> wrote:
> >>
> >> On 6/10/19 10:38 PM, Stuart Monteith wrote:
> >>> With ZGC we are emitting more lea macro instructions. In some
> >>> circumstances I found that a post indexed address was being passed to
> >>> LEA, and so we have to cover that circumstance. In this case, the
> >>> effective address is just the base address, as it is changed after the
> >>> address is calculated.
> >>
> >> Can we not find some way to fix this? Presumably Intel doesn't do that
> >> because it has no post-increment instructions.
> >>
> >>> rscratch2 is loaded with the effective source address, which you'll
> >>> see is passed to the load barrier routine, further down in the method
> >>> - would you like a comment clarifying?
> >>
> >> The use of scratch registers in the AArch64 barrier code has already
> >> led to runtime bugs; it's not a model to be imitated.  Please do not
> >> use scratch registers to pass parameters. It's too risky. Also, please
> >> do not expect scratch registers to be preserved across macro
> >> expansions.
> >>
> >>>       // call_VM_leaf uses rscratch1.
> >>>       __ call_VM_leaf(ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr(decorators),
> >>> dst, rscratch2);
> >>>
> >>> As for the vector registers, I should be saving the full 128 bits of
> >>> v0 to v31?  ZBarrierSetAssembler::load_at on x86 saves 8 SSE XMM
> >>> registers - would they be considered the same as the save-on-call
> >>> d0-d7 registers on Aarch64?
> >>>
> >>> I notice that z_x86_64.ad will kill xmm, ymm and zmm registers (SSE,
> >>> AVX, AVX-512 registers?), depending on the machine it is on. I presume
> >>> the risk we have here is that during autovectorization these registers
> >>> will be lost if we are unlucky enough to have the barrier code
> >>
> >> Explicit KILLs in the AD file are the thing to do here. This uses the
> >> native calling convention.
> >>
> >> --
> >> Andrew Haley
> >> Java Platform Lead Engineer
> >> Red Hat UK Ltd. <https://www.redhat.com>
> >> https://keybase.io/andrewhaley
> >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671