RFR: JDK-8214527 AArch64: ZGC for Aarch64

Wed Jun 12 15:18:48 UTC 2019

Hello,
   I believe I've addressed the outstanding issues. As you pointed out
Andrew H., getting a vectorized loop to provoke a LoadBarrier to spill
vector/float registers is proving difficult.

I've added the spilling of floating pointer registers to load_at in
zBarrierSetAssembler_aarch64.cpp and I've modified aarch64.ad and
z_aarch64.ad to spill the vector registers, I think. I'd appreciate it
if Andrew D wouldn't mind giving it a once over - I my wrongness or
correctness should be obvious - it's not clear to me whether the whole
128 bits is spilled, or whether just 64 would be spilled.

    http://cr.openjdk.java.net/~smonteith/8214527/webrev.5/

I've tested against Lucene, SPECjbb2015 and some limited JTreg runs -
I've got a full run running just now.

Thanks,
   Stuart

On Tue, 11 Jun 2019 at 09:53, Andrew Haley <aph at redhat.com> wrote:
>
> On 6/10/19 10:38 PM, Stuart Monteith wrote:
> > With ZGC we are emitting more lea macro instructions. In some
> > circumstances I found that a post indexed address was being passed to
> > LEA, and so we have to cover that circumstance. In this case, the
> > effective address is just the base address, as it is changed after the
> > address is calculated.
>
> Can we not find some way to fix this? Presumably Intel doesn't do that
> because it has no post-increment instructions.
>
> > rscratch2 is loaded with the effective source address, which you'll
> > see is passed to the load barrier routine, further down in the method
> > - would you like a comment clarifying?
>
> The use of scratch registers in the AArch64 barrier code has already
> led to runtime bugs; it's not a model to be imitated.  Please do not
> use scratch registers to pass parameters. It's too risky. Also, please
> do not expect scratch registers to be preserved across macro
> expansions.
>
> >      // call_VM_leaf uses rscratch1.
> >      __ call_VM_leaf(ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr(decorators),
> > dst, rscratch2);
> >
> > As for the vector registers, I should be saving the full 128 bits of
> > v0 to v31?  ZBarrierSetAssembler::load_at on x86 saves 8 SSE XMM
> > registers - would they be considered the same as the save-on-call
> > d0-d7 registers on Aarch64?
> >
> > I notice that z_x86_64.ad will kill xmm, ymm and zmm registers (SSE,
> > AVX, AVX-512 registers?), depending on the machine it is on. I presume
> > the risk we have here is that during autovectorization these registers
> > will be lost if we are unlucky enough to have the barrier code
>
> Explicit KILLs in the AD file are the thing to do here. This uses the
> native calling convention.
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671