[aarch64-port-dev ] RFR: JDK-8214527 AArch64: ZGC for Aarch64
Stuart Monteith
stuart.monteith at linaro.org
Tue Jun 11 08:32:55 UTC 2019
Looking way back, the original ZGC patch added definitions in
x86_64.ad explicitly for handling the vector registers.
In aarch64.ad, would I add definitions for this reg_class to include
all of the registers up to v31_reg? I'm concerned the register
definition doesn't include V0_J, V0_K:
// Class for 128 bit register v0
reg_class v0_reg(
V0, V0_H
);
and then add further definitions like the following all the way up to
include vRegD_V31:
operand vRegD_V0()
%{
constraint(ALLOC_IN_RC(v0_reg));
match(RegD);
op_cost(0);
format %{ %}
interface(REG_INTER);
%}
The z_aarch64.ad file would then be modified to:
instruct loadBarrierSlowReg(iRegP dst, memory mem, rFlagsReg cr,
vRegD_V0 v0, vRegD_V1 v1, vRegD_V2 v2........ vRegD_V31 v31) %{
match(Set dst (LoadBarrierSlowReg mem));
predicate(!n->as_LoadBarrierSlowReg()->is_weak());
effect(DEF dst, KILL cr,
KILL v0, KILL v1, KILL v2..... KILL v31);
On the face of it the v0_reg class only has a 64-bit definition,
rather than 128-bit.
Would this be more correct for our purposes?
// Class for 128 bit register v0
reg_class vx0_reg(
V0, V0_H, V0_J, V0_K
);
operand vVecX_V0()
%{
constraint(ALLOC_IN_RC(vx0_reg));
match(VecX);
op_cost(0);
format %{ %}
interface(REG_INTER);
%}
instruct loadBarrierSlowReg(iRegP dst, memory mem, rFlagsReg cr,
vVecX_V0 v0, vVecX_V1 v1, vVecX_V2 v2........ vVecX_V31 v31) %{
match(Set dst (LoadBarrierSlowReg mem));
predicate(!n->as_LoadBarrierSlowReg()->is_weak());
effect(DEF dst, KILL cr,
KILL v0, KILL v1, KILL v2..... KILL v31);
Thanks,
Stuart
On Mon, 10 Jun 2019 at 22:38, Stuart Monteith
<stuart.monteith at linaro.org> wrote:
>
> Thanks for looking at this.
>
> With ZGC we are emitting more lea macro instructions. In some
> circumstances I found that a post indexed address was being passed to
> LEA, and so we have to cover that circumstance. In this case, the
> effective address is just the base address, as it is changed after the
> address is calculated.
>
> rscratch2 is loaded with the effective source address, which you'll
> see is passed to the load barrier routine, further down in the method
> - would you like a comment clarifying?
> // call_VM_leaf uses rscratch1.
> __ call_VM_leaf(ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_addr(decorators),
> dst, rscratch2);
>
> As for the vector registers, I should be saving the full 128 bits of v0 to v31?
> ZBarrierSetAssembler::load_at on x86 saves 8 SSE XMM registers -
> would they be considered the same as the save-on-call d0-d7 registers
> on Aarch64?
>
> I notice that z_x86_64.ad will kill xmm, ymm and zmm registers (SSE,
> AVX, AVX-512 registers?), depending on the machine it is on. I presume
> the risk we have here is that during autovectorization these registers
> will be lost if we are unlucky enough to have the barrier code
> overwrite them.
>
> BR,
> Stuart
>
> On Mon, 10 Jun 2019 at 17:39, Andrew Haley <aph at redhat.com> wrote:
> >
> > On 6/10/19 3:02 PM, Stuart Monteith wrote:
> > > Nils' patch for "ZGC Late Barrier Insertion" has been merged
> > > (http://hg.openjdk.java.net/jdk/jdk/rev/ed12027517c0). I'm now running
> > > a fresh jtreg test against it with my updated patch here:
> > >
> > > http://cr.openjdk.java.net/~smonteith/8214527/webrev.4/
> > >
> > > The difference from before are some additions to z_aarch64.ad in order
> > > to implement the new nodes required by Nils' patch. Running against
> > > specjbb2015 and Lucene don't throw up any errors.
> >
> > This hunk is very weird.
> >
> > --- old/src/hotspot/cpu/aarch64/assembler_aarch64.cpp 2019-06-10 14:25:39.274238301 +0100
> > +++ new/src/hotspot/cpu/aarch64/assembler_aarch64.cpp 2019-06-10 14:25:39.026235784 +0100
> > @@ -1265,6 +1265,13 @@
> > __ movptr(r, (uint64_t)target());
> > break;
> > }
> > + case post: {
> > + // Post-indexed, just copy the contents of the register. Offset added afterwards.
> > + if (_base == r) // it's a nop
> > + break;
> > + __ mov(r, _base);
> > + break;
> > + }
> > default:
> > ShouldNotReachHere();
> > }
> >
> > What is going on here:
> >
> > +
> > + // rscratch1 can be passed as src or dst, so don't use it.
> > + RegSet savedRegs = RegSet::of(rscratch2, rheapbase);
> > +
> > + Label done;
> > + assert_different_registers(rheapbase, rscratch2, dst);
> > + assert_different_registers(rheapbase, rscratch2, src.base());
> > +
> > + __ push(savedRegs, sp);
> > +
> > + // Load bad mask into scratch register.
> > + __ ldr(rheapbase, address_bad_mask_from_thread(rthread));
> > + __ lea(rscratch2, src);
> >
> > You load an address into rscratch2 but you do not use rscratch2.
> >
> > Barrier stubs save int registers but not vectors. Why is that?
> >
> > Surely this file is nearly identical to x86:
> >
> > --- /dev/null 2019-06-10 08:42:37.317240407 +0100
> > +++ new/src/hotspot/os_cpu/linux_aarch64/gc/z/zBackingFile_linux_aarch64.cpp 2019-06-10 14:25:44.374290036 +0100
> > @@ -0,0 +1,590 @@
> >
> > --
> > Andrew Haley
> > Java Platform Lead Engineer
> > Red Hat UK Ltd. <https://www.redhat.com>
> > https://keybase.io/andrewhaley
> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the aarch64-port-dev
mailing list