RFR (M) 8146410: Interpreter functions are declared and defined in the wrong files
Andrew Haley
aph at redhat.com
Tue Jan 5 15:45:08 UTC 2016
Hi,
On 01/05/2016 10:28 AM, Andrew Dinn wrote:
>
> The generic code manages relies on the MacroAssembler auxiliary
> bang_stack_with_offset(int offset) called with the relevant offsets. On
> AArch64, that auxiliary method does a store using a register with
> register offset
>
> mov(rscratch2, -offset);
> str(zr, Address(sp, rscratch2));
>
> The overriding definition that you have commented out bypasses this
> auxiliary, computing the address in a scratch register then using a
> register direct store
>
> for (int pages = start_page; pages <= n_shadow_pages ; pages++) {
> __ sub(rscratch2, sp, pages*page_size);
> __ str(zr, Address(rscratch2));
>
> I cannot be certain but I recall we found that there was a significant
> performance difference between these two variants -- which means we want
> to retain the definition you have commented out. Andrew Haley (in cc)
> may be able to provide more info here as I believe he added this
> overriding implementation.
>
> I guess a comment on our overriding definition would be in order should
> we ascertain that this was indeed the rationale.
I don't think that it matters any more. It used to be that we were
not very good at generating all the possible versions of move
immediate instructions, so with the "old" code before Coleen's patch
we generated
0x0000007fa808e394: sub xscratch2, sp, #0x1, lsl #12
0x0000007fa808e398: str xzr, [xscratch2]
0x0000007fa808e39c: sub xscratch2, sp, #0x2, lsl #12
0x0000007fa808e3a0: str xzr, [xscratch2]
0x0000007fa808e3a4: sub xscratch2, sp, #0x3, lsl #12
0x0000007fa808e3a8: str xzr, [xscratch2]
0x0000007fa808e3ac: sub xscratch2, sp, #0x4, lsl #12
0x0000007fa808e3b0: str xzr, [xscratch2]
0x0000007fa808e3b4: sub xscratch2, sp, #0x5, lsl #12
0x0000007fa808e3b8: str xzr, [xscratch2]
0x0000007fa808e3bc: sub xscratch2, sp, #0x6, lsl #12
0x0000007fa808e3c0: str xzr, [xscratch2]
0x0000007fa808e3c4: sub xscratch2, sp, #0x7, lsl #12
0x0000007fa808e3c8: str xzr, [xscratch2]
0x0000007fa808e3cc: sub xscratch2, sp, #0x8, lsl #12
0x0000007fa808e3d0: str xzr, [xscratch2]
0x0000007fa808e3d4: sub xscratch2, sp, #0x9, lsl #12
0x0000007fa808e3d8: str xzr, [xscratch2]
and with the generic code after this patch
0x0000007fa808e394: orr xscratch2, xzr, #0xfffffffffffff000
0x0000007fa808e398: str xzr, [sp,x9]
0x0000007fa808e39c: orr xscratch2, xzr, #0xffffffffffffe000
0x0000007fa808e3a0: str xzr, [sp,x9]
0x0000007fa808e3a4: mov xscratch2, #0xffffffffffffd000 // #-12288
0x0000007fa808e3a8: str xzr, [sp,x9]
0x0000007fa808e3ac: orr xscratch2, xzr, #0xffffffffffffc000
0x0000007fa808e3b0: str xzr, [sp,x9]
0x0000007fa808e3b4: mov xscratch2, #0xffffffffffffb000 // #-20480
0x0000007fa808e3b8: str xzr, [sp,x9]
0x0000007fa808e3bc: mov xscratch2, #0xffffffffffffa000 // #-24576
0x0000007fa808e3c0: str xzr, [sp,x9]
0x0000007fa808e3c4: mov xscratch2, #0xffffffffffff9000 // #-28672
0x0000007fa808e3c8: str xzr, [sp,x9]
0x0000007fa808e3cc: orr xscratch2, xzr, #0xffffffffffff8000
0x0000007fa808e3d0: str xzr, [sp,x9]
0x0000007fa808e3d4: mov xscratch2, #0xffffffffffff7000 // #-36864
0x0000007fa808e3d8: str xzr, [sp,x9]
in the case of 64k pages, that's before
0x000003ffa808e398: sub xscratch2, sp, #0x10, lsl #12
0x000003ffa808e39c: str xzr, [xscratch2]
0x000003ffa808e3a0: sub xscratch2, sp, #0x20, lsl #12
0x000003ffa808e3a4: str xzr, [xscratch2]
and after
0x000003ffa808e398: orr xscratch2, xzr, #0xffffffffffff0000
0x000003ffa808e39c: str xzr, [sp,x9]
0x000003ffa808e3a0: mov xscratch2, #0xfffffffffffe0000 // #-131072
0x000003ffa808e3a4: str xzr, [sp,x9]
So, it turns out we can generate all the offsets in a single
instruction: it really doesn't matter, and I think Coleen's patch is
just fine.
In terms of efficiency the cost of a few instructions is dwarfed by
the fact that we dirty 9 (out of only maybe 48) TLB entries. That's a
huge performance effect.
Andrew.
More information about the hotspot-dev
mailing list