RFR (M) 8146410: Interpreter functions are declared and defined in the wrong files

Tue Jan 5 15:45:08 UTC 2016

Hi,

On 01/05/2016 10:28 AM, Andrew Dinn wrote:
> 
> The generic code manages relies on the MacroAssembler auxiliary
> bang_stack_with_offset(int offset) called with the relevant offsets. On
> AArch64, that auxiliary method does a store using a register with
> register offset
> 
>     mov(rscratch2, -offset);
>     str(zr, Address(sp, rscratch2));
> 
> The overriding definition that you have commented out bypasses this
> auxiliary, computing the address in a scratch register then using a
> register direct store
> 
>     for (int pages = start_page; pages <= n_shadow_pages ; pages++) {
>       __ sub(rscratch2, sp, pages*page_size);
>       __ str(zr, Address(rscratch2));
> 
> I cannot be certain but I recall we found that there was a significant
> performance difference between these two variants -- which means we want
> to retain the definition you have commented out. Andrew Haley (in cc)
> may be able to provide more info here as I believe he added this
> overriding implementation.
> 
> I guess a comment on our overriding definition would be in order should
> we ascertain that this was indeed the rationale.

I don't think that it matters any more.  It used to be that we were
not very good at generating all the possible versions of move
immediate instructions, so with the "old" code before Coleen's patch
we generated

  0x0000007fa808e394: sub	xscratch2, sp, #0x1, lsl #12
  0x0000007fa808e398: str	xzr, [xscratch2]
  0x0000007fa808e39c: sub	xscratch2, sp, #0x2, lsl #12
  0x0000007fa808e3a0: str	xzr, [xscratch2]
  0x0000007fa808e3a4: sub	xscratch2, sp, #0x3, lsl #12
  0x0000007fa808e3a8: str	xzr, [xscratch2]
  0x0000007fa808e3ac: sub	xscratch2, sp, #0x4, lsl #12
  0x0000007fa808e3b0: str	xzr, [xscratch2]
  0x0000007fa808e3b4: sub	xscratch2, sp, #0x5, lsl #12
  0x0000007fa808e3b8: str	xzr, [xscratch2]
  0x0000007fa808e3bc: sub	xscratch2, sp, #0x6, lsl #12
  0x0000007fa808e3c0: str	xzr, [xscratch2]
  0x0000007fa808e3c4: sub	xscratch2, sp, #0x7, lsl #12
  0x0000007fa808e3c8: str	xzr, [xscratch2]
  0x0000007fa808e3cc: sub	xscratch2, sp, #0x8, lsl #12
  0x0000007fa808e3d0: str	xzr, [xscratch2]
  0x0000007fa808e3d4: sub	xscratch2, sp, #0x9, lsl #12
  0x0000007fa808e3d8: str	xzr, [xscratch2]

and with the generic code after this patch

  0x0000007fa808e394: orr	xscratch2, xzr, #0xfffffffffffff000
  0x0000007fa808e398: str	xzr, [sp,x9]
  0x0000007fa808e39c: orr	xscratch2, xzr, #0xffffffffffffe000
  0x0000007fa808e3a0: str	xzr, [sp,x9]
  0x0000007fa808e3a4: mov	xscratch2, #0xffffffffffffd000    	// #-12288
  0x0000007fa808e3a8: str	xzr, [sp,x9]
  0x0000007fa808e3ac: orr	xscratch2, xzr, #0xffffffffffffc000
  0x0000007fa808e3b0: str	xzr, [sp,x9]
  0x0000007fa808e3b4: mov	xscratch2, #0xffffffffffffb000    	// #-20480
  0x0000007fa808e3b8: str	xzr, [sp,x9]
  0x0000007fa808e3bc: mov	xscratch2, #0xffffffffffffa000    	// #-24576
  0x0000007fa808e3c0: str	xzr, [sp,x9]
  0x0000007fa808e3c4: mov	xscratch2, #0xffffffffffff9000    	// #-28672
  0x0000007fa808e3c8: str	xzr, [sp,x9]
  0x0000007fa808e3cc: orr	xscratch2, xzr, #0xffffffffffff8000
  0x0000007fa808e3d0: str	xzr, [sp,x9]
  0x0000007fa808e3d4: mov	xscratch2, #0xffffffffffff7000    	// #-36864
  0x0000007fa808e3d8: str	xzr, [sp,x9]

in the case of 64k pages, that's before

  0x000003ffa808e398: sub	xscratch2, sp, #0x10, lsl #12
  0x000003ffa808e39c: str	xzr, [xscratch2]
  0x000003ffa808e3a0: sub	xscratch2, sp, #0x20, lsl #12
  0x000003ffa808e3a4: str	xzr, [xscratch2]

and after

  0x000003ffa808e398: orr	xscratch2, xzr, #0xffffffffffff0000
  0x000003ffa808e39c: str	xzr, [sp,x9]
  0x000003ffa808e3a0: mov	xscratch2, #0xfffffffffffe0000    	// #-131072
  0x000003ffa808e3a4: str	xzr, [sp,x9]

So, it turns out we can generate all the offsets in a single
instruction: it really doesn't matter, and I think Coleen's patch is
just fine.

In terms of efficiency the cost of a few instructions is dwarfed by
the fact that we dirty 9 (out of only maybe 48) TLB entries.  That's a
huge performance effect.

Andrew.