RFR (M) 8146410: Interpreter functions are declared and defined in the wrong files
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Tue Jan 5 16:33:42 UTC 2016
Hi Andrew,
If you are concerned about the TLB pollution, you can
load thread->_stack_overflow_limit and compare against that.
If you are past that limit, you just touch a yellow page to get the
SIGSEGV for the stack overflow.
You touch the thread nearby anyways, so that page should be
in the TLB.
(There is Thread::stack_overflow_limit_offset()).
Best regards,
Goetz.
(I guess this is out-of-scope of the original RFR.)
> -----Original Message-----
> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> Behalf Of Andrew Haley
> Sent: Tuesday, January 05, 2016 4:45 PM
> To: Andrew Dinn <adinn at redhat.com>; Coleen Phillimore
> <coleen.phillimore at oracle.com>; hotspot-dev developers <hotspot-
> dev at openjdk.java.net>
> Subject: Re: RFR (M) 8146410: Interpreter functions are declared and defined
> in the wrong files
>
> Hi,
>
> On 01/05/2016 10:28 AM, Andrew Dinn wrote:
> >
> > The generic code manages relies on the MacroAssembler auxiliary
> > bang_stack_with_offset(int offset) called with the relevant offsets. On
> > AArch64, that auxiliary method does a store using a register with
> > register offset
> >
> > mov(rscratch2, -offset);
> > str(zr, Address(sp, rscratch2));
> >
> > The overriding definition that you have commented out bypasses this
> > auxiliary, computing the address in a scratch register then using a
> > register direct store
> >
> > for (int pages = start_page; pages <= n_shadow_pages ; pages++) {
> > __ sub(rscratch2, sp, pages*page_size);
> > __ str(zr, Address(rscratch2));
> >
> > I cannot be certain but I recall we found that there was a significant
> > performance difference between these two variants -- which means we
> want
> > to retain the definition you have commented out. Andrew Haley (in cc)
> > may be able to provide more info here as I believe he added this
> > overriding implementation.
> >
> > I guess a comment on our overriding definition would be in order should
> > we ascertain that this was indeed the rationale.
>
> I don't think that it matters any more. It used to be that we were
> not very good at generating all the possible versions of move
> immediate instructions, so with the "old" code before Coleen's patch
> we generated
>
> 0x0000007fa808e394: sub xscratch2, sp, #0x1, lsl #12
> 0x0000007fa808e398: str xzr, [xscratch2]
> 0x0000007fa808e39c: sub xscratch2, sp, #0x2, lsl #12
> 0x0000007fa808e3a0: str xzr, [xscratch2]
> 0x0000007fa808e3a4: sub xscratch2, sp, #0x3, lsl #12
> 0x0000007fa808e3a8: str xzr, [xscratch2]
> 0x0000007fa808e3ac: sub xscratch2, sp, #0x4, lsl #12
> 0x0000007fa808e3b0: str xzr, [xscratch2]
> 0x0000007fa808e3b4: sub xscratch2, sp, #0x5, lsl #12
> 0x0000007fa808e3b8: str xzr, [xscratch2]
> 0x0000007fa808e3bc: sub xscratch2, sp, #0x6, lsl #12
> 0x0000007fa808e3c0: str xzr, [xscratch2]
> 0x0000007fa808e3c4: sub xscratch2, sp, #0x7, lsl #12
> 0x0000007fa808e3c8: str xzr, [xscratch2]
> 0x0000007fa808e3cc: sub xscratch2, sp, #0x8, lsl #12
> 0x0000007fa808e3d0: str xzr, [xscratch2]
> 0x0000007fa808e3d4: sub xscratch2, sp, #0x9, lsl #12
> 0x0000007fa808e3d8: str xzr, [xscratch2]
>
> and with the generic code after this patch
>
> 0x0000007fa808e394: orr xscratch2, xzr, #0xfffffffffffff000
> 0x0000007fa808e398: str xzr, [sp,x9]
> 0x0000007fa808e39c: orr xscratch2, xzr, #0xffffffffffffe000
> 0x0000007fa808e3a0: str xzr, [sp,x9]
> 0x0000007fa808e3a4: mov xscratch2, #0xffffffffffffd000 // #-12288
> 0x0000007fa808e3a8: str xzr, [sp,x9]
> 0x0000007fa808e3ac: orr xscratch2, xzr, #0xffffffffffffc000
> 0x0000007fa808e3b0: str xzr, [sp,x9]
> 0x0000007fa808e3b4: mov xscratch2, #0xffffffffffffb000 // #-20480
> 0x0000007fa808e3b8: str xzr, [sp,x9]
> 0x0000007fa808e3bc: mov xscratch2, #0xffffffffffffa000 // #-24576
> 0x0000007fa808e3c0: str xzr, [sp,x9]
> 0x0000007fa808e3c4: mov xscratch2, #0xffffffffffff9000 // #-28672
> 0x0000007fa808e3c8: str xzr, [sp,x9]
> 0x0000007fa808e3cc: orr xscratch2, xzr, #0xffffffffffff8000
> 0x0000007fa808e3d0: str xzr, [sp,x9]
> 0x0000007fa808e3d4: mov xscratch2, #0xffffffffffff7000 // #-36864
> 0x0000007fa808e3d8: str xzr, [sp,x9]
>
> in the case of 64k pages, that's before
>
> 0x000003ffa808e398: sub xscratch2, sp, #0x10, lsl #12
> 0x000003ffa808e39c: str xzr, [xscratch2]
> 0x000003ffa808e3a0: sub xscratch2, sp, #0x20, lsl #12
> 0x000003ffa808e3a4: str xzr, [xscratch2]
>
> and after
>
> 0x000003ffa808e398: orr xscratch2, xzr, #0xffffffffffff0000
> 0x000003ffa808e39c: str xzr, [sp,x9]
> 0x000003ffa808e3a0: mov xscratch2, #0xfffffffffffe0000 // #-131072
> 0x000003ffa808e3a4: str xzr, [sp,x9]
>
> So, it turns out we can generate all the offsets in a single
> instruction: it really doesn't matter, and I think Coleen's patch is
> just fine.
>
> In terms of efficiency the cost of a few instructions is dwarfed by
> the fact that we dirty 9 (out of only maybe 48) TLB entries. That's a
> huge performance effect.
>
> Andrew.
More information about the hotspot-dev
mailing list