RFR (M) 8146410: Interpreter functions are declared and defined in the wrong files

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Tue Jan 5 16:33:42 UTC 2016


Hi Andrew,

If you are concerned about the TLB pollution, you can 
load thread->_stack_overflow_limit and compare against that.
If you are past that limit, you just touch a yellow page to get the
SIGSEGV for the stack overflow. 

You touch the thread nearby anyways, so that page should be 
in the TLB.

(There is Thread::stack_overflow_limit_offset()).

Best regards,
  Goetz.

(I guess this is out-of-scope of the original RFR.)


> -----Original Message-----
> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> Behalf Of Andrew Haley
> Sent: Tuesday, January 05, 2016 4:45 PM
> To: Andrew Dinn <adinn at redhat.com>; Coleen Phillimore
> <coleen.phillimore at oracle.com>; hotspot-dev developers <hotspot-
> dev at openjdk.java.net>
> Subject: Re: RFR (M) 8146410: Interpreter functions are declared and defined
> in the wrong files
> 
> Hi,
> 
> On 01/05/2016 10:28 AM, Andrew Dinn wrote:
> >
> > The generic code manages relies on the MacroAssembler auxiliary
> > bang_stack_with_offset(int offset) called with the relevant offsets. On
> > AArch64, that auxiliary method does a store using a register with
> > register offset
> >
> >     mov(rscratch2, -offset);
> >     str(zr, Address(sp, rscratch2));
> >
> > The overriding definition that you have commented out bypasses this
> > auxiliary, computing the address in a scratch register then using a
> > register direct store
> >
> >     for (int pages = start_page; pages <= n_shadow_pages ; pages++) {
> >       __ sub(rscratch2, sp, pages*page_size);
> >       __ str(zr, Address(rscratch2));
> >
> > I cannot be certain but I recall we found that there was a significant
> > performance difference between these two variants -- which means we
> want
> > to retain the definition you have commented out. Andrew Haley (in cc)
> > may be able to provide more info here as I believe he added this
> > overriding implementation.
> >
> > I guess a comment on our overriding definition would be in order should
> > we ascertain that this was indeed the rationale.
> 
> I don't think that it matters any more.  It used to be that we were
> not very good at generating all the possible versions of move
> immediate instructions, so with the "old" code before Coleen's patch
> we generated
> 
>   0x0000007fa808e394: sub	xscratch2, sp, #0x1, lsl #12
>   0x0000007fa808e398: str	xzr, [xscratch2]
>   0x0000007fa808e39c: sub	xscratch2, sp, #0x2, lsl #12
>   0x0000007fa808e3a0: str	xzr, [xscratch2]
>   0x0000007fa808e3a4: sub	xscratch2, sp, #0x3, lsl #12
>   0x0000007fa808e3a8: str	xzr, [xscratch2]
>   0x0000007fa808e3ac: sub	xscratch2, sp, #0x4, lsl #12
>   0x0000007fa808e3b0: str	xzr, [xscratch2]
>   0x0000007fa808e3b4: sub	xscratch2, sp, #0x5, lsl #12
>   0x0000007fa808e3b8: str	xzr, [xscratch2]
>   0x0000007fa808e3bc: sub	xscratch2, sp, #0x6, lsl #12
>   0x0000007fa808e3c0: str	xzr, [xscratch2]
>   0x0000007fa808e3c4: sub	xscratch2, sp, #0x7, lsl #12
>   0x0000007fa808e3c8: str	xzr, [xscratch2]
>   0x0000007fa808e3cc: sub	xscratch2, sp, #0x8, lsl #12
>   0x0000007fa808e3d0: str	xzr, [xscratch2]
>   0x0000007fa808e3d4: sub	xscratch2, sp, #0x9, lsl #12
>   0x0000007fa808e3d8: str	xzr, [xscratch2]
> 
> and with the generic code after this patch
> 
>   0x0000007fa808e394: orr	xscratch2, xzr, #0xfffffffffffff000
>   0x0000007fa808e398: str	xzr, [sp,x9]
>   0x0000007fa808e39c: orr	xscratch2, xzr, #0xffffffffffffe000
>   0x0000007fa808e3a0: str	xzr, [sp,x9]
>   0x0000007fa808e3a4: mov	xscratch2, #0xffffffffffffd000    	// #-12288
>   0x0000007fa808e3a8: str	xzr, [sp,x9]
>   0x0000007fa808e3ac: orr	xscratch2, xzr, #0xffffffffffffc000
>   0x0000007fa808e3b0: str	xzr, [sp,x9]
>   0x0000007fa808e3b4: mov	xscratch2, #0xffffffffffffb000    	// #-20480
>   0x0000007fa808e3b8: str	xzr, [sp,x9]
>   0x0000007fa808e3bc: mov	xscratch2, #0xffffffffffffa000    	// #-24576
>   0x0000007fa808e3c0: str	xzr, [sp,x9]
>   0x0000007fa808e3c4: mov	xscratch2, #0xffffffffffff9000    	// #-28672
>   0x0000007fa808e3c8: str	xzr, [sp,x9]
>   0x0000007fa808e3cc: orr	xscratch2, xzr, #0xffffffffffff8000
>   0x0000007fa808e3d0: str	xzr, [sp,x9]
>   0x0000007fa808e3d4: mov	xscratch2, #0xffffffffffff7000    	// #-36864
>   0x0000007fa808e3d8: str	xzr, [sp,x9]
> 
> in the case of 64k pages, that's before
> 
>   0x000003ffa808e398: sub	xscratch2, sp, #0x10, lsl #12
>   0x000003ffa808e39c: str	xzr, [xscratch2]
>   0x000003ffa808e3a0: sub	xscratch2, sp, #0x20, lsl #12
>   0x000003ffa808e3a4: str	xzr, [xscratch2]
> 
> and after
> 
>   0x000003ffa808e398: orr	xscratch2, xzr, #0xffffffffffff0000
>   0x000003ffa808e39c: str	xzr, [sp,x9]
>   0x000003ffa808e3a0: mov	xscratch2, #0xfffffffffffe0000    	// #-131072
>   0x000003ffa808e3a4: str	xzr, [sp,x9]
> 
> So, it turns out we can generate all the offsets in a single
> instruction: it really doesn't matter, and I think Coleen's patch is
> just fine.
> 
> In terms of efficiency the cost of a few instructions is dwarfed by
> the fact that we dirty 9 (out of only maybe 48) TLB entries.  That's a
> huge performance effect.
> 
> Andrew.


More information about the hotspot-dev mailing list