Question about stack overflows in native code
Frederic Parain
frederic.parain at oracle.com
Tue Apr 4 18:58:45 UTC 2017
On 04/04/2017 02:31 PM, Thomas Stüfe wrote:
> Hi David,
>
> On Tue, Apr 4, 2017 at 12:11 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> On 4/04/2017 6:30 PM, Thomas Stüfe wrote:
>
> Hi David,
>
> On Mon, Apr 3, 2017 at 11:02 PM, David Holmes
> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>> wrote:
>
> Just to follow up on what Fred responded ...
>
> On 4/04/2017 4:42 AM, Thomas Stüfe wrote:
>
> Hi Fred,
>
> thanks! Some more questions inline.
>
> On Mon, Apr 3, 2017 at 8:29 PM, Frederic Parain
> <frederic.parain at oracle.com
> <mailto:frederic.parain at oracle.com>
> <mailto:frederic.parain at oracle.com
> <mailto:frederic.parain at oracle.com>>>
>
> wrote:
>
> When the yellow zone is hit and the thread state is
> not in
> _thread_in_java (which means thread state is
> _thread_in_native or
> _thread_in_vm), the yellow zone is silently disabled
> and the
> thread
> is allowed to resume its execution.
>
>
> Disabled by whom exactly?
>
> Normally, this would be done in the signal handler, but that
> requires
> enough stack space to run. AFAIK jitted or interpreted
> code does
> stack
> banging in order to trigger the yellow-page-segfault at
> a point
> where there
> are enough pages left on the stack to invoke the signal
> handler
> (n shadow
> pages before), but that is not guaranteed to work with
> native
> C-compiled
> code, no?
>
>
> The stack banging is done to ensure the stackoverflow is hit
> before
> we start doing the actual operation. The size of the yellow
> and red
> zones are supposed to be sufficient to allow the respective
> signal
> processing and response to be executed.
>
>
> And the size of the shadow pages should be sufficient to invoke
> initial
> signal handler which will unprotect the yellow or red zone, right?
> So, back to my original question, if native C code does not bang the
> stack but simply runs into the yellow zone, process will simply
> die, or?
>
>
> I thought Fred already answered that. The signal handler simply
> disables the yellow zone and returns:
>
> } else {
> // Thread was in the vm or native code. Return and try
> to finish.
> thread->disable_stack_yellow_reserved_zone();
> return 1;
> }
>
>
> But in order to do this it needs at least enough stack space to invoke
> the signal handler and call mprotect on the yellow page, right? So, for
> native code compiled by a C-compiler, this may or may not work,
> depending on whether and what form of stack-banging code the C-Compiler
> does generate? (It may generate some sort of stack banging to trigger
> the OS guard page and do OS stack overflow handling, or it may just
> blindly run into the yellow page when pushing a new frame).
The yellow zone by itself doesn't provide protection against dying
from a stack overflow. It has been designed to work in coordination
with the stack banging. With stack banging, a thread will try to
"touch" some pages down its stack *before* it really needs them.
This way, if the yellow zone is hit during the stack banging, there's
enough remaining free stack space before the yellow zone to execute
the signal handler (which doesn't need a lot of stack space). And
if the signal handler can disable the yellow zone, then the thread
has enough stack space to perform more complex operations like
generating and throwing a StackOverflowError.
Without stack banging, the thread will use is stack space until
the yellow zone is hit, and usually when it is hit, the process
will die because there wasn't enough remaining space to execute
the signal handler.
In Java code, stack banging is performed each time a method is
invoked, to ensure the thread has enough stack space to execute
it (the class file provides information about the maximum number
of local variables and the deepest execution stack the method
will need). Of course, with JIT compile code, stack banging is
performed differently because of in-lining.
For code which cannot perform stack banging on method boundaries,
like the VM code, the approach is different. Each time a thread
is about to call into the VM runtime, a stack banging is performed
using the StackShadowPages sizing. Shadow pages is supposed to
represent a stack space big enough to execute *any* call to VM
runtime. So, if this stack banging passes, all the runtime code
is executed without any additional check, hoping that shadow
pages have been sized correctly.
You can try to add, in your native code, some stack banging code,
or a logic computing the remaining stack space before the
guard pages. Not necessarily on every method call, but on well
known points in your code. The hardest part is usually to know
how much stack space your native code will need. It's possible
to start with a big over-estimating value, and refine it later.
The sizing of the different zones has been determined with a
trial and error process which still continue today as the
JVM code and native JDK code evolve.
Fred
> If stack space is not sufficient to invoke the signal handler to
> unprotect the yellow/red page, process would silently die, right?
Correct.
>
> If it keeps going and hits the red zone then the red zone will be
> disabled, we print some error messages, and then should call
> VMError::report_and_die(). But I admit the signal handler logic is
> quite complex so I may have missed something. :)
>
>
>
> But that assumes you simply advance into the guard zones -
> if your
> native code suddenly jumped to the end of the yellow zone for
> example, then signal processing would hit the red zone;
> similarly if
> you jump to the end of the red zone then signal processing
> will hit
> the OS guard page. If you jump past all guard pages you
> simply die.
>
>
> Thank you!
>
> See also my response to Fred. We wondered whether exporting a
> simple JNI
> helper function to check the stack size on behalf of the native code
> would be something helpful, for cooperative native code at least.
>
>
> Perhaps. Haven't really thought about it. :)
>
>
> We may experiment a bit. The VM silently dying on native code stack
> overflows is a huge annoyance, especially since it depends on the
> user-adjustable stack size. Typically not even a hs_err file is generated.
>
> Actually not a theoretical problem, I am currently running into this:
> http://www-01.ibm.com/support/docview.wss?uid=swg1IV23033 for our
> commercial code base at a customer (not j9 obviously), and while the
> recursion in the vector calculations can be fixed, it would be nice to
> at least have an hs_err file...
>
> Cheers,
> David
>
>
> Kind Regards, Thomas
>
>
> Kind Regards, Thomas
>
>
> David
>
>
> (not just a theory, we have a test case here where a stack
> overflow in
> native code just silently kills the process.)
>
> I guess it may work accidentally if the C-compiled code
> itself
> does some
> form of stack banging when establishing frames, in order to
> detect OS stack
> overflows? Very fuzzy here. But whatever the C-compiled code
> does, it has
> no notion about how much space we need to invoke the signal
> handler and
> handle stack overflows, no?
>
> When the red zone is hit, what ever the current thread
> state is,
>
> the red zone is disabled and
> VMError::report_and_die() is
> called,
> which should generate a hs_err file unless the
> generation of the
> error file requires more memory than the red zone
> provides.
>
> Fred
>
>
> Thanks, Thomas
>
>
>
>
> On 04/03/2017 02:08 PM, Thomas Stüfe wrote:
>
> Hi,
>
> Today we wondered what would happen when a stack
> overflow occurs in native
> code running in a java thread (an attached
> thread or one
> created by the
> VM).
>
> In that case yellow and red pages are in place,
> but this
> would not help
> much, would it not, because the native code
> would not do
> any stack
> banging?
>
> So, native code would hit the yellow page, and then
> there would probably
> not be enough space left on the stack to invoke the
> signal handler. The
> result would be immediate VM death - not even an
> hs-err
> file - is that
> correct?
>
> Also, we would hit the our own yellow page, not the
> guard page the OS may
> or may not have established, so - on UNIX - this
> would
> show up as
> "Segmentation Fault", not "Stack Overflow", or?
>
> Thank you,
>
> Thomas
>
>
>
>
More information about the hotspot-runtime-dev
mailing list