Question about stack overflows in native code

Wed Apr 5 01:16:43 UTC 2017

On Wed, 5 Apr 2017 at 01:07, David Holmes <david.holmes at oracle.com> wrote:

> Thanks for the correction Fred. I hadn't appreciated the significance of
> the stack-banging. I thought it was only to allow for the exception to
> be thrown before you commenced any action that would then be left in an
> inconsistent state. But of course if you exhaust your stack right up to
> the yellow page such that it faults, then you can't even push anything
> needed to invoke the signal handler! Pity there isn't a page protection
> mode that resets once the fault is

triggered.

I think there is on Windows, they have one-shot guard pages which unprotect
themselves when triggered. But I do not think we use it.

>
> Thomas

Thanks again,
> David
>
> On 5/04/2017 4:58 AM, Frederic Parain wrote:
> >
> >
> > On 04/04/2017 02:31 PM, Thomas Stüfe wrote:
> >> Hi David,
> >>
> >> On Tue, Apr 4, 2017 at 12:11 PM, David Holmes <david.holmes at oracle.com
> >> <mailto:david.holmes at oracle.com>> wrote:
> >>
> >>     On 4/04/2017 6:30 PM, Thomas Stüfe wrote:
> >>
> >>         Hi David,
> >>
> >>         On Mon, Apr 3, 2017 at 11:02 PM, David Holmes
> >>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> >>         <mailto:david.holmes at oracle.com
> >>         <mailto:david.holmes at oracle.com>>> wrote:
> >>
> >>             Just to follow up on what Fred responded ...
> >>
> >>             On 4/04/2017 4:42 AM, Thomas Stüfe wrote:
> >>
> >>                 Hi Fred,
> >>
> >>                 thanks! Some more questions inline.
> >>
> >>                 On Mon, Apr 3, 2017 at 8:29 PM, Frederic Parain
> >>                 <frederic.parain at oracle.com
> >>         <mailto:frederic.parain at oracle.com>
> >>         <mailto:frederic.parain at oracle.com
> >>         <mailto:frederic.parain at oracle.com>>>
> >>
> >>                 wrote:
> >>
> >>                     When the yellow zone is hit and the thread state is
> >>         not in
> >>                     _thread_in_java (which means thread state is
> >>                     _thread_in_native or
> >>                     _thread_in_vm), the yellow zone is silently disabled
> >>         and the
> >>                     thread
> >>                     is allowed to resume its execution.
> >>
> >>
> >>                 Disabled by whom exactly?
> >>
> >>                 Normally, this would be done in the signal handler,
> >> but that
> >>                 requires
> >>                 enough stack space to run. AFAIK jitted or interpreted
> >>         code does
> >>                 stack
> >>                 banging in order to trigger the yellow-page-segfault at
> >>         a point
> >>                 where there
> >>                 are enough pages left on the stack to invoke the signal
> >>         handler
> >>                 (n shadow
> >>                 pages before), but that is not guaranteed to work with
> >>         native
> >>                 C-compiled
> >>                 code, no?
> >>
> >>
> >>             The stack banging is done to ensure the stackoverflow is hit
> >>         before
> >>             we start doing the actual operation. The size of the yellow
> >>         and red
> >>             zones are supposed to be sufficient to allow the respective
> >>         signal
> >>             processing and response to be executed.
> >>
> >>
> >>         And the size of the shadow pages should be sufficient to invoke
> >>         initial
> >>         signal handler which will unprotect the yellow or red zone,
> >> right?
> >>         So, back to my original question, if native C code does not
> >> bang the
> >>         stack but simply runs into the yellow zone, process will simply
> >>         die, or?
> >>
> >>
> >>     I thought Fred already answered that. The signal handler simply
> >>     disables the yellow zone and returns:
> >>
> >>               } else {
> >>                 // Thread was in the vm or native code.  Return and try
> >>     to finish.
> >>                 thread->disable_stack_yellow_reserved_zone();
> >>                 return 1;
> >>               }
> >>
> >>
> >> But in order to do this it needs at least enough stack space to invoke
> >> the signal handler and call mprotect on the yellow page, right? So, for
> >> native code compiled by a C-compiler, this may or may not work,
> >> depending on whether and what form of stack-banging code the C-Compiler
> >> does generate? (It may generate some sort of stack banging to trigger
> >> the OS guard page and do OS stack overflow handling, or it may just
> >> blindly run into the yellow page when pushing a new frame).
> >
> > The yellow zone by itself doesn't provide protection against dying
> > from a stack overflow. It has been designed to work in coordination
> > with the stack banging. With stack banging, a thread will try to
> > "touch" some pages down its stack *before* it really needs them.
> > This way, if the yellow zone is hit during the stack banging, there's
> > enough remaining free stack space before the yellow zone to execute
> > the signal handler (which doesn't need a lot of stack space). And
> > if the signal handler can disable the yellow zone, then the thread
> > has enough stack space to perform more complex operations like
> > generating and throwing a StackOverflowError.
> >
> > Without stack banging, the thread will use is stack space until
> > the yellow zone is hit, and usually when it is hit, the process
> > will die because there wasn't enough remaining space to execute
> > the signal handler.
> >
> > In Java code, stack banging is performed each time a method is
> > invoked, to ensure the thread has enough stack space to execute
> > it (the class file provides information about the maximum number
> > of local variables and the deepest execution stack the method
> > will need). Of course, with JIT compile code, stack banging is
> > performed differently because of in-lining.
> >
> > For code which cannot perform stack banging on method boundaries,
> > like the VM code, the approach is different. Each time a thread
> > is about to call into the VM runtime, a stack banging is performed
> > using the StackShadowPages sizing. Shadow pages is supposed to
> > represent a stack space big enough to execute *any* call to VM
> > runtime. So, if this stack banging passes, all the runtime code
> > is executed without any additional check, hoping that shadow
> > pages have been sized correctly.
> >
> > You can try to add, in your native code, some stack banging code,
> > or a logic computing the remaining stack space before the
> > guard pages. Not necessarily on every method call, but on well
> > known points in your code. The hardest part is usually to know
> > how much stack space your native code will need. It's possible
> > to start with a big over-estimating value, and refine it later.
> > The sizing of the different zones has been determined with a
> > trial and error process which still continue today as the
> > JVM code and native JDK code evolve.
> >
> > Fred
> >
> >> If stack space is not sufficient to invoke the signal handler to
> >> unprotect the yellow/red page, process would silently die, right?
> >
> > Correct.
> >
> >>
> >>     If it keeps going and hits the red zone then the red zone will be
> >>     disabled, we print some error messages, and then should call
> >>     VMError::report_and_die(). But I admit the signal handler logic is
> >>     quite complex so I may have missed something. :)
> >>
> >>
> >>
> >>             But that assumes you simply advance into the guard zones -
> >>         if your
> >>             native code suddenly jumped to the end of the yellow zone
> for
> >>             example, then signal processing would hit the red zone;
> >>         similarly if
> >>             you jump to the end of the red zone then signal processing
> >>         will hit
> >>             the OS guard page. If you jump past all guard pages you
> >>         simply die.
> >>
> >>
> >>         Thank you!
> >>
> >>         See also my response to Fred. We wondered whether exporting a
> >>         simple JNI
> >>         helper function to check the stack size on behalf of the
> >> native code
> >>         would be something helpful, for cooperative native code at
> least.
> >>
> >>
> >>     Perhaps. Haven't really thought about it. :)
> >>
> >>
> >> We may experiment a bit. The VM silently dying on native code stack
> >> overflows is a huge annoyance, especially since it depends on the
> >> user-adjustable stack size. Typically not even a hs_err file is
> >> generated.
> >>
> >> Actually not a theoretical problem, I am currently running into this:
> >> http://www-01.ibm.com/support/docview.wss?uid=swg1IV23033 for our
> >> commercial code base at a customer (not j9 obviously), and while the
> >> recursion in the vector calculations can be fixed, it would be nice to
> >> at least have an hs_err file...
> >>
> >>     Cheers,
> >>     David
> >>
> >>
> >> Kind Regards, Thomas
> >>
> >>
> >>         Kind Regards, Thomas
> >>
> >>
> >>             David
> >>
> >>
> >>                 (not just a theory, we have a test case here where a
> >> stack
> >>                 overflow in
> >>                 native code just silently kills the process.)
> >>
> >>                 I guess it may work accidentally if the C-compiled code
> >>         itself
> >>                 does some
> >>                 form of stack banging when establishing frames, in
> >> order to
> >>                 detect OS stack
> >>                 overflows? Very fuzzy here. But whatever the
> >> C-compiled code
> >>                 does, it has
> >>                 no notion about how much space we need to invoke the
> >> signal
> >>                 handler and
> >>                 handle stack overflows, no?
> >>
> >>                 When the red zone is hit, what ever the current thread
> >>         state is,
> >>
> >>                     the red zone is disabled and
> >>         VMError::report_and_die() is
> >>                     called,
> >>                     which should generate a hs_err file unless the
> >>         generation of the
> >>                     error file requires more memory than the red zone
> >>         provides.
> >>
> >>                     Fred
> >>
> >>
> >>                 Thanks, Thomas
> >>
> >>
> >>
> >>
> >>                     On 04/03/2017 02:08 PM, Thomas Stüfe wrote:
> >>
> >>                         Hi,
> >>
> >>                         Today we wondered what would happen when a stack
> >>                         overflow occurs in native
> >>                         code running in a java thread (an attached
> >>         thread or one
> >>                         created by the
> >>                         VM).
> >>
> >>                         In that case yellow and red pages are in place,
> >>         but this
> >>                         would not help
> >>                         much, would it not, because the native code
> >>         would not do
> >>                         any stack
> >>                         banging?
> >>
> >>                         So, native code would hit the yellow page, and
> >> then
> >>                         there would probably
> >>                         not be enough space left on the stack to
> >> invoke the
> >>                         signal handler. The
> >>                         result would be immediate VM death - not even an
> >>         hs-err
> >>                         file - is that
> >>                         correct?
> >>
> >>                         Also, we would hit the our own yellow page,
> >> not the
> >>                         guard page the OS may
> >>                         or may not have established, so - on UNIX - this
> >>         would
> >>                         show up as
> >>                         "Segmentation Fault", not "Stack Overflow", or?
> >>
> >>                         Thank you,
> >>
> >>                         Thomas
> >>
> >>
> >>
> >>
>