Question about stack overflows in native code
Thomas Stüfe
thomas.stuefe at gmail.com
Wed Apr 5 01:16:43 UTC 2017
On Wed, 5 Apr 2017 at 01:07, David Holmes <david.holmes at oracle.com> wrote:
> Thanks for the correction Fred. I hadn't appreciated the significance of
> the stack-banging. I thought it was only to allow for the exception to
> be thrown before you commenced any action that would then be left in an
> inconsistent state. But of course if you exhaust your stack right up to
> the yellow page such that it faults, then you can't even push anything
> needed to invoke the signal handler! Pity there isn't a page protection
> mode that resets once the fault is
triggered.
I think there is on Windows, they have one-shot guard pages which unprotect
themselves when triggered. But I do not think we use it.
>
> Thomas
Thanks again,
> David
>
> On 5/04/2017 4:58 AM, Frederic Parain wrote:
> >
> >
> > On 04/04/2017 02:31 PM, Thomas Stüfe wrote:
> >> Hi David,
> >>
> >> On Tue, Apr 4, 2017 at 12:11 PM, David Holmes <david.holmes at oracle.com
> >> <mailto:david.holmes at oracle.com>> wrote:
> >>
> >> On 4/04/2017 6:30 PM, Thomas Stüfe wrote:
> >>
> >> Hi David,
> >>
> >> On Mon, Apr 3, 2017 at 11:02 PM, David Holmes
> >> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> >> <mailto:david.holmes at oracle.com
> >> <mailto:david.holmes at oracle.com>>> wrote:
> >>
> >> Just to follow up on what Fred responded ...
> >>
> >> On 4/04/2017 4:42 AM, Thomas Stüfe wrote:
> >>
> >> Hi Fred,
> >>
> >> thanks! Some more questions inline.
> >>
> >> On Mon, Apr 3, 2017 at 8:29 PM, Frederic Parain
> >> <frederic.parain at oracle.com
> >> <mailto:frederic.parain at oracle.com>
> >> <mailto:frederic.parain at oracle.com
> >> <mailto:frederic.parain at oracle.com>>>
> >>
> >> wrote:
> >>
> >> When the yellow zone is hit and the thread state is
> >> not in
> >> _thread_in_java (which means thread state is
> >> _thread_in_native or
> >> _thread_in_vm), the yellow zone is silently disabled
> >> and the
> >> thread
> >> is allowed to resume its execution.
> >>
> >>
> >> Disabled by whom exactly?
> >>
> >> Normally, this would be done in the signal handler,
> >> but that
> >> requires
> >> enough stack space to run. AFAIK jitted or interpreted
> >> code does
> >> stack
> >> banging in order to trigger the yellow-page-segfault at
> >> a point
> >> where there
> >> are enough pages left on the stack to invoke the signal
> >> handler
> >> (n shadow
> >> pages before), but that is not guaranteed to work with
> >> native
> >> C-compiled
> >> code, no?
> >>
> >>
> >> The stack banging is done to ensure the stackoverflow is hit
> >> before
> >> we start doing the actual operation. The size of the yellow
> >> and red
> >> zones are supposed to be sufficient to allow the respective
> >> signal
> >> processing and response to be executed.
> >>
> >>
> >> And the size of the shadow pages should be sufficient to invoke
> >> initial
> >> signal handler which will unprotect the yellow or red zone,
> >> right?
> >> So, back to my original question, if native C code does not
> >> bang the
> >> stack but simply runs into the yellow zone, process will simply
> >> die, or?
> >>
> >>
> >> I thought Fred already answered that. The signal handler simply
> >> disables the yellow zone and returns:
> >>
> >> } else {
> >> // Thread was in the vm or native code. Return and try
> >> to finish.
> >> thread->disable_stack_yellow_reserved_zone();
> >> return 1;
> >> }
> >>
> >>
> >> But in order to do this it needs at least enough stack space to invoke
> >> the signal handler and call mprotect on the yellow page, right? So, for
> >> native code compiled by a C-compiler, this may or may not work,
> >> depending on whether and what form of stack-banging code the C-Compiler
> >> does generate? (It may generate some sort of stack banging to trigger
> >> the OS guard page and do OS stack overflow handling, or it may just
> >> blindly run into the yellow page when pushing a new frame).
> >
> > The yellow zone by itself doesn't provide protection against dying
> > from a stack overflow. It has been designed to work in coordination
> > with the stack banging. With stack banging, a thread will try to
> > "touch" some pages down its stack *before* it really needs them.
> > This way, if the yellow zone is hit during the stack banging, there's
> > enough remaining free stack space before the yellow zone to execute
> > the signal handler (which doesn't need a lot of stack space). And
> > if the signal handler can disable the yellow zone, then the thread
> > has enough stack space to perform more complex operations like
> > generating and throwing a StackOverflowError.
> >
> > Without stack banging, the thread will use is stack space until
> > the yellow zone is hit, and usually when it is hit, the process
> > will die because there wasn't enough remaining space to execute
> > the signal handler.
> >
> > In Java code, stack banging is performed each time a method is
> > invoked, to ensure the thread has enough stack space to execute
> > it (the class file provides information about the maximum number
> > of local variables and the deepest execution stack the method
> > will need). Of course, with JIT compile code, stack banging is
> > performed differently because of in-lining.
> >
> > For code which cannot perform stack banging on method boundaries,
> > like the VM code, the approach is different. Each time a thread
> > is about to call into the VM runtime, a stack banging is performed
> > using the StackShadowPages sizing. Shadow pages is supposed to
> > represent a stack space big enough to execute *any* call to VM
> > runtime. So, if this stack banging passes, all the runtime code
> > is executed without any additional check, hoping that shadow
> > pages have been sized correctly.
> >
> > You can try to add, in your native code, some stack banging code,
> > or a logic computing the remaining stack space before the
> > guard pages. Not necessarily on every method call, but on well
> > known points in your code. The hardest part is usually to know
> > how much stack space your native code will need. It's possible
> > to start with a big over-estimating value, and refine it later.
> > The sizing of the different zones has been determined with a
> > trial and error process which still continue today as the
> > JVM code and native JDK code evolve.
> >
> > Fred
> >
> >> If stack space is not sufficient to invoke the signal handler to
> >> unprotect the yellow/red page, process would silently die, right?
> >
> > Correct.
> >
> >>
> >> If it keeps going and hits the red zone then the red zone will be
> >> disabled, we print some error messages, and then should call
> >> VMError::report_and_die(). But I admit the signal handler logic is
> >> quite complex so I may have missed something. :)
> >>
> >>
> >>
> >> But that assumes you simply advance into the guard zones -
> >> if your
> >> native code suddenly jumped to the end of the yellow zone
> for
> >> example, then signal processing would hit the red zone;
> >> similarly if
> >> you jump to the end of the red zone then signal processing
> >> will hit
> >> the OS guard page. If you jump past all guard pages you
> >> simply die.
> >>
> >>
> >> Thank you!
> >>
> >> See also my response to Fred. We wondered whether exporting a
> >> simple JNI
> >> helper function to check the stack size on behalf of the
> >> native code
> >> would be something helpful, for cooperative native code at
> least.
> >>
> >>
> >> Perhaps. Haven't really thought about it. :)
> >>
> >>
> >> We may experiment a bit. The VM silently dying on native code stack
> >> overflows is a huge annoyance, especially since it depends on the
> >> user-adjustable stack size. Typically not even a hs_err file is
> >> generated.
> >>
> >> Actually not a theoretical problem, I am currently running into this:
> >> http://www-01.ibm.com/support/docview.wss?uid=swg1IV23033 for our
> >> commercial code base at a customer (not j9 obviously), and while the
> >> recursion in the vector calculations can be fixed, it would be nice to
> >> at least have an hs_err file...
> >>
> >> Cheers,
> >> David
> >>
> >>
> >> Kind Regards, Thomas
> >>
> >>
> >> Kind Regards, Thomas
> >>
> >>
> >> David
> >>
> >>
> >> (not just a theory, we have a test case here where a
> >> stack
> >> overflow in
> >> native code just silently kills the process.)
> >>
> >> I guess it may work accidentally if the C-compiled code
> >> itself
> >> does some
> >> form of stack banging when establishing frames, in
> >> order to
> >> detect OS stack
> >> overflows? Very fuzzy here. But whatever the
> >> C-compiled code
> >> does, it has
> >> no notion about how much space we need to invoke the
> >> signal
> >> handler and
> >> handle stack overflows, no?
> >>
> >> When the red zone is hit, what ever the current thread
> >> state is,
> >>
> >> the red zone is disabled and
> >> VMError::report_and_die() is
> >> called,
> >> which should generate a hs_err file unless the
> >> generation of the
> >> error file requires more memory than the red zone
> >> provides.
> >>
> >> Fred
> >>
> >>
> >> Thanks, Thomas
> >>
> >>
> >>
> >>
> >> On 04/03/2017 02:08 PM, Thomas Stüfe wrote:
> >>
> >> Hi,
> >>
> >> Today we wondered what would happen when a stack
> >> overflow occurs in native
> >> code running in a java thread (an attached
> >> thread or one
> >> created by the
> >> VM).
> >>
> >> In that case yellow and red pages are in place,
> >> but this
> >> would not help
> >> much, would it not, because the native code
> >> would not do
> >> any stack
> >> banging?
> >>
> >> So, native code would hit the yellow page, and
> >> then
> >> there would probably
> >> not be enough space left on the stack to
> >> invoke the
> >> signal handler. The
> >> result would be immediate VM death - not even an
> >> hs-err
> >> file - is that
> >> correct?
> >>
> >> Also, we would hit the our own yellow page,
> >> not the
> >> guard page the OS may
> >> or may not have established, so - on UNIX - this
> >> would
> >> show up as
> >> "Segmentation Fault", not "Stack Overflow", or?
> >>
> >> Thank you,
> >>
> >> Thomas
> >>
> >>
> >>
> >>
>
More information about the hotspot-runtime-dev
mailing list