Question about stack overflows in native code
Thomas Stüfe
thomas.stuefe at gmail.com
Tue Apr 4 19:04:02 UTC 2017
Hi Fred,
On Tue, 4 Apr 2017 at 20:56, Frederic Parain <frederic.parain at oracle.com>
wrote:
>
>
> On 04/04/2017 02:31 PM, Thomas Stüfe wrote:
> > Hi David,
> >
> > On Tue, Apr 4, 2017 at 12:11 PM, David Holmes <david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>> wrote:
> >
> > On 4/04/2017 6:30 PM, Thomas Stüfe wrote:
> >
> > Hi David,
> >
> > On Mon, Apr 3, 2017 at 11:02 PM, David Holmes
> > <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
> > <mailto:david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>>> wrote:
> >
> > Just to follow up on what Fred responded ...
> >
> > On 4/04/2017 4:42 AM, Thomas Stüfe wrote:
> >
> > Hi Fred,
> >
> > thanks! Some more questions inline.
> >
> > On Mon, Apr 3, 2017 at 8:29 PM, Frederic Parain
> > <frederic.parain at oracle.com
> > <mailto:frederic.parain at oracle.com>
> > <mailto:frederic.parain at oracle.com
> > <mailto:frederic.parain at oracle.com>>>
> >
> > wrote:
> >
> > When the yellow zone is hit and the thread state is
> > not in
> > _thread_in_java (which means thread state is
> > _thread_in_native or
> > _thread_in_vm), the yellow zone is silently disabled
> > and the
> > thread
> > is allowed to resume its execution.
> >
> >
> > Disabled by whom exactly?
> >
> > Normally, this would be done in the signal handler, but
> that
> > requires
> > enough stack space to run. AFAIK jitted or interpreted
> > code does
> > stack
> > banging in order to trigger the yellow-page-segfault at
> > a point
> > where there
> > are enough pages left on the stack to invoke the signal
> > handler
> > (n shadow
> > pages before), but that is not guaranteed to work with
> > native
> > C-compiled
> > code, no?
> >
> >
> > The stack banging is done to ensure the stackoverflow is hit
> > before
> > we start doing the actual operation. The size of the yellow
> > and red
> > zones are supposed to be sufficient to allow the respective
> > signal
> > processing and response to be executed.
> >
> >
> > And the size of the shadow pages should be sufficient to invoke
> > initial
> > signal handler which will unprotect the yellow or red zone,
> right?
> > So, back to my original question, if native C code does not bang
> the
> > stack but simply runs into the yellow zone, process will simply
> > die, or?
> >
> >
> > I thought Fred already answered that. The signal handler simply
> > disables the yellow zone and returns:
> >
> > } else {
> > // Thread was in the vm or native code. Return and try
> > to finish.
> > thread->disable_stack_yellow_reserved_zone();
> > return 1;
> > }
> >
> >
> > But in order to do this it needs at least enough stack space to invoke
> > the signal handler and call mprotect on the yellow page, right? So, for
> > native code compiled by a C-compiler, this may or may not work,
> > depending on whether and what form of stack-banging code the C-Compiler
> > does generate? (It may generate some sort of stack banging to trigger
> > the OS guard page and do OS stack overflow handling, or it may just
> > blindly run into the yellow page when pushing a new frame).
>
> The yellow zone by itself doesn't provide protection against dying
> from a stack overflow. It has been designed to work in coordination
> with the stack banging. With stack banging, a thread will try to
> "touch" some pages down its stack *before* it really needs them.
> This way, if the yellow zone is hit during the stack banging, there's
> enough remaining free stack space before the yellow zone to execute
> the signal handler (which doesn't need a lot of stack space). And
> if the signal handler can disable the yellow zone, then the thread
> has enough stack space to perform more complex operations like
> generating and throwing a StackOverflowError.
>
> Without stack banging, the thread will use is stack space until
> the yellow zone is hit, and usually when it is hit, the process
> will die because there wasn't enough remaining space to execute
> the signal handler.
>
> In Java code, stack banging is performed each time a method is
> invoked, to ensure the thread has enough stack space to execute
> it (the class file provides information about the maximum number
> of local variables and the deepest execution stack the method
> will need). Of course, with JIT compile code, stack banging is
> performed differently because of in-lining.
>
> For code which cannot perform stack banging on method boundaries,
> like the VM code, the approach is different. Each time a thread
> is about to call into the VM runtime, a stack banging is performed
> using the StackShadowPages sizing. Shadow pages is supposed to
> represent a stack space big enough to execute *any* call to VM
> runtime. So, if this stack banging passes, all the runtime code
> is executed without any additional check, hoping that shadow
> pages have been sized correctly.
>
> You can try to add, in your native code, some stack banging code,
> or a logic computing the remaining stack space before the
> guard pages. Not necessarily on every method call, but on well
> known points in your code. The hardest part is usually to know
> how much stack space your native code will need. It's possible
> to start with a big over-estimating value, and refine it later.
> The sizing of the different zones has been determined with a
> trial and error process which still continue today as the
> JVM code and native JDK code evolve.
>
> Fred
Thanks a lot for this excellent and complete explanation!
Kind regards, Thomas
>
> > If stack space is not sufficient to invoke the signal handler to
> > unprotect the yellow/red page, process would silently die, right?
>
> Correct.
>
> >
> > If it keeps going and hits the red zone then the red zone will be
> > disabled, we print some error messages, and then should call
> > VMError::report_and_die(). But I admit the signal handler logic is
> > quite complex so I may have missed something. :)
> >
> >
> >
> > But that assumes you simply advance into the guard zones -
> > if your
> > native code suddenly jumped to the end of the yellow zone for
> > example, then signal processing would hit the red zone;
> > similarly if
> > you jump to the end of the red zone then signal processing
> > will hit
> > the OS guard page. If you jump past all guard pages you
> > simply die.
> >
> >
> > Thank you!
> >
> > See also my response to Fred. We wondered whether exporting a
> > simple JNI
> > helper function to check the stack size on behalf of the native
> code
> > would be something helpful, for cooperative native code at least.
> >
> >
> > Perhaps. Haven't really thought about it. :)
> >
> >
> > We may experiment a bit. The VM silently dying on native code stack
> > overflows is a huge annoyance, especially since it depends on the
> > user-adjustable stack size. Typically not even a hs_err file is
> generated.
> >
> > Actually not a theoretical problem, I am currently running into this:
> > http://www-01.ibm.com/support/docview.wss?uid=swg1IV23033 for our
> > commercial code base at a customer (not j9 obviously), and while the
> > recursion in the vector calculations can be fixed, it would be nice to
> > at least have an hs_err file...
> >
> > Cheers,
> > David
> >
> >
> > Kind Regards, Thomas
> >
> >
> > Kind Regards, Thomas
> >
> >
> > David
> >
> >
> > (not just a theory, we have a test case here where a
> stack
> > overflow in
> > native code just silently kills the process.)
> >
> > I guess it may work accidentally if the C-compiled code
> > itself
> > does some
> > form of stack banging when establishing frames, in order
> to
> > detect OS stack
> > overflows? Very fuzzy here. But whatever the C-compiled
> code
> > does, it has
> > no notion about how much space we need to invoke the
> signal
> > handler and
> > handle stack overflows, no?
> >
> > When the red zone is hit, what ever the current thread
> > state is,
> >
> > the red zone is disabled and
> > VMError::report_and_die() is
> > called,
> > which should generate a hs_err file unless the
> > generation of the
> > error file requires more memory than the red zone
> > provides.
> >
> > Fred
> >
> >
> > Thanks, Thomas
> >
> >
> >
> >
> > On 04/03/2017 02:08 PM, Thomas Stüfe wrote:
> >
> > Hi,
> >
> > Today we wondered what would happen when a stack
> > overflow occurs in native
> > code running in a java thread (an attached
> > thread or one
> > created by the
> > VM).
> >
> > In that case yellow and red pages are in place,
> > but this
> > would not help
> > much, would it not, because the native code
> > would not do
> > any stack
> > banging?
> >
> > So, native code would hit the yellow page, and
> then
> > there would probably
> > not be enough space left on the stack to invoke
> the
> > signal handler. The
> > result would be immediate VM death - not even an
> > hs-err
> > file - is that
> > correct?
> >
> > Also, we would hit the our own yellow page, not
> the
> > guard page the OS may
> > or may not have established, so - on UNIX - this
> > would
> > show up as
> > "Segmentation Fault", not "Stack Overflow", or?
> >
> > Thank you,
> >
> > Thomas
> >
> >
> >
> >
>
More information about the hotspot-runtime-dev
mailing list