RFR(S): JDK-8137035 tests got EXCEPTION_STACK_OVERFLOW on Windows 64 bit

Frederic Parain frederic.parain at oracle.com
Mon Aug 29 16:20:46 UTC 2016



On 08/29/2016 11:52 AM, Gerard Ziemski wrote:
> hi Fred,
>
>> On Aug 26, 2016, at 4:53 PM, Frederic Parain
>> <frederic.parain at oracle.com> wrote:
>>
>>> Is it really an undecidable problem? Why is that exactly?
>>
>> How would you compute the max stack size for any call to the VM?
>> Just the matrix of all VM options that could impact the stack
>> usage is huge: several GC, several JIT compilers, JVMTI hooks,
>> JFR. The work to be performed by the JVM can also be dependent on
>> the application  (class hierarchy, application code itself which
>> can be optimized (and deoptimized) in many different ways according
>> to compilation policies and application behavior).
>>
>> This problem is not specific to the JVM. Linux has a similar issue
>> with its kernel stacks: they have a fixed size, but there's no way
>> to ensure that the size is sufficient to execute any system call
>> or perform any OS operation.
>
> Absolutely, however, in light of the issue, now that we determined we
> need to increase the number of shadow pages, it seems to me that
> maybe we could take this opportunity and try to evaluate (somehow)
> how many we actually need under some hypothetical  load condition
> with all the common options turned on, as an alternative way to
> conservatively increasing them by 1. After all, like you said, when
> the networking code was changed, we had to find a new default value
> somehow, so it has bee done before. I don’t know when we set the
> pages size last, but if it has been a while, then given all the new
> features we probably added since then, again as you said JFR, GC
> strategies etc., means we should probably re-evaluate this every now
> and then? It’s just that increasing the pages by 1 and hoping
> (admittedly backed up by testing) that it’s good enough seems to me
> not quite good enough? Should we at least have a follow-up issue to
> address this?


This is what part of the fix does. Undersized stack shadow pages are
easily detected on Unices because it causes crashes as soon as
the JVM code hit the yellow zone. The Windows platform was more
sensible to this issue because 1) stack shadow zone was smaller
2) stack overflows could happen silently when executing JVM code.
With the new assert to detect stack overflows in JVM code on
Windows, we will be able to detect during our testing when the
default shadow zone is too small.

Trying to determine a set of tests and configurations to use
to test the deepest stack usage looks a waste of time to me
(any code change could introduce a deeper stack usage).
However, the new assert will be checked on every test ran
with a debug build. I expect this to provide the wide
coverage we need to estimate the JVM code stack requirements.

There's a side discussion about adding a mechanism to measure
stack consumption during every VM call, but so far, proposed
designs are both complex and brittle. Adding such code to
the main baseline would be a high risk compared to the
issue it tries to solve.

The sizing of the different special zones of the execution
stacks is currently done with a trial and error method.
I agree this is not an ideal solution, especially with all
the new features being continuously added to the JVM, but
we haven't a better solution to propose on the short term.

Regards,

Fred


More information about the hotspot-runtime-dev mailing list