RFR(S): JDK-8137035 tests got EXCEPTION_STACK_OVERFLOW on Windows 64 bit

Mon Aug 29 16:37:58 UTC 2016

> On Aug 29, 2016, at 11:20 AM, Frederic Parain <frederic.parain at oracle.com> wrote:
> 
> 
> 
> On 08/29/2016 11:52 AM, Gerard Ziemski wrote:
>> hi Fred,
>> 
>>> On Aug 26, 2016, at 4:53 PM, Frederic Parain
>>> <frederic.parain at oracle.com> wrote:
>>> 
>>>> Is it really an undecidable problem? Why is that exactly?
>>> 
>>> How would you compute the max stack size for any call to the VM?
>>> Just the matrix of all VM options that could impact the stack
>>> usage is huge: several GC, several JIT compilers, JVMTI hooks,
>>> JFR. The work to be performed by the JVM can also be dependent on
>>> the application  (class hierarchy, application code itself which
>>> can be optimized (and deoptimized) in many different ways according
>>> to compilation policies and application behavior).
>>> 
>>> This problem is not specific to the JVM. Linux has a similar issue
>>> with its kernel stacks: they have a fixed size, but there's no way
>>> to ensure that the size is sufficient to execute any system call
>>> or perform any OS operation.
>> 
>> Absolutely, however, in light of the issue, now that we determined we
>> need to increase the number of shadow pages, it seems to me that
>> maybe we could take this opportunity and try to evaluate (somehow)
>> how many we actually need under some hypothetical  load condition
>> with all the common options turned on, as an alternative way to
>> conservatively increasing them by 1. After all, like you said, when
>> the networking code was changed, we had to find a new default value
>> somehow, so it has bee done before. I don’t know when we set the
>> pages size last, but if it has been a while, then given all the new
>> features we probably added since then, again as you said JFR, GC
>> strategies etc., means we should probably re-evaluate this every now
>> and then? It’s just that increasing the pages by 1 and hoping
>> (admittedly backed up by testing) that it’s good enough seems to me
>> not quite good enough? Should we at least have a follow-up issue to
>> address this?
> 
> 
> This is what part of the fix does. Undersized stack shadow pages are
> easily detected on Unices because it causes crashes as soon as
> the JVM code hit the yellow zone. The Windows platform was more
> sensible to this issue because 1) stack shadow zone was smaller
> 2) stack overflows could happen silently when executing JVM code.
> With the new assert to detect stack overflows in JVM code on
> Windows, we will be able to detect during our testing when the
> default shadow zone is too small.
> 
> Trying to determine a set of tests and configurations to use
> to test the deepest stack usage looks a waste of time to me
> (any code change could introduce a deeper stack usage).

Right, but it has to be tracked somehow, and like you say next, we are attempting to do this.

> However, the new assert will be checked on every test ran
> with a debug build. I expect this to provide the wide
> coverage we need to estimate the JVM code stack requirements.

Good.

> 
> There's a side discussion about adding a mechanism to measure
> stack consumption during every VM call, but so far, proposed
> designs are both complex and brittle. Adding such code to
> the main baseline would be a high risk compared to the
> issue it tries to solve.
> 
> The sizing of the different special zones of the execution
> stacks is currently done with a trial and error method.
> I agree this is not an ideal solution, especially with all
> the new features being continuously added to the JVM, but
> we haven't a better solution to propose on the short term.

If there is an existing issue or a document tracking this, would you mind adding it to this discussion as a reference for any future discussions?

Thank you for answering my questions.

cheers