[9] RFR (S): 8168926: C2: Bytecode escape analyzer crashes due to stack overflow

Zoltán Majó zoltan.majo at oracle.com
Tue Jan 10 13:25:38 UTC 2017


Hi,


please review the fix for 8168926.

https://bugs.openjdk.java.net/browse/JDK-8168926
http://cr.openjdk.java.net/~zmajo/8168926/webrev.00/

This is a bug in C2's escape analyzer (EA) I've been chasing for more 
than a year now.

The bug reproduces very rarely (<10 appearances since Sep '15) and in 
different forms/with different tests (see JDK-8135159 for a set of 
different manifestations of the same bug).

I tried to reproduce the crash on at least five different occasions and 
with different tests, but did not succeed, unfortunately. So my findings 
(and the fix) rely only on source-code/test inspection and post-mortem 
analysis of crashes I've seen.

The bug is caused by the EA having an inconsistent view of the number of 
parameters taken by a call site 'c'. If call site 'c' in a method 'm' is 
dynamic (i.e., 'c' is targeted by an invokehandle or invokedynamic 
instruction), the number of parameters taken by 'c' is different before 
and after 'c' is resolved. That is, after 'c' is resolved, 'c' takes one 
more argument than the number of arguments pushed onto the stack by 'm' 
(as 'c' is dynamic, it needs an extra appendix argument after resolution).

In its current state, EA can have two views of 'c' for the analysis of 
'c'. I.e., EA can use both a "before-resolution" and an 
"after-resolution" view of 'c'. As a result, EA can pop fewer elements 
from the stack than there were pushed onto the stack, which results in a 
stack overflow.

Here is a detailed scenario to illustrate the problem. Let's assume the 
following sequence of operations to take place while EA is analyzing 
method 'm'.


Step (1): EA obtains the method targeted by call site 'c' in 'm'. The 
result is saved into ciMethod 'target':

http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l895

Let's assume that 'c' is not yet resolved at this point of time, i.e., 
the number of arguments N of 'target' does not include the appendix 
argument (i.e., N is equal to the number of items pushed onto the stack 
by the the bytecodes of method 'm').


Step (2): A thread different than the compiler thread performing EA of 
'm' reaches call site 'c' and executes it. As a result, 'c' is resolved 
(and bootstrapped) and it now points to a method taking N+1 parameters 
(one more parameter than before, because the parameters also include the 
appendix argument).


Step (3): EA checks if call site 'c' has an appendix argument.

http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l899

As there is an appendix argument, an extra (unknown) argument is pushed 
onto the stack. I.e., there are N+1 elements on the stack at this point 
of time.


Step (4): EA continues with analyzing the call site

http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l903

After being done with the analysis, EA removes 'arg_size' number of 
arguments from the stack. For example, here:

http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l294

The number 'arg_size' of arguments is, however, only N. The reason is 
that 'arg_size' is obtained from ciMethod 'target' constructed back at 
Step (1), i.e., from the unresolved call site, and does not include the 
appendix argument.


Summary: If the sequence of operations is executed as outlined by Step 
(1)-(4), the stack can overflow after call site is analyzed, because 
some arguments pushed onto it are not popped EA is done with analyzing 
call site'c'. For the problem to appear, the resolution of call site 'c' 
has to happen concurrently with EA and exactly after Step (1) and before 
Step (3). That explains why the problem reproduces so rarely. For more 
information on the investigation please see [1].

The fix I propose determines if a call site 'c' needs an appendix 
argument solely by looking at the ciMethod 'target' and the current 
bytecode instruction. By that, EA has only one (consistent) view of call 
site 'c' (which is either resolved or not).

I tested the fix with
- RBT (all hotspot tests both with -Xmixed and -Xcomp);
- JPRT;
- locally executed all jdk/test/java/lang/invoke tests (both with 
-Xmixed and -Xcomp).

No (new) failures appeared.

Thank you!

Best regards,


Zoltan

[1] 
https://bugs.openjdk.java.net/browse/JDK-8168926?focusedCommentId=14035888&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14035888


More information about the hotspot-compiler-dev mailing list