[9] RFR (S): 8168926: C2: Bytecode escape analyzer crashes due to stack overflow
Zoltán Majó
zoltan.majo at oracle.com
Tue Jan 10 13:25:38 UTC 2017
Hi,
please review the fix for 8168926.
https://bugs.openjdk.java.net/browse/JDK-8168926
http://cr.openjdk.java.net/~zmajo/8168926/webrev.00/
This is a bug in C2's escape analyzer (EA) I've been chasing for more
than a year now.
The bug reproduces very rarely (<10 appearances since Sep '15) and in
different forms/with different tests (see JDK-8135159 for a set of
different manifestations of the same bug).
I tried to reproduce the crash on at least five different occasions and
with different tests, but did not succeed, unfortunately. So my findings
(and the fix) rely only on source-code/test inspection and post-mortem
analysis of crashes I've seen.
The bug is caused by the EA having an inconsistent view of the number of
parameters taken by a call site 'c'. If call site 'c' in a method 'm' is
dynamic (i.e., 'c' is targeted by an invokehandle or invokedynamic
instruction), the number of parameters taken by 'c' is different before
and after 'c' is resolved. That is, after 'c' is resolved, 'c' takes one
more argument than the number of arguments pushed onto the stack by 'm'
(as 'c' is dynamic, it needs an extra appendix argument after resolution).
In its current state, EA can have two views of 'c' for the analysis of
'c'. I.e., EA can use both a "before-resolution" and an
"after-resolution" view of 'c'. As a result, EA can pop fewer elements
from the stack than there were pushed onto the stack, which results in a
stack overflow.
Here is a detailed scenario to illustrate the problem. Let's assume the
following sequence of operations to take place while EA is analyzing
method 'm'.
Step (1): EA obtains the method targeted by call site 'c' in 'm'. The
result is saved into ciMethod 'target':
http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l895
Let's assume that 'c' is not yet resolved at this point of time, i.e.,
the number of arguments N of 'target' does not include the appendix
argument (i.e., N is equal to the number of items pushed onto the stack
by the the bytecodes of method 'm').
Step (2): A thread different than the compiler thread performing EA of
'm' reaches call site 'c' and executes it. As a result, 'c' is resolved
(and bootstrapped) and it now points to a method taking N+1 parameters
(one more parameter than before, because the parameters also include the
appendix argument).
Step (3): EA checks if call site 'c' has an appendix argument.
http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l899
As there is an appendix argument, an extra (unknown) argument is pushed
onto the stack. I.e., there are N+1 elements on the stack at this point
of time.
Step (4): EA continues with analyzing the call site
http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l903
After being done with the analysis, EA removes 'arg_size' number of
arguments from the stack. For example, here:
http://hg.openjdk.java.net/jdk9/hs/hotspot/file/026ff073b5ad/src/share/vm/ci/bcEscapeAnalyzer.cpp#l294
The number 'arg_size' of arguments is, however, only N. The reason is
that 'arg_size' is obtained from ciMethod 'target' constructed back at
Step (1), i.e., from the unresolved call site, and does not include the
appendix argument.
Summary: If the sequence of operations is executed as outlined by Step
(1)-(4), the stack can overflow after call site is analyzed, because
some arguments pushed onto it are not popped EA is done with analyzing
call site'c'. For the problem to appear, the resolution of call site 'c'
has to happen concurrently with EA and exactly after Step (1) and before
Step (3). That explains why the problem reproduces so rarely. For more
information on the investigation please see [1].
The fix I propose determines if a call site 'c' needs an appendix
argument solely by looking at the ciMethod 'target' and the current
bytecode instruction. By that, EA has only one (consistent) view of call
site 'c' (which is either resolved or not).
I tested the fix with
- RBT (all hotspot tests both with -Xmixed and -Xcomp);
- JPRT;
- locally executed all jdk/test/java/lang/invoke tests (both with
-Xmixed and -Xcomp).
No (new) failures appeared.
Thank you!
Best regards,
Zoltan
[1]
https://bugs.openjdk.java.net/browse/JDK-8168926?focusedCommentId=14035888&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14035888
More information about the hotspot-compiler-dev
mailing list