RFR(M): 8054478 C2: Incorrectly compiled char[] array access crashes JVM
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Nov 4 00:27:10 UTC 2014
Sorry, it took so much time. I was confused by number of files you
changed. In general it seems good to me.
Do you know why GCM scheduled the load into previous block? Usually
loads are placed near their uses in:
0c7 B13: # B17 B14 <- B11 Freq: 0.768468
0c7 movzwl R8, [RCX + #24 + R10 << #1] # ushort/char << SEGV
0cd addl RDI, #-2 # int
0d0 movl R9, #-2 # int
0d6 testl R10, R10
0d9 jle,s B17 P=0.000001 C=-1.000000
0d9
0db B14: # B21 B15 <- B13 Freq: 0.768467
0db testl R8, R8
0de je,s B21 P=0.100000 C=-1.000000
castnode.hpp:
Fields names: _required --> _carry_dependency.
I am not convinced that you need _inexact. Based on your code _inexact
can be converted from 'true' to 'false' only when
phase->type(cmp->in(2))->is_int()->is_con() and all conditions at the
beginning of CastIINode::Ideal() are true. But if input's type is
narrowed later we can still improve CastII's type.
Do you think we will get worse code if we don't do Identity for
_carry_dependency CastII nodes? If _carry_dependency is set we can't
replace it with an other node, I think, (unless all inputs are matching
- hash() and cmp()).
castnode.cpp
The type change belongs to Value() not Ideal(). Especially if you don't
need to set _inexact field and create new CastII node for that.
Use fatal with message which includes BoolTest::mask name instead of
ShouldNotReachHere().
loopTransform.cpp
insert_castii_before_loop() --> cast_incr_before_loop()
You missed registration of new CastII nodes - register_new_node().
opaquenode.cpp
What test you are talking about: "The pre loop is guarded by a test on
an opaque node which is later removed"? I did not get first part of the
code. You are putting on worklist a Phi from *previous* (pre-loop) loop.
I would understand if you do that for the following (guarded main-,
post-) loop, and that is already taking care by putting CastII on worklist.
There is loop_limit_check predicate before a counted loop which has
opaqued limit. It . Those opaque nodes are removed by
cleanup_loop_predicates(). Why Phi is not put on worklist at that time?
Thanks,
Vladimir
On 10/6/14 2:49 AM, Roland Westrelin wrote:
>
> Anyone?
>
> Roland.
>
> On Sep 17, 2014, at 7:03 PM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
>
>> http://cr.openjdk.java.net/~roland/8054478/webrev.00/
>>
>> The IR after parsing is this in pseudo code:
>>
>> i_main = 0;
>> while(true) {
>> if (i_main >= 1000000) return;
>> i_main++;
>> char[] array = new char[1];
>> if (pattern1 != null) {
>> i = 0;
>> while(true) {
>> if (i >= 0) {
>> do {
>> if (pattern0 == null) uncommon_trap;
>> if (pattern0.length u< i) uncommon_trap;
>> if (pattern0[i] != 0) {
>> if (pattern1.length u< i) uncommon_trap;
>> if (pattern1[i] != 0) {
>> goto out1;
>> }
>> }
>> i—;
>> pos—;
>> if (pos != -1) {
>> goto out2;
>> }
>> } while (i >= 0);
>> goto out1;
>> }
>> out2:
>> c = array[pos];
>> }
>> }
>> out1:
>> }
>>
>> The do {} while loop is a CountedLoop. The null check & range check are moved out of the loop. Then a pre and a post loops are created and the body is unrolled. The pattern0[i] LoadNode nodes in the unrolled body have a control that is set to the range check before the pre loop. In the unrolled loop, because of the if (pos != -1) test in the first copy of the loop, the compiler finds that in the second copy of the loop body pos is != -1 and so the loop is exited before reaching the end of the unrolled loop. The back branch of the unrolled loop is dead. The compiler optimizes the CountedLoopNode of the unrolled loop out because it doesn’t have a back branch anymore, PhiNodes are removed and the LoadNode for pattern0[i] in the unroll loop body is now independent of the input control of the CountedLoop. Its control is still set to the range check before the pre loop. In the generated code, the pre loop is followed by the pattern1[i] access which is incorrect because it happens before
the if (i >= 0) that dominated the unrolled loop before it was removed.
>>
>> The graph is good as long as the CountedLoop is not removed because the LoadNode depends on a PhiNode of the CountedLoop. But as soon as the CountedLoop is removed and the PhiNode is disconnected, the LoadNode no longer depends on the check that guarded the CountedLoop.
>>
>> Vladimir suggested (in a private conversation) that I use a CastII (control dependent on the if that guards the main loop with as input the input value to the induction variable in the loop) as a way to guarantee correct dependencies. There are 2 problems with this:
>>
>> 1) the CastII nodes are removed after CCP. I added a flag to the CastII nodes so we can mark CastII nodes that can’t be removed and should be left as they are post-CCP.
>> 2) we don’t know the range of values that are possible on the if branch that enters the main loop right away (it can be adjusted as loop opts are applied). We may actually never know it if it depends on non constants. If we use a type for a range that is too wide in the CastII the CastII may be optimized out incorrectly. I added another flag to the CastII to mark nodes that cannot be optimized. In CastII:Ideal I added code that tries to figure out a better type for the CastII so we may be able to optimize the CastII as the compiler optimizes the IR.
>>
>> Another independent issue here is that the main loop is actually never executed because the loop of the test is a single iteration loop so the compiler should be able to optimize out the main loop entirely but it doesn’t. PhiNode::Value() has this code:
>>
>> CountedLoopNode *l = r->is_CountedLoop() ? r->as_CountedLoop() : NULL;
>> if( l && l->can_be_counted_loop(phase) &&
>> ((const Node*)l->phi() == this) ) { // Trip counted loop!
>>
>> ...
>> if( lo->_hi < hi->_lo ) // Reversed endpoints are well defined :-(
>> return TypeInt::make(lo->_lo,hi->_hi,3);
>> }
>> }
>> }
>>
>> that returns an accurate value for the Phi node of a CountedLoop. In this case we want the pre loop’s Phi to return an accurate type that would then make the guard of the main loop optimize. The pre loop is guarded by a test on an opaque node which is later removed. If PhiNode::Value() is called before the Opaque1 node is removed then, the limit type is int and it cannot return anything accurate. If it’s called after, then limit type is a constant and PhiNode::Value() also returns a constant for the pre-loop. Then the main loop is optimized out. To fix this: I added code to Opaque1Node::Ideal so it reenqueues the Phi when the opaque node is removed.
>>
>> Roland.
>
More information about the hotspot-compiler-dev
mailing list