RFR(M): 8054478 C2: Incorrectly compiled char[] array access crashes JVM

Tue Nov 4 00:27:10 UTC 2014

Sorry, it took so much time. I was confused by number of files you 
changed. In general it seems good to me.

Do you know why GCM scheduled the load into previous block? Usually 
loads are placed near their uses in:

0c7   B13: #    B17 B14 <- B11  Freq: 0.768468
0c7     movzwl  R8, [RCX + #24 + R10 << #1]     # ushort/char << SEGV
0cd     addl    RDI, #-2        # int
0d0     movl    R9, #-2 # int
0d6     testl   R10, R10
0d9     jle,s   B17  P=0.000001 C=-1.000000
0d9
0db   B14: #    B21 B15 <- B13  Freq: 0.768467
0db     testl   R8, R8
0de     je,s   B21  P=0.100000 C=-1.000000

castnode.hpp:

Fields names: _required --> _carry_dependency.

I am not convinced that you need _inexact. Based on your code _inexact 
can be converted from 'true' to 'false' only when 
phase->type(cmp->in(2))->is_int()->is_con() and all conditions at the 
beginning of CastIINode::Ideal() are true. But if input's type is 
narrowed later we can still improve CastII's type.

Do you think we will get worse code if we don't do Identity for 
_carry_dependency CastII nodes? If _carry_dependency is set we can't 
replace it with an other node, I think, (unless all inputs are matching 
- hash() and cmp()).

castnode.cpp

The type change belongs to Value() not Ideal(). Especially if you don't 
need to set _inexact field and create new CastII node for that.

Use fatal with message which includes BoolTest::mask name instead of 
ShouldNotReachHere().

loopTransform.cpp

insert_castii_before_loop() --> cast_incr_before_loop()
You missed registration of new CastII nodes - register_new_node().

opaquenode.cpp

What test you are talking about: "The pre loop is guarded by a test on 
an opaque node which is later removed"? I did not get first part of the 
code. You are putting on worklist a Phi from *previous* (pre-loop) loop. 
I would understand if you do that for the following (guarded main-, 
post-) loop, and that is already taking care by putting CastII on worklist.

There is loop_limit_check predicate before a counted loop which has 
opaqued limit. It . Those opaque nodes are removed by 
cleanup_loop_predicates(). Why Phi is not put on worklist at that time?

Thanks,
Vladimir

On 10/6/14 2:49 AM, Roland Westrelin wrote:
>
> Anyone?
>
> Roland.
>
> On Sep 17, 2014, at 7:03 PM, Roland Westrelin <roland.westrelin at oracle.com> wrote:
>
>> http://cr.openjdk.java.net/~roland/8054478/webrev.00/
>>
>> The IR after parsing is this in pseudo code:
>>
>> i_main = 0;
>> while(true) {
>> if (i_main >= 1000000) return;
>> i_main++;
>> char[] array = new char[1];
>> if (pattern1 != null) {
>>    i = 0;
>>    while(true) {
>>      if (i >= 0) {
>>        do {
>>          if (pattern0 == null) uncommon_trap;
>>          if (pattern0.length u< i) uncommon_trap;
>>          if (pattern0[i] != 0) {
>>            if (pattern1.length u< i) uncommon_trap;
>>            if (pattern1[i] != 0) {
>>              goto out1;
>>            }
>>          }
>>          i—;
>>          pos—;
>>          if (pos != -1) {
>>            goto out2;
>>          }
>>        } while (i >= 0);
>>        goto out1;
>>      }
>>      out2:
>>      c = array[pos];
>>    }
>> }
>> out1:
>> }
>>
>> The do {} while loop is a CountedLoop. The null check & range check are moved out of the loop. Then a pre and a post loops are created and the body is unrolled. The pattern0[i] LoadNode nodes in the unrolled body have a control that is set to the range check before the pre loop. In the unrolled loop, because of the if (pos != -1) test in the first copy of the loop, the compiler finds that in the second copy of the loop body pos is != -1 and so the loop is exited before reaching the end of the unrolled loop. The back branch of the unrolled loop is dead. The compiler optimizes the CountedLoopNode of the unrolled loop out because it doesn’t have a back branch anymore, PhiNodes are removed and the LoadNode for pattern0[i] in the unroll loop body is now independent of the input control of the CountedLoop. Its control is still set to the range check before the pre loop. In the generated code, the pre loop is followed by the pattern1[i] access which is incorrect because it happens before 
the if (i >= 0) that dominated the unrolled loop before it was removed.
>>
>> The graph is good as long as the CountedLoop is not removed because the LoadNode depends on a PhiNode of the CountedLoop. But as soon as the CountedLoop is removed and the PhiNode is disconnected, the LoadNode no longer depends on the check that guarded the CountedLoop.
>>
>> Vladimir suggested (in a private conversation) that I use a CastII (control dependent on the if that guards the main loop with as input the input value to the induction variable in the loop) as a way to guarantee correct dependencies. There are 2 problems with this:
>>
>> 1) the CastII nodes are removed after CCP. I added a flag to the CastII nodes so we can mark CastII nodes that can’t be removed and should be left as they are post-CCP.
>> 2) we don’t know the range of values that are possible on the if branch that enters the main loop right away (it can be adjusted as loop opts are applied). We may actually never know it if it depends on non constants. If we use a type for a range that is too wide in the CastII the CastII may be optimized out incorrectly. I added another flag to the CastII to mark nodes that cannot be optimized. In CastII:Ideal I added code that tries to figure out a better type for the CastII so we may be able to optimize the CastII as the compiler optimizes the IR.
>>
>> Another independent issue here is that the main loop is actually never executed because the loop of the test is a single iteration loop so the compiler should be able to optimize out the main loop entirely but it doesn’t. PhiNode::Value() has this code:
>>
>> CountedLoopNode *l = r->is_CountedLoop() ? r->as_CountedLoop() : NULL;
>> if( l && l->can_be_counted_loop(phase) &&
>>      ((const Node*)l->phi() == this) ) { // Trip counted loop!
>>
>> ...
>>        if( lo->_hi < hi->_lo )     // Reversed endpoints are well defined :-(
>>          return TypeInt::make(lo->_lo,hi->_hi,3);
>>      }
>>    }
>> }
>>
>> that returns an accurate value for the Phi node of a CountedLoop. In this case we want the pre loop’s Phi to return an accurate type that would then make the guard of the main loop optimize. The pre loop is guarded by a test on an opaque node which is later removed. If PhiNode::Value() is called before the Opaque1 node is removed then, the limit type is int and it cannot return anything accurate. If it’s called after, then limit type is a constant and PhiNode::Value() also returns a constant for the pre-loop. Then the main loop is optimized out. To fix this: I added code to Opaque1Node::Ideal so it reenqueues the Phi when the opaque node is removed.
>>
>> Roland.
>