RFR(M): 8054478 C2: Incorrectly compiled char[] array access crashes JVM

Thu Nov 13 16:26:22 UTC 2014

On 11/13/14 1:09 AM, Roland Westrelin wrote:
>
>> You don't need leftover changes in opaquenode.* files anymore.
>
> I’ll remove the changes.
>
>> jetstream regression may indicate the increase in compilation times. I would suggest to have n->Opcode() == Op_Opaque1 checks in both cases to target only it. Sorry, in my previous mail I incorrectly used Op_CastII in the example code.
>
>
> Are we guaranteed that the type of the limit won’t improve after the opaque node is removed? Maybe it’s some arithmetic computation that isn’t optimized yet? In that case, we want to re-process the phi even after the opaque node is removed as soon as the limit changes.

My main concern was regressions in your run. Which I think is increase of compilation time.
If you can show that time (-XX:+CITime) does not increase much I am fine with current change.
And I still want you to run this in aurora.se - I don't want new performance bugs filed 2 months from now.

>
> Can I go with a single reviewer for this change?

Ask Igor for second review.

Thanks,
Vladimir

>
> Roland.
>
>>
>> Thanks,
>> Vladimir
>>
>> On 11/12/14 2:16 AM, Roland Westrelin wrote:
>>> Vladimir,
>>>
>>> New webrev with your suggested change:
>>>
>>> http://cr.openjdk.java.net/~roland/8054478/webrev.02/
>>>
>>> I also had to make BoolTest::dump_on available in product builds for that code:
>>>
>>>   138             } else {
>>>   139               stringStream ss;
>>>   140               test.dump_on(&ss);
>>>   141               fatal(err_msg_res("unexpected comparison %s", ss.as_string()));
>>>   142             }
>>>
>>> in castnode.cpp
>>>
>>> I did a refworkload run on x64:
>>>
>>> ============================================================================
>>> tbase: reference_server
>>>    Benchmark         Samples        Mean     Stdev
>>>    jetstream              10      102.74      0.10
>>>      Write                10      202.90      0.94
>>>      Parse                10       37.10      0.05
>>>      Read                 10       23.20      0.13
>>>      rw.runtime           10       10.60      0.05
>>>      Copy                 10      641.70      0.10
>>>    scimark                10     1600.34      0.02
>>>      LU                   10     3776.61      0.00
>>>      FFT                  10      405.76      0.02
>>>      Monte                10      775.88      0.00
>>>      SOR                  10     1275.16      0.01
>>>      Sparse               10     1768.28      0.08
>>>      rw.runtime           10       27.20      0.02
>>>    specjbb2000            10   487411.56      0.04
>>>      Last_Warehouse       10   487411.58      0.04
>>>      First_Warehouse      10    95725.01      0.02
>>>      rw.runtime           10      603.90      0.00
>>>    specjbb2005            10   480939.78      0.01
>>>      last                 10   480939.79      0.01
>>>      interval_average     10     5937.40      0.01
>>>      peak                 10   534158.07      0.02
>>>      overall_average      10   431366.64      0.02
>>>      last_warehouse       10        8.00      0.00
>>>      peak_warehouse       10        2.30      0.21
>>>      first                10    50951.62      0.03
>>>      rw.runtime           10      461.60      0.00
>>>    specjvm98              10      801.92      0.04
>>>      compress             10      604.68      0.10
>>>      javac                10      405.68      0.17
>>>      db                   10      449.78      0.00
>>>      jack                 10      631.14      0.05
>>>      mtrt                 10     2342.41      0.03
>>>      jess                 10      957.35      0.02
>>>      rw.runtime           10       41.10      0.04
>>>      mpegaudio            10     1385.76      0.00
>>>    volano25               10   327418.69      0.06
>>>      time                 10        2.45      0.05
>>>      connections          10      400.00      0.00
>>>      rw.runtime           10       26.10      0.01
>>>    --------------------------------------------------------------------------
>>>    Weighted Geomean             31199.92
>>> ============================================================================
>>> tnew: reference_server
>>>    Benchmark         Samples        Mean     Stdev   %Diff    P   Significant
>>>    jetstream              10       85.82      0.12  -16.47 0.002          Yes
>>>      Write                10      219.60      0.31   -8.23 0.799            *
>>>      Parse                10       36.90      0.06    0.54 0.836            *
>>>      Read                 10       23.60      0.10   -1.72 0.741            *
>>>      rw.runtime           10       13.80      0.03  -30.19 0.000          Yes
>>>      Copy                 10     1058.70      0.26  -64.98 0.001          Yes
>>>    scimark                10     1583.22      0.00   -1.07 0.110            *
>>>      LU                   10     3778.49      0.00    0.05 0.160            *
>>>      FFT                  10      407.80      0.01    0.50 0.498            *
>>>      Monte                10      775.71      0.00   -0.02 0.279            *
>>>      SOR                  10     1276.75      0.01    0.12 0.742            *
>>>      Sparse               10     1677.37      0.01   -5.14 0.086            *
>>>      rw.runtime           10       27.50      0.02   -1.10 0.265            *
>>>    specjbb2000            10   491672.03      0.04    0.87 0.624            *
>>>      Last_Warehouse       10   491672.04      0.04    0.87 0.624            *
>>>      First_Warehouse      10    94172.03      0.02   -1.62 0.050            *
>>>      rw.runtime           10      604.00      0.00   -0.02 0.585            *
>>>    specjbb2005            10   481051.28      0.01    0.02 0.945            *
>>>      last                 10   481051.29      0.01    0.02 0.945            *
>>>      interval_average     10     5938.90      0.01    0.03 0.940            *
>>>      peak                 10   538706.19      0.03    0.85 0.461            *
>>>      overall_average      10   434244.96      0.02    0.67 0.535            *
>>>      last_warehouse       10        8.00      0.00   -0.00 0.000            *
>>>      peak_warehouse       10        2.00      0.00   13.04 0.081            *
>>>      first                10    51039.34      0.03    0.17 0.889            *
>>>      rw.runtime           10      462.20      0.00   -0.13 0.120            *
>>>    specjvm98              10      806.31      0.04    0.55 0.738            *
>>>      compress             10      623.61      0.10    3.13 0.494            *
>>>      javac                10      402.37      0.10   -0.81 0.898            *
>>>      db                   10      450.58      0.00    0.18 0.327            *
>>>      jack                 10      627.02      0.05   -0.65 0.785            *
>>>      mtrt                 10     2281.86      0.03   -2.58 0.106            *
>>>      jess                 10     1000.65      0.10    4.52 0.220            *
>>>      rw.runtime           10       40.90      0.03    0.49 0.761            *
>>>      mpegaudio            10     1384.19      0.00   -0.11 0.514            *
>>>    volano25               10   324906.00      0.08   -0.77 0.799            *
>>>      time                 10        2.47      0.07   -1.00 0.733            *
>>>      connections          10      400.00      0.00    0.00 0.000            *
>>>      rw.runtime           10       26.50      0.02   -1.53 0.058            *
>>>    --------------------------------------------------------------------------
>>>    Weighted Geomean             30613.61             -1.88
>>> ============================================================================
>>>
>>> tbase is current hotspot-comp. tnew is with the change. It’s ok, right?
>>>
>>> Roland.
>>>
>>>
>>>> On Nov 8, 2014, at 12:08 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>>
>>>> This looks good but I just realized that what you are doing in Opaque1Node::Ideal() we usually do in PhaseIterGVN::add_users_to_worklist().
>>>>
>>>> For example you can do there (for Phi):
>>>>
>>>> +   uint use_op = use->Opcode();
>>>>     if( use->is_Cmp() ) {       // Enable CMP/BOOL optimization
>>>>       add_users_to_worklist(use); // Put Bool on worklist
>>>> -     // Look for the 'is_x2logic' pattern: "x ? : 0 : 1" and put the
>>>> -     // phi merging either 0 or 1 onto the worklist
>>>>       if (use->outcnt() > 0) {
>>>>         Node* bol = use->raw_out(0);
>>>>         if (bol->outcnt() > 0) {
>>>>           Node* iff = bol->raw_out(0);
>>>>           if (use_op == Op_CmpI && iff->is_CountedLoopEnd() &&
>>>>               n->Opcode() == Op_CastII) {
>>>> +           // If this opaque node feeds into the limit condition of a
>>>> +           // CountedLoop, we need to process the Phi node for the
>>>> +           // induction variable: the range of values taken by the Phi is
>>>> +           // known now and so its type is also known.
>>>> +           CountedLoopEndNode* cle = iff->as_CountedLoopEnd();
>>>> +           if (cle->limit() == this) {
>>>> +             _worklist.push(cle->phi());
>>>> +           }
>>>> -         if (iff->outcnt() == 2) {
>>>> +         } else if (iff->outcnt() == 2) {
>>>> +           // Look for the 'is_x2logic' pattern: "x ? : 0 : 1" and put the
>>>> +           // phi merging either 0 or 1 onto the worklist
>>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>
>>>> On 11/7/14 2:17 PM, Roland Westrelin wrote:
>>>>> Vladimir,
>>>>>
>>>>> Thanks for the discussion, suggestions and fixes. Here is an updated webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~roland/8054478/webrev.01/
>>>>>
>>>>> Roland.
>>>>>
>>>>>> On Nov 6, 2014, at 11:12 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>>>>
>>>>>> On 11/6/14 7:39 AM, Roland Westrelin wrote:
>>>>>>>
>>>>>>>> I need to see new version of webrev. We talked about moving type change from Ideal() to Value(). You are right if the code, currently in ConstraintCastNode::Value(), will be executed first.
>>>>>>>
>>>>>>> I’ll send an updated webrev.
>>>>>>>
>>>>>>>>>>>> opaquenode.cpp
>>>>>>>>>>>>
>>>>>>>>>>>> What test you are talking about: "The pre loop is guarded by a test on an opaque node which is later removed"? I did not get first part of the code. You are putting on worklist a Phi from *previous* (pre-loop) loop. I would understand if you do that for the following (guarded main-, post-) loop, and that is already taking care by putting CastII on worklist.
>>>>>>>>>>>
>>>>>>>>>>> Once the range of values for the pre loop is know, we can optimize the test that guards the main loop. That range of values is only known once the opaque node for the pre loop is removed.
>>>>>>>>>>
>>>>>>>>>> That is what I am asking: "range of values for the pre loop is know" - when this happens, which Opaque1 is removed to make "range" to be known? If it is Opaque1 node from loop_limit_check predicate then we may need to make sure that iteration Phi of pre-loop is put on worklist when predicate's Opaque1 node is removed by cleanup_loop_predicates(). Then you don't need first part in Opaque1Node::Ideal.
>>>>>>>>>
>>>>>>>>> The Opaque1 nodes are the ones created by PhaseIdealLoop::insert_pre_post_loops() (loop limit checks). They are not in the Compile::_predicate_opaqs list and so they are not removed by cleanup_loop_predicates().
>>>>>>>>
>>>>>>>> So how these Opaque1 nodes affects type range of Phi node in pre-loop? That is what I don't understand.
>>>>>>>
>>>>>>> (I know you don’t like when I remove the text from previous emails but that was really getting confusing)
>>>>>>
>>>>>> At least leave webrev link so I don't need to search for it in previous mails.
>>>>>>
>>>>>>>
>>>>>>> PhiNode::Value() has this code:
>>>>>>>
>>>>>>>    // Check for trip-counted loop.  If so, be smarter.
>>>>>>>    CountedLoopNode *l = r->is_CountedLoop() ? r->as_CountedLoop() : NULL;
>>>>>>>    if( l && l->can_be_counted_loop(phase) &&
>>>>>>>        ((const Node*)l->phi() == this) ) { // Trip counted loop!
>>>>>>>      // protect against init_trip() or limit() returning NULL
>>>>>>>      const Node *init   = l->init_trip();
>>>>>>>      const Node *limit  = l->limit();
>>>>>>>      if( init != NULL && limit != NULL && l->stride_is_con() ) {
>>>>>>>        const TypeInt *lo = init ->bottom_type()->isa_int();
>>>>>>>        const TypeInt *hi = limit->bottom_type()->isa_int();
>>>>>>>        if( lo && hi ) {            // Dying loops might have TOP here
>>>>>>>          int stride = l->stride_con();
>>>>>>>          if( stride < 0 ) {          // Down-counter loop
>>>>>>>            const TypeInt *tmp = lo; lo = hi; hi = tmp;
>>>>>>>            stride = -stride;
>>>>>>>          }
>>>>>>>          if( lo->_hi < hi->_lo )     // Reversed endpoints are well defined :-(
>>>>>>>            return TypeInt::make(lo->_lo,hi->_hi,3);
>>>>>>>        }
>>>>>>>      }
>>>>>>>    }
>>>>>>>
>>>>>>> That code can only return something for the induction variable Phi once the opaque node for the loop limit check is removed. That’s why when the opaque node is removed I enqueue the induction variable Phi.
>>>>>>
>>>>>> My mistake was that I thought you are looking on Opaque1 node generated for main-loop guard:
>>>>>>
>>>>>>   // Step B2: Build a zero-trip guard for the main-loop.  After leaving the
>>>>>>   // pre-loop, the main-loop may not execute at all.  Later in life this
>>>>>>   // zero-trip guard will become the minimum-trip guard when we unroll
>>>>>>   // the main-loop.
>>>>>>   Node *min_opaq = new Opaque1Node(C, limit);
>>>>>>   Node *min_cmp  = new CmpINode( pre_incr, min_opaq );
>>>>>>
>>>>>>
>>>>>> But you are talking about pre-loop exit check:
>>>>>>
>>>>>>   // Step B4: Shorten the pre-loop to run only 1 iteration (for now).
>>>>>>   // RCE and alignment may change this later.
>>>>>>   Node *cmp_end = pre_end->cmp_node();
>>>>>>   assert( cmp_end->in(2) == limit, "" );
>>>>>>   Node *pre_limit = new AddINode( init, stride );
>>>>>>
>>>>>>   // Save the original loop limit in this Opaque1 node for
>>>>>>   // use by range check elimination.
>>>>>>   Node *pre_opaq  = new Opaque1Node(C, pre_limit, limit);
>>>>>>
>>>>>>
>>>>>> But in this case you can add check (cl->limit() == this) to be clear what you are looking for.
>>>>>> Also instead of looking up through AddI you can look down for CountedLoopEnd and get cle->phi() and cle->limit() from it.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>>>
>>>>>>> In the case of this test case, the pre loop iterates from 0 as long as i > -1. So the Phi for the pre loop has values within [-1, 0]. The main loop is guarded by a test that checks whether The value from the Phi - 1 is greater than 0. With a correct range of values for the Phi from the code above, that checks is statically false and the main loop is optimized away. Otherwise it’s not.
>>>>>>>
>>>>>>> Roland.
>>>>>
>>>
>