RFR(M): 8054478 C2: Incorrectly compiled char[] array access crashes JVM
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Nov 13 00:39:44 UTC 2014
You don't need leftover changes in opaquenode.* files anymore.
jetstream regression may indicate the increase in compilation times. I
would suggest to have n->Opcode() == Op_Opaque1 checks in both cases to
target only it. Sorry, in my previous mail I incorrectly used Op_CastII
in the example code.
Thanks,
Vladimir
On 11/12/14 2:16 AM, Roland Westrelin wrote:
> Vladimir,
>
> New webrev with your suggested change:
>
> http://cr.openjdk.java.net/~roland/8054478/webrev.02/
>
> I also had to make BoolTest::dump_on available in product builds for that code:
>
> 138 } else {
> 139 stringStream ss;
> 140 test.dump_on(&ss);
> 141 fatal(err_msg_res("unexpected comparison %s", ss.as_string()));
> 142 }
>
> in castnode.cpp
>
> I did a refworkload run on x64:
>
> ============================================================================
> tbase: reference_server
> Benchmark Samples Mean Stdev
> jetstream 10 102.74 0.10
> Write 10 202.90 0.94
> Parse 10 37.10 0.05
> Read 10 23.20 0.13
> rw.runtime 10 10.60 0.05
> Copy 10 641.70 0.10
> scimark 10 1600.34 0.02
> LU 10 3776.61 0.00
> FFT 10 405.76 0.02
> Monte 10 775.88 0.00
> SOR 10 1275.16 0.01
> Sparse 10 1768.28 0.08
> rw.runtime 10 27.20 0.02
> specjbb2000 10 487411.56 0.04
> Last_Warehouse 10 487411.58 0.04
> First_Warehouse 10 95725.01 0.02
> rw.runtime 10 603.90 0.00
> specjbb2005 10 480939.78 0.01
> last 10 480939.79 0.01
> interval_average 10 5937.40 0.01
> peak 10 534158.07 0.02
> overall_average 10 431366.64 0.02
> last_warehouse 10 8.00 0.00
> peak_warehouse 10 2.30 0.21
> first 10 50951.62 0.03
> rw.runtime 10 461.60 0.00
> specjvm98 10 801.92 0.04
> compress 10 604.68 0.10
> javac 10 405.68 0.17
> db 10 449.78 0.00
> jack 10 631.14 0.05
> mtrt 10 2342.41 0.03
> jess 10 957.35 0.02
> rw.runtime 10 41.10 0.04
> mpegaudio 10 1385.76 0.00
> volano25 10 327418.69 0.06
> time 10 2.45 0.05
> connections 10 400.00 0.00
> rw.runtime 10 26.10 0.01
> --------------------------------------------------------------------------
> Weighted Geomean 31199.92
> ============================================================================
> tnew: reference_server
> Benchmark Samples Mean Stdev %Diff P Significant
> jetstream 10 85.82 0.12 -16.47 0.002 Yes
> Write 10 219.60 0.31 -8.23 0.799 *
> Parse 10 36.90 0.06 0.54 0.836 *
> Read 10 23.60 0.10 -1.72 0.741 *
> rw.runtime 10 13.80 0.03 -30.19 0.000 Yes
> Copy 10 1058.70 0.26 -64.98 0.001 Yes
> scimark 10 1583.22 0.00 -1.07 0.110 *
> LU 10 3778.49 0.00 0.05 0.160 *
> FFT 10 407.80 0.01 0.50 0.498 *
> Monte 10 775.71 0.00 -0.02 0.279 *
> SOR 10 1276.75 0.01 0.12 0.742 *
> Sparse 10 1677.37 0.01 -5.14 0.086 *
> rw.runtime 10 27.50 0.02 -1.10 0.265 *
> specjbb2000 10 491672.03 0.04 0.87 0.624 *
> Last_Warehouse 10 491672.04 0.04 0.87 0.624 *
> First_Warehouse 10 94172.03 0.02 -1.62 0.050 *
> rw.runtime 10 604.00 0.00 -0.02 0.585 *
> specjbb2005 10 481051.28 0.01 0.02 0.945 *
> last 10 481051.29 0.01 0.02 0.945 *
> interval_average 10 5938.90 0.01 0.03 0.940 *
> peak 10 538706.19 0.03 0.85 0.461 *
> overall_average 10 434244.96 0.02 0.67 0.535 *
> last_warehouse 10 8.00 0.00 -0.00 0.000 *
> peak_warehouse 10 2.00 0.00 13.04 0.081 *
> first 10 51039.34 0.03 0.17 0.889 *
> rw.runtime 10 462.20 0.00 -0.13 0.120 *
> specjvm98 10 806.31 0.04 0.55 0.738 *
> compress 10 623.61 0.10 3.13 0.494 *
> javac 10 402.37 0.10 -0.81 0.898 *
> db 10 450.58 0.00 0.18 0.327 *
> jack 10 627.02 0.05 -0.65 0.785 *
> mtrt 10 2281.86 0.03 -2.58 0.106 *
> jess 10 1000.65 0.10 4.52 0.220 *
> rw.runtime 10 40.90 0.03 0.49 0.761 *
> mpegaudio 10 1384.19 0.00 -0.11 0.514 *
> volano25 10 324906.00 0.08 -0.77 0.799 *
> time 10 2.47 0.07 -1.00 0.733 *
> connections 10 400.00 0.00 0.00 0.000 *
> rw.runtime 10 26.50 0.02 -1.53 0.058 *
> --------------------------------------------------------------------------
> Weighted Geomean 30613.61 -1.88
> ============================================================================
>
> tbase is current hotspot-comp. tnew is with the change. It’s ok, right?
>
> Roland.
>
>
>> On Nov 8, 2014, at 12:08 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> This looks good but I just realized that what you are doing in Opaque1Node::Ideal() we usually do in PhaseIterGVN::add_users_to_worklist().
>>
>> For example you can do there (for Phi):
>>
>> + uint use_op = use->Opcode();
>> if( use->is_Cmp() ) { // Enable CMP/BOOL optimization
>> add_users_to_worklist(use); // Put Bool on worklist
>> - // Look for the 'is_x2logic' pattern: "x ? : 0 : 1" and put the
>> - // phi merging either 0 or 1 onto the worklist
>> if (use->outcnt() > 0) {
>> Node* bol = use->raw_out(0);
>> if (bol->outcnt() > 0) {
>> Node* iff = bol->raw_out(0);
>> if (use_op == Op_CmpI && iff->is_CountedLoopEnd() &&
>> n->Opcode() == Op_CastII) {
>> + // If this opaque node feeds into the limit condition of a
>> + // CountedLoop, we need to process the Phi node for the
>> + // induction variable: the range of values taken by the Phi is
>> + // known now and so its type is also known.
>> + CountedLoopEndNode* cle = iff->as_CountedLoopEnd();
>> + if (cle->limit() == this) {
>> + _worklist.push(cle->phi());
>> + }
>> - if (iff->outcnt() == 2) {
>> + } else if (iff->outcnt() == 2) {
>> + // Look for the 'is_x2logic' pattern: "x ? : 0 : 1" and put the
>> + // phi merging either 0 or 1 onto the worklist
>>
>>
>> Thanks,
>> Vladimir
>>
>>
>> On 11/7/14 2:17 PM, Roland Westrelin wrote:
>>> Vladimir,
>>>
>>> Thanks for the discussion, suggestions and fixes. Here is an updated webrev:
>>>
>>> http://cr.openjdk.java.net/~roland/8054478/webrev.01/
>>>
>>> Roland.
>>>
>>>> On Nov 6, 2014, at 11:12 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>>
>>>> On 11/6/14 7:39 AM, Roland Westrelin wrote:
>>>>>
>>>>>> I need to see new version of webrev. We talked about moving type change from Ideal() to Value(). You are right if the code, currently in ConstraintCastNode::Value(), will be executed first.
>>>>>
>>>>> I’ll send an updated webrev.
>>>>>
>>>>>>>>>> opaquenode.cpp
>>>>>>>>>>
>>>>>>>>>> What test you are talking about: "The pre loop is guarded by a test on an opaque node which is later removed"? I did not get first part of the code. You are putting on worklist a Phi from *previous* (pre-loop) loop. I would understand if you do that for the following (guarded main-, post-) loop, and that is already taking care by putting CastII on worklist.
>>>>>>>>>
>>>>>>>>> Once the range of values for the pre loop is know, we can optimize the test that guards the main loop. That range of values is only known once the opaque node for the pre loop is removed.
>>>>>>>>
>>>>>>>> That is what I am asking: "range of values for the pre loop is know" - when this happens, which Opaque1 is removed to make "range" to be known? If it is Opaque1 node from loop_limit_check predicate then we may need to make sure that iteration Phi of pre-loop is put on worklist when predicate's Opaque1 node is removed by cleanup_loop_predicates(). Then you don't need first part in Opaque1Node::Ideal.
>>>>>>>
>>>>>>> The Opaque1 nodes are the ones created by PhaseIdealLoop::insert_pre_post_loops() (loop limit checks). They are not in the Compile::_predicate_opaqs list and so they are not removed by cleanup_loop_predicates().
>>>>>>
>>>>>> So how these Opaque1 nodes affects type range of Phi node in pre-loop? That is what I don't understand.
>>>>>
>>>>> (I know you don’t like when I remove the text from previous emails but that was really getting confusing)
>>>>
>>>> At least leave webrev link so I don't need to search for it in previous mails.
>>>>
>>>>>
>>>>> PhiNode::Value() has this code:
>>>>>
>>>>> // Check for trip-counted loop. If so, be smarter.
>>>>> CountedLoopNode *l = r->is_CountedLoop() ? r->as_CountedLoop() : NULL;
>>>>> if( l && l->can_be_counted_loop(phase) &&
>>>>> ((const Node*)l->phi() == this) ) { // Trip counted loop!
>>>>> // protect against init_trip() or limit() returning NULL
>>>>> const Node *init = l->init_trip();
>>>>> const Node *limit = l->limit();
>>>>> if( init != NULL && limit != NULL && l->stride_is_con() ) {
>>>>> const TypeInt *lo = init ->bottom_type()->isa_int();
>>>>> const TypeInt *hi = limit->bottom_type()->isa_int();
>>>>> if( lo && hi ) { // Dying loops might have TOP here
>>>>> int stride = l->stride_con();
>>>>> if( stride < 0 ) { // Down-counter loop
>>>>> const TypeInt *tmp = lo; lo = hi; hi = tmp;
>>>>> stride = -stride;
>>>>> }
>>>>> if( lo->_hi < hi->_lo ) // Reversed endpoints are well defined :-(
>>>>> return TypeInt::make(lo->_lo,hi->_hi,3);
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> That code can only return something for the induction variable Phi once the opaque node for the loop limit check is removed. That’s why when the opaque node is removed I enqueue the induction variable Phi.
>>>>
>>>> My mistake was that I thought you are looking on Opaque1 node generated for main-loop guard:
>>>>
>>>> // Step B2: Build a zero-trip guard for the main-loop. After leaving the
>>>> // pre-loop, the main-loop may not execute at all. Later in life this
>>>> // zero-trip guard will become the minimum-trip guard when we unroll
>>>> // the main-loop.
>>>> Node *min_opaq = new Opaque1Node(C, limit);
>>>> Node *min_cmp = new CmpINode( pre_incr, min_opaq );
>>>>
>>>>
>>>> But you are talking about pre-loop exit check:
>>>>
>>>> // Step B4: Shorten the pre-loop to run only 1 iteration (for now).
>>>> // RCE and alignment may change this later.
>>>> Node *cmp_end = pre_end->cmp_node();
>>>> assert( cmp_end->in(2) == limit, "" );
>>>> Node *pre_limit = new AddINode( init, stride );
>>>>
>>>> // Save the original loop limit in this Opaque1 node for
>>>> // use by range check elimination.
>>>> Node *pre_opaq = new Opaque1Node(C, pre_limit, limit);
>>>>
>>>>
>>>> But in this case you can add check (cl->limit() == this) to be clear what you are looking for.
>>>> Also instead of looking up through AddI you can look down for CountedLoopEnd and get cle->phi() and cle->limit() from it.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>> In the case of this test case, the pre loop iterates from 0 as long as i > -1. So the Phi for the pre loop has values within [-1, 0]. The main loop is guarded by a test that checks whether The value from the Phi - 1 is greater than 0. With a correct range of values for the Phi from the code above, that checks is statically false and the main loop is optimized away. Otherwise it’s not.
>>>>>
>>>>> Roland.
>>>
>
More information about the hotspot-compiler-dev
mailing list