review (M) for 4809552: Optimize Arrays.fill(...)

Fri Aug 20 11:47:09 PDT 2010

On Aug 19, 2010, at 7:46 PM, Vladimir Kozlov wrote:

> Tom,
> 
> First, I would not call these changes Medium. They are Large at least.
> 
> Should we allow OptimizeFill only when UseLoopPredicate is true?

I could guard it.

> 
> loopTransform.cpp:
> 
> In match_fill_loop() should we exclude StoreCMNode also?

Probably.

> 
> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly?

It gets filtered out currently by the StoreP check but once I add oop support I'll filter it out explicitly.

> 
> store and store_value is not set for "copy candidate":
> 
> +         if (value->is_Load() && lpt->_body.contains(value)) {
> +           // tty->print_cr("possible copy candidate");
> +         } else {
> +           msg  = "variant store value";
> +         }

I deleted the copy candidate logic.  We may want to expand the fill match to match arraycopy idioms which was part of what I was looking for.  It didn't seem that common though.

> 
> Why you assume that on 'else' it is mem_phi?:
> 
> +       if (n == head->phi()) {
> +         // ok
> +       } else {
> +         // mem_phi
> +       }

I test explicitly for that case now and bailout if I see another phi.

> 
> Should we also skip proj node (ifFalse) or it is not part of loop body?
> 
> +     } else if (n->is_CountedLoopEnd()) {
> +       // ok so skip it.

I should probably test for it.  I really should be more careful in this loop.  I'm going to extend the checks in this loop to make sure I'm really looking at everything.

> +           msg = "node used outside loop";
>                       ^ is

ok.

> 
> How you translate next assert message?:
> +     assert(store_value->is_Load(), "shouldn't only happen for this case");

this is gone now.

> 
> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node:
> 
> + #ifdef ASSERT
> +     tty->print_cr("possible copy");
> +     store_value->dump();
> +     store->dump();
> + #endif
> +     msg = "variant store in loop";

This is gone too.

> For Op_LShiftX there is no check (n->in(1) == head->phi()):
> 
> +       } else if (n->Opcode() == Op_LShiftX) {
> +         shift = n;
> +         assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match");

I've added a check.

> s_offs already includes base_offset, see GraphKit::array_element_address():
> +     aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0);
> Also the above expression is wrong if initial index != 0.
> And actually you don't need to calculate it in match_fill_loop() since
> it is used only in call to StubRoutines::select_fill_function() to verify
> that element type is supported.

True.  I'll drop that copy.

> In intrinsify_fill() initial index value is taking into account for aligned
> but base_offset_in_bytes could be already part of offset and you need
> to multiply by element_size only initial index:
> 
> +   if (offset != NULL && head->init_trip()->is_Con()) {
> +     intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int();
> +     int element_size = type2aelembytes(t);
> +     aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0);
> +   }

Actually I don't think I should be including arrayOopDesc::base_offset_in_bytes(t) since it should be included in offset already.  It really should be:

    aligned = (offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int() * element_size) % HeapWordSize == 0);

> 
> stubRoutines.cpp:
> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already?

That was leftover debug code.  I've deleted the extra copy.

> 
> stubGenerator_sparc.cpp:
> +   //  Generate stub for disjoint short fill.  If "aligned" is true, the
>      ^ Generate stub for array fill.
> 
> +   //      from:  O0
>            ^ to
> +   //      to:    O1
>            ^ value

ok.

> 
> O5 is not used and not input argument:
> +     const Register offset    = O5;   // offset from start of arrays

ok.  I also corrected the comment below it to indicate that on O3 is used as a temp.

> 
> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong:
> +     switch (t) {
> +       case T_BOOLEAN:
> +       case T_BYTE:
> +         shift = 2;
> +         break;
> +       case T_CHAR:
> +       case T_SHORT:
> +         shift = 1;
> +         break;
> +       case T_FLOAT:
> +       case T_INT:
> +         shift = 0;
> +         break;
> +       default: ShouldNotReachHere();
> +     }
> 
> The same in assembler_x86.cpp

Yep.

> 
> In stubGenerator_x86_64.cpp
> new fill_32_bytes_forward() is not used.

I didn't even notice that was there.  Deleted.

> 
> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp

yep.

> 
> I did not look on assembler. May be tomorrow.

Ok.  The sparc version of the code is pretty much an exact duplicate structurally of the x86 code.  I just realized I need to do a little delay slot filling on sparc.  I'll fix the rest of this stuff and send out a new webrev.

tom

> 
> Thanks,
> Vladimir
> 
> 
> Tom Rodriguez wrote:
>> 4809552: Optimize Arrays.fill(...)
>> Reviewed-by:
>> This adds new logic to recognize fill idioms and convert them into a
>> call to an optimized fill routine.  Loop predication creates easily
>> matched loops that are simply replaced with calls to the new assembly
>> stubs.  Currently only 1,2 and 4 byte primitive types are supported.
>> Objects and longs/double will be supported in a later putback.  Tested
>> with runthese, nsk and ctw plus jbb2005.
>> http://cr.openjdk.java.net/~never/4809552