review (M) for 4809552: Optimize Arrays.fill(...)
Tom Rodriguez
tom.rodriguez at oracle.com
Fri Aug 20 11:26:33 PDT 2010
On Aug 20, 2010, at 10:57 AM, Vladimir Kozlov wrote:
> Actually I think it does not make sense to use movdqu()
> since it still slower on cache line boundary.
Agreed.
> Align to 8 bytes always and use only movdqa().
You mean 16 byte?
tom
>
> Vladimir
>
> Vladimir Kozlov wrote:
>> Assembler part review.
>> In stubGenerator_sparc.cpp
>> Move next lines above 64 bit value construction:
>> + __ cmp(count, 2<<shift); // Short arrays (< 8 bytes) copy by element
>> + __ brx(Assembler::lessUnsigned, false, Assembler::pn, L_copy_4_bytes); // use unsigned cmp
>> + __ delayed()->nop();
>> In assembler_x86.cpp
>> You don't need next:
>> + jmpb(L_copy_4_bytes); // all dwords were copied
>> Use next movdqa since you aligned address:
>> + movdqa(Address(to, 0), xtmp);
>> + movdqa(Address(to, 16), xtmp);
>> instead of
>> + movq(Address(to, 0), xtmp);
>> + movq(Address(to, 8), xtmp);
>> + movq(Address(to, 16), xtmp);
>> + movq(Address(to, 24), xtmp);
>> Vladimir
>> Vladimir Kozlov wrote:
>>> Tom,
>>>
>>> First, I would not call these changes Medium. They are Large at least.
>>>
>>> Should we allow OptimizeFill only when UseLoopPredicate is true?
>>>
>>> loopTransform.cpp:
>>>
>>> In match_fill_loop() should we exclude StoreCMNode also?
>>>
>>> RAW store check is hidden in as_AddP()->unpack_offsets(). Should we do it explicitly?
>>>
>>> store and store_value is not set for "copy candidate":
>>>
>>> + if (value->is_Load() && lpt->_body.contains(value)) {
>>> + // tty->print_cr("possible copy candidate");
>>> + } else {
>>> + msg = "variant store value";
>>> + }
>>>
>>> Why you assume that on 'else' it is mem_phi?:
>>>
>>> + if (n == head->phi()) {
>>> + // ok
>>> + } else {
>>> + // mem_phi
>>> + }
>>>
>>> Should we also skip proj node (ifFalse) or it is not part of loop body?
>>>
>>> + } else if (n->is_CountedLoopEnd()) {
>>> + // ok so skip it.
>>>
>>> + msg = "node used outside loop";
>>> ^ is
>>>
>>> How you translate next assert message?:
>>> + assert(store_value->is_Load(), "shouldn't only happen for this case");
>>>
>>> the next dump should be under flag and 'msg' should reflect "possible copy" or set msg_node:
>>>
>>> + #ifdef ASSERT
>>> + tty->print_cr("possible copy");
>>> + store_value->dump();
>>> + store->dump();
>>> + #endif
>>> + msg = "variant store in loop";
>>>
>>> For Op_LShiftX there is no check (n->in(1) == head->phi()):
>>>
>>> + } else if (n->Opcode() == Op_LShiftX) {
>>> + shift = n;
>>> + assert(type2aelembytes(store->as_Mem()->memory_type(), true) == 1 << shift->in(2)->get_int(), "scale should match");
>>>
>>>
>>> s_offs already includes base_offset, see GraphKit::array_element_address():
>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + s_offs * element_size) % HeapWordSize == 0);
>>> Also the above expression is wrong if initial index != 0.
>>> And actually you don't need to calculate it in match_fill_loop() since
>>> it is used only in call to StubRoutines::select_fill_function() to verify
>>> that element type is supported.
>>>
>>>
>>> In intrinsify_fill() initial index value is taking into account for aligned
>>> but base_offset_in_bytes could be already part of offset and you need
>>> to multiply by element_size only initial index:
>>>
>>> + if (offset != NULL && head->init_trip()->is_Con()) {
>>> + intptr_t offs = offset->find_intptr_t_type()->get_con() + head->init_trip()->get_int();
>>> + int element_size = type2aelembytes(t);
>>> + aligned = ((arrayOopDesc::base_offset_in_bytes(t) + offs * element_size) % HeapWordSize == 0);
>>> + }
>>>
>>> stubRoutines.cpp:
>>> why you have specialized copies for testing _jint_fill and _jbyte_fill. Is not it covered by TEST_FILL already?
>>>
>>> stubGenerator_sparc.cpp:
>>> + // Generate stub for disjoint short fill. If "aligned" is true, the
>>> ^ Generate stub for array fill.
>>>
>>> + // from: O0
>>> ^ to
>>> + // to: O1
>>> ^ value
>>>
>>> O5 is not used and not input argument:
>>> + const Register offset = O5; // offset from start of arrays
>>>
>>> stubs are generated only for byte,short and int, so allowing bollean, char and float is wrong:
>>> + switch (t) {
>>> + case T_BOOLEAN:
>>> + case T_BYTE:
>>> + shift = 2;
>>> + break;
>>> + case T_CHAR:
>>> + case T_SHORT:
>>> + shift = 1;
>>> + break;
>>> + case T_FLOAT:
>>> + case T_INT:
>>> + shift = 0;
>>> + break;
>>> + default: ShouldNotReachHere();
>>> + }
>>>
>>> The same in assembler_x86.cpp
>>>
>>> In stubGenerator_x86_64.cpp
>>> new fill_32_bytes_forward() is not used.
>>>
>>> Remove commented code for T_LONG in both stubGenerator_x86_??.cpp
>>>
>>> I did not look on assembler. May be tomorrow.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>
>>> Tom Rodriguez wrote:
>>>> 4809552: Optimize Arrays.fill(...)
>>>> Reviewed-by:
>>>>
>>>> This adds new logic to recognize fill idioms and convert them into a
>>>> call to an optimized fill routine. Loop predication creates easily
>>>> matched loops that are simply replaced with calls to the new assembly
>>>> stubs. Currently only 1,2 and 4 byte primitive types are supported.
>>>> Objects and longs/double will be supported in a later putback. Tested
>>>> with runthese, nsk and ctw plus jbb2005.
>>>>
>>>> http://cr.openjdk.java.net/~never/4809552
More information about the hotspot-compiler-dev
mailing list