[aarch64-port-dev ] RFR: aarch64: elide DecodeN when followed by CmpP 0

Thu Sep 24 09:14:21 UTC 2015

Thank you, Andrew

After this explanation I agree with original proposal to add 
cmpP_narrowOop_imm0_branch().

Vladimir

On 9/24/15 4:30 PM, Andrew Dinn wrote:
> Hi Vladimir,
>
> On 24/09/15 03:12, Vladimir Kozlov wrote:
>> . . .
>> What about Andrew Dinn response few days ago? He said next change in
>> aarch64.ad fixed the problem:
>>
>> bool Matcher::narrow_oop_use_complex_address() {
>>    assert(UseCompressedOops, "only for compressed oops code");
>>    return (LogMinObjAlignmentInBytes <= 3);
>> }
>> bool Matcher::narrow_klass_use_complex_address() {
>>    assert(UseCompressedOops, "only for compressed oops code");
>>    return (LogKlassAlignmentInBytes <= 3);
>> }
>>
>>  From what I see in aarch64.ad it support scaled memory which is
>> different from sparc:
>>
>> operand indIndexScaledOffsetI(iRegP reg, iRegL lreg, immIScale scale,
>> immIU12 off)
>> %{
>>    constraint(ALLOC_IN_RC(ptr_reg));
>>    match(AddP (AddP reg (LShiftL lreg scale)) off);
>>
>> Or I am missing something?
>
> Yes, I am afraid so. We do have that operand but it only exists at this
> abstract level. Due to the available addressing modes on AArch64 we
> encode operations using this type of memory operand as an insn pair: an
> lea (i.e. an add) followed by a ldrx with shift -- see helper function
> loadStore defined in aarch64.ad and used by all the aarch64_enc_ldrx
> encodings for details.
>
> As Ed pointed out to me (privately) that means that my redefinition of
> narrow_oop_use_complex_address makes some important test cases suffer
> the worst of all available outcomes. In those cases we lose the implicit
> null check (paying the price of an explicit compare and branch) while
> still having to plant an add+ldrx pair to do the load. This turns out to
> be too much of a performance hit for the redefinition to be acceptable.
>
> Meanwhile, if we keep the old definition then when we want to do an oop
> null test (as with the example Ed provided) we end up doing the
> unnecessary shift.
>
> So, I think the only way to steer between these two pessimal outcomes is
> to follow his suggested patch i.e. retain the old definition /and/
> include a rule which catches subgraphs in the format (If cmp (CmpP
> (DecodeN oop) zero)). I suppose we might also achieve this with a
> peephole optimization but Ed's solution seem sobvious.
>
> Also, in the interest of full disclosure I'll confess that it was not me
> who inserted the old definition. My original version (return false) was
> revised by Andrew Haley to return Universe::narrow_oop_shift == 0. I
> guess he already worked through all this and knew what he was doing :-)
>
> regards,
>
>
> Andrew Dinn
> -----------
>