RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two

Sat Aug 29 15:39:02 UTC 2020

Hi Andrew,

Thank you once again.

Can you please look at my update. I have added a functional test to
demonstrate which cases are covered by the change and made a small
update (OrI case in is_bitrange_zero) to add the missing transformation
on java.awt.Color case:

http://cr.openjdk.java.net/~bulasevich/8249893/webrev.02

The test shows successful transformation for typical int/long value
construction cases I found in jdk java sources:
((a & 0xFF) << 24) | ((r & 0xFF) << 16) | ((g & 0xFF) << 8) | (b & 0xFF)
(high << 32) | (low & 0xffffffffL)
Was there anything else among your test cases?

On my test case SubTest0::tst2 output I see that the BFI transformation
works, but for this particular case (compiled with template=template1
where value1=value2) the result is not faster than default one.

(value1 & 0x1L) | ((value1 & 0x1L) << 3)
:
and  x11, x2, #0x1
orr  x11, x11, x11, lsl #3
->
and  x11, x2, #0x1
bfi  x11, x2, #3, #1

I think it is Ok, using bfi here does not reduce the number of
instructions used. The same case with different inputs
(template=template2) is better:

(value1 & 0x1L) | ((valueC & 0x1L) << 1)
:
and  x18, x10, #0x1
and  x10, x1, #0x1
orr  x10, x10, x18, lsl #3
->
and  x11, x3, #0x1
bfi  x11, x18, #3, #1

Do you think TestBFI test cases are Ok or I should implement more
checks? The "a << 24 >>> 24" case IMO should be implemented as a
LShiftI::Ideal transformation which should be done separately.

thanks,
Boris

On 26.08.2020 17:21, Andrew Haley wrote:
> On 25/08/2020 18:30, Boris Ulasevich wrote:
>> I believe masking with left shift and right shift is not common.
>> Search though jdk repository does not give such patterns while
>> there is a hundreds of mask+lshift expressions.
>
>> I implemented a simple is_bitrange_zero() method for counting the
>> bitranges of sub-expressions: power-of-two masks and left shift only.
>> We can take into account more cases (careful testing is a main
>> concern). But particularly about "r.a << 24 >>> 24" expression
>> I think it is worse to think about canonicalization: "left shift + right
>> shift" to "mask + left shift" (or may be the backwards).
> I'm running your test program, and for example I get this, old on the
> left, new on the right.
>
> Compiled method (c2)   11832 1113             SubTest0::tst2 (184 bytes)
>
>    : and       x11, x2, #0x1   ;*land                            :   and     x11, x2, #0x1
>    : and       x10, x1, #0x1   ;*land                            :   and     x10, x1, #0x1
>    : orr       x11, x11, x11, lsl #3                             :   bfi     x11, x2, #3, #1
>    : orr       x10, x10, x10, lsl #3                             :   bfi     x10, x1, #3, #1
>    : and       xmethod, x3, #0x1  ;*land                         :   and     xmethod, x3, #0x1
>    : add       x10, x10, x11                                     :   bfi     xmethod, x3, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : and       xmethod, x4, #0x1  ;*land                         :   and     x11, x4, #0x1
>    : add       x10, x11, x10                                     :   bfi     x11, x4, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
>    : and       xmethod, x5, #0x1  ;*land                         :   and     xmethod, x5, #0x1
>    : add       x10, x11, x10                                     :   bfi     xmethod, x5, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : and       xmethod, x6, #0x1  ;*land                         :   and     x11, x6, #0x1
>    : add       x10, x11, x10                                     :   bfi     x11, x6, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
>    : and       xmethod, x7, #0x1  ;*land                         :   and     xmethod, x7, #0x1
>    : add       x10, x11, x10                                     :   bfi     xmethod, x7, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : and       xmethod, x0, #0x1  ;*land                         :   add     x10, x10, xmethod
>    : add       x10, x11, x10                                     :   ldr     x13, [sp,#32]
>    : orr       x11, xmethod, xmethod, lsl #3                     :   and     x11, x0, #0x1
>    : ldr       xmethod, [sp,#32]                                 :   and     xmethod, x13, #0x1
>    : and       xmethod, xmethod, #0x1                            :   bfi     x11, x0, #3, #1
>    : add       x10, x11, x10                                     :   bfi     xmethod, x13, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : ldr       xmethod, [sp,#40]                                 :   ldr     x13, [sp,#40]
>    : and       xmethod, xmethod, #0x1                            :   and     x11, x13, #0x1
>    : add       x10, x11, x10                                     :   bfi     x11, x13, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
>    : ldr       xmethod, [sp,#48]                                 :   ldr     x13, [sp,#48]
>    : and       xmethod, xmethod, #0x1                            :   and     xmethod, x13, #0x1
>    : add       x10, x11, x10                                     :   bfi     xmethod, x13, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, x11
>    : ldr       xmethod, [sp,#56]                                 :   ldr     x13, [sp,#56]
>    : and       xmethod, xmethod, #0x1                            :   and     x11, x13, #0x1
>    : add       x10, x11, x10                                     :   bfi     x11, x13, #3, #1
>    : orr       x11, xmethod, xmethod, lsl #3                     :   add     x10, x10, xmethod
>    : add       x0, x11, x10    ;*ladd                            :   add     x0, x10, x11
>
> I've also tried a bunch of different test cases doing operations that
> could match BFI instructions, and in only a few of them does it
> happen. In almost all cases, then, this change does not help, *even
> your own test case*.
>
> I think that you've got something that is potentially useful, but it
> needs some careful analysis to make sure it actually gets used.
>