RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two
Boris Ulasevich
boris.ulasevich at bell-sw.com
Sat Aug 29 15:39:02 UTC 2020
Hi Andrew,
Thank you once again.
Can you please look at my update. I have added a functional test to
demonstrate which cases are covered by the change and made a small
update (OrI case in is_bitrange_zero) to add the missing transformation
on java.awt.Color case:
http://cr.openjdk.java.net/~bulasevich/8249893/webrev.02
The test shows successful transformation for typical int/long value
construction cases I found in jdk java sources:
((a & 0xFF) << 24) | ((r & 0xFF) << 16) | ((g & 0xFF) << 8) | (b & 0xFF)
(high << 32) | (low & 0xffffffffL)
Was there anything else among your test cases?
On my test case SubTest0::tst2 output I see that the BFI transformation
works, but for this particular case (compiled with template=template1
where value1=value2) the result is not faster than default one.
(value1 & 0x1L) | ((value1 & 0x1L) << 3)
:
and x11, x2, #0x1
orr x11, x11, x11, lsl #3
->
and x11, x2, #0x1
bfi x11, x2, #3, #1
I think it is Ok, using bfi here does not reduce the number of
instructions used. The same case with different inputs
(template=template2) is better:
(value1 & 0x1L) | ((valueC & 0x1L) << 1)
:
and x18, x10, #0x1
and x10, x1, #0x1
orr x10, x10, x18, lsl #3
->
and x11, x3, #0x1
bfi x11, x18, #3, #1
Do you think TestBFI test cases are Ok or I should implement more
checks? The "a << 24 >>> 24" case IMO should be implemented as a
LShiftI::Ideal transformation which should be done separately.
thanks,
Boris
On 26.08.2020 17:21, Andrew Haley wrote:
> On 25/08/2020 18:30, Boris Ulasevich wrote:
>> I believe masking with left shift and right shift is not common.
>> Search though jdk repository does not give such patterns while
>> there is a hundreds of mask+lshift expressions.
>
>> I implemented a simple is_bitrange_zero() method for counting the
>> bitranges of sub-expressions: power-of-two masks and left shift only.
>> We can take into account more cases (careful testing is a main
>> concern). But particularly about "r.a << 24 >>> 24" expression
>> I think it is worse to think about canonicalization: "left shift + right
>> shift" to "mask + left shift" (or may be the backwards).
> I'm running your test program, and for example I get this, old on the
> left, new on the right.
>
> Compiled method (c2) 11832 1113 SubTest0::tst2 (184 bytes)
>
> : and x11, x2, #0x1 ;*land : and x11, x2, #0x1
> : and x10, x1, #0x1 ;*land : and x10, x1, #0x1
> : orr x11, x11, x11, lsl #3 : bfi x11, x2, #3, #1
> : orr x10, x10, x10, lsl #3 : bfi x10, x1, #3, #1
> : and xmethod, x3, #0x1 ;*land : and xmethod, x3, #0x1
> : add x10, x10, x11 : bfi xmethod, x3, #3, #1
> : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11
> : and xmethod, x4, #0x1 ;*land : and x11, x4, #0x1
> : add x10, x11, x10 : bfi x11, x4, #3, #1
> : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod
> : and xmethod, x5, #0x1 ;*land : and xmethod, x5, #0x1
> : add x10, x11, x10 : bfi xmethod, x5, #3, #1
> : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11
> : and xmethod, x6, #0x1 ;*land : and x11, x6, #0x1
> : add x10, x11, x10 : bfi x11, x6, #3, #1
> : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod
> : and xmethod, x7, #0x1 ;*land : and xmethod, x7, #0x1
> : add x10, x11, x10 : bfi xmethod, x7, #3, #1
> : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11
> : and xmethod, x0, #0x1 ;*land : add x10, x10, xmethod
> : add x10, x11, x10 : ldr x13, [sp,#32]
> : orr x11, xmethod, xmethod, lsl #3 : and x11, x0, #0x1
> : ldr xmethod, [sp,#32] : and xmethod, x13, #0x1
> : and xmethod, xmethod, #0x1 : bfi x11, x0, #3, #1
> : add x10, x11, x10 : bfi xmethod, x13, #3, #1
> : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11
> : ldr xmethod, [sp,#40] : ldr x13, [sp,#40]
> : and xmethod, xmethod, #0x1 : and x11, x13, #0x1
> : add x10, x11, x10 : bfi x11, x13, #3, #1
> : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod
> : ldr xmethod, [sp,#48] : ldr x13, [sp,#48]
> : and xmethod, xmethod, #0x1 : and xmethod, x13, #0x1
> : add x10, x11, x10 : bfi xmethod, x13, #3, #1
> : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, x11
> : ldr xmethod, [sp,#56] : ldr x13, [sp,#56]
> : and xmethod, xmethod, #0x1 : and x11, x13, #0x1
> : add x10, x11, x10 : bfi x11, x13, #3, #1
> : orr x11, xmethod, xmethod, lsl #3 : add x10, x10, xmethod
> : add x0, x11, x10 ;*ladd : add x0, x10, x11
>
> I've also tried a bunch of different test cases doing operations that
> could match BFI instructions, and in only a few of them does it
> happen. In almost all cases, then, this change does not help, *even
> your own test case*.
>
> I think that you've got something that is potentially useful, but it
> needs some careful analysis to make sure it actually gets used.
>
More information about the hotspot-compiler-dev
mailing list