RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two
Boris Ulasevich
boris.ulasevich at bell-sw.com
Tue Aug 25 09:47:11 UTC 2020
On 25.08.2020 12:17, Andrew Haley wrote:
> On 25/08/2020 09:57, Boris Ulasevich wrote:
>> Hi,
>>
>> On 25.08.2020 11:10, Andrew Haley wrote:
>>> Hi,
>>>
>>> On 23/08/2020 19:20, Boris Ulasevich wrote:
>>> >
>>> > Please review the updated change to C2 and AArch64 which introduces
>>> > a new BitfieldInsert node to replace Or+Shift+And sequence when
>>> possible.
>>> > Single BFI instruction is emitted for the new node.
>>> >
>>> > With the current change all the transformation logic is moved
>>> out of
>>> > aarch64.ad file into the common C2 code.
>>> >
>>> > http://bugs.openjdk.java.net/browse/JDK-8249893
>>> > http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01
>>> >
>>> > The change in compiler.cpp was done to implicitly ask IGVN to run
>>> > the idealization once again after the loop optimization phase.
>>> > This extra step is necessary to make the BFI transform happen
>>> > only after loop optimization.
>>>
>>> So here's a strange thing. When I run a simple JMH test
>>>
>>> @State(Scope.Benchmark)
>>> public static class Result {
>>> public int a, b;
>>> public long x;
>>> }
>>>
>>> @Benchmark
>>> public static int bfm(Result r) {
>>> return (r.a & 0xFF) | ((r.b & 0xFF) << 8);
>>> }
>>>
>>> I get
>>>
>>> 0x0000ffff84644df0: ubfiz w12, w11, #8, #8
>>> 0x0000ffff84644df4: and w10, w10, #0xff
>>> 0x0000ffff84644df8: orr w2, w10, w12 ;*ior
>>> {reexecute=0 rethrow=0 return_oop=0}
>>> ; -
>>> org.openjdk.Rotates::bfm at 19 (line 22)
>>> ; -
>>> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line
>>> 199)
>>>
>>> instead of
>>>
>>> 0x0000ffff808554b4: and w10, w10, #0xff
>>> 0x0000ffff808554b8: and w12, w12, #0xff
>>> 0x0000ffff808554bc: orr w2, w12, w10, lsl #8 ;*ior
>>> ; -
>>> org.openjdk.Rotates::bfm at 19 (line 22)
>>> ; -
>>> org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub at 20 (line
>>> 199)
>>>
>>> Do you have any ideas why this might be? Thanks.
>>>
>>
>> Both variants are correct, isn't it?
>
> Well, yes. But I thought that the idea was to generate fewer
> instructions.
>
>> I think matcher preferred UBFIZto OR rule becauseins_costwas set to 1.9
>> for OR:
>> https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l12130
>>
>> https://hg.openjdk.java.net/jdk/jdk/file/92ddc6fe60eb/src/hotspot/cpu/aarch64/aarch64.ad#l11675
>>
>>
>> With my change it would work like this:
>>
>> 0x0000ffff7c587fe0: and w2, w10, #0xff
>> 0x0000ffff7c587fe8: bfi x2, x12, #8, #8
>
> But it didn't. I'm asking you why that is. The first code I showed you
> was the JMH test
> in http://cr.openjdk.java.net/~aph/scratch/. This was after I applied
> your patch.
Ok. Can you please check that my patch [1] has been applied
and built correctly. With my change I see this picture:
....[Hottest Region 2]...........................................
c2, level 4, org.openjdk.generated.Rotates_bfm_jmhTest::bfm_avgt_jmhStub,
0x0000ffff84584dac: add x11, x14, #0x94
0x0000ffff84584db0: stp x21, x19, [sp]
0x0000ffff84584db4: stp x20, x14, [sp, #16]
0x0000ffff84584db8: stp x15, x10, [sp, #32]
0x0000ffff84584dbc: str x11, [sp, #48]
0x0000ffff84584dc0: b 0x0000ffff84584dd8
0x0000ffff84584dc4: nop
0x0000ffff84584dc8: nop
0x0000ffff84584dcc: nop
3.64% ↗ 0x0000ffff84584dd0: str x19, [sp, #16]
0.07% │ 0x0000ffff84584dd4: mov x16, x29
│ 0x0000ffff84584dd8: ldr w10, [x16, #12] ;*invokestatic bfm
3.92% │ 0x0000ffff84584ddc: ldr w12, [x16, #24]
4.69% │ 0x0000ffff84584de0: and w2, w10, #0xff
0.03% │ 0x0000ffff84584de4: mov x29, x16
0.02% │ 0x0000ffff84584de8: bfi x2, x12, #8, #8 ;*ior
{reexecute=0 rethrow=0 return_oop=0}
│ ; -
org.openjdk.Rotates::bfm at 19 (line 23)
[1] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.01/jdk-jdk.patch
More information about the hotspot-compiler-dev
mailing list