RFR: JDK-8196064: AArch64: Merging ld/st into ldp/stp in macro-assembler
Zhongwei Yao
zhongwei.yao at linaro.org
Sat Feb 3 02:44:50 UTC 2018
Hi, Andrew,
Thanks for your review and feedback!
Patch is updated: http://cr.openjdk.java.net/~zyao/8196064/webrev.01/
It passes jtreg test.
On 1 February 2018 at 23:36, Andrew Haley <aph at redhat.com> wrote:
> On 30/01/18 03:30, Zhongwei Yao wrote:
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8196064
>>
>> Webrev:
>> http://cr.openjdk.java.net/~zyao/8196064/webrev.00
>>
>> This patch merges adjacent load/store into ldp/stp in macro-assembler
>> as discussed in previous thread:
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-December/027957.html
>>
>> My micro-benchmark case gives about 8% performance improvement with this patch.
>>
>> And this patch is based on commit be48be2 due to recent build failure
>> on jdk/hs master, and it passes all jtreg tests.
>>
>> Please help to review it.
>
> Very nice. You've done a good job. This merging opportunity happens
> a lot, and it reduces not only the size of compiled code but also the
> size of the interpreter and the generated stubs. It's definitely
> worth doing.
>
> before: ArrObj.testArrObjCopy avgt 20 3.075 ? 0.002 us/op
> after: ArrObj.testArrObjCopy avgt 20 2.779 ? 0.004 us/op
>
> So, some improvement. One thing I noticed:
>
> Compiled method (c2) 6812 627 4 org.linaro.benchmarks.ArrObj::testArrObjCopy (32 bytes)
>
> ;; B44: # B45 <- B43 Freq: 40.1247
>
> 0x000003ff94b0db50: ldr x17, [sp]
> 0x000003ff94b0db54: ldp x18, x1, [sp,#8]
> 0x000003ff94b0db58: ldp x10, x3, [sp,#24]
> 0x000003ff94b0db5c: ldp x4, x11, [sp,#40]
> 0x000003ff94b0db60: ldr x6, [sp,#56]
>
> It seems to me like there could be four ldp instructions here. There
> aren't (as far as I can see) because they're in an unfortunate order,
> so the MacroAssembler doesn't see the opportunity:
>
> ;; B44: # B45 <- B43 Freq: 39.2717
>
> 0x000003ffad17bbf8: ldr x17, [sp]
> 0x000003ffad17bbfc: ldr x1, [sp,#16]
> 0x000003ffad17bc00: ldr x18, [sp,#8]
> 0x000003ffad17bc04: ldr x10, [sp,#24]
> 0x000003ffad17bc08: ldr x3, [sp,#32]
> 0x000003ffad17bc0c: ldr x4, [sp,#40]
> 0x000003ffad17bc10: ldr x11, [sp,#48]
> 0x000003ffad17bc14: ldr x6, [sp,#56]
>
> This is OK: I don't think we want to make MacroAssembler even more
> complicated in order to handle occasional out-of-order spills.
I agree.
>
> Also, last_membar and last_ldst seem to be doing the same thing. I
> think that you could simply have a single last_insn field rather than
> both last_membar and last_ldst, and then inspect the instruction to
> see if it's a membar or a ld/st.
Done
>
> Finally, please add BLOCK_COMMENTs in merge_ldst, like this:
>
> if (!is_store) {
> BLOCK_COMMENT("merged ldr pair");
> if (sz == 8) {
> ldp(rt_low, rt_high, adr_p);
> } else {
> ldpw(rt_low, rt_high, adr_p);
> }
> } else {
> BLOCK_COMMENT("merged str pair");
> if (sz == 8) {
> stp(rt_low, rt_high, adr_p);
> } else {
> stpw(rt_low, rt_high, adr_p);
> }
> }
>
Done
> Thanks.
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
--
Best regards,
Zhongwei
More information about the hotspot-compiler-dev
mailing list