RFR: JDK-8196064: AArch64: Merging ld/st into ldp/stp in macro-assembler

Zhongwei Yao zhongwei.yao at linaro.org
Fri Feb 9 02:59:25 UTC 2018


Ping... Is this patch OK?

On 3 February 2018 at 10:44, Zhongwei Yao <zhongwei.yao at linaro.org> wrote:
> Hi, Andrew,
>
> Thanks for your review and feedback!
>
> Patch is updated: http://cr.openjdk.java.net/~zyao/8196064/webrev.01/
>
> It passes jtreg test.
>
> On 1 February 2018 at 23:36, Andrew Haley <aph at redhat.com> wrote:
>> On 30/01/18 03:30, Zhongwei Yao wrote:
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8196064
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~zyao/8196064/webrev.00
>>>
>>> This patch merges adjacent load/store into ldp/stp in macro-assembler
>>> as discussed in previous thread:
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-December/027957.html
>>>
>>> My micro-benchmark case gives about 8% performance improvement with this patch.
>>>
>>> And this patch is based on commit be48be2 due to recent build failure
>>> on jdk/hs master, and it passes all jtreg tests.
>>>
>>> Please help to review it.
>>
>> Very nice.  You've done a good job.  This merging opportunity happens
>> a lot, and it reduces not only the size of compiled code but also the
>> size of the interpreter and the generated stubs.  It's definitely
>> worth doing.
>>
>> before: ArrObj.testArrObjCopy  avgt   20  3.075 ? 0.002  us/op
>>  after: ArrObj.testArrObjCopy  avgt   20  2.779 ? 0.004  us/op
>>
>> So, some improvement.  One thing I noticed:
>>
>> Compiled method (c2)    6812  627       4       org.linaro.benchmarks.ArrObj::testArrObjCopy (32 bytes)
>>
>>  ;; B44: #      B45 <- B43  Freq: 40.1247
>>
>>   0x000003ff94b0db50: ldr       x17, [sp]
>>   0x000003ff94b0db54: ldp       x18, x1, [sp,#8]
>>   0x000003ff94b0db58: ldp       x10, x3, [sp,#24]
>>   0x000003ff94b0db5c: ldp       x4, x11, [sp,#40]
>>   0x000003ff94b0db60: ldr       x6, [sp,#56]
>>
>> It seems to me like there could be four ldp instructions here.  There
>> aren't (as far as I can see) because they're in an unfortunate order,
>> so the MacroAssembler doesn't see the opportunity:
>>
>>  ;; B44: #      B45 <- B43  Freq: 39.2717
>>
>>   0x000003ffad17bbf8: ldr       x17, [sp]
>>   0x000003ffad17bbfc: ldr       x1, [sp,#16]
>>   0x000003ffad17bc00: ldr       x18, [sp,#8]
>>   0x000003ffad17bc04: ldr       x10, [sp,#24]
>>   0x000003ffad17bc08: ldr       x3, [sp,#32]
>>   0x000003ffad17bc0c: ldr       x4, [sp,#40]
>>   0x000003ffad17bc10: ldr       x11, [sp,#48]
>>   0x000003ffad17bc14: ldr       x6, [sp,#56]
>>
>> This is OK: I don't think we want to make MacroAssembler even more
>> complicated in order to handle occasional out-of-order spills.
> I agree.
>
>>
>> Also, last_membar and last_ldst seem to be doing the same thing.  I
>> think that you could simply have a single last_insn field rather than
>> both last_membar and last_ldst, and then inspect the instruction to
>> see if it's a membar or a ld/st.
> Done
>
>>
>> Finally, please add BLOCK_COMMENTs in merge_ldst, like this:
>>
>>   if (!is_store) {
>>     BLOCK_COMMENT("merged ldr pair");
>>     if (sz == 8) {
>>       ldp(rt_low, rt_high, adr_p);
>>     } else {
>>       ldpw(rt_low, rt_high, adr_p);
>>     }
>>   } else {
>>     BLOCK_COMMENT("merged str pair");
>>     if (sz == 8) {
>>       stp(rt_low, rt_high, adr_p);
>>     } else {
>>       stpw(rt_low, rt_high, adr_p);
>>     }
>>   }
>>
> Done
>
>> Thanks.
>>
>> --
>> Andrew Haley
>> Java Platform Lead Engineer
>> Red Hat UK Ltd. <https://www.redhat.com>
>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>
>
>
> --
> Best regards,
> Zhongwei



-- 
Best regards,
Zhongwei


More information about the hotspot-compiler-dev mailing list