RFR (M) 8222074: Enhance auto vectorization for x86
Viswanathan, Sandhya
sandhya.viswanathan at intel.com
Wed May 1 19:11:02 UTC 2019
Thanks a lot Vladimir. I will look into fixing these.
Best Regards,
Sandhya
-----Original Message-----
From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
Sent: Wednesday, May 01, 2019 9:41 AM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86
So far, testing spotted a couple of minor issues:
windows build broken:
jib >
t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1572):
error C2220: warning treated as error - no 'object' file generated
jib >
t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1572):
warning C4101: 'inst': unreferenced local variable
jib >
t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1600):
warning C4101: 'inst': unreferenced local variable
jib >
t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1616):
warning C4101: 'inst': unreferenced local variable
jib >
t:/workspace/build/windows-x64/hotspot/variant-server/gensrc/adfiles/ad_x86.cpp(1632):
warning C4101: 'inst': unreferenced local variable
compiler/graalunit/HotspotTest.java:
org.graalvm.compiler.hotspot.test.CRC32SubstitutionsTest finished 1685.0 ms
org.graalvm.compiler.hotspot.test.CheckGraalIntrinsics started (7 of 44)
test: FAILED
test(org.graalvm.compiler.hotspot.test.CheckGraalIntrinsics)
java.lang.AssertionError: missing Graal intrinsics for:
java/lang/Math.abs(I)I
java/lang/Math.abs(J)J
at
org.graalvm.compiler.hotspot.test.CheckGraalIntrinsics.test(CheckGraalIntrinsics.java:646)
I'll respond on the patch itself separately.
Best regards,
Vladimir Ivanov
On 30/04/2019 18:14, Viswanathan, Sandhya wrote:
> Hi VladimirK,
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8222074
>
> Please find updated webrev at:
> http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.01/
>
> With this webrev the ad file has only about 60 lines effectively added.
> Also the generated product libjvm.so size only increases by about 0.26% vs the prior 1.50%.
> I have used multiple match rules in one instruct for same size shift related rules and also for the new Abs/Neg rules.
> What I noticed is that the adlc still duplicates lot of code and there is potential to further improve code size for multiple match rule case by improving the adlc itself.
> The adlc improvement (like removing duplicate emits, formats, expand, pipeline etc) can be done as a separate RFE.
>
> In this webrev, I have also fixed the errors reported by Vladimir Ivanov and corrected the issues reported by jcheck tool.
> Also taken into account reducing the temporary by using TEMP dst for multiply rules.
>
> The compiler jtreg tests and the java math tests pass on Haswell, SKX, and KNL.
>
> Your review and feedback is welcome.
>
> Best Regards,
> Sandhya
>
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Viswanathan, Sandhya
> Sent: Wednesday, April 10, 2019 10:22 AM
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; B. Blaser <bsrbnd at gmail.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86
>
> Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used:
>
> __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len);
> __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, vector_len);
>
> Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required.
>
> Best Regards,
> Sandhya
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, April 10, 2019 9:59 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; B. Blaser <bsrbnd at gmail.com>
> Cc: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86
>
> On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote:
>> Hi Bernard,
>>
>> One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src.
>
> Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead $tmp2 to avoid overwriting
> $src2 before we get value from it if $dst = $src2.
>
> On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result.
>
> It is a little mess which may cause ineffective use of registers in compiled code.
>
> Thanks,
> Vladimir
>
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: B. Blaser [mailto:bsrbnd at gmail.com]
>> Sent: Wednesday, April 10, 2019 4:10 AM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
>> Cc: Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86
>>
>> Hi Sandhya and Vladimir K.,
>>
>> On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya <sandhya.viswanathan at intel.com> wrote:
>>>
>>> Hi Vladimir,
>>>
>>> Yes, I missed the question below:
>>>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there?
>>>
>>> No it is not intentional, we can use the dst register in those cases and reduced the tmps.
>>
>> I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg():
>>
>> 7349 format %{"pmovsxbw $tmp,$src1\n\t"
>> 7350 "pmovsxbw $tmp2,$src2\n\t"
>>
>> I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think?
>>
>> Thanks,
>> Bernard
>>
More information about the hotspot-compiler-dev
mailing list