[jdk18] RFR: 8278508: Enable X86 maskAll instruction pattern for 32 bit JVM. [v2]
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Sat Dec 18 00:31:34 UTC 2021
On Thu, 16 Dec 2021 17:46:35 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> - Vector.maskAll was accelerated for AVX-512 target, but x86 existing backend implementation does not enable maskAll instruction patterns for 32 bit JVM, due to which operations fall backs over replicateB operation which broadcasts the mask value in a vector.
>> - In some cases after unboxing-boxing optimization this vector eventually reaches to XorVMask which has different operands one held in opmask register and other in vector.
>>
>> Kindly review and share your feedback.
>>
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> 8278508: Review comments resolution.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4281:
> 4279: if (mask_len > 32) {
> 4280: kmovql(dst, src);
> 4281: kshiftrql(dst, dst, 64 - mask_len);
Here masklen is 64, so kshiftrql is not needed here?
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4284:
> 4282: } else {
> 4283: kmovdl(dst, src);
> 4284: kshiftrdl(dst, dst, 32 - mask_len);
If masklen is 32 then kshiftrdl is not needed. Only needed if masklen < 32.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4289:
> 4287: assert(mask_len <= 16, "");
> 4288: kmovwl(dst, src);
> 4289: kshiftrwl(dst, dst, 16 - mask_len);
If masklen == 16 then kshiftrwl is not needed?
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4299:
> 4297: kshiftlql(dst, tmp, 32);
> 4298: korql(dst, dst, tmp);
> 4299: kshiftrql(dst, dst, 64 - mask_len);
Do we need the kshiftrql here? The masklen is 64 here.
You could alternatively use:
kmovdl dst, src
kunpckdq dst, dst, dst
src/hotspot/cpu/x86/x86.ad line 9457:
> 9455: predicate(Matcher::vector_length(n) <= 32);
> 9456: match(Set dst (MaskAll cnt));
> 9457: effect(TEMP dst, TEMP tmp);
TEMP dst is not needed here.
src/hotspot/cpu/x86/x86.ad line 9470:
> 9468: predicate(Matcher::vector_length(n) <= 32);
> 9469: match(Set dst (MaskAll src));
> 9470: effect(TEMP dst);
TEMP dst is not needed.
src/hotspot/cpu/x86/x86_32.ad line 13853:
> 13851: predicate(Matcher::vector_length(n) <= 32);
> 13852: match(Set dst (MaskAll src));
> 13853: effect(TEMP dst);
TEMP dst is not needed.
src/hotspot/cpu/x86/x86_32.ad line 13897:
> 13895: %}
> 13896: ins_pipe( pipe_slow );
> 13897: %}
This can be removed. The more generic mask_all_evexI_GT32 rule will take care of this.
src/hotspot/cpu/x86/x86_64.ad line 13016:
> 13014: instruct mask_all_evexL(kReg dst, rRegL src) %{
> 13015: match(Set dst (MaskAll src));
> 13016: effect(TEMP dst);
TEMP dst not needed.
src/hotspot/cpu/x86/x86_64.ad line 13028:
> 13026: predicate(Matcher::vector_length(n) > 32);
> 13027: match(Set dst (MaskAll src));
> 13028: effect(TEMP dst, TEMP tmp);
TEMP dst not needed.
src/hotspot/cpu/x86/x86_64.ad line 13041:
> 13039: predicate(Matcher::vector_length(n) > 32);
> 13040: match(Set dst (MaskAll cnt));
> 13041: effect(TEMP dst, TEMP tmp);
TEMP dst not needed.
-------------
PR: https://git.openjdk.java.net/jdk18/pull/24
More information about the hotspot-compiler-dev
mailing list