[15] RFR (L) 8235825: C2: Merge AD instructions for Replicate nodes
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Dec 13 10:27:32 UTC 2019
Thanks for the feedback, Vladimir.
> replicateB
>
> Can you fold it differently?
>
> ReplB_reg_leg
Are you talking about ReplB_reg?
> predicate(!VM_Version::supports_avx512vlbw());
3151 instruct ReplB_reg_leg(legVec dst, rRegI src) %{
3152 predicate(n->as_Vector()->length() == 64 &&
!VM_Version::supports_avx512bw());
For ReplB_reg_leg the predicate can't be simplified: it is applicable
only to 512bit vector when AVX512BW is absent. Otherwise, legVec
constraint will be unnecessarily applied to other configurations.
3119 instruct ReplB_reg(vec dst, rRegI src) %{
3120 predicate((n->as_Vector()->length() <= 32) ||
3121 (n->as_Vector()->length() == 64 &&
VM_Version::supports_avx512bw()));
For ReplB_reg there's a shorter version:
predicate(n->as_Vector()->length() <= 32 ||
VM_Version::supports_avx512bw());
But do you find it easier to read? For example, when you are checking
that all configurations are covered:
predicate(n->as_Vector()->length() <= 32 ||
VM_Version::supports_avx512bw());
instruct ReplB_reg_leg(legVec dst, rRegI src) %{
predicate(n->as_Vector()->length() == 64 &&
!VM_Version::supports_avx512bw());
vs
instruct ReplB_reg_leg(legVec dst, rRegI src) %{
predicate(n->as_Vector()->length() == 64 &&
!VM_Version::supports_avx512bw());
instruct ReplB_reg(vec dst, rRegI src) %{
predicate((n->as_Vector()->length() <= 32) ||
(n->as_Vector()->length() == 64 &&
VM_Version::supports_avx512bw()));
> ins_encode %{
> uint vlen = vector_length(this);
> __ movdl($dst$$XMMRegister, $src$$Register);
> __ punpcklbw($dst$$XMMRegister, $dst$$XMMRegister);
> __ pshuflw($dst$$XMMRegister, $dst$$XMMRegister, 0x00);
> if (vlen > 8) {
> __ punpcklqdq($dst$$XMMRegister, $dst$$XMMRegister);
> if (vlen > 16) {
> __ vinserti128_high($dst$$XMMRegister, $dst$$XMMRegister);
> if (vlen > 32) {
> assert(vlen == 64, "sanity");
> __ vinserti64x4($dst$$XMMRegister, $dst$$XMMRegister,
> $dst$$XMMRegister, 0x1);
Yes, it should work as well. Do you find it easier to read though?
> Similar ReplB_imm_leg for which I don't see new implementation.
Good catch. Added it back.
(FTR completeness for reg2reg variants (_reg*) is mandatory. But for
_mem and _imm it is optional: if they some configuration isn't covered,
_reg is used. But it was in original code, so I added it back.)
Updated version:
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00
Another thing I noticed is that for ReplI/.../ReplD cases avx512vl
checks are not necessary:
http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.01/broadcast_vl/
The code assumes that evpbroadcastd/evpbroadcastq,
vpbroadcastdvpbroadcastq, vpbroadcastss/vpbroadcastsd need AVX512VL for
512bit case, but Intel manual says AVX512F is enough.
I plan to handle it as a separate change, but let me know if you want to
incorporate it into 8235825.
> It should also simplify code for avx512 which one or 2 instructions.
Can you elaborate, please? Are you talking about the case when version
for different vector sizes differ in 1-2 instructions (like ReplB_reg)?
> Other types changes can be done same way.
Best regards,
Vladimir Ivanov
> On 12/12/19 3:19 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/all/
>> https://bugs.openjdk.java.net/browse/JDK-8235825
>>
>> Merge AD instructions for the following vector nodes:
>> - ReplicateB, ..., ReplicateD
>>
>> Individual patches:
>>
>> http://cr.openjdk.java.net/~vlivanov/jbhateja/8235825/webrev.00/individual
>>
>>
>> Testing: tier1-4, test run on different CPU flavors (KNL, CLX)
>>
>> Contributed-by: Jatin Bhateja <jatin.bhateja at intel.com>
>> Reviewed-by: vlivanov, sviswanathan, ?
>>
>> Best regards,
>> Vladimir Ivanov
More information about the hotspot-compiler-dev
mailing list