RFR: 8354242: VectorAPI: combine vector not operation with compare [v11]

Emanuel Peter epeter at openjdk.org
Tue Sep 9 13:16:59 UTC 2025


On Wed, 9 Jul 2025 06:08:33 GMT, erifan <duke at openjdk.org> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
>> 
>> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
>> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
>> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
>> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
>> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
>> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
>> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
>> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
>> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
>> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
>> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
>> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
>> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
>> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
>> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
>> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
>> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
>> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
>> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
>> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
>> testCompareLTMaskNotInt		ops/s	16721...
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update the code comment

Looks much better, thanks for the updates!

I have another small list of suggestions :)

src/hotspot/share/opto/vectornode.cpp line 2243:

> 2241:   if (in1->Opcode() != Op_VectorMaskCmp ||
> 2242:       in1->outcnt() != 1 ||
> 2243:       !(in1->as_VectorMaskCmp())->predicate_can_be_negated() ||

Suggestion:

      !in1->as_VectorMaskCmp()->predicate_can_be_negated() ||

Brackets are unnecessary, and rather make it harder to read.

src/hotspot/share/opto/vectornode.cpp line 2277:

> 2275:     res = VectorNode::Ideal(phase, can_reshape);
> 2276:   }
> 2277:   return res;

What if someone comes and wants to add yet another optimization before `VectorNode::Ideal`? Your code layout would give us deeper and deeper nesting. I suggest flattening it like this:
Suggestion:


  Node* res = Ideal_XorV_VectorMaskCmp(phase, can_reshape);
  if (res != nullptr) { return res; }

  return VectorNode::Ideal(phase, can_reshape);

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 911:

> 909:         testCompareMaskNotLong(L_SPECIES_FOR_CAST, VectorOperators.UGE, (m) -> { return m.cast(I_SPECIES_FOR_CAST).not(); });
> 910:         verifyResultsLong(L_SPECIES_FOR_CAST, VectorOperators.UGE);
> 911:     }

You have some cast in here, and in similar tests.
Can you add an IR rule to check if we do or do not have the expected casts?

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 1007:

> 1005:         testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fninf, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
> 1006:         verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fninf);
> 1007:     }

Do you have test cases for the cases other than `EQ` and `NE`? After all, we don't that someone accidentally messes with the logic you implemented later and we don't notice the bug ;)

test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 351:

> 349:     public void testCompareULEMaskNotLong() {
> 350:         testCompareMaskNotLong(VectorOperators.ULE);
> 351:     }

You could consider making the operator a `@Param` next time.

There are multiple tricks to do that:
- `test/micro/org/openjdk/bench/vm/compiler/VectorStoreToLoadForwarding.java` using `MethodHandles.constant`
- Some inner class that has a static final, which is initialized from the non-final `@Param` value.
- Probably even `StableValue` would work, but I have not yet experimented with it.

It would be nice if we could do the same with the primitive types, but that's probably not going to work as easily.

Really just an idea for next time.

test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 366:

> 364:     public void testCompareNEMaskNotFloat() {
> 365:         testCompareMaskNotFloat(VectorOperators.NE);
> 366:     }

You could still add the other comparisons as well, so we can see the performance difference. Very optional, feel free to ignore this suggestion.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-3201347660
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333480061
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333418237
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333510278
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333503735
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333545924
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333516350


More information about the core-libs-dev mailing list