RFR: 8354242: VectorAPI: combine vector not operation with compare

erifan duke at openjdk.org
Wed Apr 16 06:45:18 UTC 2025


This patch optimizes the following patterns:
For integer types:

(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
    => (VectorMaskCmp src1 src2 ncond)
(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
    => (VectorMaskCmp src1 src2 ncond)

cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.

For float and double types:

(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
    => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
    => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))

cond can be eq or ne.

Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:

Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
testCompareLTMaskNotInt		ops/s	1672180.09	995.238142	2353757.863	853.774734	1.4
testCompareLTMaskNotLong	ops/s	856502.2695	12276.82851	1177671.815	496.723302	1.37
testCompareLTMaskNotShort	ops/s	3325798.025	2412.702501	4711554.181	1779.302112	1.41
testCompareNEMaskNotByte	ops/s	7910002.518	2771.82477	10245315.33	16321.93935	1.29
testCompareNEMaskNotDouble	ops/s	863754.6022	523.140788	1179133.982	476.572178	1.36
testCompareNEMaskNotFloat	ops/s	1723321.883	2598.484803	2358492.186	877.1401	1.36
testCompareNEMaskNotInt		ops/s	1670288.841	751.774826	2354158.125	835.720163	1.4
testCompareNEMaskNotLong	ops/s	836327.6835	410.525466	1178178.825	308.757932	1.4
testCompareNEMaskNotShort	ops/s	3327815.841	1511.978763	4711379.136	2336.505531	1.41
testCompareUGEMaskNotByte	ops/s	7906699.024	3200.936474	10253843.74	15067.59401	1.29
testCompareUGEMaskNotInt	ops/s	1674003.923	3287.191727	2353340.666	951.381021	1.4
testCompareUGEMaskNotLong	ops/s	852424.5562	8920.408939	1177943.609	389.6621	1.38
testCompareUGEMaskNotShort	ops/s	3327255.858	1584.885143	4711622.355	1247.215277	1.41
testCompareUGTMaskNotByte	ops/s	7909249.189	4435.283667	10245541.34	10993.34739	1.29
testCompareUGTMaskNotInt	ops/s	1693713.433	20650.00213	2353153.787	1055.343846	1.38
testCompareUGTMaskNotLong	ops/s	851022.3395	7079.065268	1177910.677	538.604598	1.38
testCompareUGTMaskNotShort	ops/s	3327236.988	1616.886789	4711209.865	3098.494145	1.41
testCompareULEMaskNotByte	ops/s	7909350.825	3251.262342	10261449.03	7273.831341	1.29
testCompareULEMaskNotInt	ops/s	1672350.925	1545.304304	2353231.755	914.231193	1.4
testCompareULEMaskNotLong	ops/s	853349.4765	9804.906913	1177967.254	435.044367	1.38
testCompareULEMaskNotShort	ops/s	3325757.891	1555.062257	4712873.187	1650.986905	1.41
testCompareULTMaskNotByte	ops/s	7912218.621	2633.477744	10242095.98	21921.39902	1.29
testCompareULTMaskNotInt	ops/s	1673994.849	2672.507666	2353449.22	946.105757	1.4
testCompareULTMaskNotLong	ops/s	849032.5868	10406.06689	1177586.047	506.541456	1.38
testCompareULTMaskNotShort	ops/s	3328062.026	1892.991844	4713247.216	1855.983724	1.41


With option `-XX:UseSVE=0`:

Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
testCompareEQMaskNotByte	ops/s	7895961.919	72712.90804	7746493.731	71481.92938	0.98
testCompareEQMaskNotDouble	ops/s	789811.0455	384.493088	766473.7994	2216.581793	0.97
testCompareEQMaskNotFloat	ops/s	1806305.818	638.010451	1819616.613	3295.38958	1
testCompareEQMaskNotInt		ops/s	1815820.144	1225.336135	1849538.401	766.29902	1.01
testCompareEQMaskNotLong	ops/s	807336.492	335.451807	792732.9483	277.954432	0.98
testCompareEQMaskNotShort	ops/s	4818266.38	1927.862665	4668903.001	1922.782715	0.96
testCompareGEMaskNotByte	ops/s	7818439.678	75374.97739	16498003.98	41440.49653	2.11
testCompareGEMaskNotInt		ops/s	1815159.05	1090.912209	2372095.779	1664.397112	1.3
testCompareGEMaskNotLong	ops/s	804324.5575	2301.686878	927919.8507	371.766719	1.15
testCompareGEMaskNotShort	ops/s	4818966.563	2443.643652	5385561.038	29558.37423	1.11
testCompareGTMaskNotByte	ops/s	7893406.157	82687.74264	16470663.2	22165.55812	2.08
testCompareGTMaskNotInt		ops/s	1815316.812	915.894106	2370447.198	655.016338	1.3
testCompareGTMaskNotLong	ops/s	807019.456	526.525482	928079.0541	330.582693	1.15
testCompareGTMaskNotShort	ops/s	4820552.881	1684.247747	5355902.93	5893.2915	1.11
testCompareLEMaskNotByte	ops/s	7816263.323	79560.0015	16473621.19	56688.99585	2.1
testCompareLEMaskNotInt		ops/s	1814915.724	926.998625	2368790.306	932.594778	1.3
testCompareLEMaskNotLong	ops/s	806483.9	935.718082	928110.9074	407.096695	1.15
testCompareLEMaskNotShort	ops/s	4813660.241	6817.870509	5357107.852	10061.47975	1.11
testCompareLTMaskNotByte	ops/s	7838948.962	69136.4504	16424405.96	24464.75469	2.09
testCompareLTMaskNotInt		ops/s	1815056.833	1187.6453	2369892.187	1103.819634	1.3
testCompareLTMaskNotLong	ops/s	806602.1804	287.923365	928346.4118	617.682824	1.15
testCompareLTMaskNotShort	ops/s	4817940.643	2767.1509	5372537.84	15397.47169	1.11
testCompareNEMaskNotByte	ops/s	9078493.798	4630.339307	16484348.42	18925.88346	1.81
testCompareNEMaskNotDouble	ops/s	661769.6272	398.712981	926763.5839	1808.843788	1.4
testCompareNEMaskNotFloat	ops/s	1570527.252	563.642144	2312425.678	1815.844846	1.47
testCompareNEMaskNotInt		ops/s	1619146.58	626.793854	2369711.543	942.330478	1.46
testCompareNEMaskNotLong	ops/s	680201.5381	2252.836482	927808.6147	414.917863	1.36
testCompareNEMaskNotShort	ops/s	3763508.054	3622.560798	5367808.015	8591.466599	1.42
testCompareUGEMaskNotByte	ops/s	7886373.129	75917.74675	16480928.93	27524.31005	2.08
testCompareUGEMaskNotInt	ops/s	1815636.832	750.036241	2369683.015	901.609404	1.3
testCompareUGEMaskNotLong	ops/s	806862.5826	287.819616	928001.4394	361.063837	1.15
testCompareUGEMaskNotShort	ops/s	4820581.361	2098.537435	5375854.248	25619.40165	1.11
testCompareUGTMaskNotByte	ops/s	7891591.465	96614.93542	16410405.93	15012.37096	2.07
testCompareUGTMaskNotInt	ops/s	1814871.179	662.825588	2371325.903	1170.491164	1.3
testCompareUGTMaskNotLong	ops/s	804013.7658	2240.534209	928062.2169	531.306897	1.15
testCompareUGTMaskNotShort	ops/s	4818150.337	3051.717685	5381449.337	21212.34187	1.11
testCompareULEMaskNotByte	ops/s	7831540.628	81306.67253	16495250.78	38682.19675	2.1
testCompareULEMaskNotInt	ops/s	1814484.14	687.860656	2369265.075	940.609586	1.3
testCompareULEMaskNotLong	ops/s	807780.5749	769.876816	927538.0732	1278.267724	1.14
testCompareULEMaskNotShort	ops/s	4817437.42	5141.336541	5356183.359	7015.608124	1.11
testCompareULTMaskNotByte	ops/s	7849078.225	56753.59764	16395975.27	34043.67295	2.08
testCompareULTMaskNotInt	ops/s	1814328.226	2697.219111	2370700.47	1991.841988	1.3
testCompareULTMaskNotLong	ops/s	807166.8197	253.061506	927926.2803	252.933462	1.14
testCompareULTMaskNotShort	ops/s	4821098.216	1625.959044	5348980.243	4100.768121	1.1


Benchmarks on AMD EPYC 9124 16-Core Processor:
With option `-XX:UseAVX=3`:

Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
testCompareEQMaskNotByte	ops/s	16607323.35	1233692.631	18381557.66	1163201.522	1.1
testCompareEQMaskNotDouble	ops/s	2114285.245	58782.2534	2959946.353	43016.0445	1.39
testCompareEQMaskNotFloat	ops/s	4480874.437	89975.29074	6960151.436	64799.143	1.55
testCompareEQMaskNotInt		ops/s	4370906.91	51784.80889	6856955.043	313858.5504	1.56
testCompareEQMaskNotLong	ops/s	2080065.895	26762.06732	2939142.143	67179.05314	1.41
testCompareEQMaskNotShort	ops/s	7968282.563	210437.2781	12701214.56	473152.6407	1.59
testCompareGEMaskNotByte	ops/s	18419141.89	473408.9451	19880059.68	321638.0397	1.07
testCompareGEMaskNotInt		ops/s	4419015.62	77352.98633	7037639.227	151066.0383	1.59
testCompareGEMaskNotLong	ops/s	2147982.48	49227.42782	3000275.928	39298.75344	1.39
testCompareGEMaskNotShort	ops/s	8469039.613	17833.19707	12288229.49	244317.8812	1.45
testCompareGTMaskNotByte	ops/s	18728997.5	468328.8358	20544730.05	392264.6466	1.09
testCompareGTMaskNotInt		ops/s	4510009.705	78812.57357	7364629.942	70970.78473	1.63
testCompareGTMaskNotLong	ops/s	2124104.969	40917.89257	2953536.279	35199.19687	1.39
testCompareGTMaskNotShort	ops/s	8690557.621	311534.1159	12344017.51	457931.8741	1.42
testCompareLEMaskNotByte	ops/s	17758400.53	478383.4945	19209183.26	1143297.241	1.08
testCompareLEMaskNotInt		ops/s	4363664.862	43443.18063	7054093.064	78141.11476	1.61
testCompareLEMaskNotLong	ops/s	2068632.213	29844.78023	2954766.412	50667.22502	1.42
testCompareLEMaskNotShort	ops/s	8637608.548	183538.5511	12719010.27	473568.8825	1.47
testCompareLTMaskNotByte	ops/s	14406138.95	423105.0163	17292417.96	371386.9689	1.2
testCompareLTMaskNotInt		ops/s	4546707.266	131977.3144	7040483.394	213590.4657	1.54
testCompareLTMaskNotLong	ops/s	2123277.356	47243.21499	2848720.442	58896.97045	1.34
testCompareLTMaskNotShort	ops/s	7570169.363	649873.6295	11945383.75	988276.5955	1.57
testCompareNEMaskNotByte	ops/s	18274529.55	683396.7384	19081938.8	1118739.778	1.04
testCompareNEMaskNotDouble	ops/s	2112533.61	43295.50012	2912115.441	78189.51083	1.37
testCompareNEMaskNotFloat	ops/s	4628683.814	93817.07362	6967208.729	145135.8544	1.5
testCompareNEMaskNotInt		ops/s	4470900.214	75974.50842	7286913.662	116328.5277	1.62
testCompareNEMaskNotLong	ops/s	2134091.061	46377.94061	2934667.477	81675.46021	1.37
testCompareNEMaskNotShort	ops/s	8790384.287	396161.8599	13076858.35	286272.1155	1.48
testCompareUGEMaskNotByte	ops/s	18009150.9	660803.8886	17551258.33	1667014.843	0.97
testCompareUGEMaskNotInt	ops/s	4442928.74	83190.81019	6854088.277	329008.8901	1.54
testCompareUGEMaskNotLong	ops/s	2088357.736	71696.24791	2973202.26	63278.78974	1.42
testCompareUGEMaskNotShort	ops/s	8348624.02	116562.7876	12832250.78	546869.3006	1.53
testCompareUGTMaskNotByte	ops/s	17871101.25	800199.6321	19902619.81	214003.3262	1.11
testCompareUGTMaskNotInt	ops/s	4088304.421	137797.9723	7135454.33	124553.651	1.74
testCompareUGTMaskNotLong	ops/s	2070610.42	19881.82182	2991536.365	36260.60767	1.44
testCompareUGTMaskNotShort	ops/s	8637099.341	155822.1608	12756579.77	186068.199	1.47
testCompareULEMaskNotByte	ops/s	17940901.36	1258029.364	18932484.94	694554.6305	1.05
testCompareULEMaskNotInt	ops/s	4369177.511	74982.31936	6392773.082	550171.2266	1.46
testCompareULEMaskNotLong	ops/s	2135905.761	43693.63178	2877579.631	41651.56289	1.34
testCompareULEMaskNotShort	ops/s	8607710.544	132655.1676	12446370.04	441718.3035	1.44
testCompareULTMaskNotByte	ops/s	17409912.23	1033204.537	20607479.99	362000.5056	1.18
testCompareULTMaskNotInt	ops/s	4386455.9	119192.1635	6920123.264	186158.2845	1.57
testCompareULTMaskNotLong	ops/s	2064995.149	38622.2734	2988343.589	39037.90006	1.44
testCompareULTMaskNotShort	ops/s	8642182.752	230919.2442	13029582.09	437101.4923	1.5


The small amount of performance degradation is due to test fluctuations.

-------------

Commit messages:
 - 8354242: VectorAPI: combine vector not operation with compare

Changes: https://git.openjdk.org/jdk/pull/24674/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8354242
  Stats: 1051 lines in 5 files changed: 1048 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674


More information about the core-libs-dev mailing list