RFR: 8354242: VectorAPI: combine vector not operation with compare [v13]

erifan duke at openjdk.org
Mon Sep 15 09:49:19 UTC 2025


On Mon, 15 Sep 2025 09:33:47 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> erifan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add an IR rule for vector mask cast operation
>
> Your benchmark and code changes look good to me. Thanks for addressing my comments.

Thanks @jatin-bhateja . And the updated benchmarks test results are as follow, no much changes.

On Nvidia Grace machine with 128-bit SVE2:
With option `-XX:UseSVE=2`:

Benchmark		COMPARISON_OP	Unit	Before		Score Error	After		Score Error	Uplift
testCompareMaskNotDouble	EQ	ops/s	908008.7644	827.699314	1175289.515	240.548861	1.294359
testCompareMaskNotDouble	NE	ops/s	872199.2489	131.090115	1175667.777	129.741515	1.347934
testCompareMaskNotDouble	LT	ops/s	880166.7559	1570.41653	882160.6889	4723.507639	1.002265
testCompareMaskNotDouble	LE	ops/s	878115.3293	2919.637497	879033.7895	5404.617017	1.001045
testCompareMaskNotDouble	GT	ops/s	877068.5325	9595.275981	865832.864	5054.26002	0.987189
testCompareMaskNotDouble	GE	ops/s	895695.0228	3276.687933	871153.7117	7714.572967	0.9726
testCompareMaskNotFloat	    EQ	ops/s	1811841.295	278.140948	2350971.83	606.667654	1.297559
testCompareMaskNotFloat	    NE	ops/s	1727124.634	1755.717051	2351789.019	269.531198	1.361678
testCompareMaskNotFloat	    LT	ops/s	1735243.319	4912.343726	1726257.01	823.746765	0.994821
testCompareMaskNotFloat	    LE	ops/s	1726151.367	1071.383328	1727029.339	960.336314	1.000508
testCompareMaskNotFloat	    GT	ops/s	1729704.897	1646.026351	1726069.02	440.981281	0.997897
testCompareMaskNotFloat	    GE	ops/s	1726515.227	2171.61643	1728365.682	1404.298156	1.001071
testCompareMaskNotByte	    EQ	ops/s	8480574.694	1254.415788	10200329.86	8560.199493	1.202787
testCompareMaskNotByte	    NE	ops/s	8480141.263	1437.762594	10207424.91	3664.106923	1.203685
testCompareMaskNotByte	    LT	ops/s	8471471.384	7699.585554	10203300.19	4675.047416	1.20443
testCompareMaskNotByte	    LE	ops/s	8476165.519	6045.944392	10204956.23	2174.866199	1.203959
testCompareMaskNotByte	    GT	ops/s	8479397.377	1290.560961	10207032.3	5414.789178	1.203745
testCompareMaskNotByte	    GE	ops/s	8479979.908	1094.823175	10203115.77	2909.433184	1.2032
testCompareMaskNotByte	    ULT	ops/s	8480915.515	1420.30856	10213140.54	19628.56888	1.204249
testCompareMaskNotByte	    ULE	ops/s	8481768.961	1806.086454	10191601.05	9537.089409	1.201589
testCompareMaskNotByte	    UGT	ops/s	8477948.807	3652.437106	10208439.79	8335.226416	1.204116
testCompareMaskNotByte	    UGE	ops/s	8477320.065	2191.753237	10198589.9	5748.761942	1.203044
testCompareMaskNotInt	    EQ	ops/s	1906386.393	208.045573	2346741.129	383.461819	1.230989
testCompareMaskNotInt	    NE	ops/s	1674206.146	169.967081	2346609.602	652.964692	1.401625
testCompareMaskNotInt	    LT	ops/s	1684755.085	4939.806653	2345939.728	738.842445	1.392451
testCompareMaskNotInt	    LE	ops/s	1659985.83	2408.542766	2346929.8	192.550397	1.413825
testCompareMaskNotInt	    GT	ops/s	1674460.437	447.120589	2347037.155	342.433085	1.401667
testCompareMaskNotInt	    GE	ops/s	1658699.073	884.268891	2347411.827	281.885914	1.415212
testCompareMaskNotInt	    ULT	ops/s	1677043.66	6215.834359	2347155.384	425.141786	1.399579
testCompareMaskNotInt	    ULE	ops/s	1667049.76	9521.094204	2346815.213	316.03901	1.407765
testCompareMaskNotInt	    UGT	ops/s	1661045.828	3669.548525	2346711.365	2808.608132	1.412791
testCompareMaskNotInt	    UGE	ops/s	1663715.691	4570.73053	2347096.847	191.804359	1.410755
testCompareMaskNotLong	    EQ	ops/s	885668.5947	203.053456	1174274.006	113.51354	1.325861
testCompareMaskNotLong	    NE	ops/s	837449.9353	198.611966	1174330.269	106.514374	1.402269
testCompareMaskNotLong	    LT	ops/s	846790.2128	7005.585657	1174290.879	93.56413	1.386755
testCompareMaskNotLong	    LE	ops/s	851253.2346	7624.045467	1174162.355	179.854316	1.379333
testCompareMaskNotLong	    GT	ops/s	837715.7563	4272.558281	1173797.819	289.311518	1.401188
testCompareMaskNotLong	    GE	ops/s	883137.593	14804.63746	1174216.909	86.404559	1.329596
testCompareMaskNotLong	    ULT	ops/s	872478.9017	4955.722542	1174341.995	124.656933	1.345983
testCompareMaskNotLong	    ULE	ops/s	866570.738	12541.58528	1174185.197	594.850706	1.354979
testCompareMaskNotLong	    UGT	ops/s	866389.0927	3971.492766	1174210.803	153.960084	1.355292
testCompareMaskNotLong	    UGE	ops/s	848339.3876	4555.514721	1174060.638	240.326562	1.383951
testCompareMaskNotShort	    EQ	ops/s	3336170.783	2286.717236	4684904.156	2134.72575	1.404275
testCompareMaskNotShort	    NE	ops/s	3334775.472	717.588615	4690264.12	3017.756867	1.40647
testCompareMaskNotShort	    LT	ops/s	3334619.058	1138.901707	4685883.864	3808.321694	1.405223
testCompareMaskNotShort	    LE	ops/s	3335538.353	538.676789	4688238.934	1029.406266	1.405541
testCompareMaskNotShort	    GT	ops/s	3301425.217	694.060525	4689167.049	2845.363801	1.420346
testCompareMaskNotShort	    GE	ops/s	3301580.972	317.042851	4688970.211	1292.83929	1.420219
testCompareMaskNotShort	    ULT	ops/s	3336318.051	892.515034	4687549.384	1403.281648	1.405006
testCompareMaskNotShort	    ULE	ops/s	3335188.292	972.230191	4684723.63	3937.599084	1.404635
testCompareMaskNotShort	    UGT	ops/s	3334490.656	930.409628	4688058.378	1166.776081	1.405929
testCompareMaskNotShort	    UGE	ops/s	3333050.033	3146.019596	4689197.9	456.439188	1.406878


With option `-XX:UseSVE=0`:

Benchmark		COMPARISON_OP	Unit	Before		Score Error	After		Score Error	Uplift
testCompareMaskNotDouble	EQ	ops/s	788505.9464	579.254839	769969.5798	138.792325	0.976491
testCompareMaskNotDouble	NE	ops/s	655499.7935	471.970429	915086.3257	183.495964	1.396013
testCompareMaskNotDouble	LT	ops/s	788418.7889	574.263314	789271.7448	51.838991	1.001081
testCompareMaskNotDouble	LE	ops/s	789144.8431	45.334181	789326.1963	84.148011	1.000229
testCompareMaskNotDouble	GT	ops/s	788690.8485	662.950083	789246.9812	99.060588	1.000705
testCompareMaskNotDouble	GE	ops/s	789421.2387	94.012868	789166.4717	111.772533	0.999677
testCompareMaskNotFloat	    EQ	ops/s	1816132.864	1298.2187	1816461.601	311.706275	1.000181
testCompareMaskNotFloat	    NE	ops/s	1550767.697	1142.987761	2301429.148	159.71525	1.484057
testCompareMaskNotFloat	    LT	ops/s	1815531.685	1370.868745	1817187.121	761.68401	1.000911
testCompareMaskNotFloat	    LE	ops/s	1817937.722	484.638134	1817703.209	625.275639	0.999871
testCompareMaskNotFloat	    GT	ops/s	1818618.89	724.324392	1817977.851	481.152488	0.999647
testCompareMaskNotFloat	    GE	ops/s	1815118.411	1327.945736	1817476.414	510.712942	1.001299
testCompareMaskNotByte	    EQ	ops/s	6489599.571	5127.815254	6535895.286	17029.15534	1.007133
testCompareMaskNotByte	    NE	ops/s	9089974.523	4069.346579	15945662.17	22867.48282	1.754203
testCompareMaskNotByte	    LT	ops/s	6499040.898	1250.085336	15939338.57	17451.05939	2.452567
testCompareMaskNotByte	    LE	ops/s	6493612.339	4928.466061	15926355.01	27249.57103	2.452618
testCompareMaskNotByte	    GT	ops/s	6494486.565	5229.4598	15957497.14	6893.237334	2.457083
testCompareMaskNotByte	    GE	ops/s	6499295.661	1030.044749	15903755.01	46454.70992	2.446996
testCompareMaskNotByte	    ULT	ops/s	6494212.684	5194.712704	15944816.71	3467.818892	2.455234
testCompareMaskNotByte	    ULE	ops/s	6493882.576	5092.839387	15936419.25	22755.34523	2.454066
testCompareMaskNotByte	    UGT	ops/s	6493479.899	4678.096391	15958133.18	3483.353667	2.457562
testCompareMaskNotByte	    UGE	ops/s	6500338.419	709.344957	15968155.27	14020.47085	2.456511
testCompareMaskNotInt	    EQ	ops/s	1830787.273	237.597163	1878452.588	142.728192	1.026035
testCompareMaskNotInt	    NE	ops/s	1615081.395	1219.871461	2360913.712	199.556675	1.461792
testCompareMaskNotInt	    LT	ops/s	1827819.867	1360.728526	2360561.422	248.025925	1.291462
testCompareMaskNotInt	    LE	ops/s	1830975.648	416.987529	2360703.924	194.958346	1.289314
testCompareMaskNotInt	    GT	ops/s	1830633.964	301.849017	2360552.203	224.908655	1.289472
testCompareMaskNotInt	    GE	ops/s	1829476.495	1348.361278	2360673.736	137.538696	1.290354
testCompareMaskNotInt	    ULT	ops/s	1829137.773	1285.55232	2360615.95	162.876291	1.290562
testCompareMaskNotInt	    ULE	ops/s	1828107.468	1360.867847	2360790.337	297.267481	1.291384
testCompareMaskNotInt	    UGT	ops/s	1829659.222	1459.098806	2361025.107	266.158075	1.290417
testCompareMaskNotInt	    UGE	ops/s	1829548.187	1427.266787	2360941.943	242.380469	1.29045
testCompareMaskNotLong	    EQ	ops/s	810439.9121	82.577412	802287.4993	73.462086	0.98994
testCompareMaskNotLong	    NE	ops/s	681643.6089	485.657471	932324.6973	158.28799	1.367759
testCompareMaskNotLong	    LT	ops/s	809850.546	680.71673	931404.3219	685.591444	1.150094
testCompareMaskNotLong	    LE	ops/s	810584.5191	115.234753	932234.2412	105.451172	1.150076
testCompareMaskNotLong	    GT	ops/s	810593.5376	117.947863	931879.1829	553.397713	1.149625
testCompareMaskNotLong	    GE	ops/s	810435.8405	81.88737	931833.0348	177.765694	1.149792
testCompareMaskNotLong	    ULT	ops/s	810429.8459	90.005329	932127.5278	74.443387	1.150164
testCompareMaskNotLong	    ULE	ops/s	809740.842	411.655134	932231.6607	76.044104	1.151271
testCompareMaskNotLong	    UGT	ops/s	810493.4369	52.024062	932239.1709	143.915229	1.150211
testCompareMaskNotLong	    UGE	ops/s	810442.0661	64.064396	932361.567	119.570287	1.150435
testCompareMaskNotShort	    EQ	ops/s	4786426.182	299.050738	4694123.013	482.608634	0.980715
testCompareMaskNotShort	    NE	ops/s	3808932.807	2993.590606	5672255.469	6262.526335	1.489198
testCompareMaskNotShort	    LT	ops/s	4782535.485	3699.104322	5668474.071	11101.86452	1.185244
testCompareMaskNotShort	    LE	ops/s	4782896.891	3338.57484	5669188.434	6309.723399	1.185304
testCompareMaskNotShort	    GT	ops/s	4778532.318	3571.547653	5680482.703	10427.66734	1.18875
testCompareMaskNotShort	    GE	ops/s	4786150.851	794.769881	5664644.919	6542.434538	1.183549
testCompareMaskNotShort	    ULT	ops/s	4783623.78	3582.962421	5668267.123	17841.44773	1.184931
testCompareMaskNotShort	    ULE	ops/s	4782752.125	3610.296618	5666231.302	6964.505363	1.184721
testCompareMaskNotShort	    UGT	ops/s	4782469.332	2913.37576	5655837.96	6494.608864	1.182618
testCompareMaskNotShort	    UGE	ops/s	4782606.35	3491.774067	5667295.182	14176.96543	1.18498


On AMD EPYC 9124 16-Core Processor:
With option `-XX:UseAVX=3`:

Benchmark		COMPARISON_OP	Unit	Before		Score Error	After		Score Error	Uplift
testCompareMaskNotDouble	EQ	ops/s	2166357.886	27577.51358	2920183.192	38491.49083	1.347968
testCompareMaskNotDouble	NE	ops/s	2177325.341	32771.27023	2965747.932	39271.62615	1.362106
testCompareMaskNotDouble	LT	ops/s	2123834.711	22890.39919	2197099.169	29107.41329	1.034496
testCompareMaskNotDouble	LE	ops/s	2172931.681	32912.05647	2121686.057	34927.37781	0.976416
testCompareMaskNotDouble	GT	ops/s	2164924.662	30925.91899	2124062.892	37135.0458	0.981125
testCompareMaskNotDouble	GE	ops/s	2150619.038	35515.09022	2192636.533	38672.85716	1.019537
testCompareMaskNotFloat	    EQ	ops/s	4518378.764	74733.72389	6724589.409	50424.63568	1.488274
testCompareMaskNotFloat	    NE	ops/s	4522823.224	78138.66727	6907565.257	203953.3299	1.527268
testCompareMaskNotFloat	    LT	ops/s	4587473.545	62621.25938	4431658.918	52760.23989	0.966034
testCompareMaskNotFloat	    LE	ops/s	4472078.986	79338.23304	4472390.043	66247.285	1.000069
testCompareMaskNotFloat	    GT	ops/s	4451744.39	220787.9755	4440866.486	58674.19154	0.997556
testCompareMaskNotFloat	    GE	ops/s	4459601.349	57873.05167	4481398.426	76819.69285	1.004887
testCompareMaskNotByte	    EQ	ops/s	19415317.92	356367.4937	20649319.86	240515.9459	1.063558
testCompareMaskNotByte	    NE	ops/s	19401162.58	362571.8103	21010358.2	71221.35255	1.082943
testCompareMaskNotByte	    LT	ops/s	19175612.37	273080.6175	20235838.72	396190.6101	1.05529
testCompareMaskNotByte	    LE	ops/s	19036831.33	121135.0491	20674528.84	248839.9471	1.086027
testCompareMaskNotByte	    GT	ops/s	19008302.3	124633.9182	20671390.89	271644.5576	1.087492
testCompareMaskNotByte	    GE	ops/s	19590753.42	429156.452	20491615.07	332912.82	1.045984
testCompareMaskNotByte	    ULT	ops/s	19431604.06	421396.5487	20575805.9	248466.2368	1.058883
testCompareMaskNotByte	    ULE	ops/s	19060425.47	98309.75469	20774930.43	206596.0422	1.089951
testCompareMaskNotByte	    UGT	ops/s	19266788.04	362893.3051	20861521.87	106977.3707	1.082771
testCompareMaskNotByte	    UGE	ops/s	19127964.33	447774.3747	20791221.56	254458.0132	1.086954
testCompareMaskNotInt	    EQ	ops/s	4473402.48	84902.77154	7191777.028	94315.13878	1.607674
testCompareMaskNotInt	    NE	ops/s	4583165.363	73491.79073	7249884.988	80028.31191	1.581851
testCompareMaskNotInt	    LT	ops/s	4618634.192	81869.82512	7242567.732	71211.3697	1.568118
testCompareMaskNotInt	    LE	ops/s	4650524.195	72302.56692	7154948.491	83057.90635	1.538525
testCompareMaskNotInt	    GT	ops/s	4534752.486	94449.20198	7004428.251	38365.18576	1.54461
testCompareMaskNotInt	    GE	ops/s	4540777.389	86331.11847	7129527.341	74343.06996	1.570111
testCompareMaskNotInt	    ULT	ops/s	4528175.644	114213.6504	7220013.98	82850.22587	1.594464
testCompareMaskNotInt	    ULE	ops/s	4619335.448	74203.98889	7118543.128	54457.43284	1.541031
testCompareMaskNotInt	    UGT	ops/s	4572521.254	122912.75	7154797.741	98858.3477	1.564737
testCompareMaskNotInt	    UGE	ops/s	4579627.842	80558.04554	7179020.593	99239.23499	1.567599
testCompareMaskNotLong	    EQ	ops/s	2103965.347	17059.28178	2997338.009	32388.42725	1.424613
testCompareMaskNotLong	    NE	ops/s	2174434.633	36011.24708	2984460.593	29074.42994	1.372522
testCompareMaskNotLong	    LT	ops/s	2110937.378	56642.0052	3020690.893	31167.62537	1.430971
testCompareMaskNotLong	    LE	ops/s	2153414.166	31280.20562	2971696.162	31176.24605	1.379992
testCompareMaskNotLong	    GT	ops/s	2166028.207	49432.18925	3008018.282	26534.78551	1.388725
testCompareMaskNotLong	    GE	ops/s	2178206.136	35757.6799	2933186.687	19824.26727	1.346606
testCompareMaskNotLong	    ULT	ops/s	2104344.728	31405.7728	2964354.007	26871.18289	1.408682
testCompareMaskNotLong	    ULE	ops/s	2210232.578	21993.95777	3032635.261	25545.43656	1.372088
testCompareMaskNotLong	    UGT	ops/s	2167177.931	44896.90807	2996245.236	34153.68941	1.382556
testCompareMaskNotLong	    UGE	ops/s	2117175.328	26131.1893	2977492.164	23227.65519	1.406351
testCompareMaskNotShort	    EQ	ops/s	8131234.179	185997.1777	12414378.38	122648.1579	1.526752
testCompareMaskNotShort	    NE	ops/s	8506016.656	236481.383	12720442.64	322747.8776	1.495464
testCompareMaskNotShort	    LT	ops/s	8487868.819	244943.6097	12150479.62	244300.5456	1.431511
testCompareMaskNotShort	    LE	ops/s	8549184.557	286833.466	12358019.06	136683.2112	1.44552
testCompareMaskNotShort	    GT	ops/s	8375447.45	221237.073	12602058.97	385690.3318	1.504643
testCompareMaskNotShort	    GE	ops/s	8123474.548	127727.1461	12799747.64	197940.1001	1.575649
testCompareMaskNotShort	    ULT	ops/s	8491650.422	313124.2425	12751186.59	255845.1653	1.501614
testCompareMaskNotShort	    ULE	ops/s	8363009.676	203670.1995	12675908.7	279496.9925	1.515711
testCompareMaskNotShort	    UGT	ops/s	8332268.933	279787.2503	12279451.4	436971.6582	1.473722
testCompareMaskNotShort	    UGE	ops/s	8931588.505	203962.9257	12324437.67	330723.3066	1.37987

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3291304777


More information about the hotspot-compiler-dev mailing list