RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where the input index is a variable

erifan duke at openjdk.org
Mon Sep 8 01:25:18 UTC 2025


On Fri, 5 Sep 2025 10:12:35 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
>> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
>> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
>> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
>> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
>> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
>> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
>> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
>> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
>> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
>> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
>> 
>> 
>> Benchmarks on Intel 6444y machine with 512-bit avx3:
>> 
>> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
>> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
>> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
>> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
>> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
>> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
>> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
>> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
>> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
>> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
>> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
>> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
>> microMaskLaneIsSetInt512_var	ops/ms	573...
>
> test/micro/org/openjdk/bench/jdk/incubator/vector/VectorExtractBenchmark.java line 34:
> 
>> 32: @Warmup(iterations = 5, time = 1)
>> 33: @Measurement(iterations = 5, time = 1)
>> 34: @Fork(value = 1, jvmArgs = {"--add-modules=jdk.incubator.vector"})
> 
> Don't do 1 fork, do at least 3.

The test results show that this test is stable, so I think forking once is enough? We have many JMH benchmarks that fork once.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27113#discussion_r2328949227


More information about the core-libs-dev mailing list