[vectorIntrinsics+mask] RFR: 8273949: Intrinsic creation for VectorMask.toLong operation. [v3]

Jatin Bhateja jbhateja at openjdk.java.net
Wed Sep 22 04:29:17 UTC 2021


On Mon, 20 Sep 2021 19:28:40 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Summary of changes:
>> - Intrinsification of VectorMask.toLong() API.
>> - Supports inline expansion for both AVX512 and non-AVX512 targets.
>> - Used toLong() API to optimize existing Java API implementation of VectorMask.laneIsSet() operation.
>> 
>> Following performance number are generated using JMH benchmark modification included with the patch.
>> 
>> System:  Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascade Lake Server 28C 2S)
>> 
>> Benchmark | VECSIZE | Baseline Score (ops/ms) | With Opt (ops/ms) | Gain Ratio
>> -- | -- | -- | -- | --
>> MaskQueryOperationsBenchmark.testToLongByte | 128 | 90451.424 | 346941.379 | 3.835665196
>> MaskQueryOperationsBenchmark.testToLongByte | 256 | 63127.764 | 338331.425 | 5.359471072
>> MaskQueryOperationsBenchmark.testToLongByte | 512 | 40543.264 | 313836.333 | 7.740776199
>> MaskQueryOperationsBenchmark.testToLongLong | 128 | 171989.714 | 152872.758 | 0.88884826
>> MaskQueryOperationsBenchmark.testToLongLong | 256 | 164702.273 | 324794.578 | 1.972010295
>> MaskQueryOperationsBenchmark.testToLongLong | 512 | 122667.916 | 318060.096 | 2.59285481
>> MaskQueryOperationsBenchmark.testToLongShort | 128 | 122656.408 | 346691.082 | 2.826522378
>> MaskQueryOperationsBenchmark.testToLongShort | 256 | 96838.555 | 360909.28 | 3.726917239
>> MaskQueryOperationsBenchmark.testToLongShort | 512 | 63119.009 | 313075.159 | 4.960077225
>> MaskQueryOperationsBenchmark.testToLonglong | 128 | 180855.623 | 324620.433 | 1.794914792
>> MaskQueryOperationsBenchmark.testToLonglong | 256 | 122705.631 | 324315.916 | 2.643040204
>> MaskQueryOperationsBenchmark.testToLonglong | 512 | 90396.687 | 324318.095 | 3.58772103
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8273949: Review comments resolution.

@nsjian , kindly check the IR change, it may need some change in AARCH64 backend since now VectorStoreMask is not being inserted while connecting mask generating node to mask operation node. This saves redundant store mask operation if target supports predicate registers.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/126


More information about the panama-dev mailing list