RFR: 8277997: Intrinsic creation for VectorMask.fromLong API
Paul Sandoz
psandoz at openjdk.java.net
Wed Dec 1 23:09:26 UTC 2021
On Wed, 1 Dec 2021 18:23:27 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> Summary of changes:
>
> 1) Inline expansion of VectorMask.fromLong API, this includes Java API implementation and C2 IR changes.
> 2) X86 backend support for AVX512 and AVX2 targets.
> 3) New IR transformation to handle following patterns:-
> a) Mask2Long + Long2Mask -> MaskCast (when source and destination mask lengths are equal)
> b) Long2Mask + Mask2Long -> Long
> 4) Following performance data is collected for new JMH micro included with the patch:-
>
> System Configuration : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
>
> Benchmark | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain factor | Baseline AVX3 (ops/ms) | Withopt AVX3(ops/ms) | Gain factor
> -- | -- | -- | -- | -- | -- | --
> MaskFromLongBenchmark.microMaskFromLong_Byte128 | 20050.884 | 36414.349 | 1.816096936 | 19699.631 | 36412.252 | 1.848372287
> MaskFromLongBenchmark.microMaskFromLong_Byte256 | 17589.496 | 36418.368 | 2.070461143 | 17211.451 | 36407.44 | 2.115303352
> MaskFromLongBenchmark.microMaskFromLong_Byte512 | 2824.411 | 2492.795 | 0.882589326 | 6359.071 | 36405.344 | 5.72494693
> MaskFromLongBenchmark.microMaskFromLong_Byte64 | 23507.28 | 36424.668 | 1.549505855 | 22659.666 | 36420.345 | 1.607276338
> MaskFromLongBenchmark.microMaskFromLong_Integer128 | 24567.895 | 36411.602 | 1.482080659 | 24620.619 | 36397.005 | 1.478313969
> MaskFromLongBenchmark.microMaskFromLong_Integer256 | 23495.078 | 36411.981 | 1.549770595 | 22823.846 | 36395.703 | 1.594634971
> MaskFromLongBenchmark.microMaskFromLong_Integer512 | 12377.022 | 11478.101 | 0.927371786 | 19701.118 | 36394.878 | 1.847350897
> MaskFromLongBenchmark.microMaskFromLong_Integer64 | 22169.231 | 17791.849 | 0.802546962 | 23603.169 | 18055.166 | 0.76494669
> MaskFromLongBenchmark.microMaskFromLong_Long128 | 22312.568 | 17859.474 | 0.800422166 | 22171.303 | 18106.295 | 0.816654529
> MaskFromLongBenchmark.microMaskFromLong_Long256 | 24271.19 | 36416.883 | 1.500416049 | 24621.327 | 36390.41 | 1.478003602
> MaskFromLongBenchmark.microMaskFromLong_Long512 | 15289.749 | 13860.775 | 0.906540389 | 23003.816 | 36396.033 | 1.582173714
> MaskFromLongBenchmark.microMaskFromLong_Long64 | 27086.471 | 20490.828 | 0.756496777 | 27177.133 | 20441.112 | 0.752143797
> MaskFromLongBenchmark.microMaskFromLong_Short128 | 23504.216 | 36412.66 | 1.549196961 | 22823.401 | 36417.799 | 1.595634191
> MaskFromLongBenchmark.microMaskFromLong_Short256 | 20056.61 | 36403.277 | 1.815026418 | 19699.502 | 36412.605 | 1.84840231
> MaskFromLongBenchmark.microMaskFromLong_Short512 | 4775.721 | 6827.594 | 1.429646749 | 17209.782 | 36388.226 | 2.114392036
> MaskFromLongBenchmark.microMaskFromLong_Short64 | 24759.049 | 36381.539 | 1.469423927 | 24506.013 | 36413.099 | 1.48588426
>
>
>
> Kindly review and share feedback.
>
> Best Regards,
> Jatin
Arguably broadcasting is not the correct term to associate with conversion of a long value to a mask, but it is very convenient to reuse `VectorSupport.broadcastCoerced` and i don't have a better solution in that regard. The addition of a new intrinsic seems overly heavy.
We could rename to `fromBitsCoerced` then the `bitwise` parameter can be renamed `mode`.
Can we define named constants on the Java and HotSpot side: `0`, for broadcasting; and `1` for mask conversion e.g. `BITS_COERCED_BROADCAST = 0`, `BITS_COERCED_MASK_TO_LONG=1`.
This potentially allows for future modes such as broadcast only to the first lane.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6646
More information about the hotspot-compiler-dev
mailing list