RFR: 8277997: Intrinsic creation for VectorMask.fromLong API [v2]
Paul Sandoz
psandoz at openjdk.java.net
Fri Dec 3 20:48:12 UTC 2021
On Fri, 3 Dec 2021 20:22:07 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Summary of changes:
>>
>> 1) Inline expansion of VectorMask.fromLong API, this includes Java API implementation and C2 IR changes.
>> 2) X86 backend support for AVX512 and AVX2 targets.
>> 3) New IR transformation to handle following patterns:-
>> a) Mask2Long + Long2Mask -> MaskCast (when source and destination mask lengths are equal)
>> b) Long2Mask + Mask2Long -> Long
>> 4) Following performance data is collected for new JMH micro included with the patch:-
>>
>> System Configuration : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
>>
>> Benchmark | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain factor | Baseline AVX3 (ops/ms) | Withopt AVX3(ops/ms) | Gain factor
>> -- | -- | -- | -- | -- | -- | --
>> MaskFromLongBenchmark.microMaskFromLong_Byte128 | 20050.884 | 36414.349 | 1.816096936 | 19699.631 | 36412.252 | 1.848372287
>> MaskFromLongBenchmark.microMaskFromLong_Byte256 | 17589.496 | 36418.368 | 2.070461143 | 17211.451 | 36407.44 | 2.115303352
>> MaskFromLongBenchmark.microMaskFromLong_Byte512 | 2824.411 | 2492.795 | 0.882589326 | 6359.071 | 36405.344 | 5.72494693
>> MaskFromLongBenchmark.microMaskFromLong_Byte64 | 23507.28 | 36424.668 | 1.549505855 | 22659.666 | 36420.345 | 1.607276338
>> MaskFromLongBenchmark.microMaskFromLong_Integer128 | 24567.895 | 36411.602 | 1.482080659 | 24620.619 | 36397.005 | 1.478313969
>> MaskFromLongBenchmark.microMaskFromLong_Integer256 | 23495.078 | 36411.981 | 1.549770595 | 22823.846 | 36395.703 | 1.594634971
>> MaskFromLongBenchmark.microMaskFromLong_Integer512 | 12377.022 | 11478.101 | 0.927371786 | 19701.118 | 36394.878 | 1.847350897
>> MaskFromLongBenchmark.microMaskFromLong_Integer64 | 22169.231 | 17791.849 | 0.802546962 | 23603.169 | 18055.166 | 0.76494669
>> MaskFromLongBenchmark.microMaskFromLong_Long128 | 22312.568 | 17859.474 | 0.800422166 | 22171.303 | 18106.295 | 0.816654529
>> MaskFromLongBenchmark.microMaskFromLong_Long256 | 24271.19 | 36416.883 | 1.500416049 | 24621.327 | 36390.41 | 1.478003602
>> MaskFromLongBenchmark.microMaskFromLong_Long512 | 15289.749 | 13860.775 | 0.906540389 | 23003.816 | 36396.033 | 1.582173714
>> MaskFromLongBenchmark.microMaskFromLong_Long64 | 27086.471 | 20490.828 | 0.756496777 | 27177.133 | 20441.112 | 0.752143797
>> MaskFromLongBenchmark.microMaskFromLong_Short128 | 23504.216 | 36412.66 | 1.549196961 | 22823.401 | 36417.799 | 1.595634191
>> MaskFromLongBenchmark.microMaskFromLong_Short256 | 20056.61 | 36403.277 | 1.815026418 | 19699.502 | 36412.605 | 1.84840231
>> MaskFromLongBenchmark.microMaskFromLong_Short512 | 4775.721 | 6827.594 | 1.429646749 | 17209.782 | 36388.226 | 2.114392036
>> MaskFromLongBenchmark.microMaskFromLong_Short64 | 24759.049 | 36381.539 | 1.469423927 | 24506.013 | 36413.099 | 1.48588426
>>
>>
>>
>> Kindly review and share feedback.
>>
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> 8277997: Review comments resolved.
The coerced terms comes from representing the value to convert (one bit, byte, float, long, etc) to a mask or vector as a set of bits held in a long value.
Thus having an extra mode `MODE_BITS_COERCED_BROADCAST` is confusing in that regard and i think you can just reuse `MODE_BROADCAST` when broadcast the 1 bit to a mask. Since the class argument determines whether we are referring to a vector or not, as determined by `is_mask`.
Thus i would retain the existing `scalar2vector` boolean argument, thereby the mode is localized just to the intrinsic.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6646
More information about the hotspot-compiler-dev
mailing list