[vectorIntrinsics+mask] RFR: 8273949: Intrinsic creation for VectorMask.toLong operation. [v3]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Tue Sep 21 21:50:17 UTC 2021


On Mon, 20 Sep 2021 19:28:40 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Summary of changes:
>> - Intrinsification of VectorMask.toLong() API.
>> - Supports inline expansion for both AVX512 and non-AVX512 targets.
>> - Used toLong() API to optimize existing Java API implementation of VectorMask.laneIsSet() operation.
>> 
>> Following performance number are generated using JMH benchmark modification included with the patch.
>> 
>> System:  Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascade Lake Server 28C 2S)
>> 
>> Benchmark | VECSIZE | Baseline Score (ops/ms) | With Opt (ops/ms) | Gain Ratio
>> -- | -- | -- | -- | --
>> MaskQueryOperationsBenchmark.testToLongByte | 128 | 90451.424 | 346941.379 | 3.835665196
>> MaskQueryOperationsBenchmark.testToLongByte | 256 | 63127.764 | 338331.425 | 5.359471072
>> MaskQueryOperationsBenchmark.testToLongByte | 512 | 40543.264 | 313836.333 | 7.740776199
>> MaskQueryOperationsBenchmark.testToLongLong | 128 | 171989.714 | 152872.758 | 0.88884826
>> MaskQueryOperationsBenchmark.testToLongLong | 256 | 164702.273 | 324794.578 | 1.972010295
>> MaskQueryOperationsBenchmark.testToLongLong | 512 | 122667.916 | 318060.096 | 2.59285481
>> MaskQueryOperationsBenchmark.testToLongShort | 128 | 122656.408 | 346691.082 | 2.826522378
>> MaskQueryOperationsBenchmark.testToLongShort | 256 | 96838.555 | 360909.28 | 3.726917239
>> MaskQueryOperationsBenchmark.testToLongShort | 512 | 63119.009 | 313075.159 | 4.960077225
>> MaskQueryOperationsBenchmark.testToLonglong | 128 | 180855.623 | 324620.433 | 1.794914792
>> MaskQueryOperationsBenchmark.testToLonglong | 256 | 122705.631 | 324315.916 | 2.643040204
>> MaskQueryOperationsBenchmark.testToLonglong | 512 | 90396.687 | 324318.095 | 3.58772103
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8273949: Review comments resolution.

src/hotspot/cpu/x86/x86.ad line 8663:

> 8661: %}
> 8662: 
> 8663: instruct vmask_tolong_evex(rRegL dst, kReg mask) %{

The instructs with rRegL should be in ifdef _LP64.

src/hotspot/cpu/x86/x86.ad line 8694:

> 8692: %}
> 8693: 
> 8694: instruct vmask_truecount_evex(rRegI dst, kReg mask, rRegL tmp) %{

This should have flags register as popcnt instruction affects flags.

src/hotspot/cpu/x86/x86.ad line 8709:

> 8707: 
> 8708: instruct vmask_truecount_avx(rRegI dst, vec mask, rRegL tmp, vec xtmp, vec xtmp1) %{
> 8709:   predicate(n->in(1)->bottom_type()->isa_vectmask() == NULL);

This should also include flags register.

src/hotspot/share/opto/vectornode.hpp line 963:

> 961:  public:
> 962:   VectorMaskToLongNode(Node* mask, const Type* ty):
> 963:     VectorMaskOpNode(mask, ty, Op_VectorMaskLastTrue) {}

Shouldn't this be Op_VectorMaskToLong?

src/hotspot/share/prims/vectorSupport.cpp line 441:

> 439:         case T_FLOAT: // fall-through
> 440:         case T_DOUBLE: return Op_VectorMaskToLong;
> 441:         default: fatal("MASK_TRUECOUNT: %s", type2name(bt));

MASK_TOLONG here? Also break is missing.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/126


More information about the panama-dev mailing list