[vectorIntrinsics] RFR: 8284459: Add x86 back-end implementation for LEADING_ZEROS_COUNT operation [v2]
Jatin Bhateja
jbhateja at openjdk.java.net
Fri Apr 15 18:59:06 UTC 2022
On Fri, 15 Apr 2022 01:40:29 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>>
>> - 8284459: Adding an exponent based leading zero count algorithm for integer vectors, its showing around 10-15% gain.
>> - Merge branch 'vectorIntrinsics' of http://github.com/openjdk/panama-vector into JDK-8284459
>> - 8284459: Add x86 back-end implementation for LEADING_ZERO_COUNT operation
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4803:
>
>> 4801: vpsllw(xtmp2, dst, 8, vec_enc);
>> 4802: vpaddw(xtmp2, xtmp2, dst, vec_enc);
>> 4803: vpblendvb(dst, dst, xtmp2, xtmp3, vec_enc);
>
> The mask is generated using a word operation, but blend is a byte operation?
There is no variable word/double word/quadword blend, generated mask lane is composed of multiple bytes and hence byte level blend is sufficient to blend multi-byte lanes. We do have a single/double precision blend but its latency is similar to byte level blend in addition it may incur domain switch over penalty.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/189
More information about the panama-dev
mailing list