[vectorIntrinsics] RFR: 8284459: Add x86 back-end implementation for LEADING_ZEROS_COUNT operation [v2]

Fri Apr 15 18:59:06 UTC 2022

On Fri, 15 Apr 2022 01:40:29 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>> 
>>  - 8284459: Adding an exponent based leading zero count algorithm for integer vectors, its showing around 10-15% gain.
>>  - Merge branch 'vectorIntrinsics' of http://github.com/openjdk/panama-vector into JDK-8284459
>>  - 8284459: Add x86 back-end implementation for LEADING_ZERO_COUNT operation
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4803:
> 
>> 4801:    vpsllw(xtmp2, dst, 8, vec_enc);
>> 4802:    vpaddw(xtmp2, xtmp2, dst, vec_enc);
>> 4803:    vpblendvb(dst, dst, xtmp2, xtmp3, vec_enc);
> 
> The mask is generated using a word operation, but blend is a byte operation?

There is no variable word/double word/quadword blend,  generated mask lane is composed of multiple bytes and hence byte level blend is sufficient to blend multi-byte lanes. We do have a single/double precision blend but its latency is similar to byte level blend in addition it may incur domain switch over penalty.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/189