RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3]
Vladimir Kozlov
kvn at openjdk.java.net
Fri Feb 19 01:58:44 UTC 2021
On Fri, 19 Feb 2021 01:23:04 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542
>>
>> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform.
>>
>> Before:
>> Benchmark (size) Mode Cnt Score Error Units
>> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ± 1.128 ops/ms
>> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ± 0.370 ops/ms
>>
>> After:
>> Benchmark (size) Mode Cnt Score Error Units
>> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ± 19.071 ops/ms
>> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ± 142.580 ops/ms
>
> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
>
> corrected assert
src/hotspot/cpu/x86/x86.ad line 1695:
> 1693: if(vlen == 2) {
> 1694: return false; // Implementation limitation due to how shuffle is loaded
> 1695: } else if (size_in_bits == 256 && UseAVX < 2) {
Should this be >= 256?
-------------
PR: https://git.openjdk.java.net/jdk/pull/2520
More information about the hotspot-compiler-dev
mailing list