RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9]
Jatin Bhateja
jbhateja at openjdk.org
Tue Jan 23 11:56:58 UTC 2024
> Hi,
>
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set.
> These are very frequently used APIs in columnar database filter operation.
>
> Implementation uses a lookup table to record permute indices. Table index is computed using
> mask argument of compress/expand operation.
>
> Following are the performance number of JMH micro included with the patch.
>
>
> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids)
>
> Baseline:
> Benchmark (size) Mode Cnt Score Error Units
> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms
> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms
> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms
> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms
> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms
> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms
> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms
> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms
> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms
> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms
> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms
> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms
>
> Withopt:
> Benchmark (size) Mode Cnt Score Error Units
> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms
> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms
> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms
> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms
> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms
> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms
> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms
> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms
> ColumnFilterBenchmark.filterIntColumn 4096...
Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:
- Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8322768
- Modifying comments.
- Review comments resolution
- Modified code comment for clarity.
- Space fixup
- Using emulated variable blend E-Core optimized instruction.
- Review suggestions incorporated.
- Review comments resolutions.
- Updating copyright year of modified files.
- 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target.
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/17261/files
- new: https://git.openjdk.org/jdk/pull/17261/files/cd912308..83e4065e
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=08
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=07-08
Stats: 41105 lines in 1072 files changed: 24738 ins; 11390 del; 4977 mod
Patch: https://git.openjdk.org/jdk/pull/17261.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261
PR: https://git.openjdk.org/jdk/pull/17261
More information about the core-libs-dev
mailing list