RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target.

Jatin Bhateja jbhateja at openjdk.org
Thu Jan 4 05:33:35 UTC 2024


Hi,

Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets.
Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set.
These are very frequently used operation in columnar database filter operation.

Implementation uses a lookup table to record permute indices. Table index is computed using
mask argument of compress/expand operation.

Following are the performance number of JMH micro included with the patch.


System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids)

Baseline:
Benchmark                                 (size)   Mode  Cnt    Score   Error   Units
ColumnFilterBenchmark.filterDoubleColumn    1024  thrpt    2  142.767          ops/ms
ColumnFilterBenchmark.filterDoubleColumn    2047  thrpt    2   71.436          ops/ms
ColumnFilterBenchmark.filterDoubleColumn    4096  thrpt    2   35.992          ops/ms
ColumnFilterBenchmark.filterFloatColumn     1024  thrpt    2  182.151          ops/ms
ColumnFilterBenchmark.filterFloatColumn     2047  thrpt    2   91.096          ops/ms
ColumnFilterBenchmark.filterFloatColumn     4096  thrpt    2   44.757          ops/ms
ColumnFilterBenchmark.filterIntColumn       1024  thrpt    2  184.099          ops/ms
ColumnFilterBenchmark.filterIntColumn       2047  thrpt    2   91.981          ops/ms
ColumnFilterBenchmark.filterIntColumn       4096  thrpt    2   45.170          ops/ms
ColumnFilterBenchmark.filterLongColumn      1024  thrpt    2  148.017          ops/ms
ColumnFilterBenchmark.filterLongColumn      2047  thrpt    2   73.516          ops/ms
ColumnFilterBenchmark.filterLongColumn      4096  thrpt    2   36.844          ops/ms

Withopt:
Benchmark                                 (size)   Mode  Cnt     Score   Error   Units
ColumnFilterBenchmark.filterDoubleColumn    1024  thrpt    2  2051.707          ops/ms
ColumnFilterBenchmark.filterDoubleColumn    2047  thrpt    2   914.072          ops/ms
ColumnFilterBenchmark.filterDoubleColumn    4096  thrpt    2   489.898          ops/ms
ColumnFilterBenchmark.filterFloatColumn     1024  thrpt    2  5324.195          ops/ms
ColumnFilterBenchmark.filterFloatColumn     2047  thrpt    2  2587.229          ops/ms
ColumnFilterBenchmark.filterFloatColumn     4096  thrpt    2  1278.665          ops/ms
ColumnFilterBenchmark.filterIntColumn       1024  thrpt    2  4149.384          ops/ms
ColumnFilterBenchmark.filterIntColumn       2047  thrpt    2  1791.170          ops/ms
ColumnFilterBenchmark.filterIntColumn       4096  thrpt    2   974.888          ops/ms
ColumnFilterBenchmark.filterLongColumn      1024  thrpt    2  1128.281          ops/ms
ColumnFilterBenchmark.filterLongColumn      2047  thrpt    2   686.334          ops/ms
ColumnFilterBenchmark.filterLongColumn      4096  thrpt    2   337.170          ops/ms


Kindly review and share your feedback.

Best Regards,
Jatin

-------------

Commit messages:
 - 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target.

Changes: https://git.openjdk.org/jdk/pull/17261/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8322768
  Stats: 336 lines in 10 files changed: 323 ins; 8 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/17261.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261

PR: https://git.openjdk.org/jdk/pull/17261


More information about the core-libs-dev mailing list