RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]
himichael
duke at openjdk.org
Sat Oct 14 03:25:34 UTC 2023
On Fri, 13 Oct 2023 23:59:55 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:
> > my question is that this feature should improve performance several times, but it doesn't look like there's much difference between open jdk 22.19 and jdk 8. is there a problem with my configuration ?
>
> Hello @himichael,
>
> Using your code snippet, please see the output below using the latest JDK and JDK 20 (which does not have AVX512 sort):
>
> JDK 20 (without AVX512 sort): `java -XX:CompileCommand=CompileThresholdScaling,java.util.DualPivotQuicksort::sort,0.0001 -XX:-TieredCompilation JDKSort `
>
> elapse time -> **7501 ms**
>
> JDK 22 (with AVX512 sort) `java -XX:CompileCommand=CompileThresholdScaling,java.util.DualPivotQuicksort::sort,0.0001 -XX:-TieredCompilation JDKSort` elapse time -> **1607 ms**
>
> It shows 4.66x speedup.
Hello, @vamsi-parasa
I used the commands you provided, but nothing seems to have changed.
The test procedure as follow:
use JDK 8(without AVX512 sort)
/data/soft/jdk1.8.0_371/bin/javac JDKSort.java
/data/soft/jdk1.8.0_371/bin/java JDKSort
elapse time -> **15309 ms**
use OpenJDK 22.19(with AVX512 sort)
/data/soft/jdk-22/bin/javac JDKSort.java
/data/soft/jdk-22/bin/java -XX:CompileCommand=CompileThresholdScaling,java.util.DualPivotQuicksort::sort,0.0001 -XX:-TieredCompilation JDKSort
CompileCommand: CompileThresholdScaling java/util/DualPivotQuicksort.sort double CompileThresholdScaling = 0.000100
elapse time -> **11687 ms**
Not much seems to have changed.
My JDK info:
OpenJDK 22.19:
/data/soft/jdk-22/bin/java -version
openjdk version "22-ea" 2024-03-19
OpenJDK Runtime Environment (build 22-ea+19-1460)
OpenJDK 64-Bit Server VM (build 22-ea+19-1460, mixed mode, sharing)
JDK 8:
/data/soft/jdk1.8.0_371/bin/java -version
java version "1.8.0_371"
Java(TM) SE Runtime Environment (build 1.8.0_371-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.371-b11, mixed mode)
I tested Intel's **x86-simd-sort**, my code as follow:
```c++
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include "src/avx512-32bit-qsort.hpp"
int main() {
// 100 million records
const int size = 100000000;
std::vector<int> random_array(size);
for (int i = 0; i < size; ++i) {
random_array[i] = rand();
}
auto start_time = std::chrono::steady_clock::now();
avx512_qsort(random_array.data(), size);
auto end_time = std::chrono::steady_clock::now();
auto elapse_time = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time).count();
std::cout << "elapse time -> " << elapse_time << " ms" << std::endl;
return 0;
}
compile commands:
g++ -o sort -O3 -mavx512f -mavx512dq sort.cpp
elapse time -> **1151 ms**
An order of magnitude performance improvement.
Here is my cpu information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel Xeon Processor (Skylake, IBRS)
Stepping: 4
CPU MHz: 2394.374
BogoMIPS: 4788.74
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 md_clear spec_ctrl
```lscpu | grep avx``` The following instructions are supported:
- avx
- avx2
- avx512f
- avx512dq
- avx512cd
- avx512bw
- avx512vl
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1762543464
More information about the build-dev
mailing list