RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512
Scott Gibbons
sgibbons at openjdk.org
Tue May 30 17:18:24 UTC 2023
Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem.
Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191).
Old:
gcc-12.2.1-4.fc36.x86_64
3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix
JVM version: 21-internal
Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68
Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67
Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70
Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69
Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44
Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40
Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40
Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40
Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41
Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40
New:
JVM version: 21-internal (float)
Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42
Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27
Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17
Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16
Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17
Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17
Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17
Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16
Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16
Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17
-------------
Commit messages:
- Replace tab with spaces to satisfy checker
- Address first round of review comments.
- Fix conditions for generating fmod
- Code cleanup
- Wrong vector lengths
- More Windows failures
- Merge branch 'openjdk:master' into fmod
- Fix some instruction dependencies
- Fix Windows build
- Fix evmovdquq => vmovdqu
- ... and 9 more: https://git.openjdk.org/jdk/compare/7d2a7ce2...c4a59733
Changes: https://git.openjdk.org/jdk/pull/14224/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8308966
Stats: 661 lines in 10 files changed: 660 ins; 0 del; 1 mod
Patch: https://git.openjdk.org/jdk/pull/14224.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224
PR: https://git.openjdk.org/jdk/pull/14224
More information about the hotspot-dev
mailing list