RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512

Scott Gibbons sgibbons at openjdk.org
Tue May 30 17:18:24 UTC 2023


Add an intrinsic for x86 AVX and AVX512 fmod.  This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem).  Also addresses dmod / drem.

Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191).

Old:
gcc-12.2.1-4.fc36.x86_64
3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix
JVM version: 21-internal
Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68
Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67
Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70
Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69
Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44
Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40
Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40
Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40
Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41
Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40
New:
JVM version: 21-internal  (float)
Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42
Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27
Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17
Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16
Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17
Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17
Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17
Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16
Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16
Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17

-------------

Commit messages:
 - Replace tab with spaces to satisfy checker
 - Address first round of review comments.
 - Fix conditions for generating fmod
 - Code cleanup
 - Wrong vector lengths
 - More Windows failures
 - Merge branch 'openjdk:master' into fmod
 - Fix some instruction dependencies
 - Fix Windows build
 - Fix evmovdquq => vmovdqu
 - ... and 9 more: https://git.openjdk.org/jdk/compare/7d2a7ce2...c4a59733

Changes: https://git.openjdk.org/jdk/pull/14224/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8308966
  Stats: 661 lines in 10 files changed: 660 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/14224.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224

PR: https://git.openjdk.org/jdk/pull/14224


More information about the hotspot-dev mailing list