RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics

Sandhya Viswanathan sviswanathan at openjdk.java.net
Thu Apr 22 22:27:50 UTC 2021


Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods.
These methods are built into a separate library instead of being part of libjvm.so or jvm.dll.

The following changes are made:
   The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml.
   The assembly source files are named as “*.S” and include files are named as “*.S.inc”.
   The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk.
   Changes are made to build system to support dependency tracking for assembly files with includes.
   The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux.
   The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library.

Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com).

This work is part of second round of incubation of the Vector API.
JEP: https://bugs.openjdk.java.net/browse/JDK-8261663

Please review.

Performance:
Micro benchmark Base Optimized Unit Gain(Optimized/Base)
Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
Double128Vector.COS 49.94 245.89 ops/ms 4.92
Double128Vector.COSH 26.91 126.00 ops/ms 4.68
Double128Vector.EXP 71.64 379.65 ops/ms 5.30
Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
Double128Vector.LOG 61.95 279.84 ops/ms 4.52
Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
Double128Vector.SIN 49.36 240.79 ops/ms 4.88
Double128Vector.SINH 26.59 103.75 ops/ms 3.90
Double128Vector.TAN 41.05 152.39 ops/ms 3.71
Double128Vector.TANH 45.29 169.53 ops/ms 3.74
Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
Double256Vector.COS 58.26 389.77 ops/ms 6.69
Double256Vector.COSH 29.44 151.11 ops/ms 5.13
Double256Vector.EXP 86.67 564.68 ops/ms 6.52
Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
Double256Vector.LOG 71.52 394.90 ops/ms 5.52
Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
Double256Vector.SIN 57.06 380.98 ops/ms 6.68
Double256Vector.SINH 29.40 117.37 ops/ms 3.99
Double256Vector.TAN 44.90 279.90 ops/ms 6.23
Double256Vector.TANH 54.08 274.71 ops/ms 5.08
Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
Double512Vector.COS 59.88 837.04 ops/ms 13.98
Double512Vector.COSH 30.34 172.76 ops/ms 5.70
Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
Double512Vector.LOG 74.84 996.00 ops/ms 13.31
Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
Double512Vector.POW 37.42 384.13 ops/ms 10.26
Double512Vector.SIN 59.74 728.45 ops/ms 12.19
Double512Vector.SINH 29.47 143.38 ops/ms 4.87
Double512Vector.TAN 46.20 587.21 ops/ms 12.71
Double512Vector.TANH 57.36 495.42 ops/ms 8.64
Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
Double64Vector.COS 23.42 152.01 ops/ms 6.49
Double64Vector.COSH 17.34 113.34 ops/ms 6.54
Double64Vector.EXP 27.08 203.53 ops/ms 7.52
Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
Double64Vector.LOG 26.75 142.63 ops/ms 5.33
Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
Double64Vector.SIN 23.28 146.91 ops/ms 6.31
Double64Vector.SINH 17.62 88.59 ops/ms 5.03
Double64Vector.TAN 21.00 86.43 ops/ms 4.12
Double64Vector.TANH 23.75 111.35 ops/ms 4.69
Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
Float128Vector.COS 42.82 803.02 ops/ms 18.75
Float128Vector.COSH 31.44 118.34 ops/ms 3.76
Float128Vector.EXP 72.43 855.33 ops/ms 11.81
Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
Float128Vector.LOG 52.95 877.94 ops/ms 16.58
Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
Float128Vector.SIN 43.38 745.31 ops/ms 17.18
Float128Vector.SINH 31.11 112.91 ops/ms 3.63
Float128Vector.TAN 37.25 332.13 ops/ms 8.92
Float128Vector.TANH 57.63 453.77 ops/ms 7.87
Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
Float256Vector.COS 43.75 926.69 ops/ms 21.18
Float256Vector.COSH 33.52 130.46 ops/ms 3.89
Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
Float256Vector.SIN 44.07 911.04 ops/ms 20.67
Float256Vector.SINH 33.16 122.50 ops/ms 3.69
Float256Vector.TAN 37.85 497.75 ops/ms 13.15
Float256Vector.TANH 64.27 537.20 ops/ms 8.36
Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
Float512Vector.COS 40.92 1567.93 ops/ms 38.32
Float512Vector.COSH 33.42 138.36 ops/ms 4.14
Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
Float512Vector.POW 32.73 1015.85 ops/ms 31.04
Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
Float512Vector.SINH 33.05 129.39 ops/ms 3.91
Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
Float64Vector.COS 44.28 394.33 ops/ms 8.91
Float64Vector.COSH 28.35 95.27 ops/ms 3.36
Float64Vector.EXP 65.80 486.37 ops/ms 7.39
Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
Float64Vector.LOG 51.93 163.25 ops/ms 3.14
Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
Float64Vector.SIN 44.41 382.09 ops/ms 8.60
Float64Vector.SINH 28.20 90.68 ops/ms 3.22
Float64Vector.TAN 36.29 160.89 ops/ms 4.43
Float64Vector.TANH 47.65 214.04 ops/ms 4.49

-------------

Commit messages:
 - remove whitespace
 - Merge master
 - Small fix
 - cleanup
 - x86 short vector math optimization for Vector API

Changes: https://git.openjdk.java.net/jdk/pull/3638/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8265783
  Stats: 417153 lines in 124 files changed: 416959 ins; 126 del; 68 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


More information about the hotspot-compiler-dev mailing list