RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4]
Vladimir Kozlov
kvn at openjdk.java.net
Mon May 17 22:20:44 UTC 2021
On Sat, 15 May 2021 02:06:29 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> This PR contains Short Vector Math Library support related changes for [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for when targeted.
>>
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods.
>> These methods are built into a separate library instead of being part of libjvm.so or jvm.dll.
>>
>> The following changes are made:
>> The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml.
>> The assembly source files are named as “*.S” and include files are named as “*.S.inc”.
>> The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk.
>> Changes are made to build system to support dependency tracking for assembly files with includes.
>> The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux.
>> The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library.
>>
>> Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie at oracle.com).
>>
>> Looking forward to your review and feedback.
>>
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
>> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
>> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
>> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
>> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
>> Float128Vector.COS 42.82 803.02 ops/ms 18.75
>> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
>> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
>> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
>> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
>> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
>> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
>> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
>> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
>> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
>> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
>> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
>> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
>> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
>> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
>> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
>> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
>> Float256Vector.COS 43.75 926.69 ops/ms 21.18
>> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
>> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
>> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
>> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
>> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
>> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
>> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
>> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
>> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
>> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
>> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
>> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
>> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
>> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
>> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
>> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
>> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
>> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
>> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
>> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
>> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
>> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
>> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
>> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
>> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
>> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
>> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
>> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
>> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
>> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
>> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
>> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
>> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
>> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
>> Float64Vector.COS 44.28 394.33 ops/ms 8.91
>> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
>> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
>> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
>> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
>> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
>> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
>> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
>> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
>> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
>> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
>> Float64Vector.TANH 47.65 214.04 ops/ms 4.49
>
> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
>
> Add missing Lib.gmk
I reviewed only HotSpot part. Main complain is long list of library entry points (vector functions) repeated in several files. May be use macros/lambdas? with loops to reduce number of lines and consolidate in one place. Add comments explaining numbers and letters in names.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 7290:
> 7288: StubRoutines::_vector_cbrt_float128 = (address)os::dll_lookup(libsvml, "__svml_cbrtf4_ha_ex");
> 7289: StubRoutines::_vector_cbrt_double64 = (address)os::dll_lookup(libsvml, "__svml_cbrt1_ha_ex");
> 7290: StubRoutines::_vector_cbrt_double128 = (address)os::dll_lookup(libsvml, "__svml_cbrt2_ha_ex");
May be use macros with **comments**? In this code is very easy make `copy-paste` errors. And It is very not clear for me what all these numbers and letters mean in names. I can guess but it should be comments.
src/hotspot/cpu/x86/x86_64.ad line 1713:
> 1711: return OptoRegPair(hi, lo);
> 1712: }
> 1713:
Should these methods check `EnableVectorSupport` flag too? Or `UseVectorStubs`?
src/hotspot/share/opto/callnode.cpp line 747:
> 745:
> 746: // If the return is in vector, compute appropriate regmask taking into account the whole range
> 747: if(ideal_reg >= Op_VecS && ideal_reg <= Op_VecZ) {
Should this be done only for CallLeafVector?
Can `Valhalla` return big object in one of vector registers?
src/hotspot/share/opto/vectorIntrinsics.cpp line 353:
> 351: if (operation == NULL) {
> 352: if (C->print_intrinsics()) {
> 353: tty->print_cr(" ** svml call failed");
Also print information about the call which fail.
src/hotspot/share/opto/vectorIntrinsics.cpp line 1682:
> 1680: default: Unimplemented(); break;
> 1681: }
> 1682: }
Macros?
src/hotspot/share/runtime/stubRoutines.cpp line 330:
> 328: address StubRoutines::_vector_atan2_double256 = NULL;
> 329: address StubRoutines::_vector_atan2_double512 = NULL;
> 330: #endif // __VECTOR_API_MATH_INTRINSICS_COMMON
Macros?
-------------
Changes requested by kvn (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/3638
More information about the hotspot-compiler-dev
mailing list