RFR: 8353217: Build libsleef on macos-aarch64

Vladimir Ivanov vlivanov at openjdk.org
Sat Mar 29 01:18:33 UTC 2025


On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform.
> 
> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation.
> 
> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`.
> 
> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. 
> 
> Testing: hs-tier1 - hs-tier4, microbenchmarks

Microbenchmark results on Apple M1 Pro:

   Benchmark          |            Throughput                 |             Allocation rate                       |
                      |    Before           After             |      Before                 After                 |
======================|=======================================|===================================================|
Float128Vector.ACOS   |   3.856 ±0.013   1.941 ± 0.008  us/op |   6076.461 ± 20.067      0.007 ±0.001      MB/sec |
Float128Vector.ASIN   |   3.813 ±0.014   1.512 ± 0.017  us/op |   6145.040 ± 22.824      0.007 ±0.001      MB/sec |
Float128Vector.ATAN   |   7.124 ±0.040   2.220 ± 0.003  us/op |   3289.059 ± 18.539      0.007 ±0.001      MB/sec |
Float128Vector.ATAN2  |  16.983 ±1.031   3.412 ± 0.038  us/op |   2075.808 ±127.179      0.007 ±0.001      MB/sec |
Float128Vector.CBRT   |   6.431 ±0.014   4.075 ± 0.011  us/op |   3643.789 ±  7.933      0.007 ±0.001      MB/sec |
Float128Vector.COS    |   8.269 ±0.094   5.614 ± 0.026  us/op |   2833.915 ± 32.041      0.007 ±0.001      MB/sec |
Float128Vector.COSH   |   5.779 ±0.020   3.072 ± 0.010  us/op |   4054.800 ± 14.028      0.007 ±0.001      MB/sec |
Float128Vector.EXP    |   5.456 ±0.006   0.936 ± 0.004  us/op |   4294.853 ±  5.025      0.007 ±0.001      MB/sec |
Float128Vector.EXPM1  |   6.888 ±0.059   2.972 ± 0.010  us/op |   3402.363 ± 28.694      0.007 ±0.001      MB/sec |
Float128Vector.HYPOT  |   6.369 ±0.013   2.213 ± 0.008  us/op |   5519.051 ± 11.103      0.007 ±0.001      MB/sec |
Float128Vector.LOG    |   8.469 ±0.574   1.729 ± 0.004  us/op |   2775.039 ±157.629      0.007 ±0.001      MB/sec |
Float128Vector.LOG10  |  15.235 ±1.039   1.830 ± 0.006  us/op |   1544.009 ±107.436      0.007 ±0.001      MB/sec |
Float128Vector.LOG1P  |   8.823 ±0.040   1.745 ± 0.014  us/op |   2655.757 ± 11.964      0.007 ±0.001      MB/sec |
Float128Vector.POW    |  27.511 ±0.918   7.467 ± 0.033  us/op |   1278.693 ± 42.538      0.007 ±0.001      MB/sec |
Float128Vector.SIN    |   7.846 ±0.063   5.822 ± 0.015  us/op |   2986.480 ± 24.025      0.007 ±0.001      MB/sec |
Float128Vector.SINH   |   5.747 ±0.033   3.206 ± 0.034  us/op |   4077.645 ± 23.305      0.007 ±0.001      MB/sec |
Float128Vector.TAN    |  22.337 ±0.533   6.114 ± 0.016  us/op |   1049.469 ± 24.969      0.007 ±0.001      MB/sec |

Double128Vector.ACOS  |   5.789 ±0.107   4.635 ± 0.013  us/op |   8097.069 ±146.593      0.007 ±0.001      MB/sec |
Double128Vector.ASIN  |   5.655 ±0.011   3.858 ± 0.017  us/op |   8287.521 ± 16.023      0.007 ±0.001      MB/sec |
Double128Vector.ATAN  |  10.082 ±0.046   6.016 ± 0.016  us/op |   4648.068 ± 21.401      0.007 ±0.001      MB/sec |
Double128Vector.ATAN2 |  17.286 ±0.113   8.148 ± 0.015  us/op |   4067.019 ± 26.586      0.007 ±0.001      MB/sec |
Double128Vector.CBRT  |   9.779 ±0.048   8.861 ± 0.045  us/op |   4792.419 ± 23.381      0.007 ±0.001      MB/sec |
Double128Vector.COS   |   9.071 ±0.107   6.948 ± 0.027  us/op |   5166.999 ± 59.377      0.007 ±0.001      MB/sec |
Double128Vector.COSH  |   8.234 ±0.030   6.403 ± 0.025  us/op |   5692.144 ± 20.625      0.007 ±0.001      MB/sec |
Double128Vector.EXP   |   7.506 ±0.012   3.073 ± 0.013  us/op |   6243.783 ± 10.382      0.007 ±0.001      MB/sec |
Double128Vector.EXPM1 |   9.122 ±0.036   6.122 ± 0.036  us/op |   5137.721 ± 20.350      0.007 ±0.001      MB/sec |
Double128Vector.HYPOT |  13.445 ±0.248   4.596 ± 0.035  us/op |   5229.977 ± 96.222      0.007 ±0.001      MB/sec |
Double128Vector.LOG   |  10.396 ±0.042   4.629 ± 0.081  us/op |   4507.928 ± 18.101      0.007 ±0.001      MB/sec |
Double128Vector.LOG10 |  13.923 ±0.046   4.889 ± 0.021  us/op |   3365.944 ± 11.078      0.007 ±0.001      MB/sec |
Double128Vector.LOG1P |  12.336 ±0.045   5.010 ± 0.027  us/op |   3799.204 ± 13.816      0.007 ±0.001      MB/sec |
Double128Vector.POW   |  28.852 ±0.043  15.270 ± 0.081  us/op |   2436.503 ±  3.647      0.007 ±0.001      MB/sec |
Double128Vector.SIN   |   8.821 ±0.018   6.309 ± 0.037  us/op |   5313.077 ± 11.056      0.007 ±0.001      MB/sec |
Double128Vector.SINH  |   8.289 ±0.037   6.566 ± 0.029  us/op |   5654.264 ± 25.538      0.007 ±0.001      MB/sec |
Double128Vector.TAN   |  25.535 ±0.636   9.788 ± 0.036  us/op |   1836.177 ± 44.430      0.007 ±0.001      MB/sec |

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2762959907


More information about the hotspot-dev mailing list