RFR: 8353217: Build libsleef on macos-aarch64
Vladimir Ivanov
vlivanov at openjdk.org
Sat Mar 29 01:18:33 UTC 2025
On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:
> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform.
>
> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation.
>
> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`.
>
> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup.
>
> Testing: hs-tier1 - hs-tier4, microbenchmarks
Microbenchmark results on Apple M1 Pro:
Benchmark | Throughput | Allocation rate |
| Before After | Before After |
======================|=======================================|===================================================|
Float128Vector.ACOS | 3.856 ±0.013 1.941 ± 0.008 us/op | 6076.461 ± 20.067 0.007 ±0.001 MB/sec |
Float128Vector.ASIN | 3.813 ±0.014 1.512 ± 0.017 us/op | 6145.040 ± 22.824 0.007 ±0.001 MB/sec |
Float128Vector.ATAN | 7.124 ±0.040 2.220 ± 0.003 us/op | 3289.059 ± 18.539 0.007 ±0.001 MB/sec |
Float128Vector.ATAN2 | 16.983 ±1.031 3.412 ± 0.038 us/op | 2075.808 ±127.179 0.007 ±0.001 MB/sec |
Float128Vector.CBRT | 6.431 ±0.014 4.075 ± 0.011 us/op | 3643.789 ± 7.933 0.007 ±0.001 MB/sec |
Float128Vector.COS | 8.269 ±0.094 5.614 ± 0.026 us/op | 2833.915 ± 32.041 0.007 ±0.001 MB/sec |
Float128Vector.COSH | 5.779 ±0.020 3.072 ± 0.010 us/op | 4054.800 ± 14.028 0.007 ±0.001 MB/sec |
Float128Vector.EXP | 5.456 ±0.006 0.936 ± 0.004 us/op | 4294.853 ± 5.025 0.007 ±0.001 MB/sec |
Float128Vector.EXPM1 | 6.888 ±0.059 2.972 ± 0.010 us/op | 3402.363 ± 28.694 0.007 ±0.001 MB/sec |
Float128Vector.HYPOT | 6.369 ±0.013 2.213 ± 0.008 us/op | 5519.051 ± 11.103 0.007 ±0.001 MB/sec |
Float128Vector.LOG | 8.469 ±0.574 1.729 ± 0.004 us/op | 2775.039 ±157.629 0.007 ±0.001 MB/sec |
Float128Vector.LOG10 | 15.235 ±1.039 1.830 ± 0.006 us/op | 1544.009 ±107.436 0.007 ±0.001 MB/sec |
Float128Vector.LOG1P | 8.823 ±0.040 1.745 ± 0.014 us/op | 2655.757 ± 11.964 0.007 ±0.001 MB/sec |
Float128Vector.POW | 27.511 ±0.918 7.467 ± 0.033 us/op | 1278.693 ± 42.538 0.007 ±0.001 MB/sec |
Float128Vector.SIN | 7.846 ±0.063 5.822 ± 0.015 us/op | 2986.480 ± 24.025 0.007 ±0.001 MB/sec |
Float128Vector.SINH | 5.747 ±0.033 3.206 ± 0.034 us/op | 4077.645 ± 23.305 0.007 ±0.001 MB/sec |
Float128Vector.TAN | 22.337 ±0.533 6.114 ± 0.016 us/op | 1049.469 ± 24.969 0.007 ±0.001 MB/sec |
Double128Vector.ACOS | 5.789 ±0.107 4.635 ± 0.013 us/op | 8097.069 ±146.593 0.007 ±0.001 MB/sec |
Double128Vector.ASIN | 5.655 ±0.011 3.858 ± 0.017 us/op | 8287.521 ± 16.023 0.007 ±0.001 MB/sec |
Double128Vector.ATAN | 10.082 ±0.046 6.016 ± 0.016 us/op | 4648.068 ± 21.401 0.007 ±0.001 MB/sec |
Double128Vector.ATAN2 | 17.286 ±0.113 8.148 ± 0.015 us/op | 4067.019 ± 26.586 0.007 ±0.001 MB/sec |
Double128Vector.CBRT | 9.779 ±0.048 8.861 ± 0.045 us/op | 4792.419 ± 23.381 0.007 ±0.001 MB/sec |
Double128Vector.COS | 9.071 ±0.107 6.948 ± 0.027 us/op | 5166.999 ± 59.377 0.007 ±0.001 MB/sec |
Double128Vector.COSH | 8.234 ±0.030 6.403 ± 0.025 us/op | 5692.144 ± 20.625 0.007 ±0.001 MB/sec |
Double128Vector.EXP | 7.506 ±0.012 3.073 ± 0.013 us/op | 6243.783 ± 10.382 0.007 ±0.001 MB/sec |
Double128Vector.EXPM1 | 9.122 ±0.036 6.122 ± 0.036 us/op | 5137.721 ± 20.350 0.007 ±0.001 MB/sec |
Double128Vector.HYPOT | 13.445 ±0.248 4.596 ± 0.035 us/op | 5229.977 ± 96.222 0.007 ±0.001 MB/sec |
Double128Vector.LOG | 10.396 ±0.042 4.629 ± 0.081 us/op | 4507.928 ± 18.101 0.007 ±0.001 MB/sec |
Double128Vector.LOG10 | 13.923 ±0.046 4.889 ± 0.021 us/op | 3365.944 ± 11.078 0.007 ±0.001 MB/sec |
Double128Vector.LOG1P | 12.336 ±0.045 5.010 ± 0.027 us/op | 3799.204 ± 13.816 0.007 ±0.001 MB/sec |
Double128Vector.POW | 28.852 ±0.043 15.270 ± 0.081 us/op | 2436.503 ± 3.647 0.007 ±0.001 MB/sec |
Double128Vector.SIN | 8.821 ±0.018 6.309 ± 0.037 us/op | 5313.077 ± 11.056 0.007 ±0.001 MB/sec |
Double128Vector.SINH | 8.289 ±0.037 6.566 ± 0.029 us/op | 5654.264 ± 25.538 0.007 ±0.001 MB/sec |
Double128Vector.TAN | 25.535 ±0.636 9.788 ± 0.036 us/op | 1836.177 ± 44.430 0.007 ±0.001 MB/sec |
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2762959907
More information about the hotspot-dev
mailing list