RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5]
Magnus Ihse Bursie
ihse at openjdk.org
Thu Nov 30 09:37:35 UTC 2023
On Thu, 30 Nov 2023 06:39:43 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]).
>>
>> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead.
>>
>> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly.
>>
>> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts.
>>
>> [1] https://github.com/openjdk/jdk/pull/3638
>> [2] https://sleef.org/
>> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/
>> [4] https://packages.debian.org/bookworm/libsleef3
>> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
> Rename vmath to sleef in configure
This version looks much better, thank you! I guess cflags/SVE_CFLAGS is an okay-ish solution.
I'm still not 100% happy though, but it might be due to my limited understanding. Let me write down a few numbered statements and then you can tell me if I'm right or wrong.
1. The aarch64 supports two different SIMD instruction set additions, Neon and SVE.
2. A specific instance of an aarch64 CPU can implement Neon, or SVE, or none of them, but not both.
3. SVE is superior to Neon, and is far more common these days.
4. We would like to ship a single version of libvmath.so, that supports SVE if it happens to be run on a CPU with SVE.
5. THe same version will just use the fallback code that "works" but has lower performance if run on a CPU without SVE (regardless of if it has Neon or not)
6. If libvmath.so is built without SVE support, and is then run on a CPU with SVE, it will "work", but not utilize the SVE functionality, so have degraded performance compared to what we want.
7. To be able to build libvmath.so with SVE support, we need to be able to compile a simple test program using `#include <arm_sve.h>` and `-march=armv8-a+sve`. If this fails, we cannot build libvmath.so with SVE support.
8. The ability to build with SVE support should only be dependent on the gcc compiler and sysroot header files, and not the SIMD instruction set of the build machine CPU.
If all these are correct, then I think the problem is that we just silently ignore if building with SVE fails. Instead, it should cause configure to fail.
If, for some reason, we must support build environment that cannot build for SVE, then we need to have a configure flag that allows us to require the presence of SVE building ability, like --enable-sve-support, which will be "auto" by default and thus adapt to the platform, but can be set to on, which will cause a configure fail if the platform does not have SVE compilation abilities.
We cannot just silently drop expected functionality depending on the build machine, or at the very least, we must have a way to prevent that from happening.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1833403493
More information about the core-libs-dev
mailing list