RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5]
Xiaohong Gong
xgong at openjdk.org
Fri Dec 1 01:33:12 UTC 2023
On Thu, 30 Nov 2023 06:39:43 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]).
>>
>> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead.
>>
>> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly.
>>
>> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts.
>>
>> [1] https://github.com/openjdk/jdk/pull/3638
>> [2] https://sleef.org/
>> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/
>> [4] https://packages.debian.org/bookworm/libsleef3
>> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
> Rename vmath to sleef in configure
> Okay, now I found a few more of your comments that I missed before. I apologize, the Github PR review UI can be a bit confusing when discussions are taking place in multiple locations. So, here's a revision to my list above:
>
> 1. An aach64 CPU can have both Neon and SVE present at the same time.
> 2. You are assuming that Neon is always present, and what I referred to as the fallback case is in fact using Neon instead of SVE.
> 3. You would like to split vect_math.c into two parts, e.g. vect_math_neon.c and vect_math_sve.c.
> 4. You will then, use heuristics in hotspot to determine at runtime if SVE or Neon functionality should be used. Even if SVE is present on the runtime machine, heuristics can chose to use the Neon implementation anyway in some cases.
> 5. Only vect_math_sve.c. need the -march+sve.
Yes, all the above are true.
> 6. The neon part do not need the -march+sve flag, and will fail if built with this flag. (???)
Current neon code does not need the `-march=arm-a+sve` flag, but it will not fail if we built with it. Because the C compiler (GCC/Clang) can also supports SVE and NEON at the same time. My concert is that if we build the neon code with sve flags, it has the possibility that compiler may generate SVE specific instructions inside the NEON functions in future (e.g. if sve has new features related to method call) , although it doesn't happen now. If so, calling the neon stubs (which contains sve instructions) on non-sve supported machines in runtime may crash. Hence, it's more safe if we can separate neon and sve code and use different flags for them.
> Anyway, it is straightforward to add compiler flags to individual files. You do it like this:
>
> $(eval $(call SetupJdkLibrary, BUILD_LIBVMATH, \
NAME := vmath, \
CFLAGS := $(CFLAGS_JDKLIB) $(LIBSLEEF_CFLAGS) -fvisibility=default, \
vect_math_sve.c_CFLAGS := $(SVE_CFLAGS), \
...
Thanks so much for this. That's very helpful!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1835258683
More information about the build-dev
mailing list