RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5]

Thu Nov 30 09:58:21 UTC 2023

On Thu, 30 Nov 2023 06:39:43 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]).
>> 
>> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead.
>> 
>> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly.
>> 
>> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts.
>> 
>> [1] https://github.com/openjdk/jdk/pull/3638
>> [2] https://sleef.org/
>> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/
>> [4] https://packages.debian.org/bookworm/libsleef3
>> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename vmath to sleef in configure

Okay, now I found a few more of your comments that I missed before. I apologize, the Github PR review UI can be a bit confusing when discussions are taking place in multiple locations. So, here's a revision to my list above:

1. An aach64 CPU can have both Neon and SVE present at the same time.
2. You are assuming that Neon is always present, and what I referred to as the fallback case is in fact using Neon instead of SVE.
4. You would like to split vect_math.c into two parts, e.g.  vect_math_neon.c and vect_math_sve.c. 
5. You will then, use heuristics in hotspot to determine at runtime if SVE or Neon functionality should be used.  Even if SVE is present on the runtime machine, heuristics can chose to use the Neon implementation anyway in some cases.
6. Only vect_math_sve.c. need the -march+sve. 
7. The neon part do not need the -march+sve flag, and will fail if built with this flag. (???)

The last point seemed very confusing to me. Right now, you can compile the entire file with the -march+sve flag, right?

Anyway, it is straightforward to add compiler flags to individual files. You do it like this:

  $(eval $(call SetupJdkLibrary, BUILD_LIBVMATH, \
      NAME := vmath, \
      CFLAGS := $(CFLAGS_JDKLIB) $(LIBSLEEF_CFLAGS) -fvisibility=default, \
      vect_math_sve.c_CFLAGS := $(SVE_CFLAGS), \
 ...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1833434870