Comparing the performance of Panama with JNI, JNA, and JNR - based on Java 21
Glavo
zjx001202 at gmail.com
Sun Mar 26 17:26:48 UTC 2023
I made an attempt:
I implemented a method using the 128-bit SSE/AVX instruction (via the
vector api) to find bytes less than or equal to 0.
Unlike strlen, which only looks for null terminators, it also looks for
negative bytes to determine whether the string contains non-ASCII
characters.
If a string contains only ASCII characters, it can use a fast path to
directly call the constructor of the String without having to decode and
copy the array again (thanks to compact strings).
I ran the JMH benchmark and the results were satisfactory:
- There is only a slight performance regression (< 10%) for non-ASCII
strings smaller than 16 bytes;
- Although SIMD is not used for ASCII strings smaller than 16 bytes,
throughput has increased by 33% due to the new fast path;
- For non-ASCII strings larger than 16 bytes, the throughput increased
by 5%~465% due to SIMD;
- For ASCII strings larger than 16 bytes, the throughput increased by
104%~2207%.
For 4KiB ASCII strings, the new implementation is 22 times faster! Even
small ASCII strings of only 16 bytes have double the performance.
This is a big victory, and even using strlen won't achieve such a
significant improvement.
Here is the source code:
https://github.com/Glavo/java-ffi-benchmark/blob/main/src/main/java/benchmark/experimental/GetStringUTF8Benchmark.java
It's just a simple implementation for experimental purposes. For
simplicity, I used 128-bit SIMD instructions.
In the future, we can consider choosing AVX-2 or AVX-512 at runtime, and
maybe we can get more gains.
Glavo
On Sun, Mar 26, 2023 at 9:18 PM Sebastian Stenzel <
sebastian.stenzel at gmail.com> wrote:
>
> > Am 26.03.2023 um 14:46 schrieb Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com>:
> >
> > Forgot: another problem is that just offloading to external "strlen"
> will not respect the memory segment boundaries (e.g. the underlying strlen
> will keep going even past the spatial boundaries of the memory segment).
>
> How about using strnlen? At least for native segments?
>
> Improving string conversion efficiency would make a huge difference in my
> FUSE bindings, where virtually every call contains a file path.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230327/7f93d106/attachment.htm>
More information about the panama-dev
mailing list