Comparing the performance of Panama with JNI, JNA, and JNR - based on Java 21
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Mar 27 08:59:17 UTC 2023
Hi Glavo,
I agree that, from an architectural perspective, doing something like
this would be preferrable to using a native method. There are some
complications with using the Vector API from java.base (as vector is an
incubating API), but we do have a mirror internal API (VectorSupport)
which perhaps can be used - at a lower level - to achieve the same
thing. I'll ask around.
Maurizio
On 26/03/2023 18:26, Glavo wrote:
> I made an attempt:
>
> I implemented a method using the 128-bit SSE/AVX instruction (via the
> vector api) to find bytes less than or equal to 0.
> Unlike strlen, which only looks for null terminators, it also looks
> for negative bytes to determine whether the string contains non-ASCII
> characters.
> If a string contains only ASCII characters, it can use a fast path to
> directly call the constructor of the String without having to decode
> and copy the array again (thanks to compact strings).
>
> I ran the JMH benchmark and the results were satisfactory:
>
> * There is only a slight performance regression (< 10%) for
> non-ASCII strings smaller than 16 bytes;
> * Although SIMD is not used for ASCII strings smaller than 16 bytes,
> throughput has increased by 33% due to the new fast path;
> * For non-ASCII strings larger than 16 bytes, the throughput
> increased by 5%~465% due to SIMD;
> * For ASCII strings larger than 16 bytes, the throughput increased
> by 104%~2207%.
>
> For 4KiB ASCII strings, the new implementation is 22 times
> faster! Even small ASCII strings of only 16 bytes have double the
> performance.
> This is a big victory, and even using strlen won't achieve such a
> significant improvement.
>
> Here is the source code:
>
> https://github.com/Glavo/java-ffi-benchmark/blob/main/src/main/java/benchmark/experimental/GetStringUTF8Benchmark.java
> <https://urldefense.com/v3/__https://github.com/Glavo/java-ffi-benchmark/blob/main/src/main/java/benchmark/experimental/GetStringUTF8Benchmark.java__;!!ACWV5N9M2RV99hQ!LnjvyYma_gs26cNG3q5GisIhlUP8XXxxEaQeDTw0bDNxaOFlWP3Kf4ZWFnAu9kO6yFCT-V7IaKQfQexp6FjTGsQ3Zw$>
>
> It's just a simple implementation for experimental purposes. For
> simplicity, I used 128-bit SIMD instructions.
> In the future, we can consider choosing AVX-2 or AVX-512 at runtime,
> and maybe we can get more gains.
>
> Glavo
>
> On Sun, Mar 26, 2023 at 9:18 PM Sebastian Stenzel
> <sebastian.stenzel at gmail.com> wrote:
>
>
> > Am 26.03.2023 um 14:46 schrieb Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com>:
> >
> > Forgot: another problem is that just offloading to external
> "strlen" will not respect the memory segment boundaries (e.g. the
> underlying strlen will keep going even past the spatial boundaries
> of the memory segment).
>
> How about using strnlen? At least for native segments?
>
> Improving string conversion efficiency would make a huge
> difference in my FUSE bindings, where virtually every call
> contains a file path.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230327/8bc2fd0d/attachment-0001.htm>
More information about the panama-dev
mailing list