Comparing the performance of Panama with JNI, JNA, and JNR - based on Java 21

Mon Mar 27 10:48:02 UTC 2023

>
> we do have a mirror internal API (VectorSupport) which perhaps can be used
>

It's interesting, I'll take a look at it.

Maybe I can create a PR that optimizes getUtf8String in the near future,
but I can't guarantee it.

Glavo

On Mon, Mar 27, 2023 at 4:59 PM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

> Hi Glavo,
> I agree that, from an architectural perspective, doing something like this
> would be preferrable to using a native method. There are some complications
> with using the Vector API from java.base (as vector is an incubating API),
> but we do have a mirror internal API (VectorSupport) which perhaps can be
> used - at a lower level - to achieve the same thing. I'll ask around.
>
> Maurizio
> On 26/03/2023 18:26, Glavo wrote:
>
> I made an attempt:
>
> I implemented a method using the 128-bit SSE/AVX instruction (via the
> vector api) to find bytes less than or equal to 0.
> Unlike strlen, which only looks for null terminators, it also looks for
> negative bytes to determine whether the string contains non-ASCII
> characters.
> If a string contains only ASCII characters, it can use a fast path to
> directly call the constructor of the String without having to decode and
> copy the array again (thanks to compact strings).
>
> I ran the JMH benchmark and the results were satisfactory:
>
>    - There is only a slight performance regression (< 10%) for non-ASCII
>    strings smaller than 16 bytes;
>    - Although SIMD is not used for ASCII strings smaller than 16 bytes,
>    throughput has increased by 33% due to the new fast path;
>    - For non-ASCII strings larger than 16 bytes, the throughput increased
>    by 5%~465% due to SIMD;
>    - For ASCII strings larger than 16 bytes, the throughput increased by
>    104%~2207%.
>
> For 4KiB ASCII strings, the new implementation is 22 times faster! Even
> small ASCII strings of only 16 bytes have double the performance.
> This is a big victory, and even using strlen won't achieve such a
> significant improvement.
>
> Here is the source code:
>
>
> https://github.com/Glavo/java-ffi-benchmark/blob/main/src/main/java/benchmark/experimental/GetStringUTF8Benchmark.java
> <https://urldefense.com/v3/__https://github.com/Glavo/java-ffi-benchmark/blob/main/src/main/java/benchmark/experimental/GetStringUTF8Benchmark.java__;!!ACWV5N9M2RV99hQ!LnjvyYma_gs26cNG3q5GisIhlUP8XXxxEaQeDTw0bDNxaOFlWP3Kf4ZWFnAu9kO6yFCT-V7IaKQfQexp6FjTGsQ3Zw$>
>
> It's just a simple implementation for experimental purposes. For
> simplicity, I used 128-bit SIMD instructions.
> In the future, we can consider choosing AVX-2 or AVX-512 at runtime, and
> maybe we can get more gains.
>
> Glavo
>
> On Sun, Mar 26, 2023 at 9:18 PM Sebastian Stenzel <
> sebastian.stenzel at gmail.com> wrote:
>
>>
>> > Am 26.03.2023 um 14:46 schrieb Maurizio Cimadamore <
>> maurizio.cimadamore at oracle.com>:
>> >
>> > Forgot: another problem is that just offloading to external "strlen"
>> will not respect the memory segment boundaries (e.g. the underlying strlen
>> will keep going even past the spatial boundaries of the memory segment).
>>
>> How about using strnlen? At least for native segments?
>>
>> Improving string conversion efficiency would make a huge difference in my
>> FUSE bindings, where virtually every call contains a file path.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230327/0dfd381e/attachment-0001.htm>