[foreign-memaccess+abi] RFR: 8315041: Optimize Java to C string conversion by avoiding double copy

Glavo duke at openjdk.org
Mon Aug 28 18:54:28 UTC 2023


On Mon, 28 Aug 2023 18:00:03 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> > I thought about that. That conversion consists of three steps:
> > 
> > 1. find the length of the C string
> > 2. bulk copy the string bytes into a new byte[]
> > 3. create a new string from the bytes (which copies the bytes again)
> 
> Actual C String to Java String conversions are useful for Java bindings of C libraries using FFM, but the other related use case that's orders of magnitude bigger in term of worldwide Java CPU consumption is native byte arrays strings from IO buffers to Java String conversion which happens for all deserializations of both binary and text encoding. In this case, the length of the native byte array is already known through the encoding / protocol parsing and strlen isn't relevant. This conversion therefore skips step (1) and become a 2 steps conversion.
> 
> In this case, one option could be to look for negatives during the copy of the bytes. As the vectorized copy holds each of the bytes in vector registers during the copy, efficiently testing for negatives at the same time would be doable. But of course the intrinsic code for array copy is also already very complex and duplicating it just for String construction isn't quite appealing. I wonder though if implementing it using the Vector API wouldn't yield a decent performance improvement vs. fully intrinsified double copy + negatives search, for an obviously much lower implementation cost. If so, it would benefits both C strings to Java String conversions and native byte arrays to Java String conversions (from MemorySegments as well as from DirectByteBuffers).

I have tried a few things with the Vector API:

https://github.com/Glavo/java-ffi-benchmark/blob/f0feab7d510e0784da0b9ca5346d0721fc7ba42e/src/main/java/benchmark/experimental/GetStringUTF8Benchmark.java#L42C13-L42C13

This implementation is crude and lacks a lot of checks (such as alignment), but it seems to be doing fine.

Maybe we can try more when the Vector API is stable.

-------------

PR Comment: https://git.openjdk.org/panama-foreign/pull/875#issuecomment-1696206128


More information about the panama-dev mailing list