[foreign-memaccess+abi] RFR: 8315041: Optimize Java to C string conversion by avoiding double copy

Maurizio Cimadamore mcimadamore at openjdk.org
Mon Aug 28 16:53:32 UTC 2023


On Fri, 25 Aug 2023 16:17:21 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> When converting a Java string to a C string, we need to call String::getBytes first, with the desired charset.
>> This will end up creating a temporary byte array where the decoded string chars are saved.
>> Now, the string implementation is already quite efficient, and in most cases, this will boil down to a simple call to the array's `clone` method.
>> That said, we could still avoid allocation of an intermediate buffer, if we know that the desired charset is compatible with the string's intenal byte representation.
>> For instance, if the string we want to convert has its coder set to `LATIN1` then:
>> * we can just use the raw bits if the desired coder is also `LATIN1`.
>> * if desired coder is either `ASCII` or `UTF8`, we can perform a quick check to see if all the bytes in the string are zero or positive. If so we can, again, just use the raw string bits.
>> 
>> Note that the method to determine whether the string bytes are positive (`StringCoder::countPositives`) is already a JVM intrinsics, and it is quite efficient. This means that calling this predicate will generally be faster than copying the entire string bytes into a new buffer.
>> 
>> This patch adds some logic to detect whether we can use the raw string bytes, and then a method which copies the string bytes into an existing segment. These two functionalities are added to `JavaLangAccess`.
>> 
>> It would have been possible to simplify the code by adding a single internal method to expose the raw string bytes, but we decided against it, given the potential for misuse (even inside the JDK itself).
>
> Numbers are as follows:
> 
> Before:
> 
> 
> Benchmark                         (size)  Mode  Cnt   Score   Error  Units
> ToCStringTest.panama_writeString       5  avgt   30  51.716 ? 1.489  ns/op
> ToCStringTest.panama_writeString      20  avgt   30  51.704 ? 0.881  ns/op
> ToCStringTest.panama_writeString     100  avgt   30  57.848 ? 2.402  ns/op
> ToCStringTest.panama_writeString     200  avgt   30  60.734 ? 3.613  ns/op
> 
> 
> After:
> 
> 
> Benchmark                         (size)  Mode  Cnt   Score   Error  Units
> ToCStringTest.panama_writeString       5  avgt   30  45.227 ? 0.329  ns/op
> ToCStringTest.panama_writeString      20  avgt   30  47.015 ? 0.491  ns/op
> ToCStringTest.panama_writeString     100  avgt   30  47.571 ? 1.447  ns/op
> ToCStringTest.panama_writeString     200  avgt   30  48.630 ? 0.954  ns/op
> 
> 
> Of course, the bigger the string size, the bigger the speedup. But, even for small strings, we get a nice boost.

> @mcimadamore Great work! Do you plan to similarly remove the double copy necessary for C to Java String conversion?

I thought about that. That conversion consists of three steps:

1. find the length of the C string
2. bulk copy the string bytes into a new byte[]
3. create a new string from the bytes (which copies the bytes again)

In principle, it should be possible to avoid the allocation in (3) by passing the `byte[]` computed in (2) directly to a private string constructor (of course, depending on the charset being used). But, to do this efficiently, we need to be able to quickly determine as to whether all the bytes in the array are positive (for `UTF8` and `ASCII`). In the Java to C conversion we do that by using StringCoder::hasNegatives` which is backed by a vectorized JVM intrinsics. Since this method only works for `byte[]` input, not for `MemorySegment`, we have a problem there: if scanning the string bytes takes too long, we're in a pickle, because the time we save for the allocation will go in the new scanning.

Ideally, you'd like to check for negatives _while_ doing (1), but the logic for (1) is already quite complex. It is something we might look at in the future.

-------------

PR Comment: https://git.openjdk.org/panama-foreign/pull/875#issuecomment-1696022518


More information about the panama-dev mailing list