[foreign-memaccess+abi] RFR: 8315041: Optimize Java to C string conversion by avoiding double copy
Maurizio Cimadamore
mcimadamore at openjdk.org
Fri Aug 25 16:22:45 UTC 2023
On Fri, 25 Aug 2023 16:15:55 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
> When converting a Java string to a C string, we need to call String::getBytes first, with the desired charset.
> This will end up creating a temporary byte array where the decoded string chars are saved.
> Now, the string implementation is already quite efficient, and in most cases, this will boil down to a simple call to the array's `clone` method.
> That said, we could still avoid allocation of an intermediate buffer, if we know that the desired charset is compatible with the string's intenal byte representation.
> For instance, if the string we want to convert has its coder set to `LATIN1` then:
> * we can just use the raw bits if the desired coder is also `LATIN1`.
> * if desired coder is either `ASCII` or `UTF8`, we can perform a quick check to see if all the bytes in the string are zero or positive. If so we can, again, just use the raw string bits.
>
> Note that the method to determine whether the string bytes are positive (`StringCoder::countPositives`) is already a JVM intrinsics, and it is quite efficient. This means that calling this predicate will generally be faster than copying the entire string bytes into a new buffer.
>
> This patch adds some logic to detect whether we can use the raw string bytes, and then a method which copies the string bytes into an existing segment. These two functionalities are added to `JavaLangAccess`.
>
> It would have been possible to simplify the code by adding a single internal method to expose the raw string bytes, but we decided against it, given the potential for misuse (even inside the JDK itself).
Numbers are as follows:
Before:
Benchmark (size) Mode Cnt Score Error Units
ToCStringTest.panama_writeString 5 avgt 30 51.716 ? 1.489 ns/op
ToCStringTest.panama_writeString 20 avgt 30 51.704 ? 0.881 ns/op
ToCStringTest.panama_writeString 100 avgt 30 57.848 ? 2.402 ns/op
ToCStringTest.panama_writeString 200 avgt 30 60.734 ? 3.613 ns/op
After:
Benchmark (size) Mode Cnt Score Error Units
ToCStringTest.panama_writeString 5 avgt 30 45.227 ? 0.329 ns/op
ToCStringTest.panama_writeString 20 avgt 30 47.015 ? 0.491 ns/op
ToCStringTest.panama_writeString 100 avgt 30 47.571 ? 1.447 ns/op
ToCStringTest.panama_writeString 200 avgt 30 48.630 ? 0.954 ns/op
Of course, the bigger the string size, the bigger the speedup. But, even for small strings, we get a nice boost.
-------------
PR Comment: https://git.openjdk.org/panama-foreign/pull/875#issuecomment-1693610244
More information about the panama-dev
mailing list