[foreign-memaccess+abi] Integrated: 8315041: Optimize Java to C string conversion by avoiding double copy

Maurizio Cimadamore mcimadamore at openjdk.org
Mon Aug 28 17:04:38 UTC 2023


On Fri, 25 Aug 2023 16:15:55 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> When converting a Java string to a C string, we need to call String::getBytes first, with the desired charset.
> This will end up creating a temporary byte array where the decoded string chars are saved.
> Now, the string implementation is already quite efficient, and in most cases, this will boil down to a simple call to the array's `clone` method.
> That said, we could still avoid allocation of an intermediate buffer, if we know that the desired charset is compatible with the string's intenal byte representation.
> For instance, if the string we want to convert has its coder set to `LATIN1` then:
> * we can just use the raw bits if the desired coder is also `LATIN1`.
> * if desired coder is either `ASCII` or `UTF8`, we can perform a quick check to see if all the bytes in the string are zero or positive. If so we can, again, just use the raw string bits.
> 
> Note that the method to determine whether the string bytes are positive (`StringCoder::countPositives`) is already a JVM intrinsics, and it is quite efficient. This means that calling this predicate will generally be faster than copying the entire string bytes into a new buffer.
> 
> This patch adds some logic to detect whether we can use the raw string bytes, and then a method which copies the string bytes into an existing segment. These two functionalities are added to `JavaLangAccess`.
> 
> It would have been possible to simplify the code by adding a single internal method to expose the raw string bytes, but we decided against it, given the potential for misuse (even inside the JDK itself).

This pull request has now been integrated.

Changeset: 7c46965a
Author:    Maurizio Cimadamore <mcimadamore at openjdk.org>
URL:       https://git.openjdk.org/panama-foreign/commit/7c46965ac319e19c10fd3a50c1ecc4daef9641c5
Stats:     87 lines in 5 files changed: 73 ins; 4 del; 10 mod

8315041: Optimize Java to C string conversion by avoiding double copy

Reviewed-by: pminborg

-------------

PR: https://git.openjdk.org/panama-foreign/pull/875


More information about the panama-dev mailing list