[foreign-memaccess+abi] RFR: 8315041: Optimize Java to C string conversion by avoiding double copy [v2]
Per Minborg
pminborg at openjdk.org
Mon Aug 28 07:00:33 UTC 2023
On Fri, 25 Aug 2023 16:28:56 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> When converting a Java string to a C string, we need to call String::getBytes first, with the desired charset.
>> This will end up creating a temporary byte array where the decoded string chars are saved.
>> Now, the string implementation is already quite efficient, and in most cases, this will boil down to a simple call to the array's `clone` method.
>> That said, we could still avoid allocation of an intermediate buffer, if we know that the desired charset is compatible with the string's intenal byte representation.
>> For instance, if the string we want to convert has its coder set to `LATIN1` then:
>> * we can just use the raw bits if the desired coder is also `LATIN1`.
>> * if desired coder is either `ASCII` or `UTF8`, we can perform a quick check to see if all the bytes in the string are zero or positive. If so we can, again, just use the raw string bits.
>>
>> Note that the method to determine whether the string bytes are positive (`StringCoder::countPositives`) is already a JVM intrinsics, and it is quite efficient. This means that calling this predicate will generally be faster than copying the entire string bytes into a new buffer.
>>
>> This patch adds some logic to detect whether we can use the raw string bytes, and then a method which copies the string bytes into an existing segment. These two functionalities are added to `JavaLangAccess`.
>>
>> It would have been possible to simplify the code by adding a single internal method to expose the raw string bytes, but we decided against it, given the potential for misuse (even inside the JDK itself).
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
>
> Drop internal property to switch between different implementations
Nice improvement. In the future, either the JVM might be improved so intermediate copying could be detected and eliminated or we could add ways to write directly into a memory segment.
-------------
Marked as reviewed by pminborg (Committer).
PR Review: https://git.openjdk.org/panama-foreign/pull/875#pullrequestreview-1597595005
More information about the panama-dev
mailing list