[foreign-memaccess+abi] RFR: 8315041: Optimize Java to C string conversion by avoiding double copy [v2]
ExE Boss
duke at openjdk.org
Sun Aug 27 20:04:31 UTC 2023
On Fri, 25 Aug 2023 16:28:56 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> When converting a Java string to a C string, we need to call String::getBytes first, with the desired charset.
>> This will end up creating a temporary byte array where the decoded string chars are saved.
>> Now, the string implementation is already quite efficient, and in most cases, this will boil down to a simple call to the array's `clone` method.
>> That said, we could still avoid allocation of an intermediate buffer, if we know that the desired charset is compatible with the string's intenal byte representation.
>> For instance, if the string we want to convert has its coder set to `LATIN1` then:
>> * we can just use the raw bits if the desired coder is also `LATIN1`.
>> * if desired coder is either `ASCII` or `UTF8`, we can perform a quick check to see if all the bytes in the string are zero or positive. If so we can, again, just use the raw string bits.
>>
>> Note that the method to determine whether the string bytes are positive (`StringCoder::countPositives`) is already a JVM intrinsics, and it is quite efficient. This means that calling this predicate will generally be faster than copying the entire string bytes into a new buffer.
>>
>> This patch adds some logic to detect whether we can use the raw string bytes, and then a method which copies the string bytes into an existing segment. These two functionalities are added to `JavaLangAccess`.
>>
>> It would have been possible to simplify the code by adding a single internal method to expose the raw string bytes, but we decided against it, given the potential for misuse (even inside the JDK itself).
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
>
> Drop internal property to switch between different implementations
src/java.base/share/classes/java/lang/String.java line 1849:
> 1847: } else {
> 1848: return false;
> 1849: }
The common check for `coder == LATIN1` can be extracted to its own block and use the internal `String::isLatin1()` helper method, which correctly handles the case when `COMPACT_STRINGS` is `false`:
Suggestion:
if (isLatin1()) {
if (charset == ISO_8859_1.INSTANCE) {
return true;
} else if (charset == UTF_8.INSTANCE || charset == US_ASCII.INSTANCE) {
return !StringCoding.hasNegatives(value, 0, value.length);
}
}
return false;
-------------
PR Review Comment: https://git.openjdk.org/panama-foreign/pull/875#discussion_r1306714280
More information about the panama-dev
mailing list