RFR: 8369564: Provide a MemorySegment API to read strings with known lengths [v9]
Jorn Vernee
jvernee at openjdk.org
Fri Nov 21 12:59:52 UTC 2025
On Fri, 21 Nov 2025 10:47:42 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:
>> This PR proposes adding a new overload to `MemorySegment::getString` that takes a known byte length of the content.
>>
>> This was previously proposed in https://github.com/openjdk/jdk/pull/20725, but the outcome of [JDK-8333843](https://bugs.openjdk.org/browse/JDK-8333843) was to update `MemorySegment#getString` to suggest
>>
>>
>> byte[] bytes = new byte[length];
>> MemorySegment.copy(segment, JAVA_BYTE, offset, bytes, 0, length);
>> return new String(bytes, charset);
>>
>>
>> However this is less efficient than what the implementation of getString does after [JDK-8362893](https://bugs.openjdk.org/browse/JDK-8362893), it now uses `JavaLangAccess::uncheckedNewStringNoRepl` to avoid the copy.
>>
>> See also discussion in [this panama-dev@ thread](https://mail.openjdk.org/pipermail/panama-dev/2025-November/021193.html), and mcimadamore's document [Pulling the (foreign) string](https://cr.openjdk.org/~mcimadamore/panama/strings_ffm.html)
>>
>> Benchmark results:
>>
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ToJavaStringTest.jni_readString 5 avgt 30 55.339 ± 0.401 ns/op
>> ToJavaStringTest.jni_readString 20 avgt 30 59.887 ± 0.295 ns/op
>> ToJavaStringTest.jni_readString 100 avgt 30 84.288 ± 0.419 ns/op
>> ToJavaStringTest.jni_readString 200 avgt 30 119.275 ± 0.496 ns/op
>> ToJavaStringTest.jni_readString 451 avgt 30 193.106 ± 1.528 ns/op
>> ToJavaStringTest.panama_copyLength 5 avgt 30 7.348 ± 0.048 ns/op
>> ToJavaStringTest.panama_copyLength 20 avgt 30 7.440 ± 0.125 ns/op
>> ToJavaStringTest.panama_copyLength 100 avgt 30 11.766 ± 0.058 ns/op
>> ToJavaStringTest.panama_copyLength 200 avgt 30 16.096 ± 0.089 ns/op
>> ToJavaStringTest.panama_copyLength 451 avgt 30 25.844 ± 0.054 ns/op
>> ToJavaStringTest.panama_readString 5 avgt 30 5.857 ± 0.046 ns/op
>> ToJavaStringTest.panama_readString 20 avgt 30 7.750 ± 0.046 ns/op
>> ToJavaStringTest.panama_readString 100 avgt 30 14.109 ± 0.187 ns/op
>> ToJavaStringTest.panama_readString 200 avgt 30 18.035 ± 0.130 ns/op
>> ToJavaStringTest.panama_readString 451 avgt 30 35.896 ± 0.227 ns/op
>> ToJavaStringTest.panama_readStringLength 5 avgt 30 4.565 ± 0.038 ns/op
>> ToJavaStringTest.panama_readStringLength 20...
>
> Liam Miller-Cushon has updated the pull request incrementally with two additional commits since the last revision:
>
> - Improve test coverage, and more fixes
> - Review feedback
>
> * document assertion to link to bytesCompatible
> * throw IAE for length > Integer.MAX_VALUE
> * javadoc fixes
src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 2660:
> 2658: * are {@code < 0}
> 2659: * @throws IndexOutOfBoundsException if the {@code numChars + srcIndex} is larger than the length of
> 2660: * this {@code String} object.
I think this leaves the question: what happens is `srcIndex + numChars` overflows and becomes negative? It will be less than the length of the string for sure, but not right either.
This is why the other method's javadoc write down e.g. `srcIndex > srcArray.length - elementCount`. Assuming all positive numbers, it avoids the overflow issue.
test/jdk/java/foreign/TestStringEncoding.java line 108:
> 106: @Test(dataProvider = "strings")
> 107: public void testStringsLength(String testString) {
> 108: Set<String> excluded = Set.of("yen");
I know the yen character is translated to `/` in some encodings when doing a round trip. Maybe we should just avoid this issue by switching it out for e.g. `"section \u00A7"`, which is `§` and doesn't have the same problem.
test/jdk/java/foreign/TestStringEncoding.java line 180:
> 178: assertThrows(IndexOutOfBoundsException.class, () ->
> 179: MemorySegment.copy(testString, StandardCharsets.UTF_8, 0, text, 0, testString.length() + 1));
> 180: // dstOffset > byteSize() + B
Suggestion:
// dstOffset > byteSize() - B
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2549653853
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2549626703
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2549635887
More information about the core-libs-dev
mailing list