RFR: 8369564: Provide a MemorySegment API to read strings with known lengths [v11]
Maurizio Cimadamore
mcimadamore at openjdk.org
Fri Nov 21 14:34:53 UTC 2025
On Fri, 21 Nov 2025 14:26:00 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:
>> This PR proposes adding a new overload to `MemorySegment::getString` that takes a known byte length of the content.
>>
>> This was previously proposed in https://github.com/openjdk/jdk/pull/20725, but the outcome of [JDK-8333843](https://bugs.openjdk.org/browse/JDK-8333843) was to update `MemorySegment#getString` to suggest
>>
>>
>> byte[] bytes = new byte[length];
>> MemorySegment.copy(segment, JAVA_BYTE, offset, bytes, 0, length);
>> return new String(bytes, charset);
>>
>>
>> However this is less efficient than what the implementation of getString does after [JDK-8362893](https://bugs.openjdk.org/browse/JDK-8362893), it now uses `JavaLangAccess::uncheckedNewStringNoRepl` to avoid the copy.
>>
>> See also discussion in [this panama-dev@ thread](https://mail.openjdk.org/pipermail/panama-dev/2025-November/021193.html), and mcimadamore's document [Pulling the (foreign) string](https://cr.openjdk.org/~mcimadamore/panama/strings_ffm.html)
>>
>> Benchmark results:
>>
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> ToJavaStringTest.jni_readString 5 avgt 30 55.339 ± 0.401 ns/op
>> ToJavaStringTest.jni_readString 20 avgt 30 59.887 ± 0.295 ns/op
>> ToJavaStringTest.jni_readString 100 avgt 30 84.288 ± 0.419 ns/op
>> ToJavaStringTest.jni_readString 200 avgt 30 119.275 ± 0.496 ns/op
>> ToJavaStringTest.jni_readString 451 avgt 30 193.106 ± 1.528 ns/op
>> ToJavaStringTest.panama_copyLength 5 avgt 30 7.348 ± 0.048 ns/op
>> ToJavaStringTest.panama_copyLength 20 avgt 30 7.440 ± 0.125 ns/op
>> ToJavaStringTest.panama_copyLength 100 avgt 30 11.766 ± 0.058 ns/op
>> ToJavaStringTest.panama_copyLength 200 avgt 30 16.096 ± 0.089 ns/op
>> ToJavaStringTest.panama_copyLength 451 avgt 30 25.844 ± 0.054 ns/op
>> ToJavaStringTest.panama_readString 5 avgt 30 5.857 ± 0.046 ns/op
>> ToJavaStringTest.panama_readString 20 avgt 30 7.750 ± 0.046 ns/op
>> ToJavaStringTest.panama_readString 100 avgt 30 14.109 ± 0.187 ns/op
>> ToJavaStringTest.panama_readString 200 avgt 30 18.035 ± 0.130 ns/op
>> ToJavaStringTest.panama_readString 451 avgt 30 35.896 ± 0.227 ns/op
>> ToJavaStringTest.panama_readStringLength 5 avgt 30 4.565 ± 0.038 ns/op
>> ToJavaStringTest.panama_readStringLength 20...
>
> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision:
>
> Use Utils.checkNonNegativeArgument
src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1333:
> 1331: * sequences with this charset's default replacement string. The {@link
> 1332: * java.nio.charset.CharsetDecoder} class should be used when more control
> 1333: * over the decoding process is required.
Should we say here, as you did for `copy` that this method ignores `\0` ?
src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 2646:
> 2644: * will appear truncated when read again.
> 2645: *
> 2646: * @param src the Java string to be written into this segment
Suggestion:
* @param src the Java string to be written into the destination segment
src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 2649:
> 2647: * @param dstEncoding the charset used to {@linkplain Charset#newEncoder() encode}
> 2648: * the string bytes.
> 2649: * @param srcIndex the starting index of the source string
Do we feel like saying words like "index" is good enough to say what we mean? E.g. by index we mean something that is compatible with `length` and `charAt`. I wonder if using a qualifier e.g. `character index` might help? (open to suggestions here, I'm not 100% sure)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2549950855
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2549953662
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2549945644
More information about the core-libs-dev
mailing list