RFR: 8369564: Provide a MemorySegment API to read strings with known lengths [v8]

Thu Nov 20 16:07:24 UTC 2025

On Thu, 20 Nov 2025 08:59:11 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:

>> This PR proposes adding a new overload to `MemorySegment::getString` that takes a known byte length of the content.
>> 
>> This was previously proposed in https://github.com/openjdk/jdk/pull/20725, but the outcome of [JDK-8333843](https://bugs.openjdk.org/browse/JDK-8333843) was to update `MemorySegment#getString` to suggest
>> 
>> 
>>     byte[] bytes = new byte[length];
>>     MemorySegment.copy(segment, JAVA_BYTE, offset, bytes, 0, length);
>>     return new String(bytes, charset);
>> 
>> 
>> However this is less efficient than what the implementation of getString does after [JDK-8362893](https://bugs.openjdk.org/browse/JDK-8362893), it now uses `JavaLangAccess::uncheckedNewStringNoRepl` to avoid the copy.
>> 
>> See also discussion in [this panama-dev@ thread](https://mail.openjdk.org/pipermail/panama-dev/2025-November/021193.html), and mcimadamore's document [Pulling the (foreign) string](https://cr.openjdk.org/~mcimadamore/panama/strings_ffm.html)
>> 
>> Benchmark results:
>> 
>> 
>> Benchmark                                 (size)  Mode  Cnt    Score   Error  Units
>> ToJavaStringTest.jni_readString                5  avgt   30   55.339 ± 0.401  ns/op
>> ToJavaStringTest.jni_readString               20  avgt   30   59.887 ± 0.295  ns/op
>> ToJavaStringTest.jni_readString              100  avgt   30   84.288 ± 0.419  ns/op
>> ToJavaStringTest.jni_readString              200  avgt   30  119.275 ± 0.496  ns/op
>> ToJavaStringTest.jni_readString              451  avgt   30  193.106 ± 1.528  ns/op
>> ToJavaStringTest.panama_copyLength             5  avgt   30    7.348 ± 0.048  ns/op
>> ToJavaStringTest.panama_copyLength            20  avgt   30    7.440 ± 0.125  ns/op
>> ToJavaStringTest.panama_copyLength           100  avgt   30   11.766 ± 0.058  ns/op
>> ToJavaStringTest.panama_copyLength           200  avgt   30   16.096 ± 0.089  ns/op
>> ToJavaStringTest.panama_copyLength           451  avgt   30   25.844 ± 0.054  ns/op
>> ToJavaStringTest.panama_readString             5  avgt   30    5.857 ± 0.046  ns/op
>> ToJavaStringTest.panama_readString            20  avgt   30    7.750 ± 0.046  ns/op
>> ToJavaStringTest.panama_readString           100  avgt   30   14.109 ± 0.187  ns/op
>> ToJavaStringTest.panama_readString           200  avgt   30   18.035 ± 0.130  ns/op
>> ToJavaStringTest.panama_readString           451  avgt   30   35.896 ± 0.227  ns/op
>> ToJavaStringTest.panama_readStringLength       5  avgt   30    4.565 ± 0.038  ns/op
>> ToJavaStringTest.panama_readStringLength      20...
>
> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review feedback

Thanks for working on this! I've left some inline comments.

src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1339:

> 1337:      * @param charset the charset used to {@linkplain Charset#newDecoder() decode} the
> 1338:      *                string bytes
> 1339:      * @param length  length to be used for string conversion, in bytes

Please keep the same format
Suggestion:

     * @param length  length in bytes of the string to read

src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1341:

> 1339:      * @param length  length to be used for string conversion, in bytes
> 1340:      * @return a Java string constructed from the bytes read from the given starting
> 1341:      *         address reading the given length of characters

Suggestion:

     *         address reading the given length of bytes

src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 2662:

> 2660:      *         are {@code < 0}
> 2661:      * @throws IndexOutOfBoundsException if the {@code endIndex} is larger than the length of
> 2662:      *         this {@code String} object, or {@code beginIndex} is larger than {@code endIndex}.

There's no `beginIndex` and `endIndex`?

src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 157:

> 155: 
> 156:     /**
> 157:      * Converts a Java string into a null-terminated C string using the provided charset,

Not null-terminated, right?

src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 177:

> 175:      * @return a new native segment containing the converted C string
> 176:      * @throws IllegalArgumentException if {@code charset} is not a
> 177:      *         {@linkplain StandardCharsets standard charset}

Same here, I don't think we have this limitation.

src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 180:

> 178:      * @throws IndexOutOfBoundsException if either {@code srcIndex} or {@code numChars} are {@code < 0}
> 179:      * @throws IndexOutOfBoundsException  if the {@code endIndex} is larger than the length of
> 180:      *         this {@code String} object, or {@code beginIndex} is larger than {@code endIndex}.

There is no 'endIndext'?

src/java.base/share/classes/jdk/internal/foreign/StringSupport.java line 68:

> 66:     @ForceInline
> 67:     public static String readBytes(AbstractMemorySegmentImpl segment, long offset, Charset charset, long length) {
> 68:         final int lengthBytes = (int) length;

I think we should do something more here than just ignore the upper bits. We probably need to throw an exception when the value is > Integer.MAX_VALUE

test/jdk/java/foreign/TestStringEncoding.java line 135:

> 133:         }
> 134:     }
> 135: 

We need some more tests for the other new methods as well. Also, it would be nice to test non-standard charsets.

-------------

PR Review: https://git.openjdk.org/jdk/pull/28043#pullrequestreview-3488572591
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2546633357
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2546635587
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2546650214
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2546610473
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2546614368
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2546616502
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2546607537
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2546628459