RFR: 8369564: Provide a MemorySegment API to read strings with known lengths [v7]

Thu Nov 20 08:59:19 UTC 2025

On Wed, 19 Nov 2025 14:52:04 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add a dstOffset parameter, stop using StringCharBuffer/CharsetEncoder::encode
>
> src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 2649:
> 
>> 2647:      *
>> 2648:      * @param src      the Java string to be written into this segment
>> 2649:      * @param dstEncoding the charset used to {@linkplain Charset#newEncoder() encode}
> 
> I'm not sure we have a dependency on the charset being standard?

We do not, thanks, fixed.

Although I think the existing `allocateFrom(String, Charset)` method does have an undocumented dependency, because it uses `CharsetKind` to get the terminator char length, which only supports standard Charsets. If we add a fast path for UTF-16 that may need a dependency on a standard Charset (or a standard way to get the code unit size of a charset, if it has one).

> src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java line 194:
> 
>> 192:         MemorySegment segment;
>> 193:         if (StringSupport.bytesCompatible(str, charset, srcIndex, numChars)) {
>> 194:             segment = allocateNoInit(numChars);
> 
> This also seems to rely on the fact that we end up here only for latin1 strings. Again, I don't think this is correct, but if it's deliberate, we should add an assertion check.

Good point. I think we make a similar assumption in the existing `allocateFrom(String, Charset)`, it does  `length + termCharSize` and that should perhaps be `(length + 1) * codeUnitSize`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2544835957
PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2544862879