RFR: 8369564: Provide a MemorySegment API to read strings with known lengths [v7]

Thu Nov 20 08:59:15 UTC 2025

On Wed, 19 Nov 2025 14:45:52 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> src/java.base/share/classes/java/lang/String.java line 2030:
>> 
>>> 2028:     }
>>> 2029: 
>>> 2030:     void copyToSegmentRaw(MemorySegment segment, long offset, int srcIndex, int numChars) {
>> 
>> This method takes an index, expressed in chars, and uses that as a byte offset in a bulk copy operation. I don't think this is correct. E.g. if the string is UTF16 (and not LATIN1), there is a scaling factor to be applied?
>
> In other words, it seems to me that here we have hardwired the knowledge that we can only get here is the string is latin1. I don't think this was the original intent of this method -- however, if that's the case, we should also add an assertion to avoid misuse.

Thanks for catching this. For `copyToSegmentRaw`, I have updated the parameter names to not refer to chars.

I have also tentatively added an assertion to `copyToSegmentRaw` to only support latin1 strings, which could be relaxed if `bytesCompatible` is updated to handle UTF-16

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28043#discussion_r2544854187