[External] : Re: MemorySegment APIs for reading and writing strings with known lengths

Jorn Vernee jorn.vernee at oracle.com
Tue Nov 18 16:52:05 UTC 2025


Coming back to this, I think we've settled on the following three methods:

In MemorySegment:

     String getString(long offset, Charset charset, long length); // as 
in Liam's PR
     void copy(String src, Charset dstEncoding, int srcIndex, 
MemorySegment dst, int numChars);

And in SegmentAllocator:

     MemorySegment allocateFrom(String src, Charset dstEncoding, int 
srcIndex, int numChars);

For encoding directly into a memory segment without the need to go to an 
intermediate buffer, it looks like we can use the internal 
StringCharBuffer class, in combination with the `CharsetEncoder::encode` 
method. But of course we can skip encoding altogether when the internal 
string encoding matches the target, and just do a bulk copy.

For allocateFrom, since we don't yet have a way to determine the encoded 
length of a String, I think we'd still have to go to an intermediate 
byte[], and then allocate the result segment based on its length. We can 
still avoid the intermediate byte[] in most cases where the encoding of 
the String's internal buffer is compatible with the target encoding, and 
again just do a bulk copy from the string's internal buffer.

Note on the length parameter for getString: we thought that it might be 
possible to open this up to any charset, not just the standard ones we 
support now, in which case having the length be specified as a byte 
length would be more flexible, since not every charset might have a 
notion of 'code unit' (and associated unit size). For charsets with a 
code unit size, converting to a byte length would be trivial any ways 
(Sorry for the back-and-forth on that). Right now we can't handle a 
length > Integer.MAX_VALUE because of limitations of ByteBuffer used in 
the decoding (CharsetDecoder::decode takes ByteBuffer as input), but we 
wanted to keep this option open for the future, so that's why the length 
is a `long` above.

Liam, would you be interested in working on these as part of your PR [1]?

Jorn

[1]: https://github.com/openjdk/jdk/pull/28043
[2]:

On 12-11-2025 15:54, Liam Miller-Cushon wrote:
> Thanks. I am convinced :)
>
> On Wed, Nov 12, 2025 at 3:30 PM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com> wrote:
>
>
>     On 12/11/2025 11:40, Liam Miller-Cushon wrote:
>>
>>         For the non-\0 terminated strings, you have the String-based
>>         MemorySegment::copy I described - e.g.
>>
>>         void copy(String srcString, Charset srcCharset, int srcIndex,
>>         MemorySegment dstSegment, long dstOffset, int length);
>>
>>         With this, we also have two cases:
>>
>>         * if the charset is compatible with the string buffer, we
>>         just bulk-copy the string buffer (or a portion of it) into
>>         the dest segment
>>         * otherwise we can encode the srcString directly into the
>>         dest segment
>>
>>     Thanks! I think I'm caught up now. My misunderstanding was
>>     whether MS::ofString was being suggested instead of and not in
>>     addition to the bulk copy.
>
>     Ah, gotcha.
>
>     I think MS::ofString is a possible add-on. To be fair, since
>     writing the document I think we've grown a little colder on it, as
>     such a view would make for a pretty big footgun, as it would allow
>     a native function (invoked via critical downcall handle) to
>     directly modify the string buffer (at least in some cases).
>     There's also some question about how `MemorySegment::equals`
>     should work in this case, as `equals` for heap segments takes into
>     account the identity of the underlying heap object.
>
>     So, if we could get there with the new `getString`/`copy` + maybe
>     some way to determine the length of an encoded string, I think it
>     would be preferrable/less risky. We could always add `ofString`
>     later, if we find a way to address and/or mitigate the issues above.
>
>     Maurizio
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251118/193c4a55/attachment-0001.htm>


More information about the panama-dev mailing list