<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Coming back to this, I think we've settled on the following three
methods:<br>
<br>
In MemorySegment:</p>
<p><font face="monospace"> String getString(long offset, Charset
charset, long length); // as in Liam's PR<br>
void copy(String src</font><font face="monospace">, Charset
dstEncoding</font><font face="monospace">, int srcIndex,
MemorySegment dst</font><font face="monospace">, int numChars</font><font face="monospace">);</font></p>
<p>And in SegmentAllocator:</p>
<p><font face="monospace"> MemorySegment allocateFrom(String src</font><font face="monospace">, Charset dstEncoding</font><font face="monospace">, int srcIndex</font><font face="monospace">,
int </font><font face="monospace">numChars</font><font face="monospace"></font><font face="monospace">);</font></p>
<p>For encoding directly into a memory segment without the need to
go to an intermediate buffer, it looks like we can use the
internal StringCharBuffer class, in combination with the
`CharsetEncoder::encode` method. But of course we can skip
encoding altogether when the internal string encoding matches the
target, and just do a bulk copy.</p>
<p>For allocateFrom, since we don't yet have a way to determine the
encoded length of a String, I think we'd still have to go to an
intermediate byte[], and then allocate the result segment based on
its length. We can still avoid the intermediate byte[] in most
cases where the encoding of the String's internal buffer is
compatible with the target encoding, and again just do a bulk copy
from the string's internal buffer.</p>
<p>Note on the length parameter for getString: we thought that it
might be possible to open this up to any charset, not just the
standard ones we support now, in which case having the length be
specified as a byte length would be more flexible, since not every
charset might have a notion of 'code unit' (and associated unit
size). For charsets with a code unit size, converting to a byte
length would be trivial any ways (Sorry for the back-and-forth on
that). Right now we can't handle a length > Integer.MAX_VALUE
because of limitations of ByteBuffer used in the decoding
(CharsetDecoder::decode takes ByteBuffer as input), but we wanted
to keep this option open for the future, so that's why the length
is a `long` above.</p>
<p>Liam, would you be interested in working on these as part of your
PR [1]?</p>
<p>Jorn</p>
<p>[1]: <a class="moz-txt-link-freetext" href="https://github.com/openjdk/jdk/pull/28043">https://github.com/openjdk/jdk/pull/28043</a><br>
[2]: </p>
<div class="moz-cite-prefix">On 12-11-2025 15:54, Liam Miller-Cushon
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAL4QsgtAJ2TcQ9KLSyA6kS-WFERrw5Z3Wo8XXO=LhpFHJZuzcQ@mail.gmail.com">
<div dir="ltr">Thanks. I am convinced :)</div>
<br>
<div class="gmail_quote gmail_quote_container">
<div dir="ltr" class="gmail_attr">On Wed, Nov 12, 2025 at
3:30 PM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 12/11/2025 11:40, Liam Miller-Cushon wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>For the non-\0 terminated strings, you have the
String-based MemorySegment::copy I described -
e.g.</p>
<pre lang=""><pre role="presentation"><span role="presentation" style="padding-right:0.1px">void copy(String srcString, Charset srcCharset, int srcIndex, MemorySegment dstSegment, long dstOffset, int length);</span></pre></pre>
<p>With this, we also have two cases:</p>
<p>* if the charset is compatible with the string
buffer, we just bulk-copy the string buffer (or
a portion of it) into the dest segment<br>
* otherwise we can encode the srcString directly
into the dest segment</p>
</div>
</blockquote>
<div>Thanks! I think I'm caught up now. My
misunderstanding was whether MS::ofString was being
suggested instead of and not in addition to the bulk
copy.</div>
</div>
</div>
</blockquote>
<p>Ah, gotcha.</p>
<p>I think MS::ofString is a possible add-on. To be fair,
since writing the document I think we've grown a little
colder on it, as such a view would make for a pretty big
footgun, as it would allow a native function (invoked via
critical downcall handle) to directly modify the string
buffer (at least in some cases). There's also some
question about how `MemorySegment::equals` should work in
this case, as `equals` for heap segments takes into
account the identity of the underlying heap object.</p>
<p>So, if we could get there with the new `getString`/`copy`
+ maybe some way to determine the length of an encoded
string, I think it would be preferrable/less risky. We
could always add `ofString` later, if we find a way to
address and/or mitigate the issues above.<br>
</p>
<p>Maurizio<br>
</p>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>