<!DOCTYPE html><html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>Coming back to this, I think we've settled on the following three

      methods:<br>

      <br>

      In MemorySegment:</p>

    <p><font face="monospace">    String getString(long offset, Charset

        charset, long length); // as in Liam's PR<br>

            void copy(String src</font><font face="monospace">, Charset

        dstEncoding</font><font face="monospace">, int srcIndex,

        MemorySegment dst</font><font face="monospace">, int numChars</font><font face="monospace">);</font></p>

    <p>And in SegmentAllocator:</p>

    <p><font face="monospace">    MemorySegment allocateFrom(String src</font><font face="monospace">, Charset dstEncoding</font><font face="monospace">, int srcIndex</font><font face="monospace">,

        int </font><font face="monospace">numChars</font><font face="monospace"></font><font face="monospace">);</font></p>

    <p>For encoding directly into a memory segment without the need to

      go to an intermediate buffer, it looks like we can use the

      internal StringCharBuffer class, in combination with the

      `CharsetEncoder::encode` method. But of course we can skip

      encoding altogether when the internal string encoding matches the

      target, and just do a bulk copy.</p>

    <p>For allocateFrom, since we don't yet have a way to determine the

      encoded length of a String, I think we'd still have to go to an

      intermediate byte[], and then allocate the result segment based on

      its length. We can still avoid the intermediate byte[] in most

      cases where the encoding of the String's internal buffer is

      compatible with the target encoding, and again just do a bulk copy

      from the string's internal buffer.</p>

    <p>Note on the length parameter for getString: we thought that it

      might be possible to open this up to any charset, not just the

      standard ones we support now, in which case having the length be

      specified as a byte length would be more flexible, since not every

      charset might have a notion of 'code unit' (and associated unit

      size). For charsets with a code unit size, converting to a byte

      length would be trivial any ways (Sorry for the back-and-forth on

      that). Right now we can't handle a length > Integer.MAX_VALUE

      because of limitations of ByteBuffer used in the decoding

      (CharsetDecoder::decode takes ByteBuffer as input), but we wanted

      to keep this option open for the future, so that's why the length

      is a `long` above.</p>

    <p>Liam, would you be interested in working on these as part of your

      PR [1]?</p>

    <p>Jorn</p>

    <p>[1]: <a class="moz-txt-link-freetext" href="https://github.com/openjdk/jdk/pull/28043">https://github.com/openjdk/jdk/pull/28043</a><br>

      [2]: </p>

    <div class="moz-cite-prefix">On 12-11-2025 15:54, Liam Miller-Cushon

      wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:CAL4QsgtAJ2TcQ9KLSyA6kS-WFERrw5Z3Wo8XXO=LhpFHJZuzcQ@mail.gmail.com">

      

      <div dir="ltr">Thanks. I am convinced :)</div>

      <br>

      <div class="gmail_quote gmail_quote_container">

        <div dir="ltr" class="gmail_attr">On Wed, Nov 12, 2025 at

          3:30 PM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div>

            <p><br>

            </p>

            <div>On 12/11/2025 11:40, Liam Miller-Cushon wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">

                <div class="gmail_quote">

                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                    <div>

                      <p>For the non-\0 terminated strings, you have the

                        String-based MemorySegment::copy I described -

                        e.g.</p>

                      <pre lang=""><pre role="presentation"><span role="presentation" style="padding-right:0.1px">void copy(String srcString, Charset srcCharset, int srcIndex, MemorySegment dstSegment, long dstOffset, int length);</span></pre></pre>

                      <p>With this, we also have two cases:</p>

                      <p>* if the charset is compatible with the string

                        buffer, we just bulk-copy the string buffer (or

                        a portion of it) into the dest segment<br>

                        * otherwise we can encode the srcString directly

                        into the dest segment</p>

                    </div>

                  </blockquote>

                  <div>Thanks! I think I'm caught up now. My

                    misunderstanding was whether MS::ofString was being

                    suggested instead of and not in addition to the bulk

                    copy.</div>

                </div>

              </div>

            </blockquote>

            <p>Ah, gotcha.</p>

            <p>I think MS::ofString is a possible add-on. To be fair,

              since writing the document I think we've grown a little

              colder on it, as such a view would make for a pretty big

              footgun, as it would allow a native function (invoked via

              critical downcall handle) to directly modify the string

              buffer (at least in some cases). There's also some

              question about how `MemorySegment::equals` should work in

              this case, as `equals` for heap segments takes into

              account the identity of the underlying heap object.</p>

            <p>So, if we could get there with the new `getString`/`copy`

              + maybe some way to determine the length of an encoded

              string, I think it would be preferrable/less risky. We

              could always add `ofString` later, if we find a way to

              address and/or mitigate the issues above.<br>

            </p>

            <p>Maurizio<br>

            </p>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>