<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 12/11/2025 10:02, Liam Miller-Cushon
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAL4Qsgu_53bsnuG0efEQYyQKsCT0xuMdsEDq=kt0507XbkzOWA@mail.gmail.com">
<div dir="ltr">Thanks, yes, I think string concat is a good
analogy.<br>
<br>
Thinking about this more, isn't this use-case an example where
the proposed MemorySegment::ofString approach wouldn't always
offer the best possible performance? In the case where the
internal string buffer isn't compatible with the requested
charset it has to make an intermediate copy. In theory with the
alternative of a setString or copy method that took a String and
directly wrote it to the output, the intermediate copy could be
avoided.</div>
</blockquote>
<p>Let's leave MS::ofString aside for this discussion (as I agree
that wouldn't be optimal for this use case).</p>
<p>I believe what you mean here is that if I have a string, and I
want to copy to a destination segment I could either:</p>
<p>* if the string buffer is compatible, just bulk-copy that buffer
into the target segment<br>
* if the string buffer is not compatible, encode the string
_directly_ into the target segment</p>
<p>Correct? If so, I tend to agree this would be slightly
preferrable, as we'd be touching things only once. And, I believe
this can be also done to the existing setString method?</p>
<p>Cheers<br>
Maurizio<br>
</p>
<p><br>
</p>
<p><br>
</p>
<blockquote type="cite" cite="mid:CAL4Qsgu_53bsnuG0efEQYyQKsCT0xuMdsEDq=kt0507XbkzOWA@mail.gmail.com"><br>
<div class="gmail_quote gmail_quote_container">
<div dir="ltr" class="gmail_attr">On Tue, Nov 11, 2025 at
6:18 PM Maurizio Cimadamore <<a href="mailto:maurizio.cimadamore@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">maurizio.cimadamore@oracle.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Thanks for the detailed reply.</p>
<p>For resizing, I tend to agree with you -- the problem is
that if you don't size correctly upfront, then you will
have to pay the cost (potentially multiple times) to
allocate a bigger buffer and move all the contents over
there.</p>
<p>A bit like how string concat has evolved, where we now
have ways to "guess" the size of each of the concatenation
arguments so we can correctly size the byte[] buffer we
create to hold the result of the concatenation.</p>
<p>In those cases, I agree, paying a small-ish cost to be
able to estimate the size of a sub-element of an
allocation goes a long way in making everything less
dynamic and more deterministic.</p>
<p>Maurizio</p>
<div>On 11/11/2025 17:04, Liam Miller-Cushon wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>It seems to me that in this case encoding and
length travels together? E.g. you need to encode
anyway, at which point you also know the byte
size?</p>
<p>(I'm a bit unsure that here there's anything to
be gained by the method you proposed?)</p>
<p>Do you have use cases where you don't want to
decode, you just want to know the byte length?</p>
</div>
</blockquote>
<div>The main use-cases I've seen do want both the
encoding and the length.</div>
<div><br>
</div>
<div>I think there is still a benefit to a fast way to
get the length first. An alternative is to
accumulate into a temporary buffer, and potentially
have to resize it. If there are gigabytes of data
it's expensive to have to make another copy. Knowing
the encoded length up-front allows exactly sizing
the output buffer and avoids the temporary buffer.</div>
<div><br>
</div>
<div>Some slightly more concrete examples:</div>
<div><br>
</div>
<div>Building a byte[] with all of the content of a
lot of data, sizing the byte[] requires knowing the
sum of all the lengths you want to put into it first
and then encoding the strings into it.<br>
<br>
Streaming serialization to the network: the top
level has to know the length of the transitive
contents that it's going to be writing out in the
nested structures. The actual output is streamed, it
never constructs a byte[] of the complete data in
this scenario.</div>
<div><br>
</div>
<div>(There are also some public protobuf APIs that
just return an encoded byte length for the data, but
that is a less performance sensitive use-case.)</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>