<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 10/11/2025 14:20, Liam Miller-Cushon
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAL4QsgseX-p_03RSkisrrrOH6nUZ=zpjFZmAc8F+vM48n9w_Pw@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>I hope this comment was not in my doc?<br>
</div>
</blockquote>
<div><br>
</div>
It's a parenthetical in the paragraph starting with
"Finally, ultimately, the user is probably the most happy
with an API that directly accepts the units in which they
are already measuring their string"</div>
</div>
</div>
</blockquote>
Apologies for the confusion, that was a leftover from a previous
version. Removed now.<br>
<blockquote type="cite" cite="mid:CAL4QsgseX-p_03RSkisrrrOH6nUZ=zpjFZmAc8F+vM48n9w_Pw@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<blockquote type="cite">
<div dir="ltr"> </div>
</blockquote>
<p>You mean the _byte size_ of the encoded string
(rather than number of code units?)</p>
</div>
</blockquote>
<div>Yes, exactly.</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Something like this might be interesting. That
said... if the charset matches, then creating the
segment view, then obtaining its byte size is O(1)
(e.g. no decoding). And if the charset doesn't match,
you'll need to decode anyway -- at which point I'm not
sure the array creation is really the bottleneck?</p>
</div>
</blockquote>
<div>Thanks, yes, MemorySegment.ofString seemingly solves
the case where the charset matches, so it's more a
question of whether there are performance gains to be had
for the case where the charset doesn't match. The
benchmarking I've seen suggests a carefully optimized loop
over the string is outperforming getBytes(charset).length
for that case. I can do some more analysis and report
back.</div>
</div>
</div>
</div>
</blockquote>
<p>I believe you. My hunch here would be to separate this one out,
as it has more to do with the Charset/String API than it has to do
with memory segments?</p>
<p>E.g. you want an API like:</p>
<p>String::getNumBytes(Charset)</p>
<p>Whether this API exists or not seems orthogonal to the
improvements described in the documents I shared.</p>
<p>Cheers<br>
Maurizio<br>
</p>
<p><br>
</p>
</body>
</html>