<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>> I wouldn't have expected looping with charAt to be
competitive with the fast paths in StringSupport where the bytes
are compatible, or is that not right?<br>
<br>
The fast paths in StringSupport call an out-of-line stub that does
a vectorized copy. At least in theory C2's auto-vectorizer should
be able to do the exact same thing for a manual loop using charAt,
but inline. i.e. it might even be faster, especially for small
strings. That's why it would be good to try that approach and see
how it compares.<br>
<br>
> Do you have a feeling for how that approach might best be
exposed in the API? Do you think it might look like more
variations of getString/setString in MemorySegment? Or that there
might be a missing API primitive that could encapsulate those
String sources and destinations? Or something else?<br>
<br>
I was thinking primarily along the lines of adding a
MemorySegment::copy overload that accepts Strings as a source (as
opposed to e.g. an array), for copying from a string to a memory
segment only. We should probably also add an overload to
SegmentAllocator::allocateFrom that accepts an offset and a length
(we already have two for full strings). These two overloads could
fully support the sub string use case without looking too out of
place.<br>
<br>
For reading a String, I think your proposal to augment
MemorySegment::getString looks good, but I think we should leave
setString alone in favor of adding a MS::copy overload (there's
the asymmetry I was talking about before).<br>
<br>
For completeness, I think we should also just add the
MemorySegment::ofString(String, CharSet) overload which tries to
return a read-only view of the string, to match the existing
ofArray methods. This seems generally just a good primitive to
have.</p>
<p>Jorn</p>
<div class="moz-cite-prefix">On 4-11-2025 14:13, Liam Miller-Cushon
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAL4QsgvdeCsCVHoq90uneSbtJctxG74h09cUjKxGa7dqEoB0Hw@mail.gmail.com">
<div dir="ltr">
<div>Hi Jorn,</div>
<div><br>
</div>
<div>Thanks for the discussion and input!</div>
<br>
<div class="gmail_quote gmail_quote_container">
<div dir="ltr" class="gmail_attr">On Mon, Nov 3, 2025 at
7:47 PM Jorn Vernee <<a href="mailto:jorn.vernee@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">jorn.vernee@oracle.com</a>>
wrote:</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
About the copy elision, this is something that we have seen
being <br>
visible in benchmarks [1]. Were there any benchmarks in
which you've <br>
seen this difference show up as well, or is this more of a
theoretical <br>
benefit? It would be good to understand how important
performance is in <br>
all of this, or if it's more about API usability. Also, I
should note <br>
that in the case of setString, we only avoid the extra
byte[] when the <br>
source and target encodings are compatible.<br>
</blockquote>
<div><br>
</div>
We had done some earlier benchmarking, I think that was part
of the discussion that led to JDK-8362893: <a href="https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149189.html" moz-do-not-send="true" class="moz-txt-link-freetext">https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149189.html</a>.
I also made some draft changes to <a href="https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/28043__;!!ACWV5N9M2RV99hQ!Kos8ibRx_a7AZYrGm-cuDJfIZAnnCayLOh3DWGCmkyPh3Dgi5ZtZxbfvytQvz0gUJ6depYeemuNMgtkN$" moz-do-not-send="true">https://github.com/openjdk/jdk/pull/28043</a>
to add a prototype of setStringWithoutNullTerminator and did
some more microbenchmarking. I updated the PR description with
some results.<br>
<br>
For use-cases like the protobuf one, the interest is more in
getting the best possible performance, rather than API
usability.</div>
<div class="gmail_quote gmail_quote_container">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">It's
possible to call `getBytes` on a string and copy the
resulting byte[] </blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">into
the memory segment as well, but I suppose you want to avoid
that </blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">because
of the extra copy of the byte[] (although I think perhaps C2
can </blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
elide the extra object in that case)? Have you tried looping
over the string </blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
and manually copying each character using charAt?<br>
</blockquote>
<div><br>
</div>
<div>I'm not seeing competitive performance with the explicit
call to getBytes in the microbenchmarks, so I wonder if it
is perhaps not eliding the copy, although I haven't verified
in the assembly.</div>
<br>
I wouldn't have expected looping with charAt to be competitive
with the fast paths in StringSupport where the bytes are
compatible, or is that not right?</div>
<div class="gmail_quote gmail_quote_container"><br>
</div>
<div class="gmail_quote gmail_quote_container">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
I think I'm slowly coming to the conclusion that we should
just treat <br>
Strings as another source and destination format for data,
with the <br>
caveat that we can not modify a String in place, so any read
operations <br>
will have to create a new String instance instead. This
creates some <br>
asymmetry with the existing MemorySegment::copy methods. I
think because <br>
of that restriction, we might have to accept some asymmetry
between the <br>
read and write APIs for strings as well.</blockquote>
<div><br>
</div>
Do you have a feeling for how that approach might best be
exposed in the API? Do you think it might look like more
variations of getString/setString in MemorySegment? Or that
there might be a missing API primitive that could encapsulate
those String sources and destinations? Or something else?<br>
</div>
</div>
</blockquote>
</body>
</html>