<div dir="ltr"><div>Thanks,</div><div class="gmail_quote gmail_quote_container"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
The fast paths in StringSupport call an out-of-line stub that does
a vectorized copy. At least in theory C2's auto-vectorizer should
be able to do the exact same thing for a manual loop using charAt,
but inline. i.e. it might even be faster, especially for small
strings. That's why it would be good to try that approach and see
how it compares.<br></p></div></blockquote><div>I can take a closer look at this. To check my understanding, would you expect it to be competitive for UTF-16, or also UTF-8? For the UTF-8 case, would you expect something like what proto is currently doing here [1] to get vectorized?</div><div><br></div><div>[1] <a href="https://github.com/protocolbuffers/protobuf/blob/0a727cfc6e0a6dbeb46716f2f6142b99b6a604e0/java/core/src/main/java/com/google/protobuf/Utf8.java#L939-L990">https://github.com/protocolbuffers/protobuf/blob/0a727cfc6e0a6dbeb46716f2f6142b99b6a604e0/java/core/src/main/java/com/google/protobuf/Utf8.java#L939-L990</a></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
I was thinking primarily along the lines of adding a
MemorySegment::copy overload that accepts Strings as a source (as
opposed to e.g. an array), for copying from a string to a memory
segment only. We should probably also add an overload to
SegmentAllocator::allocateFrom that accepts an offset and a length
(we already have two for full strings). These two overloads could
fully support the sub string use case without looking too out of
place. </p></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>For reading a String, I think your proposal to augment MemorySegment::getString looks good, but I think we should leave setString alone in favor of adding a MS::copy overload (there's the asymmetry I was talking about before). </p></div></blockquote><div>Thanks, I think I understand better now. Using copy for this seems a lot nicer than setStringWithoutNullTerminator.</div><div> </div>For the allocateFrom part, do you think it would make sense to pass the offset/length all the way through bytesCompatible/copyToSegmentRaw? That could be decided with benchmarks, and also potentially done later with the same allocateFrom API shape if it ended up being worthwhile.<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>For completeness, I think we should also just add the
MemorySegment::ofString(String, CharSet) overload which tries to
return a read-only view of the string, to match the existing
ofArray methods. This seems generally just a good primitive to
have.</p></div></blockquote><div>That sounds good to me.</div><div><br></div><div>Do you have thoughts on the best way to proceed here? Do you think it makes sense to do incrementally, or would you prefer to see all of these related changes happen together under a single issue?</div></div></div>