[External] : Re: MemorySegment APIs for reading and writing strings with known lengths
Liam Miller-Cushon
cushon at google.com
Tue Nov 4 15:00:05 UTC 2025
Thanks,
> The fast paths in StringSupport call an out-of-line stub that does a
> vectorized copy. At least in theory C2's auto-vectorizer should be able to
> do the exact same thing for a manual loop using charAt, but inline. i.e. it
> might even be faster, especially for small strings. That's why it would be
> good to try that approach and see how it compares.
>
I can take a closer look at this. To check my understanding, would you
expect it to be competitive for UTF-16, or also UTF-8? For the UTF-8 case,
would you expect something like what proto is currently doing here [1] to
get vectorized?
[1]
https://github.com/protocolbuffers/protobuf/blob/0a727cfc6e0a6dbeb46716f2f6142b99b6a604e0/java/core/src/main/java/com/google/protobuf/Utf8.java#L939-L990
> I was thinking primarily along the lines of adding a MemorySegment::copy
> overload that accepts Strings as a source (as opposed to e.g. an array),
> for copying from a string to a memory segment only. We should probably also
> add an overload to SegmentAllocator::allocateFrom that accepts an offset
> and a length (we already have two for full strings). These two overloads
> could fully support the sub string use case without looking too out of
> place.
>
For reading a String, I think your proposal to augment
> MemorySegment::getString looks good, but I think we should leave setString
> alone in favor of adding a MS::copy overload (there's the asymmetry I was
> talking about before).
>
Thanks, I think I understand better now. Using copy for this seems a lot
nicer than setStringWithoutNullTerminator.
For the allocateFrom part, do you think it would make sense to pass the
offset/length all the way through bytesCompatible/copyToSegmentRaw? That
could be decided with benchmarks, and also potentially done later with the
same allocateFrom API shape if it ended up being worthwhile.
>
> For completeness, I think we should also just add the
> MemorySegment::ofString(String, CharSet) overload which tries to return a
> read-only view of the string, to match the existing ofArray methods. This
> seems generally just a good primitive to have.
>
That sounds good to me.
Do you have thoughts on the best way to proceed here? Do you think it makes
sense to do incrementally, or would you prefer to see all of these related
changes happen together under a single issue?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251104/740bf6bf/attachment-0001.htm>
More information about the panama-dev
mailing list