[External] : Re: MemorySegment APIs for reading and writing strings with known lengths

Jorn Vernee jorn.vernee at oracle.com
Tue Nov 4 15:23:57 UTC 2025


>     The fast paths in StringSupport call an out-of-line stub that does
>     a vectorized copy. At least in theory C2's auto-vectorizer should
>     be able to do the exact same thing for a manual loop using charAt,
>     but inline. i.e. it might even be faster, especially for small
>     strings. That's why it would be good to try that approach and see
>     how it compares.
>
> I can take a closer look at this. To check my understanding, would you 
> expect it to be competitive for UTF-16, or also UTF-8?
Either should work, though the UTF-16 code for expanding to a char is 
more complex, so the vectorizer's pattern matching might fail there. The 
code for UTF-8 (well, really latin1) is much simpler though (just a 
plain array load), so that one is more likely to work out of the two.
>  For the UTF-8 case, would you expect something like what proto is 
> currently doing here [1] to get vectorized?
>
> [1] 
> https://github.com/protocolbuffers/protobuf/blob/0a727cfc6e0a6dbeb46716f2f6142b99b6a604e0/java/core/src/main/java/com/google/protobuf/Utf8.java#L939-L990 
> <https://urldefense.com/v3/__https://github.com/protocolbuffers/protobuf/blob/0a727cfc6e0a6dbeb46716f2f6142b99b6a604e0/java/core/src/main/java/com/google/protobuf/Utf8.java*L939-L990__;Iw!!ACWV5N9M2RV99hQ!OrxYNs2e8L35GrenrzSEvBmcp98_kc6dNk3fRY6NXkidCTXGY9QzRptWKz1YLh7-khqCsK4IDtwfbEiv$>

This doesn't look like something that would vectorize. Typically, any 
non-loop-invariant control flow you have in a loop body will inhibit 
vectorization.

>     I was thinking primarily along the lines of adding a
>     MemorySegment::copy overload that accepts Strings as a source (as
>     opposed to e.g. an array), for copying from a string to a memory
>     segment only. We should probably also add an overload to
>     SegmentAllocator::allocateFrom that accepts an offset and a length
>     (we already have two for full strings). These two overloads could
>     fully support the sub string use case without looking too out of
>     place.
>
>     For reading a String, I think your proposal to augment
>     MemorySegment::getString looks good, but I think we should leave
>     setString alone in favor of adding a MS::copy overload (there's
>     the asymmetry I was talking about before).
>
> Thanks, I think I understand better now. Using copy for this seems a 
> lot nicer than setStringWithoutNullTerminator.
> For the allocateFrom part, do you think it would make sense to pass 
> the offset/length all the way through 
> bytesCompatible/copyToSegmentRaw? That could be decided with 
> benchmarks, and also potentially done later with the same allocateFrom 
> API shape if it ended up being worthwhile.
I think it should work similar to the overload we have with 
MemorySegment as a source: i.e. just call allocateNoInit, and then 
delegate to MemorySegment::copy.
>
>     For completeness, I think we should also just add the
>     MemorySegment::ofString(String, CharSet) overload which tries to
>     return a read-only view of the string, to match the existing
>     ofArray methods. This seems generally just a good primitive to have.
>
> That sounds good to me.
>
> Do you have thoughts on the best way to proceed here? Do you think it 
> makes sense to do incrementally, or would you prefer to see all of 
> these related changes happen together under a single issue?
>
I don't have a preference. Since you've already started a PR for 
enhancing getString, maybe you can focus on that for now, and we'll file 
followup issues for the others. Splitting things up might be nice since 
there's probably some benchmarking work involved for each. I think the 
copy and allocateFrom overload can be done in one patch though.

Jorn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251104/113f0303/attachment.htm>


More information about the panama-dev mailing list