[External] : Re: MemorySegment APIs for reading and writing strings with known lengths

Wed Nov 12 10:56:03 UTC 2025

On 12/11/2025 10:02, Liam Miller-Cushon wrote:
> Thanks, yes, I think string concat is a good analogy.
>
> Thinking about this more, isn't this use-case an example where the 
> proposed MemorySegment::ofString approach wouldn't always offer the 
> best possible performance? In the case where the internal string 
> buffer isn't compatible with the requested charset it has to make an 
> intermediate copy. In theory with the alternative of a setString or 
> copy method that took a String and directly wrote it to the output, 
> the intermediate copy could be avoided.

Let's leave MS::ofString aside for this discussion (as I agree that 
wouldn't be optimal for this use case).

I believe what you mean here is that if I have a string, and I want to 
copy to a destination segment I could either:

* if the string buffer is compatible, just bulk-copy that buffer into 
the target segment
* if the string buffer is not compatible, encode the string _directly_ 
into the target segment

Correct? If so, I tend to agree this would be slightly preferrable, as 
we'd be touching things only once. And, I believe this can be also done 
to the existing setString method?

Cheers
Maurizio

>
> On Tue, Nov 11, 2025 at 6:18 PM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com> wrote:
>
>     Thanks for the detailed reply.
>
>     For resizing, I tend to agree with you -- the problem is that if
>     you don't size correctly upfront, then you will have to pay the
>     cost (potentially multiple times) to allocate a bigger buffer and
>     move all the contents over there.
>
>     A bit like how string concat has evolved, where we now have ways
>     to "guess" the size of each of the concatenation arguments so we
>     can correctly size the byte[] buffer we create to hold the result
>     of the concatenation.
>
>     In those cases, I agree, paying a small-ish cost to be able to
>     estimate the size of a sub-element of an allocation goes a long
>     way in making everything less dynamic and more deterministic.
>
>     Maurizio
>
>     On 11/11/2025 17:04, Liam Miller-Cushon wrote:
>>
>>         It seems to me that in this case encoding and length travels
>>         together? E.g. you need to encode anyway, at which point you
>>         also know the byte size?
>>
>>         (I'm a bit unsure that here there's anything to be gained by
>>         the method you proposed?)
>>
>>         Do you have use cases where you don't want to decode, you
>>         just want to know the byte length?
>>
>>     The main use-cases I've seen do want both the encoding and the
>>     length.
>>
>>     I think there is still a benefit to a fast way to get the length
>>     first. An alternative is to accumulate into a temporary buffer,
>>     and potentially have to resize it. If there are gigabytes of data
>>     it's expensive to have to make another copy. Knowing the encoded
>>     length up-front allows exactly sizing the output buffer and
>>     avoids the temporary buffer.
>>
>>     Some slightly more concrete examples:
>>
>>     Building a byte[] with all of the content of a lot of data,
>>     sizing the byte[] requires knowing the sum of all the lengths you
>>     want to put into it first and then encoding the strings into it.
>>
>>     Streaming serialization to the network: the top level has to know
>>     the length of the transitive contents that it's going to be
>>     writing out in the nested structures. The actual output is
>>     streamed, it never constructs a byte[] of the complete data in
>>     this scenario.
>>
>>     (There are also some public protobuf APIs that just return an
>>     encoded byte length for the data, but that is a less performance
>>     sensitive use-case.)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251112/d5e5e3dc/attachment.htm>